preliminary benchmark evaluation: Topics by Science.gov

Sample records for preliminary benchmark evaluation

Benchmarking and validation activities within JEFF project

NASA Astrophysics Data System (ADS)

Cabellos, O.; Alvarez-Velarde, F.; Angelone, M.; Diez, C. J.; Dyrda, J.; Fiorito, L.; Fischer, U.; Fleming, M.; Haeck, W.; Hill, I.; Ichou, R.; Kim, D. H.; Klix, A.; Kodeli, I.; Leconte, P.; Michel-Sendis, F.; Nunnenmann, E.; Pecchia, M.; Peneliau, Y.; Plompen, A.; Rochman, D.; Romojaro, P.; Stankovskiy, A.; Sublet, J. Ch.; Tamagno, P.; Marck, S. van der

2017-09-01

The challenge for any nuclear data evaluation project is to periodically release a revised, fully consistent and complete library, with all needed data and covariances, and ensure that it is robust and reliable for a variety of applications. Within an evaluation effort, benchmarking activities play an important role in validating proposed libraries. The Joint Evaluated Fission and Fusion (JEFF) Project aims to provide such a nuclear data library, and thus, requires a coherent and efficient benchmarking process. The aim of this paper is to present the activities carried out by the new JEFF Benchmarking and Validation Working Group, and to describe the role of the NEA Data Bank in this context. The paper will also review the status of preliminary benchmarking for the next JEFF-3.3 candidate cross-section files.
WWTP dynamic disturbance modelling--an essential module for long-term benchmarking development.

PubMed

Gernaey, K V; Rosen, C; Jeppsson, U

2006-01-01

Intensive use of the benchmark simulation model No. 1 (BSM1), a protocol for objective comparison of the effectiveness of control strategies in biological nitrogen removal activated sludge plants, has also revealed a number of limitations. Preliminary definitions of the long-term benchmark simulation model No. 1 (BSM1_LT) and the benchmark simulation model No. 2 (BSM2) have been made to extend BSM1 for evaluation of process monitoring methods and plant-wide control strategies, respectively. Influent-related disturbances for BSM1_LT/BSM2 are to be generated with a model, and this paper provides a general overview of the modelling methods used. Typical influent dynamic phenomena generated with the BSM1_LT/BSM2 influent disturbance model, including diurnal, weekend, seasonal and holiday effects, as well as rainfall, are illustrated with simulation results. As a result of the work described in this paper, a proposed influent model/file has been released to the benchmark developers for evaluation purposes. Pending this evaluation, a final BSM1_LT/BSM2 influent disturbance model definition is foreseen. Preliminary simulations with dynamic influent data generated by the influent disturbance model indicate that default BSM1 activated sludge plant control strategies will need extensions for BSM1_LT/BSM2 to efficiently handle 1 year of influent dynamics.
Space network scheduling benchmark: A proof-of-concept process for technology transfer

NASA Technical Reports Server (NTRS)

Moe, Karen; Happell, Nadine; Hayden, B. J.; Barclay, Cathy

1993-01-01

This paper describes a detailed proof-of-concept activity to evaluate flexible scheduling technology as implemented in the Request Oriented Scheduling Engine (ROSE) and applied to Space Network (SN) scheduling. The criteria developed for an operational evaluation of a reusable scheduling system is addressed including a methodology to prove that the proposed system performs at least as well as the current system in function and performance. The improvement of the new technology must be demonstrated and evaluated against the cost of making changes. Finally, there is a need to show significant improvement in SN operational procedures. Successful completion of a proof-of-concept would eventually lead to an operational concept and implementation transition plan, which is outside the scope of this paper. However, a high-fidelity benchmark using actual SN scheduling requests has been designed to test the ROSE scheduling tool. The benchmark evaluation methodology, scheduling data, and preliminary results are described.
Fisk-based criteria to support validation of detection methods for drinking water and air.

DOE Office of Scientific and Technical Information (OSTI.GOV)

MacDonell, M.; Bhattacharyya, M.; Finster, M.

2009-02-18

This report was prepared to support the validation of analytical methods for threat contaminants under the U.S. Environmental Protection Agency (EPA) National Homeland Security Research Center (NHSRC) program. It is designed to serve as a resource for certain applications of benchmark and fate information for homeland security threat contaminants. The report identifies risk-based criteria from existing health benchmarks for drinking water and air for potential use as validation targets. The focus is on benchmarks for chronic public exposures. The priority sources are standard EPA concentration limits for drinking water and air, along with oral and inhalation toxicity values. Many contaminantsmore » identified as homeland security threats to drinking water or air would convert to other chemicals within minutes to hours of being released. For this reason, a fate analysis has been performed to identify potential transformation products and removal half-lives in air and water so appropriate forms can be targeted for detection over time. The risk-based criteria presented in this report to frame method validation are expected to be lower than actual operational targets based on realistic exposures following a release. Note that many target criteria provided in this report are taken from available benchmarks without assessing the underlying toxicological details. That is, although the relevance of the chemical form and analogues are evaluated, the toxicological interpretations and extrapolations conducted by the authoring organizations are not. It is also important to emphasize that such targets in the current analysis are not health-based advisory levels to guide homeland security responses. This integrated evaluation of chronic public benchmarks and contaminant fate has identified more than 200 risk-based criteria as method validation targets across numerous contaminants and fate products in drinking water and air combined. The gap in directly applicable values is considerable across the full set of threat contaminants, so preliminary indicators were developed from other well-documented benchmarks to serve as a starting point for validation efforts. By this approach, at least preliminary context is available for water or air, and sometimes both, for all chemicals on the NHSRC list that was provided for this evaluation. This means that a number of concentrations presented in this report represent indirect measures derived from related benchmarks or surrogate chemicals, as described within the many results tables provided in this report.« less
A classification and evaluation of data movement technologies for the delivery of highly voluminous scientific data products

NASA Technical Reports Server (NTRS)

Mattmann, Chris A.; Kelly, Sean; Crichton, Daniel J.; Hughes, J. Steven; Hardman, Sean; Ramirez, Paul; Joyner, Ron

2006-01-01

In this paper, we present a preliminary study of several different electronic data movement technologies. We detail our approach to classifying the technologies included in our study and present the preliminary results of some initial performance benchmarking. Our studies suggest that highly parallel TCP/IP streaming technologies, such as GridFTP and bbFTP, outperform commercial and open-source UDP-bursting technologies in several of the key data movement dimensions that we studied.
Benchmarking Reference Desk Service in Academic Health Science Libraries: A Preliminary Survey.

ERIC Educational Resources Information Center

Robbins, Kathryn; Daniels, Kathleen

2001-01-01

This preliminary study was designed to benchmark patron perceptions of reference desk services at academic health science libraries, using a standard questionnaire. Responses were compared to determine the library that provided the highest-quality service overall and along five service dimensions. All libraries were rated very favorably, but none…
77 FR 58512 - Corrosion-Resistant Carbon Steel Flat Products From the Republic of Korea: Preliminary Results of...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-09-21

... 2006 Decision Memorandum) at ``Benchmarks for Short-Term Financing.'' B. Benchmark for Long-Term Loans.... Subsidies Valuation Information A. Benchmarks for Short-Term Financing For those programs requiring the application of a won-denominated, short-term interest rate benchmark, in accordance with 19 CFR 351.505(a)(2...
Benchmarking the Integration of WAVEWATCH III Results into HAZUS-MH: Preliminary Results

NASA Technical Reports Server (NTRS)

Berglund, Judith; Holland, Donald; McKellip, Rodney; Sciaudone, Jeff; Vickery, Peter; Wang, Zhanxian; Ying, Ken

2005-01-01

The report summarizes the results from the preliminary benchmarking activities associated with the use of WAVEWATCH III (WW3) results in the HAZUS-MH MR1 flood module. Project partner Applied Research Associates (ARA) is integrating the WW3 model into HAZUS. The current version of HAZUS-MH predicts loss estimates from hurricane-related coastal flooding by using values of surge only. Using WW3, wave setup can be included with surge. Loss estimates resulting from the use of surge-only and surge-plus-wave-setup were compared. This benchmarking study is preliminary because the HAZUS-MH MR1 flood module was under development at the time of the study. In addition, WW3 is not scheduled to be fully integrated with HAZUS-MH and available for public release until 2008.
Developing Benchmarks for Solar Radio Bursts

NASA Astrophysics Data System (ADS)

Biesecker, D. A.; White, S. M.; Gopalswamy, N.; Black, C.; Domm, P.; Love, J. J.; Pierson, J.

2016-12-01

Solar radio bursts can interfere with radar, communication, and tracking signals. In severe cases, radio bursts can inhibit the successful use of radio communications and disrupt a wide range of systems that are reliant on Position, Navigation, and Timing services on timescales ranging from minutes to hours across wide areas on the dayside of Earth. The White House's Space Weather Action Plan has asked for solar radio burst intensity benchmarks for an event occurrence frequency of 1 in 100 years and also a theoretical maximum intensity benchmark. The solar radio benchmark team was also asked to define the wavelength/frequency bands of interest. The benchmark team developed preliminary (phase 1) benchmarks for the VHF (30-300 MHz), UHF (300-3000 MHz), GPS (1176-1602 MHz), F10.7 (2800 MHz), and Microwave (4000-20000) bands. The preliminary benchmarks were derived based on previously published work. Limitations in the published work will be addressed in phase 2 of the benchmark process. In addition, deriving theoretical maxima requires additional work, where it is even possible to, in order to meet the Action Plan objectives. In this presentation, we will present the phase 1 benchmarks and the basis used to derive them. We will also present the work that needs to be done in order to complete the final, or phase 2 benchmarks.
Space Weather Action Plan Solar Radio Burst Phase 1 Benchmarks and the Steps to Phase 2

NASA Astrophysics Data System (ADS)

Biesecker, D. A.; White, S. M.; Gopalswamy, N.; Black, C.; Love, J. J.; Pierson, J.

2017-12-01

Solar radio bursts, when at the right frequency and when strong enough, can interfere with radar, communication, and tracking signals. In severe cases, radio bursts can inhibit the successful use of radio communications and disrupt a wide range of systems that are reliant on Position, Navigation, and Timing services on timescales ranging from minutes to hours across wide areas on the dayside of Earth. The White House's Space Weather Action Plan asked for solar radio burst intensity benchmarks for an event occurrence frequency of 1 in 100 years and also a theoretical maximum intensity benchmark. The benchmark team has developed preliminary (phase 1) benchmarks for the VHF (30-300 MHz), UHF (300-3000 MHz), GPS (1176-1602 MHz), F10.7 (2800 MHz), and Microwave (4000-20000) bands. The preliminary benchmarks were derived based on previously published work. Limitations in the published work will be addressed in phase 2 of the benchmark process. In addition, deriving theoretical maxima requires additional work, where it is even possible to, in order to meet the Action Plan objectives. In this presentation, we will present the phase 1 benchmarks, the basis used to derive them, and the limitations of that work. We will also discuss the work that needs to be done to complete the phase 2 benchmarks.
Summary of ORSphere critical and reactor physics measurements

NASA Astrophysics Data System (ADS)

Marshall, Margaret A.; Bess, John D.

2017-09-01

In the early 1970s Dr. John T. Mihalczo (team leader), J.J. Lynn, and J.R. Taylor performed experiments at the Oak Ridge Critical Experiments Facility (ORCEF) with highly enriched uranium (HEU) metal (called Oak Ridge Alloy or ORALLOY) to recreate GODIVA I results with greater accuracy than those performed at Los Alamos National Laboratory in the 1950s. The purpose of the Oak Ridge ORALLOY Sphere (ORSphere) experiments was to estimate the unreflected and unmoderated critical mass of an idealized sphere of uranium metal corrected to a density, purity, and enrichment such that it could be compared with the GODIVA I experiments. This critical configuration has been evaluated. Preliminary results were presented at ND2013. Since then, the evaluation was finalized and judged to be an acceptable benchmark experiment for the International Criticality Safety Benchmark Experiment Project (ICSBEP). Additionally, reactor physics measurements were performed to determine surface button worths, central void worth, delayed neutron fraction, prompt neutron decay constant, fission density and neutron importance. These measurements have been evaluated and found to be acceptable experiments and are discussed in full detail in the International Handbook of Evaluated Reactor Physics Benchmark Experiments. The purpose of this paper is to summarize all the evaluated critical and reactor physics measurements evaluations.
77 FR 75978 - Utility Scale Wind Towers From the People's Republic of China: Final Affirmative Countervailing...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-12-26

... Comment 19: Value Added Tax and Import Duties in the HRS Benchmark Used to Calculate CS Wind's Benefit...: Preliminary Determination of Sales at Less Than Fair Value and Postponement of Final Determination, 77 FR... of HRS Benchmark Provision of Electricity for LTAR Comment 16: Electricity Benchmarks Tax Programs...
Summary of ORSphere Critical and Reactor Physics Measurements

DOE Office of Scientific and Technical Information (OSTI.GOV)

Marshall, Margaret A.; Bess, John D.

In the early 1970s Dr. John T. Mihalczo (team leader), J. J. Lynn, and J. R. Taylor performed experiments at the Oak Ridge Critical Experiments Facility (ORCEF) with highly enriched uranium (HEU) metal (called Oak Ridge Alloy or ORALLOY) to recreate GODIVA I results with greater accuracy than those performed at Los Alamos National Laboratory in the 1950s. The purpose of the Oak Ridge ORALLOY Sphere (ORSphere) experiments was to estimate the unreflected and unmoderated critical mass of an idealized sphere of uranium metal corrected to a density, purity, and enrichment such that it could be compared with the GODIVAmore » I experiments. This critical configuration has been evaluated. Preliminary results were presented at ND2013. Since then, the evaluation was finalized and judged to be an acceptable benchmark experiment for the International Criticality Safety Benchmark Experiment Project (ICSBEP). Additionally, reactor physics measurements were performed to determine surface button worths, central void worth, delayed neutron fraction, prompt neutron decay constant, fission density and neutron importance. These measurements have been evaluated and found to be acceptable experiments and are discussed in full detail in the International Handbook of Evaluated Reactor Physics Benchmark Experiments. The purpose of this paper is summary summarize all the critical and reactor physics measurements evaluations and, when possible, to compare them to GODIVA experiment results.« less
A proposed benchmark problem for cargo nuclear threat monitoring

NASA Astrophysics Data System (ADS)

Wesley Holmes, Thomas; Calderon, Adan; Peeples, Cody R.; Gardner, Robin P.

2011-10-01

There is currently a great deal of technical and political effort focused on reducing the risk of potential attacks on the United States involving radiological dispersal devices or nuclear weapons. This paper proposes a benchmark problem for gamma-ray and X-ray cargo monitoring with results calculated using MCNP5, v1.51. The primary goal is to provide a benchmark problem that will allow researchers in this area to evaluate Monte Carlo models for both speed and accuracy in both forward and inverse calculational codes and approaches for nuclear security applications. A previous benchmark problem was developed by one of the authors (RPG) for two similar oil well logging problems (Gardner and Verghese, 1991, [1]). One of those benchmarks has recently been used by at least two researchers in the nuclear threat area to evaluate the speed and accuracy of Monte Carlo codes combined with variance reduction techniques. This apparent need has prompted us to design this benchmark problem specifically for the nuclear threat researcher. This benchmark consists of conceptual design and preliminary calculational results using gamma-ray interactions on a system containing three thicknesses of three different shielding materials. A point source is placed inside the three materials lead, aluminum, and plywood. The first two materials are in right circular cylindrical form while the third is a cube. The entire system rests on a sufficiently thick lead base so as to reduce undesired scattering events. The configuration was arranged in such a manner that as gamma-ray moves from the source outward it first passes through the lead circular cylinder, then the aluminum circular cylinder, and finally the wooden cube before reaching the detector. A 2 in.×4 in.×16 in. box style NaI (Tl) detector was placed 1 m from the point source located in the center with the 4 in.×16 in. side facing the system. The two sources used in the benchmark are 137Cs and 235U.
U.S. EPA Superfund Program's Policy for Risk and Dose Assessment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Walker, Stuart

2008-01-15

The Environmental Protection Agency (EPA) Office of Superfund Remediation and Technology Innovation (OSRTI) has primary responsibility for implementing the long-term (non-emergency) portion of a key U.S. law regulating cleanup: the Comprehensive Environmental Response, Compensation and Liability Act, CERCLA, nicknamed 'Superfund'. The purpose of the Superfund program is to protect human health and the environment over the long term from releases or potential releases of hazardous substances from abandoned or uncontrolled hazardous waste sites. The focus of this paper is on risk and dose assessment policies and tools for addressing radioactively contaminated sites by the Superfund program. EPA has almost completedmore » two risk assessment tools that are particularly relevant to decommissioning activities conducted under CERCLA authority. These are the: 1. Building Preliminary Remediation Goals for Radionuclides (BPRG) electronic calculator, and 2. Radionuclide Outdoor Surfaces Preliminary Remediation Goals (SPRG) electronic calculator. EPA developed the BPRG calculator to help standardize the evaluation and cleanup of radiologically contaminated buildings at which risk is being assessed for occupancy. BPRGs are radionuclide concentrations in dust, air and building materials that correspond to a specified level of human cancer risk. The intent of SPRG calculator is to address hard outside surfaces such as building slabs, outside building walls, sidewalks and roads. SPRGs are radionuclide concentrations in dust and hard outside surface materials. EPA is also developing the 'Radionuclide Ecological Benchmark' calculator. This calculator provides biota concentration guides (BCGs), also known as ecological screening benchmarks, for use in ecological risk assessments at CERCLA sites. This calculator is intended to develop ecological benchmarks as part of the EPA guidance 'Ecological Risk Assessment Guidance for Superfund: Process for Designing and Conducting Ecological Risk Assessments'. The calculator develops ecological benchmarks for ionizing radiation based on cell death only.« less
76 FR 54209 - Corrosion-Resistant Carbon Steel Flat Products From the Republic of Korea: Preliminary Results of...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-08-31

... description of the merchandise is dispositive. Subsidies Valuation Information A. Benchmarks for Short-Term Financing For those programs requiring the application of a won-denominated, short-term interest rate... Issues and Decision Memorandum (CORE from Korea 2006 Decision Memorandum) at ``Benchmarks for Short-Term...
Preliminary Results for the OECD/NEA Time Dependent Benchmark using Rattlesnake, Rattlesnake-IQS and TDKENO

DOE Office of Scientific and Technical Information (OSTI.GOV)

DeHart, Mark D.; Mausolff, Zander; Weems, Zach

2016-08-01

One goal of the MAMMOTH M&S project is to validate the analysis capabilities within MAMMOTH. Historical data has shown limited value for validation of full three-dimensional (3D) multi-physics methods. Initial analysis considered the TREAT startup minimum critical core and one of the startup transient tests. At present, validation is focusing on measurements taken during the M8CAL test calibration series. These exercises will valuable in preliminary assessment of the ability of MAMMOTH to perform coupled multi-physics calculations; calculations performed to date are being used to validate the neutron transport solver Rattlesnake\\cite{Rattlesnake} and the fuels performance code BISON. Other validation projects outsidemore » of TREAT are available for single-physics benchmarking. Because the transient solution capability of Rattlesnake is one of the key attributes that makes it unique for TREAT transient simulations, validation of the transient solution of Rattlesnake using other time dependent kinetics benchmarks has considerable value. The Nuclear Energy Agency (NEA) of the Organization for Economic Cooperation and Development (OECD) has recently developed a computational benchmark for transient simulations. This benchmark considered both two-dimensional (2D) and 3D configurations for a total number of 26 different transients. All are negative reactivity insertions, typically returning to the critical state after some time.« less
Seismo-acoustic ray model benchmarking against experimental tank data.

PubMed

Camargo Rodríguez, Orlando; Collis, Jon M; Simpson, Harry J; Ey, Emanuel; Schneiderwind, Joseph; Felisberto, Paulo

2012-08-01

Acoustic predictions of the recently developed traceo ray model, which accounts for bottom shear properties, are benchmarked against tank experimental data from the EPEE-1 and EPEE-2 (Elastic Parabolic Equation Experiment) experiments. Both experiments are representative of signal propagation in a Pekeris-like shallow-water waveguide over a non-flat isotropic elastic bottom, where significant interaction of the signal with the bottom can be expected. The benchmarks show, in particular, that the ray model can be as accurate as a parabolic approximation model benchmarked in similar conditions. The results of benchmarking are important, on one side, as a preliminary experimental validation of the model and, on the other side, demonstrates the reliability of the ray approach for seismo-acoustic applications.
77 FR 33167 - Citric Acid and Certain Citrate Salts From the People's Republic of China: Preliminary Results of...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-06-05

... to the short- and medium-term rates to convert them to long- term rates using Bloomberg U.S... derivation of the benchmark and discount rates used to value these subsidies is discussed below. Short-Term... inflation-adjusted short-term benchmark rate, we have also excluded any countries with aberrational or...
Assessing I-Grid(TM) web-based monitoring for power quality and reliability benchmarking

DOE Office of Scientific and Technical Information (OSTI.GOV)

Divan, Deepak; Brumsickle, William; Eto, Joseph

2003-04-30

This paper presents preliminary findings from DOEs pilot program. The results show how a web-based monitoring system can form the basis for aggregation of data and correlation and benchmarking across broad geographical lines. A longer report describes additional findings from the pilot, including impacts of power quality and reliability on customers operations [Divan, Brumsickle, Eto 2003].

Toward Scalable Benchmarks for Mass Storage Systems

NASA Technical Reports Server (NTRS)

Miller, Ethan L.

1996-01-01

This paper presents guidelines for the design of a mass storage system benchmark suite, along with preliminary suggestions for programs to be included. The benchmarks will measure both peak and sustained performance of the system as well as predicting both short- and long-term behavior. These benchmarks should be both portable and scalable so they may be used on storage systems from tens of gigabytes to petabytes or more. By developing a standard set of benchmarks that reflect real user workload, we hope to encourage system designers and users to publish performance figures that can be compared with those of other systems. This will allow users to choose the system that best meets their needs and give designers a tool with which they can measure the performance effects of improvements to their systems.
"Winsight"™ Assessment System: Preliminary Theory of Action. Research Report. ETS RR-17-26

ERIC Educational Resources Information Center

Wylie, E. Caroline

2017-01-01

The "Winsight"™Assessment System integrates summative, interim (including both benchmark assessments and testlets), and formative assessment components initially focused on mathematics and English language arts (ELA) in Grades 3-8 and high school.This report provides a preliminary theory of action for the Winsight Assessment System. A…
HyspIRI Low Latency Concept and Benchmarks

NASA Technical Reports Server (NTRS)

Mandl, Dan

2010-01-01

Topics include HyspIRI low latency data ops concept, HyspIRI data flow, ongoing efforts, experiment with Web Coverage Processing Service (WCPS) approach to injecting new algorithms into SensorWeb, low fidelity HyspIRI IPM testbed, compute cloud testbed, open cloud testbed environment, Global Lambda Integrated Facility (GLIF) and OCC collaboration with Starlight, delay tolerant network (DTN) protocol benchmarking, and EO-1 configuration for preliminary DTN prototype.
Performance analysis of fusion nuclear-data benchmark experiments for light to heavy materials in MeV energy region with a neutron spectrum shifter

NASA Astrophysics Data System (ADS)

Murata, Isao; Ohta, Masayuki; Miyamaru, Hiroyuki; Kondo, Keitaro; Yoshida, Shigeo; Iida, Toshiyuki; Ochiai, Kentaro; Konno, Chikara

2011-10-01

Nuclear data are indispensable for development of fusion reactor candidate materials. However, benchmarking of the nuclear data in MeV energy region is not yet adequate. In the present study, benchmark performance in the MeV energy region was investigated theoretically for experiments by using a 14 MeV neutron source. We carried out a systematical analysis for light to heavy materials. As a result, the benchmark performance for the neutron spectrum was confirmed to be acceptable, while for gamma-rays it was not sufficiently accurate. Consequently, a spectrum shifter has to be applied. Beryllium had the best performance as a shifter. Moreover, a preliminary examination of whether it is really acceptable that only the spectrum before the last collision is considered in the benchmark performance analysis. It was pointed out that not only the last collision but also earlier collisions should be considered equally in the benchmark performance analysis.
Some experiences in aircraft aeroelastic design using Preliminary Aeroelastic Design of Structures (PAD)

NASA Technical Reports Server (NTRS)

Radovcich, N. A.

1984-01-01

The design experience associated with a benchmark aeroelastic design of an out of production transport aircraft is discussed. Current work being performed on a high aspect ratio wing design is reported. The Preliminary Aeroelastic Design of Structures (PADS) system is briefly summarized and some operational aspects of generating the design in an automated aeroelastic design environment are discussed.
Evaluation of anode (electro)catalytic materials for the direct borohydride fuel cell: Methods and benchmarks

NASA Astrophysics Data System (ADS)

Olu, Pierre-Yves; Job, Nathalie; Chatenet, Marian

2016-09-01

In this paper, different methods are discussed for the evaluation of the potential of a given catalyst, in view of an application as a direct borohydride fuel cell DBFC anode material. Characterizations results in DBFC configuration are notably analyzed at the light of important experimental variables which influence the performances of the DBFC. However, in many practical DBFC-oriented studies, these various experimental variables prevent one to isolate the influence of the anode catalyst on the cell performances. Thus, the electrochemical three-electrode cell is a widely-employed and useful tool to isolate the DBFC anode catalyst and to investigate its electrocatalytic activity towards the borohydride oxidation reaction (BOR) in the absence of other limitations. This article reviews selected results for different types of catalysts in electrochemical cell containing a sodium borohydride alkaline electrolyte. In particular, propositions of common experimental conditions and benchmarks are given for practical evaluation of the electrocatalytic activity towards the BOR in three-electrode cell configuration. The major issue of gaseous hydrogen generation and escape upon DBFC operation is also addressed through a comprehensive review of various results depending on the anode composition. At last, preliminary concerns are raised about the stability of potential anode catalysts upon DBFC operation.
Source-term development for a contaminant plume for use by multimedia risk assessment models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Whelan, Gene; McDonald, John P.; Taira, Randal Y.

1999-12-01

Multimedia modelers from the U.S. Environmental Protection Agency (EPA) and the U.S. Department of Energy (DOE) are collaborating to conduct a comprehensive and quantitative benchmarking analysis of four intermedia models: DOE's Multimedia Environmental Pollutant Assessment System (MEPAS), EPA's MMSOILS, EPA's PRESTO, and DOE's RESidual RADioactivity (RESRAD). These models represent typical analytically, semi-analytically, and empirically based tools that are utilized in human risk and endangerment assessments for use at installations containing radioactive and/or hazardous contaminants. Although the benchmarking exercise traditionally emphasizes the application and comparison of these models, the establishment of a Conceptual Site Model (CSM) should be viewed with equalmore » importance. This paper reviews an approach for developing a CSM of an existing, real-world, Sr-90 plume at DOE's Hanford installation in Richland, Washington, for use in a multimedia-based benchmarking exercise bet ween MEPAS, MMSOILS, PRESTO, and RESRAD. In an unconventional move for analytically based modeling, the benchmarking exercise will begin with the plume as the source of contamination. The source and release mechanism are developed and described within the context of performing a preliminary risk assessment utilizing these analytical models. By beginning with the plume as the source term, this paper reviews a typical process and procedure an analyst would follow in developing a CSM for use in a preliminary assessment using this class of analytical tool.« less
Improving guideline concordance in multidisciplinary teams: preliminary results of a cluster-randomized trial evaluating the effect of a web-based audit and feedback intervention with outreach visits

PubMed Central

van Engen-Verheul, Mariëtte M.; Gude, Wouter T.; van der Veer, Sabine N.; Kemps, Hareld M.C.; Jaspers, Monique M.W.; de Keizer, Nicolette F.; Peek, Niels

2015-01-01

Despite their widespread use, audit and feedback (A&F) interventions show variable effectiveness on improving professional performance. Based on known facilitators of successful A&F interventions, we developed a web-based A&F intervention with indicator-based performance feedback, benchmark information, action planning and outreach visits. The goal of the intervention was to engage with multidisciplinary teams to overcome barriers to guideline concordance and to improve overall team performance in the field of cardiac rehabilitation (CR). To assess its effectiveness we conducted a cluster-randomized trial in 18 CR clinics (14,847 patients) already working with computerized decision support (CDS). Our preliminary results showed no increase in concordance with guideline recommendations regarding prescription of CR therapies. Future analyses will investigate whether our intervention did improve team performance on other quality indicators. PMID:26958310
Joint Research Centre Copernicus Climate Change Service (C3S) Fitness-for-Purpose (F4P) Platform

NASA Astrophysics Data System (ADS)

Gobron, N.; Adams, J. S.; Cappucci, F.; Lanconelli, C.; Mota, B.; Melin, F.

2016-08-01

This paper presents the concept and first results of the Copernicus Climate Change Service Fitness-for-Purpose (C3S F4P) project. The main goal aims at evaluating the efficiency and overall performance of the service, mainly with regard to users information needs and high level requirements. This project will also assess the fitness- for-purpose of the C3S with a specific emphasis on the needs of European Union (EU) Policies and translate these recommendations into programmatic and technical requirements. The C3S Climate Data Records (CDS) include various Essential Climate Variables (ECVs) that are derived from space sensors, including from Copernicus Sentinels sensors. One module of the F4P platform focuses on the benchmarking of data sets and algorithms, in addition to radiative transfer models used towards understanding potential discrepancies between CDS records. Methods and preliminary results of the benchmark platform are presented in this contribution.
High Density Aerial Image Matching: State-Of and Future Prospects

NASA Astrophysics Data System (ADS)

Haala, N.; Cavegn, S.

2016-06-01

Ongoing innovations in matching algorithms are continuously improving the quality of geometric surface representations generated automatically from aerial images. This development motivated the launch of the joint ISPRS/EuroSDR project "Benchmark on High Density Aerial Image Matching", which aims on the evaluation of photogrammetric 3D data capture in view of the current developments in dense multi-view stereo-image matching. Originally, the test aimed on image based DSM computation from conventional aerial image flights for different landuse and image block configurations. The second phase then put an additional focus on high quality, high resolution 3D geometric data capture in complex urban areas. This includes both the extension of the test scenario to oblique aerial image flights as well as the generation of filtered point clouds as additional output of the respective multi-view reconstruction. The paper uses the preliminary outcomes of the benchmark to demonstrate the state-of-the-art in airborne image matching with a special focus of high quality geometric data capture in urban scenarios.
DOE Office of Scientific and Technical Information (OSTI.GOV)

MOSTELLER, RUSSELL D.

Previous studies have indicated that ENDF/B-VII preliminary releases {beta}-2 and {beta}-3, predecessors to the recent initial release of ENDF/B-VII.0, produce significantly better overall agreement with criticality benchmarks than does ENDF/B-VI. However, one of those studies also suggests that improvements still may be needed for thermal plutonium cross sections. The current study substantiates that concern by examining criticality benchmarks for unreflected spheres of plutonium-nitrate solutions and for slightly and heavily borated mixed-oxide (MOX) lattices. Results are presented for the JEFF-3.1 and JENDL-3.3 nuclear data libraries as well as ENDF/B-VII.0 and ENDF/B-VI. It is shown that ENDF/B-VII.0 tends to overpredict reactivity formore » thermal plutonium benchmarks over at least a portion of the thermal range. In addition, it is found that additional benchmark data are needed for the deep thermal range.« less
Experimental power density distribution benchmark in the TRIGA Mark II reactor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Snoj, L.; Stancar, Z.; Radulovic, V.

2012-07-01

In order to improve the power calibration process and to benchmark the existing computational model of the TRIGA Mark II reactor at the Josef Stefan Inst. (JSI), a bilateral project was started as part of the agreement between the French Commissariat a l'energie atomique et aux energies alternatives (CEA) and the Ministry of higher education, science and technology of Slovenia. One of the objectives of the project was to analyze and improve the power calibration process of the JSI TRIGA reactor (procedural improvement and uncertainty reduction) by using absolutely calibrated CEA fission chambers (FCs). This is one of the fewmore » available power density distribution benchmarks for testing not only the fission rate distribution but also the absolute values of the fission rates. Our preliminary calculations indicate that the total experimental uncertainty of the measured reaction rate is sufficiently low that the experiments could be considered as benchmark experiments. (authors)« less
Benchmark Evaluation of Start-Up and Zero-Power Measurements at the High-Temperature Engineering Test Reactor

DOE PAGES

Bess, John D.; Fujimoto, Nozomu

2014-10-09

Benchmark models were developed to evaluate six cold-critical and two warm-critical, zero-power measurements of the HTTR. Additional measurements of a fully-loaded subcritical configuration, core excess reactivity, shutdown margins, six isothermal temperature coefficients, and axial reaction-rate distributions were also evaluated as acceptable benchmark experiments. Insufficient information is publicly available to develop finely-detailed models of the HTTR as much of the design information is still proprietary. However, the uncertainties in the benchmark models are judged to be of sufficient magnitude to encompass any biases and bias uncertainties incurred through the simplification process used to develop the benchmark models. Dominant uncertainties in themore » experimental keff for all core configurations come from uncertainties in the impurity content of the various graphite blocks that comprise the HTTR. Monte Carlo calculations of keff are between approximately 0.9 % and 2.7 % greater than the benchmark values. Reevaluation of the HTTR models as additional information becomes available could improve the quality of this benchmark and possibly reduce the computational biases. High-quality characterization of graphite impurities would significantly improve the quality of the HTTR benchmark assessment. Simulation of the other reactor physics measurements are in good agreement with the benchmark experiment values. The complete benchmark evaluation details are available in the 2014 edition of the International Handbook of Evaluated Reactor Physics Benchmark Experiments.« less
Comprehensive Benchmark Suite for Simulation of Particle Laden Flows Using the Discrete Element Method with Performance Profiles from the Multiphase Flow with Interface eXchanges (MFiX) Code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, Peiyuan; Brown, Timothy; Fullmer, William D.

Five benchmark problems are developed and simulated with the computational fluid dynamics and discrete element model code MFiX. The benchmark problems span dilute and dense regimes, consider statistically homogeneous and inhomogeneous (both clusters and bubbles) particle concentrations and a range of particle and fluid dynamic computational loads. Several variations of the benchmark problems are also discussed to extend the computational phase space to cover granular (particles only), bidisperse and heat transfer cases. A weak scaling analysis is performed for each benchmark problem and, in most cases, the scalability of the code appears reasonable up to approx. 103 cores. Profiling ofmore » the benchmark problems indicate that the most substantial computational time is being spent on particle-particle force calculations, drag force calculations and interpolating between discrete particle and continuum fields. Hardware performance analysis was also carried out showing significant Level 2 cache miss ratios and a rather low degree of vectorization. These results are intended to serve as a baseline for future developments to the code as well as a preliminary indicator of where to best focus performance optimizations.« less
The Isprs Benchmark on Indoor Modelling

NASA Astrophysics Data System (ADS)

Khoshelham, K.; Díaz Vilariño, L.; Peter, M.; Kang, Z.; Acharya, D.

2017-09-01

Automated generation of 3D indoor models from point cloud data has been a topic of intensive research in recent years. While results on various datasets have been reported in literature, a comparison of the performance of different methods has not been possible due to the lack of benchmark datasets and a common evaluation framework. The ISPRS benchmark on indoor modelling aims to address this issue by providing a public benchmark dataset and an evaluation framework for performance comparison of indoor modelling methods. In this paper, we present the benchmark dataset comprising several point clouds of indoor environments captured by different sensors. We also discuss the evaluation and comparison of indoor modelling methods based on manually created reference models and appropriate quality evaluation criteria. The benchmark dataset is available for download at: http://www2.isprs.org/commissions/comm4/wg5/benchmark-on-indoor-modelling.html.
Evaluation of control strategies using an oxidation ditch benchmark.

PubMed

Abusam, A; Keesman, K J; Spanjers, H; van, Straten G; Meinema, K

2002-01-01

This paper presents validation and implementation results of a benchmark developed for a specific full-scale oxidation ditch wastewater treatment plant. A benchmark is a standard simulation procedure that can be used as a tool in evaluating various control strategies proposed for wastewater treatment plants. It is based on model and performance criteria development. Testing of this benchmark, by comparing benchmark predictions to real measurements of the electrical energy consumptions and amounts of disposed sludge for a specific oxidation ditch WWTP, has shown that it can (reasonably) be used for evaluating the performance of this WWTP. Subsequently, the validated benchmark was then used in evaluating some basic and advanced control strategies. Some of the interesting results obtained are the following: (i) influent flow splitting ratio, between the first and the fourth aerated compartments of the ditch, has no significant effect on the TN concentrations in the effluent, and (ii) for evaluation of long-term control strategies, future benchmarks need to be able to assess settlers' performance.
Benchmarking specialty hospitals, a scoping review on theory and practice.

PubMed

Wind, A; van Harten, W H

2017-04-04

Although benchmarking may improve hospital processes, research on this subject is limited. The aim of this study was to provide an overview of publications on benchmarking in specialty hospitals and a description of study characteristics. We searched PubMed and EMBASE for articles published in English in the last 10 years. Eligible articles described a project stating benchmarking as its objective and involving a specialty hospital or specific patient category; or those dealing with the methodology or evaluation of benchmarking. Of 1,817 articles identified in total, 24 were included in the study. Articles were categorized into: pathway benchmarking, institutional benchmarking, articles on benchmark methodology or -evaluation and benchmarking using a patient registry. There was a large degree of variability:(1) study designs were mostly descriptive and retrospective; (2) not all studies generated and showed data in sufficient detail; and (3) there was variety in whether a benchmarking model was just described or if quality improvement as a consequence of the benchmark was reported upon. Most of the studies that described a benchmark model described the use of benchmarking partners from the same industry category, sometimes from all over the world. Benchmarking seems to be more developed in eye hospitals, emergency departments and oncology specialty hospitals. Some studies showed promising improvement effects. However, the majority of the articles lacked a structured design, and did not report on benchmark outcomes. In order to evaluate the effectiveness of benchmarking to improve quality in specialty hospitals, robust and structured designs are needed including a follow up to check whether the benchmark study has led to improvements.
Enhancing pediatric residents’ scholar role: the development of a Scholarly Activity Guidance and Evaluation program

PubMed Central

Pound, Catherine M.; Moreau, Katherine A.; Ward, Natalie; Eady, Kaylee; Writer, Hilary

2015-01-01

Background Research training is essential to the development of well-rounded physicians. Although many pediatric residency programs require residents to complete a research project, it is often challenging to integrate research training into educational programs. Objective We aimed to develop an innovative research program for pediatric residents, called the Scholarly Activity Guidance and Evaluation (SAGE) program. Methods We developed a competency-based program which establishes benchmarks for pediatric residents, while providing ongoing academic mentorship. Results Feedback from residents and their research supervisors about the SAGE program has been positive. Preliminary evaluation data have shown that all final-year residents have met or exceeded program expectations. Conclusions By providing residents with this supportive environment, we hope to influence their academic career paths, increase their research productivity, promote evidence-based practice, and ultimately, positively impact health outcomes. PMID:26059213
Benchmarks for Evaluation of Distributed Denial of Service (DDOS)

DTIC Science & Technology

2008-01-01

publications: [1] E. Arikan , Attack Profiling for DDoS Benchmarks, MS Thesis, University of Delaware, August 2006. [2] J. Mirkovic, A. Hussain, B. Wilson...Sigmetrics 2007, June 2007 [5] J. Mirkovic, E. Arikan , S. Wei, S. Fahmy, R. Thomas, and P. Reiher Benchmarks for DDoS Defense Evaluation, Proceedings of the...Security Experimentation, June 2006. [9] J. Mirkovic, E. Arikan , S. Wei, S. Fahmy, R. Thomas, P. Reiher, Benchmarks for DDoS Defense Evaluation
Benchmarking Data Sets for the Evaluation of Virtual Ligand Screening Methods: Review and Perspectives.

PubMed

Lagarde, Nathalie; Zagury, Jean-François; Montes, Matthieu

2015-07-27

Virtual screening methods are commonly used nowadays in drug discovery processes. However, to ensure their reliability, they have to be carefully evaluated. The evaluation of these methods is often realized in a retrospective way, notably by studying the enrichment of benchmarking data sets. To this purpose, numerous benchmarking data sets were developed over the years, and the resulting improvements led to the availability of high quality benchmarking data sets. However, some points still have to be considered in the selection of the active compounds, decoys, and protein structures to obtain optimal benchmarking data sets.

Benchmark Evaluation of HTR-PROTEUS Pebble Bed Experimental Program

DOE PAGES

Bess, John D.; Montierth, Leland; Köberl, Oliver; ...

2014-10-09

Benchmark models were developed to evaluate 11 critical core configurations of the HTR-PROTEUS pebble bed experimental program. Various additional reactor physics measurements were performed as part of this program; currently only a total of 37 absorber rod worth measurements have been evaluated as acceptable benchmark experiments for Cores 4, 9, and 10. Dominant uncertainties in the experimental keff for all core configurations come from uncertainties in the ²³⁵U enrichment of the fuel, impurities in the moderator pebbles, and the density and impurity content of the radial reflector. Calculations of k eff with MCNP5 and ENDF/B-VII.0 neutron nuclear data aremore » greater than the benchmark values but within 1% and also within the 3σ uncertainty, except for Core 4, which is the only randomly packed pebble configuration. Repeated calculations of k eff with MCNP6.1 and ENDF/B-VII.1 are lower than the benchmark values and within 1% (~3σ) except for Cores 5 and 9, which calculate lower than the benchmark eigenvalues within 4σ. The primary difference between the two nuclear data libraries is the adjustment of the absorption cross section of graphite. Simulations of the absorber rod worth measurements are within 3σ of the benchmark experiment values. The complete benchmark evaluation details are available in the 2014 edition of the International Handbook of Evaluated Reactor Physics Benchmark Experiments.« less
How do I know if my forecasts are better? Using benchmarks in hydrological ensemble prediction

NASA Astrophysics Data System (ADS)

Pappenberger, F.; Ramos, M. H.; Cloke, H. L.; Wetterhall, F.; Alfieri, L.; Bogner, K.; Mueller, A.; Salamon, P.

2015-03-01

The skill of a forecast can be assessed by comparing the relative proximity of both the forecast and a benchmark to the observations. Example benchmarks include climatology or a naïve forecast. Hydrological ensemble prediction systems (HEPS) are currently transforming the hydrological forecasting environment but in this new field there is little information to guide researchers and operational forecasters on how benchmarks can be best used to evaluate their probabilistic forecasts. In this study, it is identified that the forecast skill calculated can vary depending on the benchmark selected and that the selection of a benchmark for determining forecasting system skill is sensitive to a number of hydrological and system factors. A benchmark intercomparison experiment is then undertaken using the continuous ranked probability score (CRPS), a reference forecasting system and a suite of 23 different methods to derive benchmarks. The benchmarks are assessed within the operational set-up of the European Flood Awareness System (EFAS) to determine those that are 'toughest to beat' and so give the most robust discrimination of forecast skill, particularly for the spatial average fields that EFAS relies upon. Evaluating against an observed discharge proxy the benchmark that has most utility for EFAS and avoids the most naïve skill across different hydrological situations is found to be meteorological persistency. This benchmark uses the latest meteorological observations of precipitation and temperature to drive the hydrological model. Hydrological long term average benchmarks, which are currently used in EFAS, are very easily beaten by the forecasting system and the use of these produces much naïve skill. When decomposed into seasons, the advanced meteorological benchmarks, which make use of meteorological observations from the past 20 years at the same calendar date, have the most skill discrimination. They are also good at discriminating skill in low flows and for all catchment sizes. Simpler meteorological benchmarks are particularly useful for high flows. Recommendations for EFAS are to move to routine use of meteorological persistency, an advanced meteorological benchmark and a simple meteorological benchmark in order to provide a robust evaluation of forecast skill. This work provides the first comprehensive evidence on how benchmarks can be used in evaluation of skill in probabilistic hydrological forecasts and which benchmarks are most useful for skill discrimination and avoidance of naïve skill in a large scale HEPS. It is recommended that all HEPS use the evidence and methodology provided here to evaluate which benchmarks to employ; so forecasters can have trust in their skill evaluation and will have confidence that their forecasts are indeed better.
Precise Ages for the Benchmark Brown Dwarfs HD 19467 B and HD 4747 B

NASA Astrophysics Data System (ADS)

Wood, Charlotte; Boyajian, Tabetha; Crepp, Justin; von Braun, Kaspar; Brewer, John; Schaefer, Gail; Adams, Arthur; White, Tim

2018-01-01

Large uncertainty in the age of brown dwarfs, stemming from a mass-age degeneracy, makes it difficult to constrain substellar evolutionary models. To break the degeneracy, we need ''benchmark" brown dwarfs (found in binary systems) whose ages can be determined independent of their masses. HD~19467~B and HD~4747~B are two benchmark brown dwarfs detected through the TRENDS (TaRgeting bENchmark objects with Doppler Spectroscopy) high-contrast imaging program for which we have dynamical mass measurements. To constrain their ages independently through isochronal analysis, we measured the radii of the host stars with interferometry using the Center for High Angular Resolution Astronomy (CHARA) Array. Assuming the brown dwarfs have the same ages as their host stars, we use these results to distinguish between several substellar evolutionary models. In this poster, we present new age estimates for HD~19467 and HD~4747 that are more accurate and precise and show our preliminary comparisons to cooling models.
Large-eddy simulation of a backward facing step flow using a least-squares spectral element method

NASA Technical Reports Server (NTRS)

Chan, Daniel C.; Mittal, Rajat

1996-01-01

We report preliminary results obtained from the large eddy simulation of a backward facing step at a Reynolds number of 5100. The numerical platform is based on a high order Legendre spectral element spatial discretization and a least squares time integration scheme. A non-reflective outflow boundary condition is in place to minimize the effect of downstream influence. Smagorinsky model with Van Driest near wall damping is used for sub-grid scale modeling. Comparisons of mean velocity profiles and wall pressure show good agreement with benchmark data. More studies are needed to evaluate the sensitivity of this method on numerical parameters before it is applied to complex engineering problems.
OWL2 benchmarking for the evaluation of knowledge based systems.

PubMed

Khan, Sher Afgun; Qadir, Muhammad Abdul; Abbas, Muhammad Azeem; Afzal, Muhammad Tanvir

2017-01-01

OWL2 semantics are becoming increasingly popular for the real domain applications like Gene engineering and health MIS. The present work identifies the research gap that negligible attention has been paid to the performance evaluation of Knowledge Base Systems (KBS) using OWL2 semantics. To fulfil this identified research gap, an OWL2 benchmark for the evaluation of KBS is proposed. The proposed benchmark addresses the foundational blocks of an ontology benchmark i.e. data schema, workload and performance metrics. The proposed benchmark is tested on memory based, file based, relational database and graph based KBS for performance and scalability measures. The results show that the proposed benchmark is able to evaluate the behaviour of different state of the art KBS on OWL2 semantics. On the basis of the results, the end users (i.e. domain expert) would be able to select a suitable KBS appropriate for his domain.
Absolute and relative cross section measurements of 237Np(n,f) and 238U(n,f) at the National Physical Laboratory

NASA Astrophysics Data System (ADS)

Salvador-Castiñeira, Paula; Hambsch, Franz-Josef; Göök, Alf; Vidali, Marzio; Hawkes, Nigel P.; Roberts, Neil J.; Taylor, Graeme C.; Thomas, David J.

2017-09-01

Cross section measurements in the fast energy region are being demanded as one of the key ingredients for modelling Generation-IV nuclear power plants. However, in facilities where there are no time-of-flight possibilities or it is not convenient to use them, using the 235U(n,f) cross section as a benchmark would require a careful knowledge of the room scatter in the experimental area. In this paper we present measurements of two threshold reactions, 238U(n,f) and 237Np(n,f), that could become a standard between their fission threshold and 2.5 MeV, if the discrepancies shown in the evaluations and in some experimental data can be solved. The preliminary results are in agreement with the present ENDF/B-VII.1 evaluation.
Commercial Building Energy Saver, Web App

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hong, Tianzhen; Piette, Mary; Lee, Sang Hoon

The CBES App is a web-based toolkit for use by small businesses and building owners and operators of small and medium size commercial buildings to perform energy benchmarking and retrofit analysis for buildings. The CBES App analyzes the energy performance of user's building for pre-and posto-retrofit, in conjunction with user's input data, to identify recommended retrofit measures, energy savings and economic analysis for the selected measures. The CBES App provides energy benchmarking, including getting an EnergyStar score using EnergyStar API and benchmarking against California peer buildings using the EnergyIQ API. The retrofit analysis includes a preliminary analysis by looking upmore » retrofit measures from a pre-simulated database DEEP, and a detailed analysis creating and running EnergyPlus models to calculate energy savings of retrofit measures. The CBES App builds upon the LBNL CBES API.« less
HPC Analytics Support. Requirements for Uncertainty Quantification Benchmarks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Paulson, Patrick R.; Purohit, Sumit; Rodriguez, Luke R.

2015-05-01

This report outlines techniques for extending benchmark generation products so they support uncertainty quantification by benchmarked systems. We describe how uncertainty quantification requirements can be presented to candidate analytical tools supporting SPARQL. We describe benchmark data sets for evaluating uncertainty quantification, as well as an approach for using our benchmark generator to produce data sets for generating benchmark data sets.
Benchmarking a geostatistical procedure for the homogenisation of annual precipitation series

NASA Astrophysics Data System (ADS)

Caineta, Júlio; Ribeiro, Sara; Henriques, Roberto; Soares, Amílcar; Costa, Ana Cristina

2014-05-01

The European project COST Action ES0601, Advances in homogenisation methods of climate series: an integrated approach (HOME), has brought to attention the importance of establishing reliable homogenisation methods for climate data. In order to achieve that, a benchmark data set, containing monthly and daily temperature and precipitation data, was created to be used as a comparison basis for the effectiveness of those methods. Several contributions were submitted and evaluated by a number of performance metrics, validating the results against realistic inhomogeneous data. HOME also led to the development of new homogenisation software packages, which included feedback and lessons learned during the project. Preliminary studies have suggested a geostatistical stochastic approach, which uses Direct Sequential Simulation (DSS), as a promising methodology for the homogenisation of precipitation data series. Based on the spatial and temporal correlation between the neighbouring stations, DSS calculates local probability density functions at a candidate station to detect inhomogeneities. The purpose of the current study is to test and compare this geostatistical approach with the methods previously presented in the HOME project, using surrogate precipitation series from the HOME benchmark data set. The benchmark data set contains monthly precipitation surrogate series, from which annual precipitation data series were derived. These annual precipitation series were subject to exploratory analysis and to a thorough variography study. The geostatistical approach was then applied to the data set, based on different scenarios for the spatial continuity. Implementing this procedure also promoted the development of a computer program that aims to assist on the homogenisation of climate data, while minimising user interaction. Finally, in order to compare the effectiveness of this methodology with the homogenisation methods submitted during the HOME project, the obtained results were evaluated using the same performance metrics. This comparison opens new perspectives for the development of an innovative procedure based on the geostatistical stochastic approach. Acknowledgements: The authors gratefully acknowledge the financial support of "Fundação para a Ciência e Tecnologia" (FCT), Portugal, through the research project PTDC/GEO-MET/4026/2012 ("GSIMCLI - Geostatistical simulation with local distributions for the homogenization and interpolation of climate data").
Blind Pose Prediction, Scoring, and Affinity Ranking of the CSAR 2014 Dataset.

PubMed

Martiny, Virginie Y; Martz, François; Selwa, Edithe; Iorga, Bogdan I

2016-06-27

The 2014 CSAR Benchmark Exercise was focused on three protein targets: coagulation factor Xa, spleen tyrosine kinase, and bacterial tRNA methyltransferase. Our protocol involved a preliminary analysis of the structural information available in the Protein Data Bank for the protein targets, which allowed the identification of the most appropriate docking software and scoring functions to be used for the rescoring of several docking conformations datasets, as well as for pose prediction and affinity ranking. The two key points of this study were (i) the prior evaluation of molecular modeling tools that are most adapted for each target and (ii) the increased search efficiency during the docking process to better explore the conformational space of big and flexible ligands.
Performance Evaluation of Supercomputers using HPCC and IMB Benchmarks

NASA Technical Reports Server (NTRS)

Saini, Subhash; Ciotti, Robert; Gunney, Brian T. N.; Spelce, Thomas E.; Koniges, Alice; Dossa, Don; Adamidis, Panagiotis; Rabenseifner, Rolf; Tiyyagura, Sunil R.; Mueller, Matthias;

2006-01-01

The HPC Challenge (HPCC) benchmark suite and the Intel MPI Benchmark (IMB) are used to compare and evaluate the combined performance of processor, memory subsystem and interconnect fabric of five leading supercomputers - SGI Altix BX2, Cray XI, Cray Opteron Cluster, Dell Xeon cluster, and NEC SX-8. These five systems use five different networks (SGI NUMALINK4, Cray network, Myrinet, InfiniBand, and NEC IXS). The complete set of HPCC benchmarks are run on each of these systems. Additionally, we present Intel MPI Benchmarks (IMB) results to study the performance of 11 MPI communication functions on these systems.

Anharmonic Vibrational Spectroscopy on Metal Transition Complexes

NASA Astrophysics Data System (ADS)

Latouche, Camille; Bloino, Julien; Barone, Vincenzo

2014-06-01

Advances in hardware performance and the availability of efficient and reliable computational models have made possible the application of computational spectroscopy to ever larger molecular systems. The systematic interpretation of experimental data and the full characterization of complex molecules can then be facilitated. Focusing on vibrational spectroscopy, several approaches have been proposed to simulate spectra beyond the double harmonic approximation, so that more details become available. However, a routine use of such tools requires the preliminary definition of a valid protocol with the most appropriate combination of electronic structure and nuclear calculation models. Several benchmark of anharmonic calculations frequency have been realized on organic molecules. Nevertheless, benchmarks of organometallics or inorganic metal complexes at this level are strongly lacking despite the interest of these systems due to their strong emission and vibrational properties. Herein we report the benchmark study realized with anharmonic calculations on simple metal complexes, along with some pilot applications on systems of direct technological or biological interest.
Benchmarks: The Development of a New Approach to Student Evaluation.

ERIC Educational Resources Information Center

Larter, Sylvia

The Toronto Board of Education Benchmarks are libraries of reference materials that demonstrate student achievement at various levels. Each library contains video benchmarks, print benchmarks, a staff handbook, and summary and introductory documents. This book is about the development and the history of the benchmark program. It has taken over 3…
GROWTH OF THE INTERNATIONAL CRITICALITY SAFETY AND REACTOR PHYSICS EXPERIMENT EVALUATION PROJECTS

DOE Office of Scientific and Technical Information (OSTI.GOV)

J. Blair Briggs; John D. Bess; Jim Gulliford

2011-09-01

Since the International Conference on Nuclear Criticality Safety (ICNC) 2007, the International Criticality Safety Benchmark Evaluation Project (ICSBEP) and the International Reactor Physics Experiment Evaluation Project (IRPhEP) have continued to expand their efforts and broaden their scope. Eighteen countries participated on the ICSBEP in 2007. Now, there are 20, with recent contributions from Sweden and Argentina. The IRPhEP has also expanded from eight contributing countries in 2007 to 16 in 2011. Since ICNC 2007, the contents of the 'International Handbook of Evaluated Criticality Safety Benchmark Experiments1' have increased from 442 evaluations (38000 pages), containing benchmark specifications for 3955 critical ormore » subcritical configurations to 516 evaluations (nearly 55000 pages), containing benchmark specifications for 4405 critical or subcritical configurations in the 2010 Edition of the ICSBEP Handbook. The contents of the Handbook have also increased from 21 to 24 criticality-alarm-placement/shielding configurations with multiple dose points for each, and from 20 to 200 configurations categorized as fundamental physics measurements relevant to criticality safety applications. Approximately 25 new evaluations and 150 additional configurations are expected to be added to the 2011 edition of the Handbook. Since ICNC 2007, the contents of the 'International Handbook of Evaluated Reactor Physics Benchmark Experiments2' have increased from 16 different experimental series that were performed at 12 different reactor facilities to 53 experimental series that were performed at 30 different reactor facilities in the 2011 edition of the Handbook. Considerable effort has also been made to improve the functionality of the searchable database, DICE (Database for the International Criticality Benchmark Evaluation Project) and verify the accuracy of the data contained therein. DICE will be discussed in separate papers at ICNC 2011. The status of the ICSBEP and the IRPhEP will be discussed in the full paper, selected benchmarks that have been added to the ICSBEP Handbook will be highlighted, and a preview of the new benchmarks that will appear in the September 2011 edition of the Handbook will be provided. Accomplishments of the IRPhEP will also be highlighted and the future of both projects will be discussed. REFERENCES (1) International Handbook of Evaluated Criticality Safety Benchmark Experiments, NEA/NSC/DOC(95)03/I-IX, Organisation for Economic Co-operation and Development-Nuclear Energy Agency (OECD-NEA), September 2010 Edition, ISBN 978-92-64-99140-8. (2) International Handbook of Evaluated Reactor Physics Benchmark Experiments, NEA/NSC/DOC(2006)1, Organisation for Economic Co-operation and Development-Nuclear Energy Agency (OECD-NEA), March 2011 Edition, ISBN 978-92-64-99141-5.« less
The NAS parallel benchmarks

NASA Technical Reports Server (NTRS)

Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)

1993-01-01

A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
Hospital benchmarking: are U.S. eye hospitals ready?

PubMed

de Korne, Dirk F; van Wijngaarden, Jeroen D H; Sol, Kees J C A; Betz, Robert; Thomas, Richard C; Schein, Oliver D; Klazinga, Niek S

2012-01-01

Benchmarking is increasingly considered a useful management instrument to improve quality in health care, but little is known about its applicability in hospital settings. The aims of this study were to assess the applicability of a benchmarking project in U.S. eye hospitals and compare the results with an international initiative. We evaluated multiple cases by applying an evaluation frame abstracted from the literature to five U.S. eye hospitals that used a set of 10 indicators for efficiency benchmarking. Qualitative analysis entailed 46 semistructured face-to-face interviews with stakeholders, document analyses, and questionnaires. The case studies only partially met the conditions of the evaluation frame. Although learning and quality improvement were stated as overall purposes, the benchmarking initiative was at first focused on efficiency only. No ophthalmic outcomes were included, and clinicians were skeptical about their reporting relevance and disclosure. However, in contrast with earlier findings in international eye hospitals, all U.S. hospitals worked with internal indicators that were integrated in their performance management systems and supported benchmarking. Benchmarking can support performance management in individual hospitals. Having a certain number of comparable institutes provide similar services in a noncompetitive milieu seems to lay fertile ground for benchmarking. International benchmarking is useful only when these conditions are not met nationally. Although the literature focuses on static conditions for effective benchmarking, our case studies show that it is a highly iterative and learning process. The journey of benchmarking seems to be more important than the destination. Improving patient value (health outcomes per unit of cost) requires, however, an integrative perspective where clinicians and administrators closely cooperate on both quality and efficiency issues. If these worlds do not share such a relationship, the added "public" value of benchmarking in health care is questionable.
Validation of Short-Term Noise Assessment Procedures: FY16 Summary of Procedures, Progress, and Preliminary Results

DTIC Science & Technology

Validation project. This report describes the procedure used to generate the noise models output dataset , and then it compares that dataset to the...benchmark, the Engineer Research and Development Centers Long-Range Sound Propagation dataset . It was found that the models consistently underpredict the
NASA low speed centrifugal compressor

NASA Technical Reports Server (NTRS)

Hathaway, Michael D.

1990-01-01

The flow characteristics of a low speed centrifugal compressor were examined at NASA Lewis Research Center to improve understanding of the flow in centrifugal compressors, to provide models of various flow phenomena, and to acquire benchmark data for three dimensional viscous flow code validation. The paper describes the objectives, test facilities' instrumentation, and experiment preliminary comparisons.
75 FR 16439 - Certain Welded Carbon Steel Standard Pipe From Turkey: Preliminary Results of Countervailing Duty...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-04-01

... of businesses, which the GOK deemed ``harmful to juveniles, affecting public morals, certain private... (August 30, 2002), and accompanying Issues and Decision Memorandum (Wire Rod Memorandum) at ``Benchmark... from Turkey, 71 FR 43111 (July 31, 2006) (2004 Pipe Final), and accompanying Issues and Decision...
Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark Generation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jin, Ye; Ma, Xiaosong; Liu, Qing Gary

2015-01-01

Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel applications. Hand-extracted synthetic benchmarks are time-and labor-intensive to create. Real applications themselves, while offering most accurate performance evaluation, are expensive to compile, port, reconfigure, and often plainly inaccessible due to security or ownership concerns. This work contributes APPRIME, a novel tool for trace-based automatic parallel benchmark generation. Taking as input standard communication-I/O traces of an application's execution, it couples accurate automatic phase identification with statistical regeneration of event parameters tomore » create compact, portable, and to some degree reconfigurable parallel application benchmarks. Experiments with four NAS Parallel Benchmarks (NPB) and three real scientific simulation codes confirm the fidelity of APPRIME benchmarks. They retain the original applications' performance characteristics, in particular the relative performance across platforms.« less

The philosophy of benchmark testing a standards-based picture archiving and communications system.

PubMed

Richardson, N E; Thomas, J A; Lyche, D K; Romlein, J; Norton, G S; Dolecek, Q E

1999-05-01

The Department of Defense issued its requirements for a Digital Imaging Network-Picture Archiving and Communications System (DIN-PACS) in a Request for Proposals (RFP) to industry in January 1997, with subsequent contracts being awarded in November 1997 to the Agfa Division of Bayer and IBM Global Government Industry. The Government's technical evaluation process consisted of evaluating a written technical proposal as well as conducting a benchmark test of each proposed system at the vendor's test facility. The purpose of benchmark testing was to evaluate the performance of the fully integrated system in a simulated operational environment. The benchmark test procedures and test equipment were developed through a joint effort between the Government, academic institutions, and private consultants. Herein the authors discuss the resources required and the methods used to benchmark test a standards-based PACS.
User-centered virtual environment assessment and design for cognitive rehabilitation applications

NASA Astrophysics Data System (ADS)

Fidopiastis, Cali Michael

Virtual environment (VE) design for cognitive rehabilitation necessitates a new methodology to ensure the validity of the resulting rehabilitation assessment. We propose that benchmarking the VE system technology utilizing a user-centered approach should precede the VE construction. Further, user performance baselines should be measured throughout testing as a control for adaptive effects that may confound the metrics chosen to evaluate the rehabilitation treatment. To support these claims we present data obtained from two modules of a user-centered head-mounted display (HMD) assessment battery, specifically resolution visual acuity and stereoacuity. Resolution visual acuity and stereoacuity assessments provide information about the image quality achieved by an HMD based upon its unique system parameters. When applying a user-centered approach, we were able to quantify limitations in the VE system components (e.g., low microdisplay resolution) and separately point to user characteristics (e.g., changes in dark focus) that may introduce error in the evaluation of VE based rehabilitation protocols. Based on these results, we provide guidelines for calibrating and benchmarking HMDs. In addition, we discuss potential extensions of the assessment to address higher level usability issues. We intend to test the proposed framework within the Human Experience Modeler (HEM), a testbed created at the University of Central Florida to evaluate technologies that may enhance cognitive rehabilitation effectiveness. Preliminary results of a feasibility pilot study conducted with a memory impaired participant showed that the HEM provides the control and repeatability needed to conduct such technology comparisons. Further, the HEM affords the opportunity to integrate new brain imaging technologies (i.e., functional Near Infrared Imaging) to evaluate brain plasticity associated with VE based cognitive rehabilitation.
Contributions to Integral Nuclear Data in ICSBEP and IRPhEP since ND 2013

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bess, John D.; Briggs, J. Blair; Gulliford, Jim

2016-09-01

The status of the International Criticality Safety Benchmark Evaluation Project (ICSBEP) and the International Reactor Physics Experiment Evaluation Project (IRPhEP) was last discussed directly with the international nuclear data community at ND2013. Since ND2013, integral benchmark data that are available for nuclear data testing has continued to increase. The status of the international benchmark efforts and the latest contributions to integral nuclear data for testing is discussed. Select benchmark configurations that have been added to the ICSBEP and IRPhEP Handbooks since ND2013 are highlighted. The 2015 edition of the ICSBEP Handbook now contains 567 evaluations with benchmark specifications for 4,874more » critical, near-critical, or subcritical configurations, 31 criticality alarm placement/shielding configuration with multiple dose points apiece, and 207 configurations that have been categorized as fundamental physics measurements that are relevant to criticality safety applications. The 2015 edition of the IRPhEP Handbook contains data from 143 different experimental series that were performed at 50 different nuclear facilities. Currently 139 of the 143 evaluations are published as approved benchmarks with the remaining four evaluations published in draft format only. Measurements found in the IRPhEP Handbook include criticality, buckling and extrapolation length, spectral characteristics, reactivity effects, reactivity coefficients, kinetics, reaction-rate distributions, power distributions, isotopic compositions, and/or other miscellaneous types of measurements for various types of reactor systems. Annual technical review meetings for both projects were held in April 2016; additional approved benchmark evaluations will be included in the 2016 editions of these handbooks.« less
The NAS parallel benchmarks

NASA Technical Reports Server (NTRS)

Bailey, D. H.; Barszcz, E.; Barton, J. T.; Carter, R. L.; Lasinski, T. A.; Browning, D. S.; Dagum, L.; Fatoohi, R. A.; Frederickson, P. O.; Schreiber, R. S.

1991-01-01

A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers in the framework of the NASA Ames Numerical Aerodynamic Simulation (NAS) Program. These consist of five 'parallel kernel' benchmarks and three 'simulated application' benchmarks. Together they mimic the computation and data movement characteristics of large-scale computational fluid dynamics applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification-all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
Algorithm and Architecture Independent Benchmarking with SEAK

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tallent, Nathan R.; Manzano Franco, Joseph B.; Gawande, Nitin A.

2016-05-23

Many applications of high performance embedded computing are limited by performance or power bottlenecks. We have designed the Suite for Embedded Applications & Kernels (SEAK), a new benchmark suite, (a) to capture these bottlenecks in a way that encourages creative solutions; and (b) to facilitate rigorous, objective, end-user evaluation for their solutions. To avoid biasing solutions toward existing algorithms, SEAK benchmarks use a mission-centric (abstracted from a particular algorithm) and goal-oriented (functional) specification. To encourage solutions that are any combination of software or hardware, we use an end-user black-box evaluation that can capture tradeoffs between performance, power, accuracy, size, andmore » weight. The tradeoffs are especially informative for procurement decisions. We call our benchmarks future proof because each mission-centric interface and evaluation remains useful despite shifting algorithmic preferences. It is challenging to create both concise and precise goal-oriented specifications for mission-centric problems. This paper describes the SEAK benchmark suite and presents an evaluation of sample solutions that highlights power and performance tradeoffs.« less
Benchmarking. Issues in the Design and Implementation of a Benchmarking System for Employment and Training Programs for Young People.

ERIC Educational Resources Information Center

Coughlin, David C.; Bielen, Rhonda P.

This paper has been prepared to assist the United States Department of Labor to explore new approaches to evaluating and measuring the performance of employment and training activities for youth. As one of several tools for evaluating success of local youth training programs, "benchmarking" provides a system for measuring the development…
Space Operations Training Concepts Benchmark Study (Training in a Continuous Operations Environment)

NASA Technical Reports Server (NTRS)

Johnston, Alan E.; Gilchrist, Michael; Underwood, Debrah (Technical Monitor)

2002-01-01

The NASA/USAF Benchmark Space Operations Training Concepts Study will perform a comparative analysis of the space operations training programs utilized by the United States Air Force Space Command with those utilized by the National Aeronautics and Space Administration. The concentration of the study will be focused on Ground Controller/Flight Controller Training for the International Space Station Payload Program. The duration of the study is expected to be five months with report completion by 30 June 2002. The U.S. Air Force Space Command was chosen as the most likely candidate for this benchmark study because their experience in payload operations controller training and user interfaces compares favorably with the Payload Operations Integration Center's training and user interfaces. These similarities can be seen in the dynamics of missions/payloads, controller on-console requirements, and currency/proficiency challenges to name a few. It is expected that the report will look at the respective programs and investigate goals of each training program, unique training challenges posed by space operations ground controller environments, processes of setting up controller training programs, phases of controller training, methods of controller training, techniques to evaluate adequacy of controller knowledge and the training received, and approaches to training administration. The report will provide recommendations to the respective agencies based on the findings. Attached is a preliminary outline of the study. Following selection of participants and an approval to proceed, initial contact will be made with U.S. Air Force Space Command Directorate of Training to discuss steps to accomplish the study.
EBR-II Reactor Physics Benchmark Evaluation Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pope, Chad L.; Lum, Edward S; Stewart, Ryan

This report provides a reactor physics benchmark evaluation with associated uncertainty quantification for the critical configuration of the April 1986 Experimental Breeder Reactor II Run 138B core configuration.
Chemometrics resolution and quantification power evaluation: Application on pharmaceutical quaternary mixture of Paracetamol, Guaifenesin, Phenylephrine and p-aminophenol

NASA Astrophysics Data System (ADS)

Yehia, Ali M.; Mohamed, Heba M.

2016-01-01

Three advanced chemmometric-assisted spectrophotometric methods namely; Concentration Residuals Augmented Classical Least Squares (CRACLS), Multivariate Curve Resolution-Alternating Least Squares (MCR-ALS) and Principal Component Analysis-Artificial Neural Networks (PCA-ANN) were developed, validated and benchmarked to PLS calibration; to resolve the severely overlapped spectra and simultaneously determine; Paracetamol (PAR), Guaifenesin (GUA) and Phenylephrine (PHE) in their ternary mixture and in presence of p-aminophenol (AP) the main degradation product and synthesis impurity of Paracetamol. The analytical performance of the proposed methods was described by percentage recoveries, root mean square error of calibration and standard error of prediction. The four multivariate calibration methods could be directly used without any preliminary separation step and successfully applied for pharmaceutical formulation analysis, showing no excipients' interference.
Evaluation of the concrete shield compositions from the 2010 criticality accident alarm system benchmark experiments at the CEA Valduc SILENE facility

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, Thomas Martin; Celik, Cihangir; Dunn, Michael E

In October 2010, a series of benchmark experiments were conducted at the French Commissariat a l'Energie Atomique et aux Energies Alternatives (CEA) Valduc SILENE facility. These experiments were a joint effort between the United States Department of Energy Nuclear Criticality Safety Program and the CEA. The purpose of these experiments was to create three benchmarks for the verification and validation of radiation transport codes and evaluated nuclear data used in the analysis of criticality accident alarm systems. This series of experiments consisted of three single-pulsed experiments with the SILENE reactor. For the first experiment, the reactor was bare (unshielded), whereasmore » in the second and third experiments, it was shielded by lead and polyethylene, respectively. The polyethylene shield of the third experiment had a cadmium liner on its internal and external surfaces, which vertically was located near the fuel region of SILENE. During each experiment, several neutron activation foils and thermoluminescent dosimeters (TLDs) were placed around the reactor. Nearly half of the foils and TLDs had additional high-density magnetite concrete, high-density barite concrete, standard concrete, and/or BoroBond shields. CEA Saclay provided all the concrete, and the US Y-12 National Security Complex provided the BoroBond. Measurement data from the experiments were published at the 2011 International Conference on Nuclear Criticality (ICNC 2011) and the 2013 Nuclear Criticality Safety Division (NCSD 2013) topical meeting. Preliminary computational results for the first experiment were presented in the ICNC 2011 paper, which showed poor agreement between the computational results and the measured values of the foils shielded by concrete. Recently the hydrogen content, boron content, and density of these concrete shields were further investigated within the constraints of the previously available data. New computational results for the first experiment are now available that show much better agreement with the measured values.« less
High Temperature Test Facility Preliminary RELAP5-3D Input Model Description

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bayless, Paul David

A RELAP5-3D input model is being developed for the High Temperature Test Facility at Oregon State University. The current model is described in detail. Further refinements will be made to the model as final as-built drawings are released and when system characterization data are available for benchmarking the input model.
Preliminary Evidence on the Effectiveness of Psychological Treatments Delivered at a University Counseling Center

ERIC Educational Resources Information Center

Minami, Takuya; Davies, D. Robert; Tierney, Sandra Callen; Bettmann, Joanna E.; McAward, Scott M.; Averill, Lynnette A.; Huebner, Lois A.; Weitzman, Lauren M.; Benbrook, Amy R.; Serlin, Ronald C.; Wampold, Bruce E.

2009-01-01

Treatment data from a university counseling center (UCC) that utilized the Outcome Questionnaire-45.2 (OQ-45; M. J. Lambert et al., 2004), a self-report general clinical symptom measure, was compared against treatment efficacy benchmarks from clinical trials of adult major depression that utilized similar measures. Statistical analyses suggested…
77 FR 33181 - Large Residential Washers From the Republic of Korea: Preliminary Affirmative Countervailing Duty...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-06-05

... does not include a rate for either the ``Korea Trade Insurance Corporation (K-SURE)-- Short-Term Export.... C. Benchmark Interest Rate for Short-Term Loans Section 771(5)(E)(ii) of the Act states that the... Development Bank (KDB)/IBK Short-Term Discounted Loans for Export Receivables'' program, an analysis of any...
Adding Fault Tolerance to NPB Benchmarks Using ULFM

DOE Office of Scientific and Technical Information (OSTI.GOV)

Parchman, Zachary W; Vallee, Geoffroy R; Naughton III, Thomas J

2016-01-01

In the world of high-performance computing, fault tolerance and application resilience are becoming some of the primary concerns because of increasing hardware failures and memory corruptions. While the research community has been investigating various options, from system-level solutions to application-level solutions, standards such as the Message Passing Interface (MPI) are also starting to include such capabilities. The current proposal for MPI fault tolerant is centered around the User-Level Failure Mitigation (ULFM) concept, which provides means for fault detection and recovery of the MPI layer. This approach does not address application-level recovery, which is currently left to application developers. In thismore » work, we present a mod- ification of some of the benchmarks of the NAS parallel benchmark (NPB) to include support of the ULFM capabilities as well as application-level strategies and mechanisms for application-level failure recovery. As such, we present: (i) an application-level library to checkpoint and restore data, (ii) extensions of NPB benchmarks for fault tolerance based on different strategies, (iii) a fault injection tool, and (iv) some preliminary results that show the impact of such fault tolerant strategies on the application execution.« less
Cross-Evaluation of Degree Programmes in Higher Education

ERIC Educational Resources Information Center

Kettunen, Juha

2010-01-01

Purpose: This study seeks to develop and describe the benchmarking approach of enhancement-led evaluation in higher education and to present a cross-evaluation process for degree programmes. Design/methodology/approach: The benchmarking approach produces useful information for the development of degree programmes based on self-evaluation,…
A One-group, One-dimensional Transport Benchmark in Cylindrical Geometry

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barry Ganapol; Abderrafi M. Ougouag

A 1-D, 1-group computational benchmark in cylndrical geometry is described. This neutron transport benchmark is useful for evaluating reactor concepts that possess azimuthal symmetry such as a pebble-bed reactor.
Benchmarking routine psychological services: a discussion of challenges and methods.

PubMed

Delgadillo, Jaime; McMillan, Dean; Leach, Chris; Lucock, Mike; Gilbody, Simon; Wood, Nick

2014-01-01

Policy developments in recent years have led to important changes in the level of access to evidence-based psychological treatments. Several methods have been used to investigate the effectiveness of these treatments in routine care, with different approaches to outcome definition and data analysis. To present a review of challenges and methods for the evaluation of evidence-based treatments delivered in routine mental healthcare. This is followed by a case example of a benchmarking method applied in primary care. High, average and poor performance benchmarks were calculated through a meta-analysis of published data from services working under the Improving Access to Psychological Therapies (IAPT) Programme in England. Pre-post treatment effect sizes (ES) and confidence intervals were estimated to illustrate a benchmarking method enabling services to evaluate routine clinical outcomes. High, average and poor performance ES for routine IAPT services were estimated to be 0.91, 0.73 and 0.46 for depression (using PHQ-9) and 1.02, 0.78 and 0.52 for anxiety (using GAD-7). Data from one specific IAPT service exemplify how to evaluate and contextualize routine clinical performance against these benchmarks. The main contribution of this report is to summarize key recommendations for the selection of an adequate set of psychometric measures, the operational definition of outcomes, and the statistical evaluation of clinical performance. A benchmarking method is also presented, which may enable a robust evaluation of clinical performance against national benchmarks. Some limitations concerned significant heterogeneity among data sources, and wide variations in ES and data completeness.
Benchmark Simulation Model No 2: finalisation of plant layout and default control strategy.

PubMed

Nopens, I; Benedetti, L; Jeppsson, U; Pons, M-N; Alex, J; Copp, J B; Gernaey, K V; Rosen, C; Steyer, J-P; Vanrolleghem, P A

2010-01-01

The COST/IWA Benchmark Simulation Model No 1 (BSM1) has been available for almost a decade. Its primary purpose has been to create a platform for control strategy benchmarking of activated sludge processes. The fact that the research work related to the benchmark simulation models has resulted in more than 300 publications worldwide demonstrates the interest in and need of such tools within the research community. Recent efforts within the IWA Task Group on "Benchmarking of control strategies for WWTPs" have focused on an extension of the benchmark simulation model. This extension aims at facilitating control strategy development and performance evaluation at a plant-wide level and, consequently, includes both pretreatment of wastewater as well as the processes describing sludge treatment. The motivation for the extension is the increasing interest and need to operate and control wastewater treatment systems not only at an individual process level but also on a plant-wide basis. To facilitate the changes, the evaluation period has been extended to one year. A prolonged evaluation period allows for long-term control strategies to be assessed and enables the use of control handles that cannot be evaluated in a realistic fashion in the one week BSM1 evaluation period. In this paper, the finalised plant layout is summarised and, as was done for BSM1, a default control strategy is proposed. A demonstration of how BSM2 can be used to evaluate control strategies is also given.
Evaluation of Neutron Radiography Reactor LEU-Core Start-Up Measurements

DOE PAGES

Bess, John D.; Maddock, Thomas L.; Smolinski, Andrew T.; ...

2014-11-04

Benchmark models were developed to evaluate the cold-critical start-up measurements performed during the fresh core reload of the Neutron Radiography (NRAD) reactor with Low Enriched Uranium (LEU) fuel. Experiments include criticality, control-rod worth measurements, shutdown margin, and excess reactivity for four core loadings with 56, 60, 62, and 64 fuel elements. The worth of four graphite reflector block assemblies and an empty dry tube used for experiment irradiations were also measured and evaluated for the 60-fuel-element core configuration. Dominant uncertainties in the experimental k eff come from uncertainties in the manganese content and impurities in the stainless steel fuel claddingmore » as well as the 236U and erbium poison content in the fuel matrix. Calculations with MCNP5 and ENDF/B-VII.0 neutron nuclear data are approximately 1.4% (9σ) greater than the benchmark model eigenvalues, which is commonly seen in Monte Carlo simulations of other TRIGA reactors. Simulations of the worth measurements are within the 2σ uncertainty for most of the benchmark experiment worth values. The complete benchmark evaluation details are available in the 2014 edition of the International Handbook of Evaluated Reactor Physics Benchmark Experiments.« less
Evaluation of Neutron Radiography Reactor LEU-Core Start-Up Measurements

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bess, John D.; Maddock, Thomas L.; Smolinski, Andrew T.

Benchmark models were developed to evaluate the cold-critical start-up measurements performed during the fresh core reload of the Neutron Radiography (NRAD) reactor with Low Enriched Uranium (LEU) fuel. Experiments include criticality, control-rod worth measurements, shutdown margin, and excess reactivity for four core loadings with 56, 60, 62, and 64 fuel elements. The worth of four graphite reflector block assemblies and an empty dry tube used for experiment irradiations were also measured and evaluated for the 60-fuel-element core configuration. Dominant uncertainties in the experimental k eff come from uncertainties in the manganese content and impurities in the stainless steel fuel claddingmore » as well as the 236U and erbium poison content in the fuel matrix. Calculations with MCNP5 and ENDF/B-VII.0 neutron nuclear data are approximately 1.4% (9σ) greater than the benchmark model eigenvalues, which is commonly seen in Monte Carlo simulations of other TRIGA reactors. Simulations of the worth measurements are within the 2σ uncertainty for most of the benchmark experiment worth values. The complete benchmark evaluation details are available in the 2014 edition of the International Handbook of Evaluated Reactor Physics Benchmark Experiments.« less

Benchmarking transportation logistics practices for effective system planning

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thrower, A.W.; Dravo, A.N.; Keister, M.

2007-07-01

This paper presents preliminary findings of an Office of Civilian Radioactive Waste Management (OCRWM) benchmarking project to identify best practices for logistics enterprises. The results will help OCRWM's Office of Logistics Management (OLM) design and implement a system to move spent nuclear fuel (SNF) and high-level radioactive waste (HLW) to the Yucca Mountain repository for disposal when that facility is licensed and built. This report suggests topics for additional study. The project team looked at three Federal radioactive material logistics operations that are widely viewed to be successful: (1) the Waste Isolation Pilot Plant (WIPP) in Carlsbad, New Mexico; (2)more » the Naval Nuclear Propulsion Program (NNPP); and (3) domestic and foreign research reactor (FRR) SNF acceptance programs. (authors)« less
SU-E-T-148: Benchmarks and Pre-Treatment Reviews: A Study of Quality Assurance Effectiveness

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lowenstein, J; Nguyen, H; Roll, J

Purpose: To determine the impact benchmarks and pre-treatment reviews have on improving the quality of submitted clinical trial data. Methods: Benchmarks are used to evaluate a site’s ability to develop a treatment that meets a specific protocol’s treatment guidelines prior to placing their first patient on the protocol. A pre-treatment review is an actual patient placed on the protocol in which the dosimetry and contour volumes are evaluated to be per protocol guidelines prior to allowing the beginning of the treatment. A key component of these QA mechanisms is that sites are provided timely feedback to educate them on howmore » to plan per the protocol and prevent protocol deviations on patients accrued to a protocol. For both benchmarks and pre-treatment reviews a dose volume analysis (DVA) was performed using MIM softwareTM. For pre-treatment reviews a volume contour evaluation was also performed. Results: IROC Houston performed a QA effectiveness analysis of a protocol which required both benchmarks and pre-treatment reviews. In 70 percent of the patient cases submitted, the benchmark played an effective role in assuring that the pre-treatment review of the cases met protocol requirements. The 35 percent of sites failing the benchmark subsequently modified there planning technique to pass the benchmark before being allowed to submit a patient for pre-treatment review. However, in 30 percent of the submitted cases the pre-treatment review failed where the majority (71 percent) failed the DVA. 20 percent of sites submitting patients failed to correct their dose volume discrepancies indicated by the benchmark case. Conclusion: Benchmark cases and pre-treatment reviews can be an effective QA tool to educate sites on protocol guidelines and to minimize deviations. Without the benchmark cases it is possible that 65 percent of the cases undergoing a pre-treatment review would have failed to meet the protocols requirements.Support: U24-CA-180803.« less
New Ecological Paradigm and Sustainability Attitudes with Respect to a Multi-Cultural Educational Milieu in China

ERIC Educational Resources Information Center

Wells, Mona; Petherick, Lynda

2016-01-01

Institutions of higher education are increasingly interested in how the student experience may or may not influence world views and particularly with respect to sustainability. Here we report preliminary results from a New Ecological Paradigm (NEP) study in China to benchmark student responses, and we relate these results to findings from other…
77 FR 45582 - Certain Pasta From Italy: Preliminary Results of the 15th (2010) Countervailing Duty...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-08-01

... of the loan (e.g., short-term v. long-term), and the currency in which the loan is denominated...'s subsidies to the sales of Tomasello only. Benchmarks for Long-Term Loans and Discount Rates Loan..., the long-term interest rate calculated according to the methodology described above for the year in...
A Pilot Study of Sensation-Focused Intensive Treatment for Panic Disorder with Moderate to Severe Agoraphobia: Preliminary Outcome and Benchmarking Data

ERIC Educational Resources Information Center

Bitran, Stella; Morissette, Sandra B.; Spiegel, David A.; Barlow, David H.

2008-01-01

This report presents results of a treatment for panic disorder with moderate to severe agoraphobia (PDA-MS) called sensation-focused intensive treatment (SFIT). SFIT is an 8-day intensive treatment that combines features of cognitive-behavioral treatment for panic disorder, such as interoceptive exposure and cognitive restructuring with ungraded…
ICSBEP Benchmarks For Nuclear Data Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Briggs, J. Blair

2005-05-24

The International Criticality Safety Benchmark Evaluation Project (ICSBEP) was initiated in 1992 by the United States Department of Energy. The ICSBEP became an official activity of the Organization for Economic Cooperation and Development (OECD) -- Nuclear Energy Agency (NEA) in 1995. Representatives from the United States, United Kingdom, France, Japan, the Russian Federation, Hungary, Republic of Korea, Slovenia, Serbia and Montenegro (formerly Yugoslavia), Kazakhstan, Spain, Israel, Brazil, Poland, and the Czech Republic are now participating. South Africa, India, China, and Germany are considering participation. The purpose of the ICSBEP is to identify, evaluate, verify, and formally document a comprehensive andmore » internationally peer-reviewed set of criticality safety benchmark data. The work of the ICSBEP is published as an OECD handbook entitled ''International Handbook of Evaluated Criticality Safety Benchmark Experiments.'' The 2004 Edition of the Handbook contains benchmark specifications for 3331 critical or subcritical configurations that are intended for use in validation efforts and for testing basic nuclear data. New to the 2004 Edition of the Handbook is a draft criticality alarm / shielding type benchmark that should be finalized in 2005 along with two other similar benchmarks. The Handbook is being used extensively for nuclear data testing and is expected to be a valuable resource for code and data validation and improvement efforts for decades to come. Specific benchmarks that are useful for testing structural materials such as iron, chromium, nickel, and manganese; beryllium; lead; thorium; and 238U are highlighted.« less
78 FR 27957 - Fisheries of the South Atlantic, Southeast Data, Assessment, and Review (SEDAR); Public Meetings

Federal Register 2010, 2011, 2012, 2013, 2014

2013-05-13

..., describes the fisheries, evaluates the status of the stock, estimates biological benchmarks, projects future.... Participants will evaluate and recommend datasets appropriate for assessment analysis, employ assessment models to evaluate stock status, estimate population benchmarks and management criteria, and project future...
Reengineering of waste management at the Oak Ridge National Laboratory. Volume 1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Myrick, T.E.

1997-08-01

A reengineering evaluation of the waste management program at the Oak Ridge National Laboratory (ORNL) was conducted during the months of February through July 1997. The goal of the reengineering was to identify ways in which the waste management process could be streamlined and improved to reduce costs while maintaining full compliance and customer satisfaction. A Core Team conducted preliminary evaluations and determined that eight particular aspects of the ORNL waste management program warranted focused investigations during the reengineering. The eight areas included Pollution Prevention, Waste Characterization, Waste Certification/Verification, Hazardous/Mixed Waste Stream, Generator/WM Teaming, Reporting/Records, Disposal End Points, and On-Sitemore » Treatment/Storage. The Core Team commissioned and assembled Process Teams to conduct in-depth evaluations of each of these eight areas. The Core Team then evaluated the Process Team results and consolidated the 80 process-specific recommendations into 15 overall recommendations. Benchmarking of a commercial nuclear facility, a commercial research facility, and a DOE research facility was conducted to both validate the efficacy of these findings and seek additional ideas for improvement. The outcome of this evaluation is represented by the 15 final recommendations that are described in this report.« less
Benchmark Testing of a New 56Fe Evaluation for Criticality Safety Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Leal, Luiz C; Ivanov, E.

2015-01-01

The SAMMY code was used to evaluate resonance parameters of the 56Fe cross section in the resolved resonance energy range of 0–2 MeV using transmission data, capture, elastic, inelastic, and double differential elastic cross sections. The resonance analysis was performed with the code SAMMY that fits R-matrix resonance parameters using the generalized least-squares technique (Bayes’ theory). The evaluation yielded a set of resonance parameters that reproduced the experimental data very well, along with a resonance parameter covariance matrix for data uncertainty calculations. Benchmark tests were conducted to assess the evaluation performance in benchmark calculations.
Chemometrics resolution and quantification power evaluation: Application on pharmaceutical quaternary mixture of Paracetamol, Guaifenesin, Phenylephrine and p-aminophenol.

PubMed

Yehia, Ali M; Mohamed, Heba M

2016-01-05

Three advanced chemmometric-assisted spectrophotometric methods namely; Concentration Residuals Augmented Classical Least Squares (CRACLS), Multivariate Curve Resolution-Alternating Least Squares (MCR-ALS) and Principal Component Analysis-Artificial Neural Networks (PCA-ANN) were developed, validated and benchmarked to PLS calibration; to resolve the severely overlapped spectra and simultaneously determine; Paracetamol (PAR), Guaifenesin (GUA) and Phenylephrine (PHE) in their ternary mixture and in presence of p-aminophenol (AP) the main degradation product and synthesis impurity of Paracetamol. The analytical performance of the proposed methods was described by percentage recoveries, root mean square error of calibration and standard error of prediction. The four multivariate calibration methods could be directly used without any preliminary separation step and successfully applied for pharmaceutical formulation analysis, showing no excipients' interference. Copyright © 2015 Elsevier B.V. All rights reserved.
Partially-Averaged Navier Stokes Model for Turbulence: Implementation and Validation

NASA Technical Reports Server (NTRS)

Girimaji, Sharath S.; Abdol-Hamid, Khaled S.

2005-01-01

Partially-averaged Navier Stokes (PANS) is a suite of turbulence closure models of various modeled-to-resolved scale ratios ranging from Reynolds-averaged Navier Stokes (RANS) to Navier-Stokes (direct numerical simulations). The objective of PANS, like hybrid models, is to resolve large scale structures at reasonable computational expense. The modeled-to-resolved scale ratio or the level of physical resolution in PANS is quantified by two parameters: the unresolved-to-total ratios of kinetic energy (f(sub k)) and dissipation (f(sub epsilon)). The unresolved-scale stress is modeled with the Boussinesq approximation and modeled transport equations are solved for the unresolved kinetic energy and dissipation. In this paper, we first present a brief discussion of the PANS philosophy followed by a description of the implementation procedure and finally perform preliminary evaluation in benchmark problems.
9.4T Human MRI: Preliminary Results

PubMed Central

Vaughan, Thomas; DelaBarre, Lance; Snyder, Carl; Tian, Jinfeng; Akgun, Can; Shrivastava, Devashish; Liu, Wanzahn; Olson, Chris; Adriany, Gregor; Strupp, John; Andersen, Peter; Gopinath, Anand; van de Moortele, Pierre-Francois; Garwood, Michael; Ugurbil, Kamil

2014-01-01

This work reports the preliminary results of the first human images at the new high-field benchmark of 9.4T. A 65-cm-diameter bore magnet was used together with an asymmetric 40-cm-diameter head gradient and shim set. A multichannel transmission line (transverse electromagnetic (TEM)) head coil was driven by a programmable parallel transceiver to control the relative phase and magnitude of each channel independently. These new RF field control methods facilitated compensation for RF artifacts attributed to destructive interference patterns, in order to achieve homogeneous 9.4T head images or localize anatomic targets. Prior to FDA investigational device exemptions (IDEs) and internal review board (IRB)-approved human studies, preliminary RF safety studies were performed on porcine models. These data are reported together with exit interview results from the first 44 human volunteers. Although several points for improvement are discussed, the preliminary results demonstrate the feasibility of safe and successful human imaging at 9.4T. PMID:17075852
75 FR 75628 - Final Guidance for Federal Departments and Agencies on Establishing, Applying, and Revising...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-12-06

... and facilitate the use of documentation in future evaluations and benchmarking. Extraordinary.... Benchmarking Other Agencies' Experiences A Federal agency cannot rely on another agency's categorical exclusion... was established. Federal agencies can also substantiate categorical exclusions by benchmarking, or...
INTEGRAL BENCHMARK DATA FOR NUCLEAR DATA TESTING THROUGH THE ICSBEP AND THE NEWLY ORGANIZED IRPHEP

DOE Office of Scientific and Technical Information (OSTI.GOV)

J. Blair Briggs; Lori Scott; Yolanda Rugama

The status of the International Criticality Safety Benchmark Evaluation Project (ICSBEP) was last reported in a nuclear data conference at the International Conference on Nuclear Data for Science and Technology, ND-2004, in Santa Fe, New Mexico. Since that time the number and type of integral benchmarks have increased significantly. Included in the ICSBEP Handbook are criticality-alarm / shielding and fundamental physic benchmarks in addition to the traditional critical / subcritical benchmark data. Since ND 2004, a reactor physics counterpart to the ICSBEP, the International Reactor Physics Experiment Evaluation Project (IRPhEP) was initiated. The IRPhEP is patterned after the ICSBEP, butmore » focuses on other integral measurements, such as buckling, spectral characteristics, reactivity effects, reactivity coefficients, kinetics measurements, reaction-rate and power distributions, nuclide compositions, and other miscellaneous-type measurements in addition to the critical configuration. The status of these two projects is discussed and selected benchmarks highlighted in this paper.« less
Benchmarking nitrogen removal suspended-carrier biofilm systems using dynamic simulation.

PubMed

Vanhooren, H; Yuan, Z; Vanrolleghem, P A

2002-01-01

We are witnessing an enormous growth in biological nitrogen removal from wastewater. It presents specific challenges beyond traditional COD (carbon) removal. A possibility for optimised process design is the use of biomass-supporting media. In this paper, attached growth processes (AGP) are evaluated using dynamic simulations. The advantages of these systems that were qualitatively described elsewhere, are validated quantitatively based on a simulation benchmark for activated sludge treatment systems. This simulation benchmark is extended with a biofilm model that allows for fast and accurate simulation of the conversion of different substrates in a biofilm. The economic feasibility of this system is evaluated using the data generated with the benchmark simulations. Capital savings due to volume reduction and reduced sludge production are weighed out against increased aeration costs. In this evaluation, effluent quality is integrated as well.
Computational Chemistry Comparison and Benchmark Database

National Institute of Standards and Technology Data Gateway

SRD 101 NIST Computational Chemistry Comparison and Benchmark Database (Web, free access) The NIST Computational Chemistry Comparison and Benchmark Database is a collection of experimental and ab initio thermochemical properties for a selected set of molecules. The goals are to provide a benchmark set of molecules for the evaluation of ab initio computational methods and allow the comparison between different ab initio computational methods for the prediction of thermochemical properties.
76 FR 48130 - Certain Pasta From Italy: Preliminary Results of the 14th (2009) Countervailing Duty...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-08-08

... rate versus variable interest rate), the maturity of the loan (e.g., short-term versus long-term), and... were received. Benchmarks for Long-Term Loans and Discount Rates Pursuant to 19 CFR 351.505(a), the... using the national average cost of long-term, fixed-rate loans pursuant to 19 CFR 351.524(d)(3)(B...
An RF amplifier for ICRF studies in the LAPD

NASA Astrophysics Data System (ADS)

Martin, M. J.; Pribyl, P.; Gekelman, W.; Lucky, Z.

2015-12-01

An RF amplifier system was designed and is under construction at the UCLA Basic Plasma Science Facility. The system is designed to output 200 kW peak RMS power at 1% duty cycle with a 1 Hz rep rate at frequencies of 2-6 MHz. This paper describes the RF amplifier system with preliminary benchmarks. Current design challenges and future work are discussed.
Benchmarking: measuring the outcomes of evidence-based practice.

PubMed

DeLise, D C; Leasure, A R

2001-01-01

Measurement of the outcomes associated with implementation of evidence-based practice changes is becoming increasingly emphasized by multiple health care disciplines. A final step to the process of implementing and sustaining evidence-supported practice changes is that of outcomes evaluation and monitoring. The comparison of outcomes to internal and external measures is known as benchmarking. This article discusses evidence-based practice, provides an overview of outcomes evaluation, and describes the process of benchmarking to improve practice. A case study is used to illustrate this concept.
Assessing Student Understanding of the "New Biology": Development and Evaluation of a Criterion-Referenced Genomics and Bioinformatics Assessment

NASA Astrophysics Data System (ADS)

Campbell, Chad Edward

Over the past decade, hundreds of studies have introduced genomics and bioinformatics (GB) curricula and laboratory activities at the undergraduate level. While these publications have facilitated the teaching and learning of cutting-edge content, there has yet to be an evaluation of these assessment tools to determine if they are meeting the quality control benchmarks set forth by the educational research community. An analysis of these assessment tools indicated that <10% referenced any quality control criteria and that none of the assessments met more than one of the quality control benchmarks. In the absence of evidence that these benchmarks had been met, it is unclear whether these assessment tools are capable of generating valid and reliable inferences about student learning. To remedy this situation the development of a robust GB assessment aligned with the quality control benchmarks was undertaken in order to ensure evidence-based evaluation of student learning outcomes. Content validity is a central piece of construct validity, and it must be used to guide instrument and item development. This study reports on: (1) the correspondence of content validity evidence gathered from independent sources; (2) the process of item development using this evidence; (3) the results from a pilot administration of the assessment; (4) the subsequent modification of the assessment based on the pilot administration results and; (5) the results from the second administration of the assessment. Twenty-nine different subtopics within GB (Appendix B: Genomics and Bioinformatics Expert Survey) were developed based on preliminary GB textbook analyses. These subtopics were analyzed using two methods designed to gather content validity evidence: (1) a survey of GB experts (n=61) and (2) a detailed content analyses of GB textbooks (n=6). By including only the subtopics that were shown to have robust support across these sources, 22 GB subtopics were established for inclusion in the assessment. An expert panel subsequently developed, evaluated, and revised two multiple-choice items to align with each of the 22 subtopics, producing a final item pool of 44 items. These items were piloted with student samples of varying content exposure levels. Both Classical Test Theory (CTT) and Item Response Theory (IRT) methodologies were used to evaluate the assessment's validity, reliability and ability inferences, and its ability to differentiate students with different magnitudes of content exposure. A total of 18 items were subsequently modified and reevaluated by an expert panel. The 26 original and 18 modified items were once again piloted with student samples of varying content exposure levels. Both CTT and IRT methodologies were once again used to evaluate student responses in order to evaluate the assessment's validity and reliability inferences as well as its ability to differentiate students with different magnitudes of content exposure. Interviews with students from different content exposure levels were also performed in order to gather convergent validity evidence (external validity evidence) as well as substantive validity evidence. Also included are the limitations of the assessment and a set of guidelines on how the assessment can best be used.

Energy benchmarking in wastewater treatment plants: the importance of site operation and layout.

PubMed

Belloir, C; Stanford, C; Soares, A

2015-01-01

Energy benchmarking is a powerful tool in the optimization of wastewater treatment plants (WWTPs) in helping to reduce costs and greenhouse gas emissions. Traditionally, energy benchmarking methods focused solely on reporting electricity consumption, however, recent developments in this area have led to the inclusion of other types of energy, including electrical, manual, chemical and mechanical consumptions that can be expressed in kWh/m3. In this study, two full-scale WWTPs were benchmarked, both incorporated preliminary, secondary (oxidation ditch) and tertiary treatment processes, Site 1 also had an additional primary treatment step. The results indicated that Site 1 required 2.32 kWh/m3 against 0.98 kWh/m3 for Site 2. Aeration presented the highest energy consumption for both sites with 2.08 kWh/m3 required for Site 1 and 0.91 kWh/m3 in Site 2. The mechanical energy represented the second biggest consumption for Site 1 (9%, 0.212 kWh/m3) and chemical input was significant in Site 2 (4.1%, 0.026 kWh/m3). The analysis of the results indicated that Site 2 could be optimized by constructing a primary settling tank that would reduce the biochemical oxygen demand, total suspended solids and NH4 loads to the oxidation ditch by 55%, 75% and 12%, respectively, and at the same time reduce the aeration requirements by 49%. This study demonstrated that the effectiveness of the energy benchmarking exercise in identifying the highest energy-consuming assets, nevertheless it points out the need to develop a holistic overview of the WWTP and the need to include parameters such as effluent quality, site operation and plant layout to allow adequate benchmarking.
New Multi-group Transport Neutronics (PHISICS) Capabilities for RELAP5-3D and its Application to Phase I of the OECD/NEA MHTGR-350 MW Benchmark

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gerhard Strydom; Cristian Rabiti; Andrea Alfonsi

2012-10-01

PHISICS is a neutronics code system currently under development at the Idaho National Laboratory (INL). Its goal is to provide state of the art simulation capability to reactor designers. The different modules for PHISICS currently under development are a nodal and semi-structured transport core solver (INSTANT), a depletion module (MRTAU) and a cross section interpolation (MIXER) module. The INSTANT module is the most developed of the mentioned above. Basic functionalities are ready to use, but the code is still in continuous development to extend its capabilities. This paper reports on the effort of coupling the nodal kinetics code package PHISICSmore » (INSTANT/MRTAU/MIXER) to the thermal hydraulics system code RELAP5-3D, to enable full core and system modeling. This will enable the possibility to model coupled (thermal-hydraulics and neutronics) problems with more options for 3D neutron kinetics, compared to the existing diffusion theory neutron kinetics module in RELAP5-3D (NESTLE). In the second part of the paper, an overview of the OECD/NEA MHTGR-350 MW benchmark is given. This benchmark has been approved by the OECD, and is based on the General Atomics 350 MW Modular High Temperature Gas Reactor (MHTGR) design. The benchmark includes coupled neutronics thermal hydraulics exercises that require more capabilities than RELAP5-3D with NESTLE offers. Therefore, the MHTGR benchmark makes extensive use of the new PHISICS/RELAP5-3D coupling capabilities. The paper presents the preliminary results of the three steady state exercises specified in Phase I of the benchmark using PHISICS/RELAP5-3D.« less
Benchmarks for Psychotherapy Efficacy in Adult Major Depression

ERIC Educational Resources Information Center

Minami, Takuya; Wampold, Bruce E.; Serlin, Ronald C.; Kircher, John C.; Brown, George S.

2007-01-01

This study estimates pretreatment-posttreatment effect size benchmarks for the treatment of major depression in adults that may be useful in evaluating psychotherapy effectiveness in clinical practice. Treatment efficacy benchmarks for major depression were derived for 3 different types of outcome measures: the Hamilton Rating Scale for Depression…
BENCHMARK DOSES FOR CHEMICAL MIXTURES: EVALUATION OF A MIXTURE OF 18 PHAHS.

EPA Science Inventory

Benchmark doses (BMDs), defined as doses of a substance that are expected to result in a pre-specified level of "benchmark" response (BMR), have been used for quantifying the risk associated with exposure to environmental hazards. The lower confidence limit of the BMD is used as...
The Suite for Embedded Applications and Kernels

DOE Office of Scientific and Technical Information (OSTI.GOV)

2016-05-10

Many applications of high performance embedded computing are limited by performance or power bottlenecks. We havedesigned SEAK, a new benchmark suite, (a) to capture these bottlenecks in a way that encourages creative solutions to these bottlenecks? and (b) to facilitate rigorous, objective, end-user evaluation for their solutions. To avoid biasing solutions toward existing algorithms, SEAK benchmarks use a mission-centric (abstracted from a particular algorithm) andgoal-oriented (functional) specification. To encourage solutions that are any combination of software or hardware, we use an end-user blackbox evaluation that can capture tradeoffs between performance, power, accuracy, size, and weight. The tradeoffs are especially informativemore » for procurement decisions. We call our benchmarks future proof because each mission-centric interface and evaluation remains useful despite shifting algorithmic preferences. It is challenging to create both concise and precise goal-oriented specifications for mission-centric problems. This paper describes the SEAK benchmark suite and presents an evaluation of sample solutions that highlights power and performance tradeoffs.« less
A review on the benchmarking concept in Malaysian construction safety performance

NASA Astrophysics Data System (ADS)

Ishak, Nurfadzillah; Azizan, Muhammad Azizi

2018-02-01

Construction industry is one of the major industries that propels Malaysia's economy in highly contributes to our nation's GDP growth, yet the high fatality rates on construction sites have caused concern among safety practitioners and the stakeholders. Hence, there is a need of benchmarking in performance of Malaysia's construction industry especially in terms of safety. This concept can create a fertile ground for ideas, but only in a receptive environment, organization that share good practices and compare their safety performance against other benefit most to establish improvement in safety culture. This research was conducted to study the awareness important, evaluate current practice and improvement, and also identify the constraint in implement of benchmarking on safety performance in our industry. Additionally, interviews with construction professionals were come out with different views on this concept. Comparison has been done to show the different understanding of benchmarking approach and how safety performance can be benchmarked. But, it's viewed as one mission, which to evaluate objectives identified through benchmarking that will improve the organization's safety performance. Finally, the expected result from this research is to help Malaysia's construction industry implement best practice in safety performance management through the concept of benchmarking.
Benchmarking reference services: an introduction.

PubMed

Marshall, J G; Buchanan, H S

1995-01-01

Benchmarking is based on the common sense idea that someone else, either inside or outside of libraries, has found a better way of doing certain things and that your own library's performance can be improved by finding out how others do things and adopting the best practices you find. Benchmarking is one of the tools used for achieving continuous improvement in Total Quality Management (TQM) programs. Although benchmarking can be done on an informal basis, TQM puts considerable emphasis on formal data collection and performance measurement. Used to its full potential, benchmarking can provide a common measuring stick to evaluate process performance. This article introduces the general concept of benchmarking, linking it whenever possible to reference services in health sciences libraries. Data collection instruments that have potential application in benchmarking studies are discussed and the need to develop common measurement tools to facilitate benchmarking is emphasized.
A possible approach to 14MeV neutron moderation: A preliminary study case.

PubMed

Flammini, D; Pilotti, R; Pietropaolo, A

2017-07-01

Deuterium-Tritium (D-T) interactions produce almost monochromatic neutrons with about 14MeV energy. These neutrons are used in benchmark experiments as well as for neutron cross sections assessment in fusion reactors technology. The possibility to moderate 14MeV neutrons for purposes beyond fusion is worth to be studied in relation to projects of intense D-T sources. In this preliminary study, carried out using the MCNP Monte Carlo code, the moderation of 14MeV neutrons is approached foreseeing the use of combination of metallic materials as pre-moderator and reflectors coupled to standard water moderators. Copyright © 2017 Elsevier Ltd. All rights reserved.
Development and application of freshwater sediment-toxicity benchmarks for currently used pesticides

USGS Publications Warehouse

Nowell, Lisa H.; Norman, Julia E.; Ingersoll, Christopher G.; Moran, Patrick W.

2016-01-01

Sediment-toxicity benchmarks are needed to interpret the biological significance of currently used pesticides detected in whole sediments. Two types of freshwater sediment benchmarks for pesticides were developed using spiked-sediment bioassay (SSB) data from the literature. These benchmarks can be used to interpret sediment-toxicity data or to assess the potential toxicity of pesticides in whole sediment. The Likely Effect Benchmark (LEB) defines a pesticide concentration in whole sediment above which there is a high probability of adverse effects on benthic invertebrates, and the Threshold Effect Benchmark (TEB) defines a concentration below which adverse effects are unlikely. For compounds without available SSBs, benchmarks were estimated using equilibrium partitioning (EqP). When a sediment sample contains a pesticide mixture, benchmark quotients can be summed for all detected pesticides to produce an indicator of potential toxicity for that mixture. Benchmarks were developed for 48 pesticide compounds using SSB data and 81 compounds using the EqP approach. In an example application, data for pesticides measured in sediment from 197 streams across the United States were evaluated using these benchmarks, and compared to measured toxicity from whole-sediment toxicity tests conducted with the amphipod Hyalella azteca (28-d exposures) and the midge Chironomus dilutus (10-d exposures). Amphipod survival, weight, and biomass were significantly and inversely related to summed benchmark quotients, whereas midge survival, weight, and biomass showed no relationship to benchmarks. Samples with LEB exceedances were rare (n = 3), but all were toxic to amphipods (i.e., significantly different from control). Significant toxicity to amphipods was observed for 72% of samples exceeding one or more TEBs, compared to 18% of samples below all TEBs. Factors affecting toxicity below TEBs may include the presence of contaminants other than pesticides, physical/chemical characteristics of sediment, and uncertainty in TEB values. Additional evaluations of benchmarks in relation to sediment chemistry and toxicity are ongoing.
Benchmarking and Its Relevance to the Library and Information Sector. Interim Findings of "Best Practice Benchmarking in the Library and Information Sector," a British Library Research and Development Department Project.

ERIC Educational Resources Information Center

Kinnell, Margaret; Garrod, Penny

This British Library Research and Development Department study assesses current activities and attitudes toward quality management in library and information services (LIS) in the academic sector as well as the commercial/industrial sector. Definitions and types of benchmarking are described, and the relevance of benchmarking to LIS is evaluated.…
Teaching Medical Students at a Distance: Using Distance Learning Benchmarks to Plan and Evaluate a Web-Enhanced Medical Student Curriculum

ERIC Educational Resources Information Center

Olney, Cynthia A.; Chumley, Heidi; Parra, Juan M.

2004-01-01

A team designing a Web-enhanced third-year medical education didactic curriculum based their course planning and evaluation activities on the Institute for Higher Education Policy's (2000) 24 benchmarks for online distance learning. The authors present the team's blueprint for planning and evaluating the Web-enhanced curriculum, which incorporates…
Coreference Resolution With Reconcile

DTIC Science & Technology

2010-07-01

evaluation of coreference re- solvers across a variety of benchmark data sets and standard scoring metrics. We describe Reconcile and present experimental... scores vary wildly across data sets, evaluation metrics, and system configurations. We believe that one root cause of these dispar- ities is the high...resolution and empirical evaluation of coreference resolvers across a variety of benchmark data sets and standard scoring metrics. We describe Reconcile
Analysis of a benchmark suite to evaluate mixed numeric and symbolic processing

NASA Technical Reports Server (NTRS)

Ragharan, Bharathi; Galant, David

1992-01-01

The suite of programs that formed the benchmark for a proposed advanced computer is described and analyzed. The features of the processor and its operating system that are tested by the benchmark are discussed. The computer codes and the supporting data for the analysis are given as appendices.
Outcome Benchmarks for Adaptations of Research-Supported Treatments for Adult Traumatic Stress

ERIC Educational Resources Information Center

Rubin, Allen; Parrish, Danielle E.; Washburn, Micki

2016-01-01

This article provides benchmark data on within-group effect sizes from published randomized controlled trials (RCTs) that evaluated the efficacy of research-supported treatments (RSTs) for adult traumatic stress. Agencies can compare these benchmarks to their treatment group effect size to inform their decisions as to whether the way they are…
IT-benchmarking of clinical workflows: concept, implementation, and evaluation.

PubMed

Thye, Johannes; Straede, Matthias-Christopher; Liebe, Jan-David; Hübner, Ursula

2014-01-01

Due to the emerging evidence of health IT as opportunity and risk for clinical workflows, health IT must undergo a continuous measurement of its efficacy and efficiency. IT-benchmarks are a proven means for providing this information. The aim of this study was to enhance the methodology of an existing benchmarking procedure by including, in particular, new indicators of clinical workflows and by proposing new types of visualisation. Drawing on the concept of information logistics, we propose four workflow descriptors that were applied to four clinical processes. General and specific indicators were derived from these descriptors and processes. 199 chief information officers (CIOs) took part in the benchmarking. These hospitals were assigned to reference groups of a similar size and ownership from a total of 259 hospitals. Stepwise and comprehensive feedback was given to the CIOs. Most participants who evaluated the benchmark rated the procedure as very good, good, or rather good (98.4%). Benchmark information was used by CIOs for getting a general overview, advancing IT, preparing negotiations with board members, and arguing for a new IT project.
Benchmark Evaluation of the HTR-PROTEUS Absorber Rod Worths (Core 4)

DOE Office of Scientific and Technical Information (OSTI.GOV)

John D. Bess; Leland M. Montierth

2014-06-01

PROTEUS was a zero-power research reactor at the Paul Scherrer Institute (PSI) in Switzerland. The critical assembly was constructed from a large graphite annulus surrounding a central cylindrical cavity. Various experimental programs were investigated in PROTEUS; during the years 1992 through 1996, it was configured as a pebble-bed reactor and designated HTR-PROTEUS. Various critical configurations were assembled with each accompanied by an assortment of reactor physics experiments including differential and integral absorber rod measurements, kinetics, reaction rate distributions, water ingress effects, and small sample reactivity effects [1]. Four benchmark reports were previously prepared and included in the March 2013 editionmore » of the International Handbook of Evaluated Reactor Physics Benchmark Experiments (IRPhEP Handbook) [2] evaluating eleven critical configurations. A summary of that effort was previously provided [3] and an analysis of absorber rod worth measurements for Cores 9 and 10 have been performed prior to this analysis and included in PROTEUS-GCR-EXP-004 [4]. In the current benchmark effort, absorber rod worths measured for Core Configuration 4, which was the only core with a randomly-packed pebble loading, have been evaluated for inclusion as a revision to the HTR-PROTEUS benchmark report PROTEUS-GCR-EXP-002.« less
Benchmarking for the Effective Use of Student Evaluation Data

ERIC Educational Resources Information Center

Smithson, John; Birks, Melanie; Harrison, Glenn; Nair, Chenicheri Sid; Hitchins, Marnie

2015-01-01

Purpose: The purpose of this paper is to examine current approaches to interpretation of student evaluation data and present an innovative approach to developing benchmark targets for the effective and efficient use of these data. Design/Methodology/Approach: This article discusses traditional approaches to gathering and using student feedback…
Commercial Building Energy Saver, API

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hong, Tianzhen; Piette, Mary; Lee, Sang Hoon

2015-08-27

The CBES API provides Application Programming Interface to a suite of functions to improve energy efficiency of buildings, including building energy benchmarking, preliminary retrofit analysis using a pre-simulation database DEEP, and detailed retrofit analysis using energy modeling with the EnergyPlus simulation engine. The CBES API is used to power the LBNL CBES Web App. It can be adopted by third party developers and vendors into their software tools and platforms.
Preliminary evaluation of factors associated with premature trial closure and feasibility of accrual benchmarks in phase III oncology trials.

PubMed

Schroen, Anneke T; Petroni, Gina R; Wang, Hongkun; Gray, Robert; Wang, Xiaofei F; Cronin, Walter; Sargent, Daniel J; Benedetti, Jacqueline; Wickerham, Donald L; Djulbegovic, Benjamin; Slingluff, Craig L

2010-08-01

A major challenge for randomized phase III oncology trials is the frequent low rates of patient enrollment, resulting in high rates of premature closure due to insufficient accrual. We conducted a pilot study to determine the extent of trial closure due to poor accrual, feasibility of identifying trial factors associated with sufficient accrual, impact of redesign strategies on trial accrual, and accrual benchmarks designating high failure risk in the clinical trials cooperative group (CTCG) setting. A subset of phase III trials opened by five CTCGs between August 1991 and March 2004 was evaluated. Design elements, experimental agents, redesign strategies, and pretrial accrual assessment supporting accrual predictions were abstracted from CTCG documents. Percent actual/predicted accrual rate averaged per month was calculated. Trials were categorized as having sufficient or insufficient accrual based on reason for trial termination. Analyses included univariate and bivariate summaries to identify potential trial factors associated with accrual sufficiency. Among 40 trials from one CTCG, 21 (52.5%) trials closed due to insufficient accrual. In 82 trials from five CTCGs, therapeutic trials accrued sufficiently more often than nontherapeutic trials (59% vs 27%, p = 0.05). Trials including pretrial accrual assessment more often achieved sufficient accrual than those without (67% vs 47%, p = 0.08). Fewer exclusion criteria, shorter consent forms, other CTCG participation, and trial design simplicity were not associated with achieving sufficient accrual. Trials accruing at a rate much lower than predicted (<35% actual/predicted accrual rate) were consistently closed due to insufficient accrual. This trial subset under-represents certain experimental modalities. Data sources do not allow accounting for all factors potentially related to accrual success. Trial closure due to insufficient accrual is common. Certain trial design factors appear associated with attaining sufficient accrual. Defining accrual benchmarks for early trial termination or redesign is feasible, but better accrual prediction methods are critically needed. Future studies should focus on identifying trial factors that allow more accurate accrual predictions and strategies that can salvage open trials experiencing slow accrual.
Preliminary evaluation of factors associated with premature trial closure and feasibility of accrual benchmarks in phase III oncology trials

PubMed Central

Schroen, Anneke T; Petroni, Gina R; Wang, Hongkun; Gray, Robert; Wang, Xiaofei F; Cronin, Walter; Sargent, Daniel J; Benedetti, Jacqueline; Wickerham, Donald L; Djulbegovic, Benjamin; Slingluff, Craig L

2014-01-01

Background A major challenge for randomized phase III oncology trials is the frequent low rates of patient enrollment, resulting in high rates of premature closure due to insufficient accrual. Purpose We conducted a pilot study to determine the extent of trial closure due to poor accrual, feasibility of identifying trial factors associated with sufficient accrual, impact of redesign strategies on trial accrual, and accrual benchmarks designating high failure risk in the clinical trials cooperative group (CTCG) setting. Methods A subset of phase III trials opened by five CTCGs between August 1991 and March 2004 was evaluated. Design elements, experimental agents, redesign strategies, and pretrial accrual assessment supporting accrual predictions were abstracted from CTCG documents. Percent actual/predicted accrual rate averaged per month was calculated. Trials were categorized as having sufficient or insufficient accrual based on reason for trial termination. Analyses included univariate and bivariate summaries to identify potential trial factors associated with accrual sufficiency. Results Among 40 trials from one CTCG, 21 (52.5%) trials closed due to insufficient accrual. In 82 trials from five CTCGs, therapeutic trials accrued sufficiently more often than nontherapeutic trials (59% vs 27%, p = 0.05). Trials including pretrial accrual assessment more often achieved sufficient accrual than those without (67% vs 47%, p = 0.08). Fewer exclusion criteria, shorter consent forms, other CTCG participation, and trial design simplicity were not associated with achieving sufficient accrual. Trials accruing at a rate much lower than predicted (<35% actual/predicted accrual rate) were consistently closed due to insufficient accrual. Limitations This trial subset under-represents certain experimental modalities. Data sources do not allow accounting for all factors potentially related to accrual success. Conclusion Trial closure due to insufficient accrual is common. Certain trial design factors appear associated with attaining sufficient accrual. Defining accrual benchmarks for early trial termination or redesign is feasible, but better accrual prediction methods are critically needed. Future studies should focus on identifying trial factors that allow more accurate accrual predictions and strategies that can salvage open trials experiencing slow accrual. PMID:20595245

Utilizing Benchmarking to Study the Effectiveness of Parent-Child Interaction Therapy Implemented in a Community Setting

ERIC Educational Resources Information Center

Self-Brown, Shannon; Valente, Jessica R.; Wild, Robert C.; Whitaker, Daniel J.; Galanter, Rachel; Dorsey, Shannon; Stanley, Jenelle

2012-01-01

Benchmarking is a program evaluation approach that can be used to study whether the outcomes of parents/children who participate in an evidence-based program in the community approximate the outcomes found in randomized trials. This paper presents a case illustration using benchmarking methodology to examine a community implementation of…
Simulation of Benchmark Cases with the Terminal Area Simulation System (TASS)

NASA Technical Reports Server (NTRS)

Ahmad, Nashat N.; Proctor, Fred H.

2011-01-01

The hydrodynamic core of the Terminal Area Simulation System (TASS) is evaluated against different benchmark cases. In the absence of closed form solutions for the equations governing atmospheric flows, the models are usually evaluated against idealized test cases. Over the years, various authors have suggested a suite of these idealized cases which have become standards for testing and evaluating the dynamics and thermodynamics of atmospheric flow models. In this paper, simulations of three such cases are described. In addition, the TASS model is evaluated against a test case that uses an exact solution of the Navier-Stokes equations. The TASS results are compared against previously reported simulations of these benchmark cases in the literature. It is demonstrated that the TASS model is highly accurate, stable and robust.
Benchmarking high performance computing architectures with CMS’ skeleton framework

NASA Astrophysics Data System (ADS)

Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.

2017-10-01

In 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta, Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.
A small electron beam ion trap/source facility for electron/neutral–ion collisional spectroscopy in astrophysical plasmas

NASA Astrophysics Data System (ADS)

Liang, Gui-Yun; Wei, Hui-Gang; Yuan, Da-Wei; Wang, Fei-Lu; Peng, Ji-Min; Zhong, Jia-Yong; Zhu, Xiao-Long; Schmidt, Mike; Zschornack, Günter; Ma, Xin-Wen; Zhao, Gang

2018-01-01

Spectra are fundamental observation data used for astronomical research, but understanding them strongly depends on theoretical models with many fundamental parameters from theoretical calculations. Different models give different insights for understanding a specific object. Hence, laboratory benchmarks for these theoretical models become necessary. An electron beam ion trap is an ideal facility for spectroscopic benchmarks due to its similar conditions of electron density and temperature compared to astrophysical plasmas in stellar coronae, supernova remnants and so on. In this paper, we will describe the performance of a small electron beam ion trap/source facility installed at National Astronomical Observatories, Chinese Academy of Sciences.We present some preliminary experimental results on X-ray emission, ion production, the ionization process of trapped ions as well as the effects of charge exchange on the ionization.
Piloting a Process Maturity Model as an e-Learning Benchmarking Method

ERIC Educational Resources Information Center

Petch, Jim; Calverley, Gayle; Dexter, Hilary; Cappelli, Tim

2007-01-01

As part of a national e-learning benchmarking initiative of the UK Higher Education Academy, the University of Manchester is carrying out a pilot study of a method to benchmark e-learning in an institution. The pilot was designed to evaluate the operational viability of a method based on the e-Learning Maturity Model developed at the University of…
Developing a benchmark for emotional analysis of music

PubMed Central

Yang, Yi-Hsuan; Soleymani, Mohammad

2017-01-01

Music emotion recognition (MER) field rapidly expanded in the last decade. Many new methods and new audio features are developed to improve the performance of MER algorithms. However, it is very difficult to compare the performance of the new methods because of the data representation diversity and scarcity of publicly available data. In this paper, we address these problems by creating a data set and a benchmark for MER. The data set that we release, a MediaEval Database for Emotional Analysis in Music (DEAM), is the largest available data set of dynamic annotations (valence and arousal annotations for 1,802 songs and song excerpts licensed under Creative Commons with 2Hz time resolution). Using DEAM, we organized the ‘Emotion in Music’ task at MediaEval Multimedia Evaluation Campaign from 2013 to 2015. The benchmark attracted, in total, 21 active teams to participate in the challenge. We analyze the results of the benchmark: the winning algorithms and feature-sets. We also describe the design of the benchmark, the evaluation procedures and the data cleaning and transformations that we suggest. The results from the benchmark suggest that the recurrent neural network based approaches combined with large feature-sets work best for dynamic MER. PMID:28282400
Decoys Selection in Benchmarking Datasets: Overview and Perspectives

PubMed Central

Réau, Manon; Langenfeld, Florent; Zagury, Jean-François; Lagarde, Nathalie; Montes, Matthieu

2018-01-01

Virtual Screening (VS) is designed to prospectively help identifying potential hits, i.e., compounds capable of interacting with a given target and potentially modulate its activity, out of large compound collections. Among the variety of methodologies, it is crucial to select the protocol that is the most adapted to the query/target system under study and that yields the most reliable output. To this aim, the performance of VS methods is commonly evaluated and compared by computing their ability to retrieve active compounds in benchmarking datasets. The benchmarking datasets contain a subset of known active compounds together with a subset of decoys, i.e., assumed non-active molecules. The composition of both the active and the decoy compounds subsets is critical to limit the biases in the evaluation of the VS methods. In this review, we focus on the selection of decoy compounds that has considerably changed over the years, from randomly selected compounds to highly customized or experimentally validated negative compounds. We first outline the evolution of decoys selection in benchmarking databases as well as current benchmarking databases that tend to minimize the introduction of biases, and secondly, we propose recommendations for the selection and the design of benchmarking datasets. PMID:29416509
Developing a benchmark for emotional analysis of music.

PubMed

Aljanaki, Anna; Yang, Yi-Hsuan; Soleymani, Mohammad

2017-01-01

Music emotion recognition (MER) field rapidly expanded in the last decade. Many new methods and new audio features are developed to improve the performance of MER algorithms. However, it is very difficult to compare the performance of the new methods because of the data representation diversity and scarcity of publicly available data. In this paper, we address these problems by creating a data set and a benchmark for MER. The data set that we release, a MediaEval Database for Emotional Analysis in Music (DEAM), is the largest available data set of dynamic annotations (valence and arousal annotations for 1,802 songs and song excerpts licensed under Creative Commons with 2Hz time resolution). Using DEAM, we organized the 'Emotion in Music' task at MediaEval Multimedia Evaluation Campaign from 2013 to 2015. The benchmark attracted, in total, 21 active teams to participate in the challenge. We analyze the results of the benchmark: the winning algorithms and feature-sets. We also describe the design of the benchmark, the evaluation procedures and the data cleaning and transformations that we suggest. The results from the benchmark suggest that the recurrent neural network based approaches combined with large feature-sets work best for dynamic MER.
Benchmarking Using Basic DBMS Operations

NASA Astrophysics Data System (ADS)

Crolotte, Alain; Ghazal, Ahmad

The TPC-H benchmark proved to be successful in the decision support area. Many commercial database vendors and their related hardware vendors used these benchmarks to show the superiority and competitive edge of their products. However, over time, the TPC-H became less representative of industry trends as vendors keep tuning their database to this benchmark-specific workload. In this paper, we present XMarq, a simple benchmark framework that can be used to compare various software/hardware combinations. Our benchmark model is currently composed of 25 queries that measure the performance of basic operations such as scans, aggregations, joins and index access. This benchmark model is based on the TPC-H data model due to its maturity and well-understood data generation capability. We also propose metrics to evaluate single-system performance and compare two systems. Finally we illustrate the effectiveness of this model by showing experimental results comparing two systems under different conditions.
Modelling of a Solar Thermal Power Plant for Benchmarking Blackbox Optimization Solvers

NASA Astrophysics Data System (ADS)

Lemyre Garneau, Mathieu

A new family of problems is provided to serve as a benchmark for blackbox optimization solvers. The problems are single or bi-objective and vary in complexity in terms of the number of variables used (from 5 to 29), the type of variables (integer, real, category), the number of constraints (from 5 to 17) and their types (binary or continuous). In order to provide problems exhibiting dynamics that reflect real engineering challenges, they are extracted from an original numerical model of a concentrated solar power (CSP) power plant with molten salt thermal storage. The model simulates the performance of the power plant by using a high level modeling of each of its main components, namely, an heliostats field, a central cavity receiver, a molten salt heat storage, a steam generator and an idealized powerblock. The heliostats field layout is determined through a simple automatic strategy that finds the best individual positions on the field by considering their respective cosine efficiency, atmospheric scattering and spillage losses as a function of the design parameters. A Monte-Carlo integral method is used to evaluate the heliostats field's optical performance throughout the day so that shadowing effects between heliostats are considered, and the results of this evaluation provide the inputs to simulate the levels and temperatures of the thermal storage. The molten salt storage inventory is used to transfer thermal energy to the powerblock, which simulates a simple Rankine cycle with a single steam turbine. Auxiliary models are used to provide additional optimization constraints on the investment cost, parasitic losses or components failure. The results of preliminary optimizations performed with the NOMAD software using default settings are provided to show the validity of the problems.
DE-NE0008277_PROTEUS final technical report 2018

DOE Office of Scientific and Technical Information (OSTI.GOV)

Enqvist, Andreas

This project details re-evaluations of experiments of gas-cooled fast reactor (GCFR) core designs performed in the 1970s at the PROTEUS reactor and create a series of International Reactor Physics Experiment Evaluation Project (IRPhEP) benchmarks. Currently there are no gas-cooled fast reactor (GCFR) experiments available in the International Handbook of Evaluated Reactor Physics Benchmark Experiments (IRPhEP Handbook). These experiments are excellent candidates for reanalysis and development of multiple benchmarks because these experiments provide high-quality integral nuclear data relevant to the validation and refinement of thorium, neptunium, uranium, plutonium, iron, and graphite cross sections. It would be cost prohibitive to reproduce suchmore » a comprehensive suite of experimental data to support any future GCFR endeavors.« less
A benchmark for fault tolerant flight control evaluation

NASA Astrophysics Data System (ADS)

Smaili, H.; Breeman, J.; Lombaerts, T.; Stroosma, O.

2013-12-01

A large transport aircraft simulation benchmark (REconfigurable COntrol for Vehicle Emergency Return - RECOVER) has been developed within the GARTEUR (Group for Aeronautical Research and Technology in Europe) Flight Mechanics Action Group 16 (FM-AG(16)) on Fault Tolerant Control (2004 2008) for the integrated evaluation of fault detection and identification (FDI) and reconfigurable flight control strategies. The benchmark includes a suitable set of assessment criteria and failure cases, based on reconstructed accident scenarios, to assess the potential of new adaptive control strategies to improve aircraft survivability. The application of reconstruction and modeling techniques, based on accident flight data, has resulted in high-fidelity nonlinear aircraft and fault models to evaluate new Fault Tolerant Flight Control (FTFC) concepts and their real-time performance to accommodate in-flight failures.
An Unbiased Method To Build Benchmarking Sets for Ligand-Based Virtual Screening and its Application To GPCRs

PubMed Central

2015-01-01

Benchmarking data sets have become common in recent years for the purpose of virtual screening, though the main focus had been placed on the structure-based virtual screening (SBVS) approaches. Due to the lack of crystal structures, there is great need for unbiased benchmarking sets to evaluate various ligand-based virtual screening (LBVS) methods for important drug targets such as G protein-coupled receptors (GPCRs). To date these ready-to-apply data sets for LBVS are fairly limited, and the direct usage of benchmarking sets designed for SBVS could bring the biases to the evaluation of LBVS. Herein, we propose an unbiased method to build benchmarking sets for LBVS and validate it on a multitude of GPCRs targets. To be more specific, our methods can (1) ensure chemical diversity of ligands, (2) maintain the physicochemical similarity between ligands and decoys, (3) make the decoys dissimilar in chemical topology to all ligands to avoid false negatives, and (4) maximize spatial random distribution of ligands and decoys. We evaluated the quality of our Unbiased Ligand Set (ULS) and Unbiased Decoy Set (UDS) using three common LBVS approaches, with Leave-One-Out (LOO) Cross-Validation (CV) and a metric of average AUC of the ROC curves. Our method has greatly reduced the “artificial enrichment” and “analogue bias” of a published GPCRs benchmarking set, i.e., GPCR Ligand Library (GLL)/GPCR Decoy Database (GDD). In addition, we addressed an important issue about the ratio of decoys per ligand and found that for a range of 30 to 100 it does not affect the quality of the benchmarking set, so we kept the original ratio of 39 from the GLL/GDD. PMID:24749745
An unbiased method to build benchmarking sets for ligand-based virtual screening and its application to GPCRs.

PubMed

Xia, Jie; Jin, Hongwei; Liu, Zhenming; Zhang, Liangren; Wang, Xiang Simon

2014-05-27

Benchmarking data sets have become common in recent years for the purpose of virtual screening, though the main focus had been placed on the structure-based virtual screening (SBVS) approaches. Due to the lack of crystal structures, there is great need for unbiased benchmarking sets to evaluate various ligand-based virtual screening (LBVS) methods for important drug targets such as G protein-coupled receptors (GPCRs). To date these ready-to-apply data sets for LBVS are fairly limited, and the direct usage of benchmarking sets designed for SBVS could bring the biases to the evaluation of LBVS. Herein, we propose an unbiased method to build benchmarking sets for LBVS and validate it on a multitude of GPCRs targets. To be more specific, our methods can (1) ensure chemical diversity of ligands, (2) maintain the physicochemical similarity between ligands and decoys, (3) make the decoys dissimilar in chemical topology to all ligands to avoid false negatives, and (4) maximize spatial random distribution of ligands and decoys. We evaluated the quality of our Unbiased Ligand Set (ULS) and Unbiased Decoy Set (UDS) using three common LBVS approaches, with Leave-One-Out (LOO) Cross-Validation (CV) and a metric of average AUC of the ROC curves. Our method has greatly reduced the "artificial enrichment" and "analogue bias" of a published GPCRs benchmarking set, i.e., GPCR Ligand Library (GLL)/GPCR Decoy Database (GDD). In addition, we addressed an important issue about the ratio of decoys per ligand and found that for a range of 30 to 100 it does not affect the quality of the benchmarking set, so we kept the original ratio of 39 from the GLL/GDD.
Benchmarking high performance computing architectures with CMS’ skeleton framework

DOE PAGES

Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.

2017-11-23

Here, in 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta,more » Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.« less
Benchmarking high performance computing architectures with CMS’ skeleton framework

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.

Here, in 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta,more » Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.« less
Evaluation of CHO Benchmarks on the Arria 10 FPGA using Intel FPGA SDK for OpenCL

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jin, Zheming; Yoshii, Kazutomo; Finkel, Hal

The OpenCL standard is an open programming model for accelerating algorithms on heterogeneous computing system. OpenCL extends the C-based programming language for developing portable codes on different platforms such as CPU, Graphics processing units (GPUs), Digital Signal Processors (DSPs) and Field Programmable Gate Arrays (FPGAs). The Intel FPGA SDK for OpenCL is a suite of tools that allows developers to abstract away the complex FPGA-based development flow for a high-level software development flow. Users can focus on the design of hardware-accelerated kernel functions in OpenCL and then direct the tools to generate the low-level FPGA implementations. The approach makes themore » FPGA-based development more accessible to software users as the needs for hybrid computing using CPUs and FPGAs are increasing. It can also significantly reduce the hardware development time as users can evaluate different ideas with high-level language without deep FPGA domain knowledge. Benchmarking of OpenCL-based framework is an effective way for analyzing the performance of system by studying the execution of the benchmark applications. CHO is a suite of benchmark applications that provides support for OpenCL [1]. The authors presented CHO as an OpenCL port of the CHStone benchmark. Using Altera OpenCL (AOCL) compiler to synthesize the benchmark applications, they listed the resource usage and performance of each kernel that can be successfully synthesized by the compiler. In this report, we evaluate the resource usage and performance of the CHO benchmark applications using the Intel FPGA SDK for OpenCL and Nallatech 385A FPGA board that features an Arria 10 FPGA device. The focus of the report is to have a better understanding of the resource usage and performance of the kernel implementations using Arria-10 FPGA devices compared to Stratix-5 FPGA devices. In addition, we also gain knowledge about the limitations of the current compiler when it fails to synthesize a benchmark application.« less
Developing and Trialling an independent, scalable and repeatable IT-benchmarking procedure for healthcare organisations.

PubMed

Liebe, J D; Hübner, U

2013-01-01

Continuous improvements of IT-performance in healthcare organisations require actionable performance indicators, regularly conducted, independent measurements and meaningful and scalable reference groups. Existing IT-benchmarking initiatives have focussed on the development of reliable and valid indicators, but less on the questions about how to implement an environment for conducting easily repeatable and scalable IT-benchmarks. This study aims at developing and trialling a procedure that meets the afore-mentioned requirements. We chose a well established, regularly conducted (inter-) national IT-survey of healthcare organisations (IT-Report Healthcare) as the environment and offered the participants of the 2011 survey (CIOs of hospitals) to enter a benchmark. The 61 structural and functional performance indicators covered among others the implementation status and integration of IT-systems and functions, global user satisfaction and the resources of the IT-department. Healthcare organisations were grouped by size and ownership. The benchmark results were made available electronically and feedback on the use of these results was requested after several months. Fifty-ninehospitals participated in the benchmarking. Reference groups consisted of up to 141 members depending on the number of beds (size) and the ownership (public vs. private). A total of 122 charts showing single indicator frequency views were sent to each participant. The evaluation showed that 94.1% of the CIOs who participated in the evaluation considered this benchmarking beneficial and reported that they would enter again. Based on the feedback of the participants we developed two additional views that provide a more consolidated picture. The results demonstrate that establishing an independent, easily repeatable and scalable IT-benchmarking procedure is possible and was deemed desirable. Based on these encouraging results a new benchmarking round which includes process indicators is currently conducted.
PMLB: a large benchmark suite for machine learning evaluation and comparison.

PubMed

Olson, Randal S; La Cava, William; Orzechowski, Patryk; Urbanowicz, Ryan J; Moore, Jason H

2017-01-01

The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. From this study, we find that existing benchmarks lack the diversity to properly benchmark machine learning algorithms, and there are several gaps in benchmarking problems that still need to be considered. This work represents another important step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.
Scale/TSUNAMI Sensitivity Data for ICSBEP Evaluations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rearden, Bradley T; Reed, Davis Allan; Lefebvre, Robert A

2011-01-01

The Tools for Sensitivity and Uncertainty Analysis Methodology Implementation (TSUNAMI) software developed at Oak Ridge National Laboratory (ORNL) as part of the Scale code system provide unique methods for code validation, gap analysis, and experiment design. For TSUNAMI analysis, sensitivity data are generated for each application and each existing or proposed experiment used in the assessment. The validation of diverse sets of applications requires potentially thousands of data files to be maintained and organized by the user, and a growing number of these files are available through the International Handbook of Evaluated Criticality Safety Benchmark Experiments (IHECSBE) distributed through themore » International Criticality Safety Benchmark Evaluation Program (ICSBEP). To facilitate the use of the IHECSBE benchmarks in rigorous TSUNAMI validation and gap analysis techniques, ORNL generated SCALE/TSUNAMI sensitivity data files (SDFs) for several hundred benchmarks for distribution with the IHECSBE. For the 2010 edition of IHECSBE, the sensitivity data were generated using 238-group cross-section data based on ENDF/B-VII.0 for 494 benchmark experiments. Additionally, ORNL has developed a quality assurance procedure to guide the generation of Scale inputs and sensitivity data, as well as a graphical user interface to facilitate the use of sensitivity data in identifying experiments and applying them in validation studies.« less

A benchmarking tool to evaluate computer tomography perfusion infarct core predictions against a DWI standard.

PubMed

Cereda, Carlo W; Christensen, Søren; Campbell, Bruce Cv; Mishra, Nishant K; Mlynash, Michael; Levi, Christopher; Straka, Matus; Wintermark, Max; Bammer, Roland; Albers, Gregory W; Parsons, Mark W; Lansberg, Maarten G

2016-10-01

Differences in research methodology have hampered the optimization of Computer Tomography Perfusion (CTP) for identification of the ischemic core. We aim to optimize CTP core identification using a novel benchmarking tool. The benchmarking tool consists of an imaging library and a statistical analysis algorithm to evaluate the performance of CTP. The tool was used to optimize and evaluate an in-house developed CTP-software algorithm. Imaging data of 103 acute stroke patients were included in the benchmarking tool. Median time from stroke onset to CT was 185 min (IQR 180-238), and the median time between completion of CT and start of MRI was 36 min (IQR 25-79). Volumetric accuracy of the CTP-ROIs was optimal at an rCBF threshold of <38%; at this threshold, the mean difference was 0.3 ml (SD 19.8 ml), the mean absolute difference was 14.3 (SD 13.7) ml, and CTP was 67% sensitive and 87% specific for identification of DWI positive tissue voxels. The benchmarking tool can play an important role in optimizing CTP software as it provides investigators with a novel method to directly compare the performance of alternative CTP software packages. © The Author(s) 2015.
Performance Evaluation and Benchmarking of Next Intelligent Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

del Pobil, Angel; Madhavan, Raj; Bonsignorio, Fabio

Performance Evaluation and Benchmarking of Intelligent Systems presents research dedicated to the subject of performance evaluation and benchmarking of intelligent systems by drawing from the experiences and insights of leading experts gained both through theoretical development and practical implementation of intelligent systems in a variety of diverse application domains. This contributed volume offers a detailed and coherent picture of state-of-the-art, recent developments, and further research areas in intelligent systems. The chapters cover a broad range of applications, such as assistive robotics, planetary surveying, urban search and rescue, and line tracking for automotive assembly. Subsystems or components described in this bookmore » include human-robot interaction, multi-robot coordination, communications, perception, and mapping. Chapters are also devoted to simulation support and open source software for cognitive platforms, providing examples of the type of enabling underlying technologies that can help intelligent systems to propagate and increase in capabilities. Performance Evaluation and Benchmarking of Intelligent Systems serves as a professional reference for researchers and practitioners in the field. This book is also applicable to advanced courses for graduate level students and robotics professionals in a wide range of engineering and related disciplines including computer science, automotive, healthcare, manufacturing, and service robotics.« less
Availability of Neutronics Benchmarks in the ICSBEP and IRPhEP Handbooks for Computational Tools Testing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bess, John D.; Briggs, J. Blair; Ivanova, Tatiana

2017-02-01

In the past several decades, numerous experiments have been performed worldwide to support reactor operations, measurements, design, and nuclear safety. Those experiments represent an extensive international investment in infrastructure, expertise, and cost, representing significantly valuable resources of data supporting past, current, and future research activities. Those valuable assets represent the basis for recording, development, and validation of our nuclear methods and integral nuclear data [1]. The loss of these experimental data, which has occurred all too much in the recent years, is tragic. The high cost to repeat many of these measurements can be prohibitive, if not impossible, to surmount.more » Two international projects were developed, and are under the direction of the Organisation for Co-operation and Development Nuclear Energy Agency (OECD NEA) to address the challenges of not just data preservation, but evaluation of the data to determine its merit for modern and future use. The International Criticality Safety Benchmark Evaluation Project (ICSBEP) was established to identify and verify comprehensive critical benchmark data sets; evaluate the data, including quantification of biases and uncertainties; compile the data and calculations in a standardized format; and formally document the effort into a single source of verified benchmark data [2]. Similarly, the International Reactor Physics Experiment Evaluation Project (IRPhEP) was established to preserve integral reactor physics experimental data, including separate or special effects data for nuclear energy and technology applications [3]. Annually, contributors from around the world continue to collaborate in the evaluation and review of select benchmark experiments for preservation and dissemination. The extensively peer-reviewed integral benchmark data can then be utilized to support nuclear design and safety analysts to validate the analytical tools, methods, and data needed for next-generation reactor design, safety analysis requirements, and all other front- and back-end activities contributing to the overall nuclear fuel cycle where quality neutronics calculations are paramount.« less
Metric Evaluation Pipeline for 3d Modeling of Urban Scenes

NASA Astrophysics Data System (ADS)

Bosch, M.; Leichtman, A.; Chilcott, D.; Goldberg, H.; Brown, M.

2017-05-01

Publicly available benchmark data and metric evaluation approaches have been instrumental in enabling research to advance state of the art methods for remote sensing applications in urban 3D modeling. Most publicly available benchmark datasets have consisted of high resolution airborne imagery and lidar suitable for 3D modeling on a relatively modest scale. To enable research in larger scale 3D mapping, we have recently released a public benchmark dataset with multi-view commercial satellite imagery and metrics to compare 3D point clouds with lidar ground truth. We now define a more complete metric evaluation pipeline developed as publicly available open source software to assess semantically labeled 3D models of complex urban scenes derived from multi-view commercial satellite imagery. Evaluation metrics in our pipeline include horizontal and vertical accuracy and completeness, volumetric completeness and correctness, perceptual quality, and model simplicity. Sources of ground truth include airborne lidar and overhead imagery, and we demonstrate a semi-automated process for producing accurate ground truth shape files to characterize building footprints. We validate our current metric evaluation pipeline using 3D models produced using open source multi-view stereo methods. Data and software is made publicly available to enable further research and planned benchmarking activities.
Using Benchmarking To Strengthen the Assessment of Persistence.

PubMed

McLachlan, Michael S; Zou, Hongyan; Gouin, Todd

2017-01-03

Chemical persistence is a key property for assessing chemical risk and chemical hazard. Current methods for evaluating persistence are based on laboratory tests. The relationship between the laboratory based estimates and persistence in the environment is often unclear, in which case the current methods for evaluating persistence can be questioned. Chemical benchmarking opens new possibilities to measure persistence in the field. In this paper we explore how the benchmarking approach can be applied in both the laboratory and the field to deepen our understanding of chemical persistence in the environment and create a firmer scientific basis for laboratory to field extrapolation of persistence test results.
SP2Bench: A SPARQL Performance Benchmark

NASA Astrophysics Data System (ADS)

Schmidt, Michael; Hornung, Thomas; Meier, Michael; Pinkel, Christoph; Lausen, Georg

A meaningful analysis and comparison of both existing storage schemes for RDF data and evaluation approaches for SPARQL queries necessitates a comprehensive and universal benchmark platform. We present SP2Bench, a publicly available, language-specific performance benchmark for the SPARQL query language. SP2Bench is settled in the DBLP scenario and comprises a data generator for creating arbitrarily large DBLP-like documents and a set of carefully designed benchmark queries. The generated documents mirror vital key characteristics and social-world distributions encountered in the original DBLP data set, while the queries implement meaningful requests on top of this data, covering a variety of SPARQL operator constellations and RDF access patterns. In this chapter, we discuss requirements and desiderata for SPARQL benchmarks and present the SP2Bench framework, including its data generator, benchmark queries and performance metrics.
Cascade Optimization Strategy with Neural Network and Regression Approximations Demonstrated on a Preliminary Aircraft Engine Design

NASA Technical Reports Server (NTRS)

Hopkins, Dale A.; Patnaik, Surya N.

2000-01-01

A preliminary aircraft engine design methodology is being developed that utilizes a cascade optimization strategy together with neural network and regression approximation methods. The cascade strategy employs different optimization algorithms in a specified sequence. The neural network and regression methods are used to approximate solutions obtained from the NASA Engine Performance Program (NEPP), which implements engine thermodynamic cycle and performance analysis models. The new methodology is proving to be more robust and computationally efficient than the conventional optimization approach of using a single optimization algorithm with direct reanalysis. The methodology has been demonstrated on a preliminary design problem for a novel subsonic turbofan engine concept that incorporates a wave rotor as a cycle-topping device. Computations of maximum thrust were obtained for a specific design point in the engine mission profile. The results (depicted in the figure) show a significant improvement in the maximum thrust obtained using the new methodology in comparison to benchmark solutions obtained using NEPP in a manual design mode.
High-order continuum kinetic method for modeling plasma dynamics in phase space

DOE PAGES

Vogman, G. V.; Colella, P.; Shumlak, U.

2014-12-15

Continuum methods offer a high-fidelity means of simulating plasma kinetics. While computationally intensive, these methods are advantageous because they can be cast in conservation-law form, are not susceptible to noise, and can be implemented using high-order numerical methods. Advances in continuum method capabilities for modeling kinetic phenomena in plasmas require the development of validation tools in higher dimensional phase space and an ability to handle non-cartesian geometries. To that end, a new benchmark for validating Vlasov-Poisson simulations in 3D (x,v x,v y) is presented. The benchmark is based on the Dory-Guest-Harris instability and is successfully used to validate a continuummore » finite volume algorithm. To address challenges associated with non-cartesian geometries, unique features of cylindrical phase space coordinates are described. Preliminary results of continuum kinetic simulations in 4D (r,z,v r,v z) phase space are presented.« less
Source-term development for a contaminant plume for use by multimedia risk assessment models

NASA Astrophysics Data System (ADS)

Whelan, Gene; McDonald, John P.; Taira, Randal Y.; Gnanapragasam, Emmanuel K.; Yu, Charley; Lew, Christine S.; Mills, William B.

2000-02-01

Multimedia modelers from the US Environmental Protection Agency (EPA) and US Department of Energy (DOE) are collaborating to conduct a comprehensive and quantitative benchmarking analysis of four intermedia models: MEPAS, MMSOILS, PRESTO, and RESRAD. These models represent typical analytically based tools that are used in human-risk and endangerment assessments at installations containing radioactive and hazardous contaminants. The objective is to demonstrate an approach for developing an adequate source term by simplifying an existing, real-world, 90Sr plume at DOE's Hanford installation in Richland, WA, for use in a multimedia benchmarking exercise between MEPAS, MMSOILS, PRESTO, and RESRAD. Source characteristics and a release mechanism are developed and described; also described is a typical process and procedure that an analyst would follow in developing a source term for using this class of analytical tool in a preliminary assessment.
Field Performance of Photovoltaic Systems in the Tucson Desert

NASA Astrophysics Data System (ADS)

Orsburn, Sean; Brooks, Adria; Cormode, Daniel; Greenberg, James; Hardesty, Garrett; Lonij, Vincent; Salhab, Anas; St. Germaine, Tyler; Torres, Gabe; Cronin, Alexander

2011-10-01

At the Tucson Electric Power (TEP) solar test yard, over 20 different grid-connected photovoltaic (PV) systems are being tested. The goal at the TEP solar test yard is to measure and model real-world performance of PV systems and to benchmark new technologies such as holographic concentrators. By studying voltage and current produced by the PV systems as a function of incident irradiance, and module temperature, we can compare our measurements of field-performance (in a harsh desert environment) to manufacturer specifications (determined under laboratory conditions). In order to measure high-voltage and high-current signals, we designed and built reliable, accurate sensors that can handle extreme desert temperatures. We will present several benchmarks of sensors in a controlled environment, including shunt resistors and Hall-effect current sensors, to determine temperature drift and accuracy. Finally we will present preliminary field measurements of PV performance for several different PV technologies.
International benchmarking of specialty hospitals. A series of case studies on comprehensive cancer centres.

PubMed

van Lent, Wineke A M; de Beer, Relinde D; van Harten, Wim H

2010-08-31

Benchmarking is one of the methods used in business that is applied to hospitals to improve the management of their operations. International comparison between hospitals can explain performance differences. As there is a trend towards specialization of hospitals, this study examines the benchmarking process and the success factors of benchmarking in international specialized cancer centres. Three independent international benchmarking studies on operations management in cancer centres were conducted. The first study included three comprehensive cancer centres (CCC), three chemotherapy day units (CDU) were involved in the second study and four radiotherapy departments were included in the final study. Per multiple case study a research protocol was used to structure the benchmarking process. After reviewing the multiple case studies, the resulting description was used to study the research objectives. We adapted and evaluated existing benchmarking processes through formalizing stakeholder involvement and verifying the comparability of the partners. We also devised a framework to structure the indicators to produce a coherent indicator set and better improvement suggestions. Evaluating the feasibility of benchmarking as a tool to improve hospital processes led to mixed results. Case study 1 resulted in general recommendations for the organizations involved. In case study 2, the combination of benchmarking and lean management led in one CDU to a 24% increase in bed utilization and a 12% increase in productivity. Three radiotherapy departments of case study 3, were considering implementing the recommendations.Additionally, success factors, such as a well-defined and small project scope, partner selection based on clear criteria, stakeholder involvement, simple and well-structured indicators, analysis of both the process and its results and, adapt the identified better working methods to the own setting, were found. The improved benchmarking process and the success factors can produce relevant input to improve the operations management of specialty hospitals.
International benchmarking of specialty hospitals. A series of case studies on comprehensive cancer centres

PubMed Central

2010-01-01

Background Benchmarking is one of the methods used in business that is applied to hospitals to improve the management of their operations. International comparison between hospitals can explain performance differences. As there is a trend towards specialization of hospitals, this study examines the benchmarking process and the success factors of benchmarking in international specialized cancer centres. Methods Three independent international benchmarking studies on operations management in cancer centres were conducted. The first study included three comprehensive cancer centres (CCC), three chemotherapy day units (CDU) were involved in the second study and four radiotherapy departments were included in the final study. Per multiple case study a research protocol was used to structure the benchmarking process. After reviewing the multiple case studies, the resulting description was used to study the research objectives. Results We adapted and evaluated existing benchmarking processes through formalizing stakeholder involvement and verifying the comparability of the partners. We also devised a framework to structure the indicators to produce a coherent indicator set and better improvement suggestions. Evaluating the feasibility of benchmarking as a tool to improve hospital processes led to mixed results. Case study 1 resulted in general recommendations for the organizations involved. In case study 2, the combination of benchmarking and lean management led in one CDU to a 24% increase in bed utilization and a 12% increase in productivity. Three radiotherapy departments of case study 3, were considering implementing the recommendations. Additionally, success factors, such as a well-defined and small project scope, partner selection based on clear criteria, stakeholder involvement, simple and well-structured indicators, analysis of both the process and its results and, adapt the identified better working methods to the own setting, were found. Conclusions The improved benchmarking process and the success factors can produce relevant input to improve the operations management of specialty hospitals. PMID:20807408
Benchmarking Diagnostic Algorithms on an Electrical Power System Testbed

NASA Technical Reports Server (NTRS)

Kurtoglu, Tolga; Narasimhan, Sriram; Poll, Scott; Garcia, David; Wright, Stephanie

2009-01-01

Diagnostic algorithms (DAs) are key to enabling automated health management. These algorithms are designed to detect and isolate anomalies of either a component or the whole system based on observations received from sensors. In recent years a wide range of algorithms, both model-based and data-driven, have been developed to increase autonomy and improve system reliability and affordability. However, the lack of support to perform systematic benchmarking of these algorithms continues to create barriers for effective development and deployment of diagnostic technologies. In this paper, we present our efforts to benchmark a set of DAs on a common platform using a framework that was developed to evaluate and compare various performance metrics for diagnostic technologies. The diagnosed system is an electrical power system, namely the Advanced Diagnostics and Prognostics Testbed (ADAPT) developed and located at the NASA Ames Research Center. The paper presents the fundamentals of the benchmarking framework, the ADAPT system, description of faults and data sets, the metrics used for evaluation, and an in-depth analysis of benchmarking results obtained from testing ten diagnostic algorithms on the ADAPT electrical power system testbed.
Evaluation and optimization of virtual screening workflows with DEKOIS 2.0--a public library of challenging docking benchmark sets.

PubMed

Bauer, Matthias R; Ibrahim, Tamer M; Vogel, Simon M; Boeckler, Frank M

2013-06-24

The application of molecular benchmarking sets helps to assess the actual performance of virtual screening (VS) workflows. To improve the efficiency of structure-based VS approaches, the selection and optimization of various parameters can be guided by benchmarking. With the DEKOIS 2.0 library, we aim to further extend and complement the collection of publicly available decoy sets. Based on BindingDB bioactivity data, we provide 81 new and structurally diverse benchmark sets for a wide variety of different target classes. To ensure a meaningful selection of ligands, we address several issues that can be found in bioactivity data. We have improved our previously introduced DEKOIS methodology with enhanced physicochemical matching, now including the consideration of molecular charges, as well as a more sophisticated elimination of latent actives in the decoy set (LADS). We evaluate the docking performance of Glide, GOLD, and AutoDock Vina with our data sets and highlight existing challenges for VS tools. All DEKOIS 2.0 benchmark sets will be made accessible at http://www.dekois.com.
Providing Nuclear Criticality Safety Analysis Education through Benchmark Experiment Evaluation

DOE Office of Scientific and Technical Information (OSTI.GOV)

John D. Bess; J. Blair Briggs; David W. Nigg

2009-11-01

One of the challenges that today's new workforce of nuclear criticality safety engineers face is the opportunity to provide assessment of nuclear systems and establish safety guidelines without having received significant experience or hands-on training prior to graduation. Participation in the International Criticality Safety Benchmark Evaluation Project (ICSBEP) and/or the International Reactor Physics Experiment Evaluation Project (IRPhEP) provides students and young professionals the opportunity to gain experience and enhance critical engineering skills.
Aircraft Engine Gas Path Diagnostic Methods: Public Benchmarking Results

NASA Technical Reports Server (NTRS)

Simon, Donald L.; Borguet, Sebastien; Leonard, Olivier; Zhang, Xiaodong (Frank)

2013-01-01

Recent technology reviews have identified the need for objective assessments of aircraft engine health management (EHM) technologies. To help address this issue, a gas path diagnostic benchmark problem has been created and made publicly available. This software tool, referred to as the Propulsion Diagnostic Method Evaluation Strategy (ProDiMES), has been constructed based on feedback provided by the aircraft EHM community. It provides a standard benchmark problem enabling users to develop, evaluate and compare diagnostic methods. This paper will present an overview of ProDiMES along with a description of four gas path diagnostic methods developed and applied to the problem. These methods, which include analytical and empirical diagnostic techniques, will be described and associated blind-test-case metric results will be presented and compared. Lessons learned along with recommendations for improving the public benchmarking processes will also be presented and discussed.
Evaluation of the ACEC Benchmark Suite for Real-Time Applications

DTIC Science & Technology

1990-07-23

1.0 benchmark suite waSanalyzed with respect to its measuring of Ada real-time features such as tasking, memory management, input/output, scheduling...and delay statement, Chapter 13 features , pragmas, interrupt handling, subprogram overhead, numeric computations etc. For most of the features that...meant for programming real-time systems. The ACEC benchmarks have been analyzed extensively with respect to their measuring of Ada real-time features
An automated protocol for performance benchmarking a widefield fluorescence microscope.

PubMed

Halter, Michael; Bier, Elianna; DeRose, Paul C; Cooksey, Gregory A; Choquette, Steven J; Plant, Anne L; Elliott, John T

2014-11-01

Widefield fluorescence microscopy is a highly used tool for visually assessing biological samples and for quantifying cell responses. Despite its widespread use in high content analysis and other imaging applications, few published methods exist for evaluating and benchmarking the analytical performance of a microscope. Easy-to-use benchmarking methods would facilitate the use of fluorescence imaging as a quantitative analytical tool in research applications, and would aid the determination of instrumental method validation for commercial product development applications. We describe and evaluate an automated method to characterize a fluorescence imaging system's performance by benchmarking the detection threshold, saturation, and linear dynamic range to a reference material. The benchmarking procedure is demonstrated using two different materials as the reference material, uranyl-ion-doped glass and Schott 475 GG filter glass. Both are suitable candidate reference materials that are homogeneously fluorescent and highly photostable, and the Schott 475 GG filter glass is currently commercially available. In addition to benchmarking the analytical performance, we also demonstrate that the reference materials provide for accurate day to day intensity calibration. Published 2014 Wiley Periodicals Inc. Published 2014 Wiley Periodicals Inc. This article is a US government work and, as such, is in the public domain in the United States of America.
Medical school benchmarking - from tools to programmes.

PubMed

Wilkinson, Tim J; Hudson, Judith N; Mccoll, Geoffrey J; Hu, Wendy C Y; Jolly, Brian C; Schuwirth, Lambert W T

2015-02-01

Benchmarking among medical schools is essential, but may result in unwanted effects. To apply a conceptual framework to selected benchmarking activities of medical schools. We present an analogy between the effects of assessment on student learning and the effects of benchmarking on medical school educational activities. A framework by which benchmarking can be evaluated was developed and applied to key current benchmarking activities in Australia and New Zealand. The analogy generated a conceptual framework that tested five questions to be considered in relation to benchmarking: what is the purpose? what are the attributes of value? what are the best tools to assess the attributes of value? what happens to the results? and, what is the likely "institutional impact" of the results? If the activities were compared against a blueprint of desirable medical graduate outcomes, notable omissions would emerge. Medical schools should benchmark their performance on a range of educational activities to ensure quality improvement and to assure stakeholders that standards are being met. Although benchmarking potentially has positive benefits, it could also result in perverse incentives with unforeseen and detrimental effects on learning if it is undertaken using only a few selected assessment tools.
Processing and validation of JEFF-3.1.1 and ENDF/B-VII.0 group-wise cross section libraries for shielding calculations

NASA Astrophysics Data System (ADS)

Pescarini, M.; Sinitsa, V.; Orsi, R.; Frisoni, M.

2013-03-01

This paper presents a synthesis of the ENEA-Bologna Nuclear Data Group programme dedicated to generate and validate group-wise cross section libraries for shielding and radiation damage deterministic calculations in nuclear fission reactors, following the data processing methodology recommended in the ANSI/ANS-6.1.2-1999 (R2009) American Standard. The VITJEFF311.BOLIB and VITENDF70.BOLIB finegroup coupled n-γ (199 n + 42 γ - VITAMIN-B6 structure) multi-purpose cross section libraries, based on the Bondarenko method for neutron resonance self-shielding and respectively on JEFF-3.1.1 and ENDF/B-VII.0 evaluated nuclear data, were produced in AMPX format using the NJOY-99.259 and the ENEA-Bologna 2007 Revision of the SCAMPI nuclear data processing systems. Two derived broad-group coupled n-γ (47 n + 20 γ - BUGLE-96 structure) working cross section libraries in FIDO-ANISN format for LWR shielding and pressure vessel dosimetry calculations, named BUGJEFF311.BOLIB and BUGENDF70.BOLIB, were generated by the revised version of SCAMPI, through problem-dependent cross section collapsing and self-shielding from the cited fine-group libraries. The validation results on the criticality safety benchmark experiments for the fine-group libraries and the preliminary validation results for the broad-group working libraries on the PCA-Replica and VENUS-3 engineering neutron shielding benchmark experiments are reported in synthesis.

Accelerating progress in Artificial General Intelligence: Choosing a benchmark for natural world interaction

NASA Astrophysics Data System (ADS)

Rohrer, Brandon

2010-12-01

Measuring progress in the field of Artificial General Intelligence (AGI) can be difficult without commonly accepted methods of evaluation. An AGI benchmark would allow evaluation and comparison of the many computational intelligence algorithms that have been developed. In this paper I propose that a benchmark for natural world interaction would possess seven key characteristics: fitness, breadth, specificity, low cost, simplicity, range, and task focus. I also outline two benchmark examples that meet most of these criteria. In the first, the direction task, a human coach directs a machine to perform a novel task in an unfamiliar environment. The direction task is extremely broad, but may be idealistic. In the second, the AGI battery, AGI candidates are evaluated based on their performance on a collection of more specific tasks. The AGI battery is designed to be appropriate to the capabilities of currently existing systems. Both the direction task and the AGI battery would require further definition before implementing. The paper concludes with a description of a task that might be included in the AGI battery: the search and retrieve task.
Rethinking the reference collection: exploring benchmarks and e-book availability.

PubMed

Husted, Jeffrey T; Czechowski, Leslie J

2012-01-01

Librarians in the Health Sciences Library System at the University of Pittsburgh explored the possibility of developing an electronic reference collection that would replace the print reference collection, thus providing access to these valuable materials to a widely dispersed user population. The librarians evaluated the print reference collection and standard collection development lists as potential benchmarks for the electronic collection, and they determined which books were available in electronic format. They decided that the low availability of electronic versions of titles in each benchmark group rendered the creation of an electronic reference collection using either benchmark impractical.
Benchmarking expert system tools

NASA Technical Reports Server (NTRS)

Riley, Gary

1988-01-01

As part of its evaluation of new technologies, the Artificial Intelligence Section of the Mission Planning and Analysis Div. at NASA-Johnson has made timing tests of several expert system building tools. Among the production systems tested were Automated Reasoning Tool, several versions of OPS5, and CLIPS (C Language Integrated Production System), an expert system builder developed by the AI section. Also included in the test were a Zetalisp version of the benchmark along with four versions of the benchmark written in Knowledge Engineering Environment, an object oriented, frame based expert system tool. The benchmarks used for testing are studied.
A benchmarking method to measure dietary absorption efficiency of chemicals by fish.

PubMed

Xiao, Ruiyang; Adolfsson-Erici, Margaretha; Åkerman, Gun; McLachlan, Michael S; MacLeod, Matthew

2013-12-01

Understanding the dietary absorption efficiency of chemicals in the gastrointestinal tract of fish is important from both a scientific and a regulatory point of view. However, reported fish absorption efficiencies for well-studied chemicals are highly variable. In the present study, the authors developed and exploited an internal chemical benchmarking method that has the potential to reduce uncertainty and variability and, thus, to improve the precision of measurements of fish absorption efficiency. The authors applied the benchmarking method to measure the gross absorption efficiency for 15 chemicals with a wide range of physicochemical properties and structures. They selected 2,2',5,6'-tetrachlorobiphenyl (PCB53) and decabromodiphenyl ethane as absorbable and nonabsorbable benchmarks, respectively. Quantities of chemicals determined in fish were benchmarked to the fraction of PCB53 recovered in fish, and quantities of chemicals determined in feces were benchmarked to the fraction of decabromodiphenyl ethane recovered in feces. The performance of the benchmarking procedure was evaluated based on the recovery of the test chemicals and precision of absorption efficiency from repeated tests. Benchmarking did not improve the precision of the measurements; after benchmarking, however, the median recovery for 15 chemicals was 106%, and variability of recoveries was reduced compared with before benchmarking, suggesting that benchmarking could account for incomplete extraction of chemical in fish and incomplete collection of feces from different tests. © 2013 SETAC.
Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data

PubMed Central

2014-01-01

Background The rapid evolution in high-throughput sequencing (HTS) technologies has opened up new perspectives in several research fields and led to the production of large volumes of sequence data. A fundamental step in HTS data analysis is the mapping of reads onto reference sequences. Choosing a suitable mapper for a given technology and a given application is a subtle task because of the difficulty of evaluating mapping algorithms. Results In this paper, we present a benchmark procedure to compare mapping algorithms used in HTS using both real and simulated datasets and considering four evaluation criteria: computational resource and time requirements, robustness of mapping, ability to report positions for reads in repetitive regions, and ability to retrieve true genetic variation positions. To measure robustness, we introduced a new definition for a correctly mapped read taking into account not only the expected start position of the read but also the end position and the number of indels and substitutions. We developed CuReSim, a new read simulator, that is able to generate customized benchmark data for any kind of HTS technology by adjusting parameters to the error types. CuReSim and CuReSimEval, a tool to evaluate the mapping quality of the CuReSim simulated reads, are freely available. We applied our benchmark procedure to evaluate 14 mappers in the context of whole genome sequencing of small genomes with Ion Torrent data for which such a comparison has not yet been established. Conclusions A benchmark procedure to compare HTS data mappers is introduced with a new definition for the mapping correctness as well as tools to generate simulated reads and evaluate mapping quality. The application of this procedure to Ion Torrent data from the whole genome sequencing of small genomes has allowed us to validate our benchmark procedure and demonstrate that it is helpful for selecting a mapper based on the intended application, questions to be addressed, and the technology used. This benchmark procedure can be used to evaluate existing or in-development mappers as well as to optimize parameters of a chosen mapper for any application and any sequencing platform. PMID:24708189
Benchmarking of HEU Mental Annuli Critical Assemblies with Internally Reflected Graphite Cylinder

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xiaobo, Liu; Bess, John D.; Marshall, Margaret A.

Three experimental configurations of critical assemblies, performed in 1963 at the Oak Ridge Critical Experiment Facility, which are assembled using three different diameter HEU annuli (15-9 inches, 15-7 inches and 13-7 inches) metal annuli with internally reflected graphite cylinder are evaluated and benchmarked. The experimental uncertainties which are 0.00055, 0.00055 and 0.00055 respectively, and biases to the detailed benchmark models which are -0.00179, -0.00189 and -0.00114 respectively, were determined, and the experimental benchmark keff results were obtained for both detailed and simplified model. The calculation results for both detailed and simplified models using MCNP6-1.0 and ENDF VII.1 agree well tomore » the benchmark experimental results with a difference of less than 0.2%. These are acceptable benchmark experiments for inclusion in the ICSBEP Handbook.« less
How to benchmark methods for structure-based virtual screening of large compound libraries.

PubMed

Christofferson, Andrew J; Huang, Niu

2012-01-01

Structure-based virtual screening is a useful computational technique for ligand discovery. To systematically evaluate different docking approaches, it is important to have a consistent benchmarking protocol that is both relevant and unbiased. Here, we describe the designing of a benchmarking data set for docking screen assessment, a standard docking screening process, and the analysis and presentation of the enrichment of annotated ligands among a background decoy database.
Evaluation of state-of-the-art segmentation algorithms for left ventricle infarct from late Gadolinium enhancement MR images.

PubMed

Karim, Rashed; Bhagirath, Pranav; Claus, Piet; James Housden, R; Chen, Zhong; Karimaghaloo, Zahra; Sohn, Hyon-Mok; Lara Rodríguez, Laura; Vera, Sergio; Albà, Xènia; Hennemuth, Anja; Peitgen, Heinz-Otto; Arbel, Tal; Gonzàlez Ballester, Miguel A; Frangi, Alejandro F; Götte, Marco; Razavi, Reza; Schaeffter, Tobias; Rhode, Kawal

2016-05-01

Studies have demonstrated the feasibility of late Gadolinium enhancement (LGE) cardiovascular magnetic resonance (CMR) imaging for guiding the management of patients with sequelae to myocardial infarction, such as ventricular tachycardia and heart failure. Clinical implementation of these developments necessitates a reproducible and reliable segmentation of the infarcted regions. It is challenging to compare new algorithms for infarct segmentation in the left ventricle (LV) with existing algorithms. Benchmarking datasets with evaluation strategies are much needed to facilitate comparison. This manuscript presents a benchmarking evaluation framework for future algorithms that segment infarct from LGE CMR of the LV. The image database consists of 30 LGE CMR images of both humans and pigs that were acquired from two separate imaging centres. A consensus ground truth was obtained for all data using maximum likelihood estimation. Six widely-used fixed-thresholding methods and five recently developed algorithms are tested on the benchmarking framework. Results demonstrate that the algorithms have better overlap with the consensus ground truth than most of the n-SD fixed-thresholding methods, with the exception of the Full-Width-at-Half-Maximum (FWHM) fixed-thresholding method. Some of the pitfalls of fixed thresholding methods are demonstrated in this work. The benchmarking evaluation framework, which is a contribution of this work, can be used to test and benchmark future algorithms that detect and quantify infarct in LGE CMR images of the LV. The datasets, ground truth and evaluation code have been made publicly available through the website: https://www.cardiacatlas.org/web/guest/challenges. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Cloud-Based Evaluation of Anatomical Structure Segmentation and Landmark Detection Algorithms: VISCERAL Anatomy Benchmarks.

PubMed

Jimenez-Del-Toro, Oscar; Muller, Henning; Krenn, Markus; Gruenberg, Katharina; Taha, Abdel Aziz; Winterstein, Marianne; Eggel, Ivan; Foncubierta-Rodriguez, Antonio; Goksel, Orcun; Jakab, Andras; Kontokotsios, Georgios; Langs, Georg; Menze, Bjoern H; Salas Fernandez, Tomas; Schaer, Roger; Walleyo, Anna; Weber, Marc-Andre; Dicente Cid, Yashin; Gass, Tobias; Heinrich, Mattias; Jia, Fucang; Kahl, Fredrik; Kechichian, Razmig; Mai, Dominic; Spanier, Assaf B; Vincent, Graham; Wang, Chunliang; Wyeth, Daniel; Hanbury, Allan

2016-11-01

Variations in the shape and appearance of anatomical structures in medical images are often relevant radiological signs of disease. Automatic tools can help automate parts of this manual process. A cloud-based evaluation framework is presented in this paper including results of benchmarking current state-of-the-art medical imaging algorithms for anatomical structure segmentation and landmark detection: the VISCERAL Anatomy benchmarks. The algorithms are implemented in virtual machines in the cloud where participants can only access the training data and can be run privately by the benchmark administrators to objectively compare their performance in an unseen common test set. Overall, 120 computed tomography and magnetic resonance patient volumes were manually annotated to create a standard Gold Corpus containing a total of 1295 structures and 1760 landmarks. Ten participants contributed with automatic algorithms for the organ segmentation task, and three for the landmark localization task. Different algorithms obtained the best scores in the four available imaging modalities and for subsets of anatomical structures. The annotation framework, resulting data set, evaluation setup, results and performance analysis from the three VISCERAL Anatomy benchmarks are presented in this article. Both the VISCERAL data set and Silver Corpus generated with the fusion of the participant algorithms on a larger set of non-manually-annotated medical images are available to the research community.
Retrospective evaluation of continental-scale streamflow nudging with WRF-Hydro National Water Model V1

NASA Astrophysics Data System (ADS)

McCreight, J. L.; Wu, Y.; Gochis, D.; Rafieeinasab, A.; Dugger, A. L.; Yu, W.; Cosgrove, B.; Cui, Z.; Oubeidillah, A.; Briar, D.

2016-12-01

The streamflow (discharge) data assimilation capability in version 1 of the National Water Model (NWM; a WRF-Hydro configuration) is applied and evaluated in a 5-year (2011-2015) retrospective study using NLDAS2 forcing data over CONUS. This talk will describe the NWM V1 operational nudging (continuous-time) streamflow data assimilation approach, its motivation, and its relationship to this retrospective evaluation. Results from this study will provide a an analysis-based (not forecast-based) benchmark for streamflow DA in the NWM. The goal of the assimilation is to reduce discharge bias and improve channel initial conditions for discharge forecasting (though forecasts are not considered here). The nudging method assimilates discharge observations at nearly 7,000 USGS gages (at frequency up to 1/15 minutes) to produce a (univariate) discharge reanalysis (i.e. this is the only variable affected by the assimilation). By withholding 14% nested gages throughout CONUS in a separate validation run, we evaluate the downstream impact of assimilation at upstream gages. Based on this sample, we estimate the skill of the streamflow reanalysis at ungaged locations and examine factors governing the skill of the assimilation. Comparison of assimilation and open-loop runs is presented. Performance of DA under both high and low flow regimes and selected flooding events is examined. Preliminary evaluation of nudging parameter sensitivity and its relationship to flow regime will be presented.
Benchmarking: A strategic overview of a key management tool

Treesearch

Chris Leclair

1999-01-01

Benchmarking is a continuous, systematic process for evaluating the products, services, and work processes of organizations in an effort to identifY best practices for possible adoption in support of the objectives of enhanced activity service delivery and organizational effectiveness.
Educating Next Generation Nuclear Criticality Safety Engineers at the Idaho National Laboratory

DOE Office of Scientific and Technical Information (OSTI.GOV)

J. D. Bess; J. B. Briggs; A. S. Garcia

2011-09-01

One of the challenges in educating our next generation of nuclear safety engineers is the limitation of opportunities to receive significant experience or hands-on training prior to graduation. Such training is generally restricted to on-the-job-training before this new engineering workforce can adequately provide assessment of nuclear systems and establish safety guidelines. Participation in the International Criticality Safety Benchmark Evaluation Project (ICSBEP) and the International Reactor Physics Experiment Evaluation Project (IRPhEP) can provide students and young professionals the opportunity to gain experience and enhance critical engineering skills. The ICSBEP and IRPhEP publish annual handbooks that contain evaluations of experiments along withmore » summarized experimental data and peer-reviewed benchmark specifications to support the validation of neutronics codes, nuclear cross-section data, and the validation of reactor designs. Participation in the benchmark process not only benefits those who use these Handbooks within the international community, but provides the individual with opportunities for professional development, networking with an international community of experts, and valuable experience to be used in future employment. Traditionally students have participated in benchmarking activities via internships at national laboratories, universities, or companies involved with the ICSBEP and IRPhEP programs. Additional programs have been developed to facilitate the nuclear education of students while participating in the benchmark projects. These programs include coordination with the Center for Space Nuclear Research (CSNR) Next Degree Program, the Collaboration with the Department of Energy Idaho Operations Office to train nuclear and criticality safety engineers, and student evaluations as the basis for their Master's thesis in nuclear engineering.« less
Structural Benchmark Creep Testing for the Advanced Stirling Convertor Heater Head

NASA Technical Reports Server (NTRS)

Krause, David L.; Kalluri, Sreeramesh; Bowman, Randy R.; Shah, Ashwin R.

2008-01-01

The National Aeronautics and Space Administration (NASA) has identified the high efficiency Advanced Stirling Radioisotope Generator (ASRG) as a candidate power source for use on long duration Science missions such as lunar applications, Mars rovers, and deep space missions. For the inherent long life times required, a structurally significant design limit for the heater head component of the ASRG Advanced Stirling Convertor (ASC) is creep deformation induced at low stress levels and high temperatures. Demonstrating proof of adequate margins on creep deformation and rupture for the operating conditions and the MarM-247 material of construction is a challenge that the NASA Glenn Research Center is addressing. The combined analytical and experimental program ensures integrity and high reliability of the heater head for its 17-year design life. The life assessment approach starts with an extensive series of uniaxial creep tests on thin MarM-247 specimens that comprise the same chemistry, microstructure, and heat treatment processing as the heater head itself. This effort addresses a scarcity of openly available creep properties for the material as well as for the virtual absence of understanding of the effect on creep properties due to very thin walls, fine grains, low stress levels, and high-temperature fabrication steps. The approach continues with a considerable analytical effort, both deterministically to evaluate the median creep life using nonlinear finite element analysis, and probabilistically to calculate the heater head s reliability to a higher degree. Finally, the approach includes a substantial structural benchmark creep testing activity to calibrate and validate the analytical work. This last element provides high fidelity testing of prototypical heater head test articles; the testing includes the relevant material issues and the essential multiaxial stress state, and applies prototypical and accelerated temperature profiles for timely results in a highly controlled laboratory environment. This paper focuses on the last element and presents a preliminary methodology for creep rate prediction, the experimental methods, test challenges, and results from benchmark testing of a trial MarM-247 heater head test article. The results compare favorably with the analytical strain predictions. A description of other test findings is provided, and recommendations for future test procedures are suggested. The manuscript concludes with describing the potential impact of the heater head creep life assessment and benchmark testing effort on the ASC program.
The adenosine triphosphate test is a rapid and reliable audit tool to assess manual cleaning adequacy of flexible endoscope channels.

PubMed

Alfa, Michelle J; Fatima, Iram; Olson, Nancy

2013-03-01

The study objective was to verify that the adenosine triphosphate (ATP) benchmark of <200 relative light units (RLUs) was achievable in a busy endoscopy clinic that followed the manufacturer's manual cleaning instructions. All channels from patient-used colonoscopes (20) and duodenoscopes (20) in a tertiary care hospital endoscopy clinic were sampled after manual cleaning and tested for residual ATP. The ATP test benchmark for adequate manual cleaning was set at <200 RLUs. The benchmark for protein was <6.4 μg/cm(2), and, for bioburden, it was <4-log10 colony-forming units/cm(2). Our data demonstrated that 96% (115/120) of channels from 20 colonoscopes and 20 duodenoscopes evaluated met the ATP benchmark of <200 RLUs. The 5 channels that exceeded 200 RLUs were all elevator guide-wire channels. All 120 of the manually cleaned endoscopes tested had protein and bioburden levels that were compliant with accepted benchmarks for manual cleaning for suction-biopsy, air-water, and auxiliary water channels. Our data confirmed that, by following the endoscope manufacturer's manual cleaning recommendations, 96% of channels in gastrointestinal endoscopes would have <200 RLUs for the ATP test kit evaluated and would meet the accepted clean benchmarks for protein and bioburden. Copyright © 2013 Association for Professionals in Infection Control and Epidemiology, Inc. Published by Mosby, Inc. All rights reserved.
Analysis of 2D Torus and Hub Topologies of 100Mb/s Ethernet for the Whitney Commodity Computing Testbed

NASA Technical Reports Server (NTRS)

Pedretti, Kevin T.; Fineberg, Samuel A.; Kutler, Paul (Technical Monitor)

1997-01-01

A variety of different network technologies and topologies are currently being evaluated as part of the Whitney Project. This paper reports on the implementation and performance of a Fast Ethernet network configured in a 4x4 2D torus topology in a testbed cluster of 'commodity' Pentium Pro PCs. Several benchmarks were used for performance evaluation: an MPI point to point message passing benchmark, an MPI collective communication benchmark, and the NAS Parallel Benchmarks version 2.2 (NPB2). Our results show that for point to point communication on an unloaded network, the hub and 1 hop routes on the torus have about the same bandwidth and latency. However, the bandwidth decreases and the latency increases on the torus for each additional route hop. Collective communication benchmarks show that the torus provides roughly four times more aggregate bandwidth and eight times faster MPI barrier synchronizations than a hub based network for 16 processor systems. Finally, the SOAPBOX benchmarks, which simulate real-world CFD applications, generally demonstrated substantially better performance on the torus than on the hub. In the few cases the hub was faster, the difference was negligible. In total, our experimental results lead to the conclusion that for Fast Ethernet networks, the torus topology has better performance and scales better than a hub based network.
Paediatric International Nursing Study: using person-centred key performance indicators to benchmark children's services.

PubMed

McCance, Tanya; Wilson, Val; Kornman, Kelly

2016-07-01

The aim of the Paediatric International Nursing Study was to explore the utility of key performance indicators in developing person-centred practice across a range of services provided to sick children. The objective addressed in this paper was evaluating the use of these indicators to benchmark services internationally. This study builds on primary research, which produced indicators that were considered novel both in terms of their positive orientation and use in generating data that privileges the patient voice. This study extends this research through wider testing on an international platform within paediatrics. The overall methodological approach was a realistic evaluation used to evaluate the implementation of the key performance indicators, which combined an integrated development and evaluation methodology. The study involved children's wards/hospitals in Australia (six sites across three states) and Europe (seven sites across four countries). Qualitative and quantitative methods were used during the implementation process, however, this paper reports the quantitative data only, which used survey, observations and documentary review. The findings demonstrate the quality of care being delivered to children and their families across different international sites. The benchmarking does, however, highlight some differences between paediatric and general hospitals, and between the different key performance indicators across all the sites. The findings support the use of the key performance indicators as a novel method to benchmark services internationally. Whilst the data collected across 20 paediatric sites suggest services are more similar than different, benchmarking illuminates variations that encourage a critical dialogue about what works and why. The transferability of the key performance indicators and measurement framework across different settings has significant implications for practice. The findings offer an approach to benchmarking and celebrating the successes within practice, while learning from partners across the globe in further developing person-centred cultures. © 2016 John Wiley & Sons Ltd.
Benchmarking Strategies for Measuring the Quality of Healthcare: Problems and Prospects

PubMed Central

Lovaglio, Pietro Giorgio

2012-01-01

Over the last few years, increasing attention has been directed toward the problems inherent to measuring the quality of healthcare and implementing benchmarking strategies. Besides offering accreditation and certification processes, recent approaches measure the performance of healthcare institutions in order to evaluate their effectiveness, defined as the capacity to provide treatment that modifies and improves the patient's state of health. This paper, dealing with hospital effectiveness, focuses on research methods for effectiveness analyses within a strategy comparing different healthcare institutions. The paper, after having introduced readers to the principle debates on benchmarking strategies, which depend on the perspective and type of indicators used, focuses on the methodological problems related to performing consistent benchmarking analyses. Particularly, statistical methods suitable for controlling case-mix, analyzing aggregate data, rare events, and continuous outcomes measured with error are examined. Specific challenges of benchmarking strategies, such as the risk of risk adjustment (case-mix fallacy, underreporting, risk of comparing noncomparable hospitals), selection bias, and possible strategies for the development of consistent benchmarking analyses, are discussed. Finally, to demonstrate the feasibility of the illustrated benchmarking strategies, an application focused on determining regional benchmarks for patient satisfaction (using 2009 Lombardy Region Patient Satisfaction Questionnaire) is proposed. PMID:22666140
Benchmarking strategies for measuring the quality of healthcare: problems and prospects.

PubMed

Lovaglio, Pietro Giorgio

2012-01-01

Over the last few years, increasing attention has been directed toward the problems inherent to measuring the quality of healthcare and implementing benchmarking strategies. Besides offering accreditation and certification processes, recent approaches measure the performance of healthcare institutions in order to evaluate their effectiveness, defined as the capacity to provide treatment that modifies and improves the patient's state of health. This paper, dealing with hospital effectiveness, focuses on research methods for effectiveness analyses within a strategy comparing different healthcare institutions. The paper, after having introduced readers to the principle debates on benchmarking strategies, which depend on the perspective and type of indicators used, focuses on the methodological problems related to performing consistent benchmarking analyses. Particularly, statistical methods suitable for controlling case-mix, analyzing aggregate data, rare events, and continuous outcomes measured with error are examined. Specific challenges of benchmarking strategies, such as the risk of risk adjustment (case-mix fallacy, underreporting, risk of comparing noncomparable hospitals), selection bias, and possible strategies for the development of consistent benchmarking analyses, are discussed. Finally, to demonstrate the feasibility of the illustrated benchmarking strategies, an application focused on determining regional benchmarks for patient satisfaction (using 2009 Lombardy Region Patient Satisfaction Questionnaire) is proposed.
Automated benchmarking of peptide-MHC class I binding predictions.

PubMed

Trolle, Thomas; Metushi, Imir G; Greenbaum, Jason A; Kim, Yohan; Sidney, John; Lund, Ole; Sette, Alessandro; Peters, Bjoern; Nielsen, Morten

2015-07-01

Numerous in silico methods predicting peptide binding to major histocompatibility complex (MHC) class I molecules have been developed over the last decades. However, the multitude of available prediction tools makes it non-trivial for the end-user to select which tool to use for a given task. To provide a solid basis on which to compare different prediction tools, we here describe a framework for the automated benchmarking of peptide-MHC class I binding prediction tools. The framework runs weekly benchmarks on data that are newly entered into the Immune Epitope Database (IEDB), giving the public access to frequent, up-to-date performance evaluations of all participating tools. To overcome potential selection bias in the data included in the IEDB, a strategy was implemented that suggests a set of peptides for which different prediction methods give divergent predictions as to their binding capability. Upon experimental binding validation, these peptides entered the benchmark study. The benchmark has run for 15 weeks and includes evaluation of 44 datasets covering 17 MHC alleles and more than 4000 peptide-MHC binding measurements. Inspection of the results allows the end-user to make educated selections between participating tools. Of the four participating servers, NetMHCpan performed the best, followed by ANN, SMM and finally ARB. Up-to-date performance evaluations of each server can be found online at http://tools.iedb.org/auto_bench/mhci/weekly. All prediction tool developers are invited to participate in the benchmark. Sign-up instructions are available at http://tools.iedb.org/auto_bench/mhci/join. mniel@cbs.dtu.dk or bpeters@liai.org Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Congenital Heart Surgery Case Mix Across North American Centers and Impact on Performance Assessment.

PubMed

Pasquali, Sara K; Wallace, Amelia S; Gaynor, J William; Jacobs, Marshall L; O'Brien, Sean M; Hill, Kevin D; Gaies, Michael G; Romano, Jennifer C; Shahian, David M; Mayer, John E; Jacobs, Jeffrey P

2016-11-01

Performance assessment in congenital heart surgery is challenging due to the wide heterogeneity of disease. We describe current case mix across centers, evaluate methodology inclusive of all cardiac operations versus the more homogeneous subset of Society of Thoracic Surgeons benchmark operations, and describe implications regarding performance assessment. Centers (n = 119) participating in the Society of Thoracic Surgeons Congenital Heart Surgery Database (2010 through 2014) were included. Index operation type and frequency across centers were described. Center performance (risk-adjusted operative mortality) was evaluated and classified when including the benchmark versus all eligible operations. Overall, 207 types of operations were performed during the study period (112,140 total cases). Few operations were performed across all centers; only 25% were performed at least once by 75% or more of centers. There was 7.9-fold variation across centers in the proportion of total cases comprising high-complexity cases (STAT 5). In contrast, the benchmark operations made up 36% of cases, and all but 2 were performed by at least 90% of centers. When evaluating performance based on benchmark versus all operations, 15% of centers changed performance classification; 85% remained unchanged. Benchmark versus all operation methodology was associated with lower power, with 35% versus 78% of centers meeting sample size thresholds. There is wide variation in congenital heart surgery case mix across centers. Metrics based on benchmark versus all operations are associated with strengths (less heterogeneity) and weaknesses (lower power), and lead to differing performance classification for some centers. These findings have implications for ongoing efforts to optimize performance assessment, including choice of target population and appropriate interpretation of reported metrics. Copyright © 2016 The Society of Thoracic Surgeons. Published by Elsevier Inc. All rights reserved.

Toward multimodal signal detection of adverse drug reactions.

PubMed

Harpaz, Rave; DuMouchel, William; Schuemie, Martijn; Bodenreider, Olivier; Friedman, Carol; Horvitz, Eric; Ripple, Anna; Sorbello, Alfred; White, Ryen W; Winnenburg, Rainer; Shah, Nigam H

2017-12-01

Improving mechanisms to detect adverse drug reactions (ADRs) is key to strengthening post-marketing drug safety surveillance. Signal detection is presently unimodal, relying on a single information source. Multimodal signal detection is based on jointly analyzing multiple information sources. Building on, and expanding the work done in prior studies, the aim of the article is to further research on multimodal signal detection, explore its potential benefits, and propose methods for its construction and evaluation. Four data sources are investigated; FDA's adverse event reporting system, insurance claims, the MEDLINE citation database, and the logs of major Web search engines. Published methods are used to generate and combine signals from each data source. Two distinct reference benchmarks corresponding to well-established and recently labeled ADRs respectively are used to evaluate the performance of multimodal signal detection in terms of area under the ROC curve (AUC) and lead-time-to-detection, with the latter relative to labeling revision dates. Limited to our reference benchmarks, multimodal signal detection provides AUC improvements ranging from 0.04 to 0.09 based on a widely used evaluation benchmark, and a comparative added lead-time of 7-22 months relative to labeling revision dates from a time-indexed benchmark. The results support the notion that utilizing and jointly analyzing multiple data sources may lead to improved signal detection. Given certain data and benchmark limitations, the early stage of development, and the complexity of ADRs, it is currently not possible to make definitive statements about the ultimate utility of the concept. Continued development of multimodal signal detection requires a deeper understanding the data sources used, additional benchmarks, and further research on methods to generate and synthesize signals. Copyright © 2017 Elsevier Inc. All rights reserved.
Automated benchmarking of peptide-MHC class I binding predictions

PubMed Central

Trolle, Thomas; Metushi, Imir G.; Greenbaum, Jason A.; Kim, Yohan; Sidney, John; Lund, Ole; Sette, Alessandro; Peters, Bjoern; Nielsen, Morten

2015-01-01

Motivation: Numerous in silico methods predicting peptide binding to major histocompatibility complex (MHC) class I molecules have been developed over the last decades. However, the multitude of available prediction tools makes it non-trivial for the end-user to select which tool to use for a given task. To provide a solid basis on which to compare different prediction tools, we here describe a framework for the automated benchmarking of peptide-MHC class I binding prediction tools. The framework runs weekly benchmarks on data that are newly entered into the Immune Epitope Database (IEDB), giving the public access to frequent, up-to-date performance evaluations of all participating tools. To overcome potential selection bias in the data included in the IEDB, a strategy was implemented that suggests a set of peptides for which different prediction methods give divergent predictions as to their binding capability. Upon experimental binding validation, these peptides entered the benchmark study. Results: The benchmark has run for 15 weeks and includes evaluation of 44 datasets covering 17 MHC alleles and more than 4000 peptide-MHC binding measurements. Inspection of the results allows the end-user to make educated selections between participating tools. Of the four participating servers, NetMHCpan performed the best, followed by ANN, SMM and finally ARB. Availability and implementation: Up-to-date performance evaluations of each server can be found online at http://tools.iedb.org/auto_bench/mhci/weekly. All prediction tool developers are invited to participate in the benchmark. Sign-up instructions are available at http://tools.iedb.org/auto_bench/mhci/join. Contact: mniel@cbs.dtu.dk or bpeters@liai.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25717196
Evaluation of the influence of the definition of an isolated hip fracture as an exclusion criterion for trauma system benchmarking: a multicenter cohort study.

PubMed

Tiao, J; Moore, L; Porgo, T V; Belcaid, A

2016-06-01

To assess whether the definition of an IHF used as an exclusion criterion influences the results of trauma center benchmarking. We conducted a multicenter retrospective cohort study with data from an integrated Canadian trauma system. The study population included all patients admitted between 1999 and 2010 to any of the 57 adult trauma centers. Seven definitions of IHF based on diagnostic codes, age, mechanism of injury, and secondary injuries, identified in a systematic review, were used. Trauma centers were benchmarked using risk-adjusted mortality estimates generated using the Trauma Risk Adjustment Model. The agreement between benchmarking results generated under different IHF definitions was evaluated with correlation coefficients on adjusted mortality estimates. Correlation coefficients >0.95 were considered to convey acceptable agreement. The study population consisted of 172,872 patients before exclusion of IHF and between 128,094 and 139,588 patients after exclusion. Correlation coefficients between risk-adjusted mortality estimates generated in populations including and excluding IHF varied between 0.86 and 0.90. Correlation coefficients of estimates generated under different definitions of IHF varied between 0.97 and 0.99, even when analyses were restricted to patients aged ≥65 years. Although the exclusion of patients with IHF has an influence on the results of trauma center benchmarking based on mortality, the definition of IHF in terms of diagnostic codes, age, mechanism of injury and secondary injury has no significant impact on benchmarking results. Results suggest that there is no need to obtain formal consensus on the definition of IHF for benchmarking activities.
Qubit Manipulations Techniques for Trapped-Ion Quantum Information Processing

NASA Astrophysics Data System (ADS)

Gaebler, John; Tan, Ting; Lin, Yiheng; Bowler, Ryan; Jost, John; Meier, Adam; Knill, Emanuel; Leibfried, Dietrich; Wineland, David; Ion Storage Team

2013-05-01

We report recent results on qubit manipulation techniques for trapped-ions towards scalable quantum information processing (QIP). We demonstrate a platform-independent benchmarking protocol for evaluating the performance of Clifford gates, which form a basis for fault-tolerant QIP. We report a demonstration of an entangling gate scheme proposed by Bermudez et al. [Phys. Rev. A. 85, 040302 (2012)] and achieve a fidelity of 0.974(4). This scheme takes advantage of dynamic decoupling which protects the qubit against dephasing errors. It can be applied directly on magnetic-field-insensitive states, and provides a number of simplifications in experimental implementation compared to some other entangling gates with trapped ions. We also report preliminary results on dissipative creation of entanglement with trapped-ions. Creation of an entangled pair does not require discrete logic gates and thus could reduce the level of quantum-coherent control needed for large-scale QIP. Supported by IARPA, ARO contract No. EAO139840, ONR, and the NIST Quantum Information Program.
RELAP5-3D Results for Phase I (Exercise 2) of the OECD/NEA MHTGR-350 MW Benchmark

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gerhard Strydom

2012-06-01

The coupling of the PHISICS code suite to the thermal hydraulics system code RELAP5-3D has recently been initiated at the Idaho National Laboratory (INL) to provide a fully coupled prismatic Very High Temperature Reactor (VHTR) system modeling capability as part of the NGNP methods development program. The PHISICS code consists of three modules: INSTANT (performing 3D nodal transport core calculations), MRTAU (depletion and decay heat generation) and a perturbation/mixer module. As part of the verification and validation activities, steady state results have been obtained for Exercise 2 of Phase I of the newly-defined OECD/NEA MHTGR-350 MW Benchmark. This exercise requiresmore » participants to calculate a steady-state solution for an End of Equilibrium Cycle 350 MW Modular High Temperature Reactor (MHTGR), using the provided geometry, material, and coolant bypass flow description. The paper provides an overview of the MHTGR Benchmark and presents typical steady state results (e.g. solid and gas temperatures, thermal conductivities) for Phase I Exercise 2. Preliminary results are also provided for the early test phase of Exercise 3 using a two-group cross-section library and the Relap5-3D model developed for Exercise 2.« less
RELAP5-3D results for phase I (Exercise 2) of the OECD/NEA MHTGR-350 MW benchmark

DOE Office of Scientific and Technical Information (OSTI.GOV)

Strydom, G.; Epiney, A. S.

2012-07-01

The coupling of the PHISICS code suite to the thermal hydraulics system code RELAP5-3D has recently been initiated at the Idaho National Laboratory (INL) to provide a fully coupled prismatic Very High Temperature Reactor (VHTR) system modeling capability as part of the NGNP methods development program. The PHISICS code consists of three modules: INSTANT (performing 3D nodal transport core calculations), MRTAU (depletion and decay heat generation) and a perturbation/mixer module. As part of the verification and validation activities, steady state results have been obtained for Exercise 2 of Phase I of the newly-defined OECD/NEA MHTGR-350 MW Benchmark. This exercise requiresmore » participants to calculate a steady-state solution for an End of Equilibrium Cycle 350 MW Modular High Temperature Reactor (MHTGR), using the provided geometry, material, and coolant bypass flow description. The paper provides an overview of the MHTGR Benchmark and presents typical steady state results (e.g. solid and gas temperatures, thermal conductivities) for Phase I Exercise 2. Preliminary results are also provided for the early test phase of Exercise 3 using a two-group cross-section library and the Relap5-3D model developed for Exercise 2. (authors)« less
Microbially Mediated Kinetic Sulfur Isotope Fractionation: Reactive Transport Modeling Benchmark

NASA Astrophysics Data System (ADS)

Wanner, C.; Druhan, J. L.; Cheng, Y.; Amos, R. T.; Steefel, C. I.; Ajo Franklin, J. B.

2014-12-01

Microbially mediated sulfate reduction is a ubiquitous process in many subsurface systems. Isotopic fractionation is characteristic of this anaerobic process, since sulfate reducing bacteria (SRB) favor the reduction of the lighter sulfate isotopologue (S32O42-) over the heavier isotopologue (S34O42-). Detection of isotopic shifts have been utilized as a proxy for the onset of sulfate reduction in subsurface systems such as oil reservoirs and aquifers undergoing uranium bioremediation. Reactive transport modeling (RTM) of kinetic sulfur isotope fractionation has been applied to field and laboratory studies. These RTM approaches employ different mathematical formulations in the representation of kinetic sulfur isotope fractionation. In order to test the various formulations, we propose a benchmark problem set for the simulation of kinetic sulfur isotope fractionation during microbially mediated sulfate reduction. The benchmark problem set is comprised of four problem levels and is based on a recent laboratory column experimental study of sulfur isotope fractionation. Pertinent processes impacting sulfur isotopic composition such as microbial sulfate reduction and dispersion are included in the problem set. To date, participating RTM codes are: CRUNCHTOPE, TOUGHREACT, MIN3P and THE GEOCHEMIST'S WORKBENCH. Preliminary results from various codes show reasonable agreement for the problem levels simulating sulfur isotope fractionation in 1D.
Preliminary evaluation of military, commercial and novel skin decontamination products against a chemical warfare agent simulant (methyl salicylate).

PubMed

Matar, Hazem; Guerreiro, Antonio; Piletsky, Sergey A; Price, Shirley C; Chilcott, Robert P

2015-08-13

Rapid decontamination is vital to alleviate adverse health effects following dermal exposure to hazardous materials. There is an abundance of materials and products which can be utilised to remove hazardous materials from the skin. In this study, a total of 15 products were evaluated, 10 of which were commercial or military products and five were novel (molecular imprinted) polymers. The efficacies of these products were evaluated against a 10 µl droplet of 14 C-methyl salicylate applied to the surface of porcine skin mounted on static diffusion cells. The current UK military decontaminant (Fuller's earth) performed well, retaining 83% of the dose over 24 h and served as a benchmark to compare with the other test products. The five most effective test products were Fuller's earth (the current UK military decontaminant), Fast-Act® and three novel polymers [based on itaconic acid, 2-trifluoromethylacrylic acid and N,N-methylenebis(acrylamide)]. Five products (medical moist-free wipes, 5% FloraFree™ solution, normal baby wipes, baby wipes for sensitive skin and Diphotérine™) enhanced the dermal absorption of 14 C-methyl salicylate. Further work is required to establish the performance of the most effective products identified in this study against chemical warfare agents.
Preliminary evaluation of military, commercial and novel skin decontamination products against a chemical warfare agent simulant (methyl salicylate).

PubMed

Matar, Hazem; Guerreiro, Antonio; Piletsky, Sergey A; Price, Shirley C; Chilcott, Robert P

2016-01-01

Rapid decontamination is vital to alleviate adverse health effects following dermal exposure to hazardous materials. There is an abundance of materials and products which can be utilised to remove hazardous materials from the skin. In this study, a total of 15 products were evaluated, 10 of which were commercial or military products and five were novel (molecular imprinted) polymers. The efficacies of these products were evaluated against a 10 µl droplet of (14)C-methyl salicylate applied to the surface of porcine skin mounted on static diffusion cells. The current UK military decontaminant (Fuller's earth) performed well, retaining 83% of the dose over 24 h and served as a benchmark to compare with the other test products. The five most effective test products were Fuller's earth (the current UK military decontaminant), Fast-Act® and three novel polymers [based on itaconic acid, 2-trifluoromethylacrylic acid and N,N-methylenebis(acrylamide)]. Five products (medical moist-free wipes, 5% FloraFree™ solution, normal baby wipes, baby wipes for sensitive skin and Diphotérine™) enhanced the dermal absorption of (14)C-methyl salicylate. Further work is required to establish the performance of the most effective products identified in this study against chemical warfare agents.
Outpatient echocardiography in the evaluation of innocent murmurs in children: utilisation benchmarking.

PubMed

Frias, Patricio A; Oster, Matthew; Daley, Patricia A; Boris, Jeffrey R

2016-03-01

We sought to benchmark the utilisation of echocardiography in the outpatient evaluation of heart murmurs by evaluating two large paediatric cardiology centres. Although criteria exist for appropriate use of echocardiography, there are no benchmarking data demonstrating its utilisation. We performed a retrospective cohort study of outpatients aged between 0 and 18 years at the Sibley Heart Center Cardiology and the Children's Hospital of Philadelphia Division of Cardiology, given a sole diagnosis of "innocent murmur" from 1 July, 2007 to 31 October, 2010. Using internal claims data, we compared the utilisation of echocardiography according to centre, patient age, and physician years of service. Of 23,114 eligible patients (Sibley Heart Center Cardiology: 12,815, Children's Hospital of Philadelphia Division of Cardiology: 10,299), 43.1% (Sibley Heart Center Cardiology: 45.2%, Children's Hospital of Philadelphia Division of Cardiology: 40.4%; p1-5 years had the lowest utilisation (32.7%). In two large paediatric cardiology practices, the overall utilisation of echocardiography by physicians with a sole diagnosis of innocent murmur was similar. There was significant and similar variability in utilisation by provider at both centres. Although these data serve as initial benchmarking, the variability in utilisation highlights the importance of appropriate use criteria.
Medico-economic evaluation of healthcare products. Methodology for defining a significant impact on French health insurance costs and selection of benchmarks for interpreting results.

PubMed

Dervaux, Benoît; Baseilhac, Eric; Fagon, Jean-Yves; Biot, Claire; Blachier, Corinne; Braun, Eric; Debroucker, Frédérique; Detournay, Bruno; Ferretti, Carine; Granger, Muriel; Jouan-Flahault, Chrystel; Lussier, Marie-Dominique; Meyer, Arlette; Muller, Sophie; Pigeon, Martine; De Sahb, Rima; Sannié, Thomas; Sapède, Claudine; Vray, Muriel

2014-01-01

Decree No. 2012-1116 of 2 October 2012 on medico-economic assignments of the French National Authority for Health (Haute autorité de santé, HAS) significantly alters the conditions for accessing the health products market in France. This paper presents a theoretical framework for interpreting the results of the economic evaluation of health technologies and summarises the facts available in France for developing benchmarks that will be used to interpret incremental cost-effectiveness ratios. This literature review shows that it is difficult to determine a threshold value but it is also difficult to interpret then incremental cost effectiveness ratio (ICER) results without a threshold value. In this context, round table participants favour a pragmatic approach based on "benchmarks" as opposed to a threshold value, based on an interpretative and normative perspective, i.e. benchmarks that can change over time based on feedback. © 2014 Société Française de Pharmacologie et de Thérapeutique.
NetBenchmark: a bioconductor package for reproducible benchmarks of gene regulatory network inference.

PubMed

Bellot, Pau; Olsen, Catharina; Salembier, Philippe; Oliveras-Vergés, Albert; Meyer, Patrick E

2015-09-29

In the last decade, a great number of methods for reconstructing gene regulatory networks from expression data have been proposed. However, very few tools and datasets allow to evaluate accurately and reproducibly those methods. Hence, we propose here a new tool, able to perform a systematic, yet fully reproducible, evaluation of transcriptional network inference methods. Our open-source and freely available Bioconductor package aggregates a large set of tools to assess the robustness of network inference algorithms against different simulators, topologies, sample sizes and noise intensities. The benchmarking framework that uses various datasets highlights the specialization of some methods toward network types and data. As a result, it is possible to identify the techniques that have broad overall performances.
Chemotherapy Extravasation: Establishing a National Benchmark for Incidence Among Cancer Centers.

PubMed

Jackson-Rose, Jeannette; Del Monte, Judith; Groman, Adrienne; Dial, Linda S; Atwell, Leah; Graham, Judy; O'Neil Semler, Rosemary; O'Sullivan, Maryellen; Truini-Pittman, Lisa; Cunningham, Terri A; Roman-Fischetti, Lisa; Costantinou, Eileen; Rimkus, Chris; Banavage, Adrienne J; Dietz, Barbara; Colussi, Carol J; Catania, Kimberly; Wasko, Michelle; Schreffler, Kevin A; West, Colleen; Siefert, Mary Lou; Rice, Robert David

2017-08-01

Given the high-risk nature and nurse sensitivity of chemotherapy infusion and extravasation prevention, as well as the absence of an industry benchmark, a group of nurses studied oncology-specific nursing-sensitive indicators.  . The purpose was to establish a benchmark for the incidence of chemotherapy extravasation with vesicants, irritants, and irritants with vesicant potential. . Infusions with actual or suspected extravasations of vesicant and irritant chemotherapies were evaluated. Extravasation events were reviewed by type of agent, occurrence by drug category, route of administration, level of harm, follow-up, and patient referrals to surgical consultation. . A total of 739,812 infusions were evaluated, with 673 extravasation events identified. Incidence for all extravasation events was 0.09%.
Strategic Factors on Interpreting Remanufacturing Quality-Certifying Framework to Address Warranty Aftermarket for Malaysian Industry

NASA Astrophysics Data System (ADS)

Mohamed, N.; Saman, M. Z. M.; Sharif, S.; Hamzah, H. S.

2018-03-01

While the concept of remanufacturing is gaining popularity globally, literature and theory on strategic decision-making on certifying for warranty in this area remain limited. A strategic and establish concept flow is developed based on extensive literature review and surveys with experienced experts who are dealing with remanufactured, reconditioned, rebuilt and reused components. The remanufacturing research on evaluating quality assurance of remanufactured component targets macro-level parameters and the indicators which must be confirmed for evaluation. The strategic remanufacturing factors identified from the literature review are discussed in a brainstorming session with a number of remanufacturing researchers and academic experts. The study is further broadened by industrial surveys and case studies to justify the inputs on developing a framework to certify remanufactured components. Preliminary results have established the key factors of remanufacturing quality control that might lead to the strict quality assurance of remanufactured components. Later, the developed framework can be used as a benchmarking tool to certify remanufactured components and warranty issuance. The findings serve as the foundation for further research concerning Original Equipment Manufacturer (OEM) or Original Equipment Remanufacturer (OER) and Independent Equipment Remanufacturer (IER) in the Malaysian Remanufacturing Industry.
Application of an Upwind High Resolution Finite-Differencing Scheme and Multigrid Method in Steady-State Incompressible Flow Simulations

NASA Technical Reports Server (NTRS)

Yang, Cheng I.; Guo, Yan-Hu; Liu, C.- H.

1996-01-01

The analysis and design of a submarine propulsor requires the ability to predict the characteristics of both laminar and turbulent flows to a higher degree of accuracy. This report presents results of certain benchmark computations based on an upwind, high-resolution, finite-differencing Navier-Stokes solver. The purpose of the computations is to evaluate the ability, the accuracy and the performance of the solver in the simulation of detailed features of viscous flows. Features of interest include flow separation and reattachment, surface pressure and skin friction distributions. Those features are particularly relevant to the propulsor analysis. Test cases with a wide range of Reynolds numbers are selected; therefore, the effects of the convective and the diffusive terms of the solver can be evaluated separately. Test cases include flows over bluff bodies, such as circular cylinders and spheres, at various low Reynolds numbers, flows over a flat plate with and without turbulence effects, and turbulent flows over axisymmetric bodies with and without propulsor effects. Finally, to enhance the iterative solution procedure, a full approximation scheme V-cycle multigrid method is implemented. Preliminary results indicate that the method significantly reduces the computational effort.
Issues in Benchmarking Human Reliability Analysis Methods: A Literature Review

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ronald L. Boring; Stacey M. L. Hendrickson; John A. Forester

There is a diversity of human reliability analysis (HRA) methods available for use in assessing human performance within probabilistic risk assessments (PRA). Due to the significant differences in the methods, including the scope, approach, and underlying models, there is a need for an empirical comparison investigating the validity and reliability of the methods. To accomplish this empirical comparison, a benchmarking study comparing and evaluating HRA methods in assessing operator performance in simulator experiments is currently underway. In order to account for as many effects as possible in the construction of this benchmarking study, a literature review was conducted, reviewing pastmore » benchmarking studies in the areas of psychology and risk assessment. A number of lessons learned through these studies are presented in order to aid in the design of future HRA benchmarking endeavors.« less
78 FR 43147 - Fisheries of the South Atlantic; Southeast Data, Assessment, and Review (SEDAR); Public Meetings

Federal Register 2010, 2011, 2012, 2013, 2014

2013-07-19

..., estimates biological benchmarks, projects future population conditions, and recommends research and... the Assessment webinars are as follows: 1. Participants will employ assessment models to evaluate stock status, estimate population benchmarks and management criteria, and project future conditions. The...
An Evaluation of Unit and ½ Mass Correction Approaches as a ...

EPA Pesticide Factsheets

Rare earth elements (REE) and certain alkaline earths can produce M+2 interferences in ICP-MS because they have sufficiently low second ionization energies. Four REEs (150Sm, 150Nd, 156Gd and 156Dy) produce false positives on 75As and 78Se and 132Ba can produce a false positive on 66Zn. Currently, US EPA Method 200.8 does not address these as sources of false positives. Additionally, these M+2 false positives are typically enhanced if collision cell technology is utilized to reduce polyatomic interferences associated with ICP-MS detection. A preliminary evaluation indicates that instrumental tuning conditions can impact the observed M+2/M+1 ratio and in turn the false positives generated on Zn, As and Se. Both unit and ½ mass approaches will be evaluated to correct for these false positives relative to the benchmark concentrations estimates from a triple quadrupole ICP-MS using standard solutions. The impact of matrix on these M+2 corrections will be evaluated over multiple analysis days with a focus on evaluating internal standards that mirror the matrix induced shifts in the M+2 ion transmission. The goal of this evaluation is to move away from fixed M+2 corrective approaches and move towards sample specific approaches that mimic the sample matrix induced variability while attempting to address intra-day variability of the M+2 correction factors through the use of internal standards. Oral Presentation via webinar for EPA Laboratory Technical Informati
A Web Resource for Standardized Benchmark Datasets, Metrics, and Rosetta Protocols for Macromolecular Modeling and Design.

PubMed

Ó Conchúir, Shane; Barlow, Kyle A; Pache, Roland A; Ollikainen, Noah; Kundert, Kale; O'Meara, Matthew J; Smith, Colin A; Kortemme, Tanja

2015-01-01

The development and validation of computational macromolecular modeling and design methods depend on suitable benchmark datasets and informative metrics for comparing protocols. In addition, if a method is intended to be adopted broadly in diverse biological applications, there needs to be information on appropriate parameters for each protocol, as well as metrics describing the expected accuracy compared to experimental data. In certain disciplines, there exist established benchmarks and public resources where experts in a particular methodology are encouraged to supply their most efficient implementation of each particular benchmark. We aim to provide such a resource for protocols in macromolecular modeling and design. We present a freely accessible web resource (https://kortemmelab.ucsf.edu/benchmarks) to guide the development of protocols for protein modeling and design. The site provides benchmark datasets and metrics to compare the performance of a variety of modeling protocols using different computational sampling methods and energy functions, providing a "best practice" set of parameters for each method. Each benchmark has an associated downloadable benchmark capture archive containing the input files, analysis scripts, and tutorials for running the benchmark. The captures may be run with any suitable modeling method; we supply command lines for running the benchmarks using the Rosetta software suite. We have compiled initial benchmarks for the resource spanning three key areas: prediction of energetic effects of mutations, protein design, and protein structure prediction, each with associated state-of-the-art modeling protocols. With the help of the wider macromolecular modeling community, we hope to expand the variety of benchmarks included on the website and continue to evaluate new iterations of current methods as they become available.
Using benchmarking techniques and the 2011 maternity practices infant nutrition and care (mPINC) survey to improve performance among peer groups across the United States.

PubMed

Edwards, Roger A; Dee, Deborah; Umer, Amna; Perrine, Cria G; Shealy, Katherine R; Grummer-Strawn, Laurence M

2014-02-01

A substantial proportion of US maternity care facilities engage in practices that are not evidence-based and that interfere with breastfeeding. The CDC Survey of Maternity Practices in Infant Nutrition and Care (mPINC) showed significant variation in maternity practices among US states. The purpose of this article is to use benchmarking techniques to identify states within relevant peer groups that were top performers on mPINC survey indicators related to breastfeeding support. We used 11 indicators of breastfeeding-related maternity care from the 2011 mPINC survey and benchmarking techniques to organize and compare hospital-based maternity practices across the 50 states and Washington, DC. We created peer categories for benchmarking first by region (grouping states by West, Midwest, South, and Northeast) and then by size (grouping states by the number of maternity facilities and dividing each region into approximately equal halves based on the number of facilities). Thirty-four states had scores high enough to serve as benchmarks, and 32 states had scores low enough to reflect the lowest score gap from the benchmark on at least 1 indicator. No state served as the benchmark on more than 5 indicators and no state was furthest from the benchmark on more than 7 indicators. The small peer group benchmarks in the South, West, and Midwest were better than the large peer group benchmarks on 91%, 82%, and 36% of the indicators, respectively. In the West large, the Midwest large, the Midwest small, and the South large peer groups, 4-6 benchmarks showed that less than 50% of hospitals have ideal practice in all states. The evaluation presents benchmarks for peer group state comparisons that provide potential and feasible targets for improvement.

The use of quality benchmarking in assessing web resources for the dermatology virtual branch library of the National electronic Library for Health (NeLH).

PubMed

Kamel Boulos, M N; Roudsari, A V; Gordon, C; Muir Gray, J A

2001-01-01

In 1998, the U.K. National Health Service Information for Health Strategy proposed the implementation of a National electronic Library for Health to provide clinicians, healthcare managers and planners, patients and the public with easy, round the clock access to high quality, up-to-date electronic information on health and healthcare. The Virtual Branch Libraries are among the most important components of the National electronic Library for Health. They aim at creating online knowledge based communities, each concerned with some specific clinical and other health-related topics. This study is about the envisaged Dermatology Virtual Branch Libraries of the National electronic Library for Health. It aims at selecting suitable dermatology Web resources for inclusion in the forthcoming Virtual Branch Libraries after establishing preliminary quality benchmarking rules for this task. Psoriasis, being a common dermatological condition, has been chosen as a starting point. Because quality is a principal concern of the National electronic Library for Health, the study includes a review of the major quality benchmarking systems available today for assessing health-related Web sites. The methodology of developing a quality benchmarking system has been also reviewed. Aided by metasearch Web tools, candidate resources were hand-selected in light of the reviewed benchmarking systems and specific criteria set by the authors. Over 90 professional and patient-oriented Web resources on psoriasis and dermatology in general are suggested for inclusion in the forthcoming Dermatology Virtual Branch Libraries. The idea of an all-in knowledge-hallmarking instrument for the National electronic Library for Health is also proposed based on the reviewed quality benchmarking systems. Skilled, methodical, organized human reviewing, selection and filtering based on well-defined quality appraisal criteria seems likely to be the key ingredient in the envisaged National electronic Library for Health service. Furthermore, by promoting the application of agreed quality guidelines and codes of ethics by all health information providers and not just within the National electronic Library for Health, the overall quality of the Web will improve with time and the Web will ultimately become a reliable and integral part of the care space.
A benchmarking program to reduce red blood cell outdating: implementation, evaluation, and a conceptual framework.

PubMed

Barty, Rebecca L; Gagliardi, Kathleen; Owens, Wendy; Lauzon, Deborah; Scheuermann, Sheena; Liu, Yang; Wang, Grace; Pai, Menaka; Heddle, Nancy M

2015-07-01

Benchmarking is a quality improvement tool that compares an organization's performance to that of its peers for selected indicators, to improve practice. Processes to develop evidence-based benchmarks for red blood cell (RBC) outdating in Ontario hospitals, based on RBC hospital disposition data from Canadian Blood Services, have been previously reported. These benchmarks were implemented in 160 hospitals provincewide with a multifaceted approach, which included hospital education, inventory management tools and resources, summaries of best practice recommendations, recognition of high-performing sites, and audit tools on the Transfusion Ontario website (http://transfusionontario.org). In this study we describe the implementation process and the impact of the benchmarking program on RBC outdating. A conceptual framework for continuous quality improvement of a benchmarking program was also developed. The RBC outdating rate for all hospitals trended downward continuously from April 2006 to February 2012, irrespective of hospitals' transfusion rates or their distance from the blood supplier. The highest annual outdating rate was 2.82%, at the beginning of the observation period. Each year brought further reductions, with a nadir outdating rate of 1.02% achieved in 2011. The key elements of the successful benchmarking strategy included dynamic targets, a comprehensive and evidence-based implementation strategy, ongoing information sharing, and a robust data system to track information. The Ontario benchmarking program for RBC outdating resulted in continuous and sustained quality improvement. Our conceptual iterative framework for benchmarking provides a guide for institutions implementing a benchmarking program. © 2015 AABB.
A postprocessing method in the HMC framework for predicting gene function based on biological instrumental data

NASA Astrophysics Data System (ADS)

Feng, Shou; Fu, Ping; Zheng, Wenbin

2018-03-01

Predicting gene function based on biological instrumental data is a complicated and challenging hierarchical multi-label classification (HMC) problem. When using local approach methods to solve this problem, a preliminary results processing method is usually needed. This paper proposed a novel preliminary results processing method called the nodes interaction method. The nodes interaction method revises the preliminary results and guarantees that the predictions are consistent with the hierarchy constraint. This method exploits the label dependency and considers the hierarchical interaction between nodes when making decisions based on the Bayesian network in its first phase. In the second phase, this method further adjusts the results according to the hierarchy constraint. Implementing the nodes interaction method in the HMC framework also enhances the HMC performance for solving the gene function prediction problem based on the Gene Ontology (GO), the hierarchy of which is a directed acyclic graph that is more difficult to tackle. The experimental results validate the promising performance of the proposed method compared to state-of-the-art methods on eight benchmark yeast data sets annotated by the GO.
RESULTS OF QA/QC TESTING OF EPA BENCHMARK DOSE SOFTWARE VERSION 1.2

EPA Science Inventory

EPA is developing benchmark dose software (BMDS) to support cancer and non-cancer dose-response assessments. Following the recent public review of BMDS version 1.1b, EPA developed a Hill model for evaluating continuous data, and improved the user interface and Multistage, Polyno...
MHEC Survey Establishes Midwest Property Insurance Benchmarks.

ERIC Educational Resources Information Center

Midwestern Higher Education Commission Risk Management Institute Research Bulletin, 1994

1994-01-01

This publication presents the results of a survey of over 200 midwestern colleges and universities on their property insurance programs and establishes benchmarks to help these institutions evaluate their insurance programs. Findings included the following: (1) 51 percent of respondents currently purchase their property insurance as part of a…
Quality Assurance Testing of Version 1.3 of U.S. EPA Benchmark Dose Software (Presentation)

EPA Science Inventory

EPA benchmark dose software (BMDS) issued to evaluate chemical dose-response data in support of Agency risk assessments, and must therefore be dependable. Quality assurance testing methods developed for BMDS were designed to assess model dependability with respect to curve-fitt...
School-Based Cognitive-Behavioral Therapy for Adolescent Depression: A Benchmarking Study

ERIC Educational Resources Information Center

Shirk, Stephen R.; Kaplinski, Heather; Gudmundsen, Gretchen

2009-01-01

The current study evaluated cognitive-behavioral therapy (CBT) for adolescent depression delivered in health clinics and counseling centers in four high schools. Outcomes were benchmarked to results from prior efficacy trials. Fifty adolescents diagnosed with depressive disorders were treated by eight doctoral-level psychologists who followed a…
A Benchmark and Comparative Study of Video-Based Face Recognition on COX Face Database.

PubMed

Huang, Zhiwu; Shan, Shiguang; Wang, Ruiping; Zhang, Haihong; Lao, Shihong; Kuerban, Alifu; Chen, Xilin

2015-12-01

Face recognition with still face images has been widely studied, while the research on video-based face recognition is inadequate relatively, especially in terms of benchmark datasets and comparisons. Real-world video-based face recognition applications require techniques for three distinct scenarios: 1) Videoto-Still (V2S); 2) Still-to-Video (S2V); and 3) Video-to-Video (V2V), respectively, taking video or still image as query or target. To the best of our knowledge, few datasets and evaluation protocols have benchmarked for all the three scenarios. In order to facilitate the study of this specific topic, this paper contributes a benchmarking and comparative study based on a newly collected still/video face database, named COX(1) Face DB. Specifically, we make three contributions. First, we collect and release a largescale still/video face database to simulate video surveillance with three different video-based face recognition scenarios (i.e., V2S, S2V, and V2V). Second, for benchmarking the three scenarios designed on our database, we review and experimentally compare a number of existing set-based methods. Third, we further propose a novel Point-to-Set Correlation Learning (PSCL) method, and experimentally show that it can be used as a promising baseline method for V2S/S2V face recognition on COX Face DB. Extensive experimental results clearly demonstrate that video-based face recognition needs more efforts, and our COX Face DB is a good benchmark database for evaluation.
Assessment of the monitoring and evaluation system for integrated community case management (ICCM) in Ethiopia: a comparison against global benchmark indicators.

PubMed

Mamo, Dereje; Hazel, Elizabeth; Lemma, Israel; Guenther, Tanya; Bekele, Abeba; Demeke, Berhanu

2014-10-01

Program managers require feasible, timely, reliable, and valid measures of iCCM implementation to identify problems and assess progress. The global iCCM Task Force developed benchmark indicators to guide implementers to develop or improve monitoring and evaluation (M&E) systems. To assesses Ethiopia's iCCM M&E system by determining the availability and feasibility of the iCCM benchmark indicators. We conducted a desk review of iCCM policy documents, monitoring tools, survey reports, and other rele- vant documents; and key informant interviews with government and implementing partners involved in iCCM scale-up and M&E. Currently, Ethiopia collects data to inform most (70% [33/47]) iCCM benchmark indicators, and modest extra effort could boost this to 83% (39/47). Eight (17%) are not available given the current system. Most benchmark indicators that track coordination and policy, human resources, service delivery and referral, supervision, and quality assurance are available through the routine monitoring systems or periodic surveys. Indicators for supply chain management are less available due to limited consumption data and a weak link with treatment data. Little information is available on iCCM costs. Benchmark indicators can detail the status of iCCM implementation; however, some indicators may not fit country priorities, and others may be difficult to collect. The government of Ethiopia and partners should review and prioritize the benchmark indicators to determine which should be included in the routine M&E system, especially since iCCMdata are being reviewed for addition to the HMIS. Moreover, the Health Extension Worker's reporting burden can be minimized by an integrated reporting approach.
A preliminary comparison of hydrodynamic approaches for flood inundation modeling of urban areas in Jakarta Ciliwung river basin

NASA Astrophysics Data System (ADS)

Rojali, Aditia; Budiaji, Abdul Somat; Pribadi, Yudhistira Satya; Fatria, Dita; Hadi, Tri Wahyu

2017-07-01

This paper addresses on the numerical modeling approaches for flood inundation in urban areas. Decisive strategy to choose between 1D, 2D or even a hybrid 1D-2D model is more than important to optimize flood inundation analyses. To find cost effective yet robust and accurate model has been our priority and motivation in the absence of available High Performance Computing facilities. The application of 1D, 1D/2D and full 2D modeling approach to river flood study in Jakarta Ciliwung river basin, and a comparison of approaches benchmarked for the inundation study are presented. This study demonstrate the successful use of 1D/2D and 2D system to model Jakarta Ciliwung river basin in terms of inundation results and computational aspect. The findings of the study provide an interesting comparison between modeling approaches, HEC-RAS 1D, 1D-2D, 2D, and ANUGA when benchmarked to the Manggarai water level measurement.
Benchmark simulation Model no 2 in Matlab-simulink: towards plant-wide WWTP control strategy evaluation.

PubMed

Vreck, D; Gernaey, K V; Rosen, C; Jeppsson, U

2006-01-01

In this paper, implementation of the Benchmark Simulation Model No 2 (BSM2) within Matlab-Simulink is presented. The BSM2 is developed for plant-wide WWTP control strategy evaluation on a long-term basis. It consists of a pre-treatment process, an activated sludge process and sludge treatment processes. Extended evaluation criteria are proposed for plant-wide control strategy assessment. Default open-loop and closed-loop strategies are also proposed to be used as references with which to compare other control strategies. Simulations indicate that the BM2 is an appropriate tool for plant-wide control strategy evaluation.
Benchmarking on Tsunami Currents with ComMIT

NASA Astrophysics Data System (ADS)

Sharghi vand, N.; Kanoglu, U.

2015-12-01

There were no standards for the validation and verification of tsunami numerical models before 2004 Indian Ocean tsunami. Even, number of numerical models has been used for inundation mapping effort, evaluation of critical structures, etc. without validation and verification. After 2004, NOAA Center for Tsunami Research (NCTR) established standards for the validation and verification of tsunami numerical models (Synolakis et al. 2008 Pure Appl. Geophys. 165, 2197-2228), which will be used evaluation of critical structures such as nuclear power plants against tsunami attack. NCTR presented analytical, experimental and field benchmark problems aimed to estimate maximum runup and accepted widely by the community. Recently, benchmark problems were suggested by the US National Tsunami Hazard Mitigation Program Mapping & Modeling Benchmarking Workshop: Tsunami Currents on February 9-10, 2015 at Portland, Oregon, USA (http://nws.weather.gov/nthmp/index.html). These benchmark problems concentrated toward validation and verification of tsunami numerical models on tsunami currents. Three of the benchmark problems were: current measurement of the Japan 2011 tsunami in Hilo Harbor, Hawaii, USA and in Tauranga Harbor, New Zealand, and single long-period wave propagating onto a small-scale experimental model of the town of Seaside, Oregon, USA. These benchmark problems were implemented in the Community Modeling Interface for Tsunamis (ComMIT) (Titov et al. 2011 Pure Appl. Geophys. 168, 2121-2131), which is a user-friendly interface to the validated and verified Method of Splitting Tsunami (MOST) (Titov and Synolakis 1995 J. Waterw. Port Coastal Ocean Eng. 121, 308-316) model and is developed by NCTR. The modeling results are compared with the required benchmark data, providing good agreements and results are discussed. Acknowledgment: The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement no 603839 (Project ASTARTE - Assessment, Strategy and Risk Reduction for Tsunamis in Europe)
Phase field benchmark problems for dendritic growth and linear elasticity

DOE PAGES

Jokisaari, Andrea M.; Voorhees, P. W.; Guyer, Jonathan E.; ...

2018-03-26

We present the second set of benchmark problems for phase field models that are being jointly developed by the Center for Hierarchical Materials Design (CHiMaD) and the National Institute of Standards and Technology (NIST) along with input from other members in the phase field community. As the integrated computational materials engineering (ICME) approach to materials design has gained traction, there is an increasing need for quantitative phase field results. New algorithms and numerical implementations increase computational capabilities, necessitating standard problems to evaluate their impact on simulated microstructure evolution as well as their computational performance. We propose one benchmark problem formore » solidifiication and dendritic growth in a single-component system, and one problem for linear elasticity via the shape evolution of an elastically constrained precipitate. We demonstrate the utility and sensitivity of the benchmark problems by comparing the results of 1) dendritic growth simulations performed with different time integrators and 2) elastically constrained precipitate simulations with different precipitate sizes, initial conditions, and elastic moduli. As a result, these numerical benchmark problems will provide a consistent basis for evaluating different algorithms, both existing and those to be developed in the future, for accuracy and computational efficiency when applied to simulate physics often incorporated in phase field models.« less
Phase field benchmark problems for dendritic growth and linear elasticity

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jokisaari, Andrea M.; Voorhees, P. W.; Guyer, Jonathan E.

We present the second set of benchmark problems for phase field models that are being jointly developed by the Center for Hierarchical Materials Design (CHiMaD) and the National Institute of Standards and Technology (NIST) along with input from other members in the phase field community. As the integrated computational materials engineering (ICME) approach to materials design has gained traction, there is an increasing need for quantitative phase field results. New algorithms and numerical implementations increase computational capabilities, necessitating standard problems to evaluate their impact on simulated microstructure evolution as well as their computational performance. We propose one benchmark problem formore » solidifiication and dendritic growth in a single-component system, and one problem for linear elasticity via the shape evolution of an elastically constrained precipitate. We demonstrate the utility and sensitivity of the benchmark problems by comparing the results of 1) dendritic growth simulations performed with different time integrators and 2) elastically constrained precipitate simulations with different precipitate sizes, initial conditions, and elastic moduli. As a result, these numerical benchmark problems will provide a consistent basis for evaluating different algorithms, both existing and those to be developed in the future, for accuracy and computational efficiency when applied to simulate physics often incorporated in phase field models.« less
Fast Neutron Spectrum Potassium Worth for Space Power Reactor Design Validation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bess, John D.; Marshall, Margaret A.; Briggs, J. Blair

2015-03-01

A variety of critical experiments were constructed of enriched uranium metal (oralloy ) during the 1960s and 1970s at the Oak Ridge Critical Experiments Facility (ORCEF) in support of criticality safety operations at the Y-12 Plant. The purposes of these experiments included the evaluation of storage, casting, and handling limits for the Y-12 Plant and providing data for verification of calculation methods and cross-sections for nuclear criticality safety applications. These included solid cylinders of various diameters, annuli of various inner and outer diameters, two and three interacting cylinders of various diameters, and graphite and polyethylene reflected cylinders and annuli. Ofmore » the hundreds of delayed critical experiments, one was performed that consisted of uranium metal annuli surrounding a potassium-filled, stainless steel can. The outer diameter of the annuli was approximately 13 inches (33.02 cm) with an inner diameter of 7 inches (17.78 cm). The diameter of the stainless steel can was 7 inches (17.78 cm). The critical height of the configurations was approximately 5.6 inches (14.224 cm). The uranium annulus consisted of multiple stacked rings, each with radial thicknesses of 1 inch (2.54 cm) and varying heights. A companion measurement was performed using empty stainless steel cans; the primary purpose of these experiments was to test the fast neutron cross sections of potassium as it was a candidate for coolant in some early space power reactor designs.The experimental measurements were performed on July 11, 1963, by J. T. Mihalczo and M. S. Wyatt (Ref. 1) with additional information in its corresponding logbook. Unreflected and unmoderated experiments with the same set of highly enriched uranium metal parts were performed at the Oak Ridge Critical Experiments Facility in the 1960s and are evaluated in the International Handbook for Evaluated Criticality Safety Benchmark Experiments (ICSBEP Handbook) with the identifier HEU MET FAST 051. Thin graphite reflected (2 inches or less) experiments also using the same set of highly enriched uranium metal parts are evaluated in HEU MET FAST 071. Polyethylene-reflected configurations are evaluated in HEU-MET-FAST-076. A stack of highly enriched metal discs with a thick beryllium top reflector is evaluated in HEU-MET-FAST-069, and two additional highly enriched uranium annuli with beryllium cores are evaluated in HEU-MET-FAST-059. Both detailed and simplified model specifications are provided in this evaluation. Both of these fast neutron spectra assemblies were determined to be acceptable benchmark experiments. The calculated eigenvalues for both the detailed and the simple benchmark models are within ~0.26 % of the benchmark values for Configuration 1 (calculations performed using MCNP6 with ENDF/B-VII.1 neutron cross section data), but under-calculate the benchmark values by ~7s because the uncertainty in the benchmark is very small: ~0.0004 (1s); for Configuration 2, the under-calculation is ~0.31 % and ~8s. Comparison of detailed and simple model calculations for the potassium worth measurement and potassium mass coefficient yield results approximately 70 – 80 % lower (~6s to 10s) than the benchmark values for the various nuclear data libraries utilized. Both the potassium worth and mass coefficient are also deemed to be acceptable benchmark experiment measurements.« less
Analysis of 100Mb/s Ethernet for the Whitney Commodity Computing Testbed

NASA Technical Reports Server (NTRS)

Fineberg, Samuel A.; Pedretti, Kevin T.; Kutler, Paul (Technical Monitor)

1997-01-01

We evaluate the performance of a Fast Ethernet network configured with a single large switch, a single hub, and a 4x4 2D torus topology in a testbed cluster of "commodity" Pentium Pro PCs. We also evaluated a mixed network composed of ethernet hubs and switches. An MPI collective communication benchmark, and the NAS Parallel Benchmarks version 2.2 (NPB2) show that the torus network performs best for all sizes that we were able to test (up to 16 nodes). For larger networks the ethernet switch outperforms the hub, though its performance is far less than peak. The hub/switch combination tests indicate that the NAS parallel benchmarks are relatively insensitive to hub densities of less than 7 nodes per hub.
Interfacing VPSC with finite element codes. Demonstration of irradiation growth simulation in a cladding tube

DOE Office of Scientific and Technical Information (OSTI.GOV)

Patra, Anirban; Tome, Carlos

This Milestone report shows good progress in interfacing VPSC with the FE codes ABAQUS and MOOSE, to perform component-level simulations of irradiation-induced deformation in Zirconium alloys. In this preliminary application, we have performed an irradiation growth simulation in the quarter geometry of a cladding tube. We have benchmarked VPSC-ABAQUS and VPSC-MOOSE predictions with VPSC-SA predictions to verify the accuracy of the VPSCFE interface. Predictions from the FE simulations are in general agreement with VPSC-SA simulations and also with experimental trends.
Evaluating the Quantitative Capabilities of Metagenomic Analysis Software.

PubMed

Kerepesi, Csaba; Grolmusz, Vince

2016-05-01

DNA sequencing technologies are applied widely and frequently today to describe metagenomes, i.e., microbial communities in environmental or clinical samples, without the need for culturing them. These technologies usually return short (100-300 base-pairs long) DNA reads, and these reads are processed by metagenomic analysis software that assign phylogenetic composition-information to the dataset. Here we evaluate three metagenomic analysis software (AmphoraNet--a webserver implementation of AMPHORA2--, MG-RAST, and MEGAN5) for their capabilities of assigning quantitative phylogenetic information for the data, describing the frequency of appearance of the microorganisms of the same taxa in the sample. The difficulties of the task arise from the fact that longer genomes produce more reads from the same organism than shorter genomes, and some software assign higher frequencies to species with longer genomes than to those with shorter ones. This phenomenon is called the "genome length bias." Dozens of complex artificial metagenome benchmarks can be found in the literature. Because of the complexity of those benchmarks, it is usually difficult to judge the resistance of a metagenomic software to this "genome length bias." Therefore, we have made a simple benchmark for the evaluation of the "taxon-counting" in a metagenomic sample: we have taken the same number of copies of three full bacterial genomes of different lengths, break them up randomly to short reads of average length of 150 bp, and mixed the reads, creating our simple benchmark. Because of its simplicity, the benchmark is not supposed to serve as a mock metagenome, but if a software fails on that simple task, it will surely fail on most real metagenomes. We applied three software for the benchmark. The ideal quantitative solution would assign the same proportion to the three bacterial taxa. We have found that AMPHORA2/AmphoraNet gave the most accurate results and the other two software were under-performers: they counted quite reliably each short read to their respective taxon, producing the typical genome length bias. The benchmark dataset is available at http://pitgroup.org/static/3RandomGenome-100kavg150bps.fna.
Developing integrated benchmarks for DOE performance measurement

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barancik, J.I.; Kramer, C.F.; Thode, Jr. H.C.

1992-09-30

The objectives of this task were to describe and evaluate selected existing sources of information on occupational safety and health with emphasis on hazard and exposure assessment, abatement, training, reporting, and control identifying for exposure and outcome in preparation for developing DOE performance benchmarks. Existing resources and methodologies were assessed for their potential use as practical performance benchmarks. Strengths and limitations of current data resources were identified. Guidelines were outlined for developing new or improved performance factors, which then could become the basis for selecting performance benchmarks. Data bases for non-DOE comparison populations were identified so that DOE performance couldmore » be assessed relative to non-DOE occupational and industrial groups. Systems approaches were described which can be used to link hazards and exposure, event occurrence, and adverse outcome factors, as needed to generate valid, reliable, and predictive performance benchmarks. Data bases were identified which contain information relevant to one or more performance assessment categories . A list of 72 potential performance benchmarks was prepared to illustrate the kinds of information that can be produced through a benchmark development program. Current information resources which may be used to develop potential performance benchmarks are limited. There is need to develop an occupational safety and health information and data system in DOE, which is capable of incorporating demonstrated and documented performance benchmarks prior to, or concurrent with the development of hardware and software. A key to the success of this systems approach is rigorous development and demonstration of performance benchmark equivalents to users of such data before system hardware and software commitments are institutionalized.« less
Object-Oriented Implementation of the NAS Parallel Benchmarks using Charm++

NASA Technical Reports Server (NTRS)

Krishnan, Sanjeev; Bhandarkar, Milind; Kale, Laxmikant V.

1996-01-01

This report describes experiences with implementing the NAS Computational Fluid Dynamics benchmarks using a parallel object-oriented language, Charm++. Our main objective in implementing the NAS CFD kernel benchmarks was to develop a code that could be used to easily experiment with different domain decomposition strategies and dynamic load balancing. We also wished to leverage the object-orientation provided by the Charm++ parallel object-oriented language, to develop reusable abstractions that would simplify the process of developing parallel applications. We first describe the Charm++ parallel programming model and the parallel object array abstraction, then go into detail about each of the Scalar Pentadiagonal (SP) and Lower/Upper Triangular (LU) benchmarks, along with performance results. Finally we conclude with an evaluation of the methodology used.

Preliminary results of the Geoid Slope Validation Survey 2014 in Iowa

NASA Astrophysics Data System (ADS)

Wang, Y. M.; Becker, C.; Breidenbach, S.; Geoghegan, C.; Martin, D.; Winester, D.; Hanson, T.; Mader, G. L.; Eckl, M. C.

2014-12-01

The National Geodetic Survey conducted a second Geoid Slope Validation Survey in the summer of 2014 (GSVS14). The survey took place in Iowa along U.S Route 30. The survey line is approximately 200 miles long (325 km), extending from Denison, IA to Cedar Rapids, IA. There are over 200 official survey bench marks. A leveling survey was performed, conforming to 1st order, class II specifications. A GPS survey was performed using 24 to 48 hour occupations. Absolute gravity, relative gravity, and gravity gradient measurements were also collected during the survey. In addition, deflections of the vertical were acquired at 200 eccentric survey benchmarks using the Compact Digital Astrometric Camera (CODIAC) camera. This paper presents the preliminary results of the survey, including the accuracy analysis of the leveling data, GPS ellipsoidal heights, and the deflections of the vertical which serves as an independent data set in addition to the GPS/leveling implied geoid heights.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Bess, John D.; Sterbentz, James W.; Snoj, Luka

PROTEUS is a zero-power research reactor based on a cylindrical graphite annulus with a central cylindrical cavity. The graphite annulus remains basically the same for all experimental programs, but the contents of the central cavity are changed according to the type of reactor being investigated. Through most of its service history, PROTEUS has represented light-water reactors, but from 1992 to 1996 PROTEUS was configured as a pebble-bed reactor (PBR) critical facility and designated as HTR-PROTEUS. The nomenclature was used to indicate that this series consisted of High Temperature Reactor experiments performed in the PROTEUS assembly. During this period, seventeen criticalmore » configurations were assembled and various reactor physics experiments were conducted. These experiments included measurements of criticality, differential and integral control rod and safety rod worths, kinetics, reaction rates, water ingress effects, and small sample reactivity effects (Ref. 3). HTR-PROTEUS was constructed, and the experimental program was conducted, for the purpose of providing experimental benchmark data for assessment of reactor physics computer codes. Considerable effort was devoted to benchmark calculations as a part of the HTR-PROTEUS program. References 1 and 2 provide detailed data for use in constructing models for codes to be assessed. Reference 3 is a comprehensive summary of the HTR-PROTEUS experiments and the associated benchmark program. This document draws freely from these references. Only Cores 9 and 10 are evaluated in this benchmark report due to similarities in their construction. The other core configurations of the HTR-PROTEUS program are evaluated in their respective reports as outlined in Section 1.0. Cores 9 and 10 were evaluated and determined to be acceptable benchmark experiments.« less
Evaluation of U.S. Building Energy Benchmarking and Transparency Programs: Attributes, Impacts, and Best Practices

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mims, Natalie; Schiller, Steven R.; Stuart, Elizabeth

In the last decade, a new policy area has emerged to boost energy efficiency in buildings that focuses on the simple action of measuring energy use as compared to buildings of similar type and size, and making that data publicly available. These efforts, referred to as benchmarking and transparency (B&T) policies, seek to unlock new energy efficiency opportunities in the country’s existing buildings by promoting data-driven decision-making and creating stronger market signals. This report focuses on the 24 state and local jurisdictions that (as of December 31, 2016) require owners of privately owned commercial buildings, multifamily buildings, or both tomore » comply with a B&T policy. The report provides a summary of U.S. B&T policy design and implementation characteristics, reports results and impacts for jurisdictions with B&T policies, and discusses opportunities for increasing the efficacy of B&T policies, as well as suggested areas for further research. Among the findings, all but one of the B&T policy evaluation studies reviewed indicate some reduction (from 1.6% to 14%) in energy use, energy costs, or energy intensity over the two- to four-year period of the analyses. More specifically, most of the studies reviewed indicate 3% to 8% reductions in gross energy consumption or energy use intensity over a two- to four-year period of B&T policy implementation. Two additional evaluation studies indicate that there is a causal relationship between B&T policies and energy savings or energy cost savings. These documented impacts should be reviewed with some caution. While consistently showing energy savings benefits associated with B&T policies, these savings estimates should be considered preliminary because of the limited period of analyses and inconsistencies in analysis methods for the various studies. A nationally standardized method for data collection, reporting, and evaluation of B&T policies—developed with an advisory group of state and local jurisdictions, energy efficiency and evaluation experts, building owner and real estate associations, and other stakeholders—could improve the consistency and quality of B&T impact studies, providing policymakers and others with a more complete understanding of the present and future impacts of these policies.« less
Short-Term Field Study Programs: A Holistic and Experiential Approach to Learning

ERIC Educational Resources Information Center

Long, Mary M.; Sandler, Dennis M.; Topol, Martin T.

2017-01-01

For business schools, AACSB and Middle States' call for more experiential learning is one reason to provide study abroad programs. Universities must attend to the demand for continuous improvement and employ metrics to benchmark and evaluate their relative standing among peer institutions. One such benchmark is the National Survey of Student…
ARL Physics Web Pages: An Evaluation by Established, Transitional and Emerging Benchmarks.

ERIC Educational Resources Information Center

Duffy, Jane C.

2002-01-01

Provides an overview of characteristics among Association of Research Libraries (ARL) physics Web pages. Examines current academic Web literature and from that develops six benchmarks to measure physics Web pages: ease of navigation; logic of presentation; representation of all forms of information; engagement of the discipline; interactivity of…
75 FR 51982 - Fisheries of the Gulf of Mexico; Southeast Data, Assessment, and Review (SEDAR) Update; Greater...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-08-24

... evaluates potential datasets and recommends which datasets are appropriate for assessment analyses. The... points to datasets incorporated in the original SEDAR benchmark assessment and run the benchmark... Webinar II November 22, 2010; 10 a.m. - 1 p.m.; SEDAR Update Assessment Webinar III Using updated datasets...
Authentic e-Learning in a Multicultural Context: Virtual Benchmarking Cases from Five Countries

ERIC Educational Resources Information Center

Leppisaari, Irja; Herrington, Jan; Vainio, Leena; Im, Yeonwook

2013-01-01

The implementation of authentic learning elements at education institutions in five countries, eight online courses in total, is examined in this paper. The International Virtual Benchmarking Project (2009-2010) applied the elements of authentic learning developed by Herrington and Oliver (2000) as criteria to evaluate authenticity. Twelve…
Benchmarking for maximum value.

PubMed

Baldwin, Ed

2009-03-01

Speaking at the most recent Healthcare Estates conference, Ed Baldwin, of international built asset consultancy EC Harris LLP, examined the role of benchmarking and market-testing--two of the key methods used to evaluate the quality and cost-effectiveness of hard and soft FM services provided under PFI healthcare schemes to ensure they are offering maximum value for money.
A review of the current state-of-the-art methodology for handling bias and uncertainty in performing criticality safety evaluations. Final report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Disney, R.K.

1994-10-01

The methodology for handling bias and uncertainty when calculational methods are used in criticality safety evaluations (CSE`s) is a rapidly evolving technology. The changes in the methodology are driven by a number of factors. One factor responsible for changes in the methodology for handling bias and uncertainty in CSE`s within the overview of the US Department of Energy (DOE) is a shift in the overview function from a ``site`` perception to a more uniform or ``national`` perception. Other causes for change or improvement in the methodology for handling calculational bias and uncertainty are; (1) an increased demand for benchmark criticalsmore » data to expand the area (range) of applicability of existing data, (2) a demand for new data to supplement existing benchmark criticals data, (3) the increased reliance on (or need for) computational benchmarks which supplement (or replace) experimental measurements in critical assemblies, and (4) an increased demand for benchmark data applicable to the expanded range of conditions and configurations encountered in DOE site restoration and remediation.« less
Benchmark problems for numerical implementations of phase field models

DOE PAGES

Jokisaari, A. M.; Voorhees, P. W.; Guyer, J. E.; ...

2016-10-01

Here, we present the first set of benchmark problems for phase field models that are being developed by the Center for Hierarchical Materials Design (CHiMaD) and the National Institute of Standards and Technology (NIST). While many scientific research areas use a limited set of well-established software, the growing phase field community continues to develop a wide variety of codes and lacks benchmark problems to consistently evaluate the numerical performance of new implementations. Phase field modeling has become significantly more popular as computational power has increased and is now becoming mainstream, driving the need for benchmark problems to validate and verifymore » new implementations. We follow the example set by the micromagnetics community to develop an evolving set of benchmark problems that test the usability, computational resources, numerical capabilities and physical scope of phase field simulation codes. In this paper, we propose two benchmark problems that cover the physics of solute diffusion and growth and coarsening of a second phase via a simple spinodal decomposition model and a more complex Ostwald ripening model. We demonstrate the utility of benchmark problems by comparing the results of simulations performed with two different adaptive time stepping techniques, and we discuss the needs of future benchmark problems. The development of benchmark problems will enable the results of quantitative phase field models to be confidently incorporated into integrated computational materials science and engineering (ICME), an important goal of the Materials Genome Initiative.« less
IEA-Task 31 WAKEBENCH: Towards a protocol for wind farm flow model evaluation. Part 2: Wind farm wake models

NASA Astrophysics Data System (ADS)

Moriarty, Patrick; Sanz Rodrigo, Javier; Gancarski, Pawel; Chuchfield, Matthew; Naughton, Jonathan W.; Hansen, Kurt S.; Machefaux, Ewan; Maguire, Eoghan; Castellani, Francesco; Terzi, Ludovico; Breton, Simon-Philippe; Ueda, Yuko

2014-06-01

Researchers within the International Energy Agency (IEA) Task 31: Wakebench have created a framework for the evaluation of wind farm flow models operating at the microscale level. The framework consists of a model evaluation protocol integrated with a web-based portal for model benchmarking (www.windbench.net). This paper provides an overview of the building-block validation approach applied to wind farm wake models, including best practices for the benchmarking and data processing procedures for validation datasets from wind farm SCADA and meteorological databases. A hierarchy of test cases has been proposed for wake model evaluation, from similarity theory of the axisymmetric wake and idealized infinite wind farm, to single-wake wind tunnel (UMN-EPFL) and field experiments (Sexbierum), to wind farm arrays in offshore (Horns Rev, Lillgrund) and complex terrain conditions (San Gregorio). A summary of results from the axisymmetric wake, Sexbierum, Horns Rev and Lillgrund benchmarks are used to discuss the state-of-the-art of wake model validation and highlight the most relevant issues for future development.
Benchmarking the evaluated proton differential cross sections suitable for the EBS analysis of natSi and 16O

NASA Astrophysics Data System (ADS)

Kokkoris, M.; Dede, S.; Kantre, K.; Lagoyannis, A.; Ntemou, E.; Paneta, V.; Preketes-Sigalas, K.; Provatas, G.; Vlastou, R.; Bogdanović-Radović, I.; Siketić, Z.; Obajdin, N.

2017-08-01

The evaluated proton differential cross sections suitable for the Elastic Backscattering Spectroscopy (EBS) analysis of natSi and 16O, as obtained from SigmaCalc 2.0, have been benchmarked over a wide energy and angular range at two different accelerator laboratories, namely at N.C.S.R. 'Demokritos', Athens, Greece and at Ruđer Bošković Institute (RBI), Zagreb, Croatia, using a variety of high-purity thick targets of known stoichiometry. The results are presented in graphical and tabular forms, while the observed discrepancies, as well as, the limits in accuracy of the benchmarking procedure, along with target related effects, are thoroughly discussed and analysed. In the case of oxygen the agreement between simulated and experimental spectra was generally good, while for silicon serious discrepancies were observed above Ep,lab = 2.5 MeV, suggesting that a further tuning of the appropriate nuclear model parameters in the evaluated differential cross-section datasets is required.
Optimization of Deep Drilling Performance--Development and Benchmark Testing of Advanced Diamond Product Drill Bits & HP/HT Fluids to Significantly Improve Rates of Penetration

DOE Office of Scientific and Technical Information (OSTI.GOV)

Alan Black; Arnis Judzis

2003-10-01

This document details the progress to date on the OPTIMIZATION OF DEEP DRILLING PERFORMANCE--DEVELOPMENT AND BENCHMARK TESTING OF ADVANCED DIAMOND PRODUCT DRILL BITS AND HP/HT FLUIDS TO SIGNIFICANTLY IMPROVE RATES OF PENETRATION contract for the year starting October 2002 through September 2002. The industry cost shared program aims to benchmark drilling rates of penetration in selected simulated deep formations and to significantly improve ROP through a team development of aggressive diamond product drill bit--fluid system technologies. Overall the objectives are as follows: Phase 1--Benchmark ''best in class'' diamond and other product drilling bits and fluids and develop concepts for amore » next level of deep drilling performance; Phase 2--Develop advanced smart bit--fluid prototypes and test at large scale; and Phase 3--Field trial smart bit--fluid concepts, modify as necessary and commercialize products. Accomplishments to date include the following: 4Q 2002--Project started; Industry Team was assembled; Kick-off meeting was held at DOE Morgantown; 1Q 2003--Engineering meeting was held at Hughes Christensen, The Woodlands Texas to prepare preliminary plans for development and testing and review equipment needs; Operators started sending information regarding their needs for deep drilling challenges and priorities for large-scale testing experimental matrix; Aramco joined the Industry Team as DEA 148 objectives paralleled the DOE project; 2Q 2003--Engineering and planning for high pressure drilling at TerraTek commenced; 3Q 2003--Continuation of engineering and design work for high pressure drilling at TerraTek; Baker Hughes INTEQ drilling Fluids and Hughes Christensen commence planning for Phase 1 testing--recommendations for bits and fluids.« less
The Paucity Problem: Where Have All the Space Reactor Experiments Gone?

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bess, John D.; Marshall, Margaret A.

2016-10-01

The Handbooks of the International Criticality Safety Benchmark Evaluation Project (ICSBEP) and the International Reactor Physics Experiment Evaluation Project (IRPhEP) together contain a plethora of documented and evaluated experiments essential in the validation of nuclear data, neutronics codes, and modeling of various nuclear systems. Unfortunately, only a minute selection of handbook data (twelve evaluations) are of actual experimental facilities and mockups designed specifically for space nuclear research. There is a paucity problem, such that the multitude of space nuclear experimental activities performed in the past several decades have yet to be recovered and made available in such detail that themore » international community could benefit from these valuable historical research efforts. Those experiments represent extensive investments in infrastructure, expertise, and cost, as well as constitute significantly valuable resources of data supporting past, present, and future research activities. The ICSBEP and IRPhEP were established to identify and verify comprehensive sets of benchmark data; evaluate the data, including quantification of biases and uncertainties; compile the data and calculations in a standardized format; and formally document the effort into a single source of verified benchmark data. See full abstract in attached document.« less
An Approach to Industrial Stormwater Benchmarks: Establishing and Using Site-Specific Threshold Criteria at Lawrence Livermore National Laboratory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Campbell, C G; Mathews, S

2006-09-07

Current regulatory schemes use generic or industrial sector specific benchmarks to evaluate the quality of industrial stormwater discharges. While benchmarks can be a useful tool for facility stormwater managers in evaluating the quality stormwater runoff, benchmarks typically do not take into account site-specific conditions, such as: soil chemistry, atmospheric deposition, seasonal changes in water source, and upstream land use. Failing to account for these factors may lead to unnecessary costs to trace a source of natural variation, or potentially missing a significant local water quality problem. Site-specific water quality thresholds, established upon the statistical evaluation of historic data take intomore » account these factors, are a better tool for the direct evaluation of runoff quality, and a more cost-effective trigger to investigate anomalous results. Lawrence Livermore National Laboratory (LLNL), a federal facility, established stormwater monitoring programs to comply with the requirements of the industrial stormwater permit and Department of Energy orders, which require the evaluation of the impact of effluent discharges on the environment. LLNL recognized the need to create a tool to evaluate and manage stormwater quality that would allow analysts to identify trends in stormwater quality and recognize anomalous results so that trace-back and corrective actions could be initiated. LLNL created the site-specific water quality threshold tool to better understand the nature of the stormwater influent and effluent, to establish a technical basis for determining when facility operations might be impacting the quality of stormwater discharges, and to provide ''action levels'' to initiate follow-up to analytical results. The threshold criteria were based on a statistical analysis of the historic stormwater monitoring data and a review of relevant water quality objectives.« less
Evaluating the Effect of Labeled Benchmarks on Children’s Number Line Estimation Performance and Strategy Use

PubMed Central

Peeters, Dominique; Sekeris, Elke; Verschaffel, Lieven; Luwel, Koen

2017-01-01

Some authors argue that age-related improvements in number line estimation (NLE) performance result from changes in strategy use. More specifically, children’s strategy use develops from only using the origin of the number line, to using the origin and the endpoint, to eventually also relying on the midpoint of the number line. Recently, Peeters et al. (unpublished) investigated whether the provision of additional unlabeled benchmarks at 25, 50, and 75% of the number line, positively affects third and fifth graders’ NLE performance and benchmark-based strategy use. It was found that only the older children benefitted from the presence of these benchmarks at the quartiles of the number line (i.e., 25 and 75%), as they made more use of these benchmarks, leading to more accurate estimates. A possible explanation for this lack of improvement in third graders might be their inability to correctly link the presented benchmarks with their corresponding numerical values. In the present study, we investigated whether labeling these benchmarks with their corresponding numerical values, would have a positive effect on younger children’s NLE performance and quartile-based strategy use as well. Third and sixth graders were assigned to one of three conditions: (a) a control condition with an empty number line bounded by 0 at the origin and 1,000 at the endpoint, (b) an unlabeled condition with three additional external benchmarks without numerical labels at 25, 50, and 75% of the number line, and (c) a labeled condition in which these benchmarks were labeled with 250, 500, and 750, respectively. Results indicated that labeling the benchmarks has a positive effect on third graders’ NLE performance and quartile-based strategy use, whereas sixth graders already benefited from the mere provision of unlabeled benchmarks. These findings imply that children’s benchmark-based strategy use can be stimulated by adding additional externally provided benchmarks on the number line, but that, depending on children’s age and familiarity with the number range, these additional external benchmarks might need to be labeled. PMID:28713302
Evaluating the Effect of Labeled Benchmarks on Children's Number Line Estimation Performance and Strategy Use.

PubMed

Peeters, Dominique; Sekeris, Elke; Verschaffel, Lieven; Luwel, Koen

2017-01-01

Some authors argue that age-related improvements in number line estimation (NLE) performance result from changes in strategy use. More specifically, children's strategy use develops from only using the origin of the number line, to using the origin and the endpoint, to eventually also relying on the midpoint of the number line. Recently, Peeters et al. (unpublished) investigated whether the provision of additional unlabeled benchmarks at 25, 50, and 75% of the number line, positively affects third and fifth graders' NLE performance and benchmark-based strategy use. It was found that only the older children benefitted from the presence of these benchmarks at the quartiles of the number line (i.e., 25 and 75%), as they made more use of these benchmarks, leading to more accurate estimates. A possible explanation for this lack of improvement in third graders might be their inability to correctly link the presented benchmarks with their corresponding numerical values. In the present study, we investigated whether labeling these benchmarks with their corresponding numerical values, would have a positive effect on younger children's NLE performance and quartile-based strategy use as well. Third and sixth graders were assigned to one of three conditions: (a) a control condition with an empty number line bounded by 0 at the origin and 1,000 at the endpoint, (b) an unlabeled condition with three additional external benchmarks without numerical labels at 25, 50, and 75% of the number line, and (c) a labeled condition in which these benchmarks were labeled with 250, 500, and 750, respectively. Results indicated that labeling the benchmarks has a positive effect on third graders' NLE performance and quartile-based strategy use, whereas sixth graders already benefited from the mere provision of unlabeled benchmarks. These findings imply that children's benchmark-based strategy use can be stimulated by adding additional externally provided benchmarks on the number line, but that, depending on children's age and familiarity with the number range, these additional external benchmarks might need to be labeled.
TRECVID: the utility of a content-based video retrieval evaluation

NASA Astrophysics Data System (ADS)

Hauptmann, Alexander G.

2006-01-01

TRECVID, an annual retrieval evaluation benchmark organized by NIST, encourages research in information retrieval from digital video. TRECVID benchmarking covers both interactive and manual searching by end users, as well as the benchmarking of some supporting technologies including shot boundary detection, extraction of semantic features, and the automatic segmentation of TV news broadcasts. Evaluations done in the context of the TRECVID benchmarks show that generally, speech transcripts and annotations provide the single most important clue for successful retrieval. However, automatically finding the individual images is still a tremendous and unsolved challenge. The evaluations repeatedly found that none of the multimedia analysis and retrieval techniques provide a significant benefit over retrieval using only textual information such as from automatic speech recognition transcripts or closed captions. In interactive systems, we do find significant differences among the top systems, indicating that interfaces can make a huge difference for effective video/image search. For interactive tasks efficient interfaces require few key clicks, but display large numbers of images for visual inspection by the user. The text search finds the right context region in the video in general, but to select specific relevant images we need good interfaces to easily browse the storyboard pictures. In general, TRECVID has motivated the video retrieval community to be honest about what we don't know how to do well (sometimes through painful failures), and has focused us to work on the actual task of video retrieval, as opposed to flashy demos based on technological capabilities.
A method to improve the nutritional quality of foods and beverages based on dietary recommendations.

PubMed

Nijman, C A J; Zijp, I M; Sierksma, A; Roodenburg, A J C; Leenen, R; van den Kerkhoff, C; Weststrate, J A; Meijer, G W

2007-04-01

The increasing consumer interest in health prompted Unilever to develop a globally applicable method (Nutrition Score) to evaluate and improve the nutritional composition of its foods and beverages portfolio. Based on (inter)national dietary recommendations, generic benchmarks were developed to evaluate foods and beverages on their content of trans fatty acids, saturated fatty acids, sodium and sugars. High intakes of these key nutrients are associated with undesirable health effects. In principle, the developed generic benchmarks can be applied globally for any food and beverage product. Product category-specific benchmarks were developed when it was not feasible to meet generic benchmarks because of technological and/or taste factors. The whole Unilever global foods and beverages portfolio has been evaluated and actions have been taken to improve the nutritional quality. The advantages of this method over other initiatives to assess the nutritional quality of foods are that it is based on the latest nutritional scientific insights and its global applicability. The Nutrition Score is the first simple, transparent and straightforward method that can be applied globally and across all food and beverage categories to evaluate the nutritional composition. It can help food manufacturers to improve the nutritional value of their products. In addition, the Nutrition Score can be a starting point for a powerful health indicator front-of-pack. This can have a significant positive impact on public health, especially when implemented by all food manufacturers.
Using Benchmarking Techniques and the 2011 Maternity Practices Infant Nutrition and Care (mPINC) Survey to Improve Performance among Peer Groups across the United States

PubMed Central

Edwards, Roger A.; Dee, Deborah; Umer, Amna; Perrine, Cria G.; Shealy, Katherine R.; Grummer-Strawn, Laurence M.

2015-01-01

Background A substantial proportion of US maternity care facilities engage in practices that are not evidence-based and that interfere with breastfeeding. The CDC Survey of Maternity Practices in Infant Nutrition and Care (mPINC) showed significant variation in maternity practices among US states. Objective The purpose of this article is to use benchmarking techniques to identify states within relevant peer groups that were top performers on mPINC survey indicators related to breastfeeding support. Methods We used 11 indicators of breastfeeding-related maternity care from the 2011 mPINC survey and benchmarking techniques to organize and compare hospital-based maternity practices across the 50 states and Washington, DC. We created peer categories for benchmarking first by region (grouping states by West, Midwest, South, and Northeast) and then by size (grouping states by the number of maternity facilities and dividing each region into approximately equal halves based on the number of facilities). Results Thirty-four states had scores high enough to serve as benchmarks, and 32 states had scores low enough to reflect the lowest score gap from the benchmark on at least 1 indicator. No state served as the benchmark on more than 5 indicators and no state was furthest from the benchmark on more than 7 indicators. The small peer group benchmarks in the South, West, and Midwest were better than the large peer group benchmarks on 91%, 82%, and 36% of the indicators, respectively. In the West large, the Midwest large, the Midwest small, and the South large peer groups, 4–6 benchmarks showed that less than 50% of hospitals have ideal practice in all states. Conclusion The evaluation presents benchmarks for peer group state comparisons that provide potential and feasible targets for improvement. PMID:24394963

Benchmarks of fairness for health care reform: a policy tool for developing countries.

PubMed Central

Daniels, N.; Bryant, J.; Castano, R. A.; Dantes, O. G.; Khan, K. S.; Pannarunothai, S.

2000-01-01

Teams of collaborators from Colombia, Mexico, Pakistan, and Thailand have adapted a policy tool originally developed for evaluating health insurance reforms in the United States into "benchmarks of fairness" for assessing health system reform in developing countries. We describe briefly the history of the benchmark approach, the tool itself, and the uses to which it may be put. Fairness is a wide term that includes exposure to risk factors, access to all forms of care, and to financing. It also includes efficiency of management and resource allocation, accountability, and patient and provider autonomy. The benchmarks standardize the criteria for fairness. Reforms are then evaluated by scoring according to the degree to which they improve the situation, i.e. on a scale of -5 to 5, with zero representing the status quo. The object is to promote discussion about fairness across the disciplinary divisions that keep policy analysts and the public from understanding how trade-offs between different effects of reforms can affect the overall fairness of the reform. The benchmarks can be used at both national and provincial or district levels, and we describe plans for such uses in the collaborating sites. A striking feature of the adaptation process is that there was wide agreement on this ethical framework among the collaborating sites despite their large historical, political and cultural differences. PMID:10916911
GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods.

PubMed

Schaffter, Thomas; Marbach, Daniel; Floreano, Dario

2011-08-15

Over the last decade, numerous methods have been developed for inference of regulatory networks from gene expression data. However, accurate and systematic evaluation of these methods is hampered by the difficulty of constructing adequate benchmarks and the lack of tools for a differentiated analysis of network predictions on such benchmarks. Here, we describe a novel and comprehensive method for in silico benchmark generation and performance profiling of network inference methods available to the community as an open-source software called GeneNetWeaver (GNW). In addition to the generation of detailed dynamical models of gene regulatory networks to be used as benchmarks, GNW provides a network motif analysis that reveals systematic prediction errors, thereby indicating potential ways of improving inference methods. The accuracy of network inference methods is evaluated using standard metrics such as precision-recall and receiver operating characteristic curves. We show how GNW can be used to assess the performance and identify the strengths and weaknesses of six inference methods. Furthermore, we used GNW to provide the international Dialogue for Reverse Engineering Assessments and Methods (DREAM) competition with three network inference challenges (DREAM3, DREAM4 and DREAM5). GNW is available at http://gnw.sourceforge.net along with its Java source code, user manual and supporting data. Supplementary data are available at Bioinformatics online. dario.floreano@epfl.ch.
Benchmarking and Evaluating Unified Memory for OpenMP GPU Offloading

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mishra, Alok; Li, Lingda; Kong, Martin

Here, the latest OpenMP standard offers automatic device offloading capabilities which facilitate GPU programming. Despite this, there remain many challenges. One of these is the unified memory feature introduced in recent GPUs. GPUs in current and future HPC systems have enhanced support for unified memory space. In such systems, CPU and GPU can access each other's memory transparently, that is, the data movement is managed automatically by the underlying system software and hardware. Memory over subscription is also possible in these systems. However, there is a significant lack of knowledge about how this mechanism will perform, and how programmers shouldmore » use it. We have modified several benchmarks codes, in the Rodinia benchmark suite, to study the behavior of OpenMP accelerator extensions and have used them to explore the impact of unified memory in an OpenMP context. We moreover modified the open source LLVM compiler to allow OpenMP programs to exploit unified memory. The results of our evaluation reveal that, while the performance of unified memory is comparable with that of normal GPU offloading for benchmarks with little data reuse, it suffers from significant overhead when GPU memory is over subcribed for benchmarks with large amount of data reuse. Based on these results, we provide several guidelines for programmers to achieve better performance with unified memory.« less
Resonance Parameter Adjustment Based on Integral Experiments

DOE PAGES

Sobes, Vladimir; Leal, Luiz; Arbanas, Goran; ...

2016-06-02

Our project seeks to allow coupling of differential and integral data evaluation in a continuous-energy framework and to use the generalized linear least-squares (GLLS) methodology in the TSURFER module of the SCALE code package to update the parameters of a resolved resonance region evaluation. We recognize that the GLLS methodology in TSURFER is identical to the mathematical description of a Bayesian update in SAMMY, the SAMINT code was created to use the mathematical machinery of SAMMY to update resolved resonance parameters based on integral data. Traditionally, SAMMY used differential experimental data to adjust nuclear data parameters. Integral experimental data, suchmore » as in the International Criticality Safety Benchmark Experiments Project, remain a tool for validation of completed nuclear data evaluations. SAMINT extracts information from integral benchmarks to aid the nuclear data evaluation process. Later, integral data can be used to resolve any remaining ambiguity between differential data sets, highlight troublesome energy regions, determine key nuclear data parameters for integral benchmark calculations, and improve the nuclear data covariance matrix evaluation. Moreover, SAMINT is not intended to bias nuclear data toward specific integral experiments but should be used to supplement the evaluation of differential experimental data. Using GLLS ensures proper weight is given to the differential data.« less
Benchmarking infrastructure for mutation text mining

PubMed Central

2014-01-01

Background Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. Results We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. Conclusion We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption. PMID:24568600
Benchmarking infrastructure for mutation text mining.

PubMed

Klein, Artjom; Riazanov, Alexandre; Hindle, Matthew M; Baker, Christopher Jo

2014-02-25

Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption.
State Education Agency Communications Process: Benchmark and Best Practices Project. Benchmark and Best Practices Project. Issue No. 01

ERIC Educational Resources Information Center

Zavadsky, Heather

2014-01-01

The role of state education agencies (SEAs) has shifted significantly from low-profile, compliance activities like managing federal grants to engaging in more complex and politically charged tasks like setting curriculum standards, developing accountability systems, and creating new teacher evaluation systems. The move from compliance-monitoring…
75 FR 24534 - Treatment of Cigarettes and Smokeless Tobacco as Nonmailable Matter

Federal Register 2010, 2011, 2012, 2013, 2014

2010-05-05

... photocopy all written comments at USPS Headquarters Library, 475 L'Enfant Plaza, SW., 11th Floor North... benchmarking purposes of cigarette brands or sub-brands among existing adult smokers.'' 18 U.S.C. 1716E(b)(5)(D... of evaluating the product for quality assurance and benchmarking purposes of cigarette brands or sub...
75 FR 51058 - The Effects of Mountaintop Mines and Valley Fills on Aquatic Ecosystems of the Central...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-08-18

... the public additional time to evaluate the data used to derive a benchmark for conductivity. The... FR 18499). By following the link below, reviewers may download the initial data and EPA's derivative data sets that were used to calculate the conductivity benchmark. These reports were developed by the...
SAT® Subject Area Readiness Indicators: Reading, Writing, and STEM

ERIC Educational Resources Information Center

Wyatt, Jeffrey N.; Remigio, Mylene; Camara, Wayne J.

2012-01-01

In 2011, the College Board developed the SAT College and Career Readiness Benchmark to assist educators and policymakers in their efforts to better evaluate the college readiness of their students. This benchmark was designed to identify the point on the SAT score scale that is indicative of students' having a high likelihood of success in…
Turnkey CAD/CAM selection and evaluation

NASA Technical Reports Server (NTRS)

Moody, T.

1980-01-01

The methodology to be followed in evaluating and selecting a computer system for manufacturing applications is discussed. Main frames and minicomputers are considered. Benchmark evaluations, demonstrations, and contract negotiations are discussed.
Overview of the 2014 Edition of the International Handbook of Evaluated Reactor Physics Benchmark Experiments (IRPhEP Handbook)

DOE Office of Scientific and Technical Information (OSTI.GOV)

John D. Bess; J. Blair Briggs; Jim Gulliford

2014-10-01

The International Reactor Physics Experiment Evaluation Project (IRPhEP) is a widely recognized world class program. The work of the IRPhEP is documented in the International Handbook of Evaluated Reactor Physics Benchmark Experiments (IRPhEP Handbook). Integral data from the IRPhEP Handbook is used by reactor safety and design, nuclear data, criticality safety, and analytical methods development specialists, worldwide, to perform necessary validations of their calculational techniques. The IRPhEP Handbook is among the most frequently quoted reference in the nuclear industry and is expected to be a valuable resource for future decades.
Evaluating Cloud Computing in the Proposed NASA DESDynI Ground Data System

NASA Technical Reports Server (NTRS)

Tran, John J.; Cinquini, Luca; Mattmann, Chris A.; Zimdars, Paul A.; Cuddy, David T.; Leung, Kon S.; Kwoun, Oh-Ig; Crichton, Dan; Freeborn, Dana

2011-01-01

The proposed NASA Deformation, Ecosystem Structure and Dynamics of Ice (DESDynI) mission would be a first-of-breed endeavor that would fundamentally change the paradigm by which Earth Science data systems at NASA are built. DESDynI is evaluating a distributed architecture where expert science nodes around the country all engage in some form of mission processing and data archiving. This is compared to the traditional NASA Earth Science missions where the science processing is typically centralized. What's more, DESDynI is poised to profoundly increase the amount of data collection and processing well into the 5 terabyte/day and tens of thousands of job range, both of which comprise a tremendous challenge to DESDynI's proposed distributed data system architecture. In this paper, we report on a set of architectural trade studies and benchmarks meant to inform the DESDynI mission and the broader community of the impacts of these unprecedented requirements. In particular, we evaluate the benefits of cloud computing and its integration with our existing NASA ground data system software called Apache Object Oriented Data Technology (OODT). The preliminary conclusions of our study suggest that the use of the cloud and OODT together synergistically form an effective, efficient and extensible combination that could meet the challenges of NASA science missions requiring DESDynI-like data collection and processing volumes at reduced costs.
Impact of quality circles for improvement of asthma care: results of a randomized controlled trial

PubMed Central

Schneider, Antonius; Wensing, Michel; Biessecker, Kathrin; Quinzler, Renate; Kaufmann-Kolle, Petra; Szecsenyi, Joachim

2008-01-01

Rationale and aims Quality circles (QCs) are well established as a means of aiding doctors. New quality improvement strategies include benchmarking activities. The aim of this paper was to evaluate the efficacy of QCs for asthma care working either with general feedback or with an open benchmark. Methods Twelve QCs, involving 96 general practitioners, were organized in a randomized controlled trial. Six worked with traditional anonymous feedback and six with an open benchmark; both had guided discussion from a trained moderator. Forty-three primary care practices agreed to give out questionnaires to patients to evaluate the efficacy of QCs. Results A total of 256 patients participated in the survey, of whom 185 (72.3%) responded to the follow-up 1 year later. Use of inhaled steroids at baseline was high (69%) and self-management low (asthma education 27%, individual emergency plan 8%, and peak flow meter at home 21%). Guideline adherence in drug treatment increased (P = 0.19), and asthma steps improved (P = 0.02). Delivery of individual emergency plans increased (P = 0.008), and unscheduled emergency visits decreased (P = 0.064). There was no change in asthma education and peak flow meter usage. High medication guideline adherence was associated with reduced emergency visits (OR 0.24; 95% CI 0.07–0.89). Use of theophylline was associated with hospitalization (OR 7.1; 95% CI 1.5–34.3) and emergency visits (OR 4.9; 95% CI 1.6–14.7). There was no difference between traditional and benchmarking QCs. Conclusions Quality circles working with individualized feedback are effective at improving asthma care. The trial may have been underpowered to detect specific benchmarking effects. Further research is necessary to evaluate strategies for improving the self-management of asthma patients. PMID:18093108
HTR-PROTEUS pebble bed experimental program cores 9 & 10: columnar hexagonal point-on-point packing with a 1:1 moderator-to-fuel pebble ratio

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bess, John D.

2014-03-01

PROTEUS is a zero-power research reactor based on a cylindrical graphite annulus with a central cylindrical cavity. The graphite annulus remains basically the same for all experimental programs, but the contents of the central cavity are changed according to the type of reactor being investigated. Through most of its service history, PROTEUS has represented light-water reactors, but from 1992 to 1996 PROTEUS was configured as a pebble-bed reactor (PBR) critical facility and designated as HTR-PROTEUS. The nomenclature was used to indicate that this series consisted of High Temperature Reactor experiments performed in the PROTEUS assembly. During this period, seventeen criticalmore » configurations were assembled and various reactor physics experiments were conducted. These experiments included measurements of criticality, differential and integral control rod and safety rod worths, kinetics, reaction rates, water ingress effects, and small sample reactivity effects (Ref. 3). HTR-PROTEUS was constructed, and the experimental program was conducted, for the purpose of providing experimental benchmark data for assessment of reactor physics computer codes. Considerable effort was devoted to benchmark calculations as a part of the HTR-PROTEUS program. References 1 and 2 provide detailed data for use in constructing models for codes to be assessed. Reference 3 is a comprehensive summary of the HTR-PROTEUS experiments and the associated benchmark program. This document draws freely from these references. Only Cores 9 and 10 are evaluated in this benchmark report due to similarities in their construction. The other core configurations of the HTR-PROTEUS program are evaluated in their respective reports as outlined in Section 1.0. Cores 9 and 10 were evaluated and determined to be acceptable benchmark experiments.« less
HTR-PROTEUS PEBBLE BED EXPERIMENTAL PROGRAM CORES 5, 6, 7, & 8: COLUMNAR HEXAGONAL POINT-ON-POINT PACKING WITH A 1:2 MODERATOR-TO-FUEL PEBBLE RATIO

DOE Office of Scientific and Technical Information (OSTI.GOV)

John D. Bess

2013-03-01

PROTEUS is a zero-power research reactor based on a cylindrical graphite annulus with a central cylindrical cavity. The graphite annulus remains basically the same for all experimental programs, but the contents of the central cavity are changed according to the type of reactor being investigated. Through most of its service history, PROTEUS has represented light-water reactors, but from 1992 to 1996 PROTEUS was configured as a pebble-bed reactor (PBR) critical facility and designated as HTR-PROTEUS. The nomenclature was used to indicate that this series consisted of High Temperature Reactor experiments performed in the PROTEUS assembly. During this period, seventeen criticalmore » configurations were assembled and various reactor physics experiments were conducted. These experiments included measurements of criticality, differential and integral control rod and safety rod worths, kinetics, reaction rates, water ingress effects, and small sample reactivity effects (Ref. 3). HTR-PROTEUS was constructed, and the experimental program was conducted, for the purpose of providing experimental benchmark data for assessment of reactor physics computer codes. Considerable effort was devoted to benchmark calculations as a part of the HTR-PROTEUS program. References 1 and 2 provide detailed data for use in constructing models for codes to be assessed. Reference 3 is a comprehensive summary of the HTR-PROTEUS experiments and the associated benchmark program. This document draws freely from these references. Only Cores 9 and 10 are evaluated in this benchmark report due to similarities in their construction. The other core configurations of the HTR-PROTEUS program are evaluated in their respective reports as outlined in Section 1.0. Cores 9 and 10 were evaluated and determined to be acceptable benchmark experiments.« less
HTR-PROTEUS PEBBLE BED EXPERIMENTAL PROGRAM CORES 9 & 10: COLUMNAR HEXAGONAL POINT-ON-POINT PACKING WITH A 1:1 MODERATOR-TO-FUEL PEBBLE RATIO

DOE Office of Scientific and Technical Information (OSTI.GOV)

John D. Bess

2013-03-01

PROTEUS is a zero-power research reactor based on a cylindrical graphite annulus with a central cylindrical cavity. The graphite annulus remains basically the same for all experimental programs, but the contents of the central cavity are changed according to the type of reactor being investigated. Through most of its service history, PROTEUS has represented light-water reactors, but from 1992 to 1996 PROTEUS was configured as a pebble-bed reactor (PBR) critical facility and designated as HTR-PROTEUS. The nomenclature was used to indicate that this series consisted of High Temperature Reactor experiments performed in the PROTEUS assembly. During this period, seventeen criticalmore » configurations were assembled and various reactor physics experiments were conducted. These experiments included measurements of criticality, differential and integral control rod and safety rod worths, kinetics, reaction rates, water ingress effects, and small sample reactivity effects (Ref. 3). HTR-PROTEUS was constructed, and the experimental program was conducted, for the purpose of providing experimental benchmark data for assessment of reactor physics computer codes. Considerable effort was devoted to benchmark calculations as a part of the HTR-PROTEUS program. References 1 and 2 provide detailed data for use in constructing models for codes to be assessed. Reference 3 is a comprehensive summary of the HTR-PROTEUS experiments and the associated benchmark program. This document draws freely from these references. Only Cores 9 and 10 are evaluated in this benchmark report due to similarities in their construction. The other core configurations of the HTR-PROTEUS program are evaluated in their respective reports as outlined in Section 1.0. Cores 9 and 10 were evaluated and determined to be acceptable benchmark experiments.« less
An automated benchmarking platform for MHC class II binding prediction methods.

PubMed

Andreatta, Massimo; Trolle, Thomas; Yan, Zhen; Greenbaum, Jason A; Peters, Bjoern; Nielsen, Morten

2018-05-01

Computational methods for the prediction of peptide-MHC binding have become an integral and essential component for candidate selection in experimental T cell epitope discovery studies. The sheer amount of published prediction methods-and often discordant reports on their performance-poses a considerable quandary to the experimentalist who needs to choose the best tool for their research. With the goal to provide an unbiased, transparent evaluation of the state-of-the-art in the field, we created an automated platform to benchmark peptide-MHC class II binding prediction tools. The platform evaluates the absolute and relative predictive performance of all participating tools on data newly entered into the Immune Epitope Database (IEDB) before they are made public, thereby providing a frequent, unbiased assessment of available prediction tools. The benchmark runs on a weekly basis, is fully automated, and displays up-to-date results on a publicly accessible website. The initial benchmark described here included six commonly used prediction servers, but other tools are encouraged to join with a simple sign-up procedure. Performance evaluation on 59 data sets composed of over 10 000 binding affinity measurements suggested that NetMHCIIpan is currently the most accurate tool, followed by NN-align and the IEDB consensus method. Weekly reports on the participating methods can be found online at: http://tools.iedb.org/auto_bench/mhcii/weekly/. mniel@bioinformatics.dtu.dk. Supplementary data are available at Bioinformatics online.
Development and testing of the VITAMIN-B7/BUGLE-B7 coupled neutron-gamma multigroup cross-section libraries

DOE Office of Scientific and Technical Information (OSTI.GOV)

Risner, J.M.; Wiarda, D.; Miller, T.M.

2011-07-01

The U.S. Nuclear Regulatory Commission's Regulatory Guide 1.190 states that calculational methods used to estimate reactor pressure vessel (RPV) fluence should use the latest version of the evaluated nuclear data file (ENDF). The VITAMIN-B6 fine-group library and BUGLE-96 broad-group library, which are widely used for RPV fluence calculations, were generated using ENDF/B-VI.3 data, which was the most current data when Regulatory Guide 1.190 was issued. We have developed new fine-group (VITAMIN-B7) and broad-group (BUGLE-B7) libraries based on ENDF/B-VII.0. These new libraries, which were processed using the AMPX code system, maintain the same group structures as the VITAMIN-B6 and BUGLE-96 libraries.more » Verification and validation of the new libraries were accomplished using diagnostic checks in AMPX, 'unit tests' for each element in VITAMIN-B7, and a diverse set of benchmark experiments including critical evaluations for fast and thermal systems, a set of experimental benchmarks that are used for SCALE regression tests, and three RPV fluence benchmarks. The benchmark evaluation results demonstrate that VITAMIN-B7 and BUGLE-B7 are appropriate for use in RPV fluence calculations and meet the calculational uncertainty criterion in Regulatory Guide 1.190. (authors)« less
Development and Testing of the VITAMIN-B7/BUGLE-B7 Coupled Neutron-Gamma Multigroup Cross-Section Libraries

DOE Office of Scientific and Technical Information (OSTI.GOV)

Risner, Joel M; Wiarda, Dorothea; Miller, Thomas Martin

2011-01-01

The U.S. Nuclear Regulatory Commission s Regulatory Guide 1.190 states that calculational methods used to estimate reactor pressure vessel (RPV) fluence should use the latest version of the Evaluated Nuclear Data File (ENDF). The VITAMIN-B6 fine-group library and BUGLE-96 broad-group library, which are widely used for RPV fluence calculations, were generated using ENDF/B-VI data, which was the most current data when Regulatory Guide 1.190 was issued. We have developed new fine-group (VITAMIN-B7) and broad-group (BUGLE-B7) libraries based on ENDF/B-VII. These new libraries, which were processed using the AMPX code system, maintain the same group structures as the VITAMIN-B6 and BUGLE-96more » libraries. Verification and validation of the new libraries was accomplished using diagnostic checks in AMPX, unit tests for each element in VITAMIN-B7, and a diverse set of benchmark experiments including critical evaluations for fast and thermal systems, a set of experimental benchmarks that are used for SCALE regression tests, and three RPV fluence benchmarks. The benchmark evaluation results demonstrate that VITAMIN-B7 and BUGLE-B7 are appropriate for use in LWR shielding applications, and meet the calculational uncertainty criterion in Regulatory Guide 1.190.« less

Regression Tree-Based Methodology for Customizing Building Energy Benchmarks to Individual Commercial Buildings

NASA Astrophysics Data System (ADS)

Kaskhedikar, Apoorva Prakash

According to the U.S. Energy Information Administration, commercial buildings represent about 40% of the United State's energy consumption of which office buildings consume a major portion. Gauging the extent to which an individual building consumes energy in excess of its peers is the first step in initiating energy efficiency improvement. Energy Benchmarking offers initial building energy performance assessment without rigorous evaluation. Energy benchmarking tools based on the Commercial Buildings Energy Consumption Survey (CBECS) database are investigated in this thesis. This study proposes a new benchmarking methodology based on decision trees, where a relationship between the energy use intensities (EUI) and building parameters (continuous and categorical) is developed for different building types. This methodology was applied to medium office and school building types contained in the CBECS database. The Random Forest technique was used to find the most influential parameters that impact building energy use intensities. Subsequently, correlations which were significant were identified between EUIs and CBECS variables. Other than floor area, some of the important variables were number of workers, location, number of PCs and main cooling equipment. The coefficient of variation was used to evaluate the effectiveness of the new model. The customization technique proposed in this thesis was compared with another benchmarking model that is widely used by building owners and designers namely, the ENERGY STAR's Portfolio Manager. This tool relies on the standard Linear Regression methods which is only able to handle continuous variables. The model proposed uses data mining technique and was found to perform slightly better than the Portfolio Manager. The broader impacts of the new benchmarking methodology proposed is that it allows for identifying important categorical variables, and then incorporating them in a local, as against a global, model framework for EUI pertinent to the building type. The ability to identify and rank the important variables is of great importance in practical implementation of the benchmarking tools which rely on query-based building and HVAC variable filters specified by the user.
Radiation Detection Computational Benchmark Scenarios

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shaver, Mark W.; Casella, Andrew M.; Wittman, Richard S.

2013-09-24

Modeling forms an important component of radiation detection development, allowing for testing of new detector designs, evaluation of existing equipment against a wide variety of potential threat sources, and assessing operation performance of radiation detection systems. This can, however, result in large and complex scenarios which are time consuming to model. A variety of approaches to radiation transport modeling exist with complementary strengths and weaknesses for different problems. This variety of approaches, and the development of promising new tools (such as ORNL’s ADVANTG) which combine benefits of multiple approaches, illustrates the need for a means of evaluating or comparing differentmore » techniques for radiation detection problems. This report presents a set of 9 benchmark problems for comparing different types of radiation transport calculations, identifying appropriate tools for classes of problems, and testing and guiding the development of new methods. The benchmarks were drawn primarily from existing or previous calculations with a preference for scenarios which include experimental data, or otherwise have results with a high level of confidence, are non-sensitive, and represent problem sets of interest to NA-22. From a technical perspective, the benchmarks were chosen to span a range of difficulty and to include gamma transport, neutron transport, or both and represent different important physical processes and a range of sensitivity to angular or energy fidelity. Following benchmark identification, existing information about geometry, measurements, and previous calculations were assembled. Monte Carlo results (MCNP decks) were reviewed or created and re-run in order to attain accurate computational times and to verify agreement with experimental data, when present. Benchmark information was then conveyed to ORNL in order to guide testing and development of hybrid calculations. The results of those ADVANTG calculations were then sent to PNNL for compilation. This is a report describing the details of the selected Benchmarks and results from various transport codes.« less
Optimal type 2 diabetes mellitus management: the randomised controlled OPTIMISE benchmarking study: baseline results from six European countries.

PubMed

Hermans, Michel P; Brotons, Carlos; Elisaf, Moses; Michel, Georges; Muls, Erik; Nobels, Frank

2013-12-01

Micro- and macrovascular complications of type 2 diabetes have an adverse impact on survival, quality of life and healthcare costs. The OPTIMISE (OPtimal Type 2 dIabetes Management Including benchmarking and Standard trEatment) trial comparing physicians' individual performances with a peer group evaluates the hypothesis that benchmarking, using assessments of change in three critical quality indicators of vascular risk: glycated haemoglobin (HbA1c), low-density lipoprotein-cholesterol (LDL-C) and systolic blood pressure (SBP), may improve quality of care in type 2 diabetes in the primary care setting. This was a randomised, controlled study of 3980 patients with type 2 diabetes. Six European countries participated in the OPTIMISE study (NCT00681850). Quality of care was assessed by the percentage of patients achieving pre-set targets for the three critical quality indicators over 12 months. Physicians were randomly assigned to receive either benchmarked or non-benchmarked feedback. All physicians received feedback on six of their patients' modifiable outcome indicators (HbA1c, fasting glycaemia, total cholesterol, high-density lipoprotein-cholesterol (HDL-C), LDL-C and triglycerides). Physicians in the benchmarking group additionally received information on levels of control achieved for the three critical quality indicators compared with colleagues. At baseline, the percentage of evaluable patients (N = 3980) achieving pre-set targets was 51.2% (HbA1c; n = 2028/3964); 34.9% (LDL-C; n = 1350/3865); 27.3% (systolic blood pressure; n = 911/3337). OPTIMISE confirms that target achievement in the primary care setting is suboptimal for all three critical quality indicators. This represents an unmet but modifiable need to revisit the mechanisms and management of improving care in type 2 diabetes. OPTIMISE will help to assess whether benchmarking is a useful clinical tool for improving outcomes in type 2 diabetes.
The Use of Quality Benchmarking in Assessing Web Resources for the Dermatology Virtual Branch Library of the National electronic Library for Health (NeLH)

PubMed Central

Roudsari, AV; Gordon, C; Gray, JA Muir

2001-01-01

Background In 1998, the U.K. National Health Service Information for Health Strategy proposed the implementation of a National electronic Library for Health to provide clinicians, healthcare managers and planners, patients and the public with easy, round the clock access to high quality, up-to-date electronic information on health and healthcare. The Virtual Branch Libraries are among the most important components of the National electronic Library for Health . They aim at creating online knowledge based communities, each concerned with some specific clinical and other health-related topics. Objectives This study is about the envisaged Dermatology Virtual Branch Libraries of the National electronic Library for Health . It aims at selecting suitable dermatology Web resources for inclusion in the forthcoming Virtual Branch Libraries after establishing preliminary quality benchmarking rules for this task. Psoriasis, being a common dermatological condition, has been chosen as a starting point. Methods Because quality is a principal concern of the National electronic Library for Health, the study includes a review of the major quality benchmarking systems available today for assessing health-related Web sites. The methodology of developing a quality benchmarking system has been also reviewed. Aided by metasearch Web tools, candidate resources were hand-selected in light of the reviewed benchmarking systems and specific criteria set by the authors. Results Over 90 professional and patient-oriented Web resources on psoriasis and dermatology in general are suggested for inclusion in the forthcoming Dermatology Virtual Branch Libraries. The idea of an all-in knowledge-hallmarking instrument for the National electronic Library for Health is also proposed based on the reviewed quality benchmarking systems. Conclusions Skilled, methodical, organized human reviewing, selection and filtering based on well-defined quality appraisal criteria seems likely to be the key ingredient in the envisaged National electronic Library for Health service. Furthermore, by promoting the application of agreed quality guidelines and codes of ethics by all health information providers and not just within the National electronic Library for Health, the overall quality of the Web will improve with time and the Web will ultimately become a reliable and integral part of the care space. PMID:11720947
Benchmark matrix and guide: Part III.

PubMed

1992-01-01

The final article in the "Benchmark Matrix and Guide" series developed by Headquarters Air Force Logistics Command completes the discussion of the last three categories that are essential ingredients of a successful total quality management (TQM) program. Detailed behavioral objectives are listed in the areas of recognition, process improvement, and customer focus. These vertical categories are meant to be applied to the levels of the matrix that define the progressive stages of the TQM: business as usual, initiation, implementation, expansion, and integration. By charting the horizontal progress level and the vertical TQM category, the quality management professional can evaluate the current state of TQM in any given organization. As each category is completed, new goals can be defined in order to advance to a higher level. The benchmarking process is integral to quality improvement efforts because it focuses on the highest possible standards to evaluate quality programs.
Collected notes from the Benchmarks and Metrics Workshop

NASA Technical Reports Server (NTRS)

Drummond, Mark E.; Kaelbling, Leslie P.; Rosenschein, Stanley J.

1991-01-01

In recent years there has been a proliferation of proposals in the artificial intelligence (AI) literature for integrated agent architectures. Each architecture offers an approach to the general problem of constructing an integrated agent. Unfortunately, the ways in which one architecture might be considered better than another are not always clear. There has been a growing realization that many of the positive and negative aspects of an architecture become apparent only when experimental evaluation is performed and that to progress as a discipline, we must develop rigorous experimental methods. In addition to the intrinsic intellectual interest of experimentation, rigorous performance evaluation of systems is also a crucial practical concern to our research sponsors. DARPA, NASA, and AFOSR (among others) are actively searching for better ways of experimentally evaluating alternative approaches to building intelligent agents. One tool for experimental evaluation involves testing systems on benchmark tasks in order to assess their relative performance. As part of a joint DARPA and NASA funded project, NASA-Ames and Teleos Research are carrying out a research effort to establish a set of benchmark tasks and evaluation metrics by which the performance of agent architectures may be determined. As part of this project, we held a workshop on Benchmarks and Metrics at the NASA Ames Research Center on June 25, 1990. The objective of the workshop was to foster early discussion on this important topic. We did not achieve a consensus, nor did we expect to. Collected here is some of the information that was exchanged at the workshop. Given here is an outline of the workshop, a list of the participants, notes taken on the white-board during open discussions, position papers/notes from some participants, and copies of slides used in the presentations.
Model evaluation using a community benchmarking system for land surface models

NASA Astrophysics Data System (ADS)

Mu, M.; Hoffman, F. M.; Lawrence, D. M.; Riley, W. J.; Keppel-Aleks, G.; Kluzek, E. B.; Koven, C. D.; Randerson, J. T.

2014-12-01

Evaluation of atmosphere, ocean, sea ice, and land surface models is an important step in identifying deficiencies in Earth system models and developing improved estimates of future change. For the land surface and carbon cycle, the design of an open-source system has been an important objective of the International Land Model Benchmarking (ILAMB) project. Here we evaluated CMIP5 and CLM models using a benchmarking system that enables users to specify models, data sets, and scoring systems so that results can be tailored to specific model intercomparison projects. Our scoring system used information from four different aspects of global datasets, including climatological mean spatial patterns, seasonal cycle dynamics, interannual variability, and long-term trends. Variable-to-variable comparisons enable investigation of the mechanistic underpinnings of model behavior, and allow for some control of biases in model drivers. Graphics modules allow users to evaluate model performance at local, regional, and global scales. Use of modular structures makes it relatively easy for users to add new variables, diagnostic metrics, benchmarking datasets, or model simulations. Diagnostic results are automatically organized into HTML files, so users can conveniently share results with colleagues. We used this system to evaluate atmospheric carbon dioxide, burned area, global biomass and soil carbon stocks, net ecosystem exchange, gross primary production, ecosystem respiration, terrestrial water storage, evapotranspiration, and surface radiation from CMIP5 historical and ESM historical simulations. We found that the multi-model mean often performed better than many of the individual models for most variables. We plan to publicly release a stable version of the software during fall of 2014 that has land surface, carbon cycle, hydrology, radiation and energy cycle components.
Methodology and issues of integral experiments selection for nuclear data validation

NASA Astrophysics Data System (ADS)

Tatiana, Ivanova; Ivanov, Evgeny; Hill, Ian

2017-09-01

Nuclear data validation involves a large suite of Integral Experiments (IEs) for criticality, reactor physics and dosimetry applications. [1] Often benchmarks are taken from international Handbooks. [2, 3] Depending on the application, IEs have different degrees of usefulness in validation, and usually the use of a single benchmark is not advised; indeed, it may lead to erroneous interpretation and results. [1] This work aims at quantifying the importance of benchmarks used in application dependent cross section validation. The approach is based on well-known General Linear Least Squared Method (GLLSM) extended to establish biases and uncertainties for given cross sections (within a given energy interval). The statistical treatment results in a vector of weighting factors for the integral benchmarks. These factors characterize the value added by a benchmark for nuclear data validation for the given application. The methodology is illustrated by one example, selecting benchmarks for 239Pu cross section validation. The studies were performed in the framework of Subgroup 39 (Methods and approaches to provide feedback from nuclear and covariance data adjustment for improvement of nuclear data files) established at the Working Party on International Nuclear Data Evaluation Cooperation (WPEC) of the Nuclear Science Committee under the Nuclear Energy Agency (NEA/OECD).
Searching for 'Unknown Unknowns'

NASA Technical Reports Server (NTRS)

Parsons, Vickie S.

2005-01-01

The NASA Engineering and Safety Center (NESC) was established to improve safety through engineering excellence within NASA programs and projects. As part of this goal, methods are being investigated to enable the NESC to become proactive in identifying areas that may be precursors to future problems. The goal is to find unknown indicators of future problems, not to duplicate the program-specific trending efforts. The data that is critical for detecting these indicators exist in a plethora of dissimilar non-conformance and other databases (without a common format or taxonomy). In fact, much of the data is unstructured text. However, one common database is not required if the right standards and electronic tools are employed. Electronic data mining is a particularly promising tool for this effort into unsupervised learning of common factors. This work in progress began with a systematic evaluation of available data mining software packages, based on documented decision techniques using weighted criteria. The four packages, which were perceived to have the most promise for NASA applications, are being benchmarked and evaluated by independent contractors. Preliminary recommendations for "best practices" in data mining and trending are provided. Final results and recommendations should be available in the Fall 2005. This critical first step in identifying "unknown unknowns" before they become problems is applicable to any set of engineering or programmatic data.
High Fidelity BWR Fuel Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoon, Su Jong

This report describes the Consortium for Advanced Simulation of Light Water Reactors (CASL) work conducted for completion of the Thermal Hydraulics Methods (THM) Level 3 milestone THM.CFD.P13.03: High Fidelity BWR Fuel Simulation. High fidelity computational fluid dynamics (CFD) simulation for Boiling Water Reactor (BWR) was conducted to investigate the applicability and robustness performance of BWR closures. As a preliminary study, a CFD model with simplified Ferrule spacer grid geometry of NUPEC BWR Full-size Fine-mesh Bundle Test (BFBT) benchmark has been implemented. Performance of multiphase segregated solver with baseline boiling closures has been evaluated. Although the mean values of void fractionmore » and exit quality of CFD result for BFBT case 4101-61 agreed with experimental data, the local void distribution was not predicted accurately. The mesh quality was one of the critical factors to obtain converged result. The stability and robustness of the simulation was mainly affected by the mesh quality, combination of BWR closure models. In addition, the CFD modeling of fully-detailed spacer grid geometry with mixing vane is necessary for improving the accuracy of CFD simulation.« less
ViSAPy: a Python tool for biophysics-based generation of virtual spiking activity for evaluation of spike-sorting algorithms.

PubMed

Hagen, Espen; Ness, Torbjørn V; Khosrowshahi, Amir; Sørensen, Christina; Fyhn, Marianne; Hafting, Torkel; Franke, Felix; Einevoll, Gaute T

2015-04-30

New, silicon-based multielectrodes comprising hundreds or more electrode contacts offer the possibility to record spike trains from thousands of neurons simultaneously. This potential cannot be realized unless accurate, reliable automated methods for spike sorting are developed, in turn requiring benchmarking data sets with known ground-truth spike times. We here present a general simulation tool for computing benchmarking data for evaluation of spike-sorting algorithms entitled ViSAPy (Virtual Spiking Activity in Python). The tool is based on a well-established biophysical forward-modeling scheme and is implemented as a Python package built on top of the neuronal simulator NEURON and the Python tool LFPy. ViSAPy allows for arbitrary combinations of multicompartmental neuron models and geometries of recording multielectrodes. Three example benchmarking data sets are generated, i.e., tetrode and polytrode data mimicking in vivo cortical recordings and microelectrode array (MEA) recordings of in vitro activity in salamander retinas. The synthesized example benchmarking data mimics salient features of typical experimental recordings, for example, spike waveforms depending on interspike interval. ViSAPy goes beyond existing methods as it includes biologically realistic model noise, synaptic activation by recurrent spiking networks, finite-sized electrode contacts, and allows for inhomogeneous electrical conductivities. ViSAPy is optimized to allow for generation of long time series of benchmarking data, spanning minutes of biological time, by parallel execution on multi-core computers. ViSAPy is an open-ended tool as it can be generalized to produce benchmarking data or arbitrary recording-electrode geometries and with various levels of complexity. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
Evaluating the Effectiveness of a State-Mandated Benchmark Reading Assessment: mClass Reading 3D (Text Reading and Comprehension)

ERIC Educational Resources Information Center

Snow, Amie B.; Morris, Darrell; Perney, Jan

2018-01-01

We examined which of two instruments (Text Reading and Comprehension inventory [TRC] or a traditional informal reading inventory [IRI]) provides the more valid assessment of a primary-grade student's reading instructional level. The TRC is currently the required, benchmark reading assessment for students in grades K-3 in the state of North…
PSO algorithm enhanced with Lozi Chaotic Map - Tuning experiment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pluhacek, Michal; Senkerik, Roman; Zelinka, Ivan

2015-03-10

In this paper it is investigated the effect of tuning of control parameters of the Lozi Chaotic Map employed as a chaotic pseudo-random number generator for the particle swarm optimization algorithm. Three different benchmark functions are selected from the IEEE CEC 2013 competition benchmark set. The Lozi map is extensively tuned and the performance of PSO is evaluated.
Design and development of a community carbon cycle benchmarking system for CMIP5 models

NASA Astrophysics Data System (ADS)

Mu, M.; Hoffman, F. M.; Lawrence, D. M.; Riley, W. J.; Keppel-Aleks, G.; Randerson, J. T.

2013-12-01

Benchmarking has been widely used to assess the ability of atmosphere, ocean, sea ice, and land surface models to capture the spatial and temporal variability of observations during the historical period. For the carbon cycle and terrestrial ecosystems, the design and development of an open-source community platform has been an important goal as part of the International Land Model Benchmarking (ILAMB) project. Here we designed and developed a software system that enables the user to specify the models, benchmarks, and scoring systems so that results can be tailored to specific model intercomparison projects. We used this system to evaluate the performance of CMIP5 Earth system models (ESMs). Our scoring system used information from four different aspects of climate, including the climatological mean spatial pattern of gridded surface variables, seasonal cycle dynamics, the amplitude of interannual variability, and long-term decadal trends. We used this system to evaluate burned area, global biomass stocks, net ecosystem exchange, gross primary production, and ecosystem respiration from CMIP5 historical simulations. Initial results indicated that the multi-model mean often performed better than many of the individual models for most of the observational constraints.
New Reactor Physics Benchmark Data in the March 2012 Edition of the IRPhEP Handbook

DOE Office of Scientific and Technical Information (OSTI.GOV)

John D. Bess; J. Blair Briggs; Jim Gulliford

2012-11-01

The International Reactor Physics Experiment Evaluation Project (IRPhEP) was established to preserve integral reactor physics experimental data, including separate or special effects data for nuclear energy and technology applications. Numerous experiments that have been performed worldwide, represent a large investment of infrastructure, expertise, and cost, and are valuable resources of data for present and future research. These valuable assets provide the basis for recording, development, and validation of methods. If the experimental data are lost, the high cost to repeat many of these measurements may be prohibitive. The purpose of the IRPhEP is to provide an extensively peer-reviewed set ofmore » reactor physics-related integral data that can be used by reactor designers and safety analysts to validate the analytical tools used to design next-generation reactors and establish the safety basis for operation of these reactors. Contributors from around the world collaborate in the evaluation and review of selected benchmark experiments for inclusion in the International Handbook of Evaluated Reactor Physics Benchmark Experiments (IRPhEP Handbook) [1]. Several new evaluations have been prepared for inclusion in the March 2012 edition of the IRPhEP Handbook.« less
StirMark Benchmark: audio watermarking attacks based on lossy compression

NASA Astrophysics Data System (ADS)

Steinebach, Martin; Lang, Andreas; Dittmann, Jana

2002-04-01

StirMark Benchmark is a well-known evaluation tool for watermarking robustness. Additional attacks are added to it continuously. To enable application based evaluation, in our paper we address attacks against audio watermarks based on lossy audio compression algorithms to be included in the test environment. We discuss the effect of different lossy compression algorithms like MPEG-2 audio Layer 3, Ogg or VQF on a selection of audio test data. Our focus is on changes regarding the basic characteristics of the audio data like spectrum or average power and on removal of embedded watermarks. Furthermore we compare results of different watermarking algorithms and show that lossy compression is still a challenge for most of them. There are two strategies for adding evaluation of robustness against lossy compression to StirMark Benchmark: (a) use of existing free compression algorithms (b) implementation of a generic lossy compression simulation. We discuss how such a model can be implemented based on the results of our tests. This method is less complex, as no real psycho acoustic model has to be applied. Our model can be used for audio watermarking evaluation of numerous application fields. As an example, we describe its importance for e-commerce applications with watermarking security.
District Heating Systems Performance Analyses. Heat Energy Tariff

NASA Astrophysics Data System (ADS)

Ziemele, Jelena; Vigants, Girts; Vitolins, Valdis; Blumberga, Dagnija; Veidenbergs, Ivars

2014-12-01

The paper addresses an important element of the European energy sector: the evaluation of district heating (DH) system operations from the standpoint of increasing energy efficiency and increasing the use of renewable energy resources. This has been done by developing a new methodology for the evaluation of the heat tariff. The paper presents an algorithm of this methodology, which includes not only a data base and calculation equation systems, but also an integrated multi-criteria analysis module using MADM/MCDM (Multi-Attribute Decision Making / Multi-Criteria Decision Making) based on TOPSIS (Technique for Order Performance by Similarity to Ideal Solution). The results of the multi-criteria analysis are used to set the tariff benchmarks. The evaluation methodology has been tested for Latvian heat tariffs, and the obtained results show that only half of heating companies reach a benchmark value equal to 0.5 for the efficiency closeness to the ideal solution indicator. This means that the proposed evaluation methodology would not only allow companies to determine how they perform with regard to the proposed benchmark, but also to identify their need to restructure so that they may reach the level of a low-carbon business.
Patient radiation doses in interventional cardiology in the U.S.: Advisory data sets and possible initial values for U.S. reference levels

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, Donald L.; Hilohi, C. Michael; Spelic, David C.

2012-10-15

Purpose: To determine patient radiation doses from interventional cardiology procedures in the U.S and to suggest possible initial values for U.S. benchmarks for patient radiation dose from selected interventional cardiology procedures [fluoroscopically guided diagnostic cardiac catheterization and percutaneous coronary intervention (PCI)]. Methods: Patient radiation dose metrics were derived from analysis of data from the 2008 to 2009 Nationwide Evaluation of X-ray Trends (NEXT) survey of cardiac catheterization. This analysis used deidentified data and did not require review by an IRB. Data from 171 facilities in 30 states were analyzed. The distributions (percentiles) of radiation dose metrics were determined for diagnosticmore » cardiac catheterizations, PCI, and combined diagnostic and PCI procedures. Confidence intervals for these dose distributions were determined using bootstrap resampling. Results: Percentile distributions (advisory data sets) and possible preliminary U.S. reference levels (based on the 75th percentile of the dose distributions) are provided for cumulative air kerma at the reference point (K{sub a,r}), cumulative air kerma-area product (P{sub KA}), fluoroscopy time, and number of cine runs. Dose distributions are sufficiently detailed to permit dose audits as described in National Council on Radiation Protection and Measurements Report No. 168. Fluoroscopy times are consistent with those observed in European studies, but P{sub KA} is higher in the U.S. Conclusions: Sufficient data exist to suggest possible initial benchmarks for patient radiation dose for certain interventional cardiology procedures in the U.S. Our data suggest that patient radiation dose in these procedures is not optimized in U.S. practice.« less
Constructing Benchmark Databases and Protocols for Medical Image Analysis: Diabetic Retinopathy

PubMed Central

Kauppi, Tomi; Kämäräinen, Joni-Kristian; Kalesnykiene, Valentina; Sorri, Iiris; Uusitalo, Hannu; Kälviäinen, Heikki

2013-01-01

We address the performance evaluation practices for developing medical image analysis methods, in particular, how to establish and share databases of medical images with verified ground truth and solid evaluation protocols. Such databases support the development of better algorithms, execution of profound method comparisons, and, consequently, technology transfer from research laboratories to clinical practice. For this purpose, we propose a framework consisting of reusable methods and tools for the laborious task of constructing a benchmark database. We provide a software tool for medical image annotation helping to collect class label, spatial span, and expert's confidence on lesions and a method to appropriately combine the manual segmentations from multiple experts. The tool and all necessary functionality for method evaluation are provided as public software packages. As a case study, we utilized the framework and tools to establish the DiaRetDB1 V2.1 database for benchmarking diabetic retinopathy detection algorithms. The database contains a set of retinal images, ground truth based on information from multiple experts, and a baseline algorithm for the detection of retinopathy lesions. PMID:23956787
Roofline model toolkit: A practical tool for architectural and program analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lo, Yu Jung; Williams, Samuel; Van Straalen, Brian

We present preliminary results of the Roofline Toolkit for multicore, many core, and accelerated architectures. This paper focuses on the processor architecture characterization engine, a collection of portable instrumented micro benchmarks implemented with Message Passing Interface (MPI), and OpenMP used to express thread-level parallelism. These benchmarks are specialized to quantify the behavior of different architectural features. Compared to previous work on performance characterization, these microbenchmarks focus on capturing the performance of each level of the memory hierarchy, along with thread-level parallelism, instruction-level parallelism and explicit SIMD parallelism, measured in the context of the compilers and run-time environments. We also measuremore » sustained PCIe throughput with four GPU memory managed mechanisms. By combining results from the architecture characterization with the Roofline model based solely on architectural specifications, this work offers insights for performance prediction of current and future architectures and their software systems. To that end, we instrument three applications and plot their resultant performance on the corresponding Roofline model when run on a Blue Gene/Q architecture.« less

New prompt fission gamma-ray spectral data from 239Pu(nth, f) in response to a high priority request from OECD Nuclear Energy Agency

NASA Astrophysics Data System (ADS)

Gatera, Angélique; Belgya, Tamás; Geerts, Wouter; Göök, Alf; Hambsch, Franz-Josef; Lebois, Matthieu; Maróti, Boglárka; Oberstedt, Stephan; Oberstedt, Andreas; Postelt, Frederik; Qi, Liqiang; Szentmiklósi, Laszló; Vidali, Marzio; Zeiser, Fabio

2017-09-01

Benchmark reactor calculations have revealed an underestimation of γ-heat following fission of up to 28%. To improve the modelling of new nuclear reactors, the OECD/NEA initiated a nuclear data High Priority Request List (HPRL) entry for the major isotopes (235U, 239Pu). In response to that HPRL entry, we executed a dedicated measurement program on prompt fission γ-rays employing state-of-the-art lanthanum bromide (LaBr3) detectors with superior timing and good energy resolution. Our new results from 252Cf(sf), 235U(nth,f) and 241Pu(nth,f) provide prompt fission γ-ray spectra characteristics : average number of photons per fission, average total energy per fission and mean photon energy; all within 2% of uncertainty. We present preliminary results on 239Pu(nth,f), recently measured at the Budapest Neutron Centre and supported by the CHANDA Trans-national Access Activity, as well as discussing our different published results in comparison to the historical data and what it says about the discrepancy observed in the benchmark calculations.
Toward Automated Benchmarking of Atomistic Force Fields: Neat Liquid Densities and Static Dielectric Constants from the ThermoML Data Archive.

PubMed

Beauchamp, Kyle A; Behr, Julie M; Rustenburg, Ariën S; Bayly, Christopher I; Kroenlein, Kenneth; Chodera, John D

2015-10-08

Atomistic molecular simulations are a powerful way to make quantitative predictions, but the accuracy of these predictions depends entirely on the quality of the force field employed. Although experimental measurements of fundamental physical properties offer a straightforward approach for evaluating force field quality, the bulk of this information has been tied up in formats that are not machine-readable. Compiling benchmark data sets of physical properties from non-machine-readable sources requires substantial human effort and is prone to the accumulation of human errors, hindering the development of reproducible benchmarks of force-field accuracy. Here, we examine the feasibility of benchmarking atomistic force fields against the NIST ThermoML data archive of physicochemical measurements, which aggregates thousands of experimental measurements in a portable, machine-readable, self-annotating IUPAC-standard format. As a proof of concept, we present a detailed benchmark of the generalized Amber small-molecule force field (GAFF) using the AM1-BCC charge model against experimental measurements (specifically, bulk liquid densities and static dielectric constants at ambient pressure) automatically extracted from the archive and discuss the extent of data available for use in larger scale (or continuously performed) benchmarks. The results of even this limited initial benchmark highlight a general problem with fixed-charge force fields in the representation low-dielectric environments, such as those seen in binding cavities or biological membranes.
Developing a Benchmarking Process in Perfusion: A Report of the Perfusion Downunder Collaboration

PubMed Central

Baker, Robert A.; Newland, Richard F.; Fenton, Carmel; McDonald, Michael; Willcox, Timothy W.; Merry, Alan F.

2012-01-01

Abstract: Improving and understanding clinical practice is an appropriate goal for the perfusion community. The Perfusion Downunder Collaboration has established a multi-center perfusion focused database aimed at achieving these goals through the development of quantitative quality indicators for clinical improvement through benchmarking. Data were collected using the Perfusion Downunder Collaboration database from procedures performed in eight Australian and New Zealand cardiac centers between March 2007 and February 2011. At the Perfusion Downunder Meeting in 2010, it was agreed by consensus, to report quality indicators (QI) for glucose level, arterial outlet temperature, and pCO2 management during cardiopulmonary bypass. The values chosen for each QI were: blood glucose ≥4 mmol/L and ≤10 mmol/L; arterial outlet temperature ≤37°C; and arterial blood gas pCO2 ≥ 35 and ≤45 mmHg. The QI data were used to derive benchmarks using the Achievable Benchmark of Care (ABC™) methodology to identify the incidence of QIs at the best performing centers. Five thousand four hundred and sixty-five procedures were evaluated to derive QI and benchmark data. The incidence of the blood glucose QI ranged from 37–96% of procedures, with a benchmark value of 90%. The arterial outlet temperature QI occurred in 16–98% of procedures with the benchmark of 94%; while the arterial pCO2 QI occurred in 21–91%, with the benchmark value of 80%. We have derived QIs and benchmark calculations for the management of several key aspects of cardiopulmonary bypass to provide a platform for improving the quality of perfusion practice. PMID:22730861
EPA Corporate GHG Goal Evaluation Model

EPA Pesticide Factsheets

The EPA Corporate GHG Goal Evaluation Model provides companies with a transparent and publicly available benchmarking resource to help evaluate and establish new or existing GHG goals that go beyond business as usual for their individual sectors.
Present Status and Extensions of the Monte Carlo Performance Benchmark

NASA Astrophysics Data System (ADS)

Hoogenboom, J. Eduard; Petrovic, Bojan; Martin, William R.

2014-06-01

The NEA Monte Carlo Performance benchmark started in 2011 aiming to monitor over the years the abilities to perform a full-size Monte Carlo reactor core calculation with a detailed power production for each fuel pin with axial distribution. This paper gives an overview of the contributed results thus far. It shows that reaching a statistical accuracy of 1 % for most of the small fuel zones requires about 100 billion neutron histories. The efficiency of parallel execution of Monte Carlo codes on a large number of processor cores shows clear limitations for computer clusters with common type computer nodes. However, using true supercomputers the speedup of parallel calculations is increasing up to large numbers of processor cores. More experience is needed from calculations on true supercomputers using large numbers of processors in order to predict if the requested calculations can be done in a short time. As the specifications of the reactor geometry for this benchmark test are well suited for further investigations of full-core Monte Carlo calculations and a need is felt for testing other issues than its computational performance, proposals are presented for extending the benchmark to a suite of benchmark problems for evaluating fission source convergence for a system with a high dominance ratio, for coupling with thermal-hydraulics calculations to evaluate the use of different temperatures and coolant densities and to study the correctness and effectiveness of burnup calculations. Moreover, other contemporary proposals for a full-core calculation with realistic geometry and material composition will be discussed.
A Privacy-Preserving Platform for User-Centric Quantitative Benchmarking

NASA Astrophysics Data System (ADS)

Herrmann, Dominik; Scheuer, Florian; Feustel, Philipp; Nowey, Thomas; Federrath, Hannes

We propose a centralised platform for quantitative benchmarking of key performance indicators (KPI) among mutually distrustful organisations. Our platform offers users the opportunity to request an ad-hoc benchmarking for a specific KPI within a peer group of their choice. Architecture and protocol are designed to provide anonymity to its users and to hide the sensitive KPI values from other clients and the central server. To this end, we integrate user-centric peer group formation, exchangeable secure multi-party computation protocols, short-lived ephemeral key pairs as pseudonyms, and attribute certificates. We show by empirical evaluation of a prototype that the performance is acceptable for reasonably sized peer groups.
BACT Simulation User Guide (Version 7.0)

NASA Technical Reports Server (NTRS)

Waszak, Martin R.

1997-01-01

This report documents the structure and operation of a simulation model of the Benchmark Active Control Technology (BACT) Wind-Tunnel Model. The BACT system was designed, built, and tested at NASA Langley Research Center as part of the Benchmark Models Program and was developed to perform wind-tunnel experiments to obtain benchmark quality data to validate computational fluid dynamics and computational aeroelasticity codes, to verify the accuracy of current aeroservoelasticity design and analysis tools, and to provide an active controls testbed for evaluating new and innovative control algorithms for flutter suppression and gust load alleviation. The BACT system has been especially valuable as a control system testbed.
Child-Resistant Packaging for E-Liquid: A Review of US State Legislation.

PubMed

Frey, Leslie T; Tilburg, William C

2016-02-01

A growing number of states have introduced or enacted legislation requiring child-resistant packaging for e-liquid containers; however, these laws involve varying terms, packaging standards, and enforcement provisions, raising concerns about their effectiveness. We evaluated bills against 4 benchmarks: broad product definitions that contemplate future developments in the market, citations to a specific packaging standard, stated penalties for violations, and express grants of authority to a state entity to enforce the packaging requirements. Our findings showed that 3 states meet all 4 benchmarks in their enacted legislation. We encourage states to consider these benchmarks when revising statutes or drafting future legislation.
Child-Resistant Packaging for E-Liquid: A Review of US State Legislation

PubMed Central

Tilburg, William C.

2016-01-01

A growing number of states have introduced or enacted legislation requiring child-resistant packaging for e-liquid containers; however, these laws involve varying terms, packaging standards, and enforcement provisions, raising concerns about their effectiveness. We evaluated bills against 4 benchmarks: broad product definitions that contemplate future developments in the market, citations to a specific packaging standard, stated penalties for violations, and express grants of authority to a state entity to enforce the packaging requirements. Our findings showed that 3 states meet all 4 benchmarks in their enacted legislation. We encourage states to consider these benchmarks when revising statutes or drafting future legislation. PMID:26691114
The Impact of the Fountas and Pinnell Benchmark Assessment System on Third Grade South Carolina Ready English Language Arts Scores

ERIC Educational Resources Information Center

Harrington, Shanika

2017-01-01

The purpose of this research study was to evaluate the impact of the district's use of the Fountas and Pinnell Benchmark Assessment System on 3rd grade students' reading achievement as measured by the SC READY ELA test. Educators are increasingly using assessment data in determining students' knowledge and progress. Brady, 2011 stated that…
Restaurant Energy Use Benchmarking Guideline

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hedrick, R.; Smith, V.; Field, K.

2011-07-01

A significant operational challenge for food service operators is defining energy use benchmark metrics to compare against the performance of individual stores. Without metrics, multiunit operators and managers have difficulty identifying which stores in their portfolios require extra attention to bring their energy performance in line with expectations. This report presents a method whereby multiunit operators may use their own utility data to create suitable metrics for evaluating their operations.
Evaluation of School Library Media Centers: Demonstrating Quality.

ERIC Educational Resources Information Center

Everhart, Nancy

2003-01-01

Discusses ways to evaluate school library media programs and how to demonstrate quality. Topics include how principals evaluate programs; sources of evaluative data; national, state, and local instruments; surveys and interviews; Colorado benchmarks; evaluating the use of electronic resources; and computer reporting options. (LRW)
Automatic Keyword Extraction from Individual Documents

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rose, Stuart J.; Engel, David W.; Cramer, Nicholas O.

2010-05-03

This paper introduces a novel and domain-independent method for automatically extracting keywords, as sequences of one or more words, from individual documents. We describe the method’s configuration parameters and algorithm, and present an evaluation on a benchmark corpus of technical abstracts. We also present a method for generating lists of stop words for specific corpora and domains, and evaluate its ability to improve keyword extraction on the benchmark corpus. Finally, we apply our method of automatic keyword extraction to a corpus of news articles and define metrics for characterizing the exclusivity, essentiality, and generality of extracted keywords within a corpus.
Benchmarking and Hardware-In-The-Loop Operation of a ...

EPA Pesticide Factsheets

Engine Performance evaluation in support of LD MTE. EPA used elements of its ALPHA model to apply hardware-in-the-loop (HIL) controls to the SKYACTIV engine test setup to better understand how the engine would operate in a chassis test after combined with future leading edge technologies, advanced high-efficiency transmission, reduced mass, and reduced roadload. Predict future vehicle performance with Atkinson engine. As part of its technology assessment for the upcoming midterm evaluation of the 2017-2025 LD vehicle GHG emissions regulation, EPA has been benchmarking engines and transmissions to generate inputs for use in its ALPHA model
Conceptual Soundness, Metric Development, Benchmarking, and Targeting for PATH Subprogram Evaluation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mosey. G.; Doris, E.; Coggeshall, C.

The objective of this study is to evaluate the conceptual soundness of the U.S. Department of Housing and Urban Development (HUD) Partnership for Advancing Technology in Housing (PATH) program's revised goals and establish and apply a framework to identify and recommend metrics that are the most useful for measuring PATH's progress. This report provides an evaluative review of PATH's revised goals, outlines a structured method for identifying and selecting metrics, proposes metrics and benchmarks for a sampling of individual PATH programs, and discusses other metrics that potentially could be developed that may add value to the evaluation process. The frameworkmore » and individual program metrics can be used for ongoing management improvement efforts and to inform broader program-level metrics for government reporting requirements.« less
Simulation of Benchmark Cases with the Terminal Area Simulation System (TASS)

NASA Technical Reports Server (NTRS)

Ahmad, Nash'at; Proctor, Fred

2011-01-01

The hydrodynamic core of the Terminal Area Simulation System (TASS) is evaluated against different benchmark cases. In the absence of closed form solutions for the equations governing atmospheric flows, the models are usually evaluated against idealized test cases. Over the years, various authors have suggested a suite of these idealized cases which have become standards for testing and evaluating the dynamics and thermodynamics of atmospheric flow models. In this paper, simulations of three such cases are described. In addition, the TASS model is evaluated against a test case that uses an exact solution of the Navier-Stokes equations. The TASS results are compared against previously reported simulations of these banchmark cases in the literature. It is demonstrated that the TASS model is highly accurate, stable and robust.
Determining the sample size required to establish whether a medical device is non-inferior to an external benchmark.

PubMed

Sayers, Adrian; Crowther, Michael J; Judge, Andrew; Whitehouse, Michael R; Blom, Ashley W

2017-08-28

The use of benchmarks to assess the performance of implants such as those used in arthroplasty surgery is a widespread practice. It provides surgeons, patients and regulatory authorities with the reassurance that implants used are safe and effective. However, it is not currently clear how or how many implants should be statistically compared with a benchmark to assess whether or not that implant is superior, equivalent, non-inferior or inferior to the performance benchmark of interest.We aim to describe the methods and sample size required to conduct a one-sample non-inferiority study of a medical device for the purposes of benchmarking. Simulation study. Simulation study of a national register of medical devices. We simulated data, with and without a non-informative competing risk, to represent an arthroplasty population and describe three methods of analysis (z-test, 1-Kaplan-Meier and competing risks) commonly used in surgical research. We evaluate the performance of each method using power, bias, root-mean-square error, coverage and CI width. 1-Kaplan-Meier provides an unbiased estimate of implant net failure, which can be used to assess if a surgical device is non-inferior to an external benchmark. Small non-inferiority margins require significantly more individuals to be at risk compared with current benchmarking standards. A non-inferiority testing paradigm provides a useful framework for determining if an implant meets the required performance defined by an external benchmark. Current contemporary benchmarking standards have limited power to detect non-inferiority, and substantially larger samples sizes, in excess of 3200 procedures, are required to achieve a power greater than 60%. It is clear when benchmarking implant performance, net failure estimated using 1-KM is preferential to crude failure estimated by competing risk models. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Comprehensive evaluation of untargeted metabolomics data processing software in feature detection, quantification and discriminating marker selection.

PubMed

Li, Zhucui; Lu, Yan; Guo, Yufeng; Cao, Haijie; Wang, Qinhong; Shui, Wenqing

2018-10-31

Data analysis represents a key challenge for untargeted metabolomics studies and it commonly requires extensive processing of more than thousands of metabolite peaks included in raw high-resolution MS data. Although a number of software packages have been developed to facilitate untargeted data processing, they have not been comprehensively scrutinized in the capability of feature detection, quantification and marker selection using a well-defined benchmark sample set. In this study, we acquired a benchmark dataset from standard mixtures consisting of 1100 compounds with specified concentration ratios including 130 compounds with significant variation of concentrations. Five software evaluated here (MS-Dial, MZmine 2, XCMS, MarkerView, and Compound Discoverer) showed similar performance in detection of true features derived from compounds in the mixtures. However, significant differences between untargeted metabolomics software were observed in relative quantification of true features in the benchmark dataset. MZmine 2 outperformed the other software in terms of quantification accuracy and it reported the most true discriminating markers together with the fewest false markers. Furthermore, we assessed selection of discriminating markers by different software using both the benchmark dataset and a real-case metabolomics dataset to propose combined usage of two software for increasing confidence of biomarker identification. Our findings from comprehensive evaluation of untargeted metabolomics software would help guide future improvements of these widely used bioinformatics tools and enable users to properly interpret their metabolomics results. Copyright © 2018 Elsevier B.V. All rights reserved.
Linking log files with dosimetric accuracy--A multi-institutional study on quality assurance of volumetric modulated arc therapy.

PubMed

Pasler, Marlies; Kaas, Jochem; Perik, Thijs; Geuze, Job; Dreindl, Ralf; Künzler, Thomas; Wittkamper, Frits; Georg, Dietmar

2015-12-01

To systematically evaluate machine specific quality assurance (QA) for volumetric modulated arc therapy (VMAT) based on log files by applying a dynamic benchmark plan. A VMAT benchmark plan was created and tested on 18 Elekta linacs (13 MLCi or MLCi2, 5 Agility) at 4 different institutions. Linac log files were analyzed and a delivery robustness index was introduced. For dosimetric measurements an ionization chamber array was used. Relative dose deviations were assessed by mean gamma for each control point and compared to the log file evaluation. Fourteen linacs delivered the VMAT benchmark plan, while 4 linacs failed by consistently terminating the delivery. The mean leaf error (±1SD) was 0.3±0.2 mm for all linacs. Large MLC maximum errors up to 6.5 mm were observed at reversal positions. Delivery robustness index accounting for MLC position correction (0.8-1.0) correlated with delivery time (80-128 s) and depended on dose rate performance. Dosimetric evaluation indicated in general accurate plan reproducibility with γ(mean)(±1 SD)=0.4±0.2 for 1 mm/1%. However single control point analysis revealed larger deviations and attributed well to log file analysis. The designed benchmark plan helped identify linac related malfunctions in dynamic mode for VMAT. Log files serve as an important additional QA measure to understand and visualize dynamic linac parameters. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Towards Systematic Benchmarking of Climate Model Performance

NASA Astrophysics Data System (ADS)

Gleckler, P. J.

2014-12-01

The process by which climate models are evaluated has evolved substantially over the past decade, with the Coupled Model Intercomparison Project (CMIP) serving as a centralizing activity for coordinating model experimentation and enabling research. Scientists with a broad spectrum of expertise have contributed to the CMIP model evaluation process, resulting in many hundreds of publications that have served as a key resource for the IPCC process. For several reasons, efforts are now underway to further systematize some aspects of the model evaluation process. First, some model evaluation can now be considered routine and should not require "re-inventing the wheel" or a journal publication simply to update results with newer models. Second, the benefit of CMIP research to model development has not been optimal because the publication of results generally takes several years and is usually not reproducible for benchmarking newer model versions. And third, there are now hundreds of model versions and many thousands of simulations, but there is no community-based mechanism for routinely monitoring model performance changes. An important change in the design of CMIP6 can help address these limitations. CMIP6 will include a small set standardized experiments as an ongoing exercise (CMIP "DECK": ongoing Diagnostic, Evaluation and Characterization of Klima), so that modeling groups can submit them at any time and not be overly constrained by deadlines. In this presentation, efforts to establish routine benchmarking of existing and future CMIP simulations will be described. To date, some benchmarking tools have been made available to all CMIP modeling groups to enable them to readily compare with CMIP5 simulations during the model development process. A natural extension of this effort is to make results from all CMIP simulations widely available, including the results from newer models as soon as the simulations become available for research. Making the results from routine performance tests readily accessible will help advance a more transparent model evaluation process.

Quality assessment in head and neck oncologic surgery in a Brazilian cancer center compared with MD Anderson Cancer Center benchmarks.

PubMed

Lira, Renan Bezerra; de Carvalho, André Ywata; de Carvalho, Genival Barbosa; Lewis, Carol M; Weber, Randal S; Kowalski, Luiz Paulo

2016-07-01

Quality assessment is a major tool for evaluation of health care delivery. In head and neck surgery, the University of Texas MD Anderson Cancer Center (MD Anderson) has defined quality standards by publishing benchmarks. We conducted an analysis of 360 head and neck surgeries performed at the AC Camargo Cancer Center (AC Camargo). The procedures were stratified into low-acuity procedures (LAPs) or high-acuity procedures (HAPs) and outcome indicators where compared to MD Anderson benchmarks. In the 360 cases, there were 332 LAPs (92.2%) and 28 HAPs (7.8%). Patients with any comorbid condition had a higher incidence of negative outcome indicators (p = .005). In the LAPs, we achieved the MD Anderson benchmarks in all outcome indicators. In HAPs, the rate of surgical site infection and length of hospital stay were higher than what is established by the benchmarks. Quality assessment of head and neck surgery is possible and should be disseminated, improving effectiveness in health care delivery. © 2015 Wiley Periodicals, Inc. Head Neck 38: 1002-1007, 2016. © 2015 Wiley Periodicals, Inc.
Using a visual plate waste study to monitor menu performance.

PubMed

Connors, Priscilla L; Rozell, Sarah B

2004-01-01

Two visual plate waste studies were conducted in 1-week phases over a 1-year period in an acute care hospital. A total of 383 trays were evaluated in the first phase and 467 in the second. Food items were ranked for consumption from a low (1) to high (6) score, with a score of 4.0 set as the benchmark denoting a minimum level of acceptable consumption. In the first phase two entrees, four starches, all of the vegetables, sliced white bread, and skim milk scored below the benchmark. As a result six menu items were replaced and one was modified. In the second phase all entrees scored at or above 4.0, as did seven vegetables, and a dinner roll that replaced sliced white bread. Skim milk continued to score below the benchmark. A visual plate waste study assists in benchmarking performance, planning menu changes, and assessing effectiveness.
Closed-Loop Neuromorphic Benchmarks

PubMed Central

Stewart, Terrence C.; DeWolf, Travis; Kleinhans, Ashley; Eliasmith, Chris

2015-01-01

Evaluating the effectiveness and performance of neuromorphic hardware is difficult. It is even more difficult when the task of interest is a closed-loop task; that is, a task where the output from the neuromorphic hardware affects some environment, which then in turn affects the hardware's future input. However, closed-loop situations are one of the primary potential uses of neuromorphic hardware. To address this, we present a methodology for generating closed-loop benchmarks that makes use of a hybrid of real physical embodiment and a type of “minimal” simulation. Minimal simulation has been shown to lead to robust real-world performance, while still maintaining the practical advantages of simulation, such as making it easy for the same benchmark to be used by many researchers. This method is flexible enough to allow researchers to explicitly modify the benchmarks to identify specific task domains where particular hardware excels. To demonstrate the method, we present a set of novel benchmarks that focus on motor control for an arbitrary system with unknown external forces. Using these benchmarks, we show that an error-driven learning rule can consistently improve motor control performance across a randomly generated family of closed-loop simulations, even when there are up to 15 interacting joints to be controlled. PMID:26696820
CompaRNA: a server for continuous benchmarking of automated methods for RNA secondary structure prediction

PubMed Central

Puton, Tomasz; Kozlowski, Lukasz P.; Rother, Kristian M.; Bujnicki, Janusz M.

2013-01-01

We present a continuous benchmarking approach for the assessment of RNA secondary structure prediction methods implemented in the CompaRNA web server. As of 3 October 2012, the performance of 28 single-sequence and 13 comparative methods has been evaluated on RNA sequences/structures released weekly by the Protein Data Bank. We also provide a static benchmark generated on RNA 2D structures derived from the RNAstrand database. Benchmarks on both data sets offer insight into the relative performance of RNA secondary structure prediction methods on RNAs of different size and with respect to different types of structure. According to our tests, on the average, the most accurate predictions obtained by a comparative approach are generated by CentroidAlifold, MXScarna, RNAalifold and TurboFold. On the average, the most accurate predictions obtained by single-sequence analyses are generated by CentroidFold, ContextFold and IPknot. The best comparative methods typically outperform the best single-sequence methods if an alignment of homologous RNA sequences is available. This article presents the results of our benchmarks as of 3 October 2012, whereas the rankings presented online are continuously updated. We will gladly include new prediction methods and new measures of accuracy in the new editions of CompaRNA benchmarks. PMID:23435231
An evidence-based approach to benchmarking the fairness of health-sector reform in developing countries.

PubMed Central

Daniels, Norman; Flores, Walter; Pannarunothai, Supasit; Ndumbe, Peter N.; Bryant, John H.; Ngulube, T. J.; Wang, Yuankun

2005-01-01

The Benchmarks of Fairness instrument is an evidence-based policy tool developed in generic form in 2000 for evaluating the effects of health-system reforms on equity, efficiency and accountability. By integrating measures of these effects on the central goal of fairness, the approach fills a gap that has hampered reform efforts for more than two decades. Over the past three years, projects in developing countries on three continents have adapted the generic version of these benchmarks for use at both national and subnational levels. Interdisciplinary teams of managers, providers, academics and advocates agree on the relevant criteria for assessing components of fairness and, depending on which aspects of reform they wish to evaluate, select appropriate indicators that rely on accessible information; they also agree on scoring rules for evaluating the diverse changes in the indicators. In contrast to a comprehensive index that aggregates all measured changes into a single evaluation or rank, the pattern of changes revealed by the benchmarks is used to inform policy deliberation aboutwhich aspects of the reforms have been successfully implemented, and it also allows for improvements to be made in the reforms. This approach permits useful evidence about reform to be gathered in settings where existing information is underused and where there is a weak information infrastructure. Brief descriptions of early results from Cameroon, Ecuador, Guatemala, Thailand and Zambia demonstrate that the method can produce results that are useful for policy and reveal the variety of purposes to which the approach can be put. Collaboration across sites can yield a catalogue of indicators that will facilitate further work. PMID:16175828
Curriculum-Based Measurement of Oral Reading: An Evaluation of Growth Rates and Seasonal Effects among Students Served in General and Special Education

ERIC Educational Resources Information Center

Christ, Theodore J.; Silberglitt, Benjamin; Yeo, Seungsoo; Cormier, Damien

2010-01-01

Curriculum-based measurement of oral reading (CBM-R) is often used to benchmark growth in the fall, winter, and spring. CBM-R is also used to set goals and monitor student progress between benchmarking occasions. The results of previous research establish an expectation that weekly growth on CBM-R tasks is consistently linear throughout the…
Development of risk-based nanomaterial groups for occupational exposure control

NASA Astrophysics Data System (ADS)

Kuempel, E. D.; Castranova, V.; Geraci, C. L.; Schulte, P. A.

2012-09-01

Given the almost limitless variety of nanomaterials, it will be virtually impossible to assess the possible occupational health hazard of each nanomaterial individually. The development of science-based hazard and risk categories for nanomaterials is needed for decision-making about exposure control practices in the workplace. A possible strategy would be to select representative (benchmark) materials from various mode of action (MOA) classes, evaluate the hazard and develop risk estimates, and then apply a systematic comparison of new nanomaterials with the benchmark materials in the same MOA class. Poorly soluble particles are used here as an example to illustrate quantitative risk assessment methods for possible benchmark particles and occupational exposure control groups, given mode of action and relative toxicity. Linking such benchmark particles to specific exposure control bands would facilitate the translation of health hazard and quantitative risk information to the development of effective exposure control practices in the workplace. A key challenge is obtaining sufficient dose-response data, based on standard testing, to systematically evaluate the nanomaterials' physical-chemical factors influencing their biological activity. Categorization processes involve both science-based analyses and default assumptions in the absence of substance-specific information. Utilizing data and information from related materials may facilitate initial determinations of exposure control systems for nanomaterials.
Benchmarking Gas Path Diagnostic Methods: A Public Approach

NASA Technical Reports Server (NTRS)

Simon, Donald L.; Bird, Jeff; Davison, Craig; Volponi, Al; Iverson, R. Eugene

2008-01-01

Recent technology reviews have identified the need for objective assessments of engine health management (EHM) technology. The need is two-fold: technology developers require relevant data and problems to design and validate new algorithms and techniques while engine system integrators and operators need practical tools to direct development and then evaluate the effectiveness of proposed solutions. This paper presents a publicly available gas path diagnostic benchmark problem that has been developed by the Propulsion and Power Systems Panel of The Technical Cooperation Program (TTCP) to help address these needs. The problem is coded in MATLAB (The MathWorks, Inc.) and coupled with a non-linear turbofan engine simulation to produce "snap-shot" measurements, with relevant noise levels, as if collected from a fleet of engines over their lifetime of use. Each engine within the fleet will experience unique operating and deterioration profiles, and may encounter randomly occurring relevant gas path faults including sensor, actuator and component faults. The challenge to the EHM community is to develop gas path diagnostic algorithms to reliably perform fault detection and isolation. An example solution to the benchmark problem is provided along with associated evaluation metrics. A plan is presented to disseminate this benchmark problem to the engine health management technical community and invite technology solutions.
Benchmarking and Self-Assessment in the Wine Industry

DOE Office of Scientific and Technical Information (OSTI.GOV)

Galitsky, Christina; Radspieler, Anthony; Worrell, Ernst

2005-12-01

Not all industrial facilities have the staff or theopportunity to perform a detailed audit of their operations. The lack ofknowledge of energy efficiency opportunities provides an importantbarrier to improving efficiency. Benchmarking programs in the U.S. andabroad have shown to improve knowledge of the energy performance ofindustrial facilities and buildings and to fuel energy managementpractices. Benchmarking provides a fair way to compare the energyintensity of plants, while accounting for structural differences (e.g.,the mix of products produced, climate conditions) between differentfacilities. In California, the winemaking industry is not only one of theeconomic pillars of the economy; it is also a large energymore » consumer, witha considerable potential for energy-efficiency improvement. LawrenceBerkeley National Laboratory and Fetzer Vineyards developed the firstbenchmarking tool for the California wine industry called "BEST(Benchmarking and Energy and water Savings Tool) Winery". BEST Wineryenables a winery to compare its energy efficiency to a best practicereference winery. Besides overall performance, the tool enables the userto evaluate the impact of implementing efficiency measures. The toolfacilitates strategic planning of efficiency measures, based on theestimated impact of the measures, their costs and savings. The tool willraise awareness of current energy intensities and offer an efficient wayto evaluate the impact of future efficiency measures.« less
A benchmark for subduction zone modeling

NASA Astrophysics Data System (ADS)

van Keken, P.; King, S.; Peacock, S.

2003-04-01

Our understanding of subduction zones hinges critically on the ability to discern its thermal structure and dynamics. Computational modeling has become an essential complementary approach to observational and experimental studies. The accurate modeling of subduction zones is challenging due to the unique geometry, complicated rheological description and influence of fluid and melt formation. The complicated physics causes problems for the accurate numerical solution of the governing equations. As a consequence it is essential for the subduction zone community to be able to evaluate the ability and limitations of various modeling approaches. The participants of a workshop on the modeling of subduction zones, held at the University of Michigan at Ann Arbor, MI, USA in 2002, formulated a number of case studies to be developed into a benchmark similar to previous mantle convection benchmarks (Blankenbach et al., 1989; Busse et al., 1991; Van Keken et al., 1997). Our initial benchmark focuses on the dynamics of the mantle wedge and investigates three different rheologies: constant viscosity, diffusion creep, and dislocation creep. In addition we investigate the ability of codes to accurate model dynamic pressure and advection dominated flows. Proceedings of the workshop and the formulation of the benchmark are available at www.geo.lsa.umich.edu/~keken/subduction02.html We strongly encourage interested research groups to participate in this benchmark. At Nice 2003 we will provide an update and first set of benchmark results. Interested researchers are encouraged to contact one of the authors for further details.
Use the results of measurements on KBR facility for testing of neutron data of main structural materials for fast reactors

NASA Astrophysics Data System (ADS)

Koscheev, Vladimir; Manturov, Gennady; Pronyaev, Vladimir; Rozhikhin, Evgeny; Semenov, Mikhail; Tsibulya, Anatoly

2017-09-01

Several k∞ experiments were performed on the KBR critical facility at the Institute of Physics and Power Engineering (IPPE), Obninsk, Russia during the 1970s and 80s for study of neutron absorption properties of Cr, Mn, Fe, Ni, Zr, and Mo. Calculations of these benchmarks with almost any modern evaluated nuclear data libraries demonstrate bad agreement with the experiment. Neutron capture cross sections of the odd isotopes of Cr, Mn, Fe, and Ni in the ROSFOND-2010 library have been reevaluated and another evaluation of the Zr nuclear data has been adopted. Use of the modified nuclear data for Cr, Mn, Fe, Ni, and Zr leads to significant improvement of the C/E ratio for the KBR assemblies. Also a significant improvement in agreement between calculated and evaluated values for benchmarks with Fe reflectors was observed. C/E results obtained with the modified ROSFOND library for complex benchmark models that are highly sensitive to the cross sections of structural materials are no worse than results obtained with other major evaluated data libraries. Possible improvement in results by decreasing the capture cross section for Zr and Mo at the energies above 1 keV is indicated.
Use of integral experiments in support to the validation of JEFF-3.2 nuclear data evaluation

NASA Astrophysics Data System (ADS)

Leclaire, Nicolas; Cochet, Bertrand; Jinaphanh, Alexis; Haeck, Wim

2017-09-01

For many years now, IRSN has developed its own Monte Carlo continuous energy capability, which allows testing various nuclear data libraries. In that prospect, a validation database of 1136 experiments was built from cases used for the validation of the APOLLO2-MORET 5 multigroup route of the CRISTAL V2.0 package. In this paper, the keff obtained for more than 200 benchmarks using the JEFF-3.1.1 and JEFF-3.2 libraries are compared to benchmark keff values and main discrepancies are analyzed regarding the neutron spectrum. Special attention is paid on benchmarks for which the results have been highly modified between both JEFF-3 versions.
Experimental flutter boundaries with unsteady pressure distributions for the NACA 0012 Benchmark Model

NASA Technical Reports Server (NTRS)

Rivera, Jose A., Jr.; Dansberry, Bryan E.; Farmer, Moses G.; Eckstrom, Clinton V.; Seidel, David A.; Bennett, Robert M.

1991-01-01

The Structural Dynamics Div. at NASA-Langley has started a wind tunnel activity referred to as the Benchmark Models Program. The objective is to acquire test data that will be useful for developing and evaluating aeroelastic type Computational Fluid Dynamics codes currently in use or under development. The progress is described which was achieved in testing the first model in the Benchmark Models Program. Experimental flutter boundaries are presented for a rigid semispan model (NACA 0012 airfoil section) mounted on a flexible mount system. Also, steady and unsteady pressure measurements taken at the flutter condition are presented. The pressure data were acquired over the entire model chord located at the 60 pct. span station.
Benchmarking Discount Rate in Natural Resource Damage Assessment with Risk Aversion.

PubMed

Wu, Desheng; Chen, Shuzhen

2017-08-01

Benchmarking a credible discount rate is of crucial importance in natural resource damage assessment (NRDA) and restoration evaluation. This article integrates a holistic framework of NRDA with prevailing low discount rate theory, and proposes a discount rate benchmarking decision support system based on service-specific risk aversion. The proposed approach has the flexibility of choosing appropriate discount rates for gauging long-term services, as opposed to decisions based simply on duration. It improves injury identification in NRDA since potential damages and side-effects to ecosystem services are revealed within the service-specific framework. A real embankment case study demonstrates valid implementation of the method. © 2017 Society for Risk Analysis.
A benchmark for comparison of dental radiography analysis algorithms.

PubMed

Wang, Ching-Wei; Huang, Cheng-Ta; Lee, Jia-Hong; Li, Chung-Hsing; Chang, Sheng-Wei; Siao, Ming-Jhih; Lai, Tat-Ming; Ibragimov, Bulat; Vrtovec, Tomaž; Ronneberger, Olaf; Fischer, Philipp; Cootes, Tim F; Lindner, Claudia

2016-07-01

Dental radiography plays an important role in clinical diagnosis, treatment and surgery. In recent years, efforts have been made on developing computerized dental X-ray image analysis systems for clinical usages. A novel framework for objective evaluation of automatic dental radiography analysis algorithms has been established under the auspices of the IEEE International Symposium on Biomedical Imaging 2015 Bitewing Radiography Caries Detection Challenge and Cephalometric X-ray Image Analysis Challenge. In this article, we present the datasets, methods and results of the challenge and lay down the principles for future uses of this benchmark. The main contributions of the challenge include the creation of the dental anatomy data repository of bitewing radiographs, the creation of the anatomical abnormality classification data repository of cephalometric radiographs, and the definition of objective quantitative evaluation for comparison and ranking of the algorithms. With this benchmark, seven automatic methods for analysing cephalometric X-ray image and two automatic methods for detecting bitewing radiography caries have been compared, and detailed quantitative evaluation results are presented in this paper. Based on the quantitative evaluation results, we believe automatic dental radiography analysis is still a challenging and unsolved problem. The datasets and the evaluation software will be made available to the research community, further encouraging future developments in this field. (http://www-o.ntust.edu.tw/~cweiwang/ISBI2015/). Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Benchmarking of dynamic simulation predictions in two software platforms using an upper limb musculoskeletal model

PubMed Central

Saul, Katherine R.; Hu, Xiao; Goehler, Craig M.; Vidt, Meghan E.; Daly, Melissa; Velisar, Anca; Murray, Wendy M.

2014-01-01

Several opensource or commercially available software platforms are widely used to develop dynamic simulations of movement. While computational approaches are conceptually similar across platforms, technical differences in implementation may influence output. We present a new upper limb dynamic model as a tool to evaluate potential differences in predictive behavior between platforms. We evaluated to what extent differences in technical implementations in popular simulation software environments result in differences in kinematic predictions for single and multijoint movements using EMG- and optimization-based approaches for deriving control signals. We illustrate the benchmarking comparison using SIMM-Dynamics Pipeline-SD/Fast and OpenSim platforms. The most substantial divergence results from differences in muscle model and actuator paths. This model is a valuable resource and is available for download by other researchers. The model, data, and simulation results presented here can be used by future researchers to benchmark other software platforms and software upgrades for these two platforms. PMID:24995410
Benchmarking of dynamic simulation predictions in two software platforms using an upper limb musculoskeletal model.

PubMed

Saul, Katherine R; Hu, Xiao; Goehler, Craig M; Vidt, Meghan E; Daly, Melissa; Velisar, Anca; Murray, Wendy M

2015-01-01

Several opensource or commercially available software platforms are widely used to develop dynamic simulations of movement. While computational approaches are conceptually similar across platforms, technical differences in implementation may influence output. We present a new upper limb dynamic model as a tool to evaluate potential differences in predictive behavior between platforms. We evaluated to what extent differences in technical implementations in popular simulation software environments result in differences in kinematic predictions for single and multijoint movements using EMG- and optimization-based approaches for deriving control signals. We illustrate the benchmarking comparison using SIMM-Dynamics Pipeline-SD/Fast and OpenSim platforms. The most substantial divergence results from differences in muscle model and actuator paths. This model is a valuable resource and is available for download by other researchers. The model, data, and simulation results presented here can be used by future researchers to benchmark other software platforms and software upgrades for these two platforms.
NASA wide electronic publishing system: Electronic printing and duplicating. Stage 3 evaluation report

NASA Technical Reports Server (NTRS)

Tuey, Richard C.; Moore, Fred W.; Ryan, Christine A.

1995-01-01

The report is presented in four sections: The Introduction describes the duplicating configuration under evaluation and the Background contains a chronological description of the evaluation segmented by phases 1 and 2. This section includes the evaluation schedule, printing and duplicating requirements, storage and communication requirements, electronic publishing system configuration, existing processes and proposed processes, billing rates, costs and productivity analysis, and the return on investment based upon the data gathered to date. The third section contains the phase 1 comparative cost and productivity analysis. This analysis demonstrated that LaRC should proceed with a 90-day evaluation of the DocuTech and follow with a phase 2 cycle to actually demonstrate that the proposed system would meet the needs of LaRC's printing and duplicating requirements, benchmark results, cost comparisons, benchmark observations, and recommendations. These are documented after the recommendations.
Reliable B Cell Epitope Predictions: Impacts of Method Development and Improved Benchmarking

PubMed Central

Kringelum, Jens Vindahl; Lundegaard, Claus; Lund, Ole; Nielsen, Morten

2012-01-01

The interaction between antibodies and antigens is one of the most important immune system mechanisms for clearing infectious organisms from the host. Antibodies bind to antigens at sites referred to as B-cell epitopes. Identification of the exact location of B-cell epitopes is essential in several biomedical applications such as; rational vaccine design, development of disease diagnostics and immunotherapeutics. However, experimental mapping of epitopes is resource intensive making in silico methods an appealing complementary approach. To date, the reported performance of methods for in silico mapping of B-cell epitopes has been moderate. Several issues regarding the evaluation data sets may however have led to the performance values being underestimated: Rarely, all potential epitopes have been mapped on an antigen, and antibodies are generally raised against the antigen in a given biological context not against the antigen monomer. Improper dealing with these aspects leads to many artificial false positive predictions and hence to incorrect low performance values. To demonstrate the impact of proper benchmark definitions, we here present an updated version of the DiscoTope method incorporating a novel spatial neighborhood definition and half-sphere exposure as surface measure. Compared to other state-of-the-art prediction methods, Discotope-2.0 displayed improved performance both in cross-validation and in independent evaluations. Using DiscoTope-2.0, we assessed the impact on performance when using proper benchmark definitions. For 13 proteins in the training data set where sufficient biological information was available to make a proper benchmark redefinition, the average AUC performance was improved from 0.791 to 0.824. Similarly, the average AUC performance on an independent evaluation data set improved from 0.712 to 0.727. Our results thus demonstrate that given proper benchmark definitions, B-cell epitope prediction methods achieve highly significant predictive performances suggesting these tools to be a powerful asset in rational epitope discovery. The updated version of DiscoTope is available at www.cbs.dtu.dk/services/DiscoTope-2.0. PMID:23300419
Benchmarking is associated with improved quality of care in type 2 diabetes: the OPTIMISE randomized, controlled trial.

PubMed

Hermans, Michel P; Elisaf, Moses; Michel, Georges; Muls, Erik; Nobels, Frank; Vandenberghe, Hans; Brotons, Carlos

2013-11-01

To assess prospectively the effect of benchmarking on quality of primary care for patients with type 2 diabetes by using three major modifiable cardiovascular risk factors as critical quality indicators. Primary care physicians treating patients with type 2 diabetes in six European countries were randomized to give standard care (control group) or standard care with feedback benchmarked against other centers in each country (benchmarking group). In both groups, laboratory tests were performed every 4 months. The primary end point was the percentage of patients achieving preset targets of the critical quality indicators HbA1c, LDL cholesterol, and systolic blood pressure (SBP) after 12 months of follow-up. Of 4,027 patients enrolled, 3,996 patients were evaluable and 3,487 completed 12 months of follow-up. Primary end point of HbA1c target was achieved in the benchmarking group by 58.9 vs. 62.1% in the control group (P = 0.398) after 12 months; 40.0 vs. 30.1% patients met the SBP target (P < 0.001); 54.3 vs. 49.7% met the LDL cholesterol target (P = 0.006). Percentages of patients meeting all three targets increased during the study in both groups, with a statistically significant increase observed in the benchmarking group. The percentage of patients achieving all three targets at month 12 was significantly larger in the benchmarking group than in the control group (12.5 vs. 8.1%; P < 0.001). In this prospective, randomized, controlled study, benchmarking was shown to be an effective tool for increasing achievement of critical quality indicators and potentially reducing patient cardiovascular residual risk profile.

Benchmarking Is Associated With Improved Quality of Care in Type 2 Diabetes

PubMed Central

Hermans, Michel P.; Elisaf, Moses; Michel, Georges; Muls, Erik; Nobels, Frank; Vandenberghe, Hans; Brotons, Carlos

2013-01-01

OBJECTIVE To assess prospectively the effect of benchmarking on quality of primary care for patients with type 2 diabetes by using three major modifiable cardiovascular risk factors as critical quality indicators. RESEARCH DESIGN AND METHODS Primary care physicians treating patients with type 2 diabetes in six European countries were randomized to give standard care (control group) or standard care with feedback benchmarked against other centers in each country (benchmarking group). In both groups, laboratory tests were performed every 4 months. The primary end point was the percentage of patients achieving preset targets of the critical quality indicators HbA1c, LDL cholesterol, and systolic blood pressure (SBP) after 12 months of follow-up. RESULTS Of 4,027 patients enrolled, 3,996 patients were evaluable and 3,487 completed 12 months of follow-up. Primary end point of HbA1c target was achieved in the benchmarking group by 58.9 vs. 62.1% in the control group (P = 0.398) after 12 months; 40.0 vs. 30.1% patients met the SBP target (P < 0.001); 54.3 vs. 49.7% met the LDL cholesterol target (P = 0.006). Percentages of patients meeting all three targets increased during the study in both groups, with a statistically significant increase observed in the benchmarking group. The percentage of patients achieving all three targets at month 12 was significantly larger in the benchmarking group than in the control group (12.5 vs. 8.1%; P < 0.001). CONCLUSIONS In this prospective, randomized, controlled study, benchmarking was shown to be an effective tool for increasing achievement of critical quality indicators and potentially reducing patient cardiovascular residual risk profile. PMID:23846810
Performance criteria and quality indicators for the post-analytical phase.

PubMed

Sciacovelli, Laura; Aita, Ada; Padoan, Andrea; Pelloso, Michela; Antonelli, Giorgia; Piva, Elisa; Chiozza, Maria Laura; Plebani, Mario

2016-07-01

Quality indicators (QIs) used as performance measurements are an effective tool in accurately estimating quality, identifying problems that may need to be addressed, and monitoring the processes over time. In Laboratory Medicine, QIs should cover all steps of the testing process, as error studies have confirmed that most errors occur in the pre- and post-analytical phase of testing. Aim of the present study is to provide preliminary results on QIs and related performance criteria in the post-analytical phase. This work was conducted according to a previously described study design based on the voluntary participation of clinical laboratories in the project on QIs of the Working Group "Laboratory Errors and Patient Safety" (WG-LEPS) of the International Federation of Clinical Chemistry and Laboratory Medicine (IFCC). Overall, data collected highlighted an improvement or stability in performances over time for all reported indicators thus demonstrating that the use of QIs is effective in the quality improvement strategy. Moreover, QIs data are an important source for defining the state-of-the-art concerning the error rate in the total testing process. The definition of performance specifications based on the state-of-the-art, as suggested by consensus documents, is a valuable benchmark point in evaluating the performance of each laboratory. Laboratory tests play a relevant role in the monitoring and evaluation of the efficacy of patient outcome thus assisting clinicians in decision-making. Laboratory performance evaluation is therefore crucial to providing patients with safe, effective and efficient care.
Porting a Hall MHD Code to a Graphic Processing Unit

NASA Technical Reports Server (NTRS)

Dorelli, John C.

2011-01-01

We present our experience porting a Hall MHD code to a Graphics Processing Unit (GPU). The code is a 2nd order accurate MUSCL-Hancock scheme which makes use of an HLL Riemann solver to compute numerical fluxes and second-order finite differences to compute the Hall contribution to the electric field. The divergence of the magnetic field is controlled with Dedner?s hyperbolic divergence cleaning method. Preliminary benchmark tests indicate a speedup (relative to a single Nehalem core) of 58x for a double precision calculation. We discuss scaling issues which arise when distributing work across multiple GPUs in a CPU-GPU cluster.
Benchmarking short sequence mapping tools

PubMed Central

2013-01-01

Background The development of next-generation sequencing instruments has led to the generation of millions of short sequences in a single run. The process of aligning these reads to a reference genome is time consuming and demands the development of fast and accurate alignment tools. However, the current proposed tools make different compromises between the accuracy and the speed of mapping. Moreover, many important aspects are overlooked while comparing the performance of a newly developed tool to the state of the art. Therefore, there is a need for an objective evaluation method that covers all the aspects. In this work, we introduce a benchmarking suite to extensively analyze sequencing tools with respect to various aspects and provide an objective comparison. Results We applied our benchmarking tests on 9 well known mapping tools, namely, Bowtie, Bowtie2, BWA, SOAP2, MAQ, RMAP, GSNAP, Novoalign, and mrsFAST (mrFAST) using synthetic data and real RNA-Seq data. MAQ and RMAP are based on building hash tables for the reads, whereas the remaining tools are based on indexing the reference genome. The benchmarking tests reveal the strengths and weaknesses of each tool. The results show that no single tool outperforms all others in all metrics. However, Bowtie maintained the best throughput for most of the tests while BWA performed better for longer read lengths. The benchmarking tests are not restricted to the mentioned tools and can be further applied to others. Conclusion The mapping process is still a hard problem that is affected by many factors. In this work, we provided a benchmarking suite that reveals and evaluates the different factors affecting the mapping process. Still, there is no tool that outperforms all of the others in all the tests. Therefore, the end user should clearly specify his needs in order to choose the tool that provides the best results. PMID:23758764
Assessing potential health risks to fish and humans using mercury concentrations in inland fish from across western Canada and the United States

USGS Publications Warehouse

Lepak, Jesse M.; Hooten, Mevin B.; Eagles-Smith, Collin A.; Tate, Michael T.; Lutz, Michelle A.; Ackerman, Joshua T.; Willacker, James J.; Jackson, Allyson K.; Evers, David C.; Wiener, James G.; Pritz, Colleen Flanagan; Davis, Jay

2016-01-01

Fish represent high quality protein and nutrient sources, but Hg contamination is ubiquitous in aquatic ecosystems and can pose health risks to fish and their consumers. Potential health risks posed to fish and humans by Hg contamination in fish were assessed in western Canada and the United States. A large compilation of inland fish Hg concentrations was evaluated in terms of potential health risk to the fish themselves, health risk to predatory fish that consume Hg contaminated fish, and to humans that consume Hg contaminated fish. The probability that a fish collected from a given location would exceed a Hg concentration benchmark relevant to a health risk was calculated. These exceedance probabilities and their associated uncertainties were characterized for fish of multiple size classes at multiple health-relevant benchmarks. The approach was novel and allowed for the assessment of the potential for deleterious health effects in fish and humans associated with Hg contamination in fish across this broad study area. Exceedance probabilities were relatively common at low Hg concentration benchmarks, particularly for fish in larger size classes. Specifically, median exceedances for the largest size classes of fish evaluated at the lowest Hg concentration benchmarks were 0.73 (potential health risks to fish themselves), 0.90 (potential health risk to predatory fish that consume Hg contaminated fish), and 0.97 (potential for restricted fish consumption by humans), but diminished to essentially zero at the highest benchmarks and smallest fish size classes. Exceedances of benchmarks are likely to have deleterious health effects on fish and limit recommended amounts of fish humans consume in western Canada and the United States. Results presented here are not intended to subvert or replace local fish Hg data or consumption advice, but provide a basis for identifying areas of potential health risk and developing more focused future research and monitoring efforts.
Incremental cost effectiveness evaluation in clinical research.

PubMed

Krummenauer, Frank; Landwehr, I

2005-01-28

The health economic evaluation of therapeutic and diagnostic strategies is of increasing importance in clinical research. Therefore also clinical trialists have to involve health economic aspects more frequently. However, whereas they are quite familiar with classical effect measures in clinical trials, the corresponding parameters in health economic evaluation of therapeutic and diagnostic procedures are still not this common. The concepts of incremental cost effectiveness ratios (ICERs) and incremental net health benefit (INHB) will be illustrated and contrasted along the cost effectiveness evaluation of cataract surgery with monofocal and multifocal intraocular lenses. ICERs relate the costs of a treatment to its clinical benefit in terms of a ratio expression (indexed as Euro per clinical benefit unit). Therefore ICERs can be directly compared to a pre-specified willingness to pay (WTP) benchmark, which represents the maximum costs, health insurers would invest to achieve one clinical benefit unit. INHBs estimate a treatment's net clinical benefit after accounting for its cost increase versus an established therapeutic standard. Resource allocation rules can be formulated by means of both effect measures. Both the ICER and the INHB approach enable the definition of directional resource allocation rules. The allocation decisions arising from these rules are identical, as long as the willingness to pay benchmark is fixed in advance. Therefore both strategies crucially call for a priori determination of both the underlying clinical benefit endpoint (such as gain in vision lines after cataract surgery or gain in quality-adjusted life years) and the corresponding willingness to pay benchmark. The use of incremental cost effectiveness and net health benefit estimates provides a rationale for health economic allocation discussions and founding decisions. It implies the same requirements on trial protocols as yet established for clinical trials, that is the a priori definition of primary hypotheses (formulated as an allocation rule involving a pre-specified willingness to pay benchmark) and the primary clinical benefit endpoint (as a rationale for effectiveness evaluation).
Benchmarking government action for obesity prevention--an innovative advocacy strategy.

PubMed

Martin, J; Peeters, A; Honisett, S; Mavoa, H; Swinburn, B; de Silva-Sanigorski, A

2014-01-01

Successful obesity prevention will require a leading role for governments, but internationally they have been slow to act. League tables of benchmark indicators of action can be a valuable advocacy and evaluation tool. To develop a benchmarking tool for government action on obesity prevention, implement it across Australian jurisdictions and to publicly award the best and worst performers. A framework was developed which encompassed nine domains, reflecting best practice government action on obesity prevention: whole-of-government approaches; marketing restrictions; access to affordable, healthy food; school food and physical activity; food in public facilities; urban design and transport; leisure and local environments; health services, and; social marketing. A scoring system was used by non-government key informants to rate the performance of their government. National rankings were generated and the results were communicated to all Premiers/Chief Ministers, the media and the national obesity research and practice community. Evaluation of the initial tool in 2010 showed it to be feasible to implement and able to discriminate the better and worse performing governments. Evaluation of the rubric in 2011 confirmed this to be a robust and useful method. In relation to government action, the best performing governments were those with whole-of-government approaches, had extended common initiatives and demonstrated innovation and strong political will. This new benchmarking tool, the Obesity Action Award, has enabled identification of leading government action on obesity prevention and the key characteristics associated with their success. We recommend this tool for other multi-state/country comparisons. Copyright © 2013 Asian Oceanian Association for the Study of Obesity. Published by Elsevier Ltd. All rights reserved.
Evolutionary algorithms for the optimal management of coastal groundwater: A comparative study toward future challenges

NASA Astrophysics Data System (ADS)

Ketabchi, Hamed; Ataie-Ashtiani, Behzad

2015-01-01

This paper surveys the literature associated with the application of evolutionary algorithms (EAs) in coastal groundwater management problems (CGMPs). This review demonstrates that previous studies were mostly relied on the application of limited and particular EAs, mainly genetic algorithm (GA) and its variants, to a number of specific problems. The exclusive investigation of these problems is often not the representation of the variety of feasible processes may be occurred in coastal aquifers. In this study, eight EAs are evaluated for CGMPs. The considered EAs are: GA, continuous ant colony optimization (CACO), particle swarm optimization (PSO), differential evolution (DE), artificial bee colony optimization (ABC), harmony search (HS), shuffled complex evolution (SCE), and simplex simulated annealing (SIMPSA). The first application of PSO, ABC, HS, and SCE in CGMPs is reported here. Moreover, the four benchmark problems with different degree of difficulty and variety are considered to address the important issues of groundwater resources in coastal regions. Hence, the wide ranges of popular objective functions and constraints with the number of decision variables ranging from 4 to 15 are included. These benchmark problems are applied in the combined simulation-optimization model to examine the optimization scenarios. Some preliminary experiments are performed to select the most efficient parameters values for EAs to set a fair comparison. The specific capabilities of each EA toward CGMPs in terms of results quality and required computational time are compared. The evaluation of the results highlights EA's applicability in CGMPs, besides the remarkable strengths and weaknesses of them. The comparisons show that SCE, CACO, and PSO yield superior solutions among the EAs according to the quality of solutions whereas ABC presents the poor performance. CACO provides the better solutions (up to 17%) than the worst EA (ABC) for the problem with the highest decision variables and more complexity. In terms of computational time, PSO and SIMPSA are the fastest. SCE needs the highest computational time, even up to four times in comparison to the fastest EAs. CACO and PSO can be recommended for application in CGMPs, in terms of both abovementioned criteria.
MoleculeNet: a benchmark for molecular machine learning† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02664a

PubMed Central

Wu, Zhenqin; Ramsundar, Bharath; Feinberg, Evan N.; Gomes, Joseph; Geniesse, Caleb; Pappu, Aneesh S.; Leswing, Karl

2017-01-01

Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm. PMID:29629118
Encoding color information for visual tracking: Algorithms and benchmark.

PubMed

Liang, Pengpeng; Blasch, Erik; Ling, Haibin

2015-12-01

While color information is known to provide rich discriminative clues for visual inference, most modern visual trackers limit themselves to the grayscale realm. Despite recent efforts to integrate color in tracking, there is a lack of comprehensive understanding of the role color information can play. In this paper, we attack this problem by conducting a systematic study from both the algorithm and benchmark perspectives. On the algorithm side, we comprehensively encode 10 chromatic models into 16 carefully selected state-of-the-art visual trackers. On the benchmark side, we compile a large set of 128 color sequences with ground truth and challenge factor annotations (e.g., occlusion). A thorough evaluation is conducted by running all the color-encoded trackers, together with two recently proposed color trackers. A further validation is conducted on an RGBD tracking benchmark. The results clearly show the benefit of encoding color information for tracking. We also perform detailed analysis on several issues, including the behavior of various combinations between color model and visual tracker, the degree of difficulty of each sequence for tracking, and how different challenge factors affect the tracking performance. We expect the study to provide the guidance, motivation, and benchmark for future work on encoding color in visual tracking.
ENDF/B-VII.1 Neutron Cross Section Data Testing with Critical Assembly Benchmarks and Reactor Experiments

NASA Astrophysics Data System (ADS)

Kahler, A. C.; MacFarlane, R. E.; Mosteller, R. D.; Kiedrowski, B. C.; Frankle, S. C.; Chadwick, M. B.; McKnight, R. D.; Lell, R. M.; Palmiotti, G.; Hiruta, H.; Herman, M.; Arcilla, R.; Mughabghab, S. F.; Sublet, J. C.; Trkov, A.; Trumbull, T. H.; Dunn, M.

2011-12-01

The ENDF/B-VII.1 library is the latest revision to the United States' Evaluated Nuclear Data File (ENDF). The ENDF library is currently in its seventh generation, with ENDF/B-VII.0 being released in 2006. This revision expands upon that library, including the addition of new evaluated files (was 393 neutron files previously, now 423 including replacement of elemental vanadium and zinc evaluations with isotopic evaluations) and extension or updating of many existing neutron data files. Complete details are provided in the companion paper [M. B. Chadwick et al., "ENDF/B-VII.1 Nuclear Data for Science and Technology: Cross Sections, Covariances, Fission Product Yields and Decay Data," Nuclear Data Sheets, 112, 2887 (2011)]. This paper focuses on how accurately application libraries may be expected to perform in criticality calculations with these data. Continuous energy cross section libraries, suitable for use with the MCNP Monte Carlo transport code, have been generated and applied to a suite of nearly one thousand critical benchmark assemblies defined in the International Criticality Safety Benchmark Evaluation Project's International Handbook of Evaluated Criticality Safety Benchmark Experiments. This suite covers uranium and plutonium fuel systems in a variety of forms such as metallic, oxide or solution, and under a variety of spectral conditions, including unmoderated (i.e., bare), metal reflected and water or other light element reflected. Assembly eigenvalues that were accurately predicted with ENDF/B-VII.0 cross sections such as unmoderated and uranium reflected 235U and 239Pu assemblies, HEU solution systems and LEU oxide lattice systems that mimic commercial PWR configurations continue to be accurately calculated with ENDF/B-VII.1 cross sections, and deficiencies in predicted eigenvalues for assemblies containing selected materials, including titanium, manganese, cadmium and tungsten are greatly reduced. Improvements are also confirmed for selected actinide reaction rates such as 236U, 238,242Pu and 241,243Am capture in fast systems. Other deficiencies, such as the overprediction of Pu solution system critical eigenvalues and a decreasing trend in calculated eigenvalue for 233U fueled systems as a function of Above-Thermal Fission Fraction remain. The comprehensive nature of this critical benchmark suite and the generally accurate calculated eigenvalues obtained with ENDF/B-VII.1 neutron cross sections support the conclusion that this is the most accurate general purpose ENDF/B cross section library yet released to the technical community.
Structural Benchmark Tests of Composite Combustion Chamber Support Completed

NASA Technical Reports Server (NTRS)

Krause, David L.; Thesken, John C.; Shin, E. Eugene; Sutter, James K.

2005-01-01

A series of mechanical load tests was completed on several novel design concepts for extremely lightweight combustion chamber support structures at the NASA Glenn Research Center (http://www.nasa.gov/glenn/). The tests included compliance evaluation, preliminary proof loadings, high-strain cyclic testing, and finally residual strength testing of each design (see the photograph on the left). Loads were applied with single rollers (see the photograph on the right) or pressure plates (not shown) located midspan on each side to minimize the influence of contact stresses on corner deformation measurements. Where rollers alone were used, a more severe structural loading was produced than the corresponding equal-force pressure loading: the maximum transverse shear force existed over the entire length of each side, and the corner bending moments were greater than for a distributed (pressure) loading. Failure modes initiating at the corner only provided a qualitative indication of the performance limitations since the stress state was not identical to internal pressure. Configurations were tested at both room and elevated temperatures. Experimental results were used to evaluate analytical prediction tools and finite-element methodologies for future work, and they were essential to provide insight into the deformation at the corners. The tests also were used to assess fabrication and bonding details for the complicated structures. They will be used to further optimize the design of the support structures for weight performance and the efficacy of corner reinforcement.
The iEvaluate OSD Guidelines and Exemplars: A Disability Services Evaluation Tool

ERIC Educational Resources Information Center

Dukes, Lyman, III

2011-01-01

Program evaluation is rapidly becoming the norm in higher education and this includes disability services. Postsecondary institutions increasingly encourage disability service programs to demonstrate accountability specified through appropriate benchmarks. However, professionals in disability service offices typically report that while they…
Interactive visual optimization and analysis for RFID benchmarking.

PubMed

Wu, Yingcai; Chung, Ka-Kei; Qu, Huamin; Yuan, Xiaoru; Cheung, S C

2009-01-01

Radio frequency identification (RFID) is a powerful automatic remote identification technique that has wide applications. To facilitate RFID deployment, an RFID benchmarking instrument called aGate has been invented to identify the strengths and weaknesses of different RFID technologies in various environments. However, the data acquired by aGate are usually complex time varying multidimensional 3D volumetric data, which are extremely challenging for engineers to analyze. In this paper, we introduce a set of visualization techniques, namely, parallel coordinate plots, orientation plots, a visual history mechanism, and a 3D spatial viewer, to help RFID engineers analyze benchmark data visually and intuitively. With the techniques, we further introduce two workflow procedures (a visual optimization procedure for finding the optimum reader antenna configuration and a visual analysis procedure for comparing the performance and identifying the flaws of RFID devices) for the RFID benchmarking, with focus on the performance analysis of the aGate system. The usefulness and usability of the system are demonstrated in the user evaluation.
Time and frequency structure of causal correlation networks in the China bond market

NASA Astrophysics Data System (ADS)

Wang, Zhongxing; Yan, Yan; Chen, Xiaosong

2017-07-01

There are more than eight hundred interest rates published in the China bond market every day. Identifying the benchmark interest rates that have broad influences on most other interest rates is a major concern for economists. In this paper, a multi-variable Granger causality test is developed and applied to construct a directed network of interest rates, whose important nodes, regarded as key interest rates, are evaluated with CheiRank scores. The results indicate that repo rates are the benchmark of short-term rates, the central bank bill rates are in the core position of mid-term interest rates network, and treasury bond rates lead the long-term bond rates. The evolution of benchmark interest rates from 2008 to 2014 is also studied, and it is found that SHIBOR has generally become the benchmark interest rate in China. In the frequency domain we identify the properties of information flows between interest rates, and the result confirms the existence of market segmentation in the China bond market.
Benchmarking FEniCS for mantle convection simulations

NASA Astrophysics Data System (ADS)

Vynnytska, L.; Rognes, M. E.; Clark, S. R.

2013-01-01

This paper evaluates the usability of the FEniCS Project for mantle convection simulations by numerical comparison to three established benchmarks. The benchmark problems all concern convection processes in an incompressible fluid induced by temperature or composition variations, and cover three cases: (i) steady-state convection with depth- and temperature-dependent viscosity, (ii) time-dependent convection with constant viscosity and internal heating, and (iii) a Rayleigh-Taylor instability. These problems are modeled by the Stokes equations for the fluid and advection-diffusion equations for the temperature and composition. The FEniCS Project provides a novel platform for the automated solution of differential equations by finite element methods. In particular, it offers a significant flexibility with regard to modeling and numerical discretization choices; we have here used a discontinuous Galerkin method for the numerical solution of the advection-diffusion equations. Our numerical results are in agreement with the benchmarks, and demonstrate the applicability of both the discontinuous Galerkin method and FEniCS for such applications.
Analyzing the BBOB results by means of benchmarking concepts.

PubMed

Mersmann, O; Preuss, M; Trautmann, H; Bischl, B; Weihs, C

2015-01-01

We present methods to answer two basic questions that arise when benchmarking optimization algorithms. The first one is: which algorithm is the "best" one? and the second one is: which algorithm should I use for my real-world problem? Both are connected and neither is easy to answer. We present a theoretical framework for designing and analyzing the raw data of such benchmark experiments. This represents a first step in answering the aforementioned questions. The 2009 and 2010 BBOB benchmark results are analyzed by means of this framework and we derive insight regarding the answers to the two questions. Furthermore, we discuss how to properly aggregate rankings from algorithm evaluations on individual problems into a consensus, its theoretical background and which common pitfalls should be avoided. Finally, we address the grouping of test problems into sets with similar optimizer rankings and investigate whether these are reflected by already proposed test problem characteristics, finding that this is not always the case.
A new DWT/MC/DPCM video compression framework based on EBCOT

NASA Astrophysics Data System (ADS)

Mei, L. M.; Wu, H. R.; Tan, D. M.

2005-07-01

A novel Discrete Wavelet Transform (DWT)/Motion Compensation (MC)/Differential Pulse Code Modulation (DPCM) video compression framework is proposed in this paper. Although the Discrete Cosine Transform (DCT)/MC/DPCM is the mainstream framework for video coders in industry and international standards, the idea of DWT/MC/DPCM has existed for more than one decade in the literature and the investigation is still undergoing. The contribution of this work is twofold. Firstly, the Embedded Block Coding with Optimal Truncation (EBCOT) is used here as the compression engine for both intra- and inter-frame coding, which provides good compression ratio and embedded rate-distortion (R-D) optimization mechanism. This is an extension of the EBCOT application from still images to videos. Secondly, this framework offers a good interface for the Perceptual Distortion Measure (PDM) based on the Human Visual System (HVS) where the Mean Squared Error (MSE) can be easily replaced with the PDM in the R-D optimization. Some of the preliminary results are reported here. They are also compared with benchmarks such as MPEG-2 and MPEG-4 version 2. The results demonstrate that under specified condition the proposed coder outperforms the benchmarks in terms of rate vs. distortion.
A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Madduri, Kamesh; Ediger, David; Jiang, Karl

2009-02-15

We present a new lock-free parallel algorithm for computing betweenness centralityof massive small-world networks. With minor changes to the data structures, ouralgorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in HPCS SSCA#2, a benchmark extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the Threadstorm processor, and a single-socket Sun multicore server with the UltraSPARC T2 processor. For a small-world network of 134 millionmore » vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less
A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Madduri, Kamesh; Ediger, David; Jiang, Karl

2009-05-29

We present a new lock-free parallel algorithm for computing betweenness centrality of massive small-world networks. With minor changes to the data structures, our algorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in the HPCS SSCA#2 Graph Analysis benchmark, which has been extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the ThreadStorm processor, and a single-socket Sun multicore server with the UltraSparc T2 processor.more » For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less

Performance Monitoring of Distributed Data Processing Systems

NASA Technical Reports Server (NTRS)

Ojha, Anand K.

2000-01-01

Test and checkout systems are essential components in ensuring safety and reliability of aircraft and related systems for space missions. A variety of systems, developed over several years, are in use at the NASA/KSC. Many of these systems are configured as distributed data processing systems with the functionality spread over several multiprocessor nodes interconnected through networks. To be cost-effective, a system should take the least amount of resource and perform a given testing task in the least amount of time. There are two aspects of performance evaluation: monitoring and benchmarking. While monitoring is valuable to system administrators in operating and maintaining, benchmarking is important in designing and upgrading computer-based systems. These two aspects of performance evaluation are the foci of this project. This paper first discusses various issues related to software, hardware, and hybrid performance monitoring as applicable to distributed systems, and specifically to the TCMS (Test Control and Monitoring System). Next, a comparison of several probing instructions are made to show that the hybrid monitoring technique developed by the NIST (National Institutes for Standards and Technology) is the least intrusive and takes only one-fourth of the time taken by software monitoring probes. In the rest of the paper, issues related to benchmarking a distributed system have been discussed and finally a prescription for developing a micro-benchmark for the TCMS has been provided.
Evaluation of triclosan in Minnesota lakes and rivers: Part II - human health risk assessment.

PubMed

Yost, Lisa J; Barber, Timothy R; Gentry, P Robinan; Bock, Michael J; Lyndall, Jennifer L; Capdevielle, Marie C; Slezak, Brian P

2017-08-01

Triclosan, an antimicrobial compound found in consumer products, has been detected in low concentrations in Minnesota municipal wastewater treatment plant (WWTP) effluent. This assessment evaluates potential health risks for exposure of adults and children to triclosan in Minnesota surface water, sediments, and fish. Potential exposures via fish consumption are considered for recreational or subsistence-level consumers. This assessment uses two chronic oral toxicity benchmarks, which bracket other available toxicity values. The first benchmark is a lower bound on a benchmark dose associated with a 10% risk (BMDL 10 ) of 47mg per kilogram per day (mg/kg-day) for kidney effects in hamsters. This value was identified as the most sensitive endpoint and species in a review by Rodricks et al. (2010) and is used herein to derive an estimated reference dose (RfD (Rodricks) ) of 0.47mg/kg-day. The second benchmark is a reference dose (RfD) of 0.047mg/kg-day derived from a no observed adverse effect level (NOAEL) of 10mg/kg-day for hepatic and hematopoietic effects in mice (Minnesota Department of Health [MDH] 2014). Based on conservative assumptions regarding human exposures to triclosan, calculated risk estimates are far below levels of concern. These estimates are likely to overestimate risks for potential receptors, particularly because sample locations were generally biased towards known discharges (i.e., WWTP effluent). Copyright © 2017 Elsevier Inc. All rights reserved.
Vacuum simulation and characterization for the Linac4 H- source

NASA Astrophysics Data System (ADS)

Pasquino, C.; Chiggiato, P.; Michet, A.; Hansen, J.; Lettry, J.

2013-02-01

At CERN, the 160 MeV H- Linac4 will soon replace the 50 MeV proton Linac2. In the H- source two major sources of gas are identified. The first is the pulsed injection at about 0.1 mbar in the plasma chamber. The second is the constant H2 injection up to 10-5 mbar in the LEBT for beam space charge compensation. In addition, the outgassing of materials exposed to vacuum can play an important role in contamination control and global gas balance. To evaluate the time dependent partial pressure profiles in the H- ion source and the RFQ, electrical network - vacuum analogy and test particle Monte Carlo simulation have been used. The simulation outcome indicates that the pressure requirements are in the reach of the proposed vacuum pumping system. Preliminary results show good agreement between the experimental and the simulated pressure profiles; a calibration campaign is in progress to fully benchmark the implemented calculations. Systematic outgassing rate measurements are on-going for critical components in the ion source and RFQ. Amongst them those for the Cu-coated SmCo magnet located in the vacuum system of the biased electron dump electrode, show results lower to stainless steel at room temperature.
Peracetic acid for secondary effluent disinfection: a comprehensive performance assessment.

PubMed

Antonelli, M; Turolla, A; Mezzanotte, V; Nurizzo, C

2013-01-01

The paper is a review of previous research on secondary effluent disinfection by peracetic acid (PAA) integrated with new data about the effect of a preliminary flash-mixing step. The process was studied at bench and pilot scale to assess its performance for discharge in surface water and agricultural reuse (target microorganisms: Escherichia coli and faecal coliform bacteria). The purposes of the research were: (1) determining PAA decay and disinfection kinetics as a function of operating parameters, (2) evaluating PAA suitability as a disinfectant, (3) assessing long-term disinfection efficiency, (4) investigating disinfected effluent biological toxicity on some aquatic indicator organisms (Vibrio fischeri, Daphnia magna and Selenastrum capricornutum), (5) comparing PAA with conventional disinfectants (sodium hypochlorite, UV irradiation). PAA disinfection was capable of complying with Italian regulations on reuse (10 CFU/100 mL for E. coli) and was competitive with benchmarks. No regrowth phenomena were observed, as long as needed for agricultural reuse (29 h after disinfection), even at negligible concentrations of residual disinfectant. The toxic effect of PAA on the aquatic environment was due to the residual disinfectant in the water, rather than to chemical modification of the effluent.
Hollow fiber: a biophotonic implant for live cells

NASA Astrophysics Data System (ADS)

Silvestre, Oscar F.; Holton, Mark D.; Summers, Huw D.; Smith, Paul J.; Errington, Rachel J.

2009-02-01

The technical objective of this study has been to design, build and validate biocompatible hollow fiber implants based on fluorescence with integrated biophotonics components to enable in fiber kinetic cell based assays. A human osteosarcoma in vitro cell model fiber system has been established with validation studies to determine in fiber cell growth, cell cycle analysis and organization in normal and drug treated conditions. The rationale for implant development have focused on developing benchmark concepts in standard monolayer tissue culture followed by the development of in vitro hollow fiber designs; encompassing imaging with and without integrated biophotonics. Furthermore the effect of introducing targetable biosensors into the encapsulated tumor implant such as quantum dots for informing new detection readouts and possible implant designs have been evaluated. A preliminary micro/macro imaging approach has been undertaken, that could provide a mean to track distinct morphological changes in cells growing in a 3D matrix within the fiber which affect the light scattering properties of the implant. Parallel engineering studies have showed the influence of the optical properties of the fiber polymer wall in all imaging modes. Taken all together, we show the basic foundation and the opportunities for multi-modal imaging within an in vitro implant format.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Sample, B.E. Opresko, D.M. Suter, G.W.

Ecological risks of environmental contaminants are evaluated by using a two-tiered process. In the first tier, a screening assessment is performed where concentrations of contaminants in the environment are compared to no observed adverse effects level (NOAEL)-based toxicological benchmarks. These benchmarks represent concentrations of chemicals (i.e., concentrations presumed to be nonhazardous to the biota) in environmental media (water, sediment, soil, food, etc.). While exceedance of these benchmarks does not indicate any particular level or type of risk, concentrations below the benchmarks should not result in significant effects. In practice, when contaminant concentrations in food or water resources are less thanmore » these toxicological benchmarks, the contaminants may be excluded from further consideration. However, if the concentration of a contaminant exceeds a benchmark, that contaminant should be retained as a contaminant of potential concern (COPC) and investigated further. The second tier in ecological risk assessment, the baseline ecological risk assessment, may use toxicological benchmarks as part of a weight-of-evidence approach (Suter 1993). Under this approach, based toxicological benchmarks are one of several lines of evidence used to support or refute the presence of ecological effects. Other sources of evidence include media toxicity tests, surveys of biota (abundance and diversity), measures of contaminant body burdens, and biomarkers. This report presents NOAEL- and lowest observed adverse effects level (LOAEL)-based toxicological benchmarks for assessment of effects of 85 chemicals on 9 representative mammalian wildlife species (short-tailed shrew, little brown bat, meadow vole, white-footed mouse, cottontail rabbit, mink, red fox, and whitetail deer) or 11 avian wildlife species (American robin, rough-winged swallow, American woodcock, wild turkey, belted kingfisher, great blue heron, barred owl, barn owl, Cooper's hawk, and red-tailed hawk, osprey) (scientific names for both the mammalian and avian species are presented in Appendix B). [In this document, NOAEL refers to both dose (mg contaminant per kg animal body weight per day) and concentration (mg contaminant per kg of food or L of drinking water)]. The 20 wildlife species were chosen because they are widely distributed and provide a representative range of body sizes and diets. The chemicals are some of those that occur at U.S. Department of Energy (DOE) waste sites. The NOAEL-based benchmarks presented in this report represent values believed to be nonhazardous for the listed wildlife species; LOAEL-based benchmarks represent threshold levels at which adverse effects are likely to become evident. These benchmarks consider contaminant exposure through oral ingestion of contaminated media only. Exposure through inhalation and/or direct dermal exposure are not considered in this report.« less
Benchmark and Framework for Encouraging Research on Multi-Threaded Testing Tools

NASA Technical Reports Server (NTRS)

Havelund, Klaus; Stoller, Scott D.; Ur, Shmuel

2003-01-01

A problem that has been getting prominence in testing is that of looking for intermittent bugs. Multi-threaded code is becoming very common, mostly on the server side. As there is no silver bullet solution, research focuses on a variety of partial solutions. In this paper (invited by PADTAD 2003) we outline a proposed project to facilitate research. The project goals are as follows. The first goal is to create a benchmark that can be used to evaluate different solutions. The benchmark, apart from containing programs with documented bugs, will include other artifacts, such as traces, that are useful for evaluating some of the technologies. The second goal is to create a set of tools with open API s that can be used to check ideas without building a large system. For example an instrumentor will be available, that could be used to test temporal noise making heuristics. The third goal is to create a focus for the research in this area around which a community of people who try to solve similar problems with different techniques, could congregate.
The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS).

PubMed

Menze, Bjoern H; Jakab, Andras; Bauer, Stefan; Kalpathy-Cramer, Jayashree; Farahani, Keyvan; Kirby, Justin; Burren, Yuliya; Porz, Nicole; Slotboom, Johannes; Wiest, Roland; Lanczi, Levente; Gerstner, Elizabeth; Weber, Marc-André; Arbel, Tal; Avants, Brian B; Ayache, Nicholas; Buendia, Patricia; Collins, D Louis; Cordier, Nicolas; Corso, Jason J; Criminisi, Antonio; Das, Tilak; Delingette, Hervé; Demiralp, Çağatay; Durst, Christopher R; Dojat, Michel; Doyle, Senan; Festa, Joana; Forbes, Florence; Geremia, Ezequiel; Glocker, Ben; Golland, Polina; Guo, Xiaotao; Hamamci, Andac; Iftekharuddin, Khan M; Jena, Raj; John, Nigel M; Konukoglu, Ender; Lashkari, Danial; Mariz, José Antonió; Meier, Raphael; Pereira, Sérgio; Precup, Doina; Price, Stephen J; Raviv, Tammy Riklin; Reza, Syed M S; Ryan, Michael; Sarikaya, Duygu; Schwartz, Lawrence; Shin, Hoo-Chang; Shotton, Jamie; Silva, Carlos A; Sousa, Nuno; Subbanna, Nagesh K; Szekely, Gabor; Taylor, Thomas J; Thomas, Owen M; Tustison, Nicholas J; Unal, Gozde; Vasseur, Flor; Wintermark, Max; Ye, Dong Hye; Zhao, Liang; Zhao, Binsheng; Zikic, Darko; Prastawa, Marcel; Reyes, Mauricio; Van Leemput, Koen

2015-10-01

In this paper we report the set-up and results of the Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) organized in conjunction with the MICCAI 2012 and 2013 conferences. Twenty state-of-the-art tumor segmentation algorithms were applied to a set of 65 multi-contrast MR scans of low- and high-grade glioma patients-manually annotated by up to four raters-and to 65 comparable scans generated using tumor image simulation software. Quantitative evaluations revealed considerable disagreement between the human raters in segmenting various tumor sub-regions (Dice scores in the range 74%-85%), illustrating the difficulty of this task. We found that different algorithms worked best for different sub-regions (reaching performance comparable to human inter-rater variability), but that no single algorithm ranked in the top for all sub-regions simultaneously. Fusing several good algorithms using a hierarchical majority vote yielded segmentations that consistently ranked above all individual algorithms, indicating remaining opportunities for further methodological improvements. The BRATS image data and manual annotations continue to be publicly available through an online evaluation system as an ongoing benchmarking resource.
The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS)

PubMed Central

Jakab, Andras; Bauer, Stefan; Kalpathy-Cramer, Jayashree; Farahani, Keyvan; Kirby, Justin; Burren, Yuliya; Porz, Nicole; Slotboom, Johannes; Wiest, Roland; Lanczi, Levente; Gerstner, Elizabeth; Weber, Marc-André; Arbel, Tal; Avants, Brian B.; Ayache, Nicholas; Buendia, Patricia; Collins, D. Louis; Cordier, Nicolas; Corso, Jason J.; Criminisi, Antonio; Das, Tilak; Delingette, Hervé; Demiralp, Çağatay; Durst, Christopher R.; Dojat, Michel; Doyle, Senan; Festa, Joana; Forbes, Florence; Geremia, Ezequiel; Glocker, Ben; Golland, Polina; Guo, Xiaotao; Hamamci, Andac; Iftekharuddin, Khan M.; Jena, Raj; John, Nigel M.; Konukoglu, Ender; Lashkari, Danial; Mariz, José António; Meier, Raphael; Pereira, Sérgio; Precup, Doina; Price, Stephen J.; Raviv, Tammy Riklin; Reza, Syed M. S.; Ryan, Michael; Sarikaya, Duygu; Schwartz, Lawrence; Shin, Hoo-Chang; Shotton, Jamie; Silva, Carlos A.; Sousa, Nuno; Subbanna, Nagesh K.; Szekely, Gabor; Taylor, Thomas J.; Thomas, Owen M.; Tustison, Nicholas J.; Unal, Gozde; Vasseur, Flor; Wintermark, Max; Ye, Dong Hye; Zhao, Liang; Zhao, Binsheng; Zikic, Darko; Prastawa, Marcel; Reyes, Mauricio; Van Leemput, Koen

2016-01-01

In this paper we report the set-up and results of the Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) organized in conjunction with the MICCAI 2012 and 2013 conferences. Twenty state-of-the-art tumor segmentation algorithms were applied to a set of 65 multi-contrast MR scans of low- and high-grade glioma patients—manually annotated by up to four raters—and to 65 comparable scans generated using tumor image simulation software. Quantitative evaluations revealed considerable disagreement between the human raters in segmenting various tumor sub-regions (Dice scores in the range 74%–85%), illustrating the difficulty of this task. We found that different algorithms worked best for different sub-regions (reaching performance comparable to human inter-rater variability), but that no single algorithm ranked in the top for all sub-regions simultaneously. Fusing several good algorithms using a hierarchical majority vote yielded segmentations that consistently ranked above all individual algorithms, indicating remaining opportunities for further methodological improvements. The BRATS image data and manual annotations continue to be publicly available through an online evaluation system as an ongoing benchmarking resource. PMID:25494501
The CARIPANDA project: Climate change and water resources in the Adamello Natural Park of Italy

NASA Astrophysics Data System (ADS)

Bocchiola, D.

2009-04-01

The three years (2007-2009) CARIPANDA project funded by the Cariplo Foundation of Italy is aimed to evaluate scenarios for water resources in the Adamello natural Park of Italy in a window of 50 years or so (until 2050). The project is led by Ente Parco Adamello and involves Politecnico di Milano, Università Statale di Milano, Università di Brescia, and ARPA Lombardia as scientific partners, while ENEL hydropower Company of Italy joins the project as stake holder. The Adamello Natural Park is a noteworthy resource in the Italian Alps. The Adamello Group is made of several glacierized areas (c. 24 km2), of both debris covered and free ice types, including the widest Italian Glacier, named Adamello, spreading on an area of about c. 18 km2. Also the Adamello Natural Reserve, covering 217 km2 inside the Adamello Park and including the Adamello glaciers, hosts a number of high altitude safeguarded vegetal and animal species, the safety of which is a primary task of the Reserve. Project's activity involves analysis of local climate trend, field campaigns on glaciers, hydrological modelling and remote sensing of snow and ice covered areas, aimed to build a consistent model of the present hydrological conditions and of the areas. Then, properly tailored climate change projections for the area, obtained using local data driven downscaling of climate change projections from GCMs model, are used to infer the likely response to expected climate change conditions. With two years in the project now some preliminary findings can be highlighted and some preliminary trend analysis carried out. The proposed poster provides a resume of the main results of the project insofar, of interest as a benchmark for similar ongoing and foregoing projects about climate change impact on European mountainous natural areas.
Towards evidence-based computational statistics: lessons from clinical research on the role and design of real-data benchmark studies.

PubMed

Boulesteix, Anne-Laure; Wilson, Rory; Hapfelmeier, Alexander

2017-09-09

The goal of medical research is to develop interventions that are in some sense superior, with respect to patient outcome, to interventions currently in use. Similarly, the goal of research in methodological computational statistics is to develop data analysis tools that are themselves superior to the existing tools. The methodology of the evaluation of medical interventions continues to be discussed extensively in the literature and it is now well accepted that medicine should be at least partly "evidence-based". Although we statisticians are convinced of the importance of unbiased, well-thought-out study designs and evidence-based approaches in the context of clinical research, we tend to ignore these principles when designing our own studies for evaluating statistical methods in the context of our methodological research. In this paper, we draw an analogy between clinical trials and real-data-based benchmarking experiments in methodological statistical science, with datasets playing the role of patients and methods playing the role of medical interventions. Through this analogy, we suggest directions for improvement in the design and interpretation of studies which use real data to evaluate statistical methods, in particular with respect to dataset inclusion criteria and the reduction of various forms of bias. More generally, we discuss the concept of "evidence-based" statistical research, its limitations and its impact on the design and interpretation of real-data-based benchmark experiments. We suggest that benchmark studies-a method of assessment of statistical methods using real-world datasets-might benefit from adopting (some) concepts from evidence-based medicine towards the goal of more evidence-based statistical research.
Dose specification for hippocampal sparing whole brain radiotherapy (HS WBRT): considerations from the UK HIPPO trial QA programme.

PubMed

Megias, Daniel; Phillips, Mark; Clifton-Hadley, Laura; Harron, Elizabeth; Eaton, David J; Sanghera, Paul; Whitfield, Gillian

2017-03-01

The HIPPO trial is a UK randomized Phase II trial of hippocampal sparing (HS) vs conventional whole-brain radiotherapy after surgical resection or radiosurgery in patients with favourable prognosis with 1-4 brain metastases. Each participating centre completed a planning benchmark case as part of the dedicated radiotherapy trials quality assurance programme (RTQA), promoting the safe and effective delivery of HS intensity-modulated radiotherapy (IMRT) in a multicentre trial setting. Submitted planning benchmark cases were reviewed using visualization for radiotherapy software (VODCA) evaluating plan quality and compliance in relation to the HIPPO radiotherapy planning and delivery guidelines. Comparison of the planning benchmark data highlighted a plan specified using dose to medium as an outlier by comparison with those specified using dose to water. Further evaluation identified that the reported plan statistics for dose to medium were lower as a result of the dose calculated at regions of PTV inclusive of bony cranium being lower relative to brain. Specification of dose to water or medium remains a source of potential ambiguity and it is essential that as part of a multicentre trial, consideration is given to reported differences, particularly in the presence of bone. Evaluation of planning benchmark data as part of an RTQA programme has highlighted an important feature of HS IMRT dosimetry dependent on dose being specified to water or medium, informing the development and undertaking of HS IMRT as part of the HIPPO trial. Advances in knowledge: The potential clinical impact of differences between dose to medium and dose to water are demonstrated for the first time, in the setting of HS whole-brain radiotherapy.
Aluminum Data Measurements and Evaluation for Criticality Safety Applications

NASA Astrophysics Data System (ADS)

Leal, L. C.; Guber, K. H.; Spencer, R. R.; Derrien, H.; Wright, R. Q.

2002-12-01

The Defense Nuclear Facility Safety Board (DNFSB) Recommendation 93-2 motivated the US Department of Energy (DOE) to develop a comprehensive criticality safety program to maintain and to predict the criticality of systems throughout the DOE complex. To implement the response to the DNFSB Recommendation 93-2, a Nuclear Criticality Safety Program (NCSP) was created including the following tasks: Critical Experiments, Criticality Benchmarks, Training, Analytical Methods, and Nuclear Data. The Nuclear Data portion of the NCSP consists of a variety of differential measurements performed at the Oak Ridge Electron Linear Accelerator (ORELA) at the Oak Ridge National Laboratory (ORNL), data analysis and evaluation using the generalized least-squares fitting code SAMMY in the resolved, unresolved, and high energy ranges, and the development and benchmark testing of complete evaluations for a nuclide for inclusion into the Evaluated Nuclear Data File (ENDF/B). This paper outlines the work performed at ORNL to measure, evaluate, and test the nuclear data for aluminum for applications in criticality safety problems.
Validation of tungsten cross sections in the neutron energy region up to 100 keV

NASA Astrophysics Data System (ADS)

Pigni, Marco T.; Žerovnik, Gašper; Leal, Luiz. C.; Trkov, Andrej

2017-09-01

Following a series of recent cross section evaluations on tungsten isotopes performed at Oak Ridge National Laboratory (ORNL), this paper presents the validation work carried out to test the performance of the evaluated cross sections based on lead-slowing-down (LSD) benchmarks conducted in Grenoble. ORNL completed the resonance parameter evaluation of four tungsten isotopes - 182,183,184,186W - in August 2014 and submitted it as an ENDF-compatible file to be part of the next release of the ENDF/B-VIII.0 nuclear data library. The evaluations were performed with support from the US Nuclear Criticality Safety Program in an effort to provide improved tungsten cross section and covariance data for criticality safety sensitivity analyses. The validation analysis based on the LSD benchmarks showed an improved agreement with the experimental response when the ORNL tungsten evaluations were included in the ENDF/B-VII.1 library. Comparison with the results obtained with the JEFF-3.2 nuclear data library are also discussed.
Numerical Prediction of Signal for Magnetic Flux Leakage Benchmark Task

NASA Astrophysics Data System (ADS)

Lunin, V.; Alexeevsky, D.

2003-03-01

Numerical results predicted by the finite element method based code are presented. The nonlinear magnetic time-dependent benchmark problem proposed by the World Federation of Nondestructive Evaluation Centers, involves numerical prediction of normal (radial) component of the leaked field in the vicinity of two practically rectangular notches machined on a rotating steel pipe (with known nonlinear magnetic characteristic). One notch is located on external surface of pipe and other is on internal one, and both are oriented axially.
Evaluating the Information Power Grid using the NAS Grid Benchmarks

NASA Technical Reports Server (NTRS)

VanderWijngaartm Rob F.; Frumkin, Michael A.

2004-01-01

The NAS Grid Benchmarks (NGB) are a collection of synthetic distributed applications designed to rate the performance and functionality of computational grids. We compare several implementations of the NGB to determine programmability and efficiency of NASA's Information Power Grid (IPG), whose services are mostly based on the Globus Toolkit. We report on the overheads involved in porting existing NGB reference implementations to the IPG. No changes were made to the component tasks of the NGB can still be improved.
[Does implementation of benchmarking in quality circles improve the quality of care of patients with asthma and reduce drug interaction?].

PubMed

Kaufmann-Kolle, Petra; Szecsenyi, Joachim; Broge, Björn; Haefeli, Walter Emil; Schneider, Antonius

2011-01-01

The purpose of this cluster-randomised controlled trial was to evaluate the efficacy of quality circles (QCs) working either with general data-based feedback or with an open benchmark within the field of asthma care and drug-drug interactions. Twelve QCs, involving 96 general practitioners from 85 practices, were randomised. Six QCs worked with traditional anonymous feedback and six with an open benchmark. Two QC meetings supported with feedback reports were held covering the topics "drug-drug interactions" and "asthma"; in both cases discussions were guided by a trained moderator. Outcome measures included health-related quality of life and patient satisfaction with treatment, asthma severity and number of potentially inappropriate drug combinations as well as the general practitioners' satisfaction in relation to the performance of the QC. A significant improvement in the treatment of asthma was observed in both trial arms. However, there was only a slight improvement regarding inappropriate drug combinations. There were no relevant differences between the group with open benchmark (B-QC) and traditional quality circles (T-QC). The physicians' satisfaction with the QC performance was significantly higher in the T-QCs. General practitioners seem to take a critical perspective about open benchmarking in quality circles. Caution should be used when implementing benchmarking in a quality circle as it did not improve healthcare when compared to the traditional procedure with anonymised comparisons. Copyright © 2011. Published by Elsevier GmbH.
Gaming in risk-adjusted mortality rates: effect of misclassification of risk factors in the benchmarking of cardiac surgery risk-adjusted mortality rates.

PubMed

Siregar, Sabrina; Groenwold, Rolf H H; Versteegh, Michel I M; Noyez, Luc; ter Burg, Willem Jan P P; Bots, Michiel L; van der Graaf, Yolanda; van Herwerden, Lex A

2013-03-01

Upcoding or undercoding of risk factors could affect the benchmarking of risk-adjusted mortality rates. The aim was to investigate the effect of misclassification of risk factors on the benchmarking of mortality rates after cardiac surgery. A prospective cohort was used comprising all adult cardiac surgery patients in all 16 cardiothoracic centers in The Netherlands from January 1, 2007, to December 31, 2009. A random effects model, including the logistic European system for cardiac operative risk evaluation (EuroSCORE) was used to benchmark the in-hospital mortality rates. We simulated upcoding and undercoding of 5 selected variables in the patients from 1 center. These patients were selected randomly (nondifferential misclassification) or by the EuroSCORE (differential misclassification). In the random patients, substantial misclassification was required to affect benchmarking: a 1.8-fold increase in prevalence of the 4 risk factors changed an underperforming center into an average performing one. Upcoding of 1 variable required even more. When patients with the greatest EuroSCORE were upcoded (ie, differential misclassification), a 1.1-fold increase was sufficient: moderate left ventricular function from 14.2% to 15.7%, poor left ventricular function from 8.4% to 9.3%, recent myocardial infarction from 7.9% to 8.6%, and extracardiac arteriopathy from 9.0% to 9.8%. Benchmarking using risk-adjusted mortality rates can be manipulated by misclassification of the EuroSCORE risk factors. Misclassification of random patients or of single variables will have little effect. However, limited upcoding of multiple risk factors in high-risk patients can greatly influence benchmarking. To minimize "gaming," the prevalence of all risk factors should be carefully monitored. Copyright © 2013 The American Association for Thoracic Surgery. Published by Mosby, Inc. All rights reserved.
Performance benchmark of LHCb code on state-of-the-art x86 architectures

NASA Astrophysics Data System (ADS)

Campora Perez, D. H.; Neufeld, N.; Schwemmer, R.

2015-12-01

For Run 2 of the LHC, LHCb is replacing a significant part of its event filter farm with new compute nodes. For the evaluation of the best performing solution, we have developed a method to convert our high level trigger application into a stand-alone, bootable benchmark image. With additional instrumentation we turned it into a self-optimising benchmark which explores techniques such as late forking, NUMA balancing and optimal number of threads, i.e. it automatically optimises box-level performance. We have run this procedure on a wide range of Haswell-E CPUs and numerous other architectures from both Intel and AMD, including also the latest Intel micro-blade servers. We present results in terms of performance, power consumption, overheads and relative cost.
There is no one-size-fits-all product for InSAR; on the inclusion of contextual information for geodetically-proof InSAR data products

NASA Astrophysics Data System (ADS)

Hanssen, R. F.

2017-12-01

In traditional geodesy, one is interested in determining the coordinates, or the change in coordinates, of predefined benchmarks. These benchmarks are clearly identifiable and are especially established to be representative of the signal of interest. This holds, e.g., for leveling benchmarks, for triangulation/trilateration benchmarks, and for GNSS benchmarks. The desired coordinates are not identical to the basic measurements, and need to be estimated using robust estimation procedures, where the stochastic nature of the measurements is taken into account. For InSAR, however, the `benchmarks' are not predefined. In fact, usually we do not know where an effective benchmark is located, even though we can determine its dynamic behavior pretty well. This poses several significant problems. First, we cannot describe the quality of the measurements, unless we already know the dynamic behavior of the benchmark. Second, if we don't know the quality of the measurements, we cannot compute the quality of the estimated parameters. Third, rather harsh assumptions need to be made to produce a result. These (usually implicit) assumptions differ between processing operators and the used software, and are severely affected by the amount of available data. Fourth, the `relative' nature of the final estimates is usually not explicitly stated, which is particularly problematic for non-expert users. Finally, whereas conventional geodesy applies rigorous testing to check for measurement or model errors, this is hardly ever done in InSAR-geodesy. These problems make it rather impossible to provide a precise, reliable, repeatable, and `universal' InSAR product or service. Here we evaluate the requirements and challenges to move towards InSAR as a geodetically-proof product. In particular this involves the explicit inclusion of contextual information, as well as InSAR procedures, standards and a technical protocol, supported by the International Association of Geodesy and the international scientific community.

Structural weights analysis of advanced aerospace vehicles using finite element analysis

NASA Technical Reports Server (NTRS)

Bush, Lance B.; Lentz, Christopher A.; Rehder, John J.; Naftel, J. Chris; Cerro, Jeffrey A.

1989-01-01

A conceptual/preliminary level structural design system has been developed for structural integrity analysis and weight estimation of advanced space transportation vehicles. The system includes a three-dimensional interactive geometry modeler, a finite element pre- and post-processor, a finite element analyzer, and a structural sizing program. Inputs to the system include the geometry, surface temperature, material constants, construction methods, and aerodynamic and inertial loads. The results are a sized vehicle structure capable of withstanding the static loads incurred during assembly, transportation, operations, and missions, and a corresponding structural weight. An analysis of the Space Shuttle external tank is included in this paper as a validation and benchmark case of the system.
Risk Assessment Update: Russian Segment

NASA Technical Reports Server (NTRS)

Christiansen, Eric; Lear, Dana; Hyde, James; Bjorkman, Michael; Hoffman, Kevin

2012-01-01

BUMPER-II version 1.95j source code was provided to RSC-E- and Khrunichev at January 2012 MMOD TIM in Moscow. MEMCxP and ORDEM 3.0 environments implemented as external data files. NASA provided a sample ORDEM 3.0 g."key" & "daf" environment file set for demonstration and benchmarking BUMPER -II v1.95j installation at the Jan-12 TIM. ORDEM 3.0 has been completed and is currently in beta testing. NASA will provide a preliminary set of ORDEM 3.0 ".key" & ".daf" environment files for the years 2012 through 2028. Bumper output files produced using the new ORDEM 3.0 data files are intended for internal use only, not for requirements verification. Output files will contain these words ORDEM FILE DESCRIPTION = PRELIMINARY VERSION: not for production. The projectile density term in many BUMPER-II ballistic limit equations will need to be updated. Cube demo scripts and output files delivered at the Jan-12 TIM have been updated for the new ORDEM 3.0 data files. Risk assessment results based on ORDEM 3.0 and MEM will be presented for the Russian Segment (RS) of ISS.
A benchmark for vehicle detection on wide area motion imagery

NASA Astrophysics Data System (ADS)

Catrambone, Joseph; Amzovski, Ismail; Liang, Pengpeng; Blasch, Erik; Sheaff, Carolyn; Wang, Zhonghai; Chen, Genshe; Ling, Haibin

2015-05-01

Wide area motion imagery (WAMI) has been attracting an increased amount of research attention due to its large spatial and temporal coverage. An important application includes moving target analysis, where vehicle detection is often one of the first steps before advanced activity analysis. While there exist many vehicle detection algorithms, a thorough evaluation of them on WAMI data still remains a challenge mainly due to the lack of an appropriate benchmark data set. In this paper, we address a research need by presenting a new benchmark for wide area motion imagery vehicle detection data. The WAMI benchmark is based on the recently available Wright-Patterson Air Force Base (WPAFB09) dataset and the Temple Resolved Uncertainty Target History (TRUTH) associated target annotation. Trajectory annotations were provided in the original release of the WPAFB09 dataset, but detailed vehicle annotations were not available with the dataset. In addition, annotations of static vehicles, e.g., in parking lots, are also not identified in the original release. Addressing these issues, we re-annotated the whole dataset with detailed information for each vehicle, including not only a target's location, but also its pose and size. The annotated WAMI data set should be useful to community for a common benchmark to compare WAMI detection, tracking, and identification methods.
Endobronchial ultrasound-guided transbronchial needle aspiration: performance of biomedical scientists on rapid on-site evaluation and preliminary diagnosis.

PubMed

Schacht, M J; Toustrup, C B; Madsen, L B; Martiny, M S; Larsen, B B; Simonsen, J T

2016-10-01

Rapid on-site evaluation (ROSE) of endobronchial ultrasound-guided transbronchial needle aspiration (EBUS-TBNA) followed by a subsequent preliminary adequacy assessment and a preliminary diagnosis, was performed at Aarhus University Hospital by biomedical scientists (BMS). The aim of this study was to evaluate the BMS accuracy of ROSE adequacy assessment, the preliminary adequacy assessment and the preliminary diagnosis as compared with the cytopathologist-rendered final adequacy assessment and final diagnosis. The BMS-rendered assessments for 717 sites from 319 consecutive patients over a 4-month period were compared with the cytopathologist-rendered assessments. Comparisons of adequacy and preliminary diagnoses were based on inter-observer Cohen's Kappa coefficient with a 95% confidence interval (CI). Strong correlations between ROSE and final adequacy assessments [Kappa coefficient of 0.90 (CI: 0.85-0.96)] and between the preliminary and final adequacy assessments [Kappa coefficient of 0.93 (CI: 0.87-0.99)] were found. As for the correlation between the preliminary and final diagnoses, the Kappa coefficient was 0.99 (CI: 0.98-1). Both ROSE and preliminary adequacy assessments as well as preliminary diagnoses, all performed by BMS, were highly accurate when compared with the final assessment by the cytopathologist. © 2016 John Wiley & Sons Ltd.
Building Bridges Between Geoscience and Data Science through Benchmark Data Sets

NASA Astrophysics Data System (ADS)

Thompson, D. R.; Ebert-Uphoff, I.; Demir, I.; Gel, Y.; Hill, M. C.; Karpatne, A.; Güereque, M.; Kumar, V.; Cabral, E.; Smyth, P.

2017-12-01

The changing nature of observational field data demands richer and more meaningful collaboration between data scientists and geoscientists. Thus, among other efforts, the Working Group on Case Studies of the NSF-funded RCN on Intelligent Systems Research To Support Geosciences (IS-GEO) is developing a framework to strengthen such collaborations through the creation of benchmark datasets. Benchmark datasets provide an interface between disciplines without requiring extensive background knowledge. The goals are to create (1) a means for two-way communication between geoscience and data science researchers; (2) new collaborations, which may lead to new approaches for data analysis in the geosciences; and (3) a public, permanent repository of complex data sets, representative of geoscience problems, useful to coordinate efforts in research and education. The group identified 10 key elements and characteristics for ideal benchmarks. High impact: A problem with high potential impact. Active research area: A group of geoscientists should be eager to continue working on the topic. Challenge: The problem should be challenging for data scientists. Data science generality and versatility: It should stimulate development of new general and versatile data science methods. Rich information content: Ideally the data set provides stimulus for analysis at many different levels. Hierarchical problem statement: A hierarchy of suggested analysis tasks, from relatively straightforward to open-ended tasks. Means for evaluating success: Data scientists and geoscientists need means to evaluate whether the algorithms are successful and achieve intended purpose. Quick start guide: Introduction for data scientists on how to easily read the data to enable rapid initial data exploration. Geoscience context: Summary for data scientists of the specific data collection process, instruments used, any pre-processing and the science questions to be answered. Citability: A suitable identifier to facilitate tracking the use of the benchmark later on, e.g. allowing search engines to find all research papers using it. A first sample benchmark developed in collaboration with the Jet Propulsion Laboratory (JPL) deals with the automatic analysis of imaging spectrometer data to detect significant methane sources in the atmosphere.
Performance evaluation of tile-based Fisher Ratio analysis using a benchmark yeast metabolome dataset.

PubMed

Watson, Nathanial E; Parsons, Brendon A; Synovec, Robert E

2016-08-12

Performance of tile-based Fisher Ratio (F-ratio) data analysis, recently developed for discovery-based studies using comprehensive two-dimensional gas chromatography coupled with time-of-flight mass spectrometry (GC×GC-TOFMS), is evaluated with a metabolomics dataset that had been previously analyzed in great detail, but while taking a brute force approach. The previously analyzed data (referred to herein as the benchmark dataset) were intracellular extracts from Saccharomyces cerevisiae (yeast), either metabolizing glucose (repressed) or ethanol (derepressed), which define the two classes in the discovery-based analysis to find metabolites that are statistically different in concentration between the two classes. Beneficially, this previously analyzed dataset provides a concrete means to validate the tile-based F-ratio software. Herein, we demonstrate and validate the significant benefits of applying tile-based F-ratio analysis. The yeast metabolomics data are analyzed more rapidly in about one week versus one year for the prior studies with this dataset. Furthermore, a null distribution analysis is implemented to statistically determine an adequate F-ratio threshold, whereby the variables with F-ratio values below the threshold can be ignored as not class distinguishing, which provides the analyst with confidence when analyzing the hit table. Forty-six of the fifty-four benchmarked changing metabolites were discovered by the new methodology while consistently excluding all but one of the benchmarked nineteen false positive metabolites previously identified. Copyright © 2016 Elsevier B.V. All rights reserved.
ARABIC TRANSLATION AND ADAPTATION OF THE HOSPITAL CONSUMER ASSESSMENT OF HEALTHCARE PROVIDERS AND SYSTEMS (HCAHPS) PATIENT SATISFACTION SURVEY INSTRUMENT.

PubMed

Dockins, James; Abuzahrieh, Ramzi; Stack, Martin

2015-01-01

To translate and adapt an effective, validated, benchmarked, and widely used patient satisfaction measurement tool for use with an Arabic-speaking population. Translation of survey's items, survey administration process development, evaluation of reliability, and international benchmarking Three hundred-bed tertiary care hospital in Jeddah, Saudi Arabia. 645 patients discharged during 2011 from the hospital's inpatient care units. INTERVENTIONS; The Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) instrument was translated into Arabic, a randomized weekly sample of patients was selected, and the survey was administered via telephone during 2011 to patients or their relatives. Scores were compiled for each of the HCAHPS questions and then for each of the six HCAHPS clinical composites, two non-clinical items, and two global items. Clinical composite scores, as well as the two non-clinical and two global items were analyzed for the 645 respondents. Clinical composites were analyzed using Spearman's correlation coefficient and Cronbach's alpha to demonstrate acceptable internal consistency for these items and scales demonstrated acceptable internal consistency for the clinical composites. (Spearman's correlation coefficient = 0.327 - 0.750, P < 0.01; Cronbach's alpha = 0.516 - 0.851) All ten HCAHPS measures were compared quarterly to US national averages with results that closely paralleled the US benchmarks. . The Arabic translation and adaptation of the HCAHPS is a valid, reliable, and feasible tool for evaluation and benchmarking of inpatient satisfaction in Arabic speaking populations.
Norton-Thevenin Receptance Coupling (NTRC) as a Payload Design Tool

NASA Technical Reports Server (NTRS)

Gordon, Scott; Kaufman, Dan; Majed, Arya

2017-01-01

The NASA Engineering and Safety Center (NESC) is funding a study to develop an alternate method for performing coupled loads analysis called Norton-Thevenin Receptance Coupling (NTRC). NTRC combines Receptance Coupling (RC), a frequency-domain synthesis method and Norton-Thevenin (NT) theory, an impedance based approach for simulating the interaction between dynamic systems. The goal of developing the NTRC method is to provide a tool that payload developers can use to reduce the conservatism in defining preliminary design loads, assess the impact of design changes between formal load cycles, and to perform trade studies for design optimization with a minimum amount of data required from the launch vehicle (LV) provider. NTRC also has the ability to perform parametric loads analysis where many different design configurations can be evaluated. This will result in cost and schedule benefits to the payload developer that are currently not possible under the standard coupled loads analysis (CLA) flow where typically only 2-3 official load cycles are performed by the LV provider over the life of a payload program. NTRC is not envisioned as a replacement for the official load cycles performed by the LV provider but rather as a means to address the types of design issues faced by the payload developer before and between official load cycles.The presentation provides an overview of the NTRC methodology and discusses how NTRC can be used to replicate the results from a standard LV CLA. The presentation covers the benchmarking that has been performed as part of the NESC study to demonstrate the accuracy of the technique for both frequency and time domain dynamic analyses. Future plans for benchmarking the NTRC approach against CLA results for NASAs Space Launch System (SLS) and commercial launch vehicles are discussed and the role that NTRC is envisioned to play in the payload development cycle.
Benchmark simulation model no 2: general protocol and exploratory case studies.

PubMed

Jeppsson, U; Pons, M-N; Nopens, I; Alex, J; Copp, J B; Gernaey, K V; Rosen, C; Steyer, J-P; Vanrolleghem, P A

2007-01-01

Over a decade ago, the concept of objectively evaluating the performance of control strategies by simulating them using a standard model implementation was introduced for activated sludge wastewater treatment plants. The resulting Benchmark Simulation Model No 1 (BSM1) has been the basis for a significant new development that is reported on here: Rather than only evaluating control strategies at the level of the activated sludge unit (bioreactors and secondary clarifier) the new BSM2 now allows the evaluation of control strategies at the level of the whole plant, including primary clarifier and sludge treatment with anaerobic sludge digestion. In this contribution, the decisions that have been made over the past three years regarding the models used within the BSM2 are presented and argued, with particular emphasis on the ADM1 description of the digester, the interfaces between activated sludge and digester models, the included temperature dependencies and the reject water storage. BSM2-implementations are now available in a wide range of simulation platforms and a ring test has verified their proper implementation, consistent with the BSM2 definition. This guarantees that users can focus on the control strategy evaluation rather than on modelling issues. Finally, for illustration, twelve simple operational strategies have been implemented in BSM2 and their performance evaluated. Results show that it is an interesting control engineering challenge to further improve the performance of the BSM2 plant (which is the whole idea behind benchmarking) and that integrated control (i.e. acting at different places in the whole plant) is certainly worthwhile to achieve overall improvement.
Web Site Design Benchmarking within Industry Groups.

ERIC Educational Resources Information Center

Kim, Sung-Eon; Shaw, Thomas; Schneider, Helmut

2003-01-01

Discussion of electronic commerce focuses on Web site evaluation criteria and applies them to different industry groups in Korea. Defines six categories of Web site evaluation criteria: business function, corporate credibility, contents reliability, Web site attractiveness, systematic structure, and navigation; and discusses differences between…
Global Gridded Crop Model Evaluation: Benchmarking, Skills, Deficiencies and Implications.

NASA Technical Reports Server (NTRS)

Muller, Christoph; Elliott, Joshua; Chryssanthacopoulos, James; Arneth, Almut; Balkovic, Juraj; Ciais, Philippe; Deryng, Delphine; Folberth, Christian; Glotter, Michael; Hoek, Steven;

2017-01-01

Crop models are increasingly used to simulate crop yields at the global scale, but so far there is no general framework on how to assess model performance. Here we evaluate the simulation results of 14 global gridded crop modeling groups that have contributed historic crop yield simulations for maize, wheat, rice and soybean to the Global Gridded Crop Model Intercomparison (GGCMI) of the Agricultural Model Intercomparison and Improvement Project (AgMIP). Simulation results are compared to reference data at global, national and grid cell scales and we evaluate model performance with respect to time series correlation, spatial correlation and mean bias. We find that global gridded crop models (GGCMs) show mixed skill in reproducing time series correlations or spatial patterns at the different spatial scales. Generally, maize, wheat and soybean simulations of many GGCMs are capable of reproducing larger parts of observed temporal variability (time series correlation coefficients (r) of up to 0.888 for maize, 0.673 for wheat and 0.643 for soybean at the global scale) but rice yield variability cannot be well reproduced by most models. Yield variability can be well reproduced for most major producing countries by many GGCMs and for all countries by at least some. A comparison with gridded yield data and a statistical analysis of the effects of weather variability on yield variability shows that the ensemble of GGCMs can explain more of the yield variability than an ensemble of regression models for maize and soybean, but not for wheat and rice. We identify future research needs in global gridded crop modeling and for all individual crop modeling groups. In the absence of a purely observation-based benchmark for model evaluation, we propose that the best performing crop model per crop and region establishes the benchmark for all others, and modelers are encouraged to investigate how crop model performance can be increased. We make our evaluation system accessible to all crop modelers so that other modeling groups can also test their model performance against the reference data and the GGCMI benchmark.

Robust visual tracking via multiple discriminative models with object proposals

NASA Astrophysics Data System (ADS)

Zhang, Yuanqiang; Bi, Duyan; Zha, Yufei; Li, Huanyu; Ku, Tao; Wu, Min; Ding, Wenshan; Fan, Zunlin

2018-04-01

Model drift is an important reason for tracking failure. In this paper, multiple discriminative models with object proposals are used to improve the model discrimination for relieving this problem. Firstly, the target location and scale changing are captured by lots of high-quality object proposals, which are represented by deep convolutional features for target semantics. And then, through sharing a feature map obtained by a pre-trained network, ROI pooling is exploited to wrap the various sizes of object proposals into vectors of the same length, which are used to learn a discriminative model conveniently. Lastly, these historical snapshot vectors are trained by different lifetime models. Based on entropy decision mechanism, the bad model owing to model drift can be corrected by selecting the best discriminative model. This would improve the robustness of the tracker significantly. We extensively evaluate our tracker on two popular benchmarks, the OTB 2013 benchmark and UAV20L benchmark. On both benchmarks, our tracker achieves the best performance on precision and success rate compared with the state-of-the-art trackers.
Critical Assessment of Metagenome Interpretation – a benchmark of computational metagenomics software

PubMed Central

Sczyrba, Alexander; Hofmann, Peter; Belmann, Peter; Koslicki, David; Janssen, Stefan; Dröge, Johannes; Gregor, Ivan; Majda, Stephan; Fiedler, Jessika; Dahms, Eik; Bremges, Andreas; Fritz, Adrian; Garrido-Oter, Ruben; Jørgensen, Tue Sparholt; Shapiro, Nicole; Blood, Philip D.; Gurevich, Alexey; Bai, Yang; Turaev, Dmitrij; DeMaere, Matthew Z.; Chikhi, Rayan; Nagarajan, Niranjan; Quince, Christopher; Meyer, Fernando; Balvočiūtė, Monika; Hansen, Lars Hestbjerg; Sørensen, Søren J.; Chia, Burton K. H.; Denis, Bertrand; Froula, Jeff L.; Wang, Zhong; Egan, Robert; Kang, Dongwan Don; Cook, Jeffrey J.; Deltel, Charles; Beckstette, Michael; Lemaitre, Claire; Peterlongo, Pierre; Rizk, Guillaume; Lavenier, Dominique; Wu, Yu-Wei; Singer, Steven W.; Jain, Chirag; Strous, Marc; Klingenberg, Heiner; Meinicke, Peter; Barton, Michael; Lingner, Thomas; Lin, Hsin-Hung; Liao, Yu-Chieh; Silva, Genivaldo Gueiros Z.; Cuevas, Daniel A.; Edwards, Robert A.; Saha, Surya; Piro, Vitor C.; Renard, Bernhard Y.; Pop, Mihai; Klenk, Hans-Peter; Göker, Markus; Kyrpides, Nikos C.; Woyke, Tanja; Vorholt, Julia A.; Schulze-Lefert, Paul; Rubin, Edward M.; Darling, Aaron E.; Rattei, Thomas; McHardy, Alice C.

2018-01-01

In metagenome analysis, computational methods for assembly, taxonomic profiling and binning are key components facilitating downstream biological data interpretation. However, a lack of consensus about benchmarking datasets and evaluation metrics complicates proper performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on datasets of unprecedented complexity and realism. Benchmark metagenomes were generated from ~700 newly sequenced microorganisms and ~600 novel viruses and plasmids, including genomes with varying degrees of relatedness to each other and to publicly available ones and representing common experimental setups. Across all datasets, assembly and genome binning programs performed well for species represented by individual genomes, while performance was substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below the family level. Parameter settings substantially impacted performances, underscoring the importance of program reproducibility. While highlighting current challenges in computational metagenomics, the CAMI results provide a roadmap for software selection to answer specific research questions. PMID:28967888
First benchmark of the Unstructured Grid Adaptation Working Group

NASA Technical Reports Server (NTRS)

Ibanez, Daniel; Barral, Nicolas; Krakos, Joshua; Loseille, Adrien; Michal, Todd; Park, Mike

2017-01-01

Unstructured grid adaptation is a technology that holds the potential to improve the automation and accuracy of computational fluid dynamics and other computational disciplines. Difficulty producing the highly anisotropic elements necessary for simulation on complex curved geometries that satisfies a resolution request has limited this technology's widespread adoption. The Unstructured Grid Adaptation Working Group is an open gathering of researchers working on adapting simplicial meshes to conform to a metric field. Current members span a wide range of institutions including academia, industry, and national laboratories. The purpose of this group is to create a common basis for understanding and improving mesh adaptation. We present our first major contribution: a common set of benchmark cases, including input meshes and analytic metric specifications, that are publicly available to be used for evaluating any mesh adaptation code. We also present the results of several existing codes on these benchmark cases, to illustrate their utility in identifying key challenges common to all codes and important differences between available codes. Future directions are defined to expand this benchmark to mature the technology necessary to impact practical simulation workflows.
Benchmarking NNWSI flow and transport codes: COVE 1 results

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hayden, N.K.

1985-06-01

The code verification (COVE) activity of the Nevada Nuclear Waste Storage Investigations (NNWSI) Project is the first step in certification of flow and transport codes used for NNWSI performance assessments of a geologic repository for disposing of high-level radioactive wastes. The goals of the COVE activity are (1) to demonstrate and compare the numerical accuracy and sensitivity of certain codes, (2) to identify and resolve problems in running typical NNWSI performance assessment calculations, and (3) to evaluate computer requirements for running the codes. This report describes the work done for COVE 1, the first step in benchmarking some of themore » codes. Isothermal calculations for the COVE 1 benchmarking have been completed using the hydrologic flow codes SAGUARO, TRUST, and GWVIP; the radionuclide transport codes FEMTRAN and TRUMP; and the coupled flow and transport code TRACR3D. This report presents the results of three cases of the benchmarking problem solved for COVE 1, a comparison of the results, questions raised regarding sensitivities to modeling techniques, and conclusions drawn regarding the status and numerical sensitivities of the codes. 30 refs.« less
Benchmarking to Identify Practice Variation in Test Ordering: A Potential Tool for Utilization Management.

PubMed

Signorelli, Heather; Straseski, Joely A; Genzen, Jonathan R; Walker, Brandon S; Jackson, Brian R; Schmidt, Robert L

2015-01-01

Appropriate test utilization is usually evaluated by adherence to published guidelines. In many cases, medical guidelines are not available. Benchmarking has been proposed as a method to identify practice variations that may represent inappropriate testing. This study investigated the use of benchmarking to identify sites with inappropriate utilization of testing for a particular analyte. We used a Web-based survey to compare 2 measures of vitamin D utilization: overall testing intensity (ratio of total vitamin D orders to blood-count orders) and relative testing intensity (ratio of 1,25(OH)2D to 25(OH)D test orders). A total of 81 facilities contributed data. The average overall testing intensity index was 0.165, or approximately 1 vitamin D test for every 6 blood-count tests. The average relative testing intensity index was 0.055, or one 1,25(OH)2D test for every 18 of the 25(OH)D tests. Both indexes varied considerably. Benchmarking can be used as a screening tool to identify outliers that may be associated with inappropriate test utilization. Copyright© by the American Society for Clinical Pathology (ASCP).
A study on operation efficiency evaluation based on firm's financial index and benchmark selection: take China Unicom as an example

NASA Astrophysics Data System (ADS)

Wu, Zu-guang; Tian, Zhan-jun; Liu, Hui; Huang, Rui; Zhu, Guo-hua

2009-07-01

Being the only listed telecom operators of A share market, China Unicom has always been attracted many institutional investors under the concept of 3G recent years,which itself is a great technical progress expectation.Do the institutional investors or the concept of technical progress have signficant effect on the improving of firm's operating efficiency?Though reviewing the documentary about operating efficiency we find that schoolars study this problem useing the regress analyzing based on traditional production function and data envelopment analysis(DEA) and financial index anayzing and marginal function and capital labor ratio coefficient etc. All the methods mainly based on macrodata. This paper we use the micro-data of company to evaluate the operating efficiency.Using factor analyzing based on financial index and comparing the factor score of three years from 2005 to 2007, we find that China Unicom's operating efficiency is under the averge level of benchmark corporates and has't improved under the concept of 3G from 2005 to 2007.In other words,institutional investor or the conception of technical progress expectation have faint effect on the changes of China Unicom's operating efficiency. Selecting benchmark corporates as post to evaluate the operating efficiency is a characteristic of this method ,which is basicallly sipmly and direct.This method is suit for the operation efficiency evaluation of agriculture listed companies because agriculture listed also face technical progress and marketing concept such as tax-free etc.
Ground truth and benchmarks for performance evaluation

NASA Astrophysics Data System (ADS)

Takeuchi, Ayako; Shneier, Michael; Hong, Tsai Hong; Chang, Tommy; Scrapper, Christopher; Cheok, Geraldine S.

2003-09-01

Progress in algorithm development and transfer of results to practical applications such as military robotics requires the setup of standard tasks, of standard qualitative and quantitative measurements for performance evaluation and validation. Although the evaluation and validation of algorithms have been discussed for over a decade, the research community still faces a lack of well-defined and standardized methodology. The range of fundamental problems include a lack of quantifiable measures of performance, a lack of data from state-of-the-art sensors in calibrated real-world environments, and a lack of facilities for conducting realistic experiments. In this research, we propose three methods for creating ground truth databases and benchmarks using multiple sensors. The databases and benchmarks will provide researchers with high quality data from suites of sensors operating in complex environments representing real problems of great relevance to the development of autonomous driving systems. At NIST, we have prototyped a High Mobility Multi-purpose Wheeled Vehicle (HMMWV) system with a suite of sensors including a Riegl ladar, GDRS ladar, stereo CCD, several color cameras, Global Position System (GPS), Inertial Navigation System (INS), pan/tilt encoders, and odometry . All sensors are calibrated with respect to each other in space and time. This allows a database of features and terrain elevation to be built. Ground truth for each sensor can then be extracted from the database. The main goal of this research is to provide ground truth databases for researchers and engineers to evaluate algorithms for effectiveness, efficiency, reliability, and robustness, thus advancing the development of algorithms.
Evaluation of four seagrass species as early warning indicators for nitrogen overloading: Implications for eutrophic evaluation and ecosystem management.

PubMed

Yang, Xiaolong; Zhang, Peidong; Li, Wentao; Hu, Chengye; Zhang, Xiumei; He, Pingguo

2018-04-23

Seagrasses are major coastal primary producers and are widely distributed on coasts worldwide. Seagrasses show sensitivity to environmental stress due to their high phenotypic plasticity, and therefore, we evaluated the use of constituent elements in four dominant seagrass species as early warning indicators for nitrogen eutrophication of coastal regions. A meta-analysis was conducted with published data to develop a global benchmark for the selected indicator, which was used to evaluate nitrogen loading at a global scale. A case study at three bays was subsequently conducted to test for local-scale differences in leaf C/N ratios in four seagrasses. Additionally, morphological and physiological metrics of seagrasses were measured from the three locations under varied nitrogen levels to develop further assessment indexes. The benchmark and local study showed that leaf C/N ratios of Zostera marina were sensitive to nitrogen discharge, which could be a highly valuable early warning indicator on a global scale. Moreover, the threshold value of seagrass leaf C/N was determined according to the benchmark to differentiate eutrophic and low nitrogen levels at a local scale. Of the eight phenotypic metrics measured, leaf width, total chlorophyll (a + b), chlorophyll ratio (a/b), and starch in the rhizome were the most effective at discriminating between the three locations and could also be promising indicators for monitoring eutrophication. Copyright © 2018. Published by Elsevier B.V.
Toward a standard for the evaluation of PET-Auto-Segmentation methods following the recommendations of AAPM task group No. 211: Requirements and implementation.

PubMed

Berthon, Beatrice; Spezi, Emiliano; Galavis, Paulina; Shepherd, Tony; Apte, Aditya; Hatt, Mathieu; Fayad, Hadi; De Bernardi, Elisabetta; Soffientini, Chiara D; Ross Schmidtlein, C; El Naqa, Issam; Jeraj, Robert; Lu, Wei; Das, Shiva; Zaidi, Habib; Mawlawi, Osama R; Visvikis, Dimitris; Lee, John A; Kirov, Assen S

2017-08-01

The aim of this paper is to define the requirements and describe the design and implementation of a standard benchmark tool for evaluation and validation of PET-auto-segmentation (PET-AS) algorithms. This work follows the recommendations of Task Group 211 (TG211) appointed by the American Association of Physicists in Medicine (AAPM). The recommendations published in the AAPM TG211 report were used to derive a set of required features and to guide the design and structure of a benchmarking software tool. These items included the selection of appropriate representative data and reference contours obtained from established approaches and the description of available metrics. The benchmark was designed in a way that it could be extendable by inclusion of bespoke segmentation methods, while maintaining its main purpose of being a standard testing platform for newly developed PET-AS methods. An example of implementation of the proposed framework, named PETASset, was built. In this work, a selection of PET-AS methods representing common approaches to PET image segmentation was evaluated within PETASset for the purpose of testing and demonstrating the capabilities of the software as a benchmark platform. A selection of clinical, physical, and simulated phantom data, including "best estimates" reference contours from macroscopic specimens, simulation template, and CT scans was built into the PETASset application database. Specific metrics such as Dice Similarity Coefficient (DSC), Positive Predictive Value (PPV), and Sensitivity (S), were included to allow the user to compare the results of any given PET-AS algorithm to the reference contours. In addition, a tool to generate structured reports on the evaluation of the performance of PET-AS algorithms against the reference contours was built. The variation of the metric agreement values with the reference contours across the PET-AS methods evaluated for demonstration were between 0.51 and 0.83, 0.44 and 0.86, and 0.61 and 1.00 for DSC, PPV, and the S metric, respectively. Examples of agreement limits were provided to show how the software could be used to evaluate a new algorithm against the existing state-of-the art. PETASset provides a platform that allows standardizing the evaluation and comparison of different PET-AS methods on a wide range of PET datasets. The developed platform will be available to users willing to evaluate their PET-AS methods and contribute with more evaluation datasets. © 2017 The Authors. Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.

76 FR 44935 - Draft Guidance for Industry and Food and Drug Administration Staff; 510(k) Device Modifications...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-07-27

.... In August 2010, the Center for Devices and Radiological Health (CDRH) published two documents in.... These documents are titled ``CDRH Preliminary Internal Evaluations--Volume I: 510(k) Working Group Preliminary Report and Recommendations'' and ``CDRH Preliminary Internal Evaluations--Volume II: Task Force on...
Evaluation of the synoptic and mesoscale predictive capabilities of a mesoscale atmospheric simulation system

NASA Technical Reports Server (NTRS)

Koch, S. E.; Skillman, W. C.; Kocin, P. J.; Wetzel, P. J.; Brill, K.; Keyser, D. A.; Mccumber, M. C.

1983-01-01

The overall performance characteristics of a limited area, hydrostatic, fine (52 km) mesh, primitive equation, numerical weather prediction model are determined in anticipation of satellite data assimilations with the model. The synoptic and mesoscale predictive capabilities of version 2.0 of this model, the Mesoscale Atmospheric Simulation System (MASS 2.0), were evaluated. The two part study is based on a sample of approximately thirty 12h and 24h forecasts of atmospheric flow patterns during spring and early summer. The synoptic scale evaluation results benchmark the performance of MASS 2.0 against that of an operational, synoptic scale weather prediction model, the Limited area Fine Mesh (LFM). The large sample allows for the calculation of statistically significant measures of forecast accuracy and the determination of systematic model errors. The synoptic scale benchmark is required before unsmoothed mesoscale forecast fields can be seriously considered.
A heuristic approach to handle capacitated facility location problem evaluated using clustering internal evaluation

NASA Astrophysics Data System (ADS)

Sutanto, G. R.; Kim, S.; Kim, D.; Sutanto, H.

2018-03-01

One of the problems in dealing with capacitated facility location problem (CFLP) is occurred because of the difference between the capacity numbers of facilities and the number of customers that needs to be served. A facility with small capacity may result in uncovered customers. These customers need to be re-allocated to another facility that still has available capacity. Therefore, an approach is proposed to handle CFLP by using k-means clustering algorithm to handle customers’ allocation. And then, if customers’ re-allocation is needed, is decided by the overall average distance between customers and the facilities. This new approach is benchmarked to the existing approach by Liao and Guo which also use k-means clustering algorithm as a base idea to decide the facilities location and customers’ allocation. Both of these approaches are benchmarked by using three clustering evaluation methods with connectedness, compactness, and separations factors.
ComprehensiveBench: a Benchmark for the Extensive Evaluation of Global Scheduling Algorithms

NASA Astrophysics Data System (ADS)

Pilla, Laércio L.; Bozzetti, Tiago C.; Castro, Márcio; Navaux, Philippe O. A.; Méhaut, Jean-François

2015-10-01

Parallel applications that present tasks with imbalanced loads or complex communication behavior usually do not exploit the underlying resources of parallel platforms to their full potential. In order to mitigate this issue, global scheduling algorithms are employed. As finding the optimal task distribution is an NP-Hard problem, identifying the most suitable algorithm for a specific scenario and comparing algorithms are not trivial tasks. In this context, this paper presents ComprehensiveBench, a benchmark for global scheduling algorithms that enables the variation of a vast range of parameters that affect performance. ComprehensiveBench can be used to assist in the development and evaluation of new scheduling algorithms, to help choose a specific algorithm for an arbitrary application, to emulate other applications, and to enable statistical tests. We illustrate its use in this paper with an evaluation of Charm++ periodic load balancers that stresses their characteristics.
RCQ-GA: RDF Chain Query Optimization Using Genetic Algorithms

NASA Astrophysics Data System (ADS)

Hogenboom, Alexander; Milea, Viorel; Frasincar, Flavius; Kaymak, Uzay

The application of Semantic Web technologies in an Electronic Commerce environment implies a need for good support tools. Fast query engines are needed for efficient querying of large amounts of data, usually represented using RDF. We focus on optimizing a special class of SPARQL queries, the so-called RDF chain queries. For this purpose, we devise a genetic algorithm called RCQ-GA that determines the order in which joins need to be performed for an efficient evaluation of RDF chain queries. The approach is benchmarked against a two-phase optimization algorithm, previously proposed in literature. The more complex a query is, the more RCQ-GA outperforms the benchmark in solution quality, execution time needed, and consistency of solution quality. When the algorithms are constrained by a time limit, the overall performance of RCQ-GA compared to the benchmark further improves.
Organic contaminants, trace and major elements, and nutrients in water and sediment sampled in response to the Deepwater Horizon oil spill

USGS Publications Warehouse

Nowell, Lisa H.; Ludtke, Amy S.; Mueller, David K.; Scott, Jonathon C.

2011-01-01

Considering all the information evaluated in this report, there were significant differences between pre-landfall and post-landfall samples for PAH concentrations in sediment. Pre-landfall and post-landfall samples did not differ significantly in concentrations or benchmark exceedances for most organics in water or trace elements in sediment. For trace elements in water, aquatic-life benchmarks were exceeded in almost 50 percent of samples, but the high and variable analytical reporting levels precluded statistical comparison of benchmark exceedances between sampling periods. Concentrations of several PAH compounds in sediment were significantly higher in post-landfall samples than pre-landfall samples, and five of seven sites with the largest differences in PAH concentrations also had diagnostic geochemical evidence of Deepwater Horizon Macondo-1 oil from Rosenbauer and others (2010).
Maximal Unbiased Benchmarking Data Sets for Human Chemokine Receptors and Comparative Analysis.

PubMed

Xia, Jie; Reid, Terry-Elinor; Wu, Song; Zhang, Liangren; Wang, Xiang Simon

2018-05-29

Chemokine receptors (CRs) have long been druggable targets for the treatment of inflammatory diseases and HIV-1 infection. As a powerful technique, virtual screening (VS) has been widely applied to identifying small molecule leads for modern drug targets including CRs. For rational selection of a wide variety of VS approaches, ligand enrichment assessment based on a benchmarking data set has become an indispensable practice. However, the lack of versatile benchmarking sets for the whole CRs family that are able to unbiasedly evaluate every single approach including both structure- and ligand-based VS somewhat hinders modern drug discovery efforts. To address this issue, we constructed Maximal Unbiased Benchmarking Data sets for human Chemokine Receptors (MUBD-hCRs) using our recently developed tools of MUBD-DecoyMaker. The MUBD-hCRs encompasses 13 subtypes out of 20 chemokine receptors, composed of 404 ligands and 15756 decoys so far and is readily expandable in the future. It had been thoroughly validated that MUBD-hCRs ligands are chemically diverse while its decoys are maximal unbiased in terms of "artificial enrichment", "analogue bias". In addition, we studied the performance of MUBD-hCRs, in particular CXCR4 and CCR5 data sets, in ligand enrichment assessments of both structure- and ligand-based VS approaches in comparison with other benchmarking data sets available in the public domain and demonstrated that MUBD-hCRs is very capable of designating the optimal VS approach. MUBD-hCRs is a unique and maximal unbiased benchmarking set that covers major CRs subtypes so far.
NDEC: A NEA platform for nuclear data testing, verification and benchmarking

NASA Astrophysics Data System (ADS)

Díez, C. J.; Michel-Sendis, F.; Cabellos, O.; Bossant, M.; Soppera, N.

2017-09-01

The selection, testing, verification and benchmarking of evaluated nuclear data consists, in practice, in putting an evaluated file through a number of checking steps where different computational codes verify that the file and the data it contains complies with different requirements. These requirements range from format compliance to good performance in application cases, while at the same time physical constraints and the agreement with experimental data are verified. At NEA, the NDEC (Nuclear Data Evaluation Cycle) platform aims at providing, in a user friendly interface, a thorough diagnose of the quality of a submitted evaluated nuclear data file. Such diagnose is based on the results of different computational codes and routines which carry out the mentioned verifications, tests and checks. NDEC also searches synergies with other existing NEA tools and databases, such as JANIS, DICE or NDaST, including them into its working scheme. Hence, this paper presents NDEC, its current development status and its usage in the JEFF nuclear data project.
Development of a benchmark factor to detect wrinkles in bending parts

NASA Astrophysics Data System (ADS)

Engel, Bernd; Zehner, Bernd-Uwe; Mathes, Christian; Kuhnhen, Christopher

2013-12-01

The rotary draw bending process finds special use in the bending of parts with small bending radii. Due to the support of the forming zone during the bending process, semi-finished products with small wall thicknesses can be bent. One typical quality characteristic is the emergence of corrugations and wrinkles at the inside arc. Presently, the standard for the evaluation of wrinkles is insufficient. The wrinkles' distribution along the longitudinal axis of the tube results in an average value [1]. An evaluation of the wrinkles is not carried out. Due to the lack of an adequate basis of assessment, coordination problems between customers and suppliers occur. They result from an imprecision caused by the lack of quantitative evaluability of the geometric deviations at the inside arc. The benchmark factor for the inside arc presented in this article is an approach to holistically evaluate the geometric deviations at the inside arc. The classification of geometric deviations is carried out according to the area of the geometric characteristics and the respective flank angles.
Subsidence Rates in Southeast Texas as Determined by RTK GNSS Measurements of Preexisting Survey Markers

NASA Astrophysics Data System (ADS)

Kruger, J. M.

2013-12-01

This study determines the rates of subsidence or uplift in coastal areas of SE Texas by comparing recent GNSS measurements to the original orthometric heights of previously installed National Geodetic Survey (NGS) benchmarks. Understanding subsidence rates in coastal areas of SE Texas is critical when determining its vulnerability to local sea level rise and flooding, as well as for accurate survey control. The counties covered are Chambers, Galveston, Hardin, Jefferson, Liberty, Orange, and parts of Jasper and Newton counties. These counties lie between an earlier subsidence study conducted in Louisiana and an ongoing subsidence study of several counties around the Houston metropolitan area. The resurveying methods used in this RTK GNSS study allow a large area to be covered relatively quickly with enough detail to determine subsidence rates that are averaged over several decades. This information can be used to place more targeted GNSS observation stations in areas that appear to be rapidly subsiding. By continuously, or periodically, measuring the elevations at these targeted stations, current subsidence rates can be determined more accurately and at lower cost than by scattering a large number of GNSS stations over a wide area. This study was conducted using a Trimble R8 GNSS system on all NGS benchmarks that were found in the study area. Differential corrections were applied in real time using a VRS network of base stations. This system yields a nominal vertical accuracy of 1.5 to 2.0 cm for each 2 to 5 minute reading. Usually three of these readings were measured on each benchmark and averaged for the final result. A total of 367 benchmarks were resurveyed, most of which were suitable for vertical change rate calculations. Original NGS elevations were subtracted from the new elevations and divided by the time between the two elevation measurements to determine the average subsidence or uplift rate of the benchmark. Benchmarks used for determining the vertical change rates were monumented between1931 and 2006, thus yielding rates averaged for 5 to 80 years. Besides the errors inherent in RTK GNSS measurements, other sources of error for vertical change rates include inaccuracies in the original elevations published by the NGS and uncertainties about the year in which those original elevations were measured. Initial results show as much as -0.86 m of subsidence over a 58 year period on one benchmark in Jefferson County 30 km north of the coast, and up to +0.23 m of uplift over a 60 year period on one benchmark in Jasper County approximately 130 km north of the coast. Overall, preliminary results of the study show near zero vertical change rates to a maximum of -15.3 mm/yr subsidence in Chambers, Galveston, Liberty, and Jefferson counties, with the highest rates of subsidence in Jefferson and Chambers counties. Parts of Galveston, Orange, and Jasper counties show subsidence rates up to -9.1 mm/yr, but also show uplift rates up to +4.8 mm/yr. Potential causes of vertical change in the study area include expansion or contraction of near-surface clays due to changes in water content, compaction of near-surface to deeper sediments, growth faulting, groundwater, oil, or natural gas extraction or injection, and to a much smaller extent, tectonic effects.
Better Medicare Cost Report data are needed to help hospitals benchmark costs and performance.

PubMed

Magnus, S A; Smith, D G

2000-01-01

To evaluate costs and achieve cost control in the face of new technology and demands for efficiency from both managed care and governmental payers, hospitals need to benchmark their costs against those of other comparable hospitals. Since they typically use Medicare Cost Report (MCR) data for this purpose, a variety of cost accounting problems with the MCR may hamper hospitals' understanding of their relative costs and performance. Managers and researchers alike need to investigate the validity, accuracy, and timeliness of the MCR's cost accounting data.
Benchmark tests of JENDL-3.2 for thermal and fast reactors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Takano, Hideki; Akie, Hiroshi; Kikuchi, Yasuyuki

1994-12-31

Benchmark calculations for a variety of thermal and fast reactors have been performed by using the newly evaluated JENDL-3 Version-2 (JENDL-3.2) file. In the thermal reactor calculations for the uranium and plutonium fueled cores of TRX and TCA, the k{sub eff} and lattice parameters were well predicted. The fast reactor calculations for ZPPR-9 and FCA assemblies showed that the k{sub eff} reactivity worths of Doppler, sodium void and control rod, and reaction rate distribution were in a very good agreement with the experiments.
Engine dynamic analysis with general nonlinear finite element codes. Part 2: Bearing element implementation overall numerical characteristics and benchmaking

NASA Technical Reports Server (NTRS)

Padovan, J.; Adams, M.; Fertis, J.; Zeid, I.; Lam, P.

1982-01-01

Finite element codes are used in modelling rotor-bearing-stator structure common to the turbine industry. Engine dynamic simulation is used by developing strategies which enable the use of available finite element codes. benchmarking the elements developed are benchmarked by incorporation into a general purpose code (ADINA); the numerical characteristics of finite element type rotor-bearing-stator simulations are evaluated through the use of various types of explicit/implicit numerical integration operators. Improving the overall numerical efficiency of the procedure is improved.
Open-source platform to benchmark fingerprints for ligand-based virtual screening

PubMed Central

2013-01-01

Similarity-search methods using molecular fingerprints are an important tool for ligand-based virtual screening. A huge variety of fingerprints exist and their performance, usually assessed in retrospective benchmarking studies using data sets with known actives and known or assumed inactives, depends largely on the validation data sets used and the similarity measure used. Comparing new methods to existing ones in any systematic way is rather difficult due to the lack of standard data sets and evaluation procedures. Here, we present a standard platform for the benchmarking of 2D fingerprints. The open-source platform contains all source code, structural data for the actives and inactives used (drawn from three publicly available collections of data sets), and lists of randomly selected query molecules to be used for statistically valid comparisons of methods. This allows the exact reproduction and comparison of results for future studies. The results for 12 standard fingerprints together with two simple baseline fingerprints assessed by seven evaluation methods are shown together with the correlations between methods. High correlations were found between the 12 fingerprints and a careful statistical analysis showed that only the two baseline fingerprints were different from the others in a statistically significant way. High correlations were also found between six of the seven evaluation methods, indicating that despite their seeming differences, many of these methods are similar to each other. PMID:23721588
The solution of the point kinetics equations via converged accelerated Taylor series (CATS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ganapol, B.; Picca, P.; Previti, A.

This paper deals with finding accurate solutions of the point kinetics equations including non-linear feedback, in a fast, efficient and straightforward way. A truncated Taylor series is coupled to continuous analytical continuation to provide the recurrence relations to solve the ordinary differential equations of point kinetics. Non-linear (Wynn-epsilon) and linear (Romberg) convergence accelerations are employed to provide highly accurate results for the evaluation of Taylor series expansions and extrapolated values of neutron and precursor densities at desired edits. The proposed Converged Accelerated Taylor Series, or CATS, algorithm automatically performs successive mesh refinements until the desired accuracy is obtained, making usemore » of the intermediate results for converged initial values at each interval. Numerical performance is evaluated using case studies available from the literature. Nearly perfect agreement is found with the literature results generally considered most accurate. Benchmark quality results are reported for several cases of interest including step, ramp, zigzag and sinusoidal prescribed insertions and insertions with adiabatic Doppler feedback. A larger than usual (9) number of digits is included to encourage honest benchmarking. The benchmark is then applied to the enhanced piecewise constant algorithm (EPCA) currently being developed by the second author. (authors)« less
Bio-inspired benchmark generator for extracellular multi-unit recordings

PubMed Central

Mondragón-González, Sirenia Lizbeth; Burguière, Eric

2017-01-01

The analysis of multi-unit extracellular recordings of brain activity has led to the development of numerous tools, ranging from signal processing algorithms to electronic devices and applications. Currently, the evaluation and optimisation of these tools are hampered by the lack of ground-truth databases of neural signals. These databases must be parameterisable, easy to generate and bio-inspired, i.e. containing features encountered in real electrophysiological recording sessions. Towards that end, this article introduces an original computational approach to create fully annotated and parameterised benchmark datasets, generated from the summation of three components: neural signals from compartmental models and recorded extracellular spikes, non-stationary slow oscillations, and a variety of different types of artefacts. We present three application examples. (1) We reproduced in-vivo extracellular hippocampal multi-unit recordings from either tetrode or polytrode designs. (2) We simulated recordings in two different experimental conditions: anaesthetised and awake subjects. (3) Last, we also conducted a series of simulations to study the impact of different level of artefacts on extracellular recordings and their influence in the frequency domain. Beyond the results presented here, such a benchmark dataset generator has many applications such as calibration, evaluation and development of both hardware and software architectures. PMID:28233819
Evaluation of triclosan in Minnesota lakes and rivers: Part I - ecological risk assessment.

PubMed

Lyndall, Jennifer; Barber, Timothy; Mahaney, Wendy; Bock, Michael; Capdevielle, Marie

2017-08-01

Triclosan, an antimicrobial compound found in consumer products, may be introduced into the aquatic environment via residual concentrations in municipal wastewater treatment effluent. We conducted an aquatic risk assessment that incorporated the available measured triclosan data from Minnesota lakes and rivers. Although only data reported from Minnesota were considered in the risk assessment, the developed toxicity benchmarks can be applied to other environments. The data were evaluated using a series of environmental fate models to ensure the data were internally consistent and to fill any data gaps. Triclosan was not detected in over 75% of the 567 surface water and sediment samples. Measured environmental data were used to model the predicted environmental exposures to triclosan in surface water, surface sediment, and biota tissues. Toxicity benchmarks based on fatty acid synthesis inhibition and narcosis were determined for aquatic organisms based, in part, on a species sensitivity distribution of chronic toxicity thresholds from the available literature. Predicted and measured environmental concentrations for surface water, sediment, and tissue were below the effects benchmarks, indicating that exposure to triclosan in Minnesota lakes and rivers would not pose an unacceptable risk to aquatic organisms. Copyright © 2017 Elsevier Inc. All rights reserved.
Model benchmarking and reference signals for angled-beam shear wave ultrasonic nondestructive evaluation (NDE) inspections

NASA Astrophysics Data System (ADS)

Aldrin, John C.; Hopkins, Deborah; Datuin, Marvin; Warchol, Mark; Warchol, Lyudmila; Forsyth, David S.; Buynak, Charlie; Lindgren, Eric A.

2017-02-01

For model benchmark studies, the accuracy of the model is typically evaluated based on the change in response relative to a selected reference signal. The use of a side drilled hole (SDH) in a plate was investigated as a reference signal for angled beam shear wave inspection for aircraft structure inspections of fastener sites. Systematic studies were performed with varying SDH depth and size, and varying the ultrasonic probe frequency, focal depth, and probe height. Increased error was observed with the simulation of angled shear wave beams in the near-field. Even more significant, asymmetry in real probes and the inherent sensitivity of signals in the near-field to subtle test conditions were found to provide a greater challenge with achieving model agreement. To achieve quality model benchmark results for this problem, it is critical to carefully align the probe with the part geometry, to verify symmetry in probe response, and ideally avoid using reference signals from the near-field response. Suggested reference signals for angled beam shear wave inspections include using the `through hole' corner specular reflection signal and the full skip' signal off of the far wall from the side drilled hole.
Electrochemical Characterization Laboratory | Energy Systems Integration

Science.gov Websites

proton exchange membrane fuel cells. Photo of an NREL researcher evaluating catalyst activity in the the following capabilities: Determination and benchmarking of novel electrocatalyst activity
Towards unbiased benchmarking of evolutionary and hybrid algorithms for real-valued optimisation

NASA Astrophysics Data System (ADS)

MacNish, Cara

2007-12-01

Randomised population-based algorithms, such as evolutionary, genetic and swarm-based algorithms, and their hybrids with traditional search techniques, have proven successful and robust on many difficult real-valued optimisation problems. This success, along with the readily applicable nature of these techniques, has led to an explosion in the number of algorithms and variants proposed. In order for the field to advance it is necessary to carry out effective comparative evaluations of these algorithms, and thereby better identify and understand those properties that lead to better performance. This paper discusses the difficulties of providing benchmarking of evolutionary and allied algorithms that is both meaningful and logistically viable. To be meaningful the benchmarking test must give a fair comparison that is free, as far as possible, from biases that favour one style of algorithm over another. To be logistically viable it must overcome the need for pairwise comparison between all the proposed algorithms. To address the first problem, we begin by attempting to identify the biases that are inherent in commonly used benchmarking functions. We then describe a suite of test problems, generated recursively as self-similar or fractal landscapes, designed to overcome these biases. For the second, we describe a server that uses web services to allow researchers to 'plug in' their algorithms, running on their local machines, to a central benchmarking repository.

Benchmarking B-Cell Epitope Prediction with Quantitative Dose-Response Data on Antipeptide Antibodies: Towards Novel Pharmaceutical Product Development

PubMed Central

Caoili, Salvador Eugenio C.

2014-01-01

B-cell epitope prediction can enable novel pharmaceutical product development. However, a mechanistically framed consensus has yet to emerge on benchmarking such prediction, thus presenting an opportunity to establish standards of practice that circumvent epistemic inconsistencies of casting the epitope prediction task as a binary-classification problem. As an alternative to conventional dichotomous qualitative benchmark data, quantitative dose-response data on antibody-mediated biological effects are more meaningful from an information-theoretic perspective in the sense that such effects may be expressed as probabilities (e.g., of functional inhibition by antibody) for which the Shannon information entropy (SIE) can be evaluated as a measure of informativeness. Accordingly, half-maximal biological effects (e.g., at median inhibitory concentrations of antibody) correspond to maximally informative data while undetectable and maximal biological effects correspond to minimally informative data. This applies to benchmarking B-cell epitope prediction for the design of peptide-based immunogens that elicit antipeptide antibodies with functionally relevant cross-reactivity. Presently, the Immune Epitope Database (IEDB) contains relatively few quantitative dose-response data on such cross-reactivity. Only a small fraction of these IEDB data is maximally informative, and many more of them are minimally informative (i.e., with zero SIE). Nevertheless, the numerous qualitative data in IEDB suggest how to overcome the paucity of informative benchmark data. PMID:24949474
Requirements for benchmarking personal image retrieval systems

NASA Astrophysics Data System (ADS)

Bouguet, Jean-Yves; Dulong, Carole; Kozintsev, Igor; Wu, Yi

2006-01-01

It is now common to have accumulated tens of thousands of personal ictures. Efficient access to that many pictures can only be done with a robust image retrieval system. This application is of high interest to Intel processor architects. It is highly compute intensive, and could motivate end users to upgrade their personal computers to the next generations of processors. A key question is how to assess the robustness of a personal image retrieval system. Personal image databases are very different from digital libraries that have been used by many Content Based Image Retrieval Systems.1 For example a personal image database has a lot of pictures of people, but a small set of different people typically family, relatives, and friends. Pictures are taken in a limited set of places like home, work, school, and vacation destination. The most frequent queries are searched for people, and for places. These attributes, and many others affect how a personal image retrieval system should be benchmarked, and benchmarks need to be different from existing ones based on art images, or medical images for examples. The attributes of the data set do not change the list of components needed for the benchmarking of such systems as specified in2: - data sets - query tasks - ground truth - evaluation measures - benchmarking events. This paper proposed a way to build these components to be representative of personal image databases, and of the corresponding usage models.
Study rationale and design of OPTIMISE, a randomised controlled trial on the effect of benchmarking on quality of care in type 2 diabetes mellitus.

PubMed

Nobels, Frank; Debacker, Noëmi; Brotons, Carlos; Elisaf, Moses; Hermans, Michel P; Michel, Georges; Muls, Erik

2011-09-22

To investigate the effect of physician- and patient-specific feedback with benchmarking on the quality of care in adults with type 2 diabetes mellitus (T2DM). Study centres in six European countries were randomised to either a benchmarking or control group. Physicians in both groups received feedback on modifiable outcome indicators (glycated haemoglobin [HbA1c], glycaemia, total cholesterol, high density lipoprotein-cholesterol, low density lipoprotein [LDL]-cholesterol and triglycerides) for each patient at 0, 4, 8 and 12 months, based on the four times yearly control visits recommended by international guidelines. The benchmarking group also received comparative results on three critical quality indicators of vascular risk (HbA1c, LDL-cholesterol and systolic blood pressure [SBP]), checked against the results of their colleagues from the same country, and versus pre-set targets. After 12 months of follow up, the percentage of patients achieving the pre-determined targets for the three critical quality indicators will be assessed in the two groups. Recruitment was completed in December 2008 with 3994 evaluable patients. This paper discusses the study rationale and design of OPTIMISE, a randomised controlled study, that will help assess whether benchmarking is a useful clinical tool for improving outcomes in T2DM in primary care. NCT00681850.
Study rationale and design of OPTIMISE, a randomised controlled trial on the effect of benchmarking on quality of care in type 2 diabetes mellitus

PubMed Central

2011-01-01

Background To investigate the effect of physician- and patient-specific feedback with benchmarking on the quality of care in adults with type 2 diabetes mellitus (T2DM). Methods Study centres in six European countries were randomised to either a benchmarking or control group. Physicians in both groups received feedback on modifiable outcome indicators (glycated haemoglobin [HbA1c], glycaemia, total cholesterol, high density lipoprotein-cholesterol, low density lipoprotein [LDL]-cholesterol and triglycerides) for each patient at 0, 4, 8 and 12 months, based on the four times yearly control visits recommended by international guidelines. The benchmarking group also received comparative results on three critical quality indicators of vascular risk (HbA1c, LDL-cholesterol and systolic blood pressure [SBP]), checked against the results of their colleagues from the same country, and versus pre-set targets. After 12 months of follow up, the percentage of patients achieving the pre-determined targets for the three critical quality indicators will be assessed in the two groups. Results Recruitment was completed in December 2008 with 3994 evaluable patients. Conclusions This paper discusses the study rationale and design of OPTIMISE, a randomised controlled study, that will help assess whether benchmarking is a useful clinical tool for improving outcomes in T2DM in primary care. Trial registration NCT00681850 PMID:21939502
OPTIMIZATION OF MUD HAMMER DRILLING PERFORMANCE - A PROGRAM TO BENCHMARK THE VIABILITY OF ADVANCED MUD HAMMER DRILLING

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gordon Tibbitts; Arnis Judzis

2001-04-01

This document details the progress to date on the OPTIMIZATION OF MUD HAMMER DRILLING PERFORMANCE -- A PROGRAM TO BENCHMARK THE VIABILITY OF ADVANCED MUD HAMMER DRILLING contract for the quarter starting January 2001 through March 2001. Accomplishments to date include the following: (1) On January 9th of 2001, details of the Mud Hammer Drilling Performance Testing Project were presented at a ''kick-off'' meeting held in Morgantown. (2) A preliminary test program was formulated and prepared for presentation at a meeting of the advisory board in Houston on the 8th of February. (3) The meeting was held with the advisorymore » board reviewing the test program in detail. (4) Consensus was achieved and the approved test program was initiated after thorough discussion. (5) This new program outlined the details of the drilling tests as well as scheduling the test program for the weeks of 14th and 21st of May 2001. (6) All the tasks were initiated for a completion to coincide with the test schedule. (7) By the end of March the hardware had been designed and the majority was either being fabricated or completed. (8) The rock was received and cored into cylinders.« less
A comparison of international occupational therapy competencies: implications for Australian standards in the new millennium.

PubMed

Rodger, Sylvia; Clark, Michele; Banks, Rebecca; O'Brien, Mia; Martinez, Kay

2009-12-01

A timely evaluation of the Australian Competency Standards for Entry-Level Occupational Therapists (1994) was conducted. This thorough investigation comprised a literature review exploring the concept of competence and the applications of competency standards; systematic benchmarking of the Australian Occupational Therapy Competency Standards (OT AUSTRALIA, 1994) against other national and international competency standards and other affiliated documents, from occupational therapy and other cognate disciplines; and extensive nationwide consultation with the professional community. This paper explores and examines the similarities and disparities between occupational therapy competency standards documents available in English from Australia and other countries. An online search for national occupational therapy competency standards located 10 documents, including the Australian competencies. Four 'frameworks' were created to categorise the documents according to their conceptual underpinnings: Technical-Prescriptive, Enabling, Educational and Meta-Cognitive. Other characteristics that appeared to impact the design, content and implementation of competency standards, including definitions of key concepts, authorship, national and cultural priorities, scope of services, intended use and review mechanisms, were revealed. The proposed 'frameworks' and identification of influential characteristics provided a 'lens' through which to understand and evaluate competency standards. While consistent application of and attention to some of these characteristics appear to consolidate and affirm the authority of competency standards, it is suggested that the national context should be a critical determinant of the design and content of the final document. The Australian Occupational Therapy Competency Standards (OT AUSTRALIA, 1994) are critiqued accordingly, and preliminary recommendations for revision are proposed.
Design and Application of a Community Land Benchmarking System for Earth System Models

NASA Astrophysics Data System (ADS)

Mu, M.; Hoffman, F. M.; Lawrence, D. M.; Riley, W. J.; Keppel-Aleks, G.; Koven, C. D.; Kluzek, E. B.; Mao, J.; Randerson, J. T.

2015-12-01

Benchmarking has been widely used to assess the ability of climate models to capture the spatial and temporal variability of observations during the historical era. For the carbon cycle and terrestrial ecosystems, the design and development of an open-source community platform has been an important goal as part of the International Land Model Benchmarking (ILAMB) project. Here we developed a new benchmarking software system that enables the user to specify the models, benchmarks, and scoring metrics, so that results can be tailored to specific model intercomparison projects. Evaluation data sets included soil and aboveground carbon stocks, fluxes of energy, carbon and water, burned area, leaf area, and climate forcing and response variables. We used this system to evaluate simulations from the 5th Phase of the Coupled Model Intercomparison Project (CMIP5) with prognostic atmospheric carbon dioxide levels over the period from 1850 to 2005 (i.e., esmHistorical simulations archived on the Earth System Grid Federation). We found that the multi-model ensemble had a high bias in incoming solar radiation across Asia, likely as a consequence of incomplete representation of aerosol effects in this region, and in South America, primarily as a consequence of a low bias in mean annual precipitation. The reduced precipitation in South America had a larger influence on gross primary production than the high bias in incoming light, and as a consequence gross primary production had a low bias relative to the observations. Although model to model variations were large, the multi-model mean had a positive bias in atmospheric carbon dioxide that has been attributed in past work to weak ocean uptake of fossil emissions. In mid latitudes of the northern hemisphere, most models overestimate latent heat fluxes in the early part of the growing season, and underestimate these fluxes in mid-summer and early fall, whereas sensible heat fluxes show the opposite trend.
Selecting Students at Risk of Academic Difficulties

ERIC Educational Resources Information Center

Cummings, Kelli D.; Smolkowski, Keith

2015-01-01

This paper aims to translate for practitioners the principles and methods for evaluating screening measures in education, including benchmark goals and cut points, from our technical manuscript "Evaluation of Diagnostic Systems: The Selection of Students at Risk of Academic Difficulties" (this issue). We offer a brief description of…
A Journal-Neutral Ratio for Marketing Faculty Scholarship Assessment

ERIC Educational Resources Information Center

Elbeck, Matt; Baruca, Arne

2015-01-01

This article proposes a journal-neutral Publication to Citation Ratio (PCR) to complement qualitative methods to evaluate a marketing educator's scholarship for reappointment, promotion, tenure, and post-tenure review (RPTP) decisions. We empirically establish a minimum time period to evaluate scholarship data, then benchmark publication and…
Highly Enriched Uranium Metal Cylinders Surrounded by Various Reflector Materials

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bernard Jones; J. Blair Briggs; Leland Monteirth

A series of experiments was performed at Los Alamos Scientific Laboratory in 1958 to determine critical masses of cylinders of Oralloy (Oy) reflected by a number of materials. The experiments were all performed on the Comet Universal Critical Assembly Machine, and consisted of discs of highly enriched uranium (93.3 wt.% 235U) reflected by half-inch and one-inch-thick cylindrical shells of various reflector materials. The experiments were performed by members of Group N-2, particularly K. W. Gallup, G. E. Hansen, H. C. Paxton, and R. H. White. This experiment was intended to ascertain critical masses for criticality safety purposes, as well asmore » to compare neutron transport cross sections to those obtained from danger coefficient measurements with the Topsy Oralloy-Tuballoy reflected and Godiva unreflected critical assemblies. The reflector materials examined in this series of experiments are as follows: magnesium, titanium, aluminum, graphite, mild steel, nickel, copper, cobalt, molybdenum, natural uranium, tungsten, beryllium, aluminum oxide, molybdenum carbide, and polythene (polyethylene). Also included are two special configurations of composite beryllium and iron reflectors. Analyses were performed in which uncertainty associated with six different parameters was evaluated; namely, extrapolation to the uranium critical mass, uranium density, 235U enrichment, reflector density, reflector thickness, and reflector impurities. In addition to the idealizations made by the experimenters (removal of the platen and diaphragm), two simplifications were also made to the benchmark models that resulted in a small bias and additional uncertainty. First of all, since impurities in core and reflector materials are only estimated, they are not included in the benchmark models. Secondly, the room, support structure, and other possible surrounding equipment were not included in the model. Bias values that result from these two simplifications were determined and associated uncertainty in the bias values were included in the overall uncertainty in benchmark keff values. Bias values were very small, ranging from 0.0004 ?k low to 0.0007 ?k low. Overall uncertainties range from ? 0.0018 to ? 0.0030. Major contributors to the overall uncertainty include uncertainty in the extrapolation to the uranium critical mass and the uranium density. Results are summarized in Figure 1. Figure 1. Experimental, Benchmark-Model, and MCNP/KENO Calculated Results The 32 configurations described and evaluated under ICSBEP Identifier HEU-MET-FAST-084 are judged to be acceptable for use as criticality safety benchmark experiments and should be valuable integral benchmarks for nuclear data testing of the various reflector materials. Details of the benchmark models, uncertainty analyses, and final results are given in this paper.« less
Preliminary Computational Analysis of the (HIRENASD) Configuration in Preparation for the Aeroelastic Prediction Workshop

NASA Technical Reports Server (NTRS)

Chwalowski, Pawel; Florance, Jennifer P.; Heeg, Jennifer; Wieseman, Carol D.; Perry, Boyd P.

2011-01-01

This paper presents preliminary computational aeroelastic analysis results generated in preparation for the first Aeroelastic Prediction Workshop (AePW). These results were produced using FUN3D software developed at NASA Langley and are compared against the experimental data generated during the HIgh REynolds Number Aero- Structural Dynamics (HIRENASD) Project. The HIRENASD wind-tunnel model was tested in the European Transonic Windtunnel in 2006 by Aachen University0s Department of Mechanics with funding from the German Research Foundation. The computational effort discussed here was performed (1) to obtain a preliminary assessment of the ability of the FUN3D code to accurately compute physical quantities experimentally measured on the HIRENASD model and (2) to translate the lessons learned from the FUN3D analysis of HIRENASD into a set of initial guidelines for the first AePW, which includes test cases for the HIRENASD model and its experimental data set. This paper compares the computational and experimental results obtained at Mach 0.8 for a Reynolds number of 7 million based on chord, corresponding to the HIRENASD test conditions No. 132 and No. 159. Aerodynamic loads and static aeroelastic displacements are compared at two levels of the grid resolution. Harmonic perturbation numerical results are compared with the experimental data using the magnitude and phase relationship between pressure coefficients and displacement. A dynamic aeroelastic numerical calculation is presented at one wind-tunnel condition in the form of the time history of the generalized displacements. Additional FUN3D validation results are also presented for the AGARD 445.6 wing data set. This wing was tested in the Transonic Dynamics Tunnel and is commonly used in the preliminary benchmarking of computational aeroelastic software.
Forest resource information system

NASA Technical Reports Server (NTRS)

Mroczynski, R. P. (Principal Investigator)

1978-01-01

The author has identified the following significant results. A benchmark classification evaluation framework was implemented. The FRIS preprocessing activities were refined. Potential geo-based referencing systems were identified as components of FRIS.
7 CFR 3430.32 - Preliminary application review.

Code of Federal Regulations, 2010 CFR

2010-01-01

..., EDUCATION, AND EXTENSION SERVICE, DEPARTMENT OF AGRICULTURE COMPETITIVE AND NONCOMPETITIVE NON-FORMULA... Evaluation § 3430.32 Preliminary application review. Prior to technical examination, a preliminary review...
Groundwater-quality data for the Madera/Chowchilla–Kings shallow aquifer study unit, 2013–14: Results from the California GAMA Program

USGS Publications Warehouse

Shelton, Jennifer L.; Fram, Miranda S.

2017-02-03

Groundwater quality in the 2,390-square-mile Madera/Chowchilla–Kings Shallow Aquifer study unit was investigated by the U.S. Geological Survey from August 2013 to April 2014 as part of the California State Water Resources Control Board Groundwater Ambient Monitoring and Assessment Program’s Priority Basin Project. The study was designed to provide a statistically unbiased, spatially distributed assessment of untreated groundwater quality in the shallow aquifer systems of the Madera, Chowchilla, and Kings subbasins of the San Joaquin Valley groundwater basin. The shallow aquifer system corresponds to the part of the aquifer system generally used by domestic wells and is shallower than the part of the aquifer system generally used by public-supply wells. This report presents the data collected for the study and a brief preliminary description of the results.Groundwater samples were collected from 77 wells and were analyzed for organic constituents, inorganic constituents, selected isotopic and age-dating tracers, and microbial indicators. Most of the wells sampled for this study were private domestic wells. Unlike groundwater from public-supply wells, the groundwater from private domestic wells is not regulated for quality in California and is rarely analyzed for water-quality constituents. To provide context for the sampling results, however, concentrations of constituents measured in the untreated groundwater were compared with regulatory and non-regulatory benchmarks established for drinking-water quality by the U.S. Environmental Protection Agency, the State of California, and the U.S. Geological Survey.Of the 319 organic constituents assessed in this study (90 volatile organic compounds and 229 pesticides and pesticide degradates), 17 volatile organic compounds and 23 pesticides and pesticide degradates were detected in groundwater samples; concentrations of all but 2 were less than the respective benchmarks. The fumigants 1,2-dibromo-3-chloropropane (DBCP) and 1,2-dibromoethane (EDB) were detected at concentrations above their respective regulatory benchmarks in samples from a total of four wells.Most detections of inorganic constituents were at concentrations or activities less than the respective benchmark levels. Five inorganic constituents were detected in groundwater samples from one or more wells at concentrations or activities greater than their respective regulatory, health-based benchmarks: arsenic, uranium, nitrate, adjusted gross alpha particle activity, and gross beta particle activity. Four inorganic constituents were detected in samples from one or more wells at concentrations or activities greater than their respective non-regulatory, health-based benchmarks: manganese, molybdenum, vanadium, and radon-222. Three inorganic constituents were detected in groundwater samples from one or more wells at concentrations greater than their respective non-regulatory, aesthetic-based benchmarks: iron, sulfate, and total dissolved solids.Microbial indicators (Escherichia coli, total coliform, and enterococci) were analyzed for presence or absence. The presence of Escherichia coli (E. coli) was not detected; the presence of total coliform was detected in samples from 10 of the 72 grid wells for which it was analyzed, and the presence of enterococci was detected in samples from 5 of the 73 grid wells analyzed.
GEN-IV Benchmarking of Triso Fuel Performance Models under accident conditions modeling input data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Collin, Blaise Paul

This document presents the benchmark plan for the calculation of particle fuel performance on safety testing experiments that are representative of operational accidental transients. The benchmark is dedicated to the modeling of fission product release under accident conditions by fuel performance codes from around the world, and the subsequent comparison to post-irradiation experiment (PIE) data from the modeled heating tests. The accident condition benchmark is divided into three parts: • The modeling of a simplified benchmark problem to assess potential numerical calculation issues at low fission product release. • The modeling of the AGR-1 and HFR-EU1bis safety testing experiments. •more » The comparison of the AGR-1 and HFR-EU1bis modeling results with PIE data. The simplified benchmark case, thereafter named NCC (Numerical Calculation Case), is derived from “Case 5” of the International Atomic Energy Agency (IAEA) Coordinated Research Program (CRP) on coated particle fuel technology [IAEA 2012]. It is included so participants can evaluate their codes at low fission product release. “Case 5” of the IAEA CRP-6 showed large code-to-code discrepancies in the release of fission products, which were attributed to “effects of the numerical calculation method rather than the physical model” [IAEA 2012]. The NCC is therefore intended to check if these numerical effects subsist. The first two steps imply the involvement of the benchmark participants with a modeling effort following the guidelines and recommendations provided by this document. The third step involves the collection of the modeling results by Idaho National Laboratory (INL) and the comparison of these results with the available PIE data. The objective of this document is to provide all necessary input data to model the benchmark cases, and to give some methodology guidelines and recommendations in order to make all results suitable for comparison with each other. The participants should read this document thoroughly to make sure all the data needed for their calculations is provided in the document. Missing data will be added to a revision of the document if necessary. 09/2016: Tables 6 and 8 updated. AGR-2 input data added« less
Generation IV benchmarking of TRISO fuel performance models under accident conditions: Modeling input data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Collin, Blaise P.

2014-09-01

This document presents the benchmark plan for the calculation of particle fuel performance on safety testing experiments that are representative of operational accidental transients. The benchmark is dedicated to the modeling of fission product release under accident conditions by fuel performance codes from around the world, and the subsequent comparison to post-irradiation experiment (PIE) data from the modeled heating tests. The accident condition benchmark is divided into three parts: the modeling of a simplified benchmark problem to assess potential numerical calculation issues at low fission product release; the modeling of the AGR-1 and HFR-EU1bis safety testing experiments; and, the comparisonmore » of the AGR-1 and HFR-EU1bis modeling results with PIE data. The simplified benchmark case, thereafter named NCC (Numerical Calculation Case), is derived from ''Case 5'' of the International Atomic Energy Agency (IAEA) Coordinated Research Program (CRP) on coated particle fuel technology [IAEA 2012]. It is included so participants can evaluate their codes at low fission product release. ''Case 5'' of the IAEA CRP-6 showed large code-to-code discrepancies in the release of fission products, which were attributed to ''effects of the numerical calculation method rather than the physical model''[IAEA 2012]. The NCC is therefore intended to check if these numerical effects subsist. The first two steps imply the involvement of the benchmark participants with a modeling effort following the guidelines and recommendations provided by this document. The third step involves the collection of the modeling results by Idaho National Laboratory (INL) and the comparison of these results with the available PIE data. The objective of this document is to provide all necessary input data to model the benchmark cases, and to give some methodology guidelines and recommendations in order to make all results suitable for comparison with each other. The participants should read this document thoroughly to make sure all the data needed for their calculations is provided in the document. Missing data will be added to a revision of the document if necessary.« less
Monte Carlo source simulation technique for solution of interference reactions in INAA experiments: a preliminary report

NASA Astrophysics Data System (ADS)

Allaf, M. Athari; Shahriari, M.; Sohrabpour, M.

2004-04-01

A new method using Monte Carlo source simulation of interference reactions in neutron activation analysis experiments has been developed. The neutron spectrum at the sample location has been simulated using the Monte Carlo code MCNP and the contributions of different elements to produce a specified gamma line have been determined. The produced response matrix has been used to measure peak areas and the sample masses of the elements of interest. A number of benchmark experiments have been performed and the calculated results verified against known values. The good agreement obtained between the calculated and known values suggests that this technique may be useful for the elimination of interference reactions in neutron activation analysis.
GENASIS: General Astrophysical Simulation System. I. Refinable Mesh and Nonrelativistic Hydrodynamics

NASA Astrophysics Data System (ADS)

Cardall, Christian Y.; Budiardja, Reuben D.; Endeve, Eirik; Mezzacappa, Anthony

2014-02-01

GenASiS (General Astrophysical Simulation System) is a new code being developed initially and primarily, though by no means exclusively, for the simulation of core-collapse supernovae on the world's leading capability supercomputers. This paper—the first in a series—demonstrates a centrally refined coordinate patch suitable for gravitational collapse and documents methods for compressible nonrelativistic hydrodynamics. We benchmark the hydrodynamics capabilities of GenASiS against many standard test problems; the results illustrate the basic competence of our implementation, demonstrate the strengths and limitations of the HLLC relative to the HLL Riemann solver in a number of interesting cases, and provide preliminary indications of the code's ability to scale and to function with cell-by-cell fixed-mesh refinement.
Energy saving in WWTP: Daily benchmarking under uncertainty and data availability limitations.

PubMed

Torregrossa, D; Schutz, G; Cornelissen, A; Hernández-Sancho, F; Hansen, J

2016-07-01

Efficient management of Waste Water Treatment Plants (WWTPs) can produce significant environmental and economic benefits. Energy benchmarking can be used to compare WWTPs, identify targets and use these to improve their performance. Different authors have performed benchmark analysis on monthly or yearly basis but their approaches suffer from a time lag between an event, its detection, interpretation and potential actions. The availability of on-line measurement data on many WWTPs should theoretically enable the decrease of the management response time by daily benchmarking. Unfortunately this approach is often impossible because of limited data availability. This paper proposes a methodology to perform a daily benchmark analysis under database limitations. The methodology has been applied to the Energy Online System (EOS) developed in the framework of the project "INNERS" (INNovative Energy Recovery Strategies in the urban water cycle). EOS calculates a set of Key Performance Indicators (KPIs) for the evaluation of energy and process performances. In EOS, the energy KPIs take in consideration the pollutant load in order to enable the comparison between different plants. For example, EOS does not analyse the energy consumption but the energy consumption on pollutant load. This approach enables the comparison of performances for plants with different loads or for a single plant under different load conditions. The energy consumption is measured by on-line sensors, while the pollutant load is measured in the laboratory approximately every 14 days. Consequently, the unavailability of the water quality parameters is the limiting factor in calculating energy KPIs. In this paper, in order to overcome this limitation, the authors have developed a methodology to estimate the required parameters and manage the uncertainty in the estimation. By coupling the parameter estimation with an interval based benchmark approach, the authors propose an effective, fast and reproducible way to manage infrequent inlet measurements. Its use enables benchmarking on a daily basis and prepares the ground for further investigation. Copyright © 2016 Elsevier Inc. All rights reserved.
Low Molecular Weight Polymethacrylates as Multi-Functional Lubricant Additives

DOE PAGES

Cosimbescu, Lelia; Vellore, Azhar; Shantini Ramasamy, Uma; ...

2018-04-24

In this study, low molecular weight, moderately polar polymethacrylate polymers are explored as potential multi-functional lubricant additives. The performance of these novel additives in base oil is evaluated in terms of their viscosity index, shear stability, and friction-and-wear. The new compounds are compared to two benchmarks, a typical polymeric viscosity modifier and a fully-formulated oil. Results show that the best performing of the new polymers exhibit viscosity index and friction comparable to that of both benchmarks, far superior shear stability to either benchmark (as much as 15x lower shear loss), and wear reduction significantly better than a typical viscosity modifiermore » (lower wear volume by a factor of 2-3). The findings also suggest that the polarity and molecular weight of the polymers affect their performance which suggests future synthetic strategies may enable this new class of additives to replace multiple additives in typical lubricant formulations.« less

Benchmarking the efficiency of the Chilean water and sewerage companies: a double-bootstrap approach.

PubMed

Molinos-Senante, María; Donoso, Guillermo; Sala-Garrido, Ramon; Villegas, Andrés

2018-03-01

Benchmarking the efficiency of water companies is essential to set water tariffs and to promote their sustainability. In doing so, most of the previous studies have applied conventional data envelopment analysis (DEA) models. However, it is a deterministic method that does not allow to identify environmental factors influencing efficiency scores. To overcome this limitation, this paper evaluates the efficiency of a sample of Chilean water and sewerage companies applying a double-bootstrap DEA model. Results evidenced that the ranking of water and sewerage companies changes notably whether efficiency scores are computed applying conventional or double-bootstrap DEA models. Moreover, it was found that the percentage of non-revenue water and customer density are factors influencing the efficiency of Chilean water and sewerage companies. This paper illustrates the importance of using a robust and reliable method to increase the relevance of benchmarking tools.
Low Molecular Weight Polymethacrylates as Multi-Functional Lubricant Additives

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cosimbescu, Lelia; Vellore, Azhar; Shantini Ramasamy, Uma

In this study, low molecular weight, moderately polar polymethacrylate polymers are explored as potential multi-functional lubricant additives. The performance of these novel additives in base oil is evaluated in terms of their viscosity index, shear stability, and friction-and-wear. The new compounds are compared to two benchmarks, a typical polymeric viscosity modifier and a fully-formulated oil. Results show that the best performing of the new polymers exhibit viscosity index and friction comparable to that of both benchmarks, far superior shear stability to either benchmark (as much as 15x lower shear loss), and wear reduction significantly better than a typical viscosity modifiermore » (lower wear volume by a factor of 2-3). The findings also suggest that the polarity and molecular weight of the polymers affect their performance which suggests future synthetic strategies may enable this new class of additives to replace multiple additives in typical lubricant formulations.« less
NACA0012 benchmark model experimental flutter results with unsteady pressure distributions

NASA Technical Reports Server (NTRS)

Rivera, Jose A., Jr.; Dansberry, Bryan E.; Bennett, Robert M.; Durham, Michael H.; Silva, Walter A.

1992-01-01

The Structural Dynamics Division at NASA Langley Research Center has started a wind tunnel activity referred to as the Benchmark Models Program. The primary objective of this program is to acquire measured dynamic instability and corresponding pressure data that will be useful for developing and evaluating aeroelastic type computational fluid dynamics codes currently in use or under development. The program is a multi-year activity that will involve testing of several different models to investigate various aeroelastic phenomena. This paper describes results obtained from a second wind tunnel test of the first model in the Benchmark Models Program. This first model consisted of a rigid semispan wing having a rectangular planform and a NACA 0012 airfoil shape which was mounted on a flexible two degree of freedom mount system. Experimental flutter boundaries and corresponding unsteady pressure distribution data acquired over two model chords located at the 60 and 95 percent span stations are presented.
A Bayesian approach to traffic light detection and mapping

NASA Astrophysics Data System (ADS)

Hosseinyalamdary, Siavash; Yilmaz, Alper

2017-03-01

Automatic traffic light detection and mapping is an open research problem. The traffic lights vary in color, shape, geolocation, activation pattern, and installation which complicate their automated detection. In addition, the image of the traffic lights may be noisy, overexposed, underexposed, or occluded. In order to address this problem, we propose a Bayesian inference framework to detect and map traffic lights. In addition to the spatio-temporal consistency constraint, traffic light characteristics such as color, shape and height is shown to further improve the accuracy of the proposed approach. The proposed approach has been evaluated on two benchmark datasets and has been shown to outperform earlier studies. The results show that the precision and recall rates for the KITTI benchmark are 95.78 % and 92.95 % respectively and the precision and recall rates for the LARA benchmark are 98.66 % and 94.65 % .
The Role of Focus Groups with Other Performance Measurement Methods.

ERIC Educational Resources Information Center

Hart, Elizabeth

Huddersfield University Library (England) has undertaken a wide range of evaluative studies of its services and systems, using various data collection techniques such as: user surveys; exit interviews; online and CD-ROM analysis; benchmarking; user groups; staffing and staff development evaluation; suggestion sheets; student project work; group…
Assessment and Evaluation Methods for Access Services

ERIC Educational Resources Information Center

Long, Dallas

2014-01-01

This article serves as a primer for assessment and evaluation design by describing the range of methods commonly employed in library settings. Quantitative methods, such as counting and benchmarking measures, are useful for investigating the internal operations of an access services department in order to identify workflow inefficiencies or…
Evaluative Usage-Based Metrics for the Selection of E-Journals.

ERIC Educational Resources Information Center

Hahn, Karla L.; Faulkner, Lila A.

2002-01-01

Explores electronic journal usage statistics and develops three metrics and three benchmarks based on those metrics. Topics include earlier work that assessed the value of print journals and was modified for the electronic format; the evaluation of potential purchases; and implications for standards development, including the need for content…
The Domain Five Observation Instrument: A Competency-Based Coach Evaluation Tool

ERIC Educational Resources Information Center

Shangraw, Rebecca

2017-01-01

The Domain Five Observation Instrument (DFOI) is a competency-based observation instrument recommended for sport leaders or researchers who wish to evaluate coaches' instructional behaviors. The DFOI includes 10 behavior categories and four timed categories that encompass 34 observable instructional benchmarks outlined in domain five of the…
Evaluating the Effectiveness of Student Assistance Programs in Pennsylvania.

ERIC Educational Resources Information Center

Fertman, Carl I.; Fichter, Cele; Schlesinger, Jo; Tarasevich, Susan; Wald, Holly; Zhang, Xiaoyan

2001-01-01

This evaluation of the Pennsylvania Student Assistance Program (SAP) was conducted to determine overall efficacy of SAPs and, more specifically, how SAP is currently implemented. Findings indicate that SAP is being implemented as designed. Recommended is the development of benchmarks and indicators focusing on best SAP practices and effectiveness.…
Revamping Teacher Evaluation

ERIC Educational Resources Information Center

Zatynski, Mandy

2012-01-01

In the past two years, as concerns over teacher quality have swelled, teacher evaluation has emerged as a crucial tool for principals and other administrators to improve instructor performance. More states are seeking federal waivers to the stringent benchmarks of No Child Left Behind; others are vying for Race to the Top funds. Both require…
EVALUATION OF A CRYPTOSPORIDIUM INTERNAL STANDARD FOR DETERMINING RECOVERY WITH ENVIRONMENTAL PROTECTION AGENCY METHOD 1623

EPA Science Inventory

The current benchmark method for detecting Cryptosporidium oocysts in water is the U.S. Environmental Protection Agency (U.S. EPA) Method 1623. Studies evaluating this method report that recoveries are highly variable and dependent upon laboratory, water sample, and analyst. Ther...
Assessing Civic Competence against the Normative Benchmark of Considered Opinions

ERIC Educational Resources Information Center

Kim, Jin Woo

2017-01-01

There is widespread skepticism about civic competence. Some question if citizens are informed enough to make considered decisions. Others doubt citizens' ability to rationally evaluate relevant evidence and update their opinions even when they have necessary information. The purpose of my dissertation is to critically evaluate this literature and…
Highway Vehicle Retrofit Evaluation : Phase I. Analysis and Preliminary Evaluation Results. Volume 1. Sections 1 through 3.

DOT National Transportation Integrated Search

1975-11-01

This report in two volumes presents an analysis and preliminary evaluation of selected used-car and light-truck fuel economy retrofit devices. In particular, information is provided that depicts the performance characteristics of retrofit devices tha...
Preliminary seismic evaluation and ranking of bridges along I-24 in Western Kentucky.

DOT National Transportation Integrated Search

2006-09-01

This study represents one of the Seismic Evaluation of I-24 Bridges investigative series. The focus is on preliminary seismic evaluation and ranking of bridges according to their seismic vulnerability. Bridges along I-24 are considered in this invest...
Validating Cellular Automata Lava Flow Emplacement Algorithms with Standard Benchmarks

NASA Astrophysics Data System (ADS)

Richardson, J. A.; Connor, L.; Charbonnier, S. J.; Connor, C.; Gallant, E.

2015-12-01

A major existing need in assessing lava flow simulators is a common set of validation benchmark tests. We propose three levels of benchmarks which test model output against increasingly complex standards. First, imulated lava flows should be morphologically identical, given changes in parameter space that should be inconsequential, such as slope direction. Second, lava flows simulated in simple parameter spaces can be tested against analytical solutions or empirical relationships seen in Bingham fluids. For instance, a lava flow simulated on a flat surface should produce a circular outline. Third, lava flows simulated over real world topography can be compared to recent real world lava flows, such as those at Tolbachik, Russia, and Fogo, Cape Verde. Success or failure of emplacement algorithms in these validation benchmarks can be determined using a Bayesian approach, which directly tests the ability of an emplacement algorithm to correctly forecast lava inundation. Here we focus on two posterior metrics, P(A|B) and P(¬A|¬B), which describe the positive and negative predictive value of flow algorithms. This is an improvement on less direct statistics such as model sensitivity and the Jaccard fitness coefficient. We have performed these validation benchmarks on a new, modular lava flow emplacement simulator that we have developed. This simulator, which we call MOLASSES, follows a Cellular Automata (CA) method. The code is developed in several interchangeable modules, which enables quick modification of the distribution algorithm from cell locations to their neighbors. By assessing several different distribution schemes with the benchmark tests, we have improved the performance of MOLASSES to correctly match early stages of the 2012-3 Tolbachik Flow, Kamchakta Russia, to 80%. We also can evaluate model performance given uncertain input parameters using a Monte Carlo setup. This illuminates sensitivity to model uncertainty.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Mackillop, William J., E-mail: william.mackillop@krcc.on.ca; Kong, Weidong; Brundage, Michael

Purpose: Estimates of the appropriate rate of use of radiation therapy (RT) are required for planning and monitoring access to RT. Our objective was to compare estimates of the appropriate rate of use of RT derived from mathematical models, with the rate observed in a population of patients with optimal access to RT. Methods and Materials: The rate of use of RT within 1 year of diagnosis (RT{sub 1Y}) was measured in the 134,541 cases diagnosed in Ontario between November 2009 and October 2011. The lifetime rate of use of RT (RT{sub LIFETIME}) was estimated by the multicohort utilization tablemore » method. Poisson regression was used to evaluate potential barriers to access to RT and to identify a benchmark subpopulation with unimpeded access to RT. Rates of use of RT were measured in the benchmark subpopulation and compared with published evidence-based estimates of the appropriate rates. Results: The benchmark rate for RT{sub 1Y}, observed under conditions of optimal access, was 33.6% (95% confidence interval [CI], 33.0%-34.1%), and the benchmark for RT{sub LIFETIME} was 41.5% (95% CI, 41.2%-42.0%). Benchmarks for RT{sub LIFETIME} for 4 of 5 selected sites and for all cancers combined were significantly lower than the corresponding evidence-based estimates. Australian and Canadian evidence-based estimates of RT{sub LIFETIME} for 5 selected sites differed widely. RT{sub LIFETIME} in the overall population of Ontario was just 7.9% short of the benchmark but 20.9% short of the Australian evidence-based estimate of the appropriate rate. Conclusions: Evidence-based estimates of the appropriate lifetime rate of use of RT may overestimate the need for RT in Ontario.« less
ELAPSE - NASA AMES LISP AND ADA BENCHMARK SUITE: EFFICIENCY OF LISP AND ADA PROCESSING - A SYSTEM EVALUATION

NASA Technical Reports Server (NTRS)

Davis, G. J.

1994-01-01

One area of research of the Information Sciences Division at NASA Ames Research Center is devoted to the analysis and enhancement of processors and advanced computer architectures, specifically in support of automation and robotic systems. To compare systems' abilities to efficiently process Lisp and Ada, scientists at Ames Research Center have developed a suite of non-parallel benchmarks called ELAPSE. The benchmark suite was designed to test a single computer's efficiency as well as alternate machine comparisons on Lisp, and/or Ada languages. ELAPSE tests the efficiency with which a machine can execute the various routines in each environment. The sample routines are based on numeric and symbolic manipulations and include two-dimensional fast Fourier transformations, Cholesky decomposition and substitution, Gaussian elimination, high-level data processing, and symbol-list references. Also included is a routine based on a Bayesian classification program sorting data into optimized groups. The ELAPSE benchmarks are available for any computer with a validated Ada compiler and/or Common Lisp system. Of the 18 routines that comprise ELAPSE, provided within this package are 14 developed or translated at Ames. The others are readily available through literature. The benchmark that requires the most memory is CHOLESKY.ADA. Under VAX/VMS, CHOLESKY.ADA requires 760K of main memory. ELAPSE is available on either two 5.25 inch 360K MS-DOS format diskettes (standard distribution) or a 9-track 1600 BPI ASCII CARD IMAGE format magnetic tape. The contents of the diskettes are compressed using the PKWARE archiving tools. The utility to unarchive the files, PKUNZIP.EXE, is included. The ELAPSE benchmarks were written in 1990. VAX and VMS are trademarks of Digital Equipment Corporation. MS-DOS is a registered trademark of Microsoft Corporation.
A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge

PubMed Central

Gururaj, Anupama E.; Chen, Xiaoling; Pournejati, Saeid; Alter, George; Hersh, William R.; Demner-Fushman, Dina; Ohno-Machado, Lucila

2017-01-01

Abstract The rapid proliferation of publicly available biomedical datasets has provided abundant resources that are potentially of value as a means to reproduce prior experiments, and to generate and explore novel hypotheses. However, there are a number of barriers to the re-use of such datasets, which are distributed across a broad array of dataset repositories, focusing on different data types and indexed using different terminologies. New methods are needed to enable biomedical researchers to locate datasets of interest within this rapidly expanding information ecosystem, and new resources are needed for the formal evaluation of these methods as they emerge. In this paper, we describe the design and generation of a benchmark for information retrieval of biomedical datasets, which was developed and used for the 2016 bioCADDIE Dataset Retrieval Challenge. In the tradition of the seminal Cranfield experiments, and as exemplified by the Text Retrieval Conference (TREC), this benchmark includes a corpus (biomedical datasets), a set of queries, and relevance judgments relating these queries to elements of the corpus. This paper describes the process through which each of these elements was derived, with a focus on those aspects that distinguish this benchmark from typical information retrieval reference sets. Specifically, we discuss the origin of our queries in the context of a larger collaborative effort, the biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium, and the distinguishing features of biomedical dataset retrieval as a task. The resulting benchmark set has been made publicly available to advance research in the area of biomedical dataset retrieval. Database URL: https://biocaddie.org/benchmark-data PMID:29220453
Academic Productivity in Psychiatry: Benchmarks for the H-Index.

PubMed

MacMaster, Frank P; Swansburg, Rose; Rittenbach, Katherine

2017-08-01

Bibliometrics play an increasingly critical role in the assessment of faculty for promotion and merit increases. Bibliometrics is the statistical analysis of publications, aimed at evaluating their impact. The objective of this study is to describe h-index and citation benchmarks in academic psychiatry. Faculty lists were acquired from online resources for all academic departments of psychiatry listed as having residency training programs in Canada (as of June 2016). Potential authors were then searched on Web of Science (Thomson Reuters) for their corresponding h-index and total number of citations. The sample included 1683 faculty members in academic psychiatry departments. Restricted to those with a rank of assistant, associate, or full professor resulted in 1601 faculty members (assistant = 911, associate = 387, full = 303). h-index and total citations differed significantly by academic rank. Both were highest in the full professor rank, followed by associate, then assistant. The range in each, however, was large. This study provides the initial benchmarks for the h-index and total citations in academic psychiatry. Regardless of any controversies or criticisms of bibliometrics, they are increasingly influencing promotion, merit increases, and grant support. As such, benchmarking by specialties is needed in order to provide needed context.
Yoga for military service personnel with PTSD: A single arm study.

PubMed

Johnston, Jennifer M; Minami, Takuya; Greenwald, Deborah; Li, Chieh; Reinhardt, Kristen; Khalsa, Sat Bir S

2015-11-01

This study evaluated the effects of yoga on posttraumatic stress disorder (PTSD) symptoms, resilience, and mindfulness in military personnel. Participants completing the yoga intervention were 12 current or former military personnel who met the Diagnostic and Statistical Manual for Mental Disorders-Fourth Edition-Text Revision (DSM-IV-TR) diagnostic criteria for PTSD. Results were also benchmarked against other military intervention studies of PTSD using the Clinician Administered PTSD Scale (CAPS; Blake et al., 2000) as an outcome measure. Results of within-subject analyses supported the study's primary hypothesis that yoga would reduce PTSD symptoms (d = 0.768; t = 2.822; p = .009) but did not support the hypothesis that yoga would significantly increase mindfulness (d = 0.392; t = -0.9500; p = .181) and resilience (d = 0.270; t = -1.220; p = .124) in this population. Benchmarking results indicated that, as compared with the aggregated treatment benchmark (d = 1.074) obtained from published clinical trials, the current study's treatment effect (d = 0.768) was visibly lower, and compared with the waitlist control benchmark (d = 0.156), the treatment effect in the current study was visibly higher. (c) 2015 APA, all rights reserved).

Antibody-protein interactions: benchmark datasets and prediction tools evaluation

PubMed Central

Ponomarenko, Julia V; Bourne, Philip E

2007-01-01

Background The ability to predict antibody binding sites (aka antigenic determinants or B-cell epitopes) for a given protein is a precursor to new vaccine design and diagnostics. Among the various methods of B-cell epitope identification X-ray crystallography is one of the most reliable methods. Using these experimental data computational methods exist for B-cell epitope prediction. As the number of structures of antibody-protein complexes grows, further interest in prediction methods using 3D structure is anticipated. This work aims to establish a benchmark for 3D structure-based epitope prediction methods. Results Two B-cell epitope benchmark datasets inferred from the 3D structures of antibody-protein complexes were defined. The first is a dataset of 62 representative 3D structures of protein antigens with inferred structural epitopes. The second is a dataset of 82 structures of antibody-protein complexes containing different structural epitopes. Using these datasets, eight web-servers developed for antibody and protein binding sites prediction have been evaluated. In no method did performance exceed a 40% precision and 46% recall. The values of the area under the receiver operating characteristic curve for the evaluated methods were about 0.6 for ConSurf, DiscoTope, and PPI-PRED methods and above 0.65 but not exceeding 0.70 for protein-protein docking methods when the best of the top ten models for the bound docking were considered; the remaining methods performed close to random. The benchmark datasets are included as a supplement to this paper. Conclusion It may be possible to improve epitope prediction methods through training on datasets which include only immune epitopes and through utilizing more features characterizing epitopes, for example, the evolutionary conservation score. Notwithstanding, overall poor performance may reflect the generality of antigenicity and hence the inability to decipher B-cell epitopes as an intrinsic feature of the protein. It is an open question as to whether ultimately discriminatory features can be found. PMID:17910770
SPACE PROPULSION SYSTEM PHASED-MISSION PROBABILITY ANALYSIS USING CONVENTIONAL PRA METHODS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Curtis Smith; James Knudsen

As part of a series of papers on the topic of advance probabilistic methods, a benchmark phased-mission problem has been suggested. This problem consists of modeling a space mission using an ion propulsion system, where the mission consists of seven mission phases. The mission requires that the propulsion operate for several phases, where the configuration changes as a function of phase. The ion propulsion system itself consists of five thruster assemblies and a single propellant supply, where each thruster assembly has one propulsion power unit and two ion engines. In this paper, we evaluate the probability of mission failure usingmore » the conventional methodology of event tree/fault tree analysis. The event tree and fault trees are developed and analyzed using Systems Analysis Programs for Hands-on Integrated Reliability Evaluations (SAPHIRE). While the benchmark problem is nominally a "dynamic" problem, in our analysis the mission phases are modeled in a single event tree to show the progression from one phase to the next. The propulsion system is modeled in fault trees to account for the operation; or in this case, the failure of the system. Specifically, the propulsion system is decomposed into each of the five thruster assemblies and fed into the appropriate N-out-of-M gate to evaluate mission failure. A separate fault tree for the propulsion system is developed to account for the different success criteria of each mission phase. Common-cause failure modeling is treated using traditional (i.e., parametrically) methods. As part of this paper, we discuss the overall results in addition to the positive and negative aspects of modeling dynamic situations with non-dynamic modeling techniques. One insight from the use of this conventional method for analyzing the benchmark problem is that it requires significant manual manipulation to the fault trees and how they are linked into the event tree. The conventional method also requires editing the resultant cut sets to obtain the correct results. While conventional methods may be used to evaluate a dynamic system like that in the benchmark, the level of effort required may preclude its use on real-world problems.« less
Benchmarking reference services: step by step.

PubMed

Buchanan, H S; Marshall, J G

1996-01-01

This article is a companion to an introductory article on benchmarking published in an earlier issue of Medical Reference Services Quarterly. Librarians interested in benchmarking often ask the following questions: How do I determine what to benchmark; how do I form a benchmarking team; how do I identify benchmarking partners; what's the best way to collect and analyze benchmarking information; and what will I do with the data? Careful planning is a critical success factor of any benchmarking project, and these questions must be answered before embarking on a benchmarking study. This article summarizes the steps necessary to conduct benchmarking research. Relevant examples of each benchmarking step are provided.
DEVELOPING ECOLOGICAL SOIL SCREENING LEVELS: BENCHMARK VALUES FOR SOIL INVERTEBRATES, PLANTS, AND MICROBIAL FUNCTIONS

EPA Science Inventory

Soils are repositories for environmental contaminants (COCs) in terrestrial ecosystems. Time, effort, and money repeatedly are invested in literature-based evaluations of potential soil-ecotoxicity...
Auditing and benchmarks in screening and diagnostic mammography.

PubMed

Feig, Stephen A

2007-09-01

Radiologists can use outcome data such as cancer size and stage to determine how well their own practice provides benefit to their patients and can use measures such as screening recall rates and positive predictive values to assess how well adverse consequences are being contained. New data on national benchmarks for screening and diagnostic mammography in the United States allow radiologists to evaluate their own performance with respect to their peers. This article discusses recommended outcome values in the United States and Europe, current Mammography Quality Standards Act audit requirements, and Institute of Medicine proposals for future requirements.
Preliminary evaluation of synthetic speech

DOT National Transportation Integrated Search

1972-08-01

The report briefly discusses the methods for storing and generating synthetic speech and a preliminary evaluation of the intelligibility of a speech synthesizer having a 75-word vocabulary selected for air traffic control messages. A program is sugge...
Analysis of dosimetry from the H.B. Robinson unit 2 pressure vessel benchmark using RAPTOR-M3G and ALPAN

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fischer, G.A.

2011-07-01

Document available in abstract form only, full text of document follows: The dosimetry from the H. B. Robinson Unit 2 Pressure Vessel Benchmark is analyzed with a suite of Westinghouse-developed codes and data libraries. The radiation transport from the reactor core to the surveillance capsule and ex-vessel locations is performed by RAPTOR-M3G, a parallel deterministic radiation transport code that calculates high-resolution neutron flux information in three dimensions. The cross-section library used in this analysis is the ALPAN library, an Evaluated Nuclear Data File (ENDF)/B-VII.0-based library designed for reactor dosimetry and fluence analysis applications. Dosimetry is evaluated with the industry-standard SNLRMLmore » reactor dosimetry cross-section data library. (authors)« less
34 CFR 636.21 - What selection criteria does the Secretary use to evaluate an application?

Code of Federal Regulations, 2011 CFR

2011-07-01

...) Agencies of local government. (ii) Public and private elementary and secondary schools. (iii) Business... implementation strategy for each key project component activity is— (i) Comprehensive; (ii) Based on a sound... operation; (5) Describe a time-line chart that relates key evaluation processes and benchmarks to other...
34 CFR 636.21 - What selection criteria does the Secretary use to evaluate an application?

Code of Federal Regulations, 2014 CFR

2014-07-01

...) Agencies of local government. (ii) Public and private elementary and secondary schools. (iii) Business... implementation strategy for each key project component activity is— (i) Comprehensive; (ii) Based on a sound... operation; (5) Describe a time-line chart that relates key evaluation processes and benchmarks to other...
34 CFR 636.21 - What selection criteria does the Secretary use to evaluate an application?

Code of Federal Regulations, 2013 CFR

2013-07-01

...) Agencies of local government. (ii) Public and private elementary and secondary schools. (iii) Business... implementation strategy for each key project component activity is— (i) Comprehensive; (ii) Based on a sound... operation; (5) Describe a time-line chart that relates key evaluation processes and benchmarks to other...
34 CFR 636.21 - What selection criteria does the Secretary use to evaluate an application?

Code of Federal Regulations, 2012 CFR

2012-07-01

...) Agencies of local government. (ii) Public and private elementary and secondary schools. (iii) Business... implementation strategy for each key project component activity is— (i) Comprehensive; (ii) Based on a sound... operation; (5) Describe a time-line chart that relates key evaluation processes and benchmarks to other...
Implementing Cognitive Behavioral Therapy for Chronic Fatigue Syndrome in a Mental Health Center: A Benchmarking Evaluation

ERIC Educational Resources Information Center

Scheeres, Korine; Wensing, Michel; Knoop, Hans; Bleijenberg, Gijs

2008-01-01

Objective: This study evaluated the success of implementing cognitive behavioral therapy (CBT) for chronic fatigue syndrome (CFS) in a representative clinical practice setting and compared the patient outcomes with those of previously published randomized controlled trials (RCTs) of CBT for CFS. Method: The implementation interventions were the…
Selecting a Relational Database Management System for Library Automation Systems.

ERIC Educational Resources Information Center

Shekhel, Alex; O'Brien, Mike

1989-01-01

Describes the evaluation of four relational database management systems (RDBMSs) (Informix Turbo, Oracle 6.0 TPS, Unify 2000 and Relational Technology's Ingres 5.0) to determine which is best suited for library automation. The evaluation criteria used to develop a benchmark specifically designed to test RDBMSs for libraries are discussed. (CLB)
Special Education Professional Standards: How Important Are They in the Context of Teacher Performance Evaluation?

ERIC Educational Resources Information Center

Woolf, Sara B.

2015-01-01

Teacher performance evaluation represents a high stakes issue as evidenced by its pivotal emphasis in national and local education reform initiatives and federal policy levers. National, state, and local education leaders continue to experience unprecedented pressure to adopt standardized benchmarks to reflect and link student achievement data to…
Performance evaluation of structure based and ligand based virtual screening methods on ten selected anti-cancer targets.

PubMed

Ramasamy, Thilagavathi; Selvam, Chelliah

2015-10-15

Virtual screening has become an important tool in drug discovery process. Structure based and ligand based approaches are generally used in virtual screening process. To date, several benchmark sets for evaluating the performance of the virtual screening tool are available. In this study, our aim is to compare the performance of both structure based and ligand based virtual screening methods. Ten anti-cancer targets and their corresponding benchmark sets from 'Demanding Evaluation Kits for Objective In silico Screening' (DEKOIS) library were selected. X-ray crystal structures of protein-ligand complexes were selected based on their resolution. Openeye tools such as FRED, vROCS were used and the results were carefully analyzed. At EF1%, vROCS produced better results but at EF5% and EF10%, both FRED and ROCS produced almost similar results. It was noticed that the enrichment factor values were decreased while going from EF1% to EF5% and EF10% in many cases. Published by Elsevier Ltd.
Service profiling and outcomes benchmarking using the CORE-OM: toward practice-based evidence in the psychological therapies. Clinical Outcomes in Routine Evaluation-Outcome Measures.

PubMed

Barkham, M; Margison, F; Leach, C; Lucock, M; Mellor-Clark, J; Evans, C; Benson, L; Connell, J; Audin, K; McGrath, G

2001-04-01

To complement the evidence-based practice paradigm, the authors argued for a core outcome measure to provide practice-based evidence for the psychological therapies. Utility requires instruments that are acceptable scientifically, as well as to service users, and a coordinated implementation of the measure at a national level. The development of the Clinical Outcomes in Routine Evaluation-Outcome Measure (CORE-OM) is summarized. Data are presented across 39 secondary-care services (n = 2,710) and within an intensively evaluated single service (n = 1,455). Results suggest that the CORE-OM is a valid and reliable measure for multiple settings and is acceptable to users and clinicians as well as policy makers. Baseline data levels of patient presenting problem severity, including risk, are reported in addition to outcome benchmarks that use the concept of reliable and clinically significant change. Basic quality improvement in outcomes for a single service is considered.
Evaluating Inequality or Injustice in Water Use for Food

NASA Astrophysics Data System (ADS)

D'Odorico, P.; Carr, J. A.; Seekell, D. A.

2014-12-01

Water availability and population density distributions are uneven and therefore inequality exists in human access to freshwater resources; but is this inequality unjust or only regrettable? To examine this question we formulated and evaluated elementary principles of water ethics relative to human rights for water and explored the need for global trade to improve societal access to water by transferring plant and animal commodities and the "virtual water" embedded in them. We defined human welfare benchmarks and evaluated country specific patterns of water use for food with, and without trade, over a 25-year period in order to elucidate the influence of trade and inequality on equability of water use. We found that trade improves mean water use and wellbeing, when related to human welfare benchmarks, suggesting that inequality is regrettable but not necessarily unjust. However, trade has not significantly contributed to redressing inequality. Hence, directed trade decisions can improve future conditions of water and food scarcity through reduced inequality.
Preliminary evaluation of user terminals for an automated pilot briefing system

DOT National Transportation Integrated Search

1976-08-01

This report describes a preliminary evaluation of various user terminal concepts for an automated aviation pilot weather briefing and flight plan filing system. Terminals embodying differing operational concepts were used by volunteer general aviatio...
First Conclusions of the WPEC/Subgroup-22 Nuclear Data for Improved LEU-LWR Reactivity Predictions

NASA Astrophysics Data System (ADS)

Courcelle, Arnaud

2005-05-01

This paper is a summary of a collective work in the framework of the Working Party in International Nuclear Data Evaluation and Co-operation (WPEC) to investigate the reasons for systematic reactivity underprediction of thermal LEU-LWR (Low-Enriched Uranium, Light-Water Reactor). This keff underprediction (≈ -500 pcm) is observed with the most recent nuclear data libraries (ENDF/B-VI.8, JENDL3.3 and JEFF3.0) This report reviews the evaluation work performed at several laboratories [Oak Ridge National Laboratory (ORNL), Los Alamos National Laboratory (LANL), Commissariat a l'énergie atomique de Bruyeres-Le-Chatel (CEA-BRC), International Atomic Energy Agency (IAEA)] as well as the integral tests (mainly at LANL, Knoll Atomic Power Laboratory (KAPL), Bettis Atomic Power Laboratory (BAPL), Nuclear Research and Consultancy Group NRG-Petten, CEA and IAEA) of the successive versions of the new evaluated files. The present status of the work can be summarized as follows: • Improved evaluations of 238U inelastic data proposed by LANL and CEA-BRC were tested against integral benchmarks and partially improve the reactivity prediction. • The thermal capture cross-section of 238U has been revised, and a new evaluation of 238U resonance parameters, up to 20 keV, is in progress at ORNL. Integral tests have ensured that the modifications of 238U capture cross-section in the thermal and resolved range were still compatible with 238U integral measurements (238U capture rate ratios measured in critical facilities and 239Pu build-up prediction in a depleted pressurized water reactor (PWR) assembly). It is demonstrated that the combination of the new inelastic data (LANL or BRC) with the preliminary ORNL resonance parameter set gives a good correction of the reactivity under-estimation. The provisional conclusions of this collective work are expected to contribute toward the improvement of the future versions of nuclear data libraries.
A Preliminary Look at Systems of Evaluating Teachers in Institutions of Higher Education and at Evaluation Methods

ERIC Educational Resources Information Center

Chinese Education and Society, 2005

2005-01-01

This article takes a preliminary look at the fundamental characteristics of systems and methods of evaluating teachers in institutions of higher education as well as at planning and implementing them. Its purpose is to deepen one's knowledge of such evaluation and improve the way it is carried out in practice. The article discusses the…

Limitations of Community College Benchmarking and Benchmarks

ERIC Educational Resources Information Center

Bers, Trudy H.

2006-01-01

This chapter distinguishes between benchmarks and benchmarking, describes a number of data and cultural limitations to benchmarking projects, and suggests that external demands for accountability are the dominant reason for growing interest in benchmarking among community colleges.
A Novel Performance Evaluation Methodology for Single-Target Trackers.

PubMed

Kristan, Matej; Matas, Jiri; Leonardis, Ales; Vojir, Tomas; Pflugfelder, Roman; Fernandez, Gustavo; Nebehay, Georg; Porikli, Fatih; Cehovin, Luka

2016-11-01

This paper addresses the problem of single-target tracker performance evaluation. We consider the performance measures, the dataset and the evaluation system to be the most important components of tracker evaluation and propose requirements for each of them. The requirements are the basis of a new evaluation methodology that aims at a simple and easily interpretable tracker comparison. The ranking-based methodology addresses tracker equivalence in terms of statistical significance and practical differences. A fully-annotated dataset with per-frame annotations with several visual attributes is introduced. The diversity of its visual properties is maximized in a novel way by clustering a large number of videos according to their visual attributes. This makes it the most sophistically constructed and annotated dataset to date. A multi-platform evaluation system allowing easy integration of third-party trackers is presented as well. The proposed evaluation methodology was tested on the VOT2014 challenge on the new dataset and 38 trackers, making it the largest benchmark to date. Most of the tested trackers are indeed state-of-the-art since they outperform the standard baselines, resulting in a highly-challenging benchmark. An exhaustive analysis of the dataset from the perspective of tracking difficulty is carried out. To facilitate tracker comparison a new performance visualization technique is proposed.
ARRAYS OF BOTTLES OF PLUTONIUM NITRATE SOLUTION

DOE Office of Scientific and Technical Information (OSTI.GOV)

Margaret A. Marshall

2012-09-01

In October and November of 1981 thirteen approaches-to-critical were performed on a remote split table machine (RSTM) in the Critical Mass Laboratory of Pacific Northwest Laboratory (PNL) in Richland, Washington using planar arrays of polyethylene bottles filled with plutonium (Pu) nitrate solution. Arrays of up to sixteen bottles were used to measure the critical number of bottles and critical array spacing with a tight fitting Plexiglas® reflector on all sides of the arrays except the top. Some experiments used Plexiglas shells fitted around each bottles to determine the effect of moderation on criticality. Each bottle contained approximately 2.4 L ofmore » Pu(NO3)4 solution with a Pu content of 105 g Pu/L and a free acid molarity H+ of 5.1. The plutonium was of low 240Pu (2.9 wt.%) content. These experiments were sponsored by Rockwell Hanford Operations because of the lack of experimental data on the criticality of arrays of bottles of Pu solution such as might be found in storage and handling at the Purex Facility at Hanford. The results of these experiments were used “to provide benchmark data to validate calculational codes used in criticality safety assessments of [the] plant configurations” (Ref. 1). Data for this evaluation were collected from the published report (Ref. 1), the approach to critical logbook, the experimenter’s logbook, and communication with the primary experimenter, B. Michael Durst. Of the 13 experiments preformed 10 were evaluated. One of the experiments was not evaluated because it had been thrown out by the experimenter, one was not evaluated because it was a repeat of another experiment and the third was not evaluated because it reported the critical number of bottles as being greater than 25. Seven of the thirteen evaluated experiments were determined to be acceptable benchmark experiments. A similar experiment using uranyl nitrate was benchmarked as U233-SOL-THERM-014.« less
Performance Evaluation and Benchmarking of Intelligent Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Madhavan, Raj; Messina, Elena; Tunstel, Edward

To design and develop capable, dependable, and affordable intelligent systems, their performance must be measurable. Scientific methodologies for standardization and benchmarking are crucial for quantitatively evaluating the performance of emerging robotic and intelligent systems technologies. There is currently no accepted standard for quantitatively measuring the performance of these systems against user-defined requirements; and furthermore, there is no consensus on what objective evaluation procedures need to be followed to understand the performance of these systems. The lack of reproducible and repeatable test methods has precluded researchers working towards a common goal from exchanging and communicating results, inter-comparing system performance, and leveragingmore » previous work that could otherwise avoid duplication and expedite technology transfer. Currently, this lack of cohesion in the community hinders progress in many domains, such as manufacturing, service, healthcare, and security. By providing the research community with access to standardized tools, reference data sets, and open source libraries of solutions, researchers and consumers will be able to evaluate the cost and benefits associated with intelligent systems and associated technologies. In this vein, the edited book volume addresses performance evaluation and metrics for intelligent systems, in general, while emphasizing the need and solutions for standardized methods. To the knowledge of the editors, there is not a single book on the market that is solely dedicated to the subject of performance evaluation and benchmarking of intelligent systems. Even books that address this topic do so only marginally or are out of date. The research work presented in this volume fills this void by drawing from the experiences and insights of experts gained both through theoretical development and practical implementation of intelligent systems in a variety of diverse application domains. The book presents a detailed and coherent picture of state-of-the-art, recent developments, and further research areas in intelligent systems.« less
Interlaboratory Study Characterizing a Yeast Performance Standard for Benchmarking LC-MS Platform Performance*

PubMed Central

Paulovich, Amanda G.; Billheimer, Dean; Ham, Amy-Joan L.; Vega-Montoto, Lorenzo; Rudnick, Paul A.; Tabb, David L.; Wang, Pei; Blackman, Ronald K.; Bunk, David M.; Cardasis, Helene L.; Clauser, Karl R.; Kinsinger, Christopher R.; Schilling, Birgit; Tegeler, Tony J.; Variyath, Asokan Mulayath; Wang, Mu; Whiteaker, Jeffrey R.; Zimmerman, Lisa J.; Fenyo, David; Carr, Steven A.; Fisher, Susan J.; Gibson, Bradford W.; Mesri, Mehdi; Neubert, Thomas A.; Regnier, Fred E.; Rodriguez, Henry; Spiegelman, Cliff; Stein, Stephen E.; Tempst, Paul; Liebler, Daniel C.

2010-01-01

Optimal performance of LC-MS/MS platforms is critical to generating high quality proteomics data. Although individual laboratories have developed quality control samples, there is no widely available performance standard of biological complexity (and associated reference data sets) for benchmarking of platform performance for analysis of complex biological proteomes across different laboratories in the community. Individual preparations of the yeast Saccharomyces cerevisiae proteome have been used extensively by laboratories in the proteomics community to characterize LC-MS platform performance. The yeast proteome is uniquely attractive as a performance standard because it is the most extensively characterized complex biological proteome and the only one associated with several large scale studies estimating the abundance of all detectable proteins. In this study, we describe a standard operating protocol for large scale production of the yeast performance standard and offer aliquots to the community through the National Institute of Standards and Technology where the yeast proteome is under development as a certified reference material to meet the long term needs of the community. Using a series of metrics that characterize LC-MS performance, we provide a reference data set demonstrating typical performance of commonly used ion trap instrument platforms in expert laboratories; the results provide a basis for laboratories to benchmark their own performance, to improve upon current methods, and to evaluate new technologies. Additionally, we demonstrate how the yeast reference, spiked with human proteins, can be used to benchmark the power of proteomics platforms for detection of differentially expressed proteins at different levels of concentration in a complex matrix, thereby providing a metric to evaluate and minimize preanalytical and analytical variation in comparative proteomics experiments. PMID:19858499
The Dutch Hospital Standardised Mortality Ratio (HSMR) method and cardiac surgery: benchmarking in a national cohort using hospital administration data versus a clinical database

PubMed Central

Siregar, S; Pouw, M E; Moons, K G M; Versteegh, M I M; Bots, M L; van der Graaf, Y; Kalkman, C J; van Herwerden, L A; Groenwold, R H H

2014-01-01

Objective To compare the accuracy of data from hospital administration databases and a national clinical cardiac surgery database and to compare the performance of the Dutch hospital standardised mortality ratio (HSMR) method and the logistic European System for Cardiac Operative Risk Evaluation, for the purpose of benchmarking of mortality across hospitals. Methods Information on all patients undergoing cardiac surgery between 1 January 2007 and 31 December 2010 in 10 centres was extracted from The Netherlands Association for Cardio-Thoracic Surgery database and the Hospital Discharge Registry. The number of cardiac surgery interventions was compared between both databases. The European System for Cardiac Operative Risk Evaluation and hospital standardised mortality ratio models were updated in the study population and compared using the C-statistic, calibration plots and the Brier-score. Results The number of cardiac surgery interventions performed could not be assessed using the administrative database as the intervention code was incorrect in 1.4–26.3%, depending on the type of intervention. In 7.3% no intervention code was registered. The updated administrative model was inferior to the updated clinical model with respect to discrimination (c-statistic of 0.77 vs 0.85, p<0.001) and calibration (Brier Score of 2.8% vs 2.6%, p<0.001, maximum score 3.0%). Two average performing hospitals according to the clinical model became outliers when benchmarking was performed using the administrative model. Conclusions In cardiac surgery, administrative data are less suitable than clinical data for the purpose of benchmarking. The use of either administrative or clinical risk-adjustment models can affect the outlier status of hospitals. Risk-adjustment models including procedure-specific clinical risk factors are recommended. PMID:24334377
Design, synthesis and preliminary antimicrobial evaluation of n-alkyl chain tethered c-5 functionalized bis-isatins

USDA-ARS?s Scientific Manuscript database

A series of N-alkyl tethered C-5 functionalized bis-isatins were synthesized and evaluated for antimicrobial activity against pathogenic microorganisms. The preliminary evaluation studies revealed the compound 4t, with an optimal combination of bromo-substituent at the C-5 position of isatin ring al...
A Preliminary Evaluation of Reach: Training Early Childhood Teachers to Support Children's Social and Emotional Development

ERIC Educational Resources Information Center

Conners-Burrow, Nicola A.; Patrick, Terese; Kyzer, Angela; McKelvey, Lorraine

2017-01-01

This paper describes the development, implementation and preliminary evaluation of the Reaching Educators and Children (REACH) program, a training and coaching intervention designed to increase the capacity of early childhood teachers to support children's social and emotional development. We evaluated REACH with 139 teachers of toddler and…
Testing for sustainable preservatives

USDA-ARS?s Scientific Manuscript database

Rising antimicrobial resistance and heath concerns of common antimicrobials warrants the development of new, safer antimicrobial agents. A rapid screening protocol was developed to assess the antimicrobial properties of natural and synthetic substances. Benchmark substances were evaluated against re...
78 FR 54869 - Fisheries of the Gulf of Mexico; Southeast Data, Assessment, and Review (SEDAR); Public Meetings

Federal Register 2010, 2011, 2012, 2013, 2014

2013-09-06

... NOAA Fisheries Southeast Regional Office, Highly Migratory Species Management Division, and Southeast... describes the fisheries, evaluates the status of the stock, estimates biological benchmarks, projects future...
Proposed re-evaluation of the 154Eu thermal ( n, γ) capture cross-section based on spent fuel benchmarking studies

DOE PAGES

Skutnik, Steven E.

2016-09-22

154Eu is a nuclide of considerable importance to both non-destructive measurements of used nuclear fuel assembly burnup as well as for calculating the radiation source term for used fuel storage and transportation. But, recent evidence from code validation studies of spent fuel benchmarks have revealed evidence of a systemic bias in predicted 154Eu inventories when using ENDF/B-VII.0 and ENDF/B-VII.1 nuclear data libraries, wherein Eu-154 is consistently over-predicted on the order of 10% or more. Further, this bias is found to correlate with sample burnup, resulting in a larger departure from experimental measurements for higher sample burnups. Here, the bias in Eu-154 is characterized across eleven spent fuel destructive assay benchmarks from five different assemblies. Based on these studies, possible amendments to the ENDF/B-VII.0 and VII.1 evaluations of the 154Eu (n,γ) 155Eu are explored. By amending the location of the first resolved resonance for the 154Eu radiative capture cross-section (centered at 0.195 eV in ENDF/B-VII.0 and VII.1) to 0.188 eV and adjusting the neutron capture width proportional tomore » $$\\sqrt1/E$$, the amended cross-section evaluation was found to reduce the bias in predicted 154Eu inventories by approximately 5–7%. And while the amended capture cross-section still results in a residual over-prediction of 154Eu (ranging from 2% to 9%), the effect is substantially attenuated compared with the nominal ENDF/B-VII.0 and VII.1 evaluations.« less
Proposed re-evaluation of the 154Eu thermal ( n, γ) capture cross-section based on spent fuel benchmarking studies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Skutnik, Steven E.

154Eu is a nuclide of considerable importance to both non-destructive measurements of used nuclear fuel assembly burnup as well as for calculating the radiation source term for used fuel storage and transportation. But, recent evidence from code validation studies of spent fuel benchmarks have revealed evidence of a systemic bias in predicted 154Eu inventories when using ENDF/B-VII.0 and ENDF/B-VII.1 nuclear data libraries, wherein Eu-154 is consistently over-predicted on the order of 10% or more. Further, this bias is found to correlate with sample burnup, resulting in a larger departure from experimental measurements for higher sample burnups. Here, the bias in Eu-154 is characterized across eleven spent fuel destructive assay benchmarks from five different assemblies. Based on these studies, possible amendments to the ENDF/B-VII.0 and VII.1 evaluations of the 154Eu (n,γ) 155Eu are explored. By amending the location of the first resolved resonance for the 154Eu radiative capture cross-section (centered at 0.195 eV in ENDF/B-VII.0 and VII.1) to 0.188 eV and adjusting the neutron capture width proportional tomore » $$\\sqrt1/E$$, the amended cross-section evaluation was found to reduce the bias in predicted 154Eu inventories by approximately 5–7%. And while the amended capture cross-section still results in a residual over-prediction of 154Eu (ranging from 2% to 9%), the effect is substantially attenuated compared with the nominal ENDF/B-VII.0 and VII.1 evaluations.« less
Evaluation of the Relative Citation Ratio, a New National Institutes of Health-Supported Bibliometric Measure of Research Productivity, among Academic Radiation Oncologists.

PubMed

Rock, Calvin B; Prabhu, Arpan V; Fuller, C David; Thomas, Charles R; Holliday, Emma B

2018-03-01

Publication metrics are useful in evaluating academic faculty for awarding grants, recruitment, and promotion. A new metric, the relative citation ratio (RCR), was recently released by the National Institutes of Health (NIH); however, no benchmark data yet exist. We sought to create benchmark data for physician faculty in academic radiation oncology (RO) and analyze correlations associated with increased academic productivity. Citation database searches were performed for all US radiation oncologists affiliated with academic RO programs. Gender, NIH funding, career duration, academic rank, RCR, and weighted RCR were collected for each faculty. RCR and weighted RCR were calculated and compared between each subgroup of interest. RCR percentiles were also created for reference. A total of 1,299 RO physician faculty members from 75 institutions were included in the analysis. Overall, RO physician were very productive and influential with a mean RCR of 1.57 ± 1.53 SD and median RCR (interquartile range) of 1.32 (0.87-1.94). Academic rank, career duration, and NIH funding were associated with increased mean RCR and weighted RCR. Male gender and having a PhD were associated with an increased weighted RCR but not an increased mean RCR. Current academic radiation oncologists have a high mean RCR value relative to the benchmark NIH RCR value of 1. All subgroups analyzed had an RCR value above 1 with professor or chair and previous NIH funding having the highest RCR and weighted RCR values overall. These data may be useful for self-evaluation of ROs as well as evaluation of faculty by institutional and departmental leaders. Copyright © 2017 American College of Radiology. Published by Elsevier Inc. All rights reserved.
Three-Dimensional Cellular Structures Enhanced By Shape Memory Alloys

NASA Technical Reports Server (NTRS)

Nathal, Michael V.; Krause, David L.; Wilmoth, Nathan G.; Bednarcyk, Brett A.; Baker, Eric H.

2014-01-01

This research effort explored lightweight structural concepts married with advanced smart materials to achieve a wide variety of benefits in airframe and engine components. Lattice block structures were cast from an aerospace structural titanium alloy Ti-6Al-4V and a NiTi shape memory alloy (SMA), and preliminary properties have been measured. A finite element-based modeling approach that can rapidly and accurately capture the deformation response of lattice architectures was developed. The Ti-6-4 and SMA material behavior was calibrated via experimental tests of ligaments machined from the lattice. Benchmark testing of complete lattice structures verified the main aspects of the model as well as demonstrated the advantages of the lattice structure. Shape memory behavior of a sample machined from a lattice block was also demonstrated.
Modeling of enterprise information systems implementation: a preliminary investigation

NASA Astrophysics Data System (ADS)

Yusuf, Yahaya Y.; Abthorpe, M. S.; Gunasekaran, Angappa; Al-Dabass, D.; Onuh, Spencer

2001-10-01

The business enterprise has never been in greater need of Agility and the current trend will continue unabated well into the future. It is now recognized that information system is both the foundation and a necessary condition for increased responsiveness. A successful implementation of Enterprise Resource Planning (ERP) can help a company to move towards delivering on its competitive objectives as it enables suppliers to reach out to customers beyond the borders of traditional market defined by geography. The cost of implementation, even when it is successful, could be significant. Bearing in mind the potential strategic benefits, it is important that the implementation project is managed effectively. To this end a project cost model against which to benchmark ongoing project expenditure versus activities completed has been proposed in this paper.
Benchmarking physician performance, part 2.

PubMed

Collier, David A; Collier, Cindy Eddins; Kelly, Thomas M

2006-01-01

Part 1 of this article (January-February 2006) reviewed ways of measuring the work of physicians through methods such as data envelopment analysis (DEA) and relative value units (RVUs). These techniques provide insights into: 1. Who are the best-performing physicians? 2. Who are the underperforming physicians? 3. How can underperforming physicians improve? 4. What are the underperformers' performance targets? 5. How do you deal with full- and part-time physicians in a university setting? Part 2 compares the performance of 16 primary care physicians in the same medical specialty using DEA efficiency scores. DEA is capable of modeling multiple criteria and automatically determines the relative weights of each performance measure. This research also provides a preliminary framework for how work measurement and DEA can be used as a basis for a medical team or physician compensation system.
Preliminary evaluation of diabatic heating distribution from FGGE level 3b analysis data

NASA Technical Reports Server (NTRS)

Kasahara, A.; Mizzi, A. P.

1985-01-01

A method is presented for calculating the global distribution of diabatic heating rate. Preliminary results of global heating rate evaluated from the European center for Medium Range Weather Forecasts Level IIIb analysis data is also presented.
Notification: Preliminary Research to Evaluate EPA's Response to Contamination From Historical Lead Smelters

EPA Pesticide Factsheets

Project #OPE-FY13-0023, June 6, 2013. The U.S. Environmental Protection Agency’s Office of Inspector General plans to begin preliminary research to evaluate the EPA’s response to contamination from historical lead smelters.
Preliminary evaluation of advanced air bag field performance using event data recorders

DOT National Transportation Integrated Search

2008-08-31

This report describes a preliminary evaluation of the field performance of occupant restraint systems designed with advanced air bag features including those specified in the Federal Motor Vehicle Safety Standard No. 208 for advanced air bags, throug...
Status of groundwater quality in the Upper Santa Ana Watershed, November 2006--March 2007--California GAMA Priority Basin Project

USGS Publications Warehouse

Kent, Robert; Belitz, Kenneth

2012-01-01

Groundwater quality in the approximately 1,000-square-mile (2,590-square-kilometer) Upper Santa Ana Watershed (USAW) study unit was investigated as part of the Priority Basin Project of the Groundwater Ambient Monitoring and Assessment (GAMA) Program. The study unit is located in southern California in Riverside and San Bernardino Counties. The GAMA Priority Basin Project is being conducted by the California State Water Resources Control Board in collaboration with the U.S. Geological Survey and the Lawrence Livermore National Laboratory. The GAMA USAW study was designed to provide a spatially unbiased assessment of untreated groundwater quality within the primary aquifer systems in the study unit. The primary aquifer systems (hereinafter, primary aquifers) are defined as the perforation interval of wells listed in the California Department of Public Health (CDPH) database for the USAW study unit. The quality of groundwater in shallower or deeper water-bearing zones may differ from that in the primary aquifers; shallower groundwater may be more vulnerable to surficial contamination. The assessment is based on water-quality and ancillary data collected by the U.S. Geological Survey (USGS) from 90 wells during November 2006 through March 2007, and water-quality data from the CDPH database. The status of the current quality of the groundwater resource was assessed based on data from samples analyzed for volatile organic compounds (VOCs), pesticides, and naturally occurring inorganic constituents, such as major ions and trace elements. The status assessment is intended to characterize the quality of groundwater resources within the primary aquifers of the USAW study unit, not the treated drinking water delivered to consumers by water purveyors. Relative-concentrations (sample concentration divided by the health- or aesthetic-based benchmark concentration) were used for evaluating groundwater quality for those constituents that have Federal or California regulatory or non-regulatory benchmarks for drinking-water quality. A relative-concentration greater than (>) 1.0 indicates a concentration above a benchmark, and a relative-concentration less than or equal to (≤) 1.0 indicates a concentration equal to or less than a benchmark. Organic and special-interest constituent relative-concentrations were classified as "high" (> 1.0), "moderate" (0.1 1.0), "moderate" (0.5 < relative-concentration ≤ 1.0), or "low" ( ≤ 0.5). Aquifer-scale proportion was used as the primary metric in the status assessment for evaluating regional-scale groundwater quality. Aquifer-scale proportions are defined as the percentage of the area of the primary aquifer system with concentrations above or below specified thresholds relative to regulatory or aesthetic benchmarks. High aquifer-scale proportion is defined as the percentage of the area of the primary aquifers with a relative-concentration greater than 1.0 for a particular constituent or class of constituents; percentage is based on an areal, rather than a volumetric basis. Moderate and low aquifer-scale proportions were defined as the percentage of the primary aquifers with moderate and low relative-concentrations, respectively. Two statistical approaches—grid-based and spatially weighted—were used to evaluate aquifer-scale proportions for individual constituents and classes of constituents. Grid-based and spatially weighted estimates were comparable in the USAW study unit (within 90-percent confidence intervals). Inorganic constituents with human-health benchmarks had relative-concentrations that were high in 32.9 percent of the primary aquifers, moderate in 29.3 percent, and low in 37.8 percent. The high aquifer-scale proportion of these inorganic constituents primarily reflected high aquifer-scale proportions of nitrate (high relative-concentration in 25.3 percent of the aquifer), although seven other inorganic constituents with human-health benchmarks also were detected at high relative-concentrations in some percentage of the aquifer: arsenic, boron, fluoride, gross alpha activity, molybdenum, uranium, and vanadium. Perchlorate, as a constituent of special interest, was evaluated separately from other inorganic constituents, and had high relative-concentrations in 11.1 percent, moderate in 53.3 percent, and low or not detected in 35.6 percent of the primary aquifers. In contrast to the inorganic constituents, relative-concentrations of organic constituents (one or more) were high in 6.7 percent, moderate in 11.1 percent, and low or not detected in 82.2 percent of the primary aquifers. Of the 237 organic and special-interest constituents analyzed for, 39 constituents were detected (21 VOCs, 13 pesticides, 3 pharmaceuticals, and 2 constituents of special interest). All of the detected VOCs had health-based benchmarks, and five of these—1,1-dichloroethene, 1,2-dibromo-3-chloropropane (DBCP), tetrachloroethene (PCE), carbon tetrachloride, and trichloroethene (TCE)—were detected in at least one sample at a concentration above a benchmark (high relative-concentration). Seven of the 13 pesticides had health-based benchmarks, and none were detected above these benchmarks (no high relative-concentrations). Pharmaceuticals do not have health-based benchmarks. Thirteen organic constituents were frequently detected (detected in at least 10 percent of samples without regard to relative-concentrations): bromodichloromethane, chloroform, cis-1,2-dichloroethene, 1,1-dichloroethene, dichlorodifluoromethane (CFC-12), methyl tert-butyl ether (MTBE), PCE, TCE, trichlorofluoromethane (CFC-11), atrazine, bromacil, diuron, and simazine.

Health risk assessment of organic micropollutants in greywater for potable reuse.

PubMed

Etchepare, Ramiro; van der Hoek, Jan Peter

2015-04-01

In light of the increasing interest in development of sustainable potable reuse systems, additional research is needed to elucidate the risks of producing drinking water from new raw water sources. This article investigates the presence and potential health risks of organic micropollutants in greywater, a potential new source for potable water production introduced in this work. An extensive literature survey reveals that almost 280 organic micropollutants have been detected in greywater. A three-tiered approach is applied for the preliminary health risk assessment of these chemicals. Benchmark values are derived from established drinking water standards for compounds grouped in Tier 1, from literature toxicological data for compounds in Tier 2, and from a Threshold of Toxicological Concern approach for compounds in Tier 3. A risk quotient is estimated by comparing the maximum concentration levels reported in greywater to the benchmark values. The results show that for the majority of compounds, risk quotient values were below 0.2, which suggests they would not pose appreciable concern to human health over a lifetime exposure to potable water. Fourteen compounds were identified with risk quotients above 0.2 which may warrant further investigation if greywater is used as a source for potable reuse. The present findings are helpful in prioritizing upcoming greywater quality monitoring and defining the goals of multiple barriers treatment in future water reclamation plants for potable water production. Copyright © 2014 Elsevier Ltd. All rights reserved.
Towards a sharp-interface volume-of-fluid methodology for modeling evaporation

NASA Astrophysics Data System (ADS)

Pathak, Ashish; Raessi, Mehdi

2017-11-01

In modeling evaporation, the diffuse-interface (one-domain) formulation yields inaccurate results. Recent efforts approaching the problem via a sharp-interface (two-domain) formulation have shown significant improvements. The reasons behind their better performance are discussed in the present work. All available sharp-interface methods, however, exclusively employ the level-set. In the present work, we develop a sharp-interface evaporation model in a volume-of-fluid (VOF) framework in order to leverage its mass-conserving property as well as its ability to handle large topographical changes. We start with a critical review of the assumptions underlying the mathematical equations governing evaporation. For example, it is shown that the assumption of incompressibility can only be applied in special circumstances. The famous D2 law used for benchmarking is valid exclusively to steady-state test problems. Transient is present over significant lifetime of a micron-size droplet. Therefore, a 1D spherical fully transient model is developed to provide a benchmark transient solution. Finally, a 3D Cartesian Navier-Stokes evaporation solver is developed. Some preliminary validation test-cases are presented for static and moving drop evaporation. This material is based upon work supported by the Department of Energy, Office of Energy Efficiency and Renewable Energy and the Department of Defense, Tank and Automotive Research, Development, and Engineering Center, under Award Number DEEE0007292.
Visco-Resistive MHD Modeling Benchmark of Forced Magnetic Reconnection

NASA Astrophysics Data System (ADS)

Beidler, M. T.; Hegna, C. C.; Sovinec, C. R.; Callen, J. D.; Ferraro, N. M.

2016-10-01

The presence of externally-applied 3D magnetic fields can affect important phenomena in tokamaks, including mode locking, disruptions, and edge localized modes. External fields penetrate into the plasma and can lead to forced magnetic reconnection (FMR), and hence magnetic islands, on resonant surfaces if the local plasma rotation relative to the external field is slow. Preliminary visco-resistive MHD simulations of FMR in a slab geometry are consistent with theory. Specifically, linear simulations exhibit proper scaling of the penetrated field with resistivity, viscosity, and flow, and nonlinear simulations exhibit a bifurcation from a flow-screened to a field-penetrated, magnetic island state as the external field is increased, due to the 3D electromagnetic force. These results will be compared to simulations of FMR in a circular cross-section, cylindrical geometry by way of a benchmark between the NIMROD and M3D-C1 extended-MHD codes. Because neither this geometry nor the MHD model has the physics of poloidal flow damping, the theory of will be expanded to include poloidal flow effects. The resulting theory will be tested with linear and nonlinear simulations that vary the resistivity, viscosity, flow, and external field. Supported by OFES DoE Grants DE-FG02-92ER54139, DE-FG02-86ER53218, DE-AC02-09CH11466, and the SciDAC Center for Extended MHD Modeling.
Evaluating the effects of dam breach methodologies on Consequence Estimation through Sensitivity Analysis

NASA Astrophysics Data System (ADS)

Kalyanapu, A. J.; Thames, B. A.

2013-12-01

Dam breach modeling often includes application of models that are sophisticated, yet computationally intensive to compute flood propagation at high temporal and spatial resolutions. This results in a significant need for computational capacity that requires development of newer flood models using multi-processor and graphics processing techniques. Recently, a comprehensive benchmark exercise titled the 12th Benchmark Workshop on Numerical Analysis of Dams, is organized by the International Commission on Large Dams (ICOLD) to evaluate the performance of these various tools used for dam break risk assessment. The ICOLD workshop is focused on estimating the consequences of failure of a hypothetical dam near a hypothetical populated area with complex demographics, and economic activity. The current study uses this hypothetical case study and focuses on evaluating the effects of dam breach methodologies on consequence estimation and analysis. The current study uses ICOLD hypothetical data including the topography, dam geometric and construction information, land use/land cover data along with socio-economic and demographic data. The objective of this study is to evaluate impacts of using four different dam breach methods on the consequence estimates used in the risk assessments. The four methodologies used are: i) Froehlich (1995), ii) MacDonald and Langridge-Monopolis 1984 (MLM), iii) Von Thun and Gillete 1990 (VTG), and iv) Froehlich (2008). To achieve this objective, three different modeling components were used. First, using the HEC-RAS v.4.1, dam breach discharge hydrographs are developed. These hydrographs are then provided as flow inputs into a two dimensional flood model named Flood2D-GPU, which leverages the computer's graphics card for much improved computational capabilities of the model input. Lastly, outputs from Flood2D-GPU, including inundated areas, depth grids, velocity grids, and flood wave arrival time grids, are input into HEC-FIA, which provides the consequence assessment for the solution to the problem statement. For the four breach methodologies, a sensitivity analysis of four breach parameters, breach side slope (SS), breach width (Wb), breach invert elevation (Elb), and time of failure (tf), is conducted. Up to, 68 simulations are computed to produce breach hydrographs in HEC-RAS for input into Flood2D-GPU. The Flood2D-GPU simulation results were then post-processed in HEC-FIA to evaluate: Total Population at Risk (PAR), 14-yr and Under PAR (PAR14-), 65-yr and Over PAR (PAR65+), Loss of Life (LOL) and Direct Economic Impact (DEI). The MLM approach resulted in wide variability in simulated minimum and maximum values of PAR, PAR 65+ and LOL estimates. For PAR14- and DEI, Froehlich (1995) resulted in lower values while MLM resulted in higher estimates. This preliminary study demonstrated the relative performance of four commonly used dam breach methodologies and their impacts on consequence estimation.
Performance Comparison of NAMI DANCE and FLOW-3D® Models in Tsunami Propagation, Inundation and Currents using NTHMP Benchmark Problems

NASA Astrophysics Data System (ADS)

Velioglu Sogut, Deniz; Yalciner, Ahmet Cevdet

2018-06-01

Field observations provide valuable data regarding nearshore tsunami impact, yet only in inundation areas where tsunami waves have already flooded. Therefore, tsunami modeling is essential to understand tsunami behavior and prepare for tsunami inundation. It is necessary that all numerical models used in tsunami emergency planning be subject to benchmark tests for validation and verification. This study focuses on two numerical codes, NAMI DANCE and FLOW-3D®, for validation and performance comparison. NAMI DANCE is an in-house tsunami numerical model developed by the Ocean Engineering Research Center of Middle East Technical University, Turkey and Laboratory of Special Research Bureau for Automation of Marine Research, Russia. FLOW-3D® is a general purpose computational fluid dynamics software, which was developed by scientists who pioneered in the design of the Volume-of-Fluid technique. The codes are validated and their performances are compared via analytical, experimental and field benchmark problems, which are documented in the ``Proceedings and Results of the 2011 National Tsunami Hazard Mitigation Program (NTHMP) Model Benchmarking Workshop'' and the ``Proceedings and Results of the NTHMP 2015 Tsunami Current Modeling Workshop". The variations between the numerical solutions of these two models are evaluated through statistical error analysis.
Benchmarking a Soil Moisture Data Assimilation System for Agricultural Drought Monitoring

NASA Technical Reports Server (NTRS)

Hun, Eunjin; Crow, Wade T.; Holmes, Thomas; Bolten, John

2014-01-01

Despite considerable interest in the application of land surface data assimilation systems (LDAS) for agricultural drought applications, relatively little is known about the large-scale performance of such systems and, thus, the optimal methodological approach for implementing them. To address this need, this paper evaluates an LDAS for agricultural drought monitoring by benchmarking individual components of the system (i.e., a satellite soil moisture retrieval algorithm, a soil water balance model and a sequential data assimilation filter) against a series of linear models which perform the same function (i.e., have the same basic inputoutput structure) as the full system component. Benchmarking is based on the calculation of the lagged rank cross-correlation between the normalized difference vegetation index (NDVI) and soil moisture estimates acquired for various components of the system. Lagged soil moistureNDVI correlations obtained using individual LDAS components versus their linear analogs reveal the degree to which non-linearities andor complexities contained within each component actually contribute to the performance of the LDAS system as a whole. Here, a particular system based on surface soil moisture retrievals from the Land Parameter Retrieval Model (LPRM), a two-layer Palmer soil water balance model and an Ensemble Kalman filter (EnKF) is benchmarked. Results suggest significant room for improvement in each component of the system.
Thermal Performance Benchmarking: Annual Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moreno, Gilbert

2016-04-08

The goal for this project is to thoroughly characterize the performance of state-of-the-art (SOA) automotive power electronics and electric motor thermal management systems. Information obtained from these studies will be used to: Evaluate advantages and disadvantages of different thermal management strategies; establish baseline metrics for the thermal management systems; identify methods of improvement to advance the SOA; increase the publicly available information related to automotive traction-drive thermal management systems; help guide future electric drive technologies (EDT) research and development (R&D) efforts. The performance results combined with component efficiency and heat generation information obtained by Oak Ridge National Laboratory (ORNL) maymore » then be used to determine the operating temperatures for the EDT components under drive-cycle conditions. In FY15, the 2012 Nissan LEAF power electronics and electric motor thermal management systems were benchmarked. Testing of the 2014 Honda Accord Hybrid power electronics thermal management system started in FY15; however, due to time constraints it was not possible to include results for this system in this report. The focus of this project is to benchmark the thermal aspects of the systems. ORNL's benchmarking of electric and hybrid electric vehicle technology reports provide detailed descriptions of the electrical and packaging aspects of these automotive systems.« less
GoWeb: a semantic search engine for the life science web.

PubMed

Dietze, Heiko; Schroeder, Michael

2009-10-01

Current search engines are keyword-based. Semantic technologies promise a next generation of semantic search engines, which will be able to answer questions. Current approaches either apply natural language processing to unstructured text or they assume the existence of structured statements over which they can reason. Here, we introduce a third approach, GoWeb, which combines classical keyword-based Web search with text-mining and ontologies to navigate large results sets and facilitate question answering. We evaluate GoWeb on three benchmarks of questions on genes and functions, on symptoms and diseases, and on proteins and diseases. The first benchmark is based on the BioCreAtivE 1 Task 2 and links 457 gene names with 1352 functions. GoWeb finds 58% of the functional GeneOntology annotations. The second benchmark is based on 26 case reports and links symptoms with diseases. GoWeb achieves 77% success rate improving an existing approach by nearly 20%. The third benchmark is based on 28 questions in the TREC genomics challenge and links proteins to diseases. GoWeb achieves a success rate of 79%. GoWeb's combination of classical Web search with text-mining and ontologies is a first step towards answering questions in the biomedical domain. GoWeb is online at: http://www.gopubmed.org/goweb.
Methodology and Data Sources for Assessing Extreme Charging Events within the Earth's Magnetosphere

NASA Astrophysics Data System (ADS)

Parker, L. N.; Minow, J. I.; Talaat, E. R.

2016-12-01

Spacecraft surface and internal charging is a potential threat to space technologies because electrostatic discharges on, or within, charged spacecraft materials can result in a number of adverse impacts to spacecraft systems. The Space Weather Action Plan (SWAP) ionizing radiation benchmark team recognized that spacecraft charging will need to be considered to complete the ionizing radiation benchmarks in order to evaluate the threat of charging to critical space infrastructure operating within the near-Earth ionizing radiation environments. However, the team chose to defer work on the lower energy charging environments and focus the initial benchmark efforts on the higher energy galactic cosmic ray, solar energetic particle, and trapped radiation belt particle environments of concern for radiation dose and single event effects in humans and hardware. Therefore, an initial set of 1 in 100 year spacecraft charging environment benchmarks remains to be defined to meet the SWAP goals. This presentation will discuss the available data sources and a methodology to assess the 1 in 100 year extreme space weather events that drive surface and internal charging threats to spacecraft. Environments to be considered are the hot plasmas in the outer magnetosphere during geomagnetic storms, relativistic electrons in the outer radiation belt, and energetic auroral electrons in low Earth orbit at high latitudes.
Evaluation of the Pool Critical Assembly Benchmark with Explicitly-Modeled Geometry using MCNP6

DOE PAGES

Kulesza, Joel A.; Martz, Roger Lee

2017-03-01

Despite being one of the most widely used benchmarks for qualifying light water reactor (LWR) radiation transport methods and data, no benchmark calculation of the Oak Ridge National Laboratory (ORNL) Pool Critical Assembly (PCA) pressure vessel wall benchmark facility (PVWBF) using MCNP6 with explicitly modeled core geometry exists. As such, this paper provides results for such an analysis. First, a criticality calculation is used to construct the fixed source term. Next, ADVANTG-generated variance reduction parameters are used within the final MCNP6 fixed source calculations. These calculations provide unadjusted dosimetry results using three sets of dosimetry reaction cross sections of varyingmore » ages (those packaged with MCNP6, from the IRDF-2002 multi-group library, and from the ACE-formatted IRDFF v1.05 library). These results are then compared to two different sets of measured reaction rates. The comparison agrees in an overall sense within 2% and on a specific reaction- and dosimetry location-basis within 5%. Except for the neptunium dosimetry, the individual foil raw calculation-to-experiment comparisons usually agree within 10% but is typically greater than unity. Finally, in the course of developing these calculations, geometry that has previously not been completely specified is provided herein for the convenience of future analysts.« less
Development and Experimental Benchmark of Simulations to Predict Used Nuclear Fuel Cladding Temperatures during Drying and Transfer Operations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Greiner, Miles

Radial hydride formation in high-burnup used fuel cladding has the potential to radically reduce its ductility and suitability for long-term storage and eventual transport. To avoid this formation, the maximum post-reactor temperature must remain sufficiently low to limit the cladding hoop stress, and so that hydrogen from the existing circumferential hydrides will not dissolve and become available to re-precipitate into radial hydrides under the slow cooling conditions during drying, transfer and early dry-cask storage. The objective of this research is to develop and experimentallybenchmark computational fluid dynamics simulations of heat transfer in post-pool-storage drying operations, when high-burnup fuel cladding ismore » likely to experience its highest temperature. These benchmarked tools can play a key role in evaluating dry cask storage systems for extended storage of high-burnup fuels and post-storage transportation, including fuel retrievability. The benchmarked tools will be used to aid the design of efficient drying processes, as well as estimate variations of surface temperatures as a means of inferring helium integrity inside the canister or cask. This work will be conducted effectively because the principal investigator has experience developing these types of simulations, and has constructed a test facility that can be used to benchmark them.« less
Risk assessment for consumer exposure to toluene diisocyanate (TDI) derived from polyurethane flexible foam.

PubMed

Arnold, Scott M; Collins, Michael A; Graham, Cynthia; Jolly, Athena T; Parod, Ralph J; Poole, Alan; Schupp, Thomas; Shiotsuka, Ronald N; Woolhiser, Michael R

2012-12-01

Polyurethanes (PU) are polymers made from diisocyanates and polyols for a variety of consumer products. It has been suggested that PU foam may contain trace amounts of residual toluene diisocyanate (TDI) monomers and present a health risk. To address this concern, the exposure scenario and health risks posed by sleeping on a PU foam mattress were evaluated. Toxicity benchmarks for key non-cancer endpoints (i.e., irritation, sensitization, respiratory tract effects) were determined by dividing points of departure by uncertainty factors. The cancer benchmark was derived using the USEPA Benchmark Dose Software. Results of previous migration and emission data of TDI from PU foam were combined with conservative exposure factors to calculate upper-bound dermal and inhalation exposures to TDI as well as a lifetime average daily dose to TDI from dermal exposure. For each non-cancer endpoint, the toxicity benchmark was divided by the calculated exposure to determine the margin of safety (MOS), which ranged from 200 (respiratory tract) to 3×10(6) (irritation). Although available data indicate TDI is not carcinogenic, a theoretical excess cancer risk (1×10(-7)) was calculated. We conclude from this assessment that sleeping on a PU foam mattress does not pose TDI-related health risks to consumers. Copyright © 2012 Elsevier Inc. All rights reserved.
Validating vignette and conjoint survey experiments against real-world behavior

PubMed Central

Hainmueller, Jens; Hangartner, Dominik; Yamamoto, Teppei

2015-01-01

Survey experiments, like vignette and conjoint analyses, are widely used in the social sciences to elicit stated preferences and study how humans make multidimensional choices. However, there is a paucity of research on the external validity of these methods that examines whether the determinants that explain hypothetical choices made by survey respondents match the determinants that explain what subjects actually do when making similar choices in real-world situations. This study compares results from conjoint and vignette analyses on which immigrant attributes generate support for naturalization with closely corresponding behavioral data from a natural experiment in Switzerland, where some municipalities used referendums to decide on the citizenship applications of foreign residents. Using a representative sample from the same population and the official descriptions of applicant characteristics that voters received before each referendum as a behavioral benchmark, we find that the effects of the applicant attributes estimated from the survey experiments perform remarkably well in recovering the effects of the same attributes in the behavioral benchmark. We also find important differences in the relative performances of the different designs. Overall, the paired conjoint design, where respondents evaluate two immigrants side by side, comes closest to the behavioral benchmark; on average, its estimates are within 2% percentage points of the effects in the behavioral benchmark. PMID:25646415
All inclusive benchmarking.

PubMed

Ellis, Judith

2006-07-01

The aim of this article is to review published descriptions of benchmarking activity and synthesize benchmarking principles to encourage the acceptance and use of Essence of Care as a new benchmarking approach to continuous quality improvement, and to promote its acceptance as an integral and effective part of benchmarking activity in health services. The Essence of Care, was launched by the Department of Health in England in 2001 to provide a benchmarking tool kit to support continuous improvement in the quality of fundamental aspects of health care, for example, privacy and dignity, nutrition and hygiene. The tool kit is now being effectively used by some frontline staff. However, use is inconsistent, with the value of the tool kit, or the support clinical practice benchmarking requires to be effective, not always recognized or provided by National Health Service managers, who are absorbed with the use of quantitative benchmarking approaches and measurability of comparative performance data. This review of published benchmarking literature, was obtained through an ever-narrowing search strategy commencing from benchmarking within quality improvement literature through to benchmarking activity in health services and including access to not only published examples of benchmarking approaches and models used but the actual consideration of web-based benchmarking data. This supported identification of how benchmarking approaches have developed and been used, remaining true to the basic benchmarking principles of continuous improvement through comparison and sharing (Camp 1989). Descriptions of models and exemplars of quantitative and specifically performance benchmarking activity in industry abound (Camp 1998), with far fewer examples of more qualitative and process benchmarking approaches in use in the public services and then applied to the health service (Bullivant 1998). The literature is also in the main descriptive in its support of the effectiveness of benchmarking activity and although this does not seem to have restricted its popularity in quantitative activity, reticence about the value of the more qualitative approaches, for example Essence of Care, needs to be overcome in order to improve the quality of patient care and experiences. The perceived immeasurability and subjectivity of Essence of Care and clinical practice benchmarks means that these benchmarking approaches are not always accepted or supported by health service organizations as valid benchmarking activity. In conclusion, Essence of Care benchmarking is a sophisticated clinical practice benchmarking approach which needs to be accepted as an integral part of health service benchmarking activity to support improvement in the quality of patient care and experiences.
Evaluating Bias of Sequential Mixed-Mode Designs against Benchmark Surveys

ERIC Educational Resources Information Center

Klausch, Thomas; Schouten, Barry; Hox, Joop J.

2017-01-01

This study evaluated three types of bias--total, measurement, and selection bias (SB)--in three sequential mixed-mode designs of the Dutch Crime Victimization Survey: telephone, mail, and web, where nonrespondents were followed up face-to-face (F2F). In the absence of true scores, all biases were estimated as mode effects against two different…
Evaluating Biology Achievement Scores in an ICT Integrated PBL Environment

ERIC Educational Resources Information Center

Osman, Kamisah; Kaur, Simranjeet Judge

2014-01-01

Students' achievement in Biology is often looked up as a benchmark to evaluate the mode of teaching and learning in higher education. Problem-based learning (PBL) is an approach that focuses on students' solving a problem through collaborative groups. There were eighty samples involved in this study. The samples were divided into three groups: ICT…
Provenance Store Evaluation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Paulson, Patrick R.; Gibson, Tara D.; Schuchardt, Karen L.

2008-03-01

Requirements for the provenance store and access API are developed. Existing RDF stores and APIs are evaluated against the requirements and performance benchmarks. The team’s conclusion is to use MySQL as a database backend, with a possible move to Oracle in the near-term future. Both Jena and Sesame’s APIs will be supported, but new code will use the Jena API
Rasch Model Analysis on the Effectiveness of Early Evaluation Questions as a Benchmark for New Students Ability

ERIC Educational Resources Information Center

Arsad, Norhana; Kamal, Noorfazila; Ayob, Afida; Sarbani, Nizaroyani; Tsuey, Chong Sheau; Misran, Norbahiah; Husain, Hafizah

2013-01-01

This paper discusses the effectiveness of the early evaluation questions conducted to determine the academic ability of the new students in the Department of Electrical, Electronics and Systems Engineering. Questions designed are knowledge based--on what the students have learned during their pre-university level. The results show students have…
Letterland Evaluation: 2013-14. Eye on Evaluation. D&A Report No. 14.18

ERIC Educational Resources Information Center

Paeplow, Colleen

2015-01-01

In 2013-14, Letterland had strong implementation, with moderate to high fidelity within approximately 90% of Wake County Public School System (WCPSS) K-1 classrooms. The impact of Letterland on students' reading achievement was neutral to positive. A significantly higher percentage of WCPSS kindergarten students were at or above benchmark mid-year…
Learning How to Play Ball: Applying Sabermetric Thinking to Benchmarking in Higher Education

ERIC Educational Resources Information Center

Levy, Gary D.

2012-01-01

Although the notion is certainly cliched, baseball often serves as an excellent metaphor for life. Some of the methodologies currently being used to measure, evaluate, manage, and even play baseball may serve as references for ways that higher education may be measured, evaluated, managed, and played. This chapter proposes and presents…

Some links on this page may take you to non-federal websites. Their policies may differ from this site.