metric development benchmarking: Topics by Science.gov

Sample records for metric development benchmarking

Issues in Benchmark Metric Selection

NASA Astrophysics Data System (ADS)

Crolotte, Alain

It is true that a metric can influence a benchmark but will esoteric metrics create more problems than they will solve? We answer this question affirmatively by examining the case of the TPC-D metric which used the much debated geometric mean for the single-stream test. We will show how a simple choice influenced the benchmark and its conduct and, to some extent, DBMS development. After examining other alternatives our conclusion is that the “real” measure for a decision-support benchmark is the arithmetic mean.
Conceptual Soundness, Metric Development, Benchmarking, and Targeting for PATH Subprogram Evaluation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mosey. G.; Doris, E.; Coggeshall, C.

The objective of this study is to evaluate the conceptual soundness of the U.S. Department of Housing and Urban Development (HUD) Partnership for Advancing Technology in Housing (PATH) program's revised goals and establish and apply a framework to identify and recommend metrics that are the most useful for measuring PATH's progress. This report provides an evaluative review of PATH's revised goals, outlines a structured method for identifying and selecting metrics, proposes metrics and benchmarks for a sampling of individual PATH programs, and discusses other metrics that potentially could be developed that may add value to the evaluation process. The frameworkmore » and individual program metrics can be used for ongoing management improvement efforts and to inform broader program-level metrics for government reporting requirements.« less
A Web Resource for Standardized Benchmark Datasets, Metrics, and Rosetta Protocols for Macromolecular Modeling and Design.

PubMed

Ó Conchúir, Shane; Barlow, Kyle A; Pache, Roland A; Ollikainen, Noah; Kundert, Kale; O'Meara, Matthew J; Smith, Colin A; Kortemme, Tanja

2015-01-01

The development and validation of computational macromolecular modeling and design methods depend on suitable benchmark datasets and informative metrics for comparing protocols. In addition, if a method is intended to be adopted broadly in diverse biological applications, there needs to be information on appropriate parameters for each protocol, as well as metrics describing the expected accuracy compared to experimental data. In certain disciplines, there exist established benchmarks and public resources where experts in a particular methodology are encouraged to supply their most efficient implementation of each particular benchmark. We aim to provide such a resource for protocols in macromolecular modeling and design. We present a freely accessible web resource (https://kortemmelab.ucsf.edu/benchmarks) to guide the development of protocols for protein modeling and design. The site provides benchmark datasets and metrics to compare the performance of a variety of modeling protocols using different computational sampling methods and energy functions, providing a "best practice" set of parameters for each method. Each benchmark has an associated downloadable benchmark capture archive containing the input files, analysis scripts, and tutorials for running the benchmark. The captures may be run with any suitable modeling method; we supply command lines for running the benchmarks using the Rosetta software suite. We have compiled initial benchmarks for the resource spanning three key areas: prediction of energetic effects of mutations, protein design, and protein structure prediction, each with associated state-of-the-art modeling protocols. With the help of the wider macromolecular modeling community, we hope to expand the variety of benchmarks included on the website and continue to evaluate new iterations of current methods as they become available.
Evaluative Usage-Based Metrics for the Selection of E-Journals.

ERIC Educational Resources Information Center

Hahn, Karla L.; Faulkner, Lila A.

2002-01-01

Explores electronic journal usage statistics and develops three metrics and three benchmarks based on those metrics. Topics include earlier work that assessed the value of print journals and was modified for the electronic format; the evaluation of potential purchases; and implications for standards development, including the need for content…
HPGMG 1.0: A Benchmark for Ranking High Performance Computing Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Adams, Mark; Brown, Jed; Shalf, John

2014-05-05

This document provides an overview of the benchmark ? HPGMG ? for ranking large scale general purpose computers for use on the Top500 list [8]. We provide a rationale for the need for a replacement for the current metric HPL, some background of the Top500 list and the challenges of developing such a metric; we discuss our design philosophy and methodology, and an overview of the specification of the benchmark. The primary documentation with maintained details on the specification can be found at hpgmg.org and the Wiki and benchmark code itself can be found in the repository https://bitbucket.org/hpgmg/hpgmg.
Metric Evaluation Pipeline for 3d Modeling of Urban Scenes

NASA Astrophysics Data System (ADS)

Bosch, M.; Leichtman, A.; Chilcott, D.; Goldberg, H.; Brown, M.

2017-05-01

Publicly available benchmark data and metric evaluation approaches have been instrumental in enabling research to advance state of the art methods for remote sensing applications in urban 3D modeling. Most publicly available benchmark datasets have consisted of high resolution airborne imagery and lidar suitable for 3D modeling on a relatively modest scale. To enable research in larger scale 3D mapping, we have recently released a public benchmark dataset with multi-view commercial satellite imagery and metrics to compare 3D point clouds with lidar ground truth. We now define a more complete metric evaluation pipeline developed as publicly available open source software to assess semantically labeled 3D models of complex urban scenes derived from multi-view commercial satellite imagery. Evaluation metrics in our pipeline include horizontal and vertical accuracy and completeness, volumetric completeness and correctness, perceptual quality, and model simplicity. Sources of ground truth include airborne lidar and overhead imagery, and we demonstrate a semi-automated process for producing accurate ground truth shape files to characterize building footprints. We validate our current metric evaluation pipeline using 3D models produced using open source multi-view stereo methods. Data and software is made publicly available to enable further research and planned benchmarking activities.
Benchmarking Diagnostic Algorithms on an Electrical Power System Testbed

NASA Technical Reports Server (NTRS)

Kurtoglu, Tolga; Narasimhan, Sriram; Poll, Scott; Garcia, David; Wright, Stephanie

2009-01-01

Diagnostic algorithms (DAs) are key to enabling automated health management. These algorithms are designed to detect and isolate anomalies of either a component or the whole system based on observations received from sensors. In recent years a wide range of algorithms, both model-based and data-driven, have been developed to increase autonomy and improve system reliability and affordability. However, the lack of support to perform systematic benchmarking of these algorithms continues to create barriers for effective development and deployment of diagnostic technologies. In this paper, we present our efforts to benchmark a set of DAs on a common platform using a framework that was developed to evaluate and compare various performance metrics for diagnostic technologies. The diagnosed system is an electrical power system, namely the Advanced Diagnostics and Prognostics Testbed (ADAPT) developed and located at the NASA Ames Research Center. The paper presents the fundamentals of the benchmarking framework, the ADAPT system, description of faults and data sets, the metrics used for evaluation, and an in-depth analysis of benchmarking results obtained from testing ten diagnostic algorithms on the ADAPT electrical power system testbed.
Benchmarking infrastructure for mutation text mining

PubMed Central

2014-01-01

Background Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. Results We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. Conclusion We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption. PMID:24568600
Benchmarking infrastructure for mutation text mining.

PubMed

Klein, Artjom; Riazanov, Alexandre; Hindle, Matthew M; Baker, Christopher Jo

2014-02-25

Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption.
PDS: A Performance Database Server

DOE PAGES

Berry, Michael W.; Dongarra, Jack J.; Larose, Brian H.; ...

1994-01-01

The process of gathering, archiving, and distributing computer benchmark data is a cumbersome task usually performed by computer users and vendors with little coordination. Most important, there is no publicly available central depository of performance data for all ranges of machines from personal computers to supercomputers. We present an Internet-accessible performance database server (PDS) that can be used to extract current benchmark data and literature. As an extension to the X-Windows-based user interface (Xnetlib) to the Netlib archival system, PDS provides an on-line catalog of public domain computer benchmarks such as the LINPACK benchmark, Perfect benchmarks, and the NAS parallelmore » benchmarks. PDS does not reformat or present the benchmark data in any way that conflicts with the original methodology of any particular benchmark; it is thereby devoid of any subjective interpretations of machine performance. We believe that all branches (research laboratories, academia, and industry) of the general computing community can use this facility to archive performance metrics and make them readily available to the public. PDS can provide a more manageable approach to the development and support of a large dynamic database of published performance metrics.« less
Beyond Benchmarking: Value-Adding Metrics

ERIC Educational Resources Information Center

Fitz-enz, Jac

2007-01-01

HR metrics has grown up a bit over the past two decades, moving away from simple benchmarking practices and toward a more inclusive approach to measuring institutional performance and progress. In this article, the acknowledged "father" of human capital performance benchmarking provides an overview of several aspects of today's HR metrics…
Pollutant Emissions and Energy Efficiency under Controlled Conditions for Household Biomass Cookstoves and Implications for Metrics Useful in Setting International Test Standards

EPA Science Inventory

Realistic metrics and methods for testing household biomass cookstoves are required to develop standards needed by international policy makers, donors, and investors. Application of consistent test practices allows emissions and energy efficiency performance to be benchmarked and...
Coreference Resolution With Reconcile

DTIC Science & Technology

2010-07-01

evaluation of coreference re- solvers across a variety of benchmark data sets and standard scoring metrics. We describe Reconcile and present experimental... scores vary wildly across data sets, evaluation metrics, and system configurations. We believe that one root cause of these dispar- ities is the high...resolution and empirical evaluation of coreference resolvers across a variety of benchmark data sets and standard scoring metrics. We describe Reconcile
Measuring Distribution Performance? Benchmarking Warrants Your Attention

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ericson, Sean J; Alvarez, Paul

Identifying, designing, and measuring performance metrics is critical to securing customer value, but can be a difficult task. This article examines the use of benchmarks based on publicly available performance data to set challenging, yet fair, metrics and targets.
Proficiency performance benchmarks for removal of simulated brain tumors using a virtual reality simulator NeuroTouch.

PubMed

AlZhrani, Gmaan; Alotaibi, Fahad; Azarnoush, Hamed; Winkler-Schwartz, Alexander; Sabbagh, Abdulrahman; Bajunaid, Khalid; Lajoie, Susanne P; Del Maestro, Rolando F

2015-01-01

Assessment of neurosurgical technical skills involved in the resection of cerebral tumors in operative environments is complex. Educators emphasize the need to develop and use objective and meaningful assessment tools that are reliable and valid for assessing trainees' progress in acquiring surgical skills. The purpose of this study was to develop proficiency performance benchmarks for a newly proposed set of objective measures (metrics) of neurosurgical technical skills performance during simulated brain tumor resection using a new virtual reality simulator (NeuroTouch). Each participant performed the resection of 18 simulated brain tumors of different complexity using the NeuroTouch platform. Surgical performance was computed using Tier 1 and Tier 2 metrics derived from NeuroTouch simulator data consisting of (1) safety metrics, including (a) volume of surrounding simulated normal brain tissue removed, (b) sum of forces utilized, and (c) maximum force applied during tumor resection; (2) quality of operation metric, which involved the percentage of tumor removed; and (3) efficiency metrics, including (a) instrument total tip path lengths and (b) frequency of pedal activation. All studies were conducted in the Neurosurgical Simulation Research Centre, Montreal Neurological Institute and Hospital, McGill University, Montreal, Canada. A total of 33 participants were recruited, including 17 experts (board-certified neurosurgeons) and 16 novices (7 senior and 9 junior neurosurgery residents). The results demonstrated that "expert" neurosurgeons resected less surrounding simulated normal brain tissue and less tumor tissue than residents. These data are consistent with the concept that "experts" focused more on safety of the surgical procedure compared with novices. By analyzing experts' neurosurgical technical skills performance on these different metrics, we were able to establish benchmarks for goal proficiency performance training of neurosurgery residents. This study furthers our understanding of expert neurosurgical performance during the resection of simulated virtual reality tumors and provides neurosurgical trainees with predefined proficiency performance benchmarks designed to maximize the learning of specific surgical technical skills. Copyright © 2015 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.
Restaurant Energy Use Benchmarking Guideline

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hedrick, R.; Smith, V.; Field, K.

2011-07-01

A significant operational challenge for food service operators is defining energy use benchmark metrics to compare against the performance of individual stores. Without metrics, multiunit operators and managers have difficulty identifying which stores in their portfolios require extra attention to bring their energy performance in line with expectations. This report presents a method whereby multiunit operators may use their own utility data to create suitable metrics for evaluating their operations.
International land Model Benchmarking (ILAMB) Package v002.00

DOE Data Explorer

Collier, Nathaniel [Oak Ridge National Laboratory; Hoffman, Forrest M. [Oak Ridge National Laboratory; Mu, Mingquan [University of California, Irvine; Randerson, James T. [University of California, Irvine; Riley, William J. [Lawrence Berkeley National Laboratory

2016-05-09

As a contribution to International Land Model Benchmarking (ILAMB) Project, we are providing new analysis approaches, benchmarking tools, and science leadership. The goal of ILAMB is to assess and improve the performance of land models through international cooperation and to inform the design of new measurement campaigns and field studies to reduce uncertainties associated with key biogeochemical processes and feedbacks. ILAMB is expected to be a primary analysis tool for CMIP6 and future model-data intercomparison experiments. This team has developed initial prototype benchmarking systems for ILAMB, which will be improved and extended to include ocean model metrics and diagnostics.
International land Model Benchmarking (ILAMB) Package v001.00

DOE Data Explorer

Mu, Mingquan [University of California, Irvine; Randerson, James T. [University of California, Irvine; Riley, William J. [Lawrence Berkeley National Laboratory; Hoffman, Forrest M. [Oak Ridge National Laboratory

2016-05-02

As a contribution to International Land Model Benchmarking (ILAMB) Project, we are providing new analysis approaches, benchmarking tools, and science leadership. The goal of ILAMB is to assess and improve the performance of land models through international cooperation and to inform the design of new measurement campaigns and field studies to reduce uncertainties associated with key biogeochemical processes and feedbacks. ILAMB is expected to be a primary analysis tool for CMIP6 and future model-data intercomparison experiments. This team has developed initial prototype benchmarking systems for ILAMB, which will be improved and extended to include ocean model metrics and diagnostics.
Competency based training in robotic surgery: benchmark scores for virtual reality robotic simulation.

PubMed

Raison, Nicholas; Ahmed, Kamran; Fossati, Nicola; Buffi, Nicolò; Mottrie, Alexandre; Dasgupta, Prokar; Van Der Poel, Henk

2017-05-01

To develop benchmark scores of competency for use within a competency based virtual reality (VR) robotic training curriculum. This longitudinal, observational study analysed results from nine European Association of Urology hands-on-training courses in VR simulation. In all, 223 participants ranging from novice to expert robotic surgeons completed 1565 exercises. Competency was set at 75% of the mean expert score. Benchmark scores for all general performance metrics generated by the simulator were calculated. Assessment exercises were selected by expert consensus and through learning-curve analysis. Three basic skill and two advanced skill exercises were identified. Benchmark scores based on expert performance offered viable targets for novice and intermediate trainees in robotic surgery. Novice participants met the competency standards for most basic skill exercises; however, advanced exercises were significantly more challenging. Intermediate participants performed better across the seven metrics but still did not achieve the benchmark standard in the more difficult exercises. Benchmark scores derived from expert performances offer relevant and challenging scores for trainees to achieve during VR simulation training. Objective feedback allows both participants and trainers to monitor educational progress and ensures that training remains effective. Furthermore, the well-defined goals set through benchmarking offer clear targets for trainees and enable training to move to a more efficient competency based curriculum. © 2016 The Authors BJU International © 2016 BJU International Published by John Wiley & Sons Ltd.
Benchmarking Investments in Advancement: Results of the Inaugural CASE Advancement Investment Metrics Study (AIMS). CASE White Paper

ERIC Educational Resources Information Center

Kroll, Juidith A.

2012-01-01

The inaugural Advancement Investment Metrics Study, or AIMS, benchmarked investments and staffing in each of the advancement disciplines (advancement services, alumni relations, communications and marketing, fundraising and advancement management) as well as the return on the investment in fundraising specifically. This white paper reports on the…

A Machine-to-Machine protocol benchmark for eHealth applications - Use case: Respiratory rehabilitation.

PubMed

Talaminos-Barroso, Alejandro; Estudillo-Valderrama, Miguel A; Roa, Laura M; Reina-Tosina, Javier; Ortega-Ruiz, Francisco

2016-06-01

M2M (Machine-to-Machine) communications represent one of the main pillars of the new paradigm of the Internet of Things (IoT), and is making possible new opportunities for the eHealth business. Nevertheless, the large number of M2M protocols currently available hinders the election of a suitable solution that satisfies the requirements that can demand eHealth applications. In the first place, to develop a tool that provides a benchmarking analysis in order to objectively select among the most relevant M2M protocols for eHealth solutions. In the second place, to validate the tool with a particular use case: the respiratory rehabilitation. A software tool, called Distributed Computing Framework (DFC), has been designed and developed to execute the benchmarking tests and facilitate the deployment in environments with a large number of machines, with independence of the protocol and performance metrics selected. DDS, MQTT, CoAP, JMS, AMQP and XMPP protocols were evaluated considering different specific performance metrics, including CPU usage, memory usage, bandwidth consumption, latency and jitter. The results obtained allowed to validate a case of use: respiratory rehabilitation of chronic obstructive pulmonary disease (COPD) patients in two scenarios with different types of requirement: Home-Based and Ambulatory. The results of the benchmark comparison can guide eHealth developers in the choice of M2M technologies. In this regard, the framework presented is a simple and powerful tool for the deployment of benchmark tests under specific environments and conditions. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
The development of a virtual reality training curriculum for colonoscopy.

PubMed

Sugden, Colin; Aggarwal, Rajesh; Banerjee, Amrita; Haycock, Adam; Thomas-Gibson, Siwan; Williams, Christopher B; Darzi, Ara

2012-07-01

The development of a structured virtual reality (VR) training curriculum for colonoscopy using high-fidelity simulation. Colonoscopy requires detailed knowledge and technical skill. Changes to working practices in recent times have reduced the availability of traditional training opportunities. Much might, therefore, be achieved by applying novel technologies such as VR simulation to colonoscopy. Scientifically developed device-specific curricula aim to maximize the yield of laboratory-based training by focusing on validated modules and linking progression to the attainment of benchmarked proficiency criteria. Fifty participants comprised of 30 novices (<10 colonoscopies), 10 intermediates (100 to 500 colonoscopies), and 10 experienced (>500 colonoscopies) colonoscopists were recruited to participate. Surrogates of proficiency, such as number of procedures undertaken, determined prospective allocation to 1 of 3 groups (novice, intermediate, and experienced). Construct validity and learning value (comparison between groups and within groups respectively) for each task and metric on the chosen simulator model determined suitability for inclusion in the curriculum. Eight tasks in possession of construct validity and significant learning curves were included in the curriculum: 3 abstract tasks, 4 part-procedural tasks, and 1 procedural task. The whole-procedure task was valid for 11 metrics including the following: "time taken to complete the task" (1238, 343, and 293 s; P < 0.001) and "insertion length with embedded tip" (23.8, 3.6, and 4.9 cm; P = 0.005). Learning curves consistently plateaued at or beyond the ninth attempt. Valid metrics were used to define benchmarks, derived from the performance of the experienced cohort, for each included task. A comprehensive, stratified, benchmarked, whole-procedure curriculum has been developed for a modern high-fidelity VR colonoscopy simulator.
Aircraft Engine Gas Path Diagnostic Methods: Public Benchmarking Results

NASA Technical Reports Server (NTRS)

Simon, Donald L.; Borguet, Sebastien; Leonard, Olivier; Zhang, Xiaodong (Frank)

2013-01-01

Recent technology reviews have identified the need for objective assessments of aircraft engine health management (EHM) technologies. To help address this issue, a gas path diagnostic benchmark problem has been created and made publicly available. This software tool, referred to as the Propulsion Diagnostic Method Evaluation Strategy (ProDiMES), has been constructed based on feedback provided by the aircraft EHM community. It provides a standard benchmark problem enabling users to develop, evaluate and compare diagnostic methods. This paper will present an overview of ProDiMES along with a description of four gas path diagnostic methods developed and applied to the problem. These methods, which include analytical and empirical diagnostic techniques, will be described and associated blind-test-case metric results will be presented and compared. Lessons learned along with recommendations for improving the public benchmarking processes will also be presented and discussed.
Mean Abnormal Result Rate: Proof of Concept of a New Metric for Benchmarking Selectivity in Laboratory Test Ordering.

PubMed

Naugler, Christopher T; Guo, Maggie

2016-04-01

There is a need to develop and validate new metrics to access the appropriateness of laboratory test requests. The mean abnormal result rate (MARR) is a proposed measure of ordering selectivity, the premise being that higher mean abnormal rates represent more selective test ordering. As a validation of this metric, we compared the abnormal rate of lab tests with the number of tests ordered on the same requisition. We hypothesized that requisitions with larger numbers of requested tests represent less selective test ordering and therefore would have a lower overall abnormal rate. We examined 3,864,083 tests ordered on 451,895 requisitions and found that the MARR decreased from about 25% if one test was ordered to about 7% if nine or more tests were ordered, consistent with less selectivity when more tests were ordered. We then examined the MARR for community-based testing for 1,340 family physicians and found both a wide variation in MARR as well as an inverse relationship between the total tests ordered per year per physician and the physician-specific MARR. The proposed metric represents a new utilization metric for benchmarking relative selectivity of test orders among physicians. © American Society for Clinical Pathology, 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Toward benchmarking in catalysis science: Best practices, challenges, and opportunities

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bligaard, Thomas; Bullock, R. Morris; Campbell, Charles T.

Benchmarking is a community-based and (preferably) community-driven activity involving consensus-based decisions on how to make reproducible, fair, and relevant assessments. In catalysis science, important catalyst performance metrics include activity, selectivity, and the deactivation profile, which enable comparisons between new and standard catalysts. Benchmarking also requires careful documentation, archiving, and sharing of methods and measurements, to ensure that the full value of research data can be realized. Beyond these goals, benchmarking presents unique opportunities to advance and accelerate understanding of complex reaction systems by combining and comparing experimental information from multiple, in situ and operando techniques with theoretical insights derived frommore » calculations characterizing model systems. This Perspective describes the origins and uses of benchmarking and its applications in computational catalysis, heterogeneous catalysis, molecular catalysis, and electrocatalysis. As a result, it also discusses opportunities and challenges for future developments in these fields.« less
Toward benchmarking in catalysis science: Best practices, challenges, and opportunities

DOE PAGES

Bligaard, Thomas; Bullock, R. Morris; Campbell, Charles T.; ...

2016-03-07

Benchmarking is a community-based and (preferably) community-driven activity involving consensus-based decisions on how to make reproducible, fair, and relevant assessments. In catalysis science, important catalyst performance metrics include activity, selectivity, and the deactivation profile, which enable comparisons between new and standard catalysts. Benchmarking also requires careful documentation, archiving, and sharing of methods and measurements, to ensure that the full value of research data can be realized. Beyond these goals, benchmarking presents unique opportunities to advance and accelerate understanding of complex reaction systems by combining and comparing experimental information from multiple, in situ and operando techniques with theoretical insights derived frommore » calculations characterizing model systems. This Perspective describes the origins and uses of benchmarking and its applications in computational catalysis, heterogeneous catalysis, molecular catalysis, and electrocatalysis. As a result, it also discusses opportunities and challenges for future developments in these fields.« less
How to Advance TPC Benchmarks with Dependability Aspects

NASA Astrophysics Data System (ADS)

Almeida, Raquel; Poess, Meikel; Nambiar, Raghunath; Patil, Indira; Vieira, Marco

Transactional systems are the core of the information systems of most organizations. Although there is general acknowledgement that failures in these systems often entail significant impact both on the proceeds and reputation of companies, the benchmarks developed and managed by the Transaction Processing Performance Council (TPC) still maintain their focus on reporting bare performance. Each TPC benchmark has to pass a list of dependability-related tests (to verify ACID properties), but not all benchmarks require measuring their performances. While TPC-E measures the recovery time of some system failures, TPC-H and TPC-C only require functional correctness of such recovery. Consequently, systems used in TPC benchmarks are tuned mostly for performance. In this paper we argue that nowadays systems should be tuned for a more comprehensive suite of dependability tests, and that a dependability metric should be part of TPC benchmark publications. The paper discusses WHY and HOW this can be achieved. Two approaches are introduced and discussed: augmenting each TPC benchmark in a customized way, by extending each specification individually; and pursuing a more unified approach, defining a generic specification that could be adjoined to any TPC benchmark.
The voice of the customer--Part 2: Benchmarking battery chargers against the Consumer's Ideal Product.

PubMed

Bauer, S M; Lane, J P; Stone, V I; Unnikrishnan, N

1998-01-01

The Rehabilitation Engineering Research Center on Technology Evaluation and Transfer is exploring how the end users of assistive technology devices define the ideal device. This work is called the Consumer Ideal Product program. In this work, end users identify and establish the importance of a broad range of product design features, along with the related product support and service provided by manufacturers and vendors. This paper describes a method for systematically transforming end-user defined requirements into a form that is useful and accessible to product designers, manufacturers, and vendors. In particular, product requirements, importance weightings, and metrics are developed from the Consumer Ideal Product battery charger outcomes. Six battery charges are benchmarked against these product requirements using the metrics developed. The results suggest improvements for each product's design, service, and support. Overall, the six chargers meet roughly 45-75% of the ideal product's requirements. Many of the suggested improvements are low-cost changes that, if adopted, could provide companies a competitive advantage in the marketplace.
SU-E-T-776: Use of Quality Metrics for a New Hypo-Fractionated Pre-Surgical Mesothelioma Protocol

DOE Office of Scientific and Technical Information (OSTI.GOV)

Richardson, S; Mehta, V

Purpose: The “SMART” (Surgery for Mesothelioma After Radiation Therapy) approach involves hypo-fractionated radiotherapy of the lung pleura to 25Gy over 5 days followed by surgical resection within 7. Early clinical results suggest that this approach is very promising, but also logistically challenging due to the multidisciplinary involvement. Due to the compressed schedule, high dose, and shortened planning time, the delivery of the planned doses were monitored for safety with quality metric software. Methods: Hypo-fractionated IMRT treatment plans were developed for all patients and exported to Quality Reports™ software. Plan quality metrics or PQMs™ were created to calculate an objective scoringmore » function for each plan. This allows for an objective assessment of the quality of the plan and a benchmark for plan improvement for subsequent patients. The priorities of various components were incorporated based on similar hypo-fractionated protocols such as lung SBRT treatments. Results: Five patients have been treated at our institution using this approach. The plans were developed, QA performed, and ready within 5 days of simulation. Plan Quality metrics utilized in scoring included doses to OAR and target coverage. All patients tolerated treatment well and proceeded to surgery as scheduled. Reported toxicity included grade 1 nausea (n=1), grade 1 esophagitis (n=1), grade 2 fatigue (n=3). One patient had recurrent fluid accumulation following surgery. No patients experienced any pulmonary toxicity prior to surgery. Conclusion: An accelerated course of pre-operative high dose radiation for mesothelioma is an innovative and promising new protocol. Without historical data, one must proceed cautiously and monitor the data carefully. The development of quality metrics and scoring functions for these treatments allows us to benchmark our plans and monitor improvement. If subsequent toxicities occur, these will be easy to investigate and incorporate into the metrics. This will improve the safe delivery of large doses for these patients.« less
Collected notes from the Benchmarks and Metrics Workshop

NASA Technical Reports Server (NTRS)

Drummond, Mark E.; Kaelbling, Leslie P.; Rosenschein, Stanley J.

1991-01-01

In recent years there has been a proliferation of proposals in the artificial intelligence (AI) literature for integrated agent architectures. Each architecture offers an approach to the general problem of constructing an integrated agent. Unfortunately, the ways in which one architecture might be considered better than another are not always clear. There has been a growing realization that many of the positive and negative aspects of an architecture become apparent only when experimental evaluation is performed and that to progress as a discipline, we must develop rigorous experimental methods. In addition to the intrinsic intellectual interest of experimentation, rigorous performance evaluation of systems is also a crucial practical concern to our research sponsors. DARPA, NASA, and AFOSR (among others) are actively searching for better ways of experimentally evaluating alternative approaches to building intelligent agents. One tool for experimental evaluation involves testing systems on benchmark tasks in order to assess their relative performance. As part of a joint DARPA and NASA funded project, NASA-Ames and Teleos Research are carrying out a research effort to establish a set of benchmark tasks and evaluation metrics by which the performance of agent architectures may be determined. As part of this project, we held a workshop on Benchmarks and Metrics at the NASA Ames Research Center on June 25, 1990. The objective of the workshop was to foster early discussion on this important topic. We did not achieve a consensus, nor did we expect to. Collected here is some of the information that was exchanged at the workshop. Given here is an outline of the workshop, a list of the participants, notes taken on the white-board during open discussions, position papers/notes from some participants, and copies of slides used in the presentations.
Installed Cost Benchmarks and Deployment Barriers for Residential Solar Photovoltaics with Energy Storage: Q1 2016

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ardani, Kristen; O'Shaughnessy, Eric; Fu, Ran

2016-12-01

In this report, we fill a gap in the existing knowledge about PV-plus-storage system costs and value by providing detailed component- and system-level installed cost benchmarks for residential systems. We also examine other barriers to increased deployment of PV-plus-storage systems in the residential sector. The results are meant to help technology manufacturers, installers, and other stakeholders identify cost-reduction opportunities and inform decision makers about regulatory, policy, and market characteristics that impede solar plus storage deployment. In addition, our periodic cost benchmarks will document progress in cost reductions over time. To analyze costs for PV-plus-storage systems deployed in the first quartermore » of 2016, we adapt the National Renewable Energy Laboratory's component- and system-level cost-modeling methods for standalone PV. In general, we attempt to model best-in-class installation techniques and business operations from an installed-cost perspective. In addition to our original analysis, model development, and review of published literature, we derive inputs for our model and validate our draft results via interviews with industry and subject-matter experts. One challenge to analyzing the costs of PV-plus-storage systems is choosing an appropriate cost metric. Unlike standalone PV, energy storage lacks universally accepted cost metrics, such as dollars per watt of installed capacity and lifetime levelized cost of energy. We explain the difficulty of arriving at a standard approach for reporting storage costs and then provide the rationale for using the total installed costs of a standard PV-plus-storage system as our primary metric, rather than using a system-size-normalized metric.« less
EVA Human Health and Performance Benchmarking Study Overview and Development of a Microgravity Protocol

NASA Technical Reports Server (NTRS)

Norcross, Jason; Jarvis, Sarah; Bekdash, Omar; Cupples, Scott; Abercromby, Andrew

2017-01-01

The primary objective of this study is to develop a protocol to reliably characterize human health and performance metrics for individuals working inside various EVA suits under realistic spaceflight conditions. Expected results and methodologies developed during this study will provide the baseline benchmarking data and protocols with which future EVA suits and suit configurations (e.g., varied pressure, mass, center of gravity [CG]) and different test subject populations (e.g., deconditioned crewmembers) may be reliably assessed and compared. Results may also be used, in conjunction with subsequent testing, to inform fitness-for-duty standards, as well as design requirements and operations concepts for future EVA suits and other exploration systems.
Benchmarking Usage Statistics in Collection Management Decisions for Serials

ERIC Educational Resources Information Center

Tucker, Cory

2009-01-01

Usage statistics are an important metric for making decisions on serials. Although the University of Nevada, Las Vegas (UNLV) Libraries have been collecting usage statistics, the statistics had not frequently been used to make decisions and had not been included in collection development policy. After undergoing a collection assessment, the…
The National Practice Benchmark for Oncology: 2015 Report for 2014 Data

PubMed Central

Balch, Carla; Ogle, John D.

2016-01-01

The National Practice Benchmark (NPB) is a unique tool used to measure oncology practices against others across the country in a meaningful way despite variations in practice demographics, size, and setting. In today’s challenging economic environment, each practice positions service offerings and competitive advantages to attract patients. Although the data in the NPB report are primarily reported by community oncology practices, the business structure and arrangements with regional health care systems are also reflected in the benchmark report. The ability to produce detailed metrics is an accomplishment of excellence in business and clinical management. With these metrics, a practice should be able to measure and analyze its current business practices and make appropriate changes, if necessary. In this report, we build on the foundation initially established by Oncology Metrics (acquired by Flatiron Health in 2014) over years of data collection and refine definitions to deliver the NPB, which is uniquely meaningful in the oncology market. PMID:27006357
Automatic Classification of Protein Structure Using the Maximum Contact Map Overlap Metric

DOE Office of Scientific and Technical Information (OSTI.GOV)

Andonov, Rumen; Djidjev, Hristo Nikolov; Klau, Gunnar W.

In this paper, we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfies all properties of a metric on the space of protein representations. Having a metric in that space allows one to avoid pairwise comparisons on the entire database and, thus, to significantly accelerate exploring the protein space compared to no-metric spaces. We show on a gold standard superfamily classification benchmark set of 6759 proteins that our exact k-nearest neighbor (k-NN) scheme classifiesmore » up to 224 out of 236 queries correctly and on a larger, extended version of the benchmark with 60; 850 additional structures, up to 1361 out of 1369 queries. Finally, our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on flexible contact map overlap alignments.« less
Automatic Classification of Protein Structure Using the Maximum Contact Map Overlap Metric

DOE PAGES

Andonov, Rumen; Djidjev, Hristo Nikolov; Klau, Gunnar W.; ...

2015-10-09

In this paper, we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfies all properties of a metric on the space of protein representations. Having a metric in that space allows one to avoid pairwise comparisons on the entire database and, thus, to significantly accelerate exploring the protein space compared to no-metric spaces. We show on a gold standard superfamily classification benchmark set of 6759 proteins that our exact k-nearest neighbor (k-NN) scheme classifiesmore » up to 224 out of 236 queries correctly and on a larger, extended version of the benchmark with 60; 850 additional structures, up to 1361 out of 1369 queries. Finally, our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on flexible contact map overlap alignments.« less
EVA Health and Human Performance Benchmarking Study

NASA Technical Reports Server (NTRS)

Abercromby, A. F.; Norcross, J.; Jarvis, S. L.

2016-01-01

Multiple HRP Risks and Gaps require detailed characterization of human health and performance during exploration extravehicular activity (EVA) tasks; however, a rigorous and comprehensive methodology for characterizing and comparing the health and human performance implications of current and future EVA spacesuit designs does not exist. This study will identify and implement functional tasks and metrics, both objective and subjective, that are relevant to health and human performance, such as metabolic expenditure, suit fit, discomfort, suited postural stability, cognitive performance, and potentially biochemical responses for humans working inside different EVA suits doing functional tasks under the appropriate simulated reduced gravity environments. This study will provide health and human performance benchmark data for humans working in current EVA suits (EMU, Mark III, and Z2) as well as shirtsleeves using a standard set of tasks and metrics with quantified reliability. Results and methodologies developed during this test will provide benchmark data against which future EVA suits, and different suit configurations (eg, varied pressure, mass, CG) may be reliably compared in subsequent tests. Results will also inform fitness for duty standards as well as design requirements and operations concepts for future EVA suits and other exploration systems.
Identifying Drug-Target Interactions with Decision Templates.

PubMed

Yan, Xiao-Ying; Zhang, Shao-Wu

2018-01-01

During the development process of new drugs, identification of the drug-target interactions wins primary concerns. However, the chemical or biological experiments bear the limitation in coverage as well as the huge cost of both time and money. Based on drug similarity and target similarity, chemogenomic methods can be able to predict potential drug-target interactions (DTIs) on a large scale and have no luxurious need about target structures or ligand entries. In order to reflect the cases that the drugs having variant structures interact with common targets and the targets having dissimilar sequences interact with same drugs. In addition, though several other similarity metrics have been developed to predict DTIs, the combination of multiple similarity metrics (especially heterogeneous similarities) is too naïve to sufficiently explore the multiple similarities. In this paper, based on Gene Ontology and pathway annotation, we introduce two novel target similarity metrics to address above issues. More importantly, we propose a more effective strategy via decision template to integrate multiple classifiers designed with multiple similarity metrics. In the scenarios that predict existing targets for new drugs and predict approved drugs for new protein targets, the results on the DTI benchmark datasets show that our target similarity metrics are able to enhance the predictive accuracies in two scenarios. And the elaborate fusion strategy of multiple classifiers has better predictive power than the naïve combination of multiple similarity metrics. Compared with other two state-of-the-art approaches on the four popular benchmark datasets of binary drug-target interactions, our method achieves the best results in terms of AUC and AUPR for predicting available targets for new drugs (S2), and predicting approved drugs for new protein targets (S3).These results demonstrate that our method can effectively predict the drug-target interactions. The software package can freely available at https://github.com/NwpuSY/DT_all.git for academic users. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
An Integrated Development Environment for Adiabatic Quantum Programming

DOE Office of Scientific and Technical Information (OSTI.GOV)

Humble, Travis S; McCaskey, Alex; Bennink, Ryan S

2014-01-01

Adiabatic quantum computing is a promising route to the computational power afforded by quantum information processing. The recent availability of adiabatic hardware raises the question of how well quantum programs perform. Benchmarking behavior is challenging since the multiple steps to synthesize an adiabatic quantum program are highly tunable. We present an adiabatic quantum programming environment called JADE that provides control over all the steps taken during program development. JADE captures the workflow needed to rigorously benchmark performance while also allowing a variety of problem types, programming techniques, and processor configurations. We have also integrated JADE with a quantum simulation enginemore » that enables program profiling using numerical calculation. The computational engine supports plug-ins for simulation methodologies tailored to various metrics and computing resources. We present the design, integration, and deployment of JADE and discuss its use for benchmarking adiabatic quantum programs.« less
Quality Metrics in Neonatal and Pediatric Critical Care Transport: A National Delphi Project.

PubMed

Schwartz, Hamilton P; Bigham, Michael T; Schoettker, Pamela J; Meyer, Keith; Trautman, Michael S; Insoft, Robert M

2015-10-01

The transport of neonatal and pediatric patients to tertiary care facilities for specialized care demands monitoring the quality of care delivered during transport and its impact on patient outcomes. In 2011, pediatric transport teams in Ohio met to identify quality indicators permitting comparisons among programs. However, no set of national consensus quality metrics exists for benchmarking transport teams. The aim of this project was to achieve national consensus on appropriate neonatal and pediatric transport quality metrics. Modified Delphi technique. The first round of consensus determination was via electronic mail survey, followed by rounds of consensus determination in-person at the American Academy of Pediatrics Section on Transport Medicine's 2012 Quality Metrics Summit. All attendees of the American Academy of Pediatrics Section on Transport Medicine Quality Metrics Summit, conducted on October 21-23, 2012, in New Orleans, LA, were eligible to participate. Candidate quality metrics were identified through literature review and those metrics currently tracked by participating programs. Participants were asked in a series of rounds to identify "very important" quality metrics for transport. It was determined a priori that consensus on a metric's importance was achieved when at least 70% of respondents were in agreement. This is consistent with other Delphi studies. Eighty-two candidate metrics were considered initially. Ultimately, 12 metrics achieved consensus as "very important" to transport. These include metrics related to airway management, team mobilization time, patient and crew injuries, and adverse patient care events. Definitions were assigned to the 12 metrics to facilitate uniform data tracking among programs. The authors succeeded in achieving consensus among a diverse group of national transport experts on 12 core neonatal and pediatric transport quality metrics. We propose that transport teams across the country use these metrics to benchmark and guide their quality improvement activities.

The software product assurance metrics study: JPL's software systems quality and productivity

NASA Technical Reports Server (NTRS)

Bush, Marilyn W.

1989-01-01

The findings are reported of the Jet Propulsion Laboratory (JPL)/Software Product Assurance (SPA) Metrics Study, conducted as part of a larger JPL effort to improve software quality and productivity. Until recently, no comprehensive data had been assembled on how JPL manages and develops software-intensive systems. The first objective was to collect data on software development from as many projects and for as many years as possible. Results from five projects are discussed. These results reflect 15 years of JPL software development, representing over 100 data points (systems and subsystems), over a third of a billion dollars, over four million lines of code and 28,000 person months. Analysis of this data provides a benchmark for gauging the effectiveness of past, present and future software development work. In addition, the study is meant to encourage projects to record existing metrics data and to gather future data. The SPA long term goal is to integrate the collection of historical data and ongoing project data with future project estimations.
Interlaboratory Study Characterizing a Yeast Performance Standard for Benchmarking LC-MS Platform Performance*

PubMed Central

Paulovich, Amanda G.; Billheimer, Dean; Ham, Amy-Joan L.; Vega-Montoto, Lorenzo; Rudnick, Paul A.; Tabb, David L.; Wang, Pei; Blackman, Ronald K.; Bunk, David M.; Cardasis, Helene L.; Clauser, Karl R.; Kinsinger, Christopher R.; Schilling, Birgit; Tegeler, Tony J.; Variyath, Asokan Mulayath; Wang, Mu; Whiteaker, Jeffrey R.; Zimmerman, Lisa J.; Fenyo, David; Carr, Steven A.; Fisher, Susan J.; Gibson, Bradford W.; Mesri, Mehdi; Neubert, Thomas A.; Regnier, Fred E.; Rodriguez, Henry; Spiegelman, Cliff; Stein, Stephen E.; Tempst, Paul; Liebler, Daniel C.

2010-01-01

Optimal performance of LC-MS/MS platforms is critical to generating high quality proteomics data. Although individual laboratories have developed quality control samples, there is no widely available performance standard of biological complexity (and associated reference data sets) for benchmarking of platform performance for analysis of complex biological proteomes across different laboratories in the community. Individual preparations of the yeast Saccharomyces cerevisiae proteome have been used extensively by laboratories in the proteomics community to characterize LC-MS platform performance. The yeast proteome is uniquely attractive as a performance standard because it is the most extensively characterized complex biological proteome and the only one associated with several large scale studies estimating the abundance of all detectable proteins. In this study, we describe a standard operating protocol for large scale production of the yeast performance standard and offer aliquots to the community through the National Institute of Standards and Technology where the yeast proteome is under development as a certified reference material to meet the long term needs of the community. Using a series of metrics that characterize LC-MS performance, we provide a reference data set demonstrating typical performance of commonly used ion trap instrument platforms in expert laboratories; the results provide a basis for laboratories to benchmark their own performance, to improve upon current methods, and to evaluate new technologies. Additionally, we demonstrate how the yeast reference, spiked with human proteins, can be used to benchmark the power of proteomics platforms for detection of differentially expressed proteins at different levels of concentration in a complex matrix, thereby providing a metric to evaluate and minimize preanalytical and analytical variation in comparative proteomics experiments. PMID:19858499
Improved product energy intensity benchmarking metrics for thermally concentrated food products.

PubMed

Walker, Michael E; Arnold, Craig S; Lettieri, David J; Hutchins, Margot J; Masanet, Eric

2014-10-21

Product energy intensity (PEI) metrics allow industry and policymakers to quantify manufacturing energy requirements on a product-output basis. However, complexities can arise for benchmarking of thermally concentrated products, particularly in the food processing industry, due to differences in outlet composition, feed material composition, and processing technology. This study analyzes tomato paste as a typical, high-volume concentrated product using a thermodynamics-based model. Results show that PEI for tomato pastes and purees varies from 1200 to 9700 kJ/kg over the range of 8%-40% outlet solids concentration for a 3-effect evaporator, and 980-7000 kJ/kg for a 5-effect evaporator. Further, the PEI for producing paste at 31% outlet solids concentration in a 3-effect evaporator varies from 13,000 kJ/kg at 3% feed solids concentration to 5900 kJ/kg at 6%; for a 5-effect evaporator, the variation is from 9200 kJ/kg at 3%, to 4300 kJ/kg at 6%. Methods to compare the PEI of different product concentrations on a standard basis are evaluated. This paper also presents methods to develop PEI benchmark values for multiple plants. These results focus on the case of a tomato paste processing facility, but can be extended to other products and industries that utilize thermal concentration.
Best practices from WisDOT mega and ARRA projects : statistical analysis and % time vs. % cost metrics.

DOT National Transportation Integrated Search

2012-03-01

This study was undertaken to: 1) apply a benchmarking process to identify best practices within four areas Wisconsin Department of Transportation (WisDOT) construction management and 2) analyze two performance metrics, % Cost vs. % Time, tracked by t...
Toward a standard for the evaluation of PET-Auto-Segmentation methods following the recommendations of AAPM task group No. 211: Requirements and implementation.

PubMed

Berthon, Beatrice; Spezi, Emiliano; Galavis, Paulina; Shepherd, Tony; Apte, Aditya; Hatt, Mathieu; Fayad, Hadi; De Bernardi, Elisabetta; Soffientini, Chiara D; Ross Schmidtlein, C; El Naqa, Issam; Jeraj, Robert; Lu, Wei; Das, Shiva; Zaidi, Habib; Mawlawi, Osama R; Visvikis, Dimitris; Lee, John A; Kirov, Assen S

2017-08-01

The aim of this paper is to define the requirements and describe the design and implementation of a standard benchmark tool for evaluation and validation of PET-auto-segmentation (PET-AS) algorithms. This work follows the recommendations of Task Group 211 (TG211) appointed by the American Association of Physicists in Medicine (AAPM). The recommendations published in the AAPM TG211 report were used to derive a set of required features and to guide the design and structure of a benchmarking software tool. These items included the selection of appropriate representative data and reference contours obtained from established approaches and the description of available metrics. The benchmark was designed in a way that it could be extendable by inclusion of bespoke segmentation methods, while maintaining its main purpose of being a standard testing platform for newly developed PET-AS methods. An example of implementation of the proposed framework, named PETASset, was built. In this work, a selection of PET-AS methods representing common approaches to PET image segmentation was evaluated within PETASset for the purpose of testing and demonstrating the capabilities of the software as a benchmark platform. A selection of clinical, physical, and simulated phantom data, including "best estimates" reference contours from macroscopic specimens, simulation template, and CT scans was built into the PETASset application database. Specific metrics such as Dice Similarity Coefficient (DSC), Positive Predictive Value (PPV), and Sensitivity (S), were included to allow the user to compare the results of any given PET-AS algorithm to the reference contours. In addition, a tool to generate structured reports on the evaluation of the performance of PET-AS algorithms against the reference contours was built. The variation of the metric agreement values with the reference contours across the PET-AS methods evaluated for demonstration were between 0.51 and 0.83, 0.44 and 0.86, and 0.61 and 1.00 for DSC, PPV, and the S metric, respectively. Examples of agreement limits were provided to show how the software could be used to evaluate a new algorithm against the existing state-of-the art. PETASset provides a platform that allows standardizing the evaluation and comparison of different PET-AS methods on a wide range of PET datasets. The developed platform will be available to users willing to evaluate their PET-AS methods and contribute with more evaluation datasets. © 2017 The Authors. Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.
A Simple Graphical Method for Quantification of Disaster Management Surge Capacity Using Computer Simulation and Process-control Tools.

PubMed

Franc, Jeffrey Michael; Ingrassia, Pier Luigi; Verde, Manuela; Colombo, Davide; Della Corte, Francesco

2015-02-01

Surge capacity, or the ability to manage an extraordinary volume of patients, is fundamental for hospital management of mass-casualty incidents. However, quantification of surge capacity is difficult and no universal standard for its measurement has emerged, nor has a standardized statistical method been advocated. As mass-casualty incidents are rare, simulation may represent a viable alternative to measure surge capacity. Hypothesis/Problem The objective of the current study was to develop a statistical method for the quantification of surge capacity using a combination of computer simulation and simple process-control statistical tools. Length-of-stay (LOS) and patient volume (PV) were used as metrics. The use of this method was then demonstrated on a subsequent computer simulation of an emergency department (ED) response to a mass-casualty incident. In the derivation phase, 357 participants in five countries performed 62 computer simulations of an ED response to a mass-casualty incident. Benchmarks for ED response were derived from these simulations, including LOS and PV metrics for triage, bed assignment, physician assessment, and disposition. In the application phase, 13 students of the European Master in Disaster Medicine (EMDM) program completed the same simulation scenario, and the results were compared to the standards obtained in the derivation phase. Patient-volume metrics included number of patients to be triaged, assigned to rooms, assessed by a physician, and disposed. Length-of-stay metrics included median time to triage, room assignment, physician assessment, and disposition. Simple graphical methods were used to compare the application phase group to the derived benchmarks using process-control statistical tools. The group in the application phase failed to meet the indicated standard for LOS from admission to disposition decision. This study demonstrates how simulation software can be used to derive values for objective benchmarks of ED surge capacity using PV and LOS metrics. These objective metrics can then be applied to other simulation groups using simple graphical process-control tools to provide a numeric measure of surge capacity. Repeated use in simulations of actual EDs may represent a potential means of objectively quantifying disaster management surge capacity. It is hoped that the described statistical method, which is simple and reusable, will be useful for investigators in this field to apply to their own research.
Evaluation of four seagrass species as early warning indicators for nitrogen overloading: Implications for eutrophic evaluation and ecosystem management.

PubMed

Yang, Xiaolong; Zhang, Peidong; Li, Wentao; Hu, Chengye; Zhang, Xiumei; He, Pingguo

2018-04-23

Seagrasses are major coastal primary producers and are widely distributed on coasts worldwide. Seagrasses show sensitivity to environmental stress due to their high phenotypic plasticity, and therefore, we evaluated the use of constituent elements in four dominant seagrass species as early warning indicators for nitrogen eutrophication of coastal regions. A meta-analysis was conducted with published data to develop a global benchmark for the selected indicator, which was used to evaluate nitrogen loading at a global scale. A case study at three bays was subsequently conducted to test for local-scale differences in leaf C/N ratios in four seagrasses. Additionally, morphological and physiological metrics of seagrasses were measured from the three locations under varied nitrogen levels to develop further assessment indexes. The benchmark and local study showed that leaf C/N ratios of Zostera marina were sensitive to nitrogen discharge, which could be a highly valuable early warning indicator on a global scale. Moreover, the threshold value of seagrass leaf C/N was determined according to the benchmark to differentiate eutrophic and low nitrogen levels at a local scale. Of the eight phenotypic metrics measured, leaf width, total chlorophyll (a + b), chlorophyll ratio (a/b), and starch in the rhizome were the most effective at discriminating between the three locations and could also be promising indicators for monitoring eutrophication. Copyright © 2018. Published by Elsevier B.V.
Benchmarking homogenization algorithms for monthly data

NASA Astrophysics Data System (ADS)

Venema, V. K. C.; Mestre, O.; Aguilar, E.; Auer, I.; Guijarro, J. A.; Domonkos, P.; Vertacnik, G.; Szentimrey, T.; Stepanek, P.; Zahradnicek, P.; Viarre, J.; Müller-Westermeier, G.; Lakatos, M.; Williams, C. N.; Menne, M. J.; Lindau, R.; Rasol, D.; Rustemeier, E.; Kolokythas, K.; Marinova, T.; Andresen, L.; Acquaotta, F.; Fratiannil, S.; Cheval, S.; Klancar, M.; Brunetti, M.; Gruber, C.; Prohom Duran, M.; Likso, T.; Esteban, P.; Brandsma, T.; Willett, K.

2013-09-01

The COST (European Cooperation in Science and Technology) Action ES0601: Advances in homogenization methods of climate series: an integrated approach (HOME) has executed a blind intercomparison and validation study for monthly homogenization algorithms. Time series of monthly temperature and precipitation were evaluated because of their importance for climate studies. The algorithms were validated against a realistic benchmark dataset. Participants provided 25 separate homogenized contributions as part of the blind study as well as 22 additional solutions submitted after the details of the imposed inhomogeneities were revealed. These homogenized datasets were assessed by a number of performance metrics including i) the centered root mean square error relative to the true homogeneous values at various averaging scales, ii) the error in linear trend estimates and iii) traditional contingency skill scores. The metrics were computed both using the individual station series as well as the network average regional series. The performance of the contributions depends significantly on the error metric considered. Although relative homogenization algorithms typically improve the homogeneity of temperature data, only the best ones improve precipitation data. Moreover, state-of-the-art relative homogenization algorithms developed to work with an inhomogeneous reference are shown to perform best. The study showed that currently automatic algorithms can perform as well as manual ones.
Establishing benchmarks and metrics for disruptive technologies, inappropriate and obsolete tests in the clinical laboratory.

PubMed

Kiechle, Frederick L; Arcenas, Rodney C; Rogers, Linda C

2014-01-01

Benchmarks and metrics related to laboratory test utilization are based on evidence-based medical literature that may suffer from a positive publication bias. Guidelines are only as good as the data reviewed to create them. Disruptive technologies require time for appropriate use to be established before utilization review will be meaningful. Metrics include monitoring the use of obsolete tests and the inappropriate use of lab tests. Test utilization by clients in a hospital outreach program can be used to monitor the impact of new clients on lab workload. A multi-disciplinary laboratory utilization committee is the most effective tool for modifying bad habits, and reviewing and approving new tests for the lab formulary or by sending them out to a reference lab. Copyright © 2013 Elsevier B.V. All rights reserved.
Benchmarking Gas Path Diagnostic Methods: A Public Approach

NASA Technical Reports Server (NTRS)

Simon, Donald L.; Bird, Jeff; Davison, Craig; Volponi, Al; Iverson, R. Eugene

2008-01-01

Recent technology reviews have identified the need for objective assessments of engine health management (EHM) technology. The need is two-fold: technology developers require relevant data and problems to design and validate new algorithms and techniques while engine system integrators and operators need practical tools to direct development and then evaluate the effectiveness of proposed solutions. This paper presents a publicly available gas path diagnostic benchmark problem that has been developed by the Propulsion and Power Systems Panel of The Technical Cooperation Program (TTCP) to help address these needs. The problem is coded in MATLAB (The MathWorks, Inc.) and coupled with a non-linear turbofan engine simulation to produce "snap-shot" measurements, with relevant noise levels, as if collected from a fleet of engines over their lifetime of use. Each engine within the fleet will experience unique operating and deterioration profiles, and may encounter randomly occurring relevant gas path faults including sensor, actuator and component faults. The challenge to the EHM community is to develop gas path diagnostic algorithms to reliably perform fault detection and isolation. An example solution to the benchmark problem is provided along with associated evaluation metrics. A plan is presented to disseminate this benchmark problem to the engine health management technical community and invite technology solutions.
Evaluating Soil Health Using Remotely Sensed Evapotranspiration on the Benchmark Barnes Soils of North Dakota

NASA Astrophysics Data System (ADS)

Bohn, Meyer; Hopkins, David; Steele, Dean; Tuscherer, Sheldon

2017-04-01

The benchmark Barnes soil series is an extensive upland Hapludoll of the northern Great Plains that is both economically and ecologically vital to the region. Effects of tillage erosion coupled with wind and water erosion have degraded Barnes soil quality, but with unknown extent, distribution, or severity. Evidence of soil degradation documented for a half century warrants that the assumption of productivity be tested. Soil resilience is linked to several dynamic soil properties and National Cooperative Soil Survey initiatives are now focused on identifying those properties for benchmark soils. Quantification of soil degradation is dependent on a reliable method for broad-scale evaluation. The soil survey community is currently developing rapid and widespread soil property assessment technologies. Improvements in satellite based remote-sensing and image analysis software have stimulated the application of broad-scale resource assessment. Furthermore, these technologies have fostered refinement of land-based surface energy balance algorithms, i.e. Mapping Evapotranspiration at High Resolution with Internalized Calibration (METRIC) algorithm for evapotranspiration (ET) mapping. The hypothesis of this study is that ET mapping technology can differentiate soil function on extensive landscapes and identify degraded areas. A recent soil change study in eastern North Dakota resampled legacy Barnes pedons sampled prior to 1960 and found significant decreases in organic carbon. An ancillary study showed that evapotranspiration (ET) estimates from METRIC decreased with Barnes erosion class severity. An ET raster map has been developed for three eastern North Dakota counties using METRIC and Landsat 5 imagery. ET pixel candidates on major Barnes soil map units were stratified into tertiles and classified as ranked ET subdivisions. A sampling population of randomly selected points stratified by ET class and county proportion was established. Morphologic and chemical data will be recorded at each sampling site to test whether soil properties correlate to ET, thus serving as a non-biased proxy for soil health.
The future of simulation technologies for complex cardiovascular procedures.

PubMed

Cates, Christopher U; Gallagher, Anthony G

2012-09-01

Changing work practices and the evolution of more complex interventions in cardiovascular medicine are forcing a paradigm shift in the way doctors are trained. Implantable cardioverter defibrillator (ICD), transcatheter aortic valve implantation (TAVI), carotid artery stenting (CAS), and acute stroke intervention procedures are forcing these changes at a faster pace than in other disciplines. As a consequence, cardiovascular medicine has had to develop a sophisticated understanding of precisely what is meant by 'training' and 'skill'. An evolving conclusion is that procedure training on a virtual reality (VR) simulator presents a viable current solution. These simulations should characterize the important performance characteristics of procedural skill that have metrics derived and defined from, and then benchmarked to experienced operators (i.e. level of proficiency). Simulation training is optimal with metric-based feedback, particularly formative trainee error assessments, proximate to their performance. In prospective, randomized studies, learners who trained to a benchmarked proficiency level on the simulator performed significantly better than learners who were traditionally trained. In addition, cardiovascular medicine now has available the most sophisticated virtual reality simulators in medicine and these have been used for the roll-out of interventions such as CAS in the USA and globally with cardiovascular society and industry partnered training programmes. The Food and Drug Administration has advocated the use of VR simulation as part of the approval of new devices and the American Board of Internal Medicine has adopted simulation as part of its maintenance of certification. Simulation is rapidly becoming a mainstay of cardiovascular education, training, certification, and the safe adoption of new technology. If cardiovascular medicine is to continue to lead in the adoption and integration of simulation, then, it must take a proactive position in the development of metric-based simulation curriculum, adoption of proficiency benchmarking definitions, and then resolve to commit resources so as to continue to lead this revolution in physician training.
Aluminum-Mediated Formation of Cyclic Carbonates: Benchmarking Catalytic Performance Metrics.

PubMed

Rintjema, Jeroen; Kleij, Arjan W

2017-03-22

We report a comparative study on the activity of a series of fifteen binary catalysts derived from various reported aluminum-based complexes. A benchmarking of their initial rates in the coupling of various terminal and internal epoxides in the presence of three different nucleophilic additives was carried out, providing for the first time a useful comparison of activity metrics in the area of cyclic organic carbonate formation. These investigations provide a useful framework for how to realistically valorize relative reactivities and which features are important when considering the ideal operational window of each binary catalyst system. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Healthcare Energy Efficiency Research and Development

DOE Office of Scientific and Technical Information (OSTI.GOV)

Black, Douglas R.; Lai, Judy; Lanzisera, Steven M

2011-01-31

Hospitals are known to be among the most energy intensive commercial buildings in California. Estimates of energy end-uses (e.g. for heating, cooling, lighting, etc.) in hospitals are uncertain for lack of information about hospital-specific mechanical system operations and process loads. Lawrence Berkeley National Laboratory developed and demonstrated a benchmarking system designed specifically for hospitals. Version 1.0 featured metrics to assess energy performance for the broad variety of ventilation and thermal systems that are present in California hospitals. It required moderate to extensive sub-metering or supplemental monitoring. In this new project, we developed a companion handbook with detailed equations that canmore » be used toconvert data from energy and other sensors that may be added to or already part of hospital heating, ventilation and cooling systems into metrics described in the benchmarking document.This report additionally includes a case study and guidance on including metering into designs for new hospitals, renovations and retrofits. Despite widespread concern that this end-use is large and growing, there is limited reliable information about energy use by distributed medical equipment and other miscellaneouselectrical loads in hospitals. This report proposes a framework for quantifying aggregate energy use of medical equipment and miscellaneous loads. Novel approaches are suggested and tried in an attempt to obtain data to support this framework.« less
Short-Term Field Study Programs: A Holistic and Experiential Approach to Learning

ERIC Educational Resources Information Center

Long, Mary M.; Sandler, Dennis M.; Topol, Martin T.

2017-01-01

For business schools, AACSB and Middle States' call for more experiential learning is one reason to provide study abroad programs. Universities must attend to the demand for continuous improvement and employ metrics to benchmark and evaluate their relative standing among peer institutions. One such benchmark is the National Survey of Student…
Taking Aims: New CASE Study Benchmarks Advancement Investments and Returns

ERIC Educational Resources Information Center

Goldsmith, Rae

2012-01-01

Advancement professionals have always been thirsty for information that will help them understand how their programs compare with those of their peers. But in recent years the demand for benchmarking data has exploded as budgets have become leaner, leaders have become more business minded, and terms like "performance metrics and return on…
Can Human Capital Metrics Effectively Benchmark Higher Education with For-Profit Companies?

ERIC Educational Resources Information Center

Hagedorn, Kathy; Forlaw, Blair

2007-01-01

Last fall, Saint Louis University participated in St. Louis, Missouri's, first Human Capital Performance Study alongside several of the region's largest for-profit employers. The university also participated this year in the benchmarking of employee engagement factors conducted by the St. Louis Business Journal in its effort to quantify and select…
First benchmark of the Unstructured Grid Adaptation Working Group

NASA Technical Reports Server (NTRS)

Ibanez, Daniel; Barral, Nicolas; Krakos, Joshua; Loseille, Adrien; Michal, Todd; Park, Mike

2017-01-01

Unstructured grid adaptation is a technology that holds the potential to improve the automation and accuracy of computational fluid dynamics and other computational disciplines. Difficulty producing the highly anisotropic elements necessary for simulation on complex curved geometries that satisfies a resolution request has limited this technology's widespread adoption. The Unstructured Grid Adaptation Working Group is an open gathering of researchers working on adapting simplicial meshes to conform to a metric field. Current members span a wide range of institutions including academia, industry, and national laboratories. The purpose of this group is to create a common basis for understanding and improving mesh adaptation. We present our first major contribution: a common set of benchmark cases, including input meshes and analytic metric specifications, that are publicly available to be used for evaluating any mesh adaptation code. We also present the results of several existing codes on these benchmark cases, to illustrate their utility in identifying key challenges common to all codes and important differences between available codes. Future directions are defined to expand this benchmark to mature the technology necessary to impact practical simulation workflows.
A Locally Weighted Fixation Density-Based Metric for Assessing the Quality of Visual Saliency Predictions

NASA Astrophysics Data System (ADS)

Gide, Milind S.; Karam, Lina J.

2016-08-01

With the increased focus on visual attention (VA) in the last decade, a large number of computational visual saliency methods have been developed over the past few years. These models are traditionally evaluated by using performance evaluation metrics that quantify the match between predicted saliency and fixation data obtained from eye-tracking experiments on human observers. Though a considerable number of such metrics have been proposed in the literature, there are notable problems in them. In this work, we discuss shortcomings in existing metrics through illustrative examples and propose a new metric that uses local weights based on fixation density which overcomes these flaws. To compare the performance of our proposed metric at assessing the quality of saliency prediction with other existing metrics, we construct a ground-truth subjective database in which saliency maps obtained from 17 different VA models are evaluated by 16 human observers on a 5-point categorical scale in terms of their visual resemblance with corresponding ground-truth fixation density maps obtained from eye-tracking data. The metrics are evaluated by correlating metric scores with the human subjective ratings. The correlation results show that the proposed evaluation metric outperforms all other popular existing metrics. Additionally, the constructed database and corresponding subjective ratings provide an insight into which of the existing metrics and future metrics are better at estimating the quality of saliency prediction and can be used as a benchmark.
Measuring the gap: quantifying and comparing local health inequalities.

PubMed

Low, Anne; Low, Allan

2004-12-01

Primary Care Trusts (PCTs) and Local Strategic Partnerships (LSPs) are being asked to assess local health inequalities in order to prioritize local action, to set local targets for reducing levels of health inequality locally and to demonstrate measurable progress. Despite this, little guidance has been provided on how to quantify health inequalities within PCTs and LSPs. This paper advocates the use of a metric, the slope index of inequality, which provides a consistent measure of health inequalities across local populations. The metric can be presented as a relative gap, which is easily understood and enables levels of inequality to be compared between health conditions, lifestyles and rates of service provision at any one time, or across different time periods. The metric is applied to Sunderland Teaching PCT, using routine data sources. Examples of the results and their uses are presented. It is suggested that more widespread use of the metric could enable levels of health inequalities to be compared across PCTs and lead to the development of local health inequality and inequity benchmarks.

Benchmarking in pathology: development of a benchmarking complexity unit and associated key performance indicators.

PubMed

Neil, Amanda; Pfeffer, Sally; Burnett, Leslie

2013-01-01

This paper details the development of a new type of pathology laboratory productivity unit, the benchmarking complexity unit (BCU). The BCU provides a comparative index of laboratory efficiency, regardless of test mix. It also enables estimation of a measure of how much complex pathology a laboratory performs, and the identification of peer organisations for the purposes of comparison and benchmarking. The BCU is based on the theory that wage rates reflect productivity at the margin. A weighting factor for the ratio of medical to technical staff time was dynamically calculated based on actual participant site data. Given this weighting, a complexity value for each test, at each site, was calculated. The median complexity value (number of BCUs) for that test across all participating sites was taken as its complexity value for the Benchmarking in Pathology Program. The BCU allowed implementation of an unbiased comparison unit and test listing that was found to be a robust indicator of the relative complexity for each test. Employing the BCU data, a number of Key Performance Indicators (KPIs) were developed, including three that address comparative organisational complexity, analytical depth and performance efficiency, respectively. Peer groups were also established using the BCU combined with simple organisational and environmental metrics. The BCU has enabled productivity statistics to be compared between organisations. The BCU corrects for differences in test mix and workload complexity of different organisations and also allows for objective stratification into peer groups.
Critical Assessment of Metagenome Interpretation – a benchmark of computational metagenomics software

PubMed Central

Sczyrba, Alexander; Hofmann, Peter; Belmann, Peter; Koslicki, David; Janssen, Stefan; Dröge, Johannes; Gregor, Ivan; Majda, Stephan; Fiedler, Jessika; Dahms, Eik; Bremges, Andreas; Fritz, Adrian; Garrido-Oter, Ruben; Jørgensen, Tue Sparholt; Shapiro, Nicole; Blood, Philip D.; Gurevich, Alexey; Bai, Yang; Turaev, Dmitrij; DeMaere, Matthew Z.; Chikhi, Rayan; Nagarajan, Niranjan; Quince, Christopher; Meyer, Fernando; Balvočiūtė, Monika; Hansen, Lars Hestbjerg; Sørensen, Søren J.; Chia, Burton K. H.; Denis, Bertrand; Froula, Jeff L.; Wang, Zhong; Egan, Robert; Kang, Dongwan Don; Cook, Jeffrey J.; Deltel, Charles; Beckstette, Michael; Lemaitre, Claire; Peterlongo, Pierre; Rizk, Guillaume; Lavenier, Dominique; Wu, Yu-Wei; Singer, Steven W.; Jain, Chirag; Strous, Marc; Klingenberg, Heiner; Meinicke, Peter; Barton, Michael; Lingner, Thomas; Lin, Hsin-Hung; Liao, Yu-Chieh; Silva, Genivaldo Gueiros Z.; Cuevas, Daniel A.; Edwards, Robert A.; Saha, Surya; Piro, Vitor C.; Renard, Bernhard Y.; Pop, Mihai; Klenk, Hans-Peter; Göker, Markus; Kyrpides, Nikos C.; Woyke, Tanja; Vorholt, Julia A.; Schulze-Lefert, Paul; Rubin, Edward M.; Darling, Aaron E.; Rattei, Thomas; McHardy, Alice C.

2018-01-01

In metagenome analysis, computational methods for assembly, taxonomic profiling and binning are key components facilitating downstream biological data interpretation. However, a lack of consensus about benchmarking datasets and evaluation metrics complicates proper performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on datasets of unprecedented complexity and realism. Benchmark metagenomes were generated from ~700 newly sequenced microorganisms and ~600 novel viruses and plasmids, including genomes with varying degrees of relatedness to each other and to publicly available ones and representing common experimental setups. Across all datasets, assembly and genome binning programs performed well for species represented by individual genomes, while performance was substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below the family level. Parameter settings substantially impacted performances, underscoring the importance of program reproducibility. While highlighting current challenges in computational metagenomics, the CAMI results provide a roadmap for software selection to answer specific research questions. PMID:28967888
Multiscale benchmarking of drug delivery vectors.

PubMed

Summers, Huw D; Ware, Matthew J; Majithia, Ravish; Meissner, Kenith E; Godin, Biana; Rees, Paul

2016-10-01

Cross-system comparisons of drug delivery vectors are essential to ensure optimal design. An in-vitro experimental protocol is presented that separates the role of the delivery vector from that of its cargo in determining the cell response, thus allowing quantitative comparison of different systems. The technique is validated through benchmarking of the dose-response of human fibroblast cells exposed to the cationic molecule, polyethylene imine (PEI); delivered as a free molecule and as a cargo on the surface of CdSe nanoparticles and Silica microparticles. The exposure metrics are converted to a delivered dose with the transport properties of the different scale systems characterized by a delivery time, τ. The benchmarking highlights an agglomeration of the free PEI molecules into micron sized clusters and identifies the metric determining cell death as the total number of PEI molecules presented to cells, determined by the delivery vector dose and the surface density of the cargo. Copyright © 2016 Elsevier Inc. All rights reserved.
Benchmarking Using Basic DBMS Operations

NASA Astrophysics Data System (ADS)

Crolotte, Alain; Ghazal, Ahmad

The TPC-H benchmark proved to be successful in the decision support area. Many commercial database vendors and their related hardware vendors used these benchmarks to show the superiority and competitive edge of their products. However, over time, the TPC-H became less representative of industry trends as vendors keep tuning their database to this benchmark-specific workload. In this paper, we present XMarq, a simple benchmark framework that can be used to compare various software/hardware combinations. Our benchmark model is currently composed of 25 queries that measure the performance of basic operations such as scans, aggregations, joins and index access. This benchmark model is based on the TPC-H data model due to its maturity and well-understood data generation capability. We also propose metrics to evaluate single-system performance and compare two systems. Finally we illustrate the effectiveness of this model by showing experimental results comparing two systems under different conditions.
Using Publication Metrics to Highlight Academic Productivity and Research Impact

PubMed Central

Carpenter, Christopher R.; Cone, David C.; Sarli, Cathy C.

2016-01-01

This article provides a broad overview of widely available measures of academic productivity and impact using publication data and highlights uses of these metrics for various purposes. Metrics based on publication data include measures such as number of publications, number of citations, the journal impact factor score, and the h-index, as well as emerging metrics based on document-level metrics. Publication metrics can be used for a variety of purposes for tenure and promotion, grant applications and renewal reports, benchmarking, recruiting efforts, and administrative purposes for departmental or university performance reports. The authors also highlight practical applications of measuring and reporting academic productivity and impact to emphasize and promote individual investigators, grant applications, or department output. PMID:25308141
Simulation-based comprehensive benchmarking of RNA-seq aligners

PubMed Central

Baruzzo, Giacomo; Hayer, Katharina E; Kim, Eun Ji; Di Camillo, Barbara; FitzGerald, Garret A; Grant, Gregory R

2018-01-01

Alignment is the first step in most RNA-seq analysis pipelines, and the accuracy of downstream analyses depends heavily on it. Unlike most steps in the pipeline, alignment is particularly amenable to benchmarking with simulated data. We performed a comprehensive benchmarking of 14 common splice-aware aligners for base, read, and exon junction-level accuracy and compared default with optimized parameters. We found that performance varied by genome complexity, and accuracy and popularity were poorly correlated. The most widely cited tool underperforms for most metrics, particularly when using default settings. PMID:27941783
SP2Bench: A SPARQL Performance Benchmark

NASA Astrophysics Data System (ADS)

Schmidt, Michael; Hornung, Thomas; Meier, Michael; Pinkel, Christoph; Lausen, Georg

A meaningful analysis and comparison of both existing storage schemes for RDF data and evaluation approaches for SPARQL queries necessitates a comprehensive and universal benchmark platform. We present SP2Bench, a publicly available, language-specific performance benchmark for the SPARQL query language. SP2Bench is settled in the DBLP scenario and comprises a data generator for creating arbitrarily large DBLP-like documents and a set of carefully designed benchmark queries. The generated documents mirror vital key characteristics and social-world distributions encountered in the original DBLP data set, while the queries implement meaningful requests on top of this data, covering a variety of SPARQL operator constellations and RDF access patterns. In this chapter, we discuss requirements and desiderata for SPARQL benchmarks and present the SP2Bench framework, including its data generator, benchmark queries and performance metrics.
A Question of Accountability: Looking beyond Federal Mandates for Metrics That Accurately Benchmark Community College Success

ERIC Educational Resources Information Center

Joch, Alan

2014-01-01

The need for increased accountability in higher education and, specifically, the nation's community colleges-is something most educators can agree on. The challenge has, and continues to be, finding a system of metrics that meets the unique needs of two-year institutions versus their four-year-counterparts. Last summer, President Obama unveiled…
Benchmarking a geostatistical procedure for the homogenisation of annual precipitation series

NASA Astrophysics Data System (ADS)

Caineta, Júlio; Ribeiro, Sara; Henriques, Roberto; Soares, Amílcar; Costa, Ana Cristina

2014-05-01

The European project COST Action ES0601, Advances in homogenisation methods of climate series: an integrated approach (HOME), has brought to attention the importance of establishing reliable homogenisation methods for climate data. In order to achieve that, a benchmark data set, containing monthly and daily temperature and precipitation data, was created to be used as a comparison basis for the effectiveness of those methods. Several contributions were submitted and evaluated by a number of performance metrics, validating the results against realistic inhomogeneous data. HOME also led to the development of new homogenisation software packages, which included feedback and lessons learned during the project. Preliminary studies have suggested a geostatistical stochastic approach, which uses Direct Sequential Simulation (DSS), as a promising methodology for the homogenisation of precipitation data series. Based on the spatial and temporal correlation between the neighbouring stations, DSS calculates local probability density functions at a candidate station to detect inhomogeneities. The purpose of the current study is to test and compare this geostatistical approach with the methods previously presented in the HOME project, using surrogate precipitation series from the HOME benchmark data set. The benchmark data set contains monthly precipitation surrogate series, from which annual precipitation data series were derived. These annual precipitation series were subject to exploratory analysis and to a thorough variography study. The geostatistical approach was then applied to the data set, based on different scenarios for the spatial continuity. Implementing this procedure also promoted the development of a computer program that aims to assist on the homogenisation of climate data, while minimising user interaction. Finally, in order to compare the effectiveness of this methodology with the homogenisation methods submitted during the HOME project, the obtained results were evaluated using the same performance metrics. This comparison opens new perspectives for the development of an innovative procedure based on the geostatistical stochastic approach. Acknowledgements: The authors gratefully acknowledge the financial support of "Fundação para a Ciência e Tecnologia" (FCT), Portugal, through the research project PTDC/GEO-MET/4026/2012 ("GSIMCLI - Geostatistical simulation with local distributions for the homogenization and interpolation of climate data").
Benchmarking in Academic Pharmacy Departments

PubMed Central

Chisholm-Burns, Marie; Nappi, Jean; Gubbins, Paul O.; Ross, Leigh Ann

2010-01-01

Benchmarking in academic pharmacy, and recommendations for the potential uses of benchmarking in academic pharmacy departments are discussed in this paper. Benchmarking is the process by which practices, procedures, and performance metrics are compared to an established standard or best practice. Many businesses and industries use benchmarking to compare processes and outcomes, and ultimately plan for improvement. Institutions of higher learning have embraced benchmarking practices to facilitate measuring the quality of their educational and research programs. Benchmarking is used internally as well to justify the allocation of institutional resources or to mediate among competing demands for additional program staff or space. Surveying all chairs of academic pharmacy departments to explore benchmarking issues such as department size and composition, as well as faculty teaching, scholarly, and service productivity, could provide valuable information. To date, attempts to gather this data have had limited success. We believe this information is potentially important, urge that efforts to gather it should be continued, and offer suggestions to achieve full participation. PMID:21179251
Benchmarking in academic pharmacy departments.

PubMed

Bosso, John A; Chisholm-Burns, Marie; Nappi, Jean; Gubbins, Paul O; Ross, Leigh Ann

2010-10-11

Benchmarking in academic pharmacy, and recommendations for the potential uses of benchmarking in academic pharmacy departments are discussed in this paper. Benchmarking is the process by which practices, procedures, and performance metrics are compared to an established standard or best practice. Many businesses and industries use benchmarking to compare processes and outcomes, and ultimately plan for improvement. Institutions of higher learning have embraced benchmarking practices to facilitate measuring the quality of their educational and research programs. Benchmarking is used internally as well to justify the allocation of institutional resources or to mediate among competing demands for additional program staff or space. Surveying all chairs of academic pharmacy departments to explore benchmarking issues such as department size and composition, as well as faculty teaching, scholarly, and service productivity, could provide valuable information. To date, attempts to gather this data have had limited success. We believe this information is potentially important, urge that efforts to gather it should be continued, and offer suggestions to achieve full participation.
Benchmarking homogenization algorithms for monthly data

NASA Astrophysics Data System (ADS)

Venema, V. K. C.; Mestre, O.; Aguilar, E.; Auer, I.; Guijarro, J. A.; Domonkos, P.; Vertacnik, G.; Szentimrey, T.; Stepanek, P.; Zahradnicek, P.; Viarre, J.; Müller-Westermeier, G.; Lakatos, M.; Williams, C. N.; Menne, M. J.; Lindau, R.; Rasol, D.; Rustemeier, E.; Kolokythas, K.; Marinova, T.; Andresen, L.; Acquaotta, F.; Fratianni, S.; Cheval, S.; Klancar, M.; Brunetti, M.; Gruber, C.; Prohom Duran, M.; Likso, T.; Esteban, P.; Brandsma, T.

2012-01-01

The COST (European Cooperation in Science and Technology) Action ES0601: advances in homogenization methods of climate series: an integrated approach (HOME) has executed a blind intercomparison and validation study for monthly homogenization algorithms. Time series of monthly temperature and precipitation were evaluated because of their importance for climate studies and because they represent two important types of statistics (additive and multiplicative). The algorithms were validated against a realistic benchmark dataset. The benchmark contains real inhomogeneous data as well as simulated data with inserted inhomogeneities. Random independent break-type inhomogeneities with normally distributed breakpoint sizes were added to the simulated datasets. To approximate real world conditions, breaks were introduced that occur simultaneously in multiple station series within a simulated network of station data. The simulated time series also contained outliers, missing data periods and local station trends. Further, a stochastic nonlinear global (network-wide) trend was added. Participants provided 25 separate homogenized contributions as part of the blind study. After the deadline at which details of the imposed inhomogeneities were revealed, 22 additional solutions were submitted. These homogenized datasets were assessed by a number of performance metrics including (i) the centered root mean square error relative to the true homogeneous value at various averaging scales, (ii) the error in linear trend estimates and (iii) traditional contingency skill scores. The metrics were computed both using the individual station series as well as the network average regional series. The performance of the contributions depends significantly on the error metric considered. Contingency scores by themselves are not very informative. Although relative homogenization algorithms typically improve the homogeneity of temperature data, only the best ones improve precipitation data. Training the users on homogenization software was found to be very important. Moreover, state-of-the-art relative homogenization algorithms developed to work with an inhomogeneous reference are shown to perform best. The study showed that automatic algorithms can perform as well as manual ones.
Benchmarking monthly homogenization algorithms

NASA Astrophysics Data System (ADS)

Venema, V. K. C.; Mestre, O.; Aguilar, E.; Auer, I.; Guijarro, J. A.; Domonkos, P.; Vertacnik, G.; Szentimrey, T.; Stepanek, P.; Zahradnicek, P.; Viarre, J.; Müller-Westermeier, G.; Lakatos, M.; Williams, C. N.; Menne, M.; Lindau, R.; Rasol, D.; Rustemeier, E.; Kolokythas, K.; Marinova, T.; Andresen, L.; Acquaotta, F.; Fratianni, S.; Cheval, S.; Klancar, M.; Brunetti, M.; Gruber, C.; Prohom Duran, M.; Likso, T.; Esteban, P.; Brandsma, T.

2011-08-01

The COST (European Cooperation in Science and Technology) Action ES0601: Advances in homogenization methods of climate series: an integrated approach (HOME) has executed a blind intercomparison and validation study for monthly homogenization algorithms. Time series of monthly temperature and precipitation were evaluated because of their importance for climate studies and because they represent two important types of statistics (additive and multiplicative). The algorithms were validated against a realistic benchmark dataset. The benchmark contains real inhomogeneous data as well as simulated data with inserted inhomogeneities. Random break-type inhomogeneities were added to the simulated datasets modeled as a Poisson process with normally distributed breakpoint sizes. To approximate real world conditions, breaks were introduced that occur simultaneously in multiple station series within a simulated network of station data. The simulated time series also contained outliers, missing data periods and local station trends. Further, a stochastic nonlinear global (network-wide) trend was added. Participants provided 25 separate homogenized contributions as part of the blind study as well as 22 additional solutions submitted after the details of the imposed inhomogeneities were revealed. These homogenized datasets were assessed by a number of performance metrics including (i) the centered root mean square error relative to the true homogeneous value at various averaging scales, (ii) the error in linear trend estimates and (iii) traditional contingency skill scores. The metrics were computed both using the individual station series as well as the network average regional series. The performance of the contributions depends significantly on the error metric considered. Contingency scores by themselves are not very informative. Although relative homogenization algorithms typically improve the homogeneity of temperature data, only the best ones improve precipitation data. Training was found to be very important. Moreover, state-of-the-art relative homogenization algorithms developed to work with an inhomogeneous reference are shown to perform best. The study showed that currently automatic algorithms can perform as well as manual ones.
OWL2 benchmarking for the evaluation of knowledge based systems.

PubMed

Khan, Sher Afgun; Qadir, Muhammad Abdul; Abbas, Muhammad Azeem; Afzal, Muhammad Tanvir

2017-01-01

OWL2 semantics are becoming increasingly popular for the real domain applications like Gene engineering and health MIS. The present work identifies the research gap that negligible attention has been paid to the performance evaluation of Knowledge Base Systems (KBS) using OWL2 semantics. To fulfil this identified research gap, an OWL2 benchmark for the evaluation of KBS is proposed. The proposed benchmark addresses the foundational blocks of an ontology benchmark i.e. data schema, workload and performance metrics. The proposed benchmark is tested on memory based, file based, relational database and graph based KBS for performance and scalability measures. The results show that the proposed benchmark is able to evaluate the behaviour of different state of the art KBS on OWL2 semantics. On the basis of the results, the end users (i.e. domain expert) would be able to select a suitable KBS appropriate for his domain.
A suite of standard post-tagging evaluation metrics can help assess tag retention for field-based fish telemetry research

USGS Publications Warehouse

Gerber, Kayla M.; Mather, Martha E.; Smith, Joseph M.

2017-01-01

Telemetry can inform many scientific and research questions if a context exists for integrating individual studies into the larger body of literature. Creating cumulative distributions of post-tagging evaluation metrics would allow individual researchers to relate their telemetry data to other studies. Widespread reporting of standard metrics is a precursor to the calculation of benchmarks for these distributions (e.g., mean, SD, 95% CI). Here we illustrate five types of standard post-tagging evaluation metrics using acoustically tagged Blue Catfish (Ictalurus furcatus) released into a Kansas reservoir. These metrics included: (1) percent of tagged fish detected overall, (2) percent of tagged fish detected daily using abacus plot data, (3) average number of (and percent of available) receiver sites visited, (4) date of last movement between receiver sites (and percent of tagged fish moving during that time period), and (5) number (and percent) of fish that egressed through exit gates. These metrics were calculated for one to three time periods: early (<10 d), during (weekly), and at the end of the study (5 months). Over three-quarters of our tagged fish were detected early (85%) and at the end (85%) of the study. Using abacus plot data, all tagged fish (100%) were detected at least one day and 96% were detected for > 5 days early in the study. On average, tagged Blue Catfish visited 9 (50%) and 13 (72%) of 18 within-reservoir receivers early and at the end of the study, respectively. At the end of the study, 73% of all tagged fish were detected moving between receivers. Creating statistical benchmarks for individual metrics can provide useful reference points. In addition, combining multiple metrics can inform ecology and research design. Consequently, individual researchers and the field of telemetry research can benefit from widespread, detailed, and standard reporting of post-tagging detection metrics.
Benchmarking the ATLAS software through the Kit Validation engine

NASA Astrophysics Data System (ADS)

De Salvo, Alessandro; Brasolin, Franco

2010-04-01

The measurement of the experiment software performance is a very important metric in order to choose the most effective resources to be used and to discover the bottlenecks of the code implementation. In this work we present the benchmark techniques used to measure the ATLAS software performance through the ATLAS offline testing engine Kit Validation and the online portal Global Kit Validation. The performance measurements, the data collection, the online analysis and display of the results will be presented. The results of the measurement on different platforms and architectures will be shown, giving a full report on the CPU power and memory consumption of the Monte Carlo generation, simulation, digitization and reconstruction of the most CPU-intensive channels. The impact of the multi-core computing on the ATLAS software performance will also be presented, comparing the behavior of different architectures when increasing the number of concurrent processes. The benchmark techniques described in this paper have been used in the HEPiX group since the beginning of 2008 to help defining the performance metrics for the High Energy Physics applications, based on the real experiment software.
Implementing Data Definition Consistency for Emergency Department Operations Benchmarking and Research.

PubMed

Yiadom, Maame Yaa A B; Scheulen, James; McWade, Conor M; Augustine, James J

2016-07-01

The objective was to obtain a commitment to adopt a common set of definitions for emergency department (ED) demographic, clinical process, and performance metrics among the ED Benchmarking Alliance (EDBA), ED Operations Study Group (EDOSG), and Academy of Academic Administrators of Emergency Medicine (AAAEM) by 2017. A retrospective cross-sectional analysis of available data from three ED operations benchmarking organizations supported a negotiation to use a set of common metrics with identical definitions. During a 1.5-day meeting-structured according to social change theories of information exchange, self-interest, and interdependence-common definitions were identified and negotiated using the EDBA's published definitions as a start for discussion. Methods of process analysis theory were used in the 8 weeks following the meeting to achieve official consensus on definitions. These two lists were submitted to the organizations' leadership for implementation approval. A total of 374 unique measures were identified, of which 57 (15%) were shared by at least two organizations. Fourteen (4%) were common to all three organizations. In addition to agreement on definitions for the 14 measures used by all three organizations, agreement was reached on universal definitions for 17 of the 57 measures shared by at least two organizations. The negotiation outcome was a list of 31 measures with universal definitions to be adopted by each organization by 2017. The use of negotiation, social change, and process analysis theories achieved the adoption of universal definitions among the EDBA, EDOSG, and AAAEM. This will impact performance benchmarking for nearly half of US EDs. It initiates a formal commitment to utilize standardized metrics, and it transitions consistency in reporting ED operations metrics from consensus to implementation. This work advances our ability to more accurately characterize variation in ED care delivery models, resource utilization, and performance. In addition, it permits future aggregation of these three data sets, thus facilitating the creation of more robust ED operations research data sets unified by a universal language. Negotiation, social change, and process analysis principles can be used to advance the adoption of additional definitions. © 2016 by the Society for Academic Emergency Medicine.
Driving personalized medicine: capturing maximum net present value and optimal return on investment.

PubMed

Roth, Mollie; Keeling, Peter; Smart, Dave

2010-01-01

In order for personalized medicine to meet its potential future promise, a closer focus on the work being carried out today and the foundation it will provide for that future is imperative. While big picture perspectives of this still nascent shift in the drug-development process are important, it is more important that today's work on the first wave of targeted therapies is used to build specific benchmarking and financial models against which further such therapies may be more effectively developed. Today's drug-development teams need a robust tool to identify the exact drivers that will ensure the successful launch and rapid adoption of targeted therapies, and financial metrics to determine the appropriate resource levels to power those drivers. This special report will describe one such benchmarking and financial model that is specifically designed for the personalized medicine field and will explain how the use of this or similar models can help to capture the maximum net present value of targeted therapies and help to realize optimal return on investment.
Validating Cellular Automata Lava Flow Emplacement Algorithms with Standard Benchmarks

NASA Astrophysics Data System (ADS)

Richardson, J. A.; Connor, L.; Charbonnier, S. J.; Connor, C.; Gallant, E.

2015-12-01

A major existing need in assessing lava flow simulators is a common set of validation benchmark tests. We propose three levels of benchmarks which test model output against increasingly complex standards. First, imulated lava flows should be morphologically identical, given changes in parameter space that should be inconsequential, such as slope direction. Second, lava flows simulated in simple parameter spaces can be tested against analytical solutions or empirical relationships seen in Bingham fluids. For instance, a lava flow simulated on a flat surface should produce a circular outline. Third, lava flows simulated over real world topography can be compared to recent real world lava flows, such as those at Tolbachik, Russia, and Fogo, Cape Verde. Success or failure of emplacement algorithms in these validation benchmarks can be determined using a Bayesian approach, which directly tests the ability of an emplacement algorithm to correctly forecast lava inundation. Here we focus on two posterior metrics, P(A|B) and P(¬A|¬B), which describe the positive and negative predictive value of flow algorithms. This is an improvement on less direct statistics such as model sensitivity and the Jaccard fitness coefficient. We have performed these validation benchmarks on a new, modular lava flow emplacement simulator that we have developed. This simulator, which we call MOLASSES, follows a Cellular Automata (CA) method. The code is developed in several interchangeable modules, which enables quick modification of the distribution algorithm from cell locations to their neighbors. By assessing several different distribution schemes with the benchmark tests, we have improved the performance of MOLASSES to correctly match early stages of the 2012-3 Tolbachik Flow, Kamchakta Russia, to 80%. We also can evaluate model performance given uncertain input parameters using a Monte Carlo setup. This illuminates sensitivity to model uncertainty.
Congenital Heart Surgery Case Mix Across North American Centers and Impact on Performance Assessment.

PubMed

Pasquali, Sara K; Wallace, Amelia S; Gaynor, J William; Jacobs, Marshall L; O'Brien, Sean M; Hill, Kevin D; Gaies, Michael G; Romano, Jennifer C; Shahian, David M; Mayer, John E; Jacobs, Jeffrey P

2016-11-01

Performance assessment in congenital heart surgery is challenging due to the wide heterogeneity of disease. We describe current case mix across centers, evaluate methodology inclusive of all cardiac operations versus the more homogeneous subset of Society of Thoracic Surgeons benchmark operations, and describe implications regarding performance assessment. Centers (n = 119) participating in the Society of Thoracic Surgeons Congenital Heart Surgery Database (2010 through 2014) were included. Index operation type and frequency across centers were described. Center performance (risk-adjusted operative mortality) was evaluated and classified when including the benchmark versus all eligible operations. Overall, 207 types of operations were performed during the study period (112,140 total cases). Few operations were performed across all centers; only 25% were performed at least once by 75% or more of centers. There was 7.9-fold variation across centers in the proportion of total cases comprising high-complexity cases (STAT 5). In contrast, the benchmark operations made up 36% of cases, and all but 2 were performed by at least 90% of centers. When evaluating performance based on benchmark versus all operations, 15% of centers changed performance classification; 85% remained unchanged. Benchmark versus all operation methodology was associated with lower power, with 35% versus 78% of centers meeting sample size thresholds. There is wide variation in congenital heart surgery case mix across centers. Metrics based on benchmark versus all operations are associated with strengths (less heterogeneity) and weaknesses (lower power), and lead to differing performance classification for some centers. These findings have implications for ongoing efforts to optimize performance assessment, including choice of target population and appropriate interpretation of reported metrics. Copyright © 2016 The Society of Thoracic Surgeons. Published by Elsevier Inc. All rights reserved.

Establishing objective benchmarks in robotic virtual reality simulation at the level of a competent surgeon using the RobotiX Mentor simulator.

PubMed

Watkinson, William; Raison, Nicholas; Abe, Takashige; Harrison, Patrick; Khan, Shamim; Van der Poel, Henk; Dasgupta, Prokar; Ahmed, Kamran

2018-05-01

To establish objective benchmarks at the level of a competent robotic surgeon across different exercises and metrics for the RobotiX Mentor virtual reality (VR) simulator suitable for use within a robotic surgical training curriculum. This retrospective observational study analysed results from multiple data sources, all of which used the RobotiX Mentor VR simulator. 123 participants with varying experience from novice to expert completed the exercises. Competency was established as the 25th centile of the mean advanced intermediate score. Three basic skill exercises and two advanced skill exercises were used. King's College London. 84 Novice, 26 beginner intermediates, 9 advanced intermediates and 4 experts were used in this retrospective observational study. Objective benchmarks derived from the 25th centile of the mean scores of the advanced intermediates provided suitably challenging yet also achievable targets for training surgeons. The disparity in scores was greatest for the advanced exercises. Novice surgeons are able to achieve the benchmarks across all exercises in the majority of metrics. We have successfully created this proof-of-concept study, which requires validation in a larger cohort. Objective benchmarks obtained from the 25th centile of the mean scores of advanced intermediates provide clinically relevant benchmarks at the standard of a competent robotic surgeon that are challenging yet also attainable. That can be used within a VR training curriculum allowing participants to track and monitor their progress in a structured and progressional manner through five exercises. Providing clearly defined targets, ensuring that a universal training standard has been achieved across training surgeons. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
High-performance conjugate-gradient benchmark: A new metric for ranking high-performance computing systems

DOE PAGES

Dongarra, Jack; Heroux, Michael A.; Luszczek, Piotr

2015-08-17

Here, we describe a new high-performance conjugate-gradient (HPCG) benchmark. HPCG is composed of computations and data-access patterns commonly found in scientific applications. HPCG strives for a better correlation to existing codes from the computational science domain and to be representative of their performance. Furthermore, HPCG is meant to help drive the computer system design and implementation in directions that will better impact future performance improvement.
How well does your model capture the terrestrial ecosystem dynamics of the Arctic-Boreal Region?

NASA Astrophysics Data System (ADS)

Stofferahn, E.; Fisher, J. B.; Hayes, D. J.; Huntzinger, D. N.; Schwalm, C.

2016-12-01

The Arctic-Boreal Region (ABR) is a major source of uncertainties for terrestrial biosphere model (TBM) simulations. These uncertainties are precipitated by a lack of observational data from the region, affecting the parameterizations of cold environment processes in the models. Addressing these uncertainties requires a coordinated effort of data collection and integration of the following key indicators of the ABR ecosystem: disturbance, flora / fauna and related ecosystem function, carbon pools and biogeochemistry, permafrost, and hydrology. We are developing a model-data integration framework for NASA's Arctic Boreal Vulnerability Experiment (ABoVE), wherein data collection for the key ABoVE indicators is driven by matching observations and model outputs to the ABoVE indicators. The data are used as reference datasets for a benchmarking system which evaluates TBM performance with respect to ABR processes. The benchmarking system utilizes performance metrics to identify intra-model and inter-model strengths and weaknesses, which in turn provides guidance to model development teams for reducing uncertainties in TBM simulations of the ABR. The system is directly connected to the International Land Model Benchmarking (ILaMB) system, as an ABR-focused application.
Parameters that affect parallel processing for computational electromagnetic simulation codes on high performance computing clusters

NASA Astrophysics Data System (ADS)

Moon, Hongsik

What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the changing computer hardware platforms in order to provide fast, accurate and efficient solutions to large, complex electromagnetic problems. The research in this dissertation proves that the performance of parallel code is intimately related to the configuration of the computer hardware and can be maximized for different hardware platforms. To benchmark and optimize the performance of parallel CEM software, a variety of large, complex projects are created and executed on a variety of computer platforms. The computer platforms used in this research are detailed in this dissertation. The projects run as benchmarks are also described in detail and results are presented. The parameters that affect parallel CEM software on High Performance Computing Clusters (HPCC) are investigated. This research demonstrates methods to maximize the performance of parallel CEM software code.
The MPC&A Questionnaire

DOE Office of Scientific and Technical Information (OSTI.GOV)

Powell, Danny H; Elwood Jr, Robert H

The questionnaire is the instrument used for recording performance data on the nuclear material protection, control, and accountability (MPC&A) system at a nuclear facility. The performance information provides a basis for evaluating the effectiveness of the MPC&A system. The goal for the questionnaire is to provide an accurate representation of the performance of the MPC&A system as it currently exists in the facility. Performance grades for all basic MPC&A functions should realistically reflect the actual level of performance at the time the survey is conducted. The questionnaire was developed after testing and benchmarking the material control and accountability (MC&A) systemmore » effectiveness tool (MSET) in the United States. The benchmarking exercise at the Idaho National Laboratory (INL) proved extremely valuable for improving the content and quality of the early versions of the questionnaire. Members of the INL benchmark team identified many areas of the questionnaire where questions should be clarified and areas where additional questions should be incorporated. The questionnaire addresses all elements of the MC&A system. Specific parts pertain to the foundation for the facility's overall MPC&A system, and other parts pertain to the specific functions of the operational MPC&A system. The questionnaire includes performance metrics for each of the basic functions or tasks performed in the operational MPC&A system. All of those basic functions or tasks are represented as basic events in the MPC&A fault tree. Performance metrics are to be used during completion of the questionnaire to report what is actually being done in relation to what should be done in the performance of MPC&A functions.« less
A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC

DOE PAGES

Siddique, Nafiul A.; Grubel, Patricia A.; Badawy, Abdel-Hameed A.; ...

2017-09-20

Cache has long been used to minimize the latency of main memory accesses by storing frequently used data near the processor. Processor performance depends on the underlying cache performance. Therefore, significant research has been done to identify the most crucial metrics of cache performance. Although the majority of research focuses on measuring cache hit rates and data movement as the primary cache performance metrics, cache utilization is significantly important. We investigate the application’s locality using cache utilization metrics. In addition, we present cache utilization and traditional cache performance metrics as the program progresses providing detailed insights into the dynamic applicationmore » behavior on parallel applications from four benchmark suites running on multiple cores. We explore cache utilization for APEX, Mantevo, NAS, and PARSEC, mostly scientific benchmark suites. Our results indicate that 40% of the data bytes in a cache line are accessed at least once before line eviction. Also, on average a byte is accessed two times before the cache line is evicted for these applications. Moreover, we present runtime cache utilization, as well as, conventional performance metrics that illustrate a holistic understanding of cache behavior. To facilitate this research, we build a memory simulator incorporated into the Structural Simulation Toolkit (Rodrigues et al. in SIGMETRICS Perform Eval Rev 38(4):37–42, 2011). Finally, our results suggest that variable cache line size can result in better performance and can also conserve power.« less
A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC

DOE Office of Scientific and Technical Information (OSTI.GOV)

Siddique, Nafiul A.; Grubel, Patricia A.; Badawy, Abdel-Hameed A.

Cache has long been used to minimize the latency of main memory accesses by storing frequently used data near the processor. Processor performance depends on the underlying cache performance. Therefore, significant research has been done to identify the most crucial metrics of cache performance. Although the majority of research focuses on measuring cache hit rates and data movement as the primary cache performance metrics, cache utilization is significantly important. We investigate the application’s locality using cache utilization metrics. In addition, we present cache utilization and traditional cache performance metrics as the program progresses providing detailed insights into the dynamic applicationmore » behavior on parallel applications from four benchmark suites running on multiple cores. We explore cache utilization for APEX, Mantevo, NAS, and PARSEC, mostly scientific benchmark suites. Our results indicate that 40% of the data bytes in a cache line are accessed at least once before line eviction. Also, on average a byte is accessed two times before the cache line is evicted for these applications. Moreover, we present runtime cache utilization, as well as, conventional performance metrics that illustrate a holistic understanding of cache behavior. To facilitate this research, we build a memory simulator incorporated into the Structural Simulation Toolkit (Rodrigues et al. in SIGMETRICS Perform Eval Rev 38(4):37–42, 2011). Finally, our results suggest that variable cache line size can result in better performance and can also conserve power.« less
GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods.

PubMed

Schaffter, Thomas; Marbach, Daniel; Floreano, Dario

2011-08-15

Over the last decade, numerous methods have been developed for inference of regulatory networks from gene expression data. However, accurate and systematic evaluation of these methods is hampered by the difficulty of constructing adequate benchmarks and the lack of tools for a differentiated analysis of network predictions on such benchmarks. Here, we describe a novel and comprehensive method for in silico benchmark generation and performance profiling of network inference methods available to the community as an open-source software called GeneNetWeaver (GNW). In addition to the generation of detailed dynamical models of gene regulatory networks to be used as benchmarks, GNW provides a network motif analysis that reveals systematic prediction errors, thereby indicating potential ways of improving inference methods. The accuracy of network inference methods is evaluated using standard metrics such as precision-recall and receiver operating characteristic curves. We show how GNW can be used to assess the performance and identify the strengths and weaknesses of six inference methods. Furthermore, we used GNW to provide the international Dialogue for Reverse Engineering Assessments and Methods (DREAM) competition with three network inference challenges (DREAM3, DREAM4 and DREAM5). GNW is available at http://gnw.sourceforge.net along with its Java source code, user manual and supporting data. Supplementary data are available at Bioinformatics online. dario.floreano@epfl.ch.
Comparative modeling and benchmarking data sets for human histone deacetylases and sirtuin families.

PubMed

Xia, Jie; Tilahun, Ermias Lemma; Kebede, Eyob Hailu; Reid, Terry-Elinor; Zhang, Liangren; Wang, Xiang Simon

2015-02-23

Histone deacetylases (HDACs) are an important class of drug targets for the treatment of cancers, neurodegenerative diseases, and other types of diseases. Virtual screening (VS) has become fairly effective approaches for drug discovery of novel and highly selective histone deacetylase inhibitors (HDACIs). To facilitate the process, we constructed maximal unbiased benchmarking data sets for HDACs (MUBD-HDACs) using our recently published methods that were originally developed for building unbiased benchmarking sets for ligand-based virtual screening (LBVS). The MUBD-HDACs cover all four classes including Class III (Sirtuins family) and 14 HDAC isoforms, composed of 631 inhibitors and 24609 unbiased decoys. Its ligand sets have been validated extensively as chemically diverse, while the decoy sets were shown to be property-matching with ligands and maximal unbiased in terms of "artificial enrichment" and "analogue bias". We also conducted comparative studies with DUD-E and DEKOIS 2.0 sets against HDAC2 and HDAC8 targets and demonstrate that our MUBD-HDACs are unique in that they can be applied unbiasedly to both LBVS and SBVS approaches. In addition, we defined a novel metric, i.e. NLBScore, to detect the "2D bias" and "LBVS favorable" effect within the benchmarking sets. In summary, MUBD-HDACs are the only comprehensive and maximal-unbiased benchmark data sets for HDACs (including Sirtuins) that are available so far. MUBD-HDACs are freely available at http://www.xswlab.org/ .
Parameterized centrality metric for network analysis

NASA Astrophysics Data System (ADS)

Ghosh, Rumi; Lerman, Kristina

2011-06-01

A variety of metrics have been proposed to measure the relative importance of nodes in a network. One of these, alpha-centrality [P. Bonacich, Am. J. Sociol.0002-960210.1086/228631 92, 1170 (1987)], measures the number of attenuated paths that exist between nodes. We introduce a normalized version of this metric and use it to study network structure, for example, to rank nodes and find community structure of the network. Specifically, we extend the modularity-maximization method for community detection to use this metric as the measure of node connectivity. Normalized alpha-centrality is a powerful tool for network analysis, since it contains a tunable parameter that sets the length scale of interactions. Studying how rankings and discovered communities change when this parameter is varied allows us to identify locally and globally important nodes and structures. We apply the proposed metric to several benchmark networks and show that it leads to better insights into network structure than alternative metrics.
Benchmarking short sequence mapping tools

PubMed Central

2013-01-01

Background The development of next-generation sequencing instruments has led to the generation of millions of short sequences in a single run. The process of aligning these reads to a reference genome is time consuming and demands the development of fast and accurate alignment tools. However, the current proposed tools make different compromises between the accuracy and the speed of mapping. Moreover, many important aspects are overlooked while comparing the performance of a newly developed tool to the state of the art. Therefore, there is a need for an objective evaluation method that covers all the aspects. In this work, we introduce a benchmarking suite to extensively analyze sequencing tools with respect to various aspects and provide an objective comparison. Results We applied our benchmarking tests on 9 well known mapping tools, namely, Bowtie, Bowtie2, BWA, SOAP2, MAQ, RMAP, GSNAP, Novoalign, and mrsFAST (mrFAST) using synthetic data and real RNA-Seq data. MAQ and RMAP are based on building hash tables for the reads, whereas the remaining tools are based on indexing the reference genome. The benchmarking tests reveal the strengths and weaknesses of each tool. The results show that no single tool outperforms all others in all metrics. However, Bowtie maintained the best throughput for most of the tests while BWA performed better for longer read lengths. The benchmarking tests are not restricted to the mentioned tools and can be further applied to others. Conclusion The mapping process is still a hard problem that is affected by many factors. In this work, we provided a benchmarking suite that reveals and evaluates the different factors affecting the mapping process. Still, there is no tool that outperforms all of the others in all the tests. Therefore, the end user should clearly specify his needs in order to choose the tool that provides the best results. PMID:23758764
RBscore&NBench: a high-level web server for nucleic acid binding residues prediction with a large-scale benchmarking database.

PubMed

Miao, Zhichao; Westhof, Eric

2016-07-08

RBscore&NBench combines a web server, RBscore and a database, NBench. RBscore predicts RNA-/DNA-binding residues in proteins and visualizes the prediction scores and features on protein structures. The scoring scheme of RBscore directly links feature values to nucleic acid binding probabilities and illustrates the nucleic acid binding energy funnel on the protein surface. To avoid dataset, binding site definition and assessment metric biases, we compared RBscore with 18 web servers and 3 stand-alone programs on 41 datasets, which demonstrated the high and stable accuracy of RBscore. A comprehensive comparison led us to develop a benchmark database named NBench. The web server is available on: http://ahsoka.u-strasbg.fr/rbscorenbench/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
A concept paper: using the outcomes of common surgical conditions as quality metrics to benchmark district surgical services in South Africa as part of a systematic quality improvement programme.

PubMed

Clarke, Damian L; Kong, Victor Y; Handley, Jonathan; Aldous, Colleen

2013-07-31

The fourth, fifth and sixth Millennium Development Goals relate directly to improving global healthcare and health outcomes. The focus is to improve global health outcomes by reducing maternal and childhood mortality and the burden of infectious diseases such as HIV/AIDS, tuberculosis and malaria. Specific targets and time frames have been set for these diseases. There is, however, no specific mention of surgically treated diseases in these goals, reflecting a bias that is slowly changing with emerging consensus that surgical care is an integral part of primary healthcare systems in the developing world. The disparities between the developed and developing world in terms of wealth and social indicators are reflected in disparities in access to surgical care. Health administrators must develop plans and strategies to reduce these disparities. However, any strategic plan that addresses deficits in healthcare must have a system of metrics, which benchmark the current quality of care so that specific improvement targets may be set.This concept paper outlines the role of surgical services in a primary healthcare system, highlights the ongoing disparities in access to surgical care and outcomes of surgical care, discusses the importance of a systems-based approach to healthcare and quality improvement, and reviews the current state of surgical care at district hospitals in South Africa. Finally, it proposes that the results from a recently published study on acute appendicitis, as well as data from a number of other common surgical conditions, can provide measurable outcomes across a healthcare system and so act as an indicator for judging improvements in surgical care. This would provide a framework for the introduction of collection of these outcomes as a routine epidemiological health policy tool.
Prediction Models for 30-Day Mortality and Complications After Total Knee and Hip Arthroplasties for Veteran Health Administration Patients With Osteoarthritis.

PubMed

Harris, Alex Hs; Kuo, Alfred C; Bowe, Thomas; Gupta, Shalini; Nordin, David; Giori, Nicholas J

2018-05-01

Statistical models to preoperatively predict patients' risk of death and major complications after total joint arthroplasty (TJA) could improve the quality of preoperative management and informed consent. Although risk models for TJA exist, they have limitations including poor transparency and/or unknown or poor performance. Thus, it is currently impossible to know how well currently available models predict short-term complications after TJA, or if newly developed models are more accurate. We sought to develop and conduct cross-validation of predictive risk models, and report details and performance metrics as benchmarks. Over 90 preoperative variables were used as candidate predictors of death and major complications within 30 days for Veterans Health Administration patients with osteoarthritis who underwent TJA. Data were split into 3 samples-for selection of model tuning parameters, model development, and cross-validation. C-indexes (discrimination) and calibration plots were produced. A total of 70,569 patients diagnosed with osteoarthritis who received primary TJA were included. C-statistics and bootstrapped confidence intervals for the cross-validation of the boosted regression models were highest for cardiac complications (0.75; 0.71-0.79) and 30-day mortality (0.73; 0.66-0.79) and lowest for deep vein thrombosis (0.59; 0.55-0.64) and return to the operating room (0.60; 0.57-0.63). Moderately accurate predictive models of 30-day mortality and cardiac complications after TJA in Veterans Health Administration patients were developed and internally cross-validated. By reporting model coefficients and performance metrics, other model developers can test these models on new samples and have a procedure and indication-specific benchmark to surpass. Published by Elsevier Inc.
The financial attractiveness assessment of large waste management projects registered as clean development mechanism

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bufoni, André Luiz, E-mail: bufoni@facc.ufrj.br; Oliveira, Luciano Basto; Rosa, Luiz Pinguelli

Highlights: • Projects are not financially attractive without registration as CDMs. • WM benchmarks and indicators are converging and reducing in variance. • A sensitivity analysis reveal that revenue has more of an effect on the financial results. • Results indicate that an extensive database would reduce WM project risk and capital costs. • Disclosure standards would make information more comparable worldwide. - Abstract: This study illustrates the financial analyses for demonstration and assessment of additionality presented in the project design (PDD) and enclosed documents of the 431 large Clean Development Mechanisms (CDM) classified as the ‘waste handling and disposalmore » sector’ (13) over the past ten years (2004–2014). The expected certified emissions reductions (CER) of these projects total 63.54 million metric tons of CO{sub 2}eq, where eight countries account for 311 projects and 43.36 million metric tons. All of the projects declare themselves ‘not financially attractive’ without CER with an estimated sum of negative results of approximately a half billion US$. The results indicate that WM benchmarks and indicators are converging and reducing in variance, and the sensitivity analysis reveals that revenues have a greater effect on the financial results. This work concludes that an extensive financial database with simple standards for disclosure would greatly diminish statement problems and make information more comparable, reducing the risk and capital costs of WM projects.« less
Thermal Performance Benchmarking: Annual Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moreno, Gilbert

2016-04-08

The goal for this project is to thoroughly characterize the performance of state-of-the-art (SOA) automotive power electronics and electric motor thermal management systems. Information obtained from these studies will be used to: Evaluate advantages and disadvantages of different thermal management strategies; establish baseline metrics for the thermal management systems; identify methods of improvement to advance the SOA; increase the publicly available information related to automotive traction-drive thermal management systems; help guide future electric drive technologies (EDT) research and development (R&D) efforts. The performance results combined with component efficiency and heat generation information obtained by Oak Ridge National Laboratory (ORNL) maymore » then be used to determine the operating temperatures for the EDT components under drive-cycle conditions. In FY15, the 2012 Nissan LEAF power electronics and electric motor thermal management systems were benchmarked. Testing of the 2014 Honda Accord Hybrid power electronics thermal management system started in FY15; however, due to time constraints it was not possible to include results for this system in this report. The focus of this project is to benchmark the thermal aspects of the systems. ORNL's benchmarking of electric and hybrid electric vehicle technology reports provide detailed descriptions of the electrical and packaging aspects of these automotive systems.« less
Society of Vascular and Interventional Neurology (SVIN) Stroke Interventional Laboratory Consensus (SILC) Criteria: A 7M Management Approach to Developing a Stroke Interventional Laboratory in the Era of Stroke Thrombectomy for Large Vessel Occlusions

PubMed Central

Shams, Tanzila; Zaidat, Osama; Yavagal, Dileep; Xavier, Andrew; Jovin, Tudor; Janardhan, Vallabh

2016-01-01

Brain attack care is rapidly evolving with cutting-edge stroke interventions similar to the growth of heart attack care with cardiac interventions in the last two decades. As the field of stroke intervention is growing exponentially globally, there is clearly an unmet need to standardize stroke interventional laboratories for safe, effective, and timely stroke care. Towards this goal, the Society of Vascular and Interventional Neurology (SVIN) Writing Committee has developed the Stroke Interventional Laboratory Consensus (SILC) criteria using a 7M management approach for the development and standardization of each stroke interventional laboratory within stroke centers. The SILC criteria include: (1) manpower: personnel including roles of medical and administrative directors, attending physicians, fellows, physician extenders, and all the key stakeholders in the stroke chain of survival; (2) machines: resources needed in terms of physical facilities, and angiography equipment; (3) materials: medical device inventory, medications, and angiography supplies; (4) methods: standardized protocols for stroke workflow optimization; (5) metrics (volume): existing credentialing criteria for facilities and stroke interventionalists; (6) metrics (quality): benchmarks for quality assurance; (7) metrics (safety): radiation and procedural safety practices. PMID:27610118
EnergyIQ

DOE Office of Scientific and Technical Information (OSTI.GOV)

MILLS, EVAN; MATTHE, PAUL; STOUFER, MARTIN

2016-10-06

EnergyIQ-the first "action-oriented" benchmarking tool for non-residential buildings-provides a standardized opportunity assessment based on benchmarking results. along with decision-support information to help refine action plans. EnergyIQ offers a wide array of benchmark metrics, with visuall as well as tabular display. These include energy, costs, greenhouse-gas emissions, and a large array of characteristics (e.g. building components or operational strategies). The tool supports cross-sectional benchmarking for comparing the user's building to it's peers at one point in time, as well as longitudinal benchmarking for tracking the performance of an individual building or enterprise portfolio over time. Based on user inputs, the toolmore » generates a list of opportunities and recommended actions. Users can then explore the "Decision Support" module for helpful information on how to refine action plans, create design-intent documentation, and implement improvements. This includes information on best practices, links to other energy analysis tools and more. The variety of databases are available within EnergyIQ from which users can specify peer groups for comparison. Using the tool, this data can be visually browsed and used as a backdrop against which to view a variety of energy benchmarking metrics for the user's own building. User can save their project information and return at a later date to continue their exploration. The initial database is the CA Commercial End-Use Survey (CEUS), which provides details on energy use and characteristics for about 2800 buildings (and 62 building types). CEUS is likely the most thorough survey of its kind every conducted. The tool is built as a web service. The EnergyIQ web application is written in JSP with pervasive us of JavaScript and CSS2. EnergyIQ also supports a SOAP based web service to allow the flow of queries and data to occur with non-browser implementations. Data are stored in an Oracle 10g database. References: Mills, Mathew, Brook and Piette. 2008. "Action Oriented Benchmarking: Concepts and Tools." Energy Engineering, Vol.105, No. 4, pp 21-40. LBNL-358E; Mathew, Mills, Bourassa, Brook. 2008. "Action-Oriented Benchmarking: Using the CEUS Database to Benchmark Commercial Buildings in California." Energy Engineering, Vol 105, No. 5, pp 6-18. LBNL-502E.« less
Electric load shape benchmarking for small- and medium-sized commercial buildings

DOE Office of Scientific and Technical Information (OSTI.GOV)

Luo, Xuan; Hong, Tianzhen; Chen, Yixing

Small- and medium-sized commercial buildings owners and utility managers often look for opportunities for energy cost savings through energy efficiency and energy waste minimization. However, they currently lack easy access to low-cost tools that help interpret the massive amount of data needed to improve understanding of their energy use behaviors. Benchmarking is one of the techniques used in energy audits to identify which buildings are priorities for an energy analysis. Traditional energy performance indicators, such as the energy use intensity (annual energy per unit of floor area), consider only the total annual energy consumption, lacking consideration of the fluctuation ofmore » energy use behavior over time, which reveals the time of use information and represents distinct energy use behaviors during different time spans. To fill the gap, this study developed a general statistical method using 24-hour electric load shape benchmarking to compare a building or business/tenant space against peers. Specifically, the study developed new forms of benchmarking metrics and data analysis methods to infer the energy performance of a building based on its load shape. We first performed a data experiment with collected smart meter data using over 2,000 small- and medium-sized businesses in California. We then conducted a cluster analysis of the source data, and determined and interpreted the load shape features and parameters with peer group analysis. Finally, we implemented the load shape benchmarking feature in an open-access web-based toolkit (the Commercial Building Energy Saver) to provide straightforward and practical recommendations to users. The analysis techniques were generic and flexible for future datasets of other building types and in other utility territories.« less
Electric load shape benchmarking for small- and medium-sized commercial buildings

DOE PAGES

Luo, Xuan; Hong, Tianzhen; Chen, Yixing; ...

2017-07-28

Small- and medium-sized commercial buildings owners and utility managers often look for opportunities for energy cost savings through energy efficiency and energy waste minimization. However, they currently lack easy access to low-cost tools that help interpret the massive amount of data needed to improve understanding of their energy use behaviors. Benchmarking is one of the techniques used in energy audits to identify which buildings are priorities for an energy analysis. Traditional energy performance indicators, such as the energy use intensity (annual energy per unit of floor area), consider only the total annual energy consumption, lacking consideration of the fluctuation ofmore » energy use behavior over time, which reveals the time of use information and represents distinct energy use behaviors during different time spans. To fill the gap, this study developed a general statistical method using 24-hour electric load shape benchmarking to compare a building or business/tenant space against peers. Specifically, the study developed new forms of benchmarking metrics and data analysis methods to infer the energy performance of a building based on its load shape. We first performed a data experiment with collected smart meter data using over 2,000 small- and medium-sized businesses in California. We then conducted a cluster analysis of the source data, and determined and interpreted the load shape features and parameters with peer group analysis. Finally, we implemented the load shape benchmarking feature in an open-access web-based toolkit (the Commercial Building Energy Saver) to provide straightforward and practical recommendations to users. The analysis techniques were generic and flexible for future datasets of other building types and in other utility territories.« less

Comparative Modeling and Benchmarking Data Sets for Human Histone Deacetylases and Sirtuin Families

PubMed Central

Xia, Jie; Tilahun, Ermias Lemma; Kebede, Eyob Hailu; Reid, Terry-Elinor; Zhang, Liangren; Wang, Xiang Simon

2015-01-01

Histone Deacetylases (HDACs) are an important class of drug targets for the treatment of cancers, neurodegenerative diseases and other types of diseases. Virtual screening (VS) has become fairly effective approaches for drug discovery of novel and highly selective Histone Deacetylases Inhibitors (HDACIs). To facilitate the process, we constructed the Maximal Unbiased Benchmarking Data Sets for HDACs (MUBD-HDACs) using our recently published methods that were originally developed for building unbiased benchmarking sets for ligand-based virtual screening (LBVS). The MUBD-HDACs covers all 4 Classes including Class III (Sirtuins family) and 14 HDACs isoforms, composed of 631 inhibitors and 24,609 unbiased decoys. Its ligand sets have been validated extensively as chemically diverse, while the decoy sets were shown to be property-matching with ligands and maximal unbiased in terms of “artificial enrichment” and “analogue bias”. We also conducted comparative studies with DUD-E and DEKOIS 2.0 sets against HDAC2 and HDAC8 targets, and demonstrate that our MUBD-HDACs is unique in that it can be applied unbiasedly to both LBVS and SBVS approaches. In addition, we defined a novel metric, i.e. NLBScore, to detect the “2D bias” and “LBVS favorable” effect within the benchmarking sets. In summary, MUBD-HDACs is the only comprehensive and maximal-unbiased benchmark data sets for HDACs (including Sirtuins) that is available so far. MUBD-HDACs is freely available at http://www.xswlab.org/. PMID:25633490
NASA Software Engineering Benchmarking Study

NASA Technical Reports Server (NTRS)

Rarick, Heather L.; Godfrey, Sara H.; Kelly, John C.; Crumbley, Robert T.; Wifl, Joel M.

2013-01-01

To identify best practices for the improvement of software engineering on projects, NASA's Offices of Chief Engineer (OCE) and Safety and Mission Assurance (OSMA) formed a team led by Heather Rarick and Sally Godfrey to conduct this benchmarking study. The primary goals of the study are to identify best practices that: Improve the management and technical development of software intensive systems; Have a track record of successful deployment by aerospace industries, universities [including research and development (R&D) laboratories], and defense services, as well as NASA's own component Centers; and Identify candidate solutions for NASA's software issues. Beginning in the late fall of 2010, focus topics were chosen and interview questions were developed, based on the NASA top software challenges. Between February 2011 and November 2011, the Benchmark Team interviewed a total of 18 organizations, consisting of five NASA Centers, five industry organizations, four defense services organizations, and four university or university R and D laboratory organizations. A software assurance representative also participated in each of the interviews to focus on assurance and software safety best practices. Interviewees provided a wealth of information on each topic area that included: software policy, software acquisition, software assurance, testing, training, maintaining rigor in small projects, metrics, and use of the Capability Maturity Model Integration (CMMI) framework, as well as a number of special topics that came up in the discussions. NASA's software engineering practices compared favorably with the external organizations in most benchmark areas, but in every topic, there were ways in which NASA could improve its practices. Compared to defense services organizations and some of the industry organizations, one of NASA's notable weaknesses involved communication with contractors regarding its policies and requirements for acquired software. One of NASA's strengths was its software assurance practices, which seemed to rate well in comparison to the other organizational groups and also seemed to include a larger scope of activities. An unexpected benefit of the software benchmarking study was the identification of many opportunities for collaboration in areas including metrics, training, sharing of CMMI experiences and resources such as instructors and CMMI Lead Appraisers, and even sharing of assets such as documented processes. A further unexpected benefit of the study was the feedback on NASA practices that was received from some of the organizations interviewed. From that feedback, other potential areas where NASA could improve were highlighted, such as accuracy of software cost estimation and budgetary practices. The detailed report contains discussion of the practices noted in each of the topic areas, as well as a summary of observations and recommendations from each of the topic areas. The resulting 24 recommendations from the topic areas were then consolidated to eliminate duplication and culled into a set of 14 suggested actionable recommendations. This final set of actionable recommendations, listed below, are items that can be implemented to improve NASA's software engineering practices and to help address many of the items that were listed in the NASA top software engineering issues. 1. Develop and implement standard contract language for software procurements. 2. Advance accurate and trusted software cost estimates for both procured and in-house software and improve the capture of actual cost data to facilitate further improvements. 3. Establish a consistent set of objectives and expectations, specifically types of metrics at the Agency level, so key trends and models can be identified and used to continuously improve software processes and each software development effort. 4. Maintain the CMMI Maturity Level requirement for critical NASA projects and use CMMI to measure organizations developing software for NASA. 5.onsolidate, collect and, if needed, develop common processes principles and other assets across the Agency in order to provide more consistency in software development and acquisition practices and to reduce the overall cost of maintaining or increasing current NASA CMMI maturity levels. 6. Provide additional support for small projects that includes: (a) guidance for appropriate tailoring of requirements for small projects, (b) availability of suitable tools, including support tool set-up and training, and (c) training for small project personnel, assurance personnel and technical authorities on the acceptable options for tailoring requirements and performing assurance on small projects. 7. Develop software training classes for the more experienced software engineers using on-line training, videos, or small separate modules of training that can be accommodated as needed throughout a project. 8. Create guidelines to structure non-classroom training opportunities such as mentoring, peer reviews, lessons learned sessions, and on-the-job training. 9. Develop a set of predictive software defect data and a process for assessing software testing metric data against it. 10. Assess Agency-wide licenses for commonly used software tools. 11. Fill the knowledge gap in common software engineering practices for new hires and co-ops.12. Work through the Science, Technology, Engineering and Mathematics (STEM) program with universities in strengthening education in the use of common software engineering practices and standards. 13. Follow up this benchmark study with a deeper look into what both internal and external organizations perceive as the scope of software assurance, the value they expect to obtain from it, and the shortcomings they experience in the current practice. 14. Continue interactions with external software engineering environment through collaborations, knowledge sharing, and benchmarking.
Benchmark Credentialing Results for NRG-BR001: The First National Cancer Institute-Sponsored Trial of Stereotactic Body Radiation Therapy for Multiple Metastases

DOE Office of Scientific and Technical Information (OSTI.GOV)

Al-Hallaq, Hania A., E-mail: halhallaq@radonc.uchicago.edu; Chmura, Steven J.; Salama, Joseph K.

Purpose: The NRG-BR001 trial is the first National Cancer Institute–sponsored trial to treat multiple (range 2-4) extracranial metastases with stereotactic body radiation therapy. Benchmark credentialing is required to ensure adherence to this complex protocol, in particular, for metastases in close proximity. The present report summarizes the dosimetric results and approval rates. Methods and Materials: The benchmark used anonymized data from a patient with bilateral adrenal metastases, separated by <5 cm of normal tissue. Because the planning target volume (PTV) overlaps with organs at risk (OARs), institutions must use the planning priority guidelines to balance PTV coverage (45 Gy in 3 fractions) againstmore » OAR sparing. Submitted plans were processed by the Imaging and Radiation Oncology Core and assessed by the protocol co-chairs by comparing the doses to targets, OARs, and conformity metrics using nonparametric tests. Results: Of 63 benchmarks submitted through October 2015, 94% were approved, with 51% approved at the first attempt. Most used volumetric arc therapy (VMAT) (78%), a single plan for both PTVs (90%), and prioritized the PTV over the stomach (75%). The median dose to 95% of the volume was 44.8 ± 1.0 Gy and 44.9 ± 1.0 Gy for the right and left PTV, respectively. The median dose to 0.03 cm{sup 3} was 14.2 ± 2.2 Gy to the spinal cord and 46.5 ± 3.1 Gy to the stomach. Plans that spared the stomach significantly reduced the dose to the left PTV and stomach. Conformity metrics were significantly better for single plans that simultaneously treated both PTVs with VMAT, intensity modulated radiation therapy, or 3-dimensional conformal radiation therapy compared with separate plans. No significant differences existed in the dose at 2 cm from the PTVs. Conclusions: Although most plans used VMAT, the range of conformity and dose falloff was large. The decision to prioritize either OARs or PTV coverage varied considerably, suggesting that the toxicity outcomes in the trial could be affected. Several benchmarks met the dose-volume histogram metrics but produced unacceptable plans owing to low conformity. Dissemination of a frequently-asked-questions document improved the approval rate at the first attempt. Benchmark credentialing was found to be a valuable tool for educating institutions about the protocol requirements.« less
Benchmark Credentialing Results for NRG-BR001: The First National Cancer Institute-Sponsored Trial of Stereotactic Body Radiation Therapy for Multiple Metastases.

PubMed

Al-Hallaq, Hania A; Chmura, Steven J; Salama, Joseph K; Lowenstein, Jessica R; McNulty, Susan; Galvin, James M; Followill, David S; Robinson, Clifford G; Pisansky, Thomas M; Winter, Kathryn A; White, Julia R; Xiao, Ying; Matuszak, Martha M

2017-01-01

The NRG-BR001 trial is the first National Cancer Institute-sponsored trial to treat multiple (range 2-4) extracranial metastases with stereotactic body radiation therapy. Benchmark credentialing is required to ensure adherence to this complex protocol, in particular, for metastases in close proximity. The present report summarizes the dosimetric results and approval rates. The benchmark used anonymized data from a patient with bilateral adrenal metastases, separated by <5 cm of normal tissue. Because the planning target volume (PTV) overlaps with organs at risk (OARs), institutions must use the planning priority guidelines to balance PTV coverage (45 Gy in 3 fractions) against OAR sparing. Submitted plans were processed by the Imaging and Radiation Oncology Core and assessed by the protocol co-chairs by comparing the doses to targets, OARs, and conformity metrics using nonparametric tests. Of 63 benchmarks submitted through October 2015, 94% were approved, with 51% approved at the first attempt. Most used volumetric arc therapy (VMAT) (78%), a single plan for both PTVs (90%), and prioritized the PTV over the stomach (75%). The median dose to 95% of the volume was 44.8 ± 1.0 Gy and 44.9 ± 1.0 Gy for the right and left PTV, respectively. The median dose to 0.03 cm 3 was 14.2 ± 2.2 Gy to the spinal cord and 46.5 ± 3.1 Gy to the stomach. Plans that spared the stomach significantly reduced the dose to the left PTV and stomach. Conformity metrics were significantly better for single plans that simultaneously treated both PTVs with VMAT, intensity modulated radiation therapy, or 3-dimensional conformal radiation therapy compared with separate plans. No significant differences existed in the dose at 2 cm from the PTVs. Although most plans used VMAT, the range of conformity and dose falloff was large. The decision to prioritize either OARs or PTV coverage varied considerably, suggesting that the toxicity outcomes in the trial could be affected. Several benchmarks met the dose-volume histogram metrics but produced unacceptable plans owing to low conformity. Dissemination of a frequently-asked-questions document improved the approval rate at the first attempt. Benchmark credentialing was found to be a valuable tool for educating institutions about the protocol requirements. Copyright © 2016 Elsevier Inc. All rights reserved.
Benchmarking the Wilmer general eye services clinics: baseline metrics for surgical and outpatient clinic volume in an educational environment.

PubMed

Singman, Eric; Srikumaran, Divya; Hackett, Kathy; Kaplan, Brian; Jun, Albert; Preece, Derek; Ramulu, Pradeep

2016-01-27

The Wilmer General Eye Services (GES) at the Johns Hopkins Hospital is the clinic where residents provide supervised comprehensive medical and surgical care to ophthalmology patients. The clinic schedule and supervision structure allows for a progressive increase in trainee responsibility, with graduated autonomy and longitudinal continuity of care over the three years of ophthalmology residency training. This study sought to determine the number of cases the GES contributes to the resident surgical experiences. In addition, it was intended to create benchmarks for patient volumes, cataract surgery yield and room utilization as part of an educational initiative to introduce residents to metrics important for practice management. The electronic surgical posting system database was explored to determine the numbers of cases scheduled for patients seen by residents in the GES. In addition, aggregated residents' self-reported Accreditation Council for Graduate Medical Education (ACGME) surgical logs were collected for comparison. Finally transactional databases were queried to determine clinic volumes of new and established patients. The proportion of resident surgeries (1(st) surgeon and assistant) provided by GES patients, cataract surgery yield and new patient rates were calculated. Data was collected from July 1(st), 2014 until March 31(st), 2015 for all 16 residents (6 third year, 5 second year and 5 first year). The percentage of cataract, oculoplastics, cornea and glaucoma surgeries in which a resident was 1(st) surgeon and the patient came from the GES was 91.3, 76.1, 65.6, and 93.9 respectively. The new patient rate was 28.1% and room utilization was 50.4%. Cataract surgery yield was 29.2 DISCUSSION: The GES provides a significant proportion of primary surgeon opportunities for the residents, and in some instances, the majority of cases. Compared to benchmarks available for private practices, the new patient rate is high while the cataract surgery yield is low. The room utilization is lower than the 85% preferred by the hospital system. These are the first benchmarks of this type for an academic resident ophthalmology practice in the United States. Our study suggests that resident-hosted clinics can provide the majority of surgical opportunities for ophthalmology trainees, particulary with regard to cataract cases. However, because our study is the first academic resident practice to publish metrics of the type used in private practices, it is impossible to determine where our clinic stands compared to other training programs. Therefore, the authors strongly encourage ophthalmology training programs to explore and publish practice metrics. This will permit the creation of a benchmarking program that could be used to quantify efforts at enhancing ophthalmic resident education.
Combustion

NASA Technical Reports Server (NTRS)

Bulzan, Dan

2007-01-01

An overview of the emissions related research being conducted as part of the Fundamental Aeronautics Subsonics Fixed Wing Project is presented. The overview includes project metrics, milestones, and descriptions of major research areas. The overview also includes information on some of the emissions research being conducted under NASA Research Announcements. Objective: Development of comprehensive detailed and reduced kinetic mechanisms of jet fuels for chemically-reacting flow modeling. Scientific Challenges: 1) Developing experimental facilities capable of handling higher hydrocarbons and providing benchmark combustion data. 2) Determining and understanding ignition and combustion characteristics, such as laminar flame speeds, extinction stretch rates, and autoignition delays, of jet fuels and hydrocarbons relevant to jet surrogates. 3) Developing comprehensive kinetic models for jet fuels.
Measuring FLOPS Using Hardware Performance Counter Technologies on LC systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ahn, D H

2008-09-05

FLOPS (FLoating-point Operations Per Second) is a commonly used performance metric for scientific programs that rely heavily on floating-point (FP) calculations. The metric is based on the number of FP operations rather than instructions, thereby facilitating a fair comparison between different machines. A well-known use of this metric is the LINPACK benchmark that is used to generate the Top500 list. It measures how fast a computer solves a dense N by N system of linear equations Ax=b, which requires a known number of FP operations, and reports the result in millions of FP operations per second (MFLOPS). While running amore » benchmark with known FP workloads can provide insightful information about the efficiency of a machine's FP pipelines in relation to other machines, measuring FLOPS of an arbitrary scientific application in a platform-independent manner is nontrivial. The goal of this paper is twofold. First, we explore the FP microarchitectures of key processors that are underpinning the LC machines. Second, we present the hardware performance monitoring counter-based measurement techniques that a user can use to get the native FLOPS of his or her program, which are practical solutions readily available on LC platforms. By nature, however, these native FLOPS metrics are not directly comparable across different machines mainly because FP operations are not consistent across microarchitectures. Thus, the first goal of this paper represents the base reference by which a user can interpret the measured FLOPS more judiciously.« less
A state-of-the-art review on segmentation algorithms in intravascular ultrasound (IVUS) images.

PubMed

Katouzian, Amin; Angelini, Elsa D; Carlier, Stéphane G; Suri, Jasjit S; Navab, Nassir; Laine, Andrew F

2012-09-01

Over the past two decades, intravascular ultrasound (IVUS) image segmentation has remained a challenge for researchers while the use of this imaging modality is rapidly growing in catheterization procedures and in research studies. IVUS provides cross-sectional grayscale images of the arterial wall and the extent of atherosclerotic plaques with high spatial resolution in real time. In this paper, we review recently developed image processing methods for the detection of media-adventitia and luminal borders in IVUS images acquired with different transducers operating at frequencies ranging from 20 to 45 MHz. We discuss methodological challenges, lack of diversity in reported datasets, and weaknesses of quantification metrics that make IVUS segmentation still an open problem despite all efforts. In conclusion, we call for a common reference database, validation metrics, and ground-truth definition with which new and existing algorithms could be benchmarked.
The Creation of a Pediatric Hospital Medicine Dashboard: Performance Assessment for Improvement.

PubMed

Fox, Lindsay Anne; Walsh, Kathleen E; Schainker, Elisabeth G

2016-07-01

Leaders of pediatric hospital medicine (PHM) recommended a clinical dashboard to monitor clinical practice and make improvements. To date, however, no programs report implementing a dashboard including the proposed broad range of metrics across multiple sites. We sought to (1) develop and populate a clinical dashboard to demonstrate productivity, quality, group sustainability, and value added for an academic division of PHM across 4 inpatient sites; (2) share dashboard data with division members and administrations to improve performance and guide program development; and (3) revise the dashboard to optimize its utility. Division members proposed a dashboard based on PHM recommendations. We assessed feasibility of data collection and defined and modified metrics to enable collection of comparable data across sites. We gathered data and shared the results with division members and administrations. We collected quarterly and annual data from October 2011 to September 2013. We found comparable metrics across all sites for descriptive, productivity, group sustainability, and value-added domains; only 72% of all quality metrics were tracked in a comparable fashion. After sharing the data, we saw increased timeliness of nursery discharges and an increase in hospital committee participation and grant funding. PHM dashboards have the potential to guide program development, mobilize faculty to improve care, and demonstrate program value to stakeholders. Dashboard implementation at other institutions and data sharing across sites may help to better define and strengthen the field of PHM by creating benchmarks and help improve the quality of pediatric hospital care. Copyright © 2016 by the American Academy of Pediatrics.
Metrics for Assessing the Quality of Groundwater Used for Public Supply, CA, USA: Equivalent-Population and Area.

PubMed

Belitz, Kenneth; Fram, Miranda S; Johnson, Tyler D

2015-07-21

Data from 11,000 public supply wells in 87 study areas were used to assess the quality of nearly all of the groundwater used for public supply in California. Two metrics were developed for quantifying groundwater quality: area with high concentrations (km(2) or proportion) and equivalent-population relying upon groundwater with high concentrations (number of people or proportion). Concentrations are considered high if they are above a human-health benchmark. When expressed as proportions, the metrics are area-weighted and population-weighted detection frequencies. On a statewide-scale, about 20% of the groundwater used for public supply has high concentrations for one or more constituents (23% by area and 18% by equivalent-population). On the basis of both area and equivalent-population, trace elements are more prevalent at high concentrations than either nitrate or organic compounds at the statewide-scale, in eight of nine hydrogeologic provinces, and in about three-quarters of the study areas. At a statewide-scale, nitrate is more prevalent than organic compounds based on area, but not on the basis of equivalent-population. The approach developed for this paper, unlike many studies, recognizes the importance of appropriately weighting information when changing scales, and is broadly applicable to other areas.
Development of a Quantitative Decision Metric for Selecting the Most Suitable Discretization Method for SN Transport Problems

NASA Astrophysics Data System (ADS)

Schunert, Sebastian

In this work we develop a quantitative decision metric for spatial discretization methods of the SN equations. The quantitative decision metric utilizes performance data from selected test problems for computing a fitness score that is used for the selection of the most suitable discretization method for a particular SN transport application. The fitness score is aggregated as a weighted geometric mean of single performance indicators representing various performance aspects relevant to the user. Thus, the fitness function can be adjusted to the particular needs of the code practitioner by adding/removing single performance indicators or changing their importance via the supplied weights. Within this work a special, broad class of methods is considered, referred to as nodal methods. This class is naturally comprised of the DGFEM methods of all function space families. Within this work it is also shown that the Higher Order Diamond Difference (HODD) method is a nodal method. Building on earlier findings that the Arbitrarily High Order Method of the Nodal type (AHOTN) is also a nodal method, a generalized finite-element framework is created to yield as special cases various methods that were developed independently using profoundly different formalisms. A selection of test problems related to a certain performance aspect are considered: an Method of Manufactured Solutions (MMS) test suite for assessing accuracy and execution time, Lathrop's test problem for assessing resilience against occurrence of negative fluxes, and a simple, homogeneous cube test problem to verify if a method possesses the thick diffusive limit. The contending methods are implemented as efficiently as possible under a common SN transport code framework to level the playing field for a fair comparison of their computational load. Numerical results are presented for all three test problems and a qualitative rating of each method's performance is provided for each aspect: accuracy/efficiency, resilience against negative fluxes, and possession of the thick diffusion limit, separately. The choice of the most efficient method depends on the utilized error norm: in Lp error norms higher order methods such as the AHOTN method of order three perform best, while for computing integral quantities the linear nodal (LN) method is most efficient. The most resilient method against occurrence of negative fluxes is the simple corner balance (SCB) method. A validation of the quantitative decision metric is performed based on the NEA box-inbox suite of test problems. The validation exercise comprises two stages: first prediction of the contending methods' performance via the decision metric and second computing the actual scores based on data obtained from the NEA benchmark problem. The comparison of predicted and actual scores via a penalty function (ratio of predicted best performer's score to actual best score) completes the validation exercise. It is found that the decision metric is capable of very accurate predictions (penalty < 10%) in more than 83% of the considered cases and features penalties up to 20% for the remaining cases. An exception to this rule is the third test case NEA-III intentionally set up to incorporate a poor match of the benchmark with the "data" problems. However, even under these worst case conditions the decision metric's suggestions are never detrimental. Suggestions for improving the decision metric's accuracy are to increase the pool of employed data, to refine the mapping of a given configuration to a case in the database, and to better characterize the desired target quantities.
Nontraditional Student Graduation Rate Benchmarks

ERIC Educational Resources Information Center

Miller, Nathan B.

2014-01-01

The prominence of discourse on postsecondary degree completion, student persistence, and retention has increased in the national dialogue. Heightened attention to college completion rates by the federal government and pressure to tie state funding to performance metrics associated with graduation rates are catalysts for the discussion.…
Measuring the FMCSA's safety objectives from March 2000 to September 2004.

DOT National Transportation Integrated Search

2006-01-01

The Volpe Center was requested by FMCSA to establish metrics and benchmarks against which to assess progress in attaining the FMCSA safety objectives. This was to be done objectively, emphasizing the use of SafeStat information. SafeStat (short for M...
Mechanistic Sediment Quality Guidelines Based on Contaminant Bioavailability: Equilibrium Partitioning Sediment Benchmarks

EPA Science Inventory

Globally, billions of metric tons of contaminated sediments are present in aquatic systems representing a potentially significant ecological risk. Estimated costs to manage (i.e., remediate and monitor) these sediments are in the billions of U.S. dollars. Biologically-based app...
Benchmarking the performance of fixed-image receptor digital radiography systems. Part 2: system performance metric.

PubMed

Lee, Kam L; Bernardo, Michael; Ireland, Timothy A

2016-06-01

This is part two of a two-part study in benchmarking system performance of fixed digital radiographic systems. The study compares the system performance of seven fixed digital radiography systems based on quantitative metrics like modulation transfer function (sMTF), normalised noise power spectrum (sNNPS), detective quantum efficiency (sDQE) and entrance surface air kerma (ESAK). It was found that the most efficient image receptors (greatest sDQE) were not necessarily operating at the lowest ESAK. In part one of this study, sMTF is shown to depend on system configuration while sNNPS is shown to be relatively consistent across systems. Systems are ranked on their signal-to-noise ratio efficiency (sDQE) and their ESAK. Systems using the same equipment configuration do not necessarily have the same system performance. This implies radiographic practice at the site will have an impact on the overall system performance. In general, systems are more dose efficient at low dose settings.
Performance Benchmarks for Scholarly Metrics Associated with Fisheries and Wildlife Faculty

PubMed Central

Swihart, Robert K.; Sundaram, Mekala; Höök, Tomas O.; DeWoody, J. Andrew; Kellner, Kenneth F.

2016-01-01

Research productivity and impact are often considered in professional evaluations of academics, and performance metrics based on publications and citations increasingly are used in such evaluations. To promote evidence-based and informed use of these metrics, we collected publication and citation data for 437 tenure-track faculty members at 33 research-extensive universities in the United States belonging to the National Association of University Fisheries and Wildlife Programs. For each faculty member, we computed 8 commonly used performance metrics based on numbers of publications and citations, and recorded covariates including academic age (time since Ph.D.), sex, percentage of appointment devoted to research, and the sub-disciplinary research focus. Standardized deviance residuals from regression models were used to compare faculty after accounting for variation in performance due to these covariates. We also aggregated residuals to enable comparison across universities. Finally, we tested for temporal trends in citation practices to assess whether the “law of constant ratios”, used to enable comparison of performance metrics between disciplines that differ in citation and publication practices, applied to fisheries and wildlife sub-disciplines when mapped to Web of Science Journal Citation Report categories. Our regression models reduced deviance by ¼ to ½. Standardized residuals for each faculty member, when combined across metrics as a simple average or weighted via factor analysis, produced similar results in terms of performance based on percentile rankings. Significant variation was observed in scholarly performance across universities, after accounting for the influence of covariates. In contrast to findings for other disciplines, normalized citation ratios for fisheries and wildlife sub-disciplines increased across years. Increases were comparable for all sub-disciplines except ecology. We discuss the advantages and limitations of our methods, illustrate their use when applied to new data, and suggest future improvements. Our benchmarking approach may provide a useful tool to augment detailed, qualitative assessment of performance. PMID:27152838
Performance Benchmarks for Scholarly Metrics Associated with Fisheries and Wildlife Faculty.

PubMed

Swihart, Robert K; Sundaram, Mekala; Höök, Tomas O; DeWoody, J Andrew; Kellner, Kenneth F

2016-01-01

Research productivity and impact are often considered in professional evaluations of academics, and performance metrics based on publications and citations increasingly are used in such evaluations. To promote evidence-based and informed use of these metrics, we collected publication and citation data for 437 tenure-track faculty members at 33 research-extensive universities in the United States belonging to the National Association of University Fisheries and Wildlife Programs. For each faculty member, we computed 8 commonly used performance metrics based on numbers of publications and citations, and recorded covariates including academic age (time since Ph.D.), sex, percentage of appointment devoted to research, and the sub-disciplinary research focus. Standardized deviance residuals from regression models were used to compare faculty after accounting for variation in performance due to these covariates. We also aggregated residuals to enable comparison across universities. Finally, we tested for temporal trends in citation practices to assess whether the "law of constant ratios", used to enable comparison of performance metrics between disciplines that differ in citation and publication practices, applied to fisheries and wildlife sub-disciplines when mapped to Web of Science Journal Citation Report categories. Our regression models reduced deviance by ¼ to ½. Standardized residuals for each faculty member, when combined across metrics as a simple average or weighted via factor analysis, produced similar results in terms of performance based on percentile rankings. Significant variation was observed in scholarly performance across universities, after accounting for the influence of covariates. In contrast to findings for other disciplines, normalized citation ratios for fisheries and wildlife sub-disciplines increased across years. Increases were comparable for all sub-disciplines except ecology. We discuss the advantages and limitations of our methods, illustrate their use when applied to new data, and suggest future improvements. Our benchmarking approach may provide a useful tool to augment detailed, qualitative assessment of performance.
A benchmark for reaction coordinates in the transition path ensemble

PubMed Central

2016-01-01

The molecular mechanism of a reaction is embedded in its transition path ensemble, the complete collection of reactive trajectories. Utilizing the information in the transition path ensemble alone, we developed a novel metric, which we termed the emergent potential energy, for distinguishing reaction coordinates from the bath modes. The emergent potential energy can be understood as the average energy cost for making a displacement of a coordinate in the transition path ensemble. Where displacing a bath mode invokes essentially no cost, it costs significantly to move the reaction coordinate. Based on some general assumptions of the behaviors of reaction and bath coordinates in the transition path ensemble, we proved theoretically with statistical mechanics that the emergent potential energy could serve as a benchmark of reaction coordinates and demonstrated its effectiveness by applying it to a prototypical system of biomolecular dynamics. Using the emergent potential energy as guidance, we developed a committor-free and intuition-independent method for identifying reaction coordinates in complex systems. We expect this method to be applicable to a wide range of reaction processes in complex biomolecular systems. PMID:27059559
Best practices from WisDOT mega and ARRA projects--request for information : benchmarks and metrics.

DOT National Transportation Integrated Search

2012-03-01

Successful highway construction is measured by cost, time, safety, and quality. One further measure of success is the quantity of Request for Information's (RFI) submitted and their impact. An RFI is a formal written procedure initiated by the contra...
Local coding based matching kernel method for image classification.

PubMed

Song, Yan; McLoughlin, Ian Vince; Dai, Li-Rong

2014-01-01

This paper mainly focuses on how to effectively and efficiently measure visual similarity for local feature based representation. Among existing methods, metrics based on Bag of Visual Word (BoV) techniques are efficient and conceptually simple, at the expense of effectiveness. By contrast, kernel based metrics are more effective, but at the cost of greater computational complexity and increased storage requirements. We show that a unified visual matching framework can be developed to encompass both BoV and kernel based metrics, in which local kernel plays an important role between feature pairs or between features and their reconstruction. Generally, local kernels are defined using Euclidean distance or its derivatives, based either explicitly or implicitly on an assumption of Gaussian noise. However, local features such as SIFT and HoG often follow a heavy-tailed distribution which tends to undermine the motivation behind Euclidean metrics. Motivated by recent advances in feature coding techniques, a novel efficient local coding based matching kernel (LCMK) method is proposed. This exploits the manifold structures in Hilbert space derived from local kernels. The proposed method combines advantages of both BoV and kernel based metrics, and achieves a linear computational complexity. This enables efficient and scalable visual matching to be performed on large scale image sets. To evaluate the effectiveness of the proposed LCMK method, we conduct extensive experiments with widely used benchmark datasets, including 15-Scenes, Caltech101/256, PASCAL VOC 2007 and 2011 datasets. Experimental results confirm the effectiveness of the relatively efficient LCMK method.

[Clinical trial data management and quality metrics system].

PubMed

Chen, Zhao-hua; Huang, Qin; Deng, Ya-zhong; Zhang, Yue; Xu, Yu; Yu, Hao; Liu, Zong-fan

2015-11-01

Data quality management system is essential to ensure accurate, complete, consistent, and reliable data collection in clinical research. This paper is devoted to various choices of data quality metrics. They are categorized by study status, e.g. study start up, conduct, and close-out. In each category, metrics for different purposes are listed according to ALCOA+ principles such us completeness, accuracy, timeliness, traceability, etc. Some general quality metrics frequently used are also introduced. This paper contains detail information as much as possible to each metric by providing definition, purpose, evaluation, referenced benchmark, and recommended targets in favor of real practice. It is important that sponsors and data management service providers establish a robust integrated clinical trial data quality management system to ensure sustainable high quality of clinical trial deliverables. It will also support enterprise level of data evaluation and bench marking the quality of data across projects, sponsors, data management service providers by using objective metrics from the real clinical trials. We hope this will be a significant input to accelerate the improvement of clinical trial data quality in the industry.
Automatic Keyword Extraction from Individual Documents

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rose, Stuart J.; Engel, David W.; Cramer, Nicholas O.

2010-05-03

This paper introduces a novel and domain-independent method for automatically extracting keywords, as sequences of one or more words, from individual documents. We describe the method’s configuration parameters and algorithm, and present an evaluation on a benchmark corpus of technical abstracts. We also present a method for generating lists of stop words for specific corpora and domains, and evaluate its ability to improve keyword extraction on the benchmark corpus. Finally, we apply our method of automatic keyword extraction to a corpus of news articles and define metrics for characterizing the exclusivity, essentiality, and generality of extracted keywords within a corpus.
Benchmarking the neurology practice.

PubMed

Henderson, William S

2010-05-01

A medical practice, whether operated by a solo physician or by a group, is a business. For a neurology practice to be successful, it must meet performance measures that ensure its viability. The best method of doing this is to benchmark the practice, both against itself over time and against other practices. Crucial medical practice metrics that should be measured are financial performance, staffing efficiency, physician productivity, and patient access. Such measures assist a physician or practice in achieving the goals and objectives that each determines are important to providing quality health care to patients. Copyright 2010 Elsevier Inc. All rights reserved.
Propulsion Diagnostic Method Evaluation Strategy (ProDiMES) User's Guide

NASA Technical Reports Server (NTRS)

Simon, Donald L.

2010-01-01

This report is a User's Guide for the Propulsion Diagnostic Method Evaluation Strategy (ProDiMES). ProDiMES is a standard benchmarking problem and a set of evaluation metrics to enable the comparison of candidate aircraft engine gas path diagnostic methods. This Matlab (The Mathworks, Inc.) based software tool enables users to independently develop and evaluate diagnostic methods. Additionally, a set of blind test case data is also distributed as part of the software. This will enable the side-by-side comparison of diagnostic approaches developed by multiple users. The Users Guide describes the various components of ProDiMES, and provides instructions for the installation and operation of the tool.
Validation and Verification of Operational Land Analysis Activities at the Air Force Weather Agency

NASA Technical Reports Server (NTRS)

Shaw, Michael; Kumar, Sujay V.; Peters-Lidard, Christa D.; Cetola, Jeffrey

2012-01-01

The NASA developed Land Information System (LIS) is the Air Force Weather Agency's (AFWA) operational Land Data Assimilation System (LDAS) combining real time precipitation observations and analyses, global forecast model data, vegetation, terrain, and soil parameters with the community Noah land surface model, along with other hydrology module options, to generate profile analyses of global soil moisture, soil temperature, and other important land surface characteristics. (1) A range of satellite data products and surface observations used to generate the land analysis products (2) Global, 1/4 deg spatial resolution (3) Model analysis generated at 3 hours. AFWA recognizes the importance of operational benchmarking and uncertainty characterization for land surface modeling and is developing standard methods, software, and metrics to verify and/or validate LIS output products. To facilitate this and other needs for land analysis activities at AFWA, the Model Evaluation Toolkit (MET) -- a joint product of the National Center for Atmospheric Research Developmental Testbed Center (NCAR DTC), AFWA, and the user community -- and the Land surface Verification Toolkit (LVT), developed at the Goddard Space Flight Center (GSFC), have been adapted to operational benchmarking needs of AFWA's land characterization activities.
Applicability domains for classification problems: benchmarking of distance to models for AMES mutagenicity set

EPA Science Inventory

For QSAR and QSPR modeling of biological and physicochemical properties, estimating the accuracy of predictions is a critical problem. The “distance to model” (DM) can be defined as a metric that defines the similarity between the training set molecules and the test set compound ...
Uav Cameras: Overview and Geometric Calibration Benchmark

NASA Astrophysics Data System (ADS)

Cramer, M.; Przybilla, H.-J.; Zurhorst, A.

2017-08-01

Different UAV platforms and sensors are used in mapping already, many of them equipped with (sometimes) modified cameras as known from the consumer market. Even though these systems normally fulfil their requested mapping accuracy, the question arises, which system performs best? This asks for a benchmark, to check selected UAV based camera systems in well-defined, reproducible environments. Such benchmark is tried within this work here. Nine different cameras used on UAV platforms, representing typical camera classes, are considered. The focus is laid on the geometry here, which is tightly linked to the process of geometrical calibration of the system. In most applications the calibration is performed in-situ, i.e. calibration parameters are obtained as part of the project data itself. This is often motivated because consumer cameras do not keep constant geometry, thus, cannot be seen as metric cameras. Still, some of the commercial systems are quite stable over time, as it was proven from repeated (terrestrial) calibrations runs. Already (pre-)calibrated systems may offer advantages, especially when the block geometry of the project does not allow for a stable and sufficient in-situ calibration. Especially for such scenario close to metric UAV cameras may have advantages. Empirical airborne test flights in a calibration field have shown how block geometry influences the estimated calibration parameters and how consistent the parameters from lab calibration can be reproduced.
Identifying Seizure Onset Zone From the Causal Connectivity Inferred Using Directed Information

NASA Astrophysics Data System (ADS)

Malladi, Rakesh; Kalamangalam, Giridhar; Tandon, Nitin; Aazhang, Behnaam

2016-10-01

In this paper, we developed a model-based and a data-driven estimator for directed information (DI) to infer the causal connectivity graph between electrocorticographic (ECoG) signals recorded from brain and to identify the seizure onset zone (SOZ) in epileptic patients. Directed information, an information theoretic quantity, is a general metric to infer causal connectivity between time-series and is not restricted to a particular class of models unlike the popular metrics based on Granger causality or transfer entropy. The proposed estimators are shown to be almost surely convergent. Causal connectivity between ECoG electrodes in five epileptic patients is inferred using the proposed DI estimators, after validating their performance on simulated data. We then proposed a model-based and a data-driven SOZ identification algorithm to identify SOZ from the causal connectivity inferred using model-based and data-driven DI estimators respectively. The data-driven SOZ identification outperforms the model-based SOZ identification algorithm when benchmarked against visual analysis by neurologist, the current clinical gold standard. The causal connectivity analysis presented here is the first step towards developing novel non-surgical treatments for epilepsy.
The financial attractiveness assessment of large waste management projects registered as clean development mechanism.

PubMed

Bufoni, André Luiz; Oliveira, Luciano Basto; Rosa, Luiz Pinguelli

2015-09-01

This study illustrates the financial analyses for demonstration and assessment of additionality presented in the project design (PDD) and enclosed documents of the 431 large Clean Development Mechanisms (CDM) classified as the 'waste handling and disposal sector' (13) over the past ten years (2004-2014). The expected certified emissions reductions (CER) of these projects total 63.54 million metric tons of CO2eq, where eight countries account for 311 projects and 43.36 million metric tons. All of the projects declare themselves 'not financially attractive' without CER with an estimated sum of negative results of approximately a half billion US$. The results indicate that WM benchmarks and indicators are converging and reducing in variance, and the sensitivity analysis reveals that revenues have a greater effect on the financial results. This work concludes that an extensive financial database with simple standards for disclosure would greatly diminish statement problems and make information more comparable, reducing the risk and capital costs of WM projects. Copyright © 2015 Elsevier Ltd. All rights reserved.
Optimizing Blasting’s Air Overpressure Prediction Model using Swarm Intelligence

NASA Astrophysics Data System (ADS)

Nur Asmawisham Alel, Mohd; Ruben Anak Upom, Mark; Asnida Abdullah, Rini; Hazreek Zainal Abidin, Mohd

2018-04-01

Air overpressure (AOp) resulting from blasting can cause damage and nuisance to nearby civilians. Thus, it is important to be able to predict AOp accurately. In this study, 8 different Artificial Neural Network (ANN) were developed for the purpose of prediction of AOp. The ANN models were trained using different variants of Particle Swarm Optimization (PSO) algorithm. AOp predictions were also made using an empirical equation, as suggested by United States Bureau of Mines (USBM), to serve as a benchmark. In order to develop the models, 76 blasting operations in Hulu Langat were investigated. All the ANN models were found to outperform the USBM equation in three performance metrics; root mean square error (RMSE), mean absolute percentage error (MAPE) and coefficient of determination (R2). Using a performance ranking method, MSO-Rand-Mut was determined to be the best prediction model for AOp with a performance metric of RMSE=2.18, MAPE=1.73% and R2=0.97. The result shows that ANN models trained using PSO are capable of predicting AOp with great accuracy.
Minimum Transendothelial Electrical Resistance Thresholds for the Study of Small and Large Molecule Drug Transport in a Human in Vitro Blood-Brain Barrier Model.

PubMed

Mantle, Jennifer L; Min, Lie; Lee, Kelvin H

2016-12-05

A human cell-based in vitro model that can accurately predict drug penetration into the brain as well as metrics to assess these in vitro models are valuable for the development of new therapeutics. Here, human induced pluripotent stem cells (hPSCs) are differentiated into a polarized monolayer that express blood-brain barrier (BBB)-specific proteins and have transendothelial electrical resistance (TEER) values greater than 2500 Ω·cm 2 . By assessing the permeabilities of several known drugs, a benchmarking system to evaluate brain permeability of drugs was established. Furthermore, relationships between TEER and permeability to both small and large molecules were established, demonstrating that different minimum TEER thresholds must be achieved to study the brain transport of these two classes of drugs. This work demonstrates that this hPSC-derived BBB model exhibits an in vivo-like phenotype, and the benchmarks established here are useful for assessing functionality of other in vitro BBB models.
Design and Application of a Community Land Benchmarking System for Earth System Models

NASA Astrophysics Data System (ADS)

Mu, M.; Hoffman, F. M.; Lawrence, D. M.; Riley, W. J.; Keppel-Aleks, G.; Koven, C. D.; Kluzek, E. B.; Mao, J.; Randerson, J. T.

2015-12-01

Benchmarking has been widely used to assess the ability of climate models to capture the spatial and temporal variability of observations during the historical era. For the carbon cycle and terrestrial ecosystems, the design and development of an open-source community platform has been an important goal as part of the International Land Model Benchmarking (ILAMB) project. Here we developed a new benchmarking software system that enables the user to specify the models, benchmarks, and scoring metrics, so that results can be tailored to specific model intercomparison projects. Evaluation data sets included soil and aboveground carbon stocks, fluxes of energy, carbon and water, burned area, leaf area, and climate forcing and response variables. We used this system to evaluate simulations from the 5th Phase of the Coupled Model Intercomparison Project (CMIP5) with prognostic atmospheric carbon dioxide levels over the period from 1850 to 2005 (i.e., esmHistorical simulations archived on the Earth System Grid Federation). We found that the multi-model ensemble had a high bias in incoming solar radiation across Asia, likely as a consequence of incomplete representation of aerosol effects in this region, and in South America, primarily as a consequence of a low bias in mean annual precipitation. The reduced precipitation in South America had a larger influence on gross primary production than the high bias in incoming light, and as a consequence gross primary production had a low bias relative to the observations. Although model to model variations were large, the multi-model mean had a positive bias in atmospheric carbon dioxide that has been attributed in past work to weak ocean uptake of fossil emissions. In mid latitudes of the northern hemisphere, most models overestimate latent heat fluxes in the early part of the growing season, and underestimate these fluxes in mid-summer and early fall, whereas sensible heat fluxes show the opposite trend.
A comprehensive space management model for facilitating programmatic research.

PubMed

Libecap, Ann; Wormsley, Steven; Cress, Anne; Matthews, Mary; Souza, Angie; Joiner, Keith A

2008-03-01

In FY04, the authors developed and implemented models to manage existing and incremental research space, and to facilitate programmatic research, at the University of Arizona College of Medicine. Benchmarks were set for recovery of total sponsored research dollars and for facilities and administrative (F&A) dollars/net square foot (nsf) of space, based on college-wide metrics. Benchmarks were applied to units (departments, centers), rather than to individual faculty. Performance relative to the benchmark was assessed using three-year moving averages, and applied to existing blocks of space. Space was recaptured or allocated, in all cases to programmatic themes, using uniform policies. F&A revenues were returned on the basis of performance relative to a benchmark. During the first two years after implementation of the model (FY05 and FY06), and for the 24 units occupying research space, median total sponsored research revenue/nsf increased from $393.96 to $474.46 (20.4%), and median F&A revenue/nsf increased from $57.42 to $91.86 (60.0%). These large increases in median values are driven primarily from redistribution and recapturing of space. Recruiting policies for unit heads were developed to facilitate joint hires among units. In combination, these policies created a comprehensive space management model for facilitating programmatic research. Although challenges remain in implementing the programmatic recruitment strategy, and selected modifications to the original policy were introduced later (e.g., research space for newly recruited junior faculty is now exempted from calculations for three years), overall, the models have created a climate of transparency that is now accepted and that allows efficient and equitable management of research space.
Emergency department performance measures updates: proceedings of the 2014 emergency department benchmarking alliance consensus summit.

PubMed

Wiler, Jennifer L; Welch, Shari; Pines, Jesse; Schuur, Jeremiah; Jouriles, Nick; Stone-Griffith, Suzanne

2015-05-01

The objective was to review and update key definitions and metrics for emergency department (ED) performance and operations. Forty-five emergency medicine leaders convened for the Third Performance Measures and Benchmarking Summit held in Las Vegas, February 21-22, 2014. Prior to arrival, attendees were assigned to workgroups to review, revise, and update the definitions and vocabulary being used to communicate about ED performance and operations. They were provided with the prior definitions of those consensus summits that were published in 2006 and 2010. Other published definitions from key stakeholders in emergency medicine and health care were also reviewed and circulated. At the summit, key terminology and metrics were discussed and debated. Workgroups communicated online, via teleconference, and finally in a face-to-face meeting to reach consensus regarding their recommendations. Recommendations were then posted and open to a 30-day comment period. Participants then reanalyzed the recommendations, and modifications were made based on consensus. A comprehensive dictionary of ED terminology related to ED performance and operation was developed. This article includes definitions of operating characteristics and internal and external factors relevant to the stratification and categorization of EDs. Time stamps, time intervals, and measures of utilization were defined. Definitions of processes and staffing measures are also presented. Definitions were harmonized with performance measures put forth by the Centers for Medicare and Medicaid Services (CMS) for consistency. Standardized definitions are necessary to improve the comparability of EDs nationally for operations research and practice. More importantly, clear precise definitions describing ED operations are needed for incentive-based pay-for-performance models like those developed by CMS. This document provides a common language for front-line practitioners, managers, health policymakers, and researchers. © 2015 by the Society for Academic Emergency Medicine.
Kaiser Permanente's performance improvement system, Part 1: From benchmarking to executing on strategic priorities.

PubMed

Schilling, Lisa; Chase, Alide; Kehrli, Sommer; Liu, Amy Y; Stiefel, Matt; Brentari, Ruth

2010-11-01

By 2004, senior leaders at Kaiser Permanente, the largest not-for-profit health plan in the United States, recognizing variations across service areas in quality, safety, service, and efficiency, began developing a performance improvement (PI) system to realizing best-in-class quality performance across all 35 medical centers. MEASURING SYSTEMWIDE PERFORMANCE: In 2005, a Web-based data dashboard, "Big Q," which tracks the performance of each medical center and service area against external benchmarks and internal goals, was created. PLANNING FOR PI AND BENCHMARKING PERFORMANCE: In 2006, Kaiser Permanente national and regional continued planning the PI system, and in 2007, quality, medical group, operations, and information technology leaders benchmarked five high-performing organizations to identify capabilities required to achieve consistent best-in-class organizational performance. THE PI SYSTEM: The PI system addresses the six capabilities: leadership priority setting, a systems approach to improvement, measurement capability, a learning organization, improvement capacity, and a culture of improvement. PI "deep experts" (mentors) consult with national, regional, and local leaders, and more than 500 improvement advisors are trained to manage portfolios of 90-120 day improvement initiatives at medical centers. Between the second quarter of 2008 and the first quarter of 2009, performance across all Kaiser Permanente medical centers improved on the Big Q metrics. The lessons learned in implementing and sustaining PI as it becomes fully integrated into all levels of Kaiser Permanente can be generalized to other health care systems, hospitals, and other health care organizations.
Evaluation of metrics for benchmarking antimicrobial use in the UK dairy industry.

PubMed

Mills, Harriet L; Turner, Andrea; Morgans, Lisa; Massey, Jonathan; Schubert, Hannah; Rees, Gwen; Barrett, David; Dowsey, Andrew; Reyher, Kristen K

2018-03-31

The issue of antimicrobial resistance is of global concern across human and animal health. In 2016, the UK government committed to new targets for reducing antimicrobial use (AMU) in livestock. Although a number of metrics for quantifying AMU are defined in the literature, all give slightly different interpretations. This paper evaluates a selection of metrics for AMU in the dairy industry: total mg, total mg/kg, daily dose and daily course metrics. Although the focus is on their application to the dairy industry, the metrics and issues discussed are relevant across livestock sectors. In order to be used widely, a metric should be understandable and relevant to the veterinarians and farmers who are prescribing and using antimicrobials. This means that clear methods, assumptions (and possible biases), standardised values and exceptions should be published for all metrics. Particularly relevant are assumptions around the number and weight of cattle at risk of treatment and definitions of dose rates and course lengths; incorrect assumptions can mean metrics over-represent or under-represent AMU. The authors recommend that the UK dairy industry work towards the UK-specific metrics using the UK-specific medicine dose and course regimens as well as cattle weights in order to monitor trends nationally. © British Veterinary Association (unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
The mass storage testing laboratory at GSFC

NASA Technical Reports Server (NTRS)

Venkataraman, Ravi; Williams, Joel; Michaud, David; Gu, Heng; Kalluri, Atri; Hariharan, P. C.; Kobler, Ben; Behnke, Jeanne; Peavey, Bernard

1998-01-01

Industry-wide benchmarks exist for measuring the performance of processors (SPECmarks), and of database systems (Transaction Processing Council). Despite storage having become the dominant item in computing and IT (Information Technology) budgets, no such common benchmark is available in the mass storage field. Vendors and consultants provide services and tools for capacity planning and sizing, but these do not account for the complete set of metrics needed in today's archives. The availability of automated tape libraries, high-capacity RAID systems, and high- bandwidth interconnectivity between processor and peripherals has led to demands for services which traditional file systems cannot provide. File Storage and Management Systems (FSMS), which began to be marketed in the late 80's, have helped to some extent with large tape libraries, but their use has introduced additional parameters affecting performance. The aim of the Mass Storage Test Laboratory (MSTL) at Goddard Space Flight Center is to develop a test suite that includes not only a comprehensive check list to document a mass storage environment but also benchmark code. Benchmark code is being tested which will provide measurements for both baseline systems, i.e. applications interacting with peripherals through the operating system services, and for combinations involving an FSMS. The benchmarks are written in C, and are easily portable. They are initially being aimed at the UNIX Open Systems world. Measurements are being made using a Sun Ultra 170 Sparc with 256MB memory running Solaris 2.5.1 with the following configuration: 4mm tape stacker on SCSI 2 Fast/Wide; 4GB disk device on SCSI 2 Fast/Wide; and Sony Petaserve on Fast/Wide differential SCSI 2.
Community-based benchmarking of the CMIP DECK experiments

NASA Astrophysics Data System (ADS)

Gleckler, P. J.

2015-12-01

A diversity of community-based efforts are independently developing "diagnostic packages" with little or no coordination between them. A short list of examples include NCAR's Climate Variability Diagnostics Package (CVDP), ORNL's International Land Model Benchmarking (ILAMB), LBNL's Toolkit for Extreme Climate Analysis (TECA), PCMDI's Metrics Package (PMP), the EU EMBRACE ESMValTool, the WGNE MJO diagnostics package, and CFMIP diagnostics. The full value of these efforts cannot be realized without some coordination. As a first step, a WCRP effort has initiated a catalog to document candidate packages that could potentially be applied in a "repeat-use" fashion to all simulations contributed to the CMIP DECK (Diagnostic, Evaluation and Characterization of Klima) experiments. Some coordination of community-based diagnostics has the additional potential to improve how CMIP modeling groups analyze their simulations during model-development. The fact that most modeling groups now maintain a "CMIP compliant" data stream means that in principal without much effort they could readily adopt a set of well organized diagnostic capabilities specifically designed to operate on CMIP DECK experiments. Ultimately, a detailed listing of and access to analysis codes that are demonstrated to work "out of the box" with CMIP data could enable model developers (and others) to select those codes they wish to implement in-house, potentially enabling more systematic evaluation during the model development process.
Variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnostics.

PubMed

Mahmood, Khalid; Jung, Chol-Hee; Philip, Gayle; Georgeson, Peter; Chung, Jessica; Pope, Bernard J; Park, Daniel J

2017-05-16

Genetic variant effect prediction algorithms are used extensively in clinical genomics and research to determine the likely consequences of amino acid substitutions on protein function. It is vital that we better understand their accuracies and limitations because published performance metrics are confounded by serious problems of circularity and error propagation. Here, we derive three independent, functionally determined human mutation datasets, UniFun, BRCA1-DMS and TP53-TA, and employ them, alongside previously described datasets, to assess the pre-eminent variant effect prediction tools. Apparent accuracies of variant effect prediction tools were influenced significantly by the benchmarking dataset. Benchmarking with the assay-determined datasets UniFun and BRCA1-DMS yielded areas under the receiver operating characteristic curves in the modest ranges of 0.52 to 0.63 and 0.54 to 0.75, respectively, considerably lower than observed for other, potentially more conflicted datasets. These results raise concerns about how such algorithms should be employed, particularly in a clinical setting. Contemporary variant effect prediction tools are unlikely to be as accurate at the general prediction of functional impacts on proteins as reported prior. Use of functional assay-based datasets that avoid prior dependencies promises to be valuable for the ongoing development and accurate benchmarking of such tools.
First Steps Toward a Quality of Climate Finance Scorecard (QUODA-CF): Creating a Comparative Index to Assess International Climate Finance Contributions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sierra, Katherine; Roberts, Timmons; de Nevers, Michele

Are climate finance contributor countries, multilateral aid agencies and specialized funds using widely accepted best practices in foreign assistance? How is it possible to measure and compare international climate finance contributions when there are as yet no established metrics or agreed definitions of the quality of climate finance? As a subjective metric, quality can mean different things to different stakeholders, while of donor countries, recipients and institutional actors may place quality across a broad spectrum of objectives. This subjectivity makes the assessment of the quality of climate finance contributions a useful and necessary exercise, but one that has many challenges.more » This work seeks to enhance the development of common definitions and metrics of the quality of climate finance, to understand what we can about those areas where climate finance information is available and shine a light on the areas where there is a severe dearth of data. Allowing for comparisons of the use of best practices across funding institutions in the climate sector could begin a process of benchmarking performance, fostering learning across institutions and driving improvements when incorporated in internal evaluation protocols of those institutions. In the medium term, this kind of benchmarking and transparency could support fundraising in contributor countries and help build trust with recipient countries. As a feasibility study, this paper attempts to outline the importance of assessing international climate finance contributions while describing the difficulties in arriving at universally agreed measurements and indicators for assessment. In many cases, data are neither readily available nor complete, and there is no consensus on what should be included. A number of indicators are proposed in this study as a starting point with which to analyze voluntary contributions, but in some cases their methodologies are not complete, and further research is required for a robust measurement tool to be created.« less

Comparing Two CBM Maze Selection Tools: Considering Scoring and Interpretive Metrics for Universal Screening

ERIC Educational Resources Information Center

Ford, Jeremy W.; Missall, Kristen N.; Hosp, John L.; Kuhle, Jennifer L.

2016-01-01

Advances in maze selection curriculum-based measurement have led to several published tools with technical information for interpretation (e.g., norms, benchmarks, cut-scores, classification accuracy) that have increased their usefulness for universal screening. A range of scoring practices have emerged for evaluating student performance on maze…
Evaluating MoE and its Uncertainty and Variability for Food Contaminants (EuroTox presentation)

EPA Science Inventory

Margin of Exposure (MoE), is a metric for quantifying the relationship between exposure and hazard. Ideally, it is the ratio of the dose associated with hazard and an estimate of exposure. For example, hazard may be characterized by a benchmark dose (BMD), and, for food contami...
Short-Term Solar Forecasting Performance of Popular Machine Learning Algorithms: Preprint

DOE Office of Scientific and Technical Information (OSTI.GOV)

Florita, Anthony R; Elgindy, Tarek; Hodge, Brian S

A framework for assessing the performance of short-term solar forecasting is presented in conjunction with a range of numerical results using global horizontal irradiation (GHI) from the open-source Surface Radiation Budget (SURFRAD) data network. A suite of popular machine learning algorithms is compared according to a set of statistically distinct metrics and benchmarked against the persistence-of-cloudiness forecast and a cloud motion forecast. Results show significant improvement compared to the benchmarks with trade-offs among the machine learning algorithms depending on the desired error metric. Training inputs include time series observations of GHI for a history of years, historical weather and atmosphericmore » measurements, and corresponding date and time stamps such that training sensitivities might be inferred. Prediction outputs are GHI forecasts for 1, 2, 3, and 4 hours ahead of the issue time, and they are made for every month of the year for 7 locations. Photovoltaic power and energy outputs can then be made using the solar forecasts to better understand power system impacts.« less
SU-F-T-231: Improving the Efficiency of a Radiotherapy Peer-Review System for Quality Assurance

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hsu, S; Basavatia, A; Garg, M

Purpose: To improve the efficiency of a radiotherapy peer-review system using a commercially available software application for plan quality evaluation and documentation. Methods: A commercial application, FullAccess (Radialogica LLC, Version 1.4.4), was implemented in a Citrix platform for peer-review process and patient documentation. This application can display images, isodose lines, and dose-volume histograms and create plan reports for peer-review process. Dose metrics in the report can also be benchmarked for plan quality evaluation. Site-specific templates were generated based on departmental treatment planning policies and procedures for each disease site, which generally follow RTOG protocols as well as published prospective clinicalmore » trial data, including both conventional fractionation and hypo-fractionation schema. Once a plan is ready for review, the planner exports the plan to FullAccess, applies the site-specific template, and presents the report for plan review. The plan is still reviewed in the treatment planning system, as that is the legal record. Upon physician’s approval of a plan, the plan is packaged for peer review with the plan report and dose metrics are saved to the database. Results: The reports show dose metrics of PTVs and critical organs for the plans and also indicate whether or not the metrics are within tolerance. Graphical results with green, yellow, and red lights are displayed of whether planning objectives have been met. In addition, benchmarking statistics are collected to see where the current plan falls compared to all historical plans on each metric. All physicians in peer review can easily verify constraints by these reports. Conclusion: We have demonstrated the improvement in a radiotherapy peer-review system, which allows physicians to easily verify planning constraints for different disease sites and fractionation schema, allows for standardization in the clinic to ensure that departmental policies are maintained, and builds a comprehensive database for potential clinical outcome evaluation.« less
Metrics for the Diurnal Cycle of Precipitation: Toward Routine Benchmarks for Climate Models

DOE PAGES

Covey, Curt; Gleckler, Peter J.; Doutriaux, Charles; ...

2016-06-08

In this paper, metrics are proposed—that is, a few summary statistics that condense large amounts of data from observations or model simulations—encapsulating the diurnal cycle of precipitation. Vector area averaging of Fourier amplitude and phase produces useful information in a reasonably small number of harmonic dial plots, a procedure familiar from atmospheric tide research. The metrics cover most of the globe but down-weight high-latitude wintertime ocean areas where baroclinic waves are most prominent. This enables intercomparison of a large number of climate models with observations and with each other. The diurnal cycle of precipitation has features not encountered in typicalmore » climate model intercomparisons, notably the absence of meaningful “average model” results that can be displayed in a single two-dimensional map. Displaying one map per model guides development of the metrics proposed here by making it clear that land and ocean areas must be averaged separately, but interpreting maps from all models becomes problematic as the size of a multimodel ensemble increases. Global diurnal metrics provide quick comparisons with observations and among models, using the most recent version of the Coupled Model Intercomparison Project (CMIP). This includes, for the first time in CMIP, spatial resolutions comparable to global satellite observations. Finally, consistent with earlier studies of resolution versus parameterization of the diurnal cycle, the longstanding tendency of models to produce rainfall too early in the day persists in the high-resolution simulations, as expected if the error is due to subgrid-scale physics.« less
Metrics for the Diurnal Cycle of Precipitation: Toward Routine Benchmarks for Climate Models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Covey, Curt; Gleckler, Peter J.; Doutriaux, Charles

In this paper, metrics are proposed—that is, a few summary statistics that condense large amounts of data from observations or model simulations—encapsulating the diurnal cycle of precipitation. Vector area averaging of Fourier amplitude and phase produces useful information in a reasonably small number of harmonic dial plots, a procedure familiar from atmospheric tide research. The metrics cover most of the globe but down-weight high-latitude wintertime ocean areas where baroclinic waves are most prominent. This enables intercomparison of a large number of climate models with observations and with each other. The diurnal cycle of precipitation has features not encountered in typicalmore » climate model intercomparisons, notably the absence of meaningful “average model” results that can be displayed in a single two-dimensional map. Displaying one map per model guides development of the metrics proposed here by making it clear that land and ocean areas must be averaged separately, but interpreting maps from all models becomes problematic as the size of a multimodel ensemble increases. Global diurnal metrics provide quick comparisons with observations and among models, using the most recent version of the Coupled Model Intercomparison Project (CMIP). This includes, for the first time in CMIP, spatial resolutions comparable to global satellite observations. Finally, consistent with earlier studies of resolution versus parameterization of the diurnal cycle, the longstanding tendency of models to produce rainfall too early in the day persists in the high-resolution simulations, as expected if the error is due to subgrid-scale physics.« less
NASA Software Engineering Benchmarking Effort

NASA Technical Reports Server (NTRS)

Godfrey, Sally; Rarick, Heather

2012-01-01

Benchmarking was very interesting and provided a wealth of information (1) We did see potential solutions to some of our "top 10" issues (2) We have an assessment of where NASA stands with relation to other aerospace/defense groups We formed new contacts and potential collaborations (1) Several organizations sent us examples of their templates, processes (2) Many of the organizations were interested in future collaboration: sharing of training, metrics, Capability Maturity Model Integration (CMMI) appraisers, instructors, etc. We received feedback from some of our contractors/ partners (1) Desires to participate in our training; provide feedback on procedures (2) Welcomed opportunity to provide feedback on working with NASA
Benchmarking Brain-Computer Interfaces Outside the Laboratory: The Cybathlon 2016

PubMed Central

Novak, Domen; Sigrist, Roland; Gerig, Nicolas J.; Wyss, Dario; Bauer, René; Götz, Ulrich; Riener, Robert

2018-01-01

This paper presents a new approach to benchmarking brain-computer interfaces (BCIs) outside the lab. A computer game was created that mimics a real-world application of assistive BCIs, with the main outcome metric being the time needed to complete the game. This approach was used at the Cybathlon 2016, a competition for people with disabilities who use assistive technology to achieve tasks. The paper summarizes the technical challenges of BCIs, describes the design of the benchmarking game, then describes the rules for acceptable hardware, software and inclusion of human pilots in the BCI competition at the Cybathlon. The 11 participating teams, their approaches, and their results at the Cybathlon are presented. Though the benchmarking procedure has some limitations (for instance, we were unable to identify any factors that clearly contribute to BCI performance), it can be successfully used to analyze BCI performance in realistic, less structured conditions. In the future, the parameters of the benchmarking game could be modified to better mimic different applications (e.g., the need to use some commands more frequently than others). Furthermore, the Cybathlon has the potential to showcase such devices to the general public. PMID:29375294
MoleculeNet: a benchmark for molecular machine learning† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02664a

PubMed Central

Wu, Zhenqin; Ramsundar, Bharath; Feinberg, Evan N.; Gomes, Joseph; Geniesse, Caleb; Pappu, Aneesh S.; Leswing, Karl

2017-01-01

Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm. PMID:29629118
Hospital readiness for health information exchange: development of metrics associated with successful collaboration for quality improvement.

PubMed

Korst, Lisa M; Aydin, Carolyn E; Signer, Jordana M K; Fink, Arlene

2011-08-01

The development of readiness metrics for organizational participation in health information exchange is critical for monitoring progress toward, and achievement of, successful inter-organizational collaboration. In preparation for the development of a tool to measure readiness for data-sharing, we tested whether organizational capacities known to be related to readiness were associated with successful participation in an American data-sharing collaborative for quality improvement. Cross-sectional design, using an on-line survey of hospitals in a large, mature data-sharing collaborative organized for benchmarking and improvement in nursing care quality. Factor analysis was used to identify salient constructs, and identified factors were analyzed with respect to "successful" participation. "Success" was defined as the incorporation of comparative performance data into the hospital dashboard. The most important factor in predicting success included survey items measuring the strength of organizational leadership in fostering a culture of quality improvement (QI Leadership): (1) presence of a supportive hospital executive; (2) the extent to which a hospital values data; (3) the presence of leaders' vision for how the collaborative advances the hospital's strategic goals; (4) hospital use of the collaborative data to track quality outcomes; and (5) staff recognition of a strong mandate for collaborative participation (α=0.84, correlation with Success 0.68 [P<0.0001]). The data emphasize the importance of hospital QI Leadership in collaboratives that aim to share data for QI or safety purposes. Such metrics should prove useful in the planning and development of this complex form of inter-organizational collaboration. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Hospital readiness for health information exchange: development of metrics associated with successful collaboration for quality improvement

PubMed Central

Korst, Lisa M.; Aydin, Carolyn E.; Signer, Jordana M. K.; Fink, Arlene

2011-01-01

Objective The development of readiness metrics for organizational participation in health information exchange is critical for monitoring progress toward, and achievement of, successful inter-organizational collaboration. In preparation for the development of a tool to measure readiness for data-sharing, we tested whether organizational capacities known to be related to readiness were associated with successful participation in an American data-sharing collaborative for quality improvement. Design Cross-sectional design, using an on-line survey of hospitals in a large, mature data-sharing collaborative organized for benchmarking and improvement in nursing care quality. Measurements Factor analysis was used to identify salient constructs, and identified factors were analyzed with respect to “successful” participation. “Success” was defined as the incorporation of comparative performance data into the hospital dashboard. Results The most important factor in predicting success included survey items measuring the strength of organizational leadership in fostering a culture of quality improvement (QI Leadership): 1) presence of a supportive hospital executive; 2) the extent to which a hospital values data; 3) the presence of leaders’ vision for how the collaborative advances the hospital’s strategic goals; 4) hospital use of the collaborative data to track quality outcomes; and 5) staff recognition of a strong mandate for collaborative participation (α = 0.84, correlation with Success 0.68 [P < 0.0001]). Conclusion The data emphasize the importance of hospital QI Leadership in collaboratives that aim to share data for QI or safety purposes. Such metrics should prove useful in the planning and development of this complex form of inter-organizational collaboration. PMID:21330191
An international land-biosphere model benchmarking activity for the IPCC Fifth Assessment Report (AR5)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hoffman, Forrest M; Randerson, James T; Thornton, Peter E

2009-12-01

The need to capture important climate feedbacks in general circulation models (GCMs) has resulted in efforts to include atmospheric chemistry and land and ocean biogeochemistry into the next generation of production climate models, called Earth System Models (ESMs). While many terrestrial and ocean carbon models have been coupled to GCMs, recent work has shown that such models can yield a wide range of results (Friedlingstein et al., 2006). This work suggests that a more rigorous set of global offline and partially coupled experiments, along with detailed analyses of processes and comparisons with measurements, are needed. The Carbon-Land Model Intercomparison Projectmore » (C-LAMP) was designed to meet this need by providing a simulation protocol and model performance metrics based upon comparisons against best-available satellite- and ground-based measurements (Hoffman et al., 2007). Recently, a similar effort in Europe, called the International Land Model Benchmark (ILAMB) Project, was begun to assess the performance of European land surface models. These two projects will now serve as prototypes for a proposed international land-biosphere model benchmarking activity for those models participating in the IPCC Fifth Assessment Report (AR5). Initially used for model validation for terrestrial biogeochemistry models in the NCAR Community Land Model (CLM), C-LAMP incorporates a simulation protocol for both offline and partially coupled simulations using a prescribed historical trajectory of atmospheric CO2 concentrations. Models are confronted with data through comparisons against AmeriFlux site measurements, MODIS satellite observations, NOAA Globalview flask records, TRANSCOM inversions, and Free Air CO2 Enrichment (FACE) site measurements. Both sets of experiments have been performed using two different terrestrial biogeochemistry modules coupled to the CLM version 3 in the Community Climate System Model version 3 (CCSM3): the CASA model of Fung, et al., and the carbon-nitrogen (CN) model of Thornton. Comparisons of the CLM3 offline results against observational datasets have been performed and are described in Randerson et al. (2009). CLM version 4 has been evaluated using C-LAMP, showing improvement in many of the metrics. Efforts are now underway to initiate a Nitrogen-Land Model Intercomparison Project (N-LAMP) to better constrain the effects of the nitrogen cycle in biosphere models. Presented will be new results from C-LAMP for CLM4, initial N-LAMP developments, and the proposed land-biosphere model benchmarking activity.« less
Investigating emergency room service quality using lean manufacturing.

PubMed

Abdelhadi, Abdelhakim

2015-01-01

The purpose of this paper is to investigate a lean manufacturing metric called Takt time as a benchmark evaluation measure to evaluate a public hospital's service quality. Lean manufacturing is an established managerial philosophy with a proven track record in industry. A lean metric called Takt time is applied as a measure to compare the relative efficiency between two emergency departments (EDs) belonging to the same public hospital. Outcomes guide managers to improve patient services and increase hospital performances. The patient treatment lead time within the hospital's two EDs (one department serves male and the other female patients) are the study's focus. A lean metric called Takt time is used to find the service's relative efficiency. Findings show that the lean manufacturing metric called Takt time can be used as an effective way to measure service efficiency by analyzing relative efficiency and identifies bottlenecks in different departments providing the same services. The paper presents a new procedure to compare relative efficiency between two EDs. It can be applied to any healthcare facility.
U.S. Residential Photovoltaic (PV) System Prices, Q4 2013 Benchmarks: Cash Purchase, Fair Market Value, and Prepaid Lease Transaction Prices

DOE Office of Scientific and Technical Information (OSTI.GOV)

Davidson, C.; James, T. L.; Margolis, R.

The price of photovoltaic (PV) systems in the United States (i.e., the cost to the system owner) has dropped precipitously in recent years, led by substantial reductions in global PV module prices. This report provides a Q4 2013 update for residential PV systems, based on an objective methodology that closely approximates the book value of a PV system. Several cases are benchmarked to represent common variation in business models, labor rates, and module choice. We estimate a weighted-average cash purchase price of $3.29/W for modeled standard-efficiency, polycrystalline-silicon residential PV systems installed in the United States. This is a 46% declinemore » from the 2013-dollar-adjusted price reported in the Q4 2010 benchmark report. In addition, this report frames the cash purchase price in the context of key price metrics relevant to the continually evolving landscape of third-party-owned PV systems by benchmarking the minimum sustainable lease price and the fair market value of residential PV systems.« less
The National Practice Benchmark for oncology, 2014 report on 2013 data.

PubMed

Towle, Elaine L; Barr, Thomas R; Senese, James L

2014-11-01

The National Practice Benchmark (NPB) is a unique tool to measure oncology practices against others across the country in a way that allows meaningful comparisons despite differences in practice size or setting. In today's economic environment every oncology practice, regardless of business structure or affiliation, should be able to produce, monitor, and benchmark basic metrics to meet current business pressures for increased efficiency and efficacy of care. Although we recognize that the NPB survey results do not capture the experience of all oncology practices, practices that can and do participate demonstrate exceptional managerial capability, and this year those practices are recognized for their participation. In this report, we continue to emphasize the methodology introduced last year in which we reported medical revenue net of the cost of the drugs as net medical revenue for the hematology/oncology product line. The effect of this is to capture only the gross margin attributable to drugs as revenue. New this year, we introduce six measures of clinical data density and expand the radiation oncology benchmarks. Copyright © 2014 by American Society of Clinical Oncology.
On the relationship between tumour growth rate and survival in non-small cell lung cancer.

PubMed

Mistry, Hitesh B

2017-01-01

A recurrent question within oncology drug development is predicting phase III outcome for a new treatment using early clinical data. One approach to tackle this problem has been to derive metrics from mathematical models that describe tumour size dynamics termed re-growth rate and time to tumour re-growth. They have shown to be strong predictors of overall survival in numerous studies but there is debate about how these metrics are derived and if they are more predictive than empirical end-points. This work explores the issues raised in using model-derived metric as predictors for survival analyses. Re-growth rate and time to tumour re-growth were calculated for three large clinical studies by forward and reverse alignment. The latter involves re-aligning patients to their time of progression. Hence, it accounts for the time taken to estimate re-growth rate and time to tumour re-growth but also assesses if these predictors correlate to survival from the time of progression. I found that neither re-growth rate nor time to tumour re-growth correlated to survival using reverse alignment. This suggests that the dynamics of tumours up until disease progression has no relationship to survival post progression. For prediction of a phase III trial I found the metrics performed no better than empirical end-points. These results highlight that care must be taken when relating dynamics of tumour imaging to survival and that bench-marking new approaches to existing ones is essential.
Informatics in radiology: Efficiency metrics for imaging device productivity.

PubMed

Hu, Mengqi; Pavlicek, William; Liu, Patrick T; Zhang, Muhong; Langer, Steve G; Wang, Shanshan; Place, Vicki; Miranda, Rafael; Wu, Teresa Tong

2011-01-01

Acute awareness of the costs associated with medical imaging equipment is an ever-present aspect of the current healthcare debate. However, the monitoring of productivity associated with expensive imaging devices is likely to be labor intensive, relies on summary statistics, and lacks accepted and standardized benchmarks of efficiency. In the context of the general Six Sigma DMAIC (design, measure, analyze, improve, and control) process, a World Wide Web-based productivity tool called the Imaging Exam Time Monitor was developed to accurately and remotely monitor imaging efficiency with use of Digital Imaging and Communications in Medicine (DICOM) combined with a picture archiving and communication system. Five device efficiency metrics-examination duration, table utilization, interpatient time, appointment interval time, and interseries time-were derived from DICOM values. These metrics allow the standardized measurement of productivity, to facilitate the comparative evaluation of imaging equipment use and ongoing efforts to improve efficiency. A relational database was constructed to store patient imaging data, along with device- and examination-related data. The database provides full access to ad hoc queries and can automatically generate detailed reports for administrative and business use, thereby allowing staff to monitor data for trends and to better identify possible changes that could lead to improved productivity and reduced costs in association with imaging services. © RSNA, 2011.
Improving benchmarking by using an explicit framework for the development of composite indicators: an example using pediatric quality of care

PubMed Central

2010-01-01

Background The measurement of healthcare provider performance is becoming more widespread. Physicians have been guarded about performance measurement, in part because the methodology for comparative measurement of care quality is underdeveloped. Comprehensive quality improvement will require comprehensive measurement, implying the aggregation of multiple quality metrics into composite indicators. Objective To present a conceptual framework to develop comprehensive, robust, and transparent composite indicators of pediatric care quality, and to highlight aspects specific to quality measurement in children. Methods We reviewed the scientific literature on composite indicator development, health systems, and quality measurement in the pediatric healthcare setting. Frameworks were selected for explicitness and applicability to a hospital-based measurement system. Results We synthesized various frameworks into a comprehensive model for the development of composite indicators of quality of care. Among its key premises, the model proposes identifying structural, process, and outcome metrics for each of the Institute of Medicine's six domains of quality (safety, effectiveness, efficiency, patient-centeredness, timeliness, and equity) and presents a step-by-step framework for embedding the quality of care measurement model into composite indicator development. Conclusions The framework presented offers researchers an explicit path to composite indicator development. Without a scientifically robust and comprehensive approach to measurement of the quality of healthcare, performance measurement will ultimately fail to achieve its quality improvement goals. PMID:20181129
Measures of International Manufacturing and Trade of Clean Energy Technologies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Engel-Cox, Jill; Sandor, Debbie; Keyser, David

The technologies that produce clean energy, such as solar photovoltaic panels and lithium ion batteries for electric vehicles, are globally manufactured and traded. As demand and deployment of these technologies grows exponentially, the innovation to reach significant economies of scale and drive down energy production costs becomes less in the technology and more in the manufacturing of the technology. Manufacturing innovations and other manufacturing decisions can reduce costs of labor, materials, equipment, operating costs, and transportation, across all the links in the supply chain. To better understand the manufacturing aspect of the clean energy economy, we have developed key metricsmore » for systematically measuring and benchmarking international manufacturing of clean energy technologies. The metrics are: trade, market size, manufacturing value-added, and manufacturing capacity and production. These metrics were applied to twelve global economies and four representative technologies: wind turbine components, crystalline silicon solar photovoltaic modules, vehicle lithium ion battery cells, and light emitting diode packages for efficient lighting and other consumer products. The results indicated that clean energy technologies are being developed via complex, dynamic, and global supply chains, with individual economies benefiting from different technologies and links in the supply chain, through both domestic manufacturing and global trade.« less
Deep Multimodal Distance Metric Learning Using Click Constraints for Image Ranking.

PubMed

Yu, Jun; Yang, Xiaokang; Gao, Fei; Tao, Dacheng

2017-12-01

How do we retrieve images accurately? Also, how do we rank a group of images precisely and efficiently for specific queries? These problems are critical for researchers and engineers to generate a novel image searching engine. First, it is important to obtain an appropriate description that effectively represent the images. In this paper, multimodal features are considered for describing images. The images unique properties are reflected by visual features, which are correlated to each other. However, semantic gaps always exist between images visual features and semantics. Therefore, we utilize click feature to reduce the semantic gap. The second key issue is learning an appropriate distance metric to combine these multimodal features. This paper develops a novel deep multimodal distance metric learning (Deep-MDML) method. A structured ranking model is adopted to utilize both visual and click features in distance metric learning (DML). Specifically, images and their related ranking results are first collected to form the training set. Multimodal features, including click and visual features, are collected with these images. Next, a group of autoencoders is applied to obtain initially a distance metric in different visual spaces, and an MDML method is used to assign optimal weights for different modalities. Next, we conduct alternating optimization to train the ranking model, which is used for the ranking of new queries with click features. Compared with existing image ranking methods, the proposed method adopts a new ranking model to use multimodal features, including click features and visual features in DML. We operated experiments to analyze the proposed Deep-MDML in two benchmark data sets, and the results validate the effects of the method.

Incorporating big data into treatment plan evaluation: Development of statistical DVH metrics and visualization dashboards.

PubMed

Mayo, Charles S; Yao, John; Eisbruch, Avraham; Balter, James M; Litzenberg, Dale W; Matuszak, Martha M; Kessler, Marc L; Weyburn, Grant; Anderson, Carlos J; Owen, Dawn; Jackson, William C; Haken, Randall Ten

2017-01-01

To develop statistical dose-volume histogram (DVH)-based metrics and a visualization method to quantify the comparison of treatment plans with historical experience and among different institutions. The descriptive statistical summary (ie, median, first and third quartiles, and 95% confidence intervals) of volume-normalized DVH curve sets of past experiences was visualized through the creation of statistical DVH plots. Detailed distribution parameters were calculated and stored in JavaScript Object Notation files to facilitate management, including transfer and potential multi-institutional comparisons. In the treatment plan evaluation, structure DVH curves were scored against computed statistical DVHs and weighted experience scores (WESs). Individual, clinically used, DVH-based metrics were integrated into a generalized evaluation metric (GEM) as a priority-weighted sum of normalized incomplete gamma functions. Historical treatment plans for 351 patients with head and neck cancer, 104 with prostate cancer who were treated with conventional fractionation, and 94 with liver cancer who were treated with stereotactic body radiation therapy were analyzed to demonstrate the usage of statistical DVH, WES, and GEM in a plan evaluation. A shareable dashboard plugin was created to display statistical DVHs and integrate GEM and WES scores into a clinical plan evaluation within the treatment planning system. Benchmarking with normal tissue complication probability scores was carried out to compare the behavior of GEM and WES scores. DVH curves from historical treatment plans were characterized and presented, with difficult-to-spare structures (ie, frequently compromised organs at risk) identified. Quantitative evaluations by GEM and/or WES compared favorably with the normal tissue complication probability Lyman-Kutcher-Burman model, transforming a set of discrete threshold-priority limits into a continuous model reflecting physician objectives and historical experience. Statistical DVH offers an easy-to-read, detailed, and comprehensive way to visualize the quantitative comparison with historical experiences and among institutions. WES and GEM metrics offer a flexible means of incorporating discrete threshold-prioritizations and historic context into a set of standardized scoring metrics. Together, they provide a practical approach for incorporating big data into clinical practice for treatment plan evaluations.
Benchmarking Big Data Systems and the BigData Top100 List.

PubMed

Baru, Chaitanya; Bhandarkar, Milind; Nambiar, Raghunath; Poess, Meikel; Rabl, Tilmann

2013-03-01

"Big data" has become a major force of innovation across enterprises of all sizes. New platforms with increasingly more features for managing big datasets are being announced almost on a weekly basis. Yet, there is currently a lack of any means of comparability among such platforms. While the performance of traditional database systems is well understood and measured by long-established institutions such as the Transaction Processing Performance Council (TCP), there is neither a clear definition of the performance of big data systems nor a generally agreed upon metric for comparing these systems. In this article, we describe a community-based effort for defining a big data benchmark. Over the past year, a Big Data Benchmarking Community has become established in order to fill this void. The effort focuses on defining an end-to-end application-layer benchmark for measuring the performance of big data applications, with the ability to easily adapt the benchmark specification to evolving challenges in the big data space. This article describes the efforts that have been undertaken thus far toward the definition of a BigData Top100 List. While highlighting the major technical as well as organizational challenges, through this article, we also solicit community input into this process.
Closing the College Graduation Gap: National College Access and Success Benchmarking Report

ERIC Educational Resources Information Center

DeBaun, Bill; Melnick, Sara; Morgan, Elizabeth

2016-01-01

This report, the first of an annual series, establishes meaningful metrics about the outcomes of students served by college access and success programs. Using data collected from 24 college access programs, enrollment and graduation rates for the high school classes of 2007, 2008, and 2009 and an enrollment rate for the high school class of 2013…
Framework for performance evaluation of face, text, and vehicle detection and tracking in video: data, metrics, and protocol.

PubMed

Kasturi, Rangachar; Goldgof, Dmitry; Soundararajan, Padmanabhan; Manohar, Vasant; Garofolo, John; Bowers, Rachel; Boonstra, Matthew; Korzhova, Valentina; Zhang, Jing

2009-02-01

Common benchmark data sets, standardized performance metrics, and baseline algorithms have demonstrated considerable impact on research and development in a variety of application domains. These resources provide both consumers and developers of technology with a common framework to objectively compare the performance of different algorithms and algorithmic improvements. In this paper, we present such a framework for evaluating object detection and tracking in video: specifically for face, text, and vehicle objects. This framework includes the source video data, ground-truth annotations (along with guidelines for annotation), performance metrics, evaluation protocols, and tools including scoring software and baseline algorithms. For each detection and tracking task and supported domain, we developed a 50-clip training set and a 50-clip test set. Each data clip is approximately 2.5 minutes long and has been completely spatially/temporally annotated at the I-frame level. Each task/domain, therefore, has an associated annotated corpus of approximately 450,000 frames. The scope of such annotation is unprecedented and was designed to begin to support the necessary quantities of data for robust machine learning approaches, as well as a statistically significant comparison of the performance of algorithms. The goal of this work was to systematically address the challenges of object detection and tracking through a common evaluation framework that permits a meaningful objective comparison of techniques, provides the research community with sufficient data for the exploration of automatic modeling techniques, encourages the incorporation of objective evaluation into the development process, and contributes useful lasting resources of a scale and magnitude that will prove to be extremely useful to the computer vision research community for years to come.
Revisiting the PLUMBER Experiments from a Process-Diagnostics Perspective

NASA Astrophysics Data System (ADS)

Nearing, G. S.; Ruddell, B. L.; Clark, M. P.; Nijssen, B.; Peters-Lidard, C. D.

2017-12-01

The PLUMBER benchmarking experiments [1] showed that some of the most sophisticated land models (CABLE, CH-TESSEL, COLA-SSiB, ISBA-SURFEX, JULES, Mosaic, Noah, ORCHIDEE) were outperformed - in simulations of half-hourly surface energy fluxes - by instantaneous, out-of-sample, and globally-stationary regressions with no state memory. One criticism of PLUMBER is that the benchmarking methodology was not derived formally, so that applying a similar methodology with different performance metrics can result in qualitatively different results. Another common criticism of model intercomparison projects in general is that they offer little insight into process-level deficiencies in the models, and therefore are of marginal value for helping to improve the models. We address both of these issues by proposing a formal benchmarking methodology that also yields a formal and quantitative method for process-level diagnostics. We apply this to the PLUMBER experiments to show that (1) the PLUMBER conclusions were generally correct - the models use only a fraction of the information available to them from met forcing data (<50% by our analysis), and (2) all of the land models investigated by PLUMBER have similar process-level error structures, and therefore together do not represent a meaningful sample of structural or epistemic uncertainty. We conclude by suggesting two ways to improve the experimental design of model intercomparison and/or model benchmarking studies like PLUMBER. First, PLUMBER did not report model parameter values, and it is necessary to know these values to separate parameter uncertainty from structural uncertainty. This is a first order requirement if we want to use intercomparison studies to provide feedback to model development. Second, technical documentation of land models is inadequate. Future model intercomparison projects should begin with a collaborative effort by model developers to document specific differences between model structures. This could be done in a reproducible way using a unified, process-flexible system like SUMMA [2]. [1] Best, M.J. et al. (2015) 'The plumbing of land surface models: benchmarking model performance', J. Hydrometeor. [2] Clark, M.P. et al. (2015) 'A unified approach for process-based hydrologic modeling: 1. Modeling concept', Water Resour. Res.
Machine characterization based on an abstract high-level language machine

NASA Technical Reports Server (NTRS)

Saavedra-Barrera, Rafael H.; Smith, Alan Jay; Miya, Eugene

1989-01-01

Measurements are presented for a large number of machines ranging from small workstations to supercomputers. The authors combine these measurements into groups of parameters which relate to specific aspects of the machine implementation, and use these groups to provide overall machine characterizations. The authors also define the concept of pershapes, which represent the level of performance of a machine for different types of computation. A metric based on pershapes is introduced that provides a quantitative way of measuring how similar two machines are in terms of their performance distributions. The metric is related to the extent to which pairs of machines have varying relative performance levels depending on which benchmark is used.
Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking

PubMed Central

2012-01-01

A key metric to assess molecular docking remains ligand enrichment against challenging decoys. Whereas the directory of useful decoys (DUD) has been widely used, clear areas for optimization have emerged. Here we describe an improved benchmarking set that includes more diverse targets such as GPCRs and ion channels, totaling 102 proteins with 22886 clustered ligands drawn from ChEMBL, each with 50 property-matched decoys drawn from ZINC. To ensure chemotype diversity, we cluster each target’s ligands by their Bemis–Murcko atomic frameworks. We add net charge to the matched physicochemical properties and include only the most dissimilar decoys, by topology, from the ligands. An online automated tool (http://decoys.docking.org) generates these improved matched decoys for user-supplied ligands. We test this data set by docking all 102 targets, using the results to improve the balance between ligand desolvation and electrostatics in DOCK 3.6. The complete DUD-E benchmarking set is freely available at http://dude.docking.org. PMID:22716043
Benchmarking Global Food Safety Performances: The Era of Risk Intelligence.

PubMed

Valleé, Jean-Charles Le; Charlebois, Sylvain

2015-10-01

Food safety data segmentation and limitations hamper the world's ability to select, build up, monitor, and evaluate food safety performance. Currently, there is no metric that captures the entire food safety system, and performance data are not collected strategically on a global scale. Therefore, food safety benchmarking is essential not only to help monitor ongoing performance but also to inform continued food safety system design, adoption, and implementation toward more efficient and effective food safety preparedness, responsiveness, and accountability. This comparative study identifies and evaluates common elements among global food safety systems. It provides an overall world ranking of food safety performance for 17 Organisation for Economic Co-Operation and Development (OECD) countries, illustrated by 10 indicators organized across three food safety risk governance domains: risk assessment (chemical risks, microbial risks, and national reporting on food consumption), risk management (national food safety capacities, food recalls, food traceability, and radionuclides standards), and risk communication (allergenic risks, labeling, and public trust). Results show all countries have very high food safety standards, but Canada and Ireland, followed by France, earned excellent grades relative to their peers. However, any subsequent global ranking study should consider the development of survey instruments to gather adequate and comparable national evidence on food safety.
Image segmentation with a novel regularized composite shape prior based on surrogate study

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhao, Tingting, E-mail: tingtingzhao@mednet.ucla.edu; Ruan, Dan, E-mail: druan@mednet.ucla.edu

Purpose: Incorporating training into image segmentation is a good approach to achieve additional robustness. This work aims to develop an effective strategy to utilize shape prior knowledge, so that the segmentation label evolution can be driven toward the desired global optimum. Methods: In the variational image segmentation framework, a regularization for the composite shape prior is designed to incorporate the geometric relevance of individual training data to the target, which is inferred by an image-based surrogate relevance metric. Specifically, this regularization is imposed on the linear weights of composite shapes and serves as a hyperprior. The overall problem is formulatedmore » in a unified optimization setting and a variational block-descent algorithm is derived. Results: The performance of the proposed scheme is assessed in both corpus callosum segmentation from an MR image set and clavicle segmentation based on CT images. The resulted shape composition provides a proper preference for the geometrically relevant training data. A paired Wilcoxon signed rank test demonstrates statistically significant improvement of image segmentation accuracy, when compared to multiatlas label fusion method and three other benchmark active contour schemes. Conclusions: This work has developed a novel composite shape prior regularization, which achieves superior segmentation performance than typical benchmark schemes.« less
Student Progress to Graduation in New York City High Schools: A Metric Designed by New Visions for Public Schools. Part I: Core Components

ERIC Educational Resources Information Center

Fairchild, Susan; Gunton, Brad; Donohue, Beverly; Berry, Carolyn; Genn, Ruth; Knevals, Jessica

2011-01-01

Students who achieve critical academic benchmarks such as high attendance rates, continuous levels of credit accumulation, and high grades have a greater likelihood of success throughout high school and beyond. However, keeping students on track toward meeting graduation requirements and quickly identifying students who are at risk of falling off…
Demand Forecasting: An Evaluation of DODs Accuracy Metric and Navys Procedures

DTIC Science & Technology

2016-06-01

inventory management improvement plan, mean of absolute scaled error, lead time adjusted squared error, forecast accuracy, benchmarking, naïve method...Manager JASA Journal of the American Statistical Association LASE Lead-time Adjusted Squared Error LCI Life Cycle Indicator MA Moving Average MAE...Mean Squared Error xvi NAVSUP Naval Supply Systems Command NDAA National Defense Authorization Act NIIN National Individual Identification Number
Model evaluation using a community benchmarking system for land surface models

NASA Astrophysics Data System (ADS)

Mu, M.; Hoffman, F. M.; Lawrence, D. M.; Riley, W. J.; Keppel-Aleks, G.; Kluzek, E. B.; Koven, C. D.; Randerson, J. T.

2014-12-01

Evaluation of atmosphere, ocean, sea ice, and land surface models is an important step in identifying deficiencies in Earth system models and developing improved estimates of future change. For the land surface and carbon cycle, the design of an open-source system has been an important objective of the International Land Model Benchmarking (ILAMB) project. Here we evaluated CMIP5 and CLM models using a benchmarking system that enables users to specify models, data sets, and scoring systems so that results can be tailored to specific model intercomparison projects. Our scoring system used information from four different aspects of global datasets, including climatological mean spatial patterns, seasonal cycle dynamics, interannual variability, and long-term trends. Variable-to-variable comparisons enable investigation of the mechanistic underpinnings of model behavior, and allow for some control of biases in model drivers. Graphics modules allow users to evaluate model performance at local, regional, and global scales. Use of modular structures makes it relatively easy for users to add new variables, diagnostic metrics, benchmarking datasets, or model simulations. Diagnostic results are automatically organized into HTML files, so users can conveniently share results with colleagues. We used this system to evaluate atmospheric carbon dioxide, burned area, global biomass and soil carbon stocks, net ecosystem exchange, gross primary production, ecosystem respiration, terrestrial water storage, evapotranspiration, and surface radiation from CMIP5 historical and ESM historical simulations. We found that the multi-model mean often performed better than many of the individual models for most variables. We plan to publicly release a stable version of the software during fall of 2014 that has land surface, carbon cycle, hydrology, radiation and energy cycle components.
BIOSSES: a semantic sentence similarity estimation system for the biomedical domain.

PubMed

Sogancioglu, Gizem; Öztürk, Hakime; Özgür, Arzucan

2017-07-15

The amount of information available in textual format is rapidly increasing in the biomedical domain. Therefore, natural language processing (NLP) applications are becoming increasingly important to facilitate the retrieval and analysis of these data. Computing the semantic similarity between sentences is an important component in many NLP tasks including text retrieval and summarization. A number of approaches have been proposed for semantic sentence similarity estimation for generic English. However, our experiments showed that such approaches do not effectively cover biomedical knowledge and produce poor results for biomedical text. We propose several approaches for sentence-level semantic similarity computation in the biomedical domain, including string similarity measures and measures based on the distributed vector representations of sentences learned in an unsupervised manner from a large biomedical corpus. In addition, ontology-based approaches are presented that utilize general and domain-specific ontologies. Finally, a supervised regression based model is developed that effectively combines the different similarity computation metrics. A benchmark data set consisting of 100 sentence pairs from the biomedical literature is manually annotated by five human experts and used for evaluating the proposed methods. The experiments showed that the supervised semantic sentence similarity computation approach obtained the best performance (0.836 correlation with gold standard human annotations) and improved over the state-of-the-art domain-independent systems up to 42.6% in terms of the Pearson correlation metric. A web-based system for biomedical semantic sentence similarity computation, the source code, and the annotated benchmark data set are available at: http://tabilab.cmpe.boun.edu.tr/BIOSSES/ . gizemsogancioglu@gmail.com or arzucan.ozgur@boun.edu.tr. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Exposure-response relationship and risk assessment for cognitive deficits in early welding-induced manganism.

PubMed

Park, Robert M; Bowler, Rosemarie M; Roels, Harry A

2009-10-01

The exposure-response relationship for manganese (Mn)-induced adverse nervous system effects is not well described. Symptoms and neuropsychological deficits associated with early manganism were previously reported for welders constructing bridge piers during 2003 to 2004. A reanalysis using improved exposure, work history information, and diverse exposure metrics is presented here. Ten neuropsychological performance measures were examined, including working memory index (WMI), verbal intelligence quotient, design fluency, Stroop color word test, Rey-Osterrieth Complex Figure, and Auditory Consonant Trigram tests. Mn blood levels and air sampling data in the form of both personal and area samples were available. The exposure metrics used were cumulative exposure to Mn, body burden assuming simple first-order kinetics for Mn elimination, and cumulative burden (effective dose). Benchmark doses were calculated. Burden with a half-life of about 150 days was the best predictor of blood Mn. WMI performance declined by 3.6 (normal = 100, SD = 15) for each 1.0 mg/m3 x mo exposure (P = 0.02, one tailed). At the group mean exposure metric (burden; half-life = 275 days), WMI performance was at the lowest 17th percentile of normal, and at the maximum observed metric, performance was at the lowest 2.5 percentiles. Four other outcomes also exhibited statistically significant associations (verbal intelligence quotient, verbal comprehension index, design fluency, Stroop color word test); no dose-rate effect was observed for three of the five outcomes. A risk assessment performed for the five stronger effects, choosing various percentiles of normal performance to represent impairment, identified benchmark doses for a 2-year exposure leading to 5% excess impairment prevalence in the range of 0.03 to 0.15 mg/m3, or 30 to 150 microg/m3, total Mn in air, levels that are far below those permitted by current occupational standards. More than one-third of workers would be impaired after working 2 years at 0.2 mg/m3 Mn (the current threshold limit value).
In the pursuit of a semantic similarity metric based on UMLS annotations for articles in PubMed Central Open Access.

PubMed

Garcia Castro, Leyla Jael; Berlanga, Rafael; Garcia, Alexander

2015-10-01

Although full-text articles are provided by the publishers in electronic formats, it remains a challenge to find related work beyond the title and abstract context. Identifying related articles based on their abstract is indeed a good starting point; this process is straightforward and does not consume as many resources as full-text based similarity would require. However, further analyses may require in-depth understanding of the full content. Two articles with highly related abstracts can be substantially different regarding the full content. How similarity differs when considering title-and-abstract versus full-text and which semantic similarity metric provides better results when dealing with full-text articles are the main issues addressed in this manuscript. We have benchmarked three similarity metrics - BM25, PMRA, and Cosine, in order to determine which one performs best when using concept-based annotations on full-text documents. We also evaluated variations in similarity values based on title-and-abstract against those relying on full-text. Our test dataset comprises the Genomics track article collection from the 2005 Text Retrieval Conference. Initially, we used an entity recognition software to semantically annotate titles and abstracts as well as full-text with concepts defined in the Unified Medical Language System (UMLS®). For each article, we created a document profile, i.e., a set of identified concepts, term frequency, and inverse document frequency; we then applied various similarity metrics to those document profiles. We considered correlation, precision, recall, and F1 in order to determine which similarity metric performs best with concept-based annotations. For those full-text articles available in PubMed Central Open Access (PMC-OA), we also performed dispersion analyses in order to understand how similarity varies when considering full-text articles. We have found that the PubMed Related Articles similarity metric is the most suitable for full-text articles annotated with UMLS concepts. For similarity values above 0.8, all metrics exhibited an F1 around 0.2 and a recall around 0.1; BM25 showed the highest precision close to 1; in all cases the concept-based metrics performed better than the word-stem-based one. Our experiments show that similarity values vary when considering only title-and-abstract versus full-text similarity. Therefore, analyses based on full-text become useful when a given research requires going beyond title and abstract, particularly regarding connectivity across articles. Visualization available at ljgarcia.github.io/semsim.benchmark/, data available at http://dx.doi.org/10.5281/zenodo.13323. Copyright © 2015 Elsevier Inc. All rights reserved.
Validation of neural spike sorting algorithms without ground-truth information.

PubMed

Barnett, Alex H; Magland, Jeremy F; Greengard, Leslie F

2016-05-01

The throughput of electrophysiological recording is growing rapidly, allowing thousands of simultaneous channels, and there is a growing variety of spike sorting algorithms designed to extract neural firing events from such data. This creates an urgent need for standardized, automatic evaluation of the quality of neural units output by such algorithms. We introduce a suite of validation metrics that assess the credibility of a given automatic spike sorting algorithm applied to a given dataset. By rerunning the spike sorter two or more times, the metrics measure stability under various perturbations consistent with variations in the data itself, making no assumptions about the internal workings of the algorithm, and minimal assumptions about the noise. We illustrate the new metrics on standard sorting algorithms applied to both in vivo and ex vivo recordings, including a time series with overlapping spikes. We compare the metrics to existing quality measures, and to ground-truth accuracy in simulated time series. We provide a software implementation. Metrics have until now relied on ground-truth, simulated data, internal algorithm variables (e.g. cluster separation), or refractory violations. By contrast, by standardizing the interface, our metrics assess the reliability of any automatic algorithm without reference to internal variables (e.g. feature space) or physiological criteria. Stability is a prerequisite for reproducibility of results. Such metrics could reduce the significant human labor currently spent on validation, and should form an essential part of large-scale automated spike sorting and systematic benchmarking of algorithms. Copyright © 2016 Elsevier B.V. All rights reserved.
Stability metrics for multi-source biomedical data based on simplicial projections from probability distribution distances.

PubMed

Sáez, Carlos; Robles, Montserrat; García-Gómez, Juan M

2017-02-01

Biomedical data may be composed of individuals generated from distinct, meaningful sources. Due to possible contextual biases in the processes that generate data, there may exist an undesirable and unexpected variability among the probability distribution functions (PDFs) of the source subsamples, which, when uncontrolled, may lead to inaccurate or unreproducible research results. Classical statistical methods may have difficulties to undercover such variabilities when dealing with multi-modal, multi-type, multi-variate data. This work proposes two metrics for the analysis of stability among multiple data sources, robust to the aforementioned conditions, and defined in the context of data quality assessment. Specifically, a global probabilistic deviation and a source probabilistic outlyingness metrics are proposed. The first provides a bounded degree of the global multi-source variability, designed as an estimator equivalent to the notion of normalized standard deviation of PDFs. The second provides a bounded degree of the dissimilarity of each source to a latent central distribution. The metrics are based on the projection of a simplex geometrical structure constructed from the Jensen-Shannon distances among the sources PDFs. The metrics have been evaluated and demonstrated their correct behaviour on a simulated benchmark and with real multi-source biomedical data using the UCI Heart Disease data set. The biomedical data quality assessment based on the proposed stability metrics may improve the efficiency and effectiveness of biomedical data exploitation and research.
An Unbiased Method To Build Benchmarking Sets for Ligand-Based Virtual Screening and its Application To GPCRs

PubMed Central

2015-01-01

Benchmarking data sets have become common in recent years for the purpose of virtual screening, though the main focus had been placed on the structure-based virtual screening (SBVS) approaches. Due to the lack of crystal structures, there is great need for unbiased benchmarking sets to evaluate various ligand-based virtual screening (LBVS) methods for important drug targets such as G protein-coupled receptors (GPCRs). To date these ready-to-apply data sets for LBVS are fairly limited, and the direct usage of benchmarking sets designed for SBVS could bring the biases to the evaluation of LBVS. Herein, we propose an unbiased method to build benchmarking sets for LBVS and validate it on a multitude of GPCRs targets. To be more specific, our methods can (1) ensure chemical diversity of ligands, (2) maintain the physicochemical similarity between ligands and decoys, (3) make the decoys dissimilar in chemical topology to all ligands to avoid false negatives, and (4) maximize spatial random distribution of ligands and decoys. We evaluated the quality of our Unbiased Ligand Set (ULS) and Unbiased Decoy Set (UDS) using three common LBVS approaches, with Leave-One-Out (LOO) Cross-Validation (CV) and a metric of average AUC of the ROC curves. Our method has greatly reduced the “artificial enrichment” and “analogue bias” of a published GPCRs benchmarking set, i.e., GPCR Ligand Library (GLL)/GPCR Decoy Database (GDD). In addition, we addressed an important issue about the ratio of decoys per ligand and found that for a range of 30 to 100 it does not affect the quality of the benchmarking set, so we kept the original ratio of 39 from the GLL/GDD. PMID:24749745
An unbiased method to build benchmarking sets for ligand-based virtual screening and its application to GPCRs.

PubMed

Xia, Jie; Jin, Hongwei; Liu, Zhenming; Zhang, Liangren; Wang, Xiang Simon

2014-05-27

Benchmarking data sets have become common in recent years for the purpose of virtual screening, though the main focus had been placed on the structure-based virtual screening (SBVS) approaches. Due to the lack of crystal structures, there is great need for unbiased benchmarking sets to evaluate various ligand-based virtual screening (LBVS) methods for important drug targets such as G protein-coupled receptors (GPCRs). To date these ready-to-apply data sets for LBVS are fairly limited, and the direct usage of benchmarking sets designed for SBVS could bring the biases to the evaluation of LBVS. Herein, we propose an unbiased method to build benchmarking sets for LBVS and validate it on a multitude of GPCRs targets. To be more specific, our methods can (1) ensure chemical diversity of ligands, (2) maintain the physicochemical similarity between ligands and decoys, (3) make the decoys dissimilar in chemical topology to all ligands to avoid false negatives, and (4) maximize spatial random distribution of ligands and decoys. We evaluated the quality of our Unbiased Ligand Set (ULS) and Unbiased Decoy Set (UDS) using three common LBVS approaches, with Leave-One-Out (LOO) Cross-Validation (CV) and a metric of average AUC of the ROC curves. Our method has greatly reduced the "artificial enrichment" and "analogue bias" of a published GPCRs benchmarking set, i.e., GPCR Ligand Library (GLL)/GPCR Decoy Database (GDD). In addition, we addressed an important issue about the ratio of decoys per ligand and found that for a range of 30 to 100 it does not affect the quality of the benchmarking set, so we kept the original ratio of 39 from the GLL/GDD.
Accuracy of ab initio electron correlation and electron densities in vanadium dioxide

DOE PAGES

Kylänpää, Ilkka; Balachandran, Janakiraman; Ganesh, Panchapakesan; ...

2017-11-27

Here, diffusion quantum Monte Carlo results are used as a reference to analyze properties related to phase stability and magnetism in vanadium dioxide computed with various formulations of density functional theory. We introduce metrics related to energetics, electron densities and spin densities that give us insight on both local and global variations in the antiferromagnetic M1 and R phases. Importantly, these metrics can address contributions arising from the challenging description of the 3d orbital physics in this material. We observe that the best description of energetics between the structural phases does not correspond to the best accuracy in the charge density, which is consistent with observations made recently by Medvedev et~al. in the context of isolated atoms. However, we do find evidence that an accurate spin density connects to correct energetic ordering of different magnetic states in VOmore » $$_2$$, although local, semilocal, and meta-GGA functionals tend to erroneously favor demagnetization of the vanadium sites. The recently developed SCAN functional stands out as remaining nearly balanced in terms of magnetization across the M1-R transition and correctly predicting the ground state crystal structure. In addition to ranking current density functionals, our reference energies and densities serve as important benchmarks for future functional development.« less

Accuracy of ab initio electron correlation and electron densities in vanadium dioxide

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kylänpää, Ilkka; Balachandran, Janakiraman; Ganesh, Panchapakesan

Here, diffusion quantum Monte Carlo results are used as a reference to analyze properties related to phase stability and magnetism in vanadium dioxide computed with various formulations of density functional theory. We introduce metrics related to energetics, electron densities and spin densities that give us insight on both local and global variations in the antiferromagnetic M1 and R phases. Importantly, these metrics can address contributions arising from the challenging description of the 3d orbital physics in this material. We observe that the best description of energetics between the structural phases does not correspond to the best accuracy in the charge density, which is consistent with observations made recently by Medvedev et~al. in the context of isolated atoms. However, we do find evidence that an accurate spin density connects to correct energetic ordering of different magnetic states in VOmore » $$_2$$, although local, semilocal, and meta-GGA functionals tend to erroneously favor demagnetization of the vanadium sites. The recently developed SCAN functional stands out as remaining nearly balanced in terms of magnetization across the M1-R transition and correctly predicting the ground state crystal structure. In addition to ranking current density functionals, our reference energies and densities serve as important benchmarks for future functional development.« less
Intelligent Agent Architectures: Reactive Planning Testbed

NASA Technical Reports Server (NTRS)

Rosenschein, Stanley J.; Kahn, Philip

1993-01-01

An Integrated Agent Architecture (IAA) is a framework or paradigm for constructing intelligent agents. Intelligent agents are collections of sensors, computers, and effectors that interact with their environments in real time in goal-directed ways. Because of the complexity involved in designing intelligent agents, it has been found useful to approach the construction of agents with some organizing principle, theory, or paradigm that gives shape to the agent's components and structures their relationships. Given the wide variety of approaches being taken in the field, the question naturally arises: Is there a way to compare and evaluate these approaches? The purpose of the present work is to develop common benchmark tasks and evaluation metrics to which intelligent agents, including complex robotic agents, constructed using various architectural approaches can be subjected.
Identification of metabolic pathways using pathfinding approaches: a systematic review.

PubMed

Abd Algfoor, Zeyad; Shahrizal Sunar, Mohd; Abdullah, Afnizanfaizal; Kolivand, Hoshang

2017-03-01

Metabolic pathways have become increasingly available for various microorganisms. Such pathways have spurred the development of a wide array of computational tools, in particular, mathematical pathfinding approaches. This article can facilitate the understanding of computational analysis of metabolic pathways in genomics. Moreover, stoichiometric and pathfinding approaches in metabolic pathway analysis are discussed. Three major types of studies are elaborated: stoichiometric identification models, pathway-based graph analysis and pathfinding approaches in cellular metabolism. Furthermore, evaluation of the outcomes of the pathways with mathematical benchmarking metrics is provided. This review would lead to better comprehension of metabolism behaviors in living cells, in terms of computed pathfinding approaches. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Complete graph model for community detection

NASA Astrophysics Data System (ADS)

Sun, Peng Gang; Sun, Xiya

2017-04-01

Community detection brings plenty of considerable problems, which has attracted more attention for many years. This paper develops a new framework, which tries to measure the interior and the exterior of a community based on a same metric, complete graph model. In particular, the exterior is modeled as a complete bipartite. We partition a network into subnetworks by maximizing the difference between the interior and the exterior of the subnetworks. In addition, we compare our approach with some state of the art methods on computer-generated networks based on the LFR benchmark as well as real-world networks. The experimental results indicate that our approach obtains better results for community detection, is capable of splitting irregular networks and achieves perfect results on the karate network and the dolphin network.
Benchmarking Terrestrial Ecosystem Models in the South Central US

NASA Astrophysics Data System (ADS)

Kc, M.; Winton, K.; Langston, M. A.; Luo, Y.

2016-12-01

Ecosystem services and products are the foundation of sustainability for regional and global economy since we are directly or indirectly dependent on the ecosystem services like food, livestock, water, air, wildlife etc. It has been increasingly recognized that for sustainability concerns, the conservation problems need to be addressed in the context of entire ecosystems. This approach is even more vital in the 21st century with formidable increasing human population and rapid changes in global environment. This study was conducted to find the state of the science of ecosystem models in the South-Central region of US. The ecosystem models were benchmarked using ILAMB diagnostic package developed as a result of International Land Model Benchmarking (ILAMB) project on four main categories; viz, Ecosystem and Carbon Cycle, Hydrology Cycle, Radiation and Energy Cycle and Climate forcings. A cumulative assessment was generated with weighted seven different skill assessment metrics for the ecosystem models. This synthesis on the current state of the science of ecosystem modeling in the South-Central region of US will be highly useful towards coupling these models with climate, agronomic, hydrologic, economic or management models to better represent ecosystem dynamics as affected by climate change and human activities; and hence gain more reliable predictions of future ecosystem functions and service in the region. Better understandings of such processes will increase our ability to predict the ecosystem responses and feedbacks to environmental and human induced change in the region so that decision makers can make an informed management decisions of the ecosystem.
Toward Alternative Metrics for Measuring Performance within Operational Contracting Squadrons: An Application of Benchmarking Techniques

DTIC Science & Technology

1993-09-01

and (6) assistance with special problems by the purchasing department (Cavinato, 1987:10). Szilagyi and Wallace state-that an effective performance...Naval Research, November 1975. Szilagyi , Andrew D. Jr. and Marc J. Wallace , Jr. Organizational Behavior and Performance, Santa Monica CA, Goodyear...evaluation systems will the manager be able to achieve the bottom line--organizational effectiveness ( Szilagyi 1980:457). Benefits of Evaluatin2 Performance
Increasing awareness among fluid milk processors of the economic feasibility of energy efficiency projects, and encouraging their adoption through access to benchmarking and other decision-support tools

USDA-ARS?s Scientific Manuscript database

Based on a study done by Thoma et al. (2010) the energy used in fluid milk processing in the United States of America is responsible for approximately 2 million metric tons of greenhouse gas (GHG) emissions within the total life cycle of milk. These emissions come from electricity use (about 75 perc...
Influence of sediment chemistry and sediment toxicity on macroinvertebrate communities across 99 wadable streams of the Midwestern USA

USGS Publications Warehouse

Moran, Patrick W.; Nowell, Lisa H.; Kemble, Nile E.; Mahler, Barbara J.; Waite, Ian R.; Van Metre, Peter C.

2017-01-01

Simultaneous assessment of sediment chemistry, sediment toxicity, and macroinvertebrate communities can provide multiple lines of evidence when investigating relations between sediment contaminants and ecological degradation. These three measures were evaluated at 99 wadable stream sites across 11 states in the Midwestern United States during the summer of 2013 to assess sediment pollution across a large agricultural landscape. This evaluation considers an extensive suite of sediment chemistry totaling 274 analytes (polycyclic aromatic hydrocarbons, organochlorine compounds, polychlorinated biphenyls, polybrominated diphenyl ethers, trace elements, and current-use pesticides) and a mixture assessment based on the ratios of detected compounds to available effects-based benchmarks. The sediments were tested for toxicity with the amphipod Hyalella azteca (28-d exposure), the midge Chironomus dilutus (10-d), and, at a few sites, with the freshwater mussel Lampsilis siliquoidea (28-d). Sediment concentrations, normalized to organic carbon content, infrequently exceeded benchmarks for aquatic health, which was generally consistent with low rates of observed toxicity. However, the benchmark-based mixture score and the pyrethroid insecticide bifenthrin were significantly related to observed sediment toxicity. The sediment mixture score and bifenthrin were also significant predictors of the upper limits of several univariate measures of the macroinvertebrate community (EPT percent, MMI (Macroinvertebrate Multimetric Index) Score, Ephemeroptera and Trichoptera richness) using quantile regression. Multivariate pattern matching (Mantel-like tests) of macroinvertebrate species per site to identified contaminant metrics and sediment toxicity also indicate that the sediment mixture score and bifenthrin have weak, albeit significant, influence on the observed invertebrate community composition. Together, these three lines of evidence (toxicity tests, univariate metrics, and multivariate community analysis) suggest that elevated contaminant concentrations in sediments, in particular bifenthrin, is limiting macroinvertebrate communities in several of these Midwest streams.
Evaluation of the Relative Citation Ratio, a New National Institutes of Health-Supported Bibliometric Measure of Research Productivity, among Academic Radiation Oncologists.

PubMed

Rock, Calvin B; Prabhu, Arpan V; Fuller, C David; Thomas, Charles R; Holliday, Emma B

2018-03-01

Publication metrics are useful in evaluating academic faculty for awarding grants, recruitment, and promotion. A new metric, the relative citation ratio (RCR), was recently released by the National Institutes of Health (NIH); however, no benchmark data yet exist. We sought to create benchmark data for physician faculty in academic radiation oncology (RO) and analyze correlations associated with increased academic productivity. Citation database searches were performed for all US radiation oncologists affiliated with academic RO programs. Gender, NIH funding, career duration, academic rank, RCR, and weighted RCR were collected for each faculty. RCR and weighted RCR were calculated and compared between each subgroup of interest. RCR percentiles were also created for reference. A total of 1,299 RO physician faculty members from 75 institutions were included in the analysis. Overall, RO physician were very productive and influential with a mean RCR of 1.57 ± 1.53 SD and median RCR (interquartile range) of 1.32 (0.87-1.94). Academic rank, career duration, and NIH funding were associated with increased mean RCR and weighted RCR. Male gender and having a PhD were associated with an increased weighted RCR but not an increased mean RCR. Current academic radiation oncologists have a high mean RCR value relative to the benchmark NIH RCR value of 1. All subgroups analyzed had an RCR value above 1 with professor or chair and previous NIH funding having the highest RCR and weighted RCR values overall. These data may be useful for self-evaluation of ROs as well as evaluation of faculty by institutional and departmental leaders. Copyright © 2017 American College of Radiology. Published by Elsevier Inc. All rights reserved.
On the distribution of career longevity and the evolution of home-run prowess in professional baseball

NASA Astrophysics Data System (ADS)

Petersen, Alexander M.; Jung, Woo-Sung; Stanley, H. Eugene

2008-09-01

Statistical analysis is a major aspect of baseball, from player averages to historical benchmarks and records. Much of baseball fanfare is based around players exceeding the norm, some in a single game and others over a long career. Career statistics serve as a metric for classifying players and establishing their historical legacy. However, the concept of records and benchmarks assumes that the level of competition in baseball is stationary in time. Here we show that power law probability density functions, a hallmark of many complex systems that are driven by competition, govern career longevity in baseball. We also find similar power laws in the density functions of all major performance metrics for pitchers and batters. The use of performance-enhancing drugs has a dark history, emerging as a problem for both amateur and professional sports. We find statistical evidence consistent with performance-enhancing drugs in the analysis of home runs hit by players in the last 25 years. This is corroborated by the findings of the Mitchell Report (2007), a two-year investigation into the use of illegal steroids in Major League Baseball, which recently revealed that over 5 percent of Major League Baseball players tested positive for performance-enhancing drugs in an anonymous 2003 survey.
Large scale study of multiple-molecule queries

PubMed Central

2009-01-01

Background In ligand-based screening, as well as in other chemoinformatics applications, one seeks to effectively search large repositories of molecules in order to retrieve molecules that are similar typically to a single molecule lead. However, in some case, multiple molecules from the same family are available to seed the query and search for other members of the same family. Multiple-molecule query methods have been less studied than single-molecule query methods. Furthermore, the previous studies have relied on proprietary data and sometimes have not used proper cross-validation methods to assess the results. In contrast, here we develop and compare multiple-molecule query methods using several large publicly available data sets and background. We also create a framework based on a strict cross-validation protocol to allow unbiased benchmarking for direct comparison in future studies across several performance metrics. Results Fourteen different multiple-molecule query methods were defined and benchmarked using: (1) 41 publicly available data sets of related molecules with similar biological activity; and (2) publicly available background data sets consisting of up to 175,000 molecules randomly extracted from the ChemDB database and other sources. Eight of the fourteen methods were parameter free, and six of them fit one or two free parameters to the data using a careful cross-validation protocol. All the methods were assessed and compared for their ability to retrieve members of the same family against the background data set by using several performance metrics including the Area Under the Accumulation Curve (AUAC), Area Under the Curve (AUC), F1-measure, and BEDROC metrics. Consistent with the previous literature, the best parameter-free methods are the MAX-SIM and MIN-RANK methods, which score a molecule to a family by the maximum similarity, or minimum ranking, obtained across the family. One new parameterized method introduced in this study and two previously defined methods, the Exponential Tanimoto Discriminant (ETD), the Tanimoto Power Discriminant (TPD), and the Binary Kernel Discriminant (BKD), outperform most other methods but are more complex, requiring one or two parameters to be fit to the data. Conclusion Fourteen methods for multiple-molecule querying of chemical databases, including novel methods, (ETD) and (TPD), are validated using publicly available data sets, standard cross-validation protocols, and established metrics. The best results are obtained with ETD, TPD, BKD, MAX-SIM, and MIN-RANK. These results can be replicated and compared with the results of future studies using data freely downloadable from http://cdb.ics.uci.edu/. PMID:20298525
Structural Life and Reliability Metrics: Benchmarking and Verification of Probabilistic Life Prediction Codes

NASA Technical Reports Server (NTRS)

Litt, Jonathan S.; Soditus, Sherry; Hendricks, Robert C.; Zaretsky, Erwin V.

2002-01-01

Over the past two decades there has been considerable effort by NASA Glenn and others to develop probabilistic codes to predict with reasonable engineering certainty the life and reliability of critical components in rotating machinery and, more specifically, in the rotating sections of airbreathing and rocket engines. These codes have, to a very limited extent, been verified with relatively small bench rig type specimens under uniaxial loading. Because of the small and very narrow database the acceptance of these codes within the aerospace community has been limited. An alternate approach to generating statistically significant data under complex loading and environments simulating aircraft and rocket engine conditions is to obtain, catalog and statistically analyze actual field data. End users of the engines, such as commercial airlines and the military, record and store operational and maintenance information. This presentation describes a cooperative program between the NASA GRC, United Airlines, USAF Wright Laboratory, U.S. Army Research Laboratory and Australian Aeronautical & Maritime Research Laboratory to obtain and analyze these airline data for selected components such as blades, disks and combustors. These airline data will be used to benchmark and compare existing life prediction codes.
An integrity measure to benchmark quantum error correcting memories

NASA Astrophysics Data System (ADS)

Xu, Xiaosi; de Beaudrap, Niel; O'Gorman, Joe; Benjamin, Simon C.

2018-02-01

Rapidly developing experiments across multiple platforms now aim to realise small quantum codes, and so demonstrate a memory within which a logical qubit can be protected from noise. There is a need to benchmark the achievements in these diverse systems, and to compare the inherent power of the codes they rely upon. We describe a recently introduced performance measure called integrity, which relates to the probability that an ideal agent will successfully ‘guess’ the state of a logical qubit after a period of storage in the memory. Integrity is straightforward to evaluate experimentally without state tomography and it can be related to various established metrics such as the logical fidelity and the pseudo-threshold. We offer a set of experimental milestones that are steps towards demonstrating unconditionally superior encoded memories. Using intensive numerical simulations we compare memories based on the five-qubit code, the seven-qubit Steane code, and a nine-qubit code which is the smallest instance of a surface code; we assess both the simple and fault-tolerant implementations of each. While the ‘best’ code upon which to base a memory does vary according to the nature and severity of the noise, nevertheless certain trends emerge.
Quality Assurance Assessment of Diagnostic and Radiation Therapy–Simulation CT Image Registration for Head and Neck Radiation Therapy: Anatomic Region of Interest–based Comparison of Rigid and Deformable Algorithms

PubMed Central

Mohamed, Abdallah S. R.; Ruangskul, Manee-Naad; Awan, Musaddiq J.; Baron, Charles A.; Kalpathy-Cramer, Jayashree; Castillo, Richard; Castillo, Edward; Guerrero, Thomas M.; Kocak-Uzel, Esengul; Yang, Jinzhong; Court, Laurence E.; Kantor, Michael E.; Gunn, G. Brandon; Colen, Rivka R.; Frank, Steven J.; Garden, Adam S.; Rosenthal, David I.

2015-01-01

Purpose To develop a quality assurance (QA) workflow by using a robust, curated, manually segmented anatomic region-of-interest (ROI) library as a benchmark for quantitative assessment of different image registration techniques used for head and neck radiation therapy–simulation computed tomography (CT) with diagnostic CT coregistration. Materials and Methods Radiation therapy–simulation CT images and diagnostic CT images in 20 patients with head and neck squamous cell carcinoma treated with curative-intent intensity-modulated radiation therapy between August 2011 and May 2012 were retrospectively retrieved with institutional review board approval. Sixty-eight reference anatomic ROIs with gross tumor and nodal targets were then manually contoured on images from each examination. Diagnostic CT images were registered with simulation CT images rigidly and by using four deformable image registration (DIR) algorithms: atlas based, B-spline, demons, and optical flow. The resultant deformed ROIs were compared with manually contoured reference ROIs by using similarity coefficient metrics (ie, Dice similarity coefficient) and surface distance metrics (ie, 95% maximum Hausdorff distance). The nonparametric Steel test with control was used to compare different DIR algorithms with rigid image registration (RIR) by using the post hoc Wilcoxon signed-rank test for stratified metric comparison. Results A total of 2720 anatomic and 50 tumor and nodal ROIs were delineated. All DIR algorithms showed improved performance over RIR for anatomic and target ROI conformance, as shown for most comparison metrics (Steel test, P < .008 after Bonferroni correction). The performance of different algorithms varied substantially with stratification by specific anatomic structures or category and simulation CT section thickness. Conclusion Development of a formal ROI-based QA workflow for registration assessment demonstrated improved performance with DIR techniques over RIR. After QA, DIR implementation should be the standard for head and neck diagnostic CT and simulation CT allineation, especially for target delineation. © RSNA, 2014 Online supplemental material is available for this article. PMID:25380454
Developments in Surge Research Priorities: A Systematic Review of the Literature Following the Academic Emergency Medicine Consensus Conference, 2007-2015.

PubMed

Morton, Melinda J; DeAugustinis, Matthew L; Velasquez, Christina A; Singh, Sonal; Kelen, Gabor D

2015-11-01

In 2006, Academic Emergency Medicine (AEM) published a special issue summarizing the proceedings of the AEM consensus conference on the "Science of Surge." One major goal of the conference was to establish research priorities in the field of "disasters" surge. For this review, we wished to determine the progress toward the conference's identified research priorities: 1) defining criteria and methods for allocation of scarce resources, 2) identifying effective triage protocols, 3) determining decision-makers and means to evaluate response efficacy, 4) developing communication and information sharing strategies, and 5) identifying methods for evaluating workforce needs. Specific criteria were developed in conjunction with library search experts. PubMed, Embase, Web of Science, Scopus, and the Cochrane Library databases were queried for peer-reviewed articles from 2007 to 2015 addressing scientific advances related to the above five research priorities identified by AEM consensus conference. Abstracts and foreign language articles were excluded. Only articles with quantitative data on predefined outcomes were included; consensus panel recommendations on the above priorities were also included for the purposes of this review. Included study designs were randomized controlled trials, prospective, retrospective, qualitative (consensus panel), observational, cohort, case-control, or controlled before-and-after studies. Quality assessment was performed using a standardized tool for quantitative studies. Of the 2,484 unique articles identified by the search strategy, 313 articles appeared to be related to disaster surge. Following detailed text review, 50 articles with quantitative data and 11 concept papers (consensus conference recommendations) addressed at least one AEM consensus conference surge research priority. Outcomes included validation of the benchmark of 500 beds/million of population for disaster surge capacity, effectiveness of simulation- and Internet-based tools for forecasting of hospital and regional demand during disasters, effectiveness of reverse triage approaches, development of new disaster surge metrics, validation of mass critical care approaches (altered standards of care), use of telemedicine, and predictions of optimal hospital staffing levels for disaster surge events. Simulation tools appeared to provide some of the highest quality research. Disaster simulation studies have arguably revolutionized the study of disaster surge in the intervening years since the 2006 AEM Science of Surge conference, helping to validate some previously known disaster surge benchmarks and to generate new surge metrics. Use of reverse triage approaches and altered standards of care, as well as Internet-based tools such as Google Flu Trends, have also proven effective. However, there remains significant work to be done toward standardizing research methodologies and outcomes, as well as validating disaster surge metrics. © 2015 by the Society for Academic Emergency Medicine.
Particle image velocimetry correlation signal-to-noise ratio metrics and measurement uncertainty quantification

NASA Astrophysics Data System (ADS)

Xue, Zhenyu; Charonko, John J.; Vlachos, Pavlos P.

2014-11-01

In particle image velocimetry (PIV) the measurement signal is contained in the recorded intensity of the particle image pattern superimposed on a variety of noise sources. The signal-to-noise-ratio (SNR) strength governs the resulting PIV cross correlation and ultimately the accuracy and uncertainty of the resulting PIV measurement. Hence we posit that correlation SNR metrics calculated from the correlation plane can be used to quantify the quality of the correlation and the resulting uncertainty of an individual measurement. In this paper we extend the original work by Charonko and Vlachos and present a framework for evaluating the correlation SNR using a set of different metrics, which in turn are used to develop models for uncertainty estimation. Several corrections have been applied in this work. The SNR metrics and corresponding models presented herein are expanded to be applicable to both standard and filtered correlations by applying a subtraction of the minimum correlation value to remove the effect of the background image noise. In addition, the notion of a ‘valid’ measurement is redefined with respect to the correlation peak width in order to be consistent with uncertainty quantification principles and distinct from an ‘outlier’ measurement. Finally the type and significance of the error distribution function is investigated. These advancements lead to more robust and reliable uncertainty estimation models compared with the original work by Charonko and Vlachos. The models are tested against both synthetic benchmark data as well as experimental measurements. In this work, {{U}68.5} uncertainties are estimated at the 68.5% confidence level while {{U}95} uncertainties are estimated at 95% confidence level. For all cases the resulting calculated coverage factors approximate the expected theoretical confidence intervals, thus demonstrating the applicability of these new models for estimation of uncertainty for individual PIV measurements.
Evaluation Metrics for the Paragon XP/S-15

NASA Technical Reports Server (NTRS)

Traversat, Bernard; McNab, David; Nitzberg, Bill; Fineberg, Sam; Blaylock, Bruce T. (Technical Monitor)

1993-01-01

On February 17th 1993, the Numerical Aerodynamic Simulation (NAS) facility located at the NASA Ames Research Center installed a 224 node Intel Paragon XP/S-15 system. After its installation, the Paragon was found to be in a very immature state and was unable to support a NAS users' workload, composed of a wide range of development and production activities. As a first step towards addressing this problem, we implemented a set of metrics to objectively monitor the system as operating system and hardware upgrades were installed. The metrics were designed to measure four aspects of the system that we consider essential to support our workload: availability, utilization, functionality, and performance. This report presents the metrics collected from February 1993 to August 1993. Since its installation, the Paragon availability has improved from a low of 15% uptime to a high of 80%, while its utilization has remained low. Functionality and performance have improved from merely running one of the NAS Parallel Benchmarks to running all of them faster (between 1 and 2 times) than on the iPSC/860. In spite of the progress accomplished, fundamental limitations of the Paragon operating system are restricting the Paragon from supporting the NAS workload. The maximum operating system message passing (NORMA IPC) bandwidth was measured at 11 Mbytes/s, well below the peak hardware bandwidth (175 Mbytes/s), limiting overall virtual memory and Unix services (i.e. Disk and HiPPI I/O) performance. The high NX application message passing latency (184 microns), three times than on the iPSC/860, was found to significantly degrade performance of applications relying on small message sizes. The amount of memory available for an application was found to be approximately 10 Mbytes per node, indicating that the OS is taking more space than anticipated (6 Mbytes per node).
Comparative performance evaluation of automated segmentation methods of hippocampus from magnetic resonance images of temporal lobe epilepsy patients.

PubMed

Hosseini, Mohammad-Parsa; Nazem-Zadeh, Mohammad-Reza; Pompili, Dario; Jafari-Khouzani, Kourosh; Elisevich, Kost; Soltanian-Zadeh, Hamid

2016-01-01

Segmentation of the hippocampus from magnetic resonance (MR) images is a key task in the evaluation of mesial temporal lobe epilepsy (mTLE) patients. Several automated algorithms have been proposed although manual segmentation remains the benchmark. Choosing a reliable algorithm is problematic since structural definition pertaining to multiple edges, missing and fuzzy boundaries, and shape changes varies among mTLE subjects. Lack of statistical references and guidance for quantifying the reliability and reproducibility of automated techniques has further detracted from automated approaches. The purpose of this study was to develop a systematic and statistical approach using a large dataset for the evaluation of automated methods and establish a method that would achieve results better approximating those attained by manual tracing in the epileptogenic hippocampus. A template database of 195 (81 males, 114 females; age range 32-67 yr, mean 49.16 yr) MR images of mTLE patients was used in this study. Hippocampal segmentation was accomplished manually and by two well-known tools (FreeSurfer and hammer) and two previously published methods developed at their institution [Automatic brain structure segmentation (ABSS) and LocalInfo]. To establish which method was better performing for mTLE cases, several voxel-based, distance-based, and volume-based performance metrics were considered. Statistical validations of the results using automated techniques were compared with the results of benchmark manual segmentation. Extracted metrics were analyzed to find the method that provided a more similar result relative to the benchmark. Among the four automated methods, ABSS generated the most accurate results. For this method, the Dice coefficient was 5.13%, 14.10%, and 16.67% higher, Hausdorff was 22.65%, 86.73%, and 69.58% lower, precision was 4.94%, -4.94%, and 12.35% higher, and the root mean square (RMS) was 19.05%, 61.90%, and 65.08% lower than LocalInfo, FreeSurfer, and hammer, respectively. The Bland-Altman similarity analysis revealed a low bias for the ABSS and LocalInfo techniques compared to the others. The ABSS method for automated hippocampal segmentation outperformed other methods, best approximating what could be achieved by manual tracing. This study also shows that four categories of input data can cause automated segmentation methods to fail. They include incomplete studies, artifact, low signal-to-noise ratio, and inhomogeneity. Different scanner platforms and pulse sequences were considered as means by which to improve reliability of the automated methods. Other modifications were specially devised to enhance a particular method assessed in this study.
Comparative performance evaluation of automated segmentation methods of hippocampus from magnetic resonance images of temporal lobe epilepsy patients

PubMed Central

Hosseini, Mohammad-Parsa; Nazem-Zadeh, Mohammad-Reza; Pompili, Dario; Jafari-Khouzani, Kourosh; Elisevich, Kost; Soltanian-Zadeh, Hamid

2016-01-01

Purpose: Segmentation of the hippocampus from magnetic resonance (MR) images is a key task in the evaluation of mesial temporal lobe epilepsy (mTLE) patients. Several automated algorithms have been proposed although manual segmentation remains the benchmark. Choosing a reliable algorithm is problematic since structural definition pertaining to multiple edges, missing and fuzzy boundaries, and shape changes varies among mTLE subjects. Lack of statistical references and guidance for quantifying the reliability and reproducibility of automated techniques has further detracted from automated approaches. The purpose of this study was to develop a systematic and statistical approach using a large dataset for the evaluation of automated methods and establish a method that would achieve results better approximating those attained by manual tracing in the epileptogenic hippocampus. Methods: A template database of 195 (81 males, 114 females; age range 32–67 yr, mean 49.16 yr) MR images of mTLE patients was used in this study. Hippocampal segmentation was accomplished manually and by two well-known tools (FreeSurfer and hammer) and two previously published methods developed at their institution [Automatic brain structure segmentation (ABSS) and LocalInfo]. To establish which method was better performing for mTLE cases, several voxel-based, distance-based, and volume-based performance metrics were considered. Statistical validations of the results using automated techniques were compared with the results of benchmark manual segmentation. Extracted metrics were analyzed to find the method that provided a more similar result relative to the benchmark. Results: Among the four automated methods, ABSS generated the most accurate results. For this method, the Dice coefficient was 5.13%, 14.10%, and 16.67% higher, Hausdorff was 22.65%, 86.73%, and 69.58% lower, precision was 4.94%, −4.94%, and 12.35% higher, and the root mean square (RMS) was 19.05%, 61.90%, and 65.08% lower than LocalInfo, FreeSurfer, and hammer, respectively. The Bland–Altman similarity analysis revealed a low bias for the ABSS and LocalInfo techniques compared to the others. Conclusions: The ABSS method for automated hippocampal segmentation outperformed other methods, best approximating what could be achieved by manual tracing. This study also shows that four categories of input data can cause automated segmentation methods to fail. They include incomplete studies, artifact, low signal-to-noise ratio, and inhomogeneity. Different scanner platforms and pulse sequences were considered as means by which to improve reliability of the automated methods. Other modifications were specially devised to enhance a particular method assessed in this study. PMID:26745947
On the predictability of land surface fluxes from meteorological variables

NASA Astrophysics Data System (ADS)

Haughton, Ned; Abramowitz, Gab; Pitman, Andy J.

2018-01-01

Previous research has shown that land surface models (LSMs) are performing poorly when compared with relatively simple empirical models over a wide range of metrics and environments. Atmospheric driving data appear to provide information about land surface fluxes that LSMs are not fully utilising. Here, we further quantify the information available in the meteorological forcing data that are used by LSMs for predicting land surface fluxes, by interrogating FLUXNET data, and extending the benchmarking methodology used in previous experiments. We show that substantial performance improvement is possible for empirical models using meteorological data alone, with no explicit vegetation or soil properties, thus setting lower bounds on a priori expectations on LSM performance. The process also identifies key meteorological variables that provide predictive power. We provide an ensemble of empirical benchmarks that are simple to reproduce and provide a range of behaviours and predictive performance, acting as a baseline benchmark set for future studies. We reanalyse previously published LSM simulations and show that there is more diversity between LSMs than previously indicated, although it remains unclear why LSMs are broadly performing so much worse than simple empirical models.

Building disaster-resilient micro enterprises in the developing world.

PubMed

Prasad, Sameer; Su, Hung-Chung; Altay, Nezih; Tata, Jasmine

2015-07-01

Family-owned micro enterprises operating within the informal sector of most developing countries provide millions of citizens with a livelihood and are the economic backbone of many communities. Yet, the turbulence that emanates up or down respective supply chains following a disaster can cause these entities to fail. This study develops a model that recognises the relative weakness of micro enterprises to such disaster-related shocks. The model proposes that micro enterprises can moderate the effect of such shocks by creating resilience through cognitive preparation, continuous learning, and the generation of various forms of social capital (cognitive, relational, and structural). The propositions for the model are established through an extensive literature review, coupled with examples drawn from the documents of humanitarian agencies performing disaster relief work in India. This model also serves as a preliminary basis with which to derive metrics to set benchmarks or to assess the viability of a micro enterprise's ability to survive disaster-related shocks. © 2015 The Author(s). Disasters © Overseas Development Institute, 2015.
Task 28: Web Accessible APIs in the Cloud Trade Study

NASA Technical Reports Server (NTRS)

Gallagher, James; Habermann, Ted; Jelenak, Aleksandar; Lee, Joe; Potter, Nathan; Yang, Muqun

2017-01-01

This study explored three candidate architectures for serving NASA Earth Science Hierarchical Data Format Version 5 (HDF5) data via Hyrax running on Amazon Web Services (AWS). We studied the cost and performance for each architecture using several representative Use-Cases. The objectives of the project are: Conduct a trade study to identify one or more high performance integrated solutions for storing and retrieving NASA HDF5 and Network Common Data Format Version 4 (netCDF4) data in a cloud (web object store) environment. The target environment is Amazon Web Services (AWS) Simple Storage Service (S3).Conduct needed level of software development to properly evaluate solutions in the trade study and to obtain required benchmarking metrics for input into government decision of potential follow-on prototyping. Develop a cloud cost model for the preferred data storage solution (or solutions) that accounts for different granulation and aggregation schemes as well as cost and performance trades.
Low-temperature magnetotransport in Si/SiGe heterostructures on 300 mm Si wafers

NASA Astrophysics Data System (ADS)

Scappucci, Giordano; Yeoh, L.; Sabbagh, D.; Sammak, A.; Boter, J.; Droulers, G.; Kalhor, N.; Brousse, D.; Veldhorst, M.; Vandersypen, L. M. K.; Thomas, N.; Roberts, J.; Pillarisetty, R.; Amin, P.; George, H. C.; Singh, K. J.; Clarke, J. S.

Undoped Si/SiGe heterostructures are a promising material stack for the development of spin qubits in silicon. To deploy a qubit into high volume manufacturing in a quantum computer requires stringent control over substrate uniformity and quality. Electron mobility and valley splitting are two key electrical metrics of substrate quality relevant for qubits. Here we present low-temperature magnetotransport measurements of strained Si quantum wells with mobilities in excess of 100000 cm2/Vs fabricated on 300 mm wafers within the framework of advanced semiconductor manufacturing. These results are benchmarked against the results obtained in Si quantum wells deposited on 100 mm Si wafers in an academic research environment. To ensure rapid progress in quantum wells quality we have implemented fast feedback loops from materials growth, to heterostructure FET fabrication, and low temperature characterisation. On this topic we will present recent progress in developing a cryogenic platform for high-throughput magnetotransport measurements.
The fractured landscape of RNA-seq alignment: the default in our STARs.

PubMed

Ballouz, Sara; Dobin, Alexander; Gingeras, Thomas R; Gillis, Jesse

2018-06-01

Many tools are available for RNA-seq alignment and expression quantification, with comparative value being hard to establish. Benchmarking assessments often highlight methods' good performance, but are focused on either model data or fail to explain variation in performance. This leaves us to ask, what is the most meaningful way to assess different alignment choices? And importantly, where is there room for progress? In this work, we explore the answers to these two questions by performing an exhaustive assessment of the STAR aligner. We assess STAR's performance across a range of alignment parameters using common metrics, and then on biologically focused tasks. We find technical metrics such as fraction mapping or expression profile correlation to be uninformative, capturing properties unlikely to have any role in biological discovery. Surprisingly, we find that changes in alignment parameters within a wide range have little impact on both technical and biological performance. Yet, when performance finally does break, it happens in difficult regions, such as X-Y paralogs and MHC genes. We believe improved reporting by developers will help establish where results are likely to be robust or fragile, providing a better baseline to establish where methodological progress can still occur.
The performance of differential VLBI delay during interplanetary cruise

NASA Technical Reports Server (NTRS)

Moultrie, B.; Wolff, P. J.; Taylor, T. H.

1984-01-01

Project Voyager radio metric data are used to evaluate the orbit determination abilities of several data strategies during spacecraft interplanetary cruise. Benchmark performance is established with an operational data strategy of conventional coherent doppler, coherent range, and explicitly differenced range data from two intercontinental baselines to ameliorate the low declination singularity of the doppler data. Employing a Voyager operations trajectory as a reference, the performance of the operational data strategy is compared to the performances of data strategies using differential VLBI delay data (spacecraft delay minus quasar delay) in combinations with the aforementioned conventional data types. The comparison of strategy performances indicates that high accuracy cruise orbit determination can be achieved with a data strategy employing differential VLBI delay data, where the quantity of coherent radio metric data has been greatly reduced.
Performance Evaluation and Benchmarking of Intelligent Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Madhavan, Raj; Messina, Elena; Tunstel, Edward

To design and develop capable, dependable, and affordable intelligent systems, their performance must be measurable. Scientific methodologies for standardization and benchmarking are crucial for quantitatively evaluating the performance of emerging robotic and intelligent systems technologies. There is currently no accepted standard for quantitatively measuring the performance of these systems against user-defined requirements; and furthermore, there is no consensus on what objective evaluation procedures need to be followed to understand the performance of these systems. The lack of reproducible and repeatable test methods has precluded researchers working towards a common goal from exchanging and communicating results, inter-comparing system performance, and leveragingmore » previous work that could otherwise avoid duplication and expedite technology transfer. Currently, this lack of cohesion in the community hinders progress in many domains, such as manufacturing, service, healthcare, and security. By providing the research community with access to standardized tools, reference data sets, and open source libraries of solutions, researchers and consumers will be able to evaluate the cost and benefits associated with intelligent systems and associated technologies. In this vein, the edited book volume addresses performance evaluation and metrics for intelligent systems, in general, while emphasizing the need and solutions for standardized methods. To the knowledge of the editors, there is not a single book on the market that is solely dedicated to the subject of performance evaluation and benchmarking of intelligent systems. Even books that address this topic do so only marginally or are out of date. The research work presented in this volume fills this void by drawing from the experiences and insights of experts gained both through theoretical development and practical implementation of intelligent systems in a variety of diverse application domains. The book presents a detailed and coherent picture of state-of-the-art, recent developments, and further research areas in intelligent systems.« less
Moment-based metrics for global sensitivity analysis of hydrological systems

NASA Astrophysics Data System (ADS)

Dell'Oca, Aronne; Riva, Monica; Guadagnini, Alberto

2017-12-01

We propose new metrics to assist global sensitivity analysis, GSA, of hydrological and Earth systems. Our approach allows assessing the impact of uncertain parameters on main features of the probability density function, pdf, of a target model output, y. These include the expected value of y, the spread around the mean and the degree of symmetry and tailedness of the pdf of y. Since reliable assessment of higher-order statistical moments can be computationally demanding, we couple our GSA approach with a surrogate model, approximating the full model response at a reduced computational cost. Here, we consider the generalized polynomial chaos expansion (gPCE), other model reduction techniques being fully compatible with our theoretical framework. We demonstrate our approach through three test cases, including an analytical benchmark, a simplified scenario mimicking pumping in a coastal aquifer and a laboratory-scale conservative transport experiment. Our results allow ascertaining which parameters can impact some moments of the model output pdf while being uninfluential to others. We also investigate the error associated with the evaluation of our sensitivity metrics by replacing the original system model through a gPCE. Our results indicate that the construction of a surrogate model with increasing level of accuracy might be required depending on the statistical moment considered in the GSA. The approach is fully compatible with (and can assist the development of) analysis techniques employed in the context of reduction of model complexity, model calibration, design of experiment, uncertainty quantification and risk assessment.
Citizen science: A new perspective to advance spatial pattern evaluation in hydrology.

PubMed

Koch, Julian; Stisen, Simon

2017-01-01

Citizen science opens new pathways that can complement traditional scientific practice. Intuition and reasoning often make humans more effective than computer algorithms in various realms of problem solving. In particular, a simple visual comparison of spatial patterns is a task where humans are often considered to be more reliable than computer algorithms. However, in practice, science still largely depends on computer based solutions, which inevitably gives benefits such as speed and the possibility to automatize processes. However, the human vision can be harnessed to evaluate the reliability of algorithms which are tailored to quantify similarity in spatial patterns. We established a citizen science project to employ the human perception to rate similarity and dissimilarity between simulated spatial patterns of several scenarios of a hydrological catchment model. In total, the turnout counts more than 2500 volunteers that provided over 43000 classifications of 1095 individual subjects. We investigate the capability of a set of advanced statistical performance metrics to mimic the human perception to distinguish between similarity and dissimilarity. Results suggest that more complex metrics are not necessarily better at emulating the human perception, but clearly provide auxiliary information that is valuable for model diagnostics. The metrics clearly differ in their ability to unambiguously distinguish between similar and dissimilar patterns which is regarded a key feature of a reliable metric. The obtained dataset can provide an insightful benchmark to the community to test novel spatial metrics.
Maximizing your return on people.

PubMed

Bassi, Laurie; McMurrer, Daniel

2007-03-01

Though most traditional HR performance metrics don't predict organizational performance, alternatives simply have not existed--until now. During the past ten years, researchers Laurie Bassi and Daniel McMurrer have worked to develop a system that allows executives to assess human capital management (HCM) and to use those metrics both to predict organizational performance and to guide organizations' investments in people. The new framework is based on a core set of HCM drivers that fall into five major categories: leadership practices, employee engagement, knowledge accessibility, workforce optimization, and organizational learning capacity. By employing rigorously designed surveys to score a company on the range of HCM practices across the five categories, it's possible to benchmark organizational HCM capabilities, identify HCM strengths and weaknesses, and link improvements or back-sliding in specific HCM practices with improvements or shortcomings in organizational performance. The process requires determining a "maturity" score for each practice, based on a scale of 1 (low) to 5 (high). Over time, evolving maturity scores from multiple surveys can reveal progress in each of the HCM practices and help a company decide where to focus improvement efforts that will have a direct impact on performance. The authors draw from their work with American Standard, South Carolina's Beaufort County School District, and a bevy of financial firms to show how improving HCM scores led to increased sales, safety, academic test scores, and stock returns. Bassi and McMurrer urge HR departments to move beyond the usual metrics and begin using HCM measurement tools to gauge how well people are managed and developed throughout the organization. In this new role, according to the authors, HR can take on strategic responsibility and ensure that superior human capital management becomes central to the organization's culture.
Access to a simulator is not enough: the benefits of virtual reality training based on peer-group-derived benchmarks--a randomized controlled trial.

PubMed

von Websky, Martin W; Raptis, Dimitri A; Vitz, Martina; Rosenthal, Rachel; Clavien, P A; Hahnloser, Dieter

2013-11-01

Virtual reality (VR) simulators are widely used to familiarize surgical novices with laparoscopy, but VR training methods differ in efficacy. In the present trial, self-controlled basic VR training (SC-training) was tested against training based on peer-group-derived benchmarks (PGD-training). First, novice laparoscopic residents were randomized into a SC group (n = 34), and a group using PGD-benchmarks (n = 34) for basic laparoscopic training. After completing basic training, both groups performed 60 VR laparoscopic cholecystectomies for performance analysis. Primary endpoints were simulator metrics; secondary endpoints were program adherence, trainee motivation, and training efficacy. Altogether, 66 residents completed basic training, and 3,837 of 3,960 (96.8 %) cholecystectomies were available for analysis. Course adherence was good, with only two dropouts, both in the SC-group. The PGD-group spent more time and repetitions in basic training until the benchmarks were reached and subsequently showed better performance in the readout cholecystectomies: Median time (gallbladder extraction) showed significant differences of 520 s (IQR 354-738 s) in SC-training versus 390 s (IQR 278-536 s) in the PGD-group (p < 0.001) and 215 s (IQR 175-276 s) in experts, respectively. Path length of the right instrument also showed significant differences, again with the PGD-training group being more efficient. Basic VR laparoscopic training based on PGD benchmarks with external assessment is superior to SC training, resulting in higher trainee motivation and better performance in simulated laparoscopic cholecystectomies. We recommend such a basic course based on PGD benchmarks before advancing to more elaborate VR training.
Uncertainty in Earth System Models: Benchmarks for Ocean Model Performance and Validation

NASA Astrophysics Data System (ADS)

Ogunro, O. O.; Elliott, S.; Collier, N.; Wingenter, O. W.; Deal, C.; Fu, W.; Hoffman, F. M.

2017-12-01

The mean ocean CO2 sink is a major component of the global carbon budget, with marine reservoirs holding about fifty times more carbon than the atmosphere. Phytoplankton play a significant role in the net carbon sink through photosynthesis and drawdown, such that about a quarter of anthropogenic CO2 emissions end up in the ocean. Biology greatly increases the efficiency of marine environments in CO2 uptake and ultimately reduces the impact of the persistent rise in atmospheric concentrations. However, a number of challenges remain in appropriate representation of marine biogeochemical processes in Earth System Models (ESM). These threaten to undermine the community effort to quantify seasonal to multidecadal variability in ocean uptake of atmospheric CO2. In a bid to improve analyses of marine contributions to climate-carbon cycle feedbacks, we have developed new analysis methods and biogeochemistry metrics as part of the International Ocean Model Benchmarking (IOMB) effort. Our intent is to meet the growing diagnostic and benchmarking needs of ocean biogeochemistry models. The resulting software package has been employed to validate DOE ocean biogeochemistry results by comparison with observational datasets. Several other international ocean models contributing results to the fifth phase of the Coupled Model Intercomparison Project (CMIP5) were analyzed simultaneously. Our comparisons suggest that the biogeochemical processes determining CO2 entry into the global ocean are not well represented in most ESMs. Polar regions continue to show notable biases in many critical biogeochemical and physical oceanographic variables. Some of these disparities could have first order impacts on the conversion of atmospheric CO2 to organic carbon. In addition, single forcing simulations show that the current ocean state can be partly explained by the uptake of anthropogenic emissions. Combined effects of two or more of these forcings on ocean biogeochemical cycles and ecosystems are challenging to predict since additive or antagonistic effects may occur. A benchmarking tool for accurate assessment and validation of marine biogeochemical outputs will be indispensable as the model community continues to improve ESM developments. It will provide a first order tool in understanding climate-carbon cycle feedbacks.
Achieving Climate Change Absolute Accuracy in Orbit

NASA Technical Reports Server (NTRS)

Wielicki, Bruce A.; Young, D. F.; Mlynczak, M. G.; Thome, K. J; Leroy, S.; Corliss, J.; Anderson, J. G.; Ao, C. O.; Bantges, R.; Best, F.;

2013-01-01

The Climate Absolute Radiance and Refractivity Observatory (CLARREO) mission will provide a calibration laboratory in orbit for the purpose of accurately measuring and attributing climate change. CLARREO measurements establish new climate change benchmarks with high absolute radiometric accuracy and high statistical confidence across a wide range of essential climate variables. CLARREO's inherently high absolute accuracy will be verified and traceable on orbit to Système Internationale (SI) units. The benchmarks established by CLARREO will be critical for assessing changes in the Earth system and climate model predictive capabilities for decades into the future as society works to meet the challenge of optimizing strategies for mitigating and adapting to climate change. The CLARREO benchmarks are derived from measurements of the Earth's thermal infrared spectrum (5-50 micron), the spectrum of solar radiation reflected by the Earth and its atmosphere (320-2300 nm), and radio occultation refractivity from which accurate temperature profiles are derived. The mission has the ability to provide new spectral fingerprints of climate change, as well as to provide the first orbiting radiometer with accuracy sufficient to serve as the reference transfer standard for other space sensors, in essence serving as a "NIST [National Institute of Standards and Technology] in orbit." CLARREO will greatly improve the accuracy and relevance of a wide range of space-borne instruments for decadal climate change. Finally, CLARREO has developed new metrics and methods for determining the accuracy requirements of climate observations for a wide range of climate variables and uncertainty sources. These methods should be useful for improving our understanding of observing requirements for most climate change observations.

Formulation of a parametric systems design framework for disaster response planning

NASA Astrophysics Data System (ADS)

Mma, Stephanie Weiya

The occurrence of devastating natural disasters in the past several years have prompted communities, responding organizations, and governments to seek ways to improve disaster preparedness capabilities locally, regionally, nationally, and internationally. A holistic approach to design used in the aerospace and industrial engineering fields enables efficient allocation of resources through applied parametric changes within a particular design to improve performance metrics to selected standards. In this research, this methodology is applied to disaster preparedness, using a community's time to restoration after a disaster as the response metric. A review of the responses from Hurricane Katrina and the 2010 Haiti earthquake, among other prominent disasters, provides observations leading to some current capability benchmarking. A need for holistic assessment and planning exists for communities but the current response planning infrastructure lacks a standardized framework and standardized assessment metrics. Within the humanitarian logistics community, several different metrics exist, enabling quantification and measurement of a particular area's vulnerability. These metrics, combined with design and planning methodologies from related fields, such as engineering product design, military response planning, and business process redesign, provide insight and a framework from which to begin developing a methodology to enable holistic disaster response planning. The developed methodology was applied to the communities of Shelby County, TN and pre-Hurricane-Katrina Orleans Parish, LA. Available literature and reliable media sources provide information about the different values of system parameters within the decomposition of the community aspects and also about relationships among the parameters. The community was modeled as a system dynamics model and was tested in the implementation of two, five, and ten year improvement plans for Preparedness, Response, and Development capabilities, and combinations of these capabilities. For Shelby County and for Orleans Parish, the Response improvement plan reduced restoration time the most. For the combined capabilities, Shelby County experienced the greatest reduction in restoration time with the implementation of Development and Response capability improvements, and for Orleans Parish it was the Preparedness and Response capability improvements. Optimization of restoration time with community parameters was tested by using a Particle Swarm Optimization algorithm. Fifty different optimized restoration times were generated using the Particle Swarm Optimization algorithm and ranked using the Technique for Order Preference by Similarity to Ideal Solution. The optimization results indicate that the greatest reduction in restoration time for a community is achieved with a particular combination of different parameter values instead of the maximization of each parameter.
Funding AIDS programmes in the era of shared responsibility: an analysis of domestic spending in 12 low-income and middle-income countries.

PubMed

Resch, Stephen; Ryckman, Theresa; Hecht, Robert

2015-01-01

As the incomes of many AIDS-burdened countries grow and donors' budgets for helping to fight the disease tighten, national governments and external funding partners increasingly face the following question: what is the capacity of countries that are highly affected by AIDS to finance their responses from domestic sources, and how might this affect the level of donor support? In this study, we attempt to answer this question. We propose metrics to estimate domestic AIDS financing, using methods related to national prioritisation of health spending, disease burden, and economic growth. We apply these metrics to 12 countries in sub-Saharan Africa with a high prevalence of HIV/AIDS, generating scenarios of possible future domestic expenditure. We compare the results with total AIDS financing requirements to calculate the size of the resulting funding gaps and implications for donors. Nearly all 12 countries studied fall short of the proposed expenditure benchmarks. If they met these benchmarks fully, domestic spending on AIDS would increase by 2·5 times, from US$2·1 billion to $5·1 billion annually, covering 64% of estimated future funding requirements and leaving a gap of around a third of the total $7·9 billion needed. Although upper-middle-income countries, such as Botswana, Namibia, and South Africa, would become financially self-reliant, lower-income countries, such as Mozambique and Ethiopia, would remain heavily dependent on donor funds. The proposed metrics could be useful to stimulate further analysis and discussion around domestic spending on AIDS and corresponding donor contributions, and to structure financial agreements between recipient country governments and donors. Coupled with improved resource tracking, such metrics could enhance transparency and accountability for efficient use of money and maximise the effect of available funding to prevent HIV infections and save lives. US Centers for Disease Control and Prevention. Copyright © 2015 Hecht et al. Open Access article distributed under the terms of CC BY-NC-ND. Published by .. All rights reserved.
Emergency department operations dictionary: results of the second performance measures and benchmarking summit.

PubMed

Welch, Shari J; Stone-Griffith, Suzanne; Asplin, Brent; Davidson, Steven J; Augustine, James; Schuur, Jeremiah D

2011-05-01

The public, payers, hospitals, and Centers for Medicare and Medicaid Services (CMS) are demanding that emergency departments (EDs) measure and improve performance, but this cannot be done unless we define the terms used in ED operations. On February 24, 2010, 32 stakeholders from 13 professional organizations met in Salt Lake City, Utah, to standardize ED operations metrics and definitions, which are presented in this consensus paper. Emergency medicine (EM) experts attending the Second Performance Measures and Benchmarking Summit reviewed, expanded, and updated key definitions for ED operations. Prior to the meeting, participants were provided with the definitions created at the first summit in 2006 and relevant documents from other organizations and asked to identify gaps and limitations in the original work. Those responses were used to devise a plan to revise and update the definitions. At the summit, attendees discussed and debated key terminology, and workgroups were created to draft a more comprehensive document. These results have been crafted into two reference documents, one for metrics and the operations dictionary presented here. The ED Operations Dictionary defines ED spaces, processes, patient populations, and new ED roles. Common definitions of key terms will improve the ability to compare ED operations research and practice and provide a common language for frontline practitioners, managers, and researchers. © 2011 by the Society for Academic Emergency Medicine.
Ranking metrics in gene set enrichment analysis: do they matter?

PubMed

Zyla, Joanna; Marczyk, Michal; Weiner, January; Polanska, Joanna

2017-05-12

There exist many methods for describing the complex relation between changes of gene expression in molecular pathways or gene ontologies under different experimental conditions. Among them, Gene Set Enrichment Analysis seems to be one of the most commonly used (over 10,000 citations). An important parameter, which could affect the final result, is the choice of a metric for the ranking of genes. Applying a default ranking metric may lead to poor results. In this work 28 benchmark data sets were used to evaluate the sensitivity and false positive rate of gene set analysis for 16 different ranking metrics including new proposals. Furthermore, the robustness of the chosen methods to sample size was tested. Using k-means clustering algorithm a group of four metrics with the highest performance in terms of overall sensitivity, overall false positive rate and computational load was established i.e. absolute value of Moderated Welch Test statistic, Minimum Significant Difference, absolute value of Signal-To-Noise ratio and Baumgartner-Weiss-Schindler test statistic. In case of false positive rate estimation, all selected ranking metrics were robust with respect to sample size. In case of sensitivity, the absolute value of Moderated Welch Test statistic and absolute value of Signal-To-Noise ratio gave stable results, while Baumgartner-Weiss-Schindler and Minimum Significant Difference showed better results for larger sample size. Finally, the Gene Set Enrichment Analysis method with all tested ranking metrics was parallelised and implemented in MATLAB, and is available at https://github.com/ZAEDPolSl/MrGSEA . Choosing a ranking metric in Gene Set Enrichment Analysis has critical impact on results of pathway enrichment analysis. The absolute value of Moderated Welch Test has the best overall sensitivity and Minimum Significant Difference has the best overall specificity of gene set analysis. When the number of non-normally distributed genes is high, using Baumgartner-Weiss-Schindler test statistic gives better outcomes. Also, it finds more enriched pathways than other tested metrics, which may induce new biological discoveries.
Benchmarking Evaluation Results for Prototype Extravehicular Activity Gloves

NASA Technical Reports Server (NTRS)

Aitchison, Lindsay; McFarland, Shane

2012-01-01

The Space Suit Assembly (SSA) Development Team at NASA Johnson Space Center has invested heavily in the advancement of rear-entry planetary exploration suit design but largely deferred development of extravehicular activity (EVA) glove designs, and accepted the risk of using the current flight gloves, Phase VI, for unique mission scenarios outside the Space Shuttle and International Space Station (ISS) Program realm of experience. However, as design reference missions mature, the risks of using heritage hardware have highlighted the need for developing robust new glove technologies. To address the technology gap, the NASA Game-Changing Technology group provided start-up funding for the High Performance EVA Glove (HPEG) Project in the spring of 2012. The overarching goal of the HPEG Project is to develop a robust glove design that increases human performance during EVA and creates pathway for future implementation of emergent technologies, with specific aims of increasing pressurized mobility to 60% of barehanded capability, increasing the durability by 100%, and decreasing the potential of gloves to cause injury during use. The HPEG Project focused initial efforts on identifying potential new technologies and benchmarking the performance of current state of the art gloves to identify trends in design and fit leading to establish standards and metrics against which emerging technologies can be assessed at both the component and assembly levels. The first of the benchmarking tests evaluated the quantitative mobility performance and subjective fit of four prototype gloves developed by Flagsuit LLC, Final Frontier Designs, LLC Dover, and David Clark Company as compared to the Phase VI. All of the companies were asked to design and fabricate gloves to the same set of NASA provided hand measurements (which corresponded to a single size of Phase Vi glove) and focus their efforts on improving mobility in the metacarpal phalangeal and carpometacarpal joints. Four test subjects representing the design ]to hand anthropometry completed range of motion, grip/pinch strength, dexterity, and fit evaluations for each glove design in both the unpressurized and pressurized conditions. This paper provides a comparison of the test results along with a detailed description of hardware and test methodologies used.
Soft-core processor study for node-based architectures.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Van Houten, Jonathan Roger; Jarosz, Jason P.; Welch, Benjamin James

2008-09-01

Node-based architecture (NBA) designs for future satellite projects hold the promise of decreasing system development time and costs, size, weight, and power and positioning the laboratory to address other emerging mission opportunities quickly. Reconfigurable Field Programmable Gate Array (FPGA) based modules will comprise the core of several of the NBA nodes. Microprocessing capabilities will be necessary with varying degrees of mission-specific performance requirements on these nodes. To enable the flexibility of these reconfigurable nodes, it is advantageous to incorporate the microprocessor into the FPGA itself, either as a hardcore processor built into the FPGA or as a soft-core processor builtmore » out of FPGA elements. This document describes the evaluation of three reconfigurable FPGA based processors for use in future NBA systems--two soft cores (MicroBlaze and non-fault-tolerant LEON) and one hard core (PowerPC 405). Two standard performance benchmark applications were developed for each processor. The first, Dhrystone, is a fixed-point operation metric. The second, Whetstone, is a floating-point operation metric. Several trials were run at varying code locations, loop counts, processor speeds, and cache configurations. FPGA resource utilization was recorded for each configuration. Cache configurations impacted the results greatly; for optimal processor efficiency it is necessary to enable caches on the processors. Processor caches carry a penalty; cache error mitigation is necessary when operating in a radiation environment.« less
Sparse Contextual Activation for Efficient Visual Re-Ranking.

PubMed

Bai, Song; Bai, Xiang

2016-03-01

In this paper, we propose an extremely efficient algorithm for visual re-ranking. By considering the original pairwise distance in the contextual space, we develop a feature vector called sparse contextual activation (SCA) that encodes the local distribution of an image. Hence, re-ranking task can be simply accomplished by vector comparison under the generalized Jaccard metric, which has its theoretical meaning in the fuzzy set theory. In order to improve the time efficiency of re-ranking procedure, inverted index is successfully introduced to speed up the computation of generalized Jaccard metric. As a result, the average time cost of re-ranking for a certain query can be controlled within 1 ms. Furthermore, inspired by query expansion, we also develop an additional method called local consistency enhancement on the proposed SCA to improve the retrieval performance in an unsupervised manner. On the other hand, the retrieval performance using a single feature may not be satisfactory enough, which inspires us to fuse multiple complementary features for accurate retrieval. Based on SCA, a robust feature fusion algorithm is exploited that also preserves the characteristic of high time efficiency. We assess our proposed method in various visual re-ranking tasks. Experimental results on Princeton shape benchmark (3D object), WM-SRHEC07 (3D competition), YAEL data set B (face), MPEG-7 data set (shape), and Ukbench data set (image) manifest the effectiveness and efficiency of SCA.
A multi-center study benchmarks software tools for label-free proteome quantification

PubMed Central

Gillet, Ludovic C; Bernhardt, Oliver M.; MacLean, Brendan; Röst, Hannes L.; Tate, Stephen A.; Tsou, Chih-Chiang; Reiter, Lukas; Distler, Ute; Rosenberger, George; Perez-Riverol, Yasset; Nesvizhskii, Alexey I.; Aebersold, Ruedi; Tenzer, Stefan

2016-01-01

The consistent and accurate quantification of proteins by mass spectrometry (MS)-based proteomics depends on the performance of instruments, acquisition methods and data analysis software. In collaboration with the software developers, we evaluated OpenSWATH, SWATH2.0, Skyline, Spectronaut and DIA-Umpire, five of the most widely used software methods for processing data from SWATH-MS (sequential window acquisition of all theoretical fragment ion spectra), a method that uses data-independent acquisition (DIA) for label-free protein quantification. We analyzed high-complexity test datasets from hybrid proteome samples of defined quantitative composition acquired on two different MS instruments using different SWATH isolation windows setups. For consistent evaluation we developed LFQbench, an R-package to calculate metrics of precision and accuracy in label-free quantitative MS, and report the identification performance, robustness and specificity of each software tool. Our reference datasets enabled developers to improve their software tools. After optimization, all tools provided highly convergent identification and reliable quantification performance, underscoring their robustness for label-free quantitative proteomics. PMID:27701404

A multicenter study benchmarks software tools for label-free proteome quantification.

PubMed

Navarro, Pedro; Kuharev, Jörg; Gillet, Ludovic C; Bernhardt, Oliver M; MacLean, Brendan; Röst, Hannes L; Tate, Stephen A; Tsou, Chih-Chiang; Reiter, Lukas; Distler, Ute; Rosenberger, George; Perez-Riverol, Yasset; Nesvizhskii, Alexey I; Aebersold, Ruedi; Tenzer, Stefan

2016-11-01

Consistent and accurate quantification of proteins by mass spectrometry (MS)-based proteomics depends on the performance of instruments, acquisition methods and data analysis software. In collaboration with the software developers, we evaluated OpenSWATH, SWATH 2.0, Skyline, Spectronaut and DIA-Umpire, five of the most widely used software methods for processing data from sequential window acquisition of all theoretical fragment-ion spectra (SWATH)-MS, which uses data-independent acquisition (DIA) for label-free protein quantification. We analyzed high-complexity test data sets from hybrid proteome samples of defined quantitative composition acquired on two different MS instruments using different SWATH isolation-window setups. For consistent evaluation, we developed LFQbench, an R package, to calculate metrics of precision and accuracy in label-free quantitative MS and report the identification performance, robustness and specificity of each software tool. Our reference data sets enabled developers to improve their software tools. After optimization, all tools provided highly convergent identification and reliable quantification performance, underscoring their robustness for label-free quantitative proteomics.
Assessing the quality of restored images in optical long-baseline interferometry

NASA Astrophysics Data System (ADS)

Gomes, Nuno; Garcia, Paulo J. V.; Thiébaut, Éric

2017-03-01

Assessing the quality of aperture synthesis maps is relevant for benchmarking image reconstruction algorithms, for the scientific exploitation of data from optical long-baseline interferometers, and for the design/upgrade of new/existing interferometric imaging facilities. Although metrics have been proposed in these contexts, no systematic study has been conducted on the selection of a robust metric for quality assessment. This article addresses the question: what is the best metric to assess the quality of a reconstructed image? It starts by considering several metrics and selecting a few based on general properties. Then, a variety of image reconstruction cases are considered. The observational scenarios are phase closure and phase referencing at the Very Large Telescope Interferometer (VLTI), for a combination of two, three, four and six telescopes. End-to-end image reconstruction is accomplished with the MIRA software, and several merit functions are put to test. It is found that convolution by an effective point spread function is required for proper image quality assessment. The effective angular resolution of the images is superior to naive expectation based on the maximum frequency sampled by the array. This is due to the prior information used in the aperture synthesis algorithm and to the nature of the objects considered. The ℓ1-norm is the most robust of all considered metrics, because being linear it is less sensitive to image smoothing by high regularization levels. For the cases considered, this metric allows the implementation of automatic quality assessment of reconstructed images, with a performance similar to human selection.
Texture classification using non-Euclidean Minkowski dilation

NASA Astrophysics Data System (ADS)

Florindo, Joao B.; Bruno, Odemir M.

2018-03-01

This study presents a new method to extract meaningful descriptors of gray-scale texture images using Minkowski morphological dilation based on the Lp metric. The proposed approach is motivated by the success previously achieved by Bouligand-Minkowski fractal descriptors on texture classification. In essence, such descriptors are directly derived from the morphological dilation of a three-dimensional representation of the gray-level pixels using the classical Euclidean metric. In this way, we generalize the dilation for different values of p in the Lp metric (Euclidean is a particular case when p = 2) and obtain the descriptors from the cumulated distribution of the distance transform computed over the texture image. The proposed method is compared to other state-of-the-art approaches (such as local binary patterns and textons for example) in the classification of two benchmark data sets (UIUC and Outex). The proposed descriptors outperformed all the other approaches in terms of rate of images correctly classified. The interesting results suggest the potential of these descriptors in this type of task, with a wide range of possible applications to real-world problems.
Metrics for Success in the Preservation of Scientific Data at the STFC Centre for Environmental Data Archival (CEDA).

NASA Astrophysics Data System (ADS)

Lawrence, B.; Pepler, S.

2009-04-01

CEDA (http://www.ceda.ac.uk) hosts three main data centres: the British Atmospheric Data Centre (http://badc.nerc.ac.uk), the NERC Earth Observation Data Centre (http://neodc.nerc.ac.uk), and the Intergovernmental Panel for Climate Change Dedicated Data Centre (http://ipcc-data.org) as well as components of many national and international projects. CEDA recieves both core funding (from the UK Natural Environment Research Council) and per project funding (from a variety of sources). However, all funders require metrics assessing success. In the case of preservation it is hard to measure success - usage alone is not enough, since next year someone may use currently unused data if it is well preserved, and so it is the act of preservation which in this case marks success. Even where data is accessed, it is not necessarily used. Hence at CEDA we have three key focii in our approach to metrics: measuring direct website access, benchmarking procedures against best practice, and hopefully soon, recording data citation. In this presentation we cover how we are addressing each of these three areas.
Quantifying density cues in grouping displays.

PubMed

Machilsen, Bart; Wagemans, Johan; Demeyer, Maarten

2016-09-01

Perceptual grouping processes are typically studied using sparse displays of spatially separated elements. Unless the grouping cue of interest is a proximity cue, researchers will want to ascertain that such a cue is absent from the display. Various solutions to this problem have been employed in the literature; however, no validation of these methods exists. Here, we test a number of local density metrics both through their performance as constrained ideal observer models, and through a comparison with a large dataset of human detection trials. We conclude that for the selection of stimuli without a density cue, the Voronoi density metric is preferable, especially if combined with a measurement of the distance to each element's nearest neighbor. We offer the entirety of the dataset as a benchmark for the evaluation of future, possibly improved, metrics. With regard to human processes of grouping by proximity, we found observers to be insensitive to target groupings that are more sparse than the surrounding distractor elements, and less sensitive to regularity cues in element positioning than to local clusterings of target elements. Copyright © 2015 Elsevier Ltd. All rights reserved.
Benchmarking density functionals for hydrogen-helium mixtures with quantum Monte Carlo: Energetics, pressures, and forces

DOE Office of Scientific and Technical Information (OSTI.GOV)

Clay, Raymond C.; Holzmann, Markus; Ceperley, David M.

An accurate understanding of the phase diagram of dense hydrogen and helium mixtures is a crucial component in the construction of accurate models of Jupiter, Saturn, and Jovian extrasolar planets. Though DFT based rst principles methods have the potential to provide the accuracy and computational e ciency required for this task, recent benchmarking in hydrogen has shown that achieving this accuracy requires a judicious choice of functional, and a quanti cation of the errors introduced. In this work, we present a quantum Monte Carlo based benchmarking study of a wide range of density functionals for use in hydrogen-helium mixtures atmore » thermodynamic conditions relevant for Jovian planets. Not only do we continue our program of benchmarking energetics and pressures, but we deploy QMC based force estimators and use them to gain insights into how well the local liquid structure is captured by di erent density functionals. We nd that TPSS, BLYP and vdW-DF are the most accurate functionals by most metrics, and that the enthalpy, energy, and pressure errors are very well behaved as a function of helium concentration. Beyond this, we highlight and analyze the major error trends and relative di erences exhibited by the major classes of functionals, and estimate the magnitudes of these e ects when possible.« less
Benchmarking density functionals for hydrogen-helium mixtures with quantum Monte Carlo: Energetics, pressures, and forces

DOE PAGES

Clay, Raymond C.; Holzmann, Markus; Ceperley, David M.; ...

2016-01-19

An accurate understanding of the phase diagram of dense hydrogen and helium mixtures is a crucial component in the construction of accurate models of Jupiter, Saturn, and Jovian extrasolar planets. Though DFT based rst principles methods have the potential to provide the accuracy and computational e ciency required for this task, recent benchmarking in hydrogen has shown that achieving this accuracy requires a judicious choice of functional, and a quanti cation of the errors introduced. In this work, we present a quantum Monte Carlo based benchmarking study of a wide range of density functionals for use in hydrogen-helium mixtures atmore » thermodynamic conditions relevant for Jovian planets. Not only do we continue our program of benchmarking energetics and pressures, but we deploy QMC based force estimators and use them to gain insights into how well the local liquid structure is captured by di erent density functionals. We nd that TPSS, BLYP and vdW-DF are the most accurate functionals by most metrics, and that the enthalpy, energy, and pressure errors are very well behaved as a function of helium concentration. Beyond this, we highlight and analyze the major error trends and relative di erences exhibited by the major classes of functionals, and estimate the magnitudes of these e ects when possible.« less
Stratification of unresponsive patients by an independently validated index of brain complexity

PubMed Central

Casarotto, Silvia; Comanducci, Angela; Rosanova, Mario; Sarasso, Simone; Fecchio, Matteo; Napolitani, Martino; Pigorini, Andrea; G. Casali, Adenauer; Trimarchi, Pietro D.; Boly, Melanie; Gosseries, Olivia; Bodart, Olivier; Curto, Francesco; Landi, Cristina; Mariotti, Maurizio; Devalle, Guya; Laureys, Steven; Tononi, Giulio

2016-01-01

Objective Validating objective, brain‐based indices of consciousness in behaviorally unresponsive patients represents a challenge due to the impossibility of obtaining independent evidence through subjective reports. Here we address this problem by first validating a promising metric of consciousness—the Perturbational Complexity Index (PCI)—in a benchmark population who could confirm the presence or absence of consciousness through subjective reports, and then applying the same index to patients with disorders of consciousness (DOCs). Methods The benchmark population encompassed 150 healthy controls and communicative brain‐injured subjects in various states of conscious wakefulness, disconnected consciousness, and unconsciousness. Receiver operating characteristic curve analysis was performed to define an optimal cutoff for discriminating between the conscious and unconscious conditions. This cutoff was then applied to a cohort of noncommunicative DOC patients (38 in a minimally conscious state [MCS] and 43 in a vegetative state [VS]). Results We found an empirical cutoff that discriminated with 100% sensitivity and specificity between the conscious and the unconscious conditions in the benchmark population. This cutoff resulted in a sensitivity of 94.7% in detecting MCS and allowed the identification of a number of unresponsive VS patients (9 of 43) with high values of PCI, overlapping with the distribution of the benchmark conscious condition. Interpretation Given its high sensitivity and specificity in the benchmark and MCS population, PCI offers a reliable, independently validated stratification of unresponsive patients that has important physiopathological and therapeutic implications. In particular, the high‐PCI subgroup of VS patients may retain a capacity for consciousness that is not expressed in behavior. Ann Neurol 2016;80:718–729 PMID:27717082
Evaluation of metrics and baselines for tracking greenhouse gas emissions trends: Recommendations for the California climate action registry

DOE Office of Scientific and Technical Information (OSTI.GOV)

Price, Lynn; Murtishaw, Scott; Worrell, Ernst

2003-06-01

Executive Summary: The California Climate Action Registry, which was initially established in 2000 and began operation in Fall 2002, is a voluntary registry for recording annual greenhouse gas (GHG) emissions. The purpose of the Registry is to assist California businesses and organizations in their efforts to inventory and document emissions in order to establish a baseline and to document early actions to increase energy efficiency and decrease GHG emissions. The State of California has committed to use its ''best efforts'' to ensure that entities that establish GHG emissions baselines and register their emissions will receive ''appropriate consideration under any futuremore » international, federal, or state regulatory scheme relating to greenhouse gas emissions.'' Reporting of GHG emissions involves documentation of both ''direct'' emissions from sources that are under the entity's control and indirect emissions controlled by others. Electricity generated by an off-site power source is consider ed to be an indirect GHG emission and is required to be included in the entity's report. Registry participants include businesses, non-profit organizations, municipalities, state agencies, and other entities. Participants are required to register the GHG emissions of all operations in California, and are encouraged to report nationwide. For the first three years of participation, the Registry only requires the reporting of carbon dioxide (CO2) emissions, although participants are encouraged to report the remaining five Kyoto Protocol GHGs (CH4, N2O, HFCs, PFCs, and SF6). After three years, reporting of all six Kyoto GHG emissions is required. The enabling legislation for the Registry (SB 527) requires total GHG emissions to be registered and requires reporting of ''industry-specific metrics'' once such metrics have been adopted by the Registry. The Ernest Orlando Lawrence Berkeley National Laboratory (Berkeley Lab) was asked to provide technical assistance to the California Energy Commission (Energy Commission) related to the Registry in three areas: (1) assessing the availability and usefulness of industry-specific metrics, (2) evaluating various methods for establishing baselines for calculating GHG emissions reductions related to specific actions taken by Registry participants, and (3) establishing methods for calculating electricity CO2 emission factors. The third area of research was completed in 2002 and is documented in Estimating Carbon Dioxide Emissions Factors for the California Electric Power Sector (Marnay et al., 2002). This report documents our findings related to the first areas of research. For the first area of research, the overall objective was to evaluate the metrics, such as emissions per economic unit or emissions per unit of production that can be used to report GHG emissions trends for potential Registry participants. This research began with an effort to identify methodologies, benchmarking programs, inventories, protocols, and registries that u se industry-specific metrics to track trends in energy use or GHG emissions in order to determine what types of metrics have already been developed. The next step in developing industry-specific metrics was to assess the availability of data needed to determine metric development priorities. Berkeley Lab also determined the relative importance of different potential Registry participant categories in order to asses s the availability of sectoral or industry-specific metrics and then identified industry-specific metrics in use around the world. While a plethora of metrics was identified, no one metric that adequately tracks trends in GHG emissions while maintaining confidentiality of data was identified. As a result of this review, Berkeley Lab recommends the development of a GHG intensity index as a new metric for reporting and tracking GHG emissions trends.Such an index could provide an industry-specific metric for reporting and tracking GHG emissions trends to accurately reflect year to year changes while protecting proprietary data. This GHG intensity index changes while protecting proprietary data. This GHG intensity index would provide Registry participants with a means for demonstrating improvements in their energy and GHG emissions per unit of production without divulging specific values. For the second research area, Berkeley Lab evaluated various methods used to calculate baselines for documentation of energy consumption or GHG emissions reductions, noting those that use industry-specific metrics. Accounting for actions to reduce GHGs can be done on a project-by-project basis or on an entity basis. Establishing project-related baselines for mitigation efforts has been widely discussed in the context of two of the so-called ''flexible mechanisms'' of the Kyoto Protocol to the United Nations Framework Convention on Climate Change (Kyoto Protocol) Joint Implementation (JI) and the Clean Development Mechanism (CDM).« less
Demonstration of a software design and statistical analysis methodology with application to patient outcomes data sets

PubMed Central

Mayo, Charles; Conners, Steve; Warren, Christopher; Miller, Robert; Court, Laurence; Popple, Richard

2013-01-01

Purpose: With emergence of clinical outcomes databases as tools utilized routinely within institutions, comes need for software tools to support automated statistical analysis of these large data sets and intrainstitutional exchange from independent federated databases to support data pooling. In this paper, the authors present a design approach and analysis methodology that addresses both issues. Methods: A software application was constructed to automate analysis of patient outcomes data using a wide range of statistical metrics, by combining use of C#.Net and R code. The accuracy and speed of the code was evaluated using benchmark data sets. Results: The approach provides data needed to evaluate combinations of statistical measurements for ability to identify patterns of interest in the data. Through application of the tools to a benchmark data set for dose-response threshold and to SBRT lung data sets, an algorithm was developed that uses receiver operator characteristic curves to identify a threshold value and combines use of contingency tables, Fisher exact tests, Welch t-tests, and Kolmogorov-Smirnov tests to filter the large data set to identify values demonstrating dose-response. Kullback-Leibler divergences were used to provide additional confirmation. Conclusions: The work demonstrates the viability of the design approach and the software tool for analysis of large data sets. PMID:24320426
Demonstration of a software design and statistical analysis methodology with application to patient outcomes data sets.

PubMed

Mayo, Charles; Conners, Steve; Warren, Christopher; Miller, Robert; Court, Laurence; Popple, Richard

2013-11-01

With emergence of clinical outcomes databases as tools utilized routinely within institutions, comes need for software tools to support automated statistical analysis of these large data sets and intrainstitutional exchange from independent federated databases to support data pooling. In this paper, the authors present a design approach and analysis methodology that addresses both issues. A software application was constructed to automate analysis of patient outcomes data using a wide range of statistical metrics, by combining use of C#.Net and R code. The accuracy and speed of the code was evaluated using benchmark data sets. The approach provides data needed to evaluate combinations of statistical measurements for ability to identify patterns of interest in the data. Through application of the tools to a benchmark data set for dose-response threshold and to SBRT lung data sets, an algorithm was developed that uses receiver operator characteristic curves to identify a threshold value and combines use of contingency tables, Fisher exact tests, Welch t-tests, and Kolmogorov-Smirnov tests to filter the large data set to identify values demonstrating dose-response. Kullback-Leibler divergences were used to provide additional confirmation. The work demonstrates the viability of the design approach and the software tool for analysis of large data sets.
Intelligent Luminance Control of Lighting Systems Based on Imaging Sensor Feedback

PubMed Central

Liu, Haoting; Zhou, Qianxiang; Yang, Jin; Jiang, Ting; Liu, Zhizhen; Li, Jie

2017-01-01

An imaging sensor-based intelligent Light Emitting Diode (LED) lighting system for desk use is proposed. In contrast to the traditional intelligent lighting system, such as the photosensitive resistance sensor-based or the infrared sensor-based system, the imaging sensor can realize a finer perception of the environmental light; thus it can guide a more precise lighting control. Before this system works, first lots of typical imaging lighting data of the desk application are accumulated. Second, a series of subjective and objective Lighting Effect Evaluation Metrics (LEEMs) are defined and assessed for these datasets above. Then the cluster benchmarks of these objective LEEMs can be obtained. Third, both a single LEEM-based control and a multiple LEEMs-based control are developed to realize a kind of optimal luminance tuning. When this system works, first it captures the lighting image using a wearable camera. Then it computes the objective LEEMs of the captured image and compares them with the cluster benchmarks of the objective LEEMs. Finally, the single LEEM-based or the multiple LEEMs-based control can be implemented to get a kind of optimal lighting effect. Many experiment results have shown the proposed system can tune the LED lamp automatically according to environment luminance changes. PMID:28208781
Intelligent Luminance Control of Lighting Systems Based on Imaging Sensor Feedback.

PubMed

Liu, Haoting; Zhou, Qianxiang; Yang, Jin; Jiang, Ting; Liu, Zhizhen; Li, Jie

2017-02-09

An imaging sensor-based intelligent Light Emitting Diode (LED) lighting system for desk use is proposed. In contrast to the traditional intelligent lighting system, such as the photosensitive resistance sensor-based or the infrared sensor-based system, the imaging sensor can realize a finer perception of the environmental light; thus it can guide a more precise lighting control. Before this system works, first lots of typical imaging lighting data of the desk application are accumulated. Second, a series of subjective and objective Lighting Effect Evaluation Metrics (LEEMs) are defined and assessed for these datasets above. Then the cluster benchmarks of these objective LEEMs can be obtained. Third, both a single LEEM-based control and a multiple LEEMs-based control are developed to realize a kind of optimal luminance tuning. When this system works, first it captures the lighting image using a wearable camera. Then it computes the objective LEEMs of the captured image and compares them with the cluster benchmarks of the objective LEEMs. Finally, the single LEEM-based or the multiple LEEMs-based control can be implemented to get a kind of optimal lighting effect. Many experiment results have shown the proposed system can tune the LED lamp automatically according to environment luminance changes.
Diagnosing the Causes and Severity of One-sided Message Contention

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tallent, Nathan R.; Vishnu, Abhinav; van Dam, Hubertus

Two trends suggest network contention for one-sided messages is poised to become a performance problem that concerns application developers: an increased interest in one-sided programming models and a rising ratio of hardware threads to network injection bandwidth. Unfortunately, it is difficult to reason about network contention and one-sided messages because one-sided tasks can either decrease or increase contention. We present effective and portable techniques for diagnosing the causes and severity of one-sided message contention. To detect that a message is affected by contention, we maintain statistics representing instantaneous (non-local) network resource demand. Using lightweight measurement and modeling, we identify themore » portion of a message's latency that is due to contention and whether contention occurs at the initiator or target. We attribute these metrics to program statements in their full static and dynamic context. We characterize contention for an important computational chemistry benchmark on InfiniBand, Cray Aries, and IBM Blue Gene/Q interconnects. We pinpoint the sources of contention, estimate their severity, and show that when message delivery time deviates from an ideal model, there are other messages contending for the same network links. With a small change to the benchmark, we reduce contention up to 50% and improve total runtime as much as 20%.« less
Citizen science: A new perspective to advance spatial pattern evaluation in hydrology

PubMed Central

Stisen, Simon

2017-01-01

Citizen science opens new pathways that can complement traditional scientific practice. Intuition and reasoning often make humans more effective than computer algorithms in various realms of problem solving. In particular, a simple visual comparison of spatial patterns is a task where humans are often considered to be more reliable than computer algorithms. However, in practice, science still largely depends on computer based solutions, which inevitably gives benefits such as speed and the possibility to automatize processes. However, the human vision can be harnessed to evaluate the reliability of algorithms which are tailored to quantify similarity in spatial patterns. We established a citizen science project to employ the human perception to rate similarity and dissimilarity between simulated spatial patterns of several scenarios of a hydrological catchment model. In total, the turnout counts more than 2500 volunteers that provided over 43000 classifications of 1095 individual subjects. We investigate the capability of a set of advanced statistical performance metrics to mimic the human perception to distinguish between similarity and dissimilarity. Results suggest that more complex metrics are not necessarily better at emulating the human perception, but clearly provide auxiliary information that is valuable for model diagnostics. The metrics clearly differ in their ability to unambiguously distinguish between similar and dissimilar patterns which is regarded a key feature of a reliable metric. The obtained dataset can provide an insightful benchmark to the community to test novel spatial metrics. PMID:28558050
Benchmarking an unstructured grid sediment model in an energetic estuary

DOE PAGES

Lopez, Jesse E.; Baptista, António M.

2016-12-14

A sediment model coupled to the hydrodynamic model SELFE is validated against a benchmark combining a set of idealized tests and an application to a field-data rich energetic estuary. After sensitivity studies, model results for the idealized tests largely agree with previously reported results from other models in addition to analytical, semi-analytical, or laboratory results. Results of suspended sediment in an open channel test with fixed bottom are sensitive to turbulence closure and treatment for hydrodynamic bottom boundary. Results for the migration of a trench are very sensitive to critical stress and erosion rate, but largely insensitive to turbulence closure.more » The model is able to qualitatively represent sediment dynamics associated with estuarine turbidity maxima in an idealized estuary. Applied to the Columbia River estuary, the model qualitatively captures sediment dynamics observed by fixed stations and shipborne profiles. Representation of the vertical structure of suspended sediment degrades when stratification is underpredicted. Across all tests, skill metrics of suspended sediments lag those of hydrodynamics even when qualitatively representing dynamics. The benchmark is fully documented in an openly available repository to encourage unambiguous comparisons against other models.« less
Future computing platforms for science in a power constrained era

DOE PAGES

Abdurachmanov, David; Elmer, Peter; Eulisse, Giulio; ...

2015-12-23

Power consumption will be a key constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics (HEP). This makes performance-per-watt a crucial metric for selecting cost-efficient computing solutions. For this paper, we have done a wide survey of current and emerging architectures becoming available on the market including x86-64 variants, ARMv7 32-bit, ARMv8 64-bit, Many-Core and GPU solutions, as well as newer System-on-Chip (SoC) solutions. We compare performance and energy efficiency using an evolving set of standardized HEP-related benchmarks and power measurement techniques we have been developing. In conclusion, we evaluate the potentialmore » for use of such computing solutions in the context of DHTC systems, such as the Worldwide LHC Computing Grid (WLCG).« less
Microsupercapacitors as miniaturized energy-storage components for on-chip electronics

NASA Astrophysics Data System (ADS)

Kyeremateng, Nana Amponsah; Brousse, Thierry; Pech, David

2017-01-01

The push towards miniaturized electronics calls for the development of miniaturized energy-storage components that can enable sustained, autonomous operation of electronic devices for applications such as wearable gadgets and wireless sensor networks. Microsupercapacitors have been targeted as a viable route for this purpose, because, though storing less energy than microbatteries, they can be charged and discharged much more rapidly and have an almost unlimited lifetime. In this Review, we discuss the progress and the prospects of integrated miniaturized supercapacitors. In particular, we discuss their power performances and emphasize the need of a three-dimensional design to boost their energy-storage capacity. This is obtainable, for example, through self-supported nanostructured electrodes. We also critically evaluate the performance metrics currently used in the literature to characterize microsupercapacitors and offer general guidelines to benchmark performances towards prospective applications.
Microsupercapacitors as miniaturized energy-storage components for on-chip electronics.

PubMed

Kyeremateng, Nana Amponsah; Brousse, Thierry; Pech, David

2017-01-01

The push towards miniaturized electronics calls for the development of miniaturized energy-storage components that can enable sustained, autonomous operation of electronic devices for applications such as wearable gadgets and wireless sensor networks. Microsupercapacitors have been targeted as a viable route for this purpose, because, though storing less energy than microbatteries, they can be charged and discharged much more rapidly and have an almost unlimited lifetime. In this Review, we discuss the progress and the prospects of integrated miniaturized supercapacitors. In particular, we discuss their power performances and emphasize the need of a three-dimensional design to boost their energy-storage capacity. This is obtainable, for example, through self-supported nanostructured electrodes. We also critically evaluate the performance metrics currently used in the literature to characterize microsupercapacitors and offer general guidelines to benchmark performances towards prospective applications.
Weighing the impact (factor) of publishing in veterinary journals.

PubMed

Christopher, Mary M

2015-06-01

The journal in which you publish your research can have a major influence on the perceived value of your work and on your ability to reach certain audiences. The impact factor, a widely used metric of journal quality and prestige, has evolved into a benchmark of quality for institutions and graduate programs and, inappropriately, as a proxy for the quality of individual authors and articles, affecting tenure, promotion, and funding decisions. As a result, despite its many limitations, publishing decisions by authors often are based solely on a journal's impact factor. This can disadvantage journals in small disciplines, such as veterinary medicine, and limit the ability of authors to reach key audiences. In this article, factors that can influence the impact factor of a journal and its applicability, including precision, citation practices, article type, editorial policies, and size of the research community will be reviewed. The value and importance of veterinary journals such as the Journal of Veterinary Cardiology for reaching relevant audiences and for helping shape disciplinary specialties and influence clinical practice will also be discussed. Lastly, the efforts underway to develop alternative measures to assess the scientific quality of individual authors and articles, such as article-level metrics, as well as institutional measures of the economic and social impact of biomedical research will be considered. Judicious use of the impact factor and the implementation of new metrics for assessing the quality and societal relevance of veterinary research articles will benefit both authors and journals. Copyright © 2015 Elsevier B.V. All rights reserved.

Benchmarks: The Development of a New Approach to Student Evaluation.

ERIC Educational Resources Information Center

Larter, Sylvia

The Toronto Board of Education Benchmarks are libraries of reference materials that demonstrate student achievement at various levels. Each library contains video benchmarks, print benchmarks, a staff handbook, and summary and introductory documents. This book is about the development and the history of the benchmark program. It has taken over 3…
Applying ILAMB to data from several generations of the Community Land Model to assess the relative contribution of model improvements and forcing uncertainty to model-data agreement

NASA Astrophysics Data System (ADS)

Lawrence, D. M.; Fisher, R.; Koven, C.; Oleson, K. W.; Swenson, S. C.; Hoffman, F. M.; Randerson, J. T.; Collier, N.; Mu, M.

2017-12-01

The International Land Model Benchmarking (ILAMB) project is a model-data intercomparison and integration project designed to assess and help improve land models. The current package includes assessment of more than 25 land variables across more than 60 global, regional, and site-level (e.g., FLUXNET) datasets. ILAMB employs a broad range of metrics including RMSE, mean error, spatial distributions, interannual variability, and functional relationships. Here, we apply ILAMB for the purpose of assessment of several generations of the Community Land Model (CLM4, CLM4.5, and CLM5). Encouragingly, CLM5, which is the result of model development over the last several years by more than 50 researchers from 15 different institutions, shows broad improvements across many ILAMB metrics including LAI, GPP, vegetation carbon stocks, and the historical net ecosystem carbon balance among others. We will also show that considerable uncertainty arises from the historical climate forcing data used (GSWP3v1 and CRUNCEPv7). ILAMB score variations due to forcing data can be as large for many variables as that due to model structural differences. Strengths and weaknesses and persistent biases across model generations will also be presented.
Optimizing Radiation Doses for Computed Tomography Across Institutions: Dose Auditing and Best Practices.

PubMed

Demb, Joshua; Chu, Philip; Nelson, Thomas; Hall, David; Seibert, Anthony; Lamba, Ramit; Boone, John; Krishnam, Mayil; Cagnon, Christopher; Bostani, Maryam; Gould, Robert; Miglioretti, Diana; Smith-Bindman, Rebecca

2017-06-01

Radiation doses for computed tomography (CT) vary substantially across institutions. To assess the impact of institutional-level audit and collaborative efforts to share best practices on CT radiation doses across 5 University of California (UC) medical centers. In this before/after interventional study, we prospectively collected radiation dose metrics on all diagnostic CT examinations performed between October 1, 2013, and December 31, 2014, at 5 medical centers. Using data from January to March (baseline), we created audit reports detailing the distribution of radiation dose metrics for chest, abdomen, and head CT scans. In April, we shared reports with the medical centers and invited radiology professionals from the centers to a 1.5-day in-person meeting to review reports and share best practices. We calculated changes in mean effective dose 12 weeks before and after the audits and meeting, excluding a 12-week implementation period when medical centers could make changes. We compared proportions of examinations exceeding previously published benchmarks at baseline and following the audit and meeting, and calculated changes in proportion of examinations exceeding benchmarks. Of 158 274 diagnostic CT scans performed in the study period, 29 594 CT scans were performed in the 3 months before and 32 839 CT scans were performed 12 to 24 weeks after the audit and meeting. Reductions in mean effective dose were considerable for chest and abdomen. Mean effective dose for chest CT decreased from 13.2 to 10.7 mSv (18.9% reduction; 95% CI, 18.0%-19.8%). Reductions at individual medical centers ranged from 3.8% to 23.5%. The mean effective dose for abdominal CT decreased from 20.0 to 15.0 mSv (25.0% reduction; 95% CI, 24.3%-25.8%). Reductions at individual medical centers ranged from 10.8% to 34.7%. The number of CT scans that had an effective dose measurement that exceeded benchmarks was reduced considerably by 48% and 54% for chest and abdomen, respectively. After the audit and meeting, head CT doses varied less, although some institutions increased and some decreased mean head CT doses and the proportion above benchmarks. Reviewing institutional doses and sharing dose-optimization best practices resulted in lower radiation doses for chest and abdominal CT and more consistent doses for head CT.
Deep Correlated Holistic Metric Learning for Sketch-Based 3D Shape Retrieval.

PubMed

Dai, Guoxian; Xie, Jin; Fang, Yi

2018-07-01

How to effectively retrieve desired 3D models with simple queries is a long-standing problem in computer vision community. The model-based approach is quite straightforward but nontrivial, since people could not always have the desired 3D query model available by side. Recently, large amounts of wide-screen electronic devices are prevail in our daily lives, which makes the sketch-based 3D shape retrieval a promising candidate due to its simpleness and efficiency. The main challenge of sketch-based approach is the huge modality gap between sketch and 3D shape. In this paper, we proposed a novel deep correlated holistic metric learning (DCHML) method to mitigate the discrepancy between sketch and 3D shape domains. The proposed DCHML trains two distinct deep neural networks (one for each domain) jointly, which learns two deep nonlinear transformations to map features from both domains into a new feature space. The proposed loss, including discriminative loss and correlation loss, aims to increase the discrimination of features within each domain as well as the correlation between different domains. In the new feature space, the discriminative loss minimizes the intra-class distance of the deep transformed features and maximizes the inter-class distance of the deep transformed features to a large margin within each domain, while the correlation loss focused on mitigating the distribution discrepancy across different domains. Different from existing deep metric learning methods only with loss at the output layer, our proposed DCHML is trained with loss at both hidden layer and output layer to further improve the performance by encouraging features in the hidden layer also with desired properties. Our proposed method is evaluated on three benchmarks, including 3D Shape Retrieval Contest 2013, 2014, and 2016 benchmarks, and the experimental results demonstrate the superiority of our proposed method over the state-of-the-art methods.
Benchmarking--Measuring and Comparing for Continuous Improvement.

ERIC Educational Resources Information Center

Henczel, Sue

2002-01-01

Discussion of benchmarking focuses on the use of internal and external benchmarking by special librarians. Highlights include defining types of benchmarking; historical development; benefits, including efficiency, improved performance, increased competitiveness, and better decision making; problems, including inappropriate adaptation; developing a…
Measurement of the Inter-Rater Reliability Rate Is Mandatory for Improving the Quality of a Medical Database: Experience with the Paulista Lung Cancer Registry.

PubMed

Lauricella, Leticia L; Costa, Priscila B; Salati, Michele; Pego-Fernandes, Paulo M; Terra, Ricardo M

2018-06-01

Database quality measurement should be considered a mandatory step to ensure an adequate level of confidence in data used for research and quality improvement. Several metrics have been described in the literature, but no standardized approach has been established. We aimed to describe a methodological approach applied to measure the quality and inter-rater reliability of a regional multicentric thoracic surgical database (Paulista Lung Cancer Registry). Data from the first 3 years of the Paulista Lung Cancer Registry underwent an audit process with 3 metrics: completeness, consistency, and inter-rater reliability. The first 2 methods were applied to the whole data set, and the last method was calculated using 100 cases randomized for direct auditing. Inter-rater reliability was evaluated using percentage of agreement between the data collector and auditor and through calculation of Cohen's κ and intraclass correlation. The overall completeness per section ranged from 0.88 to 1.00, and the overall consistency was 0.96. Inter-rater reliability showed many variables with high disagreement (>10%). For numerical variables, intraclass correlation was a better metric than inter-rater reliability. Cohen's κ showed that most variables had moderate to substantial agreement. The methodological approach applied to the Paulista Lung Cancer Registry showed that completeness and consistency metrics did not sufficiently reflect the real quality status of a database. The inter-rater reliability associated with κ and intraclass correlation was a better quality metric than completeness and consistency metrics because it could determine the reliability of specific variables used in research or benchmark reports. This report can be a paradigm for future studies of data quality measurement. Copyright © 2018 American College of Surgeons. Published by Elsevier Inc. All rights reserved.
Designing for Annual Spacelift Performance

NASA Technical Reports Server (NTRS)

McCleskey, Carey M.; Zapata, Edgar

2017-01-01

This paper presents a methodology for approaching space launch system design from a total architectural point of view. This different approach to conceptual design is contrasted with traditional approaches that focus on a single set of metrics for flight system performance, i.e., payload lift per flight, vehicle mass, specific impulse, etc. The approach presented works with a larger set of metrics, including annual system lift, or "spacelift" performance. Spacelift performance is more inclusive of the flight production capability of the total architecture, i.e., the flight and ground systems working together as a whole to produce flights on a repeated basis. In the proposed methodology, spacelift performance becomes an important design-for-support parameter for flight system concepts and truly advanced spaceport architectures of the future. The paper covers examples of existing system spacelift performance as benchmarks, points out specific attributes of space transportation systems that must be greatly improved over these existing designs, and outlines current activity in this area.
Performance assessment of static lead-lag feedforward controllers for disturbance rejection in PID control loops.

PubMed

Yu, Zhenpeng; Wang, Jiandong

2016-09-01

This paper assesses the performance of feedforward controllers for disturbance rejection in univariate feedback plus feedforward control loops. The structures of feedback and feedforward controllers are confined to proportional-integral-derivative and static-lead-lag forms, respectively, and the effects of feedback controllers are not considered. The integral squared error (ISE) and total squared variation (TSV) are used as performance metrics. A performance index is formulated by comparing the current ISE and TSV metrics to their own lower bounds as performance benchmarks. A controller performance assessment (CPA) method is proposed to calculate the performance index from measurements. The proposed CPA method resolves two critical limitations in the existing CPA methods, in order to be consistent with industrial scenarios. Numerical and experimental examples illustrate the effectiveness of the obtained results. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.
I/O Performance Characterization of Lustre and NASA Applications on Pleiades

NASA Technical Reports Server (NTRS)

Saini, Subhash; Rappleye, Jason; Chang, Johnny; Barker, David Peter; Biswas, Rupak; Mehrotra, Piyush

2012-01-01

In this paper we study the performance of the Lustre file system using five scientific and engineering applications representative of NASA workload on large-scale supercomputing systems such as NASA s Pleiades. In order to facilitate the collection of Lustre performance metrics, we have developed a software tool that exports a wide variety of client and server-side metrics using SGI's Performance Co-Pilot (PCP), and generates a human readable report on key metrics at the end of a batch job. These performance metrics are (a) amount of data read and written, (b) number of files opened and closed, and (c) remote procedure call (RPC) size distribution (4 KB to 1024 KB, in powers of 2) for I/O operations. RPC size distribution measures the efficiency of the Lustre client and can pinpoint problems such as small write sizes, disk fragmentation, etc. These extracted statistics are useful in determining the I/O pattern of the application and can assist in identifying possible improvements for users applications. Information on the number of file operations enables a scientist to optimize the I/O performance of their applications. Amount of I/O data helps users choose the optimal stripe size and stripe count to enhance I/O performance. In this paper, we demonstrate the usefulness of this tool on Pleiades for five production quality NASA scientific and engineering applications. We compare the latency of read and write operations under Lustre to that with NFS by tracing system calls and signals. We also investigate the read and write policies and study the effect of page cache size on I/O operations. We examine the performance impact of Lustre stripe size and stripe count along with performance evaluation of file per process and single shared file accessed by all the processes for NASA workload using parameterized IOR benchmark.
Accuracy of ab initio electron correlation and electron densities in vanadium dioxide

NASA Astrophysics Data System (ADS)

Kylänpää, Ilkka; Balachandran, Janakiraman; Ganesh, Panchapakesan; Heinonen, Olle; Kent, Paul R. C.; Krogel, Jaron T.

2017-11-01

Diffusion quantum Monte Carlo results are used as a reference to analyze properties related to phase stability and magnetism in vanadium dioxide computed with various formulations of density functional theory. We introduce metrics related to energetics, electron densities and spin densities that give us insight on both local and global variations in the antiferromagnetic M1 and R phases. Importantly, these metrics can address contributions arising from the challenging description of the 3 d orbital physics in this material. We observe that the best description of energetics between the structural phases does not correspond to the best accuracy in the charge density, which is consistent with observations made recently by Medvedev et al. [Science 355, 371 (2017), 10.1126/science.aag0410] in the context of isolated atoms. However, we do find evidence that an accurate spin density connects to correct energetic ordering of different magnetic states in VO2, although local, semilocal, and meta-GGA functionals tend to erroneously favor demagnetization of the vanadium sites. The recently developed SCAN functional stands out as remaining nearly balanced in terms of magnetization across the M1-R transition and correctly predicting the ground state crystal structure. In addition to ranking current density functionals, our reference energies and densities serve as important benchmarks for future functional development. With our reference data, the accuracy of both the energy and the electron density can be monitored simultaneously, which is useful for functional development. So far, this kind of detailed high accuracy reference data for correlated materials has been absent from the literature.
Drivers of Dashboard Development (3-D): A Curricular Continuous Quality Improvement Approach.

PubMed

Shroyer, A Laurie; Lu, Wei-Hsin; Chandran, Latha

2016-04-01

Undergraduate medical education (UME) programs are seeking systematic ways to monitor and manage their educational performance metrics and document their achievement of external goals (e.g., Liaison Committee on Medical Education [LCME] accreditation requirements) and internal objectives (institution-specific metrics). In other continuous quality improvement (CQI) settings, summary dashboard reports have been used to evaluate and improve performance. The Stony Brook University School of Medicine UME leadership team developed and implemented summary dashboard performance reports in 2009 to document LCME standards/criteria compliance, evaluate medical student performance, and identify progress in attaining institutional curricular goals and objectives. Key performance indicators (KPIs) and benchmarks were established and have been routinely monitored as part of the novel Drivers of Dashboard Development (3-D) approach to curricular CQI. The systematic 3-D approach has had positive CQI impacts. Substantial improvements over time have been documented in KPIs including timeliness of clerkship grades, midclerkship feedback, student mistreatment policy awareness, and student satisfaction. Stakeholder feedback indicates that the dashboards have provided useful information guiding data-driven curricular changes, such as integrating clinician-scientists as lecturers in basic science courses to clarify the clinical relevance of specific topics. Gaining stakeholder acceptance of the 3-D approach required clear communication of preestablished targets and annual meetings with department leaders and course/clerkship directors. The 3-D approach may be considered by UME programs as a template for providing faculty and leadership with a CQI framework to establish shared goals, document compliance, report accomplishments, enrich communications, facilitate decisions, and improve performance.
Developing integrated benchmarks for DOE performance measurement

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barancik, J.I.; Kramer, C.F.; Thode, Jr. H.C.

1992-09-30

The objectives of this task were to describe and evaluate selected existing sources of information on occupational safety and health with emphasis on hazard and exposure assessment, abatement, training, reporting, and control identifying for exposure and outcome in preparation for developing DOE performance benchmarks. Existing resources and methodologies were assessed for their potential use as practical performance benchmarks. Strengths and limitations of current data resources were identified. Guidelines were outlined for developing new or improved performance factors, which then could become the basis for selecting performance benchmarks. Data bases for non-DOE comparison populations were identified so that DOE performance couldmore » be assessed relative to non-DOE occupational and industrial groups. Systems approaches were described which can be used to link hazards and exposure, event occurrence, and adverse outcome factors, as needed to generate valid, reliable, and predictive performance benchmarks. Data bases were identified which contain information relevant to one or more performance assessment categories . A list of 72 potential performance benchmarks was prepared to illustrate the kinds of information that can be produced through a benchmark development program. Current information resources which may be used to develop potential performance benchmarks are limited. There is need to develop an occupational safety and health information and data system in DOE, which is capable of incorporating demonstrated and documented performance benchmarks prior to, or concurrent with the development of hardware and software. A key to the success of this systems approach is rigorous development and demonstration of performance benchmark equivalents to users of such data before system hardware and software commitments are institutionalized.« less
Issues to consider in the derivation of water quality benchmarks for the protection of aquatic life.

PubMed

Schneider, Uwe

2014-01-01

While water quality benchmarks for the protection of aquatic life have been in use in some jurisdictions for several decades (USA, Canada, several European countries), more and more countries are now setting up their own national water quality benchmark development programs. In doing so, they either adopt an existing method from another jurisdiction, update on an existing approach, or develop their own new derivation method. Each approach has its own advantages and disadvantages, and many issues have to be addressed when setting up a water quality benchmark development program or when deriving a water quality benchmark. Each of these tasks requires a special expertise. They may seem simple, but are complex in their details. The intention of this paper was to provide some guidance for this process of water quality benchmark development on the program level, for the derivation methodology development, and in the actual benchmark derivation step, as well as to point out some issues (notably the inclusion of adapted populations and cryptic species and points to consider in the use of the species sensitivity distribution approach) and future opportunities (an international data repository and international collaboration in water quality benchmark development).
Relational Agreement Measures for Similarity Searching of Cheminformatic Data Sets.

PubMed

Rivera-Borroto, Oscar Miguel; García-de la Vega, José Manuel; Marrero-Ponce, Yovani; Grau, Ricardo

2016-01-01

Research on similarity searching of cheminformatic data sets has been focused on similarity measures using fingerprints. However, nominal scales are the least informative of all metric scales, increasing the tied similarity scores, and decreasing the effectivity of the retrieval engines. Tanimoto's coefficient has been claimed to be the most prominent measure for this task. Nevertheless, this field is far from being exhausted since the computer science no free lunch theorem predicts that "no similarity measure has overall superiority over the population of data sets". We introduce 12 relational agreement (RA) coefficients for seven metric scales, which are integrated within a group fusion-based similarity searching algorithm. These similarity measures are compared to a reference panel of 21 proximity quantifiers over 17 benchmark data sets (MUV), by using informative descriptors, a feature selection stage, a suitable performance metric, and powerful comparison tests. In this stage, RA coefficients perform favourably with repect to the state-of-the-art proximity measures. Afterward, the RA-based method outperform another four nearest neighbor searching algorithms over the same data domains. In a third validation stage, RA measures are successfully applied to the virtual screening of the NCI data set. Finally, we discuss a possible molecular interpretation for these similarity variants.
Enhanced monitoring of the temporal and spatial relationships between water demand and water availability

NASA Astrophysics Data System (ADS)

Schneider, C. A.; Aggett, G. R.; Hattendorf, M. J.

2007-12-01

Better information on evapotranspiration (ET) is essential to better understanding of consumptive use of water by crops. RTi is using NASA Earth-sun System research results and METRIC (Mapping ET at high Resolution with Internalized Calibration) to increase the repeatability and accuracy of consumptive use estimates. METRIC, an image-processing model for calculating ET as a residual of the surface energy balance, utilizes the thermal band on various satellite remote sensors. Calculating actual ET from satellites can avoid many of the assumptions driving other methods of calculating ET over a large area. Because it is physically based and does not rely on explicit knowledge of crop type in the field, a large potential source of error should be eliminated. This paper assesses sources of error in current operational estimates of ET for an area of the South Platte irrigated lands of Colorado, and benchmarks potential improvements in the accuracy of ET estimates gained using METRIC, as well as the processing efficiency of consumptive use demand for large irrigated lands. Examples highlighting how better water planning decisions and water management can be achieved via enhanced monitoring of the temporal and spatial relationships between water demand and water availability are provided.
Adapting generalization tools to physiographic diversity for the united states national hydrography dataset

USGS Publications Warehouse

Buttenfield, B.P.; Stanislawski, L.V.; Brewer, C.A.

2011-01-01

This paper reports on generalization and data modeling to create reduced scale versions of the National Hydrographic Dataset (NHD) for dissemination through The National Map, the primary data delivery portal for USGS. Our approach distinguishes local differences in physiographic factors, to demonstrate that knowledge about varying terrain (mountainous, hilly or flat) and varying climate (dry or humid) can support decisions about algorithms, parameters, and processing sequences to create generalized, smaller scale data versions which preserve distinct hydrographic patterns in these regions. We work with multiple subbasins of the NHD that provide a range of terrain and climate characteristics. Specifically tailored generalization sequences are used to create simplified versions of the high resolution data, which was compiled for 1:24,000 scale mapping. Results are evaluated cartographically and metrically against a medium resolution benchmark version compiled for 1:100,000, developing coefficients of linear and areal correspondence.
Measuring distance “as the horse runs”: Cross-scale comparison of terrain-based metrics

USGS Publications Warehouse

Buttenfield, Barbara P.; Ghandehari, M; Leyk, S; Stanislawski, Larry V.; Brantley, M E; Qiang, Yi

2016-01-01

Distance metrics play significant roles in spatial modeling tasks, such as flood inundation (Tucker and Hancock 2010), stream extraction (Stanislawski et al. 2015), power line routing (Kiessling et al. 2003) and analysis of surface pollutants such as nitrogen (Harms et al. 2009). Avalanche risk is based on slope, aspect, and curvature, all directly computed from distance metrics (Gutiérrez 2012). Distance metrics anchor variogram analysis, kernel estimation, and spatial interpolation (Cressie 1993). Several approaches are employed to measure distance. Planar metrics measure straight line distance between two points (“as the crow flies”) and are simple and intuitive, but suffer from uncertainties. Planar metrics assume that Digital Elevation Model (DEM) pixels are rigid and flat, as tiny facets of ceramic tile approximating a continuous terrain surface. In truth, terrain can bend, twist and undulate within each pixel.Work with Light Detection and Ranging (lidar) data or High Resolution Topography to achieve precise measurements present challenges, as filtering can eliminate or distort significant features (Passalacqua et al. 2015). The current availability of lidar data is far from comprehensive in developed nations, and non-existent in many rural and undeveloped regions. Notwithstanding computational advances, distance estimation on DEMs has never been systematically assessed, due to assumptions that improvements are so small that surface adjustment is unwarranted. For individual pixels inaccuracies may be small, but additive effects can propagate dramatically, especially in regional models (e.g., disaster evacuation) or global models (e.g., sea level rise) where pixels span dozens to hundreds of kilometers (Usery et al 2003). Such models are increasingly common, lending compelling reasons to understand shortcomings in the use of planar distance metrics. Researchers have studied curvature-based terrain modeling. Jenny et al. (2011) use curvature to generate hierarchical terrain models. Schneider (2001) creates a ‘plausibility’ metric for DEM-extracted structure lines. d’Oleire- Oltmanns et al. (2014) adopt object-based image processing as an alternative to working with DEMs; acknowledging the pre-processing involved in converting terrain into an object model is computationally intensive, and likely infeasible for some applications.This paper compares planar distance with surface adjusted distance, evolving from distance “as the crow flies” to distance “as the horse runs”. Several methods are compared for DEMs spanning a range of resolutions for the study area and validated against a 3 meter (m) lidar data benchmark. Error magnitudes vary with pixel size and with the method of surface adjustment. The rate of error increase may also vary with landscape type (terrain roughness, precipitation regimes and land settlement patterns). Cross-scale analysis for a single study area is reported here. Additional areas will be presented at the conference.
Quantifying China's regional economic complexity

NASA Astrophysics Data System (ADS)

Gao, Jian; Zhou, Tao

2018-02-01

China has experienced an outstanding economic expansion during the past decades, however, literature on non-monetary metrics that reveal the status of China's regional economic development are still lacking. In this paper, we fill this gap by quantifying the economic complexity of China's provinces through analyzing 25 years' firm data. First, we estimate the regional economic complexity index (ECI), and show that the overall time evolution of provinces' ECI is relatively stable and slow. Then, after linking ECI to the economic development and the income inequality, we find that the explanatory power of ECI is positive for the former but negative for the latter. Next, we compare different measures of economic diversity and explore their relationships with monetary macroeconomic indicators. Results show that the ECI index and the non-linear iteration based Fitness index are comparative, and they both have stronger explanatory power than other benchmark measures. Further multivariate regressions suggest the robustness of our results after controlling other socioeconomic factors. Our work moves forward a step towards better understanding China's regional economic development and non-monetary macroeconomic indicators.
40 CFR 141.709 - Developing the disinfection profile and benchmark.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 23 2011-07-01 2011-07-01 false Developing the disinfection profile... Cryptosporidium Disinfection Profiling and Benchmarking Requirements § 141.709 Developing the disinfection profile and benchmark. (a) Systems required to develop disinfection profiles under § 141.708 must follow the...
40 CFR 141.709 - Developing the disinfection profile and benchmark.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 23 2014-07-01 2014-07-01 false Developing the disinfection profile... Cryptosporidium Disinfection Profiling and Benchmarking Requirements § 141.709 Developing the disinfection profile and benchmark. (a) Systems required to develop disinfection profiles under § 141.708 must follow the...

40 CFR 141.709 - Developing the disinfection profile and benchmark.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 24 2012-07-01 2012-07-01 false Developing the disinfection profile... Cryptosporidium Disinfection Profiling and Benchmarking Requirements § 141.709 Developing the disinfection profile and benchmark. (a) Systems required to develop disinfection profiles under § 141.708 must follow the...
40 CFR 141.709 - Developing the disinfection profile and benchmark.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 24 2013-07-01 2013-07-01 false Developing the disinfection profile... Cryptosporidium Disinfection Profiling and Benchmarking Requirements § 141.709 Developing the disinfection profile and benchmark. (a) Systems required to develop disinfection profiles under § 141.708 must follow the...
Logistics, Costs, and GHG Impacts of Utility-Scale Co-Firing with 20% Biomass

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nichol, Corrie Ian

This study analyzes the possibility that biopower in the U.S. is a cost-competitive option to significantly reduce greenhouse gas emissions. In 2009, net greenhouse gas (GHG) emitted in the United States was equivalent to 5,618 million metric tons CO 2, up 5.6% from 1990 (EPA 2011). Coal-fired power generation accounted for 1,748 million metric tons of this total. Intuitively, life-cycle CO 2 emissions in the power sector could be reduced by substituting renewable biomass for coal. If just 20% of the coal combusted in 2009 had been replaced with biomass, CO 2 emissions would have been reduced by 350 millionmore » metric tons, or about 6% of net annual GHG emission. This would have required approximately 225 million tons of dry biomass. Such an ambitious fuel substitution would require development of a biomass feedstock production and supply system tantamount to coal. This material would need to meet stringent specifications to ensure reliable conveyance to boiler burners, efficient combustion, and no adverse impact on heat transfer surfaces and flue gas cleanup operations. Therefore, this report addresses the potential cost/benefit tradeoffs of co-firing 20% specification-qualified biomass (on an energy content basis) in large U.S. coal-fired power plants. The dependence and sensitivity of feedstock cost on source of material, location, supply distance, and demand pressure was established. Subsequently, the dependence of levelized cost of electricity (LCOE) on feedstock costs, power plant feed system retrofit, and impact on boiler performance was determined. Overall life-cycle assessment (LCA) of greenhouse gas emissions saving were next evaluated and compared to wind and solar energy to benchmark the leading alternatives for meeting renewable portfolio standards (or RPS).« less
Quantifying Groundwater Quality at a Regional Scale: Establishing a Foundation for Economic and Health Assessments

NASA Astrophysics Data System (ADS)

Belitz, K.

2015-12-01

What is the value of clean groundwater? Might one aquifer be considered more valuable than another? To help address these, and similar, questions, we propose that aquifers be assessed by two metrics: 1) the contaminated area of an aquifer, defined by high concentrations (km2 or proportion); and 2) equivalent-population potentially impacted by that contamination (number of people or proportion). Concentrations are considered high if they are above a human health benchmark. The two metrics provide a quantitative basis for assessment at the aquifer scale, rather than the well scale. This approach has been applied to groundwater used for public supply in California (Belitz and others, 2015). The assessment distinguishes between population (34 million, 2000 census) and equivalent-population (11 million) because public drinking water supplies can be a mix of surface water and groundwater. The assessment was conducted in 87 study areas which account for nearly 100% of the groundwater used for public supply. The area-metric, when expressed as a proportion, is useful for identifying where a particular contaminant or class of contaminants might be a cause for concern. In CA, there are 38 study where the area-metric ≥ 25% for one or more contaminants; in 7 of these, the area-metric ≥ 50%. Naturally-occurring trace elements, such as arsenic and uranium, are the most prevalent contaminant class in 72 study areas. Nitrate is most prevalent at high concentrations in 11 study areas, and organic compounds in 4. By the area-metric, 23% of the groundwater used for public supply in CA has high concentrations of one or more contaminants (20,000 of 89,000 km2 assessed). The population-metric, when expressed as a number of people, identifies the potential impact of groundwater contamination. There are 33 CA study areas where the population-metric exceeds 10,000 people (equivalent population multiplied by detection frequency of wells with high concentrations). The population-metric exceeds 50,000 people in 10 study areas. On a statewide basis, the population metric is 2 million people (18% of 11 million equivalent-people). The proposed assessment approach is independent of scale, allows for consistent comparison across regions, and provides a foundation for subsequent economic or health assessments.
Development of a proficiency-based virtual reality simulation training curriculum for laparoscopic appendicectomy.

PubMed

Sirimanna, Pramudith; Gladman, Marc A

2017-10-01

Proficiency-based virtual reality (VR) training curricula improve intraoperative performance, but have not been developed for laparoscopic appendicectomy (LA). This study aimed to develop an evidence-based training curriculum for LA. A total of 10 experienced (>50 LAs), eight intermediate (10-30 LAs) and 20 inexperienced (<10 LAs) operators performed guided and unguided LA tasks on a high-fidelity VR simulator using internationally relevant techniques. The ability to differentiate levels of experience (construct validity) was measured using simulator-derived metrics. Learning curves were analysed. Proficiency benchmarks were defined by the performance of the experienced group. Intermediate and experienced participants completed a questionnaire to evaluate the realism (face validity) and relevance (content validity). Of 18 surgeons, 16 (89%) considered the VR model to be visually realistic and 17 (95%) believed that it was representative of actual practice. All 'guided' modules demonstrated construct validity (P < 0.05), with learning curves that plateaued between sessions 6 and 9 (P < 0.01). When comparing inexperienced to intermediates to experienced, the 'unguided' LA module demonstrated construct validity for economy of motion (5.00 versus 7.17 versus 7.84, respectively; P < 0.01) and task time (864.5 s versus 477.2 s versus 352.1 s, respectively, P < 0.01). Construct validity was also confirmed for number of movements, path length and idle time. Validated modules were used for curriculum construction, with proficiency benchmarks used as performance goals. A VR LA model was realistic and representative of actual practice and was validated as a training and assessment tool. Consequently, the first evidence-based internationally applicable training curriculum for LA was constructed, which facilitates skill acquisition to proficiency. © 2017 Royal Australasian College of Surgeons.
PMLB: a large benchmark suite for machine learning evaluation and comparison.

PubMed

Olson, Randal S; La Cava, William; Orzechowski, Patryk; Urbanowicz, Ryan J; Moore, Jason H

2017-01-01

The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. From this study, we find that existing benchmarks lack the diversity to properly benchmark machine learning algorithms, and there are several gaps in benchmarking problems that still need to be considered. This work represents another important step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.
Telescience Resource Kit (TReK)

NASA Technical Reports Server (NTRS)

Lippincott, Jeff

2015-01-01

Telescience Resource Kit (TReK) is one of the Huntsville Operations Support Center (HOSC) remote operations solutions. It can be used to monitor and control International Space Station (ISS) payloads from anywhere in the world. It is comprised of a suite of software applications and libraries that provide generic data system capabilities and access to HOSC services. The TReK Software has been operational since 2000. A new cross-platform version of TReK is under development. The new software is being released in phases during the 2014-2016 timeframe. The TReK Release 3.x series of software is the original TReK software that has been operational since 2000. This software runs on Windows. It contains capabilities to support traditional telemetry and commanding using CCSDS (Consultative Committee for Space Data Systems) packets. The TReK Release 4.x series of software is the new cross platform software. It runs on Windows and Linux. The new TReK software will support communication using standard IP protocols and traditional telemetry and commanding. All the software listed above is compatible and can be installed and run together on Windows. The new TReK software contains a suite of software that can be used by payload developers on the ground and onboard (TReK Toolkit). TReK Toolkit is a suite of lightweight libraries and utility applications for use onboard and on the ground. TReK Desktop is the full suite of TReK software -most useful on the ground. When TReK Desktop is released, the TReK installation program will provide the option to choose just the TReK Toolkit portion of the software or the full TReK Desktop suite. The ISS program is providing the TReK Toolkit software as a generic flight software capability offered as a standard service to payloads. TReK Software Verification was conducted during the April/May 2015 timeframe. Payload teams using the TReK software onboard can reference the TReK software verification. TReK will be demonstrated on-orbit running on an ISS provided T61p laptop. Target Timeframe: September 2015 -2016. The on-orbit demonstration will collect benchmark metrics, and will be used in the future to provide live demonstrations during ISS Payload Conferences. Benchmark metrics and demonstrations will address the protocols described in SSP 52050-0047 Ku Forward section 3.3.7. (Associated term: CCSDS File Delivery Protocol (CFDP)).
The Plumbing of Land Surface Models: Is Poor Performance a Result of Methodology or Data Quality?

NASA Technical Reports Server (NTRS)

Haughton, Ned; Abramowitz, Gab; Pitman, Andy J.; Or, Dani; Best, Martin J.; Johnson, Helen R.; Balsamo, Gianpaolo; Boone, Aaron; Cuntz, Matthais; Decharme, Bertrand;

2016-01-01

The PALS Land sUrface Model Benchmarking Evaluation pRoject (PLUMBER) illustrated the value of prescribing a priori performance targets in model intercomparisons. It showed that the performance of turbulent energy flux predictions from different land surface models, at a broad range of flux tower sites using common evaluation metrics, was on average worse than relatively simple empirical models. For sensible heat fluxes, all land surface models were outperformed by a linear regression against downward shortwave radiation. For latent heat flux, all land surface models were outperformed by a regression against downward shortwave, surface air temperature and relative humidity. These results are explored here in greater detail and possible causes are investigated. We examine whether particular metrics or sites unduly influence the collated results, whether results change according to time-scale aggregation and whether a lack of energy conservation in fluxtower data gives the empirical models an unfair advantage in the intercomparison. We demonstrate that energy conservation in the observational data is not responsible for these results. We also show that the partitioning between sensible and latent heat fluxes in LSMs, rather than the calculation of available energy, is the cause of the original findings. Finally, we present evidence suggesting that the nature of this partitioning problem is likely shared among all contributing LSMs. While we do not find a single candidate explanation forwhy land surface models perform poorly relative to empirical benchmarks in PLUMBER, we do exclude multiple possible explanations and provide guidance on where future research should focus.

Using Participatory Action Research to Study the Implementation of Career Development Benchmarks at a New Zealand University

ERIC Educational Resources Information Center

Furbish, Dale S.; Bailey, Robyn; Trought, David

2016-01-01

Benchmarks for career development services at tertiary institutions have been developed by Careers New Zealand. The benchmarks are intended to provide standards derived from international best practices to guide career development services. A new career development service was initiated at a large New Zealand university just after the benchmarks…
Translating Research on Myoelectric Control into Clinics-Are the Performance Assessment Methods Adequate?

PubMed

Vujaklija, Ivan; Roche, Aidan D; Hasenoehrl, Timothy; Sturma, Agnes; Amsuess, Sebastian; Farina, Dario; Aszmann, Oskar C

2017-01-01

Missing an upper limb dramatically impairs daily-life activities. Efforts in overcoming the issues arising from this disability have been made in both academia and industry, although their clinical outcome is still limited. Translation of prosthetic research into clinics has been challenging because of the difficulties in meeting the necessary requirements of the market. In this perspective article, we suggest that one relevant factor determining the relatively small clinical impact of myocontrol algorithms for upper limb prostheses is the limit of commonly used laboratory performance metrics. The laboratory conditions, in which the majority of the solutions are being evaluated, fail to sufficiently replicate real-life challenges. We qualitatively support this argument with representative data from seven transradial amputees. Their ability to control a myoelectric prosthesis was tested by measuring the accuracy of offline EMG signal classification, as a typical laboratory performance metrics, as well as by clinical scores when performing standard tests of daily living. Despite all subjects reaching relatively high classification accuracy offline, their clinical scores varied greatly and were not strongly predicted by classification accuracy. We therefore support the suggestion to test myocontrol systems using clinical tests on amputees, fully fitted with sockets and prostheses highly resembling the systems they would use in daily living, as evaluation benchmark. Agreement on this level of testing for systems developed in research laboratories would facilitate clinically relevant progresses in this field.
Spring onset variations and long-term trends from new hemispheric-scale products and remote sensing

NASA Astrophysics Data System (ADS)

Dye, D. G.; Li, X.; Ault, T.; Zurita-Milla, R.; Schwartz, M. D.

2015-12-01

Spring onset is commonly characterized by plant phenophase changes among a variety of biophysical transitions and has important implications for natural and man-managed ecosystems. Here, we present a new integrated analysis of variability in gridded Northern Hemisphere spring onset metrics. We developed a set of hemispheric temperature-based spring indices spanning 1920-2013. As these were derived solely from meteorological data, they are used as a benchmark for isolating the climate system's role in modulating spring "green up" estimated from the annual cycle of normalized difference vegetation index (NDVI). Spatial patterns of interannual variations, teleconnections, and long-term trends were also analyzed in all metrics. At mid-to-high latitudes, all indices exhibit larger variability at interannual to decadal time scales than at spatial scales of a few kilometers. Trends of spring onset vary across space and time. However, compared to long-term trend, interannual to decadal variability generally accounts for a larger portion of the total variance in spring onset timing. Therefore, spring onset trends identified from short existing records may be aliased by decadal climate variations due to their limited temporal depth, even when these records span the entire satellite era. Based on our findings, we also demonstrated that our indices have skill in representing ecosystem-level spring phenology and may have important implications in understanding relationships between phenology, atmosphere dynamics and climate variability.
Information-Theoretic Benchmarking of Land Surface Models

NASA Astrophysics Data System (ADS)

Nearing, Grey; Mocko, David; Kumar, Sujay; Peters-Lidard, Christa; Xia, Youlong

2016-04-01

Benchmarking is a type of model evaluation that compares model performance against a baseline metric that is derived, typically, from a different existing model. Statistical benchmarking was used to qualitatively show that land surface models do not fully utilize information in boundary conditions [1] several years before Gong et al [2] discovered the particular type of benchmark that makes it possible to *quantify* the amount of information lost by an incorrect or imperfect model structure. This theoretical development laid the foundation for a formal theory of model benchmarking [3]. We here extend that theory to separate uncertainty contributions from the three major components of dynamical systems models [4]: model structures, model parameters, and boundary conditions describe time-dependent details of each prediction scenario. The key to this new development is the use of large-sample [5] data sets that span multiple soil types, climates, and biomes, which allows us to segregate uncertainty due to parameters from the two other sources. The benefit of this approach for uncertainty quantification and segregation is that it does not rely on Bayesian priors (although it is strictly coherent with Bayes' theorem and with probability theory), and therefore the partitioning of uncertainty into different components is *not* dependent on any a priori assumptions. We apply this methodology to assess the information use efficiency of the four land surface models that comprise the North American Land Data Assimilation System (Noah, Mosaic, SAC-SMA, and VIC). Specifically, we looked at the ability of these models to estimate soil moisture and latent heat fluxes. We found that in the case of soil moisture, about 25% of net information loss was from boundary conditions, around 45% was from model parameters, and 30-40% was from the model structures. In the case of latent heat flux, boundary conditions contributed about 50% of net uncertainty, and model structures contributed about 40%. There was relatively little difference between the different models. 1. G. Abramowitz, R. Leuning, M. Clark, A. Pitman, Evaluating the performance of land surface models. Journal of Climate 21, (2008). 2. W. Gong, H. V. Gupta, D. Yang, K. Sricharan, A. O. Hero, Estimating Epistemic & Aleatory Uncertainties During Hydrologic Modeling: An Information Theoretic Approach. Water Resources Research 49, 2253-2273 (2013). 3. G. S. Nearing, H. V. Gupta, The quantity and quality of information in hydrologic models. Water Resources Research 51, 524-538 (2015). 4. H. V. Gupta, G. S. Nearing, Using models and data to learn: A systems theoretic perspective on the future of hydrological science. Water Resources Research 50(6), 5351-5359 (2014). 5. H. V. Gupta et al., Large-sample hydrology: a need to balance depth with breadth. Hydrology and Earth System Sciences Discussions 10, 9147-9189 (2013).
Self-supervised online metric learning with low rank constraint for scene categorization.

PubMed

Cong, Yang; Liu, Ji; Yuan, Junsong; Luo, Jiebo

2013-08-01

Conventional visual recognition systems usually train an image classifier in a bath mode with all training data provided in advance. However, in many practical applications, only a small amount of training samples are available in the beginning and many more would come sequentially during online recognition. Because the image data characteristics could change over time, it is important for the classifier to adapt to the new data incrementally. In this paper, we present an online metric learning method to address the online scene recognition problem via adaptive similarity measurement. Given a number of labeled data followed by a sequential input of unseen testing samples, the similarity metric is learned to maximize the margin of the distance among different classes of samples. By considering the low rank constraint, our online metric learning model not only can provide competitive performance compared with the state-of-the-art methods, but also guarantees convergence. A bi-linear graph is also defined to model the pair-wise similarity, and an unseen sample is labeled depending on the graph-based label propagation, while the model can also self-update using the more confident new samples. With the ability of online learning, our methodology can well handle the large-scale streaming video data with the ability of incremental self-updating. We evaluate our model to online scene categorization and experiments on various benchmark datasets and comparisons with state-of-the-art methods demonstrate the effectiveness and efficiency of our algorithm.
Evaluating appropriate red blood cell transfusions: a quality audit at 10 Ontario hospitals to determine the optimal measure for assessing appropriateness.

PubMed

Spradbrow, Jordan; Cohen, Robert; Lin, Yulia; Armali, Chantal; Collins, Allison; Cserti-Gazdewich, Christine; Lieberman, Lani; Pavenski, Katerina; Pendergrast, Jacob; Webert, Kathryn; Callum, Jeannie

2016-10-01

Evaluating the appropriateness of red blood cell (RBC) transfusion requires labor-intensive medical chart audits and expert adjudication. We sought to determine the appropriateness of RBC transfusions at 10 hospitals using retrospective chart review and to determine whether simple metrics (proportion of single-unit transfusions, RBCs/100 acute inpatient days, proportion of transfusions with pretransfusion hemoglobin <80 g/L or posttransfusion hemoglobin <90 g/L) could be used as surrogate markers of appropriateness by comparing their values with the results from the audit. An initial block of 30 RBC units was dually adjudicated for appropriateness followed by additional blocks of 10 units until the difference between the cumulative percentage of appropriate RBC units in the preceding block and final block was <3%. Pearson correlation tests were used to evaluate associations between the metrics and percentages of appropriate transfusions per hospital. Two-by-two tables were used to assess the utility of the metrics to classify transfusions for appropriateness. Of the 498 units audited, 78% were adjudicated as appropriate (κ = 0.9603), with significant variability between institutions (p < 0.0001). Fifty audits or less were required at nine of the institutions. The values of the metrics were not found to have significant correlations with appropriateness, and the metric that misclassified the smallest proportion of transfusions for appropriateness was pretransfusion hemoglobin <80 g/L, at 24%. Our findings suggest that a chart audit of 50 RBC transfusions with adjudication using robust criteria is the optimal means of evaluating RBC transfusion appropriateness at an institution for benchmarking and quality-improvement initiatives. © 2016 AABB.
Feasibility of Turing-Style Tests for Autonomous Aerial Vehicle "Intelligence"

NASA Technical Reports Server (NTRS)

Young, Larry A.

2007-01-01

A new approach is suggested to define and evaluate key metrics as to autonomous aerial vehicle performance. This approach entails the conceptual definition of a "Turing Test" for UAVs. Such a "UAV Turing test" would be conducted by means of mission simulations and/or tailored flight demonstrations of vehicles under the guidance of their autonomous system software. These autonomous vehicle mission simulations and flight demonstrations would also have to be benchmarked against missions "flown" with pilots/human-operators in the loop. In turn, scoring criteria for such testing could be based upon both quantitative mission success metrics (unique to each mission) and by turning to analog "handling quality" metrics similar to the well-known Cooper-Harper pilot ratings used for manned aircraft. Autonomous aerial vehicles would be considered to have successfully passed this "UAV Turing Test" if the aggregate mission success metrics and handling qualities for the autonomous aerial vehicle matched or exceeded the equivalent metrics for missions conducted with pilots/human-operators in the loop. Alternatively, an independent, knowledgeable observer could provide the "UAV Turing Test" ratings of whether a vehicle is autonomous or "piloted." This observer ideally would, in the more sophisticated mission simulations, also have the enhanced capability of being able to override the scripted mission scenario and instigate failure modes and change of flight profile/plans. If a majority of mission tasks are rated as "piloted" by the observer, when in reality the vehicle/simulation is fully- or semi- autonomously controlled, then the vehicle/simulation "passes" the "UAV Turing Test." In this regards, this second "UAV Turing Test" approach is more consistent with Turing s original "imitation game" proposal. The overall feasibility, and important considerations and limitations, of such an approach for judging/evaluating autonomous aerial vehicle "intelligence" will be discussed from a theoretical perspective.
Academic Productivity of Neurosurgeons Working in the United Kingdom: Insights from the H-Index and Its Variants.

PubMed

Jamjoom, Aimun A B; Wiggins, A N; Loan, J J M; Emelifeoneu, J; Fouyas, I P; Brennan, P M

2016-02-01

Academic metrics can be used to compare the productivity of researchers. We aimed to use a variety of bibliometric parameters to assess the productivity of neurosurgeons working in the United Kingdom. Neurosurgical consultants working in the United Kingdom were identified using the Society of British Neurosurgeons' Audit Programme website. Baseline data collected included year of entry to specialist register, academic position and award of higher degree. Google Scholar was used to compute a range of academic metrics for each consultant including the h-index, hi-norm, e-index and g-index. Non-parametric tests were used to compare median results. Median metrics for the whole cohort were: h-index (5), hi-norm (3), g-index (10.4) and e-index (9). The top 3 units based on h-index were Addenbrookes (13), Great Ormond Street (12.5) and Queen's Square (11.5). The h-index correlated with academic position [Prof (17.5), Senior Lecturer (10.5) and non-academic (5); P < 0.0001], higher degree [PhD (10), MD (6) and none (4.5); P < 0.0001] and consultant experience [> 10 year (7), < 10 years (4); P < 0.0001]. No difference was found based on gender [male (5), female (4); P = 0.12]. The same trends were seen across the following other metrics: hi-norm, e-index and g-index. This study details the academic impact of United Kingdom-based neurosurgeons through the analysis of a number of citation metrics. It provides a benchmark bibliometric profile and we advocate future comparative assessments as a means to assess impact of and guide academic policy. Copyright © 2016 Elsevier Inc. All rights reserved.
Sigma metrics as a tool for evaluating the performance of internal quality control in a clinical chemistry laboratory.

PubMed

Kumar, B Vinodh; Mohan, Thuthi

2018-01-01

Six Sigma is one of the most popular quality management system tools employed for process improvement. The Six Sigma methods are usually applied when the outcome of the process can be measured. This study was done to assess the performance of individual biochemical parameters on a Sigma Scale by calculating the sigma metrics for individual parameters and to follow the Westgard guidelines for appropriate Westgard rules and levels of internal quality control (IQC) that needs to be processed to improve target analyte performance based on the sigma metrics. This is a retrospective study, and data required for the study were extracted between July 2015 and June 2016 from a Secondary Care Government Hospital, Chennai. The data obtained for the study are IQC - coefficient of variation percentage and External Quality Assurance Scheme (EQAS) - Bias% for 16 biochemical parameters. For the level 1 IQC, four analytes (alkaline phosphatase, magnesium, triglyceride, and high-density lipoprotein-cholesterol) showed an ideal performance of ≥6 sigma level, five analytes (urea, total bilirubin, albumin, cholesterol, and potassium) showed an average performance of <3 sigma level and for level 2 IQCs, same four analytes of level 1 showed a performance of ≥6 sigma level, and four analytes (urea, albumin, cholesterol, and potassium) showed an average performance of <3 sigma level. For all analytes <6 sigma level, the quality goal index (QGI) was <0.8 indicating the area requiring improvement to be imprecision except cholesterol whose QGI >1.2 indicated inaccuracy. This study shows that sigma metrics is a good quality tool to assess the analytical performance of a clinical chemistry laboratory. Thus, sigma metric analysis provides a benchmark for the laboratory to design a protocol for IQC, address poor assay performance, and assess the efficiency of existing laboratory processes.
Benchmark Evaluation of Start-Up and Zero-Power Measurements at the High-Temperature Engineering Test Reactor

DOE PAGES

Bess, John D.; Fujimoto, Nozomu

2014-10-09

Benchmark models were developed to evaluate six cold-critical and two warm-critical, zero-power measurements of the HTTR. Additional measurements of a fully-loaded subcritical configuration, core excess reactivity, shutdown margins, six isothermal temperature coefficients, and axial reaction-rate distributions were also evaluated as acceptable benchmark experiments. Insufficient information is publicly available to develop finely-detailed models of the HTTR as much of the design information is still proprietary. However, the uncertainties in the benchmark models are judged to be of sufficient magnitude to encompass any biases and bias uncertainties incurred through the simplification process used to develop the benchmark models. Dominant uncertainties in themore » experimental keff for all core configurations come from uncertainties in the impurity content of the various graphite blocks that comprise the HTTR. Monte Carlo calculations of keff are between approximately 0.9 % and 2.7 % greater than the benchmark values. Reevaluation of the HTTR models as additional information becomes available could improve the quality of this benchmark and possibly reduce the computational biases. High-quality characterization of graphite impurities would significantly improve the quality of the HTTR benchmark assessment. Simulation of the other reactor physics measurements are in good agreement with the benchmark experiment values. The complete benchmark evaluation details are available in the 2014 edition of the International Handbook of Evaluated Reactor Physics Benchmark Experiments.« less
40 CFR 141.540 - Who has to develop a disinfection benchmark?

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 23 2014-07-01 2014-07-01 false Who has to develop a disinfection... Disinfection-Systems Serving Fewer Than 10,000 People Disinfection Benchmark § 141.540 Who has to develop a disinfection benchmark? If you are a subpart H system required to develop a disinfection profile under §§ 141...
40 CFR 141.540 - Who has to develop a disinfection benchmark?

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 24 2013-07-01 2013-07-01 false Who has to develop a disinfection... Disinfection-Systems Serving Fewer Than 10,000 People Disinfection Benchmark § 141.540 Who has to develop a disinfection benchmark? If you are a subpart H system required to develop a disinfection profile under §§ 141...

40 CFR 141.540 - Who has to develop a disinfection benchmark?

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 23 2011-07-01 2011-07-01 false Who has to develop a disinfection... Disinfection-Systems Serving Fewer Than 10,000 People Disinfection Benchmark § 141.540 Who has to develop a disinfection benchmark? If you are a subpart H system required to develop a disinfection profile under §§ 141...
Assessment of transport performance index for urban transport development strategies — Incorporating residents' preferences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ambarwati, Lasmini, E-mail: L.Ambarwati@tudelft.nl; Department of Civil Engineering, Brawijaya University; Verhaeghe, Robert, E-mail: R.Verhaeghe@tudelft.nl

The performance of urban transport depends on a variety of factors related to metropolitan structure; in particular, the patterns of commuting, roads and public transport (PT) systems. To evaluate urban transport planning efforts, there is a need for a metric expressing the aggregate performance of the city's transport systems which should relate to residents' preferences. The existing metrics have typically focused on a measure to express the proximity of job locations to residences. A Transport Performance Index (TPI) is proposed in which the total cost of transportation system (operational and environmental costs) is divided by willingness to pay (WTP) formore » transport plus the willingness to accept (WTA) the environmental effects on residents. Transport operational as well as the environmental costs are derived from a simulation of all transport systems, to particular designs of spatial development. Willingness to pay for transport and willingness to accept the environmental effects are derived from surveys among residents. Simulations were modelled of Surabaya's spatial structure and public transport expansion. The results indicate that the current TPI is high, which will double by 2030. With a hypothetical polycentric city structure and adjusted job housing balance, a lower index occurs because of the improvements in urban transport performance. A low index means that the residents obtain much benefit from the alternative proposed. This illustrates the importance of residents' preferences in urban spatial planning in order to achieve efficient urban transport. Applying the index suggests that city authorities should provide fair and equitable public transport systems for suburban residents in the effort to control the phenomenon of urban sprawl. This index is certainly a good tool and prospective benchmark for measuring sustainability in relation to urban development.« less
Use of Traditional and Novel Methods to Evaluate the Influence of an EVA Glove on Hand Performance

NASA Technical Reports Server (NTRS)

Benson, Elizabeth A.; England, Scott A.; Mesloh, Miranda; Thompson, Shelby; ajulu, Sudhakar

2010-01-01

The gloved hand is one of an astronaut s primary means of interacting with the environment, and any restrictions imposed by the glove can strongly affect performance during extravehicular activity (EVA). Glove restrictions have been the subject of study for decades, yet previous studies have generally been unsuccessful in quantifying glove mobility and tactility. Past studies have tended to focus on the dexterity, strength, and functional performance of the gloved hand; this provides only a circumspect analysis of the impact of each type of restriction on the glove s overall capability. The aim of this study was to develop novel capabilities to provide metrics for mobility and tactility that can be used to assess the performance of a glove in a way that could enable designers and engineers to improve their current designs. A series of evaluations were performed to compare unpressurized and pressurized (4.3 psi) gloved conditions with the ungloved condition. A second series of evaluations were performed with the Thermal Micrometeoroid Garment (TMG) removed. This series of tests provided interesting insight into how much of an effect the TMG has on gloved mobility - in some cases, the presence of the TMG restricted glove mobility as much as pressurization did. Previous hypotheses had assumed that the TMG would have a much lower impact on mobility, but these results suggest that an improvement in the design of the TMG could have a significant impact on glove performance. Tactility testing illustrated the effect of glove pressurization, provided insight into the design of hardware that interfaces with the glove, and highlighted areas of concern. The metrics developed in this study served to benchmark the Phase VI EVA glove and to develop requirements for the next-generation glove for the Constellation program.
Structural Benchmark Creep Testing for Microcast MarM-247 Advanced Stirling Convertor E2 Heater Head Test Article SN18

NASA Technical Reports Server (NTRS)

Krause, David L.; Brewer, Ethan J.; Pawlik, Ralph

2013-01-01

This report provides test methodology details and qualitative results for the first structural benchmark creep test of an Advanced Stirling Convertor (ASC) heater head of ASC-E2 design heritage. The test article was recovered from a flight-like Microcast MarM-247 heater head specimen previously used in helium permeability testing. The test article was utilized for benchmark creep test rig preparation, wall thickness and diametral laser scan hardware metrological developments, and induction heater custom coil experiments. In addition, a benchmark creep test was performed, terminated after one week when through-thickness cracks propagated at thermocouple weld locations. Following this, it was used to develop a unique temperature measurement methodology using contact thermocouples, thereby enabling future benchmark testing to be performed without the use of conventional welded thermocouples, proven problematic for the alloy. This report includes an overview of heater head structural benchmark creep testing, the origin of this particular test article, test configuration developments accomplished using the test article, creep predictions for its benchmark creep test, qualitative structural benchmark creep test results, and a short summary.
A resilience-oriented approach for quantitatively assessing recurrent spatial-temporal congestion on urban roads.

PubMed

Tang, Junqing; Heinimann, Hans Rudolf

2018-01-01

Traffic congestion brings not only delay and inconvenience, but other associated national concerns, such as greenhouse gases, air pollutants, road safety issues and risks. Identification, measurement, tracking, and control of urban recurrent congestion are vital for building a livable and smart community. A considerable amount of works has made contributions to tackle the problem. Several methods, such as time-based approaches and level of service, can be effective for characterizing congestion on urban streets. However, studies with systemic perspectives have been minor in congestion quantification. Resilience, on the other hand, is an emerging concept that focuses on comprehensive systemic performance and characterizes the ability of a system to cope with disturbance and to recover its functionality. In this paper, we symbolized recurrent congestion as internal disturbance and proposed a modified metric inspired by the well-applied "R4" resilience-triangle framework. We constructed the metric with generic dimensions from both resilience engineering and transport science to quantify recurrent congestion based on spatial-temporal traffic patterns and made the comparison with other two approaches in freeway and signal-controlled arterial cases. Results showed that the metric can effectively capture congestion patterns in the study area and provides a quantitative benchmark for comparison. Also, it suggested not only a good comparative performance in measuring strength of proposed metric, but also its capability of considering the discharging process in congestion. The sensitivity tests showed that proposed metric possesses robustness against parameter perturbation in Robustness Range (RR), but the number of identified congestion patterns can be influenced by the existence of ϵ. In addition, the Elasticity Threshold (ET) and the spatial dimension of cell-based platform differ the congestion results significantly on both the detected number and intensity. By tackling this conventional problem with emerging concept, our metric provides a systemic alternative approach and enriches the toolbox for congestion assessment. Future work will be conducted on a larger scale with multiplex scenarios in various traffic conditions.
Benchmarks and Quality Assurance for Online Course Development in Higher Education

ERIC Educational Resources Information Center

Wang, Hong

2008-01-01

As online education has entered the main stream of the U.S. higher education, quality assurance in online course development has become a critical topic in distance education. This short article summarizes the major benchmarks related to online course development, listing and comparing the benchmarks of the National Education Association (NEA),…
Factors associated with increased academic productivity among US academic radiation oncology faculty.

PubMed

Zhang, Catherine; Murata, Stephen; Murata, Mark; Fuller, Clifton David; Thomas, Charles R; Choi, Mehee; Holliday, Emma B

Publication productivity metrics can help evaluate academic faculty for hiring, promotion, grants, and awards; however, limited benchmarking data exist, which makes intra- and interdepartmental comparisons difficult. Therefore, we sought to evaluate the scholarly activity of physician faculty at academic radiation oncology (RO) departments and establish factors associated with increased academic productivity. Citation database searches were performed for all physician-faculty in US residency-affiliated academic RO departments. Demographics, National Institutes of Health (NIH) funding, and bibliometrics (number of publications, Hirsch-[h]-index, and m-index [Hirsch index divided by the number of years since first publication]) were collected and stratified by academic rank. Senior academic rank was defined as full professor, professor, and/or chair. Junior academic rank was defined as all others. Logistic regression was performed to determine the association of academic rank and other factors with h- and m-indices. A total of 1191 academic RO physician faculty from 75 institutions were included in the analysis. The mean (standard deviation) number of publications and h- and m-indices were 48.2 (71.2), 14.5 (15), and 0.86 (0.83), respectively. The median (interquartile range) number of publications and h- and m-indices were 20 (6-61), 9 (4-20), and 0.69 (0.38-1.10), respectively. Recursive partitioning analysis revealed a statistically significant numeric h-index threshold of 21 between junior and senior faculty (LogWorth 114; receiver operating characteristic, 0.828). Senior faculty status, receipt of NIH funding, and a larger department size were associated with increased h- and m-indices. Current academic RO departments have relatively high objective metrics of scholastic productivity compared with prior benchmarking analyses of RO departments and compared with published metrics from other academic medicine subspecialties. An h-index of 21 or greater was associated with senior faculty status. Additionally, receipt of NIH funding and greater departmental size were associated with a higher h-index. These data may be of interest to faculty preparing for promotion or award applications as well as institutional leadership evaluating their departments. Copyright © 2016 American Society for Radiation Oncology. Published by Elsevier Inc. All rights reserved.
Results Oriented Benchmarking: The Evolution of Benchmarking at NASA from Competitive Comparisons to World Class Space Partnerships

NASA Technical Reports Server (NTRS)

Bell, Michael A.

1999-01-01

Informal benchmarking using personal or professional networks has taken place for many years at the Kennedy Space Center (KSC). The National Aeronautics and Space Administration (NASA) recognized early on, the need to formalize the benchmarking process for better utilization of resources and improved benchmarking performance. The need to compete in a faster, better, cheaper environment has been the catalyst for formalizing these efforts. A pioneering benchmarking consortium was chartered at KSC in January 1994. The consortium known as the Kennedy Benchmarking Clearinghouse (KBC), is a collaborative effort of NASA and all major KSC contractors. The charter of this consortium is to facilitate effective benchmarking, and leverage the resulting quality improvements across KSC. The KBC acts as a resource with experienced facilitators and a proven process. One of the initial actions of the KBC was to develop a holistic methodology for Center-wide benchmarking. This approach to Benchmarking integrates the best features of proven benchmarking models (i.e., Camp, Spendolini, Watson, and Balm). This cost-effective alternative to conventional Benchmarking approaches has provided a foundation for consistent benchmarking at KSC through the development of common terminology, tools, and techniques. Through these efforts a foundation and infrastructure has been built which allows short duration benchmarking studies yielding results gleaned from world class partners that can be readily implemented. The KBC has been recognized with the Silver Medal Award (in the applied research category) from the International Benchmarking Clearinghouse.
Utility of different glycemic control metrics for optimizing management of diabetes.

PubMed

Kohnert, Klaus-Dieter; Heinke, Peter; Vogt, Lutz; Salzsieder, Eckhard

2015-02-15

The benchmark for assessing quality of long-term glycemic control and adjustment of therapy is currently glycated hemoglobin (HbA1c). Despite its importance as an indicator for the development of diabetic complications, recent studies have revealed that this metric has some limitations; it conveys a rather complex message, which has to be taken into consideration for diabetes screening and treatment. On the basis of recent clinical trials, the relationship between HbA1c and cardiovascular outcomes in long-standing diabetes has been called into question. It becomes obvious that other surrogate and biomarkers are needed to better predict cardiovascular diabetes complications and assess efficiency of therapy. Glycated albumin, fructosamin, and 1,5-anhydroglucitol have received growing interest as alternative markers of glycemic control. In addition to measures of hyperglycemia, advanced glucose monitoring methods became available. An indispensible adjunct to HbA1c in routine diabetes care is self-monitoring of blood glucose. This monitoring method is now widely used, as it provides immediate feedback to patients on short-term changes, involving fasting, preprandial, and postprandial glucose levels. Beyond the traditional metrics, glycemic variability has been identified as a predictor of hypoglycemia, and it might also be implicated in the pathogenesis of vascular diabetes complications. Assessment of glycemic variability is thus important, but exact quantification requires frequently sampled glucose measurements. In order to optimize diabetes treatment, there is a need for both key metrics of glycemic control on a day-to-day basis and for more advanced, user-friendly monitoring methods. In addition to traditional discontinuous glucose testing, continuous glucose sensing has become a useful tool to reveal insufficient glycemic management. This new technology is particularly effective in patients with complicated diabetes and provides the opportunity to characterize glucose dynamics. Several continuous glucose monitoring (CGM) systems, which have shown usefulness in clinical practice, are presently on the market. They can broadly be divided into systems providing retrospective or real-time information on glucose patterns. The widespread clinical application of CGM is still hampered by the lack of generally accepted measures for assessment of glucose profiles and standardized reporting of glucose data. In this article, we will discuss advantages and limitations of various metrics for glycemic control as well as possibilities for evaluation of glucose data with the special focus on glycemic variability and application of CGM to improve individual diabetes management.
Latent uncertainties of the precalculated track Monte Carlo method

DOE Office of Scientific and Technical Information (OSTI.GOV)

Renaud, Marc-André; Seuntjens, Jan; Roberge, David

Purpose: While significant progress has been made in speeding up Monte Carlo (MC) dose calculation methods, they remain too time-consuming for the purpose of inverse planning. To achieve clinically usable calculation speeds, a precalculated Monte Carlo (PMC) algorithm for proton and electron transport was developed to run on graphics processing units (GPUs). The algorithm utilizes pregenerated particle track data from conventional MC codes for different materials such as water, bone, and lung to produce dose distributions in voxelized phantoms. While PMC methods have been described in the past, an explicit quantification of the latent uncertainty arising from the limited numbermore » of unique tracks in the pregenerated track bank is missing from the paper. With a proper uncertainty analysis, an optimal number of tracks in the pregenerated track bank can be selected for a desired dose calculation uncertainty. Methods: Particle tracks were pregenerated for electrons and protons using EGSnrc and GEANT4 and saved in a database. The PMC algorithm for track selection, rotation, and transport was implemented on the Compute Unified Device Architecture (CUDA) 4.0 programming framework. PMC dose distributions were calculated in a variety of media and compared to benchmark dose distributions simulated from the corresponding general-purpose MC codes in the same conditions. A latent uncertainty metric was defined and analysis was performed by varying the pregenerated track bank size and the number of simulated primary particle histories and comparing dose values to a “ground truth” benchmark dose distribution calculated to 0.04% average uncertainty in voxels with dose greater than 20% of D{sub max}. Efficiency metrics were calculated against benchmark MC codes on a single CPU core with no variance reduction. Results: Dose distributions generated using PMC and benchmark MC codes were compared and found to be within 2% of each other in voxels with dose values greater than 20% of the maximum dose. In proton calculations, a small (≤1 mm) distance-to-agreement error was observed at the Bragg peak. Latent uncertainty was characterized for electrons and found to follow a Poisson distribution with the number of unique tracks per energy. A track bank of 12 energies and 60000 unique tracks per pregenerated energy in water had a size of 2.4 GB and achieved a latent uncertainty of approximately 1% at an optimal efficiency gain over DOSXYZnrc. Larger track banks produced a lower latent uncertainty at the cost of increased memory consumption. Using an NVIDIA GTX 590, efficiency analysis showed a 807 × efficiency increase over DOSXYZnrc for 16 MeV electrons in water and 508 × for 16 MeV electrons in bone. Conclusions: The PMC method can calculate dose distributions for electrons and protons to a statistical uncertainty of 1% with a large efficiency gain over conventional MC codes. Before performing clinical dose calculations, models to calculate dose contributions from uncharged particles must be implemented. Following the successful implementation of these models, the PMC method will be evaluated as a candidate for inverse planning of modulated electron radiation therapy and scanned proton beams.« less
Latent uncertainties of the precalculated track Monte Carlo method.

PubMed

Renaud, Marc-André; Roberge, David; Seuntjens, Jan

2015-01-01

While significant progress has been made in speeding up Monte Carlo (MC) dose calculation methods, they remain too time-consuming for the purpose of inverse planning. To achieve clinically usable calculation speeds, a precalculated Monte Carlo (PMC) algorithm for proton and electron transport was developed to run on graphics processing units (GPUs). The algorithm utilizes pregenerated particle track data from conventional MC codes for different materials such as water, bone, and lung to produce dose distributions in voxelized phantoms. While PMC methods have been described in the past, an explicit quantification of the latent uncertainty arising from the limited number of unique tracks in the pregenerated track bank is missing from the paper. With a proper uncertainty analysis, an optimal number of tracks in the pregenerated track bank can be selected for a desired dose calculation uncertainty. Particle tracks were pregenerated for electrons and protons using EGSnrc and geant4 and saved in a database. The PMC algorithm for track selection, rotation, and transport was implemented on the Compute Unified Device Architecture (cuda) 4.0 programming framework. PMC dose distributions were calculated in a variety of media and compared to benchmark dose distributions simulated from the corresponding general-purpose MC codes in the same conditions. A latent uncertainty metric was defined and analysis was performed by varying the pregenerated track bank size and the number of simulated primary particle histories and comparing dose values to a "ground truth" benchmark dose distribution calculated to 0.04% average uncertainty in voxels with dose greater than 20% of Dmax. Efficiency metrics were calculated against benchmark MC codes on a single CPU core with no variance reduction. Dose distributions generated using PMC and benchmark MC codes were compared and found to be within 2% of each other in voxels with dose values greater than 20% of the maximum dose. In proton calculations, a small (≤ 1 mm) distance-to-agreement error was observed at the Bragg peak. Latent uncertainty was characterized for electrons and found to follow a Poisson distribution with the number of unique tracks per energy. A track bank of 12 energies and 60000 unique tracks per pregenerated energy in water had a size of 2.4 GB and achieved a latent uncertainty of approximately 1% at an optimal efficiency gain over DOSXYZnrc. Larger track banks produced a lower latent uncertainty at the cost of increased memory consumption. Using an NVIDIA GTX 590, efficiency analysis showed a 807 × efficiency increase over DOSXYZnrc for 16 MeV electrons in water and 508 × for 16 MeV electrons in bone. The PMC method can calculate dose distributions for electrons and protons to a statistical uncertainty of 1% with a large efficiency gain over conventional MC codes. Before performing clinical dose calculations, models to calculate dose contributions from uncharged particles must be implemented. Following the successful implementation of these models, the PMC method will be evaluated as a candidate for inverse planning of modulated electron radiation therapy and scanned proton beams.
The NAS parallel benchmarks

NASA Technical Reports Server (NTRS)

Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)

1993-01-01

A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
An exploratory survey of methods used to develop measures of performance

NASA Astrophysics Data System (ADS)

Hamner, Kenneth L.; Lafleur, Charles A.

1993-09-01

Nonmanufacturing organizations are being challenged to provide high-quality products and services to their customers, with an emphasis on continuous process improvement. Measures of performance, referred to as metrics, can be used to foster process improvement. The application of performance measurement to nonmanufacturing processes can be very difficult. This research explored methods used to develop metrics in nonmanufacturing organizations. Several methods were formally defined in the literature, and the researchers used a two-step screening process to determine the OMB Generic Method was most likely to produce high-quality metrics. The OMB Generic Method was then used to develop metrics. A few other metric development methods were found in use at nonmanufacturing organizations. The researchers interviewed participants in metric development efforts to determine their satisfaction and to have them identify the strengths and weaknesses of, and recommended improvements to, the metric development methods used. Analysis of participants' responses allowed the researchers to identify the key components of a sound metrics development method. Those components were incorporated into a proposed metric development method that was based on the OMB Generic Method, and should be more likely to produce high-quality metrics that will result in continuous process improvement.
Trade Study: Storing NASA HDF5/netCDF-4 Data in the Amazon Cloud and Retrieving Data Via Hyrax Server Data Server

NASA Technical Reports Server (NTRS)

Habermann, Ted; Gallagher, James; Jelenak, Aleksandar; Potter, Nathan; Lee, Joe; Yang, Kent

2017-01-01

This study explored three candidate architectures with different types of objects and access paths for serving NASA Earth Science HDF5 data via Hyrax running on Amazon Web Services (AWS). We studied the cost and performance for each architecture using several representative Use-Cases. The objectives of the study were: Conduct a trade study to identify one or more high performance integrated solutions for storing and retrieving NASA HDF5 and netCDF4 data in a cloud (web object store) environment. The target environment is Amazon Web Services (AWS) Simple Storage Service (S3). Conduct needed level of software development to properly evaluate solutions in the trade study and to obtain required benchmarking metrics for input into government decision of potential follow-on prototyping. Develop a cloud cost model for the preferred data storage solution (or solutions) that accounts for different granulation and aggregation schemes as well as cost and performance trades.We will describe the three architectures and the use cases along with performance results and recommendations for further work.
State of emergency preparedness for US health insurance plans.

PubMed

Merchant, Raina M; Finne, Kristen; Lardy, Barbara; Veselovskiy, German; Korba, Caey; Margolis, Gregg S; Lurie, Nicole

2015-01-01

Health insurance plans serve a critical role in public health emergencies, yet little has been published about their collective emergency preparedness practices and policies. We evaluated, on a national scale, the state of health insurance plans' emergency preparedness and policies. A survey of health insurance plans. We queried members of America's Health Insurance Plans, the national trade association representing the health insurance industry, about issues related to emergency preparedness issues: infrastructure, adaptability, connectedness, and best practices. Of 137 health insurance plans queried, 63% responded, representing 190.6 million members and 81% of US plan enrollment. All respondents had emergency plans for business continuity, and most (85%) had infrastructure for emergency teams. Some health plans also have established benchmarks for preparedness (eg, response time). Regarding adaptability, 85% had protocols to extend claim filing time and 71% could temporarily suspend prior medical authorization rules. Regarding connectedness, many plans shared their contingency plans with health officials, but often cited challenges in identifying regulatory agency contacts. Some health insurance plans had specific policies for assisting individuals dependent on durable medical equipment or home healthcare. Many plans (60%) expressed interest in sharing best practices. Health insurance plans are prioritizing emergency preparedness. We identified 6 policy modifications that health insurance plans could undertake to potentially improve healthcare system preparedness: establishing metrics and benchmarks for emergency preparedness; identifying disaster-specific policy modifications, enhancing stakeholder connectedness, considering digital strategies to enhance communication, improving support and access for special-needs individuals, and developing regular forums for knowledge exchange about emergency preparedness.
Benchmark problems for numerical implementations of phase field models

DOE PAGES

Jokisaari, A. M.; Voorhees, P. W.; Guyer, J. E.; ...

2016-10-01

Here, we present the first set of benchmark problems for phase field models that are being developed by the Center for Hierarchical Materials Design (CHiMaD) and the National Institute of Standards and Technology (NIST). While many scientific research areas use a limited set of well-established software, the growing phase field community continues to develop a wide variety of codes and lacks benchmark problems to consistently evaluate the numerical performance of new implementations. Phase field modeling has become significantly more popular as computational power has increased and is now becoming mainstream, driving the need for benchmark problems to validate and verifymore » new implementations. We follow the example set by the micromagnetics community to develop an evolving set of benchmark problems that test the usability, computational resources, numerical capabilities and physical scope of phase field simulation codes. In this paper, we propose two benchmark problems that cover the physics of solute diffusion and growth and coarsening of a second phase via a simple spinodal decomposition model and a more complex Ostwald ripening model. We demonstrate the utility of benchmark problems by comparing the results of simulations performed with two different adaptive time stepping techniques, and we discuss the needs of future benchmark problems. The development of benchmark problems will enable the results of quantitative phase field models to be confidently incorporated into integrated computational materials science and engineering (ICME), an important goal of the Materials Genome Initiative.« less
Benchmarking and Its Relevance to the Library and Information Sector. Interim Findings of "Best Practice Benchmarking in the Library and Information Sector," a British Library Research and Development Department Project.

ERIC Educational Resources Information Center

Kinnell, Margaret; Garrod, Penny

This British Library Research and Development Department study assesses current activities and attitudes toward quality management in library and information services (LIS) in the academic sector as well as the commercial/industrial sector. Definitions and types of benchmarking are described, and the relevance of benchmarking to LIS is evaluated.…
Reliability of hospital cost profiles in inpatient surgery.

PubMed

Grenda, Tyler R; Krell, Robert W; Dimick, Justin B

2016-02-01

With increased policy emphasis on shifting risk from payers to providers through mechanisms such as bundled payments and accountable care organizations, hospitals are increasingly in need of metrics to understand their costs relative to peers. However, it is unclear whether Medicare payments for surgery can reliably compare hospital costs. We used national Medicare data to assess patients undergoing colectomy, pancreatectomy, and open incisional hernia repair from 2009 to 2010 (n = 339,882 patients). We first calculated risk-adjusted hospital total episode payments for each procedure. We then used hierarchical modeling techniques to estimate the reliability of total episode payments for each procedure and explored the impact of hospital caseload on payment reliability. Finally, we quantified the number of hospitals meeting published reliability benchmarks. Mean risk-adjusted total episode payments ranged from $13,262 (standard deviation [SD] $14,523) for incisional hernia repair to $25,055 (SD $22,549) for pancreatectomy. The reliability of hospital episode payments varied widely across procedures and depended on sample size. For example, mean episode payment reliability for colectomy (mean caseload, 157) was 0.80 (SD 0.18), whereas for pancreatectomy (mean caseload, 13) the mean reliability was 0.45 (SD 0.27). Many hospitals met published reliability benchmarks for each procedure. For example, 90% of hospitals met reliability benchmarks for colectomy, 40% for pancreatectomy, and 66% for incisional hernia repair. Episode payments for inpatient surgery are a reliable measure of hospital costs for commonly performed procedures, but are less reliable for lower volume operations. These findings suggest that hospital cost profiles based on Medicare claims data may be used to benchmark efficiency, especially for more common procedures. Copyright © 2016 Elsevier Inc. All rights reserved.
The NAS parallel benchmarks

NASA Technical Reports Server (NTRS)

Bailey, D. H.; Barszcz, E.; Barton, J. T.; Carter, R. L.; Lasinski, T. A.; Browning, D. S.; Dagum, L.; Fatoohi, R. A.; Frederickson, P. O.; Schreiber, R. S.

1991-01-01

A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers in the framework of the NASA Ames Numerical Aerodynamic Simulation (NAS) Program. These consist of five 'parallel kernel' benchmarks and three 'simulated application' benchmarks. Together they mimic the computation and data movement characteristics of large-scale computational fluid dynamics applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification-all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
Characterizing Wheel-Soil Interaction Loads Using Meshfree Finite Element Methods: A Sensitivity Analysis for Design Trade Studies

NASA Technical Reports Server (NTRS)

Contreras, Michael T.; Trease, Brian P.; Bojanowski, Cezary; Kulakx, Ronald F.

2013-01-01

A wheel experiencing sinkage and slippage events poses a high risk to planetary rover missions as evidenced by the mobility challenges endured by the Mars Exploration Rover (MER) project. Current wheel design practice utilizes loads derived from a series of events in the life cycle of the rover which do not include (1) failure metrics related to wheel sinkage and slippage and (2) performance trade-offs based on grouser placement/orientation. Wheel designs are rigorously tested experimentally through a variety of drive scenarios and simulated soil environments; however, a robust simulation capability is still in development due to myriad of complex interaction phenomena that contribute to wheel sinkage and slippage conditions such as soil composition, large deformation soil behavior, wheel geometry, nonlinear contact forces, terrain irregularity, etc. For the purposes of modeling wheel sinkage and slippage at an engineering scale, meshfree nite element approaches enable simulations that capture su cient detail of wheel-soil interaction while remaining computationally feasible. This study implements the JPL wheel-soil benchmark problem in the commercial code environment utilizing the large deformation modeling capability of Smooth Particle Hydrodynamics (SPH) meshfree methods. The nominal, benchmark wheel-soil interaction model that produces numerically stable and physically realistic results is presented and simulations are shown for both wheel traverse and wheel sinkage cases. A sensitivity analysis developing the capability and framework for future ight applications is conducted to illustrate the importance of perturbations to critical material properties and parameters. Implementation of the proposed soil-wheel interaction simulation capability and associated sensitivity framework has the potential to reduce experimentation cost and improve the early stage wheel design proce

The Importance of Non-accessible Crosslinks and Solvent Accessible Surface Distance in Modeling Proteins with Restraints From Crosslinking Mass Spectrometry*

PubMed Central

Bullock, Joshua Matthew Allen; Schwab, Jannik; Thalassinos, Konstantinos; Topf, Maya

2016-01-01

Crosslinking mass spectrometry (XL-MS) is becoming an increasingly popular technique for modeling protein monomers and complexes. The distance restraints garnered from these experiments can be used alone or as part of an integrative modeling approach, incorporating data from many sources. However, modeling practices are varied and the difference in their usefulness is not clear. Here, we develop a new scoring procedure for models based on crosslink data—Matched and Nonaccessible Crosslink score (MNXL). We compare its performance with that of other commonly-used scoring functions (Number of Violations and Sum of Violation Distances) on a benchmark of 14 protein domains, each with 300 corresponding models (at various levels of quality) and associated, previously published, experimental crosslinks (XLdb). The distances between crosslinked lysines are calculated either as Euclidean distances or Solvent Accessible Surface Distances (SASD) using a newly-developed method (Jwalk). MNXL takes into account whether a crosslink is nonaccessible, i.e. an experimentally observed crosslink has no corresponding SASD in a model due to buried lysines. This metric alone is shown to have a significant impact on modeling performance and is a concept that is not considered at present if only Euclidean distances are used. Additionally, a comparison between modeling with SASD or Euclidean distance shows that SASD is superior, even when factoring out the effect of the nonaccessible crosslinks. Our benchmarking also shows that MNXL outperforms the other tested scoring functions in terms of precision and correlation to Cα-RMSD from the crystal structure. We finally test the MNXL at different levels of crosslink recovery (i.e. the percentage of crosslinks experimentally observed out of all theoretical ones) and set a target recovery of ∼20% after which the performance plateaus. PMID:27150526
Clinical Trial Assessment of Infrastructure Matrix Tool to Improve the Quality of Research Conduct in the Community.

PubMed

Dimond, Eileen P; Zon, Robin T; Weiner, Bryan J; St Germain, Diane; Denicoff, Andrea M; Dempsey, Kandie; Carrigan, Angela C; Teal, Randall W; Good, Marjorie J; McCaskill-Stevens, Worta; Grubbs, Stephen S; Dimond, Eileen P; Zon, Robin T; Weiner, Bryan J; St Germain, Diane; Denicoff, Andrea M; Dempsey, Kandie; Carrigan, Angela C; Teal, Randall W; Good, Marjorie J; McCaskill-Stevens, Worta; Grubbs, Stephen S

2016-01-01

Several publications have described minimum standards and exemplary attributes for clinical trial sites to improve research quality. The National Cancer Institute (NCI) Community Cancer Centers Program (NCCCP) developed the clinical trial Best Practice Matrix tool to facilitate research program improvements through annual self-assessments and benchmarking. The tool identified nine attributes, each with three progressive levels, to score clinical trial infrastructural elements from less to more exemplary. The NCCCP sites correlated tool use with research program improvements, and the NCI pursued a formative evaluation to refine the interpretability and measurability of the tool. From 2011 to 2013, 21 NCCCP sites self-assessed their programs with the tool annually. During 2013 to 2014, NCI collaborators conducted a five-step formative evaluation of the matrix tool. Sites reported significant increases in level-three scores across the original nine attributes combined (P<.001). Two specific attributes exhibited significant change: clinical trial portfolio diversity and management (P=.0228) and clinical trial communication (P=.0281). The formative evaluation led to revisions, including renaming the Best Practice Matrix as the Clinical Trial Assessment of Infrastructure Matrix (CT AIM), expanding infrastructural attributes from nine to 11, clarifying metrics, and developing a new scoring tool. Broad community input, cognitive interviews, and pilot testing improved the usability and functionality of the tool. Research programs are encouraged to use the CT AIM to assess and improve site infrastructure. Experience within the NCCCP suggests that the CT AIM is useful for improving quality, benchmarking research performance, reporting progress, and communicating program needs with institutional leaders. The tool model may also be useful in disciplines beyond oncology.
Performance Evaluation and Modeling Techniques for Parallel Processors. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Dimpsey, Robert Tod

1992-01-01

In practice, the performance evaluation of supercomputers is still substantially driven by singlepoint estimates of metrics (e.g., MFLOPS) obtained by running characteristic benchmarks or workloads. With the rapid increase in the use of time-shared multiprogramming in these systems, such measurements are clearly inadequate. This is because multiprogramming and system overhead, as well as other degradations in performance due to time varying characteristics of workloads, are not taken into account. In multiprogrammed environments, multiple jobs and users can dramatically increase the amount of system overhead and degrade the performance of the machine. Performance techniques, such as benchmarking, which characterize performance on a dedicated machine ignore this major component of true computer performance. Due to the complexity of analysis, there has been little work done in analyzing, modeling, and predicting the performance of applications in multiprogrammed environments. This is especially true for parallel processors, where the costs and benefits of multi-user workloads are exacerbated. While some may claim that the issue of multiprogramming is not a viable one in the supercomputer market, experience shows otherwise. Even in recent massively parallel machines, multiprogramming is a key component. It has even been claimed that a partial cause of the demise of the CM2 was the fact that it did not efficiently support time-sharing. In the same paper, Gordon Bell postulates that, multicomputers will evolve to multiprocessors in order to support efficient multiprogramming. Therefore, it is clear that parallel processors of the future will be required to offer the user a time-shared environment with reasonable response times for the applications. In this type of environment, the most important performance metric is the completion of response time of a given application. However, there are a few evaluation efforts addressing this issue.
Patient radiation doses in interventional cardiology in the U.S.: Advisory data sets and possible initial values for U.S. reference levels

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, Donald L.; Hilohi, C. Michael; Spelic, David C.

2012-10-15

Purpose: To determine patient radiation doses from interventional cardiology procedures in the U.S and to suggest possible initial values for U.S. benchmarks for patient radiation dose from selected interventional cardiology procedures [fluoroscopically guided diagnostic cardiac catheterization and percutaneous coronary intervention (PCI)]. Methods: Patient radiation dose metrics were derived from analysis of data from the 2008 to 2009 Nationwide Evaluation of X-ray Trends (NEXT) survey of cardiac catheterization. This analysis used deidentified data and did not require review by an IRB. Data from 171 facilities in 30 states were analyzed. The distributions (percentiles) of radiation dose metrics were determined for diagnosticmore » cardiac catheterizations, PCI, and combined diagnostic and PCI procedures. Confidence intervals for these dose distributions were determined using bootstrap resampling. Results: Percentile distributions (advisory data sets) and possible preliminary U.S. reference levels (based on the 75th percentile of the dose distributions) are provided for cumulative air kerma at the reference point (K{sub a,r}), cumulative air kerma-area product (P{sub KA}), fluoroscopy time, and number of cine runs. Dose distributions are sufficiently detailed to permit dose audits as described in National Council on Radiation Protection and Measurements Report No. 168. Fluoroscopy times are consistent with those observed in European studies, but P{sub KA} is higher in the U.S. Conclusions: Sufficient data exist to suggest possible initial benchmarks for patient radiation dose for certain interventional cardiology procedures in the U.S. Our data suggest that patient radiation dose in these procedures is not optimized in U.S. practice.« less
Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set.

PubMed

Lenselink, Eelke B; Ten Dijke, Niels; Bongers, Brandon; Papadatos, George; van Vlijmen, Herman W T; Kowalczyk, Wojtek; IJzerman, Adriaan P; van Westen, Gerard J P

2017-08-14

The increase of publicly available bioactivity data in recent years has fueled and catalyzed research in chemogenomics, data mining, and modeling approaches. As a direct result, over the past few years a multitude of different methods have been reported and evaluated, such as target fishing, nearest neighbor similarity-based methods, and Quantitative Structure Activity Relationship (QSAR)-based protocols. However, such studies are typically conducted on different datasets, using different validation strategies, and different metrics. In this study, different methods were compared using one single standardized dataset obtained from ChEMBL, which is made available to the public, using standardized metrics (BEDROC and Matthews Correlation Coefficient). Specifically, the performance of Naïve Bayes, Random Forests, Support Vector Machines, Logistic Regression, and Deep Neural Networks was assessed using QSAR and proteochemometric (PCM) methods. All methods were validated using both a random split validation and a temporal validation, with the latter being a more realistic benchmark of expected prospective execution. Deep Neural Networks are the top performing classifiers, highlighting the added value of Deep Neural Networks over other more conventional methods. Moreover, the best method ('DNN_PCM') performed significantly better at almost one standard deviation higher than the mean performance. Furthermore, Multi-task and PCM implementations were shown to improve performance over single task Deep Neural Networks. Conversely, target prediction performed almost two standard deviations under the mean performance. Random Forests, Support Vector Machines, and Logistic Regression performed around mean performance. Finally, using an ensemble of DNNs, alongside additional tuning, enhanced the relative performance by another 27% (compared with unoptimized 'DNN_PCM'). Here, a standardized set to test and evaluate different machine learning algorithms in the context of multi-task learning is offered by providing the data and the protocols. Graphical Abstract .
40 CFR 141.172 - Disinfection profiling and benchmarking.

Code of Federal Regulations, 2011 CFR

2011-07-01

... benchmarking. 141.172 Section 141.172 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED... Disinfection-Systems Serving 10,000 or More People § 141.172 Disinfection profiling and benchmarking. (a... sanitary surveys conducted by the State. (c) Disinfection benchmarking. (1) Any system required to develop...
Establishing Benchmarks for Outcome Indicators: A Statistical Approach to Developing Performance Standards.

ERIC Educational Resources Information Center

Henry, Gary T.; And Others

1992-01-01

A statistical technique is presented for developing performance standards based on benchmark groups. The benchmark groups are selected using a multivariate technique that relies on a squared Euclidean distance method. For each observation unit (a school district in the example), a unique comparison group is selected. (SLD)
Off-Line Evaluation of Mobile-Centric Indoor Positioning Systems: The Experiences from the 2017 IPIN Competition

PubMed Central

Moreira, Adriano; Lungenstrass, Tomás; Lu, Wei-Chung; Seco, Fernando; Nicolau, Maria João; Farina, Joaquín; Morales, Juan Pablo; Lu, Wen-Chen; Cheng, Ho-Ti; Yang, Shi-Shen

2018-01-01

The development of indoor positioning solutions using smartphones is a growing activity with an enormous potential for everyday life and professional applications. The research activities on this topic concentrate on the development of new positioning solutions that are tested in specific environments under their own evaluation metrics. To explore the real positioning quality of smartphone-based solutions and their capabilities for seamlessly adapting to different scenarios, it is needed to find fair evaluation frameworks. The design of competitions using extensive pre-recorded datasets is a valid way to generate open data for comparing the different solutions created by research teams. In this paper, we discuss the details of the 2017 IPIN indoor localization competition, the different datasets created, the teams participating in the event, and the results they obtained. We compare these results with other competition-based approaches (Microsoft and Perf-loc) and on-line evaluation web sites. The lessons learned by organising these competitions and the benefits for the community are addressed along the paper. Our analysis paves the way for future developments on the standardization of evaluations and for creating a widely-adopted benchmark strategy for researchers and companies in the field. PMID:29415508
Off-Line Evaluation of Mobile-Centric Indoor Positioning Systems: The Experiences from the 2017 IPIN Competition.

PubMed

Torres-Sospedra, Joaquín; Jiménez, Antonio R; Moreira, Adriano; Lungenstrass, Tomás; Lu, Wei-Chung; Knauth, Stefan; Mendoza-Silva, Germán Martín; Seco, Fernando; Pérez-Navarro, Antoni; Nicolau, Maria João; Costa, António; Meneses, Filipe; Farina, Joaquín; Morales, Juan Pablo; Lu, Wen-Chen; Cheng, Ho-Ti; Yang, Shi-Shen; Fang, Shih-Hau; Chien, Ying-Ren; Tsao, Yu

2018-02-06

The development of indoor positioning solutions using smartphones is a growing activity with an enormous potential for everyday life and professional applications. The research activities on this topic concentrate on the development of new positioning solutions that are tested in specific environments under their own evaluation metrics. To explore the real positioning quality of smartphone-based solutions and their capabilities for seamlessly adapting to different scenarios, it is needed to find fair evaluation frameworks. The design of competitions using extensive pre-recorded datasets is a valid way to generate open data for comparing the different solutions created by research teams. In this paper, we discuss the details of the 2017 IPIN indoor localization competition, the different datasets created, the teams participating in the event, and the results they obtained. We compare these results with other competition-based approaches (Microsoft and Perf-loc) and on-line evaluation web sites. The lessons learned by organising these competitions and the benefits for the community are addressed along the paper. Our analysis paves the way for future developments on the standardization of evaluations and for creating a widely-adopted benchmark strategy for researchers and companies in the field.
Comparison of two-dimensional and three-dimensional simulations of dense nonaqueous phase liquids (DNAPLs): Migration and entrapment in a nonuniform permeability field

NASA Astrophysics Data System (ADS)

Christ, John A.; Lemke, Lawrence D.; Abriola, Linda M.

2005-01-01

The influence of reduced dimensionality (two-dimensional (2-D) versus 3-D) on predictions of dense nonaqueous phase liquid (DNAPL) infiltration and entrapment in statistically homogeneous, nonuniform permeability fields was investigated using the University of Texas Chemical Compositional Simulator (UTCHEM), a 3-D numerical multiphase simulator. Hysteretic capillary pressure-saturation and relative permeability relationships implemented in UTCHEM were benchmarked against those of another lab-tested simulator, the Michigan-Vertical and Lateral Organic Redistribution (M-VALOR). Simulation of a tetrachloroethene spill in 16 field-scale aquifer realizations generated DNAPL saturation distributions with approximately equivalent distribution metrics in two and three dimensions, with 2-D simulations generally resulting in slightly higher maximum saturations and increased vertical spreading. Variability in 2-D and 3-D distribution metrics across the set of realizations was shown to be correlated at a significance level of 95-99%. Neither spill volume nor release rate appeared to affect these conclusions. Variability in the permeability field did affect spreading metrics by increasing the horizontal spreading in 3-D more than in 2-D in more heterogeneous media simulations. The assumption of isotropic horizontal spatial statistics resulted, on average, in symmetric 3-D saturation distribution metrics in the horizontal directions. The practical implication of this study is that for statistically homogeneous, nonuniform aquifers, 2-D simulations of saturation distributions are good approximations to those obtained in 3-D. However, additional work will be needed to explore the influence of dimensionality on simulated DNAPL dissolution.
Development and application of freshwater sediment-toxicity benchmarks for currently used pesticides

USGS Publications Warehouse

Nowell, Lisa H.; Norman, Julia E.; Ingersoll, Christopher G.; Moran, Patrick W.

2016-01-01

Sediment-toxicity benchmarks are needed to interpret the biological significance of currently used pesticides detected in whole sediments. Two types of freshwater sediment benchmarks for pesticides were developed using spiked-sediment bioassay (SSB) data from the literature. These benchmarks can be used to interpret sediment-toxicity data or to assess the potential toxicity of pesticides in whole sediment. The Likely Effect Benchmark (LEB) defines a pesticide concentration in whole sediment above which there is a high probability of adverse effects on benthic invertebrates, and the Threshold Effect Benchmark (TEB) defines a concentration below which adverse effects are unlikely. For compounds without available SSBs, benchmarks were estimated using equilibrium partitioning (EqP). When a sediment sample contains a pesticide mixture, benchmark quotients can be summed for all detected pesticides to produce an indicator of potential toxicity for that mixture. Benchmarks were developed for 48 pesticide compounds using SSB data and 81 compounds using the EqP approach. In an example application, data for pesticides measured in sediment from 197 streams across the United States were evaluated using these benchmarks, and compared to measured toxicity from whole-sediment toxicity tests conducted with the amphipod Hyalella azteca (28-d exposures) and the midge Chironomus dilutus (10-d exposures). Amphipod survival, weight, and biomass were significantly and inversely related to summed benchmark quotients, whereas midge survival, weight, and biomass showed no relationship to benchmarks. Samples with LEB exceedances were rare (n = 3), but all were toxic to amphipods (i.e., significantly different from control). Significant toxicity to amphipods was observed for 72% of samples exceeding one or more TEBs, compared to 18% of samples below all TEBs. Factors affecting toxicity below TEBs may include the presence of contaminants other than pesticides, physical/chemical characteristics of sediment, and uncertainty in TEB values. Additional evaluations of benchmarks in relation to sediment chemistry and toxicity are ongoing.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Lopez, Jesse E.; Baptista, António M.

A sediment model coupled to the hydrodynamic model SELFE is validated against a benchmark combining a set of idealized tests and an application to a field-data rich energetic estuary. After sensitivity studies, model results for the idealized tests largely agree with previously reported results from other models in addition to analytical, semi-analytical, or laboratory results. Results of suspended sediment in an open channel test with fixed bottom are sensitive to turbulence closure and treatment for hydrodynamic bottom boundary. Results for the migration of a trench are very sensitive to critical stress and erosion rate, but largely insensitive to turbulence closure.more » The model is able to qualitatively represent sediment dynamics associated with estuarine turbidity maxima in an idealized estuary. Applied to the Columbia River estuary, the model qualitatively captures sediment dynamics observed by fixed stations and shipborne profiles. Representation of the vertical structure of suspended sediment degrades when stratification is underpredicted. Across all tests, skill metrics of suspended sediments lag those of hydrodynamics even when qualitatively representing dynamics. The benchmark is fully documented in an openly available repository to encourage unambiguous comparisons against other models.« less
A general theory of effect size, and its consequences for defining the benchmark response (BMR) for continuous endpoints.

PubMed

Slob, Wout

2017-04-01

A general theory on effect size for continuous data predicts a relationship between maximum response and within-group variation of biological parameters, which is empirically confirmed by results from dose-response analyses of 27 different biological parameters. The theory shows how effect sizes observed in distinct biological parameters can be compared and provides a basis for a generic definition of small, intermediate and large effects. While the theory is useful for experimental science in general, it has specific consequences for risk assessment: it solves the current debate on the appropriate metric for the Benchmark response in continuous data. The theory shows that scaling the BMR expressed as a percent change in means to the maximum response (in the way specified) automatically takes "natural variability" into account. Thus, the theory supports the underlying rationale of the BMR 1 SD. For various reasons, it is, however, recommended to use a BMR in terms of a percent change that is scaled to maximum response and/or within group variation (averaged over studies), as a single harmonized approach.
Benchmarking the performance of fixed-image receptor digital radiographic systems part 1: a novel method for image quality analysis.

PubMed

Lee, Kam L; Ireland, Timothy A; Bernardo, Michael

2016-06-01

This is the first part of a two-part study in benchmarking the performance of fixed digital radiographic general X-ray systems. This paper concentrates on reporting findings related to quantitative analysis techniques used to establish comparative image quality metrics. A systematic technical comparison of the evaluated systems is presented in part two of this study. A novel quantitative image quality analysis method is presented with technical considerations addressed for peer review. The novel method was applied to seven general radiographic systems with four different makes of radiographic image receptor (12 image receptors in total). For the System Modulation Transfer Function (sMTF), the use of grid was found to reduce veiling glare and decrease roll-off. The major contributor in sMTF degradation was found to be focal spot blurring. For the System Normalised Noise Power Spectrum (sNNPS), it was found that all systems examined had similar sNNPS responses. A mathematical model is presented to explain how the use of stationary grid may cause a difference between horizontal and vertical sNNPS responses.
Robust Visual Tracking Revisited: From Correlation Filter to Template Matching.

PubMed

Liu, Fanghui; Gong, Chen; Huang, Xiaolin; Zhou, Tao; Yang, Jie; Tao, Dacheng

2018-06-01

In this paper, we propose a novel matching based tracker by investigating the relationship between template matching and the recent popular correlation filter based trackers (CFTs). Compared to the correlation operation in CFTs, a sophisticated similarity metric termed mutual buddies similarity is proposed to exploit the relationship of multiple reciprocal nearest neighbors for target matching. By doing so, our tracker obtains powerful discriminative ability on distinguishing target and background as demonstrated by both empirical and theoretical analyses. Besides, instead of utilizing single template with the improper updating scheme in CFTs, we design a novel online template updating strategy named memory, which aims to select a certain amount of representative and reliable tracking results in history to construct the current stable and expressive template set. This scheme is beneficial for the proposed tracker to comprehensively understand the target appearance variations, recall some stable results. Both qualitative and quantitative evaluations on two benchmarks suggest that the proposed tracking method performs favorably against some recently developed CFTs and other competitive trackers.
DEKOIS: demanding evaluation kits for objective in silico screening--a versatile tool for benchmarking docking programs and scoring functions.

PubMed

Vogel, Simon M; Bauer, Matthias R; Boeckler, Frank M

2011-10-24

For widely applied in silico screening techniques success depends on the rational selection of an appropriate method. We herein present a fast, versatile, and robust method to construct demanding evaluation kits for objective in silico screening (DEKOIS). This automated process enables creating tailor-made decoy sets for any given sets of bioactives. It facilitates a target-dependent validation of docking algorithms and scoring functions helping to save time and resources. We have developed metrics for assessing and improving decoy set quality and employ them to investigate how decoy embedding affects docking. We demonstrate that screening performance is target-dependent and can be impaired by latent actives in the decoy set (LADS) or enhanced by poor decoy embedding. The presented method allows extending and complementing the collection of publicly available high quality decoy sets toward new target space. All present and future DEKOIS data sets will be made accessible at www.dekois.com.
QUAST: quality assessment tool for genome assemblies.

PubMed

Gurevich, Alexey; Saveliev, Vladislav; Vyahhi, Nikolay; Tesler, Glenn

2013-04-15

Limitations of genome sequencing techniques have led to dozens of assembly algorithms, none of which is perfect. A number of methods for comparing assemblers have been developed, but none is yet a recognized benchmark. Further, most existing methods for comparing assemblies are only applicable to new assemblies of finished genomes; the problem of evaluating assemblies of previously unsequenced species has not been adequately considered. Here, we present QUAST-a quality assessment tool for evaluating and comparing genome assemblies. This tool improves on leading assembly comparison software with new ideas and quality metrics. QUAST can evaluate assemblies both with a reference genome, as well as without a reference. QUAST produces many reports, summary tables and plots to help scientists in their research and in their publications. In this study, we used QUAST to compare several genome assemblers on three datasets. QUAST tables and plots for all of them are available in the Supplementary Material, and interactive versions of these reports are on the QUAST website. http://bioinf.spbau.ru/quast . Supplementary data are available at Bioinformatics online.
A Plan for Academic Biobank Solvency-Leveraging Resources and Applying Business Processes to Improve Sustainability.

PubMed

Uzarski, Diane; Burke, James; Turner, Barbara; Vroom, James; Short, Nancy

2015-10-01

Researcher-initiated biobanks based at academic institutions contribute valuable biomarker and translational research advances to medicine. With many legacy banks once supported by federal funding, reductions in fiscal support threaten the future of existing and new biobanks. When the Brain Bank at Duke University's Bryan Alzheimer's Disease Center (ADRC) faced a funding crisis, a collaborative, multidisciplinary team embarked on a 2-year biobank sustainability project utilizing a comprehensive business strategy, dedicated project management, and a systems approach involving many Duke University entities. By synthesizing and applying existing knowledge, Duke Translational Medicine Institute created and launched a business model that can be adjusted and applied to legacy and start-up academic biobanks. This model provides a path to identify new funding mechanisms, while also emphasizing improved communication, business development, and a focus on collaborating with industry to improve access to biospecimens. Benchmarks for short-term Brain Bank stabilization have been successfully attained, and the evaluation of long-term sustainability metrics is ongoing. © 2015 Wiley Periodicals, Inc.
A Plan for Academic Biobank Solvency—Leveraging Resources and Applying Business Processes to Improve Sustainability

PubMed Central

Burke, James; Turner, Barbara; Vroom, James; Short, Nancy

2015-01-01

Abstract Researcher‐initiated biobanks based at academic institutions contribute valuable biomarker and translational research advances to medicine. With many legacy banks once supported by federal funding, reductions in fiscal support threaten the future of existing and new biobanks. When the Brain Bank at Duke University's Bryan Alzheimer's Disease Center (ADRC) faced a funding crisis, a collaborative, multidisciplinary team embarked on a 2‐year biobank sustainability project utilizing a comprehensive business strategy, dedicated project management, and a systems approach involving many Duke University entities. By synthesizing and applying existing knowledge, Duke Translational Medicine Institute created and launched a business model that can be adjusted and applied to legacy and start‐up academic biobanks. This model provides a path to identify new funding mechanisms, while also emphasizing improved communication, business development, and a focus on collaborating with industry to improve access to biospecimens. Benchmarks for short‐term Brain Bank stabilization have been successfully attained, and the evaluation of long‐term sustainability metrics is ongoing. PMID:25996355
DOE Office of Scientific and Technical Information (OSTI.GOV)

Sandor, Debra; Chung, Donald; Keyser, David

This report documents the CEMAC methodologies for developing and reporting annual global clean energy manufacturing benchmarks. The report reviews previously published manufacturing benchmark reports and foundational data, establishes a framework for benchmarking clean energy technologies, describes the CEMAC benchmark analysis methodologies, and describes the application of the methodologies to the manufacturing of four specific clean energy technologies.

Benchmarking for Higher Education.

ERIC Educational Resources Information Center

Jackson, Norman, Ed.; Lund, Helen, Ed.

The chapters in this collection explore the concept of benchmarking as it is being used and developed in higher education (HE). Case studies and reviews show how universities in the United Kingdom are using benchmarking to aid in self-regulation and self-improvement. The chapters are: (1) "Introduction to Benchmarking" (Norman Jackson…
Summer 2016

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mendoza, Paul Michael

2016-08-31

The project goals seek to develop applications in order to automate MCNP criticality benchmark execution; create a dataset containing static benchmark information; combine MCNP output with benchmark information; and fit and visually represent data.
Development of a Computer-based Benchmarking and Analytical Tool. Benchmarking and Energy & Water Savings Tool in Dairy Plants (BEST-Dairy)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xu, Tengfang; Flapper, Joris; Ke, Jing

The overall goal of the project is to develop a computer-based benchmarking and energy and water savings tool (BEST-Dairy) for use in the California dairy industry – including four dairy processes – cheese, fluid milk, butter, and milk powder.
Rural Elementary Students' Understanding of Science and Agricultural Education Benchmarks Related to Meat and Livestock.

ERIC Educational Resources Information Center

Meischen, Deanna L.; Trexler, Cary J.

2003-01-01

Seven fifth-graders developed concept maps depicting their knowledge of meat product development. Despite their rural background, they lacked understanding of agriculture concepts and had mixed knowledge of agricultural literacy benchmarks concerning food products. Their language did not reflect scientific terminology in the benchmarks. (Contains…
Measuring Information Security: Guidelines to Build Metrics

NASA Astrophysics Data System (ADS)

von Faber, Eberhard

Measuring information security is a genuine interest of security managers. With metrics they can develop their security organization's visibility and standing within the enterprise or public authority as a whole. Organizations using information technology need to use security metrics. Despite the clear demands and advantages, security metrics are often poorly developed or ineffective parameters are collected and analysed. This paper describes best practices for the development of security metrics. First attention is drawn to motivation showing both requirements and benefits. The main body of this paper lists things which need to be observed (characteristic of metrics), things which can be measured (how measurements can be conducted) and steps for the development and implementation of metrics (procedures and planning). Analysis and communication is also key when using security metrics. Examples are also given in order to develop a better understanding. The author wants to resume, continue and develop the discussion about a topic which is or increasingly will be a critical factor of success for any security managers in larger organizations.
Using Benchmarking To Influence Tuition and Fee Decisions.

ERIC Educational Resources Information Center

Hubbell, Loren W. Loomis; Massa, Robert J.; Lapovsky, Lucie

2002-01-01

Discusses the use of benchmarking in managing enrollment. Using a case study, illustrates how benchmarking can help administrators develop strategies for planning and implementing admissions and pricing practices. (EV)
Automated, contour-based tracking and analysis of cell behaviour over long time scales in environments of varying complexity and cell density.

PubMed

Baker, Richard M; Brasch, Megan E; Manning, M Lisa; Henderson, James H

2014-08-06

Understanding single and collective cell motility in model environments is foundational to many current research efforts in biology and bioengineering. To elucidate subtle differences in cell behaviour despite cell-to-cell variability, we introduce an algorithm for tracking large numbers of cells for long time periods and present a set of physics-based metrics that quantify differences in cell trajectories. Our algorithm, termed automated contour-based tracking for in vitro environments (ACTIVE), was designed for adherent cell populations subject to nuclear staining or transfection. ACTIVE is distinct from existing tracking software because it accommodates both variability in image intensity and multi-cell interactions, such as divisions and occlusions. When applied to low-contrast images from live-cell experiments, ACTIVE reduced error in analysing cell occlusion events by as much as 43% compared with a benchmark-tracking program while simultaneously tracking cell divisions and resulting daughter-daughter cell relationships. The large dataset generated by ACTIVE allowed us to develop metrics that capture subtle differences between cell trajectories on different substrates. We present cell motility data for thousands of cells studied at varying densities on shape-memory-polymer-based nanotopographies and identify several quantitative differences, including an unanticipated difference between two 'control' substrates. We expect that ACTIVE will be immediately useful to researchers who require accurate, long-time-scale motility data for many cells. © 2014 The Author(s) Published by the Royal Society. All rights reserved.
A reference standard-based quality assurance program for radiology.

PubMed

Liu, Patrick T; Johnson, C Daniel; Miranda, Rafael; Patel, Maitray D; Phillips, Carrie J

2010-01-01

The authors have developed a comprehensive radiology quality assurance (QA) program that evaluates radiology interpretations and procedures by comparing them with reference standards. Performance metrics are calculated and then compared with benchmarks or goals on the basis of published multicenter data and meta-analyses. Additional workload for physicians is kept to a minimum by having trained allied health staff members perform the comparisons of radiology reports with the reference standards. The performance metrics tracked by the QA program include the accuracy of CT colonography for detecting polyps, the false-negative rate for mammographic detection of breast cancer, the accuracy of CT angiography detection of coronary artery stenosis, the accuracy of meniscal tear detection on MRI, the accuracy of carotid artery stenosis detection on MR angiography, the accuracy of parathyroid adenoma detection by parathyroid scintigraphy, the success rate for obtaining cortical tissue on ultrasound-guided core biopsies of pelvic renal transplants, and the technical success rate for peripheral arterial angioplasty procedures. In contrast with peer-review programs, this reference standard-based QA program minimizes the possibilities of reviewer bias and erroneous second reviewer interpretations. The more objective assessment of performance afforded by the QA program will provide data that can easily be used for education and management conferences, research projects, and multicenter evaluations. Additionally, such performance data could be used by radiology departments to demonstrate their value over nonradiology competitors to referring clinicians, hospitals, patients, and third-party payers. Copyright 2010 American College of Radiology. Published by Elsevier Inc. All rights reserved.
Developing Benchmarks for Solar Radio Bursts

NASA Astrophysics Data System (ADS)

Biesecker, D. A.; White, S. M.; Gopalswamy, N.; Black, C.; Domm, P.; Love, J. J.; Pierson, J.

2016-12-01

Solar radio bursts can interfere with radar, communication, and tracking signals. In severe cases, radio bursts can inhibit the successful use of radio communications and disrupt a wide range of systems that are reliant on Position, Navigation, and Timing services on timescales ranging from minutes to hours across wide areas on the dayside of Earth. The White House's Space Weather Action Plan has asked for solar radio burst intensity benchmarks for an event occurrence frequency of 1 in 100 years and also a theoretical maximum intensity benchmark. The solar radio benchmark team was also asked to define the wavelength/frequency bands of interest. The benchmark team developed preliminary (phase 1) benchmarks for the VHF (30-300 MHz), UHF (300-3000 MHz), GPS (1176-1602 MHz), F10.7 (2800 MHz), and Microwave (4000-20000) bands. The preliminary benchmarks were derived based on previously published work. Limitations in the published work will be addressed in phase 2 of the benchmark process. In addition, deriving theoretical maxima requires additional work, where it is even possible to, in order to meet the Action Plan objectives. In this presentation, we will present the phase 1 benchmarks and the basis used to derive them. We will also present the work that needs to be done in order to complete the final, or phase 2 benchmarks.
Implementing effective and sustainable multidisciplinary clinical thoracic oncology programs

PubMed Central

Freeman, Richard K.; Krasna, Mark J.

2015-01-01

Three models of care are described, including two models of multidisciplinary care for thoracic malignancies. The pros and cons of each model are discussed, the evidence supporting each is reviewed, and the need for more (and better) research into care delivery models is highlighted. Key stakeholders in thoracic oncology care delivery outcomes are identified, and the need to consider stakeholder perspectives in designing, validating and implementing multidisciplinary programs as a vehicle for quality improvement in thoracic oncology is emphasized. The importance of reconciling stakeholder perspectives, and identify meaningful stakeholder-relevant benchmarks is also emphasized. Metrics for measuring program implementation and overall success are proposed. PMID:26380186
Implementing effective and sustainable multidisciplinary clinical thoracic oncology programs.

PubMed

Osarogiagbon, Raymond U; Freeman, Richard K; Krasna, Mark J

2015-08-01

Three models of care are described, including two models of multidisciplinary care for thoracic malignancies. The pros and cons of each model are discussed, the evidence supporting each is reviewed, and the need for more (and better) research into care delivery models is highlighted. Key stakeholders in thoracic oncology care delivery outcomes are identified, and the need to consider stakeholder perspectives in designing, validating and implementing multidisciplinary programs as a vehicle for quality improvement in thoracic oncology is emphasized. The importance of reconciling stakeholder perspectives, and identify meaningful stakeholder-relevant benchmarks is also emphasized. Metrics for measuring program implementation and overall success are proposed.
Information filtering based on corrected redundancy-eliminating mass diffusion.

PubMed

Zhu, Xuzhen; Yang, Yujie; Chen, Guilin; Medo, Matus; Tian, Hui; Cai, Shi-Min

2017-01-01

Methods used in information filtering and recommendation often rely on quantifying the similarity between objects or users. The used similarity metrics often suffer from similarity redundancies arising from correlations between objects' attributes. Based on an unweighted undirected object-user bipartite network, we propose a Corrected Redundancy-Eliminating similarity index (CRE) which is based on a spreading process on the network. Extensive experiments on three benchmark data sets-Movilens, Netflix and Amazon-show that when used in recommendation, the CRE yields significant improvements in terms of recommendation accuracy and diversity. A detailed analysis is presented to unveil the origins of the observed differences between the CRE and mainstream similarity indices.
Quantification and characterization of leakage errors

NASA Astrophysics Data System (ADS)

Wood, Christopher J.; Gambetta, Jay M.

2018-03-01

We present a general framework for the quantification and characterization of leakage errors that result when a quantum system is encoded in the subspace of a larger system. To do this we introduce metrics for quantifying the coherent and incoherent properties of the resulting errors and we illustrate this framework with several examples relevant to superconducting qubits. In particular, we propose two quantities, the leakage and seepage rates, which together with average gate fidelity allow for characterizing the average performance of quantum gates in the presence of leakage and show how the randomized benchmarking protocol can be modified to enable the robust estimation of all three quantities for a Clifford gate set.
Understanding Acceptance of Software Metrics--A Developer Perspective

ERIC Educational Resources Information Center

Umarji, Medha

2009-01-01

Software metrics are measures of software products and processes. Metrics are widely used by software organizations to help manage projects, improve product quality and increase efficiency of the software development process. However, metrics programs tend to have a high failure rate in organizations, and developer pushback is one of the sources…
Towards a physics on fractals: Differential vector calculus in three-dimensional continuum with fractal metric

NASA Astrophysics Data System (ADS)

Balankin, Alexander S.; Bory-Reyes, Juan; Shapiro, Michael

2016-02-01

One way to deal with physical problems on nowhere differentiable fractals is the mapping of these problems into the corresponding problems for continuum with a proper fractal metric. On this way different definitions of the fractal metric were suggested to account for the essential fractal features. In this work we develop the metric differential vector calculus in a three-dimensional continuum with a non-Euclidean metric. The metric differential forms and Laplacian are introduced, fundamental identities for metric differential operators are established and integral theorems are proved by employing the metric version of the quaternionic analysis for the Moisil-Teodoresco operator, which has been introduced and partially developed in this paper. The relations between the metric and conventional operators are revealed. It should be emphasized that the metric vector calculus developed in this work provides a comprehensive mathematical formalism for the continuum with any suitable definition of fractal metric. This offers a novel tool to study physics on fractals.
Comparing Methods for Prioritising Protected Areas for Investment: A Case Study Using Madagascar’s Dry Forest Reptiles

PubMed Central

Gardner, Charlie J.; Raxworthy, Christopher J.; Metcalfe, Kristian; Raselimanana, Achille P.; Smith, Robert J.; Davies, Zoe G.

2015-01-01

There are insufficient resources available to manage the world’s existing protected area portfolio effectively, so the most important sites should be prioritised in investment decision-making. Sophisticated conservation planning and assessment tools developed to identify locations for new protected areas can provide an evidence base for such prioritisations, yet decision-makers in many countries lack the institutional support and necessary capacity to use the associated software. As such, simple heuristic approaches such as species richness or number of threatened species are generally adopted to inform prioritisation decisions. However, their performance has never been tested. Using the reptile fauna of Madagascar’s dry forests as a case study, we evaluate the performance of four site prioritisation protocols used to rank the conservation value of 22 established and candidate protected areas. We compare the results to a benchmark produced by the widely-used systematic conservation planning software Zonation. The four indices scored sites on the basis of: i) species richness; ii) an index based on species’ Red List status; iii) irreplaceability (a key metric in systematic conservation planning); and, iv) a novel Conservation Value Index (CVI), which incorporates species-level information on endemism, representation in the protected area system, tolerance of habitat degradation and hunting/collection pressure. Rankings produced by the four protocols were positively correlated to the results of Zonation, particularly amongst high-scoring sites, but CVI and Irreplaceability performed better than Species Richness and the Red List Index. Given the technological capacity constraints experienced by decision-makers in the developing world, our findings suggest that heuristic metrics can represent a useful alternative to more sophisticated analyses, especially when they integrate species-specific information related to extinction risk. However, this can require access to, and understanding of, more complex species data. PMID:26162073
Methods for detrending success metrics to account for inflationary and deflationary factors*

NASA Astrophysics Data System (ADS)

Petersen, A. M.; Penner, O.; Stanley, H. E.

2011-01-01

Time-dependent economic, technological, and social factors can artificially inflate or deflate quantitative measures for career success. Here we develop and test a statistical method for normalizing career success metrics across time dependent factors. In particular, this method addresses the long standing question: how do we compare the career achievements of professional athletes from different historical eras? Developing an objective approach will be of particular importance over the next decade as major league baseball (MLB) players from the "steroids era" become eligible for Hall of Fame induction. Some experts are calling for asterisks (*) to be placed next to the career statistics of athletes found guilty of using performance enhancing drugs (PED). Here we address this issue, as well as the general problem of comparing statistics from distinct eras, by detrending the seasonal statistics of professional baseball players. We detrend player statistics by normalizing achievements to seasonal averages, which accounts for changes in relative player ability resulting from a range of factors. Our methods are general, and can be extended to various arenas of competition where time-dependent factors play a key role. For five statistical categories, we compare the probability density function (pdf) of detrended career statistics to the pdf of raw career statistics calculated for all player careers in the 90-year period 1920-2009. We find that the functional form of these pdfs is stationary under detrending. This stationarity implies that the statistical regularity observed in the right-skewed distributions for longevity and success in professional sports arises from both the wide range of intrinsic talent among athletes and the underlying nature of competition. We fit the pdfs for career success by the Gamma distribution in order to calculate objective benchmarks based on extreme statistics which can be used for the identification of extraordinary careers.
Failure to Rescue Rates After Coronary Artery Bypass Grafting: An Analysis From The Society of Thoracic Surgeons Adult Cardiac Surgery Database.

PubMed

Edwards, Fred H; Ferraris, Victor A; Kurlansky, Paul A; Lobdell, Kevin W; He, Xia; O'Brien, Sean M; Furnary, Anthony P; Rankin, J Scott; Vassileva, Christina M; Fazzalari, Frank L; Magee, Mitchell J; Badhwar, Vinay; Xian, Ying; Jacobs, Jeffrey P; Wyler von Ballmoos, Moritz C; Shahian, David M

2016-08-01

Failure to rescue (FTR) is increasingly recognized as an important quality indicator in surgery. The Society of Thoracic Surgeons National Database was used to develop FTR metrics and a predictive FTR model for coronary artery bypass grafting (CABG). The study included 604,154 patients undergoing isolated CABG at 1,105 centers from January 2010 to January 2014. FTR was defined as death after four complications: stroke, renal failure, reoperation, and prolonged ventilation. FTR was determined for each complication and a composite of the four complications. A statistical model to predict FTR was developed. FTR rates were 22.3% for renal failure, 16.4% for stroke, 12.4% for reoperation, 12.1% for prolonged ventilation, and 10.5% for the composite. Mortality increased with multiple complications and with specific combinations of complications. The multivariate risk model for prediction of FTR demonstrated a C index of 0.792 and was well calibrated, with a 1.0% average difference between observed/expected (O/E) FTR rates. With centers grouped into mortality terciles, complication rates increased modestly (11.4% to 15.7%), but FTR rates more than doubled (6.8% to 13.9%) from the lowest to highest terciles. Centers in the lowest complication rate tercile had an FTR O/E of 1.14, whereas centers in the highest complication rate tercile had an FTR O/E of 0.91. CABG mortality rates vary directly with FTR, but complication rates have little relation to death. FTR rates derived from The Society of Thoracic Surgeons data can serve as national benchmarks. Predicted FTR rates may facilitate patient counseling, and FTR O/E ratios have promise as valuable quality metrics. Copyright © 2016 The Society of Thoracic Surgeons. Published by Elsevier Inc. All rights reserved.
A System-Wide Approach to Physician Efficiency and Utilization Rates for Non-Operating Room Anesthesia Sites.

PubMed

Tsai, Mitchell H; Huynh, Tinh T; Breidenstein, Max W; O'Donnell, Stephen E; Ehrenfeld, Jesse M; Urman, Richard D

2017-07-01

There has been little in the development or application of operating room (OR) management metrics to non-operating room anesthesia (NORA) sites. This is in contrast to the well-developed management framework for the OR management. We hypothesized that by adopting the concept of physician efficiency, we could determine the applicability of this clinical productivity benchmark for physicians providing services for NORA cases at a tertiary care center. We conducted a retrospective data analysis of NORA sites at an academic, rural hospital, including both adult and pediatric patients. Using the time stamps from WiseOR® (Palo Alto, CA), we calculated site utilization and physician efficiency for each day. We defined scheduling efficiency (SE) as the number of staffed anesthesiologists divided by the number of staffed sites and stratified the data into three categories (SE < 1, SE = 1, and SE >1). The mean physician efficiency was 0.293 (95% CI, [0.281, 0.305]), and the mean site utilization was 0.328 (95% CI, [0.314, 0.343]). When days were stratified by scheduling efficiency (SE < 1, =1, or >1), we found differences between physician efficiency and site utilization. On days where scheduling efficiency was less than 1, that is, there are more sites than physicians, mean physician efficiency (95% CI, [0.326, 0.402]) was higher than mean site utilization (95% CI, [0.250, 0.296]). We demonstrate that scheduling efficiency vis-à-vis physician efficiency as an OR management metric diverge when anesthesiologists travel between NORA sites. When the opportunity to scale operational efficiencies is limited, increasing scheduling efficiency by incorporating different NORA sites into a "block" allocation on any given day may be the only suitable tactical alternative.
The Craft of Benchmarking: Finding and Utilizing District-Level, Campus-Level, and Program-Level Standards.

ERIC Educational Resources Information Center

McGregor, Ellen N.; Attinasi, Louis C., Jr.

This paper describes the processes involved in selecting peer institutions for appropriate benchmarking using national databases (NCES-IPEDS). Benchmarking involves the identification of peer institutions and/or best practices in specific operational areas for the purpose of developing standards. The benchmarking process was borne in the early…

The Applicability of Proposed Object-Oriented Metrics to Developer Feedback in Time to Impact Development

NASA Technical Reports Server (NTRS)

Neal, Ralph D.

1996-01-01

This paper looks closely at each of the software metrics generated by the McCabe object-Oriented Tool(TM) and its ability to convey timely information to developers. The metrics are examined for meaningfulness in terms of the scale assignable to the metric by the rules of measurement theory and the software dimension being measured. Recommendations are made as to the proper use of each metric and its ability to influence development at an early stage. The metrics of the McCabe Object-Oriented Tool(TM) set were selected because of the tool's use in a couple of NASA IV&V projects.
NASA metric transition plan

NASA Technical Reports Server (NTRS)

1992-01-01

NASA science publications have used the metric system of measurement since 1970. Although NASA has maintained a metric use policy since 1979, practical constraints have restricted actual use of metric units. In 1988, an amendment to the Metric Conversion Act of 1975 required the Federal Government to adopt the metric system except where impractical. In response to Public Law 100-418 and Executive Order 12770, NASA revised its metric use policy and developed this Metric Transition Plan. NASA's goal is to use the metric system for program development and functional support activities to the greatest practical extent by the end of 1995. The introduction of the metric system into new flight programs will determine the pace of the metric transition. Transition of institutional capabilities and support functions will be phased to enable use of the metric system in flight program development and operations. Externally oriented elements of this plan will introduce and actively support use of the metric system in education, public information, and small business programs. The plan also establishes a procedure for evaluating and approving waivers and exceptions to the required use of the metric system for new programs. Coordination with other Federal agencies and departments (through the Interagency Council on Metric Policy) and industry (directly and through professional societies and interest groups) will identify sources of external support and minimize duplication of effort.
Carbon footprint of a music festival

NASA Astrophysics Data System (ADS)

Schafer, K. V.

2009-12-01

In an effort to curb CO2 and by extension, greenhouse gas emissions various initiatives have been taken statewide, nationally and internationally. However, benchmarks and metrics are not clearly defined for CO2 and CO2 equivalent accounting. The objective of this study is to estimate the carbon footprint of the Lincoln Park Music Festival which occurs annually in Newark, NJ. This festival runs for three days each summer and consists of music, food vendors, merchandise and a green marketplace. In order to determine the carbon footprint generated by transportation, surveys of participants were analyzed. Of the approximately 40,000 participants in 2009, 3.3% were surveyed. About 30% of respondents commuted to the festival by car with an average of 10 miles traveling distance. Transportation emission amounted to an estimated CO2 emission of 188 metric tons for all three days combined. Trash at the music festival was weighed, components estimated, and potential CO2 emission calculated if incinerated. 63% of the trash was found to be carbon based, which is the equivalent of three metric tons of CO2 if incinerated. The majority of the trash (>60%) could have been recycled, thus significantly reducing the carbon footprint. In order to limit the carbon footprint of this festival, alternative transport options would be advisable as transport accounted for the largest proportion of the carbon footprint at this festival.
Validation of tsunami inundation model TUNA-RP using OAR-PMEL-135 benchmark problem set

NASA Astrophysics Data System (ADS)

Koh, H. L.; Teh, S. Y.; Tan, W. K.; Kh'ng, X. Y.

2017-05-01

A standard set of benchmark problems, known as OAR-PMEL-135, is developed by the US National Tsunami Hazard Mitigation Program for tsunami inundation model validation. Any tsunami inundation model must be tested for its accuracy and capability using this standard set of benchmark problems before it can be gainfully used for inundation simulation. The authors have previously developed an in-house tsunami inundation model known as TUNA-RP. This inundation model solves the two-dimensional nonlinear shallow water equations coupled with a wet-dry moving boundary algorithm. This paper presents the validation of TUNA-RP against the solutions provided in the OAR-PMEL-135 benchmark problem set. This benchmark validation testing shows that TUNA-RP can indeed perform inundation simulation with accuracy consistent with that in the tested benchmark problem set.
Developing and Trialling an independent, scalable and repeatable IT-benchmarking procedure for healthcare organisations.

PubMed

Liebe, J D; Hübner, U

2013-01-01

Continuous improvements of IT-performance in healthcare organisations require actionable performance indicators, regularly conducted, independent measurements and meaningful and scalable reference groups. Existing IT-benchmarking initiatives have focussed on the development of reliable and valid indicators, but less on the questions about how to implement an environment for conducting easily repeatable and scalable IT-benchmarks. This study aims at developing and trialling a procedure that meets the afore-mentioned requirements. We chose a well established, regularly conducted (inter-) national IT-survey of healthcare organisations (IT-Report Healthcare) as the environment and offered the participants of the 2011 survey (CIOs of hospitals) to enter a benchmark. The 61 structural and functional performance indicators covered among others the implementation status and integration of IT-systems and functions, global user satisfaction and the resources of the IT-department. Healthcare organisations were grouped by size and ownership. The benchmark results were made available electronically and feedback on the use of these results was requested after several months. Fifty-ninehospitals participated in the benchmarking. Reference groups consisted of up to 141 members depending on the number of beds (size) and the ownership (public vs. private). A total of 122 charts showing single indicator frequency views were sent to each participant. The evaluation showed that 94.1% of the CIOs who participated in the evaluation considered this benchmarking beneficial and reported that they would enter again. Based on the feedback of the participants we developed two additional views that provide a more consolidated picture. The results demonstrate that establishing an independent, easily repeatable and scalable IT-benchmarking procedure is possible and was deemed desirable. Based on these encouraging results a new benchmarking round which includes process indicators is currently conducted.
Evaluation of control strategies using an oxidation ditch benchmark.

PubMed

Abusam, A; Keesman, K J; Spanjers, H; van, Straten G; Meinema, K

2002-01-01

This paper presents validation and implementation results of a benchmark developed for a specific full-scale oxidation ditch wastewater treatment plant. A benchmark is a standard simulation procedure that can be used as a tool in evaluating various control strategies proposed for wastewater treatment plants. It is based on model and performance criteria development. Testing of this benchmark, by comparing benchmark predictions to real measurements of the electrical energy consumptions and amounts of disposed sludge for a specific oxidation ditch WWTP, has shown that it can (reasonably) be used for evaluating the performance of this WWTP. Subsequently, the validated benchmark was then used in evaluating some basic and advanced control strategies. Some of the interesting results obtained are the following: (i) influent flow splitting ratio, between the first and the fourth aerated compartments of the ditch, has no significant effect on the TN concentrations in the effluent, and (ii) for evaluation of long-term control strategies, future benchmarks need to be able to assess settlers' performance.
GalaxyGPCRloop: Template-Based and Ab Initio Structure Sampling of the Extracellular Loops of G-Protein-Coupled Receptors.

PubMed

Won, Jonghun; Lee, Gyu Rie; Park, Hahnbeom; Seok, Chaok

2018-06-07

The second extracellular loops (ECL2s) of G-protein-coupled receptors (GPCRs) are often involved in GPCR functions, and their structures have important implications in drug discovery. However, structure prediction of ECL2 is difficult because of its long length and the structural diversity among different GPCRs. In this study, a new ECL2 conformational sampling method involving both template-based and ab initio sampling was developed. Inspired by the observation of similar ECL2 structures of closely related GPCRs, a template-based sampling method employing loop structure templates selected from the structure database was developed. A new metric for evaluating similarity of the target loop to templates was introduced for template selection. An ab initio loop sampling method was also developed to treat cases without highly similar templates. The ab initio method is based on the previously developed fragment assembly and loop closure method. A new sampling component that takes advantage of secondary structure prediction was added. In addition, a conserved disulfide bridge restraining ECL2 conformation was predicted and analytically incorporated into sampling, reducing the effective dimension of the conformational search space. The sampling method was combined with an existing energy function for comparison with previously reported loop structure prediction methods, and the benchmark test demonstrated outstanding performance.
Screening Breast MRI Outcomes in Routine Clinical Practice: Comparison to BI-RADS Benchmarks.

PubMed

Strigel, Roberta M; Rollenhagen, Jennifer; Burnside, Elizabeth S; Elezaby, Mai; Fowler, Amy M; Kelcz, Frederick; Salkowski, Lonie; DeMartini, Wendy B

2017-04-01

The BI-RADS Atlas 5th Edition includes screening breast magnetic resonance imaging (MRI) outcome benchmarks. However, the metrics are from expert practices and clinical trials of women with hereditary breast cancer predispositions, and it is unknown if they are appropriate for routine practice. We evaluated screening breast MRI audit outcomes in routine practice across a spectrum of elevated risk patients. This Institutional Review Board-approved, Health Insurance Portability and Accountability Act-compliant retrospective study included all consecutive screening breast MRI examinations from July 1, 2010 to June 30, 2013. Examination indications were categorized as gene mutation carrier (GMC), personal history (PH) breast cancer, family history (FH) breast cancer, chest radiation, and atypia/lobular carcinoma in situ (LCIS). Outcomes were determined by pathology and/or ≥12 months clinical and/or imaging follow-up. We calculated abnormal interpretation rate (AIR), cancer detection rate (CDR), positive predictive value of recommendation for tissue diagnosis (PPV2) and biopsy performed (PPV3), and median size and percentage of node-negative invasive cancers. Eight hundred and sixty examinations were performed in 566 patients with a mean age of 47 years. Indications were 367 of 860 (42.7%) FH, 365 of 860 (42.4%) PH, 106 of 860 (12.3%) GMC, 14 of 860 (1.6%) chest radiation, and 8 of 22 (0.9%) atypia/LCIS. The AIR was 134 of 860 (15.6%). Nineteen cancers were identified (13 invasive, 4 DCIS, two lymph nodes), resulting in CDR of 19 of 860 (22.1 per 1000), PPV2 of 19 of 88 (21.6%), and PPV3 of 19 of 80 (23.8%). Of 13 invasive breast cancers, median size was 10 mm, and 8 of 13 were node negative (61.5%). Performance outcomes of screening breast MRI in routine clinical practice across a spectrum of elevated risk patients met the American College of Radiology Breast Imaging Reporting and Data System benchmarks, supporting broad application of these metrics. The indication of a personal history of treated breast cancer accounted for a large proportion (42%) of our screening examinations, with breast MRI performance in this population at least comparable to that of other screening indications. Copyright © 2017 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.
Screening Breast MRI Outcomes in Routine Clinical Practice: Comparison to BI-RADS Benchmarks

PubMed Central

Strigel, Roberta M.; Rollenhagen, Jennifer; Burnside, Elizabeth S.; Elezaby, Mai; Fowler, Amy M.; Kelcz, Frederick; Salkowski, Lonie; DeMartini, Wendy B.

2017-01-01

Rationale and Objectives The BI-RADS Atlas 5th Edition includes screening breast magnetic resonance imaging (MRI) outcome benchmarks. However, the metrics are from expert practices and clinical trials of women with hereditary breast cancer predispositions, and it is unknown if they are appropriate for routine practice. We evaluated screening breast MRI audit outcomes in routine practice across a spectrum of elevated risk patients. Materials and Methods This Institutional Review Board-approved, Health Insurance Portability and Accountability Act-compliant retrospective study included all consecutive screening breast MRI examinations from July 1, 2010 to June 30, 2013. Examination indications were categorized as gene mutation carrier (GMC), personal history (PH) breast cancer, family history (FH) breast cancer, chest radiation, and atypia/lobular carcinoma in situ (LCIS). Outcomes were determined by pathology and/or ≥12 months clinical and/or imaging follow-up. We calculated abnormal interpretation rate (AIR), cancer detection rate (CDR), positive predictive value of recommendation for tissue diagnosis (PPV2) and biopsy performed (PPV3), and median size and percentage of node-negative invasive cancers. Results Eight hundred and sixty examinations were performed in 566 patients with a mean age of 47 years. Indications were 367 of 860 (42.7%) FH, 365 of 860 (42.4%) PH, 106 of 860 (12.3%) GMC, 14 of 860 (1.6%) chest radiation, and 8 of 22 (0.9%) atypia/LCIS. The AIR was 134 of 860 (15.6%). Nineteen cancers were identified (13 invasive, 4 DCIS, two lymph nodes), resulting in CDR of 19 of 860 (22.1 per 1000), PPV2 of 19 of 88 (21.6%), and PPV3 of 19 of 80 (23.8%). Of 13 invasive breast cancers, median size was 10 mm, and 8 of 13 were node negative (61.5%). Conclusions Performance outcomes of screening breast MRI in routine clinical practice across a spectrum of elevated risk patients met the American College of Radiology Breast Imaging Reporting and Data System benchmarks, supporting broad application of these metrics. The indication of a personal history of treated breast cancer accounted for a large proportion (42%) of our screening examinations, with breast MRI performance in this population at least comparable to that of other screening indications. PMID:27986508
Unprecedented 21st century drought risk in the American Southwest and Central Plains

PubMed Central

Cook, Benjamin I.; Ault, Toby R.; Smerdon, Jason E.

2015-01-01

In the Southwest and Central Plains of Western North America, climate change is expected to increase drought severity in the coming decades. These regions nevertheless experienced extended Medieval-era droughts that were more persistent than any historical event, providing crucial targets in the paleoclimate record for benchmarking the severity of future drought risks. We use an empirical drought reconstruction and three soil moisture metrics from 17 state-of-the-art general circulation models to show that these models project significantly drier conditions in the later half of the 21st century compared to the 20th century and earlier paleoclimatic intervals. This desiccation is consistent across most of the models and moisture balance variables, indicating a coherent and robust drying response to warming despite the diversity of models and metrics analyzed. Notably, future drought risk will likely exceed even the driest centuries of the Medieval Climate Anomaly (1100–1300 CE) in both moderate (RCP 4.5) and high (RCP 8.5) future emissions scenarios, leading to unprecedented drought conditions during the last millennium. PMID:26601131
Unprecedented 21st century drought risk in the American Southwest and Central Plains.

PubMed

Cook, Benjamin I; Ault, Toby R; Smerdon, Jason E

2015-02-01

In the Southwest and Central Plains of Western North America, climate change is expected to increase drought severity in the coming decades. These regions nevertheless experienced extended Medieval-era droughts that were more persistent than any historical event, providing crucial targets in the paleoclimate record for benchmarking the severity of future drought risks. We use an empirical drought reconstruction and three soil moisture metrics from 17 state-of-the-art general circulation models to show that these models project significantly drier conditions in the later half of the 21st century compared to the 20th century and earlier paleoclimatic intervals. This desiccation is consistent across most of the models and moisture balance variables, indicating a coherent and robust drying response to warming despite the diversity of models and metrics analyzed. Notably, future drought risk will likely exceed even the driest centuries of the Medieval Climate Anomaly (1100-1300 CE) in both moderate (RCP 4.5) and high (RCP 8.5) future emissions scenarios, leading to unprecedented drought conditions during the last millennium.
Quantum catastrophes: a case study

NASA Astrophysics Data System (ADS)

Znojil, Miloslav

2012-11-01

The bound-state spectrum of a Hamiltonian H is assumed real in a non-empty domain D of physical values of parameters. This means that for these parameters, H may be called crypto-Hermitian, i.e. made Hermitian via an ad hoc choice of the inner product in the physical Hilbert space of quantum bound states (i.e. via an ad hoc construction of the operator Θ called the metric). The name quantum catastrophe is then assigned to the N-tuple-exceptional-point crossing, i.e. to the scenario in which we leave the domain D along such a path that at the boundary of D, an N-plet of bound-state energies degenerates and, subsequently, complexifies. At any fixed N ⩾ 2, this process is simulated via an N × N benchmark effective matrix Hamiltonian H. It is being assigned such a closed-form metric which is made unique via an N-extrapolation-friendliness requirement. This article is part of a special issue of Journal of Physics A: Mathematical and Theoretical devoted to ‘Quantum physics with non-Hermitian operators’.
A proteomics performance standard to support measurement quality in proteomics.

PubMed

Beasley-Green, Ashley; Bunk, David; Rudnick, Paul; Kilpatrick, Lisa; Phinney, Karen

2012-04-01

The emergence of MS-based proteomic platforms as a prominent technology utilized in biochemical and biomedical research has increased the need for high-quality MS measurements. To address this need, National Institute of Standards and Technology (NIST) reference material (RM) 8323 yeast protein extract is introduced as a proteomics quality control material for benchmarking the preanalytical and analytical performance of proteomics-based experimental workflows. RM 8323 yeast protein extract is based upon the well-characterized eukaryote Saccharomyces cerevisiae and can be utilized in the design and optimization of proteomics-based methodologies from sample preparation to data analysis. To demonstrate its utility as a proteomics quality control material, we coupled LC-MS/MS measurements of RM 8323 with the NIST MS Quality Control (MSQC) performance metrics to quantitatively assess the LC-MS/MS instrumentation parameters that influence measurement accuracy, repeatability, and reproducibility. Due to the complexity of the yeast proteome, we also demonstrate how NIST RM 8323, along with the NIST MSQC performance metrics, can be used in the evaluation and optimization of proteomics-based sample preparation methods. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
The Medical Library Association Benchmarking Network: development and implementation.

PubMed

Dudden, Rosalind Farnam; Corcoran, Kate; Kaplan, Janice; Magouirk, Jeff; Rand, Debra C; Smith, Bernie Todd

2006-04-01

This article explores the development and implementation of the Medical Library Association (MLA) Benchmarking Network from the initial idea and test survey, to the implementation of a national survey in 2002, to the establishment of a continuing program in 2004. Started as a program for hospital libraries, it has expanded to include other nonacademic health sciences libraries. The activities and timelines of MLA's Benchmarking Network task forces and editorial board from 1998 to 2004 are described. The Benchmarking Network task forces successfully developed an extensive questionnaire with parameters of size and measures of library activity and published a report of the data collected by September 2002. The data were available to all MLA members in the form of aggregate tables. Utilization of Web-based technologies proved feasible for data intake and interactive display. A companion article analyzes and presents some of the data. MLA has continued to develop the Benchmarking Network with the completion of a second survey in 2004. The Benchmarking Network has provided many small libraries with comparative data to present to their administrators. It is a challenge for the future to convince all MLA members to participate in this valuable program.
The Medical Library Association Benchmarking Network: development and implementation*

PubMed Central

Dudden, Rosalind Farnam; Corcoran, Kate; Kaplan, Janice; Magouirk, Jeff; Rand, Debra C.; Smith, Bernie Todd

2006-01-01

Objective: This article explores the development and implementation of the Medical Library Association (MLA) Benchmarking Network from the initial idea and test survey, to the implementation of a national survey in 2002, to the establishment of a continuing program in 2004. Started as a program for hospital libraries, it has expanded to include other nonacademic health sciences libraries. Methods: The activities and timelines of MLA's Benchmarking Network task forces and editorial board from 1998 to 2004 are described. Results: The Benchmarking Network task forces successfully developed an extensive questionnaire with parameters of size and measures of library activity and published a report of the data collected by September 2002. The data were available to all MLA members in the form of aggregate tables. Utilization of Web-based technologies proved feasible for data intake and interactive display. A companion article analyzes and presents some of the data. MLA has continued to develop the Benchmarking Network with the completion of a second survey in 2004. Conclusions: The Benchmarking Network has provided many small libraries with comparative data to present to their administrators. It is a challenge for the future to convince all MLA members to participate in this valuable program. PMID:16636702
Sigma metrics as a tool for evaluating the performance of internal quality control in a clinical chemistry laboratory

PubMed Central

Kumar, B. Vinodh; Mohan, Thuthi

2018-01-01

OBJECTIVE: Six Sigma is one of the most popular quality management system tools employed for process improvement. The Six Sigma methods are usually applied when the outcome of the process can be measured. This study was done to assess the performance of individual biochemical parameters on a Sigma Scale by calculating the sigma metrics for individual parameters and to follow the Westgard guidelines for appropriate Westgard rules and levels of internal quality control (IQC) that needs to be processed to improve target analyte performance based on the sigma metrics. MATERIALS AND METHODS: This is a retrospective study, and data required for the study were extracted between July 2015 and June 2016 from a Secondary Care Government Hospital, Chennai. The data obtained for the study are IQC - coefficient of variation percentage and External Quality Assurance Scheme (EQAS) - Bias% for 16 biochemical parameters. RESULTS: For the level 1 IQC, four analytes (alkaline phosphatase, magnesium, triglyceride, and high-density lipoprotein-cholesterol) showed an ideal performance of ≥6 sigma level, five analytes (urea, total bilirubin, albumin, cholesterol, and potassium) showed an average performance of <3 sigma level and for level 2 IQCs, same four analytes of level 1 showed a performance of ≥6 sigma level, and four analytes (urea, albumin, cholesterol, and potassium) showed an average performance of <3 sigma level. For all analytes <6 sigma level, the quality goal index (QGI) was <0.8 indicating the area requiring improvement to be imprecision except cholesterol whose QGI >1.2 indicated inaccuracy. CONCLUSION: This study shows that sigma metrics is a good quality tool to assess the analytical performance of a clinical chemistry laboratory. Thus, sigma metric analysis provides a benchmark for the laboratory to design a protocol for IQC, address poor assay performance, and assess the efficiency of existing laboratory processes. PMID:29692587
Metrication study for large space telescope

NASA Technical Reports Server (NTRS)

Creswick, F. A.; Weller, A. E.

1973-01-01

Various approaches which could be taken in developing a metric-system design for the Large Space Telescope, considering potential penalties on development cost and time, commonality with other satellite programs, and contribution to national goals for conversion to the metric system of units were investigated. Information on the problems, potential approaches, and impacts of metrication was collected from published reports on previous aerospace-industry metrication-impact studies and through numerous telephone interviews. The recommended approach to LST metrication formulated in this study cells for new components and subsystems to be designed in metric-module dimensions, but U.S. customary practice is allowed where U.S. metric standards and metric components are not available or would be unsuitable. Electrical/electronic-system design, which is presently largely metric, is considered exempt from futher metrication. An important guideline is that metric design and fabrication should in no way compromise the effectiveness of the LST equipment.
Metrication report to the Congress. 1991 activities and 1992 plans

NASA Technical Reports Server (NTRS)

1991-01-01

During 1991, NASA approved a revised metric use policy and developed a NASA Metric Transition Plan. This Plan targets the end of 1995 for completion of NASA's metric initiatives. This Plan also identifies future programs that NASA anticipates will use the metric system of measurement. Field installations began metric transition studies in 1991 and will complete them in 1992. Half of NASA's Space Shuttle payloads for 1991, and almost all such payloads for 1992, have some metric-based elements. In 1992, NASA will begin assessing requirements for space-quality piece parts fabricated to U.S. metric standards, leading to development and qualification of high priority parts.
Life-cycle assessment of a biogas power plant with application of different climate metrics and inclusion of near-term climate forcers.

PubMed

Iordan, Cristina; Lausselet, Carine; Cherubini, Francesco

2016-12-15

This study assesses the environmental sustainability of electricity production through anaerobic co-digestion of sewage sludge and organic wastes. The analysis relies on primary data from a biogas plant, supplemented with data from the literature. The climate impact assessment includes emissions of near-term climate forcers (NTCFs) like ozone precursors and aerosols, which are frequently overlooked in Life Cycle Assessment (LCA), and the application of a suite of different emission metrics, based on either the Global Warming Potential (GWP) or the Global Temperature change Potential (GTP) with a time horizon (TH) of 20 or 100 years. The environmental performances of the biogas system are benchmarked against a conventional fossil fuel system. We also investigate the sensitivity of the system to critical parameters and provide five different scenarios in a sensitivity analysis. Hotspots are the management of the digestate (mainly due to the open storage) and methane (CH 4 ) losses during the anaerobic co-digestion. Results are sensitive to the type of climate metric used. The impacts range from 52 up to 116 g CO 2 -eq./MJ electricity when using GTP100 and GWP20, respectively. This difference is mostly due to the varying contribution from CH 4 emissions. The influence of NTCFs is about 6% for GWP100 (worst case), and grows up to 31% for GWP20 (best case). The biogas system has a lower performance than the fossil reference system for the acidification and particulate matter formation potentials. We argue for an active consideration of NTCFs in LCA and a critical reflection over the climate metrics to be used, as these aspects can significantly affect the final outcomes. Copyright © 2016 Elsevier Ltd. All rights reserved.
The mark of vegetation change on Earth's surface energy balance: data-driven diagnostics and model validation

NASA Astrophysics Data System (ADS)

Cescatti, A.; Duveiller, G.; Hooker, J.

2017-12-01

Changing vegetation cover not only affects the atmospheric concentration of greenhouse gases but also alters the radiative and non-radiative properties of the surface. The result of competing biophysical processes on Earth's surface energy balance varies spatially and seasonally, and can lead to warming or cooling depending on the specific vegetation change and on the background climate. To date these effects are not accounted for in land-based climate policies because of the complexity of the phenomena, contrasting model predictions and the lack of global data-driven assessments. To overcome the limitations of available observation-based diagnostics and of the on-going model inter-comparison, here we present a new benchmarking dataset derived from satellite remote sensing. This global dataset provides the potential changes induced by multiple vegetation transitions on the single terms of the surface energy balance. We used this dataset for two major goals: 1) Quantify the impact of actual vegetation changes that occurred during the decade 2000-2010, showing the overwhelming role of tropical deforestation in warming the surface by reducing evapotranspiration despite the concurrent brightening of the Earth. 2) Benchmark a series of ESMs against data-driven metrics of the land cover change impacts on the various terms of the surface energy budget and on the surface temperature. We anticipate that the dataset could be also used to evaluate future scenarios of land cover change and to develop the monitoring, reporting and verification guidelines required for the implementation of mitigation plans that account for biophysical land processes.

EPA's Benchmark Dose Modeling Software

EPA Science Inventory

The EPA developed the Benchmark Dose Software (BMDS) as a tool to help Agency risk assessors facilitate applying benchmark dose (BMD) method’s to EPA’s human health risk assessment (HHRA) documents. The application of BMD methods overcomes many well know limitations ...
Disaster metrics: quantitative benchmarking of hospital surge capacity in trauma-related multiple casualty events.

PubMed

Bayram, Jamil D; Zuabi, Shawki; Subbarao, Italo

2011-06-01

Hospital surge capacity in multiple casualty events (MCE) is the core of hospital medical response, and an integral part of the total medical capacity of the community affected. To date, however, there has been no consensus regarding the definition or quantification of hospital surge capacity. The first objective of this study was to quantitatively benchmark the various components of hospital surge capacity pertaining to the care of critically and moderately injured patients in trauma-related MCE. The second objective was to illustrate the applications of those quantitative parameters in local, regional, national, and international disaster planning; in the distribution of patients to various hospitals by prehospital medical services; and in the decision-making process for ambulance diversion. A 2-step approach was adopted in the methodology of this study. First, an extensive literature search was performed, followed by mathematical modeling. Quantitative studies on hospital surge capacity for trauma injuries were used as the framework for our model. The North Atlantic Treaty Organization triage categories (T1-T4) were used in the modeling process for simplicity purposes. Hospital Acute Care Surge Capacity (HACSC) was defined as the maximum number of critical (T1) and moderate (T2) casualties a hospital can adequately care for per hour, after recruiting all possible additional medical assets. HACSC was modeled to be equal to the number of emergency department beds (#EDB), divided by the emergency department time (EDT); HACSC = #EDB/EDT. In trauma-related MCE, the EDT was quantitatively benchmarked to be 2.5 (hours). Because most of the critical and moderate casualties arrive at hospitals within a 6-hour period requiring admission (by definition), the hospital bed surge capacity must match the HACSC at 6 hours to ensure coordinated care, and it was mathematically benchmarked to be 18% of the staffed hospital bed capacity. Defining and quantitatively benchmarking the different components of hospital surge capacity is vital to hospital preparedness in MCE. Prospective studies of our mathematical model are needed to verify its applicability, generalizability, and validity.
Rethinking the reference collection: exploring benchmarks and e-book availability.

PubMed

Husted, Jeffrey T; Czechowski, Leslie J

2012-01-01

Librarians in the Health Sciences Library System at the University of Pittsburgh explored the possibility of developing an electronic reference collection that would replace the print reference collection, thus providing access to these valuable materials to a widely dispersed user population. The librarians evaluated the print reference collection and standard collection development lists as potential benchmarks for the electronic collection, and they determined which books were available in electronic format. They decided that the low availability of electronic versions of titles in each benchmark group rendered the creation of an electronic reference collection using either benchmark impractical.
Metrication report to the Congress

NASA Technical Reports Server (NTRS)

1991-01-01

NASA's principal metrication accomplishments for FY 1990 were establishment of metrication policy for major programs, development of an implementing instruction for overall metric policy and initiation of metrication planning for the major program offices. In FY 1991, development of an overall NASA plan and individual program office plans will be completed, requirement assessments will be performed for all support areas, and detailed assessment and transition planning will be undertaken at the institutional level. Metric feasibility decisions on a number of major programs are expected over the next 18 months.
Metrication report to the Congress

NASA Technical Reports Server (NTRS)

1990-01-01

The principal NASA metrication activities for FY 1989 were a revision of NASA metric policy and evaluation of the impact of using the metric system of measurement for the design and construction of the Space Station Freedom. Additional studies provided a basis for focusing follow-on activity. In FY 1990, emphasis will shift to implementation of metric policy and development of a long-range metrication plan. The report which follows addresses Policy Development, Planning and Program Evaluation, and Supporting Activities for the past and coming year.
Evaluation of CHO Benchmarks on the Arria 10 FPGA using Intel FPGA SDK for OpenCL

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jin, Zheming; Yoshii, Kazutomo; Finkel, Hal

The OpenCL standard is an open programming model for accelerating algorithms on heterogeneous computing system. OpenCL extends the C-based programming language for developing portable codes on different platforms such as CPU, Graphics processing units (GPUs), Digital Signal Processors (DSPs) and Field Programmable Gate Arrays (FPGAs). The Intel FPGA SDK for OpenCL is a suite of tools that allows developers to abstract away the complex FPGA-based development flow for a high-level software development flow. Users can focus on the design of hardware-accelerated kernel functions in OpenCL and then direct the tools to generate the low-level FPGA implementations. The approach makes themore » FPGA-based development more accessible to software users as the needs for hybrid computing using CPUs and FPGAs are increasing. It can also significantly reduce the hardware development time as users can evaluate different ideas with high-level language without deep FPGA domain knowledge. Benchmarking of OpenCL-based framework is an effective way for analyzing the performance of system by studying the execution of the benchmark applications. CHO is a suite of benchmark applications that provides support for OpenCL [1]. The authors presented CHO as an OpenCL port of the CHStone benchmark. Using Altera OpenCL (AOCL) compiler to synthesize the benchmark applications, they listed the resource usage and performance of each kernel that can be successfully synthesized by the compiler. In this report, we evaluate the resource usage and performance of the CHO benchmark applications using the Intel FPGA SDK for OpenCL and Nallatech 385A FPGA board that features an Arria 10 FPGA device. The focus of the report is to have a better understanding of the resource usage and performance of the kernel implementations using Arria-10 FPGA devices compared to Stratix-5 FPGA devices. In addition, we also gain knowledge about the limitations of the current compiler when it fails to synthesize a benchmark application.« less
Early Warning Look Ahead Metrics: The Percent Milestone Backlog Metric

NASA Technical Reports Server (NTRS)

Shinn, Stephen A.; Anderson, Timothy P.

2017-01-01

All complex development projects experience delays and corresponding backlogs of their project control milestones during their acquisition lifecycles. NASA Goddard Space Flight Center (GSFC) Flight Projects Directorate (FPD) teamed with The Aerospace Corporation (Aerospace) to develop a collection of Early Warning Look Ahead metrics that would provide GSFC leadership with some independent indication of the programmatic health of GSFC flight projects. As part of the collection of Early Warning Look Ahead metrics, the Percent Milestone Backlog metric is particularly revealing, and has utility as a stand-alone execution performance monitoring tool. This paper describes the purpose, development methodology, and utility of the Percent Milestone Backlog metric. The other four Early Warning Look Ahead metrics are also briefly discussed. Finally, an example of the use of the Percent Milestone Backlog metric in providing actionable insight is described, along with examples of its potential use in other commodities.
A benchmarking program to reduce red blood cell outdating: implementation, evaluation, and a conceptual framework.

PubMed

Barty, Rebecca L; Gagliardi, Kathleen; Owens, Wendy; Lauzon, Deborah; Scheuermann, Sheena; Liu, Yang; Wang, Grace; Pai, Menaka; Heddle, Nancy M

2015-07-01

Benchmarking is a quality improvement tool that compares an organization's performance to that of its peers for selected indicators, to improve practice. Processes to develop evidence-based benchmarks for red blood cell (RBC) outdating in Ontario hospitals, based on RBC hospital disposition data from Canadian Blood Services, have been previously reported. These benchmarks were implemented in 160 hospitals provincewide with a multifaceted approach, which included hospital education, inventory management tools and resources, summaries of best practice recommendations, recognition of high-performing sites, and audit tools on the Transfusion Ontario website (http://transfusionontario.org). In this study we describe the implementation process and the impact of the benchmarking program on RBC outdating. A conceptual framework for continuous quality improvement of a benchmarking program was also developed. The RBC outdating rate for all hospitals trended downward continuously from April 2006 to February 2012, irrespective of hospitals' transfusion rates or their distance from the blood supplier. The highest annual outdating rate was 2.82%, at the beginning of the observation period. Each year brought further reductions, with a nadir outdating rate of 1.02% achieved in 2011. The key elements of the successful benchmarking strategy included dynamic targets, a comprehensive and evidence-based implementation strategy, ongoing information sharing, and a robust data system to track information. The Ontario benchmarking program for RBC outdating resulted in continuous and sustained quality improvement. Our conceptual iterative framework for benchmarking provides a guide for institutions implementing a benchmarking program. © 2015 AABB.
Metrication in a global environment

NASA Technical Reports Server (NTRS)

Aberg, J.

1994-01-01

A brief history about the development of the metric system of measurement is given. The need for the U.S. to implement the 'SI' metric system in the international markets, especially in the aerospace and general trade, is discussed. Development of metric implementation and experiences locally, nationally, and internationally are included.
Using a health promotion model to promote benchmarking.

PubMed

Welby, Jane

2006-07-01

The North East (England) Neonatal Benchmarking Group has been established for almost a decade and has researched and developed a substantial number of evidence-based benchmarks. With no firm evidence that these were being used or that there was any standardisation of neonatal care throughout the region, the group embarked on a programme to review the benchmarks and determine what evidence-based guidelines were needed to support standardisation. A health promotion planning model was used by one subgroup to structure the programme; it enabled all members of the sub group to engage in the review process and provided the motivation and supporting documentation for implementation of changes in practice. The need for a regional guideline development group to complement the activity of the benchmarking group is being addressed.
Object-Oriented Implementation of the NAS Parallel Benchmarks using Charm++

NASA Technical Reports Server (NTRS)

Krishnan, Sanjeev; Bhandarkar, Milind; Kale, Laxmikant V.

1996-01-01

This report describes experiences with implementing the NAS Computational Fluid Dynamics benchmarks using a parallel object-oriented language, Charm++. Our main objective in implementing the NAS CFD kernel benchmarks was to develop a code that could be used to easily experiment with different domain decomposition strategies and dynamic load balancing. We also wished to leverage the object-orientation provided by the Charm++ parallel object-oriented language, to develop reusable abstractions that would simplify the process of developing parallel applications. We first describe the Charm++ parallel programming model and the parallel object array abstraction, then go into detail about each of the Scalar Pentadiagonal (SP) and Lower/Upper Triangular (LU) benchmarks, along with performance results. Finally we conclude with an evaluation of the methodology used.
An Approach for Performance Assessments of Extravehicular Activity Gloves

NASA Technical Reports Server (NTRS)

Aitchison, Lindsay; Benosn, Elizabeth

2014-01-01

The Space Suit Assembly (SSA) Development Team at NASA Johnson Space Center has invested heavily in the advancement of rear-entry planetary exploration suit design but largely deferred development of extravehicular activity (EVA) glove designs, and accepted the risk of using the current flight gloves, Phase VI, for unique mission scenarios outside the Space Shuttle and International Space Station (ISS) Program realm of experience. However, as design reference missions mature, the risks of using heritage hardware have highlighted the need for developing robust new glove technologies. To address the technology gap, the NASA Game-Changing Technology group provided start-up funding for the High Performance EVA Glove (HPEG) Project in the spring of 2012. The overarching goal of the HPEG Project is to develop a robust glove design that increases human performance during EVA and creates pathway for future implementation of emergent technologies, with specific aims of increasing pressurized mobility to 60% of barehanded capability, increasing the durability by 100%, and decreasing the potential of gloves to cause injury during use. The HPEG Project focused initial efforts on identifying potential new technologies and benchmarking the performance of current state of the art gloves to identify trends in design and fit leading to establish standards and metrics against which emerging technologies can be assessed at both the component and assembly levels. The first of the benchmarking tests evaluated the quantitative mobility performance and subjective fit of two sets of prototype EVA gloves developed ILC Dover and David Clark Company as compared to the Phase VI. Both companies were asked to design and fabricate gloves to the same set of NASA provided hand measurements (which corresponded to a single size of Phase Vi glove) and focus their efforts on improving mobility in the metacarpal phalangeal and carpometacarpal joints. Four test subjects representing the design-to hand anthropometry completed range of motion, grip/pinch strength, dexterity, and fit evaluations for each glove design in pressurized conditions, with and without thermal micrometeoroid garments (TMG) installed. This paper provides a detailed description of hardware and test methodologies used and lessons learned.
Benchmarking reference services: an introduction.

PubMed

Marshall, J G; Buchanan, H S

1995-01-01

Benchmarking is based on the common sense idea that someone else, either inside or outside of libraries, has found a better way of doing certain things and that your own library's performance can be improved by finding out how others do things and adopting the best practices you find. Benchmarking is one of the tools used for achieving continuous improvement in Total Quality Management (TQM) programs. Although benchmarking can be done on an informal basis, TQM puts considerable emphasis on formal data collection and performance measurement. Used to its full potential, benchmarking can provide a common measuring stick to evaluate process performance. This article introduces the general concept of benchmarking, linking it whenever possible to reference services in health sciences libraries. Data collection instruments that have potential application in benchmarking studies are discussed and the need to develop common measurement tools to facilitate benchmarking is emphasized.
EPA and EFSA approaches for Benchmark Dose modeling

EPA Science Inventory

Benchmark dose (BMD) modeling has become the preferred approach in the analysis of toxicological dose-response data for the purpose of deriving human health toxicity values. The software packages most often used are Benchmark Dose Software (BMDS, developed by EPA) and PROAST (de...
Information filtering based on corrected redundancy-eliminating mass diffusion

PubMed Central

Zhu, Xuzhen; Yang, Yujie; Chen, Guilin; Medo, Matus; Tian, Hui

2017-01-01

Methods used in information filtering and recommendation often rely on quantifying the similarity between objects or users. The used similarity metrics often suffer from similarity redundancies arising from correlations between objects’ attributes. Based on an unweighted undirected object-user bipartite network, we propose a Corrected Redundancy-Eliminating similarity index (CRE) which is based on a spreading process on the network. Extensive experiments on three benchmark data sets—Movilens, Netflix and Amazon—show that when used in recommendation, the CRE yields significant improvements in terms of recommendation accuracy and diversity. A detailed analysis is presented to unveil the origins of the observed differences between the CRE and mainstream similarity indices. PMID:28749976
Connecting to young adults: an online social network survey of beliefs and attitudes associated with prescription opioid misuse among college students.

PubMed

Lord, Sarah; Brevard, Julie; Budman, Simon

2011-01-01

A survey of motives and attitudes associated with patterns of nonmedical prescription opioid medication use among college students was conducted on Facebook, a popular online social networking Web site. Response metrics for a 2-week random advertisement post, targeting students who had misused prescription medications, surpassed typical benchmarks for online marketing campaigns and yielded 527 valid surveys. Respondent characteristics, substance use patterns, and use motives were consistent with other surveys of prescription opioid use among college populations. Results support the potential of online social networks to serve as powerful vehicles to connect with college-aged populations about their drug use. Limitations of the study are noted.
Transaction Processing Performance Council (TPC): State of the Council 2010

NASA Astrophysics Data System (ADS)

Nambiar, Raghunath; Wakou, Nicholas; Carman, Forrest; Majdalany, Michael

The Transaction Processing Performance Council (TPC) is a non-profit corporation founded to define transaction processing and database benchmarks and to disseminate objective, verifiable performance data to the industry. Established in August 1988, the TPC has been integral in shaping the landscape of modern transaction processing and database benchmarks over the past twenty-two years. This paper provides an overview of the TPC's existing benchmark standards and specifications, introduces two new TPC benchmarks under development, and examines the TPC's active involvement in the early creation of additional future benchmarks.
Basis for the development of sustainable optimisation indicators for activated sludge wastewater treatment plants in the Republic of Ireland.

PubMed

Gordon, G T; McCann, B P

2015-01-01

This paper describes the basis of a stakeholder-based sustainable optimisation indicator (SOI) system to be developed for small-to-medium sized activated sludge (AS) wastewater treatment plants (WwTPs) in the Republic of Ireland (ROI). Key technical publications relating to best practice plant operation, performance audits and optimisation, and indicator and benchmarking systems for wastewater services are identified. Optimisation studies were developed at a number of Irish AS WwTPs and key findings are presented. A national AS WwTP manager/operator survey was carried out to verify the applied operational findings and identify the key operator stakeholder requirements for this proposed SOI system. It was found that most plants require more consistent operational data-based decision-making, monitoring and communication structures to facilitate optimised, sustainable and continuous performance improvement. The applied optimisation and stakeholder consultation phases form the basis of the proposed stakeholder-based SOI system. This system will allow for continuous monitoring and rating of plant performance, facilitate optimised operation and encourage the prioritisation of performance improvement through tracking key operational metrics. Plant optimisation has become a major focus due to the transfer of all ROI water services to a national water utility from individual local authorities and the implementation of the EU Water Framework Directive.
Engineering department physical plant staffing requirements.

PubMed

Cole, C

1997-05-01

There is a considerable effort in the health care arena to establish credible engineering manpower yardsticks that are universally applicable as a benchmark. This document was created by using one facility's own benchmark criteria that can be used to help develop either internal or competitive benchmarking comparisons.
Examining the impact of succession management practices on organizational performance: A national study of U.S. hospitals.

PubMed

Groves, Kevin S

2017-08-03

Spearheaded by the industry's transition from volume- to value-based care, the health care reform movement has spurred both unprecedented challenges and opportunities for developing more effective and sustainable health care delivery organizations. Whereas the formidable challenges of leading hospitals and health systems have been widely discussed, including reimbursement degradation, the rapidly aging workforce, and the imminent wave of executive retirements, the opportunity to leverage succession management and talent development capabilities to overcome these challenges has been largely overlooked. To address this key research and practice need, this multiphase study develops and validates an assessment of succession management practices for health care organizations. Utilizing data collected from two national samples of hospital organizations, the results provide a 32-item succession management assessment comprising seven distinct sets of succession management practices. The results indicate that succession management practices are strongly associated with multiple hospital performance metrics, including patient satisfaction and Medicare Spending per Beneficiary, leadership bench strength, and internal/external placement rate for executive level positions. The author concludes this article with a discussion of several practical implications for health care executives and boards, including employing the succession management assessment for diagnosing development opportunities, benchmarking succession planning and talent development practices against similar hospitals or health systems, and elevating the profile of succession management as a strategic priority in today's increasingly uncertain health care landscape.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Munro, J.F.; Kristal, J.; Thompson, G.

The Office of Environmental Management is bringing Headquarters and the Field together to implement process improvements throughout the Complex through a systematic process of organizational learning called benchmarking. Simply stated, benchmarking is a process of continuously comparing and measuring practices, processes, or methodologies with those of other private and public organizations. The EM benchmarking program, which began as the result of a recommendation from Xerox Corporation, is building trust and removing barriers to performance enhancement across the DOE organization. The EM benchmarking program is designed to be field-centered with Headquarters providing facilitatory and integrative functions on an ``as needed`` basis.more » One of the main goals of the program is to assist Field Offices and their associated M&O/M&I contractors develop the capabilities to do benchmarking for themselves. In this regard, a central precept is that in order to realize tangible performance benefits, program managers and staff -- the ones closest to the work - must take ownership of the studies. This avoids the ``check the box`` mentality associated with some third party studies. This workshop will provide participants with a basic level of understanding why the EM benchmarking team was developed and the nature and scope of its mission. Participants will also begin to understand the types of study levels and the particular methodology the EM benchmarking team is using to conduct studies. The EM benchmarking team will also encourage discussion on ways that DOE (both Headquarters and the Field) can team with its M&O/M&I contractors to conduct additional benchmarking studies. This ``introduction to benchmarking`` is intended to create a desire to know more and a greater appreciation of how benchmarking processes could be creatively employed to enhance performance.« less
Benchmarking specialty hospitals, a scoping review on theory and practice.

PubMed

Wind, A; van Harten, W H

2017-04-04

Although benchmarking may improve hospital processes, research on this subject is limited. The aim of this study was to provide an overview of publications on benchmarking in specialty hospitals and a description of study characteristics. We searched PubMed and EMBASE for articles published in English in the last 10 years. Eligible articles described a project stating benchmarking as its objective and involving a specialty hospital or specific patient category; or those dealing with the methodology or evaluation of benchmarking. Of 1,817 articles identified in total, 24 were included in the study. Articles were categorized into: pathway benchmarking, institutional benchmarking, articles on benchmark methodology or -evaluation and benchmarking using a patient registry. There was a large degree of variability:(1) study designs were mostly descriptive and retrospective; (2) not all studies generated and showed data in sufficient detail; and (3) there was variety in whether a benchmarking model was just described or if quality improvement as a consequence of the benchmark was reported upon. Most of the studies that described a benchmark model described the use of benchmarking partners from the same industry category, sometimes from all over the world. Benchmarking seems to be more developed in eye hospitals, emergency departments and oncology specialty hospitals. Some studies showed promising improvement effects. However, the majority of the articles lacked a structured design, and did not report on benchmark outcomes. In order to evaluate the effectiveness of benchmarking to improve quality in specialty hospitals, robust and structured designs are needed including a follow up to check whether the benchmark study has led to improvements.
Investing in innovation: trade-offs in the costs and cost-efficiency of school feeding using community-based kitchens in Bangladesh.

PubMed

Gelli, Aulo; Suwa, Yuko

2014-09-01

School feeding programs have been a key response to the recent food and economic crises and function to some degree in nearly every country in the world. However, school feeding programs are complex and exhibit different, context-specific models or configurations. To examine the trade-offs, including the costs and cost-efficiency, of an innovative cluster kitchen implementation model in Bangladesh using a standardized framework. A supply chain framework based on international standards was used to provide benchmarks for meaningful comparisons across models. Implementation processes specific to the program in Bangladesh were mapped against this reference to provide a basis for standardized performance measures. Qualitative and quantitative data on key metrics were collected retrospectively using semistructured questionnaires following an ingredients approach, including both financial and economic costs. Costs were standardized to a 200-feeding-day year and 700 kcal daily. The cluster kitchen model had similarities with the semidecentralized model and outsourced models in the literature, the main differences involving implementation scale, scale of purchasing volumes, and frequency of purchasing. Two important features stand out in terms of implementation: the nutritional quality of meals and the level of community involvement. The standardized full cost per child per year was US$110. Despite the nutritious content of the meals, the overall cost-efficiency in cost per nutrient output was lower than the benchmark for centralized programs, due mainly to support and start-up costs. Cluster kitchens provide an example of an innovative implementation model, combining an emphasis on quality meal delivery with strong community engagement. However, the standardized costs-per child were above the average benchmarks for both low-and middle-income countries. In contrast to the existing benchmark data from mature, centralized models, the main cost drivers of the program were associated with support and start-up activities. Further research is required to better understand changes in cost drivers as programs mature.
Regional restoration benchmarks for Acropora cervicornis

NASA Astrophysics Data System (ADS)

Schopmeyer, Stephanie A.; Lirman, Diego; Bartels, Erich; Gilliam, David S.; Goergen, Elizabeth A.; Griffin, Sean P.; Johnson, Meaghan E.; Lustic, Caitlin; Maxwell, Kerry; Walter, Cory S.

2017-12-01

Coral gardening plays an important role in the recovery of depleted populations of threatened Acropora cervicornis in the Caribbean. Over the past decade, high survival coupled with fast growth of in situ nursery corals have allowed practitioners to create healthy and genotypically diverse nursery stocks. Currently, thousands of corals are propagated and outplanted onto degraded reefs on a yearly basis, representing a substantial increase in the abundance, biomass, and overall footprint of A. cervicornis. Here, we combined an extensive dataset collected by restoration practitioners to document early (1-2 yr) restoration success metrics in Florida and Puerto Rico, USA. By reporting region-specific data on the impacts of fragment collection on donor colonies, survivorship and productivity of nursery corals, and survivorship and productivity of outplanted corals during normal conditions, we provide the basis for a stop-light indicator framework for new or existing restoration programs to evaluate their performance. We show that current restoration methods are very effective, that no excess damage is caused to donor colonies, and that once outplanted, corals behave just as wild colonies. We also provide science-based benchmarks that can be used by programs to evaluate successes and challenges of their efforts, and to make modifications where needed. We propose that up to 10% of the biomass can be collected from healthy, large A. cervicornis donor colonies for nursery propagation. We also propose the following benchmarks for the first year of activities for A. cervicornis restoration: (1) >75% live tissue cover on donor colonies; (2) >80% survivorship of nursery corals; and (3) >70% survivorship of outplanted corals. Finally, we report productivity means of 4.4 cm yr-1 for nursery corals and 4.8 cm yr-1 for outplants as a frame of reference for ranking performance within programs. Such benchmarks, and potential subsequent adaptive actions, are needed to fully assess the long-term success of coral restoration and species recovery programs.
Benchmark Shock Tube Experiments for Radiative Heating Relevant to Earth Re-Entry

NASA Technical Reports Server (NTRS)

Brandis, A. M.; Cruden, B. A.

2017-01-01

Detailed spectrally and spatially resolved radiance has been measured in the Electric Arc Shock Tube (EAST) facility for conditions relevant to high speed entry into a variety of atmospheres, including Earth, Venus, Titan, Mars and the Outer Planets. The tests that measured radiation relevant for Earth re-entry are the focus of this work and are taken from campaigns 47, 50, 52 and 57. These tests covered conditions from 8 km/s to 15.5 km/s at initial pressures ranging from 0.05 Torr to 1 Torr, of which shots at 0.1 and 0.2 Torr are analyzed in this paper. These conditions cover a range of points of interest for potential fight missions, including return from Low Earth Orbit, the Moon and Mars. The large volume of testing available from EAST is useful for statistical analysis of radiation data, but is problematic for identifying representative experiments for performing detailed analysis. Therefore, the intent of this paper is to select a subset of benchmark test data that can be considered for further detailed study. These benchmark shots are intended to provide more accessible data sets for future code validation studies and facility-to-facility comparisons. The shots that have been selected as benchmark data are the ones in closest agreement to a line of best fit through all of the EAST results, whilst also showing the best experimental characteristics, such as test time and convergence to equilibrium. The EAST data are presented in different formats for analysis. These data include the spectral radiance at equilibrium, the spatial dependence of radiance over defined wavelength ranges and the mean non-equilibrium spectral radiance (so-called 'spectral non-equilibrium metric'). All the information needed to simulate each experimental trace, including free-stream conditions, shock time of arrival (i.e. x-t) relation, and the spectral and spatial resolution functions, are provided.
Reliable nanomaterial classification of powders using the volume-specific surface area method

NASA Astrophysics Data System (ADS)

Wohlleben, Wendel; Mielke, Johannes; Bianchin, Alvise; Ghanem, Antoine; Freiberger, Harald; Rauscher, Hubert; Gemeinert, Marion; Hodoroaba, Vasile-Dan

2017-02-01

The volume-specific surface area (VSSA) of a particulate material is one of two apparently very different metrics recommended by the European Commission for a definition of "nanomaterial" for regulatory purposes: specifically, the VSSA metric may classify nanomaterials and non-nanomaterials differently than the median size in number metrics, depending on the chemical composition, size, polydispersity, shape, porosity, and aggregation of the particles in the powder. Here we evaluate the extent of agreement between classification by electron microscopy (EM) and classification by VSSA on a large set of diverse particulate substances that represent all the anticipated challenges except mixtures of different substances. EM and VSSA are determined in multiple labs to assess also the level of reproducibility. Based on the results obtained on highly characterized benchmark materials from the NanoDefine EU FP7 project, we derive a tiered screening strategy for the purpose of implementing the definition of nanomaterials. We finally apply the screening strategy to further industrial materials, which were classified correctly and left only borderline cases for EM. On platelet-shaped nanomaterials, VSSA is essential to prevent false-negative classification by EM. On porous materials, approaches involving extended adsorption isotherms prevent false positive classification by VSSA. We find no false negatives by VSSA, neither in Tier 1 nor in Tier 2, despite real-world industrial polydispersity and diverse composition, shape, and coatings. The VSSA screening strategy is recommended for inclusion in a technical guidance for the implementation of the definition.
Developing a Security Metrics Scorecard for Healthcare Organizations.

PubMed

Elrefaey, Heba; Borycki, Elizabeth; Kushniruk, Andrea

2015-01-01

In healthcare, information security is a key aspect of protecting a patient's privacy and ensuring systems availability to support patient care. Security managers need to measure the performance of security systems and this can be achieved by using evidence-based metrics. In this paper, we describe the development of an evidence-based security metrics scorecard specific to healthcare organizations. Study participants were asked to comment on the usability and usefulness of a prototype of a security metrics scorecard that was developed based on current research in the area of general security metrics. Study findings revealed that scorecards need to be customized for the healthcare setting in order for the security information to be useful and usable in healthcare organizations. The study findings resulted in the development of a security metrics scorecard that matches the healthcare security experts' information requirements.
Making Benchmark Testing Work

ERIC Educational Resources Information Center

Herman, Joan L.; Baker, Eva L.

2005-01-01

Many schools are moving to develop benchmark tests to monitor their students' progress toward state standards throughout the academic year. Benchmark tests can provide the ongoing information that schools need to guide instructional programs and to address student learning problems. The authors discuss six criteria that educators can use to…
Workshop Outline and Training Materials for Adult Learners Developed During the Metric Leader Training Project for Maine.

ERIC Educational Resources Information Center

Butzow, John W.; Yvon, Bernard R.

Outlines of five sessions of the Metric Leader Training Project for Maine are included in this publication. Topics are: (1) Introduction - Length - Temperature; (2) Volume/Capacity - Mass/Weight; (3) Metric Advocacy - Length - Area - Temperature; (4) Metric Education Resources; and (5) Metric Education Planning - Metric Olympics - Final…
Comprehensive comparison of gap filling techniques for eddy covariance net carbon fluxes

NASA Astrophysics Data System (ADS)

Moffat, A. M.; Papale, D.; Reichstein, M.; Hollinger, D. Y.; Richardson, A. D.; Barr, A. G.; Beckstein, C.; Braswell, B. H.; Churkina, G.; Desai, A. R.; Falge, E.; Gove, J. H.; Heimann, M.; Hui, D.; Jarvis, A. J.; Kattge, J.; Noormets, A.; Stauch, V. J.

2007-12-01

Review of fifteen techniques for estimating missing values of net ecosystem CO2 exchange (NEE) in eddy covariance time series and evaluation of their performance for different artificial gap scenarios based on a set of ten benchmark datasets from six forested sites in Europe. The goal of gap filling is the reproduction of the NEE time series and hence this present work focuses on estimating missing NEE values, not on editing or the removal of suspect values in these time series due to systematic errors in the measurements (e.g. nighttime flux, advection). The gap filling was examined by generating fifty secondary datasets with artificial gaps (ranging in length from single half-hours to twelve consecutive days) for each benchmark dataset and evaluating the performance with a variety of statistical metrics. The performance of the gap filling varied among sites and depended on the level of aggregation (native half- hourly time step versus daily), long gaps were more difficult to fill than short gaps, and differences among the techniques were more pronounced during the day than at night. The non-linear regression techniques (NLRs), the look-up table (LUT), marginal distribution sampling (MDS), and the semi-parametric model (SPM) generally showed good overall performance. The artificial neural network based techniques (ANNs) were generally, if only slightly, superior to the other techniques. The simple interpolation technique of mean diurnal variation (MDV) showed a moderate but consistent performance. Several sophisticated techniques, the dual unscented Kalman filter (UKF), the multiple imputation method (MIM), the terrestrial biosphere model (BETHY), but also one of the ANNs and one of the NLRs showed high biases which resulted in a low reliability of the annual sums, indicating that additional development might be needed. An uncertainty analysis comparing the estimated random error in the ten benchmark datasets with the artificial gap residuals suggested that the techniques are already at or very close to the noise limit of the measurements. Based on the techniques and site data examined here, the effect of gap filling on the annual sums of NEE is modest, with most techniques falling within a range of ±25 g C m-2 y-1.
Benchmarking quantitative label-free LC-MS data processing workflows using a complex spiked proteomic standard dataset.

PubMed

Ramus, Claire; Hovasse, Agnès; Marcellin, Marlène; Hesse, Anne-Marie; Mouton-Barbosa, Emmanuelle; Bouyssié, David; Vaca, Sebastian; Carapito, Christine; Chaoui, Karima; Bruley, Christophe; Garin, Jérôme; Cianférani, Sarah; Ferro, Myriam; Van Dorssaeler, Alain; Burlet-Schiltz, Odile; Schaeffer, Christine; Couté, Yohann; Gonzalez de Peredo, Anne

2016-01-30

Proteomic workflows based on nanoLC-MS/MS data-dependent-acquisition analysis have progressed tremendously in recent years. High-resolution and fast sequencing instruments have enabled the use of label-free quantitative methods, based either on spectral counting or on MS signal analysis, which appear as an attractive way to analyze differential protein expression in complex biological samples. However, the computational processing of the data for label-free quantification still remains a challenge. Here, we used a proteomic standard composed of an equimolar mixture of 48 human proteins (Sigma UPS1) spiked at different concentrations into a background of yeast cell lysate to benchmark several label-free quantitative workflows, involving different software packages developed in recent years. This experimental design allowed to finely assess their performances in terms of sensitivity and false discovery rate, by measuring the number of true and false-positive (respectively UPS1 or yeast background proteins found as differential). The spiked standard dataset has been deposited to the ProteomeXchange repository with the identifier PXD001819 and can be used to benchmark other label-free workflows, adjust software parameter settings, improve algorithms for extraction of the quantitative metrics from raw MS data, or evaluate downstream statistical methods. Bioinformatic pipelines for label-free quantitative analysis must be objectively evaluated in their ability to detect variant proteins with good sensitivity and low false discovery rate in large-scale proteomic studies. This can be done through the use of complex spiked samples, for which the "ground truth" of variant proteins is known, allowing a statistical evaluation of the performances of the data processing workflow. We provide here such a controlled standard dataset and used it to evaluate the performances of several label-free bioinformatics tools (including MaxQuant, Skyline, MFPaQ, IRMa-hEIDI and Scaffold) in different workflows, for detection of variant proteins with different absolute expression levels and fold change values. The dataset presented here can be useful for tuning software tool parameters, and also testing new algorithms for label-free quantitative analysis, or for evaluation of downstream statistical methods. Copyright © 2015 Elsevier B.V. All rights reserved.
Overall ED efficiency is associated with decreased time to percutaneous coronary intervention for ST-segment elevation myocardial infarction.

PubMed

Jones, Christopher W; Sonnad, Seema S; Augustine, James J; Reese, Charles L

2014-10-01

Performance of percutaneous coronary intervention (PCI) within 90 minutes of hospital arrival for ST-segment elevation myocardial infarction patients is a commonly cited clinical quality measure. The Centers for Medicare and Medicaid Services use this measure to adjust hospital reimbursement via the Value-Based Purchasing Program. This study investigated the relationship between hospital performance on this quality measure and emergency department (ED) operational efficiency. Hospital-level data from Centers for Medicare and Medicaid Services on PCI quality measure performance was linked to information on operational performance from 272 US EDs obtained from the Emergency Department Benchmarking Alliance annual operations survey. Standard metrics of ED size, acuity, and efficiency were compared across hospitals grouped by performance on the door-to-balloon time quality measure. Mean hospital performance on the 90-minute arrival to PCI measure was 94.0% (range, 42-100). Among hospitals failing to achieve the door-to-balloon time performance standard, median ED length of stay was 209 minutes, compared with 173 minutes among those hospitals meeting the benchmark standard (P < .001). Similarly, median time from ED patient arrival to physician evaluation was 39 minutes for hospitals below the performance standard and 23 minutes for hospitals at the benchmark standard (P < .001). Markers of ED size and acuity, including annual patient volume, admission rate, and the percentage of patients arriving via ambulance did not vary with door-to-balloon time. Better performance on measures associated with ED efficiency is associated with more timely PCI performance. Copyright © 2014 Elsevier Inc. All rights reserved.
Compounded effects of heat waves and droughts over the Western Electricity Grid: spatio-temporal scales of impacts and predictability toward mitigation and adaptation.

NASA Astrophysics Data System (ADS)

Voisin, N.; Kintner-Meyer, M.; Skaggs, R.; Xie, Y.; Wu, D.; Nguyen, T. B.; Fu, T.; Zhou, T.

2016-12-01

Heat waves and droughts are projected to be more frequent and intense. We have seen in the past the effects of each of those extreme climate events on electricity demand and constrained electricity generation, challenging power system operations. Our aim here is to understand the compounding effects under historical conditions. We present a benchmark of Western US grid performance under 55 years of historical climate, and including droughts, using 2010-level of water demand and water management infrastructure, and 2010-level of electricity grid infrastructure and operations. We leverage CMIP5 historical hydrology simulations and force a large scale river routing- reservoir model with 2010-level sectoral water demands. The regulated flow at each water-dependent generating plants is processed to adjust water-dependent electricity generation parameterization in a production cost model, that represents 2010-level power system operations with hourly energy demand of 2010. The resulting benchmark includes a risk distribution of several grid performance metrics (unserved energy, production cost, carbon emission) as a function of inter-annual variability in regional water availability and predictability using large scale climate oscillations. In the second part of the presentation, we describe an approach to map historical heat waves onto this benchmark grid performance using a building energy demand model. The impact of the heat waves, combined with the impact of droughts, is explored at multiple scales to understand the compounding effects. Vulnerabilities of the power generation and transmission systems are highlighted to guide future adaptation.
Multisociety consensus quality improvement guidelines for intraarterial catheter-directed treatment of acute ischemic stroke, from the American Society of Neuroradiology, Canadian Interventional Radiology Association, Cardiovascular and Interventional Radiological Society of Europe, Society for Cardiovascular Angiography and Interventions, Society of Interventional Radiology, Society of NeuroInterventional Surgery, European Society of Minimally Invasive Neurological Therapy, and Society of Vascular and Interventional Neurology.

PubMed

Sacks, David; Black, Carl M; Cognard, Christophe; Connors, John J; Frei, Donald; Gupta, Rishi; Jovin, Tudor G; Kluck, Bryan; Meyers, Philip M; Murphy, Kieran J; Ramee, Stephen; Rüfenacht, Daniel A; Bernadette Stallmeyer, M J; Vorwerk, Dierk

2013-02-01

In this international multispecialty document, quality benchmarks for processes of care and clinical outcomes are defined. It is intended that these benchmarks be used in a quality assurance program to assess and improve processes and outcomes in acute stroke revascularization. Members of the writing group were appointed by the American Society of Neuroradiology, Canadian Interventional Radiology Association, Cardiovascular and Interventional Radiological Society of Europe, Society of Cardiac Angiography and Interventions, Society of Interventional Radiology, Society of NeuroInterventional Surgery, European Society of Minimally Invasive Neurological Therapy, and Society of Vascular and Interventional Neurology. The writing group reviewed the relevant literature from 1986 through February 2012 to create an evidence table summarizing processes and outcomes of care. Performance metrics and thresholds were then created by consensus. The guideline was approved by the sponsoring societies. It is intended that this guideline be fully updated in 3 years. In this international multispecialty document, quality benchmarks for processes of care and clinical outcomes are defined. These include process measures of time to imaging, arterial puncture, and revascularization and measures of clinical outcome up to 90 days. Quality improvement guidelines are provided for endovascular acute ischemic stroke revascularization procedures. Copyright © 2013 SIR. Published by Elsevier Inc. All rights reserved.
Application of Benchmark Dose Methodology to a Variety of Endpoints and Exposures

EPA Science Inventory

This latest beta version (1.1b) of the U.S. Environmental Protection Agency (EPA) Benchmark Dose Software (BMDS) is being distributed for public comment. The BMDS system is being developed as a tool to facilitate the application of benchmark dose (BMD) methods to EPA hazardous p...
The NAS kernel benchmark program

NASA Technical Reports Server (NTRS)

Bailey, D. H.; Barton, J. T.

1985-01-01

A collection of benchmark test kernels that measure supercomputer performance has been developed for the use of the NAS (Numerical Aerodynamic Simulation) program at the NASA Ames Research Center. This benchmark program is described in detail and the specific ground rules are given for running the program as a performance test.
Benchmarking Ada tasking on tightly coupled multiprocessor architectures

NASA Technical Reports Server (NTRS)

Collard, Philippe; Goforth, Andre; Marquardt, Matthew

1989-01-01

The development of benchmarks and performance measures for parallel Ada tasking is reported with emphasis on the macroscopic behavior of the benchmark across a set of load parameters. The application chosen for the study was the NASREM model for telerobot control, relevant to many NASA missions. The results of the study demonstrate the potential of parallel Ada in accomplishing the task of developing a control system for a system such as the Flight Telerobotic Servicer using the NASREM framework.
Benchmarking Data Sets for the Evaluation of Virtual Ligand Screening Methods: Review and Perspectives.

PubMed

Lagarde, Nathalie; Zagury, Jean-François; Montes, Matthieu

2015-07-27

Virtual screening methods are commonly used nowadays in drug discovery processes. However, to ensure their reliability, they have to be carefully evaluated. The evaluation of these methods is often realized in a retrospective way, notably by studying the enrichment of benchmarking data sets. To this purpose, numerous benchmarking data sets were developed over the years, and the resulting improvements led to the availability of high quality benchmarking data sets. However, some points still have to be considered in the selection of the active compounds, decoys, and protein structures to obtain optimal benchmarking data sets.
Thermal Performance Benchmarking

DOE Office of Scientific and Technical Information (OSTI.GOV)

Feng, Xuhui; Moreno, Gilbert; Bennion, Kevin

2016-06-07

The goal for this project is to thoroughly characterize the thermal performance of state-of-the-art (SOA) in-production automotive power electronics and electric motor thermal management systems. Information obtained from these studies will be used to: evaluate advantages and disadvantages of different thermal management strategies; establish baseline metrics for the thermal management systems; identify methods of improvement to advance the SOA; increase the publicly available information related to automotive traction-drive thermal management systems; help guide future electric drive technologies (EDT) research and development (R&D) efforts. The thermal performance results combined with component efficiency and heat generation information obtained by Oak Ridge Nationalmore » Laboratory (ORNL) may then be used to determine the operating temperatures for the EDT components under drive-cycle conditions. In FY16, the 2012 Nissan LEAF power electronics and 2014 Honda Accord Hybrid power electronics thermal management system were characterized. Comparison of the two power electronics thermal management systems was also conducted to provide insight into the various cooling strategies to understand the current SOA in thermal management for automotive power electronics and electric motors.« less
Equality bias impairs collective decision-making across cultures

PubMed Central

Mahmoodi, Ali; Bang, Dan; Olsen, Karsten; Zhao, Yuanyuan Aimee; Shi, Zhenhao; Broberg, Kristina; Safavi, Shervin; Han, Shihui; Nili Ahmadabadi, Majid; Frith, Chris D.; Roepstorff, Andreas; Rees, Geraint; Bahrami, Bahador

2015-01-01

We tend to think that everyone deserves an equal say in a debate. This seemingly innocuous assumption can be damaging when we make decisions together as part of a group. To make optimal decisions, group members should weight their differing opinions according to how competent they are relative to one another; whenever they differ in competence, an equal weighting is suboptimal. Here, we asked how people deal with individual differences in competence in the context of a collective perceptual decision-making task. We developed a metric for estimating how participants weight their partner’s opinion relative to their own and compared this weighting to an optimal benchmark. Replicated across three countries (Denmark, Iran, and China), we show that participants assigned nearly equal weights to each other’s opinions regardless of true differences in their competence—even when informed by explicit feedback about their competence gap or under monetary incentives to maximize collective accuracy. This equality bias, whereby people behave as if they are as good or as bad as their partner, is particularly costly for a group when a competence gap separates its members. PMID:25775532

Equality bias impairs collective decision-making across cultures.

PubMed

Mahmoodi, Ali; Bang, Dan; Olsen, Karsten; Zhao, Yuanyuan Aimee; Shi, Zhenhao; Broberg, Kristina; Safavi, Shervin; Han, Shihui; Nili Ahmadabadi, Majid; Frith, Chris D; Roepstorff, Andreas; Rees, Geraint; Bahrami, Bahador

2015-03-24

We tend to think that everyone deserves an equal say in a debate. This seemingly innocuous assumption can be damaging when we make decisions together as part of a group. To make optimal decisions, group members should weight their differing opinions according to how competent they are relative to one another; whenever they differ in competence, an equal weighting is suboptimal. Here, we asked how people deal with individual differences in competence in the context of a collective perceptual decision-making task. We developed a metric for estimating how participants weight their partner's opinion relative to their own and compared this weighting to an optimal benchmark. Replicated across three countries (Denmark, Iran, and China), we show that participants assigned nearly equal weights to each other's opinions regardless of true differences in their competence-even when informed by explicit feedback about their competence gap or under monetary incentives to maximize collective accuracy. This equality bias, whereby people behave as if they are as good or as bad as their partner, is particularly costly for a group when a competence gap separates its members.
Filtering Gene Ontology semantic similarity for identifying protein complexes in large protein interaction networks.

PubMed

Wang, Jian; Xie, Dong; Lin, Hongfei; Yang, Zhihao; Zhang, Yijia

2012-06-21

Many biological processes recognize in particular the importance of protein complexes, and various computational approaches have been developed to identify complexes from protein-protein interaction (PPI) networks. However, high false-positive rate of PPIs leads to challenging identification. A protein semantic similarity measure is proposed in this study, based on the ontology structure of Gene Ontology (GO) terms and GO annotations to estimate the reliability of interactions in PPI networks. Interaction pairs with low GO semantic similarity are removed from the network as unreliable interactions. Then, a cluster-expanding algorithm is used to detect complexes with core-attachment structure on filtered network. Our method is applied to three different yeast PPI networks. The effectiveness of our method is examined on two benchmark complex datasets. Experimental results show that our method performed better than other state-of-the-art approaches in most evaluation metrics. The method detects protein complexes from large scale PPI networks by filtering GO semantic similarity. Removing interactions with low GO similarity significantly improves the performance of complex identification. The expanding strategy is also effective to identify attachment proteins of complexes.
QUAST: quality assessment tool for genome assemblies

PubMed Central

Gurevich, Alexey; Saveliev, Vladislav; Vyahhi, Nikolay; Tesler, Glenn

2013-01-01

Summary: Limitations of genome sequencing techniques have led to dozens of assembly algorithms, none of which is perfect. A number of methods for comparing assemblers have been developed, but none is yet a recognized benchmark. Further, most existing methods for comparing assemblies are only applicable to new assemblies of finished genomes; the problem of evaluating assemblies of previously unsequenced species has not been adequately considered. Here, we present QUAST—a quality assessment tool for evaluating and comparing genome assemblies. This tool improves on leading assembly comparison software with new ideas and quality metrics. QUAST can evaluate assemblies both with a reference genome, as well as without a reference. QUAST produces many reports, summary tables and plots to help scientists in their research and in their publications. In this study, we used QUAST to compare several genome assemblers on three datasets. QUAST tables and plots for all of them are available in the Supplementary Material, and interactive versions of these reports are on the QUAST website. Availability: http://bioinf.spbau.ru/quast Contact: gurevich@bioinf.spbau.ru Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23422339
Implementation and verification of global optimization benchmark problems

NASA Astrophysics Data System (ADS)

Posypkin, Mikhail; Usov, Alexander

2017-12-01

The paper considers the implementation and verification of a test suite containing 150 benchmarks for global deterministic box-constrained optimization. A C++ library for describing standard mathematical expressions was developed for this purpose. The library automate the process of generating the value of a function and its' gradient at a given point and the interval estimates of a function and its' gradient on a given box using a single description. Based on this functionality, we have developed a collection of tests for an automatic verification of the proposed benchmarks. The verification has shown that literary sources contain mistakes in the benchmarks description. The library and the test suite are available for download and can be used freely.
Metrics Handbook (Air Force Systems Command)

NASA Astrophysics Data System (ADS)

1991-08-01

The handbook is designed to help one develop and use good metrics. It is intended to provide sufficient information to begin developing metrics for objectives, processes, and tasks, and to steer one toward appropriate actions based on the data one collects. It should be viewed as a road map to assist one in arriving at meaningful metrics and to assist in continuous process improvement.
40 CFR 141.540 - Who has to develop a disinfection benchmark?

Code of Federal Regulations, 2010 CFR

2010-07-01

...) WATER PROGRAMS (CONTINUED) NATIONAL PRIMARY DRINKING WATER REGULATIONS Enhanced Filtration and Disinfection-Systems Serving Fewer Than 10,000 People Disinfection Benchmark § 141.540 Who has to develop a... 40 Protection of Environment 22 2010-07-01 2010-07-01 false Who has to develop a disinfection...
International E-Benchmarking: Flexible Peer Development of Authentic Learning Principles in Higher Education

ERIC Educational Resources Information Center

Leppisaari, Irja; Vainio, Leena; Herrington, Jan; Im, Yeonwook

2011-01-01

More and more, social technologies and virtual work methods are facilitating new ways of crossing boundaries in professional development and international collaborations. This paper examines the peer development of higher education teachers through the experiences of the IVBM project (International Virtual Benchmarking, 2009-2010). The…
LASL benchmark performance 1978. [CDC STAR-100, 6600, 7600, Cyber 73, and CRAY-1

DOE Office of Scientific and Technical Information (OSTI.GOV)

McKnight, A.L.

1979-08-01

This report presents the results of running several benchmark programs on a CDC STAR-100, a Cray Research CRAY-1, a CDC 6600, a CDC 7600, and a CDC Cyber 73. The benchmark effort included CRAY-1's at several installations running different operating systems and compilers. This benchmark is part of an ongoing program at Los Alamos Scientific Laboratory to collect performance data and monitor the development trend of supercomputers. 3 tables.
Developing an automated database for monitoring ultrasound- and computed tomography-guided procedure complications and diagnostic yield.

PubMed

Itri, Jason N; Jones, Lisa P; Kim, Woojin; Boonn, William W; Kolansky, Ana S; Hilton, Susan; Zafar, Hanna M

2014-04-01

Monitoring complications and diagnostic yield for image-guided procedures is an important component of maintaining high quality patient care promoted by professional societies in radiology and accreditation organizations such as the American College of Radiology (ACR) and Joint Commission. These outcome metrics can be used as part of a comprehensive quality assurance/quality improvement program to reduce variation in clinical practice, provide opportunities to engage in practice quality improvement, and contribute to developing national benchmarks and standards. The purpose of this article is to describe the development and successful implementation of an automated web-based software application to monitor procedural outcomes for US- and CT-guided procedures in an academic radiology department. The open source tools PHP: Hypertext Preprocessor (PHP) and MySQL were used to extract relevant procedural information from the Radiology Information System (RIS), auto-populate the procedure log database, and develop a user interface that generates real-time reports of complication rates and diagnostic yield by site and by operator. Utilizing structured radiology report templates resulted in significantly improved accuracy of information auto-populated from radiology reports, as well as greater compliance with manual data entry. An automated web-based procedure log database is an effective tool to reliably track complication rates and diagnostic yield for US- and CT-guided procedures performed in a radiology department.
Temporal stability in human interaction networks

NASA Astrophysics Data System (ADS)

Fabbri, Renato; Fabbri, Ricardo; Antunes, Deborah Christina; Pisani, Marilia Mello; de Oliveira, Osvaldo Novais

2017-11-01

This paper reports on stable (or invariant) properties of human interaction networks, with benchmarks derived from public email lists. Activity, recognized through messages sent, along time and topology were observed in snapshots in a timeline, and at different scales. Our analysis shows that activity is practically the same for all networks across timescales ranging from seconds to months. The principal components of the participants in the topological metrics space remain practically unchanged as different sets of messages are considered. The activity of participants follows the expected scale-free trace, thus yielding the hub, intermediary and peripheral classes of vertices by comparison against the Erdös-Rényi model. The relative sizes of these three sectors are essentially the same for all email lists and the same along time. Typically, < 15% of the vertices are hubs, 15%-45% are intermediary and > 45% are peripheral vertices. Similar results for the distribution of participants in the three sectors and for the relative importance of the topological metrics were obtained for 12 additional networks from Facebook, Twitter and ParticipaBR. These properties are consistent with the literature and may be general for human interaction networks, which has important implications for establishing a typology of participants based on quantitative criteria.
75 FR 26057 - Mandatory Reliability Standards for the Calculation of Available Transfer Capability, Capacity...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-05-11

... Requirement R3.1 of MOD-001-1. C. Benchmarking 14. In the Final Rule, the Commission directed the ERO to develop benchmarking and updating requirements for the MOD Reliability Standards to measure modeled... requirements should specify the frequency for benchmarking and updating the available transfer and flowgate...
Learning Through Benchmarking: Developing a Relational, Prospective Approach to Benchmarking ICT in Learning and Teaching

ERIC Educational Resources Information Center

Ellis, Robert A.; Moore, Roger R.

2006-01-01

This study discusses benchmarking the use of information and communication technologies (ICT) in teaching and learning between two universities with different missions: one an Australian campus-based metropolitan university and the other a British distance-education provider. It argues that the differences notwithstanding, it is possible to…
Benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance.

PubMed

Chaput, Ludovic; Martinez-Sanz, Juan; Saettel, Nicolas; Mouawad, Liliane

2016-01-01

In a structure-based virtual screening, the choice of the docking program is essential for the success of a hit identification. Benchmarks are meant to help in guiding this choice, especially when undertaken on a large variety of protein targets. Here, the performance of four popular virtual screening programs, Gold, Glide, Surflex and FlexX, is compared using the Directory of Useful Decoys-Enhanced database (DUD-E), which includes 102 targets with an average of 224 ligands per target and 50 decoys per ligand, generated to avoid biases in the benchmarking. Then, a relationship between these program performances and the properties of the targets or the small molecules was investigated. The comparison was based on two metrics, with three different parameters each. The BEDROC scores with α = 80.5, indicated that, on the overall database, Glide succeeded (score > 0.5) for 30 targets, Gold for 27, FlexX for 14 and Surflex for 11. The performance did not depend on the hydrophobicity nor the openness of the protein cavities, neither on the families to which the proteins belong. However, despite the care in the construction of the DUD-E database, the small differences that remain between the actives and the decoys likely explain the successes of Gold, Surflex and FlexX. Moreover, the similarity between the actives of a target and its crystal structure ligand seems to be at the basis of the good performance of Glide. When all targets with significant biases are removed from the benchmarking, a subset of 47 targets remains, for which Glide succeeded for only 5 targets, Gold for 4 and FlexX and Surflex for 2. The performance dramatic drop of all four programs when the biases are removed shows that we should beware of virtual screening benchmarks, because good performances may be due to wrong reasons. Therefore, benchmarking would hardly provide guidelines for virtual screening experiments, despite the tendency that is maintained, i.e., Glide and Gold display better performance than FlexX and Surflex. We recommend to always use several programs and combine their results. Graphical AbstractSummary of the results obtained by virtual screening with the four programs, Glide, Gold, Surflex and FlexX, on the 102 targets of the DUD-E database. The percentage of targets with successful results, i.e., with BDEROC(α = 80.5) > 0.5, when the entire database is considered are in Blue, and when targets with biased chemical libraries are removed are in Red.
Development and Application of Benchmark Examples for Mixed-Mode I/II Quasi-Static Delamination Propagation Predictions

NASA Technical Reports Server (NTRS)

Krueger, Ronald

2012-01-01

The development of benchmark examples for quasi-static delamination propagation prediction is presented. The example is based on a finite element model of the Mixed-Mode Bending (MMB) specimen for 50% mode II. The benchmarking is demonstrated for Abaqus/Standard, however, the example is independent of the analysis software used and allows the assessment of the automated delamination propagation prediction capability in commercial finite element codes based on the virtual crack closure technique (VCCT). First, a quasi-static benchmark example was created for the specimen. Second, starting from an initially straight front, the delamination was allowed to propagate under quasi-static loading. Third, the load-displacement as well as delamination length versus applied load/displacement relationships from a propagation analysis and the benchmark results were compared, and good agreement could be achieved by selecting the appropriate input parameters. The benchmarking procedure proved valuable by highlighting the issues associated with choosing the input parameters of the particular implementation. Overall, the results are encouraging, but further assessment for mixed-mode delamination fatigue onset and growth is required.
Potential uncertainty reduction in model-averaged benchmark dose estimates informed by an additional dose study.

PubMed

Shao, Kan; Small, Mitchell J

2011-10-01

A methodology is presented for assessing the information value of an additional dosage experiment in existing bioassay studies. The analysis demonstrates the potential reduction in the uncertainty of toxicity metrics derived from expanded studies, providing insights for future studies. Bayesian methods are used to fit alternative dose-response models using Markov chain Monte Carlo (MCMC) simulation for parameter estimation and Bayesian model averaging (BMA) is used to compare and combine the alternative models. BMA predictions for benchmark dose (BMD) are developed, with uncertainty in these predictions used to derive the lower bound BMDL. The MCMC and BMA results provide a basis for a subsequent Monte Carlo analysis that backcasts the dosage where an additional test group would have been most beneficial in reducing the uncertainty in the BMD prediction, along with the magnitude of the expected uncertainty reduction. Uncertainty reductions are measured in terms of reduced interval widths of predicted BMD values and increases in BMDL values that occur as a result of this reduced uncertainty. The methodology is illustrated using two existing data sets for TCDD carcinogenicity, fitted with two alternative dose-response models (logistic and quantal-linear). The example shows that an additional dose at a relatively high value would have been most effective for reducing the uncertainty in BMA BMD estimates, with predicted reductions in the widths of uncertainty intervals of approximately 30%, and expected increases in BMDL values of 5-10%. The results demonstrate that dose selection for studies that subsequently inform dose-response models can benefit from consideration of how these models will be fit, combined, and interpreted. © 2011 Society for Risk Analysis.
Develop metrics of tire debris on Texas highways : [project summary].

DOT National Transportation Integrated Search

2017-05-01

This research developed metrics on the amount and characteristics of tire debris generated on Texas highways. These metrics provide numerical, data-based rates for districts to anticipate the amounts and characteristics of tire debris and to plan rem...
The Earthquake Source Inversion Validation (SIV) - Project: Summary, Status, Outlook

NASA Astrophysics Data System (ADS)

Mai, P. M.

2017-12-01

Finite-fault earthquake source inversions infer the (time-dependent) displacement on the rupture surface from geophysical data. The resulting earthquake source models document the complexity of the rupture process. However, this kinematic source inversion is ill-posed and returns non-unique solutions, as seen for instance in multiple source models for the same earthquake, obtained by different research teams, that often exhibit remarkable dissimilarities. To address the uncertainties in earthquake-source inversions and to understand strengths and weaknesses of various methods, the Source Inversion Validation (SIV) project developed a set of forward-modeling exercises and inversion benchmarks. Several research teams then use these validation exercises to test their codes and methods, but also to develop and benchmark new approaches. In this presentation I will summarize the SIV strategy, the existing benchmark exercises and corresponding results. Using various waveform-misfit criteria and newly developed statistical comparison tools to quantify source-model (dis)similarities, the SIV platforms is able to rank solutions and identify particularly promising source inversion approaches. Existing SIV exercises (with related data and descriptions) and all computational tools remain available via the open online collaboration platform; additional exercises and benchmark tests will be uploaded once they are fully developed. I encourage source modelers to use the SIV benchmarks for developing and testing new methods. The SIV efforts have already led to several promising new techniques for tackling the earthquake-source imaging problem. I expect that future SIV benchmarks will provide further innovations and insights into earthquake source kinematics that will ultimately help to better understand the dynamics of the rupture process.
Optimization of Deep Drilling Performance - Development and Benchmark Testing of Advanced Diamond Product Drill Bits & HP/HT Fluids to Significantly Improve Rates of Penetration

DOE Office of Scientific and Technical Information (OSTI.GOV)

Alan Black; Arnis Judzis

2005-09-30

This document details the progress to date on the OPTIMIZATION OF DEEP DRILLING PERFORMANCE--DEVELOPMENT AND BENCHMARK TESTING OF ADVANCED DIAMOND PRODUCT DRILL BITS AND HP/HT FLUIDS TO SIGNIFICANTLY IMPROVE RATES OF PENETRATION contract for the year starting October 2004 through September 2005. The industry cost shared program aims to benchmark drilling rates of penetration in selected simulated deep formations and to significantly improve ROP through a team development of aggressive diamond product drill bit--fluid system technologies. Overall the objectives are as follows: Phase 1--Benchmark ''best in class'' diamond and other product drilling bits and fluids and develop concepts for amore » next level of deep drilling performance; Phase 2--Develop advanced smart bit-fluid prototypes and test at large scale; and Phase 3--Field trial smart bit--fluid concepts, modify as necessary and commercialize products. As of report date, TerraTek has concluded all Phase 1 testing and is planning Phase 2 development.« less
Optical modulation in silicon-vanadium dioxide photonic structures

NASA Astrophysics Data System (ADS)

Miller, Kevin J.; Hallman, Kent A.; Haglund, Richard F.; Weiss, Sharon M.

2017-08-01

All-optical modulators are likely to play an important role in future chip-scale information processing systems. In this work, through simulations, we investigate the potential of a recently reported vanadium dioxide (VO2) embedded silicon waveguide structure for ultrafast all-optical signal modulation. With a VO2 length of only 200 nm, finite-differencetime- domain simulations suggest broadband (200 nm) operation with a modulation greater than 12 dB and an insertion loss of less than 3 dB. Predicted performance metrics, including modulation speed, modulation depth, optical bandwidth, insertion loss, device footprint, and energy consumption of the proposed Si-VO2 all-optical modulator are benchmarked against those of current state-of-the-art all-optical modulators with in-plane optical excitation.
Connecting to Young Adults: An Online Social Network Survey of Beliefs and Attitudes Associated With Prescription Opioid Misuse Among College Students

PubMed Central

Lord, Sarah; Brevard, Julie; Budman, Simon

2011-01-01

A survey of motives and attitudes associated with patterns of nonmedical prescription opioid medication use among college students was conducted on Facebook, a popular online social networking Web site. Response metrics for a 2-week random advertisement post, targeting students who had misused prescription medications, surpassed typical benchmarks for online marketing campaigns and yielded 527 valid surveys. Respondent characteristics, substance use patterns, and use motives were consistent with other surveys of prescription opioid use among college populations. Results support the potential of online social networks to serve as powerful vehicles to connect with college-aged populations about their drug use. Limitations of the study are noted. PMID:21190407

Identification of a Lead Candidate in the Search for Carbene-Stabilised Homoaromatics.

PubMed

Mattock, James D; Vargas, Alfredo; Dewhurst, Rian D

2015-11-16

The effect of carbenes as Lewis donor groups on the homoaromaticity of mono- and bicyclic organic molecules is surveyed. The search for viable carbene-stabilised homoaromatics resulted in a large amount of rejected candidates as well as nine promising candidates that are further analysed for their homoaromaticity by using a number of metrics. Of these, five appeared to show modest homoaromaticity, whereas another compound showed a level of homoaromaticity comparable with the homotropylium cation benchmark compound. Isoelectronic analogues and constitutional isomers of the lead compound were investigated, however, none of these showed comparable homoaromaticity. The implications of these calculations on the design of donor-stabilised homoaromatics are discussed. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Empirical Evaluation of Hunk Metrics as Bug Predictors

NASA Astrophysics Data System (ADS)

Ferzund, Javed; Ahsan, Syed Nadeem; Wotawa, Franz

Reducing the number of bugs is a crucial issue during software development and maintenance. Software process and product metrics are good indicators of software complexity. These metrics have been used to build bug predictor models to help developers maintain the quality of software. In this paper we empirically evaluate the use of hunk metrics as predictor of bugs. We present a technique for bug prediction that works at smallest units of code change called hunks. We build bug prediction models using random forests, which is an efficient machine learning classifier. Hunk metrics are used to train the classifier and each hunk metric is evaluated for its bug prediction capabilities. Our classifier can classify individual hunks as buggy or bug-free with 86 % accuracy, 83 % buggy hunk precision and 77% buggy hunk recall. We find that history based and change level hunk metrics are better predictors of bugs than code level hunk metrics.
KIC 12557548 and Similar Stars as SETI Targets

NASA Astrophysics Data System (ADS)

Star Cartier, Kimberly Michelle

2015-01-01

This project aims to construct a robust information theoretic metric to quantify anomalous transit light curves and compare regular and irregular transits in a reproducible way. Using this metric we can distinguish natural transits from predicted extraterrestrial intelligence (ETI) communication that utilizes transiting mega-structures to alter the transit shape and depth in a measurable way. KIC-12557548b (KIC-1255b) is such an anomalous planet, with highly variable consecutive transit depths and shapes that have been explained by Rappaport et al. (2012) and Croll et al. (2014) as due to a disintegrating sub-Mercury sized planet with a debris tail encompassing the planetary orbit. However, Arnold (2005) and later Forgan (2013) presented models showing that planet-sized, non-circular artificial structures transiting their host star could be identified as non-natural by light curves anomalous in their duration and asymmetry, as in the case of KIC-1255b. If such mega-engineering structures were able to alter their aspects on orbital timescales, the resulting transit depths could be used to transmit information at low bandwidth. We use KIC-1255b as a benchmark case for separating anomalous transit signals that resemble ETI predictions but are naturally occurring. To do this, we use the Kullback-Leibler (KL) divergence of the KIC-1255b transit depth time series to quantify the entropy of the transit depth series. We calibrate our relative entropy metric by calculating the KL divergence of the Kepler-5b transits, which are markedly constant compared to KIC-1255b. Artificially generated transit depth time series data using Arnold's beacons allow us to calculate the KL divergence of predicted ETI communications and show that while KIC-1255b might match ETI predictions of shape and depth variations, the entropy content of the datasets are distinct by our metric. Thus we can use the entropy metric to test other cases of anomalous transits to separate out those transiting planets that can be explained through natural models and those for which an ETI hypothesis might be entertained.
Solution of the neutronics code dynamic benchmark by finite element method

NASA Astrophysics Data System (ADS)

Avvakumov, A. V.; Vabishchevich, P. N.; Vasilev, A. O.; Strizhov, V. F.

2016-10-01

The objective is to analyze the dynamic benchmark developed by Atomic Energy Research for the verification of best-estimate neutronics codes. The benchmark scenario includes asymmetrical ejection of a control rod in a water-type hexagonal reactor at hot zero power. A simple Doppler feedback mechanism assuming adiabatic fuel temperature heating is proposed. The finite element method on triangular calculation grids is used to solve the three-dimensional neutron kinetics problem. The software has been developed using the engineering and scientific calculation library FEniCS. The matrix spectral problem is solved using the scalable and flexible toolkit SLEPc. The solution accuracy of the dynamic benchmark is analyzed by condensing calculation grid and varying degree of finite elements.
Automatic extraction and visualization of object-oriented software design metrics

NASA Astrophysics Data System (ADS)

Lakshminarayana, Anuradha; Newman, Timothy S.; Li, Wei; Talburt, John

2000-02-01

Software visualization is a graphical representation of software characteristics and behavior. Certain modes of software visualization can be useful in isolating problems and identifying unanticipated behavior. In this paper we present a new approach to aid understanding of object- oriented software through 3D visualization of software metrics that can be extracted from the design phase of software development. The focus of the paper is a metric extraction method and a new collection of glyphs for multi- dimensional metric visualization. Our approach utilize the extensibility interface of a popular CASE tool to access and automatically extract the metrics from Unified Modeling Language class diagrams. Following the extraction of the design metrics, 3D visualization of these metrics are generated for each class in the design, utilizing intuitively meaningful 3D glyphs that are representative of the ensemble of metrics. Extraction and visualization of design metrics can aid software developers in the early study and understanding of design complexity.
Implementing Cognitive Strategy Instruction across the School: The Benchmark Manual for Teachers.

ERIC Educational Resources Information Center

Gaskins, Irene; Elliot, Thorne

Improving reading instruction has been the primary focus at the Benchmark School in Media, Pennsylvania. This book describes the various phases of Benchmark's development of a program to create strategic learners, thinkers, and problem solvers across the curriculum. The goal is to provide teachers and administrators with a handbook that can be…
Elementary School Students' Science Talk Ability in Inquiry-Oriented Settings in Taiwan: Test Development, Verification, and Performance Benchmarks

ERIC Educational Resources Information Center

Lin, Sheau-Wen; Liu, Yu; Chen, Shin-Feng; Wang, Jing-Ru; Kao, Huey-Lien

2016-01-01

The purpose of this study was to develop a computer-based measure of elementary students' science talk and to report students' benchmarks. The development procedure had three steps: defining the framework of the test, collecting and identifying key reference sets of science talk, and developing and verifying the science talk instrument. The…
BENCHMARKING SUSTAINABILITY ENGINEERING EDUCATION

EPA Science Inventory

The goals of this project are to develop and apply a methodology for benchmarking curricula in sustainability engineering and to identify individuals active in sustainability engineering education.
Development of Benchmark Examples for Quasi-Static Delamination Propagation and Fatigue Growth Predictions

NASA Technical Reports Server (NTRS)

Krueger, Ronald

2012-01-01

The development of benchmark examples for quasi-static delamination propagation and cyclic delamination onset and growth prediction is presented and demonstrated for Abaqus/Standard. The example is based on a finite element model of a Double-Cantilever Beam specimen. The example is independent of the analysis software used and allows the assessment of the automated delamination propagation, onset and growth prediction capabilities in commercial finite element codes based on the virtual crack closure technique (VCCT). First, a quasi-static benchmark example was created for the specimen. Second, based on the static results, benchmark examples for cyclic delamination growth were created. Third, the load-displacement relationship from a propagation analysis and the benchmark results were compared, and good agreement could be achieved by selecting the appropriate input parameters. Fourth, starting from an initially straight front, the delamination was allowed to grow under cyclic loading. The number of cycles to delamination onset and the number of cycles during delamination growth for each growth increment were obtained from the automated analysis and compared to the benchmark examples. Again, good agreement between the results obtained from the growth analysis and the benchmark results could be achieved by selecting the appropriate input parameters. The benchmarking procedure proved valuable by highlighting the issues associated with choosing the input parameters of the particular implementation. Selecting the appropriate input parameters, however, was not straightforward and often required an iterative procedure. Overall the results are encouraging, but further assessment for mixed-mode delamination is required.
Development of Benchmark Examples for Static Delamination Propagation and Fatigue Growth Predictions

NASA Technical Reports Server (NTRS)

Kruger, Ronald

2011-01-01

The development of benchmark examples for static delamination propagation and cyclic delamination onset and growth prediction is presented and demonstrated for a commercial code. The example is based on a finite element model of an End-Notched Flexure (ENF) specimen. The example is independent of the analysis software used and allows the assessment of the automated delamination propagation, onset and growth prediction capabilities in commercial finite element codes based on the virtual crack closure technique (VCCT). First, static benchmark examples were created for the specimen. Second, based on the static results, benchmark examples for cyclic delamination growth were created. Third, the load-displacement relationship from a propagation analysis and the benchmark results were compared, and good agreement could be achieved by selecting the appropriate input parameters. Fourth, starting from an initially straight front, the delamination was allowed to grow under cyclic loading. The number of cycles to delamination onset and the number of cycles during stable delamination growth for each growth increment were obtained from the automated analysis and compared to the benchmark examples. Again, good agreement between the results obtained from the growth analysis and the benchmark results could be achieved by selecting the appropriate input parameters. The benchmarking procedure proved valuable by highlighting the issues associated with the input parameters of the particular implementation. Selecting the appropriate input parameters, however, was not straightforward and often required an iterative procedure. Overall, the results are encouraging but further assessment for mixed-mode delamination is required.
Development and Application of Benchmark Examples for Mode II Static Delamination Propagation and Fatigue Growth Predictions

NASA Technical Reports Server (NTRS)

Krueger, Ronald

2011-01-01

The development of benchmark examples for static delamination propagation and cyclic delamination onset and growth prediction is presented and demonstrated for a commercial code. The example is based on a finite element model of an End-Notched Flexure (ENF) specimen. The example is independent of the analysis software used and allows the assessment of the automated delamination propagation, onset and growth prediction capabilities in commercial finite element codes based on the virtual crack closure technique (VCCT). First, static benchmark examples were created for the specimen. Second, based on the static results, benchmark examples for cyclic delamination growth were created. Third, the load-displacement relationship from a propagation analysis and the benchmark results were compared, and good agreement could be achieved by selecting the appropriate input parameters. Fourth, starting from an initially straight front, the delamination was allowed to grow under cyclic loading. The number of cycles to delamination onset and the number of cycles during delamination growth for each growth increment were obtained from the automated analysis and compared to the benchmark examples. Again, good agreement between the results obtained from the growth analysis and the benchmark results could be achieved by selecting the appropriate input parameters. The benchmarking procedure proved valuable by highlighting the issues associated with choosing the input parameters of the particular implementation. Selecting the appropriate input parameters, however, was not straightforward and often required an iterative procedure. Overall the results are encouraging, but further assessment for mixed-mode delamination is required.
Assessment of competency in endoscopy: establishing and validating generalizable competency benchmarks for colonoscopy.

PubMed

Sedlack, Robert E; Coyle, Walter J

2016-03-01

The Mayo Colonoscopy Skills Assessment Tool (MCSAT) has previously been used to describe learning curves and competency benchmarks for colonoscopy; however, these data were limited to a single training center. The newer Assessment of Competency in Endoscopy (ACE) tool is a refinement of the MCSAT tool put forth by the Training Committee of the American Society for Gastrointestinal Endoscopy, intended to include additional important quality metrics. The goal of this study is to validate the changes made by updating this tool and establish more generalizable and reliable learning curves and competency benchmarks for colonoscopy by examining a larger national cohort of trainees. In a prospective, multicenter trial, gastroenterology fellows at all stages of training had their core cognitive and motor skills in colonoscopy assessed by staff. Evaluations occurred at set intervals of every 50 procedures throughout the 2013 to 2014 academic year. Skills were graded by using the ACE tool, which uses a 4-point grading scale defining the continuum from novice to competent. Average learning curves for each skill were established at each interval in training and competency benchmarks for each skill were established using the contrasting groups method. Ninety-three gastroenterology fellows at 10 U.S. academic institutions had 1061 colonoscopies assessed by using the ACE tool. Average scores of 3.5 were found to be inclusive of all minimal competency thresholds identified for each core skill. Cecal intubation times of less than 15 minutes and independent cecal intubation rates of 90% were also identified as additional competency thresholds during analysis. The average fellow achieved all cognitive and motor skill endpoints by 250 procedures, with >90% surpassing these thresholds by 300 procedures. Nationally generalizable learning curves for colonoscopy skills in gastroenterology fellows are described. Average ACE scores of 3.5, cecal intubation rates of 90%, and intubation times less than 15 minutes are recommended as minimal competency criteria. On average, it takes 250 procedures to achieve competence in colonoscopy. The thresholds found in this multicenter cohort by using the ACE tool are nearly identical to the previously established MCSAT benchmarks and are consistent with recent gastroenterology training recommendations but far higher than current training requirements in other specialties. Copyright © 2016 American Society for Gastrointestinal Endoscopy. Published by Elsevier Inc. All rights reserved.
Rice by Weight, Other Produce by Bulk, and Snared Iguanas at So Much Per One. A Talk on Measurement Standards and on Metric Conversion.

ERIC Educational Resources Information Center

Allen, Harold Don

This script for a short radio broadcast on measurement standards and metric conversion begins by tracing the rise of the metric system in the international marketplace. Metric units are identified and briefly explained. Arguments for conversion to metric measures are presented. The history of the development and acceptance of the metric system is…
Surveillance metrics sensitivity study.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hamada, Michael S.; Bierbaum, Rene Lynn; Robertson, Alix A.

2011-09-01

In September of 2009, a Tri-Lab team was formed to develop a set of metrics relating to the NNSA nuclear weapon surveillance program. The purpose of the metrics was to develop a more quantitative and/or qualitative metric(s) describing the results of realized or non-realized surveillance activities on our confidence in reporting reliability and assessing the stockpile. As a part of this effort, a statistical sub-team investigated various techniques and developed a complementary set of statistical metrics that could serve as a foundation for characterizing aspects of meeting the surveillance program objectives. The metrics are a combination of tolerance limit calculationsmore » and power calculations, intending to answer level-of-confidence type questions with respect to the ability to detect certain undesirable behaviors (catastrophic defects, margin insufficiency defects, and deviations from a model). Note that the metrics are not intended to gauge product performance but instead the adequacy of surveillance. This report gives a short description of four metrics types that were explored and the results of a sensitivity study conducted to investigate their behavior for various inputs. The results of the sensitivity study can be used to set the risk parameters that specify the level of stockpile problem that the surveillance program should be addressing.« less
Surveillance Metrics Sensitivity Study

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bierbaum, R; Hamada, M; Robertson, A

2011-11-01

In September of 2009, a Tri-Lab team was formed to develop a set of metrics relating to the NNSA nuclear weapon surveillance program. The purpose of the metrics was to develop a more quantitative and/or qualitative metric(s) describing the results of realized or non-realized surveillance activities on our confidence in reporting reliability and assessing the stockpile. As a part of this effort, a statistical sub-team investigated various techniques and developed a complementary set of statistical metrics that could serve as a foundation for characterizing aspects of meeting the surveillance program objectives. The metrics are a combination of tolerance limit calculationsmore » and power calculations, intending to answer level-of-confidence type questions with respect to the ability to detect certain undesirable behaviors (catastrophic defects, margin insufficiency defects, and deviations from a model). Note that the metrics are not intended to gauge product performance but instead the adequacy of surveillance. This report gives a short description of four metrics types that were explored and the results of a sensitivity study conducted to investigate their behavior for various inputs. The results of the sensitivity study can be used to set the risk parameters that specify the level of stockpile problem that the surveillance program should be addressing.« less
Benchmarking health IT among OECD countries: better data for better policy

PubMed Central

Adler-Milstein, Julia; Ronchi, Elettra; Cohen, Genna R; Winn, Laura A Pannella; Jha, Ashish K

2014-01-01

Objective To develop benchmark measures of health information and communication technology (ICT) use to facilitate cross-country comparisons and learning. Materials and methods The effort is led by the Organisation for Economic Co-operation and Development (OECD). Approaches to definition and measurement within four ICT domains were compared across seven OECD countries in order to identify functionalities in each domain. These informed a set of functionality-based benchmark measures, which were refined in collaboration with representatives from more than 20 OECD and non-OECD countries. We report on progress to date and remaining work to enable countries to begin to collect benchmark data. Results The four benchmarking domains include provider-centric electronic record, patient-centric electronic record, health information exchange, and tele-health. There was broad agreement on functionalities in the provider-centric electronic record domain (eg, entry of core patient data, decision support), and less agreement in the other three domains in which country representatives worked to select benchmark functionalities. Discussion Many countries are working to implement ICTs to improve healthcare system performance. Although many countries are looking to others as potential models, the lack of consistent terminology and approach has made cross-national comparisons and learning difficult. Conclusions As countries develop and implement strategies to increase the use of ICTs to promote health goals, there is a historic opportunity to enable cross-country learning. To facilitate this learning and reduce the chances that individual countries flounder, a common understanding of health ICT adoption and use is needed. The OECD-led benchmarking process is a crucial step towards achieving this. PMID:23721983
Benchmarking health IT among OECD countries: better data for better policy.

PubMed

Adler-Milstein, Julia; Ronchi, Elettra; Cohen, Genna R; Winn, Laura A Pannella; Jha, Ashish K

2014-01-01

To develop benchmark measures of health information and communication technology (ICT) use to facilitate cross-country comparisons and learning. The effort is led by the Organisation for Economic Co-operation and Development (OECD). Approaches to definition and measurement within four ICT domains were compared across seven OECD countries in order to identify functionalities in each domain. These informed a set of functionality-based benchmark measures, which were refined in collaboration with representatives from more than 20 OECD and non-OECD countries. We report on progress to date and remaining work to enable countries to begin to collect benchmark data. The four benchmarking domains include provider-centric electronic record, patient-centric electronic record, health information exchange, and tele-health. There was broad agreement on functionalities in the provider-centric electronic record domain (eg, entry of core patient data, decision support), and less agreement in the other three domains in which country representatives worked to select benchmark functionalities. Many countries are working to implement ICTs to improve healthcare system performance. Although many countries are looking to others as potential models, the lack of consistent terminology and approach has made cross-national comparisons and learning difficult. As countries develop and implement strategies to increase the use of ICTs to promote health goals, there is a historic opportunity to enable cross-country learning. To facilitate this learning and reduce the chances that individual countries flounder, a common understanding of health ICT adoption and use is needed. The OECD-led benchmarking process is a crucial step towards achieving this.
Advanced Life Support Research and Technology Development Metric

NASA Technical Reports Server (NTRS)

Hanford, A. J.

2004-01-01

The Metric is one of several measures employed by the NASA to assess the Agency s progress as mandated by the United States Congress and the Office of Management and Budget. Because any measure must have a reference point, whether explicitly defined or implied, the Metric is a comparison between a selected ALS Project life support system and an equivalently detailed life support system using technology from the Environmental Control and Life Support System (ECLSS) for the International Space Station (ISS). This document provides the official calculation of the Advanced Life Support (ALS) Research and Technology Development Metric (the Metric) for Fiscal Year 2004. The values are primarily based on Systems Integration, Modeling, and Analysis (SIMA) Element approved software tools or reviewed and approved reference documents. For Fiscal Year 2004, the Advanced Life Support Research and Technology Development Metric value is 2.03 for an Orbiting Research Facility and 1.62 for an Independent Exploration Mission.
Comprehensive Metric Education Project: Implementing Metrics at a District Level Administrative Guide.

ERIC Educational Resources Information Center

Borelli, Michael L.

This document details the administrative issues associated with guiding a school district through its metrication efforts. Issues regarding staff development, curriculum development, and the acquisition of instructional resources are considered. Alternative solutions are offered. Finally, an overall implementation strategy is discussed with…
Principles for Developing Benchmark Criteria for Staff Training in Responsible Gambling.

PubMed

Oehler, Stefan; Banzer, Raphaela; Gruenerbl, Agnes; Malischnig, Doris; Griffiths, Mark D; Haring, Christian

2017-03-01

One approach to minimizing the negative consequences of excessive gambling is staff training to reduce the rate of the development of new cases of harm or disorder within their customers. The primary goal of the present study was to assess suitable benchmark criteria for the training of gambling employees at casinos and lottery retailers. The study utilised the Delphi Method, a survey with one qualitative and two quantitative phases. A total of 21 invited international experts in the responsible gambling field participated in all three phases. A total of 75 performance indicators were outlined and assigned to six categories: (1) criteria of content, (2) modelling, (3) qualification of trainer, (4) framework conditions, (5) sustainability and (6) statistical indicators. Nine of the 75 indicators were rated as very important by 90 % or more of the experts. Unanimous support for importance was given to indicators such as (1) comprehensibility and (2) concrete action-guidance for handling with problem gamblers, Additionally, the study examined the implementation of benchmarking, when it should be conducted, and who should be responsible. Results indicated that benchmarking should be conducted every 1-2 years regularly and that one institution should be clearly defined and primarily responsible for benchmarking. The results of the present study provide the basis for developing a benchmarking for staff training in responsible gambling.

A benchmarking method to measure dietary absorption efficiency of chemicals by fish.

PubMed

Xiao, Ruiyang; Adolfsson-Erici, Margaretha; Åkerman, Gun; McLachlan, Michael S; MacLeod, Matthew

2013-12-01

Understanding the dietary absorption efficiency of chemicals in the gastrointestinal tract of fish is important from both a scientific and a regulatory point of view. However, reported fish absorption efficiencies for well-studied chemicals are highly variable. In the present study, the authors developed and exploited an internal chemical benchmarking method that has the potential to reduce uncertainty and variability and, thus, to improve the precision of measurements of fish absorption efficiency. The authors applied the benchmarking method to measure the gross absorption efficiency for 15 chemicals with a wide range of physicochemical properties and structures. They selected 2,2',5,6'-tetrachlorobiphenyl (PCB53) and decabromodiphenyl ethane as absorbable and nonabsorbable benchmarks, respectively. Quantities of chemicals determined in fish were benchmarked to the fraction of PCB53 recovered in fish, and quantities of chemicals determined in feces were benchmarked to the fraction of decabromodiphenyl ethane recovered in feces. The performance of the benchmarking procedure was evaluated based on the recovery of the test chemicals and precision of absorption efficiency from repeated tests. Benchmarking did not improve the precision of the measurements; after benchmarking, however, the median recovery for 15 chemicals was 106%, and variability of recoveries was reduced compared with before benchmarking, suggesting that benchmarking could account for incomplete extraction of chemical in fish and incomplete collection of feces from different tests. © 2013 SETAC.
AN OVERVIEW OF THE DEVELOPMENT, STATUS, AND APPLICATION OF EQUILIBRIUM PARTITIONING SEDIMENT BENCHMARKS FOR PAH MIXTURES

EPA Science Inventory

This article provides an overview of the development, theoretical basis, regulatory status, and application of the U.S. Environmental Protection Agency's (USEPA's)< Equilibrium Partitioning Sediment Benchmarks (ESBs) for PAH mixtures. ESBs are compared to other sediment quality g...
Sparganothis fruitworm degree-day benchmarks provide key treatmen timings for cranberry IPM

USDA-ARS?s Scientific Manuscript database

Degree-day benchmarks indicate discrete biological events in the development of insect pests. For the Sparganothis fruitworm, we have isolated all key development events and linked them to degree-day accumulations. These degree-day accumulations can greatly improve treatment timings for cranberry ...
Metric Education; A Position Paper for Vocational, Technical and Adult Education.

ERIC Educational Resources Information Center

Cooper, Gloria S.; And Others

Part of an Office of Education three-year project on metric education, the position paper is intended to alert and prepare teachers, curriculum developers, and administrators in vocational, technical, and adult education to the change over to the metric system. The five chapters cover issues in metric education, what the metric system is all…
Attack-Resistant Trust Metrics

NASA Astrophysics Data System (ADS)

Levien, Raph

The Internet is an amazingly powerful tool for connecting people together, unmatched in human history. Yet, with that power comes great potential for spam and abuse. Trust metrics are an attempt to compute the set of which people are trustworthy and which are likely attackers. This chapter presents two specific trust metrics developed and deployed on the Advogato Website, which is a community blog for free software developers. This real-world experience demonstrates that the trust metrics fulfilled their goals, but that for good results, it is important to match the assumptions of the abstract trust metric computation to the real-world implementation.
Developing image processing meta-algorithms with data mining of multiple metrics.

PubMed

Leung, Kelvin; Cunha, Alexandre; Toga, A W; Parker, D Stott

2014-01-01

People often use multiple metrics in image processing, but here we take a novel approach of mining the values of batteries of metrics on image processing results. We present a case for extending image processing methods to incorporate automated mining of multiple image metric values. Here by a metric we mean any image similarity or distance measure, and in this paper we consider intensity-based and statistical image measures and focus on registration as an image processing problem. We show how it is possible to develop meta-algorithms that evaluate different image processing results with a number of different metrics and mine the results in an automated fashion so as to select the best results. We show that the mining of multiple metrics offers a variety of potential benefits for many image processing problems, including improved robustness and validation.
Medical school benchmarking - from tools to programmes.

PubMed

Wilkinson, Tim J; Hudson, Judith N; Mccoll, Geoffrey J; Hu, Wendy C Y; Jolly, Brian C; Schuwirth, Lambert W T

2015-02-01

Benchmarking among medical schools is essential, but may result in unwanted effects. To apply a conceptual framework to selected benchmarking activities of medical schools. We present an analogy between the effects of assessment on student learning and the effects of benchmarking on medical school educational activities. A framework by which benchmarking can be evaluated was developed and applied to key current benchmarking activities in Australia and New Zealand. The analogy generated a conceptual framework that tested five questions to be considered in relation to benchmarking: what is the purpose? what are the attributes of value? what are the best tools to assess the attributes of value? what happens to the results? and, what is the likely "institutional impact" of the results? If the activities were compared against a blueprint of desirable medical graduate outcomes, notable omissions would emerge. Medical schools should benchmark their performance on a range of educational activities to ensure quality improvement and to assure stakeholders that standards are being met. Although benchmarking potentially has positive benefits, it could also result in perverse incentives with unforeseen and detrimental effects on learning if it is undertaken using only a few selected assessment tools.
Single-Point Mutation with a Rotamer Library Toolkit: Toward Protein Engineering.

PubMed

Pottel, Joshua; Moitessier, Nicolas

2015-12-28

Protein engineers have long been hard at work to harness biocatalysts as a natural source of regio-, stereo-, and chemoselectivity in order to carry out chemistry (reactions and/or substrates) not previously achieved with these enzymes. The extreme labor demands and exponential number of mutation combinations have induced computational advances in this domain. The first step in our virtual approach is to predict the correct conformations upon mutation of residues (i.e., rebuilding side chains). For this purpose, we opted for a combination of molecular mechanics and statistical data. In this work, we have developed automated computational tools to extract protein structural information and created conformational libraries for each amino acid dependent on a variable number of parameters (e.g., resolution, flexibility, secondary structure). We have also developed the necessary tool to apply the mutation and optimize the conformation accordingly. For side-chain conformation prediction, we obtained overall average root-mean-square deviations (RMSDs) of 0.91 and 1.01 Å for the 18 flexible natural amino acids within two distinct sets of over 3000 and 1500 side-chain residues, respectively. The commonly used dihedral angle differences were also evaluated and performed worse than the state of the art. These two metrics are also compared. Furthermore, we generated a family-specific library for kinases that produced an average 2% lower RMSD upon side-chain reconstruction and a residue-specific library that yielded a 17% improvement. Ultimately, since our protein engineering outlook involves using our docking software, Fitted/Impacts, we applied our mutation protocol to a benchmarked data set for self- and cross-docking. Our side-chain reconstruction does not hinder our docking software, demonstrating differences in pose prediction accuracy of approximately 2% (RMSD cutoff metric) for a set of over 200 protein/ligand structures. Similarly, when docking to a set of over 100 kinases, side-chain reconstruction (using both general and biased conformation libraries) had minimal detriment to the docking accuracy.
A Combination of Traditional and Novel Methods Used to Evaluate the Impact of an EVA Glove on Hand Performance

NASA Technical Reports Server (NTRS)

Rajulu, Sudhakar; Benson, Elizabeth; England, Scott; Mesloh, Miranda; Thompson, Shelby

2009-01-01

The gloved hand is an astronaut s primary means of interacting with the environment, so performance on an EVA is strongly impacted by any restrictions imposed by the glove. As a result, these restrictions have been the subject of study for decades. However, previous studies have generally been unsuccessful in quantifying glove mobility and tactility. Instead, studies have tended to focus on the dexterity, strength and functional performance of the gloved hand. Therefore, it has been difficult to judge the impact of each type of restriction on the glove s overall capability. The lack of basic information on glove mobility in particular, is related to the difficulty in instrumenting a gloved hand to allow an accurate evaluation. However, the current study aims at developing novel technological capabilities to provide metrics for mobility and tactility that can be used to assess the performance of a glove in a way that could enable designers and engineers to improve upon their current designs. A series of evaluations were performed in ungloved, unpressurized and pressurized (4.3 psi) conditions, to allow a comparison across pressures and to the baseline barehanded condition. In addition, a subset of the testing was also performed with the Thermal Micrometeoroid Garment (TMG) removed. This test case in particular provided some interesting insight into how much of an impact the TMG has on gloved mobility -- in some cases, as much as pressurization of the glove. Previous rule-of-thumb estimates had assumed that the TMG would have a much lower impact on mobility, while these results suggest that an improvement in the TMG could actually have a significant impact on glove performance. Similarly, tactility testing illustrated the impact of glove pressurization on tactility and provided insight on the design of interfaces to the glove. The metrics described in this paper have been used to benchmark the Phase VI EVA glove and to develop requirements for the next generation glove for the Constellation program.
Standardised Benchmarking in the Quest for Orthologs

PubMed Central

Altenhoff, Adrian M.; Boeckmann, Brigitte; Capella-Gutierrez, Salvador; Dalquen, Daniel A.; DeLuca, Todd; Forslund, Kristoffer; Huerta-Cepas, Jaime; Linard, Benjamin; Pereira, Cécile; Pryszcz, Leszek P.; Schreiber, Fabian; Sousa da Silva, Alan; Szklarczyk, Damian; Train, Clément-Marie; Bork, Peer; Lecompte, Odile; von Mering, Christian; Xenarios, Ioannis; Sjölander, Kimmen; Juhl Jensen, Lars; Martin, Maria J.; Muffato, Matthieu; Gabaldón, Toni; Lewis, Suzanna E.; Thomas, Paul D.; Sonnhammer, Erik; Dessimoz, Christophe

2016-01-01

The identification of evolutionarily related genes across different species—orthologs in particular—forms the backbone of many comparative, evolutionary, and functional genomic analyses. Achieving high accuracy in orthology inference is thus essential. Yet the true evolutionary history of genes, required to ascertain orthology, is generally unknown. Furthermore, orthologs are used for very different applications across different phyla, with different requirements in terms of the precision-recall trade-off. As a result, assessing the performance of orthology inference methods remains difficult for both users and method developers. Here, we present a community effort to establish standards in orthology benchmarking and facilitate orthology benchmarking through an automated web-based service (http://orthology.benchmarkservice.org). Using this new service, we characterise the performance of 15 well-established orthology inference methods and resources on a battery of 20 different benchmarks. Standardised benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimal requirement for new tools and resources, and guides the development of more accurate orthology inference methods. PMID:27043882
Using benchmarks for radiation testing of microprocessors and FPGAs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Quinn, Heather; Robinson, William H.; Rech, Paolo

Performance benchmarks have been used over the years to compare different systems. These benchmarks can be useful for researchers trying to determine how changes to the technology, architecture, or compiler affect the system's performance. No such standard exists for systems deployed into high radiation environments, making it difficult to assess whether changes in the fabrication process, circuitry, architecture, or software affect reliability or radiation sensitivity. In this paper, we propose a benchmark suite for high-reliability systems that is designed for field-programmable gate arrays and microprocessors. As a result, we describe the development process and report neutron test data for themore » hardware and software benchmarks.« less
Using benchmarks for radiation testing of microprocessors and FPGAs

DOE PAGES

Quinn, Heather; Robinson, William H.; Rech, Paolo; ...

2015-12-17

Performance benchmarks have been used over the years to compare different systems. These benchmarks can be useful for researchers trying to determine how changes to the technology, architecture, or compiler affect the system's performance. No such standard exists for systems deployed into high radiation environments, making it difficult to assess whether changes in the fabrication process, circuitry, architecture, or software affect reliability or radiation sensitivity. In this paper, we propose a benchmark suite for high-reliability systems that is designed for field-programmable gate arrays and microprocessors. As a result, we describe the development process and report neutron test data for themore » hardware and software benchmarks.« less
Developing a Benchmarking Process in Perfusion: A Report of the Perfusion Downunder Collaboration

PubMed Central

Baker, Robert A.; Newland, Richard F.; Fenton, Carmel; McDonald, Michael; Willcox, Timothy W.; Merry, Alan F.

2012-01-01

Abstract: Improving and understanding clinical practice is an appropriate goal for the perfusion community. The Perfusion Downunder Collaboration has established a multi-center perfusion focused database aimed at achieving these goals through the development of quantitative quality indicators for clinical improvement through benchmarking. Data were collected using the Perfusion Downunder Collaboration database from procedures performed in eight Australian and New Zealand cardiac centers between March 2007 and February 2011. At the Perfusion Downunder Meeting in 2010, it was agreed by consensus, to report quality indicators (QI) for glucose level, arterial outlet temperature, and pCO2 management during cardiopulmonary bypass. The values chosen for each QI were: blood glucose ≥4 mmol/L and ≤10 mmol/L; arterial outlet temperature ≤37°C; and arterial blood gas pCO2 ≥ 35 and ≤45 mmHg. The QI data were used to derive benchmarks using the Achievable Benchmark of Care (ABC™) methodology to identify the incidence of QIs at the best performing centers. Five thousand four hundred and sixty-five procedures were evaluated to derive QI and benchmark data. The incidence of the blood glucose QI ranged from 37–96% of procedures, with a benchmark value of 90%. The arterial outlet temperature QI occurred in 16–98% of procedures with the benchmark of 94%; while the arterial pCO2 QI occurred in 21–91%, with the benchmark value of 80%. We have derived QIs and benchmark calculations for the management of several key aspects of cardiopulmonary bypass to provide a platform for improving the quality of perfusion practice. PMID:22730861
Key performance indicators in British military trauma.

PubMed

Stannard, Adam; Tai, Nigel R; Bowley, Douglas M; Midwinter, Mark; Hodgetts, Tim J

2008-08-01

Key performance indicators (KPI) are tools for assessing process and outcome in systems of health care provision and are an essential component in performance improvement. Although KPI have been used in British military trauma for 10 years, they remain poorly defined and are derived from civilian metrics that do not adjust for the realities of field trauma care. Our aim was to modify current trauma KPI to ensure they more faithfully reflect both the military setting and contemporary evidence in order to both aid accurate calibration of the performance of the British Defence Medical Services and act as a driver for performance improvement. A workshop was convened that was attended by senior, experienced doctors and nurses from all disciplines of trauma care in the British military. "Speciality-specific" KPI were developed by interest groups using evidence-based data where available and collective experience where this was lacking. In a final discussion these were streamlined into 60 KPI covering each phase of trauma management. The introduction of these KPI sets a number of important benchmarks by which British military trauma can be measured. As part of a performance improvement programme, these will allow closer monitoring of our performance and assist efforts to develop, train, and resource British military trauma providers.
Classifying indicators of quality: a collaboration between Dutch and English regulators.

PubMed

Mears, Alex; Vesseur, Jan; Hamblin, Richard; Long, Paul; Den Ouden, Lya

2011-12-01

Many approaches to measuring quality in healthcare exist, generally employing indicators or metrics. While there are important differences, most of these approaches share three key areas of measurement: safety, effectiveness and patient experience. The European Partnership for Supervisory Organisations in Health Services and Social Care (EPSO) exists as a working group and discussion forum for European regulators. This group undertook to identify a common framework within which European approaches to indicators could be compared. A framework was developed to classify indicators, using four sets of criteria: conceptualization of quality, Donabedian definition (structure, process, outcome), data type (derivable, collectable from routine sources, special collections, samples) and data use (judgement (singular or part of framework) benchmarking, risk assessment). Indicators from English and Dutch hospital measurement programmes were put into the framework, showing areas of agreement and levels of comparability. In the first instance, results are only illustrative. The EPSO has been a powerful driver for undertaking cross-European research, and this project is the first of many to take advantage of the access to international expertize. It has shown that through development of a framework that deconstructs national indicators, commonalities can be identified. Future work will attempt to incorporate other nations' indicators, and attempt cross-national comparison.
Piloting a Process Maturity Model as an e-Learning Benchmarking Method

ERIC Educational Resources Information Center

Petch, Jim; Calverley, Gayle; Dexter, Hilary; Cappelli, Tim

2007-01-01

As part of a national e-learning benchmarking initiative of the UK Higher Education Academy, the University of Manchester is carrying out a pilot study of a method to benchmark e-learning in an institution. The pilot was designed to evaluate the operational viability of a method based on the e-Learning Maturity Model developed at the University of…
Journal Benchmarking for Strategic Publication Management and for Improving Journal Positioning in the World Ranking Systems

ERIC Educational Resources Information Center

Moskovkin, Vladimir M.; Bocharova, Emilia A.; Balashova, Oksana V.

2014-01-01

Purpose: The purpose of this paper is to introduce and develop the methodology of journal benchmarking. Design/Methodology/ Approach: The journal benchmarking method is understood to be an analytic procedure of continuous monitoring and comparing of the advance of specific journal(s) against that of competing journals in the same subject area,…
Developing a benchmark for emotional analysis of music

PubMed Central

Yang, Yi-Hsuan; Soleymani, Mohammad

2017-01-01

Music emotion recognition (MER) field rapidly expanded in the last decade. Many new methods and new audio features are developed to improve the performance of MER algorithms. However, it is very difficult to compare the performance of the new methods because of the data representation diversity and scarcity of publicly available data. In this paper, we address these problems by creating a data set and a benchmark for MER. The data set that we release, a MediaEval Database for Emotional Analysis in Music (DEAM), is the largest available data set of dynamic annotations (valence and arousal annotations for 1,802 songs and song excerpts licensed under Creative Commons with 2Hz time resolution). Using DEAM, we organized the ‘Emotion in Music’ task at MediaEval Multimedia Evaluation Campaign from 2013 to 2015. The benchmark attracted, in total, 21 active teams to participate in the challenge. We analyze the results of the benchmark: the winning algorithms and feature-sets. We also describe the design of the benchmark, the evaluation procedures and the data cleaning and transformations that we suggest. The results from the benchmark suggest that the recurrent neural network based approaches combined with large feature-sets work best for dynamic MER. PMID:28282400
Comprehensive Benchmark Suite for Simulation of Particle Laden Flows Using the Discrete Element Method with Performance Profiles from the Multiphase Flow with Interface eXchanges (MFiX) Code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, Peiyuan; Brown, Timothy; Fullmer, William D.

Five benchmark problems are developed and simulated with the computational fluid dynamics and discrete element model code MFiX. The benchmark problems span dilute and dense regimes, consider statistically homogeneous and inhomogeneous (both clusters and bubbles) particle concentrations and a range of particle and fluid dynamic computational loads. Several variations of the benchmark problems are also discussed to extend the computational phase space to cover granular (particles only), bidisperse and heat transfer cases. A weak scaling analysis is performed for each benchmark problem and, in most cases, the scalability of the code appears reasonable up to approx. 103 cores. Profiling ofmore » the benchmark problems indicate that the most substantial computational time is being spent on particle-particle force calculations, drag force calculations and interpolating between discrete particle and continuum fields. Hardware performance analysis was also carried out showing significant Level 2 cache miss ratios and a rather low degree of vectorization. These results are intended to serve as a baseline for future developments to the code as well as a preliminary indicator of where to best focus performance optimizations.« less
Developing a benchmark for emotional analysis of music.

PubMed

Aljanaki, Anna; Yang, Yi-Hsuan; Soleymani, Mohammad

2017-01-01

Music emotion recognition (MER) field rapidly expanded in the last decade. Many new methods and new audio features are developed to improve the performance of MER algorithms. However, it is very difficult to compare the performance of the new methods because of the data representation diversity and scarcity of publicly available data. In this paper, we address these problems by creating a data set and a benchmark for MER. The data set that we release, a MediaEval Database for Emotional Analysis in Music (DEAM), is the largest available data set of dynamic annotations (valence and arousal annotations for 1,802 songs and song excerpts licensed under Creative Commons with 2Hz time resolution). Using DEAM, we organized the 'Emotion in Music' task at MediaEval Multimedia Evaluation Campaign from 2013 to 2015. The benchmark attracted, in total, 21 active teams to participate in the challenge. We analyze the results of the benchmark: the winning algorithms and feature-sets. We also describe the design of the benchmark, the evaluation procedures and the data cleaning and transformations that we suggest. The results from the benchmark suggest that the recurrent neural network based approaches combined with large feature-sets work best for dynamic MER.

Benchmarking gate-based quantum computers

NASA Astrophysics Data System (ADS)

Michielsen, Kristel; Nocon, Madita; Willsch, Dennis; Jin, Fengping; Lippert, Thomas; De Raedt, Hans

2017-11-01

With the advent of public access to small gate-based quantum processors, it becomes necessary to develop a benchmarking methodology such that independent researchers can validate the operation of these processors. We explore the usefulness of a number of simple quantum circuits as benchmarks for gate-based quantum computing devices and show that circuits performing identity operations are very simple, scalable and sensitive to gate errors and are therefore very well suited for this task. We illustrate the procedure by presenting benchmark results for the IBM Quantum Experience, a cloud-based platform for gate-based quantum computing.
Evaluating and Improving Automatic Sleep Spindle Detection by Using Multi-Objective Evolutionary Algorithms

PubMed Central

Liu, Min-Yin; Huang, Adam; Huang, Norden E.

2017-01-01

Sleep spindles are brief bursts of brain activity in the sigma frequency range (11–16 Hz) measured by electroencephalography (EEG) mostly during non-rapid eye movement (NREM) stage 2 sleep. These oscillations are of great biological and clinical interests because they potentially play an important role in identifying and characterizing the processes of various neurological disorders. Conventionally, sleep spindles are identified by expert sleep clinicians via visual inspection of EEG signals. The process is laborious and the results are inconsistent among different experts. To resolve the problem, numerous computerized methods have been developed to automate the process of sleep spindle identification. Still, the performance of these automated sleep spindle detection methods varies inconsistently from study to study. There are two reasons: (1) the lack of common benchmark databases, and (2) the lack of commonly accepted evaluation metrics. In this study, we focus on tackling the second problem by proposing to evaluate the performance of a spindle detector in a multi-objective optimization context and hypothesize that using the resultant Pareto fronts for deriving evaluation metrics will improve automatic sleep spindle detection. We use a popular multi-objective evolutionary algorithm (MOEA), the Strength Pareto Evolutionary Algorithm (SPEA2), to optimize six existing frequency-based sleep spindle detection algorithms. They include three Fourier, one continuous wavelet transform (CWT), and two Hilbert-Huang transform (HHT) based algorithms. We also explore three hybrid approaches. Trained and tested on open-access DREAMS and MASS databases, two new hybrid methods of combining Fourier with HHT algorithms show significant performance improvement with F1-scores of 0.726–0.737. PMID:28572762
Development of a Multidisciplinary Approach to Access Sustainability

EPA Science Inventory

There are a number of established, scientifically supported metrics of sustainability. Many of the metrics are data intensive and require extensive effort to collect data and compute the metrics. Moreover, individual metrics do not capture all aspects of a system that are relevan...
Assessment of quality outcomes for robotic pancreaticoduodenectomy: identification of the learning curve.

PubMed

Boone, Brian A; Zenati, Mazen; Hogg, Melissa E; Steve, Jennifer; Moser, Arthur James; Bartlett, David L; Zeh, Herbert J; Zureikat, Amer H

2015-05-01

Quality assessment is an important instrument to ensure optimal surgical outcomes, particularly during the adoption of new surgical technology. The use of the robotic platform for complex pancreatic resections, such as the pancreaticoduodenectomy, requires close monitoring of outcomes during its implementation phase to ensure patient safety is maintained and the learning curve identified. To report the results of a quality analysis and learning curve during the implementation of robotic pancreaticoduodenectomy (RPD). A retrospective review of a prospectively maintained database of 200 consecutive patients who underwent RPD in a large academic center from October 3, 2008, through March 1, 2014, was evaluated for important metrics of quality. Patients were analyzed in groups of 20 to minimize demographic differences and optimize the ability to detect statistically meaningful changes in performance. Robotic pancreaticoduodenectomy. Optimization of perioperative outcome parameters. No statistical differences in mortality rates or major morbidity were noted during the study. Statistical improvements in estimated blood loss and conversions to open surgery occurred after 20 cases (600 mL vs 250 mL [P = .002] and 35.0% vs 3.3% [P < .001], respectively), incidence of pancreatic fistula after 40 cases (27.5% vs 14.4%; P = .04), and operative time after 80 cases (581 minutes vs 417 minutes [P < .001]). Complication rates, lengths of stay, and readmission rates showed continuous improvement that did not reach statistical significance. Outcomes for the last 120 cases (representing optimized metrics beyond the learning curve) included a mean operative time of 417 minutes, median estimated blood loss of 250 mL, a conversion rate of 3.3%, 90-day mortality of 3.3%, a clinically significant (grade B/C) pancreatic fistula rate of 6.9%, and a median length of stay of 9 days. Continuous assessment of quality metrics allows for safe implementation of RPD. We identified several inflexion points corresponding to optimization of performance metrics for RPD that can be used as benchmarks for surgeons who are adopting this technology.
The demographic impact and development benefits of meeting demand for family planning with modern contraceptive methods.

PubMed

Goodkind, Daniel; Lollock, Lisa; Choi, Yoonjoung; McDevitt, Thomas; West, Loraine

2018-01-01

Meeting demand for family planning can facilitate progress towards all major themes of the United Nations Sustainable Development Goals (SDGs): people, planet, prosperity, peace, and partnership. Many policymakers have embraced a benchmark goal that at least 75% of the demand for family planning in all countries be satisfied with modern contraceptive methods by the year 2030. This study examines the demographic impact (and development implications) of achieving the 75% benchmark in 13 developing countries that are expected to be the furthest from achieving that benchmark. Estimation of the demographic impact of achieving the 75% benchmark requires three steps in each country: 1) translate contraceptive prevalence assumptions (with and without intervention) into future fertility levels based on biometric models, 2) incorporate each pair of fertility assumptions into separate population projections, and 3) compare the demographic differences between the two population projections. Data are drawn from the United Nations, the US Census Bureau, and Demographic and Health Surveys. The demographic impact of meeting the 75% benchmark is examined via projected differences in fertility rates (average expected births per woman's reproductive lifetime), total population, growth rates, age structure, and youth dependency. On average, meeting the benchmark would imply a 16 percentage point increase in modern contraceptive prevalence by 2030 and a 20% decline in youth dependency, which portends a potential demographic dividend to spur economic growth. Improvements in meeting the demand for family planning with modern contraceptive methods can bring substantial benefits to developing countries. To our knowledge, this is the first study to show formally how such improvements can alter population size and age structure. Declines in youth dependency portend a demographic dividend, an added bonus to the already well-known benefits of meeting existing demands for family planning.
All inclusive benchmarking.

PubMed

Ellis, Judith

2006-07-01

The aim of this article is to review published descriptions of benchmarking activity and synthesize benchmarking principles to encourage the acceptance and use of Essence of Care as a new benchmarking approach to continuous quality improvement, and to promote its acceptance as an integral and effective part of benchmarking activity in health services. The Essence of Care, was launched by the Department of Health in England in 2001 to provide a benchmarking tool kit to support continuous improvement in the quality of fundamental aspects of health care, for example, privacy and dignity, nutrition and hygiene. The tool kit is now being effectively used by some frontline staff. However, use is inconsistent, with the value of the tool kit, or the support clinical practice benchmarking requires to be effective, not always recognized or provided by National Health Service managers, who are absorbed with the use of quantitative benchmarking approaches and measurability of comparative performance data. This review of published benchmarking literature, was obtained through an ever-narrowing search strategy commencing from benchmarking within quality improvement literature through to benchmarking activity in health services and including access to not only published examples of benchmarking approaches and models used but the actual consideration of web-based benchmarking data. This supported identification of how benchmarking approaches have developed and been used, remaining true to the basic benchmarking principles of continuous improvement through comparison and sharing (Camp 1989). Descriptions of models and exemplars of quantitative and specifically performance benchmarking activity in industry abound (Camp 1998), with far fewer examples of more qualitative and process benchmarking approaches in use in the public services and then applied to the health service (Bullivant 1998). The literature is also in the main descriptive in its support of the effectiveness of benchmarking activity and although this does not seem to have restricted its popularity in quantitative activity, reticence about the value of the more qualitative approaches, for example Essence of Care, needs to be overcome in order to improve the quality of patient care and experiences. The perceived immeasurability and subjectivity of Essence of Care and clinical practice benchmarks means that these benchmarking approaches are not always accepted or supported by health service organizations as valid benchmarking activity. In conclusion, Essence of Care benchmarking is a sophisticated clinical practice benchmarking approach which needs to be accepted as an integral part of health service benchmarking activity to support improvement in the quality of patient care and experiences.
Development of quality metrics for ambulatory pediatric cardiology: Infection prevention.

PubMed

Johnson, Jonathan N; Barrett, Cindy S; Franklin, Wayne H; Graham, Eric M; Halnon, Nancy J; Hattendorf, Brandy A; Krawczeski, Catherine D; McGovern, James J; O'Connor, Matthew J; Schultz, Amy H; Vinocur, Jeffrey M; Chowdhury, Devyani; Anderson, Jeffrey B

2017-12-01

In 2012, the American College of Cardiology's (ACC) Adult Congenital and Pediatric Cardiology Council established a program to develop quality metrics to guide ambulatory practices for pediatric cardiology. The council chose five areas on which to focus their efforts; chest pain, Kawasaki Disease, tetralogy of Fallot, transposition of the great arteries after arterial switch, and infection prevention. Here, we sought to describe the process, evaluation, and results of the Infection Prevention Committee's metric design process. The infection prevention metrics team consisted of 12 members from 11 institutions in North America. The group agreed to work on specific infection prevention topics including antibiotic prophylaxis for endocarditis, rheumatic fever, and asplenia/hyposplenism; influenza vaccination and respiratory syncytial virus prophylaxis (palivizumab); preoperative methods to reduce intraoperative infections; vaccinations after cardiopulmonary bypass; hand hygiene; and testing to identify splenic function in patients with heterotaxy. An extensive literature review was performed. When available, previously published guidelines were used fully in determining metrics. The committee chose eight metrics to submit to the ACC Quality Metric Expert Panel for review. Ultimately, metrics regarding hand hygiene and influenza vaccination recommendation for patients did not pass the RAND analysis. Both endocarditis prophylaxis metrics and the RSV/palivizumab metric passed the RAND analysis but fell out during the open comment period. Three metrics passed all analyses, including those for antibiotic prophylaxis in patients with heterotaxy/asplenia, for influenza vaccination compliance in healthcare personnel, and for adherence to recommended regimens of secondary prevention of rheumatic fever. The lack of convincing data to guide quality improvement initiatives in pediatric cardiology is widespread, particularly in infection prevention. Despite this, three metrics were able to be developed for use in the ACC's quality efforts for ambulatory practice. © 2017 Wiley Periodicals, Inc.
Municipal water consumption forecast accuracy

NASA Astrophysics Data System (ADS)

Fullerton, Thomas M.; Molina, Angel L.

2010-06-01

Municipal water consumption planning is an active area of research because of infrastructure construction and maintenance costs, supply constraints, and water quality assurance. In spite of that, relatively few water forecast accuracy assessments have been completed to date, although some internal documentation may exist as part of the proprietary "grey literature." This study utilizes a data set of previously published municipal consumption forecasts to partially fill that gap in the empirical water economics literature. Previously published municipal water econometric forecasts for three public utilities are examined for predictive accuracy against two random walk benchmarks commonly used in regional analyses. Descriptive metrics used to quantify forecast accuracy include root-mean-square error and Theil inequality statistics. Formal statistical assessments are completed using four-pronged error differential regression F tests. Similar to studies for other metropolitan econometric forecasts in areas with similar demographic and labor market characteristics, model predictive performances for the municipal water aggregates in this effort are mixed for each of the municipalities included in the sample. Given the competitiveness of the benchmarks, analysts should employ care when utilizing econometric forecasts of municipal water consumption for planning purposes, comparing them to recent historical observations and trends to insure reliability. Comparative results using data from other markets, including regions facing differing labor and demographic conditions, would also be helpful.
What Randomized Benchmarking Actually Measures

DOE PAGES

Proctor, Timothy; Rudinger, Kenneth; Young, Kevin; ...

2017-09-28

Randomized benchmarking (RB) is widely used to measure an error rate of a set of quantum gates, by performing random circuits that would do nothing if the gates were perfect. In the limit of no finite-sampling error, the exponential decay rate of the observable survival probabilities, versus circuit length, yields a single error metric r. For Clifford gates with arbitrary small errors described by process matrices, r was believed to reliably correspond to the mean, over all Clifford gates, of the average gate infidelity between the imperfect gates and their ideal counterparts. We show that this quantity is not amore » well-defined property of a physical gate set. It depends on the representations used for the imperfect and ideal gates, and the variant typically computed in the literature can differ from r by orders of magnitude. We present new theories of the RB decay that are accurate for all small errors describable by process matrices, and show that the RB decay curve is a simple exponential for all such errors. Here, these theories allow explicit computation of the error rate that RB measures (r), but as far as we can tell it does not correspond to the infidelity of a physically allowed (completely positive) representation of the imperfect gates.« less
Development and Application of Explicitly Correlated Wave Function Based Methods for the Investigation of Optical Properties of Semiconductor Nanomaterials

NASA Astrophysics Data System (ADS)

Elward, Jennifer Mary

Semiconductor nanoparticles, or quantum dots (QDs), are well known to have very unique optical and electronic properties. These properties can be controlled and tailored as a function of several influential factors, including but not limited to the particle size and shape, effect of composition and heterojunction as well as the effect of ligand on the particle surface. This customizable nature leads to extensive experimental and theoretical research on the capabilities of these quantum dots for many application purposes. However, in order to be able to understand and thus further the development of these materials, one must first understand the fundamental interaction within these nanoparticles. In this thesis, I have developed a theoretical method which is called electron-hole explicitly correlated Hartee-Fock (eh-XCHF). It is a variational method for solving the electron-hole Schrodinger equation and has been used in this work to study electron-hole interaction in semiconductor quantum dots. The method was benchmarked with respect to a parabolic quantum dot system, and ground state energy and electron-hole recombination probability were computed. Both of these properties were found to be in good agreement with expected results. Upon successful benchmarking, I have applied the eh-XCHF method to study optical properties of several quantum dot systems including the effect of dot size on exciton binding energy and recombination probability in a CdSe quantum dot, the effect of shape on a CdSe quantum dot, the effect of heterojunction on a CdSe/ZnS quantum dot and the effect of quantum dot-biomolecule interaction within a CdSe-firefly Luciferase protein conjugate system. As metrics for assessing the effect of these influencers on the electron-hole interaction, the exciton binding energy, electron-hole recombination probability and the average electron-hole separation distance have been computed. These excitonic properties have been found to be strongly infuenced by the changing composition of the particle. It has also been found through this work that the explicitly correlated method performs very well when computing these properties as it provides a feasible computational route to compare to both experimental and other theoretical results.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Biddy, Mary J.; Davis, Ryan; Humbird, David

Biorefinery process development relies on techno-economic analysis (TEA) to identify primary cost drivers, prioritize research directions, and mitigate technical risk for scale-up through development of detailed process designs. Here, we conduct TEA of a model 2000 dry metric ton-per-day lignocellulosic biorefinery that employs a two-step pretreatment and enzymatic hydrolysis to produce biomass-derived sugars, followed by biological lipid production, lipid recovery, and catalytic hydrotreating to produce renewable diesel blendstock (RDB). On the basis of projected near-term technical feasibility of these steps, we predict that RDB could be produced at a minimum fuel selling price (MFSP) of USD $9.55/gasoline-gallon-equivalent (GGE), predicated onmore » the need for improvements in the lipid productivity and yield beyond current benchmark performance. This cost is significant given the limitations in scale and high costs for aerobic cultivation of oleaginous microbes and subsequent lipid extraction/recovery. In light of this predicted cost, we developed an alternative pathway which demonstrates that RDB costs could be substantially reduced in the near term if upgradeable fractions of biomass, in this case hemicellulose-derived sugars, are diverted to coproducts of sufficient value and market size; here, we use succinic acid as an example coproduct. The coproduction model predicts an MFSP of USD $5.28/GGE when leaving conversion and yield parameters unchanged for the fuel production pathway, leading to a change in biorefinery RDB capacity from 24 to 15 MM GGE/year and 0.13 MM tons of succinic acid per year. Additional analysis demonstrates that beyond the near-term projections assumed in the models here, further reductions in the MFSP toward $2-3/GGE (which would be competitive with fossil-based hydrocarbon fuels) are possible with additional transformational improvements in the fuel and coproduct trains, especially in terms of carbon efficiency to both fuels and coproducts, recovery and purification of fuels and coproducts, and coproduct selection and price. Overall, this analysis documents potential economics for both a hydrocarbon fuel and bioproduct process pathway and highlights prioritized research directions beyond the current benchmark to enable hydrocarbon fuel production via an oleaginous microbial platform with simultaneous coproduct manufacturing from lignocellulosic biomass.« less
Cross-Evaluation of Degree Programmes in Higher Education

ERIC Educational Resources Information Center

Kettunen, Juha

2010-01-01

Purpose: This study seeks to develop and describe the benchmarking approach of enhancement-led evaluation in higher education and to present a cross-evaluation process for degree programmes. Design/methodology/approach: The benchmarking approach produces useful information for the development of degree programmes based on self-evaluation,…
Establishing Language Benchmarks for Children with Typically Developing Language and Children with Language Impairment

ERIC Educational Resources Information Center

Schmitt, Mary Beth; Logan, Jessica A. R.; Tambyraja, Sherine R.; Farquharson, Kelly; Justice, Laura M.

2017-01-01

Purpose: Practitioners, researchers, and policymakers (i.e., stakeholders) have vested interests in children's language growth yet currently do not have empirically driven methods for measuring such outcomes. The present study established language benchmarks for children with typically developing language (TDL) and children with language…
RESULTS OF QA/QC TESTING OF EPA BENCHMARK DOSE SOFTWARE VERSION 1.2

EPA Science Inventory

EPA is developing benchmark dose software (BMDS) to support cancer and non-cancer dose-response assessments. Following the recent public review of BMDS version 1.1b, EPA developed a Hill model for evaluating continuous data, and improved the user interface and Multistage, Polyno...
Objective Methodology to Assess Meaningful Research Productivity by Orthopaedic Residency Departments: Validation Against Widely Distributed Ranking Metrics and Published Surrogates.

PubMed

Jones, Louis B; Goel, Sameer; Hung, Leroy Y; Graves, Matthew L; Spitler, Clay A; Russell, George V; Bergin, Patrick F

2018-04-01

The mission of any academic orthopaedic training program can be divided into 3 general areas of focus: clinical care, academic performance, and research. Clinical care is evaluated on clinical volume, patient outcomes, patient satisfaction, and becoming increasingly focused on data-driven quality metrics. Academic performance of a department can be used to motivate individual surgeons, but objective measures are used to define a residency program. Annual in-service examinations serve as a marker of resident knowledge base, and board pass rates are clearly scrutinized. Research productivity, however, has proven harder to objectively quantify. In an effort to improve transparency and better account for conflicts of interest, bias, and self-citation, multiple bibliometric measures have been developed. Rather than using individuals' research productivity as a surrogate for departmental research, we sought to establish an objective methodology to better assess a residency program's ability to conduct meaningful research. In this study, we describe a process to assess the number and quality of publications produced by an orthopaedic residency department. This would allow chairmen and program directors to benchmark their current production and make measurable goals for future research investment. The main goal of the benchmarking system is to create an "h-index" for residency programs. To do this, we needed to create a list of relevant articles in the orthopaedic literature. We used the Journal Citation Reports. This publication lists all orthopaedic journals that are given an impact factor rating every year. When we accessed the Journal Citation Reports database, there were 72 journals included in the orthopaedic literature section. To ensure only relevant, impactful journals were included, we selected journals with an impact factor greater than 0.95 and an Eigenfactor Score greater than 0.00095. After excluding journals not meeting these criteria, we were left with 45 journals. We performed a Scopus search over a 10-year period of these journals and created a database of articles and their affiliated institutions. We performed several iterations of this to maximize the capture of articles attributed to institutions with multiple names. Based off of this extensive database, we were able to analyze all allopathic US residency programs based on their quality research productivity. We believe this as a novel methodology to create a system by which residency program chairmen and directors can assess progress over time and accurate comparison with other programs.
Systematic Benchmarking of Diagnostic Technologies for an Electrical Power System

NASA Technical Reports Server (NTRS)

Kurtoglu, Tolga; Jensen, David; Poll, Scott

2009-01-01

Automated health management is a critical functionality for complex aerospace systems. A wide variety of diagnostic algorithms have been developed to address this technical challenge. Unfortunately, the lack of support to perform large-scale V&V (verification and validation) of diagnostic technologies continues to create barriers to effective development and deployment of such algorithms for aerospace vehicles. In this paper, we describe a formal framework developed for benchmarking of diagnostic technologies. The diagnosed system is the Advanced Diagnostics and Prognostics Testbed (ADAPT), a real-world electrical power system (EPS), developed and maintained at the NASA Ames Research Center. The benchmarking approach provides a systematic, empirical basis to the testing of diagnostic software and is used to provide performance assessment for different diagnostic algorithms.
A Teacher's Guide to Metrics. A Series of In-Service Booklets Designed for Adult Educators.

ERIC Educational Resources Information Center

Wendel, Robert, Ed.; And Others

This series of seven booklets is designed to train teachers of adults in metrication, as a prerequisite to offering metrics in adult basic education and general educational development programs. The seven booklets provide a guide representing an integration of metric teaching methods and metric materials to place the adult in an active learning…
Development, Validation and Integration of the ATLAS Trigger System Software in Run 2

NASA Astrophysics Data System (ADS)

Keyes, Robert; ATLAS Collaboration

2017-10-01

The trigger system of the ATLAS detector at the LHC is a combination of hardware, firmware, and software, associated to various sub-detectors that must seamlessly cooperate in order to select one collision of interest out of every 40,000 delivered by the LHC every millisecond. These proceedings discuss the challenges, organization and work flow of the ongoing trigger software development, validation, and deployment. The goal of this development is to ensure that the most up-to-date algorithms are used to optimize the performance of the experiment. The goal of the validation is to ensure the reliability and predictability of the software performance. Integration tests are carried out to ensure that the software deployed to the online trigger farm during data-taking run as desired. Trigger software is validated by emulating online conditions using a benchmark run and mimicking the reconstruction that occurs during normal data-taking. This exercise is computationally demanding and thus runs on the ATLAS high performance computing grid with high priority. Performance metrics ranging from low-level memory and CPU requirements, to distributions and efficiencies of high-level physics quantities are visualized and validated by a range of experts. This is a multifaceted critical task that ties together many aspects of the experimental effort and thus directly influences the overall performance of the ATLAS experiment.
Research Overview and Analysis.

DTIC Science & Technology

1982-04-01

they have the infrastructure in place to respond to a signi- ficant increase in demand for the development of metric standards should such a demand...Conversion of Standards: The Views of Nine Selected Major Standards Development Bodies, U.S. Metric Board, in press (1982). A Study of Metric Conversion...Reports Developed under Contract AA-80-SAC-XB604 4--Office of Public Awareness and Education 5--Three Reports Developed under Contract AA-80-SAC-X8602
Development of a multidisciplinary approach to assess regional sustainability

EPA Science Inventory

There are a number of established, scientifically supported metrics of sustainability. Many of the metrics are data intensive and require extensive effort to collect data and compute the metrics. Moreover, individual metrics do not capture all aspects of a system that are relev...

Assessment of Static Delamination Propagation Capabilities in Commercial Finite Element Codes Using Benchmark Analysis

NASA Technical Reports Server (NTRS)

Orifici, Adrian C.; Krueger, Ronald

2010-01-01

With capabilities for simulating delamination growth in composite materials becoming available, the need for benchmarking and assessing these capabilities is critical. In this study, benchmark analyses were performed to assess the delamination propagation simulation capabilities of the VCCT implementations in Marc TM and MD NastranTM. Benchmark delamination growth results for Double Cantilever Beam, Single Leg Bending and End Notched Flexure specimens were generated using a numerical approach. This numerical approach was developed previously, and involves comparing results from a series of analyses at different delamination lengths to a single analysis with automatic crack propagation. Specimens were analyzed with three-dimensional and two-dimensional models, and compared with previous analyses using Abaqus . The results demonstrated that the VCCT implementation in Marc TM and MD Nastran(TradeMark) was capable of accurately replicating the benchmark delamination growth results and that the use of the numerical benchmarks offers advantages over benchmarking using experimental and analytical results.
Can data-driven benchmarks be used to set the goals of healthy people 2010?

PubMed Central

Allison, J; Kiefe, C I; Weissman, N W

1999-01-01

OBJECTIVES: Expert panels determined the public health goals of Healthy People 2000 subjectively. The present study examined whether data-driven benchmarks provide a better alternative. METHODS: We developed the "pared-mean" method to define from data the best achievable health care practices. We calculated the pared-mean benchmark for screening mammography from the 1994 National Health Interview Survey, using the metropolitan statistical area as the "provider" unit. Beginning with the best-performing provider and adding providers in descending sequence, we established the minimum provider subset that included at least 10% of all women surveyed on this question. The pared-mean benchmark is then the proportion of women in this subset who received mammography. RESULTS: The pared-mean benchmark for screening mammography was 71%, compared with the Healthy People 2000 goal of 60%. CONCLUSIONS: For Healthy People 2010, benchmarks derived from data reflecting the best available care provide viable alternatives to consensus-derived targets. We are currently pursuing additional refinements to the data-driven pared-mean benchmark approach. PMID:9987466
Benchmarking to improve the quality of cystic fibrosis care.

PubMed

Schechter, Michael S

2012-11-01

Benchmarking involves the ascertainment of healthcare programs with most favorable outcomes as a means to identify and spread effective strategies for delivery of care. The recent interest in the development of patient registries for patients with cystic fibrosis (CF) has been fueled in part by an interest in using them to facilitate benchmarking. This review summarizes reports of how benchmarking has been operationalized in attempts to improve CF care. Although certain goals of benchmarking can be accomplished with an exclusive focus on registry data analysis, benchmarking programs in Germany and the United States have supplemented these data analyses with exploratory interactions and discussions to better understand successful approaches to care and encourage their spread throughout the care network. Benchmarking allows the discovery and facilitates the spread of effective approaches to care. It provides a pragmatic alternative to traditional research methods such as randomized controlled trials, providing insights into methods that optimize delivery of care and allowing judgments about the relative effectiveness of different therapeutic approaches.
Software metrics: The key to quality software on the NCC project

NASA Technical Reports Server (NTRS)

Burns, Patricia J.

1993-01-01

Network Control Center (NCC) Project metrics are captured during the implementation and testing phases of the NCCDS software development lifecycle. The metrics data collection and reporting function has interfaces with all elements of the NCC project. Close collaboration with all project elements has resulted in the development of a defined and repeatable set of metrics processes. The resulting data are used to plan and monitor release activities on a weekly basis. The use of graphical outputs facilitates the interpretation of progress and status. The successful application of metrics throughout the NCC project has been instrumental in the delivery of quality software. The use of metrics on the NCC Project supports the needs of the technical and managerial staff. This paper describes the project, the functions supported by metrics, the data that are collected and reported, how the data are used, and the improvements in the quality of deliverable software since the metrics processes and products have been in use.
A Validation of Object-Oriented Design Metrics as Quality Indicators

NASA Technical Reports Server (NTRS)

Basili, Victor R.; Briand, Lionel C.; Melo, Walcelio

1997-01-01

This paper presents the results of a study in which we empirically investigated the suits of object-oriented (00) design metrics introduced in another work. More specifically, our goal is to assess these metrics as predictors of fault-prone classes and, therefore, determine whether they can be used as early quality indicators. This study is complementary to the work described where the same suite of metrics had been used to assess frequencies of maintenance changes to classes. To perform our validation accurately, we collected data on the development of eight medium-sized information management systems based on identical requirements. All eight projects were developed using a sequential life cycle model, a well-known 00 analysis/design method and the C++ programming language. Based on empirical and quantitative analysis, the advantages and drawbacks of these 00 metrics are discussed. Several of Chidamber and Kamerer's 00 metrics appear to be useful to predict class fault-proneness during the early phases of the life-cycle. Also, on our data set, they are better predictors than 'traditional' code metrics, which can only be collected at a later phase of the software development processes.
Metric analysis and data validation across FORTRAN projects

NASA Technical Reports Server (NTRS)

Basili, Victor R.; Selby, Richard W., Jr.; Phillips, Tsai-Yun

1983-01-01

The desire to predict the effort in developing or explaining the quality of software has led to the proposal of several metrics. As a step toward validating these metrics, the Software Engineering Laboratory (SEL) has analyzed the software science metrics, cyclomatic complexity, and various standard program measures for their relation to effort (including design through acceptance testing), development errors (both discrete and weighted according to the amount of time to locate and fix), and one another. The data investigated are collected from a project FORTRAN environment and examined across several projects at once, within individual projects and by reporting accuracy checks demonstrating the need to validate a database. When the data comes from individual programmers or certain validated projects, the metrics' correlations with actual effort seem to be strongest. For modules developed entirely by individual programmers, the validity ratios induce a statistically significant ordering of several of the metrics' correlations. When comparing the strongest correlations, neither software science's E metric cyclomatic complexity not source lines of code appears to relate convincingly better with effort than the others.
A Validation of Object-Oriented Design Metrics

NASA Technical Reports Server (NTRS)

Basili, Victor R.; Briand, Lionel; Melo, Walcelio L.

1995-01-01

This paper presents the results of a study conducted at the University of Maryland in which we experimentally investigated the suite of Object-Oriented (00) design metrics introduced by [Chidamber and Kemerer, 1994]. In order to do this, we assessed these metrics as predictors of fault-prone classes. This study is complementary to [Lieand Henry, 1993] where the same suite of metrics had been used to assess frequencies of maintenance changes to classes. To perform our validation accurately, we collected data on the development of eight medium-sized information management systems based on identical requirements. All eight projects were developed using a sequential life cycle model, a well-known 00 analysis/design method and the C++ programming language. Based on experimental results, the advantages and drawbacks of these 00 metrics are discussed and suggestions for improvement are provided. Several of Chidamber and Kemerer's 00 metrics appear to be adequate to predict class fault-proneness during the early phases of the life-cycle. We also showed that they are, on our data set, better predictors than "traditional" code metrics, which can only be collected at a later phase of the software development processes.
Developing Image Processing Meta-Algorithms with Data Mining of Multiple Metrics

PubMed Central

Cunha, Alexandre; Toga, A. W.; Parker, D. Stott

2014-01-01

People often use multiple metrics in image processing, but here we take a novel approach of mining the values of batteries of metrics on image processing results. We present a case for extending image processing methods to incorporate automated mining of multiple image metric values. Here by a metric we mean any image similarity or distance measure, and in this paper we consider intensity-based and statistical image measures and focus on registration as an image processing problem. We show how it is possible to develop meta-algorithms that evaluate different image processing results with a number of different metrics and mine the results in an automated fashion so as to select the best results. We show that the mining of multiple metrics offers a variety of potential benefits for many image processing problems, including improved robustness and validation. PMID:24653748
Evaluation of an Integrated Framework for Biodiversity with a New Metric for Functional Dispersion

PubMed Central

Presley, Steven J.; Scheiner, Samuel M.; Willig, Michael R.

2014-01-01

Growing interest in understanding ecological patterns from phylogenetic and functional perspectives has driven the development of metrics that capture variation in evolutionary histories or ecological functions of species. Recently, an integrated framework based on Hill numbers was developed that measures three dimensions of biodiversity based on abundance, phylogeny and function of species. This framework is highly flexible, allowing comparison of those diversity dimensions, including different aspects of a single dimension and their integration into a single measure. The behavior of those metrics with regard to variation in data structure has not been explored in detail, yet is critical for ensuring an appropriate match between the concept and its measurement. We evaluated how each metric responds to particular data structures and developed a new metric for functional biodiversity. The phylogenetic metric is sensitive to variation in the topology of phylogenetic trees, including variation in the relative lengths of basal, internal and terminal branches. In contrast, the functional metric exhibited multiple shortcomings: (1) species that are functionally redundant contribute nothing to functional diversity and (2) a single highly distinct species causes functional diversity to approach the minimum possible value. We introduced an alternative, improved metric based on functional dispersion that solves both of these problems. In addition, the new metric exhibited more desirable behavior when based on multiple traits. PMID:25148103
Evaluating software development characteristics: Assessment of software measures in the Software Engineering Laboratory. [reliability engineering

NASA Technical Reports Server (NTRS)

Basili, V. R.

1981-01-01

Work on metrics is discussed. Factors that affect software quality are reviewed. Metrics is discussed in terms of criteria achievements, reliability, and fault tolerance. Subjective and objective metrics are distinguished. Product/process and cost/quality metrics are characterized and discussed.
Integral Full Core Multi-Physics PWR Benchmark with Measured Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Forget, Benoit; Smith, Kord; Kumar, Shikhar

In recent years, the importance of modeling and simulation has been highlighted extensively in the DOE research portfolio with concrete examples in nuclear engineering with the CASL and NEAMS programs. These research efforts and similar efforts worldwide aim at the development of high-fidelity multi-physics analysis tools for the simulation of current and next-generation nuclear power reactors. Like all analysis tools, verification and validation is essential to guarantee proper functioning of the software and methods employed. The current approach relies mainly on the validation of single physic phenomena (e.g. critical experiment, flow loops, etc.) and there is a lack of relevantmore » multiphysics benchmark measurements that are necessary to validate high-fidelity methods being developed today. This work introduces a new multi-cycle full-core Pressurized Water Reactor (PWR) depletion benchmark based on two operational cycles of a commercial nuclear power plant that provides a detailed description of fuel assemblies, burnable absorbers, in-core fission detectors, core loading and re-loading patterns. This benchmark enables analysts to develop extremely detailed reactor core models that can be used for testing and validation of coupled neutron transport, thermal-hydraulics, and fuel isotopic depletion. The benchmark also provides measured reactor data for Hot Zero Power (HZP) physics tests, boron letdown curves, and three-dimensional in-core flux maps from 58 instrumented assemblies. The benchmark description is now available online and has been used by many groups. However, much work remains to be done on the quantification of uncertainties and modeling sensitivities. This work aims to address these deficiencies and make this benchmark a true non-proprietary international benchmark for the validation of high-fidelity tools. This report details the BEAVRS uncertainty quantification for the first two cycle of operations and serves as the final report of the project.« less
Benchmarking: contexts and details matter.

PubMed

Zheng, Siyuan

2017-07-05

Benchmarking is an essential step in the development of computational tools. We take this opportunity to pitch in our opinions on tool benchmarking, in light of two correspondence articles published in Genome Biology.Please see related Li et al. and Newman et al. correspondence articles: www.dx.doi.org/10.1186/s13059-017-1256-5 and www.dx.doi.org/10.1186/s13059-017-1257-4.
Bilingual Metric Education Modules for Postsecondary and Adult Vocational Education. Final Report.

ERIC Educational Resources Information Center

Ellis Associates, Inc., College Park, MD.

A project was conducted to develop three metric education modules for use with bilingual (Spanish and English) students in postsecondary and adult vocational education programs. Developed for the first section of each module, five instructional units cover basic metric concepts: (1) measuring length and finding area, (2) measuring volume, (3)…
A Classification Scheme for Smart Manufacturing Systems’ Performance Metrics

PubMed Central

Lee, Y. Tina; Kumaraguru, Senthilkumaran; Jain, Sanjay; Robinson, Stefanie; Helu, Moneer; Hatim, Qais Y.; Rachuri, Sudarsan; Dornfeld, David; Saldana, Christopher J.; Kumara, Soundar

2017-01-01

This paper proposes a classification scheme for performance metrics for smart manufacturing systems. The discussion focuses on three such metrics: agility, asset utilization, and sustainability. For each of these metrics, we discuss classification themes, which we then use to develop a generalized classification scheme. In addition to the themes, we discuss a conceptual model that may form the basis for the information necessary for performance evaluations. Finally, we present future challenges in developing robust, performance-measurement systems for real-time, data-intensive enterprises. PMID:28785744
A software quality model and metrics for risk assessment

NASA Technical Reports Server (NTRS)

Hyatt, L.; Rosenberg, L.

1996-01-01

A software quality model and its associated attributes are defined and used as the model for the basis for a discussion on risk. Specific quality goals and attributes are selected based on their importance to a software development project and their ability to be quantified. Risks that can be determined by the model's metrics are identified. A core set of metrics relating to the software development process and its products is defined. Measurements for each metric and their usability and applicability are discussed.
Benchmarking in pathology: development of an activity-based costing model.

PubMed

Burnett, Leslie; Wilson, Roger; Pfeffer, Sally; Lowry, John

2012-12-01

Benchmarking in Pathology (BiP) allows pathology laboratories to determine the unit cost of all laboratory tests and procedures, and also provides organisational productivity indices allowing comparisons of performance with other BiP participants. We describe 14 years of progressive enhancement to a BiP program, including the implementation of 'avoidable costs' as the accounting basis for allocation of costs rather than previous approaches using 'total costs'. A hierarchical tree-structured activity-based costing model distributes 'avoidable costs' attributable to the pathology activities component of a pathology laboratory operation. The hierarchical tree model permits costs to be allocated across multiple laboratory sites and organisational structures. This has enabled benchmarking on a number of levels, including test profiles and non-testing related workload activities. The development of methods for dealing with variable cost inputs, allocation of indirect costs using imputation techniques, panels of tests, and blood-bank record keeping, have been successfully integrated into the costing model. A variety of laboratory management reports are produced, including the 'cost per test' of each pathology 'test' output. Benchmarking comparisons may be undertaken at any and all of the 'cost per test' and 'cost per Benchmarking Complexity Unit' level, 'discipline/department' (sub-specialty) level, or overall laboratory/site and organisational levels. We have completed development of a national BiP program. An activity-based costing methodology based on avoidable costs overcomes many problems of previous benchmarking studies based on total costs. The use of benchmarking complexity adjustment permits correction for varying test-mix and diagnostic complexity between laboratories. Use of iterative communication strategies with program participants can overcome many obstacles and lead to innovations.
Benchmarking: applications to transfusion medicine.

PubMed

Apelseth, Torunn Oveland; Molnar, Laura; Arnold, Emmy; Heddle, Nancy M

2012-10-01

Benchmarking is as a structured continuous collaborative process in which comparisons for selected indicators are used to identify factors that, when implemented, will improve transfusion practices. This study aimed to identify transfusion medicine studies reporting on benchmarking, summarize the benchmarking approaches used, and identify important considerations to move the concept of benchmarking forward in the field of transfusion medicine. A systematic review of published literature was performed to identify transfusion medicine-related studies that compared at least 2 separate institutions or regions with the intention of benchmarking focusing on 4 areas: blood utilization, safety, operational aspects, and blood donation. Forty-five studies were included: blood utilization (n = 35), safety (n = 5), operational aspects of transfusion medicine (n = 5), and blood donation (n = 0). Based on predefined criteria, 7 publications were classified as benchmarking, 2 as trending, and 36 as single-event studies. Three models of benchmarking are described: (1) a regional benchmarking program that collects and links relevant data from existing electronic sources, (2) a sentinel site model where data from a limited number of sites are collected, and (3) an institutional-initiated model where a site identifies indicators of interest and approaches other institutions. Benchmarking approaches are needed in the field of transfusion medicine. Major challenges include defining best practices and developing cost-effective methods of data collection. For those interested in initiating a benchmarking program, the sentinel site model may be most effective and sustainable as a starting point, although the regional model would be the ideal goal. Copyright © 2012 Elsevier Inc. All rights reserved.
Utilizing a Trauma Systems Approach to Benchmark and Improve Combat Casualty Care

DTIC Science & Technology

2010-07-01

modern battlefield utilizing evidence - based medicine . The development of injury care benchmarks enhanced the evolution of the combat casualty care performance improvement process within the trauma system.
Forging the Basis for Developing Protein-Ligand Interaction Scoring Functions.

PubMed

Liu, Zhihai; Su, Minyi; Han, Li; Liu, Jie; Yang, Qifan; Li, Yan; Wang, Renxiao

2017-02-21

In structure-based drug design, scoring functions are widely used for fast evaluation of protein-ligand interactions. They are often applied in combination with molecular docking and de novo design methods. Since the early 1990s, a whole spectrum of protein-ligand interaction scoring functions have been developed. Regardless of their technical difference, scoring functions all need data sets combining protein-ligand complex structures and binding affinity data for parametrization and validation. However, data sets of this kind used to be rather limited in terms of size and quality. On the other hand, standard metrics for evaluating scoring function used to be ambiguous. Scoring functions are often tested in molecular docking or even virtual screening trials, which do not directly reflect the genuine quality of scoring functions. Collectively, these underlying obstacles have impeded the invention of more advanced scoring functions. In this Account, we describe our long-lasting efforts to overcome these obstacles, which involve two related projects. On the first project, we have created the PDBbind database. It is the first database that systematically annotates the protein-ligand complexes in the Protein Data Bank (PDB) with experimental binding data. This database has been updated annually since its first public release in 2004. The latest release (version 2016) provides binding data for 16 179 biomolecular complexes in PDB. Data sets provided by PDBbind have been applied to many computational and statistical studies on protein-ligand interaction and various subjects. In particular, it has become a major data resource for scoring function development. On the second project, we have established the Comparative Assessment of Scoring Functions (CASF) benchmark for scoring function evaluation. Our key idea is to decouple the "scoring" process from the "sampling" process, so scoring functions can be tested in a relatively pure context to reflect their quality. In our latest work on this track, i.e. CASF-2013, the performance of a scoring function was quantified in four aspects, including "scoring power", "ranking power", "docking power", and "screening power". All four performance tests were conducted on a test set containing 195 high-quality protein-ligand complexes selected from PDBbind. A panel of 20 standard scoring functions were tested as demonstration. Importantly, CASF is designed to be an open-access benchmark, with which scoring functions developed by different researchers can be compared on the same grounds. Indeed, it has become a popular choice for scoring function validation in recent years. Despite the considerable progress that has been made so far, the performance of today's scoring functions still does not meet people's expectations in many aspects. There is a constant demand for more advanced scoring functions. Our efforts have helped to overcome some obstacles underlying scoring function development so that the researchers in this field can move forward faster. We will continue to improve the PDBbind database and the CASF benchmark in the future to keep them as useful community resources.
Seismo-acoustic ray model benchmarking against experimental tank data.

PubMed

Camargo Rodríguez, Orlando; Collis, Jon M; Simpson, Harry J; Ey, Emanuel; Schneiderwind, Joseph; Felisberto, Paulo

2012-08-01

Acoustic predictions of the recently developed traceo ray model, which accounts for bottom shear properties, are benchmarked against tank experimental data from the EPEE-1 and EPEE-2 (Elastic Parabolic Equation Experiment) experiments. Both experiments are representative of signal propagation in a Pekeris-like shallow-water waveguide over a non-flat isotropic elastic bottom, where significant interaction of the signal with the bottom can be expected. The benchmarks show, in particular, that the ray model can be as accurate as a parabolic approximation model benchmarked in similar conditions. The results of benchmarking are important, on one side, as a preliminary experimental validation of the model and, on the other side, demonstrates the reliability of the ray approach for seismo-acoustic applications.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.