metric development benchmarking: Topics by Science.gov

Sample records for metric development benchmarking

Issues in Benchmark Metric Selection

NASA Astrophysics Data System (ADS)

Crolotte, Alain

It is true that a metric can influence a benchmark but will esoteric metrics create more problems than they will solve? We answer this question affirmatively by examining the case of the TPC-D metric which used the much debated geometric mean for the single-stream test. We will show how a simple choice influenced the benchmark and its conduct and, to some extent, DBMS development. After examining other alternatives our conclusion is that the “real” measure for a decision-support benchmark is the arithmetic mean.
Conceptual Soundness, Metric Development, Benchmarking, and Targeting for PATH Subprogram Evaluation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mosey. G.; Doris, E.; Coggeshall, C.

The objective of this study is to evaluate the conceptual soundness of the U.S. Department of Housing and Urban Development (HUD) Partnership for Advancing Technology in Housing (PATH) program's revised goals and establish and apply a framework to identify and recommend metrics that are the most useful for measuring PATH's progress. This report provides an evaluative review of PATH's revised goals, outlines a structured method for identifying and selecting metrics, proposes metrics and benchmarks for a sampling of individual PATH programs, and discusses other metrics that potentially could be developed that may add value to the evaluation process. The frameworkmore » and individual program metrics can be used for ongoing management improvement efforts and to inform broader program-level metrics for government reporting requirements.« less
Beyond Benchmarking: Value-Adding Metrics

ERIC Educational Resources Information Center

Fitz-enz, Jac

2007-01-01

HR metrics has grown up a bit over the past two decades, moving away from simple benchmarking practices and toward a more inclusive approach to measuring institutional performance and progress. In this article, the acknowledged "father" of human capital performance benchmarking provides an overview of several aspects of today's HR metrics…
A Web Resource for Standardized Benchmark Datasets, Metrics, and Rosetta Protocols for Macromolecular Modeling and Design.

PubMed

Ó Conchúir, Shane; Barlow, Kyle A; Pache, Roland A; Ollikainen, Noah; Kundert, Kale; O'Meara, Matthew J; Smith, Colin A; Kortemme, Tanja

2015-01-01

The development and validation of computational macromolecular modeling and design methods depend on suitable benchmark datasets and informative metrics for comparing protocols. In addition, if a method is intended to be adopted broadly in diverse biological applications, there needs to be information on appropriate parameters for each protocol, as well as metrics describing the expected accuracy compared to experimental data. In certain disciplines, there exist established benchmarks and public resources where experts in a particular methodology are encouraged to supply their most efficient implementation of each particular benchmark. We aim to provide such a resource for protocols in macromolecular modeling and design. We present a freely accessible web resource (https://kortemmelab.ucsf.edu/benchmarks) to guide the development of protocols for protein modeling and design. The site provides benchmark datasets and metrics to compare the performance of a variety of modeling protocols using different computational sampling methods and energy functions, providing a "best practice" set of parameters for each method. Each benchmark has an associated downloadable benchmark capture archive containing the input files, analysis scripts, and tutorials for running the benchmark. The captures may be run with any suitable modeling method; we supply command lines for running the benchmarks using the Rosetta software suite. We have compiled initial benchmarks for the resource spanning three key areas: prediction of energetic effects of mutations, protein design, and protein structure prediction, each with associated state-of-the-art modeling protocols. With the help of the wider macromolecular modeling community, we hope to expand the variety of benchmarks included on the website and continue to evaluate new iterations of current methods as they become available.
Collected notes from the Benchmarks and Metrics Workshop

NASA Technical Reports Server (NTRS)

Drummond, Mark E.; Kaelbling, Leslie P.; Rosenschein, Stanley J.

1991-01-01

In recent years there has been a proliferation of proposals in the artificial intelligence (AI) literature for integrated agent architectures. Each architecture offers an approach to the general problem of constructing an integrated agent. Unfortunately, the ways in which one architecture might be considered better than another are not always clear. There has been a growing realization that many of the positive and negative aspects of an architecture become apparent only when experimental evaluation is performed and that to progress as a discipline, we must develop rigorous experimental methods. In addition to the intrinsic intellectual interest of experimentation, rigorous performance evaluation of systems is also a crucial practical concern to our research sponsors. DARPA, NASA, and AFOSR (among others) are actively searching for better ways of experimentally evaluating alternative approaches to building intelligent agents. One tool for experimental evaluation involves testing systems on benchmark tasks in order to assess their relative performance. As part of a joint DARPA and NASA funded project, NASA-Ames and Teleos Research are carrying out a research effort to establish a set of benchmark tasks and evaluation metrics by which the performance of agent architectures may be determined. As part of this project, we held a workshop on Benchmarks and Metrics at the NASA Ames Research Center on June 25, 1990. The objective of the workshop was to foster early discussion on this important topic. We did not achieve a consensus, nor did we expect to. Collected here is some of the information that was exchanged at the workshop. Given here is an outline of the workshop, a list of the participants, notes taken on the white-board during open discussions, position papers/notes from some participants, and copies of slides used in the presentations.
Aluminum-Mediated Formation of Cyclic Carbonates: Benchmarking Catalytic Performance Metrics.

PubMed

Rintjema, Jeroen; Kleij, Arjan W

2017-03-22

We report a comparative study on the activity of a series of fifteen binary catalysts derived from various reported aluminum-based complexes. A benchmarking of their initial rates in the coupling of various terminal and internal epoxides in the presence of three different nucleophilic additives was carried out, providing for the first time a useful comparison of activity metrics in the area of cyclic organic carbonate formation. These investigations provide a useful framework for how to realistically valorize relative reactivities and which features are important when considering the ideal operational window of each binary catalyst system. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Improved product energy intensity benchmarking metrics for thermally concentrated food products.

PubMed

Walker, Michael E; Arnold, Craig S; Lettieri, David J; Hutchins, Margot J; Masanet, Eric

2014-10-21

Product energy intensity (PEI) metrics allow industry and policymakers to quantify manufacturing energy requirements on a product-output basis. However, complexities can arise for benchmarking of thermally concentrated products, particularly in the food processing industry, due to differences in outlet composition, feed material composition, and processing technology. This study analyzes tomato paste as a typical, high-volume concentrated product using a thermodynamics-based model. Results show that PEI for tomato pastes and purees varies from 1200 to 9700 kJ/kg over the range of 8%-40% outlet solids concentration for a 3-effect evaporator, and 980-7000 kJ/kg for a 5-effect evaporator. Further, the PEI for producing paste at 31% outlet solids concentration in a 3-effect evaporator varies from 13,000 kJ/kg at 3% feed solids concentration to 5900 kJ/kg at 6%; for a 5-effect evaporator, the variation is from 9200 kJ/kg at 3%, to 4300 kJ/kg at 6%. Methods to compare the PEI of different product concentrations on a standard basis are evaluated. This paper also presents methods to develop PEI benchmark values for multiple plants. These results focus on the case of a tomato paste processing facility, but can be extended to other products and industries that utilize thermal concentration.
Performance Benchmarks for Scholarly Metrics Associated with Fisheries and Wildlife Faculty

PubMed Central

Swihart, Robert K.; Sundaram, Mekala; Höök, Tomas O.; DeWoody, J. Andrew; Kellner, Kenneth F.

2016-01-01

Research productivity and impact are often considered in professional evaluations of academics, and performance metrics based on publications and citations increasingly are used in such evaluations. To promote evidence-based and informed use of these metrics, we collected publication and citation data for 437 tenure-track faculty members at 33 research-extensive universities in the United States belonging to the National Association of University Fisheries and Wildlife Programs. For each faculty member, we computed 8 commonly used performance metrics based on numbers of publications and citations, and recorded covariates including academic age (time since Ph.D.), sex, percentage of appointment devoted to research, and the sub-disciplinary research focus. Standardized deviance residuals from regression models were used to compare faculty after accounting for variation in performance due to these covariates. We also aggregated residuals to enable comparison across universities. Finally, we tested for temporal trends in citation practices to assess whether the “law of constant ratios”, used to enable comparison of performance metrics between disciplines that differ in citation and publication practices, applied to fisheries and wildlife sub-disciplines when mapped to Web of Science Journal Citation Report categories. Our regression models reduced deviance by ¼ to ½. Standardized residuals for each faculty member, when combined across metrics as a simple average or weighted via factor analysis, produced similar results in terms of performance based on percentile rankings. Significant variation was observed in scholarly performance across universities, after accounting for the influence of covariates. In contrast to findings for other disciplines, normalized citation ratios for fisheries and wildlife sub-disciplines increased across years. Increases were comparable for all sub-disciplines except ecology. We discuss the advantages and limitations of our methods
Performance Benchmarks for Scholarly Metrics Associated with Fisheries and Wildlife Faculty.

PubMed

Swihart, Robert K; Sundaram, Mekala; Höök, Tomas O; DeWoody, J Andrew; Kellner, Kenneth F

2016-01-01

Research productivity and impact are often considered in professional evaluations of academics, and performance metrics based on publications and citations increasingly are used in such evaluations. To promote evidence-based and informed use of these metrics, we collected publication and citation data for 437 tenure-track faculty members at 33 research-extensive universities in the United States belonging to the National Association of University Fisheries and Wildlife Programs. For each faculty member, we computed 8 commonly used performance metrics based on numbers of publications and citations, and recorded covariates including academic age (time since Ph.D.), sex, percentage of appointment devoted to research, and the sub-disciplinary research focus. Standardized deviance residuals from regression models were used to compare faculty after accounting for variation in performance due to these covariates. We also aggregated residuals to enable comparison across universities. Finally, we tested for temporal trends in citation practices to assess whether the "law of constant ratios", used to enable comparison of performance metrics between disciplines that differ in citation and publication practices, applied to fisheries and wildlife sub-disciplines when mapped to Web of Science Journal Citation Report categories. Our regression models reduced deviance by ¼ to ½. Standardized residuals for each faculty member, when combined across metrics as a simple average or weighted via factor analysis, produced similar results in terms of performance based on percentile rankings. Significant variation was observed in scholarly performance across universities, after accounting for the influence of covariates. In contrast to findings for other disciplines, normalized citation ratios for fisheries and wildlife sub-disciplines increased across years. Increases were comparable for all sub-disciplines except ecology. We discuss the advantages and limitations of our methods
Benchmarking Investments in Advancement: Results of the Inaugural CASE Advancement Investment Metrics Study (AIMS). CASE White Paper

ERIC Educational Resources Information Center

Kroll, Juidith A.

2012-01-01

The inaugural Advancement Investment Metrics Study, or AIMS, benchmarked investments and staffing in each of the advancement disciplines (advancement services, alumni relations, communications and marketing, fundraising and advancement management) as well as the return on the investment in fundraising specifically. This white paper reports on the…
Mean Abnormal Result Rate: Proof of Concept of a New Metric for Benchmarking Selectivity in Laboratory Test Ordering.

PubMed

Naugler, Christopher T; Guo, Maggie

2016-04-01

There is a need to develop and validate new metrics to access the appropriateness of laboratory test requests. The mean abnormal result rate (MARR) is a proposed measure of ordering selectivity, the premise being that higher mean abnormal rates represent more selective test ordering. As a validation of this metric, we compared the abnormal rate of lab tests with the number of tests ordered on the same requisition. We hypothesized that requisitions with larger numbers of requested tests represent less selective test ordering and therefore would have a lower overall abnormal rate. We examined 3,864,083 tests ordered on 451,895 requisitions and found that the MARR decreased from about 25% if one test was ordered to about 7% if nine or more tests were ordered, consistent with less selectivity when more tests were ordered. We then examined the MARR for community-based testing for 1,340 family physicians and found both a wide variation in MARR as well as an inverse relationship between the total tests ordered per year per physician and the physician-specific MARR. The proposed metric represents a new utilization metric for benchmarking relative selectivity of test orders among physicians. © American Society for Clinical Pathology, 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Benchmarking in pathology: development of a benchmarking complexity unit and associated key performance indicators.

PubMed

Neil, Amanda; Pfeffer, Sally; Burnett, Leslie

2013-01-01

This paper details the development of a new type of pathology laboratory productivity unit, the benchmarking complexity unit (BCU). The BCU provides a comparative index of laboratory efficiency, regardless of test mix. It also enables estimation of a measure of how much complex pathology a laboratory performs, and the identification of peer organisations for the purposes of comparison and benchmarking. The BCU is based on the theory that wage rates reflect productivity at the margin. A weighting factor for the ratio of medical to technical staff time was dynamically calculated based on actual participant site data. Given this weighting, a complexity value for each test, at each site, was calculated. The median complexity value (number of BCUs) for that test across all participating sites was taken as its complexity value for the Benchmarking in Pathology Program. The BCU allowed implementation of an unbiased comparison unit and test listing that was found to be a robust indicator of the relative complexity for each test. Employing the BCU data, a number of Key Performance Indicators (KPIs) were developed, including three that address comparative organisational complexity, analytical depth and performance efficiency, respectively. Peer groups were also established using the BCU combined with simple organisational and environmental metrics. The BCU has enabled productivity statistics to be compared between organisations. The BCU corrects for differences in test mix and workload complexity of different organisations and also allows for objective stratification into peer groups.
Establishing benchmarks and metrics for disruptive technologies, inappropriate and obsolete tests in the clinical laboratory.

PubMed

Kiechle, Frederick L; Arcenas, Rodney C; Rogers, Linda C

2014-01-01

Benchmarks and metrics related to laboratory test utilization are based on evidence-based medical literature that may suffer from a positive publication bias. Guidelines are only as good as the data reviewed to create them. Disruptive technologies require time for appropriate use to be established before utilization review will be meaningful. Metrics include monitoring the use of obsolete tests and the inappropriate use of lab tests. Test utilization by clients in a hospital outreach program can be used to monitor the impact of new clients on lab workload. A multi-disciplinary laboratory utilization committee is the most effective tool for modifying bad habits, and reviewing and approving new tests for the lab formulary or by sending them out to a reference lab. Copyright © 2013 Elsevier B.V. All rights reserved.
Benchmarking infrastructure for mutation text mining

PubMed Central

2014-01-01

Background Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. Results We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. Conclusion We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption. PMID:24568600
Benchmarking infrastructure for mutation text mining.

PubMed

Klein, Artjom; Riazanov, Alexandre; Hindle, Matthew M; Baker, Christopher Jo

2014-02-25

Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption.
Metric Evaluation Pipeline for 3d Modeling of Urban Scenes

NASA Astrophysics Data System (ADS)

Bosch, M.; Leichtman, A.; Chilcott, D.; Goldberg, H.; Brown, M.

2017-05-01

Publicly available benchmark data and metric evaluation approaches have been instrumental in enabling research to advance state of the art methods for remote sensing applications in urban 3D modeling. Most publicly available benchmark datasets have consisted of high resolution airborne imagery and lidar suitable for 3D modeling on a relatively modest scale. To enable research in larger scale 3D mapping, we have recently released a public benchmark dataset with multi-view commercial satellite imagery and metrics to compare 3D point clouds with lidar ground truth. We now define a more complete metric evaluation pipeline developed as publicly available open source software to assess semantically labeled 3D models of complex urban scenes derived from multi-view commercial satellite imagery. Evaluation metrics in our pipeline include horizontal and vertical accuracy and completeness, volumetric completeness and correctness, perceptual quality, and model simplicity. Sources of ground truth include airborne lidar and overhead imagery, and we demonstrate a semi-automated process for producing accurate ground truth shape files to characterize building footprints. We validate our current metric evaluation pipeline using 3D models produced using open source multi-view stereo methods. Data and software is made publicly available to enable further research and planned benchmarking activities.
Evaluative Usage-Based Metrics for the Selection of E-Journals.

ERIC Educational Resources Information Center

Hahn, Karla L.; Faulkner, Lila A.

2002-01-01

Explores electronic journal usage statistics and develops three metrics and three benchmarks based on those metrics. Topics include earlier work that assessed the value of print journals and was modified for the electronic format; the evaluation of potential purchases; and implications for standards development, including the need for content…
HPGMG 1.0: A Benchmark for Ranking High Performance Computing Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Adams, Mark; Brown, Jed; Shalf, John

2014-05-05

This document provides an overview of the benchmark ? HPGMG ? for ranking large scale general purpose computers for use on the Top500 list [8]. We provide a rationale for the need for a replacement for the current metric HPL, some background of the Top500 list and the challenges of developing such a metric; we discuss our design philosophy and methodology, and an overview of the specification of the benchmark. The primary documentation with maintained details on the specification can be found at hpgmg.org and the Wiki and benchmark code itself can be found in the repository https://bitbucket.org/hpgmg/hpgmg.
Benchmarking Diagnostic Algorithms on an Electrical Power System Testbed

NASA Technical Reports Server (NTRS)

Kurtoglu, Tolga; Narasimhan, Sriram; Poll, Scott; Garcia, David; Wright, Stephanie

2009-01-01

Diagnostic algorithms (DAs) are key to enabling automated health management. These algorithms are designed to detect and isolate anomalies of either a component or the whole system based on observations received from sensors. In recent years a wide range of algorithms, both model-based and data-driven, have been developed to increase autonomy and improve system reliability and affordability. However, the lack of support to perform systematic benchmarking of these algorithms continues to create barriers for effective development and deployment of diagnostic technologies. In this paper, we present our efforts to benchmark a set of DAs on a common platform using a framework that was developed to evaluate and compare various performance metrics for diagnostic technologies. The diagnosed system is an electrical power system, namely the Advanced Diagnostics and Prognostics Testbed (ADAPT) developed and located at the NASA Ames Research Center. The paper presents the fundamentals of the benchmarking framework, the ADAPT system, description of faults and data sets, the metrics used for evaluation, and an in-depth analysis of benchmarking results obtained from testing ten diagnostic algorithms on the ADAPT electrical power system testbed.
Developing integrated benchmarks for DOE performance measurement

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barancik, J.I.; Kramer, C.F.; Thode, Jr. H.C.

1992-09-30

The objectives of this task were to describe and evaluate selected existing sources of information on occupational safety and health with emphasis on hazard and exposure assessment, abatement, training, reporting, and control identifying for exposure and outcome in preparation for developing DOE performance benchmarks. Existing resources and methodologies were assessed for their potential use as practical performance benchmarks. Strengths and limitations of current data resources were identified. Guidelines were outlined for developing new or improved performance factors, which then could become the basis for selecting performance benchmarks. Data bases for non-DOE comparison populations were identified so that DOE performance couldmore » be assessed relative to non-DOE occupational and industrial groups. Systems approaches were described which can be used to link hazards and exposure, event occurrence, and adverse outcome factors, as needed to generate valid, reliable, and predictive performance benchmarks. Data bases were identified which contain information relevant to one or more performance assessment categories . A list of 72 potential performance benchmarks was prepared to illustrate the kinds of information that can be produced through a benchmark development program. Current information resources which may be used to develop potential performance benchmarks are limited. There is need to develop an occupational safety and health information and data system in DOE, which is capable of incorporating demonstrated and documented performance benchmarks prior to, or concurrent with the development of hardware and software. A key to the success of this systems approach is rigorous development and demonstration of performance benchmark equivalents to users of such data before system hardware and software commitments are institutionalized.« less

Benchmarking the Wilmer general eye services clinics: baseline metrics for surgical and outpatient clinic volume in an educational environment.

PubMed

Singman, Eric; Srikumaran, Divya; Hackett, Kathy; Kaplan, Brian; Jun, Albert; Preece, Derek; Ramulu, Pradeep

2016-01-27

The Wilmer General Eye Services (GES) at the Johns Hopkins Hospital is the clinic where residents provide supervised comprehensive medical and surgical care to ophthalmology patients. The clinic schedule and supervision structure allows for a progressive increase in trainee responsibility, with graduated autonomy and longitudinal continuity of care over the three years of ophthalmology residency training. This study sought to determine the number of cases the GES contributes to the resident surgical experiences. In addition, it was intended to create benchmarks for patient volumes, cataract surgery yield and room utilization as part of an educational initiative to introduce residents to metrics important for practice management. The electronic surgical posting system database was explored to determine the numbers of cases scheduled for patients seen by residents in the GES. In addition, aggregated residents' self-reported Accreditation Council for Graduate Medical Education (ACGME) surgical logs were collected for comparison. Finally transactional databases were queried to determine clinic volumes of new and established patients. The proportion of resident surgeries (1(st) surgeon and assistant) provided by GES patients, cataract surgery yield and new patient rates were calculated. Data was collected from July 1(st), 2014 until March 31(st), 2015 for all 16 residents (6 third year, 5 second year and 5 first year). The percentage of cataract, oculoplastics, cornea and glaucoma surgeries in which a resident was 1(st) surgeon and the patient came from the GES was 91.3, 76.1, 65.6, and 93.9 respectively. The new patient rate was 28.1% and room utilization was 50.4%. Cataract surgery yield was 29.2 DISCUSSION: The GES provides a significant proportion of primary surgeon opportunities for the residents, and in some instances, the majority of cases. Compared to benchmarks available for private practices, the new patient rate is high while the cataract surgery yield is low
Restaurant Energy Use Benchmarking Guideline

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hedrick, R.; Smith, V.; Field, K.

2011-07-01

A significant operational challenge for food service operators is defining energy use benchmark metrics to compare against the performance of individual stores. Without metrics, multiunit operators and managers have difficulty identifying which stores in their portfolios require extra attention to bring their energy performance in line with expectations. This report presents a method whereby multiunit operators may use their own utility data to create suitable metrics for evaluating their operations.
International land Model Benchmarking (ILAMB) Package v002.00

DOE Data Explorer

Collier, Nathaniel [Oak Ridge National Laboratory; Hoffman, Forrest M. [Oak Ridge National Laboratory; Mu, Mingquan [University of California, Irvine; Randerson, James T. [University of California, Irvine; Riley, William J. [Lawrence Berkeley National Laboratory

2016-05-09

As a contribution to International Land Model Benchmarking (ILAMB) Project, we are providing new analysis approaches, benchmarking tools, and science leadership. The goal of ILAMB is to assess and improve the performance of land models through international cooperation and to inform the design of new measurement campaigns and field studies to reduce uncertainties associated with key biogeochemical processes and feedbacks. ILAMB is expected to be a primary analysis tool for CMIP6 and future model-data intercomparison experiments. This team has developed initial prototype benchmarking systems for ILAMB, which will be improved and extended to include ocean model metrics and diagnostics.
International land Model Benchmarking (ILAMB) Package v001.00

DOE Data Explorer

Mu, Mingquan [University of California, Irvine; Randerson, James T. [University of California, Irvine; Riley, William J. [Lawrence Berkeley National Laboratory; Hoffman, Forrest M. [Oak Ridge National Laboratory

2016-05-02

As a contribution to International Land Model Benchmarking (ILAMB) Project, we are providing new analysis approaches, benchmarking tools, and science leadership. The goal of ILAMB is to assess and improve the performance of land models through international cooperation and to inform the design of new measurement campaigns and field studies to reduce uncertainties associated with key biogeochemical processes and feedbacks. ILAMB is expected to be a primary analysis tool for CMIP6 and future model-data intercomparison experiments. This team has developed initial prototype benchmarking systems for ILAMB, which will be improved and extended to include ocean model metrics and diagnostics.
Measuring Distribution Performance? Benchmarking Warrants Your Attention

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ericson, Sean J; Alvarez, Paul

Identifying, designing, and measuring performance metrics is critical to securing customer value, but can be a difficult task. This article examines the use of benchmarks based on publicly available performance data to set challenging, yet fair, metrics and targets.
EVA Human Health and Performance Benchmarking Study Overview and Development of a Microgravity Protocol

NASA Technical Reports Server (NTRS)

Norcross, Jason; Jarvis, Sarah; Bekdash, Omar; Cupples, Scott; Abercromby, Andrew

2017-01-01

The primary objective of this study is to develop a protocol to reliably characterize human health and performance metrics for individuals working inside various EVA suits under realistic spaceflight conditions. Expected results and methodologies developed during this study will provide the baseline benchmarking data and protocols with which future EVA suits and suit configurations (e.g., varied pressure, mass, center of gravity [CG]) and different test subject populations (e.g., deconditioned crewmembers) may be reliably assessed and compared. Results may also be used, in conjunction with subsequent testing, to inform fitness-for-duty standards, as well as design requirements and operations concepts for future EVA suits and other exploration systems.
Developing Benchmarks for Solar Radio Bursts

NASA Astrophysics Data System (ADS)

Biesecker, D. A.; White, S. M.; Gopalswamy, N.; Black, C.; Domm, P.; Love, J. J.; Pierson, J.

2016-12-01

Solar radio bursts can interfere with radar, communication, and tracking signals. In severe cases, radio bursts can inhibit the successful use of radio communications and disrupt a wide range of systems that are reliant on Position, Navigation, and Timing services on timescales ranging from minutes to hours across wide areas on the dayside of Earth. The White House's Space Weather Action Plan has asked for solar radio burst intensity benchmarks for an event occurrence frequency of 1 in 100 years and also a theoretical maximum intensity benchmark. The solar radio benchmark team was also asked to define the wavelength/frequency bands of interest. The benchmark team developed preliminary (phase 1) benchmarks for the VHF (30-300 MHz), UHF (300-3000 MHz), GPS (1176-1602 MHz), F10.7 (2800 MHz), and Microwave (4000-20000) bands. The preliminary benchmarks were derived based on previously published work. Limitations in the published work will be addressed in phase 2 of the benchmark process. In addition, deriving theoretical maxima requires additional work, where it is even possible to, in order to meet the Action Plan objectives. In this presentation, we will present the phase 1 benchmarks and the basis used to derive them. We will also present the work that needs to be done in order to complete the final, or phase 2 benchmarks.
Benchmarking in Academic Pharmacy Departments

PubMed Central

Chisholm-Burns, Marie; Nappi, Jean; Gubbins, Paul O.; Ross, Leigh Ann

2010-01-01

Benchmarking in academic pharmacy, and recommendations for the potential uses of benchmarking in academic pharmacy departments are discussed in this paper. Benchmarking is the process by which practices, procedures, and performance metrics are compared to an established standard or best practice. Many businesses and industries use benchmarking to compare processes and outcomes, and ultimately plan for improvement. Institutions of higher learning have embraced benchmarking practices to facilitate measuring the quality of their educational and research programs. Benchmarking is used internally as well to justify the allocation of institutional resources or to mediate among competing demands for additional program staff or space. Surveying all chairs of academic pharmacy departments to explore benchmarking issues such as department size and composition, as well as faculty teaching, scholarly, and service productivity, could provide valuable information. To date, attempts to gather this data have had limited success. We believe this information is potentially important, urge that efforts to gather it should be continued, and offer suggestions to achieve full participation. PMID:21179251
Benchmarking in academic pharmacy departments.

PubMed

Bosso, John A; Chisholm-Burns, Marie; Nappi, Jean; Gubbins, Paul O; Ross, Leigh Ann

2010-10-11

Benchmarking in academic pharmacy, and recommendations for the potential uses of benchmarking in academic pharmacy departments are discussed in this paper. Benchmarking is the process by which practices, procedures, and performance metrics are compared to an established standard or best practice. Many businesses and industries use benchmarking to compare processes and outcomes, and ultimately plan for improvement. Institutions of higher learning have embraced benchmarking practices to facilitate measuring the quality of their educational and research programs. Benchmarking is used internally as well to justify the allocation of institutional resources or to mediate among competing demands for additional program staff or space. Surveying all chairs of academic pharmacy departments to explore benchmarking issues such as department size and composition, as well as faculty teaching, scholarly, and service productivity, could provide valuable information. To date, attempts to gather this data have had limited success. We believe this information is potentially important, urge that efforts to gather it should be continued, and offer suggestions to achieve full participation.
How to Advance TPC Benchmarks with Dependability Aspects

NASA Astrophysics Data System (ADS)

Almeida, Raquel; Poess, Meikel; Nambiar, Raghunath; Patil, Indira; Vieira, Marco

Transactional systems are the core of the information systems of most organizations. Although there is general acknowledgement that failures in these systems often entail significant impact both on the proceeds and reputation of companies, the benchmarks developed and managed by the Transaction Processing Performance Council (TPC) still maintain their focus on reporting bare performance. Each TPC benchmark has to pass a list of dependability-related tests (to verify ACID properties), but not all benchmarks require measuring their performances. While TPC-E measures the recovery time of some system failures, TPC-H and TPC-C only require functional correctness of such recovery. Consequently, systems used in TPC benchmarks are tuned mostly for performance. In this paper we argue that nowadays systems should be tuned for a more comprehensive suite of dependability tests, and that a dependability metric should be part of TPC benchmark publications. The paper discusses WHY and HOW this can be achieved. Two approaches are introduced and discussed: augmenting each TPC benchmark in a customized way, by extending each specification individually; and pursuing a more unified approach, defining a generic specification that could be adjoined to any TPC benchmark.
Benchmarking Using Basic DBMS Operations

NASA Astrophysics Data System (ADS)

Crolotte, Alain; Ghazal, Ahmad

The TPC-H benchmark proved to be successful in the decision support area. Many commercial database vendors and their related hardware vendors used these benchmarks to show the superiority and competitive edge of their products. However, over time, the TPC-H became less representative of industry trends as vendors keep tuning their database to this benchmark-specific workload. In this paper, we present XMarq, a simple benchmark framework that can be used to compare various software/hardware combinations. Our benchmark model is currently composed of 25 queries that measure the performance of basic operations such as scans, aggregations, joins and index access. This benchmark model is based on the TPC-H data model due to its maturity and well-understood data generation capability. We also propose metrics to evaluate single-system performance and compare two systems. Finally we illustrate the effectiveness of this model by showing experimental results comparing two systems under different conditions.
Benchmarking the performance of fixed-image receptor digital radiography systems. Part 2: system performance metric.

PubMed

Lee, Kam L; Bernardo, Michael; Ireland, Timothy A

2016-06-01

This is part two of a two-part study in benchmarking system performance of fixed digital radiographic systems. The study compares the system performance of seven fixed digital radiography systems based on quantitative metrics like modulation transfer function (sMTF), normalised noise power spectrum (sNNPS), detective quantum efficiency (sDQE) and entrance surface air kerma (ESAK). It was found that the most efficient image receptors (greatest sDQE) were not necessarily operating at the lowest ESAK. In part one of this study, sMTF is shown to depend on system configuration while sNNPS is shown to be relatively consistent across systems. Systems are ranked on their signal-to-noise ratio efficiency (sDQE) and their ESAK. Systems using the same equipment configuration do not necessarily have the same system performance. This implies radiographic practice at the site will have an impact on the overall system performance. In general, systems are more dose efficient at low dose settings.
Competency based training in robotic surgery: benchmark scores for virtual reality robotic simulation.

PubMed

Raison, Nicholas; Ahmed, Kamran; Fossati, Nicola; Buffi, Nicolò; Mottrie, Alexandre; Dasgupta, Prokar; Van Der Poel, Henk

2017-05-01

To develop benchmark scores of competency for use within a competency based virtual reality (VR) robotic training curriculum. This longitudinal, observational study analysed results from nine European Association of Urology hands-on-training courses in VR simulation. In all, 223 participants ranging from novice to expert robotic surgeons completed 1565 exercises. Competency was set at 75% of the mean expert score. Benchmark scores for all general performance metrics generated by the simulator were calculated. Assessment exercises were selected by expert consensus and through learning-curve analysis. Three basic skill and two advanced skill exercises were identified. Benchmark scores based on expert performance offered viable targets for novice and intermediate trainees in robotic surgery. Novice participants met the competency standards for most basic skill exercises; however, advanced exercises were significantly more challenging. Intermediate participants performed better across the seven metrics but still did not achieve the benchmark standard in the more difficult exercises. Benchmark scores derived from expert performances offer relevant and challenging scores for trainees to achieve during VR simulation training. Objective feedback allows both participants and trainers to monitor educational progress and ensures that training remains effective. Furthermore, the well-defined goals set through benchmarking offer clear targets for trainees and enable training to move to a more efficient competency based curriculum. © 2016 The Authors BJU International © 2016 BJU International Published by John Wiley & Sons Ltd.
Aircraft Engine Gas Path Diagnostic Methods: Public Benchmarking Results

NASA Technical Reports Server (NTRS)

Simon, Donald L.; Borguet, Sebastien; Leonard, Olivier; Zhang, Xiaodong (Frank)

2013-01-01

Recent technology reviews have identified the need for objective assessments of aircraft engine health management (EHM) technologies. To help address this issue, a gas path diagnostic benchmark problem has been created and made publicly available. This software tool, referred to as the Propulsion Diagnostic Method Evaluation Strategy (ProDiMES), has been constructed based on feedback provided by the aircraft EHM community. It provides a standard benchmark problem enabling users to develop, evaluate and compare diagnostic methods. This paper will present an overview of ProDiMES along with a description of four gas path diagnostic methods developed and applied to the problem. These methods, which include analytical and empirical diagnostic techniques, will be described and associated blind-test-case metric results will be presented and compared. Lessons learned along with recommendations for improving the public benchmarking processes will also be presented and discussed.
Understanding Acceptance of Software Metrics--A Developer Perspective

ERIC Educational Resources Information Center

Umarji, Medha

2009-01-01

Software metrics are measures of software products and processes. Metrics are widely used by software organizations to help manage projects, improve product quality and increase efficiency of the software development process. However, metrics programs tend to have a high failure rate in organizations, and developer pushback is one of the sources…
Benchmarking short sequence mapping tools

PubMed Central

2013-01-01

Background The development of next-generation sequencing instruments has led to the generation of millions of short sequences in a single run. The process of aligning these reads to a reference genome is time consuming and demands the development of fast and accurate alignment tools. However, the current proposed tools make different compromises between the accuracy and the speed of mapping. Moreover, many important aspects are overlooked while comparing the performance of a newly developed tool to the state of the art. Therefore, there is a need for an objective evaluation method that covers all the aspects. In this work, we introduce a benchmarking suite to extensively analyze sequencing tools with respect to various aspects and provide an objective comparison. Results We applied our benchmarking tests on 9 well known mapping tools, namely, Bowtie, Bowtie2, BWA, SOAP2, MAQ, RMAP, GSNAP, Novoalign, and mrsFAST (mrFAST) using synthetic data and real RNA-Seq data. MAQ and RMAP are based on building hash tables for the reads, whereas the remaining tools are based on indexing the reference genome. The benchmarking tests reveal the strengths and weaknesses of each tool. The results show that no single tool outperforms all others in all metrics. However, Bowtie maintained the best throughput for most of the tests while BWA performed better for longer read lengths. The benchmarking tests are not restricted to the mentioned tools and can be further applied to others. Conclusion The mapping process is still a hard problem that is affected by many factors. In this work, we provided a benchmarking suite that reveals and evaluates the different factors affecting the mapping process. Still, there is no tool that outperforms all of the others in all the tests. Therefore, the end user should clearly specify his needs in order to choose the tool that provides the best results. PMID:23758764
EVA Health and Human Performance Benchmarking Study

NASA Technical Reports Server (NTRS)

Abercromby, A. F.; Norcross, J.; Jarvis, S. L.

2016-01-01

Multiple HRP Risks and Gaps require detailed characterization of human health and performance during exploration extravehicular activity (EVA) tasks; however, a rigorous and comprehensive methodology for characterizing and comparing the health and human performance implications of current and future EVA spacesuit designs does not exist. This study will identify and implement functional tasks and metrics, both objective and subjective, that are relevant to health and human performance, such as metabolic expenditure, suit fit, discomfort, suited postural stability, cognitive performance, and potentially biochemical responses for humans working inside different EVA suits doing functional tasks under the appropriate simulated reduced gravity environments. This study will provide health and human performance benchmark data for humans working in current EVA suits (EMU, Mark III, and Z2) as well as shirtsleeves using a standard set of tasks and metrics with quantified reliability. Results and methodologies developed during this test will provide benchmark data against which future EVA suits, and different suit configurations (eg, varied pressure, mass, CG) may be reliably compared in subsequent tests. Results will also inform fitness for duty standards as well as design requirements and operations concepts for future EVA suits and other exploration systems.
Developing a Security Metrics Scorecard for Healthcare Organizations.

PubMed

Elrefaey, Heba; Borycki, Elizabeth; Kushniruk, Andrea

2015-01-01

In healthcare, information security is a key aspect of protecting a patient's privacy and ensuring systems availability to support patient care. Security managers need to measure the performance of security systems and this can be achieved by using evidence-based metrics. In this paper, we describe the development of an evidence-based security metrics scorecard specific to healthcare organizations. Study participants were asked to comment on the usability and usefulness of a prototype of a security metrics scorecard that was developed based on current research in the area of general security metrics. Study findings revealed that scorecards need to be customized for the healthcare setting in order for the security information to be useful and usable in healthcare organizations. The study findings resulted in the development of a security metrics scorecard that matches the healthcare security experts' information requirements.
Toward benchmarking in catalysis science: Best practices, challenges, and opportunities

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bligaard, Thomas; Bullock, R. Morris; Campbell, Charles T.

Benchmarking is a community-based and (preferably) community-driven activity involving consensus-based decisions on how to make reproducible, fair, and relevant assessments. In catalysis science, important catalyst performance metrics include activity, selectivity, and the deactivation profile, which enable comparisons between new and standard catalysts. Benchmarking also requires careful documentation, archiving, and sharing of methods and measurements, to ensure that the full value of research data can be realized. Beyond these goals, benchmarking presents unique opportunities to advance and accelerate understanding of complex reaction systems by combining and comparing experimental information from multiple, in situ and operando techniques with theoretical insights derived frommore » calculations characterizing model systems. This Perspective describes the origins and uses of benchmarking and its applications in computational catalysis, heterogeneous catalysis, molecular catalysis, and electrocatalysis. As a result, it also discusses opportunities and challenges for future developments in these fields.« less
Toward benchmarking in catalysis science: Best practices, challenges, and opportunities

DOE PAGES

Bligaard, Thomas; Bullock, R. Morris; Campbell, Charles T.; ...

2016-03-07

Benchmarking is a community-based and (preferably) community-driven activity involving consensus-based decisions on how to make reproducible, fair, and relevant assessments. In catalysis science, important catalyst performance metrics include activity, selectivity, and the deactivation profile, which enable comparisons between new and standard catalysts. Benchmarking also requires careful documentation, archiving, and sharing of methods and measurements, to ensure that the full value of research data can be realized. Beyond these goals, benchmarking presents unique opportunities to advance and accelerate understanding of complex reaction systems by combining and comparing experimental information from multiple, in situ and operando techniques with theoretical insights derived frommore » calculations characterizing model systems. This Perspective describes the origins and uses of benchmarking and its applications in computational catalysis, heterogeneous catalysis, molecular catalysis, and electrocatalysis. As a result, it also discusses opportunities and challenges for future developments in these fields.« less

Evaluation of metrics for benchmarking antimicrobial use in the UK dairy industry.

PubMed

Mills, Harriet L; Turner, Andrea; Morgans, Lisa; Massey, Jonathan; Schubert, Hannah; Rees, Gwen; Barrett, David; Dowsey, Andrew; Reyher, Kristen K

2018-03-31

The issue of antimicrobial resistance is of global concern across human and animal health. In 2016, the UK government committed to new targets for reducing antimicrobial use (AMU) in livestock. Although a number of metrics for quantifying AMU are defined in the literature, all give slightly different interpretations. This paper evaluates a selection of metrics for AMU in the dairy industry: total mg, total mg/kg, daily dose and daily course metrics. Although the focus is on their application to the dairy industry, the metrics and issues discussed are relevant across livestock sectors. In order to be used widely, a metric should be understandable and relevant to the veterinarians and farmers who are prescribing and using antimicrobials. This means that clear methods, assumptions (and possible biases), standardised values and exceptions should be published for all metrics. Particularly relevant are assumptions around the number and weight of cattle at risk of treatment and definitions of dose rates and course lengths; incorrect assumptions can mean metrics over-represent or under-represent AMU. The authors recommend that the UK dairy industry work towards the UK-specific metrics using the UK-specific medicine dose and course regimens as well as cattle weights in order to monitor trends nationally. © British Veterinary Association (unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Benchmarks: The Development of a New Approach to Student Evaluation.

ERIC Educational Resources Information Center

Larter, Sylvia

The Toronto Board of Education Benchmarks are libraries of reference materials that demonstrate student achievement at various levels. Each library contains video benchmarks, print benchmarks, a staff handbook, and summary and introductory documents. This book is about the development and the history of the benchmark program. It has taken over 3…
SP2Bench: A SPARQL Performance Benchmark

NASA Astrophysics Data System (ADS)

Schmidt, Michael; Hornung, Thomas; Meier, Michael; Pinkel, Christoph; Lausen, Georg

A meaningful analysis and comparison of both existing storage schemes for RDF data and evaluation approaches for SPARQL queries necessitates a comprehensive and universal benchmark platform. We present SP2Bench, a publicly available, language-specific performance benchmark for the SPARQL query language. SP2Bench is settled in the DBLP scenario and comprises a data generator for creating arbitrarily large DBLP-like documents and a set of carefully designed benchmark queries. The generated documents mirror vital key characteristics and social-world distributions encountered in the original DBLP data set, while the queries implement meaningful requests on top of this data, covering a variety of SPARQL operator constellations and RDF access patterns. In this chapter, we discuss requirements and desiderata for SPARQL benchmarks and present the SP2Bench framework, including its data generator, benchmark queries and performance metrics.
The Medical Library Association Benchmarking Network: development and implementation.

PubMed

Dudden, Rosalind Farnam; Corcoran, Kate; Kaplan, Janice; Magouirk, Jeff; Rand, Debra C; Smith, Bernie Todd

2006-04-01

This article explores the development and implementation of the Medical Library Association (MLA) Benchmarking Network from the initial idea and test survey, to the implementation of a national survey in 2002, to the establishment of a continuing program in 2004. Started as a program for hospital libraries, it has expanded to include other nonacademic health sciences libraries. The activities and timelines of MLA's Benchmarking Network task forces and editorial board from 1998 to 2004 are described. The Benchmarking Network task forces successfully developed an extensive questionnaire with parameters of size and measures of library activity and published a report of the data collected by September 2002. The data were available to all MLA members in the form of aggregate tables. Utilization of Web-based technologies proved feasible for data intake and interactive display. A companion article analyzes and presents some of the data. MLA has continued to develop the Benchmarking Network with the completion of a second survey in 2004. The Benchmarking Network has provided many small libraries with comparative data to present to their administrators. It is a challenge for the future to convince all MLA members to participate in this valuable program.
40 CFR 141.709 - Developing the disinfection profile and benchmark.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 23 2011-07-01 2011-07-01 false Developing the disinfection profile... Cryptosporidium Disinfection Profiling and Benchmarking Requirements § 141.709 Developing the disinfection profile and benchmark. (a) Systems required to develop disinfection profiles under § 141.708 must follow the...
40 CFR 141.709 - Developing the disinfection profile and benchmark.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 23 2014-07-01 2014-07-01 false Developing the disinfection profile... Cryptosporidium Disinfection Profiling and Benchmarking Requirements § 141.709 Developing the disinfection profile and benchmark. (a) Systems required to develop disinfection profiles under § 141.708 must follow the...
40 CFR 141.709 - Developing the disinfection profile and benchmark.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 24 2012-07-01 2012-07-01 false Developing the disinfection profile... Cryptosporidium Disinfection Profiling and Benchmarking Requirements § 141.709 Developing the disinfection profile and benchmark. (a) Systems required to develop disinfection profiles under § 141.708 must follow the...
40 CFR 141.709 - Developing the disinfection profile and benchmark.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 24 2013-07-01 2013-07-01 false Developing the disinfection profile... Cryptosporidium Disinfection Profiling and Benchmarking Requirements § 141.709 Developing the disinfection profile and benchmark. (a) Systems required to develop disinfection profiles under § 141.708 must follow the...
Benchmarking Gas Path Diagnostic Methods: A Public Approach

NASA Technical Reports Server (NTRS)

Simon, Donald L.; Bird, Jeff; Davison, Craig; Volponi, Al; Iverson, R. Eugene

2008-01-01

Recent technology reviews have identified the need for objective assessments of engine health management (EHM) technology. The need is two-fold: technology developers require relevant data and problems to design and validate new algorithms and techniques while engine system integrators and operators need practical tools to direct development and then evaluate the effectiveness of proposed solutions. This paper presents a publicly available gas path diagnostic benchmark problem that has been developed by the Propulsion and Power Systems Panel of The Technical Cooperation Program (TTCP) to help address these needs. The problem is coded in MATLAB (The MathWorks, Inc.) and coupled with a non-linear turbofan engine simulation to produce "snap-shot" measurements, with relevant noise levels, as if collected from a fleet of engines over their lifetime of use. Each engine within the fleet will experience unique operating and deterioration profiles, and may encounter randomly occurring relevant gas path faults including sensor, actuator and component faults. The challenge to the EHM community is to develop gas path diagnostic algorithms to reliably perform fault detection and isolation. An example solution to the benchmark problem is provided along with associated evaluation metrics. A plan is presented to disseminate this benchmark problem to the engine health management technical community and invite technology solutions.
Multiscale benchmarking of drug delivery vectors.

PubMed

Summers, Huw D; Ware, Matthew J; Majithia, Ravish; Meissner, Kenith E; Godin, Biana; Rees, Paul

2016-10-01

Cross-system comparisons of drug delivery vectors are essential to ensure optimal design. An in-vitro experimental protocol is presented that separates the role of the delivery vector from that of its cargo in determining the cell response, thus allowing quantitative comparison of different systems. The technique is validated through benchmarking of the dose-response of human fibroblast cells exposed to the cationic molecule, polyethylene imine (PEI); delivered as a free molecule and as a cargo on the surface of CdSe nanoparticles and Silica microparticles. The exposure metrics are converted to a delivered dose with the transport properties of the different scale systems characterized by a delivery time, τ. The benchmarking highlights an agglomeration of the free PEI molecules into micron sized clusters and identifies the metric determining cell death as the total number of PEI molecules presented to cells, determined by the delivery vector dose and the surface density of the cargo. Copyright © 2016 Elsevier Inc. All rights reserved.
Proficiency performance benchmarks for removal of simulated brain tumors using a virtual reality simulator NeuroTouch.

PubMed

AlZhrani, Gmaan; Alotaibi, Fahad; Azarnoush, Hamed; Winkler-Schwartz, Alexander; Sabbagh, Abdulrahman; Bajunaid, Khalid; Lajoie, Susanne P; Del Maestro, Rolando F

2015-01-01

Assessment of neurosurgical technical skills involved in the resection of cerebral tumors in operative environments is complex. Educators emphasize the need to develop and use objective and meaningful assessment tools that are reliable and valid for assessing trainees' progress in acquiring surgical skills. The purpose of this study was to develop proficiency performance benchmarks for a newly proposed set of objective measures (metrics) of neurosurgical technical skills performance during simulated brain tumor resection using a new virtual reality simulator (NeuroTouch). Each participant performed the resection of 18 simulated brain tumors of different complexity using the NeuroTouch platform. Surgical performance was computed using Tier 1 and Tier 2 metrics derived from NeuroTouch simulator data consisting of (1) safety metrics, including (a) volume of surrounding simulated normal brain tissue removed, (b) sum of forces utilized, and (c) maximum force applied during tumor resection; (2) quality of operation metric, which involved the percentage of tumor removed; and (3) efficiency metrics, including (a) instrument total tip path lengths and (b) frequency of pedal activation. All studies were conducted in the Neurosurgical Simulation Research Centre, Montreal Neurological Institute and Hospital, McGill University, Montreal, Canada. A total of 33 participants were recruited, including 17 experts (board-certified neurosurgeons) and 16 novices (7 senior and 9 junior neurosurgery residents). The results demonstrated that "expert" neurosurgeons resected less surrounding simulated normal brain tissue and less tumor tissue than residents. These data are consistent with the concept that "experts" focused more on safety of the surgical procedure compared with novices. By analyzing experts' neurosurgical technical skills performance on these different metrics, we were able to establish benchmarks for goal proficiency performance training of neurosurgery residents. This
The Medical Library Association Benchmarking Network: development and implementation*

PubMed Central

Dudden, Rosalind Farnam; Corcoran, Kate; Kaplan, Janice; Magouirk, Jeff; Rand, Debra C.; Smith, Bernie Todd

2006-01-01

Objective: This article explores the development and implementation of the Medical Library Association (MLA) Benchmarking Network from the initial idea and test survey, to the implementation of a national survey in 2002, to the establishment of a continuing program in 2004. Started as a program for hospital libraries, it has expanded to include other nonacademic health sciences libraries. Methods: The activities and timelines of MLA's Benchmarking Network task forces and editorial board from 1998 to 2004 are described. Results: The Benchmarking Network task forces successfully developed an extensive questionnaire with parameters of size and measures of library activity and published a report of the data collected by September 2002. The data were available to all MLA members in the form of aggregate tables. Utilization of Web-based technologies proved feasible for data intake and interactive display. A companion article analyzes and presents some of the data. MLA has continued to develop the Benchmarking Network with the completion of a second survey in 2004. Conclusions: The Benchmarking Network has provided many small libraries with comparative data to present to their administrators. It is a challenge for the future to convince all MLA members to participate in this valuable program. PMID:16636702
Thermal Performance Benchmarking: Annual Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moreno, Gilbert

2016-04-08

The goal for this project is to thoroughly characterize the performance of state-of-the-art (SOA) automotive power electronics and electric motor thermal management systems. Information obtained from these studies will be used to: Evaluate advantages and disadvantages of different thermal management strategies; establish baseline metrics for the thermal management systems; identify methods of improvement to advance the SOA; increase the publicly available information related to automotive traction-drive thermal management systems; help guide future electric drive technologies (EDT) research and development (R&D) efforts. The performance results combined with component efficiency and heat generation information obtained by Oak Ridge National Laboratory (ORNL) maymore » then be used to determine the operating temperatures for the EDT components under drive-cycle conditions. In FY15, the 2012 Nissan LEAF power electronics and electric motor thermal management systems were benchmarked. Testing of the 2014 Honda Accord Hybrid power electronics thermal management system started in FY15; however, due to time constraints it was not possible to include results for this system in this report. The focus of this project is to benchmark the thermal aspects of the systems. ORNL's benchmarking of electric and hybrid electric vehicle technology reports provide detailed descriptions of the electrical and packaging aspects of these automotive systems.« less
Benchmarking monthly homogenization algorithms

NASA Astrophysics Data System (ADS)

Venema, V. K. C.; Mestre, O.; Aguilar, E.; Auer, I.; Guijarro, J. A.; Domonkos, P.; Vertacnik, G.; Szentimrey, T.; Stepanek, P.; Zahradnicek, P.; Viarre, J.; Müller-Westermeier, G.; Lakatos, M.; Williams, C. N.; Menne, M.; Lindau, R.; Rasol, D.; Rustemeier, E.; Kolokythas, K.; Marinova, T.; Andresen, L.; Acquaotta, F.; Fratianni, S.; Cheval, S.; Klancar, M.; Brunetti, M.; Gruber, C.; Prohom Duran, M.; Likso, T.; Esteban, P.; Brandsma, T.

2011-08-01

The COST (European Cooperation in Science and Technology) Action ES0601: Advances in homogenization methods of climate series: an integrated approach (HOME) has executed a blind intercomparison and validation study for monthly homogenization algorithms. Time series of monthly temperature and precipitation were evaluated because of their importance for climate studies and because they represent two important types of statistics (additive and multiplicative). The algorithms were validated against a realistic benchmark dataset. The benchmark contains real inhomogeneous data as well as simulated data with inserted inhomogeneities. Random break-type inhomogeneities were added to the simulated datasets modeled as a Poisson process with normally distributed breakpoint sizes. To approximate real world conditions, breaks were introduced that occur simultaneously in multiple station series within a simulated network of station data. The simulated time series also contained outliers, missing data periods and local station trends. Further, a stochastic nonlinear global (network-wide) trend was added. Participants provided 25 separate homogenized contributions as part of the blind study as well as 22 additional solutions submitted after the details of the imposed inhomogeneities were revealed. These homogenized datasets were assessed by a number of performance metrics including (i) the centered root mean square error relative to the true homogeneous value at various averaging scales, (ii) the error in linear trend estimates and (iii) traditional contingency skill scores. The metrics were computed both using the individual station series as well as the network average regional series. The performance of the contributions depends significantly on the error metric considered. Contingency scores by themselves are not very informative. Although relative homogenization algorithms typically improve the homogeneity of temperature data, only the best ones improve precipitation data
Advanced Life Support Research and Technology Development Metric

NASA Technical Reports Server (NTRS)

Hanford, A. J.

2004-01-01

The Metric is one of several measures employed by the NASA to assess the Agency s progress as mandated by the United States Congress and the Office of Management and Budget. Because any measure must have a reference point, whether explicitly defined or implied, the Metric is a comparison between a selected ALS Project life support system and an equivalently detailed life support system using technology from the Environmental Control and Life Support System (ECLSS) for the International Space Station (ISS). This document provides the official calculation of the Advanced Life Support (ALS) Research and Technology Development Metric (the Metric) for Fiscal Year 2004. The values are primarily based on Systems Integration, Modeling, and Analysis (SIMA) Element approved software tools or reviewed and approved reference documents. For Fiscal Year 2004, the Advanced Life Support Research and Technology Development Metric value is 2.03 for an Orbiting Research Facility and 1.62 for an Independent Exploration Mission.
A Machine-to-Machine protocol benchmark for eHealth applications - Use case: Respiratory rehabilitation.

PubMed

Talaminos-Barroso, Alejandro; Estudillo-Valderrama, Miguel A; Roa, Laura M; Reina-Tosina, Javier; Ortega-Ruiz, Francisco

2016-06-01

M2M (Machine-to-Machine) communications represent one of the main pillars of the new paradigm of the Internet of Things (IoT), and is making possible new opportunities for the eHealth business. Nevertheless, the large number of M2M protocols currently available hinders the election of a suitable solution that satisfies the requirements that can demand eHealth applications. In the first place, to develop a tool that provides a benchmarking analysis in order to objectively select among the most relevant M2M protocols for eHealth solutions. In the second place, to validate the tool with a particular use case: the respiratory rehabilitation. A software tool, called Distributed Computing Framework (DFC), has been designed and developed to execute the benchmarking tests and facilitate the deployment in environments with a large number of machines, with independence of the protocol and performance metrics selected. DDS, MQTT, CoAP, JMS, AMQP and XMPP protocols were evaluated considering different specific performance metrics, including CPU usage, memory usage, bandwidth consumption, latency and jitter. The results obtained allowed to validate a case of use: respiratory rehabilitation of chronic obstructive pulmonary disease (COPD) patients in two scenarios with different types of requirement: Home-Based and Ambulatory. The results of the benchmark comparison can guide eHealth developers in the choice of M2M technologies. In this regard, the framework presented is a simple and powerful tool for the deployment of benchmark tests under specific environments and conditions. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Parameterized centrality metric for network analysis

NASA Astrophysics Data System (ADS)

Ghosh, Rumi; Lerman, Kristina

2011-06-01

A variety of metrics have been proposed to measure the relative importance of nodes in a network. One of these, alpha-centrality [P. Bonacich, Am. J. Sociol.0002-960210.1086/228631 92, 1170 (1987)], measures the number of attenuated paths that exist between nodes. We introduce a normalized version of this metric and use it to study network structure, for example, to rank nodes and find community structure of the network. Specifically, we extend the modularity-maximization method for community detection to use this metric as the measure of node connectivity. Normalized alpha-centrality is a powerful tool for network analysis, since it contains a tunable parameter that sets the length scale of interactions. Studying how rankings and discovered communities change when this parameter is varied allows us to identify locally and globally important nodes and structures. We apply the proposed metric to several benchmark networks and show that it leads to better insights into network structure than alternative metrics.
Metrics for the Diurnal Cycle of Precipitation: Toward Routine Benchmarks for Climate Models

DOE PAGES

Covey, Curt; Gleckler, Peter J.; Doutriaux, Charles; ...

2016-06-08

In this paper, metrics are proposed—that is, a few summary statistics that condense large amounts of data from observations or model simulations—encapsulating the diurnal cycle of precipitation. Vector area averaging of Fourier amplitude and phase produces useful information in a reasonably small number of harmonic dial plots, a procedure familiar from atmospheric tide research. The metrics cover most of the globe but down-weight high-latitude wintertime ocean areas where baroclinic waves are most prominent. This enables intercomparison of a large number of climate models with observations and with each other. The diurnal cycle of precipitation has features not encountered in typicalmore » climate model intercomparisons, notably the absence of meaningful “average model” results that can be displayed in a single two-dimensional map. Displaying one map per model guides development of the metrics proposed here by making it clear that land and ocean areas must be averaged separately, but interpreting maps from all models becomes problematic as the size of a multimodel ensemble increases. Global diurnal metrics provide quick comparisons with observations and among models, using the most recent version of the Coupled Model Intercomparison Project (CMIP). This includes, for the first time in CMIP, spatial resolutions comparable to global satellite observations. Finally, consistent with earlier studies of resolution versus parameterization of the diurnal cycle, the longstanding tendency of models to produce rainfall too early in the day persists in the high-resolution simulations, as expected if the error is due to subgrid-scale physics.« less
Metrics for the Diurnal Cycle of Precipitation: Toward Routine Benchmarks for Climate Models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Covey, Curt; Gleckler, Peter J.; Doutriaux, Charles

In this paper, metrics are proposed—that is, a few summary statistics that condense large amounts of data from observations or model simulations—encapsulating the diurnal cycle of precipitation. Vector area averaging of Fourier amplitude and phase produces useful information in a reasonably small number of harmonic dial plots, a procedure familiar from atmospheric tide research. The metrics cover most of the globe but down-weight high-latitude wintertime ocean areas where baroclinic waves are most prominent. This enables intercomparison of a large number of climate models with observations and with each other. The diurnal cycle of precipitation has features not encountered in typicalmore » climate model intercomparisons, notably the absence of meaningful “average model” results that can be displayed in a single two-dimensional map. Displaying one map per model guides development of the metrics proposed here by making it clear that land and ocean areas must be averaged separately, but interpreting maps from all models becomes problematic as the size of a multimodel ensemble increases. Global diurnal metrics provide quick comparisons with observations and among models, using the most recent version of the Coupled Model Intercomparison Project (CMIP). This includes, for the first time in CMIP, spatial resolutions comparable to global satellite observations. Finally, consistent with earlier studies of resolution versus parameterization of the diurnal cycle, the longstanding tendency of models to produce rainfall too early in the day persists in the high-resolution simulations, as expected if the error is due to subgrid-scale physics.« less
Automatic Classification of Protein Structure Using the Maximum Contact Map Overlap Metric

DOE Office of Scientific and Technical Information (OSTI.GOV)

Andonov, Rumen; Djidjev, Hristo Nikolov; Klau, Gunnar W.

In this paper, we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfies all properties of a metric on the space of protein representations. Having a metric in that space allows one to avoid pairwise comparisons on the entire database and, thus, to significantly accelerate exploring the protein space compared to no-metric spaces. We show on a gold standard superfamily classification benchmark set of 6759 proteins that our exact k-nearest neighbor (k-NN) scheme classifiesmore » up to 224 out of 236 queries correctly and on a larger, extended version of the benchmark with 60; 850 additional structures, up to 1361 out of 1369 queries. Finally, our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on flexible contact map overlap alignments.« less

Automatic Classification of Protein Structure Using the Maximum Contact Map Overlap Metric

DOE PAGES

Andonov, Rumen; Djidjev, Hristo Nikolov; Klau, Gunnar W.; ...

2015-10-09

In this paper, we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfies all properties of a metric on the space of protein representations. Having a metric in that space allows one to avoid pairwise comparisons on the entire database and, thus, to significantly accelerate exploring the protein space compared to no-metric spaces. We show on a gold standard superfamily classification benchmark set of 6759 proteins that our exact k-nearest neighbor (k-NN) scheme classifiesmore » up to 224 out of 236 queries correctly and on a larger, extended version of the benchmark with 60; 850 additional structures, up to 1361 out of 1369 queries. Finally, our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on flexible contact map overlap alignments.« less
First benchmark of the Unstructured Grid Adaptation Working Group

NASA Technical Reports Server (NTRS)

Ibanez, Daniel; Barral, Nicolas; Krakos, Joshua; Loseille, Adrien; Michal, Todd; Park, Mike

2017-01-01

Unstructured grid adaptation is a technology that holds the potential to improve the automation and accuracy of computational fluid dynamics and other computational disciplines. Difficulty producing the highly anisotropic elements necessary for simulation on complex curved geometries that satisfies a resolution request has limited this technology's widespread adoption. The Unstructured Grid Adaptation Working Group is an open gathering of researchers working on adapting simplicial meshes to conform to a metric field. Current members span a wide range of institutions including academia, industry, and national laboratories. The purpose of this group is to create a common basis for understanding and improving mesh adaptation. We present our first major contribution: a common set of benchmark cases, including input meshes and analytic metric specifications, that are publicly available to be used for evaluating any mesh adaptation code. We also present the results of several existing codes on these benchmark cases, to illustrate their utility in identifying key challenges common to all codes and important differences between available codes. Future directions are defined to expand this benchmark to mature the technology necessary to impact practical simulation workflows.
NASA Software Engineering Benchmarking Study

NASA Technical Reports Server (NTRS)

Rarick, Heather L.; Godfrey, Sara H.; Kelly, John C.; Crumbley, Robert T.; Wifl, Joel M.

2013-01-01

To identify best practices for the improvement of software engineering on projects, NASA's Offices of Chief Engineer (OCE) and Safety and Mission Assurance (OSMA) formed a team led by Heather Rarick and Sally Godfrey to conduct this benchmarking study. The primary goals of the study are to identify best practices that: Improve the management and technical development of software intensive systems; Have a track record of successful deployment by aerospace industries, universities [including research and development (R&D) laboratories], and defense services, as well as NASA's own component Centers; and Identify candidate solutions for NASA's software issues. Beginning in the late fall of 2010, focus topics were chosen and interview questions were developed, based on the NASA top software challenges. Between February 2011 and November 2011, the Benchmark Team interviewed a total of 18 organizations, consisting of five NASA Centers, five industry organizations, four defense services organizations, and four university or university R and D laboratory organizations. A software assurance representative also participated in each of the interviews to focus on assurance and software safety best practices. Interviewees provided a wealth of information on each topic area that included: software policy, software acquisition, software assurance, testing, training, maintaining rigor in small projects, metrics, and use of the Capability Maturity Model Integration (CMMI) framework, as well as a number of special topics that came up in the discussions. NASA's software engineering practices compared favorably with the external organizations in most benchmark areas, but in every topic, there were ways in which NASA could improve its practices. Compared to defense services organizations and some of the industry organizations, one of NASA's notable weaknesses involved communication with contractors regarding its policies and requirements for acquired software. One of NASA's strengths
Using Publication Metrics to Highlight Academic Productivity and Research Impact

PubMed Central

Carpenter, Christopher R.; Cone, David C.; Sarli, Cathy C.

2016-01-01

This article provides a broad overview of widely available measures of academic productivity and impact using publication data and highlights uses of these metrics for various purposes. Metrics based on publication data include measures such as number of publications, number of citations, the journal impact factor score, and the h-index, as well as emerging metrics based on document-level metrics. Publication metrics can be used for a variety of purposes for tenure and promotion, grant applications and renewal reports, benchmarking, recruiting efforts, and administrative purposes for departmental or university performance reports. The authors also highlight practical applications of measuring and reporting academic productivity and impact to emphasize and promote individual investigators, grant applications, or department output. PMID:25308141
Developing a benchmark for emotional analysis of music

PubMed Central

Yang, Yi-Hsuan; Soleymani, Mohammad

2017-01-01

Music emotion recognition (MER) field rapidly expanded in the last decade. Many new methods and new audio features are developed to improve the performance of MER algorithms. However, it is very difficult to compare the performance of the new methods because of the data representation diversity and scarcity of publicly available data. In this paper, we address these problems by creating a data set and a benchmark for MER. The data set that we release, a MediaEval Database for Emotional Analysis in Music (DEAM), is the largest available data set of dynamic annotations (valence and arousal annotations for 1,802 songs and song excerpts licensed under Creative Commons with 2Hz time resolution). Using DEAM, we organized the ‘Emotion in Music’ task at MediaEval Multimedia Evaluation Campaign from 2013 to 2015. The benchmark attracted, in total, 21 active teams to participate in the challenge. We analyze the results of the benchmark: the winning algorithms and feature-sets. We also describe the design of the benchmark, the evaluation procedures and the data cleaning and transformations that we suggest. The results from the benchmark suggest that the recurrent neural network based approaches combined with large feature-sets work best for dynamic MER. PMID:28282400
Developing a benchmark for emotional analysis of music.

PubMed

Aljanaki, Anna; Yang, Yi-Hsuan; Soleymani, Mohammad

2017-01-01

Music emotion recognition (MER) field rapidly expanded in the last decade. Many new methods and new audio features are developed to improve the performance of MER algorithms. However, it is very difficult to compare the performance of the new methods because of the data representation diversity and scarcity of publicly available data. In this paper, we address these problems by creating a data set and a benchmark for MER. The data set that we release, a MediaEval Database for Emotional Analysis in Music (DEAM), is the largest available data set of dynamic annotations (valence and arousal annotations for 1,802 songs and song excerpts licensed under Creative Commons with 2Hz time resolution). Using DEAM, we organized the 'Emotion in Music' task at MediaEval Multimedia Evaluation Campaign from 2013 to 2015. The benchmark attracted, in total, 21 active teams to participate in the challenge. We analyze the results of the benchmark: the winning algorithms and feature-sets. We also describe the design of the benchmark, the evaluation procedures and the data cleaning and transformations that we suggest. The results from the benchmark suggest that the recurrent neural network based approaches combined with large feature-sets work best for dynamic MER.
40 CFR 141.540 - Who has to develop a disinfection benchmark?

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 23 2014-07-01 2014-07-01 false Who has to develop a disinfection... Disinfection-Systems Serving Fewer Than 10,000 People Disinfection Benchmark § 141.540 Who has to develop a disinfection benchmark? If you are a subpart H system required to develop a disinfection profile under §§ 141...
40 CFR 141.540 - Who has to develop a disinfection benchmark?

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 24 2013-07-01 2013-07-01 false Who has to develop a disinfection... Disinfection-Systems Serving Fewer Than 10,000 People Disinfection Benchmark § 141.540 Who has to develop a disinfection benchmark? If you are a subpart H system required to develop a disinfection profile under §§ 141...
40 CFR 141.540 - Who has to develop a disinfection benchmark?

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 23 2011-07-01 2011-07-01 false Who has to develop a disinfection... Disinfection-Systems Serving Fewer Than 10,000 People Disinfection Benchmark § 141.540 Who has to develop a disinfection benchmark? If you are a subpart H system required to develop a disinfection profile under §§ 141...
The National Practice Benchmark for Oncology: 2015 Report for 2014 Data

PubMed Central

Balch, Carla; Ogle, John D.

2016-01-01

The National Practice Benchmark (NPB) is a unique tool used to measure oncology practices against others across the country in a meaningful way despite variations in practice demographics, size, and setting. In today’s challenging economic environment, each practice positions service offerings and competitive advantages to attract patients. Although the data in the NPB report are primarily reported by community oncology practices, the business structure and arrangements with regional health care systems are also reflected in the benchmark report. The ability to produce detailed metrics is an accomplishment of excellence in business and clinical management. With these metrics, a practice should be able to measure and analyze its current business practices and make appropriate changes, if necessary. In this report, we build on the foundation initially established by Oncology Metrics (acquired by Flatiron Health in 2014) over years of data collection and refine definitions to deliver the NPB, which is uniquely meaningful in the oncology market. PMID:27006357
Benchmarking homogenization algorithms for monthly data

NASA Astrophysics Data System (ADS)

Venema, V. K. C.; Mestre, O.; Aguilar, E.; Auer, I.; Guijarro, J. A.; Domonkos, P.; Vertacnik, G.; Szentimrey, T.; Stepanek, P.; Zahradnicek, P.; Viarre, J.; Müller-Westermeier, G.; Lakatos, M.; Williams, C. N.; Menne, M. J.; Lindau, R.; Rasol, D.; Rustemeier, E.; Kolokythas, K.; Marinova, T.; Andresen, L.; Acquaotta, F.; Fratiannil, S.; Cheval, S.; Klancar, M.; Brunetti, M.; Gruber, C.; Prohom Duran, M.; Likso, T.; Esteban, P.; Brandsma, T.; Willett, K.

2013-09-01

The COST (European Cooperation in Science and Technology) Action ES0601: Advances in homogenization methods of climate series: an integrated approach (HOME) has executed a blind intercomparison and validation study for monthly homogenization algorithms. Time series of monthly temperature and precipitation were evaluated because of their importance for climate studies. The algorithms were validated against a realistic benchmark dataset. Participants provided 25 separate homogenized contributions as part of the blind study as well as 22 additional solutions submitted after the details of the imposed inhomogeneities were revealed. These homogenized datasets were assessed by a number of performance metrics including i) the centered root mean square error relative to the true homogeneous values at various averaging scales, ii) the error in linear trend estimates and iii) traditional contingency skill scores. The metrics were computed both using the individual station series as well as the network average regional series. The performance of the contributions depends significantly on the error metric considered. Although relative homogenization algorithms typically improve the homogeneity of temperature data, only the best ones improve precipitation data. Moreover, state-of-the-art relative homogenization algorithms developed to work with an inhomogeneous reference are shown to perform best. The study showed that currently automatic algorithms can perform as well as manual ones.
The Applicability of Proposed Object-Oriented Metrics to Developer Feedback in Time to Impact Development

NASA Technical Reports Server (NTRS)

Neal, Ralph D.

1996-01-01

This paper looks closely at each of the software metrics generated by the McCabe object-Oriented Tool(TM) and its ability to convey timely information to developers. The metrics are examined for meaningfulness in terms of the scale assignable to the metric by the rules of measurement theory and the software dimension being measured. Recommendations are made as to the proper use of each metric and its ability to influence development at an early stage. The metrics of the McCabe Object-Oriented Tool(TM) set were selected because of the tool's use in a couple of NASA IV&V projects.
Incorporating big data into treatment plan evaluation: Development of statistical DVH metrics and visualization dashboards.

PubMed

Mayo, Charles S; Yao, John; Eisbruch, Avraham; Balter, James M; Litzenberg, Dale W; Matuszak, Martha M; Kessler, Marc L; Weyburn, Grant; Anderson, Carlos J; Owen, Dawn; Jackson, William C; Haken, Randall Ten

2017-01-01

To develop statistical dose-volume histogram (DVH)-based metrics and a visualization method to quantify the comparison of treatment plans with historical experience and among different institutions. The descriptive statistical summary (ie, median, first and third quartiles, and 95% confidence intervals) of volume-normalized DVH curve sets of past experiences was visualized through the creation of statistical DVH plots. Detailed distribution parameters were calculated and stored in JavaScript Object Notation files to facilitate management, including transfer and potential multi-institutional comparisons. In the treatment plan evaluation, structure DVH curves were scored against computed statistical DVHs and weighted experience scores (WESs). Individual, clinically used, DVH-based metrics were integrated into a generalized evaluation metric (GEM) as a priority-weighted sum of normalized incomplete gamma functions. Historical treatment plans for 351 patients with head and neck cancer, 104 with prostate cancer who were treated with conventional fractionation, and 94 with liver cancer who were treated with stereotactic body radiation therapy were analyzed to demonstrate the usage of statistical DVH, WES, and GEM in a plan evaluation. A shareable dashboard plugin was created to display statistical DVHs and integrate GEM and WES scores into a clinical plan evaluation within the treatment planning system. Benchmarking with normal tissue complication probability scores was carried out to compare the behavior of GEM and WES scores. DVH curves from historical treatment plans were characterized and presented, with difficult-to-spare structures (ie, frequently compromised organs at risk) identified. Quantitative evaluations by GEM and/or WES compared favorably with the normal tissue complication probability Lyman-Kutcher-Burman model, transforming a set of discrete threshold-priority limits into a continuous model reflecting physician objectives and historical experience
Simulation-based comprehensive benchmarking of RNA-seq aligners

PubMed Central

Baruzzo, Giacomo; Hayer, Katharina E; Kim, Eun Ji; Di Camillo, Barbara; FitzGerald, Garret A; Grant, Gregory R

2018-01-01

Alignment is the first step in most RNA-seq analysis pipelines, and the accuracy of downstream analyses depends heavily on it. Unlike most steps in the pipeline, alignment is particularly amenable to benchmarking with simulated data. We performed a comprehensive benchmarking of 14 common splice-aware aligners for base, read, and exon junction-level accuracy and compared default with optimized parameters. We found that performance varied by genome complexity, and accuracy and popularity were poorly correlated. The most widely cited tool underperforms for most metrics, particularly when using default settings. PMID:27941783
Benchmarks and Quality Assurance for Online Course Development in Higher Education

ERIC Educational Resources Information Center

Wang, Hong

2008-01-01

As online education has entered the main stream of the U.S. higher education, quality assurance in online course development has become a critical topic in distance education. This short article summarizes the major benchmarks related to online course development, listing and comparing the benchmarks of the National Education Association (NEA),…
Development of a Quantitative Decision Metric for Selecting the Most Suitable Discretization Method for SN Transport Problems

NASA Astrophysics Data System (ADS)

Schunert, Sebastian

In this work we develop a quantitative decision metric for spatial discretization methods of the SN equations. The quantitative decision metric utilizes performance data from selected test problems for computing a fitness score that is used for the selection of the most suitable discretization method for a particular SN transport application. The fitness score is aggregated as a weighted geometric mean of single performance indicators representing various performance aspects relevant to the user. Thus, the fitness function can be adjusted to the particular needs of the code practitioner by adding/removing single performance indicators or changing their importance via the supplied weights. Within this work a special, broad class of methods is considered, referred to as nodal methods. This class is naturally comprised of the DGFEM methods of all function space families. Within this work it is also shown that the Higher Order Diamond Difference (HODD) method is a nodal method. Building on earlier findings that the Arbitrarily High Order Method of the Nodal type (AHOTN) is also a nodal method, a generalized finite-element framework is created to yield as special cases various methods that were developed independently using profoundly different formalisms. A selection of test problems related to a certain performance aspect are considered: an Method of Manufactured Solutions (MMS) test suite for assessing accuracy and execution time, Lathrop's test problem for assessing resilience against occurrence of negative fluxes, and a simple, homogeneous cube test problem to verify if a method possesses the thick diffusive limit. The contending methods are implemented as efficiently as possible under a common SN transport code framework to level the playing field for a fair comparison of their computational load. Numerical results are presented for all three test problems and a qualitative rating of each method's performance is provided for each aspect: accuracy
Development of Quality Metrics in Ambulatory Pediatric Cardiology.

PubMed

Chowdhury, Devyani; Gurvitz, Michelle; Marelli, Ariane; Anderson, Jeffrey; Baker-Smith, Carissa; Diab, Karim A; Edwards, Thomas C; Hougen, Tom; Jedeikin, Roy; Johnson, Jonathan N; Karpawich, Peter; Lai, Wyman; Lu, Jimmy C; Mitchell, Stephanie; Newburger, Jane W; Penny, Daniel J; Portman, Michael A; Satou, Gary; Teitel, David; Villafane, Juan; Williams, Roberta; Jenkins, Kathy

2017-02-07

The American College of Cardiology Adult Congenital and Pediatric Cardiology (ACPC) Section had attempted to create quality metrics (QM) for ambulatory pediatric practice, but limited evidence made the process difficult. The ACPC sought to develop QMs for ambulatory pediatric cardiology practice. Five areas of interest were identified, and QMs were developed in a 2-step review process. In the first step, an expert panel, using the modified RAND-UCLA methodology, rated each QM for feasibility and validity. The second step sought input from ACPC Section members; final approval was by a vote of the ACPC Council. Work groups proposed a total of 44 QMs. Thirty-one metrics passed the RAND process and, after the open comment period, the ACPC council approved 18 metrics. The project resulted in successful development of QMs in ambulatory pediatric cardiology for a range of ambulatory domains. Copyright © 2017 American College of Cardiology Foundation. Published by Elsevier Inc. All rights reserved.
Quality Metrics in Neonatal and Pediatric Critical Care Transport: A National Delphi Project.

PubMed

Schwartz, Hamilton P; Bigham, Michael T; Schoettker, Pamela J; Meyer, Keith; Trautman, Michael S; Insoft, Robert M

2015-10-01

The transport of neonatal and pediatric patients to tertiary care facilities for specialized care demands monitoring the quality of care delivered during transport and its impact on patient outcomes. In 2011, pediatric transport teams in Ohio met to identify quality indicators permitting comparisons among programs. However, no set of national consensus quality metrics exists for benchmarking transport teams. The aim of this project was to achieve national consensus on appropriate neonatal and pediatric transport quality metrics. Modified Delphi technique. The first round of consensus determination was via electronic mail survey, followed by rounds of consensus determination in-person at the American Academy of Pediatrics Section on Transport Medicine's 2012 Quality Metrics Summit. All attendees of the American Academy of Pediatrics Section on Transport Medicine Quality Metrics Summit, conducted on October 21-23, 2012, in New Orleans, LA, were eligible to participate. Candidate quality metrics were identified through literature review and those metrics currently tracked by participating programs. Participants were asked in a series of rounds to identify "very important" quality metrics for transport. It was determined a priori that consensus on a metric's importance was achieved when at least 70% of respondents were in agreement. This is consistent with other Delphi studies. Eighty-two candidate metrics were considered initially. Ultimately, 12 metrics achieved consensus as "very important" to transport. These include metrics related to airway management, team mobilization time, patient and crew injuries, and adverse patient care events. Definitions were assigned to the 12 metrics to facilitate uniform data tracking among programs. The authors succeeded in achieving consensus among a diverse group of national transport experts on 12 core neonatal and pediatric transport quality metrics. We propose that transport teams across the country use these metrics to
Pollutant Emissions and Energy Efficiency under Controlled Conditions for Household Biomass Cookstoves and Implications for Metrics Useful in Setting International Test Standards

EPA Science Inventory

Realistic metrics and methods for testing household biomass cookstoves are required to develop standards needed by international policy makers, donors, and investors. Application of consistent test practices allows emissions and energy efficiency performance to be benchmarked and...
Benchmarking homogenization algorithms for monthly data

NASA Astrophysics Data System (ADS)

Venema, V. K. C.; Mestre, O.; Aguilar, E.; Auer, I.; Guijarro, J. A.; Domonkos, P.; Vertacnik, G.; Szentimrey, T.; Stepanek, P.; Zahradnicek, P.; Viarre, J.; Müller-Westermeier, G.; Lakatos, M.; Williams, C. N.; Menne, M. J.; Lindau, R.; Rasol, D.; Rustemeier, E.; Kolokythas, K.; Marinova, T.; Andresen, L.; Acquaotta, F.; Fratianni, S.; Cheval, S.; Klancar, M.; Brunetti, M.; Gruber, C.; Prohom Duran, M.; Likso, T.; Esteban, P.; Brandsma, T.

2012-01-01

The COST (European Cooperation in Science and Technology) Action ES0601: advances in homogenization methods of climate series: an integrated approach (HOME) has executed a blind intercomparison and validation study for monthly homogenization algorithms. Time series of monthly temperature and precipitation were evaluated because of their importance for climate studies and because they represent two important types of statistics (additive and multiplicative). The algorithms were validated against a realistic benchmark dataset. The benchmark contains real inhomogeneous data as well as simulated data with inserted inhomogeneities. Random independent break-type inhomogeneities with normally distributed breakpoint sizes were added to the simulated datasets. To approximate real world conditions, breaks were introduced that occur simultaneously in multiple station series within a simulated network of station data. The simulated time series also contained outliers, missing data periods and local station trends. Further, a stochastic nonlinear global (network-wide) trend was added. Participants provided 25 separate homogenized contributions as part of the blind study. After the deadline at which details of the imposed inhomogeneities were revealed, 22 additional solutions were submitted. These homogenized datasets were assessed by a number of performance metrics including (i) the centered root mean square error relative to the true homogeneous value at various averaging scales, (ii) the error in linear trend estimates and (iii) traditional contingency skill scores. The metrics were computed both using the individual station series as well as the network average regional series. The performance of the contributions depends significantly on the error metric considered. Contingency scores by themselves are not very informative. Although relative homogenization algorithms typically improve the homogeneity of temperature data, only the best ones improve precipitation data

Development of quality metrics for ambulatory pediatric cardiology: Infection prevention.

PubMed

Johnson, Jonathan N; Barrett, Cindy S; Franklin, Wayne H; Graham, Eric M; Halnon, Nancy J; Hattendorf, Brandy A; Krawczeski, Catherine D; McGovern, James J; O'Connor, Matthew J; Schultz, Amy H; Vinocur, Jeffrey M; Chowdhury, Devyani; Anderson, Jeffrey B

2017-12-01

In 2012, the American College of Cardiology's (ACC) Adult Congenital and Pediatric Cardiology Council established a program to develop quality metrics to guide ambulatory practices for pediatric cardiology. The council chose five areas on which to focus their efforts; chest pain, Kawasaki Disease, tetralogy of Fallot, transposition of the great arteries after arterial switch, and infection prevention. Here, we sought to describe the process, evaluation, and results of the Infection Prevention Committee's metric design process. The infection prevention metrics team consisted of 12 members from 11 institutions in North America. The group agreed to work on specific infection prevention topics including antibiotic prophylaxis for endocarditis, rheumatic fever, and asplenia/hyposplenism; influenza vaccination and respiratory syncytial virus prophylaxis (palivizumab); preoperative methods to reduce intraoperative infections; vaccinations after cardiopulmonary bypass; hand hygiene; and testing to identify splenic function in patients with heterotaxy. An extensive literature review was performed. When available, previously published guidelines were used fully in determining metrics. The committee chose eight metrics to submit to the ACC Quality Metric Expert Panel for review. Ultimately, metrics regarding hand hygiene and influenza vaccination recommendation for patients did not pass the RAND analysis. Both endocarditis prophylaxis metrics and the RSV/palivizumab metric passed the RAND analysis but fell out during the open comment period. Three metrics passed all analyses, including those for antibiotic prophylaxis in patients with heterotaxy/asplenia, for influenza vaccination compliance in healthcare personnel, and for adherence to recommended regimens of secondary prevention of rheumatic fever. The lack of convincing data to guide quality improvement initiatives in pediatric cardiology is widespread, particularly in infection prevention. Despite this, three metrics were
OWL2 benchmarking for the evaluation of knowledge based systems.

PubMed

Khan, Sher Afgun; Qadir, Muhammad Abdul; Abbas, Muhammad Azeem; Afzal, Muhammad Tanvir

2017-01-01

OWL2 semantics are becoming increasingly popular for the real domain applications like Gene engineering and health MIS. The present work identifies the research gap that negligible attention has been paid to the performance evaluation of Knowledge Base Systems (KBS) using OWL2 semantics. To fulfil this identified research gap, an OWL2 benchmark for the evaluation of KBS is proposed. The proposed benchmark addresses the foundational blocks of an ontology benchmark i.e. data schema, workload and performance metrics. The proposed benchmark is tested on memory based, file based, relational database and graph based KBS for performance and scalability measures. The results show that the proposed benchmark is able to evaluate the behaviour of different state of the art KBS on OWL2 semantics. On the basis of the results, the end users (i.e. domain expert) would be able to select a suitable KBS appropriate for his domain.
Development and application of freshwater sediment-toxicity benchmarks for currently used pesticides

USGS Publications Warehouse

Nowell, Lisa H.; Norman, Julia E.; Ingersoll, Christopher G.; Moran, Patrick W.

2016-01-01

Sediment-toxicity benchmarks are needed to interpret the biological significance of currently used pesticides detected in whole sediments. Two types of freshwater sediment benchmarks for pesticides were developed using spiked-sediment bioassay (SSB) data from the literature. These benchmarks can be used to interpret sediment-toxicity data or to assess the potential toxicity of pesticides in whole sediment. The Likely Effect Benchmark (LEB) defines a pesticide concentration in whole sediment above which there is a high probability of adverse effects on benthic invertebrates, and the Threshold Effect Benchmark (TEB) defines a concentration below which adverse effects are unlikely. For compounds without available SSBs, benchmarks were estimated using equilibrium partitioning (EqP). When a sediment sample contains a pesticide mixture, benchmark quotients can be summed for all detected pesticides to produce an indicator of potential toxicity for that mixture. Benchmarks were developed for 48 pesticide compounds using SSB data and 81 compounds using the EqP approach. In an example application, data for pesticides measured in sediment from 197 streams across the United States were evaluated using these benchmarks, and compared to measured toxicity from whole-sediment toxicity tests conducted with the amphipod Hyalella azteca (28-d exposures) and the midge Chironomus dilutus (10-d exposures). Amphipod survival, weight, and biomass were significantly and inversely related to summed benchmark quotients, whereas midge survival, weight, and biomass showed no relationship to benchmarks. Samples with LEB exceedances were rare (n = 3), but all were toxic to amphipods (i.e., significantly different from control). Significant toxicity to amphipods was observed for 72% of samples exceeding one or more TEBs, compared to 18% of samples below all TEBs. Factors affecting toxicity below TEBs may include the presence of contaminants other than pesticides, physical
Benchmarking in pathology: development of an activity-based costing model.

PubMed

Burnett, Leslie; Wilson, Roger; Pfeffer, Sally; Lowry, John

2012-12-01

Benchmarking in Pathology (BiP) allows pathology laboratories to determine the unit cost of all laboratory tests and procedures, and also provides organisational productivity indices allowing comparisons of performance with other BiP participants. We describe 14 years of progressive enhancement to a BiP program, including the implementation of 'avoidable costs' as the accounting basis for allocation of costs rather than previous approaches using 'total costs'. A hierarchical tree-structured activity-based costing model distributes 'avoidable costs' attributable to the pathology activities component of a pathology laboratory operation. The hierarchical tree model permits costs to be allocated across multiple laboratory sites and organisational structures. This has enabled benchmarking on a number of levels, including test profiles and non-testing related workload activities. The development of methods for dealing with variable cost inputs, allocation of indirect costs using imputation techniques, panels of tests, and blood-bank record keeping, have been successfully integrated into the costing model. A variety of laboratory management reports are produced, including the 'cost per test' of each pathology 'test' output. Benchmarking comparisons may be undertaken at any and all of the 'cost per test' and 'cost per Benchmarking Complexity Unit' level, 'discipline/department' (sub-specialty) level, or overall laboratory/site and organisational levels. We have completed development of a national BiP program. An activity-based costing methodology based on avoidable costs overcomes many problems of previous benchmarking studies based on total costs. The use of benchmarking complexity adjustment permits correction for varying test-mix and diagnostic complexity between laboratories. Use of iterative communication strategies with program participants can overcome many obstacles and lead to innovations.
Development of Management Metrics for Research and Technology

NASA Technical Reports Server (NTRS)

Sheskin, Theodore J.

2003-01-01

Professor Ted Sheskin from CSU will be tasked to research and investigate metrics that can be used to determine the technical progress for advanced development and research tasks. These metrics will be implemented in a software environment that hosts engineering design, analysis and management tools to be used to support power system and component research work at GRC. Professor Sheskin is an Industrial Engineer and has been involved in issues related to management of engineering tasks and will use his knowledge from this area to allow extrapolation into the research and technology management area. Over the course of the summer, Professor Sheskin will develop a bibliography of management papers covering current management methods that may be applicable to research management. At the completion of the summer work we expect to have him recommend a metric system to be reviewed prior to implementation in the software environment. This task has been discussed with Professor Sheskin and some review material has already been given to him.
Compressed Sensing for Metrics Development

NASA Astrophysics Data System (ADS)

McGraw, R. L.; Giangrande, S. E.; Liu, Y.

2012-12-01

Models by their very nature tend to be sparse in the sense that they are designed, with a few optimally selected key parameters, to provide simple yet faithful representations of a complex observational dataset or computer simulation output. This paper seeks to apply methods from compressed sensing (CS), a new area of applied mathematics currently undergoing a very rapid development (see for example Candes et al., 2006), to FASTER needs for new approaches to model evaluation and metrics development. The CS approach will be illustrated for a time series generated using a few-parameter (i.e. sparse) model. A seemingly incomplete set of measurements, taken at a just few random sampling times, is then used to recover the hidden model parameters. Remarkably there is a sharp transition in the number of required measurements, beyond which both the model parameters and time series are recovered exactly. Applications to data compression, data sampling/collection strategies, and to the development of metrics for model evaluation by comparison with observation (e.g. evaluation of model predictions of cloud fraction using cloud radar observations) are presented and discussed in context of the CS approach. Cited reference: Candes, E. J., Romberg, J., and Tao, T. (2006), Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information, IEEE Transactions on Information Theory, 52, 489-509.
Develop metrics of tire debris on Texas highways : [project summary].

DOT National Transportation Integrated Search

2017-05-01

This research developed metrics on the amount and characteristics of tire debris generated on Texas highways. These metrics provide numerical, data-based rates for districts to anticipate the amounts and characteristics of tire debris and to plan rem...
Hospital readiness for health information exchange: development of metrics associated with successful collaboration for quality improvement.

PubMed

Korst, Lisa M; Aydin, Carolyn E; Signer, Jordana M K; Fink, Arlene

2011-08-01

The development of readiness metrics for organizational participation in health information exchange is critical for monitoring progress toward, and achievement of, successful inter-organizational collaboration. In preparation for the development of a tool to measure readiness for data-sharing, we tested whether organizational capacities known to be related to readiness were associated with successful participation in an American data-sharing collaborative for quality improvement. Cross-sectional design, using an on-line survey of hospitals in a large, mature data-sharing collaborative organized for benchmarking and improvement in nursing care quality. Factor analysis was used to identify salient constructs, and identified factors were analyzed with respect to "successful" participation. "Success" was defined as the incorporation of comparative performance data into the hospital dashboard. The most important factor in predicting success included survey items measuring the strength of organizational leadership in fostering a culture of quality improvement (QI Leadership): (1) presence of a supportive hospital executive; (2) the extent to which a hospital values data; (3) the presence of leaders' vision for how the collaborative advances the hospital's strategic goals; (4) hospital use of the collaborative data to track quality outcomes; and (5) staff recognition of a strong mandate for collaborative participation (α=0.84, correlation with Success 0.68 [P<0.0001]). The data emphasize the importance of hospital QI Leadership in collaboratives that aim to share data for QI or safety purposes. Such metrics should prove useful in the planning and development of this complex form of inter-organizational collaboration. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Hospital readiness for health information exchange: development of metrics associated with successful collaboration for quality improvement

PubMed Central

Korst, Lisa M.; Aydin, Carolyn E.; Signer, Jordana M. K.; Fink, Arlene

2011-01-01

Objective The development of readiness metrics for organizational participation in health information exchange is critical for monitoring progress toward, and achievement of, successful inter-organizational collaboration. In preparation for the development of a tool to measure readiness for data-sharing, we tested whether organizational capacities known to be related to readiness were associated with successful participation in an American data-sharing collaborative for quality improvement. Design Cross-sectional design, using an on-line survey of hospitals in a large, mature data-sharing collaborative organized for benchmarking and improvement in nursing care quality. Measurements Factor analysis was used to identify salient constructs, and identified factors were analyzed with respect to “successful” participation. “Success” was defined as the incorporation of comparative performance data into the hospital dashboard. Results The most important factor in predicting success included survey items measuring the strength of organizational leadership in fostering a culture of quality improvement (QI Leadership): 1) presence of a supportive hospital executive; 2) the extent to which a hospital values data; 3) the presence of leaders’ vision for how the collaborative advances the hospital’s strategic goals; 4) hospital use of the collaborative data to track quality outcomes; and 5) staff recognition of a strong mandate for collaborative participation (α = 0.84, correlation with Success 0.68 [P < 0.0001]). Conclusion The data emphasize the importance of hospital QI Leadership in collaboratives that aim to share data for QI or safety purposes. Such metrics should prove useful in the planning and development of this complex form of inter-organizational collaboration. PMID:21330191
[Clinical trial data management and quality metrics system].

PubMed

Chen, Zhao-hua; Huang, Qin; Deng, Ya-zhong; Zhang, Yue; Xu, Yu; Yu, Hao; Liu, Zong-fan

2015-11-01

Data quality management system is essential to ensure accurate, complete, consistent, and reliable data collection in clinical research. This paper is devoted to various choices of data quality metrics. They are categorized by study status, e.g. study start up, conduct, and close-out. In each category, metrics for different purposes are listed according to ALCOA+ principles such us completeness, accuracy, timeliness, traceability, etc. Some general quality metrics frequently used are also introduced. This paper contains detail information as much as possible to each metric by providing definition, purpose, evaluation, referenced benchmark, and recommended targets in favor of real practice. It is important that sponsors and data management service providers establish a robust integrated clinical trial data quality management system to ensure sustainable high quality of clinical trial deliverables. It will also support enterprise level of data evaluation and bench marking the quality of data across projects, sponsors, data management service providers by using objective metrics from the real clinical trials. We hope this will be a significant input to accelerate the improvement of clinical trial data quality in the industry.
Interlaboratory Study Characterizing a Yeast Performance Standard for Benchmarking LC-MS Platform Performance*

PubMed Central

Paulovich, Amanda G.; Billheimer, Dean; Ham, Amy-Joan L.; Vega-Montoto, Lorenzo; Rudnick, Paul A.; Tabb, David L.; Wang, Pei; Blackman, Ronald K.; Bunk, David M.; Cardasis, Helene L.; Clauser, Karl R.; Kinsinger, Christopher R.; Schilling, Birgit; Tegeler, Tony J.; Variyath, Asokan Mulayath; Wang, Mu; Whiteaker, Jeffrey R.; Zimmerman, Lisa J.; Fenyo, David; Carr, Steven A.; Fisher, Susan J.; Gibson, Bradford W.; Mesri, Mehdi; Neubert, Thomas A.; Regnier, Fred E.; Rodriguez, Henry; Spiegelman, Cliff; Stein, Stephen E.; Tempst, Paul; Liebler, Daniel C.

2010-01-01

Optimal performance of LC-MS/MS platforms is critical to generating high quality proteomics data. Although individual laboratories have developed quality control samples, there is no widely available performance standard of biological complexity (and associated reference data sets) for benchmarking of platform performance for analysis of complex biological proteomes across different laboratories in the community. Individual preparations of the yeast Saccharomyces cerevisiae proteome have been used extensively by laboratories in the proteomics community to characterize LC-MS platform performance. The yeast proteome is uniquely attractive as a performance standard because it is the most extensively characterized complex biological proteome and the only one associated with several large scale studies estimating the abundance of all detectable proteins. In this study, we describe a standard operating protocol for large scale production of the yeast performance standard and offer aliquots to the community through the National Institute of Standards and Technology where the yeast proteome is under development as a certified reference material to meet the long term needs of the community. Using a series of metrics that characterize LC-MS performance, we provide a reference data set demonstrating typical performance of commonly used ion trap instrument platforms in expert laboratories; the results provide a basis for laboratories to benchmark their own performance, to improve upon current methods, and to evaluate new technologies. Additionally, we demonstrate how the yeast reference, spiked with human proteins, can be used to benchmark the power of proteomics platforms for detection of differentially expressed proteins at different levels of concentration in a complex matrix, thereby providing a metric to evaluate and minimize preanalytical and analytical variation in comparative proteomics experiments. PMID:19858499
Principles for Developing Benchmark Criteria for Staff Training in Responsible Gambling.

PubMed

Oehler, Stefan; Banzer, Raphaela; Gruenerbl, Agnes; Malischnig, Doris; Griffiths, Mark D; Haring, Christian

2017-03-01

One approach to minimizing the negative consequences of excessive gambling is staff training to reduce the rate of the development of new cases of harm or disorder within their customers. The primary goal of the present study was to assess suitable benchmark criteria for the training of gambling employees at casinos and lottery retailers. The study utilised the Delphi Method, a survey with one qualitative and two quantitative phases. A total of 21 invited international experts in the responsible gambling field participated in all three phases. A total of 75 performance indicators were outlined and assigned to six categories: (1) criteria of content, (2) modelling, (3) qualification of trainer, (4) framework conditions, (5) sustainability and (6) statistical indicators. Nine of the 75 indicators were rated as very important by 90 % or more of the experts. Unanimous support for importance was given to indicators such as (1) comprehensibility and (2) concrete action-guidance for handling with problem gamblers, Additionally, the study examined the implementation of benchmarking, when it should be conducted, and who should be responsible. Results indicated that benchmarking should be conducted every 1-2 years regularly and that one institution should be clearly defined and primarily responsible for benchmarking. The results of the present study provide the basis for developing a benchmarking for staff training in responsible gambling.
Benchmarking the neurology practice.

PubMed

Henderson, William S

2010-05-01

A medical practice, whether operated by a solo physician or by a group, is a business. For a neurology practice to be successful, it must meet performance measures that ensure its viability. The best method of doing this is to benchmark the practice, both against itself over time and against other practices. Crucial medical practice metrics that should be measured are financial performance, staffing efficiency, physician productivity, and patient access. Such measures assist a physician or practice in achieving the goals and objectives that each determines are important to providing quality health care to patients. Copyright 2010 Elsevier Inc. All rights reserved.
A Locally Weighted Fixation Density-Based Metric for Assessing the Quality of Visual Saliency Predictions

NASA Astrophysics Data System (ADS)

Gide, Milind S.; Karam, Lina J.

2016-08-01

With the increased focus on visual attention (VA) in the last decade, a large number of computational visual saliency methods have been developed over the past few years. These models are traditionally evaluated by using performance evaluation metrics that quantify the match between predicted saliency and fixation data obtained from eye-tracking experiments on human observers. Though a considerable number of such metrics have been proposed in the literature, there are notable problems in them. In this work, we discuss shortcomings in existing metrics through illustrative examples and propose a new metric that uses local weights based on fixation density which overcomes these flaws. To compare the performance of our proposed metric at assessing the quality of saliency prediction with other existing metrics, we construct a ground-truth subjective database in which saliency maps obtained from 17 different VA models are evaluated by 16 human observers on a 5-point categorical scale in terms of their visual resemblance with corresponding ground-truth fixation density maps obtained from eye-tracking data. The metrics are evaluated by correlating metric scores with the human subjective ratings. The correlation results show that the proposed evaluation metric outperforms all other popular existing metrics. Additionally, the constructed database and corresponding subjective ratings provide an insight into which of the existing metrics and future metrics are better at estimating the quality of saliency prediction and can be used as a benchmark.
Critical Assessment of Metagenome Interpretation – a benchmark of computational metagenomics software

PubMed Central

Sczyrba, Alexander; Hofmann, Peter; Belmann, Peter; Koslicki, David; Janssen, Stefan; Dröge, Johannes; Gregor, Ivan; Majda, Stephan; Fiedler, Jessika; Dahms, Eik; Bremges, Andreas; Fritz, Adrian; Garrido-Oter, Ruben; Jørgensen, Tue Sparholt; Shapiro, Nicole; Blood, Philip D.; Gurevich, Alexey; Bai, Yang; Turaev, Dmitrij; DeMaere, Matthew Z.; Chikhi, Rayan; Nagarajan, Niranjan; Quince, Christopher; Meyer, Fernando; Balvočiūtė, Monika; Hansen, Lars Hestbjerg; Sørensen, Søren J.; Chia, Burton K. H.; Denis, Bertrand; Froula, Jeff L.; Wang, Zhong; Egan, Robert; Kang, Dongwan Don; Cook, Jeffrey J.; Deltel, Charles; Beckstette, Michael; Lemaitre, Claire; Peterlongo, Pierre; Rizk, Guillaume; Lavenier, Dominique; Wu, Yu-Wei; Singer, Steven W.; Jain, Chirag; Strous, Marc; Klingenberg, Heiner; Meinicke, Peter; Barton, Michael; Lingner, Thomas; Lin, Hsin-Hung; Liao, Yu-Chieh; Silva, Genivaldo Gueiros Z.; Cuevas, Daniel A.; Edwards, Robert A.; Saha, Surya; Piro, Vitor C.; Renard, Bernhard Y.; Pop, Mihai; Klenk, Hans-Peter; Göker, Markus; Kyrpides, Nikos C.; Woyke, Tanja; Vorholt, Julia A.; Schulze-Lefert, Paul; Rubin, Edward M.; Darling, Aaron E.; Rattei, Thomas; McHardy, Alice C.

2018-01-01

In metagenome analysis, computational methods for assembly, taxonomic profiling and binning are key components facilitating downstream biological data interpretation. However, a lack of consensus about benchmarking datasets and evaluation metrics complicates proper performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on datasets of unprecedented complexity and realism. Benchmark metagenomes were generated from ~700 newly sequenced microorganisms and ~600 novel viruses and plasmids, including genomes with varying degrees of relatedness to each other and to publicly available ones and representing common experimental setups. Across all datasets, assembly and genome binning programs performed well for species represented by individual genomes, while performance was substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below the family level. Parameter settings substantially impacted performances, underscoring the importance of program reproducibility. While highlighting current challenges in computational metagenomics, the CAMI results provide a roadmap for software selection to answer specific research questions. PMID:28967888
Establishing Benchmarks for Outcome Indicators: A Statistical Approach to Developing Performance Standards.

ERIC Educational Resources Information Center

Henry, Gary T.; And Others

1992-01-01

A statistical technique is presented for developing performance standards based on benchmark groups. The benchmark groups are selected using a multivariate technique that relies on a squared Euclidean distance method. For each observation unit (a school district in the example), a unique comparison group is selected. (SLD)
Advanced Life Support Research and Technology Development Metric: Fiscal Year 2003

NASA Technical Reports Server (NTRS)

Hanford, A. J.

2004-01-01

This document provides the official calculation of the Advanced Life Support (ALS) Research and Technology Development Metric (the Metric) for Fiscal Year 2003. As such, the values herein are primarily based on Systems Integration, Modeling, and Analysis (SIMA) Element approved software tools or reviewed and approved reference documents. The Metric is one of several measures employed by the National Aeronautics and Space Administration (NASA) to assess the Agency s progress as mandated by the United States Congress and the Office of Management and Budget. Because any measure must have a reference point, whether explicitly defined or implied, the Metric is a comparison between a selected ALS Project life support system and an equivalently detailed life support system using technology from the Environmental Control and Life Support System (ECLSS) for the International Space Station (ISS). More specifically, the Metric is the ratio defined by the equivalent system mass (ESM) of a life support system for a specific mission using the ISS ECLSS technologies divided by the ESM for an equivalent life support system using the best ALS technologies. As defined, the Metric should increase in value as the ALS technologies become lighter, less power intensive, and require less volume. For Fiscal Year 2003, the Advanced Life Support Research and Technology Development Metric value is 1.47 for an Orbiting Research Facility and 1.36 for an Independent Exploration Mission.
Developing image processing meta-algorithms with data mining of multiple metrics.

PubMed

Leung, Kelvin; Cunha, Alexandre; Toga, A W; Parker, D Stott

2014-01-01

People often use multiple metrics in image processing, but here we take a novel approach of mining the values of batteries of metrics on image processing results. We present a case for extending image processing methods to incorporate automated mining of multiple image metric values. Here by a metric we mean any image similarity or distance measure, and in this paper we consider intensity-based and statistical image measures and focus on registration as an image processing problem. We show how it is possible to develop meta-algorithms that evaluate different image processing results with a number of different metrics and mine the results in an automated fashion so as to select the best results. We show that the mining of multiple metrics offers a variety of potential benefits for many image processing problems, including improved robustness and validation.
Developing Image Processing Meta-Algorithms with Data Mining of Multiple Metrics

PubMed Central

Cunha, Alexandre; Toga, A. W.; Parker, D. Stott

2014-01-01

People often use multiple metrics in image processing, but here we take a novel approach of mining the values of batteries of metrics on image processing results. We present a case for extending image processing methods to incorporate automated mining of multiple image metric values. Here by a metric we mean any image similarity or distance measure, and in this paper we consider intensity-based and statistical image measures and focus on registration as an image processing problem. We show how it is possible to develop meta-algorithms that evaluate different image processing results with a number of different metrics and mine the results in an automated fashion so as to select the best results. We show that the mining of multiple metrics offers a variety of potential benefits for many image processing problems, including improved robustness and validation. PMID:24653748
Validating Cellular Automata Lava Flow Emplacement Algorithms with Standard Benchmarks

NASA Astrophysics Data System (ADS)

Richardson, J. A.; Connor, L.; Charbonnier, S. J.; Connor, C.; Gallant, E.

2015-12-01

A major existing need in assessing lava flow simulators is a common set of validation benchmark tests. We propose three levels of benchmarks which test model output against increasingly complex standards. First, imulated lava flows should be morphologically identical, given changes in parameter space that should be inconsequential, such as slope direction. Second, lava flows simulated in simple parameter spaces can be tested against analytical solutions or empirical relationships seen in Bingham fluids. For instance, a lava flow simulated on a flat surface should produce a circular outline. Third, lava flows simulated over real world topography can be compared to recent real world lava flows, such as those at Tolbachik, Russia, and Fogo, Cape Verde. Success or failure of emplacement algorithms in these validation benchmarks can be determined using a Bayesian approach, which directly tests the ability of an emplacement algorithm to correctly forecast lava inundation. Here we focus on two posterior metrics, P(A|B) and P(¬A|¬B), which describe the positive and negative predictive value of flow algorithms. This is an improvement on less direct statistics such as model sensitivity and the Jaccard fitness coefficient. We have performed these validation benchmarks on a new, modular lava flow emplacement simulator that we have developed. This simulator, which we call MOLASSES, follows a Cellular Automata (CA) method. The code is developed in several interchangeable modules, which enables quick modification of the distribution algorithm from cell locations to their neighbors. By assessing several different distribution schemes with the benchmark tests, we have improved the performance of MOLASSES to correctly match early stages of the 2012-3 Tolbachik Flow, Kamchakta Russia, to 80%. We also can evaluate model performance given uncertain input parameters using a Monte Carlo setup. This illuminates sensitivity to model uncertainty.

Can Human Capital Metrics Effectively Benchmark Higher Education with For-Profit Companies?

ERIC Educational Resources Information Center

Hagedorn, Kathy; Forlaw, Blair

2007-01-01

Last fall, Saint Louis University participated in St. Louis, Missouri's, first Human Capital Performance Study alongside several of the region's largest for-profit employers. The university also participated this year in the benchmarking of employee engagement factors conducted by the St. Louis Business Journal in its effort to quantify and select…
SU-E-T-776: Use of Quality Metrics for a New Hypo-Fractionated Pre-Surgical Mesothelioma Protocol

DOE Office of Scientific and Technical Information (OSTI.GOV)

Richardson, S; Mehta, V

Purpose: The “SMART” (Surgery for Mesothelioma After Radiation Therapy) approach involves hypo-fractionated radiotherapy of the lung pleura to 25Gy over 5 days followed by surgical resection within 7. Early clinical results suggest that this approach is very promising, but also logistically challenging due to the multidisciplinary involvement. Due to the compressed schedule, high dose, and shortened planning time, the delivery of the planned doses were monitored for safety with quality metric software. Methods: Hypo-fractionated IMRT treatment plans were developed for all patients and exported to Quality Reports™ software. Plan quality metrics or PQMs™ were created to calculate an objective scoringmore » function for each plan. This allows for an objective assessment of the quality of the plan and a benchmark for plan improvement for subsequent patients. The priorities of various components were incorporated based on similar hypo-fractionated protocols such as lung SBRT treatments. Results: Five patients have been treated at our institution using this approach. The plans were developed, QA performed, and ready within 5 days of simulation. Plan Quality metrics utilized in scoring included doses to OAR and target coverage. All patients tolerated treatment well and proceeded to surgery as scheduled. Reported toxicity included grade 1 nausea (n=1), grade 1 esophagitis (n=1), grade 2 fatigue (n=3). One patient had recurrent fluid accumulation following surgery. No patients experienced any pulmonary toxicity prior to surgery. Conclusion: An accelerated course of pre-operative high dose radiation for mesothelioma is an innovative and promising new protocol. Without historical data, one must proceed cautiously and monitor the data carefully. The development of quality metrics and scoring functions for these treatments allows us to benchmark our plans and monitor improvement. If subsequent toxicities occur, these will be easy to investigate and incorporate into
Taking Aims: New CASE Study Benchmarks Advancement Investments and Returns

ERIC Educational Resources Information Center

Goldsmith, Rae

2012-01-01

Advancement professionals have always been thirsty for information that will help them understand how their programs compare with those of their peers. But in recent years the demand for benchmarking data has exploded as budgets have become leaner, leaders have become more business minded, and terms like "performance metrics and return on…
The software product assurance metrics study: JPL's software systems quality and productivity

NASA Technical Reports Server (NTRS)

Bush, Marilyn W.

1989-01-01

The findings are reported of the Jet Propulsion Laboratory (JPL)/Software Product Assurance (SPA) Metrics Study, conducted as part of a larger JPL effort to improve software quality and productivity. Until recently, no comprehensive data had been assembled on how JPL manages and develops software-intensive systems. The first objective was to collect data on software development from as many projects and for as many years as possible. Results from five projects are discussed. These results reflect 15 years of JPL software development, representing over 100 data points (systems and subsystems), over a third of a billion dollars, over four million lines of code and 28,000 person months. Analysis of this data provides a benchmark for gauging the effectiveness of past, present and future software development work. In addition, the study is meant to encourage projects to record existing metrics data and to gather future data. The SPA long term goal is to integrate the collection of historical data and ongoing project data with future project estimations.
Automated Metrics in a Virtual-Reality Myringotomy Simulator: Development and Construct Validity.

PubMed

Huang, Caiwen; Cheng, Horace; Bureau, Yves; Ladak, Hanif M; Agrawal, Sumit K

2018-06-15

The objectives of this study were: 1) to develop and implement a set of automated performance metrics into the Western myringotomy simulator, and 2) to establish construct validity. Prospective simulator-based assessment study. The Auditory Biophysics Laboratory at Western University, London, Ontario, Canada. Eleven participants were recruited from the Department of Otolaryngology-Head & Neck Surgery at Western University: four senior otolaryngology consultants and seven junior otolaryngology residents. Educational simulation. Discrimination between expert and novice participants on five primary automated performance metrics: 1) time to completion, 2) surgical errors, 3) incision angle, 4) incision length, and 5) the magnification of the microscope. Automated performance metrics were developed, programmed, and implemented into the simulator. Participants were given a standardized simulator orientation and instructions on myringotomy and tube placement. Each participant then performed 10 procedures and automated metrics were collected. The metrics were analyzed using the Mann-Whitney U test with Bonferroni correction. All metrics discriminated senior otolaryngologists from junior residents with a significance of p < 0.002. Junior residents had 2.8 times more errors compared with the senior otolaryngologists. Senior otolaryngologists took significantly less time to completion compared with junior residents. The senior group also had significantly longer incision lengths, more accurate incision angles, and lower magnification keeping both the umbo and annulus in view. Automated quantitative performance metrics were successfully developed and implemented, and construct validity was established by discriminating between expert and novice participants.
Development and Applications of Benchmark Examples for Static Delamination Propagation Predictions

NASA Technical Reports Server (NTRS)

Krueger, Ronald

2013-01-01

The development and application of benchmark examples for the assessment of quasistatic delamination propagation capabilities was demonstrated for ANSYS (TradeMark) and Abaqus/Standard (TradeMark). The examples selected were based on finite element models of Double Cantilever Beam (DCB) and Mixed-Mode Bending (MMB) specimens. First, quasi-static benchmark results were created based on an approach developed previously. Second, the delamination was allowed to propagate under quasi-static loading from its initial location using the automated procedure implemented in ANSYS (TradeMark) and Abaqus/Standard (TradeMark). Input control parameters were varied to study the effect on the computed delamination propagation. Overall, the benchmarking procedure proved valuable by highlighting the issues associated with choosing the appropriate input parameters for the VCCT implementations in ANSYS® and Abaqus/Standard®. However, further assessment for mixed-mode delamination fatigue onset and growth is required. Additionally studies should include the assessment of the propagation capabilities in more complex specimens and on a structural level.
Early Warning Look Ahead Metrics: The Percent Milestone Backlog Metric

NASA Technical Reports Server (NTRS)

Shinn, Stephen A.; Anderson, Timothy P.

2017-01-01

All complex development projects experience delays and corresponding backlogs of their project control milestones during their acquisition lifecycles. NASA Goddard Space Flight Center (GSFC) Flight Projects Directorate (FPD) teamed with The Aerospace Corporation (Aerospace) to develop a collection of Early Warning Look Ahead metrics that would provide GSFC leadership with some independent indication of the programmatic health of GSFC flight projects. As part of the collection of Early Warning Look Ahead metrics, the Percent Milestone Backlog metric is particularly revealing, and has utility as a stand-alone execution performance monitoring tool. This paper describes the purpose, development methodology, and utility of the Percent Milestone Backlog metric. The other four Early Warning Look Ahead metrics are also briefly discussed. Finally, an example of the use of the Percent Milestone Backlog metric in providing actionable insight is described, along with examples of its potential use in other commodities.
The NAS parallel benchmarks

NASA Technical Reports Server (NTRS)

Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)

1993-01-01

A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
Developing and Trialling an independent, scalable and repeatable IT-benchmarking procedure for healthcare organisations.

PubMed

Liebe, J D; Hübner, U

2013-01-01

Continuous improvements of IT-performance in healthcare organisations require actionable performance indicators, regularly conducted, independent measurements and meaningful and scalable reference groups. Existing IT-benchmarking initiatives have focussed on the development of reliable and valid indicators, but less on the questions about how to implement an environment for conducting easily repeatable and scalable IT-benchmarks. This study aims at developing and trialling a procedure that meets the afore-mentioned requirements. We chose a well established, regularly conducted (inter-) national IT-survey of healthcare organisations (IT-Report Healthcare) as the environment and offered the participants of the 2011 survey (CIOs of hospitals) to enter a benchmark. The 61 structural and functional performance indicators covered among others the implementation status and integration of IT-systems and functions, global user satisfaction and the resources of the IT-department. Healthcare organisations were grouped by size and ownership. The benchmark results were made available electronically and feedback on the use of these results was requested after several months. Fifty-ninehospitals participated in the benchmarking. Reference groups consisted of up to 141 members depending on the number of beds (size) and the ownership (public vs. private). A total of 122 charts showing single indicator frequency views were sent to each participant. The evaluation showed that 94.1% of the CIOs who participated in the evaluation considered this benchmarking beneficial and reported that they would enter again. Based on the feedback of the participants we developed two additional views that provide a more consolidated picture. The results demonstrate that establishing an independent, easily repeatable and scalable IT-benchmarking procedure is possible and was deemed desirable. Based on these encouraging results a new benchmarking round which includes process indicators is currently
Benchmarking the ATLAS software through the Kit Validation engine

NASA Astrophysics Data System (ADS)

De Salvo, Alessandro; Brasolin, Franco

2010-04-01

The measurement of the experiment software performance is a very important metric in order to choose the most effective resources to be used and to discover the bottlenecks of the code implementation. In this work we present the benchmark techniques used to measure the ATLAS software performance through the ATLAS offline testing engine Kit Validation and the online portal Global Kit Validation. The performance measurements, the data collection, the online analysis and display of the results will be presented. The results of the measurement on different platforms and architectures will be shown, giving a full report on the CPU power and memory consumption of the Monte Carlo generation, simulation, digitization and reconstruction of the most CPU-intensive channels. The impact of the multi-core computing on the ATLAS software performance will also be presented, comparing the behavior of different architectures when increasing the number of concurrent processes. The benchmark techniques described in this paper have been used in the HEPiX group since the beginning of 2008 to help defining the performance metrics for the High Energy Physics applications, based on the real experiment software.
Uav Cameras: Overview and Geometric Calibration Benchmark

NASA Astrophysics Data System (ADS)

Cramer, M.; Przybilla, H.-J.; Zurhorst, A.

2017-08-01

Different UAV platforms and sensors are used in mapping already, many of them equipped with (sometimes) modified cameras as known from the consumer market. Even though these systems normally fulfil their requested mapping accuracy, the question arises, which system performs best? This asks for a benchmark, to check selected UAV based camera systems in well-defined, reproducible environments. Such benchmark is tried within this work here. Nine different cameras used on UAV platforms, representing typical camera classes, are considered. The focus is laid on the geometry here, which is tightly linked to the process of geometrical calibration of the system. In most applications the calibration is performed in-situ, i.e. calibration parameters are obtained as part of the project data itself. This is often motivated because consumer cameras do not keep constant geometry, thus, cannot be seen as metric cameras. Still, some of the commercial systems are quite stable over time, as it was proven from repeated (terrestrial) calibrations runs. Already (pre-)calibrated systems may offer advantages, especially when the block geometry of the project does not allow for a stable and sufficient in-situ calibration. Especially for such scenario close to metric UAV cameras may have advantages. Empirical airborne test flights in a calibration field have shown how block geometry influences the estimated calibration parameters and how consistent the parameters from lab calibration can be reproduced.
The NAS parallel benchmarks

NASA Technical Reports Server (NTRS)

Bailey, D. H.; Barszcz, E.; Barton, J. T.; Carter, R. L.; Lasinski, T. A.; Browning, D. S.; Dagum, L.; Fatoohi, R. A.; Frederickson, P. O.; Schreiber, R. S.

1991-01-01

A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers in the framework of the NASA Ames Numerical Aerodynamic Simulation (NAS) Program. These consist of five 'parallel kernel' benchmarks and three 'simulated application' benchmarks. Together they mimic the computation and data movement characteristics of large-scale computational fluid dynamics applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification-all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
Development of Benchmark Examples for Static Delamination Propagation and Fatigue Growth Predictions

NASA Technical Reports Server (NTRS)

Kruger, Ronald

2011-01-01

The development of benchmark examples for static delamination propagation and cyclic delamination onset and growth prediction is presented and demonstrated for a commercial code. The example is based on a finite element model of an End-Notched Flexure (ENF) specimen. The example is independent of the analysis software used and allows the assessment of the automated delamination propagation, onset and growth prediction capabilities in commercial finite element codes based on the virtual crack closure technique (VCCT). First, static benchmark examples were created for the specimen. Second, based on the static results, benchmark examples for cyclic delamination growth were created. Third, the load-displacement relationship from a propagation analysis and the benchmark results were compared, and good agreement could be achieved by selecting the appropriate input parameters. Fourth, starting from an initially straight front, the delamination was allowed to grow under cyclic loading. The number of cycles to delamination onset and the number of cycles during stable delamination growth for each growth increment were obtained from the automated analysis and compared to the benchmark examples. Again, good agreement between the results obtained from the growth analysis and the benchmark results could be achieved by selecting the appropriate input parameters. The benchmarking procedure proved valuable by highlighting the issues associated with the input parameters of the particular implementation. Selecting the appropriate input parameters, however, was not straightforward and often required an iterative procedure. Overall, the results are encouraging but further assessment for mixed-mode delamination is required.
Development of a Benchmark Example for Delamination Fatigue Growth Prediction

NASA Technical Reports Server (NTRS)

Krueger, Ronald

2010-01-01

The development of a benchmark example for cyclic delamination growth prediction is presented and demonstrated for a commercial code. The example is based on a finite element model of a Double Cantilever Beam (DCB) specimen, which is independent of the analysis software used and allows the assessment of the delamination growth prediction capabilities in commercial finite element codes. First, the benchmark result was created for the specimen. Second, starting from an initially straight front, the delamination was allowed to grow under cyclic loading in a finite element model of a commercial code. The number of cycles to delamination onset and the number of cycles during stable delamination growth for each growth increment were obtained from the analysis. In general, good agreement between the results obtained from the growth analysis and the benchmark results could be achieved by selecting the appropriate input parameters. Overall, the results are encouraging but further assessment for mixed-mode delamination is required
NASA metric transition plan

NASA Technical Reports Server (NTRS)

1992-01-01

NASA science publications have used the metric system of measurement since 1970. Although NASA has maintained a metric use policy since 1979, practical constraints have restricted actual use of metric units. In 1988, an amendment to the Metric Conversion Act of 1975 required the Federal Government to adopt the metric system except where impractical. In response to Public Law 100-418 and Executive Order 12770, NASA revised its metric use policy and developed this Metric Transition Plan. NASA's goal is to use the metric system for program development and functional support activities to the greatest practical extent by the end of 1995. The introduction of the metric system into new flight programs will determine the pace of the metric transition. Transition of institutional capabilities and support functions will be phased to enable use of the metric system in flight program development and operations. Externally oriented elements of this plan will introduce and actively support use of the metric system in education, public information, and small business programs. The plan also establishes a procedure for evaluating and approving waivers and exceptions to the required use of the metric system for new programs. Coordination with other Federal agencies and departments (through the Interagency Council on Metric Policy) and industry (directly and through professional societies and interest groups) will identify sources of external support and minimize duplication of effort.
Installed Cost Benchmarks and Deployment Barriers for Residential Solar Photovoltaics with Energy Storage: Q1 2016

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ardani, Kristen; O'Shaughnessy, Eric; Fu, Ran

2016-12-01

In this report, we fill a gap in the existing knowledge about PV-plus-storage system costs and value by providing detailed component- and system-level installed cost benchmarks for residential systems. We also examine other barriers to increased deployment of PV-plus-storage systems in the residential sector. The results are meant to help technology manufacturers, installers, and other stakeholders identify cost-reduction opportunities and inform decision makers about regulatory, policy, and market characteristics that impede solar plus storage deployment. In addition, our periodic cost benchmarks will document progress in cost reductions over time. To analyze costs for PV-plus-storage systems deployed in the first quartermore » of 2016, we adapt the National Renewable Energy Laboratory's component- and system-level cost-modeling methods for standalone PV. In general, we attempt to model best-in-class installation techniques and business operations from an installed-cost perspective. In addition to our original analysis, model development, and review of published literature, we derive inputs for our model and validate our draft results via interviews with industry and subject-matter experts. One challenge to analyzing the costs of PV-plus-storage systems is choosing an appropriate cost metric. Unlike standalone PV, energy storage lacks universally accepted cost metrics, such as dollars per watt of installed capacity and lifetime levelized cost of energy. We explain the difficulty of arriving at a standard approach for reporting storage costs and then provide the rationale for using the total installed costs of a standard PV-plus-storage system as our primary metric, rather than using a system-size-normalized metric.« less
Benchmarking a geostatistical procedure for the homogenisation of annual precipitation series

NASA Astrophysics Data System (ADS)

Caineta, Júlio; Ribeiro, Sara; Henriques, Roberto; Soares, Amílcar; Costa, Ana Cristina

2014-05-01

The European project COST Action ES0601, Advances in homogenisation methods of climate series: an integrated approach (HOME), has brought to attention the importance of establishing reliable homogenisation methods for climate data. In order to achieve that, a benchmark data set, containing monthly and daily temperature and precipitation data, was created to be used as a comparison basis for the effectiveness of those methods. Several contributions were submitted and evaluated by a number of performance metrics, validating the results against realistic inhomogeneous data. HOME also led to the development of new homogenisation software packages, which included feedback and lessons learned during the project. Preliminary studies have suggested a geostatistical stochastic approach, which uses Direct Sequential Simulation (DSS), as a promising methodology for the homogenisation of precipitation data series. Based on the spatial and temporal correlation between the neighbouring stations, DSS calculates local probability density functions at a candidate station to detect inhomogeneities. The purpose of the current study is to test and compare this geostatistical approach with the methods previously presented in the HOME project, using surrogate precipitation series from the HOME benchmark data set. The benchmark data set contains monthly precipitation surrogate series, from which annual precipitation data series were derived. These annual precipitation series were subject to exploratory analysis and to a thorough variography study. The geostatistical approach was then applied to the data set, based on different scenarios for the spatial continuity. Implementing this procedure also promoted the development of a computer program that aims to assist on the homogenisation of climate data, while minimising user interaction. Finally, in order to compare the effectiveness of this methodology with the homogenisation methods submitted during the HOME project, the obtained results
All inclusive benchmarking.

PubMed

Ellis, Judith

2006-07-01

The aim of this article is to review published descriptions of benchmarking activity and synthesize benchmarking principles to encourage the acceptance and use of Essence of Care as a new benchmarking approach to continuous quality improvement, and to promote its acceptance as an integral and effective part of benchmarking activity in health services. The Essence of Care, was launched by the Department of Health in England in 2001 to provide a benchmarking tool kit to support continuous improvement in the quality of fundamental aspects of health care, for example, privacy and dignity, nutrition and hygiene. The tool kit is now being effectively used by some frontline staff. However, use is inconsistent, with the value of the tool kit, or the support clinical practice benchmarking requires to be effective, not always recognized or provided by National Health Service managers, who are absorbed with the use of quantitative benchmarking approaches and measurability of comparative performance data. This review of published benchmarking literature, was obtained through an ever-narrowing search strategy commencing from benchmarking within quality improvement literature through to benchmarking activity in health services and including access to not only published examples of benchmarking approaches and models used but the actual consideration of web-based benchmarking data. This supported identification of how benchmarking approaches have developed and been used, remaining true to the basic benchmarking principles of continuous improvement through comparison and sharing (Camp 1989). Descriptions of models and exemplars of quantitative and specifically performance benchmarking activity in industry abound (Camp 1998), with far fewer examples of more qualitative and process benchmarking approaches in use in the public services and then applied to the health service (Bullivant 1998). The literature is also in the main descriptive in its support of the effectiveness of
Disaster metrics: quantitative benchmarking of hospital surge capacity in trauma-related multiple casualty events.

PubMed

Bayram, Jamil D; Zuabi, Shawki; Subbarao, Italo

2011-06-01

Hospital surge capacity in multiple casualty events (MCE) is the core of hospital medical response, and an integral part of the total medical capacity of the community affected. To date, however, there has been no consensus regarding the definition or quantification of hospital surge capacity. The first objective of this study was to quantitatively benchmark the various components of hospital surge capacity pertaining to the care of critically and moderately injured patients in trauma-related MCE. The second objective was to illustrate the applications of those quantitative parameters in local, regional, national, and international disaster planning; in the distribution of patients to various hospitals by prehospital medical services; and in the decision-making process for ambulance diversion. A 2-step approach was adopted in the methodology of this study. First, an extensive literature search was performed, followed by mathematical modeling. Quantitative studies on hospital surge capacity for trauma injuries were used as the framework for our model. The North Atlantic Treaty Organization triage categories (T1-T4) were used in the modeling process for simplicity purposes. Hospital Acute Care Surge Capacity (HACSC) was defined as the maximum number of critical (T1) and moderate (T2) casualties a hospital can adequately care for per hour, after recruiting all possible additional medical assets. HACSC was modeled to be equal to the number of emergency department beds (#EDB), divided by the emergency department time (EDT); HACSC = #EDB/EDT. In trauma-related MCE, the EDT was quantitatively benchmarked to be 2.5 (hours). Because most of the critical and moderate casualties arrive at hospitals within a 6-hour period requiring admission (by definition), the hospital bed surge capacity must match the HACSC at 6 hours to ensure coordinated care, and it was mathematically benchmarked to be 18% of the staffed hospital bed capacity. Defining and quantitatively benchmarking the
Electric load shape benchmarking for small- and medium-sized commercial buildings

DOE Office of Scientific and Technical Information (OSTI.GOV)

Luo, Xuan; Hong, Tianzhen; Chen, Yixing

Small- and medium-sized commercial buildings owners and utility managers often look for opportunities for energy cost savings through energy efficiency and energy waste minimization. However, they currently lack easy access to low-cost tools that help interpret the massive amount of data needed to improve understanding of their energy use behaviors. Benchmarking is one of the techniques used in energy audits to identify which buildings are priorities for an energy analysis. Traditional energy performance indicators, such as the energy use intensity (annual energy per unit of floor area), consider only the total annual energy consumption, lacking consideration of the fluctuation ofmore » energy use behavior over time, which reveals the time of use information and represents distinct energy use behaviors during different time spans. To fill the gap, this study developed a general statistical method using 24-hour electric load shape benchmarking to compare a building or business/tenant space against peers. Specifically, the study developed new forms of benchmarking metrics and data analysis methods to infer the energy performance of a building based on its load shape. We first performed a data experiment with collected smart meter data using over 2,000 small- and medium-sized businesses in California. We then conducted a cluster analysis of the source data, and determined and interpreted the load shape features and parameters with peer group analysis. Finally, we implemented the load shape benchmarking feature in an open-access web-based toolkit (the Commercial Building Energy Saver) to provide straightforward and practical recommendations to users. The analysis techniques were generic and flexible for future datasets of other building types and in other utility territories.« less

Electric load shape benchmarking for small- and medium-sized commercial buildings

DOE PAGES

Luo, Xuan; Hong, Tianzhen; Chen, Yixing; ...

2017-07-28

Small- and medium-sized commercial buildings owners and utility managers often look for opportunities for energy cost savings through energy efficiency and energy waste minimization. However, they currently lack easy access to low-cost tools that help interpret the massive amount of data needed to improve understanding of their energy use behaviors. Benchmarking is one of the techniques used in energy audits to identify which buildings are priorities for an energy analysis. Traditional energy performance indicators, such as the energy use intensity (annual energy per unit of floor area), consider only the total annual energy consumption, lacking consideration of the fluctuation ofmore » energy use behavior over time, which reveals the time of use information and represents distinct energy use behaviors during different time spans. To fill the gap, this study developed a general statistical method using 24-hour electric load shape benchmarking to compare a building or business/tenant space against peers. Specifically, the study developed new forms of benchmarking metrics and data analysis methods to infer the energy performance of a building based on its load shape. We first performed a data experiment with collected smart meter data using over 2,000 small- and medium-sized businesses in California. We then conducted a cluster analysis of the source data, and determined and interpreted the load shape features and parameters with peer group analysis. Finally, we implemented the load shape benchmarking feature in an open-access web-based toolkit (the Commercial Building Energy Saver) to provide straightforward and practical recommendations to users. The analysis techniques were generic and flexible for future datasets of other building types and in other utility territories.« less
Benchmarking for Higher Education.

ERIC Educational Resources Information Center

Jackson, Norman, Ed.; Lund, Helen, Ed.

The chapters in this collection explore the concept of benchmarking as it is being used and developed in higher education (HE). Case studies and reviews show how universities in the United Kingdom are using benchmarking to aid in self-regulation and self-improvement. The chapters are: (1) "Introduction to Benchmarking" (Norman Jackson…
NASA Software Engineering Benchmarking Effort

NASA Technical Reports Server (NTRS)

Godfrey, Sally; Rarick, Heather

2012-01-01

Benchmarking was very interesting and provided a wealth of information (1) We did see potential solutions to some of our "top 10" issues (2) We have an assessment of where NASA stands with relation to other aerospace/defense groups We formed new contacts and potential collaborations (1) Several organizations sent us examples of their templates, processes (2) Many of the organizations were interested in future collaboration: sharing of training, metrics, Capability Maturity Model Integration (CMMI) appraisers, instructors, etc. We received feedback from some of our contractors/ partners (1) Desires to participate in our training; provide feedback on procedures (2) Welcomed opportunity to provide feedback on working with NASA
Implicit Contractive Mappings in Modular Metric and Fuzzy Metric Spaces

PubMed Central

Hussain, N.; Salimi, P.

2014-01-01

The notion of modular metric spaces being a natural generalization of classical modulars over linear spaces like Lebesgue, Orlicz, Musielak-Orlicz, Lorentz, Orlicz-Lorentz, and Calderon-Lozanovskii spaces was recently introduced. In this paper we investigate the existence of fixed points of generalized α-admissible modular contractive mappings in modular metric spaces. As applications, we derive some new fixed point theorems in partially ordered modular metric spaces, Suzuki type fixed point theorems in modular metric spaces and new fixed point theorems for integral contractions. In last section, we develop an important relation between fuzzy metric and modular metric and deduce certain new fixed point results in triangular fuzzy metric spaces. Moreover, some examples are provided here to illustrate the usability of the obtained results. PMID:25003157
Implementing Data Definition Consistency for Emergency Department Operations Benchmarking and Research.

PubMed

Yiadom, Maame Yaa A B; Scheulen, James; McWade, Conor M; Augustine, James J

2016-07-01

The objective was to obtain a commitment to adopt a common set of definitions for emergency department (ED) demographic, clinical process, and performance metrics among the ED Benchmarking Alliance (EDBA), ED Operations Study Group (EDOSG), and Academy of Academic Administrators of Emergency Medicine (AAAEM) by 2017. A retrospective cross-sectional analysis of available data from three ED operations benchmarking organizations supported a negotiation to use a set of common metrics with identical definitions. During a 1.5-day meeting-structured according to social change theories of information exchange, self-interest, and interdependence-common definitions were identified and negotiated using the EDBA's published definitions as a start for discussion. Methods of process analysis theory were used in the 8 weeks following the meeting to achieve official consensus on definitions. These two lists were submitted to the organizations' leadership for implementation approval. A total of 374 unique measures were identified, of which 57 (15%) were shared by at least two organizations. Fourteen (4%) were common to all three organizations. In addition to agreement on definitions for the 14 measures used by all three organizations, agreement was reached on universal definitions for 17 of the 57 measures shared by at least two organizations. The negotiation outcome was a list of 31 measures with universal definitions to be adopted by each organization by 2017. The use of negotiation, social change, and process analysis theories achieved the adoption of universal definitions among the EDBA, EDOSG, and AAAEM. This will impact performance benchmarking for nearly half of US EDs. It initiates a formal commitment to utilize standardized metrics, and it transitions consistency in reporting ED operations metrics from consensus to implementation. This work advances our ability to more accurately characterize variation in ED care delivery models, resource utilization, and performance. In
Evaluating Soil Health Using Remotely Sensed Evapotranspiration on the Benchmark Barnes Soils of North Dakota

NASA Astrophysics Data System (ADS)

Bohn, Meyer; Hopkins, David; Steele, Dean; Tuscherer, Sheldon

2017-04-01

The benchmark Barnes soil series is an extensive upland Hapludoll of the northern Great Plains that is both economically and ecologically vital to the region. Effects of tillage erosion coupled with wind and water erosion have degraded Barnes soil quality, but with unknown extent, distribution, or severity. Evidence of soil degradation documented for a half century warrants that the assumption of productivity be tested. Soil resilience is linked to several dynamic soil properties and National Cooperative Soil Survey initiatives are now focused on identifying those properties for benchmark soils. Quantification of soil degradation is dependent on a reliable method for broad-scale evaluation. The soil survey community is currently developing rapid and widespread soil property assessment technologies. Improvements in satellite based remote-sensing and image analysis software have stimulated the application of broad-scale resource assessment. Furthermore, these technologies have fostered refinement of land-based surface energy balance algorithms, i.e. Mapping Evapotranspiration at High Resolution with Internalized Calibration (METRIC) algorithm for evapotranspiration (ET) mapping. The hypothesis of this study is that ET mapping technology can differentiate soil function on extensive landscapes and identify degraded areas. A recent soil change study in eastern North Dakota resampled legacy Barnes pedons sampled prior to 1960 and found significant decreases in organic carbon. An ancillary study showed that evapotranspiration (ET) estimates from METRIC decreased with Barnes erosion class severity. An ET raster map has been developed for three eastern North Dakota counties using METRIC and Landsat 5 imagery. ET pixel candidates on major Barnes soil map units were stratified into tertiles and classified as ranked ET subdivisions. A sampling population of randomly selected points stratified by ET class and county proportion was established. Morphologic and chemical data will
Developing Metrics in Systems Integration (ISS Program COTS Integration Model)

NASA Technical Reports Server (NTRS)

Lueders, Kathryn

2007-01-01

This viewgraph presentation reviews some of the complications in developing metrics for systems integration. Specifically it reviews a case study of how two programs within NASA try to develop and measure performance while meeting the encompassing organizational goals.
Development of Benchmark Examples for Quasi-Static Delamination Propagation and Fatigue Growth Predictions

NASA Technical Reports Server (NTRS)

Krueger, Ronald

2012-01-01

The development of benchmark examples for quasi-static delamination propagation and cyclic delamination onset and growth prediction is presented and demonstrated for Abaqus/Standard. The example is based on a finite element model of a Double-Cantilever Beam specimen. The example is independent of the analysis software used and allows the assessment of the automated delamination propagation, onset and growth prediction capabilities in commercial finite element codes based on the virtual crack closure technique (VCCT). First, a quasi-static benchmark example was created for the specimen. Second, based on the static results, benchmark examples for cyclic delamination growth were created. Third, the load-displacement relationship from a propagation analysis and the benchmark results were compared, and good agreement could be achieved by selecting the appropriate input parameters. Fourth, starting from an initially straight front, the delamination was allowed to grow under cyclic loading. The number of cycles to delamination onset and the number of cycles during delamination growth for each growth increment were obtained from the automated analysis and compared to the benchmark examples. Again, good agreement between the results obtained from the growth analysis and the benchmark results could be achieved by selecting the appropriate input parameters. The benchmarking procedure proved valuable by highlighting the issues associated with choosing the input parameters of the particular implementation. Selecting the appropriate input parameters, however, was not straightforward and often required an iterative procedure. Overall the results are encouraging, but further assessment for mixed-mode delamination is required.
Using Participatory Action Research to Study the Implementation of Career Development Benchmarks at a New Zealand University

ERIC Educational Resources Information Center

Furbish, Dale S.; Bailey, Robyn; Trought, David

2016-01-01

Benchmarks for career development services at tertiary institutions have been developed by Careers New Zealand. The benchmarks are intended to provide standards derived from international best practices to guide career development services. A new career development service was initiated at a large New Zealand university just after the benchmarks…
The voice of the customer--Part 2: Benchmarking battery chargers against the Consumer's Ideal Product.

PubMed

Bauer, S M; Lane, J P; Stone, V I; Unnikrishnan, N

1998-01-01

The Rehabilitation Engineering Research Center on Technology Evaluation and Transfer is exploring how the end users of assistive technology devices define the ideal device. This work is called the Consumer Ideal Product program. In this work, end users identify and establish the importance of a broad range of product design features, along with the related product support and service provided by manufacturers and vendors. This paper describes a method for systematically transforming end-user defined requirements into a form that is useful and accessible to product designers, manufacturers, and vendors. In particular, product requirements, importance weightings, and metrics are developed from the Consumer Ideal Product battery charger outcomes. Six battery charges are benchmarked against these product requirements using the metrics developed. The results suggest improvements for each product's design, service, and support. Overall, the six chargers meet roughly 45-75% of the ideal product's requirements. Many of the suggested improvements are low-cost changes that, if adopted, could provide companies a competitive advantage in the marketplace.
40 CFR 141.540 - Who has to develop a disinfection benchmark?

Code of Federal Regulations, 2010 CFR

2010-07-01

...) WATER PROGRAMS (CONTINUED) NATIONAL PRIMARY DRINKING WATER REGULATIONS Enhanced Filtration and Disinfection-Systems Serving Fewer Than 10,000 People Disinfection Benchmark § 141.540 Who has to develop a... 40 Protection of Environment 22 2010-07-01 2010-07-01 false Who has to develop a disinfection...
Benchmarking Brain-Computer Interfaces Outside the Laboratory: The Cybathlon 2016

PubMed Central

Novak, Domen; Sigrist, Roland; Gerig, Nicolas J.; Wyss, Dario; Bauer, René; Götz, Ulrich; Riener, Robert

2018-01-01

This paper presents a new approach to benchmarking brain-computer interfaces (BCIs) outside the lab. A computer game was created that mimics a real-world application of assistive BCIs, with the main outcome metric being the time needed to complete the game. This approach was used at the Cybathlon 2016, a competition for people with disabilities who use assistive technology to achieve tasks. The paper summarizes the technical challenges of BCIs, describes the design of the benchmarking game, then describes the rules for acceptable hardware, software and inclusion of human pilots in the BCI competition at the Cybathlon. The 11 participating teams, their approaches, and their results at the Cybathlon are presented. Though the benchmarking procedure has some limitations (for instance, we were unable to identify any factors that clearly contribute to BCI performance), it can be successfully used to analyze BCI performance in realistic, less structured conditions. In the future, the parameters of the benchmarking game could be modified to better mimic different applications (e.g., the need to use some commands more frequently than others). Furthermore, the Cybathlon has the potential to showcase such devices to the general public. PMID:29375294
Developing a Benchmarking Process in Perfusion: A Report of the Perfusion Downunder Collaboration

PubMed Central

Baker, Robert A.; Newland, Richard F.; Fenton, Carmel; McDonald, Michael; Willcox, Timothy W.; Merry, Alan F.

2012-01-01

Abstract: Improving and understanding clinical practice is an appropriate goal for the perfusion community. The Perfusion Downunder Collaboration has established a multi-center perfusion focused database aimed at achieving these goals through the development of quantitative quality indicators for clinical improvement through benchmarking. Data were collected using the Perfusion Downunder Collaboration database from procedures performed in eight Australian and New Zealand cardiac centers between March 2007 and February 2011. At the Perfusion Downunder Meeting in 2010, it was agreed by consensus, to report quality indicators (QI) for glucose level, arterial outlet temperature, and pCO2 management during cardiopulmonary bypass. The values chosen for each QI were: blood glucose ≥4 mmol/L and ≤10 mmol/L; arterial outlet temperature ≤37°C; and arterial blood gas pCO2 ≥ 35 and ≤45 mmHg. The QI data were used to derive benchmarks using the Achievable Benchmark of Care (ABC™) methodology to identify the incidence of QIs at the best performing centers. Five thousand four hundred and sixty-five procedures were evaluated to derive QI and benchmark data. The incidence of the blood glucose QI ranged from 37–96% of procedures, with a benchmark value of 90%. The arterial outlet temperature QI occurred in 16–98% of procedures with the benchmark of 94%; while the arterial pCO2 QI occurred in 21–91%, with the benchmark value of 80%. We have derived QIs and benchmark calculations for the management of several key aspects of cardiopulmonary bypass to provide a platform for improving the quality of perfusion practice. PMID:22730861
Best practices from WisDOT mega and ARRA projects : statistical analysis and % time vs. % cost metrics.

DOT National Transportation Integrated Search

2012-03-01

This study was undertaken to: 1) apply a benchmarking process to identify best practices within four areas Wisconsin Department of Transportation (WisDOT) construction management and 2) analyze two performance metrics, % Cost vs. % Time, tracked by t...
Development of Benchmark Examples for Delamination Onset and Fatigue Growth Prediction

NASA Technical Reports Server (NTRS)

Krueger, Ronald

2011-01-01

An approach for assessing the delamination propagation and growth capabilities in commercial finite element codes was developed and demonstrated for the Virtual Crack Closure Technique (VCCT) implementations in ABAQUS. The Double Cantilever Beam (DCB) specimen was chosen as an example. First, benchmark results to assess delamination propagation capabilities under static loading were created using models simulating specimens with different delamination lengths. For each delamination length modeled, the load and displacement at the load point were monitored. The mixed-mode strain energy release rate components were calculated along the delamination front across the width of the specimen. A failure index was calculated by correlating the results with the mixed-mode failure criterion of the graphite/epoxy material. The calculated critical loads and critical displacements for delamination onset for each delamination length modeled were used as a benchmark. The load/displacement relationship computed during automatic propagation should closely match the benchmark case. Second, starting from an initially straight front, the delamination was allowed to propagate based on the algorithms implemented in the commercial finite element software. The load-displacement relationship obtained from the propagation analysis results and the benchmark results were compared. Good agreements could be achieved by selecting the appropriate input parameters, which were determined in an iterative procedure.
GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods.

PubMed

Schaffter, Thomas; Marbach, Daniel; Floreano, Dario

2011-08-15

Over the last decade, numerous methods have been developed for inference of regulatory networks from gene expression data. However, accurate and systematic evaluation of these methods is hampered by the difficulty of constructing adequate benchmarks and the lack of tools for a differentiated analysis of network predictions on such benchmarks. Here, we describe a novel and comprehensive method for in silico benchmark generation and performance profiling of network inference methods available to the community as an open-source software called GeneNetWeaver (GNW). In addition to the generation of detailed dynamical models of gene regulatory networks to be used as benchmarks, GNW provides a network motif analysis that reveals systematic prediction errors, thereby indicating potential ways of improving inference methods. The accuracy of network inference methods is evaluated using standard metrics such as precision-recall and receiver operating characteristic curves. We show how GNW can be used to assess the performance and identify the strengths and weaknesses of six inference methods. Furthermore, we used GNW to provide the international Dialogue for Reverse Engineering Assessments and Methods (DREAM) competition with three network inference challenges (DREAM3, DREAM4 and DREAM5). GNW is available at http://gnw.sourceforge.net along with its Java source code, user manual and supporting data. Supplementary data are available at Bioinformatics online. dario.floreano@epfl.ch.
Limitations of Community College Benchmarking and Benchmarks

ERIC Educational Resources Information Center

Bers, Trudy H.

2006-01-01

This chapter distinguishes between benchmarks and benchmarking, describes a number of data and cultural limitations to benchmarking projects, and suggests that external demands for accountability are the dominant reason for growing interest in benchmarking among community colleges.
Development of a Computer-based Benchmarking and Analytical Tool. Benchmarking and Energy & Water Savings Tool in Dairy Plants (BEST-Dairy)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xu, Tengfang; Flapper, Joris; Ke, Jing

The overall goal of the project is to develop a computer-based benchmarking and energy and water savings tool (BEST-Dairy) for use in the California dairy industry – including four dairy processes – cheese, fluid milk, butter, and milk powder.
Surveillance metrics sensitivity study.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hamada, Michael S.; Bierbaum, Rene Lynn; Robertson, Alix A.

2011-09-01

In September of 2009, a Tri-Lab team was formed to develop a set of metrics relating to the NNSA nuclear weapon surveillance program. The purpose of the metrics was to develop a more quantitative and/or qualitative metric(s) describing the results of realized or non-realized surveillance activities on our confidence in reporting reliability and assessing the stockpile. As a part of this effort, a statistical sub-team investigated various techniques and developed a complementary set of statistical metrics that could serve as a foundation for characterizing aspects of meeting the surveillance program objectives. The metrics are a combination of tolerance limit calculationsmore » and power calculations, intending to answer level-of-confidence type questions with respect to the ability to detect certain undesirable behaviors (catastrophic defects, margin insufficiency defects, and deviations from a model). Note that the metrics are not intended to gauge product performance but instead the adequacy of surveillance. This report gives a short description of four metrics types that were explored and the results of a sensitivity study conducted to investigate their behavior for various inputs. The results of the sensitivity study can be used to set the risk parameters that specify the level of stockpile problem that the surveillance program should be addressing.« less
Surveillance Metrics Sensitivity Study

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bierbaum, R; Hamada, M; Robertson, A

2011-11-01

In September of 2009, a Tri-Lab team was formed to develop a set of metrics relating to the NNSA nuclear weapon surveillance program. The purpose of the metrics was to develop a more quantitative and/or qualitative metric(s) describing the results of realized or non-realized surveillance activities on our confidence in reporting reliability and assessing the stockpile. As a part of this effort, a statistical sub-team investigated various techniques and developed a complementary set of statistical metrics that could serve as a foundation for characterizing aspects of meeting the surveillance program objectives. The metrics are a combination of tolerance limit calculationsmore » and power calculations, intending to answer level-of-confidence type questions with respect to the ability to detect certain undesirable behaviors (catastrophic defects, margin insufficiency defects, and deviations from a model). Note that the metrics are not intended to gauge product performance but instead the adequacy of surveillance. This report gives a short description of four metrics types that were explored and the results of a sensitivity study conducted to investigate their behavior for various inputs. The results of the sensitivity study can be used to set the risk parameters that specify the level of stockpile problem that the surveillance program should be addressing.« less

Pragmatic quality metrics for evolutionary software development models

NASA Technical Reports Server (NTRS)

Royce, Walker

1990-01-01

Due to the large number of product, project, and people parameters which impact large custom software development efforts, measurement of software product quality is a complex undertaking. Furthermore, the absolute perspective from which quality is measured (customer satisfaction) is intangible. While we probably can't say what the absolute quality of a software product is, we can determine the relative quality, the adequacy of this quality with respect to pragmatic considerations, and identify good and bad trends during development. While no two software engineers will ever agree on an optimum definition of software quality, they will agree that the most important perspective of software quality is its ease of change. We can call this flexibility, adaptability, or some other vague term, but the critical characteristic of software is that it is soft. The easier the product is to modify, the easier it is to achieve any other software quality perspective. This paper presents objective quality metrics derived from consistent lifecycle perspectives of rework which, when used in concert with an evolutionary development approach, can provide useful insight to produce better quality per unit cost/schedule or to achieve adequate quality more efficiently. The usefulness of these metrics is evaluated by applying them to a large, real world, Ada project.
Results Oriented Benchmarking: The Evolution of Benchmarking at NASA from Competitive Comparisons to World Class Space Partnerships

NASA Technical Reports Server (NTRS)

Bell, Michael A.

1999-01-01

Informal benchmarking using personal or professional networks has taken place for many years at the Kennedy Space Center (KSC). The National Aeronautics and Space Administration (NASA) recognized early on, the need to formalize the benchmarking process for better utilization of resources and improved benchmarking performance. The need to compete in a faster, better, cheaper environment has been the catalyst for formalizing these efforts. A pioneering benchmarking consortium was chartered at KSC in January 1994. The consortium known as the Kennedy Benchmarking Clearinghouse (KBC), is a collaborative effort of NASA and all major KSC contractors. The charter of this consortium is to facilitate effective benchmarking, and leverage the resulting quality improvements across KSC. The KBC acts as a resource with experienced facilitators and a proven process. One of the initial actions of the KBC was to develop a holistic methodology for Center-wide benchmarking. This approach to Benchmarking integrates the best features of proven benchmarking models (i.e., Camp, Spendolini, Watson, and Balm). This cost-effective alternative to conventional Benchmarking approaches has provided a foundation for consistent benchmarking at KSC through the development of common terminology, tools, and techniques. Through these efforts a foundation and infrastructure has been built which allows short duration benchmarking studies yielding results gleaned from world class partners that can be readily implemented. The KBC has been recognized with the Silver Medal Award (in the applied research category) from the International Benchmarking Clearinghouse.
Developing a Benchmark Tool for Sustainable Consumption: An Iterative Process

ERIC Educational Resources Information Center

Heiskanen, E.; Timonen, P.; Nissinen, A.; Gronroos, J.; Honkanen, A.; Katajajuuri, J. -M.; Kettunen, J.; Kurppa, S.; Makinen, T.; Seppala, J.; Silvenius, F.; Virtanen, Y.; Voutilainen, P.

2007-01-01

This article presents the development process of a consumer-oriented, illustrative benchmarking tool enabling consumers to use the results of environmental life cycle assessment (LCA) to make informed decisions. LCA provides a wealth of information on the environmental impacts of products, but its results are very difficult to present concisely…
Comparative Modeling and Benchmarking Data Sets for Human Histone Deacetylases and Sirtuin Families

PubMed Central

Xia, Jie; Tilahun, Ermias Lemma; Kebede, Eyob Hailu; Reid, Terry-Elinor; Zhang, Liangren; Wang, Xiang Simon

2015-01-01

Histone Deacetylases (HDACs) are an important class of drug targets for the treatment of cancers, neurodegenerative diseases and other types of diseases. Virtual screening (VS) has become fairly effective approaches for drug discovery of novel and highly selective Histone Deacetylases Inhibitors (HDACIs). To facilitate the process, we constructed the Maximal Unbiased Benchmarking Data Sets for HDACs (MUBD-HDACs) using our recently published methods that were originally developed for building unbiased benchmarking sets for ligand-based virtual screening (LBVS). The MUBD-HDACs covers all 4 Classes including Class III (Sirtuins family) and 14 HDACs isoforms, composed of 631 inhibitors and 24,609 unbiased decoys. Its ligand sets have been validated extensively as chemically diverse, while the decoy sets were shown to be property-matching with ligands and maximal unbiased in terms of “artificial enrichment” and “analogue bias”. We also conducted comparative studies with DUD-E and DEKOIS 2.0 sets against HDAC2 and HDAC8 targets, and demonstrate that our MUBD-HDACs is unique in that it can be applied unbiasedly to both LBVS and SBVS approaches. In addition, we defined a novel metric, i.e. NLBScore, to detect the “2D bias” and “LBVS favorable” effect within the benchmarking sets. In summary, MUBD-HDACs is the only comprehensive and maximal-unbiased benchmark data sets for HDACs (including Sirtuins) that is available so far. MUBD-HDACs is freely available at http://www.xswlab.org/. PMID:25633490
Comparative modeling and benchmarking data sets for human histone deacetylases and sirtuin families.

PubMed

Xia, Jie; Tilahun, Ermias Lemma; Kebede, Eyob Hailu; Reid, Terry-Elinor; Zhang, Liangren; Wang, Xiang Simon

2015-02-23

Histone deacetylases (HDACs) are an important class of drug targets for the treatment of cancers, neurodegenerative diseases, and other types of diseases. Virtual screening (VS) has become fairly effective approaches for drug discovery of novel and highly selective histone deacetylase inhibitors (HDACIs). To facilitate the process, we constructed maximal unbiased benchmarking data sets for HDACs (MUBD-HDACs) using our recently published methods that were originally developed for building unbiased benchmarking sets for ligand-based virtual screening (LBVS). The MUBD-HDACs cover all four classes including Class III (Sirtuins family) and 14 HDAC isoforms, composed of 631 inhibitors and 24609 unbiased decoys. Its ligand sets have been validated extensively as chemically diverse, while the decoy sets were shown to be property-matching with ligands and maximal unbiased in terms of "artificial enrichment" and "analogue bias". We also conducted comparative studies with DUD-E and DEKOIS 2.0 sets against HDAC2 and HDAC8 targets and demonstrate that our MUBD-HDACs are unique in that they can be applied unbiasedly to both LBVS and SBVS approaches. In addition, we defined a novel metric, i.e. NLBScore, to detect the "2D bias" and "LBVS favorable" effect within the benchmarking sets. In summary, MUBD-HDACs are the only comprehensive and maximal-unbiased benchmark data sets for HDACs (including Sirtuins) that are available so far. MUBD-HDACs are freely available at http://www.xswlab.org/ .
Benchmarking--Measuring and Comparing for Continuous Improvement.

ERIC Educational Resources Information Center

Henczel, Sue

2002-01-01

Discussion of benchmarking focuses on the use of internal and external benchmarking by special librarians. Highlights include defining types of benchmarking; historical development; benefits, including efficiency, improved performance, increased competitiveness, and better decision making; problems, including inappropriate adaptation; developing a…
Metrication report to the Congress

NASA Technical Reports Server (NTRS)

1990-01-01

The principal NASA metrication activities for FY 1989 were a revision of NASA metric policy and evaluation of the impact of using the metric system of measurement for the design and construction of the Space Station Freedom. Additional studies provided a basis for focusing follow-on activity. In FY 1990, emphasis will shift to implementation of metric policy and development of a long-range metrication plan. The report which follows addresses Policy Development, Planning and Program Evaluation, and Supporting Activities for the past and coming year.
Benchmarks of fairness for health care reform: a policy tool for developing countries.

PubMed Central

Daniels, N.; Bryant, J.; Castano, R. A.; Dantes, O. G.; Khan, K. S.; Pannarunothai, S.

2000-01-01

Teams of collaborators from Colombia, Mexico, Pakistan, and Thailand have adapted a policy tool originally developed for evaluating health insurance reforms in the United States into "benchmarks of fairness" for assessing health system reform in developing countries. We describe briefly the history of the benchmark approach, the tool itself, and the uses to which it may be put. Fairness is a wide term that includes exposure to risk factors, access to all forms of care, and to financing. It also includes efficiency of management and resource allocation, accountability, and patient and provider autonomy. The benchmarks standardize the criteria for fairness. Reforms are then evaluated by scoring according to the degree to which they improve the situation, i.e. on a scale of -5 to 5, with zero representing the status quo. The object is to promote discussion about fairness across the disciplinary divisions that keep policy analysts and the public from understanding how trade-offs between different effects of reforms can affect the overall fairness of the reform. The benchmarks can be used at both national and provincial or district levels, and we describe plans for such uses in the collaborating sites. A striking feature of the adaptation process is that there was wide agreement on this ethical framework among the collaborating sites despite their large historical, political and cultural differences. PMID:10916911
Metrication report to the Congress

NASA Technical Reports Server (NTRS)

1991-01-01

NASA's principal metrication accomplishments for FY 1990 were establishment of metrication policy for major programs, development of an implementing instruction for overall metric policy and initiation of metrication planning for the major program offices. In FY 1991, development of an overall NASA plan and individual program office plans will be completed, requirement assessments will be performed for all support areas, and detailed assessment and transition planning will be undertaken at the institutional level. Metric feasibility decisions on a number of major programs are expected over the next 18 months.
Attack-Resistant Trust Metrics

NASA Astrophysics Data System (ADS)

Levien, Raph

The Internet is an amazingly powerful tool for connecting people together, unmatched in human history. Yet, with that power comes great potential for spam and abuse. Trust metrics are an attempt to compute the set of which people are trustworthy and which are likely attackers. This chapter presents two specific trust metrics developed and deployed on the Advogato Website, which is a community blog for free software developers. This real-world experience demonstrates that the trust metrics fulfilled their goals, but that for good results, it is important to match the assumptions of the abstract trust metric computation to the real-world implementation.
Develop metrics of tire debris on Texas highways : technical report.

DOT National Transportation Integrated Search

2017-05-01

This research effort estimated the amount, characteristics, costs, and safety implications of tire debris on Texas highways. The metrics developed by this research are based on several sources of data, including a statewide survey of debris removal p...
Benchmarking reference services: an introduction.

PubMed

Marshall, J G; Buchanan, H S

1995-01-01

Benchmarking is based on the common sense idea that someone else, either inside or outside of libraries, has found a better way of doing certain things and that your own library's performance can be improved by finding out how others do things and adopting the best practices you find. Benchmarking is one of the tools used for achieving continuous improvement in Total Quality Management (TQM) programs. Although benchmarking can be done on an informal basis, TQM puts considerable emphasis on formal data collection and performance measurement. Used to its full potential, benchmarking can provide a common measuring stick to evaluate process performance. This article introduces the general concept of benchmarking, linking it whenever possible to reference services in health sciences libraries. Data collection instruments that have potential application in benchmarking studies are discussed and the need to develop common measurement tools to facilitate benchmarking is emphasized.
Engineering performance metrics

NASA Astrophysics Data System (ADS)

Delozier, R.; Snyder, N.

1993-03-01

Implementation of a Total Quality Management (TQM) approach to engineering work required the development of a system of metrics which would serve as a meaningful management tool for evaluating effectiveness in accomplishing project objectives and in achieving improved customer satisfaction. A team effort was chartered with the goal of developing a system of engineering performance metrics which would measure customer satisfaction, quality, cost effectiveness, and timeliness. The approach to developing this system involved normal systems design phases including, conceptual design, detailed design, implementation, and integration. The lessons teamed from this effort will be explored in this paper. These lessons learned may provide a starting point for other large engineering organizations seeking to institute a performance measurement system accomplishing project objectives and in achieving improved customer satisfaction. To facilitate this effort, a team was chartered to assist in the development of the metrics system. This team, consisting of customers and Engineering staff members, was utilized to ensure that the needs and views of the customers were considered in the development of performance measurements. The development of a system of metrics is no different than the development of any type of system. It includes the steps of defining performance measurement requirements, measurement process conceptual design, performance measurement and reporting system detailed design, and system implementation and integration.
Metrication in a global environment

NASA Technical Reports Server (NTRS)

Aberg, J.

1994-01-01

A brief history about the development of the metric system of measurement is given. The need for the U.S. to implement the 'SI' metric system in the international markets, especially in the aerospace and general trade, is discussed. Development of metric implementation and experiences locally, nationally, and internationally are included.
A suite of standard post-tagging evaluation metrics can help assess tag retention for field-based fish telemetry research

USGS Publications Warehouse

Gerber, Kayla M.; Mather, Martha E.; Smith, Joseph M.

2017-01-01

Telemetry can inform many scientific and research questions if a context exists for integrating individual studies into the larger body of literature. Creating cumulative distributions of post-tagging evaluation metrics would allow individual researchers to relate their telemetry data to other studies. Widespread reporting of standard metrics is a precursor to the calculation of benchmarks for these distributions (e.g., mean, SD, 95% CI). Here we illustrate five types of standard post-tagging evaluation metrics using acoustically tagged Blue Catfish (Ictalurus furcatus) released into a Kansas reservoir. These metrics included: (1) percent of tagged fish detected overall, (2) percent of tagged fish detected daily using abacus plot data, (3) average number of (and percent of available) receiver sites visited, (4) date of last movement between receiver sites (and percent of tagged fish moving during that time period), and (5) number (and percent) of fish that egressed through exit gates. These metrics were calculated for one to three time periods: early (<10 d), during (weekly), and at the end of the study (5 months). Over three-quarters of our tagged fish were detected early (85%) and at the end (85%) of the study. Using abacus plot data, all tagged fish (100%) were detected at least one day and 96% were detected for > 5 days early in the study. On average, tagged Blue Catfish visited 9 (50%) and 13 (72%) of 18 within-reservoir receivers early and at the end of the study, respectively. At the end of the study, 73% of all tagged fish were detected moving between receivers. Creating statistical benchmarks for individual metrics can provide useful reference points. In addition, combining multiple metrics can inform ecology and research design. Consequently, individual researchers and the field of telemetry research can benefit from widespread, detailed, and standard reporting of post-tagging detection metrics.
Performance Evaluation and Benchmarking of Intelligent Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Madhavan, Raj; Messina, Elena; Tunstel, Edward

To design and develop capable, dependable, and affordable intelligent systems, their performance must be measurable. Scientific methodologies for standardization and benchmarking are crucial for quantitatively evaluating the performance of emerging robotic and intelligent systems technologies. There is currently no accepted standard for quantitatively measuring the performance of these systems against user-defined requirements; and furthermore, there is no consensus on what objective evaluation procedures need to be followed to understand the performance of these systems. The lack of reproducible and repeatable test methods has precluded researchers working towards a common goal from exchanging and communicating results, inter-comparing system performance, and leveragingmore » previous work that could otherwise avoid duplication and expedite technology transfer. Currently, this lack of cohesion in the community hinders progress in many domains, such as manufacturing, service, healthcare, and security. By providing the research community with access to standardized tools, reference data sets, and open source libraries of solutions, researchers and consumers will be able to evaluate the cost and benefits associated with intelligent systems and associated technologies. In this vein, the edited book volume addresses performance evaluation and metrics for intelligent systems, in general, while emphasizing the need and solutions for standardized methods. To the knowledge of the editors, there is not a single book on the market that is solely dedicated to the subject of performance evaluation and benchmarking of intelligent systems. Even books that address this topic do so only marginally or are out of date. The research work presented in this volume fills this void by drawing from the experiences and insights of experts gained both through theoretical development and practical implementation of intelligent systems in a variety of diverse application domains. The book
Metrication study for large space telescope

NASA Technical Reports Server (NTRS)

Creswick, F. A.; Weller, A. E.

1973-01-01

Various approaches which could be taken in developing a metric-system design for the Large Space Telescope, considering potential penalties on development cost and time, commonality with other satellite programs, and contribution to national goals for conversion to the metric system of units were investigated. Information on the problems, potential approaches, and impacts of metrication was collected from published reports on previous aerospace-industry metrication-impact studies and through numerous telephone interviews. The recommended approach to LST metrication formulated in this study cells for new components and subsystems to be designed in metric-module dimensions, but U.S. customary practice is allowed where U.S. metric standards and metric components are not available or would be unsuitable. Electrical/electronic-system design, which is presently largely metric, is considered exempt from futher metrication. An important guideline is that metric design and fabrication should in no way compromise the effectiveness of the LST equipment.
Structural Life and Reliability Metrics: Benchmarking and Verification of Probabilistic Life Prediction Codes

NASA Technical Reports Server (NTRS)

Litt, Jonathan S.; Soditus, Sherry; Hendricks, Robert C.; Zaretsky, Erwin V.

2002-01-01

Over the past two decades there has been considerable effort by NASA Glenn and others to develop probabilistic codes to predict with reasonable engineering certainty the life and reliability of critical components in rotating machinery and, more specifically, in the rotating sections of airbreathing and rocket engines. These codes have, to a very limited extent, been verified with relatively small bench rig type specimens under uniaxial loading. Because of the small and very narrow database the acceptance of these codes within the aerospace community has been limited. An alternate approach to generating statistically significant data under complex loading and environments simulating aircraft and rocket engine conditions is to obtain, catalog and statistically analyze actual field data. End users of the engines, such as commercial airlines and the military, record and store operational and maintenance information. This presentation describes a cooperative program between the NASA GRC, United Airlines, USAF Wright Laboratory, U.S. Army Research Laboratory and Australian Aeronautical & Maritime Research Laboratory to obtain and analyze these airline data for selected components such as blades, disks and combustors. These airline data will be used to benchmark and compare existing life prediction codes.
Making Benchmark Testing Work

ERIC Educational Resources Information Center

Herman, Joan L.; Baker, Eva L.

2005-01-01

Many schools are moving to develop benchmark tests to monitor their students' progress toward state standards throughout the academic year. Benchmark tests can provide the ongoing information that schools need to guide instructional programs and to address student learning problems. The authors discuss six criteria that educators can use to…
The National Practice Benchmark for oncology, 2014 report on 2013 data.

PubMed

Towle, Elaine L; Barr, Thomas R; Senese, James L

2014-11-01

The National Practice Benchmark (NPB) is a unique tool to measure oncology practices against others across the country in a way that allows meaningful comparisons despite differences in practice size or setting. In today's economic environment every oncology practice, regardless of business structure or affiliation, should be able to produce, monitor, and benchmark basic metrics to meet current business pressures for increased efficiency and efficacy of care. Although we recognize that the NPB survey results do not capture the experience of all oncology practices, practices that can and do participate demonstrate exceptional managerial capability, and this year those practices are recognized for their participation. In this report, we continue to emphasize the methodology introduced last year in which we reported medical revenue net of the cost of the drugs as net medical revenue for the hematology/oncology product line. The effect of this is to capture only the gross margin attributable to drugs as revenue. New this year, we introduce six measures of clinical data density and expand the radiation oncology benchmarks. Copyright © 2014 by American Society of Clinical Oncology.

Benchmarking Big Data Systems and the BigData Top100 List.

PubMed

Baru, Chaitanya; Bhandarkar, Milind; Nambiar, Raghunath; Poess, Meikel; Rabl, Tilmann

2013-03-01

"Big data" has become a major force of innovation across enterprises of all sizes. New platforms with increasingly more features for managing big datasets are being announced almost on a weekly basis. Yet, there is currently a lack of any means of comparability among such platforms. While the performance of traditional database systems is well understood and measured by long-established institutions such as the Transaction Processing Performance Council (TCP), there is neither a clear definition of the performance of big data systems nor a generally agreed upon metric for comparing these systems. In this article, we describe a community-based effort for defining a big data benchmark. Over the past year, a Big Data Benchmarking Community has become established in order to fill this void. The effort focuses on defining an end-to-end application-layer benchmark for measuring the performance of big data applications, with the ability to easily adapt the benchmark specification to evolving challenges in the big data space. This article describes the efforts that have been undertaken thus far toward the definition of a BigData Top100 List. While highlighting the major technical as well as organizational challenges, through this article, we also solicit community input into this process.
WWTP dynamic disturbance modelling--an essential module for long-term benchmarking development.

PubMed

Gernaey, K V; Rosen, C; Jeppsson, U

2006-01-01

Intensive use of the benchmark simulation model No. 1 (BSM1), a protocol for objective comparison of the effectiveness of control strategies in biological nitrogen removal activated sludge plants, has also revealed a number of limitations. Preliminary definitions of the long-term benchmark simulation model No. 1 (BSM1_LT) and the benchmark simulation model No. 2 (BSM2) have been made to extend BSM1 for evaluation of process monitoring methods and plant-wide control strategies, respectively. Influent-related disturbances for BSM1_LT/BSM2 are to be generated with a model, and this paper provides a general overview of the modelling methods used. Typical influent dynamic phenomena generated with the BSM1_LT/BSM2 influent disturbance model, including diurnal, weekend, seasonal and holiday effects, as well as rainfall, are illustrated with simulation results. As a result of the work described in this paper, a proposed influent model/file has been released to the benchmark developers for evaluation purposes. Pending this evaluation, a final BSM1_LT/BSM2 influent disturbance model definition is foreseen. Preliminary simulations with dynamic influent data generated by the influent disturbance model indicate that default BSM1 activated sludge plant control strategies will need extensions for BSM1_LT/BSM2 to efficiently handle 1 year of influent dynamics.
High-performance conjugate-gradient benchmark: A new metric for ranking high-performance computing systems

DOE PAGES

Dongarra, Jack; Heroux, Michael A.; Luszczek, Piotr

2015-08-17

Here, we describe a new high-performance conjugate-gradient (HPCG) benchmark. HPCG is composed of computations and data-access patterns commonly found in scientific applications. HPCG strives for a better correlation to existing codes from the computational science domain and to be representative of their performance. Furthermore, HPCG is meant to help drive the computer system design and implementation in directions that will better impact future performance improvement.
International E-Benchmarking: Flexible Peer Development of Authentic Learning Principles in Higher Education

ERIC Educational Resources Information Center

Leppisaari, Irja; Vainio, Leena; Herrington, Jan; Im, Yeonwook

2011-01-01

More and more, social technologies and virtual work methods are facilitating new ways of crossing boundaries in professional development and international collaborations. This paper examines the peer development of higher education teachers through the experiences of the IVBM project (International Virtual Benchmarking, 2009-2010). The…
METRICS DEVELOPMENT FOR PATENTS.

PubMed

Veiga, Daniela Francescato; Ferreira, Lydia Masako

2015-01-01

To develop a proposal for metrics for patents to be applied in assessing the postgraduate programs of Medicine III - Capes. From the reading and analysis of the 2013 area documents of all the 48 areas of Capes, a proposal for metrics for patents was developed to be applied in Medicine III programs. Except for the areas Biotechnology, Food Science, Biological Sciences III, Physical Education, Engineering I, III and IV and Interdisciplinary, most areas do not adopt a scoring system for patents. The proposal developed was based on the criteria of Biotechnology, with adaptations. In general, it will be valued, in ascending order, the deposit, the granting and licensing/production. It will also be assigned higher scores to patents registered abroad and whenever there is a participation of students. This proposal can be applied to the item Intellectual Production of the evaluation form, in subsection Technical Production/Patents. The percentage of 10% for academic programs and 40% for Masters Professionals should be maintained. The program will be scored as Very Good when it reaches 400 points or over; Good, between 200 and 399 points; Regular, between 71 and 199 points; Weak up to 70 points; Insufficient, no punctuation. Desenvolver uma proposta de métricas para patentes a serem aplicadas na avaliação dos Programas de Pós-Graduação da Área Medicina III - Capes. A partir da leitura e análise dos documentos de área de 2013 de todas as 48 Áreas da Capes, desenvolveu-se uma proposta de métricas para patentes, a ser aplicada na avaliação dos programas da área. Constatou-se que, com exceção das áreas Biotecnologia, Ciência de Alimentos, Ciências Biológicas III, Educação Física, Engenharias I, III e IV e Interdisciplinar, a maioria não adota sistema de pontuação para patentes. A proposta desenvolvida baseou-se nos critérios da Biotecnologia, com adaptações. De uma forma geral, foi valorizado, em ordem crescente, o depósito, a concessão e o
Deep Multimodal Distance Metric Learning Using Click Constraints for Image Ranking.

PubMed

Yu, Jun; Yang, Xiaokang; Gao, Fei; Tao, Dacheng

2017-12-01

How do we retrieve images accurately? Also, how do we rank a group of images precisely and efficiently for specific queries? These problems are critical for researchers and engineers to generate a novel image searching engine. First, it is important to obtain an appropriate description that effectively represent the images. In this paper, multimodal features are considered for describing images. The images unique properties are reflected by visual features, which are correlated to each other. However, semantic gaps always exist between images visual features and semantics. Therefore, we utilize click feature to reduce the semantic gap. The second key issue is learning an appropriate distance metric to combine these multimodal features. This paper develops a novel deep multimodal distance metric learning (Deep-MDML) method. A structured ranking model is adopted to utilize both visual and click features in distance metric learning (DML). Specifically, images and their related ranking results are first collected to form the training set. Multimodal features, including click and visual features, are collected with these images. Next, a group of autoencoders is applied to obtain initially a distance metric in different visual spaces, and an MDML method is used to assign optimal weights for different modalities. Next, we conduct alternating optimization to train the ranking model, which is used for the ranking of new queries with click features. Compared with existing image ranking methods, the proposed method adopts a new ranking model to use multimodal features, including click features and visual features in DML. We operated experiments to analyze the proposed Deep-MDML in two benchmark data sets, and the results validate the effects of the method.
Developing a Metrics-Based Online Strategy for Libraries

ERIC Educational Resources Information Center

Pagano, Joe

2009-01-01

Purpose: The purpose of this paper is to provide an introduction to the various web metrics tools that are available, and to indicate how these might be used in libraries. Design/methodology/approach: The paper describes ways in which web metrics can be used to inform strategic decision making in libraries. Findings: A framework of possible web…
Benchmarking expert system tools

NASA Technical Reports Server (NTRS)

Riley, Gary

1988-01-01

As part of its evaluation of new technologies, the Artificial Intelligence Section of the Mission Planning and Analysis Div. at NASA-Johnson has made timing tests of several expert system building tools. Among the production systems tested were Automated Reasoning Tool, several versions of OPS5, and CLIPS (C Language Integrated Production System), an expert system builder developed by the AI section. Also included in the test were a Zetalisp version of the benchmark along with four versions of the benchmark written in Knowledge Engineering Environment, an object oriented, frame based expert system tool. The benchmarks used for testing are studied.
EPA's Benchmark Dose Modeling Software

EPA Science Inventory

The EPA developed the Benchmark Dose Software (BMDS) as a tool to help Agency risk assessors facilitate applying benchmark dose (BMD) method’s to EPA’s human health risk assessment (HHRA) documents. The application of BMD methods overcomes many well know limitations ...
Toward a benchmark material in aerogel development

NASA Astrophysics Data System (ADS)

Sibille, Laurent; Cronise, Raymond J.; Noever, David A.; Hunt, Arlon J.

1996-03-01

Discovered in the thirties, aerogels constitute today the lightest solids known while exhibiting outstanding thermal and noise insulation properties in air and vacuum. In a far-reaching collaboration, the Space Science Laboratory at NASA Marshall Space Flight Center and the Microstructured Materials Group at Lawrence Berkeley National Laboratory are engaged in a two-fold research effort aiming at characterizing the microstructure of silica aerogels and the development of benchmark samples through the use of in-orbit microgravity environment. Absence of density-driven convection flows and sedimentation is sought to produce aerogel samples with narrow distribution of pore sizes, thus largely improving transparency of the material in the visible range. Furthermore, highly isotropic distribution of doping materials are attainable even in large gels grown in microgravity. Aerospace companies (cryogenic tanks insulation and high temperature insulation of space vehicles), insulation manufacturers (household and industrial applications) as well as pharmaceutical companies (biosensors) are potential end-users of this rapidly developing technology.
Deriving phenological metrics from NDVI through an open source tool developed in QGIS

NASA Astrophysics Data System (ADS)

Duarte, Lia; Teodoro, A. C.; Gonçalves, Hernãni

2014-10-01

Vegetation indices have been commonly used over the past 30 years for studying vegetation characteristics using images collected by remote sensing satellites. One of the most commonly used is the Normalized Difference Vegetation Index (NDVI). The various stages that green vegetation undergoes during a complete growing season can be summarized through time-series analysis of NDVI data. The analysis of such time-series allow for extracting key phenological variables or metrics of a particular season. These characteristics may not necessarily correspond directly to conventional, ground-based phenological events, but do provide indications of ecosystem dynamics. A complete list of the phenological metrics that can be extracted from smoothed, time-series NDVI data is available in the USGS online resources (http://phenology.cr.usgs.gov/methods_deriving.php).This work aims to develop an open source application to automatically extract these phenological metrics from a set of satellite input data. The main advantage of QGIS for this specific application relies on the easiness and quickness in developing new plug-ins, using Python language, based on the experience of the research group in other related works. QGIS has its own application programming interface (API) with functionalities and programs to develop new features. The toolbar developed for this application was implemented using the plug-in NDVIToolbar.py. The user introduces the raster files as input and obtains a plot and a report with the metrics. The report includes the following eight metrics: SOST (Start Of Season - Time) corresponding to the day of the year identified as having a consistent upward trend in the NDVI time series; SOSN (Start Of Season - NDVI) corresponding to the NDVI value associated with SOST; EOST (End of Season - Time) which corresponds to the day of year identified at the end of a consistent downward trend in the NDVI time series; EOSN (End of Season - NDVI) corresponding to the NDVI value
Benchmarking and Its Relevance to the Library and Information Sector. Interim Findings of "Best Practice Benchmarking in the Library and Information Sector," a British Library Research and Development Department Project.

ERIC Educational Resources Information Center

Kinnell, Margaret; Garrod, Penny

This British Library Research and Development Department study assesses current activities and attitudes toward quality management in library and information services (LIS) in the academic sector as well as the commercial/industrial sector. Definitions and types of benchmarking are described, and the relevance of benchmarking to LIS is evaluated.…
Measuring Information Security: Guidelines to Build Metrics

NASA Astrophysics Data System (ADS)

von Faber, Eberhard

Measuring information security is a genuine interest of security managers. With metrics they can develop their security organization's visibility and standing within the enterprise or public authority as a whole. Organizations using information technology need to use security metrics. Despite the clear demands and advantages, security metrics are often poorly developed or ineffective parameters are collected and analysed. This paper describes best practices for the development of security metrics. First attention is drawn to motivation showing both requirements and benefits. The main body of this paper lists things which need to be observed (characteristic of metrics), things which can be measured (how measurements can be conducted) and steps for the development and implementation of metrics (procedures and planning). Analysis and communication is also key when using security metrics. Examples are also given in order to develop a better understanding. The author wants to resume, continue and develop the discussion about a topic which is or increasingly will be a critical factor of success for any security managers in larger organizations.
Benchmarking: applications to transfusion medicine.

PubMed

Apelseth, Torunn Oveland; Molnar, Laura; Arnold, Emmy; Heddle, Nancy M

2012-10-01

Benchmarking is as a structured continuous collaborative process in which comparisons for selected indicators are used to identify factors that, when implemented, will improve transfusion practices. This study aimed to identify transfusion medicine studies reporting on benchmarking, summarize the benchmarking approaches used, and identify important considerations to move the concept of benchmarking forward in the field of transfusion medicine. A systematic review of published literature was performed to identify transfusion medicine-related studies that compared at least 2 separate institutions or regions with the intention of benchmarking focusing on 4 areas: blood utilization, safety, operational aspects, and blood donation. Forty-five studies were included: blood utilization (n = 35), safety (n = 5), operational aspects of transfusion medicine (n = 5), and blood donation (n = 0). Based on predefined criteria, 7 publications were classified as benchmarking, 2 as trending, and 36 as single-event studies. Three models of benchmarking are described: (1) a regional benchmarking program that collects and links relevant data from existing electronic sources, (2) a sentinel site model where data from a limited number of sites are collected, and (3) an institutional-initiated model where a site identifies indicators of interest and approaches other institutions. Benchmarking approaches are needed in the field of transfusion medicine. Major challenges include defining best practices and developing cost-effective methods of data collection. For those interested in initiating a benchmarking program, the sentinel site model may be most effective and sustainable as a starting point, although the regional model would be the ideal goal. Copyright © 2012 Elsevier Inc. All rights reserved.
Development and Application of Benchmark Examples for Mixed-Mode I/II Quasi-Static Delamination Propagation Predictions

NASA Technical Reports Server (NTRS)

Krueger, Ronald

2012-01-01

The development of benchmark examples for quasi-static delamination propagation prediction is presented. The example is based on a finite element model of the Mixed-Mode Bending (MMB) specimen for 50% mode II. The benchmarking is demonstrated for Abaqus/Standard, however, the example is independent of the analysis software used and allows the assessment of the automated delamination propagation prediction capability in commercial finite element codes based on the virtual crack closure technique (VCCT). First, a quasi-static benchmark example was created for the specimen. Second, starting from an initially straight front, the delamination was allowed to propagate under quasi-static loading. Third, the load-displacement as well as delamination length versus applied load/displacement relationships from a propagation analysis and the benchmark results were compared, and good agreement could be achieved by selecting the appropriate input parameters. The benchmarking procedure proved valuable by highlighting the issues associated with choosing the input parameters of the particular implementation. Overall, the results are encouraging, but further assessment for mixed-mode delamination fatigue onset and growth is required.
Evaluation Metrics for the Paragon XP/S-15

NASA Technical Reports Server (NTRS)

Traversat, Bernard; McNab, David; Nitzberg, Bill; Fineberg, Sam; Blaylock, Bruce T. (Technical Monitor)

1993-01-01

On February 17th 1993, the Numerical Aerodynamic Simulation (NAS) facility located at the NASA Ames Research Center installed a 224 node Intel Paragon XP/S-15 system. After its installation, the Paragon was found to be in a very immature state and was unable to support a NAS users' workload, composed of a wide range of development and production activities. As a first step towards addressing this problem, we implemented a set of metrics to objectively monitor the system as operating system and hardware upgrades were installed. The metrics were designed to measure four aspects of the system that we consider essential to support our workload: availability, utilization, functionality, and performance. This report presents the metrics collected from February 1993 to August 1993. Since its installation, the Paragon availability has improved from a low of 15% uptime to a high of 80%, while its utilization has remained low. Functionality and performance have improved from merely running one of the NAS Parallel Benchmarks to running all of them faster (between 1 and 2 times) than on the iPSC/860. In spite of the progress accomplished, fundamental limitations of the Paragon operating system are restricting the Paragon from supporting the NAS workload. The maximum operating system message passing (NORMA IPC) bandwidth was measured at 11 Mbytes/s, well below the peak hardware bandwidth (175 Mbytes/s), limiting overall virtual memory and Unix services (i.e. Disk and HiPPI I/O) performance. The high NX application message passing latency (184 microns), three times than on the iPSC/860, was found to significantly degrade performance of applications relying on small message sizes. The amount of memory available for an application was found to be approximately 10 Mbytes per node, indicating that the OS is taking more space than anticipated (6 Mbytes per node).
A concept paper: using the outcomes of common surgical conditions as quality metrics to benchmark district surgical services in South Africa as part of a systematic quality improvement programme.

PubMed

Clarke, Damian L; Kong, Victor Y; Handley, Jonathan; Aldous, Colleen

2013-07-31

The fourth, fifth and sixth Millennium Development Goals relate directly to improving global healthcare and health outcomes. The focus is to improve global health outcomes by reducing maternal and childhood mortality and the burden of infectious diseases such as HIV/AIDS, tuberculosis and malaria. Specific targets and time frames have been set for these diseases. There is, however, no specific mention of surgically treated diseases in these goals, reflecting a bias that is slowly changing with emerging consensus that surgical care is an integral part of primary healthcare systems in the developing world. The disparities between the developed and developing world in terms of wealth and social indicators are reflected in disparities in access to surgical care. Health administrators must develop plans and strategies to reduce these disparities. However, any strategic plan that addresses deficits in healthcare must have a system of metrics, which benchmark the current quality of care so that specific improvement targets may be set.This concept paper outlines the role of surgical services in a primary healthcare system, highlights the ongoing disparities in access to surgical care and outcomes of surgical care, discusses the importance of a systems-based approach to healthcare and quality improvement, and reviews the current state of surgical care at district hospitals in South Africa. Finally, it proposes that the results from a recently published study on acute appendicitis, as well as data from a number of other common surgical conditions, can provide measurable outcomes across a healthcare system and so act as an indicator for judging improvements in surgical care. This would provide a framework for the introduction of collection of these outcomes as a routine epidemiological health policy tool.
Development and Application of Benchmark Examples for Mode II Static Delamination Propagation and Fatigue Growth Predictions

NASA Technical Reports Server (NTRS)

Krueger, Ronald

2011-01-01

The development of benchmark examples for static delamination propagation and cyclic delamination onset and growth prediction is presented and demonstrated for a commercial code. The example is based on a finite element model of an End-Notched Flexure (ENF) specimen. The example is independent of the analysis software used and allows the assessment of the automated delamination propagation, onset and growth prediction capabilities in commercial finite element codes based on the virtual crack closure technique (VCCT). First, static benchmark examples were created for the specimen. Second, based on the static results, benchmark examples for cyclic delamination growth were created. Third, the load-displacement relationship from a propagation analysis and the benchmark results were compared, and good agreement could be achieved by selecting the appropriate input parameters. Fourth, starting from an initially straight front, the delamination was allowed to grow under cyclic loading. The number of cycles to delamination onset and the number of cycles during delamination growth for each growth increment were obtained from the automated analysis and compared to the benchmark examples. Again, good agreement between the results obtained from the growth analysis and the benchmark results could be achieved by selecting the appropriate input parameters. The benchmarking procedure proved valuable by highlighting the issues associated with choosing the input parameters of the particular implementation. Selecting the appropriate input parameters, however, was not straightforward and often required an iterative procedure. Overall the results are encouraging, but further assessment for mixed-mode delamination is required.
Implementation of Benchmarking Transportation Logistics Practices and Future Benchmarking Organizations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thrower, A.W.; Patric, J.; Keister, M.

2008-07-01

The purpose of the Office of Civilian Radioactive Waste Management's (OCRWM) Logistics Benchmarking Project is to identify established government and industry practices for the safe transportation of hazardous materials which can serve as a yardstick for design and operation of OCRWM's national transportation system for shipping spent nuclear fuel and high-level radioactive waste to the proposed repository at Yucca Mountain, Nevada. The project will present logistics and transportation practices and develop implementation recommendations for adaptation by the national transportation system. This paper will describe the process used to perform the initial benchmarking study, highlight interim findings, and explain how thesemore » findings are being implemented. It will also provide an overview of the next phase of benchmarking studies. The benchmarking effort will remain a high-priority activity throughout the planning and operational phases of the transportation system. The initial phase of the project focused on government transportation programs to identify those practices which are most clearly applicable to OCRWM. These Federal programs have decades of safe transportation experience, strive for excellence in operations, and implement effective stakeholder involvement, all of which parallel OCRWM's transportation mission and vision. The initial benchmarking project focused on four business processes that are critical to OCRWM's mission success, and can be incorporated into OCRWM planning and preparation in the near term. The processes examined were: transportation business model, contract management/out-sourcing, stakeholder relations, and contingency planning. More recently, OCRWM examined logistics operations of AREVA NC's Business Unit Logistics in France. The next phase of benchmarking will focus on integrated domestic and international commercial radioactive logistic operations. The prospective companies represent large scale shippers and have vast
Metric integration architecture for product development

NASA Astrophysics Data System (ADS)

Sieger, David B.

1997-06-01

Present-day product development endeavors utilize the concurrent engineering philosophy as a logical means for incorporating a variety of viewpoints into the design of products. Since this approach provides no explicit procedural provisions, it is necessary to establish at least a mental coupling with a known design process model. The central feature of all such models is the management and transformation of information. While these models assist in structuring the design process, characterizing the basic flow of operations that are involved, they provide no guidance facilities. The significance of this feature, and the role it plays in the time required to develop products, is increasing in importance due to the inherent process dynamics, system/component complexities, and competitive forces. The methodology presented in this paper involves the use of a hierarchical system structure, discrete event system specification (DEVS), and multidimensional state variable based metrics. This approach is unique in its capability to quantify designer's actions throughout product development, provide recommendations about subsequent activity selection, and coordinate distributed activities of designers and/or design teams across all design stages. Conceptual design tool implementation results are used to demonstrate the utility of this technique in improving the incremental decision making process.

Informatics in radiology: Efficiency metrics for imaging device productivity.

PubMed

Hu, Mengqi; Pavlicek, William; Liu, Patrick T; Zhang, Muhong; Langer, Steve G; Wang, Shanshan; Place, Vicki; Miranda, Rafael; Wu, Teresa Tong

2011-01-01

Acute awareness of the costs associated with medical imaging equipment is an ever-present aspect of the current healthcare debate. However, the monitoring of productivity associated with expensive imaging devices is likely to be labor intensive, relies on summary statistics, and lacks accepted and standardized benchmarks of efficiency. In the context of the general Six Sigma DMAIC (design, measure, analyze, improve, and control) process, a World Wide Web-based productivity tool called the Imaging Exam Time Monitor was developed to accurately and remotely monitor imaging efficiency with use of Digital Imaging and Communications in Medicine (DICOM) combined with a picture archiving and communication system. Five device efficiency metrics-examination duration, table utilization, interpatient time, appointment interval time, and interseries time-were derived from DICOM values. These metrics allow the standardized measurement of productivity, to facilitate the comparative evaluation of imaging equipment use and ongoing efforts to improve efficiency. A relational database was constructed to store patient imaging data, along with device- and examination-related data. The database provides full access to ad hoc queries and can automatically generate detailed reports for administrative and business use, thereby allowing staff to monitor data for trends and to better identify possible changes that could lead to improved productivity and reduced costs in association with imaging services. © RSNA, 2011.
The NAS kernel benchmark program

NASA Technical Reports Server (NTRS)

Bailey, D. H.; Barton, J. T.

1985-01-01

A collection of benchmark test kernels that measure supercomputer performance has been developed for the use of the NAS (Numerical Aerodynamic Simulation) program at the NASA Ames Research Center. This benchmark program is described in detail and the specific ground rules are given for running the program as a performance test.
Development and evaluation of aperture-based complexity metrics using film and EPID measurements of static MLC openings

DOE Office of Scientific and Technical Information (OSTI.GOV)

Götstedt, Julia; Karlsson Hauer, Anna; Bäck, Anna, E-mail: anna.back@vgregion.se

Purpose: Complexity metrics have been suggested as a complement to measurement-based quality assurance for intensity modulated radiation therapy (IMRT) and volumetric modulated arc therapy (VMAT). However, these metrics have not yet been sufficiently validated. This study develops and evaluates new aperture-based complexity metrics in the context of static multileaf collimator (MLC) openings and compares them to previously published metrics. Methods: This study develops the converted aperture metric and the edge area metric. The converted aperture metric is based on small and irregular parts within the MLC opening that are quantified as measured distances between MLC leaves. The edge area metricmore » is based on the relative size of the region around the edges defined by the MLC. Another metric suggested in this study is the circumference/area ratio. Earlier defined aperture-based complexity metrics—the modulation complexity score, the edge metric, the ratio monitor units (MU)/Gy, the aperture area, and the aperture irregularity—are compared to the newly proposed metrics. A set of small and irregular static MLC openings are created which simulate individual IMRT/VMAT control points of various complexities. These are measured with both an amorphous silicon electronic portal imaging device and EBT3 film. The differences between calculated and measured dose distributions are evaluated using a pixel-by-pixel comparison with two global dose difference criteria of 3% and 5%. The extent of the dose differences, expressed in terms of pass rate, is used as a measure of the complexity of the MLC openings and used for the evaluation of the metrics compared in this study. The different complexity scores are calculated for each created static MLC opening. The correlation between the calculated complexity scores and the extent of the dose differences (pass rate) are analyzed in scatter plots and using Pearson’s r-values. Results: The complexity scores calculated by the
Design and development of a community carbon cycle benchmarking system for CMIP5 models

NASA Astrophysics Data System (ADS)

Mu, M.; Hoffman, F. M.; Lawrence, D. M.; Riley, W. J.; Keppel-Aleks, G.; Randerson, J. T.

2013-12-01

Benchmarking has been widely used to assess the ability of atmosphere, ocean, sea ice, and land surface models to capture the spatial and temporal variability of observations during the historical period. For the carbon cycle and terrestrial ecosystems, the design and development of an open-source community platform has been an important goal as part of the International Land Model Benchmarking (ILAMB) project. Here we designed and developed a software system that enables the user to specify the models, benchmarks, and scoring systems so that results can be tailored to specific model intercomparison projects. We used this system to evaluate the performance of CMIP5 Earth system models (ESMs). Our scoring system used information from four different aspects of climate, including the climatological mean spatial pattern of gridded surface variables, seasonal cycle dynamics, the amplitude of interannual variability, and long-term decadal trends. We used this system to evaluate burned area, global biomass stocks, net ecosystem exchange, gross primary production, and ecosystem respiration from CMIP5 historical simulations. Initial results indicated that the multi-model mean often performed better than many of the individual models for most of the observational constraints.
Benchmarking and Threshold Standards in Higher Education. Staff and Educational Development Series.

ERIC Educational Resources Information Center

Smith, Helen, Ed.; Armstrong, Michael, Ed.; Brown, Sally, Ed.

This book explores the issues involved in developing standards in higher education, examining the practical issues involved in benchmarking and offering a critical analysis of the problems associated with this developmental tool. The book focuses primarily on experience in the United Kingdom (UK), but looks also at international activity in this…
Development of an Objective Space Suit Mobility Performance Metric Using Metabolic Cost and Functional Tasks

NASA Technical Reports Server (NTRS)

McFarland, Shane M.; Norcross, Jason

2016-01-01

Existing methods for evaluating EVA suit performance and mobility have historically concentrated on isolated joint range of motion and torque. However, these techniques do little to evaluate how well a suited crewmember can actually perform during an EVA. An alternative method of characterizing suited mobility through measurement of metabolic cost to the wearer has been evaluated at Johnson Space Center over the past several years. The most recent study involved six test subjects completing multiple trials of various functional tasks in each of three different space suits; the results indicated it was often possible to discern between different suit designs on the basis of metabolic cost alone. However, other variables may have an effect on real-world suited performance; namely, completion time of the task, the gravity field in which the task is completed, etc. While previous results have analyzed completion time, metabolic cost, and metabolic cost normalized to system mass individually, it is desirable to develop a single metric comprising these (and potentially other) performance metrics. This paper outlines the background upon which this single-score metric is determined to be feasible, and initial efforts to develop such a metric. Forward work includes variable coefficient determination and verification of the metric through repeated testing.
Metric Development for Continuous Process Improvement

DTIC Science & Technology

2011-03-01

observable ( Cropley , 1998). All measurement is done within a context (Morse, 2003), which is shaped by a purpose, existing knowledge, capabilities, and...performance: metrics for entrepreneurship and strategic management research. Cheltenham, UK: Edward Elgar, 2006. Print. 9. Cropley , D. H., “Towards
Translational benchmark risk analysis

PubMed Central

Piegorsch, Walter W.

2010-01-01

Translational development – in the sense of translating a mature methodology from one area of application to another, evolving area – is discussed for the use of benchmark doses in quantitative risk assessment. Illustrations are presented with traditional applications of the benchmark paradigm in biology and toxicology, and also with risk endpoints that differ from traditional toxicological archetypes. It is seen that the benchmark approach can apply to a diverse spectrum of risk management settings. This suggests a promising future for this important risk-analytic tool. Extensions of the method to a wider variety of applications represent a significant opportunity for enhancing environmental, biomedical, industrial, and socio-economic risk assessments. PMID:20953283
Ranking metrics in gene set enrichment analysis: do they matter?

PubMed

Zyla, Joanna; Marczyk, Michal; Weiner, January; Polanska, Joanna

2017-05-12

There exist many methods for describing the complex relation between changes of gene expression in molecular pathways or gene ontologies under different experimental conditions. Among them, Gene Set Enrichment Analysis seems to be one of the most commonly used (over 10,000 citations). An important parameter, which could affect the final result, is the choice of a metric for the ranking of genes. Applying a default ranking metric may lead to poor results. In this work 28 benchmark data sets were used to evaluate the sensitivity and false positive rate of gene set analysis for 16 different ranking metrics including new proposals. Furthermore, the robustness of the chosen methods to sample size was tested. Using k-means clustering algorithm a group of four metrics with the highest performance in terms of overall sensitivity, overall false positive rate and computational load was established i.e. absolute value of Moderated Welch Test statistic, Minimum Significant Difference, absolute value of Signal-To-Noise ratio and Baumgartner-Weiss-Schindler test statistic. In case of false positive rate estimation, all selected ranking metrics were robust with respect to sample size. In case of sensitivity, the absolute value of Moderated Welch Test statistic and absolute value of Signal-To-Noise ratio gave stable results, while Baumgartner-Weiss-Schindler and Minimum Significant Difference showed better results for larger sample size. Finally, the Gene Set Enrichment Analysis method with all tested ranking metrics was parallelised and implemented in MATLAB, and is available at https://github.com/ZAEDPolSl/MrGSEA . Choosing a ranking metric in Gene Set Enrichment Analysis has critical impact on results of pathway enrichment analysis. The absolute value of Moderated Welch Test has the best overall sensitivity and Minimum Significant Difference has the best overall specificity of gene set analysis. When the number of non-normally distributed genes is high, using Baumgartner
Development and comparison of metrics for evaluating climate models and estimation of projection uncertainty

NASA Astrophysics Data System (ADS)

Ring, Christoph; Pollinger, Felix; Kaspar-Ott, Irena; Hertig, Elke; Jacobeit, Jucundus; Paeth, Heiko

2017-04-01

The COMEPRO project (Comparison of Metrics for Probabilistic Climate Change Projections of Mediterranean Precipitation), funded by the Deutsche Forschungsgemeinschaft (DFG), is dedicated to the development of new evaluation metrics for state-of-the-art climate models. Further, we analyze implications for probabilistic projections of climate change. This study focuses on the results of 4-field matrix metrics. Here, six different approaches are compared. We evaluate 24 models of the Coupled Model Intercomparison Project Phase 3 (CMIP3), 40 of CMIP5 and 18 of the Coordinated Regional Downscaling Experiment (CORDEX). In addition to the annual and seasonal precipitation the mean temperature is analysed. We consider both 50-year trend and climatological mean for the second half of the 20th century. For the probabilistic projections of climate change A1b, A2 (CMIP3) and RCP4.5, RCP8.5 (CMIP5,CORDEX) scenarios are used. The eight main study areas are located in the Mediterranean. However, we apply our metrics to globally distributed regions as well. The metrics show high simulation quality of temperature trend and both precipitation and temperature mean for most climate models and study areas. In addition, we find high potential for model weighting in order to reduce uncertainty. These results are in line with other accepted evaluation metrics and studies. The comparison of the different 4-field approaches reveals high correlations for most metrics. The results of the metric-weighted probabilistic density functions of climate change are heterogeneous. We find for different regions and seasons both increases and decreases of uncertainty. The analysis of global study areas is consistent with the regional study areas of the Medeiterrenean.
Developing a confidence metric for the Landsat land surface temperature product

NASA Astrophysics Data System (ADS)

Laraby, Kelly G.; Schott, John R.; Raqueno, Nina

2016-05-01

Land Surface Temperature (LST) is an important Earth system data record that is useful to fields such as change detection, climate research, environmental monitoring, and smaller scale applications such as agriculture. Certain Earth-observing satellites can be used to derive this metric, and it would be extremely useful if such imagery could be used to develop a global product. Through the support of the National Aeronautics and Space Administration (NASA) and the United States Geological Survey (USGS), a LST product for the Landsat series of satellites has been developed. Currently, it has been validated for scenes in North America, with plans to expand to a trusted global product. For ideal atmospheric conditions (e.g. stable atmosphere with no clouds nearby), the LST product underestimates the surface temperature by an average of 0.26 K. When clouds are directly above or near the pixel of interest, however, errors can extend to several Kelvin. As the product approaches public release, our major goal is to develop a quality metric that will provide the user with a per-pixel map of estimated LST errors. There are several sources of error that are involved in the LST calculation process, but performing standard error propagation is a difficult task due to the complexity of the atmospheric propagation component. To circumvent this difficulty, we propose to utilize the relationship between cloud proximity and the error seen in the LST process to help develop a quality metric. This method involves calculating the distance to the nearest cloud from a pixel of interest in a scene, and recording the LST error at that location. Performing this calculation for hundreds of scenes allows us to observe the average LST error for different ranges of distances to the nearest cloud. This paper describes this process in full, and presents results for a large set of Landsat scenes.
Radiation Detection Computational Benchmark Scenarios

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shaver, Mark W.; Casella, Andrew M.; Wittman, Richard S.

2013-09-24

Modeling forms an important component of radiation detection development, allowing for testing of new detector designs, evaluation of existing equipment against a wide variety of potential threat sources, and assessing operation performance of radiation detection systems. This can, however, result in large and complex scenarios which are time consuming to model. A variety of approaches to radiation transport modeling exist with complementary strengths and weaknesses for different problems. This variety of approaches, and the development of promising new tools (such as ORNL’s ADVANTG) which combine benefits of multiple approaches, illustrates the need for a means of evaluating or comparing differentmore » techniques for radiation detection problems. This report presents a set of 9 benchmark problems for comparing different types of radiation transport calculations, identifying appropriate tools for classes of problems, and testing and guiding the development of new methods. The benchmarks were drawn primarily from existing or previous calculations with a preference for scenarios which include experimental data, or otherwise have results with a high level of confidence, are non-sensitive, and represent problem sets of interest to NA-22. From a technical perspective, the benchmarks were chosen to span a range of difficulty and to include gamma transport, neutron transport, or both and represent different important physical processes and a range of sensitivity to angular or energy fidelity. Following benchmark identification, existing information about geometry, measurements, and previous calculations were assembled. Monte Carlo results (MCNP decks) were reviewed or created and re-run in order to attain accurate computational times and to verify agreement with experimental data, when present. Benchmark information was then conveyed to ORNL in order to guide testing and development of hybrid calculations. The results of those ADVANTG calculations were then sent to
Metrics Handbook (Air Force Systems Command)

NASA Astrophysics Data System (ADS)

1991-08-01

The handbook is designed to help one develop and use good metrics. It is intended to provide sufficient information to begin developing metrics for objectives, processes, and tasks, and to steer one toward appropriate actions based on the data one collects. It should be viewed as a road map to assist one in arriving at meaningful metrics and to assist in continuous process improvement.
Design and Application of a Community Land Benchmarking System for Earth System Models

NASA Astrophysics Data System (ADS)

Mu, M.; Hoffman, F. M.; Lawrence, D. M.; Riley, W. J.; Keppel-Aleks, G.; Koven, C. D.; Kluzek, E. B.; Mao, J.; Randerson, J. T.

2015-12-01

Benchmarking has been widely used to assess the ability of climate models to capture the spatial and temporal variability of observations during the historical era. For the carbon cycle and terrestrial ecosystems, the design and development of an open-source community platform has been an important goal as part of the International Land Model Benchmarking (ILAMB) project. Here we developed a new benchmarking software system that enables the user to specify the models, benchmarks, and scoring metrics, so that results can be tailored to specific model intercomparison projects. Evaluation data sets included soil and aboveground carbon stocks, fluxes of energy, carbon and water, burned area, leaf area, and climate forcing and response variables. We used this system to evaluate simulations from the 5th Phase of the Coupled Model Intercomparison Project (CMIP5) with prognostic atmospheric carbon dioxide levels over the period from 1850 to 2005 (i.e., esmHistorical simulations archived on the Earth System Grid Federation). We found that the multi-model ensemble had a high bias in incoming solar radiation across Asia, likely as a consequence of incomplete representation of aerosol effects in this region, and in South America, primarily as a consequence of a low bias in mean annual precipitation. The reduced precipitation in South America had a larger influence on gross primary production than the high bias in incoming light, and as a consequence gross primary production had a low bias relative to the observations. Although model to model variations were large, the multi-model mean had a positive bias in atmospheric carbon dioxide that has been attributed in past work to weak ocean uptake of fossil emissions. In mid latitudes of the northern hemisphere, most models overestimate latent heat fluxes in the early part of the growing season, and underestimate these fluxes in mid-summer and early fall, whereas sensible heat fluxes show the opposite trend.
40 CFR 141.172 - Disinfection profiling and benchmarking.

Code of Federal Regulations, 2011 CFR

2011-07-01

... benchmarking. 141.172 Section 141.172 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED... Disinfection-Systems Serving 10,000 or More People § 141.172 Disinfection profiling and benchmarking. (a... sanitary surveys conducted by the State. (c) Disinfection benchmarking. (1) Any system required to develop...
Benchmarking gate-based quantum computers

NASA Astrophysics Data System (ADS)

Michielsen, Kristel; Nocon, Madita; Willsch, Dennis; Jin, Fengping; Lippert, Thomas; De Raedt, Hans

2017-11-01

With the advent of public access to small gate-based quantum processors, it becomes necessary to develop a benchmarking methodology such that independent researchers can validate the operation of these processors. We explore the usefulness of a number of simple quantum circuits as benchmarks for gate-based quantum computing devices and show that circuits performing identity operations are very simple, scalable and sensitive to gate errors and are therefore very well suited for this task. We illustrate the procedure by presenting benchmark results for the IBM Quantum Experience, a cloud-based platform for gate-based quantum computing.
Model evaluation using a community benchmarking system for land surface models

NASA Astrophysics Data System (ADS)

Mu, M.; Hoffman, F. M.; Lawrence, D. M.; Riley, W. J.; Keppel-Aleks, G.; Kluzek, E. B.; Koven, C. D.; Randerson, J. T.

2014-12-01

Evaluation of atmosphere, ocean, sea ice, and land surface models is an important step in identifying deficiencies in Earth system models and developing improved estimates of future change. For the land surface and carbon cycle, the design of an open-source system has been an important objective of the International Land Model Benchmarking (ILAMB) project. Here we evaluated CMIP5 and CLM models using a benchmarking system that enables users to specify models, data sets, and scoring systems so that results can be tailored to specific model intercomparison projects. Our scoring system used information from four different aspects of global datasets, including climatological mean spatial patterns, seasonal cycle dynamics, interannual variability, and long-term trends. Variable-to-variable comparisons enable investigation of the mechanistic underpinnings of model behavior, and allow for some control of biases in model drivers. Graphics modules allow users to evaluate model performance at local, regional, and global scales. Use of modular structures makes it relatively easy for users to add new variables, diagnostic metrics, benchmarking datasets, or model simulations. Diagnostic results are automatically organized into HTML files, so users can conveniently share results with colleagues. We used this system to evaluate atmospheric carbon dioxide, burned area, global biomass and soil carbon stocks, net ecosystem exchange, gross primary production, ecosystem respiration, terrestrial water storage, evapotranspiration, and surface radiation from CMIP5 historical and ESM historical simulations. We found that the multi-model mean often performed better than many of the individual models for most variables. We plan to publicly release a stable version of the software during fall of 2014 that has land surface, carbon cycle, hydrology, radiation and energy cycle components.
Development of quality metrics for ambulatory care in pediatric patients with tetralogy of Fallot.

PubMed

Villafane, Juan; Edwards, Thomas C; Diab, Karim A; Satou, Gary M; Saarel, Elizabeth; Lai, Wyman W; Serwer, Gerald A; Karpawich, Peter P; Cross, Russell; Schiff, Russell; Chowdhury, Devyani; Hougen, Thomas J

2017-12-01

The objective of this study was to develop quality metrics (QMs) relating to the ambulatory care of children after complete repair of tetralogy of Fallot (TOF). A workgroup team (WT) of pediatric cardiologists with expertise in all aspects of ambulatory cardiac management was formed at the request of the American College of Cardiology (ACC) and the Adult Congenital and Pediatric Cardiology Council (ACPC), to review published guidelines and consensus data relating to the ambulatory care of repaired TOF patients under the age of 18 years. A set of quality metrics (QMs) was proposed by the WT. The metrics went through a two-step evaluation process. In the first step, the RAND-UCLA modified Delphi methodology was employed and the metrics were voted on feasibility and validity by an expert panel. In the second step, QMs were put through an "open comments" process where feedback was provided by the ACPC members. The final QMs were approved by the ACPC council. The TOF WT formulated 9 QMs of which only 6 were submitted to the expert panel; 3 QMs passed the modified RAND-UCLA and went through the "open comments" process. Based on the feedback through the open comment process, only 1 metric was finally approved by the ACPC council. The ACPC Council was able to develop QM for ambulatory care of children with repaired TOF. These patients should have documented genetic testing for 22q11.2 deletion. However, lack of evidence in the literature made it a challenge to formulate other evidence-based QMs. © 2017 Wiley Periodicals, Inc.
Kaiser Permanente's performance improvement system, Part 1: From benchmarking to executing on strategic priorities.

PubMed

Schilling, Lisa; Chase, Alide; Kehrli, Sommer; Liu, Amy Y; Stiefel, Matt; Brentari, Ruth

2010-11-01

By 2004, senior leaders at Kaiser Permanente, the largest not-for-profit health plan in the United States, recognizing variations across service areas in quality, safety, service, and efficiency, began developing a performance improvement (PI) system to realizing best-in-class quality performance across all 35 medical centers. MEASURING SYSTEMWIDE PERFORMANCE: In 2005, a Web-based data dashboard, "Big Q," which tracks the performance of each medical center and service area against external benchmarks and internal goals, was created. PLANNING FOR PI AND BENCHMARKING PERFORMANCE: In 2006, Kaiser Permanente national and regional continued planning the PI system, and in 2007, quality, medical group, operations, and information technology leaders benchmarked five high-performing organizations to identify capabilities required to achieve consistent best-in-class organizational performance. THE PI SYSTEM: The PI system addresses the six capabilities: leadership priority setting, a systems approach to improvement, measurement capability, a learning organization, improvement capacity, and a culture of improvement. PI "deep experts" (mentors) consult with national, regional, and local leaders, and more than 500 improvement advisors are trained to manage portfolios of 90-120 day improvement initiatives at medical centers. Between the second quarter of 2008 and the first quarter of 2009, performance across all Kaiser Permanente medical centers improved on the Big Q metrics. The lessons learned in implementing and sustaining PI as it becomes fully integrated into all levels of Kaiser Permanente can be generalized to other health care systems, hospitals, and other health care organizations.
A benchmark for reaction coordinates in the transition path ensemble

PubMed Central

2016-01-01

The molecular mechanism of a reaction is embedded in its transition path ensemble, the complete collection of reactive trajectories. Utilizing the information in the transition path ensemble alone, we developed a novel metric, which we termed the emergent potential energy, for distinguishing reaction coordinates from the bath modes. The emergent potential energy can be understood as the average energy cost for making a displacement of a coordinate in the transition path ensemble. Where displacing a bath mode invokes essentially no cost, it costs significantly to move the reaction coordinate. Based on some general assumptions of the behaviors of reaction and bath coordinates in the transition path ensemble, we proved theoretically with statistical mechanics that the emergent potential energy could serve as a benchmark of reaction coordinates and demonstrated its effectiveness by applying it to a prototypical system of biomolecular dynamics. Using the emergent potential energy as guidance, we developed a committor-free and intuition-independent method for identifying reaction coordinates in complex systems. We expect this method to be applicable to a wide range of reaction processes in complex biomolecular systems. PMID:27059559

The development of a virtual reality training curriculum for colonoscopy.

PubMed

Sugden, Colin; Aggarwal, Rajesh; Banerjee, Amrita; Haycock, Adam; Thomas-Gibson, Siwan; Williams, Christopher B; Darzi, Ara

2012-07-01

The development of a structured virtual reality (VR) training curriculum for colonoscopy using high-fidelity simulation. Colonoscopy requires detailed knowledge and technical skill. Changes to working practices in recent times have reduced the availability of traditional training opportunities. Much might, therefore, be achieved by applying novel technologies such as VR simulation to colonoscopy. Scientifically developed device-specific curricula aim to maximize the yield of laboratory-based training by focusing on validated modules and linking progression to the attainment of benchmarked proficiency criteria. Fifty participants comprised of 30 novices (<10 colonoscopies), 10 intermediates (100 to 500 colonoscopies), and 10 experienced (>500 colonoscopies) colonoscopists were recruited to participate. Surrogates of proficiency, such as number of procedures undertaken, determined prospective allocation to 1 of 3 groups (novice, intermediate, and experienced). Construct validity and learning value (comparison between groups and within groups respectively) for each task and metric on the chosen simulator model determined suitability for inclusion in the curriculum. Eight tasks in possession of construct validity and significant learning curves were included in the curriculum: 3 abstract tasks, 4 part-procedural tasks, and 1 procedural task. The whole-procedure task was valid for 11 metrics including the following: "time taken to complete the task" (1238, 343, and 293 s; P < 0.001) and "insertion length with embedded tip" (23.8, 3.6, and 4.9 cm; P = 0.005). Learning curves consistently plateaued at or beyond the ninth attempt. Valid metrics were used to define benchmarks, derived from the performance of the experienced cohort, for each included task. A comprehensive, stratified, benchmarked, whole-procedure curriculum has been developed for a modern high-fidelity VR colonoscopy simulator.
Development and Application of Benchmark Examples for Mixed-Mode I/II Quasi-Static Delamination Propagation Predictions

NASA Technical Reports Server (NTRS)

Krueger, Ronald

2012-01-01

The development of benchmark examples for quasi-static delamination propagation prediction is presented and demonstrated for a commercial code. The examples are based on finite element models of the Mixed-Mode Bending (MMB) specimen. The examples are independent of the analysis software used and allow the assessment of the automated delamination propagation prediction capability in commercial finite element codes based on the virtual crack closure technique (VCCT). First, quasi-static benchmark examples were created for the specimen. Second, starting from an initially straight front, the delamination was allowed to propagate under quasi-static loading. Third, the load-displacement relationship from a propagation analysis and the benchmark results were compared, and good agreement could be achieved by selecting the appropriate input parameters. Good agreement between the results obtained from the automated propagation analysis and the benchmark results could be achieved by selecting input parameters that had previously been determined during analyses of mode I Double Cantilever Beam and mode II End Notched Flexure specimens. The benchmarking procedure proved valuable by highlighting the issues associated with choosing the input parameters of the particular implementation. Overall the results are encouraging, but further assessment for mixed-mode delamination fatigue onset and growth is required.
Information-Theoretic Benchmarking of Land Surface Models

NASA Astrophysics Data System (ADS)

Nearing, Grey; Mocko, David; Kumar, Sujay; Peters-Lidard, Christa; Xia, Youlong

2016-04-01

Benchmarking is a type of model evaluation that compares model performance against a baseline metric that is derived, typically, from a different existing model. Statistical benchmarking was used to qualitatively show that land surface models do not fully utilize information in boundary conditions [1] several years before Gong et al [2] discovered the particular type of benchmark that makes it possible to *quantify* the amount of information lost by an incorrect or imperfect model structure. This theoretical development laid the foundation for a formal theory of model benchmarking [3]. We here extend that theory to separate uncertainty contributions from the three major components of dynamical systems models [4]: model structures, model parameters, and boundary conditions describe time-dependent details of each prediction scenario. The key to this new development is the use of large-sample [5] data sets that span multiple soil types, climates, and biomes, which allows us to segregate uncertainty due to parameters from the two other sources. The benefit of this approach for uncertainty quantification and segregation is that it does not rely on Bayesian priors (although it is strictly coherent with Bayes' theorem and with probability theory), and therefore the partitioning of uncertainty into different components is *not* dependent on any a priori assumptions. We apply this methodology to assess the information use efficiency of the four land surface models that comprise the North American Land Data Assimilation System (Noah, Mosaic, SAC-SMA, and VIC). Specifically, we looked at the ability of these models to estimate soil moisture and latent heat fluxes. We found that in the case of soil moisture, about 25% of net information loss was from boundary conditions, around 45% was from model parameters, and 30-40% was from the model structures. In the case of latent heat flux, boundary conditions contributed about 50% of net uncertainty, and model structures contributed
Metrication report to the Congress

NASA Technical Reports Server (NTRS)

1989-01-01

The major NASA metrication activity of 1988 concerned the Space Station. Although the metric system was the baseline measurement system for preliminary design studies, solicitations for final design and development of the Space Station Freedom requested use of the inch-pound system because of concerns with cost impact and potential safety hazards. Under that policy, however use of the metric system would be permitted through waivers where its use was appropriate. Late in 1987, several Department of Defense decisions were made to increase commitment to the metric system, thereby broadening the potential base of metric involvement in the U.S. industry. A re-evaluation of Space Station Freedom units of measure policy was, therefore, initiated in January 1988.
Research on computer systems benchmarking

NASA Technical Reports Server (NTRS)

Smith, Alan Jay (Principal Investigator)

1996-01-01

This grant addresses the topic of research on computer systems benchmarking and is more generally concerned with performance issues in computer systems. This report reviews work in those areas during the period of NASA support under this grant. The bulk of the work performed concerned benchmarking and analysis of CPUs, compilers, caches, and benchmark programs. The first part of this work concerned the issue of benchmark performance prediction. A new approach to benchmarking and machine characterization was reported, using a machine characterizer that measures the performance of a given system in terms of a Fortran abstract machine. Another report focused on analyzing compiler performance. The performance impact of optimization in the context of our methodology for CPU performance characterization was based on the abstract machine model. Benchmark programs are analyzed in another paper. A machine-independent model of program execution was developed to characterize both machine performance and program execution. By merging these machine and program characterizations, execution time can be estimated for arbitrary machine/program combinations. The work was continued into the domain of parallel and vector machines, including the issue of caches in vector processors and multiprocessors. All of the afore-mentioned accomplishments are more specifically summarized in this report, as well as those smaller in magnitude supported by this grant.
Benchmark problems for numerical implementations of phase field models

DOE PAGES

Jokisaari, A. M.; Voorhees, P. W.; Guyer, J. E.; ...

2016-10-01

Here, we present the first set of benchmark problems for phase field models that are being developed by the Center for Hierarchical Materials Design (CHiMaD) and the National Institute of Standards and Technology (NIST). While many scientific research areas use a limited set of well-established software, the growing phase field community continues to develop a wide variety of codes and lacks benchmark problems to consistently evaluate the numerical performance of new implementations. Phase field modeling has become significantly more popular as computational power has increased and is now becoming mainstream, driving the need for benchmark problems to validate and verifymore » new implementations. We follow the example set by the micromagnetics community to develop an evolving set of benchmark problems that test the usability, computational resources, numerical capabilities and physical scope of phase field simulation codes. In this paper, we propose two benchmark problems that cover the physics of solute diffusion and growth and coarsening of a second phase via a simple spinodal decomposition model and a more complex Ostwald ripening model. We demonstrate the utility of benchmark problems by comparing the results of simulations performed with two different adaptive time stepping techniques, and we discuss the needs of future benchmark problems. The development of benchmark problems will enable the results of quantitative phase field models to be confidently incorporated into integrated computational materials science and engineering (ICME), an important goal of the Materials Genome Initiative.« less
Metrication: What Can HRD Specialists Do?

ERIC Educational Resources Information Center

Short, Larry G.

1978-01-01

First discusses some features of the Metric Conversion Act which established federal support of metric system usage in the United States. Then covers the following: what HRD (Human Resources Development) specialists can do to assist their company managers during the conversion process; metric training strategies; and how to prepare for metric…
Software Quality Assurance Metrics

NASA Technical Reports Server (NTRS)

McRae, Kalindra A.

2004-01-01

Software Quality Assurance (SQA) is a planned and systematic set of activities that ensures conformance of software life cycle processes and products conform to requirements, standards and procedures. In software development, software quality means meeting requirements and a degree of excellence and refinement of a project or product. Software Quality is a set of attributes of a software product by which its quality is described and evaluated. The set of attributes includes functionality, reliability, usability, efficiency, maintainability, and portability. Software Metrics help us understand the technical process that is used to develop a product. The process is measured to improve it and the product is measured to increase quality throughout the life cycle of software. Software Metrics are measurements of the quality of software. Software is measured to indicate the quality of the product, to assess the productivity of the people who produce the product, to assess the benefits derived from new software engineering methods and tools, to form a baseline for estimation, and to help justify requests for new tools or additional training. Any part of the software development can be measured. If Software Metrics are implemented in software development, it can save time, money, and allow the organization to identify the caused of defects which have the greatest effect on software development. The summer of 2004, I worked with Cynthia Calhoun and Frank Robinson in the Software Assurance/Risk Management department. My task was to research and collect, compile, and analyze SQA Metrics that have been used in other projects that are not currently being used by the SA team and report them to the Software Assurance team to see if any metrics can be implemented in their software assurance life cycle process.
Numerical Calabi-Yau metrics

NASA Astrophysics Data System (ADS)

Douglas, Michael R.; Karp, Robert L.; Lukic, Sergio; Reinbacher, René

2008-03-01

We develop numerical methods for approximating Ricci flat metrics on Calabi-Yau hypersurfaces in projective spaces. Our approach is based on finding balanced metrics and builds on recent theoretical work by Donaldson. We illustrate our methods in detail for a one parameter family of quintics. We also suggest several ways to extend our results.
Developing of Indicators of an E-Learning Benchmarking Model for Higher Education Institutions

ERIC Educational Resources Information Center

Sae-Khow, Jirasak

2014-01-01

This study was the development of e-learning indicators used as an e-learning benchmarking model for higher education institutes. Specifically, it aimed to: 1) synthesize the e-learning indicators; 2) examine content validity by specialists; and 3) explore appropriateness of the e-learning indicators. Review of related literature included…
Community-based benchmarking of the CMIP DECK experiments

NASA Astrophysics Data System (ADS)

Gleckler, P. J.

2015-12-01

A diversity of community-based efforts are independently developing "diagnostic packages" with little or no coordination between them. A short list of examples include NCAR's Climate Variability Diagnostics Package (CVDP), ORNL's International Land Model Benchmarking (ILAMB), LBNL's Toolkit for Extreme Climate Analysis (TECA), PCMDI's Metrics Package (PMP), the EU EMBRACE ESMValTool, the WGNE MJO diagnostics package, and CFMIP diagnostics. The full value of these efforts cannot be realized without some coordination. As a first step, a WCRP effort has initiated a catalog to document candidate packages that could potentially be applied in a "repeat-use" fashion to all simulations contributed to the CMIP DECK (Diagnostic, Evaluation and Characterization of Klima) experiments. Some coordination of community-based diagnostics has the additional potential to improve how CMIP modeling groups analyze their simulations during model-development. The fact that most modeling groups now maintain a "CMIP compliant" data stream means that in principal without much effort they could readily adopt a set of well organized diagnostic capabilities specifically designed to operate on CMIP DECK experiments. Ultimately, a detailed listing of and access to analysis codes that are demonstrated to work "out of the box" with CMIP data could enable model developers (and others) to select those codes they wish to implement in-house, potentially enabling more systematic evaluation during the model development process.
Development and Implementation of a Design Metric for Systems Containing Long-Term Fluid Loops

NASA Technical Reports Server (NTRS)

Steele, John W.

2016-01-01

John Steele, a chemist and technical fellow from United Technologies Corporation, provided a water quality module to assist engineers and scientists with a metric tool to evaluate risks associated with the design of space systems with fluid loops. This design metric is a methodical, quantitative, lessons-learned based means to evaluate the robustness of a long-term fluid loop system design. The tool was developed by a cross-section of engineering disciplines who had decades of experience and problem resolution.
Medical school benchmarking - from tools to programmes.

PubMed

Wilkinson, Tim J; Hudson, Judith N; Mccoll, Geoffrey J; Hu, Wendy C Y; Jolly, Brian C; Schuwirth, Lambert W T

2015-02-01

Benchmarking among medical schools is essential, but may result in unwanted effects. To apply a conceptual framework to selected benchmarking activities of medical schools. We present an analogy between the effects of assessment on student learning and the effects of benchmarking on medical school educational activities. A framework by which benchmarking can be evaluated was developed and applied to key current benchmarking activities in Australia and New Zealand. The analogy generated a conceptual framework that tested five questions to be considered in relation to benchmarking: what is the purpose? what are the attributes of value? what are the best tools to assess the attributes of value? what happens to the results? and, what is the likely "institutional impact" of the results? If the activities were compared against a blueprint of desirable medical graduate outcomes, notable omissions would emerge. Medical schools should benchmark their performance on a range of educational activities to ensure quality improvement and to assure stakeholders that standards are being met. Although benchmarking potentially has positive benefits, it could also result in perverse incentives with unforeseen and detrimental effects on learning if it is undertaken using only a few selected assessment tools.
Learning Through Benchmarking: Developing a Relational, Prospective Approach to Benchmarking ICT in Learning and Teaching

ERIC Educational Resources Information Center

Ellis, Robert A.; Moore, Roger R.

2006-01-01

This study discusses benchmarking the use of information and communication technologies (ICT) in teaching and learning between two universities with different missions: one an Australian campus-based metropolitan university and the other a British distance-education provider. It argues that the differences notwithstanding, it is possible to…
Benchmarking: contexts and details matter.

PubMed

Zheng, Siyuan

2017-07-05

Benchmarking is an essential step in the development of computational tools. We take this opportunity to pitch in our opinions on tool benchmarking, in light of two correspondence articles published in Genome Biology.Please see related Li et al. and Newman et al. correspondence articles: www.dx.doi.org/10.1186/s13059-017-1256-5 and www.dx.doi.org/10.1186/s13059-017-1257-4.
Using Benchmarking To Influence Tuition and Fee Decisions.

ERIC Educational Resources Information Center

Hubbell, Loren W. Loomis; Massa, Robert J.; Lapovsky, Lucie

2002-01-01

Discusses the use of benchmarking in managing enrollment. Using a case study, illustrates how benchmarking can help administrators develop strategies for planning and implementing admissions and pricing practices. (EV)
Benchmarking Terrestrial Ecosystem Models in the South Central US

NASA Astrophysics Data System (ADS)

Kc, M.; Winton, K.; Langston, M. A.; Luo, Y.

2016-12-01

Ecosystem services and products are the foundation of sustainability for regional and global economy since we are directly or indirectly dependent on the ecosystem services like food, livestock, water, air, wildlife etc. It has been increasingly recognized that for sustainability concerns, the conservation problems need to be addressed in the context of entire ecosystems. This approach is even more vital in the 21st century with formidable increasing human population and rapid changes in global environment. This study was conducted to find the state of the science of ecosystem models in the South-Central region of US. The ecosystem models were benchmarked using ILAMB diagnostic package developed as a result of International Land Model Benchmarking (ILAMB) project on four main categories; viz, Ecosystem and Carbon Cycle, Hydrology Cycle, Radiation and Energy Cycle and Climate forcings. A cumulative assessment was generated with weighted seven different skill assessment metrics for the ecosystem models. This synthesis on the current state of the science of ecosystem modeling in the South-Central region of US will be highly useful towards coupling these models with climate, agronomic, hydrologic, economic or management models to better represent ecosystem dynamics as affected by climate change and human activities; and hence gain more reliable predictions of future ecosystem functions and service in the region. Better understandings of such processes will increase our ability to predict the ecosystem responses and feedbacks to environmental and human induced change in the region so that decision makers can make an informed management decisions of the ecosystem.
Establishing objective benchmarks in robotic virtual reality simulation at the level of a competent surgeon using the RobotiX Mentor simulator.

PubMed

Watkinson, William; Raison, Nicholas; Abe, Takashige; Harrison, Patrick; Khan, Shamim; Van der Poel, Henk; Dasgupta, Prokar; Ahmed, Kamran

2018-05-01

To establish objective benchmarks at the level of a competent robotic surgeon across different exercises and metrics for the RobotiX Mentor virtual reality (VR) simulator suitable for use within a robotic surgical training curriculum. This retrospective observational study analysed results from multiple data sources, all of which used the RobotiX Mentor VR simulator. 123 participants with varying experience from novice to expert completed the exercises. Competency was established as the 25th centile of the mean advanced intermediate score. Three basic skill exercises and two advanced skill exercises were used. King's College London. 84 Novice, 26 beginner intermediates, 9 advanced intermediates and 4 experts were used in this retrospective observational study. Objective benchmarks derived from the 25th centile of the mean scores of the advanced intermediates provided suitably challenging yet also achievable targets for training surgeons. The disparity in scores was greatest for the advanced exercises. Novice surgeons are able to achieve the benchmarks across all exercises in the majority of metrics. We have successfully created this proof-of-concept study, which requires validation in a larger cohort. Objective benchmarks obtained from the 25th centile of the mean scores of advanced intermediates provide clinically relevant benchmarks at the standard of a competent robotic surgeon that are challenging yet also attainable. That can be used within a VR training curriculum allowing participants to track and monitor their progress in a structured and progressional manner through five exercises. Providing clearly defined targets, ensuring that a universal training standard has been achieved across training surgeons. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
PMLB: a large benchmark suite for machine learning evaluation and comparison.

PubMed

Olson, Randal S; La Cava, William; Orzechowski, Patryk; Urbanowicz, Ryan J; Moore, Jason H

2017-01-01

The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. From this study, we find that existing benchmarks lack the diversity to properly benchmark machine learning algorithms, and there are several gaps in benchmarking problems that still need to be considered. This work represents another important step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.
Benchmarking initiatives in the water industry.

PubMed

Parena, R; Smeets, E

2001-01-01

Customer satisfaction and service care are every day pushing professionals in the water industry to seek to improve their performance, lowering costs and increasing the provided service level. Process Benchmarking is generally recognised as a systematic mechanism of comparing one's own utility with other utilities or businesses with the intent of self-improvement by adopting structures or methods used elsewhere. The IWA Task Force on Benchmarking, operating inside the Statistics and Economics Committee, has been committed to developing a general accepted concept of Process Benchmarking to support water decision-makers in addressing issues of efficiency. In a first step the Task Force disseminated among the Committee members a questionnaire focused on providing suggestions about the kind, the evolution degree and the main concepts of Benchmarking adopted in the represented Countries. A comparison among the guidelines adopted in The Netherlands and Scandinavia has recently challenged the Task Force in drafting a methodology for a worldwide process benchmarking in water industry. The paper provides a framework of the most interesting benchmarking experiences in the water sector and describes in detail both the final results of the survey and the methodology focused on identification of possible improvement areas.

Establishing Language Benchmarks for Children with Typically Developing Language and Children with Language Impairment

ERIC Educational Resources Information Center

Schmitt, Mary Beth; Logan, Jessica A. R.; Tambyraja, Sherine R.; Farquharson, Kelly; Justice, Laura M.

2017-01-01

Purpose: Practitioners, researchers, and policymakers (i.e., stakeholders) have vested interests in children's language growth yet currently do not have empirically driven methods for measuring such outcomes. The present study established language benchmarks for children with typically developing language (TDL) and children with language…
Evaluation of Two Crew Module Boilerplate Tests Using Newly Developed Calibration Metrics

NASA Technical Reports Server (NTRS)

Horta, Lucas G.; Reaves, Mercedes C.

2012-01-01

The paper discusses a application of multi-dimensional calibration metrics to evaluate pressure data from water drop tests of the Max Launch Abort System (MLAS) crew module boilerplate. Specifically, three metrics are discussed: 1) a metric to assess the probability of enveloping the measured data with the model, 2) a multi-dimensional orthogonality metric to assess model adequacy between test and analysis, and 3) a prediction error metric to conduct sensor placement to minimize pressure prediction errors. Data from similar (nearly repeated) capsule drop tests shows significant variability in the measured pressure responses. When compared to expected variability using model predictions, it is demonstrated that the measured variability cannot be explained by the model under the current uncertainty assumptions.
Human Health Benchmarks for Pesticides

EPA Pesticide Factsheets

Advanced testing methods now allow pesticides to be detected in water at very low levels. These small amounts of pesticides detected in drinking water or source water for drinking water do not necessarily indicate a health risk. The EPA has developed human health benchmarks for 363 pesticides to enable our partners to better determine whether the detection of a pesticide in drinking water or source waters for drinking water may indicate a potential health risk and to help them prioritize monitoring efforts.The table below includes benchmarks for acute (one-day) and chronic (lifetime) exposures for the most sensitive populations from exposure to pesticides that may be found in surface or ground water sources of drinking water. The table also includes benchmarks for 40 pesticides in drinking water that have the potential for cancer risk. The HHBP table includes pesticide active ingredients for which Health Advisories or enforceable National Primary Drinking Water Regulations (e.g., maximum contaminant levels) have not been developed.
Evaluation of control strategies using an oxidation ditch benchmark.

PubMed

Abusam, A; Keesman, K J; Spanjers, H; van, Straten G; Meinema, K

2002-01-01

This paper presents validation and implementation results of a benchmark developed for a specific full-scale oxidation ditch wastewater treatment plant. A benchmark is a standard simulation procedure that can be used as a tool in evaluating various control strategies proposed for wastewater treatment plants. It is based on model and performance criteria development. Testing of this benchmark, by comparing benchmark predictions to real measurements of the electrical energy consumptions and amounts of disposed sludge for a specific oxidation ditch WWTP, has shown that it can (reasonably) be used for evaluating the performance of this WWTP. Subsequently, the validated benchmark was then used in evaluating some basic and advanced control strategies. Some of the interesting results obtained are the following: (i) influent flow splitting ratio, between the first and the fourth aerated compartments of the ditch, has no significant effect on the TN concentrations in the effluent, and (ii) for evaluation of long-term control strategies, future benchmarks need to be able to assess settlers' performance.
BENCHMARKING SUSTAINABILITY ENGINEERING EDUCATION

EPA Science Inventory

The goals of this project are to develop and apply a methodology for benchmarking curricula in sustainability engineering and to identify individuals active in sustainability engineering education.
Benchmarking hypercube hardware and software

NASA Technical Reports Server (NTRS)

Grunwald, Dirk C.; Reed, Daniel A.

1986-01-01

It was long a truism in computer systems design that balanced systems achieve the best performance. Message passing parallel processors are no different. To quantify the balance of a hypercube design, an experimental methodology was developed and the associated suite of benchmarks was applied to several existing hypercubes. The benchmark suite includes tests of both processor speed in the absence of internode communication and message transmission speed as a function of communication patterns.
Career performance trajectories of Olympic swimmers: benchmarks for talent development.

PubMed

Allen, Sian V; Vandenbogaerde, Tom J; Hopkins, William G

2014-01-01

The age-related progression of elite athletes to their career-best performances can provide benchmarks for talent development. The purpose of this study was to model career performance trajectories of Olympic swimmers to develop these benchmarks. We searched the Web for annual best times of swimmers who were top 16 in pool events at the 2008 or 2012 Olympics, from each swimmer's earliest available competitive performance through to 2012. There were 6959 times in the 13 events for each sex, for 683 swimmers, with 10 ± 3 performances per swimmer (mean ± s). Progression to peak performance was tracked with individual quadratic trajectories derived using a mixed linear model that included adjustments for better performance in Olympic years and for the use of full-body polyurethane swimsuits in 2009. Analysis of residuals revealed appropriate fit of quadratic trends to the data. The trajectories provided estimates of age of peak performance and the duration of the age window of trivial improvement and decline around the peak. Men achieved peak performance later than women (24.2 ± 2.1 vs. 22.5 ± 2.4 years), while peak performance occurred at later ages for the shorter distances for both sexes (∼1.5-2.0 years between sprint and distance-event groups). Men and women had a similar duration in the peak-performance window (2.6 ± 1.5 years) and similar progressions to peak performance over four years (2.4 ± 1.2%) and eight years (9.5 ± 4.8%). These data provide performance targets for swimmers aiming to achieve elite-level performance.
Metrics in method engineering

NASA Astrophysics Data System (ADS)

Brinkkemper, S.; Rossi, M.

1994-12-01

As customizable computer aided software engineering (CASE) tools, or CASE shells, have been introduced in academia and industry, there has been a growing interest into the systematic construction of methods and their support environments, i.e. method engineering. To aid the method developers and method selectors in their tasks, we propose two sets of metrics, which measure the complexity of diagrammatic specification techniques on the one hand, and of complete systems development methods on the other hand. Proposed metrics provide a relatively fast and simple way to analyze the technique (or method) properties, and when accompanied with other selection criteria, can be used for estimating the cost of learning the technique and the relative complexity of a technique compared to others. To demonstrate the applicability of the proposed metrics, we have applied them to 34 techniques and 15 methods.
Benchmark Airport Charges

NASA Technical Reports Server (NTRS)

deWit, A.; Cohn, N.

1999-01-01

The Netherlands Directorate General of Civil Aviation (DGCA) commissioned Hague Consulting Group (HCG) to complete a benchmark study of airport charges at twenty eight airports in Europe and around the world, based on 1996 charges. This study followed previous DGCA research on the topic but included more airports in much more detail. The main purpose of this new benchmark study was to provide insight into the levels and types of airport charges worldwide and into recent changes in airport charge policy and structure, This paper describes the 1996 analysis. It is intended that this work be repeated every year in order to follow developing trends and provide the most up-to-date information possible.
Benchmark Airport Charges

NASA Technical Reports Server (NTRS)

de Wit, A.; Cohn, N.

1999-01-01

The Netherlands Directorate General of Civil Aviation (DGCA) commissioned Hague Consulting Group (HCG) to complete a benchmark study of airport charges at twenty eight airports in Europe and around the world, based on 1996 charges. This study followed previous DGCA research on the topic but included more airports in much more detail. The main purpose of this new benchmark study was to provide insight into the levels and types of airport charges worldwide and into recent changes in airport charge policy and structure. This paper describes the 1996 analysis. It is intended that this work be repeated every year in order to follow developing trends and provide the most up-to-date information possible.
Guidelines for Development and Use of Mobile Metric Education Laboratories.

ERIC Educational Resources Information Center

Carr, Edwin M.; And Others

Information is provided for projects on metric education involving the use of motor vehicles or vans as mobile laboratories or demonstration units. Included are various types and functions of mobile education facilities in common use in recent years in both mathematics and non-mathematics areas, with descriptions of several current metric mobile…
A Question of Accountability: Looking beyond Federal Mandates for Metrics That Accurately Benchmark Community College Success

ERIC Educational Resources Information Center

Joch, Alan

2014-01-01

The need for increased accountability in higher education and, specifically, the nation's community colleges-is something most educators can agree on. The challenge has, and continues to be, finding a system of metrics that meets the unique needs of two-year institutions versus their four-year-counterparts. Last summer, President Obama unveiled…
Benchmarking Evaluation Results for Prototype Extravehicular Activity Gloves

NASA Technical Reports Server (NTRS)

Aitchison, Lindsay; McFarland, Shane

2012-01-01

The Space Suit Assembly (SSA) Development Team at NASA Johnson Space Center has invested heavily in the advancement of rear-entry planetary exploration suit design but largely deferred development of extravehicular activity (EVA) glove designs, and accepted the risk of using the current flight gloves, Phase VI, for unique mission scenarios outside the Space Shuttle and International Space Station (ISS) Program realm of experience. However, as design reference missions mature, the risks of using heritage hardware have highlighted the need for developing robust new glove technologies. To address the technology gap, the NASA Game-Changing Technology group provided start-up funding for the High Performance EVA Glove (HPEG) Project in the spring of 2012. The overarching goal of the HPEG Project is to develop a robust glove design that increases human performance during EVA and creates pathway for future implementation of emergent technologies, with specific aims of increasing pressurized mobility to 60% of barehanded capability, increasing the durability by 100%, and decreasing the potential of gloves to cause injury during use. The HPEG Project focused initial efforts on identifying potential new technologies and benchmarking the performance of current state of the art gloves to identify trends in design and fit leading to establish standards and metrics against which emerging technologies can be assessed at both the component and assembly levels. The first of the benchmarking tests evaluated the quantitative mobility performance and subjective fit of four prototype gloves developed by Flagsuit LLC, Final Frontier Designs, LLC Dover, and David Clark Company as compared to the Phase VI. All of the companies were asked to design and fabricate gloves to the same set of NASA provided hand measurements (which corresponded to a single size of Phase Vi glove) and focus their efforts on improving mobility in the metacarpal phalangeal and carpometacarpal joints. Four test
Benchmarking and Performance Measurement.

ERIC Educational Resources Information Center

Town, J. Stephen

This paper defines benchmarking and its relationship to quality management, describes a project which applied the technique in a library context, and explores the relationship between performance measurement and benchmarking. Numerous benchmarking methods contain similar elements: deciding what to benchmark; identifying partners; gathering…
Using a health promotion model to promote benchmarking.

PubMed

Welby, Jane

2006-07-01

The North East (England) Neonatal Benchmarking Group has been established for almost a decade and has researched and developed a substantial number of evidence-based benchmarks. With no firm evidence that these were being used or that there was any standardisation of neonatal care throughout the region, the group embarked on a programme to review the benchmarks and determine what evidence-based guidelines were needed to support standardisation. A health promotion planning model was used by one subgroup to structure the programme; it enabled all members of the sub group to engage in the review process and provided the motivation and supporting documentation for implementation of changes in practice. The need for a regional guideline development group to complement the activity of the benchmarking group is being addressed.
Moment-based metrics for global sensitivity analysis of hydrological systems

NASA Astrophysics Data System (ADS)

Dell'Oca, Aronne; Riva, Monica; Guadagnini, Alberto

2017-12-01

We propose new metrics to assist global sensitivity analysis, GSA, of hydrological and Earth systems. Our approach allows assessing the impact of uncertain parameters on main features of the probability density function, pdf, of a target model output, y. These include the expected value of y, the spread around the mean and the degree of symmetry and tailedness of the pdf of y. Since reliable assessment of higher-order statistical moments can be computationally demanding, we couple our GSA approach with a surrogate model, approximating the full model response at a reduced computational cost. Here, we consider the generalized polynomial chaos expansion (gPCE), other model reduction techniques being fully compatible with our theoretical framework. We demonstrate our approach through three test cases, including an analytical benchmark, a simplified scenario mimicking pumping in a coastal aquifer and a laboratory-scale conservative transport experiment. Our results allow ascertaining which parameters can impact some moments of the model output pdf while being uninfluential to others. We also investigate the error associated with the evaluation of our sensitivity metrics by replacing the original system model through a gPCE. Our results indicate that the construction of a surrogate model with increasing level of accuracy might be required depending on the statistical moment considered in the GSA. The approach is fully compatible with (and can assist the development of) analysis techniques employed in the context of reduction of model complexity, model calibration, design of experiment, uncertainty quantification and risk assessment.
Reliable B Cell Epitope Predictions: Impacts of Method Development and Improved Benchmarking

PubMed Central

Kringelum, Jens Vindahl; Lundegaard, Claus; Lund, Ole; Nielsen, Morten

2012-01-01

The interaction between antibodies and antigens is one of the most important immune system mechanisms for clearing infectious organisms from the host. Antibodies bind to antigens at sites referred to as B-cell epitopes. Identification of the exact location of B-cell epitopes is essential in several biomedical applications such as; rational vaccine design, development of disease diagnostics and immunotherapeutics. However, experimental mapping of epitopes is resource intensive making in silico methods an appealing complementary approach. To date, the reported performance of methods for in silico mapping of B-cell epitopes has been moderate. Several issues regarding the evaluation data sets may however have led to the performance values being underestimated: Rarely, all potential epitopes have been mapped on an antigen, and antibodies are generally raised against the antigen in a given biological context not against the antigen monomer. Improper dealing with these aspects leads to many artificial false positive predictions and hence to incorrect low performance values. To demonstrate the impact of proper benchmark definitions, we here present an updated version of the DiscoTope method incorporating a novel spatial neighborhood definition and half-sphere exposure as surface measure. Compared to other state-of-the-art prediction methods, Discotope-2.0 displayed improved performance both in cross-validation and in independent evaluations. Using DiscoTope-2.0, we assessed the impact on performance when using proper benchmark definitions. For 13 proteins in the training data set where sufficient biological information was available to make a proper benchmark redefinition, the average AUC performance was improved from 0.791 to 0.824. Similarly, the average AUC performance on an independent evaluation data set improved from 0.712 to 0.727. Our results thus demonstrate that given proper benchmark definitions, B-cell epitope prediction methods achieve highly significant
XWeB: The XML Warehouse Benchmark

NASA Astrophysics Data System (ADS)

Mahboubi, Hadj; Darmont, Jérôme

With the emergence of XML as a standard for representing business data, new decision support applications are being developed. These XML data warehouses aim at supporting On-Line Analytical Processing (OLAP) operations that manipulate irregular XML data. To ensure feasibility of these new tools, important performance issues must be addressed. Performance is customarily assessed with the help of benchmarks. However, decision support benchmarks do not currently support XML features. In this paper, we introduce the XML Warehouse Benchmark (XWeB), which aims at filling this gap. XWeB derives from the relational decision support benchmark TPC-H. It is mainly composed of a test data warehouse that is based on a unified reference model for XML warehouses and that features XML-specific structures, and its associate XQuery decision support workload. XWeB's usage is illustrated by experiments on several XML database management systems.
A Validation of Object-Oriented Design Metrics

NASA Technical Reports Server (NTRS)

Basili, Victor R.; Briand, Lionel; Melo, Walcelio L.

1995-01-01

This paper presents the results of a study conducted at the University of Maryland in which we experimentally investigated the suite of Object-Oriented (00) design metrics introduced by [Chidamber and Kemerer, 1994]. In order to do this, we assessed these metrics as predictors of fault-prone classes. This study is complementary to [Lieand Henry, 1993] where the same suite of metrics had been used to assess frequencies of maintenance changes to classes. To perform our validation accurately, we collected data on the development of eight medium-sized information management systems based on identical requirements. All eight projects were developed using a sequential life cycle model, a well-known 00 analysis/design method and the C++ programming language. Based on experimental results, the advantages and drawbacks of these 00 metrics are discussed and suggestions for improvement are provided. Several of Chidamber and Kemerer's 00 metrics appear to be adequate to predict class fault-proneness during the early phases of the life-cycle. We also showed that they are, on our data set, better predictors than "traditional" code metrics, which can only be collected at a later phase of the software development processes.
Integrated framework for developing search and discrimination metrics

NASA Astrophysics Data System (ADS)

Copeland, Anthony C.; Trivedi, Mohan M.

1997-06-01

This paper presents an experimental framework for evaluating target signature metrics as models of human visual search and discrimination. This framework is based on a prototype eye tracking testbed, the Integrated Testbed for Eye Movement Studies (ITEMS). ITEMS determines an observer's visual fixation point while he studies a displayed image scene, by processing video of the observer's eye. The utility of this framework is illustrated with an experiment using gray-scale images of outdoor scenes that contain randomly placed targets. Each target is a square region of a specific size containing pixel values from another image of an outdoor scene. The real-world analogy of this experiment is that of a military observer looking upon the sensed image of a static scene to find camouflaged enemy targets that are reported to be in the area. ITEMS provides the data necessary to compute various statistics for each target to describe how easily the observers located it, including the likelihood the target was fixated or identified and the time required to do so. The computed values of several target signature metrics are compared to these statistics, and a second-order metric based on a model of image texture was found to be the most highly correlated.

ICSBEP Benchmarks For Nuclear Data Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Briggs, J. Blair

2005-05-24

The International Criticality Safety Benchmark Evaluation Project (ICSBEP) was initiated in 1992 by the United States Department of Energy. The ICSBEP became an official activity of the Organization for Economic Cooperation and Development (OECD) -- Nuclear Energy Agency (NEA) in 1995. Representatives from the United States, United Kingdom, France, Japan, the Russian Federation, Hungary, Republic of Korea, Slovenia, Serbia and Montenegro (formerly Yugoslavia), Kazakhstan, Spain, Israel, Brazil, Poland, and the Czech Republic are now participating. South Africa, India, China, and Germany are considering participation. The purpose of the ICSBEP is to identify, evaluate, verify, and formally document a comprehensive andmore » internationally peer-reviewed set of criticality safety benchmark data. The work of the ICSBEP is published as an OECD handbook entitled ''International Handbook of Evaluated Criticality Safety Benchmark Experiments.'' The 2004 Edition of the Handbook contains benchmark specifications for 3331 critical or subcritical configurations that are intended for use in validation efforts and for testing basic nuclear data. New to the 2004 Edition of the Handbook is a draft criticality alarm / shielding type benchmark that should be finalized in 2005 along with two other similar benchmarks. The Handbook is being used extensively for nuclear data testing and is expected to be a valuable resource for code and data validation and improvement efforts for decades to come. Specific benchmarks that are useful for testing structural materials such as iron, chromium, nickel, and manganese; beryllium; lead; thorium; and 238U are highlighted.« less
Stability metrics for multi-source biomedical data based on simplicial projections from probability distribution distances.

PubMed

Sáez, Carlos; Robles, Montserrat; García-Gómez, Juan M

2017-02-01

Biomedical data may be composed of individuals generated from distinct, meaningful sources. Due to possible contextual biases in the processes that generate data, there may exist an undesirable and unexpected variability among the probability distribution functions (PDFs) of the source subsamples, which, when uncontrolled, may lead to inaccurate or unreproducible research results. Classical statistical methods may have difficulties to undercover such variabilities when dealing with multi-modal, multi-type, multi-variate data. This work proposes two metrics for the analysis of stability among multiple data sources, robust to the aforementioned conditions, and defined in the context of data quality assessment. Specifically, a global probabilistic deviation and a source probabilistic outlyingness metrics are proposed. The first provides a bounded degree of the global multi-source variability, designed as an estimator equivalent to the notion of normalized standard deviation of PDFs. The second provides a bounded degree of the dissimilarity of each source to a latent central distribution. The metrics are based on the projection of a simplex geometrical structure constructed from the Jensen-Shannon distances among the sources PDFs. The metrics have been evaluated and demonstrated their correct behaviour on a simulated benchmark and with real multi-source biomedical data using the UCI Heart Disease data set. The biomedical data quality assessment based on the proposed stability metrics may improve the efficiency and effectiveness of biomedical data exploitation and research.
Developing Student Character through Disciplinary Curricula: An Analysis of UK QAA Subject Benchmark Statements

ERIC Educational Resources Information Center

Quinlan, Kathleen M.

2016-01-01

What aspects of student character are expected to be developed through disciplinary curricula? This paper examines the UK written curriculum through an analysis of the Quality Assurance Agency's subject benchmark statements for the most popular subjects studied in the UK. It explores the language, principles and intended outcomes that suggest…
Regional restoration benchmarks for Acropora cervicornis

NASA Astrophysics Data System (ADS)

Schopmeyer, Stephanie A.; Lirman, Diego; Bartels, Erich; Gilliam, David S.; Goergen, Elizabeth A.; Griffin, Sean P.; Johnson, Meaghan E.; Lustic, Caitlin; Maxwell, Kerry; Walter, Cory S.

2017-12-01

Coral gardening plays an important role in the recovery of depleted populations of threatened Acropora cervicornis in the Caribbean. Over the past decade, high survival coupled with fast growth of in situ nursery corals have allowed practitioners to create healthy and genotypically diverse nursery stocks. Currently, thousands of corals are propagated and outplanted onto degraded reefs on a yearly basis, representing a substantial increase in the abundance, biomass, and overall footprint of A. cervicornis. Here, we combined an extensive dataset collected by restoration practitioners to document early (1-2 yr) restoration success metrics in Florida and Puerto Rico, USA. By reporting region-specific data on the impacts of fragment collection on donor colonies, survivorship and productivity of nursery corals, and survivorship and productivity of outplanted corals during normal conditions, we provide the basis for a stop-light indicator framework for new or existing restoration programs to evaluate their performance. We show that current restoration methods are very effective, that no excess damage is caused to donor colonies, and that once outplanted, corals behave just as wild colonies. We also provide science-based benchmarks that can be used by programs to evaluate successes and challenges of their efforts, and to make modifications where needed. We propose that up to 10% of the biomass can be collected from healthy, large A. cervicornis donor colonies for nursery propagation. We also propose the following benchmarks for the first year of activities for A. cervicornis restoration: (1) >75% live tissue cover on donor colonies; (2) >80% survivorship of nursery corals; and (3) >70% survivorship of outplanted corals. Finally, we report productivity means of 4.4 cm yr-1 for nursery corals and 4.8 cm yr-1 for outplants as a frame of reference for ranking performance within programs. Such benchmarks, and potential subsequent adaptive actions, are needed to fully assess the
Empirical Evaluation of Hunk Metrics as Bug Predictors

NASA Astrophysics Data System (ADS)

Ferzund, Javed; Ahsan, Syed Nadeem; Wotawa, Franz

Reducing the number of bugs is a crucial issue during software development and maintenance. Software process and product metrics are good indicators of software complexity. These metrics have been used to build bug predictor models to help developers maintain the quality of software. In this paper we empirically evaluate the use of hunk metrics as predictor of bugs. We present a technique for bug prediction that works at smallest units of code change called hunks. We build bug prediction models using random forests, which is an efficient machine learning classifier. Hunk metrics are used to train the classifier and each hunk metric is evaluated for its bug prediction capabilities. Our classifier can classify individual hunks as buggy or bug-free with 86 % accuracy, 83 % buggy hunk precision and 77% buggy hunk recall. We find that history based and change level hunk metrics are better predictors of bugs than code level hunk metrics.
A Seafloor Benchmark for 3-dimensional Geodesy

NASA Astrophysics Data System (ADS)

Chadwell, C. D.; Webb, S. C.; Nooner, S. L.

2014-12-01

We have developed an inexpensive, permanent seafloor benchmark to increase the longevity of seafloor geodetic measurements. The benchmark provides a physical tie to the sea floor lasting for decades (perhaps longer) on which geodetic sensors can be repeatedly placed and removed with millimeter resolution. Global coordinates estimated with seafloor geodetic techniques will remain attached to the benchmark allowing for the interchange of sensors as they fail or become obsolete, or for the sensors to be removed and used elsewhere, all the while maintaining a coherent series of positions referenced to the benchmark. The benchmark has been designed to free fall from the sea surface with transponders attached. The transponder can be recalled via an acoustic command sent from the surface to release from the benchmark and freely float to the sea surface for recovery. The duration of the sensor attachment to the benchmark will last from a few days to a few years depending on the specific needs of the experiment. The recovered sensors are then available to be reused at other locations, or again at the same site in the future. Three pins on the sensor frame mate precisely and unambiguously with three grooves on the benchmark. To reoccupy a benchmark a Remotely Operated Vehicle (ROV) uses its manipulator arm to place the sensor pins into the benchmark grooves. In June 2014 we deployed four benchmarks offshore central Oregon. We used the ROV Jason to successfully demonstrate the removal and replacement of packages onto the benchmark. We will show the benchmark design and its operational capabilities. Presently models of megathrust slip within the Cascadia Subduction Zone (CSZ) are mostly constrained by the sub-aerial GPS vectors from the Plate Boundary Observatory, a part of Earthscope. More long-lived seafloor geodetic measures are needed to better understand the earthquake and tsunami risk associated with a large rupture of the thrust fault within the Cascadia subduction zone
Sharp metric obstructions for quasi-Einstein metrics

NASA Astrophysics Data System (ADS)

Case, Jeffrey S.

2013-02-01

Using the tractor calculus to study smooth metric measure spaces, we adapt results of Gover and Nurowski to give sharp metric obstructions to the existence of quasi-Einstein metrics on suitably generic manifolds. We do this by introducing an analogue of the Weyl tractor W to the setting of smooth metric measure spaces. The obstructions we obtain can be realized as tensorial invariants which are polynomial in the Riemann curvature tensor and its divergence. By taking suitable limits of their tensorial forms, we then find obstructions to the existence of static potentials, generalizing to higher dimensions a result of Bartnik and Tod, and to the existence of potentials for gradient Ricci solitons.
Metric Madness

ERIC Educational Resources Information Center

Kroon, Cindy D.

2007-01-01

Created for a Metric Day activity, Metric Madness is a board game for two to four players. Students review and practice metric vocabulary, measurement, and calculations by playing the game. Playing time is approximately twenty to thirty minutes.
Make It Metric.

ERIC Educational Resources Information Center

Camilli, Thomas

Measurement is perhaps the most frequently used form of mathematics. This book presents activities for learning about the metric system designed for upper intermediate and junior high levels. Discussions include: why metrics, history of metrics, changing to a metric world, teaching tips, and formulas. Activities presented are: metrics all around…
Fighter agility metrics, research, and test

NASA Technical Reports Server (NTRS)

Liefer, Randall K.; Valasek, John; Eggold, David P.

1990-01-01

Proposed new metrics to assess fighter aircraft agility are collected and analyzed. A framework for classification of these new agility metrics is developed and applied. A completed set of transient agility metrics is evaluated with a high fidelity, nonlinear F-18 simulation provided by the NASA Dryden Flight Research Center. Test techniques and data reduction methods are proposed. A method of providing cuing information to the pilot during flight test is discussed. The sensitivity of longitudinal and lateral agility metrics to deviations from the pilot cues is studied in detail. The metrics are shown to be largely insensitive to reasonable deviations from the nominal test pilot commands. Instrumentation required to quantify agility via flight test is also considered. With one exception, each of the proposed new metrics may be measured with instrumentation currently available. Simulation documentation and user instructions are provided in an appendix.
Metrics for Assessing the Quality of Groundwater Used for Public Supply, CA, USA: Equivalent-Population and Area.

PubMed

Belitz, Kenneth; Fram, Miranda S; Johnson, Tyler D

2015-07-21

Data from 11,000 public supply wells in 87 study areas were used to assess the quality of nearly all of the groundwater used for public supply in California. Two metrics were developed for quantifying groundwater quality: area with high concentrations (km(2) or proportion) and equivalent-population relying upon groundwater with high concentrations (number of people or proportion). Concentrations are considered high if they are above a human-health benchmark. When expressed as proportions, the metrics are area-weighted and population-weighted detection frequencies. On a statewide-scale, about 20% of the groundwater used for public supply has high concentrations for one or more constituents (23% by area and 18% by equivalent-population). On the basis of both area and equivalent-population, trace elements are more prevalent at high concentrations than either nitrate or organic compounds at the statewide-scale, in eight of nine hydrogeologic provinces, and in about three-quarters of the study areas. At a statewide-scale, nitrate is more prevalent than organic compounds based on area, but not on the basis of equivalent-population. The approach developed for this paper, unlike many studies, recognizes the importance of appropriately weighting information when changing scales, and is broadly applicable to other areas.
Benchmark Credentialing Results for NRG-BR001: The First National Cancer Institute-Sponsored Trial of Stereotactic Body Radiation Therapy for Multiple Metastases

DOE Office of Scientific and Technical Information (OSTI.GOV)

Al-Hallaq, Hania A., E-mail: halhallaq@radonc.uchicago.edu; Chmura, Steven J.; Salama, Joseph K.

Purpose: The NRG-BR001 trial is the first National Cancer Institute–sponsored trial to treat multiple (range 2-4) extracranial metastases with stereotactic body radiation therapy. Benchmark credentialing is required to ensure adherence to this complex protocol, in particular, for metastases in close proximity. The present report summarizes the dosimetric results and approval rates. Methods and Materials: The benchmark used anonymized data from a patient with bilateral adrenal metastases, separated by <5 cm of normal tissue. Because the planning target volume (PTV) overlaps with organs at risk (OARs), institutions must use the planning priority guidelines to balance PTV coverage (45 Gy in 3 fractions) againstmore » OAR sparing. Submitted plans were processed by the Imaging and Radiation Oncology Core and assessed by the protocol co-chairs by comparing the doses to targets, OARs, and conformity metrics using nonparametric tests. Results: Of 63 benchmarks submitted through October 2015, 94% were approved, with 51% approved at the first attempt. Most used volumetric arc therapy (VMAT) (78%), a single plan for both PTVs (90%), and prioritized the PTV over the stomach (75%). The median dose to 95% of the volume was 44.8 ± 1.0 Gy and 44.9 ± 1.0 Gy for the right and left PTV, respectively. The median dose to 0.03 cm{sup 3} was 14.2 ± 2.2 Gy to the spinal cord and 46.5 ± 3.1 Gy to the stomach. Plans that spared the stomach significantly reduced the dose to the left PTV and stomach. Conformity metrics were significantly better for single plans that simultaneously treated both PTVs with VMAT, intensity modulated radiation therapy, or 3-dimensional conformal radiation therapy compared with separate plans. No significant differences existed in the dose at 2 cm from the PTVs. Conclusions: Although most plans used VMAT, the range of conformity and dose falloff was large. The decision to prioritize either OARs or PTV coverage varied considerably, suggesting
Benchmark Credentialing Results for NRG-BR001: The First National Cancer Institute-Sponsored Trial of Stereotactic Body Radiation Therapy for Multiple Metastases.

PubMed

Al-Hallaq, Hania A; Chmura, Steven J; Salama, Joseph K; Lowenstein, Jessica R; McNulty, Susan; Galvin, James M; Followill, David S; Robinson, Clifford G; Pisansky, Thomas M; Winter, Kathryn A; White, Julia R; Xiao, Ying; Matuszak, Martha M

2017-01-01

The NRG-BR001 trial is the first National Cancer Institute-sponsored trial to treat multiple (range 2-4) extracranial metastases with stereotactic body radiation therapy. Benchmark credentialing is required to ensure adherence to this complex protocol, in particular, for metastases in close proximity. The present report summarizes the dosimetric results and approval rates. The benchmark used anonymized data from a patient with bilateral adrenal metastases, separated by <5 cm of normal tissue. Because the planning target volume (PTV) overlaps with organs at risk (OARs), institutions must use the planning priority guidelines to balance PTV coverage (45 Gy in 3 fractions) against OAR sparing. Submitted plans were processed by the Imaging and Radiation Oncology Core and assessed by the protocol co-chairs by comparing the doses to targets, OARs, and conformity metrics using nonparametric tests. Of 63 benchmarks submitted through October 2015, 94% were approved, with 51% approved at the first attempt. Most used volumetric arc therapy (VMAT) (78%), a single plan for both PTVs (90%), and prioritized the PTV over the stomach (75%). The median dose to 95% of the volume was 44.8 ± 1.0 Gy and 44.9 ± 1.0 Gy for the right and left PTV, respectively. The median dose to 0.03 cm 3 was 14.2 ± 2.2 Gy to the spinal cord and 46.5 ± 3.1 Gy to the stomach. Plans that spared the stomach significantly reduced the dose to the left PTV and stomach. Conformity metrics were significantly better for single plans that simultaneously treated both PTVs with VMAT, intensity modulated radiation therapy, or 3-dimensional conformal radiation therapy compared with separate plans. No significant differences existed in the dose at 2 cm from the PTVs. Although most plans used VMAT, the range of conformity and dose falloff was large. The decision to prioritize either OARs or PTV coverage varied considerably, suggesting that the toxicity outcomes in the trial could be affected. Several
Benchmarking an unstructured grid sediment model in an energetic estuary

DOE PAGES

Lopez, Jesse E.; Baptista, António M.

2016-12-14

A sediment model coupled to the hydrodynamic model SELFE is validated against a benchmark combining a set of idealized tests and an application to a field-data rich energetic estuary. After sensitivity studies, model results for the idealized tests largely agree with previously reported results from other models in addition to analytical, semi-analytical, or laboratory results. Results of suspended sediment in an open channel test with fixed bottom are sensitive to turbulence closure and treatment for hydrodynamic bottom boundary. Results for the migration of a trench are very sensitive to critical stress and erosion rate, but largely insensitive to turbulence closure.more » The model is able to qualitatively represent sediment dynamics associated with estuarine turbidity maxima in an idealized estuary. Applied to the Columbia River estuary, the model qualitatively captures sediment dynamics observed by fixed stations and shipborne profiles. Representation of the vertical structure of suspended sediment degrades when stratification is underpredicted. Across all tests, skill metrics of suspended sediments lag those of hydrodynamics even when qualitatively representing dynamics. The benchmark is fully documented in an openly available repository to encourage unambiguous comparisons against other models.« less
Metric analysis and data validation across FORTRAN projects

NASA Technical Reports Server (NTRS)

Basili, Victor R.; Selby, Richard W., Jr.; Phillips, Tsai-Yun

1983-01-01

The desire to predict the effort in developing or explaining the quality of software has led to the proposal of several metrics. As a step toward validating these metrics, the Software Engineering Laboratory (SEL) has analyzed the software science metrics, cyclomatic complexity, and various standard program measures for their relation to effort (including design through acceptance testing), development errors (both discrete and weighted according to the amount of time to locate and fix), and one another. The data investigated are collected from a project FORTRAN environment and examined across several projects at once, within individual projects and by reporting accuracy checks demonstrating the need to validate a database. When the data comes from individual programmers or certain validated projects, the metrics' correlations with actual effort seem to be strongest. For modules developed entirely by individual programmers, the validity ratios induce a statistically significant ordering of several of the metrics' correlations. When comparing the strongest correlations, neither software science's E metric cyclomatic complexity not source lines of code appears to relate convincingly better with effort than the others.
Metrication report to the Congress. 1991 activities and 1992 plans

NASA Technical Reports Server (NTRS)

1991-01-01

During 1991, NASA approved a revised metric use policy and developed a NASA Metric Transition Plan. This Plan targets the end of 1995 for completion of NASA's metric initiatives. This Plan also identifies future programs that NASA anticipates will use the metric system of measurement. Field installations began metric transition studies in 1991 and will complete them in 1992. Half of NASA's Space Shuttle payloads for 1991, and almost all such payloads for 1992, have some metric-based elements. In 1992, NASA will begin assessing requirements for space-quality piece parts fabricated to U.S. metric standards, leading to development and qualification of high priority parts.
EPA and EFSA approaches for Benchmark Dose modeling

EPA Science Inventory

Benchmark dose (BMD) modeling has become the preferred approach in the analysis of toxicological dose-response data for the purpose of deriving human health toxicity values. The software packages most often used are Benchmark Dose Software (BMDS, developed by EPA) and PROAST (de...
Metrication: A Guide for Consumers.

ERIC Educational Resources Information Center

Consumer and Corporate Affairs Dept., Ottawa (Ontario).

The widespread use of the metric system by most of the major industrial powers of the world has prompted the Canadian government to investigate and consider use of the system. This booklet was developed to aid the consuming public in Canada in gaining some knowledge of metrication and how its application would affect their present economy.…
Standardised Benchmarking in the Quest for Orthologs

PubMed Central

Altenhoff, Adrian M.; Boeckmann, Brigitte; Capella-Gutierrez, Salvador; Dalquen, Daniel A.; DeLuca, Todd; Forslund, Kristoffer; Huerta-Cepas, Jaime; Linard, Benjamin; Pereira, Cécile; Pryszcz, Leszek P.; Schreiber, Fabian; Sousa da Silva, Alan; Szklarczyk, Damian; Train, Clément-Marie; Bork, Peer; Lecompte, Odile; von Mering, Christian; Xenarios, Ioannis; Sjölander, Kimmen; Juhl Jensen, Lars; Martin, Maria J.; Muffato, Matthieu; Gabaldón, Toni; Lewis, Suzanna E.; Thomas, Paul D.; Sonnhammer, Erik; Dessimoz, Christophe

2016-01-01

The identification of evolutionarily related genes across different species—orthologs in particular—forms the backbone of many comparative, evolutionary, and functional genomic analyses. Achieving high accuracy in orthology inference is thus essential. Yet the true evolutionary history of genes, required to ascertain orthology, is generally unknown. Furthermore, orthologs are used for very different applications across different phyla, with different requirements in terms of the precision-recall trade-off. As a result, assessing the performance of orthology inference methods remains difficult for both users and method developers. Here, we present a community effort to establish standards in orthology benchmarking and facilitate orthology benchmarking through an automated web-based service (http://orthology.benchmarkservice.org). Using this new service, we characterise the performance of 15 well-established orthology inference methods and resources on a battery of 20 different benchmarks. Standardised benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimal requirement for new tools and resources, and guides the development of more accurate orthology inference methods. PMID:27043882
Benchmark Dose Software (BMDS) Development and ...

EPA Pesticide Factsheets

This report is intended to provide an overview of beta version 1.0 of the implementation of a model of repeated measures data referred to as the Toxicodiffusion model. The implementation described here represents the first steps towards integration of the Toxicodiffusion model into the EPA benchmark dose software (BMDS). This version runs from within BMDS 2.0 using an option screen for making model selection, as is done for other models in the BMDS 2.0 suite. This report is intended to provide an overview of beta version 1.0 of the implementation of a model of repeated measures data referred to as the Toxicodiffusion model.

Developing a Metric for the Cost of Green House Gas Abatement

DOT National Transportation Integrated Search

2017-02-28

The authors introduce the levelized cost of carbon (LCC), a metric that can be used to evaluate MassDOT CO2 abatement projects in terms of their cost-effectiveness. The study presents ways in which the metric can be used to rank projects. The data ar...
Standardised metrics for global surgical surveillance.

PubMed

Weiser, Thomas G; Makary, Martin A; Haynes, Alex B; Dziekan, Gerald; Berry, William R; Gawande, Atul A

2009-09-26

Public health surveillance relies on standardised metrics to evaluate disease burden and health system performance. Such metrics have not been developed for surgical services despite increasing volume, substantial cost, and high rates of death and disability associated with surgery. The Safe Surgery Saves Lives initiative of WHO's Patient Safety Programme has developed standardised public health metrics for surgical care that are applicable worldwide. We assembled an international panel of experts to develop and define metrics for measuring the magnitude and effect of surgical care in a population, while taking into account economic feasibility and practicability. This panel recommended six measures for assessing surgical services at a national level: number of operating rooms, number of operations, number of accredited surgeons, number of accredited anaesthesia professionals, day-of-surgery death ratio, and postoperative in-hospital death ratio. We assessed the feasibility of gathering such statistics at eight diverse hospitals in eight countries and incorporated them into the WHO Guidelines for Safe Surgery, in which methods for data collection, analysis, and reporting are outlined.
Energy-Based Metrics for Arthroscopic Skills Assessment.

PubMed

Poursartip, Behnaz; LeBel, Marie-Eve; McCracken, Laura C; Escoto, Abelardo; Patel, Rajni V; Naish, Michael D; Trejos, Ana Luisa

2017-08-05

Minimally invasive skills assessment methods are essential in developing efficient surgical simulators and implementing consistent skills evaluation. Although numerous methods have been investigated in the literature, there is still a need to further improve the accuracy of surgical skills assessment. Energy expenditure can be an indication of motor skills proficiency. The goals of this study are to develop objective metrics based on energy expenditure, normalize these metrics, and investigate classifying trainees using these metrics. To this end, different forms of energy consisting of mechanical energy and work were considered and their values were divided by the related value of an ideal performance to develop normalized metrics. These metrics were used as inputs for various machine learning algorithms including support vector machines (SVM) and neural networks (NNs) for classification. The accuracy of the combination of the normalized energy-based metrics with these classifiers was evaluated through a leave-one-subject-out cross-validation. The proposed method was validated using 26 subjects at two experience levels (novices and experts) in three arthroscopic tasks. The results showed that there are statistically significant differences between novices and experts for almost all of the normalized energy-based metrics. The accuracy of classification using SVM and NN methods was between 70% and 95% for the various tasks. The results show that the normalized energy-based metrics and their combination with SVM and NN classifiers are capable of providing accurate classification of trainees. The assessment method proposed in this study can enhance surgical training by providing appropriate feedback to trainees about their level of expertise and can be used in the evaluation of proficiency.
Toward Scalable Benchmarks for Mass Storage Systems

NASA Technical Reports Server (NTRS)

Miller, Ethan L.

1996-01-01

This paper presents guidelines for the design of a mass storage system benchmark suite, along with preliminary suggestions for programs to be included. The benchmarks will measure both peak and sustained performance of the system as well as predicting both short- and long-term behavior. These benchmarks should be both portable and scalable so they may be used on storage systems from tens of gigabytes to petabytes or more. By developing a standard set of benchmarks that reflect real user workload, we hope to encourage system designers and users to publish performance figures that can be compared with those of other systems. This will allow users to choose the system that best meets their needs and give designers a tool with which they can measure the performance effects of improvements to their systems.
Benchmarking in emergency health systems.

PubMed

Kennedy, Marcus P; Allen, Jacqueline; Allen, Greg

2002-12-01

This paper discusses the role of benchmarking as a component of quality management. It describes the historical background of benchmarking, its competitive origin and the requirement in today's health environment for a more collaborative approach. The classical 'functional and generic' types of benchmarking are discussed with a suggestion to adopt a different terminology that describes the purpose and practicalities of benchmarking. Benchmarking is not without risks. The consequence of inappropriate focus and the need for a balanced overview of process is explored. The competition that is intrinsic to benchmarking is questioned and the negative impact it may have on improvement strategies in poorly performing organizations is recognized. The difficulty in achieving cross-organizational validity in benchmarking is emphasized, as is the need to scrutinize benchmarking measures. The cost effectiveness of benchmarking projects is questioned and the concept of 'best value, best practice' in an environment of fixed resources is examined.
AN OVERVIEW OF THE DEVELOPMENT, STATUS, AND APPLICATION OF EQUILIBRIUM PARTITIONING SEDIMENT BENCHMARKS FOR PAH MIXTURES

EPA Science Inventory

This article provides an overview of the development, theoretical basis, regulatory status, and application of the U.S. Environmental Protection Agency's (USEPA's)< Equilibrium Partitioning Sediment Benchmarks (ESBs) for PAH mixtures. ESBs are compared to other sediment quality g...
Prototypic Development and Evaluation of a Medium Format Metric Camera

NASA Astrophysics Data System (ADS)

Hastedt, H.; Rofallski, R.; Luhmann, T.; Rosenbauer, R.; Ochsner, D.; Rieke-Zapp, D.

2018-05-01

Engineering applications require high-precision 3D measurement techniques for object sizes that vary between small volumes (2-3 m in each direction) and large volumes (around 20 x 20 x 1-10 m). The requested precision in object space (1σ RMS) is defined to be within 0.1-0.2 mm for large volumes and less than 0.01 mm for small volumes. In particular, focussing large volume applications the availability of a metric camera would have different advantages for several reasons: 1) high-quality optical components and stabilisations allow for a stable interior geometry of the camera itself, 2) a stable geometry leads to a stable interior orientation that enables for an a priori camera calibration, 3) a higher resulting precision can be expected. With this article the development and accuracy evaluation of a new metric camera, the ALPA 12 FPS add|metric will be presented. Its general accuracy potential is tested against calibrated lengths in a small volume test environment based on the German Guideline VDI/VDE 2634.1 (2002). Maximum length measurement errors of less than 0.025 mm are achieved with different scenarios having been tested. The accuracy potential for large volumes is estimated within a feasibility study on the application of photogrammetric measurements for the deformation estimation on a large wooden shipwreck in the German Maritime Museum. An accuracy of 0.2 mm-0.4 mm is reached for a length of 28 m (given by a distance from a lasertracker network measurement). All analyses have proven high stabilities of the interior orientation of the camera and indicate the applicability for a priori camera calibration for subsequent 3D measurements.
Benchmarking reference services: step by step.

PubMed

Buchanan, H S; Marshall, J G

1996-01-01

This article is a companion to an introductory article on benchmarking published in an earlier issue of Medical Reference Services Quarterly. Librarians interested in benchmarking often ask the following questions: How do I determine what to benchmark; how do I form a benchmarking team; how do I identify benchmarking partners; what's the best way to collect and analyze benchmarking information; and what will I do with the data? Careful planning is a critical success factor of any benchmarking project, and these questions must be answered before embarking on a benchmarking study. This article summarizes the steps necessary to conduct benchmarking research. Relevant examples of each benchmarking step are provided.
Performation Metrics Development Analysis for Information and Communications Technology Outsourcing: A Case Study

ERIC Educational Resources Information Center

Travis, James L., III

2014-01-01

This study investigated how and to what extent the development and use of the OV-5a operational architecture decomposition tree (OADT) from the Department of Defense (DoD) Architecture Framework (DoDAF) affects requirements analysis with respect to complete performance metrics for performance-based services acquisition of ICT under rigid…
Toward a perceptual video-quality metric

NASA Astrophysics Data System (ADS)

Watson, Andrew B.

1998-07-01

The advent of widespread distribution of digital video creates a need for automated methods for evaluating the visual quality of digital video. This is particularly so since most digital video is compressed using lossy methods, which involve the controlled introduction of potentially visible artifacts. Compounding the problem is the bursty nature of digital video, which requires adaptive bit allocation based on visual quality metrics, and the economic need to reduce bit-rate to the lowest level that yields acceptable quality. In previous work, we have developed visual quality metrics for evaluating, controlling,a nd optimizing the quality of compressed still images. These metrics incorporate simplified models of human visual sensitivity to spatial and chromatic visual signals. Here I describe a new video quality metric that is an extension of these still image metrics into the time domain. Like the still image metrics, it is based on the Discrete Cosine Transform. An effort has been made to minimize the amount of memory and computation required by the metric, in order that might be applied in the widest range of applications. To calibrate the basic sensitivity of this metric to spatial and temporal signals we have made measurements of visual thresholds for temporally varying samples of DCT quantization noise.
Implementation and verification of global optimization benchmark problems

NASA Astrophysics Data System (ADS)

Posypkin, Mikhail; Usov, Alexander

2017-12-01

The paper considers the implementation and verification of a test suite containing 150 benchmarks for global deterministic box-constrained optimization. A C++ library for describing standard mathematical expressions was developed for this purpose. The library automate the process of generating the value of a function and its' gradient at a given point and the interval estimates of a function and its' gradient on a given box using a single description. Based on this functionality, we have developed a collection of tests for an automatic verification of the proposed benchmarks. The verification has shown that literary sources contain mistakes in the benchmarks description. The library and the test suite are available for download and can be used freely.
GRC GSFC TDRSS Waveform Metrics Report

NASA Technical Reports Server (NTRS)

Mortensen, Dale J.

2013-01-01

The report presents software metrics and porting metrics for the GGT Waveform. The porting was from a ground-based COTS SDR, the SDR-3000, to the CoNNeCT JPL SDR. The report does not address any of the Operating Environment (OE) software development, nor the original TDRSS waveform development at GSFC for the COTS SDR. With regard to STRS, the report presents compliance data and lessons learned.
Comprehensive Metric Education Project: Implementing Metrics at a District Level Administrative Guide.

ERIC Educational Resources Information Center

Borelli, Michael L.

This document details the administrative issues associated with guiding a school district through its metrication efforts. Issues regarding staff development, curriculum development, and the acquisition of instructional resources are considered. Alternative solutions are offered. Finally, an overall implementation strategy is discussed with…
A benchmark for subduction zone modeling

NASA Astrophysics Data System (ADS)

van Keken, P.; King, S.; Peacock, S.

2003-04-01

Our understanding of subduction zones hinges critically on the ability to discern its thermal structure and dynamics. Computational modeling has become an essential complementary approach to observational and experimental studies. The accurate modeling of subduction zones is challenging due to the unique geometry, complicated rheological description and influence of fluid and melt formation. The complicated physics causes problems for the accurate numerical solution of the governing equations. As a consequence it is essential for the subduction zone community to be able to evaluate the ability and limitations of various modeling approaches. The participants of a workshop on the modeling of subduction zones, held at the University of Michigan at Ann Arbor, MI, USA in 2002, formulated a number of case studies to be developed into a benchmark similar to previous mantle convection benchmarks (Blankenbach et al., 1989; Busse et al., 1991; Van Keken et al., 1997). Our initial benchmark focuses on the dynamics of the mantle wedge and investigates three different rheologies: constant viscosity, diffusion creep, and dislocation creep. In addition we investigate the ability of codes to accurate model dynamic pressure and advection dominated flows. Proceedings of the workshop and the formulation of the benchmark are available at www.geo.lsa.umich.edu/~keken/subduction02.html We strongly encourage interested research groups to participate in this benchmark. At Nice 2003 we will provide an update and first set of benchmark results. Interested researchers are encouraged to contact one of the authors for further details.
Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking

PubMed Central

2012-01-01

A key metric to assess molecular docking remains ligand enrichment against challenging decoys. Whereas the directory of useful decoys (DUD) has been widely used, clear areas for optimization have emerged. Here we describe an improved benchmarking set that includes more diverse targets such as GPCRs and ion channels, totaling 102 proteins with 22886 clustered ligands drawn from ChEMBL, each with 50 property-matched decoys drawn from ZINC. To ensure chemotype diversity, we cluster each target’s ligands by their Bemis–Murcko atomic frameworks. We add net charge to the matched physicochemical properties and include only the most dissimilar decoys, by topology, from the ligands. An online automated tool (http://decoys.docking.org) generates these improved matched decoys for user-supplied ligands. We test this data set by docking all 102 targets, using the results to improve the balance between ligand desolvation and electrostatics in DOCK 3.6. The complete DUD-E benchmarking set is freely available at http://dude.docking.org. PMID:22716043
Development of Cardiovascular and Neurodevelopmental Metrics as Sublethal Endpoints for the Fish Embryo Toxicity Test.

PubMed

Krzykwa, Julie C; Olivas, Alexis; Jeffries, Marlo K Sellin

2018-06-19

The fathead minnow fish embryo toxicity (FET) test has been proposed as a more humane alternative to current toxicity testing methods, as younger organisms are thought to experience less distress during toxicant exposure. However, the FET test protocol does not include endpoints that allow for the prediction of sublethal adverse outcomes, limiting its utility relative to other test types. Researchers have proposed the development of sublethal endpoints for the FET test to increase its utility. The present study 1) developed methods for previously unmeasured sublethal metrics in fathead minnows (i.e., spontaneous contraction frequency and heart rate) and 2) investigated the responsiveness of several sublethal endpoints related to growth (wet weight, length, and growth-related gene expression), neurodevelopment (spontaneous contraction frequency, and neurodevelopmental gene expression), and cardiovascular function and development (pericardial area, eye size and cardiovascular related gene expression) as additional FET test metrics using the model toxicant 3,4-dichloroaniline. Of the growth, neurological and cardiovascular endpoints measured, length, eye size and pericardial area were found to more responsive than the other endpoints, respectively. Future studies linking alterations in these endpoints to longer-term adverse impacts are needed to fully evaluate the predictive power of these metrics in chemical and whole effluent toxicity testing. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
MoMaS reactive transport benchmark using PFLOTRAN

NASA Astrophysics Data System (ADS)

Park, H.

2017-12-01

MoMaS benchmark was developed to enhance numerical simulation capability for reactive transport modeling in porous media. The benchmark was published in late September of 2009; it is not taken from a real chemical system, but realistic and numerically challenging tests. PFLOTRAN is a state-of-art massively parallel subsurface flow and reactive transport code that is being used in multiple nuclear waste repository projects at Sandia National Laboratories including Waste Isolation Pilot Plant and Used Fuel Disposition. MoMaS benchmark has three independent tests with easy, medium, and hard chemical complexity. This paper demonstrates how PFLOTRAN is applied to this benchmark exercise and shows results of the easy benchmark test case which includes mixing of aqueous components and surface complexation. Surface complexations consist of monodentate and bidentate reactions which introduces difficulty in defining selectivity coefficient if the reaction applies to a bulk reference volume. The selectivity coefficient becomes porosity dependent for bidentate reaction in heterogeneous porous media. The benchmark is solved by PFLOTRAN with minimal modification to address the issue and unit conversions were made properly to suit PFLOTRAN.
Defining Sustainability Metric Targets in an Institutional Setting

ERIC Educational Resources Information Center

Rauch, Jason N.; Newman, Julie

2009-01-01

Purpose: The purpose of this paper is to expand on the development of university and college sustainability metrics by implementing an adaptable metric target strategy. Design/methodology/approach: A combined qualitative and quantitative methodology is derived that both defines what a sustainable metric target might be and describes the path a…
The KMAT: Benchmarking Knowledge Management.

ERIC Educational Resources Information Center

de Jager, Martha

Provides an overview of knowledge management and benchmarking, including the benefits and methods of benchmarking (e.g., competitive, cooperative, collaborative, and internal benchmarking). Arthur Andersen's KMAT (Knowledge Management Assessment Tool) is described. The KMAT is a collaborative benchmarking tool, designed to help organizations make…
METRICS DEVELOPMENT FOR THE QUALIS OF SOFTWARE TECHNICAL PRODUCTION.

PubMed

Scarpi, Marinho Jorge

2015-01-01

To recommend metrics to qualify software production and to propose guidelines for the CAPES quadrennial evaluation of the Post-Graduation Programs of Medicine III about this issue. Identification of the development process quality features, of the product attributes and of the software use, determined by Brazilian Association of Technical Standards (ABNT), International Organization Standardization (ISO) and International Electrotechnical (IEC), important in the perspective of the CAPES Medicine III Area correlate users, basing the creation proposal of metrics aiming to be used on four-year evaluation of Medicine III. The in use software quality perception by the user results from the provided effectiveness, productivity, security and satisfaction that originate from its characteristics of functionality, reliability, usability, efficiency, maintainability and portability (in use metrics quality). This perception depends on the specific use scenario. The software metrics should be included in the intellectual production of the program, considering the system behavior measurements results obtained by users' performance evaluation through out the favorable responses punctuation sum for the six in use metrics quality (27 sub-items, 0 to 2 points each) and for quality perception proof (four items, 0 to 10 points each). It will be considered as very good (VG) 85 to 94 points; good (G) 75 to 84 points; regular (R) 65 to 74 points; weak (W) 55 to 64 points; poor (P) <55 points. Recomendar métrica para qualificar a produção de software propondo diretrizes para a avaliação dos Programas de Pós-Graduação da Medicina III. Identificação das características de qualidade para o processo de desenvolvimento, para os atributos do produto e para o uso de software, determinadas pela Associação Brasileira de Normas Técnicas (ABNT), International Organization Standardization (ISO) e International Electrotechnical (IEC), importantes na perspectiva dos usuários correlatos

Saving Lives at Birth; development of a retrospective theory of change, impact framework and prioritised metrics.

PubMed

Lalli, Marek; Ruysen, Harriet; Blencowe, Hannah; Yee, Kristen; Clune, Karen; DeSilva, Mary; Leffler, Marissa; Hillman, Emily; El-Noush, Haitham; Mulligan, Jo; Murray, Jeffrey C; Silver, Karlee; Lawn, Joy E

2018-01-29

Grand Challenges for international health and development initiatives have received substantial funding to tackle unsolved problems; however, evidence of their effectiveness in achieving change is lacking. A theory of change may provide a useful tool to track progress towards desired outcomes. The Saving Lives at Birth partnership aims to address inequities in maternal-newborn survival through the provision of strategic investments for the development, testing and transition-to-scale of ground-breaking prevention and treatment approaches with the potential to leapfrog conventional healthcare approaches in low resource settings. We aimed to develop a theory of change and impact framework with prioritised metrics to map the initiative's contribution towards overall goals, and to measure progress towards improved outcomes around the time of birth. A theory of change and impact framework was developed retrospectively, drawing on expertise across the partnership and stakeholders. This included a document and literature review, and wide consultation, with feedback from stakeholders at all stages. Possible indicators were reviewed from global maternal-newborn health-related partner initiatives, priority indicator lists, and project indicators from current innovators. These indicators were scored across five domains to prioritise those most relevant and feasible for Saving Lives at Birth. These results informed the identification of the prioritised metrics for the initiative. The pathway to scale through Saving Lives at Birth is articulated through a theory of change and impact framework, which also highlight the roles of different actors involved in the programme. A prioritised metrics toolkit, including ten core impact indicators and five additional process indicators, complement the theory of change. The retrospective nature of this development enabled structured reflection of the program mechanics, allowing for inclusion of learning from the first four rounds of the
Object-oriented productivity metrics

NASA Technical Reports Server (NTRS)

Connell, John L.; Eller, Nancy

1992-01-01

Software productivity metrics are useful for sizing and costing proposed software and for measuring development productivity. Estimating and measuring source lines of code (SLOC) has proven to be a bad idea because it encourages writing more lines of code and using lower level languages. Function Point Analysis is an improved software metric system, but it is not compatible with newer rapid prototyping and object-oriented approaches to software development. A process is presented here for counting object-oriented effort points, based on a preliminary object-oriented analysis. It is proposed that this approach is compatible with object-oriented analysis, design, programming, and rapid prototyping. Statistics gathered on actual projects are presented to validate the approach.
Benchmarks for target tracking

NASA Astrophysics Data System (ADS)

Dunham, Darin T.; West, Philip D.

2011-09-01

The term benchmark originates from the chiseled horizontal marks that surveyors made, into which an angle-iron could be placed to bracket ("bench") a leveling rod, thus ensuring that the leveling rod can be repositioned in exactly the same place in the future. A benchmark in computer terms is the result of running a computer program, or a set of programs, in order to assess the relative performance of an object by running a number of standard tests and trials against it. This paper will discuss the history of simulation benchmarks that are being used by multiple branches of the military and agencies of the US government. These benchmarks range from missile defense applications to chemical biological situations. Typically, a benchmark is used with Monte Carlo runs in order to tease out how algorithms deal with variability and the range of possible inputs. We will also describe problems that can be solved by a benchmark.
Benchmarking specialty hospitals, a scoping review on theory and practice.

PubMed

Wind, A; van Harten, W H

2017-04-04

Although benchmarking may improve hospital processes, research on this subject is limited. The aim of this study was to provide an overview of publications on benchmarking in specialty hospitals and a description of study characteristics. We searched PubMed and EMBASE for articles published in English in the last 10 years. Eligible articles described a project stating benchmarking as its objective and involving a specialty hospital or specific patient category; or those dealing with the methodology or evaluation of benchmarking. Of 1,817 articles identified in total, 24 were included in the study. Articles were categorized into: pathway benchmarking, institutional benchmarking, articles on benchmark methodology or -evaluation and benchmarking using a patient registry. There was a large degree of variability:(1) study designs were mostly descriptive and retrospective; (2) not all studies generated and showed data in sufficient detail; and (3) there was variety in whether a benchmarking model was just described or if quality improvement as a consequence of the benchmark was reported upon. Most of the studies that described a benchmark model described the use of benchmarking partners from the same industry category, sometimes from all over the world. Benchmarking seems to be more developed in eye hospitals, emergency departments and oncology specialty hospitals. Some studies showed promising improvement effects. However, the majority of the articles lacked a structured design, and did not report on benchmark outcomes. In order to evaluate the effectiveness of benchmarking to improve quality in specialty hospitals, robust and structured designs are needed including a follow up to check whether the benchmark study has led to improvements.
Requirement Metrics for Risk Identification

NASA Technical Reports Server (NTRS)

Hammer, Theodore; Huffman, Lenore; Wilson, William; Rosenberg, Linda; Hyatt, Lawrence

1996-01-01

The Software Assurance Technology Center (SATC) is part of the Office of Mission Assurance of the Goddard Space Flight Center (GSFC). The SATC's mission is to assist National Aeronautics and Space Administration (NASA) projects to improve the quality of software which they acquire or develop. The SATC's efforts are currently focused on the development and use of metric methodologies and tools that identify and assess risks associated with software performance and scheduled delivery. This starts at the requirements phase, where the SATC, in conjunction with software projects at GSFC and other NASA centers is working to identify tools and metric methodologies to assist project managers in identifying and mitigating risks. This paper discusses requirement metrics currently being used at NASA in a collaborative effort between the SATC and the Quality Assurance Office at GSFC to utilize the information available through the application of requirements management tools.
Data Race Benchmark Collection

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liao, Chunhua; Lin, Pei-Hung; Asplund, Joshua

2017-03-21

This project is a benchmark suite of Open-MP parallel codes that have been checked for data races. The programs are marked to show which do and do not have races. This allows them to be leveraged while testing and developing race detection tools.
24 CFR 84.15 - Metric system of measurement.

Code of Federal Regulations, 2014 CFR

2014-04-01

... 24 Housing and Urban Development 1 2014-04-01 2014-04-01 false Metric system of measurement. 84.15... EDUCATION, HOSPITALS, AND OTHER NON-PROFIT ORGANIZATIONS Pre-Award Requirements § 84.15 Metric system of...) declares that the metric system is the preferred measurement system for U.S. trade and commerce. The Act...
24 CFR 84.15 - Metric system of measurement.

Code of Federal Regulations, 2011 CFR

2011-04-01

... 24 Housing and Urban Development 1 2011-04-01 2011-04-01 false Metric system of measurement. 84.15... EDUCATION, HOSPITALS, AND OTHER NON-PROFIT ORGANIZATIONS Pre-Award Requirements § 84.15 Metric system of...) declares that the metric system is the preferred measurement system for U.S. trade and commerce. The Act...
24 CFR 84.15 - Metric system of measurement.

Code of Federal Regulations, 2013 CFR

2013-04-01

... 24 Housing and Urban Development 1 2013-04-01 2013-04-01 false Metric system of measurement. 84.15... EDUCATION, HOSPITALS, AND OTHER NON-PROFIT ORGANIZATIONS Pre-Award Requirements § 84.15 Metric system of...) declares that the metric system is the preferred measurement system for U.S. trade and commerce. The Act...
24 CFR 84.15 - Metric system of measurement.

Code of Federal Regulations, 2012 CFR

2012-04-01

... 24 Housing and Urban Development 1 2012-04-01 2012-04-01 false Metric system of measurement. 84.15... EDUCATION, HOSPITALS, AND OTHER NON-PROFIT ORGANIZATIONS Pre-Award Requirements § 84.15 Metric system of...) declares that the metric system is the preferred measurement system for U.S. trade and commerce. The Act...
24 CFR 84.15 - Metric system of measurement.

Code of Federal Regulations, 2010 CFR

2010-04-01

... 24 Housing and Urban Development 1 2010-04-01 2010-04-01 false Metric system of measurement. 84.15... EDUCATION, HOSPITALS, AND OTHER NON-PROFIT ORGANIZATIONS Pre-Award Requirements § 84.15 Metric system of...) declares that the metric system is the preferred measurement system for U.S. trade and commerce. The Act...
An Integrated Development Environment for Adiabatic Quantum Programming

DOE Office of Scientific and Technical Information (OSTI.GOV)

Humble, Travis S; McCaskey, Alex; Bennink, Ryan S

2014-01-01

Adiabatic quantum computing is a promising route to the computational power afforded by quantum information processing. The recent availability of adiabatic hardware raises the question of how well quantum programs perform. Benchmarking behavior is challenging since the multiple steps to synthesize an adiabatic quantum program are highly tunable. We present an adiabatic quantum programming environment called JADE that provides control over all the steps taken during program development. JADE captures the workflow needed to rigorously benchmark performance while also allowing a variety of problem types, programming techniques, and processor configurations. We have also integrated JADE with a quantum simulation enginemore » that enables program profiling using numerical calculation. The computational engine supports plug-ins for simulation methodologies tailored to various metrics and computing resources. We present the design, integration, and deployment of JADE and discuss its use for benchmarking adiabatic quantum programs.« less
An international land-biosphere model benchmarking activity for the IPCC Fifth Assessment Report (AR5)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hoffman, Forrest M; Randerson, James T; Thornton, Peter E

2009-12-01

the carbon-nitrogen (CN) model of Thornton. Comparisons of the CLM3 offline results against observational datasets have been performed and are described in Randerson et al. (2009). CLM version 4 has been evaluated using C-LAMP, showing improvement in many of the metrics. Efforts are now underway to initiate a Nitrogen-Land Model Intercomparison Project (N-LAMP) to better constrain the effects of the nitrogen cycle in biosphere models. Presented will be new results from C-LAMP for CLM4, initial N-LAMP developments, and the proposed land-biosphere model benchmarking activity.« less
Benchmarking Global Food Safety Performances: The Era of Risk Intelligence.

PubMed

Valleé, Jean-Charles Le; Charlebois, Sylvain

2015-10-01

Food safety data segmentation and limitations hamper the world's ability to select, build up, monitor, and evaluate food safety performance. Currently, there is no metric that captures the entire food safety system, and performance data are not collected strategically on a global scale. Therefore, food safety benchmarking is essential not only to help monitor ongoing performance but also to inform continued food safety system design, adoption, and implementation toward more efficient and effective food safety preparedness, responsiveness, and accountability. This comparative study identifies and evaluates common elements among global food safety systems. It provides an overall world ranking of food safety performance for 17 Organisation for Economic Co-Operation and Development (OECD) countries, illustrated by 10 indicators organized across three food safety risk governance domains: risk assessment (chemical risks, microbial risks, and national reporting on food consumption), risk management (national food safety capacities, food recalls, food traceability, and radionuclides standards), and risk communication (allergenic risks, labeling, and public trust). Results show all countries have very high food safety standards, but Canada and Ireland, followed by France, earned excellent grades relative to their peers. However, any subsequent global ranking study should consider the development of survey instruments to gather adequate and comparable national evidence on food safety.
What Research Says to the Teacher: Metric Education.

ERIC Educational Resources Information Center

Goldbecker, Sheralyn S.

How measurement systems developed is briefly reviewed, followed by comments on the international conversion to the metric system and a lengthier discussion of the history of the metric controversy in the U.S. Statements made by supporters of the customary and metric systems are listed. The role of education is detailed in terms of teacher…
A new numerical benchmark of a freshwater lens

NASA Astrophysics Data System (ADS)

Stoeckl, L.; Walther, M.; Graf, T.

2016-04-01

A numerical benchmark for 2-D variable-density flow and solute transport in a freshwater lens is presented. The benchmark is based on results of laboratory experiments conducted by Stoeckl and Houben (2012) using a sand tank on the meter scale. This benchmark describes the formation and degradation of a freshwater lens over time as it can be found under real-world islands. An error analysis gave the appropriate spatial and temporal discretization of 1 mm and 8.64 s, respectively. The calibrated parameter set was obtained using the parameter estimation tool PEST. Comparing density-coupled and density-uncoupled results showed that the freshwater-saltwater interface position is strongly dependent on density differences. A benchmark that adequately represents saltwater intrusion and that includes realistic features of coastal aquifers or freshwater lenses was lacking. This new benchmark was thus developed and is demonstrated to be suitable to test variable-density groundwater models applied to saltwater intrusion investigations.
Benchmarking to improve the quality of cystic fibrosis care.

PubMed

Schechter, Michael S

2012-11-01

Benchmarking involves the ascertainment of healthcare programs with most favorable outcomes as a means to identify and spread effective strategies for delivery of care. The recent interest in the development of patient registries for patients with cystic fibrosis (CF) has been fueled in part by an interest in using them to facilitate benchmarking. This review summarizes reports of how benchmarking has been operationalized in attempts to improve CF care. Although certain goals of benchmarking can be accomplished with an exclusive focus on registry data analysis, benchmarking programs in Germany and the United States have supplemented these data analyses with exploratory interactions and discussions to better understand successful approaches to care and encourage their spread throughout the care network. Benchmarking allows the discovery and facilitates the spread of effective approaches to care. It provides a pragmatic alternative to traditional research methods such as randomized controlled trials, providing insights into methods that optimize delivery of care and allowing judgments about the relative effectiveness of different therapeutic approaches.
MIPS bacterial genomes functional annotation benchmark dataset.

PubMed

Tetko, Igor V; Brauner, Barbara; Dunger-Kaltenbach, Irmtraud; Frishman, Goar; Montrone, Corinna; Fobo, Gisela; Ruepp, Andreas; Antonov, Alexey V; Surmeli, Dimitrij; Mewes, Hans-Wernen

2005-05-15

Any development of new methods for automatic functional annotation of proteins according to their sequences requires high-quality data (as benchmark) as well as tedious preparatory work to generate sequence parameters required as input data for the machine learning methods. Different program settings and incompatible protocols make a comparison of the analyzed methods difficult. The MIPS Bacterial Functional Annotation Benchmark dataset (MIPS-BFAB) is a new, high-quality resource comprising four bacterial genomes manually annotated according to the MIPS functional catalogue (FunCat). These resources include precalculated sequence parameters, such as sequence similarity scores, InterPro domain composition and other parameters that could be used to develop and benchmark methods for functional annotation of bacterial protein sequences. These data are provided in XML format and can be used by scientists who are not necessarily experts in genome annotation. BFAB is available at http://mips.gsf.de/proj/bfab
Development and application of a novel metric to assess effectiveness of biomedical data

PubMed Central

Bloom, Gregory C; Eschrich, Steven; Hang, Gang; Schabath, Matthew B; Bhansali, Neera; Hoerter, Andrew M; Morgan, Scott; Fenstermacher, David A

2013-01-01

Objective Design a metric to assess the comparative effectiveness of biomedical data elements within a study that incorporates their statistical relatedness to a given outcome variable as well as a measurement of the quality of their underlying data. Materials and methods The cohort consisted of 874 patients with adenocarcinoma of the lung, each with 47 clinical data elements. The p value for each element was calculated using the Cox proportional hazard univariable regression model with overall survival as the endpoint. An attribute or A-score was calculated by quantification of an element's four quality attributes; Completeness, Comprehensiveness, Consistency and Overall-cost. An effectiveness or E-score was obtained by calculating the conditional probabilities of the p-value and A-score within the given data set with their product equaling the effectiveness score (E-score). Results The E-score metric provided information about the utility of an element beyond an outcome-related p value ranking. E-scores for elements age-at-diagnosis, gender and tobacco-use showed utility above what their respective p values alone would indicate due to their relative ease of acquisition, that is, higher A-scores. Conversely, elements surgery-site, histologic-type and pathological-TNM stage were down-ranked in comparison to their p values based on lower A-scores caused by significantly higher acquisition costs. Conclusions A novel metric termed E-score was developed which incorporates standard statistics with data quality metrics and was tested on elements from a large lung cohort. Results show that an element's underlying data quality is an important consideration in addition to p value correlation to outcome when determining the element's clinical or research utility in a study. PMID:23975264
Candidate control design metrics for an agile fighter

NASA Technical Reports Server (NTRS)

Murphy, Patrick C.; Bailey, Melvin L.; Ostroff, Aaron J.

1991-01-01

Success in the fighter combat environment of the future will certainly demand increasing capability from aircraft technology. These advanced capabilities in the form of superagility and supermaneuverability will require special design techniques which translate advanced air combat maneuvering requirements into design criteria. Control design metrics can provide some of these techniques for the control designer. Thus study presents an overview of control design metrics and investigates metrics for advanced fighter agility. The objectives of various metric users, such as airframe designers and pilots, are differentiated from the objectives of the control designer. Using an advanced fighter model, metric values are documented over a portion of the flight envelope through piloted simulation. These metric values provide a baseline against which future control system improvements can be compared and against which a control design methodology can be developed. Agility is measured for axial, pitch, and roll axes. Axial metrics highlight acceleration and deceleration capabilities under different flight loads and include specific excess power measurements to characterize energy meneuverability. Pitch metrics cover both body-axis and wind-axis pitch rates and accelerations. Included in pitch metrics are nose pointing metrics which highlight displacement capability between the nose and the velocity vector. Roll metrics (or torsion metrics) focus on rotational capability about the wind axis.

Protein Models Docking Benchmark 2

PubMed Central

Anishchenko, Ivan; Kundrotas, Petras J.; Tuzikov, Alexander V.; Vakser, Ilya A.

2015-01-01

Structural characterization of protein-protein interactions is essential for our ability to understand life processes. However, only a fraction of known proteins have experimentally determined structures. Such structures provide templates for modeling of a large part of the proteome, where individual proteins can be docked by template-free or template-based techniques. Still, the sensitivity of the docking methods to the inherent inaccuracies of protein models, as opposed to the experimentally determined high-resolution structures, remains largely untested, primarily due to the absence of appropriate benchmark set(s). Structures in such a set should have pre-defined inaccuracy levels and, at the same time, resemble actual protein models in terms of structural motifs/packing. The set should also be large enough to ensure statistical reliability of the benchmarking results. We present a major update of the previously developed benchmark set of protein models. For each interactor, six models were generated with the model-to-native Cα RMSD in the 1 to 6 Å range. The models in the set were generated by a new approach, which corresponds to the actual modeling of new protein structures in the “real case scenario,” as opposed to the previous set, where a significant number of structures were model-like only. In addition, the larger number of complexes (165 vs. 63 in the previous set) increases the statistical reliability of the benchmarking. We estimated the highest accuracy of the predicted complexes (according to CAPRI criteria), which can be attained using the benchmark structures. The set is available at http://dockground.bioinformatics.ku.edu. PMID:25712716
An unbiased method to build benchmarking sets for ligand-based virtual screening and its application to GPCRs.

PubMed

Xia, Jie; Jin, Hongwei; Liu, Zhenming; Zhang, Liangren; Wang, Xiang Simon

2014-05-27

Benchmarking data sets have become common in recent years for the purpose of virtual screening, though the main focus had been placed on the structure-based virtual screening (SBVS) approaches. Due to the lack of crystal structures, there is great need for unbiased benchmarking sets to evaluate various ligand-based virtual screening (LBVS) methods for important drug targets such as G protein-coupled receptors (GPCRs). To date these ready-to-apply data sets for LBVS are fairly limited, and the direct usage of benchmarking sets designed for SBVS could bring the biases to the evaluation of LBVS. Herein, we propose an unbiased method to build benchmarking sets for LBVS and validate it on a multitude of GPCRs targets. To be more specific, our methods can (1) ensure chemical diversity of ligands, (2) maintain the physicochemical similarity between ligands and decoys, (3) make the decoys dissimilar in chemical topology to all ligands to avoid false negatives, and (4) maximize spatial random distribution of ligands and decoys. We evaluated the quality of our Unbiased Ligand Set (ULS) and Unbiased Decoy Set (UDS) using three common LBVS approaches, with Leave-One-Out (LOO) Cross-Validation (CV) and a metric of average AUC of the ROC curves. Our method has greatly reduced the "artificial enrichment" and "analogue bias" of a published GPCRs benchmarking set, i.e., GPCR Ligand Library (GLL)/GPCR Decoy Database (GDD). In addition, we addressed an important issue about the ratio of decoys per ligand and found that for a range of 30 to 100 it does not affect the quality of the benchmarking set, so we kept the original ratio of 39 from the GLL/GDD.
Using benchmarks for radiation testing of microprocessors and FPGAs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Quinn, Heather; Robinson, William H.; Rech, Paolo

Performance benchmarks have been used over the years to compare different systems. These benchmarks can be useful for researchers trying to determine how changes to the technology, architecture, or compiler affect the system's performance. No such standard exists for systems deployed into high radiation environments, making it difficult to assess whether changes in the fabrication process, circuitry, architecture, or software affect reliability or radiation sensitivity. In this paper, we propose a benchmark suite for high-reliability systems that is designed for field-programmable gate arrays and microprocessors. As a result, we describe the development process and report neutron test data for themore » hardware and software benchmarks.« less
Using benchmarks for radiation testing of microprocessors and FPGAs

DOE PAGES

Quinn, Heather; Robinson, William H.; Rech, Paolo; ...

2015-12-17

Performance benchmarks have been used over the years to compare different systems. These benchmarks can be useful for researchers trying to determine how changes to the technology, architecture, or compiler affect the system's performance. No such standard exists for systems deployed into high radiation environments, making it difficult to assess whether changes in the fabrication process, circuitry, architecture, or software affect reliability or radiation sensitivity. In this paper, we propose a benchmark suite for high-reliability systems that is designed for field-programmable gate arrays and microprocessors. As a result, we describe the development process and report neutron test data for themore » hardware and software benchmarks.« less
Benchmarking and the laboratory

PubMed Central

Galloway, M; Nadin, L

2001-01-01

This article describes how benchmarking can be used to assess laboratory performance. Two benchmarking schemes are reviewed, the Clinical Benchmarking Company's Pathology Report and the College of American Pathologists' Q-Probes scheme. The Clinical Benchmarking Company's Pathology Report is undertaken by staff based in the clinical management unit, Keele University with appropriate input from the professional organisations within pathology. Five annual reports have now been completed. Each report is a detailed analysis of 10 areas of laboratory performance. In this review, particular attention is focused on the areas of quality, productivity, variation in clinical practice, skill mix, and working hours. The Q-Probes scheme is part of the College of American Pathologists programme in studies of quality assurance. The Q-Probes scheme and its applicability to pathology in the UK is illustrated by reviewing two recent Q-Probe studies: routine outpatient test turnaround time and outpatient test order accuracy. The Q-Probes scheme is somewhat limited by the small number of UK laboratories that have participated. In conclusion, as a result of the government's policy in the UK, benchmarking is here to stay. Benchmarking schemes described in this article are one way in which pathologists can demonstrate that they are providing a cost effective and high quality service. Key Words: benchmarking • pathology PMID:11477112
Benchmarking Ada tasking on tightly coupled multiprocessor architectures

NASA Technical Reports Server (NTRS)

Collard, Philippe; Goforth, Andre; Marquardt, Matthew

1989-01-01

The development of benchmarks and performance measures for parallel Ada tasking is reported with emphasis on the macroscopic behavior of the benchmark across a set of load parameters. The application chosen for the study was the NASREM model for telerobot control, relevant to many NASA missions. The results of the study demonstrate the potential of parallel Ada in accomplishing the task of developing a control system for a system such as the Flight Telerobotic Servicer using the NASREM framework.
Overview of TPC Benchmark E: The Next Generation of OLTP Benchmarks

NASA Astrophysics Data System (ADS)

Hogan, Trish

Set to replace the aging TPC-C, the TPC Benchmark E is the next generation OLTP benchmark, which more accurately models client database usage. TPC-E addresses the shortcomings of TPC-C. It has a much more complex workload, requires the use of RAID-protected storage, generates much less I/O, and is much cheaper and easier to set up, run, and audit. After a period of overlap, it is expected that TPC-E will become the de facto OLTP benchmark.
Automatic extraction and visualization of object-oriented software design metrics

NASA Astrophysics Data System (ADS)

Lakshminarayana, Anuradha; Newman, Timothy S.; Li, Wei; Talburt, John

2000-02-01

Software visualization is a graphical representation of software characteristics and behavior. Certain modes of software visualization can be useful in isolating problems and identifying unanticipated behavior. In this paper we present a new approach to aid understanding of object- oriented software through 3D visualization of software metrics that can be extracted from the design phase of software development. The focus of the paper is a metric extraction method and a new collection of glyphs for multi- dimensional metric visualization. Our approach utilize the extensibility interface of a popular CASE tool to access and automatically extract the metrics from Unified Modeling Language class diagrams. Following the extraction of the design metrics, 3D visualization of these metrics are generated for each class in the design, utilizing intuitively meaningful 3D glyphs that are representative of the ensemble of metrics. Extraction and visualization of design metrics can aid software developers in the early study and understanding of design complexity.
Testing, Requirements, and Metrics

NASA Technical Reports Server (NTRS)

Rosenberg, Linda; Hyatt, Larry; Hammer, Theodore F.; Huffman, Lenore; Wilson, William

1998-01-01

The criticality of correct, complete, testable requirements is a fundamental tenet of software engineering. Also critical is complete requirements based testing of the final product. Modern tools for managing requirements allow new metrics to be used in support of both of these critical processes. Using these tools, potential problems with the quality of the requirements and the test plan can be identified early in the life cycle. Some of these quality factors include: ambiguous or incomplete requirements, poorly designed requirements databases, excessive or insufficient test cases, and incomplete linkage of tests to requirements. This paper discusses how metrics can be used to evaluate the quality of the requirements and test to avoid problems later. Requirements management and requirements based testing have always been critical in the implementation of high quality software systems. Recently, automated tools have become available to support requirements management. At NASA's Goddard Space Flight Center (GSFC), automated requirements management tools are being used on several large projects. The use of these tools opens the door to innovative uses of metrics in characterizing test plan quality and assessing overall testing risks. In support of these projects, the Software Assurance Technology Center (SATC) is working to develop and apply a metrics program that utilizes the information now available through the application of requirements management tools. Metrics based on this information provides real-time insight into the testing of requirements and these metrics assist the Project Quality Office in its testing oversight role. This paper discusses three facets of the SATC's efforts to evaluate the quality of the requirements and test plan early in the life cycle, thus preventing costly errors and time delays later.
Elementary School Students' Science Talk Ability in Inquiry-Oriented Settings in Taiwan: Test Development, Verification, and Performance Benchmarks

ERIC Educational Resources Information Center

Lin, Sheau-Wen; Liu, Yu; Chen, Shin-Feng; Wang, Jing-Ru; Kao, Huey-Lien

2016-01-01

The purpose of this study was to develop a computer-based measure of elementary students' science talk and to report students' benchmarks. The development procedure had three steps: defining the framework of the test, collecting and identifying key reference sets of science talk, and developing and verifying the science talk instrument. The…
Healthcare Energy Efficiency Research and Development

DOE Office of Scientific and Technical Information (OSTI.GOV)

Black, Douglas R.; Lai, Judy; Lanzisera, Steven M

2011-01-31

Hospitals are known to be among the most energy intensive commercial buildings in California. Estimates of energy end-uses (e.g. for heating, cooling, lighting, etc.) in hospitals are uncertain for lack of information about hospital-specific mechanical system operations and process loads. Lawrence Berkeley National Laboratory developed and demonstrated a benchmarking system designed specifically for hospitals. Version 1.0 featured metrics to assess energy performance for the broad variety of ventilation and thermal systems that are present in California hospitals. It required moderate to extensive sub-metering or supplemental monitoring. In this new project, we developed a companion handbook with detailed equations that canmore » be used toconvert data from energy and other sensors that may be added to or already part of hospital heating, ventilation and cooling systems into metrics described in the benchmarking document.This report additionally includes a case study and guidance on including metering into designs for new hospitals, renovations and retrofits. Despite widespread concern that this end-use is large and growing, there is limited reliable information about energy use by distributed medical equipment and other miscellaneouselectrical loads in hospitals. This report proposes a framework for quantifying aggregate energy use of medical equipment and miscellaneous loads. Novel approaches are suggested and tried in an attempt to obtain data to support this framework.« less
The MCNP6 Analytic Criticality Benchmark Suite

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, Forrest B.

2016-06-16

Analytical benchmarks provide an invaluable tool for verifying computer codes used to simulate neutron transport. Several collections of analytical benchmark problems [1-4] are used routinely in the verification of production Monte Carlo codes such as MCNP® [5,6]. Verification of a computer code is a necessary prerequisite to the more complex validation process. The verification process confirms that a code performs its intended functions correctly. The validation process involves determining the absolute accuracy of code results vs. nature. In typical validations, results are computed for a set of benchmark experiments using a particular methodology (code, cross-section data with uncertainties, and modeling)more » and compared to the measured results from the set of benchmark experiments. The validation process determines bias, bias uncertainty, and possibly additional margins. Verification is generally performed by the code developers, while validation is generally performed by code users for a particular application space. The VERIFICATION_KEFF suite of criticality problems [1,2] was originally a set of 75 criticality problems found in the literature for which exact analytical solutions are available. Even though the spatial and energy detail is necessarily limited in analytical benchmarks, typically to a few regions or energy groups, the exact solutions obtained can be used to verify that the basic algorithms, mathematics, and methods used in complex production codes perform correctly. The present work has focused on revisiting this benchmark suite. A thorough review of the problems resulted in discarding some of them as not suitable for MCNP benchmarking. For the remaining problems, many of them were reformulated to permit execution in either multigroup mode or in the normal continuous-energy mode for MCNP. Execution of the benchmarks in continuous-energy mode provides a significant advance to MCNP verification methods.« less
Advanced Life Support System Value Metric

NASA Technical Reports Server (NTRS)

Jones, Harry W.; Rasky, Daniel J. (Technical Monitor)

1999-01-01

The NASA Advanced Life Support (ALS) Program is required to provide a performance metric to measure its progress in system development. Extensive discussions within the ALS program have led to the following approach. The Equivalent System Mass (ESM) metric has been traditionally used and provides a good summary of the weight, size, and power cost factors of space life support equipment. But ESM assumes that all the systems being traded off exactly meet a fixed performance requirement, so that the value and benefit (readiness, performance, safety, etc.) of all the different systems designs are considered to be exactly equal. This is too simplistic. Actual system design concepts are selected using many cost and benefit factors and the system specification is defined after many trade-offs. The ALS program needs a multi-parameter metric including both the ESM and a System Value Metric (SVM). The SVM would include safety, maintainability, reliability, performance, use of cross cutting technology, and commercialization potential. Another major factor in system selection is technology readiness level (TRL), a familiar metric in ALS. The overall ALS system metric that is suggested is a benefit/cost ratio, SVM/[ESM + function (TRL)], with appropriate weighting and scaling. The total value is given by SVM. Cost is represented by higher ESM and lower TRL. The paper provides a detailed description and example application of a suggested System Value Metric and an overall ALS system metric.
Degraded visual environment image/video quality metrics

NASA Astrophysics Data System (ADS)

Baumgartner, Dustin D.; Brown, Jeremy B.; Jacobs, Eddie L.; Schachter, Bruce J.

2014-06-01

A number of image quality metrics (IQMs) and video quality metrics (VQMs) have been proposed in the literature for evaluating techniques and systems for mitigating degraded visual environments. Some require both pristine and corrupted imagery. Others require patterned target boards in the scene. None of these metrics relates well to the task of landing a helicopter in conditions such as a brownout dust cloud. We have developed and used a variety of IQMs and VQMs related to the pilot's ability to detect hazards in the scene and to maintain situational awareness. Some of these metrics can be made agnostic to sensor type. Not only are the metrics suitable for evaluating algorithm and sensor variation, they are also suitable for choosing the most cost effective solution to improve operating conditions in degraded visual environments.
USING BROAD-SCALE METRICS TO DEVELOP INDICATORS OF WATERSHED VULNERABILITY IN THE OZARK MOUNTAINS (USA)

EPA Science Inventory

Multiple broad-scale landscape metrics were tested as potential indicators of total phosphorus (TP) concentration, total ammonia (TA) concentration, and Escherichia coli (E. coli) bacteria count, among 244 sub-watersheds in the Ozark Mountains (USA). Indicator models were develop...
A newly developed dispersal metric indicates the succession of benthic invertebrates in restored rivers.

PubMed

Li, Fengqing; Sundermann, Andrea; Stoll, Stefan; Haase, Peter

2016-11-01

Dispersal capacity plays a fundamental role in the riverine benthic invertebrate colonization of new habitats that emerges following flash floods or restoration. However, an appropriate measure of dispersal capacity for benthic invertebrates is still lacking. The dispersal of benthic invertebrates occurs mainly during the aquatic (larval) and aerial (adult) life stages, and the dispersal of each stage can be further subdivided into active and passive modes. Based on these four possible dispersal modes, we first developed a metric (which is very similar to the well-known and widely used saprobic index) to estimate the dispersal capacity for 802 benthic invertebrate taxa by incorporating a weight for each mode. Second, we tested this metric using benthic invertebrate community data from a) 23 large restored river sites with substantial improvements of river bottom habitats dating back 1 to 10years, b) 23 unrestored sites very close to the restored sites, and c) 298 adjacent surrounding sites (mean±standard deviation: 13.0±9.5 per site) within a distance of up to 5km for each restored site in the low mountain and lowland areas of Germany. We hypothesize that our metric will reflect the temporal succession process of benthic invertebrate communities colonizing the restored sites, whereas no temporal changes are expected in the unrestored and surrounding sites. By applying our metric to these three river treatment categories, we found that the average dispersal capacity of benthic invertebrate communities in the restored sites significantly decreased in the early years following restoration, whereas there were no changes in either the unrestored or the surrounding sites. After all taxa had been divided into quartiles representing weak to strong dispersers, this pattern became even more obvious; strong dispersers colonized the restored sites during the first year after restoration and then significantly decreased over time, whereas weak dispersers continued to increase
Uncertainty in Earth System Models: Benchmarks for Ocean Model Performance and Validation

NASA Astrophysics Data System (ADS)

Ogunro, O. O.; Elliott, S.; Collier, N.; Wingenter, O. W.; Deal, C.; Fu, W.; Hoffman, F. M.

2017-12-01

The mean ocean CO2 sink is a major component of the global carbon budget, with marine reservoirs holding about fifty times more carbon than the atmosphere. Phytoplankton play a significant role in the net carbon sink through photosynthesis and drawdown, such that about a quarter of anthropogenic CO2 emissions end up in the ocean. Biology greatly increases the efficiency of marine environments in CO2 uptake and ultimately reduces the impact of the persistent rise in atmospheric concentrations. However, a number of challenges remain in appropriate representation of marine biogeochemical processes in Earth System Models (ESM). These threaten to undermine the community effort to quantify seasonal to multidecadal variability in ocean uptake of atmospheric CO2. In a bid to improve analyses of marine contributions to climate-carbon cycle feedbacks, we have developed new analysis methods and biogeochemistry metrics as part of the International Ocean Model Benchmarking (IOMB) effort. Our intent is to meet the growing diagnostic and benchmarking needs of ocean biogeochemistry models. The resulting software package has been employed to validate DOE ocean biogeochemistry results by comparison with observational datasets. Several other international ocean models contributing results to the fifth phase of the Coupled Model Intercomparison Project (CMIP5) were analyzed simultaneously. Our comparisons suggest that the biogeochemical processes determining CO2 entry into the global ocean are not well represented in most ESMs. Polar regions continue to show notable biases in many critical biogeochemical and physical oceanographic variables. Some of these disparities could have first order impacts on the conversion of atmospheric CO2 to organic carbon. In addition, single forcing simulations show that the current ocean state can be partly explained by the uptake of anthropogenic emissions. Combined effects of two or more of these forcings on ocean biogeochemical cycles and ecosystems
Developing Evidence for Action on the Postgraduate Experience: An Effective Local Instrument to Move beyond Benchmarking

ERIC Educational Resources Information Center

Sampson, K. A.; Johnston, L.; Comer, K.; Brogt, E.

2016-01-01

Summative and benchmarking surveys to measure the postgraduate student research experience are well reported in the literature. While useful, we argue that local instruments that provide formative resources with an academic development focus are also required. If higher education institutions are to move beyond the identification of issues and…
It's A Metric World.

ERIC Educational Resources Information Center

Alabama State Dept. of Education, Montgomery. Div. of Instructional Services.

Topics covered in the first part of this document include eight advantages of the metric system; a summary of metric instruction; the International System of Units (SI) style and usage; metric decimal tables; the metric system; and conversion tables. An alphabetized list of organizations which market metric materials for educators is provided with…
Particle image velocimetry correlation signal-to-noise ratio metrics and measurement uncertainty quantification

NASA Astrophysics Data System (ADS)

Xue, Zhenyu; Charonko, John J.; Vlachos, Pavlos P.

2014-11-01

In particle image velocimetry (PIV) the measurement signal is contained in the recorded intensity of the particle image pattern superimposed on a variety of noise sources. The signal-to-noise-ratio (SNR) strength governs the resulting PIV cross correlation and ultimately the accuracy and uncertainty of the resulting PIV measurement. Hence we posit that correlation SNR metrics calculated from the correlation plane can be used to quantify the quality of the correlation and the resulting uncertainty of an individual measurement. In this paper we extend the original work by Charonko and Vlachos and present a framework for evaluating the correlation SNR using a set of different metrics, which in turn are used to develop models for uncertainty estimation. Several corrections have been applied in this work. The SNR metrics and corresponding models presented herein are expanded to be applicable to both standard and filtered correlations by applying a subtraction of the minimum correlation value to remove the effect of the background image noise. In addition, the notion of a ‘valid’ measurement is redefined with respect to the correlation peak width in order to be consistent with uncertainty quantification principles and distinct from an ‘outlier’ measurement. Finally the type and significance of the error distribution function is investigated. These advancements lead to more robust and reliable uncertainty estimation models compared with the original work by Charonko and Vlachos. The models are tested against both synthetic benchmark data as well as experimental measurements. In this work, {{U}68.5} uncertainties are estimated at the 68.5% confidence level while {{U}95} uncertainties are estimated at 95% confidence level. For all cases the resulting calculated coverage factors approximate the expected theoretical confidence intervals, thus demonstrating the applicability of these new models for estimation of uncertainty for individual PIV measurements.

Related Critical Psychometric Issues and Their Resolutions during Development of PE Metrics

ERIC Educational Resources Information Center

Fox, Connie; Zhu, Weimo; Park, Youngsik; Fisette, Jennifer L.; Graber, Kim C.; Dyson, Ben; Avery, Marybell; Franck, Marian; Placek, Judith H.; Rink, Judy; Raynes, De

2011-01-01

In addition to validity and reliability evidence, other psychometric qualities of the PE Metrics assessments needed to be examined. This article describes how those critical psychometric issues were addressed during the PE Metrics assessment bank construction. Specifically, issues included (a) number of items or assessments needed, (b) training…
Rethinking the reference collection: exploring benchmarks and e-book availability.

PubMed

Husted, Jeffrey T; Czechowski, Leslie J

2012-01-01

Librarians in the Health Sciences Library System at the University of Pittsburgh explored the possibility of developing an electronic reference collection that would replace the print reference collection, thus providing access to these valuable materials to a widely dispersed user population. The librarians evaluated the print reference collection and standard collection development lists as potential benchmarks for the electronic collection, and they determined which books were available in electronic format. They decided that the low availability of electronic versions of titles in each benchmark group rendered the creation of an electronic reference collection using either benchmark impractical.
Development of water quality criteria and screening benchmarks for 2,4,6 trinitrotoluene

DOE Office of Scientific and Technical Information (OSTI.GOV)

Talmage, S.S.; Opresko, D.M.

1995-12-31

Munitions compounds and their degradation products are present at many Army Ammunition Plant Superfund sites. Neither Water Quality Criteria (WQC) for aquatic organisms nor safe soil levels for terrestrial plants and animals have been developed for munitions compounds including trinitrotoluene (TNT). Data are available for the calculation of an acute WQC for TNT according to US EPA guidelines but are insufficient to calculate a chronic criterion. However, available data can be used to determine a Secondary Chronic Value (SCV) and to determine lowest chronic values for fish and daphnids (used by EPA in the absence of criteria). Based on datamore » from eight genera of aquatic organisms, an acute WOC of 0.566 mg/L was calculated. Using available data, a SCV of 0.137 mg/L was calculated. Lowest chronic values for fish and for daphnids are 0.04 mg/L and 1.03 mg/L, respectively. The lowest concentration that affected the growth of aquatic plants was 1.0 mg/L. For terrestrial animals, data from studies of laboratory animals can be extrapolated to derive screening benchmarks in the same way in which human toxicity values are derived from laboratory animal data. For terrestrial animals, a no-observed-adverse-effect-level (NOAEL) for reproductive effects of 1.60 mg/kg/day was determined from a subchronic laboratory feeding study with rats. By scaling the test NOAEL on the basis of differences in body size, screening benchmarks were calculated for oral intake for selected mammalian wildlife species. Screening benchmarks were also derived for protection of benthic organisms in sediment, for soil invertebrates, and for terrestrial plants.« less
Advanced Life Support System Value Metric

NASA Technical Reports Server (NTRS)

Jones, Harry W.; Arnold, James O. (Technical Monitor)

1999-01-01

The NASA Advanced Life Support (ALS) Program is required to provide a performance metric to measure its progress in system development. Extensive discussions within the ALS program have reached a consensus. The Equivalent System Mass (ESM) metric has been traditionally used and provides a good summary of the weight, size, and power cost factors of space life support equipment. But ESM assumes that all the systems being traded off exactly meet a fixed performance requirement, so that the value and benefit (readiness, performance, safety, etc.) of all the different systems designs are exactly equal. This is too simplistic. Actual system design concepts are selected using many cost and benefit factors and the system specification is then set accordingly. The ALS program needs a multi-parameter metric including both the ESM and a System Value Metric (SVM). The SVM would include safety, maintainability, reliability, performance, use of cross cutting technology, and commercialization potential. Another major factor in system selection is technology readiness level (TRL), a familiar metric in ALS. The overall ALS system metric that is suggested is a benefit/cost ratio, [SVM + TRL]/ESM, with appropriate weighting and scaling. The total value is the sum of SVM and TRL. Cost is represented by ESM. The paper provides a detailed description and example application of the suggested System Value Metric.
Automation of Ocean Product Metrics

DTIC Science & Technology

2008-09-30

Presented in: Ocean Sciences 2008 Conf., 5 Mar 2008. Shriver, J., J. D. Dykes, and J. Fabre: Automation of Operational Ocean Product Metrics. Presented in 2008 EGU General Assembly , 14 April 2008. 9 ...processing (multiple data cuts per day) and multiple-nested models. Routines for generating automated evaluations of model forecast statistics will be...developed and pre-existing tools will be collected to create a generalized tool set, which will include user-interface tools to the metrics data
FireHose Streaming Benchmarks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Karl Anderson, Steve Plimpton

2015-01-27

The FireHose Streaming Benchmarks are a suite of stream-processing benchmarks defined to enable comparison of streaming software and hardware, both quantitatively vis-a-vis the rate at which they can process data, and qualitatively by judging the effort involved to implement and run the benchmarks. Each benchmark has two parts. The first is a generator which produces and outputs datums at a high rate in a specific format. The second is an analytic which reads the stream of datums and is required to perform a well-defined calculation on the collection of datums, typically to find anomalous datums that have been created inmore » the stream by the generator. The FireHose suite provides code for the generators, sample code for the analytics (which users are free to re-implement in their own custom frameworks), and a precise definition of each benchmark calculation.« less
An evidence-based approach to benchmarking the fairness of health-sector reform in developing countries.

PubMed Central

Daniels, Norman; Flores, Walter; Pannarunothai, Supasit; Ndumbe, Peter N.; Bryant, John H.; Ngulube, T. J.; Wang, Yuankun

2005-01-01

The Benchmarks of Fairness instrument is an evidence-based policy tool developed in generic form in 2000 for evaluating the effects of health-system reforms on equity, efficiency and accountability. By integrating measures of these effects on the central goal of fairness, the approach fills a gap that has hampered reform efforts for more than two decades. Over the past three years, projects in developing countries on three continents have adapted the generic version of these benchmarks for use at both national and subnational levels. Interdisciplinary teams of managers, providers, academics and advocates agree on the relevant criteria for assessing components of fairness and, depending on which aspects of reform they wish to evaluate, select appropriate indicators that rely on accessible information; they also agree on scoring rules for evaluating the diverse changes in the indicators. In contrast to a comprehensive index that aggregates all measured changes into a single evaluation or rank, the pattern of changes revealed by the benchmarks is used to inform policy deliberation aboutwhich aspects of the reforms have been successfully implemented, and it also allows for improvements to be made in the reforms. This approach permits useful evidence about reform to be gathered in settings where existing information is underused and where there is a weak information infrastructure. Brief descriptions of early results from Cameroon, Ecuador, Guatemala, Thailand and Zambia demonstrate that the method can produce results that are useful for policy and reveal the variety of purposes to which the approach can be put. Collaboration across sites can yield a catalogue of indicators that will facilitate further work. PMID:16175828
Public Interest Energy Research (PIER) Program Development of a Computer-based Benchmarking and Analytical Tool. Benchmarking and Energy & Water Savings Tool in Dairy Plants (BEST-Dairy)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xu, Tengfang; Flapper, Joris; Ke, Jing

The overall goal of the project is to develop a computer-based benchmarking and energy and water savings tool (BEST-Dairy) for use in the California dairy industry - including four dairy processes - cheese, fluid milk, butter, and milk powder. BEST-Dairy tool developed in this project provides three options for the user to benchmark each of the dairy product included in the tool, with each option differentiated based on specific detail level of process or plant, i.e., 1) plant level; 2) process-group level, and 3) process-step level. For each detail level, the tool accounts for differences in production and other variablesmore » affecting energy use in dairy processes. The dairy products include cheese, fluid milk, butter, milk powder, etc. The BEST-Dairy tool can be applied to a wide range of dairy facilities to provide energy and water savings estimates, which are based upon the comparisons with the best available reference cases that were established through reviewing information from international and national samples. We have performed and completed alpha- and beta-testing (field testing) of the BEST-Dairy tool, through which feedback from voluntary users in the U.S. dairy industry was gathered to validate and improve the tool's functionality. BEST-Dairy v1.2 was formally published in May 2011, and has been made available for free downloads from the internet (i.e., http://best-dairy.lbl.gov). A user's manual has been developed and published as the companion documentation for use with the BEST-Dairy tool. In addition, we also carried out technology transfer activities by engaging the dairy industry in the process of tool development and testing, including field testing, technical presentations, and technical assistance throughout the project. To date, users from more than ten countries in addition to those in the U.S. have downloaded the BEST-Dairy from the LBNL website. It is expected that the use of BEST-Dairy tool will advance understanding of energy and
Validation metrics for turbulent plasma transport

DOE PAGES

Holland, C.

2016-06-22

Developing accurate models of plasma dynamics is essential for confident predictive modeling of current and future fusion devices. In modern computer science and engineering, formal verification and validation processes are used to assess model accuracy and establish confidence in the predictive capabilities of a given model. This paper provides an overview of the key guiding principles and best practices for the development of validation metrics, illustrated using examples from investigations of turbulent transport in magnetically confined plasmas. Particular emphasis is given to the importance of uncertainty quantification and its inclusion within the metrics, and the need for utilizing synthetic diagnosticsmore » to enable quantitatively meaningful comparisons between simulation and experiment. As a starting point, the structure of commonly used global transport model metrics and their limitations is reviewed. An alternate approach is then presented, which focuses upon comparisons of predicted local fluxes, fluctuations, and equilibrium gradients against observation. Furthermore, the utility of metrics based upon these comparisons is demonstrated by applying them to gyrokinetic predictions of turbulent transport in a variety of discharges performed on the DIII-D tokamak, as part of a multi-year transport model validation activity.« less
Validation metrics for turbulent plasma transport

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holland, C.

Developing accurate models of plasma dynamics is essential for confident predictive modeling of current and future fusion devices. In modern computer science and engineering, formal verification and validation processes are used to assess model accuracy and establish confidence in the predictive capabilities of a given model. This paper provides an overview of the key guiding principles and best practices for the development of validation metrics, illustrated using examples from investigations of turbulent transport in magnetically confined plasmas. Particular emphasis is given to the importance of uncertainty quantification and its inclusion within the metrics, and the need for utilizing synthetic diagnosticsmore » to enable quantitatively meaningful comparisons between simulation and experiment. As a starting point, the structure of commonly used global transport model metrics and their limitations is reviewed. An alternate approach is then presented, which focuses upon comparisons of predicted local fluxes, fluctuations, and equilibrium gradients against observation. Furthermore, the utility of metrics based upon these comparisons is demonstrated by applying them to gyrokinetic predictions of turbulent transport in a variety of discharges performed on the DIII-D tokamak, as part of a multi-year transport model validation activity.« less
Development and Experimental Benchmark of Simulations to Predict Used Nuclear Fuel Cladding Temperatures during Drying and Transfer Operations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Greiner, Miles

Radial hydride formation in high-burnup used fuel cladding has the potential to radically reduce its ductility and suitability for long-term storage and eventual transport. To avoid this formation, the maximum post-reactor temperature must remain sufficiently low to limit the cladding hoop stress, and so that hydrogen from the existing circumferential hydrides will not dissolve and become available to re-precipitate into radial hydrides under the slow cooling conditions during drying, transfer and early dry-cask storage. The objective of this research is to develop and experimentallybenchmark computational fluid dynamics simulations of heat transfer in post-pool-storage drying operations, when high-burnup fuel cladding ismore » likely to experience its highest temperature. These benchmarked tools can play a key role in evaluating dry cask storage systems for extended storage of high-burnup fuels and post-storage transportation, including fuel retrievability. The benchmarked tools will be used to aid the design of efficient drying processes, as well as estimate variations of surface temperatures as a means of inferring helium integrity inside the canister or cask. This work will be conducted effectively because the principal investigator has experience developing these types of simulations, and has constructed a test facility that can be used to benchmark them.« less
Math Roots: The Beginnings of the Metric System

ERIC Educational Resources Information Center

Johnson, Art; Norris, Kit; Adams,Thomasina Lott, Ed.

2007-01-01

This article reviews the history of the metric system, from a proposal of a sixteenth-century mathematician to its implementation in Revolutionary France some 200 years later. Recent developments in the metric system are also discussed.
An Unbiased Method To Build Benchmarking Sets for Ligand-Based Virtual Screening and its Application To GPCRs

PubMed Central

2015-01-01

Benchmarking data sets have become common in recent years for the purpose of virtual screening, though the main focus had been placed on the structure-based virtual screening (SBVS) approaches. Due to the lack of crystal structures, there is great need for unbiased benchmarking sets to evaluate various ligand-based virtual screening (LBVS) methods for important drug targets such as G protein-coupled receptors (GPCRs). To date these ready-to-apply data sets for LBVS are fairly limited, and the direct usage of benchmarking sets designed for SBVS could bring the biases to the evaluation of LBVS. Herein, we propose an unbiased method to build benchmarking sets for LBVS and validate it on a multitude of GPCRs targets. To be more specific, our methods can (1) ensure chemical diversity of ligands, (2) maintain the physicochemical similarity between ligands and decoys, (3) make the decoys dissimilar in chemical topology to all ligands to avoid false negatives, and (4) maximize spatial random distribution of ligands and decoys. We evaluated the quality of our Unbiased Ligand Set (ULS) and Unbiased Decoy Set (UDS) using three common LBVS approaches, with Leave-One-Out (LOO) Cross-Validation (CV) and a metric of average AUC of the ROC curves. Our method has greatly reduced the “artificial enrichment” and “analogue bias” of a published GPCRs benchmarking set, i.e., GPCR Ligand Library (GLL)/GPCR Decoy Database (GDD). In addition, we addressed an important issue about the ratio of decoys per ligand and found that for a range of 30 to 100 it does not affect the quality of the benchmarking set, so we kept the original ratio of 39 from the GLL/GDD. PMID:24749745
Deep Correlated Holistic Metric Learning for Sketch-Based 3D Shape Retrieval.

PubMed

Dai, Guoxian; Xie, Jin; Fang, Yi

2018-07-01

How to effectively retrieve desired 3D models with simple queries is a long-standing problem in computer vision community. The model-based approach is quite straightforward but nontrivial, since people could not always have the desired 3D query model available by side. Recently, large amounts of wide-screen electronic devices are prevail in our daily lives, which makes the sketch-based 3D shape retrieval a promising candidate due to its simpleness and efficiency. The main challenge of sketch-based approach is the huge modality gap between sketch and 3D shape. In this paper, we proposed a novel deep correlated holistic metric learning (DCHML) method to mitigate the discrepancy between sketch and 3D shape domains. The proposed DCHML trains two distinct deep neural networks (one for each domain) jointly, which learns two deep nonlinear transformations to map features from both domains into a new feature space. The proposed loss, including discriminative loss and correlation loss, aims to increase the discrimination of features within each domain as well as the correlation between different domains. In the new feature space, the discriminative loss minimizes the intra-class distance of the deep transformed features and maximizes the inter-class distance of the deep transformed features to a large margin within each domain, while the correlation loss focused on mitigating the distribution discrepancy across different domains. Different from existing deep metric learning methods only with loss at the output layer, our proposed DCHML is trained with loss at both hidden layer and output layer to further improve the performance by encouraging features in the hidden layer also with desired properties. Our proposed method is evaluated on three benchmarks, including 3D Shape Retrieval Contest 2013, 2014, and 2016 benchmarks, and the experimental results demonstrate the superiority of our proposed method over the state-of-the-art methods.
Word Maturity: A New Metric for Word Knowledge

ERIC Educational Resources Information Center

Landauer, Thomas K.; Kireyev, Kirill; Panaccione, Charles

2011-01-01

A new metric, Word Maturity, estimates the development by individual students of knowledge of every word in a large corpus. The metric is constructed by Latent Semantic Analysis modeling of word knowledge as a function of the reading that a simulated learner has done and is calibrated by its developing closeness in information content to that of a…
Development and validation of trauma surgical skills metrics: Preliminary assessment of performance after training.

PubMed

Shackelford, Stacy; Garofalo, Evan; Shalin, Valerie; Pugh, Kristy; Chen, Hegang; Pasley, Jason; Sarani, Babak; Henry, Sharon; Bowyer, Mark; Mackenzie, Colin F

2015-07-01

Maintaining trauma-specific surgical skills is an ongoing challenge for surgical training programs. An objective assessment of surgical skills is needed. We hypothesized that a validated surgical performance assessment tool could detect differences following a training intervention. We developed surgical performance assessment metrics based on discussion with expert trauma surgeons, video review of 10 experts and 10 novice surgeons performing three vascular exposure procedures and lower extremity fasciotomy on cadavers, and validated the metrics with interrater reliability testing by five reviewers blinded to level of expertise and a consensus conference. We tested these performance metrics in 12 surgical residents (Year 3-7) before and 2 weeks after vascular exposure skills training in the Advanced Surgical Skills for Exposure in Trauma (ASSET) course. Performance was assessed in three areas as follows: knowledge (anatomic, management), procedure steps, and technical skills. Time to completion of procedures was recorded, and these metrics were combined into a single performance score, the Trauma Readiness Index (TRI). Wilcoxon matched-pairs signed-ranks test compared pretraining/posttraining effects. Mean time to complete procedures decreased by 4.3 minutes (from 13.4 minutes to 9.1 minutes). The performance component most improved by the 1-day skills training was procedure steps, completion of which increased by 21%. Technical skill scores improved by 12%. Overall knowledge improved by 3%, with 18% improvement in anatomic knowledge. TRI increased significantly from 50% to 64% with ASSET training. Interrater reliability of the surgical performance assessment metrics was validated with single intraclass correlation coefficient of 0.7 to 0.98. A trauma-relevant surgical performance assessment detected improvements in specific procedure steps and anatomic knowledge taught during a 1-day course, quantified by the TRI. ASSET training reduced time to complete vascular
Using TRACI for Sustainability Metrics

EPA Science Inventory

TRACI, the Tool for the Reduction and Assessment of Chemical and other environmental Impacts, has been developed for sustainability metrics, life cycle impact assessment, and product and process design impact assessment for developing increasingly sustainable products, processes,...
Object-Oriented Implementation of the NAS Parallel Benchmarks using Charm++

NASA Technical Reports Server (NTRS)

Krishnan, Sanjeev; Bhandarkar, Milind; Kale, Laxmikant V.

1996-01-01

This report describes experiences with implementing the NAS Computational Fluid Dynamics benchmarks using a parallel object-oriented language, Charm++. Our main objective in implementing the NAS CFD kernel benchmarks was to develop a code that could be used to easily experiment with different domain decomposition strategies and dynamic load balancing. We also wished to leverage the object-orientation provided by the Charm++ parallel object-oriented language, to develop reusable abstractions that would simplify the process of developing parallel applications. We first describe the Charm++ parallel programming model and the parallel object array abstraction, then go into detail about each of the Scalar Pentadiagonal (SP) and Lower/Upper Triangular (LU) benchmarks, along with performance results. Finally we conclude with an evaluation of the methodology used.
On Applying the Prognostic Performance Metrics

NASA Technical Reports Server (NTRS)

Saxena, Abhinav; Celaya, Jose; Saha, Bhaskar; Saha, Sankalita; Goebel, Kai

2009-01-01

Prognostics performance evaluation has gained significant attention in the past few years. As prognostics technology matures and more sophisticated methods for prognostic uncertainty management are developed, a standardized methodology for performance evaluation becomes extremely important to guide improvement efforts in a constructive manner. This paper is in continuation of previous efforts where several new evaluation metrics tailored for prognostics were introduced and were shown to effectively evaluate various algorithms as compared to other conventional metrics. Specifically, this paper presents a detailed discussion on how these metrics should be interpreted and used. Several shortcomings identified, while applying these metrics to a variety of real applications, are also summarized along with discussions that attempt to alleviate these problems. Further, these metrics have been enhanced to include the capability of incorporating probability distribution information from prognostic algorithms as opposed to evaluation based on point estimates only. Several methods have been suggested and guidelines have been provided to help choose one method over another based on probability distribution characteristics. These approaches also offer a convenient and intuitive visualization of algorithm performance with respect to some of these new metrics like prognostic horizon and alpha-lambda performance, and also quantify the corresponding performance while incorporating the uncertainty information.
Seismo-acoustic ray model benchmarking against experimental tank data.

PubMed

Camargo Rodríguez, Orlando; Collis, Jon M; Simpson, Harry J; Ey, Emanuel; Schneiderwind, Joseph; Felisberto, Paulo

2012-08-01

Acoustic predictions of the recently developed traceo ray model, which accounts for bottom shear properties, are benchmarked against tank experimental data from the EPEE-1 and EPEE-2 (Elastic Parabolic Equation Experiment) experiments. Both experiments are representative of signal propagation in a Pekeris-like shallow-water waveguide over a non-flat isotropic elastic bottom, where significant interaction of the signal with the bottom can be expected. The benchmarks show, in particular, that the ray model can be as accurate as a parabolic approximation model benchmarked in similar conditions. The results of benchmarking are important, on one side, as a preliminary experimental validation of the model and, on the other side, demonstrates the reliability of the ray approach for seismo-acoustic applications.

Coverage Metrics for Model Checking

NASA Technical Reports Server (NTRS)

Penix, John; Visser, Willem; Norvig, Peter (Technical Monitor)

2001-01-01

When using model checking to verify programs in practice, it is not usually possible to achieve complete coverage of the system. In this position paper we describe ongoing research within the Automated Software Engineering group at NASA Ames on the use of test coverage metrics to measure partial coverage and provide heuristic guidance for program model checking. We are specifically interested in applying and developing coverage metrics for concurrent programs that might be used to support certification of next generation avionics software.
A software quality model and metrics for risk assessment

NASA Technical Reports Server (NTRS)

Hyatt, L.; Rosenberg, L.

1996-01-01

A software quality model and its associated attributes are defined and used as the model for the basis for a discussion on risk. Specific quality goals and attributes are selected based on their importance to a software development project and their ability to be quantified. Risks that can be determined by the model's metrics are identified. A core set of metrics relating to the software development process and its products is defined. Measurements for each metric and their usability and applicability are discussed.
Metric Conversion

Atmospheric Science Data Center

2013-03-12

Metric Weights and Measures The metric system is based on 10s. For example, 10 millimeters = 1 centimeter, 10 ... Special Publications: NIST Guide to SI Units: Conversion Factors NIST Guide to SI Units: Conversion Factors listed ...
Benchmarking, benchmarks, or best practices? Applying quality improvement principles to decrease surgical turnaround time.

PubMed

Mitchell, L

1996-01-01

The processes of benchmarking, benchmark data comparative analysis, and study of best practices are distinctly different. The study of best practices is explained with an example based on the Arthur Andersen & Co. 1992 "Study of Best Practices in Ambulatory Surgery". The results of a national best practices study in ambulatory surgery were used to provide our quality improvement team with the goal of improving the turnaround time between surgical cases. The team used a seven-step quality improvement problem-solving process to improve the surgical turnaround time. The national benchmark for turnaround times between surgical cases in 1992 was 13.5 minutes. The initial turnaround time at St. Joseph's Medical Center was 19.9 minutes. After the team implemented solutions, the time was reduced to an average of 16.3 minutes, an 18% improvement. Cost-benefit analysis showed a potential enhanced revenue of approximately $300,000, or a potential savings of $10,119. Applying quality improvement principles to benchmarking, benchmarks, or best practices can improve process performance. Understanding which form of benchmarking the institution wishes to embark on will help focus a team and use appropriate resources. Communicating with professional organizations that have experience in benchmarking will save time and money and help achieve the desired results.
Optimization of Deep Drilling Performance - Development and Benchmark Testing of Advanced Diamond Product Drill Bits & HP/HT Fluids to Significantly Improve Rates of Penetration

DOE Office of Scientific and Technical Information (OSTI.GOV)

Alan Black; Arnis Judzis

2005-09-30

This document details the progress to date on the OPTIMIZATION OF DEEP DRILLING PERFORMANCE--DEVELOPMENT AND BENCHMARK TESTING OF ADVANCED DIAMOND PRODUCT DRILL BITS AND HP/HT FLUIDS TO SIGNIFICANTLY IMPROVE RATES OF PENETRATION contract for the year starting October 2004 through September 2005. The industry cost shared program aims to benchmark drilling rates of penetration in selected simulated deep formations and to significantly improve ROP through a team development of aggressive diamond product drill bit--fluid system technologies. Overall the objectives are as follows: Phase 1--Benchmark ''best in class'' diamond and other product drilling bits and fluids and develop concepts for amore » next level of deep drilling performance; Phase 2--Develop advanced smart bit-fluid prototypes and test at large scale; and Phase 3--Field trial smart bit--fluid concepts, modify as necessary and commercialize products. As of report date, TerraTek has concluded all Phase 1 testing and is planning Phase 2 development.« less
Integral Full Core Multi-Physics PWR Benchmark with Measured Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Forget, Benoit; Smith, Kord; Kumar, Shikhar

In recent years, the importance of modeling and simulation has been highlighted extensively in the DOE research portfolio with concrete examples in nuclear engineering with the CASL and NEAMS programs. These research efforts and similar efforts worldwide aim at the development of high-fidelity multi-physics analysis tools for the simulation of current and next-generation nuclear power reactors. Like all analysis tools, verification and validation is essential to guarantee proper functioning of the software and methods employed. The current approach relies mainly on the validation of single physic phenomena (e.g. critical experiment, flow loops, etc.) and there is a lack of relevantmore » multiphysics benchmark measurements that are necessary to validate high-fidelity methods being developed today. This work introduces a new multi-cycle full-core Pressurized Water Reactor (PWR) depletion benchmark based on two operational cycles of a commercial nuclear power plant that provides a detailed description of fuel assemblies, burnable absorbers, in-core fission detectors, core loading and re-loading patterns. This benchmark enables analysts to develop extremely detailed reactor core models that can be used for testing and validation of coupled neutron transport, thermal-hydraulics, and fuel isotopic depletion. The benchmark also provides measured reactor data for Hot Zero Power (HZP) physics tests, boron letdown curves, and three-dimensional in-core flux maps from 58 instrumented assemblies. The benchmark description is now available online and has been used by many groups. However, much work remains to be done on the quantification of uncertainties and modeling sensitivities. This work aims to address these deficiencies and make this benchmark a true non-proprietary international benchmark for the validation of high-fidelity tools. This report details the BEAVRS uncertainty quantification for the first two cycle of operations and serves as the final report of the
Quality metrics for sensor images

NASA Technical Reports Server (NTRS)

Ahumada, AL

1993-01-01

Methods are needed for evaluating the quality of augmented visual displays (AVID). Computational quality metrics will help summarize, interpolate, and extrapolate the results of human performance tests with displays. The FLM Vision group at NASA Ames has been developing computational models of visual processing and using them to develop computational metrics for similar problems. For example, display modeling systems use metrics for comparing proposed displays, halftoning optimizing methods use metrics to evaluate the difference between the halftone and the original, and image compression methods minimize the predicted visibility of compression artifacts. The visual discrimination models take as input two arbitrary images A and B and compute an estimate of the probability that a human observer will report that A is different from B. If A is an image that one desires to display and B is the actual displayed image, such an estimate can be regarded as an image quality metric reflecting how well B approximates A. There are additional complexities associated with the problem of evaluating the quality of radar and IR enhanced displays for AVID tasks. One important problem is the question of whether intruding obstacles are detectable in such displays. Although the discrimination model can handle detection situations by making B the original image A plus the intrusion, this detection model makes the inappropriate assumption that the observer knows where the intrusion will be. Effects of signal uncertainty need to be added to our models. A pilot needs to make decisions rapidly. The models need to predict not just the probability of a correct decision, but the probability of a correct decision by the time the decision needs to be made. That is, the models need to predict latency as well as accuracy. Luce and Green have generated models for auditory detection latencies. Similar models are needed for visual detection. Most image quality models are designed for static imagery
Software development predictors, error analysis, reliability models and software metric analysis

NASA Technical Reports Server (NTRS)

Basili, Victor

1983-01-01

The use of dynamic characteristics as predictors for software development was studied. It was found that there are some significant factors that could be useful as predictors. From a study on software errors and complexity, it was shown that meaningful results can be obtained which allow insight into software traits and the environment in which it is developed. Reliability models were studied. The research included the field of program testing because the validity of some reliability models depends on the answers to some unanswered questions about testing. In studying software metrics, data collected from seven software engineering laboratory (FORTRAN) projects were examined and three effort reporting accuracy checks were applied to demonstrate the need to validate a data base. Results are discussed.
Evaluation metrics for bone segmentation in ultrasound

NASA Astrophysics Data System (ADS)

Lougheed, Matthew; Fichtinger, Gabor; Ungi, Tamas

2015-03-01

Tracked ultrasound is a safe alternative to X-ray for imaging bones. The interpretation of bony structures is challenging as ultrasound has no specific intensity characteristic of bones. Several image segmentation algorithms have been devised to identify bony structures. We propose an open-source framework that would aid in the development and comparison of such algorithms by quantitatively measuring segmentation performance in the ultrasound images. True-positive and false-negative metrics used in the framework quantify algorithm performance based on correctly segmented bone and correctly segmented boneless regions. Ground-truth for these metrics are defined manually and along with the corresponding automatically segmented image are used for the performance analysis. Manually created ground truth tests were generated to verify the accuracy of the analysis. Further evaluation metrics for determining average performance per slide and standard deviation are considered. The metrics provide a means of evaluating accuracy of frames along the length of a volume. This would aid in assessing the accuracy of the volume itself and the approach to image acquisition (positioning and frequency of frame). The framework was implemented as an open-source module of the 3D Slicer platform. The ground truth tests verified that the framework correctly calculates the implemented metrics. The developed framework provides a convenient way to evaluate bone segmentation algorithms. The implementation fits in a widely used application for segmentation algorithm prototyping. Future algorithm development will benefit by monitoring the effects of adjustments to an algorithm in a standard evaluation framework.
Metrics for the NASA Airspace Systems Program

NASA Technical Reports Server (NTRS)

Smith, Jeremy C.; Neitzke, Kurt W.

2009-01-01

This document defines an initial set of metrics for use by the NASA Airspace Systems Program (ASP). ASP consists of the NextGen-Airspace Project and the NextGen-Airportal Project. The work in each project is organized along multiple, discipline-level Research Focus Areas (RFAs). Each RFA is developing future concept elements in support of the Next Generation Air Transportation System (NextGen), as defined by the Joint Planning and Development Office (JPDO). In addition, a single, system-level RFA is responsible for integrating concept elements across RFAs in both projects and for assessing system-wide benefits. The primary purpose of this document is to define a common set of metrics for measuring National Airspace System (NAS) performance before and after the introduction of ASP-developed concepts for NextGen as the system handles increasing traffic. The metrics are directly traceable to NextGen goals and objectives as defined by the JPDO and hence will be used to measure the progress of ASP research toward reaching those goals. The scope of this document is focused on defining a common set of metrics for measuring NAS capacity, efficiency, robustness, and safety at the system-level and at the RFA-level. Use of common metrics will focus ASP research toward achieving system-level performance goals and objectives and enable the discipline-level RFAs to evaluate the impact of their concepts at the system level.
An integrity measure to benchmark quantum error correcting memories

NASA Astrophysics Data System (ADS)

Xu, Xiaosi; de Beaudrap, Niel; O'Gorman, Joe; Benjamin, Simon C.

2018-02-01

Rapidly developing experiments across multiple platforms now aim to realise small quantum codes, and so demonstrate a memory within which a logical qubit can be protected from noise. There is a need to benchmark the achievements in these diverse systems, and to compare the inherent power of the codes they rely upon. We describe a recently introduced performance measure called integrity, which relates to the probability that an ideal agent will successfully ‘guess’ the state of a logical qubit after a period of storage in the memory. Integrity is straightforward to evaluate experimentally without state tomography and it can be related to various established metrics such as the logical fidelity and the pseudo-threshold. We offer a set of experimental milestones that are steps towards demonstrating unconditionally superior encoded memories. Using intensive numerical simulations we compare memories based on the five-qubit code, the seven-qubit Steane code, and a nine-qubit code which is the smallest instance of a surface code; we assess both the simple and fault-tolerant implementations of each. While the ‘best’ code upon which to base a memory does vary according to the nature and severity of the noise, nevertheless certain trends emerge.
Experimental constraints on metric and non-metric theories of gravity

NASA Technical Reports Server (NTRS)

Will, Clifford M.

1989-01-01

Experimental constraints on metric and non-metric theories of gravitation are reviewed. Tests of the Einstein Equivalence Principle indicate that only metric theories of gravity are likely to be viable. Solar system experiments constrain the parameters of the weak field, post-Newtonian limit to be close to the values predicted by general relativity. Future space experiments will provide further constraints on post-Newtonian gravity.
RBscore&NBench: a high-level web server for nucleic acid binding residues prediction with a large-scale benchmarking database.

PubMed

Miao, Zhichao; Westhof, Eric

2016-07-08

RBscore&NBench combines a web server, RBscore and a database, NBench. RBscore predicts RNA-/DNA-binding residues in proteins and visualizes the prediction scores and features on protein structures. The scoring scheme of RBscore directly links feature values to nucleic acid binding probabilities and illustrates the nucleic acid binding energy funnel on the protein surface. To avoid dataset, binding site definition and assessment metric biases, we compared RBscore with 18 web servers and 3 stand-alone programs on 41 datasets, which demonstrated the high and stable accuracy of RBscore. A comprehensive comparison led us to develop a benchmark database named NBench. The web server is available on: http://ahsoka.u-strasbg.fr/rbscorenbench/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Benchmark Evaluation of HTR-PROTEUS Pebble Bed Experimental Program

DOE PAGES

Bess, John D.; Montierth, Leland; Köberl, Oliver; ...

2014-10-09

Benchmark models were developed to evaluate 11 critical core configurations of the HTR-PROTEUS pebble bed experimental program. Various additional reactor physics measurements were performed as part of this program; currently only a total of 37 absorber rod worth measurements have been evaluated as acceptable benchmark experiments for Cores 4, 9, and 10. Dominant uncertainties in the experimental keff for all core configurations come from uncertainties in the ²³⁵U enrichment of the fuel, impurities in the moderator pebbles, and the density and impurity content of the radial reflector. Calculations of k eff with MCNP5 and ENDF/B-VII.0 neutron nuclear data aremore » greater than the benchmark values but within 1% and also within the 3σ uncertainty, except for Core 4, which is the only randomly packed pebble configuration. Repeated calculations of k eff with MCNP6.1 and ENDF/B-VII.1 are lower than the benchmark values and within 1% (~3σ) except for Cores 5 and 9, which calculate lower than the benchmark eigenvalues within 4σ. The primary difference between the two nuclear data libraries is the adjustment of the absorption cross section of graphite. Simulations of the absorber rod worth measurements are within 3σ of the benchmark experiment values. The complete benchmark evaluation details are available in the 2014 edition of the International Handbook of Evaluated Reactor Physics Benchmark Experiments.« less
Benchmarking on Tsunami Currents with ComMIT

NASA Astrophysics Data System (ADS)

Sharghi vand, N.; Kanoglu, U.

2015-12-01

There were no standards for the validation and verification of tsunami numerical models before 2004 Indian Ocean tsunami. Even, number of numerical models has been used for inundation mapping effort, evaluation of critical structures, etc. without validation and verification. After 2004, NOAA Center for Tsunami Research (NCTR) established standards for the validation and verification of tsunami numerical models (Synolakis et al. 2008 Pure Appl. Geophys. 165, 2197-2228), which will be used evaluation of critical structures such as nuclear power plants against tsunami attack. NCTR presented analytical, experimental and field benchmark problems aimed to estimate maximum runup and accepted widely by the community. Recently, benchmark problems were suggested by the US National Tsunami Hazard Mitigation Program Mapping & Modeling Benchmarking Workshop: Tsunami Currents on February 9-10, 2015 at Portland, Oregon, USA (http://nws.weather.gov/nthmp/index.html). These benchmark problems concentrated toward validation and verification of tsunami numerical models on tsunami currents. Three of the benchmark problems were: current measurement of the Japan 2011 tsunami in Hilo Harbor, Hawaii, USA and in Tauranga Harbor, New Zealand, and single long-period wave propagating onto a small-scale experimental model of the town of Seaside, Oregon, USA. These benchmark problems were implemented in the Community Modeling Interface for Tsunamis (ComMIT) (Titov et al. 2011 Pure Appl. Geophys. 168, 2121-2131), which is a user-friendly interface to the validated and verified Method of Splitting Tsunami (MOST) (Titov and Synolakis 1995 J. Waterw. Port Coastal Ocean Eng. 121, 308-316) model and is developed by NCTR. The modeling results are compared with the required benchmark data, providing good agreements and results are discussed. Acknowledgment: The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant
A Classification Scheme for Smart Manufacturing Systems’ Performance Metrics

PubMed Central

Lee, Y. Tina; Kumaraguru, Senthilkumaran; Jain, Sanjay; Robinson, Stefanie; Helu, Moneer; Hatim, Qais Y.; Rachuri, Sudarsan; Dornfeld, David; Saldana, Christopher J.; Kumara, Soundar

2017-01-01

This paper proposes a classification scheme for performance metrics for smart manufacturing systems. The discussion focuses on three such metrics: agility, asset utilization, and sustainability. For each of these metrics, we discuss classification themes, which we then use to develop a generalized classification scheme. In addition to the themes, we discuss a conceptual model that may form the basis for the information necessary for performance evaluations. Finally, we present future challenges in developing robust, performance-measurement systems for real-time, data-intensive enterprises. PMID:28785744
Benchmarking B-Cell Epitope Prediction with Quantitative Dose-Response Data on Antipeptide Antibodies: Towards Novel Pharmaceutical Product Development

PubMed Central

Caoili, Salvador Eugenio C.

2014-01-01

B-cell epitope prediction can enable novel pharmaceutical product development. However, a mechanistically framed consensus has yet to emerge on benchmarking such prediction, thus presenting an opportunity to establish standards of practice that circumvent epistemic inconsistencies of casting the epitope prediction task as a binary-classification problem. As an alternative to conventional dichotomous qualitative benchmark data, quantitative dose-response data on antibody-mediated biological effects are more meaningful from an information-theoretic perspective in the sense that such effects may be expressed as probabilities (e.g., of functional inhibition by antibody) for which the Shannon information entropy (SIE) can be evaluated as a measure of informativeness. Accordingly, half-maximal biological effects (e.g., at median inhibitory concentrations of antibody) correspond to maximally informative data while undetectable and maximal biological effects correspond to minimally informative data. This applies to benchmarking B-cell epitope prediction for the design of peptide-based immunogens that elicit antipeptide antibodies with functionally relevant cross-reactivity. Presently, the Immune Epitope Database (IEDB) contains relatively few quantitative dose-response data on such cross-reactivity. Only a small fraction of these IEDB data is maximally informative, and many more of them are minimally informative (i.e., with zero SIE). Nevertheless, the numerous qualitative data in IEDB suggest how to overcome the paucity of informative benchmark data. PMID:24949474
Workshop Outline and Training Materials for Adult Learners Developed During the Metric Leader Training Project for Maine.

ERIC Educational Resources Information Center

Butzow, John W.; Yvon, Bernard R.

Outlines of five sessions of the Metric Leader Training Project for Maine are included in this publication. Topics are: (1) Introduction - Length - Temperature; (2) Volume/Capacity - Mass/Weight; (3) Metric Advocacy - Length - Area - Temperature; (4) Metric Education Resources; and (5) Metric Education Planning - Metric Olympics - Final…
Self-supervised online metric learning with low rank constraint for scene categorization.

PubMed

Cong, Yang; Liu, Ji; Yuan, Junsong; Luo, Jiebo

2013-08-01

Conventional visual recognition systems usually train an image classifier in a bath mode with all training data provided in advance. However, in many practical applications, only a small amount of training samples are available in the beginning and many more would come sequentially during online recognition. Because the image data characteristics could change over time, it is important for the classifier to adapt to the new data incrementally. In this paper, we present an online metric learning method to address the online scene recognition problem via adaptive similarity measurement. Given a number of labeled data followed by a sequential input of unseen testing samples, the similarity metric is learned to maximize the margin of the distance among different classes of samples. By considering the low rank constraint, our online metric learning model not only can provide competitive performance compared with the state-of-the-art methods, but also guarantees convergence. A bi-linear graph is also defined to model the pair-wise similarity, and an unseen sample is labeled depending on the graph-based label propagation, while the model can also self-update using the more confident new samples. With the ability of online learning, our methodology can well handle the large-scale streaming video data with the ability of incremental self-updating. We evaluate our model to online scene categorization and experiments on various benchmark datasets and comparisons with state-of-the-art methods demonstrate the effectiveness and efficiency of our algorithm.
Benchmarking Tool Kit.

ERIC Educational Resources Information Center

Canadian Health Libraries Association.

Nine Canadian health libraries participated in a pilot test of the Benchmarking Tool Kit between January and April, 1998. Although the Tool Kit was designed specifically for health libraries, the content and approach are useful to other types of libraries as well. Used to its full potential, benchmarking can provide a common measuring stick to…

In the pursuit of a semantic similarity metric based on UMLS annotations for articles in PubMed Central Open Access.

PubMed

Garcia Castro, Leyla Jael; Berlanga, Rafael; Garcia, Alexander

2015-10-01

Although full-text articles are provided by the publishers in electronic formats, it remains a challenge to find related work beyond the title and abstract context. Identifying related articles based on their abstract is indeed a good starting point; this process is straightforward and does not consume as many resources as full-text based similarity would require. However, further analyses may require in-depth understanding of the full content. Two articles with highly related abstracts can be substantially different regarding the full content. How similarity differs when considering title-and-abstract versus full-text and which semantic similarity metric provides better results when dealing with full-text articles are the main issues addressed in this manuscript. We have benchmarked three similarity metrics - BM25, PMRA, and Cosine, in order to determine which one performs best when using concept-based annotations on full-text documents. We also evaluated variations in similarity values based on title-and-abstract against those relying on full-text. Our test dataset comprises the Genomics track article collection from the 2005 Text Retrieval Conference. Initially, we used an entity recognition software to semantically annotate titles and abstracts as well as full-text with concepts defined in the Unified Medical Language System (UMLS®). For each article, we created a document profile, i.e., a set of identified concepts, term frequency, and inverse document frequency; we then applied various similarity metrics to those document profiles. We considered correlation, precision, recall, and F1 in order to determine which similarity metric performs best with concept-based annotations. For those full-text articles available in PubMed Central Open Access (PMC-OA), we also performed dispersion analyses in order to understand how similarity varies when considering full-text articles. We have found that the PubMed Related Articles similarity metric is the most suitable for
Properties of C-metric spaces

NASA Astrophysics Data System (ADS)

Croitoru, Anca; Apreutesei, Gabriela; Mastorakis, Nikos E.

2017-09-01

The subject of this paper belongs to the theory of approximate metrics [23]. An approximate metric on X is a real application defined on X × X that satisfies only a part of the metric axioms. In a recent paper [23], we introduced a new type of approximate metric, named C-metric, that is an application which satisfies only two metric axioms: symmetry and triangular inequality. The remarkable fact in a C-metric space is that a topological structure induced by the C-metric can be defined. The innovative idea of this paper is that we obtain some convergence properties of a C-metric space in the absence of a metric. In this paper we investigate C-metric spaces. The paper is divided into four sections. Section 1 is for Introduction. In Section 2 we recall some concepts and preliminary results. In Section 3 we present some properties of C-metric spaces, such as convergence properties, a canonical decomposition and a C-fixed point theorem. Finally, in Section 4 some conclusions are highlighted.
Benchmark matrix and guide: Part II.

PubMed

1991-01-01

In the last issue of the Journal of Quality Assurance (September/October 1991, Volume 13, Number 5, pp. 14-19), the benchmark matrix developed by Headquarters Air Force Logistics Command was published. Five horizontal levels on the matrix delineate progress in TQM: business as usual, initiation, implementation, expansion, and integration. The six vertical categories that are critical to the success of TQM are leadership, structure, training, recognition, process improvement, and customer focus. In this issue, "Benchmark Matrix and Guide: Part II" will show specifically how to apply the categories of leadership, structure, and training to the benchmark matrix progress levels. At the intersection of each category and level, specific behavior objectives are listed with supporting behaviors and guidelines. Some categories will have objectives that are relatively easy to accomplish, allowing quick progress from one level to the next. Other categories will take considerable time and effort to complete. In the next issue, Part III of this series will focus on recognition, process improvement, and customer focus.
Sustainability Assessment of a Military Installation: A Template for Developing a Mission Sustainability Framework, Goals, Metrics and Reporting System

DTIC Science & Technology

2009-08-01

integration across base MSF Category: Neighbors and Stakeholders (NS) No. Conceptual Metric No. Conceptual Metric NS1 “ Walkable ” on-base community...34 Walkable " on- base community design 1 " Walkable " community Design – on-base: clustering of facilities, presence of sidewalks, need for car...access to public transit LEED for Neighborhood Development (ND) 0-100 index based on score of walkable community indicators Adapt LEED-ND
Distance Metric Learning via Iterated Support Vector Machines.

PubMed

Zuo, Wangmeng; Wang, Faqiang; Zhang, David; Lin, Liang; Huang, Yuchi; Meng, Deyu; Zhang, Lei

2017-07-11

Distance metric learning aims to learn from the given training data a valid distance metric, with which the similarity between data samples can be more effectively evaluated for classification. Metric learning is often formulated as a convex or nonconvex optimization problem, while most existing methods are based on customized optimizers and become inefficient for large scale problems. In this paper, we formulate metric learning as a kernel classification problem with the positive semi-definite constraint, and solve it by iterated training of support vector machines (SVMs). The new formulation is easy to implement and efficient in training with the off-the-shelf SVM solvers. Two novel metric learning models, namely Positive-semidefinite Constrained Metric Learning (PCML) and Nonnegative-coefficient Constrained Metric Learning (NCML), are developed. Both PCML and NCML can guarantee the global optimality of their solutions. Experiments are conducted on general classification, face verification and person re-identification to evaluate our methods. Compared with the state-of-the-art approaches, our methods can achieve comparable classification accuracy and are efficient in training.
About Using the Metric System.

ERIC Educational Resources Information Center

Illinois State Office of Education, Springfield.

This booklet contains a brief introduction to the use of the metric system. Topics covered include: (1) what is the metric system; (2) how to think metric; (3) some advantages of the metric system; (4) basics of the metric system; (5) how to measure length, area, volume, mass and temperature the metric way; (6) some simple calculations using…
Thought Experiment to Examine Benchmark Performance for Fusion Nuclear Data

NASA Astrophysics Data System (ADS)

Murata, Isao; Ohta, Masayuki; Kusaka, Sachie; Sato, Fuminobu; Miyamaru, Hiroyuki

2017-09-01

There are many benchmark experiments carried out so far with DT neutrons especially aiming at fusion reactor development. These integral experiments seemed vaguely to validate the nuclear data below 14 MeV. However, no precise studies exist now. The author's group thus started to examine how well benchmark experiments with DT neutrons can play a benchmarking role for energies below 14 MeV. Recently, as a next phase, to generalize the above discussion, the energy range was expanded to the entire region. In this study, thought experiments with finer energy bins have thus been conducted to discuss how to generally estimate performance of benchmark experiments. As a result of thought experiments with a point detector, the sensitivity for a discrepancy appearing in the benchmark analysis is "equally" due not only to contribution directly conveyed to the deterctor, but also due to indirect contribution of neutrons (named (A)) making neutrons conveying the contribution, indirect controbution of neutrons (B) making the neutrons (A) and so on. From this concept, it would become clear from a sensitivity analysis in advance how well and which energy nuclear data could be benchmarked with a benchmark experiment.
Landscape pattern metrics and regional assessment

USGS Publications Warehouse

O'Neill, R. V.; Riitters, K.H.; Wickham, J.D.; Jones, K.B.

1999-01-01

The combination of remote imagery data, geographic information systems software, and landscape ecology theory provides a unique basis for monitoring and assessing large-scale ecological systems. The unique feature of the work has been the need to develop and interpret quantitative measures of spatial pattern-the landscape indices. This article reviews what is known about the statistical properties of these pattern metrics and suggests some additional metrics based on island biogeography, percolation theory, hierarchy theory, and economic geography. Assessment applications of this approach have required interpreting the pattern metrics in terms of specific environmental endpoints, such as wildlife and water quality, and research into how to represent synergystic effects of many overlapping sources of stress.
Developing a Common Metric for Evaluating Police Performance in Deadly Force Situations

DTIC Science & Technology

2012-08-27

2005).“Police Inservice Deadly Force Training and Requalification in Washington State.” Law Enforcement Executive Forum, 5(2):67-86. NIJ Metric...OF: EXECUTIVE SUMMARY Background There is a critical lack of scientific evidence about whether deadly force management, accountability and training ...Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 15. SUBJECT TERMS training metrics develoment, deadly encounters
Issues to consider in the derivation of water quality benchmarks for the protection of aquatic life.

PubMed

Schneider, Uwe

2014-01-01

While water quality benchmarks for the protection of aquatic life have been in use in some jurisdictions for several decades (USA, Canada, several European countries), more and more countries are now setting up their own national water quality benchmark development programs. In doing so, they either adopt an existing method from another jurisdiction, update on an existing approach, or develop their own new derivation method. Each approach has its own advantages and disadvantages, and many issues have to be addressed when setting up a water quality benchmark development program or when deriving a water quality benchmark. Each of these tasks requires a special expertise. They may seem simple, but are complex in their details. The intention of this paper was to provide some guidance for this process of water quality benchmark development on the program level, for the derivation methodology development, and in the actual benchmark derivation step, as well as to point out some issues (notably the inclusion of adapted populations and cryptic species and points to consider in the use of the species sensitivity distribution approach) and future opportunities (an international data repository and international collaboration in water quality benchmark development).
Utility of different glycemic control metrics for optimizing management of diabetes.

PubMed

Kohnert, Klaus-Dieter; Heinke, Peter; Vogt, Lutz; Salzsieder, Eckhard

2015-02-15

The benchmark for assessing quality of long-term glycemic control and adjustment of therapy is currently glycated hemoglobin (HbA1c). Despite its importance as an indicator for the development of diabetic complications, recent studies have revealed that this metric has some limitations; it conveys a rather complex message, which has to be taken into consideration for diabetes screening and treatment. On the basis of recent clinical trials, the relationship between HbA1c and cardiovascular outcomes in long-standing diabetes has been called into question. It becomes obvious that other surrogate and biomarkers are needed to better predict cardiovascular diabetes complications and assess efficiency of therapy. Glycated albumin, fructosamin, and 1,5-anhydroglucitol have received growing interest as alternative markers of glycemic control. In addition to measures of hyperglycemia, advanced glucose monitoring methods became available. An indispensible adjunct to HbA1c in routine diabetes care is self-monitoring of blood glucose. This monitoring method is now widely used, as it provides immediate feedback to patients on short-term changes, involving fasting, preprandial, and postprandial glucose levels. Beyond the traditional metrics, glycemic variability has been identified as a predictor of hypoglycemia, and it might also be implicated in the pathogenesis of vascular diabetes complications. Assessment of glycemic variability is thus important, but exact quantification requires frequently sampled glucose measurements. In order to optimize diabetes treatment, there is a need for both key metrics of glycemic control on a day-to-day basis and for more advanced, user-friendly monitoring methods. In addition to traditional discontinuous glucose testing, continuous glucose sensing has become a useful tool to reveal insufficient glycemic management. This new technology is particularly effective in patients with complicated diabetes and provides the opportunity to characterize
Validation metrics for turbulent plasma transport

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holland, C., E-mail: chholland@ucsd.edu

Developing accurate models of plasma dynamics is essential for confident predictive modeling of current and future fusion devices. In modern computer science and engineering, formal verification and validation processes are used to assess model accuracy and establish confidence in the predictive capabilities of a given model. This paper provides an overview of the key guiding principles and best practices for the development of validation metrics, illustrated using examples from investigations of turbulent transport in magnetically confined plasmas. Particular emphasis is given to the importance of uncertainty quantification and its inclusion within the metrics, and the need for utilizing synthetic diagnosticsmore » to enable quantitatively meaningful comparisons between simulation and experiment. As a starting point, the structure of commonly used global transport model metrics and their limitations is reviewed. An alternate approach is then presented, which focuses upon comparisons of predicted local fluxes, fluctuations, and equilibrium gradients against observation. The utility of metrics based upon these comparisons is demonstrated by applying them to gyrokinetic predictions of turbulent transport in a variety of discharges performed on the DIII-D tokamak [J. L. Luxon, Nucl. Fusion 42, 614 (2002)], as part of a multi-year transport model validation activity.« less
Benchmark concentrations for methyl mercury obtained from the 9-year follow-up of the Seychelles Child Development Study.

PubMed

van Wijngaarden, Edwin; Beck, Christopher; Shamlaye, Conrad F; Cernichiari, Elsa; Davidson, Philip W; Myers, Gary J; Clarkson, Thomas W

2006-09-01

Methyl mercury (MeHg) is highly toxic to the developing nervous system. Human exposure is mainly from fish consumption since small amounts are present in all fish. Findings of developmental neurotoxicity following high-level prenatal exposure to MeHg raised the question of whether children whose mothers consumed fish contaminated with background levels during pregnancy are at an increased risk of impaired neurological function. Benchmark doses determined from studies in New Zealand, and the Faroese and Seychelles Islands indicate that a level of 4-25 parts per million (ppm) measured in maternal hair may carry a risk to the infant. However, there are numerous sources of uncertainty that could affect the derivation of benchmark doses, and it is crucial to continue to investigate the most appropriate derivation of safe consumption levels. Earlier, we published the findings from benchmark analyses applied to the data collected on the Seychelles main cohort at the 66-month follow-up period. Here, we expand on the main cohort analyses by determining the benchmark doses (BMD) of MeHg level in maternal hair based on 643 Seychellois children for whom 26 different neurobehavioral endpoints were measured at 9 years of age. Dose-response models applied to these continuous endpoints incorporated a variety of covariates and included the k-power model, the Weibull model, and the logistic model. The average 95% lower confidence limit of the BMD (BMDL) across all 26 endpoints varied from 20.1 ppm (range=17.2-22.5) for the logistic model to 20.4 ppm (range=17.9-23.0) for the k-power model. These estimates are somewhat lower than those obtained after 66 months of follow-up. The Seychelles Child Development Study continues to provide a firm scientific basis for the derivation of safe levels of MeHg consumption.
MoleculeNet: a benchmark for molecular machine learning† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02664a

PubMed Central

Wu, Zhenqin; Ramsundar, Bharath; Feinberg, Evan N.; Gomes, Joseph; Geniesse, Caleb; Pappu, Aneesh S.; Leswing, Karl

2017-01-01

Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm. PMID:29629118
A Validation of Object-Oriented Design Metrics as Quality Indicators

NASA Technical Reports Server (NTRS)

Basili, Victor R.; Briand, Lionel C.; Melo, Walcelio

1997-01-01

This paper presents the results of a study in which we empirically investigated the suits of object-oriented (00) design metrics introduced in another work. More specifically, our goal is to assess these metrics as predictors of fault-prone classes and, therefore, determine whether they can be used as early quality indicators. This study is complementary to the work described where the same suite of metrics had been used to assess frequencies of maintenance changes to classes. To perform our validation accurately, we collected data on the development of eight medium-sized information management systems based on identical requirements. All eight projects were developed using a sequential life cycle model, a well-known 00 analysis/design method and the C++ programming language. Based on empirical and quantitative analysis, the advantages and drawbacks of these 00 metrics are discussed. Several of Chidamber and Kamerer's 00 metrics appear to be useful to predict class fault-proneness during the early phases of the life-cycle. Also, on our data set, they are better predictors than 'traditional' code metrics, which can only be collected at a later phase of the software development processes.
Benchmarking density functionals for hydrogen-helium mixtures with quantum Monte Carlo: Energetics, pressures, and forces

DOE Office of Scientific and Technical Information (OSTI.GOV)

Clay, Raymond C.; Holzmann, Markus; Ceperley, David M.

An accurate understanding of the phase diagram of dense hydrogen and helium mixtures is a crucial component in the construction of accurate models of Jupiter, Saturn, and Jovian extrasolar planets. Though DFT based rst principles methods have the potential to provide the accuracy and computational e ciency required for this task, recent benchmarking in hydrogen has shown that achieving this accuracy requires a judicious choice of functional, and a quanti cation of the errors introduced. In this work, we present a quantum Monte Carlo based benchmarking study of a wide range of density functionals for use in hydrogen-helium mixtures atmore » thermodynamic conditions relevant for Jovian planets. Not only do we continue our program of benchmarking energetics and pressures, but we deploy QMC based force estimators and use them to gain insights into how well the local liquid structure is captured by di erent density functionals. We nd that TPSS, BLYP and vdW-DF are the most accurate functionals by most metrics, and that the enthalpy, energy, and pressure errors are very well behaved as a function of helium concentration. Beyond this, we highlight and analyze the major error trends and relative di erences exhibited by the major classes of functionals, and estimate the magnitudes of these e ects when possible.« less
Benchmarking density functionals for hydrogen-helium mixtures with quantum Monte Carlo: Energetics, pressures, and forces

DOE PAGES

Clay, Raymond C.; Holzmann, Markus; Ceperley, David M.; ...

2016-01-19

An accurate understanding of the phase diagram of dense hydrogen and helium mixtures is a crucial component in the construction of accurate models of Jupiter, Saturn, and Jovian extrasolar planets. Though DFT based rst principles methods have the potential to provide the accuracy and computational e ciency required for this task, recent benchmarking in hydrogen has shown that achieving this accuracy requires a judicious choice of functional, and a quanti cation of the errors introduced. In this work, we present a quantum Monte Carlo based benchmarking study of a wide range of density functionals for use in hydrogen-helium mixtures atmore » thermodynamic conditions relevant for Jovian planets. Not only do we continue our program of benchmarking energetics and pressures, but we deploy QMC based force estimators and use them to gain insights into how well the local liquid structure is captured by di erent density functionals. We nd that TPSS, BLYP and vdW-DF are the most accurate functionals by most metrics, and that the enthalpy, energy, and pressure errors are very well behaved as a function of helium concentration. Beyond this, we highlight and analyze the major error trends and relative di erences exhibited by the major classes of functionals, and estimate the magnitudes of these e ects when possible.« less
The LSST metrics analysis framework (MAF)

NASA Astrophysics Data System (ADS)

Jones, R. L.; Yoachim, Peter; Chandrasekharan, Srinivasan; Connolly, Andrew J.; Cook, Kem H.; Ivezic, Željko; Krughoff, K. S.; Petry, Catherine; Ridgway, Stephen T.

2014-07-01

We describe the Metrics Analysis Framework (MAF), an open-source python framework developed to provide a user-friendly, customizable, easily-extensible set of tools for analyzing data sets. MAF is part of the Large Synoptic Survey Telescope (LSST) Simulations effort. Its initial goal is to provide a tool to evaluate LSST Operations Simulation (OpSim) simulated surveys to help understand the effects of telescope scheduling on survey performance, however MAF can be applied to a much wider range of datasets. The building blocks of the framework are Metrics (algorithms to analyze a given quantity of data), Slicers (subdividing the overall data set into smaller data slices as relevant for each Metric), and Database classes (to access the dataset and read data into memory). We describe how these building blocks work together, and provide an example of using MAF to evaluate different dithering strategies. We also outline how users can write their own custom Metrics and use these within the framework.
Metric Education; A Position Paper for Vocational, Technical and Adult Education.

ERIC Educational Resources Information Center

Cooper, Gloria S.; And Others

Part of an Office of Education three-year project on metric education, the position paper is intended to alert and prepare teachers, curriculum developers, and administrators in vocational, technical, and adult education to the change over to the metric system. The five chapters cover issues in metric education, what the metric system is all…
U.S. Residential Photovoltaic (PV) System Prices, Q4 2013 Benchmarks: Cash Purchase, Fair Market Value, and Prepaid Lease Transaction Prices

DOE Office of Scientific and Technical Information (OSTI.GOV)

Davidson, C.; James, T. L.; Margolis, R.

The price of photovoltaic (PV) systems in the United States (i.e., the cost to the system owner) has dropped precipitously in recent years, led by substantial reductions in global PV module prices. This report provides a Q4 2013 update for residential PV systems, based on an objective methodology that closely approximates the book value of a PV system. Several cases are benchmarked to represent common variation in business models, labor rates, and module choice. We estimate a weighted-average cash purchase price of $3.29/W for modeled standard-efficiency, polycrystalline-silicon residential PV systems installed in the United States. This is a 46% declinemore » from the 2013-dollar-adjusted price reported in the Q4 2010 benchmark report. In addition, this report frames the cash purchase price in the context of key price metrics relevant to the continually evolving landscape of third-party-owned PV systems by benchmarking the minimum sustainable lease price and the fair market value of residential PV systems.« less

DEVELOPMENT OF METRICS FOR PROTOCOLS AND OTHER TECHNICAL PRODUCTS.

PubMed

Veiga, Daniela Francescato; Ferreira, Lydia Masako

2015-01-01

To develop a proposal for metrics for protocols and other technical products to be applied in assessing the Postgraduate Programs of Medicine III - Capes. The 2013 area documents of all the 48 Capes areas were read. From the analysis of the criteria used by the areas at the 2013's Triennal Assessment, a proposal for metrics for protocols and other technical products was developed to be applied in assessing the Postgraduate Programs of Medicine III. This proposal was based on the criteria of Biological Sciences I and Interdisciplinary areas. Only seven areas have described a scoring system for technical products. The products considered and the scoring varied widely. Due to the wide range of different technical products which could be considered relevant, and that would not be punctuated if they were not previously specified, it was developed, for the Medicine III, a proposal for metrics in which five specific criteria to be analyzed: Demand, Relevance/Impact, Scope, Complexity and Adherence to the Program. Based on these criteria, each product can receive 10 to 100 points. This proposal can be applied to the item Intellectual Production of the evaluation form, in subsection "Technical production, patents and other relevant production". The program will be scored as Very Good when it reaches mean ≥150 points/permanent professor/quadrennium; Good, mean between 100 and 149 points; Regular, mean between 60 and 99 points; Weak mean between 30 and 59 points; Insufficient, up to 29 points/permanent professor/quadrennium. Desenvolver proposta de métricas para protocolos e outras produções técnicas a serem aplicadas na avaliação dos Programas de Pós-Graduação da Área Medicina III da Capes. Foram lidos os documentos de área de 2013 de todas as 48 Áreas da Capes. A partir da análise dos critérios utilizados por elas na avaliação trienal 2013, foi desenvolvida uma proposta de métricas para protocolos e outras produções técnicas. Esta proposta foi baseada
Software metrics: The key to quality software on the NCC project

NASA Technical Reports Server (NTRS)

Burns, Patricia J.

1993-01-01

Network Control Center (NCC) Project metrics are captured during the implementation and testing phases of the NCCDS software development lifecycle. The metrics data collection and reporting function has interfaces with all elements of the NCC project. Close collaboration with all project elements has resulted in the development of a defined and repeatable set of metrics processes. The resulting data are used to plan and monitor release activities on a weekly basis. The use of graphical outputs facilitates the interpretation of progress and status. The successful application of metrics throughout the NCC project has been instrumental in the delivery of quality software. The use of metrics on the NCC Project supports the needs of the technical and managerial staff. This paper describes the project, the functions supported by metrics, the data that are collected and reported, how the data are used, and the improvements in the quality of deliverable software since the metrics processes and products have been in use.
What Randomized Benchmarking Actually Measures

DOE PAGES

Proctor, Timothy; Rudinger, Kenneth; Young, Kevin; ...

2017-09-28

Randomized benchmarking (RB) is widely used to measure an error rate of a set of quantum gates, by performing random circuits that would do nothing if the gates were perfect. In the limit of no finite-sampling error, the exponential decay rate of the observable survival probabilities, versus circuit length, yields a single error metric r. For Clifford gates with arbitrary small errors described by process matrices, r was believed to reliably correspond to the mean, over all Clifford gates, of the average gate infidelity between the imperfect gates and their ideal counterparts. We show that this quantity is not amore » well-defined property of a physical gate set. It depends on the representations used for the imperfect and ideal gates, and the variant typically computed in the literature can differ from r by orders of magnitude. We present new theories of the RB decay that are accurate for all small errors describable by process matrices, and show that the RB decay curve is a simple exponential for all such errors. Here, these theories allow explicit computation of the error rate that RB measures (r), but as far as we can tell it does not correspond to the infidelity of a physically allowed (completely positive) representation of the imperfect gates.« less
Evaluation Metrics for Biostatistical and Epidemiological Collaborations

PubMed Central

Rubio, Doris McGartland; del Junco, Deborah J.; Bhore, Rafia; Lindsell, Christopher J.; Oster, Robert A.; Wittkowski, Knut M.; Welty, Leah J.; Li, Yi-Ju; DeMets, Dave

2011-01-01

Increasing demands for evidence-based medicine and for the translation of biomedical research into individual and public health benefit have been accompanied by the proliferation of special units that offer expertise in biostatistics, epidemiology, and research design (BERD) within academic health centers. Objective metrics that can be used to evaluate, track, and improve the performance of these BERD units are critical to their successful establishment and sustainable future. To develop a set of reliable but versatile metrics that can be adapted easily to different environments and evolving needs, we consulted with members of BERD units from the consortium of academic health centers funded by the Clinical and Translational Science Award Program of the National Institutes of Health. Through a systematic process of consensus building and document drafting, we formulated metrics that covered the three identified domains of BERD practices: the development and maintenance of collaborations with clinical and translational science investigators, the application of BERD-related methods to clinical and translational research, and the discovery of novel BERD-related methodologies. In this article, we describe the set of metrics and advocate their use for evaluating BERD practices. The routine application, comparison of findings across diverse BERD units, and ongoing refinement of the metrics will identify trends, facilitate meaningful changes, and ultimately enhance the contribution of BERD activities to biomedical research. PMID:21284015
Evaluation metrics for biostatistical and epidemiological collaborations.

PubMed

Rubio, Doris McGartland; Del Junco, Deborah J; Bhore, Rafia; Lindsell, Christopher J; Oster, Robert A; Wittkowski, Knut M; Welty, Leah J; Li, Yi-Ju; Demets, Dave

2011-10-15

Increasing demands for evidence-based medicine and for the translation of biomedical research into individual and public health benefit have been accompanied by the proliferation of special units that offer expertise in biostatistics, epidemiology, and research design (BERD) within academic health centers. Objective metrics that can be used to evaluate, track, and improve the performance of these BERD units are critical to their successful establishment and sustainable future. To develop a set of reliable but versatile metrics that can be adapted easily to different environments and evolving needs, we consulted with members of BERD units from the consortium of academic health centers funded by the Clinical and Translational Science Award Program of the National Institutes of Health. Through a systematic process of consensus building and document drafting, we formulated metrics that covered the three identified domains of BERD practices: the development and maintenance of collaborations with clinical and translational science investigators, the application of BERD-related methods to clinical and translational research, and the discovery of novel BERD-related methodologies. In this article, we describe the set of metrics and advocate their use for evaluating BERD practices. The routine application, comparison of findings across diverse BERD units, and ongoing refinement of the metrics will identify trends, facilitate meaningful changes, and ultimately enhance the contribution of BERD activities to biomedical research. Copyright © 2011 John Wiley & Sons, Ltd.
Construction of self-dual codes in the Rosenbloom-Tsfasman metric

NASA Astrophysics Data System (ADS)

Krisnawati, Vira Hari; Nisa, Anzi Lina Ukhtin

2017-12-01

Linear code is a very basic code and very useful in coding theory. Generally, linear code is a code over finite field in Hamming metric. Among the most interesting families of codes, the family of self-dual code is a very important one, because it is the best known error-correcting code. The concept of Hamming metric is develop into Rosenbloom-Tsfasman metric (RT-metric). The inner product in RT-metric is different from Euclid inner product that is used to define duality in Hamming metric. Most of the codes which are self-dual in Hamming metric are not so in RT-metric. And, generator matrix is very important to construct a code because it contains basis of the code. Therefore in this paper, we give some theorems and methods to construct self-dual codes in RT-metric by considering properties of the inner product and generator matrix. Also, we illustrate some examples for every kind of the construction.
The Isprs Benchmark on Indoor Modelling

NASA Astrophysics Data System (ADS)

Khoshelham, K.; Díaz Vilariño, L.; Peter, M.; Kang, Z.; Acharya, D.

2017-09-01

Automated generation of 3D indoor models from point cloud data has been a topic of intensive research in recent years. While results on various datasets have been reported in literature, a comparison of the performance of different methods has not been possible due to the lack of benchmark datasets and a common evaluation framework. The ISPRS benchmark on indoor modelling aims to address this issue by providing a public benchmark dataset and an evaluation framework for performance comparison of indoor modelling methods. In this paper, we present the benchmark dataset comprising several point clouds of indoor environments captured by different sensors. We also discuss the evaluation and comparison of indoor modelling methods based on manually created reference models and appropriate quality evaluation criteria. The benchmark dataset is available for download at: benchmark-on-indoor-modelling.html"target="_blank">http://www2.isprs.org/commissions/comm4/wg5/benchmark-on-indoor-modelling.html.
22 CFR 226.15 - Metric system of measurement.

Code of Federal Regulations, 2010 CFR

2010-04-01

... 22 Foreign Relations 1 2010-04-01 2010-04-01 false Metric system of measurement. 226.15 Section 226.15 Foreign Relations AGENCY FOR INTERNATIONAL DEVELOPMENT ADMINISTRATION OF ASSISTANCE AWARDS TO U.S. NON-GOVERNMENTAL ORGANIZATIONS Pre-award Requirements § 226.15 Metric system of measurement. (a...
The metrics and correlates of physician migration from Africa.

PubMed

Arah, Onyebuchi A

2007-05-17

Physician migration from poor to rich countries is considered an important contributor to the growing health workforce crisis in the developing world. This is particularly true for Africa. The perceived magnitude of such migration for each source country might, however, depend on the choice of metrics used in the analysis. This study examined the influence of choice of migration metrics on the rankings of African countries that suffered the most physician migration, and investigated the correlates of physician migration. Ranking and correlational analyses were conducted on African physician migration data adjusted for bilateral net flows, and supplemented with developmental, economic and health system data. The setting was the 53 African birth countries of African-born physicians working in nine wealthier destination countries. Three metrics of physician migration were used: total number of physician émigrés; emigration fraction defined as the proportion of the potential physician pool working in destination countries; and physician migration density defined as the number of physician émigrés per 1000 population of the African source country. Rankings based on any of the migration metrics differed substantially from those based on the other two metrics. Although the emigration fraction and physician migration density metrics gave proportionality to the migration crisis, only the latter was consistently associated with source countries' workforce capacity, health, health spending, economic and development characteristics. As such, higher physician migration density was seen among African countries with relatively higher health workforce capacity (0.401 < or = r < or = 0.694, p < or = 0.011), health status, health spending, and development. The perceived magnitude of physician migration is sensitive to the choice of metrics. Complementing the emigration fraction, the physician migration density is a metric which gives a different but proportionate picture of
Structural texture similarity metrics for image analysis and retrieval.

PubMed

Zujovic, Jana; Pappas, Thrasyvoulos N; Neuhoff, David L

2013-07-01

We develop new metrics for texture similarity that accounts for human visual perception and the stochastic nature of textures. The metrics rely entirely on local image statistics and allow substantial point-by-point deviations between textures that according to human judgment are essentially identical. The proposed metrics extend the ideas of structural similarity and are guided by research in texture analysis-synthesis. They are implemented using a steerable filter decomposition and incorporate a concise set of subband statistics, computed globally or in sliding windows. We conduct systematic tests to investigate metric performance in the context of "known-item search," the retrieval of textures that are "identical" to the query texture. This eliminates the need for cumbersome subjective tests, thus enabling comparisons with human performance on a large database. Our experimental results indicate that the proposed metrics outperform peak signal-to-noise ratio (PSNR), structural similarity metric (SSIM) and its variations, as well as state-of-the-art texture classification metrics, using standard statistical measures.
The Metric System--An Overview.

ERIC Educational Resources Information Center

Hovey, Larry; Hovey, Kathi

1983-01-01

Sections look at: (1) Historical Perspective; (2) Naming the New System; (3) The Metric Units; (4) Measuring Larger and Smaller Amounts; (5) Advantage of Using the Metric System; (6) Metric Symbols; (7) Conversion from Metric to Customary System; (8) General Hints for Helping Children Understand; and (9) Current Status of Metric Conversion. (MP)
Observable traces of non-metricity: New constraints on metric-affine gravity

NASA Astrophysics Data System (ADS)

Delhom-Latorre, Adrià; Olmo, Gonzalo J.; Ronco, Michele

2018-05-01

Relaxing the Riemannian condition to incorporate geometric quantities such as torsion and non-metricity may allow to explore new physics associated with defects in a hypothetical space-time microstructure. Here we show that non-metricity produces observable effects in quantum fields in the form of 4-fermion contact interactions, thereby allowing us to constrain the scale of non-metricity to be greater than 1 TeV by using results on Bahbah scattering. Our analysis is carried out in the framework of a wide class of theories of gravity in the metric-affine approach. The bound obtained represents an improvement of several orders of magnitude to previous experimental constraints.
Measuring distance “as the horse runs”: Cross-scale comparison of terrain-based metrics

USGS Publications Warehouse

Buttenfield, Barbara P.; Ghandehari, M; Leyk, S; Stanislawski, Larry V.; Brantley, M E; Qiang, Yi

2016-01-01

Distance metrics play significant roles in spatial modeling tasks, such as flood inundation (Tucker and Hancock 2010), stream extraction (Stanislawski et al. 2015), power line routing (Kiessling et al. 2003) and analysis of surface pollutants such as nitrogen (Harms et al. 2009). Avalanche risk is based on slope, aspect, and curvature, all directly computed from distance metrics (Gutiérrez 2012). Distance metrics anchor variogram analysis, kernel estimation, and spatial interpolation (Cressie 1993). Several approaches are employed to measure distance. Planar metrics measure straight line distance between two points (“as the crow flies”) and are simple and intuitive, but suffer from uncertainties. Planar metrics assume that Digital Elevation Model (DEM) pixels are rigid and flat, as tiny facets of ceramic tile approximating a continuous terrain surface. In truth, terrain can bend, twist and undulate within each pixel.Work with Light Detection and Ranging (lidar) data or High Resolution Topography to achieve precise measurements present challenges, as filtering can eliminate or distort significant features (Passalacqua et al. 2015). The current availability of lidar data is far from comprehensive in developed nations, and non-existent in many rural and undeveloped regions. Notwithstanding computational advances, distance estimation on DEMs has never been systematically assessed, due to assumptions that improvements are so small that surface adjustment is unwarranted. For individual pixels inaccuracies may be small, but additive effects can propagate dramatically, especially in regional models (e.g., disaster evacuation) or global models (e.g., sea level rise) where pixels span dozens to hundreds of kilometers (Usery et al 2003). Such models are increasingly common, lending compelling reasons to understand shortcomings in the use of planar distance metrics. Researchers have studied curvature-based terrain modeling. Jenny et al. (2011) use curvature to generate
Austin Community College Benchmarking Update.

ERIC Educational Resources Information Center

Austin Community Coll., TX. Office of Institutional Effectiveness.

Austin Community College contracted with MGT of America, Inc. in spring 1999 to develop a peer and benchmark (best) practices analysis on key indicators. These indicators were updated in spring 2002 using data from eight Texas community colleges and four non-Texas institutions that represent large, comprehensive, urban community colleges, similar…
A large-scale benchmark of gene prioritization methods.

PubMed

Guala, Dimitri; Sonnhammer, Erik L L

2017-04-21

In order to maximize the use of results from high-throughput experimental studies, e.g. GWAS, for identification and diagnostics of new disease-associated genes, it is important to have properly analyzed and benchmarked gene prioritization tools. While prospective benchmarks are underpowered to provide statistically significant results in their attempt to differentiate the performance of gene prioritization tools, a strategy for retrospective benchmarking has been missing, and new tools usually only provide internal validations. The Gene Ontology(GO) contains genes clustered around annotation terms. This intrinsic property of GO can be utilized in construction of robust benchmarks, objective to the problem domain. We demonstrate how this can be achieved for network-based gene prioritization tools, utilizing the FunCoup network. We use cross-validation and a set of appropriate performance measures to compare state-of-the-art gene prioritization algorithms: three based on network diffusion, NetRank and two implementations of Random Walk with Restart, and MaxLink that utilizes network neighborhood. Our benchmark suite provides a systematic and objective way to compare the multitude of available and future gene prioritization tools, enabling researchers to select the best gene prioritization tool for the task at hand, and helping to guide the development of more accurate methods.
NASA metrication activities

NASA Technical Reports Server (NTRS)

Vlannes, P. N.

1978-01-01

NASA's organization and policy for metrification, history from 1964, NASA participation in Federal agency activities, interaction with nongovernmental metrication organizations, and the proposed metrication assessment study are reviewed.
Benchmarking health IT among OECD countries: better data for better policy

PubMed Central

Adler-Milstein, Julia; Ronchi, Elettra; Cohen, Genna R; Winn, Laura A Pannella; Jha, Ashish K

2014-01-01

Objective To develop benchmark measures of health information and communication technology (ICT) use to facilitate cross-country comparisons and learning. Materials and methods The effort is led by the Organisation for Economic Co-operation and Development (OECD). Approaches to definition and measurement within four ICT domains were compared across seven OECD countries in order to identify functionalities in each domain. These informed a set of functionality-based benchmark measures, which were refined in collaboration with representatives from more than 20 OECD and non-OECD countries. We report on progress to date and remaining work to enable countries to begin to collect benchmark data. Results The four benchmarking domains include provider-centric electronic record, patient-centric electronic record, health information exchange, and tele-health. There was broad agreement on functionalities in the provider-centric electronic record domain (eg, entry of core patient data, decision support), and less agreement in the other three domains in which country representatives worked to select benchmark functionalities. Discussion Many countries are working to implement ICTs to improve healthcare system performance. Although many countries are looking to others as potential models, the lack of consistent terminology and approach has made cross-national comparisons and learning difficult. Conclusions As countries develop and implement strategies to increase the use of ICTs to promote health goals, there is a historic opportunity to enable cross-country learning. To facilitate this learning and reduce the chances that individual countries flounder, a common understanding of health ICT adoption and use is needed. The OECD-led benchmarking process is a crucial step towards achieving this. PMID:23721983
Benchmarking health IT among OECD countries: better data for better policy.

PubMed

Adler-Milstein, Julia; Ronchi, Elettra; Cohen, Genna R; Winn, Laura A Pannella; Jha, Ashish K

2014-01-01

To develop benchmark measures of health information and communication technology (ICT) use to facilitate cross-country comparisons and learning. The effort is led by the Organisation for Economic Co-operation and Development (OECD). Approaches to definition and measurement within four ICT domains were compared across seven OECD countries in order to identify functionalities in each domain. These informed a set of functionality-based benchmark measures, which were refined in collaboration with representatives from more than 20 OECD and non-OECD countries. We report on progress to date and remaining work to enable countries to begin to collect benchmark data. The four benchmarking domains include provider-centric electronic record, patient-centric electronic record, health information exchange, and tele-health. There was broad agreement on functionalities in the provider-centric electronic record domain (eg, entry of core patient data, decision support), and less agreement in the other three domains in which country representatives worked to select benchmark functionalities. Many countries are working to implement ICTs to improve healthcare system performance. Although many countries are looking to others as potential models, the lack of consistent terminology and approach has made cross-national comparisons and learning difficult. As countries develop and implement strategies to increase the use of ICTs to promote health goals, there is a historic opportunity to enable cross-country learning. To facilitate this learning and reduce the chances that individual countries flounder, a common understanding of health ICT adoption and use is needed. The OECD-led benchmarking process is a crucial step towards achieving this.
Towards a Visual Quality Metric for Digital Video

NASA Technical Reports Server (NTRS)

Watson, Andrew B.

1998-01-01

The advent of widespread distribution of digital video creates a need for automated methods for evaluating visual quality of digital video. This is particularly so since most digital video is compressed using lossy methods, which involve the controlled introduction of potentially visible artifacts. Compounding the problem is the bursty nature of digital video, which requires adaptive bit allocation based on visual quality metrics. In previous work, we have developed visual quality metrics for evaluating, controlling, and optimizing the quality of compressed still images. These metrics incorporate simplified models of human visual sensitivity to spatial and chromatic visual signals. The challenge of video quality metrics is to extend these simplified models to temporal signals as well. In this presentation I will discuss a number of the issues that must be resolved in the design of effective video quality metrics. Among these are spatial, temporal, and chromatic sensitivity and their interactions, visual masking, and implementation complexity. I will also touch on the question of how to evaluate the performance of these metrics.
Benchmark Evaluation of Start-Up and Zero-Power Measurements at the High-Temperature Engineering Test Reactor

DOE PAGES

Bess, John D.; Fujimoto, Nozomu

2014-10-09

Benchmark models were developed to evaluate six cold-critical and two warm-critical, zero-power measurements of the HTTR. Additional measurements of a fully-loaded subcritical configuration, core excess reactivity, shutdown margins, six isothermal temperature coefficients, and axial reaction-rate distributions were also evaluated as acceptable benchmark experiments. Insufficient information is publicly available to develop finely-detailed models of the HTTR as much of the design information is still proprietary. However, the uncertainties in the benchmark models are judged to be of sufficient magnitude to encompass any biases and bias uncertainties incurred through the simplification process used to develop the benchmark models. Dominant uncertainties in themore » experimental keff for all core configurations come from uncertainties in the impurity content of the various graphite blocks that comprise the HTTR. Monte Carlo calculations of keff are between approximately 0.9 % and 2.7 % greater than the benchmark values. Reevaluation of the HTTR models as additional information becomes available could improve the quality of this benchmark and possibly reduce the computational biases. High-quality characterization of graphite impurities would significantly improve the quality of the HTTR benchmark assessment. Simulation of the other reactor physics measurements are in good agreement with the benchmark experiment values. The complete benchmark evaluation details are available in the 2014 edition of the International Handbook of Evaluated Reactor Physics Benchmark Experiments.« less

Metrics for Offline Evaluation of Prognostic Performance

NASA Technical Reports Server (NTRS)

Saxena, Abhinav; Celaya, Jose; Saha, Bhaskar; Saha, Sankalita; Goebel, Kai

2010-01-01

Prognostic performance evaluation has gained significant attention in the past few years. Currently, prognostics concepts lack standard definitions and suffer from ambiguous and inconsistent interpretations. This lack of standards is in part due to the varied end-user requirements for different applications, time scales, available information, domain dynamics, etc. to name a few. The research community has used a variety of metrics largely based on convenience and their respective requirements. Very little attention has been focused on establishing a standardized approach to compare different efforts. This paper presents several new evaluation metrics tailored for prognostics that were recently introduced and were shown to effectively evaluate various algorithms as compared to other conventional metrics. Specifically, this paper presents a detailed discussion on how these metrics should be interpreted and used. These metrics have the capability of incorporating probabilistic uncertainty estimates from prognostic algorithms. In addition to quantitative assessment they also offer a comprehensive visual perspective that can be used in designing the prognostic system. Several methods are suggested to customize these metrics for different applications. Guidelines are provided to help choose one method over another based on distribution characteristics. Various issues faced by prognostics and its performance evaluation are discussed followed by a formal notational framework to help standardize subsequent developments.
Mastering Metrics

ERIC Educational Resources Information Center

Parrot, Annette M.

2005-01-01

By the time students reach a middle school science course, they are expected to make measurements using the metric system. However, most are not practiced in its use, as their experience in metrics is often limited to one unit they were taught in elementary school. This lack of knowledge is not wholly the fault of formal education. Although the…
Closed-Loop Neuromorphic Benchmarks

PubMed Central

Stewart, Terrence C.; DeWolf, Travis; Kleinhans, Ashley; Eliasmith, Chris

2015-01-01

Evaluating the effectiveness and performance of neuromorphic hardware is difficult. It is even more difficult when the task of interest is a closed-loop task; that is, a task where the output from the neuromorphic hardware affects some environment, which then in turn affects the hardware's future input. However, closed-loop situations are one of the primary potential uses of neuromorphic hardware. To address this, we present a methodology for generating closed-loop benchmarks that makes use of a hybrid of real physical embodiment and a type of “minimal” simulation. Minimal simulation has been shown to lead to robust real-world performance, while still maintaining the practical advantages of simulation, such as making it easy for the same benchmark to be used by many researchers. This method is flexible enough to allow researchers to explicitly modify the benchmarks to identify specific task domains where particular hardware excels. To demonstrate the method, we present a set of novel benchmarks that focus on motor control for an arbitrary system with unknown external forces. Using these benchmarks, we show that an error-driven learning rule can consistently improve motor control performance across a randomly generated family of closed-loop simulations, even when there are up to 15 interacting joints to be controlled. PMID:26696820
Pynamic: the Python Dynamic Benchmark

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, G L; Ahn, D H; de Supinksi, B R

2007-07-10

Python is widely used in scientific computing to facilitate application development and to support features such as computational steering. Making full use of some of Python's popular features, which improve programmer productivity, leads to applications that access extremely high numbers of dynamically linked libraries (DLLs). As a result, some important Python-based applications severely stress a system's dynamic linking and loading capabilities and also cause significant difficulties for most development environment tools, such as debuggers. Furthermore, using the Python paradigm for large scale MPI-based applications can create significant file IO and further stress tools and operating systems. In this paper, wemore » present Pynamic, the first benchmark program to support configurable emulation of a wide-range of the DLL usage of Python-based applications for large scale systems. Pynamic has already accurately reproduced system software and tool issues encountered by important large Python-based scientific applications on our supercomputers. Pynamic provided insight for our system software and tool vendors, and our application developers, into the impact of several design decisions. As we describe the Pynamic benchmark, we will highlight some of the issues discovered in our large scale system software and tools using Pynamic.« less
Performance metrics for the evaluation of hyperspectral chemical identification systems

NASA Astrophysics Data System (ADS)

Truslow, Eric; Golowich, Steven; Manolakis, Dimitris; Ingle, Vinay

2016-02-01

Remote sensing of chemical vapor plumes is a difficult but important task for many military and civilian applications. Hyperspectral sensors operating in the long-wave infrared regime have well-demonstrated detection capabilities. However, the identification of a plume's chemical constituents, based on a chemical library, is a multiple hypothesis testing problem which standard detection metrics do not fully describe. We propose using an additional performance metric for identification based on the so-called Dice index. Our approach partitions and weights a confusion matrix to develop both the standard detection metrics and identification metric. Using the proposed metrics, we demonstrate that the intuitive system design of a detector bank followed by an identifier is indeed justified when incorporating performance information beyond the standard detection metrics.
Development of Metric for Measuring the Impact of RD&D Funding on GTO's Geothermal Exploration Goals (Presentation)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jenne, S.; Young, K. R.; Thorsteinsson, H.

The Department of Energy's Geothermal Technologies Office (GTO) provides RD&D funding for geothermal exploration technologies with the goal of lowering the risks and costs of geothermal development and exploration. In 2012, NREL was tasked with developing a metric to measure the impacts of this RD&D funding on the cost and time required for exploration activities. The development of this metric included collecting cost and time data for exploration techniques, creating a baseline suite of exploration techniques to which future exploration and cost and time improvements could be compared, and developing an online tool for graphically showing potential project impacts (allmore » available at http://en.openei.org/wiki/Gateway:Geothermal). The conference paper describes the methodology used to define the baseline exploration suite of techniques (baseline), as well as the approach that was used to create the cost and time data set that populates the baseline. The resulting product, an online tool for measuring impact, and the aggregated cost and time data are available on the Open EI website for public access (http://en.openei.org).« less
Benchmarking strategies for measuring the quality of healthcare: problems and prospects.

PubMed

Lovaglio, Pietro Giorgio

2012-01-01

Over the last few years, increasing attention has been directed toward the problems inherent to measuring the quality of healthcare and implementing benchmarking strategies. Besides offering accreditation and certification processes, recent approaches measure the performance of healthcare institutions in order to evaluate their effectiveness, defined as the capacity to provide treatment that modifies and improves the patient's state of health. This paper, dealing with hospital effectiveness, focuses on research methods for effectiveness analyses within a strategy comparing different healthcare institutions. The paper, after having introduced readers to the principle debates on benchmarking strategies, which depend on the perspective and type of indicators used, focuses on the methodological problems related to performing consistent benchmarking analyses. Particularly, statistical methods suitable for controlling case-mix, analyzing aggregate data, rare events, and continuous outcomes measured with error are examined. Specific challenges of benchmarking strategies, such as the risk of risk adjustment (case-mix fallacy, underreporting, risk of comparing noncomparable hospitals), selection bias, and possible strategies for the development of consistent benchmarking analyses, are discussed. Finally, to demonstrate the feasibility of the illustrated benchmarking strategies, an application focused on determining regional benchmarks for patient satisfaction (using 2009 Lombardy Region Patient Satisfaction Questionnaire) is proposed.
Fault Management Metrics

NASA Technical Reports Server (NTRS)

Johnson, Stephen B.; Ghoshal, Sudipto; Haste, Deepak; Moore, Craig

2017-01-01

This paper describes the theory and considerations in the application of metrics to measure the effectiveness of fault management. Fault management refers here to the operational aspect of system health management, and as such is considered as a meta-control loop that operates to preserve or maximize the system's ability to achieve its goals in the face of current or prospective failure. As a suite of control loops, the metrics to estimate and measure the effectiveness of fault management are similar to those of classical control loops in being divided into two major classes: state estimation, and state control. State estimation metrics can be classified into lower-level subdivisions for detection coverage, detection effectiveness, fault isolation and fault identification (diagnostics), and failure prognosis. State control metrics can be classified into response determination effectiveness and response effectiveness. These metrics are applied to each and every fault management control loop in the system, for each failure to which they apply, and probabilistically summed to determine the effectiveness of these fault management control loops to preserve the relevant system goals that they are intended to protect.
Transport impacts on atmosphere and climate: Metrics

NASA Astrophysics Data System (ADS)

Fuglestvedt, J. S.; Shine, K. P.; Berntsen, T.; Cook, J.; Lee, D. S.; Stenke, A.; Skeie, R. B.; Velders, G. J. M.; Waitz, I. A.

2010-12-01

The transport sector emits a wide variety of gases and aerosols, with distinctly different characteristics which influence climate directly and indirectly via chemical and physical processes. Tools that allow these emissions to be placed on some kind of common scale in terms of their impact on climate have a number of possible uses such as: in agreements and emission trading schemes; when considering potential trade-offs between changes in emissions resulting from technological or operational developments; and/or for comparing the impact of different environmental impacts of transport activities. Many of the non-CO 2 emissions from the transport sector are short-lived substances, not currently covered by the Kyoto Protocol. There are formidable difficulties in developing metrics and these are particularly acute for such short-lived species. One difficulty concerns the choice of an appropriate structure for the metric (which may depend on, for example, the design of any climate policy it is intended to serve) and the associated value judgements on the appropriate time periods to consider; these choices affect the perception of the relative importance of short- and long-lived species. A second difficulty is the quantification of input parameters (due to underlying uncertainty in atmospheric processes). In addition, for some transport-related emissions, the values of metrics (unlike the gases included in the Kyoto Protocol) depend on where and when the emissions are introduced into the atmosphere - both the regional distribution and, for aircraft, the distribution as a function of altitude, are important. In this assessment of such metrics, we present Global Warming Potentials (GWPs) as these have traditionally been used in the implementation of climate policy. We also present Global Temperature Change Potentials (GTPs) as an alternative metric, as this, or a similar metric may be more appropriate for use in some circumstances. We use radiative forcings and lifetimes
Towards a physics on fractals: Differential vector calculus in three-dimensional continuum with fractal metric

NASA Astrophysics Data System (ADS)

Balankin, Alexander S.; Bory-Reyes, Juan; Shapiro, Michael

2016-02-01

One way to deal with physical problems on nowhere differentiable fractals is the mapping of these problems into the corresponding problems for continuum with a proper fractal metric. On this way different definitions of the fractal metric were suggested to account for the essential fractal features. In this work we develop the metric differential vector calculus in a three-dimensional continuum with a non-Euclidean metric. The metric differential forms and Laplacian are introduced, fundamental identities for metric differential operators are established and integral theorems are proved by employing the metric version of the quaternionic analysis for the Moisil-Teodoresco operator, which has been introduced and partially developed in this paper. The relations between the metric and conventional operators are revealed. It should be emphasized that the metric vector calculus developed in this work provides a comprehensive mathematical formalism for the continuum with any suitable definition of fractal metric. This offers a novel tool to study physics on fractals.
Validation of tsunami inundation model TUNA-RP using OAR-PMEL-135 benchmark problem set

NASA Astrophysics Data System (ADS)

Koh, H. L.; Teh, S. Y.; Tan, W. K.; Kh'ng, X. Y.

2017-05-01

A standard set of benchmark problems, known as OAR-PMEL-135, is developed by the US National Tsunami Hazard Mitigation Program for tsunami inundation model validation. Any tsunami inundation model must be tested for its accuracy and capability using this standard set of benchmark problems before it can be gainfully used for inundation simulation. The authors have previously developed an in-house tsunami inundation model known as TUNA-RP. This inundation model solves the two-dimensional nonlinear shallow water equations coupled with a wet-dry moving boundary algorithm. This paper presents the validation of TUNA-RP against the solutions provided in the OAR-PMEL-135 benchmark problem set. This benchmark validation testing shows that TUNA-RP can indeed perform inundation simulation with accuracy consistent with that in the tested benchmark problem set.
Development of a perceptually calibrated objective metric of noise

NASA Astrophysics Data System (ADS)

Keelan, Brian W.; Jin, Elaine W.; Prokushkin, Sergey

2011-01-01

A system simulation model was used to create scene-dependent noise masks that reflect current performance of mobile phone cameras. Stimuli with different overall magnitudes of noise and with varying mixtures of red, green, blue, and luminance noises were included in the study. Eleven treatments in each of ten pictorial scenes were evaluated by twenty observers using the softcopy ruler method. In addition to determining the quality loss function in just noticeable differences (JNDs) for the average observer and scene, transformations for different combinations of observer sensitivity and scene susceptibility were derived. The psychophysical results were used to optimize an objective metric of isotropic noise based on system noise power spectra (NPS), which were integrated over a visual frequency weighting function to yield perceptually relevant variances and covariances in CIE L*a*b* space. Because the frequency weighting function is expressed in terms of cycles per degree at the retina, it accounts for display pixel size and viewing distance effects, so application-specific predictions can be made. Excellent results were obtained using only L* and a* variances and L*a* covariance, with relative weights of 100, 5, and 12, respectively. The positive a* weight suggests that the luminance (photopic) weighting is slightly narrow on the long wavelength side for predicting perceived noisiness. The L*a* covariance term, which is normally negative, reflects masking between L* and a* noise, as confirmed in informal evaluations. Test targets in linear sRGB and rendered L*a*b* spaces for each treatment are available at http://www.aptina.com/ImArch/ to enable other researchers to test metrics of their own design and calibrate them to JNDs of quality loss without performing additional observer experiments. Such JND-calibrated noise metrics are particularly valuable for comparing the impact of noise and other attributes, and for computing overall image quality.
Countermeasure development using a formalised metric-based process

NASA Astrophysics Data System (ADS)

Barker, Laurence

2008-10-01

Guided weapons, are a potent threat to both air and surface platforms; to protect the platform, Countermeasures are often used to disrupt the operation of the tracking system. Development of effective techniques to defeat the guidance sensors is a complex activity. The countermeasure often responds to the behaviour of a responsive sensor system, creating a "closed loop" interaction. Performance assessment is difficult, and determining that enough knowledge exists to make a case that a platform is adequately protected is challenging. A set of metrics known as Countermeasure Confidence Levels (CCL) is described. These set out a measure of confidence in prediction of Countermeasure performance. The CCL scale provides, for the first time, a method to determine whether enough evidence exists to support development activity and introduction to operational service. Application of the CCL scale to development of a hypothetical countermeasure is described. This tracks how the countermeasure is matured from initial concept to in-service application. The purpose of each stage is described, together with a description of what work is likely to be needed. This will involve timely use of analysis, simulation, laboratory work and field testing. The use of the CCL scale at key decision points is described. These include procurement decision points, and entry-to-service decisions. Each stage requires collection of evidence of effectiveness. Completeness of the available evidence can be assessed, and duplication can be avoided. Read-across between concepts, weapon systems and platforms can be addressed and the impact of technology insertion can be assessed.
Solution of the neutronics code dynamic benchmark by finite element method

NASA Astrophysics Data System (ADS)

Avvakumov, A. V.; Vabishchevich, P. N.; Vasilev, A. O.; Strizhov, V. F.

2016-10-01

The objective is to analyze the dynamic benchmark developed by Atomic Energy Research for the verification of best-estimate neutronics codes. The benchmark scenario includes asymmetrical ejection of a control rod in a water-type hexagonal reactor at hot zero power. A simple Doppler feedback mechanism assuming adiabatic fuel temperature heating is proposed. The finite element method on triangular calculation grids is used to solve the three-dimensional neutron kinetics problem. The software has been developed using the engineering and scientific calculation library FEniCS. The matrix spectral problem is solved using the scalable and flexible toolkit SLEPc. The solution accuracy of the dynamic benchmark is analyzed by condensing calculation grid and varying degree of finite elements.
A benchmarking method to measure dietary absorption efficiency of chemicals by fish.

PubMed

Xiao, Ruiyang; Adolfsson-Erici, Margaretha; Åkerman, Gun; McLachlan, Michael S; MacLeod, Matthew

2013-12-01

Understanding the dietary absorption efficiency of chemicals in the gastrointestinal tract of fish is important from both a scientific and a regulatory point of view. However, reported fish absorption efficiencies for well-studied chemicals are highly variable. In the present study, the authors developed and exploited an internal chemical benchmarking method that has the potential to reduce uncertainty and variability and, thus, to improve the precision of measurements of fish absorption efficiency. The authors applied the benchmarking method to measure the gross absorption efficiency for 15 chemicals with a wide range of physicochemical properties and structures. They selected 2,2',5,6'-tetrachlorobiphenyl (PCB53) and decabromodiphenyl ethane as absorbable and nonabsorbable benchmarks, respectively. Quantities of chemicals determined in fish were benchmarked to the fraction of PCB53 recovered in fish, and quantities of chemicals determined in feces were benchmarked to the fraction of decabromodiphenyl ethane recovered in feces. The performance of the benchmarking procedure was evaluated based on the recovery of the test chemicals and precision of absorption efficiency from repeated tests. Benchmarking did not improve the precision of the measurements; after benchmarking, however, the median recovery for 15 chemicals was 106%, and variability of recoveries was reduced compared with before benchmarking, suggesting that benchmarking could account for incomplete extraction of chemical in fish and incomplete collection of feces from different tests. © 2013 SETAC.
Double metric, generalized metric, and α' -deformed double field theory

NASA Astrophysics Data System (ADS)

Hohm, Olaf; Zwiebach, Barton

2016-03-01

We relate the unconstrained "double metric" of the "α' -geometry" formulation of double field theory to the constrained generalized metric encoding the spacetime metric and b -field. This is achieved by integrating out auxiliary field components of the double metric in an iterative procedure that induces an infinite number of higher-derivative corrections. As an application, we prove that, to first order in α' and to all orders in fields, the deformed gauge transformations are Green-Schwarz-deformed diffeomorphisms. We also prove that to first order in α' the spacetime action encodes precisely the Green-Schwarz deformation with Chern-Simons forms based on the torsionless gravitational connection. This seems to be in tension with suggestions in the literature that T-duality requires a torsionful connection, but we explain that these assertions are ambiguous since actions that use different connections are related by field redefinitions.
Processor Emulator with Benchmark Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lloyd, G. Scott; Pearce, Roger; Gokhale, Maya

2015-11-13

A processor emulator and a suite of benchmark applications have been developed to assist in characterizing the performance of data-centric workloads on current and future computer architectures. Some of the applications have been collected from other open source projects. For more details on the emulator and an example of its usage, see reference [1].
Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

1998-01-01

This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.
Evaluation of metrics and baselines for tracking greenhouse gas emissions trends: Recommendations for the California climate action registry

DOE Office of Scientific and Technical Information (OSTI.GOV)

Price, Lynn; Murtishaw, Scott; Worrell, Ernst

2003-06-01

California Energy Commission (Energy Commission) related to the Registry in three areas: (1) assessing the availability and usefulness of industry-specific metrics, (2) evaluating various methods for establishing baselines for calculating GHG emissions reductions related to specific actions taken by Registry participants, and (3) establishing methods for calculating electricity CO2 emission factors. The third area of research was completed in 2002 and is documented in Estimating Carbon Dioxide Emissions Factors for the California Electric Power Sector (Marnay et al., 2002). This report documents our findings related to the first areas of research. For the first area of research, the overall objective was to evaluate the metrics, such as emissions per economic unit or emissions per unit of production that can be used to report GHG emissions trends for potential Registry participants. This research began with an effort to identify methodologies, benchmarking programs, inventories, protocols, and registries that u se industry-specific metrics to track trends in energy use or GHG emissions in order to determine what types of metrics have already been developed. The next step in developing industry-specific metrics was to assess the availability of data needed to determine metric development priorities. Berkeley Lab also determined the relative importance of different potential Registry participant categories in order to asses s the availability of sectoral or industry-specific metrics and then identified industry-specific metrics in use around the world. While a plethora of metrics was identified, no one metric that adequately tracks trends in GHG emissions while maintaining confidentiality of data was identified. As a result of this review, Berkeley Lab recommends the development of a GHG intensity index as a new metric for reporting and tracking GHG emissions trends.Such an index could provide an industry-specific metric for reporting and tracking GHG emissions trends to
R&D100: Lightweight Distributed Metric Service

ScienceCinema

Gentile, Ann; Brandt, Jim; Tucker, Tom; Showerman, Mike

2018-06-12

On today's High Performance Computing platforms, the complexity of applications and configurations makes efficient use of resources difficult. The Lightweight Distributed Metric Service (LDMS) is monitoring software developed by Sandia National Laboratories to provide detailed metrics of system performance. LDMS provides collection, transport, and storage of data from extreme-scale systems at fidelities and timescales to provide understanding of application and system performance with no statistically significant impact on application performance.

R&D100: Lightweight Distributed Metric Service

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gentile, Ann; Brandt, Jim; Tucker, Tom

2015-11-19

On today's High Performance Computing platforms, the complexity of applications and configurations makes efficient use of resources difficult. The Lightweight Distributed Metric Service (LDMS) is monitoring software developed by Sandia National Laboratories to provide detailed metrics of system performance. LDMS provides collection, transport, and storage of data from extreme-scale systems at fidelities and timescales to provide understanding of application and system performance with no statistically significant impact on application performance.
Length of stay benchmarks for inpatient rehabilitation after stroke.

PubMed

Meyer, Matthew; Britt, Eileen; McHale, Heather A; Teasell, Robert

2012-01-01

In Canada, no standardized benchmarks for length of stay (LOS) have been established for post-stroke inpatient rehabilitation. This paper describes the development of a severity specific median length of stay benchmarking strategy, assessment of its impact after one year of implementation in a Canadian rehabilitation hospital, and establishment of updated benchmarks that may be useful for comparison with other facilities across Canada. Patient data were retrospectively assessed for all patients admitted to a single post-acute stroke rehabilitation unit in Ontario, Canada between April 2005 and March 2008. Rehabilitation Patient Groups (RPGs) were used to establish stratified median length of stay benchmarks for each group that were incorporated into team rounds beginning in October 2009. Benchmark impact was assessed using mean LOS, FIM(®) gain, and discharge destination for each RPG group, collected prospectively for one year, compared against similar information from the previous calendar year. Benchmarks were then adjusted accordingly for future use. Between October 2009 and September 2010, a significant reduction in average LOS was noted compared to the previous year (35.3 vs. 41.2 days; p < 0.05). Reductions in LOS were noted in each RPG group including statistically significant reductions in 4 of the 7 groups. As intended, reductions in LOS were achieved with no significant reduction in mean FIM(®) gain or proportion of patients discharged home compared to the previous year. Adjusted benchmarks for LOS ranged from 13 to 48 days depending on the RPG group. After a single year of implementation, severity specific benchmarks helped the rehabilitation team reduce LOS while maintaining the same levels of functional gain and achieving the same rate of discharge to the community. © 2012 Informa UK, Ltd.
Development of Methodologies, Metrics, and Tools for Investigating Human-Robot Interaction in Space Robotics

NASA Technical Reports Server (NTRS)

Ezer, Neta; Zumbado, Jennifer Rochlis; Sandor, Aniko; Boyer, Jennifer

2011-01-01

Human-robot systems are expected to have a central role in future space exploration missions that extend beyond low-earth orbit [1]. As part of a directed research project funded by NASA s Human Research Program (HRP), researchers at the Johnson Space Center have started to use a variety of techniques, including literature reviews, case studies, knowledge capture, field studies, and experiments to understand critical human-robot interaction (HRI) variables for current and future systems. Activities accomplished to date include observations of the International Space Station s Special Purpose Dexterous Manipulator (SPDM), Robonaut, and Space Exploration Vehicle (SEV), as well as interviews with robotics trainers, robot operators, and developers of gesture interfaces. A survey of methods and metrics used in HRI was completed to identify those most applicable to space robotics. These methods and metrics included techniques and tools associated with task performance, the quantification of human-robot interactions and communication, usability, human workload, and situation awareness. The need for more research in areas such as natural interfaces, compensations for loss of signal and poor video quality, psycho-physiological feedback, and common HRI testbeds were identified. The initial findings from these activities and planned future research are discussed. Human-robot systems are expected to have a central role in future space exploration missions that extend beyond low-earth orbit [1]. As part of a directed research project funded by NASA s Human Research Program (HRP), researchers at the Johnson Space Center have started to use a variety of techniques, including literature reviews, case studies, knowledge capture, field studies, and experiments to understand critical human-robot interaction (HRI) variables for current and future systems. Activities accomplished to date include observations of the International Space Station s Special Purpose Dexterous Manipulator
Framework for performance evaluation of face, text, and vehicle detection and tracking in video: data, metrics, and protocol.

PubMed

Kasturi, Rangachar; Goldgof, Dmitry; Soundararajan, Padmanabhan; Manohar, Vasant; Garofolo, John; Bowers, Rachel; Boonstra, Matthew; Korzhova, Valentina; Zhang, Jing

2009-02-01

Common benchmark data sets, standardized performance metrics, and baseline algorithms have demonstrated considerable impact on research and development in a variety of application domains. These resources provide both consumers and developers of technology with a common framework to objectively compare the performance of different algorithms and algorithmic improvements. In this paper, we present such a framework for evaluating object detection and tracking in video: specifically for face, text, and vehicle objects. This framework includes the source video data, ground-truth annotations (along with guidelines for annotation), performance metrics, evaluation protocols, and tools including scoring software and baseline algorithms. For each detection and tracking task and supported domain, we developed a 50-clip training set and a 50-clip test set. Each data clip is approximately 2.5 minutes long and has been completely spatially/temporally annotated at the I-frame level. Each task/domain, therefore, has an associated annotated corpus of approximately 450,000 frames. The scope of such annotation is unprecedented and was designed to begin to support the necessary quantities of data for robust machine learning approaches, as well as a statistically significant comparison of the performance of algorithms. The goal of this work was to systematically address the challenges of object detection and tracking through a common evaluation framework that permits a meaningful objective comparison of techniques, provides the research community with sufficient data for the exploration of automatic modeling techniques, encourages the incorporation of objective evaluation into the development process, and contributes useful lasting resources of a scale and magnitude that will prove to be extremely useful to the computer vision research community for years to come.
HS06 Benchmark for an ARM Server

NASA Astrophysics Data System (ADS)

Kluth, Stefan

2014-06-01

We benchmarked an ARM cortex-A9 based server system with a four-core CPU running at 1.1 GHz. The system used Ubuntu 12.04 as operating system and the HEPSPEC 2006 (HS06) benchmarking suite was compiled natively with gcc-4.4 on the system. The benchmark was run for various settings of the relevant gcc compiler options. We did not find significant influence from the compiler options on the benchmark result. The final HS06 benchmark result is 10.4.
DEVELOPMENT OF METRICS FOR TECHNICAL PRODUCTION: QUALIS BOOKS AND BOOK CHAPTERS.

PubMed

Ribas-Filho, Jurandir Marcondes; Malafaia, Osvaldo; Czeczko, Nicolau Gregori; Ribas, Carmen A P Marcondes; Nassif, Paulo Afonso Nunes

2015-01-01

To propose metrics to qualify the publication in books and chapters, and from there, establish guidance for the evaluation of the Medicine III programs. Analysis of some of the 2013 area documents focusing this issue. Were analyzed the following areas: Computer Science; Biotechnology; Biological Sciences I; Public Health; Medicine I. Except for the Medicine I, which has not adopted the metric for books and chapters, all other programs established metrics within the intellectual production, although with unequal percentages. It´s desirable to include metrics for books and book chapters in the intellectual production of post-graduate programs in Area Document with percentage-value of 5% in publications of Medicine III programs. Propor a métrica para qualificar a produção veiculada através de livros e capítulos e, a partir daí, estabelecer orientação para a avaliação dos programas de pós-graduação da Medicina III. Análise dos documentos de área de 2013 dos programas de pós-graduação senso estrito das áreas: Ciência da Computação; Biotecnologia; Ciências Biológicas I; Saúde Coletiva; Medicina I. Excetuando-se o programa da Medicina I, que não adotou a métrica para classificação de livros e capítulos, todos os demais estabeleceram-na dentro da sua produção intelectual, embora com percentuais desiguais. É desejável inserir a métrica de livros e capitulos de livros na produção intelectual do Documento de Área dos programas, ortorgando a ela percentual de 5% das publicações qualificadas dos programas da Medicina III.
NAS Grid Benchmarks. 1.0

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob; Frumkin, Michael; Biegel, Bryan A. (Technical Monitor)

2002-01-01

We provide a paper-and-pencil specification of a benchmark suite for computational grids. It is based on the NAS (NASA Advanced Supercomputing) Parallel Benchmarks (NPB) and is called the NAS Grid Benchmarks (NGB). NGB problems are presented as data flow graphs encapsulating an instance of a slightly modified NPB task in each graph node, which communicates with other nodes by sending/receiving initialization data. Like NPB, NGB specifies several different classes (problem sizes). In this report we describe classes S, W, and A, and provide verification values for each. The implementor has the freedom to choose any language, grid environment, security model, fault tolerance/error correction mechanism, etc., as long as the resulting implementation passes the verification test and reports the turnaround time of the benchmark.
Metrics for linear kinematic features in sea ice

NASA Astrophysics Data System (ADS)

Levy, G.; Coon, M.; Sulsky, D.

2006-12-01

The treatment of leads as cracks or discontinuities (see Coon et al. presentation) requires some shift in the procedure of evaluation and comparison of lead-resolving models and their validation against observations. Common metrics used to evaluate ice model skills are by and large an adaptation of a least square "metric" adopted from operational numerical weather prediction data assimilation systems and are most appropriate for continuous fields and Eilerian systems where the observations and predictions are commensurate. However, this class of metrics suffers from some flaws in areas of sharp gradients and discontinuities (e.g., leads) and when Lagrangian treatments are more natural. After a brief review of these metrics and their performance in areas of sharp gradients, we present two new metrics specifically designed to measure model accuracy in representing linear features (e.g., leads). The indices developed circumvent the requirement that both the observations and model variables be commensurate (i.e., measured with the same units) by considering the frequencies of the features of interest/importance. We illustrate the metrics by scoring several hypothetical "simulated" discontinuity fields against the lead interpreted from RGPS observations.
Changing to the Metric System.

ERIC Educational Resources Information Center

Chambers, Donald L.; Dowling, Kenneth W.

This report examines educational aspects of the conversion to the metric system of measurement in the United States. Statements of positions on metrication and basic mathematical skills are given from various groups. Base units, symbols, prefixes, and style of the metric system are outlined. Guidelines for teaching metric concepts are given,…
The Zoo, Benchmarks & You: How To Reach the Oregon State Benchmarks with Zoo Resources.

ERIC Educational Resources Information Center

2002

This document aligns Oregon state educational benchmarks and standards with Oregon Zoo resources. Benchmark areas examined include English, mathematics, science, social studies, and career and life roles. Brief descriptions of the programs offered by the zoo are presented. (SOE)
Software metrics: Software quality metrics for distributed systems. [reliability engineering

NASA Technical Reports Server (NTRS)

Post, J. V.

1981-01-01

Software quality metrics was extended to cover distributed computer systems. Emphasis is placed on studying embedded computer systems and on viewing them within a system life cycle. The hierarchy of quality factors, criteria, and metrics was maintained. New software quality factors were added, including survivability, expandability, and evolvability.
Implementing the Metric System in Agricultural Occupations. Metric Implementation Guide.

ERIC Educational Resources Information Center

Gilmore, Hal M.; And Others

Addressed to the agricultural education teacher, this guide is intended to provide appropriate information, viewpoints, and attitudes regarding the metric system and to make suggestions regarding presentation of the material in the classroom. An introductory section on teaching suggestions emphasizes the need for a "think metric" approach made up…
Implementing the Metric System in Health Occupations. Metric Implementation Guide.

ERIC Educational Resources Information Center

Banks, Wilson P.; And Others

Addressed to the health occupations education teacher, this guide is intended to provide appropriate information, viewpoints, and attitudes regarding the metric system and to make suggestions regarding presentation of the material in the classroom. An introductory section on teaching suggestions emphasizes the need for a "think metric" approach…
lakemorpho: Calculating lake morphometry metrics in R.

PubMed

Hollister, Jeffrey; Stachelek, Joseph

2017-01-01

Metrics describing the shape and size of lakes, known as lake morphometry metrics, are important for any limnological study. In cases where a lake has long been the subject of study these data are often already collected and are openly available. Many other lakes have these data collected, but access is challenging as it is often stored on individual computers (or worse, in filing cabinets) and is available only to the primary investigators. The vast majority of lakes fall into a third category in which the data are not available. This makes broad scale modelling of lake ecology a challenge as some of the key information about in-lake processes are unavailable. While this valuable in situ information may be difficult to obtain, several national datasets exist that may be used to model and estimate lake morphometry. In particular, digital elevation models and hydrography have been shown to be predictive of several lake morphometry metrics. The R package lakemorpho has been developed to utilize these data and estimate the following morphometry metrics: surface area, shoreline length, major axis length, minor axis length, major and minor axis length ratio, shoreline development, maximum depth, mean depth, volume, maximum lake length, mean lake width, maximum lake width, and fetch. In this software tool article we describe the motivation behind developing lakemorpho , discuss the implementation in R, and describe the use of lakemorpho with an example of a typical use case.
The Concepts "Benchmarks and Benchmarking" Used in Education Planning: Teacher Education as Example

ERIC Educational Resources Information Center

Steyn, H. J.

2015-01-01

Planning in education is a structured activity that includes several phases and steps that take into account several kinds of information (Steyn, Steyn, De Waal & Wolhuter, 2002: 146). One of the sets of information that are usually considered is the (so-called) "benchmarks" and "benchmarking" regarding the focus of a…
Metrics in Career Education.

ERIC Educational Resources Information Center

Lindbeck, John R.

The United States is rapidly becoming a metric nation. Industry, education, business, and government are all studying the issue of metrication to learn how they can prepare for it. The book is designed to help teachers and students in career education programs learn something about metrics. Presented in an easily understood manner, the textbook's…
Newman-Penrose constants of the Kerr-Newman metric

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gong Xuefei; Shang Yu; Bai Shan

The Newman-Unti formalism of the Kerr-Newman metric near future null infinity is developed, with which the Newman-Penrose constants for both the gravitational and electromagnetic fields of the Kerr-Newman metric are computed and shown to be zero. The multipole structure near future null infinity in the sense of Janis-Newman of the Kerr-Newman metric is then further studied. It is found that up to the 2{sup 4}-pole, modulo a constant dependent upon the order of the pole, these multipole moments agree with those of Geroch-Hansen multipole moments defined at spatial infinity.
Validation of a Quality Management Metric

DTIC Science & Technology

2000-09-01

quality management metric (QMM) was used to measure the performance of ten software managers on Department of Defense (DoD) software development programs. Informal verification and validation of the metric compared the QMM score to an overall program success score for the entire program and yielded positive correlation. The results of applying the QMM can be used to characterize the quality of software management and can serve as a template to improve software management performance. Future work includes further refining the QMM, applying the QMM scores to provide feedback
Benchmarking Strategies for Measuring the Quality of Healthcare: Problems and Prospects

PubMed Central

Lovaglio, Pietro Giorgio

2012-01-01

Over the last few years, increasing attention has been directed toward the problems inherent to measuring the quality of healthcare and implementing benchmarking strategies. Besides offering accreditation and certification processes, recent approaches measure the performance of healthcare institutions in order to evaluate their effectiveness, defined as the capacity to provide treatment that modifies and improves the patient's state of health. This paper, dealing with hospital effectiveness, focuses on research methods for effectiveness analyses within a strategy comparing different healthcare institutions. The paper, after having introduced readers to the principle debates on benchmarking strategies, which depend on the perspective and type of indicators used, focuses on the methodological problems related to performing consistent benchmarking analyses. Particularly, statistical methods suitable for controlling case-mix, analyzing aggregate data, rare events, and continuous outcomes measured with error are examined. Specific challenges of benchmarking strategies, such as the risk of risk adjustment (case-mix fallacy, underreporting, risk of comparing noncomparable hospitals), selection bias, and possible strategies for the development of consistent benchmarking analyses, are discussed. Finally, to demonstrate the feasibility of the illustrated benchmarking strategies, an application focused on determining regional benchmarks for patient satisfaction (using 2009 Lombardy Region Patient Satisfaction Questionnaire) is proposed. PMID:22666140
Implementing the Metric System in Business Occupations. Metric Implementation Guide.

ERIC Educational Resources Information Center

Retzer, Kenneth A.; And Others

Addressed to the business education teacher, this guide is intended to provide appropriate information, viewpoints, and attitudes regarding the metric system and to make suggestions regarding presentation of the material in the classroom. An introductory section on teaching suggestions emphasizes the need for a "think metric" approach made up of…

Implementing the Metric System in Industrial Occupations. Metric Implementation Guide.

ERIC Educational Resources Information Center

Retzer, Kenneth A.

Addressed to the industrial education teacher, this guide is intended to provide appropriate information, viewpoints, and attitudes regarding the metric system and to make suggestions regarding presentation of the material in the classroom. An introductory section on teaching suggestions emphasizes the need for a "think metric" approach made up of…
The metric system: An introduction

NASA Astrophysics Data System (ADS)

Lumley, Susan M.

On 13 Jul. 1992, Deputy Director Duane Sewell restated the Laboratory's policy on conversion to the metric system which was established in 1974. Sewell's memo announced the Laboratory's intention to continue metric conversion on a reasonable and cost effective basis. Copies of the 1974 and 1992 Administrative Memos are contained in the Appendix. There are three primary reasons behind the Laboratory's conversion to the metric system. First, Public Law 100-418, passed in 1988, states that by the end of fiscal year 1992 the Federal Government must begin using metric units in grants, procurements, and other business transactions. Second, on 25 Jul. 1991, President George Bush signed Executive Order 12770 which urged Federal agencies to expedite conversion to metric units. Third, the contract between the University of California and the Department of Energy calls for the Laboratory to convert to the metric system. Thus, conversion to the metric system is a legal requirement and a contractual mandate with the University of California. Public Law 100-418 and Executive Order 12770 are discussed in more detail later in this section, but first they examine the reasons behind the nation's conversion to the metric system. The second part of this report is on applying the metric system.
Summer temperature metrics for predicting brook trout (Salvelinus fontinalis) distribution in streams

USGS Publications Warehouse

Parrish, Donna; Butryn, Ryan S.; Rizzo, Donna M.

2012-01-01

We developed a methodology to predict brook trout (Salvelinus fontinalis) distribution using summer temperature metrics as predictor variables. Our analysis used long-term fish and hourly water temperature data from the Dog River, Vermont (USA). Commonly used metrics (e.g., mean, maximum, maximum 7-day maximum) tend to smooth the data so information on temperature variation is lost. Therefore, we developed a new set of metrics (called event metrics) to capture temperature variation by describing the frequency, area, duration, and magnitude of events that exceeded a user-defined temperature threshold. We used 16, 18, 20, and 22°C. We built linear discriminant models and tested and compared the event metrics against the commonly used metrics. Correct classification of the observations was 66% with event metrics and 87% with commonly used metrics. However, combined event and commonly used metrics correctly classified 92%. Of the four individual temperature thresholds, it was difficult to assess which threshold had the “best” accuracy. The 16°C threshold had slightly fewer misclassifications; however, the 20°C threshold had the fewest extreme misclassifications. Our method leveraged the volumes of existing long-term data and provided a simple, systematic, and adaptable framework for monitoring changes in fish distribution, specifically in the case of irregular, extreme temperature events.
Benchmarking neuromorphic vision: lessons learnt from computer vision

PubMed Central

Tan, Cheston; Lallee, Stephane; Orchard, Garrick

2015-01-01

Neuromorphic Vision sensors have improved greatly since the first silicon retina was presented almost three decades ago. They have recently matured to the point where they are commercially available and can be operated by laymen. However, despite improved availability of sensors, there remains a lack of good datasets, while algorithms for processing spike-based visual data are still in their infancy. On the other hand, frame-based computer vision algorithms are far more mature, thanks in part to widely accepted datasets which allow direct comparison between algorithms and encourage competition. We are presented with a unique opportunity to shape the development of Neuromorphic Vision benchmarks and challenges by leveraging what has been learnt from the use of datasets in frame-based computer vision. Taking advantage of this opportunity, in this paper we review the role that benchmarks and challenges have played in the advancement of frame-based computer vision, and suggest guidelines for the creation of Neuromorphic Vision benchmarks and challenges. We also discuss the unique challenges faced when benchmarking Neuromorphic Vision algorithms, particularly when attempting to provide direct comparison with frame-based computer vision. PMID:26528120
Resilience Metrics for the Electric Power System: A Performance-Based Approach.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vugrin, Eric D.; Castillo, Andrea R; Silva-Monroy, Cesar Augusto

Grid resilience is a concept related to a power system's ability to continue operating and delivering power even in the event that low probability, high-consequence disruptions such as hurricanes, earthquakes, and cyber-attacks occur. Grid resilience objectives focus on managing and, ideally, minimizing potential consequences that occur as a result of these disruptions. Currently, no formal grid resilience definitions, metrics, or analysis methods have been universally accepted. This document describes an effort to develop and describe grid resilience metrics and analysis methods. The metrics and methods described herein extend upon the Resilience Analysis Process (RAP) developed by Watson et al. formore » the 2015 Quadrennial Energy Review. The extension allows for both outputs from system models and for historical data to serve as the basis for creating grid resilience metrics and informing grid resilience planning and response decision-making. This document describes the grid resilience metrics and analysis methods. Demonstration of the metrics and methods is shown through a set of illustrative use cases.« less
Metric Issues for Small Business.

DTIC Science & Technology

1981-08-01

time, it seems that small business is meeting the problens (f ()ijutarv, metric conversion within its own resources ( manager ",a1, t, ,’bral, an...AD-AI07 861 UNITED STATES METRIC BOARD ARLINGTON VA FIG 5/3 METRIC ISSUES FOR SMALL BUSINESS . (U) UNCASIF AUG a I EHEHEi-17i i/ll/l///i//,° i MENNEN...METRIC ISSUES FOR SMALL BUSINESS EXECUTIVE SUMMARY DTIC ifELECT A This o@~ant )u enapo.I fW pu~~w Z. Ias ,!W% si~ts La-i UNITED STATES METRIC BOARD
Human Performance Optimization Metrics: Consensus Findings, Gaps, and Recommendations for Future Research.

PubMed

Nindl, Bradley C; Jaffin, Dianna P; Dretsch, Michael N; Cheuvront, Samuel N; Wesensten, Nancy J; Kent, Michael L; Grunberg, Neil E; Pierce, Joseph R; Barry, Erin S; Scott, Jonathan M; Young, Andrew J; OʼConnor, Francis G; Deuster, Patricia A

2015-11-01

Human performance optimization (HPO) is defined as "the process of applying knowledge, skills and emerging technologies to improve and preserve the capabilities of military members, and organizations to execute essential tasks." The lack of consensus for operationally relevant and standardized metrics that meet joint military requirements has been identified as the single most important gap for research and application of HPO. In 2013, the Consortium for Health and Military Performance hosted a meeting to develop a toolkit of standardized HPO metrics for use in military and civilian research, and potentially for field applications by commanders, units, and organizations. Performance was considered from a holistic perspective as being influenced by various behaviors and barriers. To accomplish the goal of developing a standardized toolkit, key metrics were identified and evaluated across a spectrum of domains that contribute to HPO: physical performance, nutritional status, psychological status, cognitive performance, environmental challenges, sleep, and pain. These domains were chosen based on relevant data with regard to performance enhancers and degraders. The specific objectives at this meeting were to (a) identify and evaluate current metrics for assessing human performance within selected domains; (b) prioritize metrics within each domain to establish a human performance assessment toolkit; and (c) identify scientific gaps and the needed research to more effectively assess human performance across domains. This article provides of a summary of 150 total HPO metrics across multiple domains that can be used as a starting point-the beginning of an HPO toolkit: physical fitness (29 metrics), nutrition (24 metrics), psychological status (36 metrics), cognitive performance (35 metrics), environment (12 metrics), sleep (9 metrics), and pain (5 metrics). These metrics can be particularly valuable as the military emphasizes a renewed interest in Human Dimension efforts
The Craft of Benchmarking: Finding and Utilizing District-Level, Campus-Level, and Program-Level Standards.

ERIC Educational Resources Information Center

McGregor, Ellen N.; Attinasi, Louis C., Jr.

This paper describes the processes involved in selecting peer institutions for appropriate benchmarking using national databases (NCES-IPEDS). Benchmarking involves the identification of peer institutions and/or best practices in specific operational areas for the purpose of developing standards. The benchmarking process was borne in the early…
HPC Analytics Support. Requirements for Uncertainty Quantification Benchmarks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Paulson, Patrick R.; Purohit, Sumit; Rodriguez, Luke R.

2015-05-01

This report outlines techniques for extending benchmark generation products so they support uncertainty quantification by benchmarked systems. We describe how uncertainty quantification requirements can be presented to candidate analytical tools supporting SPARQL. We describe benchmark data sets for evaluating uncertainty quantification, as well as an approach for using our benchmark generator to produce data sets for generating benchmark data sets.
BENCHMARK DOSE TECHNICAL GUIDANCE DOCUMENT ...

EPA Pesticide Factsheets

The purpose of this document is to provide guidance for the Agency on the application of the benchmark dose approach in determining the point of departure (POD) for health effects data, whether a linear or nonlinear low dose extrapolation is used. The guidance includes discussion on computation of benchmark doses and benchmark concentrations (BMDs and BMCs) and their lower confidence limits, data requirements, dose-response analysis, and reporting requirements. This guidance is based on today's knowledge and understanding, and on experience gained in using this approach.
Benchmarking clinical photography services in the NHS.

PubMed

Arbon, Giles

2015-01-01

Benchmarking is used in services across the National Health Service (NHS) using various benchmarking programs. Clinical photography services do not have a program in place and services have to rely on ad hoc surveys of other services. A trial benchmarking exercise was undertaken with 13 services in NHS Trusts. This highlights valuable data and comparisons that can be used to benchmark and improve services throughout the profession.
Method and system for benchmarking computers

DOEpatents

Gustafson, John L.

1993-09-14

A testing system and method for benchmarking computer systems. The system includes a store containing a scalable set of tasks to be performed to produce a solution in ever-increasing degrees of resolution as a larger number of the tasks are performed. A timing and control module allots to each computer a fixed benchmarking interval in which to perform the stored tasks. Means are provided for determining, after completion of the benchmarking interval, the degree of progress through the scalable set of tasks and for producing a benchmarking rating relating to the degree of progress for each computer.
Green Chemistry Metrics with Special Reference to Green Analytical Chemistry.

PubMed

Tobiszewski, Marek; Marć, Mariusz; Gałuszka, Agnieszka; Namieśnik, Jacek

2015-06-12

The concept of green chemistry is widely recognized in chemical laboratories. To properly measure an environmental impact of chemical processes, dedicated assessment tools are required. This paper summarizes the current state of knowledge in the field of development of green chemistry and green analytical chemistry metrics. The diverse methods used for evaluation of the greenness of organic synthesis, such as eco-footprint, E-Factor, EATOS, and Eco-Scale are described. Both the well-established and recently developed green analytical chemistry metrics, including NEMI labeling and analytical Eco-scale, are presented. Additionally, this paper focuses on the possibility of the use of multivariate statistics in evaluation of environmental impact of analytical procedures. All the above metrics are compared and discussed in terms of their advantages and disadvantages. The current needs and future perspectives in green chemistry metrics are also discussed.
Benchmark Shock Tube Experiments for Radiative Heating Relevant to Earth Re-Entry

NASA Technical Reports Server (NTRS)

Brandis, A. M.; Cruden, B. A.

2017-01-01

Detailed spectrally and spatially resolved radiance has been measured in the Electric Arc Shock Tube (EAST) facility for conditions relevant to high speed entry into a variety of atmospheres, including Earth, Venus, Titan, Mars and the Outer Planets. The tests that measured radiation relevant for Earth re-entry are the focus of this work and are taken from campaigns 47, 50, 52 and 57. These tests covered conditions from 8 km/s to 15.5 km/s at initial pressures ranging from 0.05 Torr to 1 Torr, of which shots at 0.1 and 0.2 Torr are analyzed in this paper. These conditions cover a range of points of interest for potential fight missions, including return from Low Earth Orbit, the Moon and Mars. The large volume of testing available from EAST is useful for statistical analysis of radiation data, but is problematic for identifying representative experiments for performing detailed analysis. Therefore, the intent of this paper is to select a subset of benchmark test data that can be considered for further detailed study. These benchmark shots are intended to provide more accessible data sets for future code validation studies and facility-to-facility comparisons. The shots that have been selected as benchmark data are the ones in closest agreement to a line of best fit through all of the EAST results, whilst also showing the best experimental characteristics, such as test time and convergence to equilibrium. The EAST data are presented in different formats for analysis. These data include the spectral radiance at equilibrium, the spatial dependence of radiance over defined wavelength ranges and the mean non-equilibrium spectral radiance (so-called 'spectral non-equilibrium metric'). All the information needed to simulate each experimental trace, including free-stream conditions, shock time of arrival (i.e. x-t) relation, and the spectral and spatial resolution functions, are provided.
7 CFR 1794.4 - Metric units.

Code of Federal Regulations, 2014 CFR

2014-01-01

... environmental documents using non-metric equivalents with one of the following two options; metric units in parentheses immediately following the non-metric equivalents or a metric conversion table as an appendix...
7 CFR 1794.4 - Metric units.

Code of Federal Regulations, 2010 CFR

2010-01-01

... environmental documents using non-metric equivalents with one of the following two options; metric units in parentheses immediately following the non-metric equivalents or a metric conversion table as an appendix...
7 CFR 1794.4 - Metric units.

Code of Federal Regulations, 2013 CFR

2013-01-01

... environmental documents using non-metric equivalents with one of the following two options; metric units in parentheses immediately following the non-metric equivalents or a metric conversion table as an appendix...
7 CFR 1794.4 - Metric units.

Code of Federal Regulations, 2011 CFR

2011-01-01

... environmental documents using non-metric equivalents with one of the following two options; metric units in parentheses immediately following the non-metric equivalents or a metric conversion table as an appendix...
7 CFR 1794.4 - Metric units.

Code of Federal Regulations, 2012 CFR

2012-01-01

... environmental documents using non-metric equivalents with one of the following two options; metric units in parentheses immediately following the non-metric equivalents or a metric conversion table as an appendix...
Application of Benchmark Dose Methodology to a Variety of Endpoints and Exposures

EPA Science Inventory

This latest beta version (1.1b) of the U.S. Environmental Protection Agency (EPA) Benchmark Dose Software (BMDS) is being distributed for public comment. The BMDS system is being developed as a tool to facilitate the application of benchmark dose (BMD) methods to EPA hazardous p...

Specification-based software sizing: An empirical investigation of function metrics

NASA Technical Reports Server (NTRS)

Jeffery, Ross; Stathis, John

1993-01-01

For some time the software industry has espoused the need for improved specification-based software size metrics. This paper reports on a study of nineteen recently developed systems in a variety of application domains. The systems were developed by a single software services corporation using a variety of languages. The study investigated several metric characteristics. It shows that: earlier research into inter-item correlation within the overall function count is partially supported; a priori function counts, in themself, do not explain the majority of the effort variation in software development in the organization studied; documentation quality is critical to accurate function identification; and rater error is substantial in manual function counting. The implication of these findings for organizations using function based metrics are explored.
Do-It-Yourself Metrics

ERIC Educational Resources Information Center

Klubeck, Martin; Langthorne, Michael; Padgett, Don

2006-01-01

Something new is on the horizon, and depending on one's role on campus, it might be storm clouds or a cleansing shower. Either way, no matter how hard one tries to avoid it, sooner rather than later he/she will have to deal with metrics. Metrics do not have to cause fear and resistance. Metrics can, and should, be a powerful tool for improvement.…
The LSST Metrics Analysis Framework (MAF)

NASA Astrophysics Data System (ADS)

Jones, R. Lynne; Yoachim, Peter; Chandrasekharan, Srinivasan; Connolly, Andrew J.; Cook, Kem H.; Ivezic, Zeljko; Krughoff, K. Simon; Petry, Catherine E.; Ridgway, Stephen T.

2015-01-01

Studying potential observing strategies or cadences for the Large Synoptic Survey Telescope (LSST) is a complicated but important problem. To address this, LSST has created an Operations Simulator (OpSim) to create simulated surveys, including realistic weather and sky conditions. Analyzing the results of these simulated surveys for the wide variety of science cases to be considered for LSST is, however, difficult. We have created a Metric Analysis Framework (MAF), an open-source python framework, to be a user-friendly, customizable and easily extensible tool to help analyze the outputs of the OpSim.MAF reads the pointing history of the LSST generated by the OpSim, then enables the subdivision of these pointings based on position on the sky (RA/Dec, etc.) or the characteristics of the observations (e.g. airmass or sky brightness) and a calculation of how well these observations meet a specified science objective (or metric). An example simple metric could be the mean single visit limiting magnitude for each position in the sky; a more complex metric might be the expected astrometric precision. The output of these metrics can be generated for a full survey, for specified time intervals, or for regions of the sky, and can be easily visualized using a web interface.An important goal for MAF is to facilitate analysis of the OpSim outputs for a wide variety of science cases. A user can often write a new metric to evaluate OpSim for new science goals in less than a day once they are familiar with the framework. Some of these new metrics are illustrated in the accompanying poster, "Analyzing Simulated LSST Survey Performance With MAF".While MAF has been developed primarily for application to OpSim outputs, it can be applied to any dataset. The most obvious examples are examining pointing histories of other survey projects or telescopes, such as CFHT.
Piloting a Process Maturity Model as an e-Learning Benchmarking Method

ERIC Educational Resources Information Center

Petch, Jim; Calverley, Gayle; Dexter, Hilary; Cappelli, Tim

2007-01-01

As part of a national e-learning benchmarking initiative of the UK Higher Education Academy, the University of Manchester is carrying out a pilot study of a method to benchmark e-learning in an institution. The pilot was designed to evaluate the operational viability of a method based on the e-Learning Maturity Model developed at the University of…
Comparison of 3D displays using objective metrics

NASA Astrophysics Data System (ADS)

Havig, Paul; McIntire, John; Dixon, Sharon; Moore, Jason; Reis, George

2008-04-01

Previously, we (Havig, Aleva, Reis, Moore, and McIntire, 2007) presented a taxonomy for the development of three-dimensional (3D) displays. We proposed three levels of metrics: objective (in which physical measurements are made of the display), subjective (Likert-type rating scales to show preferences of the display), and subjective-objective (performance metrics in which one shows how the 3D display may be more or less useful than a 2D display or a different 3D display). We concluded that for each level of metric, drawing practical comparisons among currently disparate 3D displays is difficult. In this paper we attempt to define more clearly the objective metrics for 3D displays. We set out to collect and measure physical attributes of several 3D displays and compare the results. We discuss our findings in terms of both difficulties in making the measurements in the first place, due to the physical set-up of the display, to issues in comparing the results we found and comparing how similar (or dissimilar) two 3D displays may or may not be. We conclude by discussing the next steps in creating objective metrics for three-dimensional displays as well as a proposed way ahead for the other two levels of metrics based on our findings.
Hyperkahler metrics on focus-focus fibrations

NASA Astrophysics Data System (ADS)

Zhao, Jie

In this thesis, we focus on the study of hyperkahler metric in four dimensional cases, and practice GMN's construction of hyperkahler metric on focus-focus fibrations. We explicitly compute the action-angle coordinates on the local model of focus-focus fibration, and show its semi-global invariant should be harmonic to admit a compatible holomorphic 2-form. Then we study the canonical semi-flat metric on it. After the instanton correction inspired by physics, we get a family of the generalized Ooguri-Vafa metric on focus-focus fibrations, which becomes more local examples of explicit hyperkahler metric in four dimensional cases. In addition, we also make some exploration of the Ooguri-Vafa metric in the thesis. We study the potential function of the Ooguri-Vafa metric, and prove that its nodal set is a cylinder of bounded radius 1 < R < 1. As a result, we get that only on a finite neighborhood of the singular fibre the Ooguri-Vafa metric is a hyperkahler metric. Finally, we give some estimates for the diameter of the fibration under the Oogui-Vafa metric, which confirms that the Oogui-Vafa metric is not complete. The new family of metric constructed in the thesis, we think, will provide more examples to further study of Lagrangian fibrations and mirror symmetry in future.
Holographic Spherically Symmetric Metrics

NASA Astrophysics Data System (ADS)

Petri, Michael

The holographic principle (HP) conjectures, that the maximum number of degrees of freedom of any realistic physical system is proportional to the system's boundary area. The HP has its roots in the study of black holes. It has recently been applied to cosmological solutions. In this article we apply the HP to spherically symmetric static space-times. We find that any regular spherically symmetric object saturating the HP is subject to tight constraints on the (interior) metric, energy-density, temperature and entropy-density. Whenever gravity can be described by a metric theory, gravity is macroscopically scale invariant and the laws of thermodynamics hold locally and globally, the (interior) metric of a regular holographic object is uniquely determined up to a constant factor and the interior matter-state must follow well defined scaling relations. When the metric theory of gravity is general relativity, the interior matter has an overall string equation of state (EOS) and a unique total energy-density. Thus the holographic metric derived in this article can serve as simple interior 4D realization of Mathur's string fuzzball proposal. Some properties of the holographic metric and its possible experimental verification are discussed. The geodesics of the holographic metric describe an isotropically expanding (or contracting) universe with a nearly homogeneous matter-distribution within the local Hubble volume. Due to the overall string EOS the active gravitational mass-density is zero, resulting in a coasting expansion with Ht = 1, which is compatible with the recent GRB-data.
Think Metric

USGS Publications Warehouse

,

1978-01-01

The International System of Units, as the metric system is officially called, provides for a single "language" to describe weights and measures over the world. We in the United States together with the people of Brunei, Burma, and Yemen are the only ones who have not put this convenient system into effect. In the passage of the Metric Conversion Act of 1975, Congress determined that we also will adopt it, but the transition will be voluntary.
Semantic Metrics for Analysis of Software

NASA Technical Reports Server (NTRS)

Etzkorn, Letha H.; Cox, Glenn W.; Farrington, Phil; Utley, Dawn R.; Ghalston, Sampson; Stein, Cara

2005-01-01

A recently conceived suite of object-oriented software metrics focus is on semantic aspects of software, in contradistinction to traditional software metrics, which focus on syntactic aspects of software. Semantic metrics represent a more human-oriented view of software than do syntactic metrics. The semantic metrics of a given computer program are calculated by use of the output of a knowledge-based analysis of the program, and are substantially more representative of software quality and more readily comprehensible from a human perspective than are the syntactic metrics.
Handbook of aircraft noise metrics

NASA Technical Reports Server (NTRS)

Bennett, R. L.; Pearsons, K. S.

1981-01-01

Information is presented on 22 noise metrics that are associated with the measurement and prediction of the effects of aircraft noise. Some of the instantaneous frequency weighted sound level measures, such as A-weighted sound level, are used to provide multiple assessment of the aircraft noise level. Other multiple event metrics, such as day-night average sound level, were designed to relate sound levels measured over a period of time to subjective responses in an effort to determine compatible land uses and aid in community planning. The various measures are divided into: (1) instantaneous sound level metrics; (2) duration corrected single event metrics; (3) multiple event metrics; and (4) speech communication metrics. The scope of each measure is examined in terms of its: definition, purpose, background, relationship to other measures, calculation method, example, equipment, references, and standards.
Handbook of aircraft noise metrics

NASA Astrophysics Data System (ADS)

Bennett, R. L.; Pearsons, K. S.

1981-03-01

Information is presented on 22 noise metrics that are associated with the measurement and prediction of the effects of aircraft noise. Some of the instantaneous frequency weighted sound level measures, such as A-weighted sound level, are used to provide multiple assessment of the aircraft noise level. Other multiple event metrics, such as day-night average sound level, were designed to relate sound levels measured over a period of time to subjective responses in an effort to determine compatible land uses and aid in community planning. The various measures are divided into: (1) instantaneous sound level metrics; (2) duration corrected single event metrics; (3) multiple event metrics; and (4) speech communication metrics. The scope of each measure is examined in terms of its: definition, purpose, background, relationship to other measures, calculation method, example, equipment, references, and standards.
Advanced Metrics for Assessing Holistic Care: The "Epidaurus 2" Project.

PubMed

Foote, Frederick O; Benson, Herbert; Berger, Ann; Berman, Brian; DeLeo, James; Deuster, Patricia A; Lary, David J; Silverman, Marni N; Sternberg, Esther M

2018-01-01

In response to the challenge of military traumatic brain injury and posttraumatic stress disorder, the US military developed a wide range of holistic care modalities at the new Walter Reed National Military Medical Center, Bethesda, MD, from 2001 to 2017, guided by civilian expert consultation via the Epidaurus Project. These projects spanned a range from healing buildings to wellness initiatives and healing through nature, spirituality, and the arts. The next challenge was to develop whole-body metrics to guide the use of these therapies in clinical care. Under the "Epidaurus 2" Project, a national search produced 5 advanced metrics for measuring whole-body therapeutic effects: genomics, integrated stress biomarkers, language analysis, machine learning, and "Star Glyphs." This article describes the metrics, their current use in guiding holistic care at Walter Reed, and their potential for operationalizing personalized care, patient self-management, and the improvement of public health. Development of these metrics allows the scientific integration of holistic therapies with organ-system-based care, expanding the powers of medicine.
Emergency department performance measures updates: proceedings of the 2014 emergency department benchmarking alliance consensus summit.

PubMed

Wiler, Jennifer L; Welch, Shari; Pines, Jesse; Schuur, Jeremiah; Jouriles, Nick; Stone-Griffith, Suzanne

2015-05-01

The objective was to review and update key definitions and metrics for emergency department (ED) performance and operations. Forty-five emergency medicine leaders convened for the Third Performance Measures and Benchmarking Summit held in Las Vegas, February 21-22, 2014. Prior to arrival, attendees were assigned to workgroups to review, revise, and update the definitions and vocabulary being used to communicate about ED performance and operations. They were provided with the prior definitions of those consensus summits that were published in 2006 and 2010. Other published definitions from key stakeholders in emergency medicine and health care were also reviewed and circulated. At the summit, key terminology and metrics were discussed and debated. Workgroups communicated online, via teleconference, and finally in a face-to-face meeting to reach consensus regarding their recommendations. Recommendations were then posted and open to a 30-day comment period. Participants then reanalyzed the recommendations, and modifications were made based on consensus. A comprehensive dictionary of ED terminology related to ED performance and operation was developed. This article includes definitions of operating characteristics and internal and external factors relevant to the stratification and categorization of EDs. Time stamps, time intervals, and measures of utilization were defined. Definitions of processes and staffing measures are also presented. Definitions were harmonized with performance measures put forth by the Centers for Medicare and Medicaid Services (CMS) for consistency. Standardized definitions are necessary to improve the comparability of EDs nationally for operations research and practice. More importantly, clear precise definitions describing ED operations are needed for incentive-based pay-for-performance models like those developed by CMS. This document provides a common language for front-line practitioners, managers, health policymakers, and researchers. �
Status and Outlook of Metric Conversion of Standards: The Views of Nine Selected Major Standards Development Bodies.

DTIC Science & Technology

1982-05-01

insufficient need for a hard metric version of the ASME Boiler and Pressure Vessel Code and industry would not support the metric version. The Code Is not...aircraft industry is concerned with certification requirements in metric units. The inch-pound Boiler and Pressure Vessel Code is the current standard
Benchmark duration of work hours for development of fatigue symptoms in Japanese workers with adjustment for job-related stress.

PubMed

Suwazono, Yasushi; Dochi, Mirei; Kobayashi, Etsuko; Oishi, Mitsuhiro; Okubo, Yasushi; Tanaka, Kumihiko; Sakata, Kouichi

2008-12-01

The objective of this study was to calculate benchmark durations and lower 95% confidence limits for benchmark durations of working hours associated with subjective fatigue symptoms by applying the benchmark dose approach while adjusting for job-related stress using multiple logistic regression analyses. A self-administered questionnaire was completed by 3,069 male and 412 female daytime workers (age 18-67 years) in a Japanese steel company. The eight dependent variables in the Cumulative Fatigue Symptoms Index were decreased vitality, general fatigue, physical disorders, irritability, decreased willingness to work, anxiety, depressive feelings, and chronic tiredness. Independent variables were daily working hours, four subscales (job demand, job control, interpersonal relationship, and job suitability) of the Brief Job Stress Questionnaire, and other potential covariates. Using significant parameters for working hours and those for other covariates, the benchmark durations of working hours were calculated for the corresponding Index property. Benchmark response was set at 5% or 10%. Assuming a condition of worst job stress, the benchmark duration/lower 95% confidence limit for benchmark duration of working hours per day with a benchmark response of 5% or 10% were 10.0/9.4 or 11.7/10.7 (irritability) and 9.2/8.9 or 10.4/9.8 (chronic tiredness) in men and 8.9/8.4 or 9.8/8.9 (chronic tiredness) in women. The threshold amounts of working hours for fatigue symptoms under the worst job-related stress were very close to the standard daily working hours in Japan. The results strongly suggest that special attention should be paid to employees whose working hours exceed threshold amounts based on individual levels of job-related stress.
A metric for success

NASA Astrophysics Data System (ADS)

Carver, Gary P.

1994-05-01

The federal agencies are working with industry to ease adoption of the metric system. The goal is to help U.S. industry compete more successfully in the global marketplace, increase exports, and create new jobs. The strategy is to use federal procurement, financial assistance, and other business-related activities to encourage voluntary conversion. Based upon the positive experiences of firms and industries that have converted, federal agencies have concluded that metric use will yield long-term benefits that are beyond any one-time costs or inconveniences. It may be time for additional steps to move the Nation out of its dual-system comfort zone and continue to progress toward metrication. This report includes 'Metric Highlights in U.S. History'.
42 CFR 440.330 - Benchmark health benefits coverage.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 42 Public Health 4 2012-10-01 2012-10-01 false Benchmark health benefits coverage. 440.330 Section 440.330 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF HEALTH AND HUMAN... Benchmark-Equivalent Coverage § 440.330 Benchmark health benefits coverage. Benchmark coverage is health...
Benchmarking Data Sets for the Evaluation of Virtual Ligand Screening Methods: Review and Perspectives.

PubMed

Lagarde, Nathalie; Zagury, Jean-François; Montes, Matthieu

2015-07-27

Virtual screening methods are commonly used nowadays in drug discovery processes. However, to ensure their reliability, they have to be carefully evaluated. The evaluation of these methods is often realized in a retrospective way, notably by studying the enrichment of benchmarking data sets. To this purpose, numerous benchmarking data sets were developed over the years, and the resulting improvements led to the availability of high quality benchmarking data sets. However, some points still have to be considered in the selection of the active compounds, decoys, and protein structures to obtain optimal benchmarking data sets.
A benchmarking program to reduce red blood cell outdating: implementation, evaluation, and a conceptual framework.

PubMed

Barty, Rebecca L; Gagliardi, Kathleen; Owens, Wendy; Lauzon, Deborah; Scheuermann, Sheena; Liu, Yang; Wang, Grace; Pai, Menaka; Heddle, Nancy M

2015-07-01

Benchmarking is a quality improvement tool that compares an organization's performance to that of its peers for selected indicators, to improve practice. Processes to develop evidence-based benchmarks for red blood cell (RBC) outdating in Ontario hospitals, based on RBC hospital disposition data from Canadian Blood Services, have been previously reported. These benchmarks were implemented in 160 hospitals provincewide with a multifaceted approach, which included hospital education, inventory management tools and resources, summaries of best practice recommendations, recognition of high-performing sites, and audit tools on the Transfusion Ontario website (http://transfusionontario.org). In this study we describe the implementation process and the impact of the benchmarking program on RBC outdating. A conceptual framework for continuous quality improvement of a benchmarking program was also developed. The RBC outdating rate for all hospitals trended downward continuously from April 2006 to February 2012, irrespective of hospitals' transfusion rates or their distance from the blood supplier. The highest annual outdating rate was 2.82%, at the beginning of the observation period. Each year brought further reductions, with a nadir outdating rate of 1.02% achieved in 2011. The key elements of the successful benchmarking strategy included dynamic targets, a comprehensive and evidence-based implementation strategy, ongoing information sharing, and a robust data system to track information. The Ontario benchmarking program for RBC outdating resulted in continuous and sustained quality improvement. Our conceptual iterative framework for benchmarking provides a guide for institutions implementing a benchmarking program. © 2015 AABB.
New Objective Refraction Metric Based on Sphere Fitting to the Wavefront.

PubMed

Jaskulski, Mateusz; Martínez-Finkelshtein, Andreí; López-Gil, Norberto

2017-01-01

To develop an objective refraction formula based on the ocular wavefront error (WFE) expressed in terms of Zernike coefficients and pupil radius, which would be an accurate predictor of subjective spherical equivalent (SE) for different pupil sizes. A sphere is fitted to the ocular wavefront at the center and at a variable distance, t . The optimal fitting distance, t opt , is obtained empirically from a dataset of 308 eyes as a function of objective refraction pupil radius, r 0 , and used to define the formula of a new wavefront refraction metric (MTR). The metric is tested in another, independent dataset of 200 eyes. For pupil radii r 0 ≤ 2 mm, the new metric predicts the equivalent sphere with similar accuracy (<0.1D), however, for r 0 > 2 mm, the mean error of traditional metrics can increase beyond 0.25D, and the MTR remains accurate. The proposed metric allows clinicians to obtain an accurate clinical spherical equivalent value without rescaling/refitting of the wavefront coefficients. It has the potential to be developed into a metric which will be able to predict full spherocylindrical refraction for the desired illumination conditions and corresponding pupil size.

Quality metrics in high-dimensional data visualization: an overview and systematization.

PubMed

Bertini, Enrico; Tatu, Andrada; Keim, Daniel

2011-12-01

In this paper, we present a systematization of techniques that use quality metrics to help in the visual exploration of meaningful patterns in high-dimensional data. In a number of recent papers, different quality metrics are proposed to automate the demanding search through large spaces of alternative visualizations (e.g., alternative projections or ordering), allowing the user to concentrate on the most promising visualizations suggested by the quality metrics. Over the last decade, this approach has witnessed a remarkable development but few reflections exist on how these methods are related to each other and how the approach can be developed further. For this purpose, we provide an overview of approaches that use quality metrics in high-dimensional data visualization and propose a systematization based on a thorough literature review. We carefully analyze the papers and derive a set of factors for discriminating the quality metrics, visualization techniques, and the process itself. The process is described through a reworked version of the well-known information visualization pipeline. We demonstrate the usefulness of our model by applying it to several existing approaches that use quality metrics, and we provide reflections on implications of our model for future research. © 2010 IEEE
Prioritizing Urban Habitats for Connectivity Conservation: Integrating Centrality and Ecological Metrics.

PubMed

Poodat, Fatemeh; Arrowsmith, Colin; Fraser, David; Gordon, Ascelin

2015-09-01

Connectivity among fragmented areas of habitat has long been acknowledged as important for the viability of biological conservation, especially within highly modified landscapes. Identifying important habitat patches in ecological connectivity is a priority for many conservation strategies, and the application of 'graph theory' has been shown to provide useful information on connectivity. Despite the large number of metrics for connectivity derived from graph theory, only a small number have been compared in terms of the importance they assign to nodes in a network. This paper presents a study that aims to define a new set of metrics and compares these with traditional graph-based metrics, used in the prioritization of habitat patches for ecological connectivity. The metrics measured consist of "topological" metrics, "ecological metrics," and "integrated metrics," Integrated metrics are a combination of topological and ecological metrics. Eight metrics were applied to the habitat network for the fat-tailed dunnart within Greater Melbourne, Australia. A non-directional network was developed in which nodes were linked to adjacent nodes. These links were then weighted by the effective distance between patches. By applying each of the eight metrics for the study network, nodes were ranked according to their contribution to the overall network connectivity. The structured comparison revealed the similarity and differences in the way the habitat for the fat-tailed dunnart was ranked based on different classes of metrics. Due to the differences in the way the metrics operate, a suitable metric should be chosen that best meets the objectives established by the decision maker.
Phase field benchmark problems for dendritic growth and linear elasticity

DOE PAGES

Jokisaari, Andrea M.; Voorhees, P. W.; Guyer, Jonathan E.; ...

2018-03-26

We present the second set of benchmark problems for phase field models that are being jointly developed by the Center for Hierarchical Materials Design (CHiMaD) and the National Institute of Standards and Technology (NIST) along with input from other members in the phase field community. As the integrated computational materials engineering (ICME) approach to materials design has gained traction, there is an increasing need for quantitative phase field results. New algorithms and numerical implementations increase computational capabilities, necessitating standard problems to evaluate their impact on simulated microstructure evolution as well as their computational performance. We propose one benchmark problem formore » solidifiication and dendritic growth in a single-component system, and one problem for linear elasticity via the shape evolution of an elastically constrained precipitate. We demonstrate the utility and sensitivity of the benchmark problems by comparing the results of 1) dendritic growth simulations performed with different time integrators and 2) elastically constrained precipitate simulations with different precipitate sizes, initial conditions, and elastic moduli. As a result, these numerical benchmark problems will provide a consistent basis for evaluating different algorithms, both existing and those to be developed in the future, for accuracy and computational efficiency when applied to simulate physics often incorporated in phase field models.« less
Phase field benchmark problems for dendritic growth and linear elasticity

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jokisaari, Andrea M.; Voorhees, P. W.; Guyer, Jonathan E.

We present the second set of benchmark problems for phase field models that are being jointly developed by the Center for Hierarchical Materials Design (CHiMaD) and the National Institute of Standards and Technology (NIST) along with input from other members in the phase field community. As the integrated computational materials engineering (ICME) approach to materials design has gained traction, there is an increasing need for quantitative phase field results. New algorithms and numerical implementations increase computational capabilities, necessitating standard problems to evaluate their impact on simulated microstructure evolution as well as their computational performance. We propose one benchmark problem formore » solidifiication and dendritic growth in a single-component system, and one problem for linear elasticity via the shape evolution of an elastically constrained precipitate. We demonstrate the utility and sensitivity of the benchmark problems by comparing the results of 1) dendritic growth simulations performed with different time integrators and 2) elastically constrained precipitate simulations with different precipitate sizes, initial conditions, and elastic moduli. As a result, these numerical benchmark problems will provide a consistent basis for evaluating different algorithms, both existing and those to be developed in the future, for accuracy and computational efficiency when applied to simulate physics often incorporated in phase field models.« less
Scholarly Metrics Baseline: A Survey of Faculty Knowledge, Use, and Opinion about Scholarly Metrics

ERIC Educational Resources Information Center

DeSanto, Dan; Nichols, Aaron

2017-01-01

This article presents the results of a faculty survey conducted at the University of Vermont during academic year 2014-2015. The survey asked faculty about: familiarity with scholarly metrics, metric-seeking habits, help-seeking habits, and the role of metrics in their department's tenure and promotion process. The survey also gathered faculty…
The Publications Tracking and Metrics Program at NOAO: Challenges and Opportunities

NASA Astrophysics Data System (ADS)

Hunt, Sharon

2015-08-01

The National Optical Astronomy Observatory (NOAO) is the U.S. national research and development center for ground-based nighttime astronomy. The NOAO librarian manages the organization’s publications tracking and metrics program, which consists of three components: identifying publications, organizing citation data, and disseminating publications information. We are developing methods to streamline these tasks, better organize our data, provide greater accessibility to publications data, and add value to our services.Our publications tracking process is complex, as we track refereed publications citing data from several sources: NOAO telescopes at two observatory sites, telescopes of consortia in which NOAO participates, the NOAO Science Archive, and NOAO-granted community-access time on non-NOAO telescopes. We also identify and document our scientific staff publications. In addition, several individuals contribute publications data.In the past year, we made several changes in our publications tracking and metrics program. To better organize our data and streamline the creation of reports and metrics, we created a MySQL publications database. When designing this relational database, we considered ease of use, the ability to incorporate data from various sources, efficiency in data inputting and sorting, and potential for growth. We also considered the types of metrics we wished to generate from our publications data based on our target audiences and the messages we wanted to convey. To increase accessibility and dissemination of publications information, we developed a publications section on the library’s website, with citation lists, acknowledgements guidelines, and metrics. We are now developing a searchable online database for our website using PHP.The publications tracking and metrics program has provided many opportunities for the library to market its services and contribute to the organization’s mission. As we make decisions on collecting, organizing
Say "Yes" to Metric Measure.

ERIC Educational Resources Information Center

Monroe, Eula Ewing; Nelson, Marvin N.

2000-01-01

Provides a brief history of the metric system. Discusses the infrequent use of the metric measurement system in the United States, why conversion from the customary system to the metric system is difficult, and the need for change. (Contains 14 resources.) (ASK)
Parallel Ada benchmarks for the SVMS

NASA Technical Reports Server (NTRS)

Collard, Philippe E.

1990-01-01

The use of parallel processing paradigm to design and develop faster and more reliable computers appear to clearly mark the future of information processing. NASA started the development of such an architecture: the Spaceborne VHSIC Multi-processor System (SVMS). Ada will be one of the languages used to program the SVMS. One of the unique characteristics of Ada is that it supports parallel processing at the language level through the tasking constructs. It is important for the SVMS project team to assess how efficiently the SVMS architecture will be implemented, as well as how efficiently Ada environment will be ported to the SVMS. AUTOCLASS II, a Bayesian classifier written in Common Lisp, was selected as one of the benchmarks for SVMS configurations. The purpose of the R and D effort was to provide the SVMS project team with the version of AUTOCLASS II, written in Ada, that would make use of Ada tasking constructs as much as possible so as to constitute a suitable benchmark. Additionally, a set of programs was developed that would measure Ada tasking efficiency on parallel architectures as well as determine the critical parameters influencing tasking efficiency. All this was designed to provide the SVMS project team with a set of suitable tools in the development of the SVMS architecture.
Multisensor benchmark data for riot control

NASA Astrophysics Data System (ADS)

Jäger, Uwe; Höpken, Marc; Dürr, Bernhard; Metzler, Jürgen; Willersinn, Dieter

2008-10-01

Quick and precise response is essential for riot squads when coping with escalating violence in crowds. Often it is just a single person, known as the leader of the gang, who instigates other people and thus is responsible of excesses. Putting this single person out of action in most cases leads to a de-escalating situation. Fostering de-escalations is one of the main tasks of crowd and riot control. To do so, extensive situation awareness is mandatory for the squads and can be promoted by technical means such as video surveillance using sensor networks. To develop software tools for situation awareness appropriate input data with well-known quality is needed. Furthermore, the developer must be able to measure algorithm performance and ongoing improvements. Last but not least, after algorithm development has finished and marketing aspects emerge, meeting of specifications must be proved. This paper describes a multisensor benchmark which exactly serves this purpose. We first define the underlying algorithm task. Then we explain details about data acquisition and sensor setup and finally we give some insight into quality measures of multisensor data. Currently, the multisensor benchmark described in this paper is applied to the development of basic algorithms for situational awareness, e.g. tracking of individuals in a crowd.
NAS Parallel Benchmark Results 11-96. 1.0

NASA Technical Reports Server (NTRS)

Bailey, David H.; Bailey, David; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

The NAS Parallel Benchmarks have been developed at NASA Ames Research Center to study the performance of parallel supercomputers. The eight benchmark problems are specified in a "pencil and paper" fashion. In other words, the complete details of the problem to be solved are given in a technical document, and except for a few restrictions, benchmarkers are free to select the language constructs and implementation techniques best suited for a particular system. These results represent the best results that have been reported to us by the vendors for the specific 3 systems listed. In this report, we present new NPB (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu VPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz), NEC SX-4/32, SGI/CRAY T3E, SGI Origin200, and SGI Origin2000. We also report High Performance Fortran (HPF) based NPB results for IBM SP2 Wide Nodes, HP/Convex Exemplar SPP2000, and SGI/CRAY T3D. These results have been submitted by Applied Parallel Research (APR) and Portland Group Inc. (PGI). We also present sustained performance per dollar for Class B LU, SP and BT benchmarks.
The Grad-Shafranov Reconstruction of Toroidal Magnetic Flux Ropes: Method Development and Benchmark Studies

NASA Astrophysics Data System (ADS)

Hu, Qiang

2017-09-01

We develop an approach of the Grad-Shafranov (GS) reconstruction for toroidal structures in space plasmas, based on in situ spacecraft measurements. The underlying theory is the GS equation that describes two-dimensional magnetohydrostatic equilibrium, as widely applied in fusion plasmas. The geometry is such that the arbitrary cross-section of the torus has rotational symmetry about the rotation axis, Z, with a major radius, r0. The magnetic field configuration is thus determined by a scalar flux function, Ψ, and a functional F that is a single-variable function of Ψ. The algorithm is implemented through a two-step approach: i) a trial-and-error process by minimizing the residue of the functional F(Ψ) to determine an optimal Z-axis orientation, and ii) for the chosen Z, a χ2 minimization process resulting in a range of r0. Benchmark studies of known analytic solutions to the toroidal GS equation with noise additions are presented to illustrate the two-step procedure and to demonstrate the performance of the numerical GS solver, separately. For the cases presented, the errors in Z and r0 are 9° and 22%, respectively, and the relative percent error in the numerical GS solutions is smaller than 10%. We also make public the computer codes for these implementations and benchmark studies.
Rural Elementary Students' Understanding of Science and Agricultural Education Benchmarks Related to Meat and Livestock.

ERIC Educational Resources Information Center

Meischen, Deanna L.; Trexler, Cary J.

2003-01-01

Seven fifth-graders developed concept maps depicting their knowledge of meat product development. Despite their rural background, they lacked understanding of agriculture concepts and had mixed knowledge of agricultural literacy benchmarks concerning food products. Their language did not reflect scientific terminology in the benchmarks. (Contains…
A call for benchmarking transposable element annotation methods.

PubMed

Hoen, Douglas R; Hickey, Glenn; Bourque, Guillaume; Casacuberta, Josep; Cordaux, Richard; Feschotte, Cédric; Fiston-Lavier, Anna-Sophie; Hua-Van, Aurélie; Hubley, Robert; Kapusta, Aurélie; Lerat, Emmanuelle; Maumus, Florian; Pollock, David D; Quesneville, Hadi; Smit, Arian; Wheeler, Travis J; Bureau, Thomas E; Blanchette, Mathieu

2015-01-01

DNA derived from transposable elements (TEs) constitutes large parts of the genomes of complex eukaryotes, with major impacts not only on genomic research but also on how organisms evolve and function. Although a variety of methods and tools have been developed to detect and annotate TEs, there are as yet no standard benchmarks-that is, no standard way to measure or compare their accuracy. This lack of accuracy assessment calls into question conclusions from a wide range of research that depends explicitly or implicitly on TE annotation. In the absence of standard benchmarks, toolmakers are impeded in improving their tools, annotators cannot properly assess which tools might best suit their needs, and downstream researchers cannot judge how accuracy limitations might impact their studies. We therefore propose that the TE research community create and adopt standard TE annotation benchmarks, and we call for other researchers to join the authors in making this long-overdue effort a success.
A College Level Program for Instructing the Metric System.

ERIC Educational Resources Information Center

Thrall, Marjorie A.

Although the Congress of the United States has enacted legislation calling for a conversion to the metric system by October 1992, recent government reports suggest that the country may not be prepared to meet that deadline. In an effort to develop a learning module for instructing community college students in the application of the metric system,…
Metrics, The Measure of Your Future: Evaluation Report, 1977.

ERIC Educational Resources Information Center

North Carolina State Dept. of Public Instruction, Raleigh. Div. of Development.

The primary goal of the Metric Education Project was the systematic development of a replicable educational model to facilitate the system-wide conversion to the metric system during the next five to ten years. This document is an evaluation of that project. Three sets of statistical evidence exist to support the fact that the project has been…
Benchmarking for On-Scalp MEG Sensors.

PubMed

Xie, Minshu; Schneiderman, Justin F; Chukharkin, Maxim L; Kalabukhov, Alexei; Riaz, Bushra; Lundqvist, Daniel; Whitmarsh, Stephen; Hamalainen, Matti; Jousmaki, Veikko; Oostenveld, Robert; Winkler, Dag

2017-06-01

We present a benchmarking protocol for quantitatively comparing emerging on-scalp magnetoencephalography (MEG) sensor technologies to their counterparts in state-of-the-art MEG systems. As a means of validation, we compare a high-critical-temperature superconducting quantum interference device (high T c SQUID) with the low- T c SQUIDs of an Elekta Neuromag TRIUX system in MEG recordings of auditory and somatosensory evoked fields (SEFs) on one human subject. We measure the expected signal gain for the auditory-evoked fields (deeper sources) and notice some unfamiliar features in the on-scalp sensor-based recordings of SEFs (shallower sources). The experimental results serve as a proof of principle for the benchmarking protocol. This approach is straightforward, general to various on-scalp MEG sensors, and convenient to use on human subjects. The unexpected features in the SEFs suggest on-scalp MEG sensors may reveal information about neuromagnetic sources that is otherwise difficult to extract from state-of-the-art MEG recordings. As the first systematically established on-scalp MEG benchmarking protocol, magnetic sensor developers can employ this method to prove the utility of their technology in MEG recordings. Further exploration of the SEFs with on-scalp MEG sensors may reveal unique information about their sources.
Systematic Benchmarking of Diagnostic Technologies for an Electrical Power System

NASA Technical Reports Server (NTRS)

Kurtoglu, Tolga; Jensen, David; Poll, Scott

2009-01-01

Automated health management is a critical functionality for complex aerospace systems. A wide variety of diagnostic algorithms have been developed to address this technical challenge. Unfortunately, the lack of support to perform large-scale V&V (verification and validation) of diagnostic technologies continues to create barriers to effective development and deployment of such algorithms for aerospace vehicles. In this paper, we describe a formal framework developed for benchmarking of diagnostic technologies. The diagnosed system is the Advanced Diagnostics and Prognostics Testbed (ADAPT), a real-world electrical power system (EPS), developed and maintained at the NASA Ames Research Center. The benchmarking approach provides a systematic, empirical basis to the testing of diagnostic software and is used to provide performance assessment for different diagnostic algorithms.
Introduction to Metrics.

ERIC Educational Resources Information Center

Edgecomb, Philip L.; Shapiro, Marion

Addressed to vocational, or academic middle or high school students, this book reviews mathematics fundamentals using metric units of measurement. It utilizes a common-sense approach to the degree of accuracy needed in solving actual trade and every-day problems. Stress is placed on reading off metric measurements from a ruler or tape, and on…
Metric System Unit.

ERIC Educational Resources Information Center

Maggi, Gayle J. B.

Thirty-six lessons for introducing the metric system are outlined. Appropriate grade level is not specified. The metric lessons suggested include 13 lessons on length, 7 lessons on mass, 11 lessons on capacity, and 5 lessons on temperature. Each lesson includes a list of needed materials, a statement of the lesson purpose, and suggested…
Algal bioassessment metrics for wadeable streams and rivers of Maine, USA

USGS Publications Warehouse

Danielson, Thomas J.; Loftin, Cynthia S.; Tsomides, Leonidas; DiFranco, Jeanne L.; Connors, Beth

2011-01-01

Many state water-quality agencies use biological assessment methods based on lotic fish and macroinvertebrate communities, but relatively few states have incorporated algal multimetric indices into monitoring programs. Algae are good indicators for monitoring water quality because they are sensitive to many environmental stressors. We evaluated benthic algal community attributes along a landuse gradient affecting wadeable streams and rivers in Maine, USA, to identify potential bioassessment metrics. We collected epilithic algal samples from 193 locations across the state. We computed weighted-average optima for common taxa for total P, total N, specific conductance, % impervious cover, and % developed watershed, which included all land use that is no longer forest or wetland. We assigned Maine stream tolerance values and categories (sensitive, intermediate, tolerant) to taxa based on their optima and responses to watershed disturbance. We evaluated performance of algal community metrics used in multimetric indices from other regions and novel metrics based on Maine data. Metrics specific to Maine data, such as the relative richness of species characterized as being sensitive in Maine, were more correlated with % developed watershed than most metrics used in other regions. Few community-structure attributes (e.g., species richness) were useful metrics in Maine. Performance of algal bioassessment models would be improved if metrics were evaluated with attributes of local data before inclusion in multimetric indices or statistical models. ?? 2011 by The North American Benthological Society.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.