Metrics for Performance Evaluation of Patient Exercises during Physical Therapy.
Vakanski, Aleksandar; Ferguson, Jake M; Lee, Stephen
2017-06-01
The article proposes a set of metrics for evaluation of patient performance in physical therapy exercises. Taxonomy is employed that classifies the metrics into quantitative and qualitative categories, based on the level of abstraction of the captured motion sequences. Further, the quantitative metrics are classified into model-less and model-based metrics, in reference to whether the evaluation employs the raw measurements of patient performed motions, or whether the evaluation is based on a mathematical model of the motions. The reviewed metrics include root-mean square distance, Kullback Leibler divergence, log-likelihood, heuristic consistency, Fugl-Meyer Assessment, and similar. The metrics are evaluated for a set of five human motions captured with a Kinect sensor. The metrics can potentially be integrated into a system that employs machine learning for modelling and assessment of the consistency of patient performance in home-based therapy setting. Automated performance evaluation can overcome the inherent subjectivity in human performed therapy assessment, and it can increase the adherence to prescribed therapy plans, and reduce healthcare costs.
Model Performance Evaluation and Scenario Analysis (MPESA) Tutorial
This tool consists of two parts: model performance evaluation and scenario analysis (MPESA). The model performance evaluation consists of two components: model performance evaluation metrics and model diagnostics. These metrics provides modelers with statistical goodness-of-fit m...
On Applying the Prognostic Performance Metrics
NASA Technical Reports Server (NTRS)
Saxena, Abhinav; Celaya, Jose; Saha, Bhaskar; Saha, Sankalita; Goebel, Kai
2009-01-01
Prognostics performance evaluation has gained significant attention in the past few years. As prognostics technology matures and more sophisticated methods for prognostic uncertainty management are developed, a standardized methodology for performance evaluation becomes extremely important to guide improvement efforts in a constructive manner. This paper is in continuation of previous efforts where several new evaluation metrics tailored for prognostics were introduced and were shown to effectively evaluate various algorithms as compared to other conventional metrics. Specifically, this paper presents a detailed discussion on how these metrics should be interpreted and used. Several shortcomings identified, while applying these metrics to a variety of real applications, are also summarized along with discussions that attempt to alleviate these problems. Further, these metrics have been enhanced to include the capability of incorporating probability distribution information from prognostic algorithms as opposed to evaluation based on point estimates only. Several methods have been suggested and guidelines have been provided to help choose one method over another based on probability distribution characteristics. These approaches also offer a convenient and intuitive visualization of algorithm performance with respect to some of these new metrics like prognostic horizon and alpha-lambda performance, and also quantify the corresponding performance while incorporating the uncertainty information.
Multi-objective optimization for generating a weighted multi-model ensemble
NASA Astrophysics Data System (ADS)
Lee, H.
2017-12-01
Many studies have demonstrated that multi-model ensembles generally show better skill than each ensemble member. When generating weighted multi-model ensembles, the first step is measuring the performance of individual model simulations using observations. There is a consensus on the assignment of weighting factors based on a single evaluation metric. When considering only one evaluation metric, the weighting factor for each model is proportional to a performance score or inversely proportional to an error for the model. While this conventional approach can provide appropriate combinations of multiple models, the approach confronts a big challenge when there are multiple metrics under consideration. When considering multiple evaluation metrics, it is obvious that a simple averaging of multiple performance scores or model ranks does not address the trade-off problem between conflicting metrics. So far, there seems to be no best method to generate weighted multi-model ensembles based on multiple performance metrics. The current study applies the multi-objective optimization, a mathematical process that provides a set of optimal trade-off solutions based on a range of evaluation metrics, to combining multiple performance metrics for the global climate models and their dynamically downscaled regional climate simulations over North America and generating a weighted multi-model ensemble. NASA satellite data and the Regional Climate Model Evaluation System (RCMES) software toolkit are used for assessment of the climate simulations. Overall, the performance of each model differs markedly with strong seasonal dependence. Because of the considerable variability across the climate simulations, it is important to evaluate models systematically and make future projections by assigning optimized weighting factors to the models with relatively good performance. Our results indicate that the optimally weighted multi-model ensemble always shows better performance than an arithmetic ensemble mean and may provide reliable future projections.
Metrics for Evaluation of Student Models
ERIC Educational Resources Information Center
Pelanek, Radek
2015-01-01
Researchers use many different metrics for evaluation of performance of student models. The aim of this paper is to provide an overview of commonly used metrics, to discuss properties, advantages, and disadvantages of different metrics, to summarize current practice in educational data mining, and to provide guidance for evaluation of student…
Miller, Anna N; Kozar, Rosemary; Wolinsky, Philip
2017-06-01
Reproducible metrics are needed to evaluate the delivery of orthopaedic trauma care, national care, norms, and outliers. The American College of Surgeons (ACS) is uniquely positioned to collect and evaluate the data needed to evaluate orthopaedic trauma care via the Committee on Trauma and the Trauma Quality Improvement Project. We evaluated the first quality metrics the ACS has collected for orthopaedic trauma surgery to determine whether these metrics can be appropriately collected with accuracy and completeness. The metrics include the time to administration of the first dose of antibiotics for open fractures, the time to surgical irrigation and débridement of open tibial fractures, and the percentage of patients who undergo stabilization of femoral fractures at trauma centers nationwide. These metrics were analyzed to evaluate for variances in the delivery of orthopaedic care across the country. The data showed wide variances for all metrics, and many centers had incomplete ability to collect the orthopaedic trauma care metrics. There was a large variability in the results of the metrics collected among different trauma center levels, as well as among centers of a particular level. The ACS has successfully begun tracking orthopaedic trauma care performance measures, which will help inform reevaluation of the goals and continued work on data collection and improvement of patient care. Future areas of research may link these performance measures with patient outcomes, such as long-term tracking, to assess nonunion and function. This information can provide insight into center performance and its effect on patient outcomes. The ACS was able to successfully collect and evaluate the data for three metrics used to assess the quality of orthopaedic trauma care. However, additional research is needed to determine whether these metrics are suitable for evaluating orthopaedic trauma care and cutoff values for each metric.
NASA Astrophysics Data System (ADS)
Camp, H. A.; Moyer, Steven; Moore, Richard K.
2010-04-01
The Night Vision and Electronic Sensors Directorate's current time-limited search (TLS) model, which makes use of the targeting task performance (TTP) metric to describe image quality, does not explicitly account for the effects of visual clutter on observer performance. The TLS model is currently based on empirical fits to describe human performance for a time of day, spectrum and environment. Incorporating a clutter metric into the TLS model may reduce the number of these empirical fits needed. The masked target transform volume (MTTV) clutter metric has been previously presented and compared to other clutter metrics. Using real infrared imagery of rural images with varying levels of clutter, NVESD is currently evaluating the appropriateness of the MTTV metric. NVESD had twenty subject matter experts (SME) rank the amount of clutter in each scene in a series of pair-wise comparisons. MTTV metric values were calculated and then compared to the SME observers rankings. The MTTV metric ranked the clutter in a similar manner to the SME evaluation, suggesting that the MTTV metric may emulate SME response. This paper is a first step in quantifying clutter and measuring the agreement to subjective human evaluation.
Metrics for evaluating performance and uncertainty of Bayesian network models
Bruce G. Marcot
2012-01-01
This paper presents a selected set of existing and new metrics for gauging Bayesian network model performance and uncertainty. Selected existing and new metrics are discussed for conducting model sensitivity analysis (variance reduction, entropy reduction, case file simulation); evaluating scenarios (influence analysis); depicting model complexity (numbers of model...
An Evaluation of the IntelliMetric[SM] Essay Scoring System
ERIC Educational Resources Information Center
Rudner, Lawrence M.; Garcia, Veronica; Welch, Catherine
2006-01-01
This report provides a two-part evaluation of the IntelliMetric[SM] automated essay scoring system based on its performance scoring essays from the Analytic Writing Assessment of the Graduate Management Admission Test[TM] (GMAT[TM]). The IntelliMetric system performance is first compared to that of individual human raters, a Bayesian system…
Nindl, Bradley C; Jaffin, Dianna P; Dretsch, Michael N; Cheuvront, Samuel N; Wesensten, Nancy J; Kent, Michael L; Grunberg, Neil E; Pierce, Joseph R; Barry, Erin S; Scott, Jonathan M; Young, Andrew J; OʼConnor, Francis G; Deuster, Patricia A
2015-11-01
Human performance optimization (HPO) is defined as "the process of applying knowledge, skills and emerging technologies to improve and preserve the capabilities of military members, and organizations to execute essential tasks." The lack of consensus for operationally relevant and standardized metrics that meet joint military requirements has been identified as the single most important gap for research and application of HPO. In 2013, the Consortium for Health and Military Performance hosted a meeting to develop a toolkit of standardized HPO metrics for use in military and civilian research, and potentially for field applications by commanders, units, and organizations. Performance was considered from a holistic perspective as being influenced by various behaviors and barriers. To accomplish the goal of developing a standardized toolkit, key metrics were identified and evaluated across a spectrum of domains that contribute to HPO: physical performance, nutritional status, psychological status, cognitive performance, environmental challenges, sleep, and pain. These domains were chosen based on relevant data with regard to performance enhancers and degraders. The specific objectives at this meeting were to (a) identify and evaluate current metrics for assessing human performance within selected domains; (b) prioritize metrics within each domain to establish a human performance assessment toolkit; and (c) identify scientific gaps and the needed research to more effectively assess human performance across domains. This article provides of a summary of 150 total HPO metrics across multiple domains that can be used as a starting point-the beginning of an HPO toolkit: physical fitness (29 metrics), nutrition (24 metrics), psychological status (36 metrics), cognitive performance (35 metrics), environment (12 metrics), sleep (9 metrics), and pain (5 metrics). These metrics can be particularly valuable as the military emphasizes a renewed interest in Human Dimension efforts, and leverages science, resources, programs, and policies to optimize the performance capacities of all Service members.
Model Performance Evaluation and Scenario Analysis (MPESA) Tutorial
The model performance evaluation consists of metrics and model diagnostics. These metrics provides modelers with statistical goodness-of-fit measures that capture magnitude only, sequence only, and combined magnitude and sequence errors.
NASA Astrophysics Data System (ADS)
Gide, Milind S.; Karam, Lina J.
2016-08-01
With the increased focus on visual attention (VA) in the last decade, a large number of computational visual saliency methods have been developed over the past few years. These models are traditionally evaluated by using performance evaluation metrics that quantify the match between predicted saliency and fixation data obtained from eye-tracking experiments on human observers. Though a considerable number of such metrics have been proposed in the literature, there are notable problems in them. In this work, we discuss shortcomings in existing metrics through illustrative examples and propose a new metric that uses local weights based on fixation density which overcomes these flaws. To compare the performance of our proposed metric at assessing the quality of saliency prediction with other existing metrics, we construct a ground-truth subjective database in which saliency maps obtained from 17 different VA models are evaluated by 16 human observers on a 5-point categorical scale in terms of their visual resemblance with corresponding ground-truth fixation density maps obtained from eye-tracking data. The metrics are evaluated by correlating metric scores with the human subjective ratings. The correlation results show that the proposed evaluation metric outperforms all other popular existing metrics. Additionally, the constructed database and corresponding subjective ratings provide an insight into which of the existing metrics and future metrics are better at estimating the quality of saliency prediction and can be used as a benchmark.
Metrics for Offline Evaluation of Prognostic Performance
NASA Technical Reports Server (NTRS)
Saxena, Abhinav; Celaya, Jose; Saha, Bhaskar; Saha, Sankalita; Goebel, Kai
2010-01-01
Prognostic performance evaluation has gained significant attention in the past few years. Currently, prognostics concepts lack standard definitions and suffer from ambiguous and inconsistent interpretations. This lack of standards is in part due to the varied end-user requirements for different applications, time scales, available information, domain dynamics, etc. to name a few. The research community has used a variety of metrics largely based on convenience and their respective requirements. Very little attention has been focused on establishing a standardized approach to compare different efforts. This paper presents several new evaluation metrics tailored for prognostics that were recently introduced and were shown to effectively evaluate various algorithms as compared to other conventional metrics. Specifically, this paper presents a detailed discussion on how these metrics should be interpreted and used. These metrics have the capability of incorporating probabilistic uncertainty estimates from prognostic algorithms. In addition to quantitative assessment they also offer a comprehensive visual perspective that can be used in designing the prognostic system. Several methods are suggested to customize these metrics for different applications. Guidelines are provided to help choose one method over another based on distribution characteristics. Various issues faced by prognostics and its performance evaluation are discussed followed by a formal notational framework to help standardize subsequent developments.
Evaluating Algorithm Performance Metrics Tailored for Prognostics
NASA Technical Reports Server (NTRS)
Saxena, Abhinav; Celaya, Jose; Saha, Bhaskar; Saha, Sankalita; Goebel, Kai
2009-01-01
Prognostics has taken a center stage in Condition Based Maintenance (CBM) where it is desired to estimate Remaining Useful Life (RUL) of the system so that remedial measures may be taken in advance to avoid catastrophic events or unwanted downtimes. Validation of such predictions is an important but difficult proposition and a lack of appropriate evaluation methods renders prognostics meaningless. Evaluation methods currently used in the research community are not standardized and in many cases do not sufficiently assess key performance aspects expected out of a prognostics algorithm. In this paper we introduce several new evaluation metrics tailored for prognostics and show that they can effectively evaluate various algorithms as compared to other conventional metrics. Specifically four algorithms namely; Relevance Vector Machine (RVM), Gaussian Process Regression (GPR), Artificial Neural Network (ANN), and Polynomial Regression (PR) are compared. These algorithms vary in complexity and their ability to manage uncertainty around predicted estimates. Results show that the new metrics rank these algorithms in different manner and depending on the requirements and constraints suitable metrics may be chosen. Beyond these results, these metrics offer ideas about how metrics suitable to prognostics may be designed so that the evaluation procedure can be standardized. 1
Evaluating true BCI communication rate through mutual information and language models.
Speier, William; Arnold, Corey; Pouratian, Nader
2013-01-01
Brain-computer interface (BCI) systems are a promising means for restoring communication to patients suffering from "locked-in" syndrome. Research to improve system performance primarily focuses on means to overcome the low signal to noise ratio of electroencephalogric (EEG) recordings. However, the literature and methods are difficult to compare due to the array of evaluation metrics and assumptions underlying them, including that: 1) all characters are equally probable, 2) character selection is memoryless, and 3) errors occur completely at random. The standardization of evaluation metrics that more accurately reflect the amount of information contained in BCI language output is critical to make progress. We present a mutual information-based metric that incorporates prior information and a model of systematic errors. The parameters of a system used in one study were re-optimized, showing that the metric used in optimization significantly affects the parameter values chosen and the resulting system performance. The results of 11 BCI communication studies were then evaluated using different metrics, including those previously used in BCI literature and the newly advocated metric. Six studies' results varied based on the metric used for evaluation and the proposed metric produced results that differed from those originally published in two of the studies. Standardizing metrics to accurately reflect the rate of information transmission is critical to properly evaluate and compare BCI communication systems and advance the field in an unbiased manner.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ronald Boring; Roger Lew; Thomas Ulrich
2014-03-01
As control rooms are modernized with new digital systems at nuclear power plants, it is necessary to evaluate the operator performance using these systems as part of a verification and validation process. There are no standard, predefined metrics available for assessing what is satisfactory operator interaction with new systems, especially during the early design stages of a new system. This report identifies the process and metrics for evaluating human system interfaces as part of control room modernization. The report includes background information on design and evaluation, a thorough discussion of human performance measures, and a practical example of how themore » process and metrics have been used as part of a turbine control system upgrade during the formative stages of design. The process and metrics are geared toward generalizability to other applications and serve as a template for utilities undertaking their own control room modernization activities.« less
Design and Implementation of Performance Metrics for Evaluation of Assessments Data
ERIC Educational Resources Information Center
Ahmed, Irfan; Bhatti, Arif
2016-01-01
Evocative evaluation of assessment data is essential to quantify the achievements at course and program levels. The objective of this paper is to design performance metrics and respective formulas to quantitatively evaluate the achievement of set objectives and expected outcomes at the course levels for program accreditation. Even though…
Evaluation of image deblurring methods via a classification metric
NASA Astrophysics Data System (ADS)
Perrone, Daniele; Humphreys, David; Lamb, Robert A.; Favaro, Paolo
2012-09-01
The performance of single image deblurring algorithms is typically evaluated via a certain discrepancy measure between the reconstructed image and the ideal sharp image. The choice of metric, however, has been a source of debate and has also led to alternative metrics based on human visual perception. While fixed metrics may fail to capture some small but visible artifacts, perception-based metrics may favor reconstructions with artifacts that are visually pleasant. To overcome these limitations, we propose to assess the quality of reconstructed images via a task-driven metric. In this paper we consider object classification as the task and therefore use the rate of classification as the metric to measure deblurring performance. In our evaluation we use data with different types of blur in two cases: Optical Character Recognition (OCR), where the goal is to recognise characters in a black and white image, and object classification with no restrictions on pose, illumination and orientation. Finally, we show how off-the-shelf classification algorithms benefit from working with deblurred images.
A Classification Scheme for Smart Manufacturing Systems’ Performance Metrics
Lee, Y. Tina; Kumaraguru, Senthilkumaran; Jain, Sanjay; Robinson, Stefanie; Helu, Moneer; Hatim, Qais Y.; Rachuri, Sudarsan; Dornfeld, David; Saldana, Christopher J.; Kumara, Soundar
2017-01-01
This paper proposes a classification scheme for performance metrics for smart manufacturing systems. The discussion focuses on three such metrics: agility, asset utilization, and sustainability. For each of these metrics, we discuss classification themes, which we then use to develop a generalized classification scheme. In addition to the themes, we discuss a conceptual model that may form the basis for the information necessary for performance evaluations. Finally, we present future challenges in developing robust, performance-measurement systems for real-time, data-intensive enterprises. PMID:28785744
NASA Astrophysics Data System (ADS)
Stisen, S.; Demirel, C.; Koch, J.
2017-12-01
Evaluation of performance is an integral part of model development and calibration as well as it is of paramount importance when communicating modelling results to stakeholders and the scientific community. There exists a comprehensive and well tested toolbox of metrics to assess temporal model performance in the hydrological modelling community. On the contrary, the experience to evaluate spatial performance is not corresponding to the grand availability of spatial observations readily available and to the sophisticate model codes simulating the spatial variability of complex hydrological processes. This study aims at making a contribution towards advancing spatial pattern oriented model evaluation for distributed hydrological models. This is achieved by introducing a novel spatial performance metric which provides robust pattern performance during model calibration. The promoted SPAtial EFficiency (spaef) metric reflects three equally weighted components: correlation, coefficient of variation and histogram overlap. This multi-component approach is necessary in order to adequately compare spatial patterns. spaef, its three components individually and two alternative spatial performance metrics, i.e. connectivity analysis and fractions skill score, are tested in a spatial pattern oriented model calibration of a catchment model in Denmark. The calibration is constrained by a remote sensing based spatial pattern of evapotranspiration and discharge timeseries at two stations. Our results stress that stand-alone metrics tend to fail to provide holistic pattern information to the optimizer which underlines the importance of multi-component metrics. The three spaef components are independent which allows them to complement each other in a meaningful way. This study promotes the use of bias insensitive metrics which allow comparing variables which are related but may differ in unit in order to optimally exploit spatial observations made available by remote sensing platforms. We see great potential of spaef across environmental disciplines dealing with spatially distributed modelling.
NASA Technical Reports Server (NTRS)
McFarland, Shane M.; Norcross, Jason
2016-01-01
Existing methods for evaluating EVA suit performance and mobility have historically concentrated on isolated joint range of motion and torque. However, these techniques do little to evaluate how well a suited crewmember can actually perform during an EVA. An alternative method of characterizing suited mobility through measurement of metabolic cost to the wearer has been evaluated at Johnson Space Center over the past several years. The most recent study involved six test subjects completing multiple trials of various functional tasks in each of three different space suits; the results indicated it was often possible to discern between different suit designs on the basis of metabolic cost alone. However, other variables may have an effect on real-world suited performance; namely, completion time of the task, the gravity field in which the task is completed, etc. While previous results have analyzed completion time, metabolic cost, and metabolic cost normalized to system mass individually, it is desirable to develop a single metric comprising these (and potentially other) performance metrics. This paper outlines the background upon which this single-score metric is determined to be feasible, and initial efforts to develop such a metric. Forward work includes variable coefficient determination and verification of the metric through repeated testing.
Evaluation schemes for video and image anomaly detection algorithms
NASA Astrophysics Data System (ADS)
Parameswaran, Shibin; Harguess, Josh; Barngrover, Christopher; Shafer, Scott; Reese, Michael
2016-05-01
Video anomaly detection is a critical research area in computer vision. It is a natural first step before applying object recognition algorithms. There are many algorithms that detect anomalies (outliers) in videos and images that have been introduced in recent years. However, these algorithms behave and perform differently based on differences in domains and tasks to which they are subjected. In order to better understand the strengths and weaknesses of outlier algorithms and their applicability in a particular domain/task of interest, it is important to measure and quantify their performance using appropriate evaluation metrics. There are many evaluation metrics that have been used in the literature such as precision curves, precision-recall curves, and receiver operating characteristic (ROC) curves. In order to construct these different metrics, it is also important to choose an appropriate evaluation scheme that decides when a proposed detection is considered a true or a false detection. Choosing the right evaluation metric and the right scheme is very critical since the choice can introduce positive or negative bias in the measuring criterion and may favor (or work against) a particular algorithm or task. In this paper, we review evaluation metrics and popular evaluation schemes that are used to measure the performance of anomaly detection algorithms on videos and imagery with one or more anomalies. We analyze the biases introduced by these by measuring the performance of an existing anomaly detection algorithm.
Evaluating hydrological model performance using information theory-based metrics
USDA-ARS?s Scientific Manuscript database
The accuracy-based model performance metrics not necessarily reflect the qualitative correspondence between simulated and measured streamflow time series. The objective of this work was to use the information theory-based metrics to see whether they can be used as complementary tool for hydrologic m...
Neiger, Brad L; Thackeray, Rosemary; Van Wagenen, Sarah A; Hanson, Carl L; West, Joshua H; Barnes, Michael D; Fagen, Michael C
2012-03-01
Despite the expanding use of social media, little has been published about its appropriate role in health promotion, and even less has been written about evaluation. The purpose of this article is threefold: (a) outline purposes for social media in health promotion, (b) identify potential key performance indicators associated with these purposes, and (c) propose evaluation metrics for social media related to the key performance indicators. Process evaluation is presented in this article as an overarching evaluation strategy for social media.
A condition metric for Eucalyptus woodland derived from expert evaluations.
Sinclair, Steve J; Bruce, Matthew J; Griffioen, Peter; Dodd, Amanda; White, Matthew D
2018-02-01
The evaluation of ecosystem quality is important for land-management and land-use planning. Evaluation is unavoidably subjective, and robust metrics must be based on consensus and the structured use of observations. We devised a transparent and repeatable process for building and testing ecosystem metrics based on expert data. We gathered quantitative evaluation data on the quality of hypothetical grassy woodland sites from experts. We used these data to train a model (an ensemble of 30 bagged regression trees) capable of predicting the perceived quality of similar hypothetical woodlands based on a set of 13 site variables as inputs (e.g., cover of shrubs, richness of native forbs). These variables can be measured at any site and the model implemented in a spreadsheet as a metric of woodland quality. We also investigated the number of experts required to produce an opinion data set sufficient for the construction of a metric. The model produced evaluations similar to those provided by experts, as shown by assessing the model's quality scores of expert-evaluated test sites not used to train the model. We applied the metric to 13 woodland conservation reserves and asked managers of these sites to independently evaluate their quality. To assess metric performance, we compared the model's evaluation of site quality with the managers' evaluations through multidimensional scaling. The metric performed relatively well, plotting close to the center of the space defined by the evaluators. Given the method provides data-driven consensus and repeatability, which no single human evaluator can provide, we suggest it is a valuable tool for evaluating ecosystem quality in real-world contexts. We believe our approach is applicable to any ecosystem. © 2017 State of Victoria.
Quality evaluation of motion-compensated edge artifacts in compressed video.
Leontaris, Athanasios; Cosman, Pamela C; Reibman, Amy R
2007-04-01
Little attention has been paid to an impairment common in motion-compensated video compression: the addition of high-frequency (HF) energy as motion compensation displaces blocking artifacts off block boundaries. In this paper, we employ an energy-based approach to measure this motion-compensated edge artifact, using both compressed bitstream information and decoded pixels. We evaluate the performance of our proposed metric, along with several blocking and blurring metrics, on compressed video in two ways. First, ordinal scales are evaluated through a series of expectations that a good quality metric should satisfy: the objective evaluation. Then, the best performing metrics are subjectively evaluated. The same subjective data set is finally used to obtain interval scales to gain more insight. Experimental results show that we accurately estimate the percentage of the added HF energy in compressed video.
Evaluation metrics for bone segmentation in ultrasound
NASA Astrophysics Data System (ADS)
Lougheed, Matthew; Fichtinger, Gabor; Ungi, Tamas
2015-03-01
Tracked ultrasound is a safe alternative to X-ray for imaging bones. The interpretation of bony structures is challenging as ultrasound has no specific intensity characteristic of bones. Several image segmentation algorithms have been devised to identify bony structures. We propose an open-source framework that would aid in the development and comparison of such algorithms by quantitatively measuring segmentation performance in the ultrasound images. True-positive and false-negative metrics used in the framework quantify algorithm performance based on correctly segmented bone and correctly segmented boneless regions. Ground-truth for these metrics are defined manually and along with the corresponding automatically segmented image are used for the performance analysis. Manually created ground truth tests were generated to verify the accuracy of the analysis. Further evaluation metrics for determining average performance per slide and standard deviation are considered. The metrics provide a means of evaluating accuracy of frames along the length of a volume. This would aid in assessing the accuracy of the volume itself and the approach to image acquisition (positioning and frequency of frame). The framework was implemented as an open-source module of the 3D Slicer platform. The ground truth tests verified that the framework correctly calculates the implemented metrics. The developed framework provides a convenient way to evaluate bone segmentation algorithms. The implementation fits in a widely used application for segmentation algorithm prototyping. Future algorithm development will benefit by monitoring the effects of adjustments to an algorithm in a standard evaluation framework.
The importance of metrics for evaluating scientific performance
NASA Astrophysics Data System (ADS)
Miyakawa, Tsuyoshi
Evaluation of scientific performance is a major factor that determines the behavior of both individual researchers and the academic institutes to which they belong. Because the number of researchers heavily outweighs the number of available research posts, and the competitive funding accounts for an ever-increasing proportion of research budget, some objective indicators of research performance have gained recognition for increasing transparency and openness. It is common practice to use metrics and indices to evaluate a researcher's performance or the quality of their grant applications. Such measures include the number of publications, the number of times these papers are cited and, more recently, the h-index, which measures the number of highly-cited papers the researcher has written. However, academic institutions and funding agencies in Japan have been rather slow to adopt such metrics. In this article, I will outline some of the currently available metrics, and discuss why we need to use such objective indicators of research performance more often in Japan. I will also discuss how to promote the use of metrics and what we should keep in mind when using them, as well as their potential impact on the research community in Japan.
NASA Astrophysics Data System (ADS)
Koch, Julian; Cüneyd Demirel, Mehmet; Stisen, Simon
2018-05-01
The process of model evaluation is not only an integral part of model development and calibration but also of paramount importance when communicating modelling results to the scientific community and stakeholders. The modelling community has a large and well-tested toolbox of metrics to evaluate temporal model performance. In contrast, spatial performance evaluation does not correspond to the grand availability of spatial observations readily available and to the sophisticate model codes simulating the spatial variability of complex hydrological processes. This study makes a contribution towards advancing spatial-pattern-oriented model calibration by rigorously testing a multiple-component performance metric. The promoted SPAtial EFficiency (SPAEF) metric reflects three equally weighted components: correlation, coefficient of variation and histogram overlap. This multiple-component approach is found to be advantageous in order to achieve the complex task of comparing spatial patterns. SPAEF, its three components individually and two alternative spatial performance metrics, i.e. connectivity analysis and fractions skill score, are applied in a spatial-pattern-oriented model calibration of a catchment model in Denmark. Results suggest the importance of multiple-component metrics because stand-alone metrics tend to fail to provide holistic pattern information. The three SPAEF components are found to be independent, which allows them to complement each other in a meaningful way. In order to optimally exploit spatial observations made available by remote sensing platforms, this study suggests applying bias insensitive metrics which further allow for a comparison of variables which are related but may differ in unit. This study applies SPAEF in the hydrological context using the mesoscale Hydrologic Model (mHM; version 5.8), but we see great potential across disciplines related to spatially distributed earth system modelling.
Performance metrics for the evaluation of hyperspectral chemical identification systems
NASA Astrophysics Data System (ADS)
Truslow, Eric; Golowich, Steven; Manolakis, Dimitris; Ingle, Vinay
2016-02-01
Remote sensing of chemical vapor plumes is a difficult but important task for many military and civilian applications. Hyperspectral sensors operating in the long-wave infrared regime have well-demonstrated detection capabilities. However, the identification of a plume's chemical constituents, based on a chemical library, is a multiple hypothesis testing problem which standard detection metrics do not fully describe. We propose using an additional performance metric for identification based on the so-called Dice index. Our approach partitions and weights a confusion matrix to develop both the standard detection metrics and identification metric. Using the proposed metrics, we demonstrate that the intuitive system design of a detector bank followed by an identifier is indeed justified when incorporating performance information beyond the standard detection metrics.
Sakieh, Yousef; Salmanmahiny, Abdolrassoul
2016-03-01
Performance evaluation is a critical step when developing land-use and cover change (LUCC) models. The present study proposes a spatially explicit model performance evaluation method, adopting a landscape metric-based approach. To quantify GEOMOD model performance, a set of composition- and configuration-based landscape metrics including number of patches, edge density, mean Euclidean nearest neighbor distance, largest patch index, class area, landscape shape index, and splitting index were employed. The model takes advantage of three decision rules including neighborhood effect, persistence of change direction, and urbanization suitability values. According to the results, while class area, largest patch index, and splitting indices demonstrated insignificant differences between spatial pattern of ground truth and simulated layers, there was a considerable inconsistency between simulation results and real dataset in terms of the remaining metrics. Specifically, simulation outputs were simplistic and the model tended to underestimate number of developed patches by producing a more compact landscape. Landscape-metric-based performance evaluation produces more detailed information (compared to conventional indices such as the Kappa index and overall accuracy) on the model's behavior in replicating spatial heterogeneity features of a landscape such as frequency, fragmentation, isolation, and density. Finally, as the main characteristic of the proposed method, landscape metrics employ the maximum potential of observed and simulated layers for a performance evaluation procedure, provide a basis for more robust interpretation of a calibration process, and also deepen modeler insight into the main strengths and pitfalls of a specific land-use change model when simulating a spatiotemporal phenomenon.
Image Navigation and Registration Performance Assessment Evaluation Tools for GOES-R ABI and GLM
NASA Technical Reports Server (NTRS)
Houchin, Scott; Porter, Brian; Graybill, Justin; Slingerland, Philip
2017-01-01
The GOES-R Flight Project has developed an Image Navigation and Registration (INR) Performance Assessment Tool Set (IPATS) for measuring Advanced Baseline Imager (ABI) and Geostationary Lightning Mapper (GLM) INR performance metrics in the post-launch period for performance evaluation and long term monitoring. IPATS utilizes a modular algorithmic design to allow user selection of data processing sequences optimized for generation of each INR metric. This novel modular approach minimizes duplication of common processing elements, thereby maximizing code efficiency and speed. Fast processing is essential given the large number of sub-image registrations required to generate INR metrics for the many images produced over a 24 hour evaluation period. This paper describes the software design and implementation of IPATS and provides preliminary test results.
Multi-mode evaluation of power-maximizing cross-flow turbine controllers
Forbush, Dominic; Cavagnaro, Robert J.; Donegan, James; ...
2017-09-21
A general method for predicting and evaluating the performance of three candidate cross-flow turbine power-maximizing controllers is presented in this paper using low-order dynamic simulation, scaled laboratory experiments, and full-scale field testing. For each testing mode and candidate controller, performance metrics quantifying energy capture (ability of a controller to maximize power), variation in torque and rotation rate (related to drive train fatigue), and variation in thrust loads (related to structural fatigue) are quantified for two purposes. First, for metrics that could be evaluated across all testing modes, we considered the accuracy with which simulation or laboratory experiments could predict performancemore » at full scale. Second, we explored the utility of these metrics to contrast candidate controller performance. For these turbines and set of candidate controllers, energy capture was found to only differentiate controller performance in simulation, while the other explored metrics were able to predict performance of the full-scale turbine in the field with various degrees of success. Finally, effects of scale between laboratory and full-scale testing are considered, along with recommendations for future improvements to dynamic simulations and controller evaluation.« less
Multi-mode evaluation of power-maximizing cross-flow turbine controllers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Forbush, Dominic; Cavagnaro, Robert J.; Donegan, James
A general method for predicting and evaluating the performance of three candidate cross-flow turbine power-maximizing controllers is presented in this paper using low-order dynamic simulation, scaled laboratory experiments, and full-scale field testing. For each testing mode and candidate controller, performance metrics quantifying energy capture (ability of a controller to maximize power), variation in torque and rotation rate (related to drive train fatigue), and variation in thrust loads (related to structural fatigue) are quantified for two purposes. First, for metrics that could be evaluated across all testing modes, we considered the accuracy with which simulation or laboratory experiments could predict performancemore » at full scale. Second, we explored the utility of these metrics to contrast candidate controller performance. For these turbines and set of candidate controllers, energy capture was found to only differentiate controller performance in simulation, while the other explored metrics were able to predict performance of the full-scale turbine in the field with various degrees of success. Finally, effects of scale between laboratory and full-scale testing are considered, along with recommendations for future improvements to dynamic simulations and controller evaluation.« less
An Underwater Color Image Quality Evaluation Metric.
Yang, Miao; Sowmya, Arcot
2015-12-01
Quality evaluation of underwater images is a key goal of underwater video image retrieval and intelligent processing. To date, no metric has been proposed for underwater color image quality evaluation (UCIQE). The special absorption and scattering characteristics of the water medium do not allow direct application of natural color image quality metrics especially to different underwater environments. In this paper, subjective testing for underwater image quality has been organized. The statistical distribution of the underwater image pixels in the CIELab color space related to subjective evaluation indicates the sharpness and colorful factors correlate well with subjective image quality perception. Based on these, a new UCIQE metric, which is a linear combination of chroma, saturation, and contrast, is proposed to quantify the non-uniform color cast, blurring, and low-contrast that characterize underwater engineering and monitoring images. Experiments are conducted to illustrate the performance of the proposed UCIQE metric and its capability to measure the underwater image enhancement results. They show that the proposed metric has comparable performance to the leading natural color image quality metrics and the underwater grayscale image quality metrics available in the literature, and can predict with higher accuracy the relative amount of degradation with similar image content in underwater environments. Importantly, UCIQE is a simple and fast solution for real-time underwater video processing. The effectiveness of the presented measure is also demonstrated by subjective evaluation. The results show better correlation between the UCIQE and the subjective mean opinion score.
Rudnick, Paul A.; Clauser, Karl R.; Kilpatrick, Lisa E.; Tchekhovskoi, Dmitrii V.; Neta, Pedatsur; Blonder, Nikša; Billheimer, Dean D.; Blackman, Ronald K.; Bunk, David M.; Cardasis, Helene L.; Ham, Amy-Joan L.; Jaffe, Jacob D.; Kinsinger, Christopher R.; Mesri, Mehdi; Neubert, Thomas A.; Schilling, Birgit; Tabb, David L.; Tegeler, Tony J.; Vega-Montoto, Lorenzo; Variyath, Asokan Mulayath; Wang, Mu; Wang, Pei; Whiteaker, Jeffrey R.; Zimmerman, Lisa J.; Carr, Steven A.; Fisher, Susan J.; Gibson, Bradford W.; Paulovich, Amanda G.; Regnier, Fred E.; Rodriguez, Henry; Spiegelman, Cliff; Tempst, Paul; Liebler, Daniel C.; Stein, Stephen E.
2010-01-01
A major unmet need in LC-MS/MS-based proteomics analyses is a set of tools for quantitative assessment of system performance and evaluation of technical variability. Here we describe 46 system performance metrics for monitoring chromatographic performance, electrospray source stability, MS1 and MS2 signals, dynamic sampling of ions for MS/MS, and peptide identification. Applied to data sets from replicate LC-MS/MS analyses, these metrics displayed consistent, reasonable responses to controlled perturbations. The metrics typically displayed variations less than 10% and thus can reveal even subtle differences in performance of system components. Analyses of data from interlaboratory studies conducted under a common standard operating procedure identified outlier data and provided clues to specific causes. Moreover, interlaboratory variation reflected by the metrics indicates which system components vary the most between laboratories. Application of these metrics enables rational, quantitative quality assessment for proteomics and other LC-MS/MS analytical applications. PMID:19837981
Tang, Tao; Stevenson, R Jan; Infante, Dana M
2016-10-15
Regional variation in both natural environment and human disturbance can influence performance of ecological assessments. In this study we calculated 5 types of benthic diatom multimetric indices (MMIs) with 3 different approaches to account for variation in ecological assessments. We used: site groups defined by ecoregions or diatom typologies; the same or different sets of metrics among site groups; and unmodeled or modeled MMIs, where models accounted for natural variation in metrics within site groups by calculating an expected reference condition for each metric and each site. We used data from the USEPA's National Rivers and Streams Assessment to calculate the MMIs and evaluate changes in MMI performance. MMI performance was evaluated with indices of precision, bias, responsiveness, sensitivity and relevancy which were respectively measured as MMI variation among reference sites, effects of natural variables on MMIs, difference between MMIs at reference and highly disturbed sites, percent of highly disturbed sites properly classified, and relation of MMIs to human disturbance and stressors. All 5 types of MMIs showed considerable discrimination ability. Using different metrics among ecoregions sometimes reduced precision, but it consistently increased responsiveness, sensitivity, and relevancy. Site specific metric modeling reduced bias and increased responsiveness. Combined use of different metrics among site groups and site specific modeling significantly improved MMI performance irrespective of site grouping approach. Compared to ecoregion site classification, grouping sites based on diatom typologies improved precision, but did not improve overall performance of MMIs if we accounted for natural variation in metrics with site specific models. We conclude that using different metrics among ecoregions and site specific metric modeling improve MMI performance, particularly when used together. Applications of these MMI approaches in ecological assessments introduced a tradeoff with assessment consistency when metrics differed across site groups, but they justified the convenient and consistent use of ecoregions. Copyright © 2016 Elsevier B.V. All rights reserved.
FAST COGNITIVE AND TASK ORIENTED, ITERATIVE DATA DISPLAY (FACTOID)
2017-06-01
approaches. As a result, the following assumptions guided our efforts in developing modeling and descriptive metrics for evaluation purposes...Application Evaluation . Our analytic workflow for evaluation is to first provide descriptive statistics about applications across metrics (performance...distributions for evaluation purposes because the goal of evaluation is accurate description , not inference (e.g., prediction). Outliers depicted
An objective method for a video quality evaluation in a 3DTV service
NASA Astrophysics Data System (ADS)
Wilczewski, Grzegorz
2015-09-01
The following article describes proposed objective method for a 3DTV video quality evaluation, a Compressed Average Image Intensity (CAII) method. Identification of the 3DTV service's content chain nodes enables to design a versatile, objective video quality metric. It is based on an advanced approach to the stereoscopic videostream analysis. Insights towards designed metric mechanisms, as well as the evaluation of performance of the designed video quality metric, in the face of the simulated environmental conditions are herein discussed. As a result, created CAII metric might be effectively used in a variety of service quality assessment applications.
Video-Based Method of Quantifying Performance and Instrument Motion During Simulated Phonosurgery
Conroy, Ellen; Surender, Ketan; Geng, Zhixian; Chen, Ting; Dailey, Seth; Jiang, Jack
2015-01-01
Objectives/Hypothesis To investigate the use of the Video-Based Phonomicrosurgery Instrument Tracking System to collect instrument position data during simulated phonomicrosurgery and calculate motion metrics using these data. We used this system to determine if novice subject motion metrics improved over 1 week of training. Study Design Prospective cohort study. Methods Ten subjects performed simulated surgical tasks once per day for 5 days. Instrument position data were collected and used to compute motion metrics (path length, depth perception, and motion smoothness). Data were analyzed to determine if motion metrics improved with practice time. Task outcome was also determined each day, and relationships between task outcome and motion metrics were used to evaluate the validity of motion metrics as indicators of surgical performance. Results Significant decreases over time were observed for path length (P <.001), depth perception (P <.001), and task outcome (P <.001). No significant change was observed for motion smoothness. Significant relationships were observed between task outcome and path length (P <.001), depth perception (P <.001), and motion smoothness (P <.001). Conclusions Our system can estimate instrument trajectory and provide quantitative descriptions of surgical performance. It may be useful for evaluating phonomicrosurgery performance. Path length and depth perception may be particularly useful indicators. PMID:24737286
NASA Technical Reports Server (NTRS)
Lee, Seung Man; Park, Chunki; Cone, Andrew Clayton; Thipphavong, David P.; Santiago, Confesor
2016-01-01
This presentation contains the analysis results of NAS-wide fast-time simulations with UAS and VFR traffic for a single day for evaluating the performance of Detect-and-Avoid (DAA) alerting and guidance systems. This purpose of this study was to help refine and validate MOPS alerting and guidance requirements. In this study, we generated plots of all performance metrics that are specified by RTCA SC-228 Minimum Operational Performance Standards (MOPS): 1) to evaluate the sensitivity of alerting parameters on the performance metrics of each DAA alert type: Preventive, Corrective, and Warning alerts and 2) to evaluate the effect of sensor uncertainty on DAA alerting and guidance performance.
Performance evaluation of no-reference image quality metrics for face biometric images
NASA Astrophysics Data System (ADS)
Liu, Xinwei; Pedersen, Marius; Charrier, Christophe; Bours, Patrick
2018-03-01
The accuracy of face recognition systems is significantly affected by the quality of face sample images. The recent established standardization proposed several important aspects for the assessment of face sample quality. There are many existing no-reference image quality metrics (IQMs) that are able to assess natural image quality by taking into account similar image-based quality attributes as introduced in the standardization. However, whether such metrics can assess face sample quality is rarely considered. We evaluate the performance of 13 selected no-reference IQMs on face biometrics. The experimental results show that several of them can assess face sample quality according to the system performance. We also analyze the strengths and weaknesses of different IQMs as well as why some of them failed to assess face sample quality. Retraining an original IQM by using face database can improve the performance of such a metric. In addition, the contribution of this paper can be used for the evaluation of IQMs on other biometric modalities; furthermore, it can be used for the development of multimodality biometric IQMs.
Restaurant Energy Use Benchmarking Guideline
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hedrick, R.; Smith, V.; Field, K.
2011-07-01
A significant operational challenge for food service operators is defining energy use benchmark metrics to compare against the performance of individual stores. Without metrics, multiunit operators and managers have difficulty identifying which stores in their portfolios require extra attention to bring their energy performance in line with expectations. This report presents a method whereby multiunit operators may use their own utility data to create suitable metrics for evaluating their operations.
Uncooperative target-in-the-loop performance with backscattered speckle-field effects
NASA Astrophysics Data System (ADS)
Kansky, Jan E.; Murphy, Daniel V.
2007-09-01
Systems utilizing target-in-the-loop (TIL) techniques for adaptive optics phase compensation rely on a metric sensor to perform a hill climbing algorithm that maximizes the far-field Strehl ratio. In uncooperative TIL, the metric signal is derived from the light backscattered from a target. In cases where the target is illuminated with a laser with suffciently long coherence length, the potential exists for the validity of the metric sensor to be compromised by speckle-field effects. We report experimental results from a scaled laboratory designed to evaluate TIL performance in atmospheric turbulence and thermal blooming conditions where the metric sensors are influenced by varying degrees of backscatter speckle. We compare performance of several TIL configurations and metrics for cases with static speckle, and for cases with speckle fluctuations within the frequency range that the TIL system operates. The roles of metric sensor filtering and system bandwidth are discussed.
Compression performance comparison in low delay real-time video for mobile applications
NASA Astrophysics Data System (ADS)
Bivolarski, Lazar
2012-10-01
This article compares the performance of several current video coding standards in the conditions of low-delay real-time in a resource constrained environment. The comparison is performed using the same content and the metrics and mix of objective and perceptual quality metrics. The metrics results in different coding schemes are analyzed from a point of view of user perception and quality of service. Multiple standards are compared MPEG-2, MPEG4 and MPEG-AVC and well and H.263. The metrics used in the comparison include SSIM, VQM and DVQ. Subjective evaluation and quality of service are discussed from a point of view of perceptual metrics and their incorporation in the coding scheme development process. The performance and the correlation of results are presented as a predictor of the performance of video compression schemes.
National Quality Forum Colon Cancer Quality Metric Performance: How Are Hospitals Measuring Up?
Mason, Meredith C; Chang, George J; Petersen, Laura A; Sada, Yvonne H; Tran Cao, Hop S; Chai, Christy; Berger, David H; Massarweh, Nader N
2017-12-01
To evaluate the impact of care at high-performing hospitals on the National Quality Forum (NQF) colon cancer metrics. The NQF endorses evaluating ≥12 lymph nodes (LNs), adjuvant chemotherapy (AC) for stage III patients, and AC within 4 months of diagnosis as colon cancer quality indicators. Data on hospital-level metric performance and the association with survival are unclear. Retrospective cohort study of 218,186 patients with resected stage I to III colon cancer in the National Cancer Data Base (2004-2012). High-performing hospitals (>75% achievement) were identified by the proportion of patients achieving each measure. The association between hospital performance and survival was evaluated using Cox shared frailty modeling. Only hospital LN performance improved (15.8% in 2004 vs 80.7% in 2012; trend test, P < 0.001), with 45.9% of hospitals performing well on all 3 measures concurrently in the most recent study year. Overall, 5-year survival was 75.0%, 72.3%, 72.5%, and 69.5% for those treated at hospitals with high performance on 3, 2, 1, and 0 metrics, respectively (log-rank, P < 0.001). Care at hospitals with high metric performance was associated with lower risk of death in a dose-response fashion [0 metrics, reference; 1, hazard ratio (HR) 0.96 (0.89-1.03); 2, HR 0.92 (0.87-0.98); 3, HR 0.85 (0.80-0.90); 2 vs 1, HR 0.96 (0.91-1.01); 3 vs 1, HR 0.89 (0.84-0.93); 3 vs 2, HR 0.95 (0.89-0.95)]. Performance on metrics in combination was associated with lower risk of death [LN + AC, HR 0.86 (0.78-0.95); AC + timely AC, HR 0.92 (0.87-0.98); LN + AC + timely AC, HR 0.85 (0.80-0.90)], whereas individual measures were not [LN, HR 0.95 (0.88-1.04); AC, HR 0.95 (0.87-1.05)]. Less than half of hospitals perform well on these NQF colon cancer metrics concurrently, and high performance on individual measures is not associated with improved survival. Quality improvement efforts should shift focus from individual measures to defining composite measures encompassing the overall multimodal care pathway and capturing successful transitions from one care modality to another.
NERC Policy 10: Measurement of two generation and load balancing IOS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Spicer, P.J.; Galow, G.G.
1999-11-01
Policy 10 will describe specific standards and metrics for most of the reliability functions described in the Interconnected Operations Services Working Group (IOS WG) report. The purpose of this paper is to discuss, in detail, the proposed metrics for two generation and load balancing IOSs: Regulation; Load Following. For purposes of this paper, metrics include both measurement and performance evaluation. The measurement methods discussed are included in the current draft of the proposed Policy 10. The performance evaluation method discussed is offered by the authors for consideration by the IOS ITF (Implementation Task Force) for inclusion into Policy 10.
NASA Technical Reports Server (NTRS)
Seo, Byoung-Joon; Nissly, Carl; Troy, Mitchell; Angeli, George
2010-01-01
The Normalized Point Source Sensitivity (PSSN) has previously been defined and analyzed as an On-Axis seeing-limited telescope performance metric. In this paper, we expand the scope of the PSSN definition to include Off-Axis field of view (FoV) points and apply this generalized metric for performance evaluation of the Thirty Meter Telescope (TMT). We first propose various possible choices for the PSSN definition and select one as our baseline. We show that our baseline metric has useful properties including the multiplicative feature even when considering Off-Axis FoV points, which has proven to be useful for optimizing the telescope error budget. Various TMT optical errors are considered for the performance evaluation including segment alignment and phasing, segment surface figures, temperature, and gravity, whose On-Axis PSSN values have previously been published by our group.
The compressed average image intensity metric for stereoscopic video quality assessment
NASA Astrophysics Data System (ADS)
Wilczewski, Grzegorz
2016-09-01
The following article depicts insights towards design, creation and testing of a genuine metric designed for a 3DTV video quality evaluation. The Compressed Average Image Intensity (CAII) mechanism is based upon stereoscopic video content analysis, setting its core feature and functionality to serve as a versatile tool for an effective 3DTV service quality assessment. Being an objective type of quality metric it may be utilized as a reliable source of information about the actual performance of a given 3DTV system, under strict providers evaluation. Concerning testing and the overall performance analysis of the CAII metric, the following paper presents comprehensive study of results gathered across several testing routines among selected set of samples of stereoscopic video content. As a result, the designed method for stereoscopic video quality evaluation is investigated across the range of synthetic visual impairments injected into the original video stream.
Lopes, Julio Cesar Dias; Dos Santos, Fábio Mendes; Martins-José, Andrelly; Augustyns, Koen; De Winter, Hans
2017-01-01
A new metric for the evaluation of model performance in the field of virtual screening and quantitative structure-activity relationship applications is described. This metric has been termed the power metric and is defined as the fraction of the true positive rate divided by the sum of the true positive and false positive rates, for a given cutoff threshold. The performance of this metric is compared with alternative metrics such as the enrichment factor, the relative enrichment factor, the receiver operating curve enrichment factor, the correct classification rate, Matthews correlation coefficient and Cohen's kappa coefficient. The performance of this new metric is found to be quite robust with respect to variations in the applied cutoff threshold and ratio of the number of active compounds to the total number of compounds, and at the same time being sensitive to variations in model quality. It possesses the correct characteristics for its application in early-recognition virtual screening problems.
Algal bioassessment metrics for wadeable streams and rivers of Maine, USA
Danielson, Thomas J.; Loftin, Cynthia S.; Tsomides, Leonidas; DiFranco, Jeanne L.; Connors, Beth
2011-01-01
Many state water-quality agencies use biological assessment methods based on lotic fish and macroinvertebrate communities, but relatively few states have incorporated algal multimetric indices into monitoring programs. Algae are good indicators for monitoring water quality because they are sensitive to many environmental stressors. We evaluated benthic algal community attributes along a landuse gradient affecting wadeable streams and rivers in Maine, USA, to identify potential bioassessment metrics. We collected epilithic algal samples from 193 locations across the state. We computed weighted-average optima for common taxa for total P, total N, specific conductance, % impervious cover, and % developed watershed, which included all land use that is no longer forest or wetland. We assigned Maine stream tolerance values and categories (sensitive, intermediate, tolerant) to taxa based on their optima and responses to watershed disturbance. We evaluated performance of algal community metrics used in multimetric indices from other regions and novel metrics based on Maine data. Metrics specific to Maine data, such as the relative richness of species characterized as being sensitive in Maine, were more correlated with % developed watershed than most metrics used in other regions. Few community-structure attributes (e.g., species richness) were useful metrics in Maine. Performance of algal bioassessment models would be improved if metrics were evaluated with attributes of local data before inclusion in multimetric indices or statistical models. ?? 2011 by The North American Benthological Society.
ERIC Educational Resources Information Center
Ramanarayanan, Vikram; Lange, Patrick; Evanini, Keelan; Molloy, Hillary; Tsuprun, Eugene; Qian, Yao; Suendermann-Oeft, David
2017-01-01
Predicting and analyzing multimodal dialog user experience (UX) metrics, such as overall call experience, caller engagement, and latency, among other metrics, in an ongoing manner is important for evaluating such systems. We investigate automated prediction of multiple such metrics collected from crowdsourced interactions with an open-source,…
2010-07-01
applicants and is pursing further research on the WPA. An operational test and evaluation ( IOT &E) has been initiated to evaluate the new screen...initial operational test and evaluation ( IOT &E) starting in fall 2009. vii EXPANDED ENLISTMENT ELIGIBILITY METRICS (EEEM): RECOMMENDATIONS ON A NON...Evaluation of a Performance Screen for IOT &E ..................................... 49 Approach
Interaction Metrics for Feedback Control of Sound Radiation from Stiffened Panels
NASA Technical Reports Server (NTRS)
Cabell, Randolph H.; Cox, David E.; Gibbs, Gary P.
2003-01-01
Interaction metrics developed for the process control industry are used to evaluate decentralized control of sound radiation from bays on an aircraft fuselage. The metrics are applied to experimentally measured frequency response data from a model of an aircraft fuselage. The purpose is to understand how coupling between multiple bays of the fuselage can destabilize or limit the performance of a decentralized active noise control system. The metrics quantitatively verify observations from a previous experiment, in which decentralized controllers performed worse than centralized controllers. The metrics do not appear to be useful for explaining control spillover which was observed in a previous experiment.
Evaluating the Performance of the IEEE Standard 1366 Method for Identifying Major Event Days
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eto, Joseph H.; LaCommare, Kristina Hamachi; Sohn, Michael D.
IEEE Standard 1366 offers a method for segmenting reliability performance data to isolate the effects of major events from the underlying year-to-year trends in reliability. Recent analysis by the IEEE Distribution Reliability Working Group (DRWG) has found that reliability performance of some utilities differs from the expectations that helped guide the development of the Standard 1366 method. This paper proposes quantitative metrics to evaluate the performance of the Standard 1366 method in identifying major events and in reducing year-to-year variability in utility reliability. The metrics are applied to a large sample of utility-reported reliability data to assess performance of themore » method with alternative specifications that have been considered by the DRWG. We find that none of the alternatives perform uniformly 'better' than the current Standard 1366 method. That is, none of the modifications uniformly lowers the year-to-year variability in System Average Interruption Duration Index without major events. Instead, for any given alternative, while it may lower the value of this metric for some utilities, it also increases it for other utilities (sometimes dramatically). Thus, we illustrate some of the trade-offs that must be considered in using the Standard 1366 method and highlight the usefulness of the metrics we have proposed in conducting these evaluations.« less
DeJournett, Jeremy; DeJournett, Leon
2017-11-01
Effective glucose control in the intensive care unit (ICU) setting has the potential to decrease morbidity and mortality rates and thereby decrease health care expenditures. To evaluate what constitutes effective glucose control, typically several metrics are reported, including time in range, time in mild and severe hypoglycemia, coefficient of variation, and others. To date, there is no one metric that combines all of these individual metrics to give a number indicative of overall performance. We proposed a composite metric that combines 5 commonly reported metrics, and we used this composite metric to compare 6 glucose controllers. We evaluated the following controllers: Ideal Medical Technologies (IMT) artificial-intelligence-based controller, Yale protocol, Glucommander, Wintergerst et al PID controller, GRIP, and NICE-SUGAR. We evaluated each controller across 80 simulated patients, 4 clinically relevant exogenous dextrose infusions, and one nonclinical infusion as a test of the controller's ability to handle difficult situations. This gave a total of 2400 5-day simulations, and 585 604 individual glucose values for analysis. We used a random walk sensor error model that gave a 10% MARD. For each controller, we calculated severe hypoglycemia (<40 mg/dL), mild hypoglycemia (40-69 mg/dL), normoglycemia (70-140 mg/dL), hyperglycemia (>140 mg/dL), and coefficient of variation (CV), as well as our novel controller metric. For the controllers tested, we achieved the following median values for our novel controller scoring metric: IMT: 88.1, YALE: 46.7, GLUC: 47.2, PID: 50, GRIP: 48.2, NICE: 46.4. The novel scoring metric employed in this study shows promise as a means for evaluating new and existing ICU-based glucose controllers, and it could be used in the future to compare results of glucose control studies in critical care. The IMT AI-based glucose controller demonstrated the most consistent performance results based on this new metric.
Towards a Visual Quality Metric for Digital Video
NASA Technical Reports Server (NTRS)
Watson, Andrew B.
1998-01-01
The advent of widespread distribution of digital video creates a need for automated methods for evaluating visual quality of digital video. This is particularly so since most digital video is compressed using lossy methods, which involve the controlled introduction of potentially visible artifacts. Compounding the problem is the bursty nature of digital video, which requires adaptive bit allocation based on visual quality metrics. In previous work, we have developed visual quality metrics for evaluating, controlling, and optimizing the quality of compressed still images. These metrics incorporate simplified models of human visual sensitivity to spatial and chromatic visual signals. The challenge of video quality metrics is to extend these simplified models to temporal signals as well. In this presentation I will discuss a number of the issues that must be resolved in the design of effective video quality metrics. Among these are spatial, temporal, and chromatic sensitivity and their interactions, visual masking, and implementation complexity. I will also touch on the question of how to evaluate the performance of these metrics.
CUQI: cardiac ultrasound video quality index
Razaak, Manzoor; Martini, Maria G.
2016-01-01
Abstract. Medical images and videos are now increasingly part of modern telecommunication applications, including telemedicinal applications, favored by advancements in video compression and communication technologies. Medical video quality evaluation is essential for modern applications since compression and transmission processes often compromise the video quality. Several state-of-the-art video quality metrics used for quality evaluation assess the perceptual quality of the video. For a medical video, assessing quality in terms of “diagnostic” value rather than “perceptual” quality is more important. We present a diagnostic-quality–oriented video quality metric for quality evaluation of cardiac ultrasound videos. Cardiac ultrasound videos are characterized by rapid repetitive cardiac motions and distinct structural information characteristics that are explored by the proposed metric. Cardiac ultrasound video quality index, the proposed metric, is a full reference metric and uses the motion and edge information of the cardiac ultrasound video to evaluate the video quality. The metric was evaluated for its performance in approximating the quality of cardiac ultrasound videos by testing its correlation with the subjective scores of medical experts. The results of our tests showed that the metric has high correlation with medical expert opinions and in several cases outperforms the state-of-the-art video quality metrics considered in our tests. PMID:27014715
Hussain, Husniza; Khalid, Norhayati Mustafa; Selamat, Rusidah; Wan Nazaimoon, Wan Mohamud
2013-09-01
The urinary iodine micromethod (UIMM) is a modification of the conventional method and its performance needs evaluation. UIMM performance was evaluated using the method validation and 2008 Iodine Deficiency Disorders survey data obtained from four urinary iodine (UI) laboratories. Method acceptability tests and Sigma quality metrics were determined using total allowable errors (TEas) set by two external quality assurance (EQA) providers. UIMM obeyed various method acceptability test criteria with some discrepancies at low concentrations. Method validation data calculated against the UI Quality Program (TUIQP) TEas showed that the Sigma metrics were at 2.75, 1.80, and 3.80 for 51±15.50 µg/L, 108±32.40 µg/L, and 149±38.60 µg/L UI, respectively. External quality control (EQC) data showed that the performance of the laboratories was within Sigma metrics of 0.85-1.12, 1.57-4.36, and 1.46-4.98 at 46.91±7.05 µg/L, 135.14±13.53 µg/L, and 238.58±17.90 µg/L, respectively. No laboratory showed a calculated total error (TEcalc)
Gamut Volume Index: a color preference metric based on meta-analysis and optimized colour samples.
Liu, Qiang; Huang, Zheng; Xiao, Kaida; Pointer, Michael R; Westland, Stephen; Luo, M Ronnier
2017-07-10
A novel metric named Gamut Volume Index (GVI) is proposed for evaluating the colour preference of lighting. This metric is based on the absolute gamut volume of optimized colour samples. The optimal colour set of the proposed metric was obtained by optimizing the weighted average correlation between the metric predictions and the subjective ratings for 8 psychophysical studies. The performance of 20 typical colour metrics was also investigated, which included colour difference based metrics, gamut based metrics, memory based metrics as well as combined metrics. It was found that the proposed GVI outperformed the existing counterparts, especially for the conditions where correlated colour temperatures differed.
EVALUATION OF METRIC PRECISION FOR A RIPARIAN FOREST SURVEY
This paper evaluates the performance of a protocol to monitor riparian forests in western Oregon based on the quality of the data obtained from a recent field survey. Precision and accuracy are the criteria used to determine the quality of 19 field metrics. The field survey con...
NASA Astrophysics Data System (ADS)
Grieggs, Samuel M.; McLaughlin, Michael J.; Ezekiel, Soundararajan; Blasch, Erik
2015-06-01
As technology and internet use grows at an exponential rate, video and imagery data is becoming increasingly important. Various techniques such as Wide Area Motion imagery (WAMI), Full Motion Video (FMV), and Hyperspectral Imaging (HSI) are used to collect motion data and extract relevant information. Detecting and identifying a particular object in imagery data is an important step in understanding visual imagery, such as content-based image retrieval (CBIR). Imagery data is segmented and automatically analyzed and stored in dynamic and robust database. In our system, we seek utilize image fusion methods which require quality metrics. Many Image Fusion (IF) algorithms have been proposed based on different, but only a few metrics, used to evaluate the performance of these algorithms. In this paper, we seek a robust, objective metric to evaluate the performance of IF algorithms which compares the outcome of a given algorithm to ground truth and reports several types of errors. Given the ground truth of a motion imagery data, it will compute detection failure, false alarm, precision and recall metrics, background and foreground regions statistics, as well as split and merge of foreground regions. Using the Structural Similarity Index (SSIM), Mutual Information (MI), and entropy metrics; experimental results demonstrate the effectiveness of the proposed methodology for object detection, activity exploitation, and CBIR.
Combining control input with flight path data to evaluate pilot performance in transport aircraft.
Ebbatson, Matt; Harris, Don; Huddlestone, John; Sears, Rodney
2008-11-01
When deriving an objective assessment of piloting performance from flight data records, it is common to employ metrics which purely evaluate errors in flight path parameters. The adequacy of pilot performance is evaluated from the flight path of the aircraft. However, in large jet transport aircraft these measures may be insensitive and require supplementing with frequency-based measures of control input parameters. Flight path and control input data were collected from pilots undertaking a jet transport aircraft conversion course during a series of symmetric and asymmetric approaches in a flight simulator. The flight path data were analyzed for deviations around the optimum flight path while flying an instrument landing approach. Manipulation of the flight controls was subject to analysis using a series of power spectral density measures. The flight path metrics showed no significant differences in performance between the symmetric and asymmetric approaches. However, control input frequency domain measures revealed that the pilots employed highly different control strategies in the pitch and yaw axes. The results demonstrate that to evaluate pilot performance fully in large aircraft, it is necessary to employ performance metrics targeted at both the outer control loop (flight path) and the inner control loop (flight control) parameters in parallel, evaluating both the product and process of a pilot's performance.
Relevance of motion-related assessment metrics in laparoscopic surgery.
Oropesa, Ignacio; Chmarra, Magdalena K; Sánchez-González, Patricia; Lamata, Pablo; Rodrigues, Sharon P; Enciso, Silvia; Sánchez-Margallo, Francisco M; Jansen, Frank-Willem; Dankelman, Jenny; Gómez, Enrique J
2013-06-01
Motion metrics have become an important source of information when addressing the assessment of surgical expertise. However, their direct relationship with the different surgical skills has not been fully explored. The purpose of this study is to investigate the relevance of motion-related metrics in the evaluation processes of basic psychomotor laparoscopic skills and their correlation with the different abilities sought to measure. A framework for task definition and metric analysis is proposed. An explorative survey was first conducted with a board of experts to identify metrics to assess basic psychomotor skills. Based on the output of that survey, 3 novel tasks for surgical assessment were designed. Face and construct validation was performed, with focus on motion-related metrics. Tasks were performed by 42 participants (16 novices, 22 residents, and 4 experts). Movements of the laparoscopic instruments were registered with the TrEndo tracking system and analyzed. Time, path length, and depth showed construct validity for all 3 tasks. Motion smoothness and idle time also showed validity for tasks involving bimanual coordination and tasks requiring a more tactical approach, respectively. Additionally, motion smoothness and average speed showed a high internal consistency, proving them to be the most task-independent of all the metrics analyzed. Motion metrics are complementary and valid for assessing basic psychomotor skills, and their relevance depends on the skill being evaluated. A larger clinical implementation, combined with quality performance information, will give more insight on the relevance of the results shown in this study.
Zone calculation as a tool for assessing performance outcome in laparoscopic suturing.
Buckley, Christina E; Kavanagh, Dara O; Nugent, Emmeline; Ryan, Donncha; Traynor, Oscar J; Neary, Paul C
2015-06-01
Simulator performance is measured by metrics, which are valued as an objective way of assessing trainees. Certain procedures such as laparoscopic suturing, however, may not be suitable for assessment under traditionally formulated metrics. Our aim was to assess if our new metric is a valid method of assessing laparoscopic suturing. A software program was developed to order to create a new metric, which would calculate the percentage of time spent operating within pre-defined areas called "zones." Twenty-five candidates (medical students N = 10, surgical residents N = 10, and laparoscopic experts N = 5) performed the laparoscopic suturing task on the ProMIS III(®) simulator. New metrics of "in-zone" and "out-zone" scores as well as traditional metrics of time, path length, and smoothness were generated. Performance was also assessed by two blinded observers using the OSATS and FLS rating scales. This novel metric was evaluated by comparing it to both traditional metrics and subjective scores. There was a significant difference in the average in-zone and out-zone scores between all three experience groups (p < 0.05). The new zone metrics scores correlated significantly with the subjective-blinded observer scores of OSATS and FLS (p = 0.0001). The new zone metric scores also correlated significantly with the traditional metrics of path length, time, and smoothness (p < 0.05). The new metric is a valid tool for assessing laparoscopic suturing objectively. This could be incorporated into a competency-based curriculum to monitor resident progression in the simulated setting.
Metric for evaluation of filter efficiency in spectral cameras.
Nahavandi, Alireza Mahmoudi; Tehran, Mohammad Amani
2016-11-10
Although metric functions that show the performance of a colorimetric imaging device have been investigated, a metric for performance analysis of a set of filters in wideband filter-based spectral cameras has rarely been studied. Based on a generalization of Vora's Measure of Goodness (MOG) and the spanning theorem, a single function metric that estimates the effectiveness of a filter set is introduced. The improved metric, named MMOG, varies between one, for a perfect, and zero, for the worst possible set of filters. Results showed that MMOG exhibits a trend that is more similar to the mean square of spectral reflectance reconstruction errors than does Vora's MOG index, and it is robust to noise in the imaging system. MMOG as a single metric could be exploited for further analysis of manufacturing errors.
Analysis of Skeletal Muscle Metrics as Predictors of Functional Task Performance
NASA Technical Reports Server (NTRS)
Ryder, Jeffrey W.; Buxton, Roxanne E.; Redd, Elizabeth; Scott-Pandorf, Melissa; Hackney, Kyle J.; Fiedler, James; Ploutz-Snyder, Robert J.; Bloomberg, Jacob J.; Ploutz-Snyder, Lori L.
2010-01-01
PURPOSE: The ability to predict task performance using physiological performance metrics is vital to ensure that astronauts can execute their jobs safely and effectively. This investigation used a weighted suit to evaluate task performance at various ratios of strength, power, and endurance to body weight. METHODS: Twenty subjects completed muscle performance tests and functional tasks representative of those that would be required of astronauts during planetary exploration (see table for specific tests/tasks). Subjects performed functional tasks while wearing a weighted suit with additional loads ranging from 0-120% of initial body weight. Performance metrics were time to completion for all tasks except hatch opening, which consisted of total work. Task performance metrics were plotted against muscle metrics normalized to "body weight" (subject weight + external load; BW) for each trial. Fractional polynomial regression was used to model the relationship between muscle and task performance. CONCLUSION: LPMIF/BW is the best predictor of performance for predominantly lower-body tasks that are ambulatory and of short duration. LPMIF/BW is a very practical predictor of occupational task performance as it is quick and relatively safe to perform. Accordingly, bench press work best predicts hatch-opening work performance.
Return on investment in healthcare leadership development programs.
Jeyaraman, Maya M; Qadar, Sheikh Muhammad Zeeshan; Wierzbowski, Aleksandra; Farshidfar, Farnaz; Lys, Justin; Dickson, Graham; Grimes, Kelly; Phillips, Leah A; Mitchell, Jonathan I; Van Aerde, John; Johnson, Dave; Krupka, Frank; Zarychanski, Ryan; Abou-Setta, Ahmed M
2018-02-05
Purpose Strong leadership has been shown to foster change, including loyalty, improved performance and decreased error rates, but there is a dearth of evidence on effectiveness of leadership development programs. To ensure a return on the huge investments made, evidence-based approaches are needed to assess the impact of leadership on health-care establishments. As a part of a pan-Canadian initiative to design an effective evaluative instrument, the purpose of this paper was to identify and summarize evidence on health-care outcomes/return on investment (ROI) indicators and metrics associated with leadership quality, leadership development programs and existing evaluative instruments. Design/methodology/approach The authors performed a scoping review using the Arksey and O'Malley framework, searching eight databases from 2006 through June 2016. Findings Of 11,868 citations screened, the authors included 223 studies reporting on health-care outcomes/ROI indicators and metrics associated with leadership quality (73 studies), leadership development programs (138 studies) and existing evaluative instruments (12 studies). The extracted ROI indicators and metrics have been summarized in detail. Originality/value This review provides a snapshot in time of the current evidence on ROI indicators and metrics associated with leadership. Summarized ROI indicators and metrics can be used to design an effective evaluative instrument to assess the impact of leadership on health-care organizations.
Meaningful Assessment of Robotic Surgical Style using the Wisdom of Crowds.
Ershad, M; Rege, R; Fey, A Majewicz
2018-07-01
Quantitative assessment of surgical skills is an important aspect of surgical training; however, the proposed metrics are sometimes difficult to interpret and may not capture the stylistic characteristics that define expertise. This study proposes a methodology for evaluating the surgical skill, based on metrics associated with stylistic adjectives, and evaluates the ability of this method to differentiate expertise levels. We recruited subjects from different expertise levels to perform training tasks on a surgical simulator. A lexicon of contrasting adjective pairs, based on important skills for robotic surgery, inspired by the global evaluative assessment of robotic skills tool, was developed. To validate the use of stylistic adjectives for surgical skill assessment, posture videos of the subjects performing the task, as well as videos of the task were rated by crowd-workers. Metrics associated with each adjective were found using kinematic and physiological measurements through correlation with the crowd-sourced adjective assignment ratings. To evaluate the chosen metrics' ability in distinguishing expertise levels, two classifiers were trained and tested using these metrics. Crowd-assignment ratings for all adjectives were significantly correlated with expertise levels. The results indicate that naive Bayes classifier performs the best, with an accuracy of [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text] when classifying into four, three, and two levels of expertise, respectively. The proposed method is effective at mapping understandable adjectives of expertise to the stylistic movements and physiological response of trainees.
Automated Assessment of Visual Quality of Digital Video
NASA Technical Reports Server (NTRS)
Watson, Andrew B.; Ellis, Stephen R. (Technical Monitor)
1997-01-01
The advent of widespread distribution of digital video creates a need for automated methods for evaluating visual quality of digital video. This is particularly so since most digital video is compressed using lossy methods, which involve the controlled introduction of potentially visible artifacts. Compounding the problem is the bursty nature of digital video, which requires adaptive bit allocation based on visual quality metrics. In previous work, we have developed visual quality metrics for evaluating, controlling, and optimizing the quality of compressed still images[1-4]. These metrics incorporate simplified models of human visual sensitivity to spatial and chromatic visual signals. The challenge of video quality metrics is to extend these simplified models to temporal signals as well. In this presentation I will discuss a number of the issues that must be resolved in the design of effective video quality metrics. Among these are spatial, temporal, and chromatic sensitivity and their interactions, visual masking, and implementation complexity. I will also touch on the question of how to evaluate the performance of these metrics.
Regime-based evaluation of cloudiness in CMIP5 models
NASA Astrophysics Data System (ADS)
Jin, Daeho; Oreopoulos, Lazaros; Lee, Dongmin
2017-01-01
The concept of cloud regimes (CRs) is used to develop a framework for evaluating the cloudiness of 12 fifth Coupled Model Intercomparison Project (CMIP5) models. Reference CRs come from existing global International Satellite Cloud Climatology Project (ISCCP) weather states. The evaluation is made possible by the implementation in several CMIP5 models of the ISCCP simulator generating in each grid cell daily joint histograms of cloud optical thickness and cloud top pressure. Model performance is assessed with several metrics such as CR global cloud fraction (CF), CR relative frequency of occurrence (RFO), their product [long-term average total cloud amount (TCA)], cross-correlations of CR RFO maps, and a metric of resemblance between model and ISCCP CRs. In terms of CR global RFO, arguably the most fundamental metric, the models perform unsatisfactorily overall, except for CRs representing thick storm clouds. Because model CR CF is internally constrained by our method, RFO discrepancies yield also substantial TCA errors. Our results support previous findings that CMIP5 models underestimate cloudiness. The multi-model mean performs well in matching observed RFO maps for many CRs, but is still not the best for this or other metrics. When overall performance across all CRs is assessed, some models, despite shortcomings, apparently outperform Moderate Resolution Imaging Spectroradiometer cloud observations evaluated against ISCCP like another model output. Lastly, contrasting cloud simulation performance against each model's equilibrium climate sensitivity in order to gain insight on whether good cloud simulation pairs with particular values of this parameter, yields no clear conclusions.
Vehicle Integrated Prognostic Reasoner (VIPR) Metric Report
NASA Technical Reports Server (NTRS)
Cornhill, Dennis; Bharadwaj, Raj; Mylaraswamy, Dinkar
2013-01-01
This document outlines a set of metrics for evaluating the diagnostic and prognostic schemes developed for the Vehicle Integrated Prognostic Reasoner (VIPR), a system-level reasoner that encompasses the multiple levels of large, complex systems such as those for aircraft and spacecraft. VIPR health managers are organized hierarchically and operate together to derive diagnostic and prognostic inferences from symptoms and conditions reported by a set of diagnostic and prognostic monitors. For layered reasoners such as VIPR, the overall performance cannot be evaluated by metrics solely directed toward timely detection and accuracy of estimation of the faults in individual components. Among other factors, overall vehicle reasoner performance is governed by the effectiveness of the communication schemes between monitors and reasoners in the architecture, and the ability to propagate and fuse relevant information to make accurate, consistent, and timely predictions at different levels of the reasoner hierarchy. We outline an extended set of diagnostic and prognostics metrics that can be broadly categorized as evaluation measures for diagnostic coverage, prognostic coverage, accuracy of inferences, latency in making inferences, computational cost, and sensitivity to different fault and degradation conditions. We report metrics from Monte Carlo experiments using two variations of an aircraft reference model that supported both flat and hierarchical reasoning.
Rating and Ranking the Role of Bibliometrics and Webometrics in Nursing and Midwifery
Davidson, Patricia M.; Newton, Phillip J.; Ferguson, Caleb
2014-01-01
Background. Bibliometrics are an essential aspect of measuring academic and organizational performance. Aim. This review seeks to describe methods for measuring bibliometrics, identify the strengths and limitations of methodologies, outline strategies for interpretation, summarise evaluation of nursing and midwifery performance, identify implications for metric of evaluation, and specify the implications for nursing and midwifery and implications of social networking for bibliometrics and measures of individual performance. Method. A review of electronic databases CINAHL, Medline, and Scopus was undertaken using search terms such as bibliometrics, nursing, and midwifery. The reference lists of retrieved articles and Internet sources and social media platforms were also examined. Results. A number of well-established, formal ways of assessment have been identified, including h- and c-indices. Changes in publication practices and the use of the Internet have challenged traditional metrics of influence. Moreover, measuring impact beyond citation metrics is an increasing focus, with social media representing newer ways of establishing performance and impact. Conclusions. Even though a number of measures exist, no single bibliometric measure is perfect. Therefore, multiple approaches to evaluation are recommended. However, bibliometric approaches should not be the only measures upon which academic and scholarly performance are evaluated. PMID:24550691
Garfjeld Roberts, Patrick; Guyver, Paul; Baldwin, Mathew; Akhtar, Kash; Alvand, Abtin; Price, Andrew J; Rees, Jonathan L
2017-02-01
To assess the construct and face validity of ArthroS, a passive haptic VR simulator. A secondary aim was to evaluate the novel performance metrics produced by this simulator. Two groups of 30 participants, each divided into novice, intermediate or expert based on arthroscopic experience, completed three separate tasks on either the knee or shoulder module of the simulator. Performance was recorded using 12 automatically generated performance metrics and video footage of the arthroscopic procedures. The videos were blindly assessed using a validated global rating scale (GRS). Participants completed a survey about the simulator's realism and training utility. This new simulator demonstrated construct validity of its tasks when evaluated against a GRS (p ≤ 0.003 in all cases). Regarding it's automatically generated performance metrics, established outputs such as time taken (p ≤ 0.001) and instrument path length (p ≤ 0.007) also demonstrated good construct validity. However, two-thirds of the proposed 'novel metrics' the simulator reports could not distinguish participants based on arthroscopic experience. Face validity assessment rated the simulator as a realistic and useful tool for trainees, but the passive haptic feedback (a key feature of this simulator) is rated as less realistic. The ArthroS simulator has good task construct validity based on established objective outputs, but some of the novel performance metrics could not distinguish between surgical experience. The passive haptic feedback of the simulator also needs improvement. If simulators could offer automated and validated performance feedback, this would facilitate improvements in the delivery of training by allowing trainees to practise and self-assess.
Model Performance Evaluation and Scenario Analysis ...
This tool consists of two parts: model performance evaluation and scenario analysis (MPESA). The model performance evaluation consists of two components: model performance evaluation metrics and model diagnostics. These metrics provides modelers with statistical goodness-of-fit measures that capture magnitude only, sequence only, and combined magnitude and sequence errors. The performance measures include error analysis, coefficient of determination, Nash-Sutcliffe efficiency, and a new weighted rank method. These performance metrics only provide useful information about the overall model performance. Note that MPESA is based on the separation of observed and simulated time series into magnitude and sequence components. The separation of time series into magnitude and sequence components and the reconstruction back to time series provides diagnostic insights to modelers. For example, traditional approaches lack the capability to identify if the source of uncertainty in the simulated data is due to the quality of the input data or the way the analyst adjusted the model parameters. This report presents a suite of model diagnostics that identify if mismatches between observed and simulated data result from magnitude or sequence related errors. MPESA offers graphical and statistical options that allow HSPF users to compare observed and simulated time series and identify the parameter values to adjust or the input data to modify. The scenario analysis part of the too
Assessing precision, bias and sigma-metrics of 53 measurands of the Alinity ci system.
Westgard, Sten; Petrides, Victoria; Schneider, Sharon; Berman, Marvin; Herzogenrath, Jörg; Orzechowski, Anthony
2017-12-01
Assay performance is dependent on the accuracy and precision of a given method. These attributes can be combined into an analytical Sigma-metric, providing a simple value for laboratorians to use in evaluating a test method's capability to meet its analytical quality requirements. Sigma-metrics were determined for 37 clinical chemistry assays, 13 immunoassays, and 3 ICT methods on the Alinity ci system. Analytical Performance Specifications were defined for the assays, following a rationale of using CLIA goals first, then Ricos Desirable goals when CLIA did not regulate the method, and then other sources if the Ricos Desirable goal was unrealistic. A precision study was conducted at Abbott on each assay using the Alinity ci system following the CLSI EP05-A2 protocol. Bias was estimated following the CLSI EP09-A3 protocol using samples with concentrations spanning the assay's measuring interval tested in duplicate on the Alinity ci system and ARCHITECT c8000 and i2000 SR systems, where testing was also performed at Abbott. Using the regression model, the %bias was estimated at an important medical decisions point. Then the Sigma-metric was estimated for each assay and was plotted on a method decision chart. The Sigma-metric was calculated using the equation: Sigma-metric=(%TEa-|%bias|)/%CV. The Sigma-metrics and Normalized Method Decision charts demonstrate that a majority of the Alinity assays perform at least at five Sigma or higher, at or near critical medical decision levels. More than 90% of the assays performed at Five and Six Sigma. None performed below Three Sigma. Sigma-metrics plotted on Normalized Method Decision charts provide useful evaluations of performance. The majority of Alinity ci system assays had sigma values >5 and thus laboratories can expect excellent or world class performance. Laboratorians can use these tools as aids in choosing high-quality products, further contributing to the delivery of excellent quality healthcare for patients. Copyright © 2017 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.
Raghubar, Kimberly P; Lamba, Michael; Cecil, Kim M; Yeates, Keith Owen; Mahone, E Mark; Limke, Christina; Grosshans, David; Beckwith, Travis J; Ris, M Douglas
2018-06-01
Advances in radiation treatment (RT), specifically volumetric planning with detailed dose and volumetric data for specific brain structures, have provided new opportunities to study neurobehavioral outcomes of RT in children treated for brain tumor. The present study examined the relationship between biophysical and physical dose metrics and neurocognitive ability, namely learning and memory, 2 years post-RT in pediatric brain tumor patients. The sample consisted of 26 pediatric patients with brain tumor, 14 of whom completed neuropsychological evaluations on average 24 months post-RT. Prescribed dose and dose-volume metrics for specific brain regions were calculated including physical metrics (i.e., mean dose and maximum dose) and biophysical metrics (i.e., integral biological effective dose and generalized equivalent uniform dose). We examined the associations between dose-volume metrics (whole brain, right and left hippocampus), and performance on measures of learning and memory (Children's Memory Scale). Biophysical dose metrics were highly correlated with the physical metric of mean dose but not with prescribed dose. Biophysical metrics and mean dose, but not prescribed dose, correlated with measures of learning and memory. These preliminary findings call into question the value of prescribed dose for characterizing treatment intensity; they also suggest that biophysical dose has only a limited advantage compared to physical dose when calculated for specific regions of the brain. We discuss the implications of the findings for evaluating and understanding the relation between RT and neurocognitive functioning. © 2018 Wiley Periodicals, Inc.
Performance comparison of optical interference cancellation system architectures.
Lu, Maddie; Chang, Matt; Deng, Yanhua; Prucnal, Paul R
2013-04-10
The performance of three optics-based interference cancellation systems are compared and contrasted with each other, and with traditional electronic techniques for interference cancellation. The comparison is based on a set of common performance metrics that we have developed for this purpose. It is shown that thorough evaluation of our optical approaches takes into account the traditional notions of depth of cancellation and dynamic range, along with notions of link loss and uniformity of cancellation. Our evaluation shows that our use of optical components affords performance that surpasses traditional electronic approaches, and that the optimal choice for an optical interference canceller requires taking into account the performance metrics discussed in this paper.
Regime-Based Evaluation of Cloudiness in CMIP5 Models
NASA Technical Reports Server (NTRS)
Jin, Daeho; Oraiopoulos, Lazaros; Lee, Dong Min
2016-01-01
The concept of Cloud Regimes (CRs) is used to develop a framework for evaluating the cloudiness of 12 fifth Coupled Model Intercomparison Project (CMIP5) models. Reference CRs come from existing global International Satellite Cloud Climatology Project (ISCCP) weather states. The evaluation is made possible by the implementation in several CMIP5 models of the ISCCP simulator generating for each gridcell daily joint histograms of cloud optical thickness and cloud top pressure. Model performance is assessed with several metrics such as CR global cloud fraction (CF), CR relative frequency of occurrence (RFO), their product (long-term average total cloud amount [TCA]), cross-correlations of CR RFO maps, and a metric of resemblance between model and ISCCP CRs. In terms of CR global RFO, arguably the most fundamental metric, the models perform unsatisfactorily overall, except for CRs representing thick storm clouds. Because model CR CF is internally constrained by our method, RFO discrepancies yield also substantial TCA errors. Our findings support previous studies showing that CMIP5 models underestimate cloudiness. The multi-model mean performs well in matching observed RFO maps for many CRs, but is not the best for this or other metrics. When overall performance across all CRs is assessed, some models, despite their shortcomings, apparently outperform Moderate Resolution Imaging Spectroradiometer (MODIS) cloud observations evaluated against ISCCP as if they were another model output. Lastly, cloud simulation performance is contrasted with each model's equilibrium climate sensitivity (ECS) in order to gain insight on whether good cloud simulation pairs with particular values of this parameter.
Decision-relevant evaluation of climate models: A case study of chill hours in California
NASA Astrophysics Data System (ADS)
Jagannathan, K. A.; Jones, A. D.; Kerr, A. C.
2017-12-01
The past decade has seen a proliferation of different climate datasets with over 60 climate models currently in use. Comparative evaluation and validation of models can assist practitioners chose the most appropriate models for adaptation planning. However, such assessments are usually conducted for `climate metrics' such as seasonal temperature, while sectoral decisions are often based on `decision-relevant outcome metrics' such as growing degree days or chill hours. Since climate models predict different metrics with varying skill, the goal of this research is to conduct a bottom-up evaluation of model skill for `outcome-based' metrics. Using chill hours (number of hours in winter months where temperature is lesser than 45 deg F) in Fresno, CA as a case, we assess how well different GCMs predict the historical mean and slope of chill hours, and whether and to what extent projections differ based on model selection. We then compare our results with other climate-based evaluations of the region, to identify similarities and differences. For the model skill evaluation, historically observed chill hours were compared with simulations from 27 GCMs (and multiple ensembles). Model skill scores were generated based on a statistical hypothesis test of the comparative assessment. Future projections from RCP 8.5 runs were evaluated, and a simple bias correction was also conducted. Our analysis indicates that model skill in predicting chill hour slope is dependent on its skill in predicting mean chill hours, which results from the non-linear nature of the chill metric. However, there was no clear relationship between the models that performed well for the chill hour metric and those that performed well in other temperature-based evaluations (such winter minimum temperature or diurnal temperature range). Further, contrary to conclusions from other studies, we also found that the multi-model mean or large ensemble mean results may not always be most appropriate for this outcome metric. Our assessment sheds light on key differences between global versus local skill, and broad versus specific skill of climate models, highlighting that decision-relevant model evaluation may be crucial for providing practitioners with the best available climate information for their specific needs.
Evaluation of eye metrics as a detector of fatigue.
McKinley, R Andy; McIntire, Lindsey K; Schmidt, Regina; Repperger, Daniel W; Caldwell, John A
2011-08-01
This study evaluated oculometrics as a detector of fatigue in Air Force-relevant tasks after sleep deprivation. Using the metrics of total eye closure duration (PERCLOS) and approximate entropy (ApEn), the relation between these eye metrics and fatigue-induced performance decrements was investigated. One damaging effect to the successful outcome of operational military missions is that attributed to sleep deprivation-induced fatigue. Consequently, there is interest in the development of reliable monitoring devices that can assess when an operator is overly fatigued. Ten civilian participants volunteered to serve in this study. Each was trained on three performance tasks: target identification, unmanned aerial vehicle landing, and the psychomotor vigilance task (PVT). Experimental testing began after 14 hr awake and continued every 2 hr until 28 hr of sleep deprivation was reached. Performance on the PVT and target identification tasks declined significantly as the level of sleep deprivation increased.These performance declines were paralleled more closely by changes in the ApEn compared to the PERCLOS measure. The results provide evidence that the ApEn eye metric can be used to detect fatigue in relevant military aviation tasks. Military and commercial operators could benefit from an alertness monitoring device.
Climate Classification is an Important Factor in Assessing Hospital Performance Metrics
NASA Astrophysics Data System (ADS)
Boland, M. R.; Parhi, P.; Gentine, P.; Tatonetti, N. P.
2017-12-01
Context/Purpose: Climate is a known modulator of disease, but its impact on hospital performance metrics remains unstudied. Methods: We assess the relationship between Köppen-Geiger climate classification and hospital performance metrics, specifically 30-day mortality, as reported in Hospital Compare, and collected for the period July 2013 through June 2014 (7/1/2013 - 06/30/2014). A hospital-level multivariate linear regression analysis was performed while controlling for known socioeconomic factors to explore the relationship between all-cause mortality and climate. Hospital performance scores were obtained from 4,524 hospitals belonging to 15 distinct Köppen-Geiger climates and 2,373 unique counties. Results: Model results revealed that hospital performance metrics for mortality showed significant climate dependence (p<0.001) after adjusting for socioeconomic factors. Interpretation: Currently, hospitals are reimbursed by Governmental agencies using 30-day mortality rates along with 30-day readmission rates. These metrics allow Government agencies to rank hospitals according to their `performance' along these metrics. Various socioeconomic factors are taken into consideration when determining individual hospitals performance. However, no climate-based adjustment is made within the existing framework. Our results indicate that climate-based variability in 30-day mortality rates does exist even after socioeconomic confounder adjustment. Use of standardized high-level climate classification systems (such as Koppen-Geiger) would be useful to incorporate in future metrics. Conclusion: Climate is a significant factor in evaluating hospital 30-day mortality rates. These results demonstrate that climate classification is an important factor when comparing hospital performance across the United States.
Quality Measures for Dialysis: Time for a Balanced Scorecard
2016-01-01
Recent federal legislation establishes a merit-based incentive payment system for physicians, with a scorecard for each professional. The Centers for Medicare and Medicaid Services evaluate quality of care with clinical performance measures and have used these metrics for public reporting and payment to dialysis facilities. Similar metrics may be used for the future merit-based incentive payment system. In nephrology, most clinical performance measures measure processes and intermediate outcomes of care. These metrics were developed from population studies of best practice and do not identify opportunities for individualizing care on the basis of patient characteristics and individual goals of treatment. The In-Center Hemodialysis (ICH) Consumer Assessment of Healthcare Providers and Systems (CAHPS) survey examines patients' perception of care and has entered the arena to evaluate quality of care. A balanced scorecard of quality performance should include three elements: population-based best clinical practice, patient perceptions, and individually crafted patient goals of care. PMID:26316622
Performance Benchmarks for Scholarly Metrics Associated with Fisheries and Wildlife Faculty
Swihart, Robert K.; Sundaram, Mekala; Höök, Tomas O.; DeWoody, J. Andrew; Kellner, Kenneth F.
2016-01-01
Research productivity and impact are often considered in professional evaluations of academics, and performance metrics based on publications and citations increasingly are used in such evaluations. To promote evidence-based and informed use of these metrics, we collected publication and citation data for 437 tenure-track faculty members at 33 research-extensive universities in the United States belonging to the National Association of University Fisheries and Wildlife Programs. For each faculty member, we computed 8 commonly used performance metrics based on numbers of publications and citations, and recorded covariates including academic age (time since Ph.D.), sex, percentage of appointment devoted to research, and the sub-disciplinary research focus. Standardized deviance residuals from regression models were used to compare faculty after accounting for variation in performance due to these covariates. We also aggregated residuals to enable comparison across universities. Finally, we tested for temporal trends in citation practices to assess whether the “law of constant ratios”, used to enable comparison of performance metrics between disciplines that differ in citation and publication practices, applied to fisheries and wildlife sub-disciplines when mapped to Web of Science Journal Citation Report categories. Our regression models reduced deviance by ¼ to ½. Standardized residuals for each faculty member, when combined across metrics as a simple average or weighted via factor analysis, produced similar results in terms of performance based on percentile rankings. Significant variation was observed in scholarly performance across universities, after accounting for the influence of covariates. In contrast to findings for other disciplines, normalized citation ratios for fisheries and wildlife sub-disciplines increased across years. Increases were comparable for all sub-disciplines except ecology. We discuss the advantages and limitations of our methods, illustrate their use when applied to new data, and suggest future improvements. Our benchmarking approach may provide a useful tool to augment detailed, qualitative assessment of performance. PMID:27152838
Performance Benchmarks for Scholarly Metrics Associated with Fisheries and Wildlife Faculty.
Swihart, Robert K; Sundaram, Mekala; Höök, Tomas O; DeWoody, J Andrew; Kellner, Kenneth F
2016-01-01
Research productivity and impact are often considered in professional evaluations of academics, and performance metrics based on publications and citations increasingly are used in such evaluations. To promote evidence-based and informed use of these metrics, we collected publication and citation data for 437 tenure-track faculty members at 33 research-extensive universities in the United States belonging to the National Association of University Fisheries and Wildlife Programs. For each faculty member, we computed 8 commonly used performance metrics based on numbers of publications and citations, and recorded covariates including academic age (time since Ph.D.), sex, percentage of appointment devoted to research, and the sub-disciplinary research focus. Standardized deviance residuals from regression models were used to compare faculty after accounting for variation in performance due to these covariates. We also aggregated residuals to enable comparison across universities. Finally, we tested for temporal trends in citation practices to assess whether the "law of constant ratios", used to enable comparison of performance metrics between disciplines that differ in citation and publication practices, applied to fisheries and wildlife sub-disciplines when mapped to Web of Science Journal Citation Report categories. Our regression models reduced deviance by ¼ to ½. Standardized residuals for each faculty member, when combined across metrics as a simple average or weighted via factor analysis, produced similar results in terms of performance based on percentile rankings. Significant variation was observed in scholarly performance across universities, after accounting for the influence of covariates. In contrast to findings for other disciplines, normalized citation ratios for fisheries and wildlife sub-disciplines increased across years. Increases were comparable for all sub-disciplines except ecology. We discuss the advantages and limitations of our methods, illustrate their use when applied to new data, and suggest future improvements. Our benchmarking approach may provide a useful tool to augment detailed, qualitative assessment of performance.
Ranking streamflow model performance based on Information theory metrics
NASA Astrophysics Data System (ADS)
Martinez, Gonzalo; Pachepsky, Yakov; Pan, Feng; Wagener, Thorsten; Nicholson, Thomas
2016-04-01
The accuracy-based model performance metrics not necessarily reflect the qualitative correspondence between simulated and measured streamflow time series. The objective of this work was to use the information theory-based metrics to see whether they can be used as complementary tool for hydrologic model evaluation and selection. We simulated 10-year streamflow time series in five watersheds located in Texas, North Carolina, Mississippi, and West Virginia. Eight model of different complexity were applied. The information-theory based metrics were obtained after representing the time series as strings of symbols where different symbols corresponded to different quantiles of the probability distribution of streamflow. The symbol alphabet was used. Three metrics were computed for those strings - mean information gain that measures the randomness of the signal, effective measure complexity that characterizes predictability and fluctuation complexity that characterizes the presence of a pattern in the signal. The observed streamflow time series has smaller information content and larger complexity metrics than the precipitation time series. Watersheds served as information filters and and streamflow time series were less random and more complex than the ones of precipitation. This is reflected the fact that the watershed acts as the information filter in the hydrologic conversion process from precipitation to streamflow. The Nash Sutcliffe efficiency metric increased as the complexity of models increased, but in many cases several model had this efficiency values not statistically significant from each other. In such cases, ranking models by the closeness of the information-theory based parameters in simulated and measured streamflow time series can provide an additional criterion for the evaluation of hydrologic model performance.
Nyflot, Matthew J.; Yang, Fei; Byrd, Darrin; Bowen, Stephen R.; Sandison, George A.; Kinahan, Paul E.
2015-01-01
Abstract. Image heterogeneity metrics such as textural features are an active area of research for evaluating clinical outcomes with positron emission tomography (PET) imaging and other modalities. However, the effects of stochastic image acquisition noise on these metrics are poorly understood. We performed a simulation study by generating 50 statistically independent PET images of the NEMA IQ phantom with realistic noise and resolution properties. Heterogeneity metrics based on gray-level intensity histograms, co-occurrence matrices, neighborhood difference matrices, and zone size matrices were evaluated within regions of interest surrounding the lesions. The impact of stochastic variability was evaluated with percent difference from the mean of the 50 realizations, coefficient of variation and estimated sample size for clinical trials. Additionally, sensitivity studies were performed to simulate the effects of patient size and image reconstruction method on the quantitative performance of these metrics. Complex trends in variability were revealed as a function of textural feature, lesion size, patient size, and reconstruction parameters. In conclusion, the sensitivity of PET textural features to normal stochastic image variation and imaging parameters can be large and is feature-dependent. Standards are needed to ensure that prospective studies that incorporate textural features are properly designed to measure true effects that may impact clinical outcomes. PMID:26251842
Nyflot, Matthew J; Yang, Fei; Byrd, Darrin; Bowen, Stephen R; Sandison, George A; Kinahan, Paul E
2015-10-01
Image heterogeneity metrics such as textural features are an active area of research for evaluating clinical outcomes with positron emission tomography (PET) imaging and other modalities. However, the effects of stochastic image acquisition noise on these metrics are poorly understood. We performed a simulation study by generating 50 statistically independent PET images of the NEMA IQ phantom with realistic noise and resolution properties. Heterogeneity metrics based on gray-level intensity histograms, co-occurrence matrices, neighborhood difference matrices, and zone size matrices were evaluated within regions of interest surrounding the lesions. The impact of stochastic variability was evaluated with percent difference from the mean of the 50 realizations, coefficient of variation and estimated sample size for clinical trials. Additionally, sensitivity studies were performed to simulate the effects of patient size and image reconstruction method on the quantitative performance of these metrics. Complex trends in variability were revealed as a function of textural feature, lesion size, patient size, and reconstruction parameters. In conclusion, the sensitivity of PET textural features to normal stochastic image variation and imaging parameters can be large and is feature-dependent. Standards are needed to ensure that prospective studies that incorporate textural features are properly designed to measure true effects that may impact clinical outcomes.
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
Kobourov, Stephen; Gallant, Mike; Börner, Katy
2016-01-01
Overview Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms—Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. Cluster Quality Metrics We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Network Clustering Algorithms Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters. PMID:27391786
NASA Astrophysics Data System (ADS)
Marshak, William P.; Darkow, David J.; Wesler, Mary M.; Fix, Edward L.
2000-08-01
Computer-based display designers have more sensory modes and more dimensions within sensory modality with which to encode information in a user interface than ever before. This elaboration of information presentation has made measurement of display/format effectiveness and predicting display/format performance extremely difficult. A multivariate method has been devised which isolates critical information, physically measures its signal strength, and compares it with other elements of the display, which act like background noise. This common Metric relates signal-to-noise ratios (SNRs) within each stimulus dimension, then combines SNRs among display modes, dimensions and cognitive factors can predict display format effectiveness. Examples with their Common Metric assessment and validation in performance will be presented along with the derivation of the metric. Implications of the Common Metric in display design and evaluation will be discussed.
LIFE CYCLE DESIGN OF AMORPHOUS SILICON PHOTOVOLTAIC MODULES
The life cycle design framework was applied to photovoltaic module design. The primary objective of this project was to develop and evaluate design metrics for assessing and guiding the Improvement of PV product systems. Two metrics were used to assess life cycle energy perform...
WE-E-217A-02: Methodologies for Evaluation of Standalone CAD System Performance.
Sahiner, B
2012-06-01
Standalone performance evaluation of a CAD system provides information about the abnormality detection or classification performance of the computerized system alone. Although the performance of the reader with CAD is the final step in CAD system assessment, standalone performance evaluation is an important component for several reasons: First, standalone evaluation informs the reader about the performance level of the CAD system and may have an impact on how the reader uses the system. Second, it provides essential information to the system designer for algorithm optimization during system development. Third, standalone evaluation can provide a detailed description of algorithm performance (e.g., on subgroups of the population) because a larger data set with more samples from different subgroups can be included in standalone studies compared to reader studies. Proper standalone evaluation of a CAD system involves a number of key components, some of which are shared with the assessment of reader performance with CAD. These include (1) selection of a test data set that allows performance assessment with little or no bias and acceptable uncertainty; (2) a reference standard that indicates disease status as well as the location and extent of disease; (3) a clearly defined method for labeling each CAD mark as a true-positive or false-positive; and (4) a properly selected set of metrics to summarize the accuracy of the computer marks and their corresponding scores. In this lecture, we will discuss various approaches for the key components of standalone CAD performance evaluation listed above, and present some of the recommendations and opinions from the AAPM CAD subcommittee on these issues. Learning Objectives 1. Identify basic components and metrics in the assessment of standalone CAD systems 2. Understand how each component may affect the assessed performance 3. Learn about AAPM CAD subcommittee's opinions and recommendations on factors and metrics related to the evaluation of standalone CAD system performance. © 2012 American Association of Physicists in Medicine.
Improving Climate Projections Using "Intelligent" Ensembles
NASA Technical Reports Server (NTRS)
Baker, Noel C.; Taylor, Patrick C.
2015-01-01
Recent changes in the climate system have led to growing concern, especially in communities which are highly vulnerable to resource shortages and weather extremes. There is an urgent need for better climate information to develop solutions and strategies for adapting to a changing climate. Climate models provide excellent tools for studying the current state of climate and making future projections. However, these models are subject to biases created by structural uncertainties. Performance metrics-or the systematic determination of model biases-succinctly quantify aspects of climate model behavior. Efforts to standardize climate model experiments and collect simulation data-such as the Coupled Model Intercomparison Project (CMIP)-provide the means to directly compare and assess model performance. Performance metrics have been used to show that some models reproduce present-day climate better than others. Simulation data from multiple models are often used to add value to projections by creating a consensus projection from the model ensemble, in which each model is given an equal weight. It has been shown that the ensemble mean generally outperforms any single model. It is possible to use unequal weights to produce ensemble means, in which models are weighted based on performance (called "intelligent" ensembles). Can performance metrics be used to improve climate projections? Previous work introduced a framework for comparing the utility of model performance metrics, showing that the best metrics are related to the variance of top-of-atmosphere outgoing longwave radiation. These metrics improve present-day climate simulations of Earth's energy budget using the "intelligent" ensemble method. The current project identifies several approaches for testing whether performance metrics can be applied to future simulations to create "intelligent" ensemble-mean climate projections. It is shown that certain performance metrics test key climate processes in the models, and that these metrics can be used to evaluate model quality in both current and future climate states. This information will be used to produce new consensus projections and provide communities with improved climate projections for urgent decision-making.
Performance metrics for the assessment of satellite data products: an ocean color case study
Seegers, Bridget N.; Stumpf, Richard P.; Schaeffer, Blake A.; Loftin, Keith A.; Werdell, P. Jeremy
2018-01-01
Performance assessment of ocean color satellite data has generally relied on statistical metrics chosen for their common usage and the rationale for selecting certain metrics is infrequently explained. Commonly reported statistics based on mean squared errors, such as the coefficient of determination (r2), root mean square error, and regression slopes, are most appropriate for Gaussian distributions without outliers and, therefore, are often not ideal for ocean color algorithm performance assessment, which is often limited by sample availability. In contrast, metrics based on simple deviations, such as bias and mean absolute error, as well as pair-wise comparisons, often provide more robust and straightforward quantities for evaluating ocean color algorithms with non-Gaussian distributions and outliers. This study uses a SeaWiFS chlorophyll-a validation data set to demonstrate a framework for satellite data product assessment and recommends a multi-metric and user-dependent approach that can be applied within science, modeling, and resource management communities. PMID:29609296
Developing a Common Metric for Evaluating Police Performance in Deadly Force Situations
2012-08-27
2005).“Police Inservice Deadly Force Training and Requalification in Washington State.” Law Enforcement Executive Forum, 5(2):67-86. NIJ Metric...OF: EXECUTIVE SUMMARY Background There is a critical lack of scientific evidence about whether deadly force management, accountability and training ...Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 15. SUBJECT TERMS training metrics develoment, deadly encounters
Evaluation of image quality metrics for the prediction of subjective best focus.
Kilintari, Marina; Pallikaris, Aristophanis; Tsiklis, Nikolaos; Ginis, Harilaos S
2010-03-01
Seven existing and three new image quality metrics were evaluated in terms of their effectiveness in predicting subjective cycloplegic refraction. Monochromatic wavefront aberrations (WA) were measured in 70 eyes using a Shack-Hartmann based device (Complete Ophthalmic Analysis System; Wavefront Sciences). Subjective cycloplegic spherocylindrical correction was obtained using a standard manifest refraction procedure. The dioptric amount required to optimize each metric was calculated and compared with the subjective refraction result. Metrics included monochromatic and polychromatic variants, as well as variants taking into consideration the Stiles and Crawford effect (SCE). WA measurements were performed using infrared light and converted to visible before all calculations. The mean difference between subjective cycloplegic and WA-derived spherical refraction ranged from 0.17 to 0.36 diopters (D), while paraxial curvature resulted in a difference of 0.68 D. Monochromatic metrics exhibited smaller mean differences between subjective cycloplegic and objective refraction. Consideration of the SCE reduced the standard deviation (SD) of the difference between subjective and objective refraction. All metrics exhibited similar performance in terms of accuracy and precision. We hypothesize that errors pertaining to the conversion between infrared and visible wavelengths rather than calculation method may be the limiting factor in determining objective best focus from near infrared WA measurements.
Cuesta-Frau, David; Miró-Martínez, Pau; Jordán Núñez, Jorge; Oltra-Crespo, Sandra; Molina Picó, Antonio
2017-08-01
This paper evaluates the performance of first generation entropy metrics, featured by the well known and widely used Approximate Entropy (ApEn) and Sample Entropy (SampEn) metrics, and what can be considered an evolution from these, Fuzzy Entropy (FuzzyEn), in the Electroencephalogram (EEG) signal classification context. The study uses the commonest artifacts found in real EEGs, such as white noise, and muscular, cardiac, and ocular artifacts. Using two different sets of publicly available EEG records, and a realistic range of amplitudes for interfering artifacts, this work optimises and assesses the robustness of these metrics against artifacts in class segmentation terms probability. The results show that the qualitative behaviour of the two datasets is similar, with SampEn and FuzzyEn performing the best, and the noise and muscular artifacts are the most confounding factors. On the contrary, there is a wide variability as regards initialization parameters. The poor performance achieved by ApEn suggests that this metric should not be used in these contexts. Copyright © 2017 Elsevier Ltd. All rights reserved.
Shwartz, Michael; Peköz, Erol A; Burgess, James F; Christiansen, Cindy L; Rosen, Amy K; Berlowitz, Dan
2014-12-01
Two approaches are commonly used for identifying high-performing facilities on a performance measure: one, that the facility is in a top quantile (eg, quintile or quartile); and two, that a confidence interval is below (or above) the average of the measure for all facilities. This type of yes/no designation often does not do well in distinguishing high-performing from average-performing facilities. To illustrate an alternative continuous-valued metric for profiling facilities--the probability a facility is in a top quantile--and show the implications of using this metric for profiling and pay-for-performance. We created a composite measure of quality from fiscal year 2007 data based on 28 quality indicators from 112 Veterans Health Administration nursing homes. A Bayesian hierarchical multivariate normal-binomial model was used to estimate shrunken rates of the 28 quality indicators, which were combined into a composite measure using opportunity-based weights. Rates were estimated using Markov Chain Monte Carlo methods as implemented in WinBUGS. The probability metric was calculated from the simulation replications. Our probability metric allowed better discrimination of high performers than the point or interval estimate of the composite score. In a pay-for-performance program, a smaller top quantile (eg, a quintile) resulted in more resources being allocated to the highest performers, whereas a larger top quantile (eg, being above the median) distinguished less among high performers and allocated more resources to average performers. The probability metric has potential but needs to be evaluated by stakeholders in different types of delivery systems.
Evaluating which plan quality metrics are appropriate for use in lung SBRT.
Yaparpalvi, Ravindra; Garg, Madhur K; Shen, Jin; Bodner, William R; Mynampati, Dinesh K; Gafar, Aleiya; Kuo, Hsiang-Chi; Basavatia, Amar K; Ohri, Nitin; Hong, Linda X; Kalnicki, Shalom; Tome, Wolfgang A
2018-02-01
Several dose metrics in the categories-homogeneity, coverage, conformity and gradient have been proposed in literature for evaluating treatment plan quality. In this study, we applied these metrics to characterize and identify the plan quality metrics that would merit plan quality assessment in lung stereotactic body radiation therapy (SBRT) dose distributions. Treatment plans of 90 lung SBRT patients, comprising 91 targets, treated in our institution were retrospectively reviewed. Dose calculations were performed using anisotropic analytical algorithm (AAA) with heterogeneity correction. A literature review on published plan quality metrics in the categories-coverage, homogeneity, conformity and gradient was performed. For each patient, using dose-volume histogram data, plan quality metric values were quantified and analysed. For the study, the radiation therapy oncology group (RTOG) defined plan quality metrics were: coverage (0.90 ± 0.08); homogeneity (1.27 ± 0.07); conformity (1.03 ± 0.07) and gradient (4.40 ± 0.80). Geometric conformity strongly correlated with conformity index (p < 0.0001). Gradient measures strongly correlated with target volume (p < 0.0001). The RTOG lung SBRT protocol advocated conformity guidelines for prescribed dose in all categories were met in ≥94% of cases. The proportion of total lung volume receiving doses of 20 Gy and 5 Gy (V 20 and V 5 ) were mean 4.8% (±3.2) and 16.4% (±9.2), respectively. Based on our study analyses, we recommend the following metrics as appropriate surrogates for establishing SBRT lung plan quality guidelines-coverage % (ICRU 62), conformity (CN or CI Paddick ) and gradient (R 50% ). Furthermore, we strongly recommend that RTOG lung SBRT protocols adopt either CN or CI Padddick in place of prescription isodose to target volume ratio for conformity index evaluation. Advances in knowledge: Our study metrics are valuable tools for establishing lung SBRT plan quality guidelines.
Evaluation of motion artifact metrics for coronary CT angiography.
Ma, Hongfeng; Gros, Eric; Szabo, Aniko; Baginski, Scott G; Laste, Zachary R; Kulkarni, Naveen M; Okerlund, Darin; Schmidt, Taly G
2018-02-01
This study quantified the performance of coronary artery motion artifact metrics relative to human observer ratings. Motion artifact metrics have been used as part of motion correction and best-phase selection algorithms for Coronary Computed Tomography Angiography (CCTA). However, the lack of ground truth makes it difficult to validate how well the metrics quantify the level of motion artifact. This study investigated five motion artifact metrics, including two novel metrics, using a dynamic phantom, clinical CCTA images, and an observer study that provided ground-truth motion artifact scores from a series of pairwise comparisons. Five motion artifact metrics were calculated for the coronary artery regions on both phantom and clinical CCTA images: positivity, entropy, normalized circularity, Fold Overlap Ratio (FOR), and Low-Intensity Region Score (LIRS). CT images were acquired of a dynamic cardiac phantom that simulated cardiac motion and contained six iodine-filled vessels of varying diameter and with regions of soft plaque and calcifications. Scans were repeated with different gantry start angles. Images were reconstructed at five phases of the motion cycle. Clinical images were acquired from 14 CCTA exams with patient heart rates ranging from 52 to 82 bpm. The vessel and shading artifacts were manually segmented by three readers and combined to create ground-truth artifact regions. Motion artifact levels were also assessed by readers using a pairwise comparison method to establish a ground-truth reader score. The Kendall's Tau coefficients were calculated to evaluate the statistical agreement in ranking between the motion artifacts metrics and reader scores. Linear regression between the reader scores and the metrics was also performed. On phantom images, the Kendall's Tau coefficients of the five motion artifact metrics were 0.50 (normalized circularity), 0.35 (entropy), 0.82 (positivity), 0.77 (FOR), 0.77(LIRS), where higher Kendall's Tau signifies higher agreement. The FOR, LIRS, and transformed positivity (the fourth root of the positivity) were further evaluated in the study of clinical images. The Kendall's Tau coefficients of the selected metrics were 0.59 (FOR), 0.53 (LIRS), and 0.21 (Transformed positivity). In the study of clinical data, a Motion Artifact Score, defined as the product of FOR and LIRS metrics, further improved agreement with reader scores, with a Kendall's Tau coefficient of 0.65. The metrics of FOR, LIRS, and the product of the two metrics provided the highest agreement in motion artifact ranking when compared to the readers, and the highest linear correlation to the reader scores. The validated motion artifact metrics may be useful for developing and evaluating methods to reduce motion in Coronary Computed Tomography Angiography (CCTA) images. © 2017 American Association of Physicists in Medicine.
DOT National Transportation Integrated Search
2016-07-12
This document describes the Performance Measurement and Evaluation Support Plan for the New York City Department of Transportation New York City (NYC) Connected Vehicle Pilot Deployment (CVPD) Project. The report documents the performance metrics tha...
DOT National Transportation Integrated Search
2016-03-14
The Performance Measurement and Evaluation Support Plan for the Connected Vehicle Pilot Deployment Program Phase 1, Tampa Hillsborough Expressway Authority, outlines the goals and objectives for the Pilot as well as the proposed performance metrics. ...
The LSST metrics analysis framework (MAF)
NASA Astrophysics Data System (ADS)
Jones, R. L.; Yoachim, Peter; Chandrasekharan, Srinivasan; Connolly, Andrew J.; Cook, Kem H.; Ivezic, Željko; Krughoff, K. S.; Petry, Catherine; Ridgway, Stephen T.
2014-07-01
We describe the Metrics Analysis Framework (MAF), an open-source python framework developed to provide a user-friendly, customizable, easily-extensible set of tools for analyzing data sets. MAF is part of the Large Synoptic Survey Telescope (LSST) Simulations effort. Its initial goal is to provide a tool to evaluate LSST Operations Simulation (OpSim) simulated surveys to help understand the effects of telescope scheduling on survey performance, however MAF can be applied to a much wider range of datasets. The building blocks of the framework are Metrics (algorithms to analyze a given quantity of data), Slicers (subdividing the overall data set into smaller data slices as relevant for each Metric), and Database classes (to access the dataset and read data into memory). We describe how these building blocks work together, and provide an example of using MAF to evaluate different dithering strategies. We also outline how users can write their own custom Metrics and use these within the framework.
Metrics for linear kinematic features in sea ice
NASA Astrophysics Data System (ADS)
Levy, G.; Coon, M.; Sulsky, D.
2006-12-01
The treatment of leads as cracks or discontinuities (see Coon et al. presentation) requires some shift in the procedure of evaluation and comparison of lead-resolving models and their validation against observations. Common metrics used to evaluate ice model skills are by and large an adaptation of a least square "metric" adopted from operational numerical weather prediction data assimilation systems and are most appropriate for continuous fields and Eilerian systems where the observations and predictions are commensurate. However, this class of metrics suffers from some flaws in areas of sharp gradients and discontinuities (e.g., leads) and when Lagrangian treatments are more natural. After a brief review of these metrics and their performance in areas of sharp gradients, we present two new metrics specifically designed to measure model accuracy in representing linear features (e.g., leads). The indices developed circumvent the requirement that both the observations and model variables be commensurate (i.e., measured with the same units) by considering the frequencies of the features of interest/importance. We illustrate the metrics by scoring several hypothetical "simulated" discontinuity fields against the lead interpreted from RGPS observations.
Sound quality evaluation of air conditioning sound rating metric
NASA Astrophysics Data System (ADS)
Hodgdon, Kathleen K.; Peters, Jonathan A.; Burkhardt, Russell C.; Atchley, Anthony A.; Blood, Ingrid M.
2003-10-01
A product's success can depend on its acoustic signature as much as on the product's performance. The consumer's perception can strongly influence their satisfaction with and confidence in the product. A metric that can rate the content of the spectrum, and predict its consumer preference, is a valuable tool for manufacturers. The current method of assessing acoustic signatures from residential air conditioning units is defined in the Air Conditioning and Refrigeration Institute (ARI 270) 1995 Standard for Sound Rating of Outdoor Unitary Equipment. The ARI 270 metric, and modified versions of that metric, were implemented in software with the flexibility to modify the features applied. Numerous product signatures were analyzed to generate a set of synthesized spectra that targeted spectral configurations that challenged the metric's abilities. A subjective jury evaluation was conducted to establish the consumer preference for those spectra. Statistical correlations were conducted to assess the degree of relationship between the subjective preferences and the various metric calculations. Recommendations were made for modifications to improve the current metric's ability to predict subjective preference. [Research supported by the Air Conditioning and Refrigeration Institute.
Weykamp, Cas; John, Garry; Gillery, Philippe; English, Emma; Ji, Linong; Lenters-Westra, Erna; Little, Randie R.; Roglic, Gojka; Sacks, David B.; Takei, Izumi
2016-01-01
Background A major objective of the IFCC Task Force on implementation of HbA1c standardization is to develop a model to define quality targets for HbA1c. Methods Two generic models, the Biological Variation and Sigma-metrics model, are investigated. Variables in the models were selected for HbA1c and data of EQA/PT programs were used to evaluate the suitability of the models to set and evaluate quality targets within and between laboratories. Results In the biological variation model 48% of individual laboratories and none of the 26 instrument groups met the minimum performance criterion. In the Sigma-metrics model, with a total allowable error (TAE) set at 5 mmol/mol (0.46% NGSP) 77% of the individual laboratories and 12 of 26 instrument groups met the 2 sigma criterion. Conclusion The Biological Variation and Sigma-metrics model were demonstrated to be suitable for setting and evaluating quality targets within and between laboratories. The Sigma-metrics model is more flexible as both the TAE and the risk of failure can be adjusted to requirements related to e.g. use for diagnosis/monitoring or requirements of (inter)national authorities. With the aim of reaching international consensus on advice regarding quality targets for HbA1c, the Task Force suggests the Sigma-metrics model as the model of choice with default values of 5 mmol/mol (0.46%) for TAE, and risk levels of 2 and 4 sigma for routine laboratories and laboratories performing clinical trials, respectively. These goals should serve as a starting point for discussion with international stakeholders in the field of diabetes. PMID:25737535
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale.
Emmons, Scott; Kobourov, Stephen; Gallant, Mike; Börner, Katy
2016-01-01
Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms-Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.
NASA Astrophysics Data System (ADS)
Madugundu, Rangaswamy; Al-Gaadi, Khalid A.; Tola, ElKamil; Hassaballa, Abdalhaleem A.; Patil, Virupakshagouda C.
2017-12-01
Accurate estimation of evapotranspiration (ET) is essential for hydrological modeling and efficient crop water management in hyper-arid climates. In this study, we applied the METRIC algorithm on Landsat-8 images, acquired from June to October 2013, for the mapping of ET of a 50 ha center-pivot irrigated alfalfa field in the eastern region of Saudi Arabia. The METRIC-estimated energy balance components and ET were evaluated against the data provided by an eddy covariance (EC) flux tower installed in the field. Results indicated that the METRIC algorithm provided accurate ET estimates over the study area, with RMSE values of 0.13 and 4.15 mm d-1. The METRIC algorithm was observed to perform better in full canopy conditions compared to partial canopy conditions. On average, the METRIC algorithm overestimated the hourly ET by 6.6 % in comparison to the EC measurements; however, the daily ET was underestimated by 4.2 %.
Screening for dry eye disease using infrared ocular thermography.
Tan, Li Li; Sanjay, Srinivasan; Morgan, Philip B
2016-12-01
To evaluate the efficacy of infrared (IR) ocular thermography in screening for dry eye disease (DED). IR ocular thermography was performed on 62 dry eye and 63 age- and sex-matched control subjects. Marking of ocular surface and temperature acquisition was done using a novel 'diamond' demarcation method. 30 static- and 30 dynamic-metrics were studied and receiver operating characteristic curves were plotted. Efficacy of the temperature metrics in detecting DED were evaluated singly and in combination in terms of their area under the curve (AUC), Youden's index and discrimination power (DP). Absolute temperature of the extreme nasal conjunctiva 5s and 10s after eye opening were best detectors for DED. With threshold value for the first metric set at 34.7°C, sensitivity and specificity was 87.1% (95% CI: 76.2-94.3%) and 50.8% (95% CI: 37.9-63.6%) respectively. With threshold value for the second metric set at 34.5°C, sensitivity and specificity was 77.6% (95% CI: 64.7-87.5%) and 61.9% (95% CI: 48.8-73.9%) respectively. The two metrics had moderate accuracy and limited performances with AUC of 72% (95% CI: 63-81%) and 73% (95% CI: 64-82%); Youden index of about 0.4 and DP of 1.07 and 1.05 respectively. None of the dynamic metrics was good detector for DED. Combining metrics was not able to increase the AUC. This work suggests some utility for the application of IR ocular thermography for evaluation of dry eye patients. Copyright © 2016 British Contact Lens Association. Published by Elsevier Ltd. All rights reserved.
Methodology to Calculate the ACE and HPQ Metrics Used in the Wave Energy Prize
DOE Office of Scientific and Technical Information (OSTI.GOV)
Driscoll, Frederick R; Weber, Jochem W; Jenne, Dale S
The U.S. Department of Energy's Wave Energy Prize Competition encouraged the development of innovative deep-water wave energy conversion technologies that at least doubled device performance above the 2014 state of the art. Because levelized cost of energy (LCOE) metrics are challenging to apply equitably to new technologies where significant uncertainty exists in design and operation, the prize technical team developed a reduced metric as proxy for LCOE, which provides an equitable comparison of low technology readiness level wave energy converter (WEC) concepts. The metric is called 'ACE' which is short for the ratio of the average climate capture width tomore » the characteristic capital expenditure. The methodology and application of the ACE metric used to evaluate the performance of the technologies that competed in the Wave Energy Prize are explained in this report.« less
State of the States 2009. Renewable Energy Development and the Role of Policy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Doris, Elizabeth; McLaren, Joyce; Healey, Victoria
2009-10-01
This report tracks the progress of U.S. renewable energy development at the state level, with metrics on development status and reviews of relevant policies. The analysis offers state-by-state policy suggestions and develops performance-based evaluation metrics to accelerate and improve renewable energy development.
Using measures of information content and complexity of time series as hydrologic metrics
USDA-ARS?s Scientific Manuscript database
The information theory has been previously used to develop metrics that allowed to characterize temporal patterns in soil moisture dynamics, and to evaluate and to compare performance of soil water flow models. The objective of this study was to apply information and complexity measures to characte...
NASA Astrophysics Data System (ADS)
Crowe, S. B.; Kairn, T.; Middlebrook, N.; Sutherland, B.; Hill, B.; Kenny, J.; Langton, C. M.; Trapp, J. V.
2015-03-01
This study aimed to provide a detailed evaluation and comparison of a range of modulated beam evaluation metrics, in terms of their correlation with QA testing results and their variation between treatment sites, for a large number of treatments. Ten metrics including the modulation index (MI), fluence map complexity, modulation complexity score (MCS), mean aperture displacement (MAD) and small aperture score (SAS) were evaluated for 546 beams from 122 intensity modulated radiotherapy (IMRT) and volumetric modulated arc therapy (VMAT) treatment plans targeting the anus, rectum, endometrium, brain, head and neck and prostate. The calculated sets of metrics were evaluated in terms of their relationships to each other and their correlation with the results of electronic portal imaging based quality assurance (QA) evaluations of the treatment beams. Evaluation of the MI, MAD and SAS suggested that beams used in treatments of the anus, rectum, head and neck were more complex than the prostate and brain treatment beams. Seven of the ten beam complexity metrics were found to be strongly correlated with the results from QA testing of the IMRT beams (p < 0.00008). For example, values of SAS (with multileaf collimator apertures narrower than 10 mm defined as ‘small’) less than 0.2 also identified QA passing IMRT beams with 100% specificity. However, few of the metrics are correlated with the results from QA testing of the VMAT beams, whether they were evaluated as whole 360° arcs or as 60° sub-arcs. Select evaluation of beam complexity metrics (at least MI, MCS and SAS) is therefore recommended, as an intermediate step in the IMRT QA chain. Such evaluation may also be useful as a means of periodically reviewing VMAT planning or optimiser performance.
Evaluating Modeled Impact Metrics for Human Health, Agriculture Growth, and Near-Term Climate
NASA Astrophysics Data System (ADS)
Seltzer, K. M.; Shindell, D. T.; Faluvegi, G.; Murray, L. T.
2017-12-01
Simulated metrics that assess impacts on human health, agriculture growth, and near-term climate were evaluated using ground-based and satellite observations. The NASA GISS ModelE2 and GEOS-Chem models were used to simulate the near-present chemistry of the atmosphere. A suite of simulations that varied by model, meteorology, horizontal resolution, emissions inventory, and emissions year were performed, enabling an analysis of metric sensitivities to various model components. All simulations utilized consistent anthropogenic global emissions inventories (ECLIPSE V5a or CEDS), and an evaluation of simulated results were carried out for 2004-2006 and 2009-2011 over the United States and 2014-2015 over China. Results for O3- and PM2.5-based metrics featured minor differences due to the model resolutions considered here (2.0° × 2.5° and 0.5° × 0.666°) and model, meteorology, and emissions inventory each played larger roles in variances. Surface metrics related to O3 were consistently high biased, though to varying degrees, demonstrating the need to evaluate particular modeling frameworks before O3 impacts are quantified. Surface metrics related to PM2.5 were diverse, indicating that a multimodel mean with robust results are valuable tools in predicting PM2.5-related impacts. Oftentimes, the configuration that captured the change of a metric best over time differed from the configuration that captured the magnitude of the same metric best, demonstrating the challenge in skillfully simulating impacts. These results highlight the strengths and weaknesses of these models in simulating impact metrics related to air quality and near-term climate. With such information, the reliability of historical and future simulations can be better understood.
PET and MRI image fusion based on combination of 2-D Hilbert transform and IHS method.
Haddadpour, Mozhdeh; Daneshvar, Sabalan; Seyedarabi, Hadi
2017-08-01
The process of medical image fusion is combining two or more medical images such as Magnetic Resonance Image (MRI) and Positron Emission Tomography (PET) and mapping them to a single image as fused image. So purpose of our study is assisting physicians to diagnose and treat the diseases in the least of the time. We used Magnetic Resonance Image (MRI) and Positron Emission Tomography (PET) as input images, so fused them based on combination of two dimensional Hilbert transform (2-D HT) and Intensity Hue Saturation (IHS) method. Evaluation metrics that we apply are Discrepancy (D k ) as an assessing spectral features and Average Gradient (AG k ) as an evaluating spatial features and also Overall Performance (O.P) to verify properly of the proposed method. In this paper we used three common evaluation metrics like Average Gradient (AG k ) and the lowest Discrepancy (D k ) and Overall Performance (O.P) to evaluate the performance of our method. Simulated and numerical results represent the desired performance of proposed method. Since that the main purpose of medical image fusion is preserving both spatial and spectral features of input images, so based on numerical results of evaluation metrics such as Average Gradient (AG k ), Discrepancy (D k ) and Overall Performance (O.P) and also desired simulated results, it can be concluded that our proposed method can preserve both spatial and spectral features of input images. Copyright © 2017 Chang Gung University. Published by Elsevier B.V. All rights reserved.
Application of Support Vector Machine to Forex Monitoring
NASA Astrophysics Data System (ADS)
Kamruzzaman, Joarder; Sarker, Ruhul A.
Previous studies have demonstrated superior performance of artificial neural network (ANN) based forex forecasting models over traditional regression models. This paper applies support vector machines to build a forecasting model from the historical data using six simple technical indicators and presents a comparison with an ANN based model trained by scaled conjugate gradient (SCG) learning algorithm. The models are evaluated and compared on the basis of five commonly used performance metrics that measure closeness of prediction as well as correctness in directional change. Forecasting results of six different currencies against Australian dollar reveal superior performance of SVM model using simple linear kernel over ANN-SCG model in terms of all the evaluation metrics. The effect of SVM parameter selection on prediction performance is also investigated and analyzed.
Objective Situation Awareness Measurement Based on Performance Self-Evaluation
NASA Technical Reports Server (NTRS)
DeMaio, Joe
1998-01-01
The research was conducted in support of the NASA Safe All-Weather Flight Operations for Rotorcraft (SAFOR) program. The purpose of the work was to investigate the utility of two measurement tools developed by the British Defense Evaluation Research Agency. These tools were a subjective workload assessment scale, the DRA Workload Scale and a situation awareness measurement tool. The situation awareness tool uses a comparison of the crew's self-evaluation of performance against actual performance in order to determine what information the crew attended to during the performance. These two measurement tools were evaluated in the context of a test of innovative approach to alerting the crew by way of a helmet mounted display. The situation assessment data are reported here. The performance self-evaluation metric of situation awareness was found to be highly effective. It was used to evaluate situation awareness on a tank reconnaissance task, a tactical navigation task, and a stylized task used to evaluated handling qualities. Using the self-evaluation metric, it was possible to evaluate situation awareness, without exact knowledge the relevant information in some cases and to identify information to which the crew attended or failed to attend in others.
Engineering performance metrics
NASA Astrophysics Data System (ADS)
Delozier, R.; Snyder, N.
1993-03-01
Implementation of a Total Quality Management (TQM) approach to engineering work required the development of a system of metrics which would serve as a meaningful management tool for evaluating effectiveness in accomplishing project objectives and in achieving improved customer satisfaction. A team effort was chartered with the goal of developing a system of engineering performance metrics which would measure customer satisfaction, quality, cost effectiveness, and timeliness. The approach to developing this system involved normal systems design phases including, conceptual design, detailed design, implementation, and integration. The lessons teamed from this effort will be explored in this paper. These lessons learned may provide a starting point for other large engineering organizations seeking to institute a performance measurement system accomplishing project objectives and in achieving improved customer satisfaction. To facilitate this effort, a team was chartered to assist in the development of the metrics system. This team, consisting of customers and Engineering staff members, was utilized to ensure that the needs and views of the customers were considered in the development of performance measurements. The development of a system of metrics is no different than the development of any type of system. It includes the steps of defining performance measurement requirements, measurement process conceptual design, performance measurement and reporting system detailed design, and system implementation and integration.
Research on quality metrics of wireless adaptive video streaming
NASA Astrophysics Data System (ADS)
Li, Xuefei
2018-04-01
With the development of wireless networks and intelligent terminals, video traffic has increased dramatically. Adaptive video streaming has become one of the most promising video transmission technologies. For this type of service, a good QoS (Quality of Service) of wireless network does not always guarantee that all customers have good experience. Thus, new quality metrics have been widely studies recently. Taking this into account, the objective of this paper is to investigate the quality metrics of wireless adaptive video streaming. In this paper, a wireless video streaming simulation platform with DASH mechanism and multi-rate video generator is established. Based on this platform, PSNR model, SSIM model and Quality Level model are implemented. Quality Level Model considers the QoE (Quality of Experience) factors such as image quality, stalling and switching frequency while PSNR Model and SSIM Model mainly consider the quality of the video. To evaluate the performance of these QoE models, three performance metrics (SROCC, PLCC and RMSE) which are used to make a comparison of subjective and predicted MOS (Mean Opinion Score) are calculated. From these performance metrics, the monotonicity, linearity and accuracy of these quality metrics can be observed.
Han, Aaron L-F; Wong, Derek F; Chao, Lidia S; He, Liangye; Lu, Yi
2014-01-01
With the rapid development of machine translation (MT), the MT evaluation becomes very important to timely tell us whether the MT system makes any progress. The conventional MT evaluation methods tend to calculate the similarity between hypothesis translations offered by automatic translation systems and reference translations offered by professional translators. There are several weaknesses in existing evaluation metrics. Firstly, the designed incomprehensive factors result in language-bias problem, which means they perform well on some special language pairs but weak on other language pairs. Secondly, they tend to use no linguistic features or too many linguistic features, of which no usage of linguistic feature draws a lot of criticism from the linguists and too many linguistic features make the model weak in repeatability. Thirdly, the employed reference translations are very expensive and sometimes not available in the practice. In this paper, the authors propose an unsupervised MT evaluation metric using universal part-of-speech tagset without relying on reference translations. The authors also explore the performances of the designed metric on traditional supervised evaluation tasks. Both the supervised and unsupervised experiments show that the designed methods yield higher correlation scores with human judgments.
DOT National Transportation Integrated Search
2015-09-01
Over the past few years, the Utah Department of Transportation (UDOT) has developed a system called the : Signal Performance Metrics System (SPMS) to evaluate the performance of signalized intersections. This system : currently provides data summarie...
Rank Order Entropy: why one metric is not enough
McLellan, Margaret R.; Ryan, M. Dominic; Breneman, Curt M.
2011-01-01
The use of Quantitative Structure-Activity Relationship models to address problems in drug discovery has a mixed history, generally resulting from the mis-application of QSAR models that were either poorly constructed or used outside of their domains of applicability. This situation has motivated the development of a variety of model performance metrics (r2, PRESS r2, F-tests, etc) designed to increase user confidence in the validity of QSAR predictions. In a typical workflow scenario, QSAR models are created and validated on training sets of molecules using metrics such as Leave-One-Out or many-fold cross-validation methods that attempt to assess their internal consistency. However, few current validation methods are designed to directly address the stability of QSAR predictions in response to changes in the information content of the training set. Since the main purpose of QSAR is to quickly and accurately estimate a property of interest for an untested set of molecules, it makes sense to have a means at hand to correctly set user expectations of model performance. In fact, the numerical value of a molecular prediction is often less important to the end user than knowing the rank order of that set of molecules according to their predicted endpoint values. Consequently, a means for characterizing the stability of predicted rank order is an important component of predictive QSAR. Unfortunately, none of the many validation metrics currently available directly measure the stability of rank order prediction, making the development of an additional metric that can quantify model stability a high priority. To address this need, this work examines the stabilities of QSAR rank order models created from representative data sets, descriptor sets, and modeling methods that were then assessed using Kendall Tau as a rank order metric, upon which the Shannon Entropy was evaluated as a means of quantifying rank-order stability. Random removal of data from the training set, also known as Data Truncation Analysis (DTA), was used as a means for systematically reducing the information content of each training set while examining both rank order performance and rank order stability in the face of training set data loss. The premise for DTA ROE model evaluation is that the response of a model to incremental loss of training information will be indicative of the quality and sufficiency of its training set, learning method, and descriptor types to cover a particular domain of applicability. This process is termed a “rank order entropy” evaluation, or ROE. By analogy with information theory, an unstable rank order model displays a high level of implicit entropy, while a QSAR rank order model which remains nearly unchanged during training set reductions would show low entropy. In this work, the ROE metric was applied to 71 data sets of different sizes, and was found to reveal more information about the behavior of the models than traditional metrics alone. Stable, or consistently performing models, did not necessarily predict rank order well. Models that performed well in rank order did not necessarily perform well in traditional metrics. In the end, it was shown that ROE metrics suggested that some QSAR models that are typically used should be discarded. ROE evaluation helps to discern which combinations of data set, descriptor set, and modeling methods lead to usable models in prioritization schemes, and provides confidence in the use of a particular model within a specific domain of applicability. PMID:21875058
Quality Measures for Dialysis: Time for a Balanced Scorecard.
Kliger, Alan S
2016-02-05
Recent federal legislation establishes a merit-based incentive payment system for physicians, with a scorecard for each professional. The Centers for Medicare and Medicaid Services evaluate quality of care with clinical performance measures and have used these metrics for public reporting and payment to dialysis facilities. Similar metrics may be used for the future merit-based incentive payment system. In nephrology, most clinical performance measures measure processes and intermediate outcomes of care. These metrics were developed from population studies of best practice and do not identify opportunities for individualizing care on the basis of patient characteristics and individual goals of treatment. The In-Center Hemodialysis (ICH) Consumer Assessment of Healthcare Providers and Systems (CAHPS) survey examines patients' perception of care and has entered the arena to evaluate quality of care. A balanced scorecard of quality performance should include three elements: population-based best clinical practice, patient perceptions, and individually crafted patient goals of care. Copyright © 2016 by the American Society of Nephrology.
Performance evaluation of objective quality metrics for HDR image compression
NASA Astrophysics Data System (ADS)
Valenzise, Giuseppe; De Simone, Francesca; Lauga, Paul; Dufaux, Frederic
2014-09-01
Due to the much larger luminance and contrast characteristics of high dynamic range (HDR) images, well-known objective quality metrics, widely used for the assessment of low dynamic range (LDR) content, cannot be directly applied to HDR images in order to predict their perceptual fidelity. To overcome this limitation, advanced fidelity metrics, such as the HDR-VDP, have been proposed to accurately predict visually significant differences. However, their complex calibration may make them difficult to use in practice. A simpler approach consists in computing arithmetic or structural fidelity metrics, such as PSNR and SSIM, on perceptually encoded luminance values but the performance of quality prediction in this case has not been clearly studied. In this paper, we aim at providing a better comprehension of the limits and the potentialities of this approach, by means of a subjective study. We compare the performance of HDR-VDP to that of PSNR and SSIM computed on perceptually encoded luminance values, when considering compressed HDR images. Our results show that these simpler metrics can be effectively employed to assess image fidelity for applications such as HDR image compression.
Performance analysis of three-dimensional ridge acquisition from live finger and palm surface scans
NASA Astrophysics Data System (ADS)
Fatehpuria, Abhishika; Lau, Daniel L.; Yalla, Veeraganesh; Hassebrook, Laurence G.
2007-04-01
Fingerprints are one of the most commonly used and relied-upon biometric technology. But often the captured fingerprint image is far from ideal due to imperfect acquisition techniques that can be slow and cumbersome to use without providing complete fingerprint information. Most of the diffculties arise due to the contact of the fingerprint surface with the sensor platen. To overcome these diffculties we have been developing a noncontact scanning system for acquiring a 3-D scan of a finger with suffciently high resolution which is then converted into a 2-D rolled equivalent image. In this paper, we describe certain quantitative measures evaluating scanner performance. Specifically, we use some image software components developed by the National Institute of Standards and Technology, to derive our performance metrics. Out of the eleven identified metrics, three were found to be most suitable for evaluating scanner performance. A comparison is also made between 2D fingerprint images obtained by the traditional means and the 2D images obtained after unrolling the 3D scans and the quality of the acquired scans is quantified using the metrics.
Fu, Lawrence D.; Aphinyanaphongs, Yindalon; Wang, Lily; Aliferis, Constantin F.
2011-01-01
Evaluating the biomedical literature and health-related websites for quality are challenging information retrieval tasks. Current commonly used methods include impact factor for journals, PubMed’s clinical query filters and machine learning-based filter models for articles, and PageRank for websites. Previous work has focused on the average performance of these methods without considering the topic, and it is unknown how performance varies for specific topics or focused searches. Clinicians, researchers, and users should be aware when expected performance is not achieved for specific topics. The present work analyzes the behavior of these methods for a variety of topics. Impact factor, clinical query filters, and PageRank vary widely across different topics while a topic-specific impact factor and machine learning-based filter models are more stable. The results demonstrate that a method may perform excellently on average but struggle when used on a number of narrower topics. Topic adjusted metrics and other topic robust methods have an advantage in such situations. Users of traditional topic-sensitive metrics should be aware of their limitations. PMID:21419864
Important LiDAR metrics for discriminating forest tree species in Central Europe
NASA Astrophysics Data System (ADS)
Shi, Yifang; Wang, Tiejun; Skidmore, Andrew K.; Heurich, Marco
2018-03-01
Numerous airborne LiDAR-derived metrics have been proposed for classifying tree species. Yet an in-depth ecological and biological understanding of the significance of these metrics for tree species mapping remains largely unexplored. In this paper, we evaluated the performance of 37 frequently used LiDAR metrics derived under leaf-on and leaf-off conditions, respectively, for discriminating six different tree species in a natural forest in Germany. We firstly assessed the correlation between these metrics. Then we applied a Random Forest algorithm to classify the tree species and evaluated the importance of the LiDAR metrics. Finally, we identified the most important LiDAR metrics and tested their robustness and transferability. Our results indicated that about 60% of LiDAR metrics were highly correlated to each other (|r| > 0.7). There was no statistically significant difference in tree species mapping accuracy between the use of leaf-on and leaf-off LiDAR metrics. However, combining leaf-on and leaf-off LiDAR metrics significantly increased the overall accuracy from 58.2% (leaf-on) and 62.0% (leaf-off) to 66.5% as well as the kappa coefficient from 0.47 (leaf-on) and 0.51 (leaf-off) to 0.58. Radiometric features, especially intensity related metrics, provided more consistent and significant contributions than geometric features for tree species discrimination. Specifically, the mean intensity of first-or-single returns as well as the mean value of echo width were identified as the most robust LiDAR metrics for tree species discrimination. These results indicate that metrics derived from airborne LiDAR data, especially radiometric metrics, can aid in discriminating tree species in a mixed temperate forest, and represent candidate metrics for tree species classification and monitoring in Central Europe.
Energy-Based Metrics for Arthroscopic Skills Assessment.
Poursartip, Behnaz; LeBel, Marie-Eve; McCracken, Laura C; Escoto, Abelardo; Patel, Rajni V; Naish, Michael D; Trejos, Ana Luisa
2017-08-05
Minimally invasive skills assessment methods are essential in developing efficient surgical simulators and implementing consistent skills evaluation. Although numerous methods have been investigated in the literature, there is still a need to further improve the accuracy of surgical skills assessment. Energy expenditure can be an indication of motor skills proficiency. The goals of this study are to develop objective metrics based on energy expenditure, normalize these metrics, and investigate classifying trainees using these metrics. To this end, different forms of energy consisting of mechanical energy and work were considered and their values were divided by the related value of an ideal performance to develop normalized metrics. These metrics were used as inputs for various machine learning algorithms including support vector machines (SVM) and neural networks (NNs) for classification. The accuracy of the combination of the normalized energy-based metrics with these classifiers was evaluated through a leave-one-subject-out cross-validation. The proposed method was validated using 26 subjects at two experience levels (novices and experts) in three arthroscopic tasks. The results showed that there are statistically significant differences between novices and experts for almost all of the normalized energy-based metrics. The accuracy of classification using SVM and NN methods was between 70% and 95% for the various tasks. The results show that the normalized energy-based metrics and their combination with SVM and NN classifiers are capable of providing accurate classification of trainees. The assessment method proposed in this study can enhance surgical training by providing appropriate feedback to trainees about their level of expertise and can be used in the evaluation of proficiency.
Geospace Environment Modeling 2008-2009 Challenge: Ground Magnetic Field Perturbations
NASA Technical Reports Server (NTRS)
Pulkkinen, A.; Kuznetsova, M.; Ridley, A.; Raeder, J.; Vapirev, A.; Weimer, D.; Weigel, R. S.; Wiltberger, M.; Millward, G.; Rastatter, L.;
2011-01-01
Acquiring quantitative metrics!based knowledge about the performance of various space physics modeling approaches is central for the space weather community. Quantification of the performance helps the users of the modeling products to better understand the capabilities of the models and to choose the approach that best suits their specific needs. Further, metrics!based analyses are important for addressing the differences between various modeling approaches and for measuring and guiding the progress in the field. In this paper, the metrics!based results of the ground magnetic field perturbation part of the Geospace Environment Modeling 2008 2009 Challenge are reported. Predictions made by 14 different models, including an ensemble model, are compared to geomagnetic observatory recordings from 12 different northern hemispheric locations. Five different metrics are used to quantify the model performances for four storm events. It is shown that the ranking of the models is strongly dependent on the type of metric used to evaluate the model performance. None of the models rank near or at the top systematically for all used metrics. Consequently, one cannot pick the absolute winner : the choice for the best model depends on the characteristics of the signal one is interested in. Model performances vary also from event to event. This is particularly clear for root!mean!square difference and utility metric!based analyses. Further, analyses indicate that for some of the models, increasing the global magnetohydrodynamic model spatial resolution and the inclusion of the ring current dynamics improve the models capability to generate more realistic ground magnetic field fluctuations.
Devriendt, Floris; Moldovan, Darie; Verbeke, Wouter
2018-03-01
Prescriptive analytics extends on predictive analytics by allowing to estimate an outcome in function of control variables, allowing as such to establish the required level of control variables for realizing a desired outcome. Uplift modeling is at the heart of prescriptive analytics and aims at estimating the net difference in an outcome resulting from a specific action or treatment that is applied. In this article, a structured and detailed literature survey on uplift modeling is provided by identifying and contrasting various groups of approaches. In addition, evaluation metrics for assessing the performance of uplift models are reviewed. An experimental evaluation on four real-world data sets provides further insight into their use. Uplift random forests are found to be consistently among the best performing techniques in terms of the Qini and Gini measures, although considerable variability in performance across the various data sets of the experiments is observed. In addition, uplift models are frequently observed to be unstable and display a strong variability in terms of performance across different folds in the cross-validation experimental setup. This potentially threatens their actual use for business applications. Moreover, it is found that the available evaluation metrics do not provide an intuitively understandable indication of the actual use and performance of a model. Specifically, existing evaluation metrics do not facilitate a comparison of uplift models and predictive models and evaluate performance either at an arbitrary cutoff or over the full spectrum of potential cutoffs. In conclusion, we highlight the instability of uplift models and the need for an application-oriented approach to assess uplift models as prime topics for further research.
Guidelines for evaluating performance of oyster habitat restoration
Baggett, Lesley P.; Powers, Sean P.; Brumbaugh, Robert D.; Coen, Loren D.; DeAngelis, Bryan M.; Greene, Jennifer K.; Hancock, Boze T.; Morlock, Summer M.; Allen, Brian L.; Breitburg, Denise L.; Bushek, David; Grabowski, Jonathan H.; Grizzle, Raymond E.; Grosholz, Edwin D.; LaPeyre, Megan K.; Luckenbach, Mark W.; McGraw, Kay A.; Piehler, Michael F.; Westby, Stephanie R.; zu Ermgassen, Philine S. E.
2015-01-01
Restoration of degraded ecosystems is an important societal goal, yet inadequate monitoring and the absence of clear performance metrics are common criticisms of many habitat restoration projects. Funding limitations can prevent adequate monitoring, but we suggest that the lack of accepted metrics to address the diversity of restoration objectives also presents a serious challenge to the monitoring of restoration projects. A working group with experience in designing and monitoring oyster reef projects was used to develop standardized monitoring metrics, units, and performance criteria that would allow for comparison among restoration sites and projects of various construction types. A set of four universal metrics (reef areal dimensions, reef height, oyster density, and oyster size–frequency distribution) and a set of three universal environmental variables (water temperature, salinity, and dissolved oxygen) are recommended to be monitored for all oyster habitat restoration projects regardless of their goal(s). In addition, restoration goal-based metrics specific to four commonly cited ecosystem service-based restoration goals are recommended, along with an optional set of seven supplemental ancillary metrics that could provide information useful to the interpretation of prerestoration and postrestoration monitoring data. Widespread adoption of a common set of metrics with standardized techniques and units to assess well-defined goals not only allows practitioners to gauge the performance of their own projects but also allows for comparison among projects, which is both essential to the advancement of the field of oyster restoration and can provide new knowledge about the structure and ecological function of oyster reef ecosystems.
Synthesized view comparison method for no-reference 3D image quality assessment
NASA Astrophysics Data System (ADS)
Luo, Fangzhou; Lin, Chaoyi; Gu, Xiaodong; Ma, Xiaojun
2018-04-01
We develop a no-reference image quality assessment metric to evaluate the quality of synthesized view rendered from the Multi-view Video plus Depth (MVD) format. Our metric is named Synthesized View Comparison (SVC), which is designed for real-time quality monitoring at the receiver side in a 3D-TV system. The metric utilizes the virtual views in the middle which are warped from left and right views by Depth-image-based rendering algorithm (DIBR), and compares the difference between the virtual views rendered from different cameras by Structural SIMilarity (SSIM), a popular 2D full-reference image quality assessment metric. The experimental results indicate that our no-reference quality assessment metric for the synthesized images has competitive prediction performance compared with some classic full-reference image quality assessment metrics.
NASA Astrophysics Data System (ADS)
De Luccia, Frank J.; Houchin, Scott; Porter, Brian C.; Graybill, Justin; Haas, Evan; Johnson, Patrick D.; Isaacson, Peter J.; Reth, Alan D.
2016-05-01
The GOES-R Flight Project has developed an Image Navigation and Registration (INR) Performance Assessment Tool Set (IPATS) for measuring Advanced Baseline Imager (ABI) and Geostationary Lightning Mapper (GLM) INR performance metrics in the post-launch period for performance evaluation and long term monitoring. For ABI, these metrics are the 3-sigma errors in navigation (NAV), channel-to-channel registration (CCR), frame-to-frame registration (FFR), swath-to-swath registration (SSR), and within frame registration (WIFR) for the Level 1B image products. For GLM, the single metric of interest is the 3-sigma error in the navigation of background images (GLM NAV) used by the system to navigate lightning strikes. 3-sigma errors are estimates of the 99. 73rd percentile of the errors accumulated over a 24 hour data collection period. IPATS utilizes a modular algorithmic design to allow user selection of data processing sequences optimized for generation of each INR metric. This novel modular approach minimizes duplication of common processing elements, thereby maximizing code efficiency and speed. Fast processing is essential given the large number of sub-image registrations required to generate INR metrics for the many images produced over a 24 hour evaluation period. Another aspect of the IPATS design that vastly reduces execution time is the off-line propagation of Landsat based truth images to the fixed grid coordinates system for each of the three GOES-R satellite locations, operational East and West and initial checkout locations. This paper describes the algorithmic design and implementation of IPATS and provides preliminary test results.
NASA Technical Reports Server (NTRS)
DeLuccia, Frank J.; Houchin, Scott; Porter, Brian C.; Graybill, Justin; Haas, Evan; Johnson, Patrick D.; Isaacson, Peter J.; Reth, Alan D.
2016-01-01
The GOES-R Flight Project has developed an Image Navigation and Registration (INR) Performance Assessment Tool Set (IPATS) for measuring Advanced Baseline Imager (ABI) and Geostationary Lightning Mapper (GLM) INR performance metrics in the post-launch period for performance evaluation and long term monitoring. For ABI, these metrics are the 3-sigma errors in navigation (NAV), channel-to-channel registration (CCR), frame-to-frame registration (FFR), swath-to-swath registration (SSR), and within frame registration (WIFR) for the Level 1B image products. For GLM, the single metric of interest is the 3-sigma error in the navigation of background images (GLM NAV) used by the system to navigate lightning strikes. 3-sigma errors are estimates of the 99.73rd percentile of the errors accumulated over a 24 hour data collection period. IPATS utilizes a modular algorithmic design to allow user selection of data processing sequences optimized for generation of each INR metric. This novel modular approach minimizes duplication of common processing elements, thereby maximizing code efficiency and speed. Fast processing is essential given the large number of sub-image registrations required to generate INR metrics for the many images produced over a 24 hour evaluation period. Another aspect of the IPATS design that vastly reduces execution time is the off-line propagation of Landsat based truth images to the fixed grid coordinates system for each of the three GOES-R satellite locations, operational East and West and initial checkout locations. This paper describes the algorithmic design and implementation of IPATS and provides preliminary test results.
NASA Technical Reports Server (NTRS)
De Luccia, Frank J.; Houchin, Scott; Porter, Brian C.; Graybill, Justin; Haas, Evan; Johnson, Patrick D.; Isaacson, Peter J.; Reth, Alan D.
2016-01-01
The GOES-R Flight Project has developed an Image Navigation and Registration (INR) Performance Assessment Tool Set (IPATS) for measuring Advanced Baseline Imager (ABI) and Geostationary Lightning Mapper (GLM) INR performance metrics in the post-launch period for performance evaluation and long term monitoring. For ABI, these metrics are the 3-sigma errors in navigation (NAV), channel-to-channel registration (CCR), frame-to-frame registration (FFR), swath-to-swath registration (SSR), and within frame registration (WIFR) for the Level 1B image products. For GLM, the single metric of interest is the 3-sigma error in the navigation of background images (GLM NAV) used by the system to navigate lightning strikes. 3-sigma errors are estimates of the 99.73rd percentile of the errors accumulated over a 24-hour data collection period. IPATS utilizes a modular algorithmic design to allow user selection of data processing sequences optimized for generation of each INR metric. This novel modular approach minimizes duplication of common processing elements, thereby maximizing code efficiency and speed. Fast processing is essential given the large number of sub-image registrations required to generate INR metrics for the many images produced over a 24-hour evaluation period. Another aspect of the IPATS design that vastly reduces execution time is the off-line propagation of Landsat based truth images to the fixed grid coordinates system for each of the three GOES-R satellite locations, operational East and West and initial checkout locations. This paper describes the algorithmic design and implementation of IPATS and provides preliminary test results.
An Instrumented Glove to Assess Manual Dexterity in Simulation-Based Neurosurgical Education
Lemos, Juan Diego; Hernandez, Alher Mauricio; Soto-Romero, Georges
2017-01-01
The traditional neurosurgical apprenticeship scheme includes the assessment of trainee’s manual skills carried out by experienced surgeons. However, the introduction of surgical simulation technology presents a new paradigm where residents can refine surgical techniques on a simulator before putting them into practice in real patients. Unfortunately, in this new scheme, an experienced surgeon will not always be available to evaluate trainee’s performance. For this reason, it is necessary to develop automatic mechanisms to estimate metrics for assessing manual dexterity in a quantitative way. Authors have proposed some hardware-software approaches to evaluate manual dexterity on surgical simulators. This paper presents IGlove, a wearable device that uses inertial sensors embedded on an elastic glove to capture hand movements. Metrics to assess manual dexterity are estimated from sensors signals using data processing and information analysis algorithms. It has been designed to be used with a neurosurgical simulator called Daubara NS Trainer, but can be easily adapted to another benchtop- and manikin-based medical simulators. The system was tested with a sample of 14 volunteers who performed a test that was designed to simultaneously evaluate their fine motor skills and the IGlove’s functionalities. Metrics obtained by each of the participants are presented as results in this work; it is also shown how these metrics are used to automatically evaluate the level of manual dexterity of each volunteer. PMID:28468268
Subjective Performance Evaluation in the Public Sector: Evidence from School Inspections. CEE DP 135
ERIC Educational Resources Information Center
Hussain, Iftikhar
2012-01-01
Performance measurement in the public sector is largely based on objective metrics, which may be subject to gaming behaviour. This paper investigates a novel subjective performance evaluation system where independent inspectors visit schools at very short notice, publicly disclose their findings and sanction schools rated fail. First, I…
Temporal Delineation and Quantification of Short Term Clustered Mining Seismicity
NASA Astrophysics Data System (ADS)
Woodward, Kyle; Wesseloo, Johan; Potvin, Yves
2017-07-01
The assessment of the temporal characteristics of seismicity is fundamental to understanding and quantifying the seismic hazard associated with mining, the effectiveness of strategies and tactics used to manage seismic hazard, and the relationship between seismicity and changes to the mining environment. This article aims to improve the accuracy and precision in which the temporal dimension of seismic responses can be quantified and delineated. We present a review and discussion on the occurrence of time-dependent mining seismicity with a specific focus on temporal modelling and the modified Omori law (MOL). This forms the basis for the development of a simple weighted metric that allows for the consistent temporal delineation and quantification of a seismic response. The optimisation of this metric allows for the selection of the most appropriate modelling interval given the temporal attributes of time-dependent mining seismicity. We evaluate the performance weighted metric for the modelling of a synthetic seismic dataset. This assessment shows that seismic responses can be quantified and delineated by the MOL, with reasonable accuracy and precision, when the modelling is optimised by evaluating the weighted MLE metric. Furthermore, this assessment highlights that decreased weighted MLE metric performance can be expected if there is a lack of contrast between the temporal characteristics of events associated with different processes.
A strategy for evaluating pathway analysis methods.
Yu, Chenggang; Woo, Hyung Jun; Yu, Xueping; Oyama, Tatsuya; Wallqvist, Anders; Reifman, Jaques
2017-10-13
Researchers have previously developed a multitude of methods designed to identify biological pathways associated with specific clinical or experimental conditions of interest, with the aim of facilitating biological interpretation of high-throughput data. Before practically applying such pathway analysis (PA) methods, we must first evaluate their performance and reliability, using datasets where the pathways perturbed by the conditions of interest have been well characterized in advance. However, such 'ground truths' (or gold standards) are often unavailable. Furthermore, previous evaluation strategies that have focused on defining 'true answers' are unable to systematically and objectively assess PA methods under a wide range of conditions. In this work, we propose a novel strategy for evaluating PA methods independently of any gold standard, either established or assumed. The strategy involves the use of two mutually complementary metrics, recall and discrimination. Recall measures the consistency of the perturbed pathways identified by applying a particular analysis method to an original large dataset and those identified by the same method to a sub-dataset of the original dataset. In contrast, discrimination measures specificity-the degree to which the perturbed pathways identified by a particular method to a dataset from one experiment differ from those identifying by the same method to a dataset from a different experiment. We used these metrics and 24 datasets to evaluate six widely used PA methods. The results highlighted the common challenge in reliably identifying significant pathways from small datasets. Importantly, we confirmed the effectiveness of our proposed dual-metric strategy by showing that previous comparative studies corroborate the performance evaluations of the six methods obtained by our strategy. Unlike any previously proposed strategy for evaluating the performance of PA methods, our dual-metric strategy does not rely on any ground truth, either established or assumed, of the pathways perturbed by a specific clinical or experimental condition. As such, our strategy allows researchers to systematically and objectively evaluate pathway analysis methods by employing any number of datasets for a variety of conditions.
Evaluation Metrics for Biostatistical and Epidemiological Collaborations
Rubio, Doris McGartland; del Junco, Deborah J.; Bhore, Rafia; Lindsell, Christopher J.; Oster, Robert A.; Wittkowski, Knut M.; Welty, Leah J.; Li, Yi-Ju; DeMets, Dave
2011-01-01
Increasing demands for evidence-based medicine and for the translation of biomedical research into individual and public health benefit have been accompanied by the proliferation of special units that offer expertise in biostatistics, epidemiology, and research design (BERD) within academic health centers. Objective metrics that can be used to evaluate, track, and improve the performance of these BERD units are critical to their successful establishment and sustainable future. To develop a set of reliable but versatile metrics that can be adapted easily to different environments and evolving needs, we consulted with members of BERD units from the consortium of academic health centers funded by the Clinical and Translational Science Award Program of the National Institutes of Health. Through a systematic process of consensus building and document drafting, we formulated metrics that covered the three identified domains of BERD practices: the development and maintenance of collaborations with clinical and translational science investigators, the application of BERD-related methods to clinical and translational research, and the discovery of novel BERD-related methodologies. In this article, we describe the set of metrics and advocate their use for evaluating BERD practices. The routine application, comparison of findings across diverse BERD units, and ongoing refinement of the metrics will identify trends, facilitate meaningful changes, and ultimately enhance the contribution of BERD activities to biomedical research. PMID:21284015
The five traps of performance measurement.
Likierman, Andrew
2009-10-01
Evaluating a company's performance often entails wading through a thicket of numbers produced by a few simple metrics, writes the author, and senior executives leave measurement to those whose specialty is spreadsheets. To take ownership of performance assessment, those executives should find qualitative, forward-looking measures that will help them avoid five common traps: Measuring against yourself. Find data from outside the company, and reward relative, rather than absolute, performance. Enterprise Rent-A-Car uses a service quality index to measure customers' repeat purchase intentions. Looking backward. Use measures that lead rather than lag the profits in your business. Humana, a health insurer, found that the sickest 10% of its patients account for 80% of its costs; now it offers customers incentives for early screening. Putting your faith in numbers. The soft drinks company Britvic evaluates its executive coaching program not by trying to assign it an ROI number but by tracking participants' careers for a year. Gaming your metrics. The law firm Clifford Chance replaced its single, easy-to-game metric of billable hours with seven criteria on which to base bonuses. Sticking to your numbers too long. Be precise about what you want to assess and explicit about what metrics are assessing it. Such clarity would have helped investors interpret the AAA ratings involved in the financial meltdown. Really good assessment will combine finance managers' relative independence with line managers' expertise.
Virtual reality simulator training for laparoscopic colectomy: what metrics have construct validity?
Shanmugan, Skandan; Leblanc, Fabien; Senagore, Anthony J; Ellis, C Neal; Stein, Sharon L; Khan, Sadaf; Delaney, Conor P; Champagne, Bradley J
2014-02-01
Virtual reality simulation for laparoscopic colectomy has been used for training of surgical residents and has been considered as a model for technical skills assessment of board-eligible colorectal surgeons. However, construct validity (the ability to distinguish between skill levels) must be confirmed before widespread implementation. This study was designed to specifically determine which metrics for laparoscopic sigmoid colectomy have evidence of construct validity. General surgeons that had performed fewer than 30 laparoscopic colon resections and laparoscopic colorectal experts (>200 laparoscopic colon resections) performed laparoscopic sigmoid colectomy on the LAP Mentor model. All participants received a 15-minute instructional warm-up and had never used the simulator before the study. Performance was then compared between each group for 21 metrics (procedural, 14; intraoperative errors, 7) to determine specifically which measurements demonstrate construct validity. Performance was compared with the Mann-Whitney U-test (p < 0.05 was significant). Fifty-three surgeons; 29 general surgeons, and 24 colorectal surgeons enrolled in the study. The virtual reality simulators for laparoscopic sigmoid colectomy demonstrated construct validity for 8 of 14 procedural metrics by distinguishing levels of surgical experience (p < 0.05). The most discriminatory procedural metrics (p < 0.01) favoring experts were reduced instrument path length, accuracy of the peritoneal/medial mobilization, and dissection of the inferior mesenteric artery. Intraoperative errors were not discriminatory for most metrics and favored general surgeons for colonic wall injury (general surgeons, 0.7; colorectal surgeons, 3.5; p = 0.045). Individual variability within the general surgeon and colorectal surgeon groups was not accounted for. The virtual reality simulators for laparoscopic sigmoid colectomy demonstrated construct validity for 8 procedure-specific metrics. However, using virtual reality simulator metrics to detect intraoperative errors did not discriminate between groups. If the virtual reality simulator continues to be used for the technical assessment of trainees and board-eligible surgeons, the evaluation of performance should be limited to procedural metrics.
Implementation of statistical process control for proteomic experiments via LC MS/MS.
Bereman, Michael S; Johnson, Richard; Bollinger, James; Boss, Yuval; Shulman, Nick; MacLean, Brendan; Hoofnagle, Andrew N; MacCoss, Michael J
2014-04-01
Statistical process control (SPC) is a robust set of tools that aids in the visualization, detection, and identification of assignable causes of variation in any process that creates products, services, or information. A tool has been developed termed Statistical Process Control in Proteomics (SProCoP) which implements aspects of SPC (e.g., control charts and Pareto analysis) into the Skyline proteomics software. It monitors five quality control metrics in a shotgun or targeted proteomic workflow. None of these metrics require peptide identification. The source code, written in the R statistical language, runs directly from the Skyline interface, which supports the use of raw data files from several of the mass spectrometry vendors. It provides real time evaluation of the chromatographic performance (e.g., retention time reproducibility, peak asymmetry, and resolution), and mass spectrometric performance (targeted peptide ion intensity and mass measurement accuracy for high resolving power instruments) via control charts. Thresholds are experiment- and instrument-specific and are determined empirically from user-defined quality control standards that enable the separation of random noise and systematic error. Finally, Pareto analysis provides a summary of performance metrics and guides the user to metrics with high variance. The utility of these charts to evaluate proteomic experiments is illustrated in two case studies.
Climate Data Analytics Workflow Management
NASA Astrophysics Data System (ADS)
Zhang, J.; Lee, S.; Pan, L.; Mattmann, C. A.; Lee, T. J.
2016-12-01
In this project we aim to pave a novel path to create a sustainable building block toward Earth science big data analytics and knowledge sharing. Closely studying how Earth scientists conduct data analytics research in their daily work, we have developed a provenance model to record their activities, and to develop a technology to automatically generate workflows for scientists from the provenance. On top of it, we have built the prototype of a data-centric provenance repository, and establish a PDSW (People, Data, Service, Workflow) knowledge network to support workflow recommendation. To ensure the scalability and performance of the expected recommendation system, we have leveraged the Apache OODT system technology. The community-approved, metrics-based performance evaluation web-service will allow a user to select a metric from the list of several community-approved metrics and to evaluate model performance using the metric as well as the reference dataset. This service will facilitate the use of reference datasets that are generated in support of the model-data intercomparison projects such as Obs4MIPs and Ana4MIPs. The data-centric repository infrastructure will allow us to catch richer provenance to further facilitate knowledge sharing and scientific collaboration in the Earth science community. This project is part of Apache incubator CMDA project.
NASA Astrophysics Data System (ADS)
Ciaramello, Francis M.; Hemami, Sheila S.
2007-02-01
For members of the Deaf Community in the United States, current communication tools include TTY/TTD services, video relay services, and text-based communication. With the growth of cellular technology, mobile sign language conversations are becoming a possibility. Proper coding techniques must be employed to compress American Sign Language (ASL) video for low-rate transmission while maintaining the quality of the conversation. In order to evaluate these techniques, an appropriate quality metric is needed. This paper demonstrates that traditional video quality metrics, such as PSNR, fail to predict subjective intelligibility scores. By considering the unique structure of ASL video, an appropriate objective metric is developed. Face and hand segmentation is performed using skin-color detection techniques. The distortions in the face and hand regions are optimally weighted and pooled across all frames to create an objective intelligibility score for a distorted sequence. The objective intelligibility metric performs significantly better than PSNR in terms of correlation with subjective responses.
Abriata, Luciano A; Kinch, Lisa N; Tamò, Giorgio E; Monastyrskyy, Bohdan; Kryshtafovych, Andriy; Dal Peraro, Matteo
2018-03-01
For assessment purposes, CASP targets are split into evaluation units. We herein present the official definition of CASP12 evaluation units (EUs) and their classification into difficulty categories. Each target can be evaluated as one EU (the whole target) or/and several EUs (separate structural domains or groups of structural domains). The specific scenario for a target split is determined by the domain organization of available templates, the difference in server performance on separate domains versus combination of the domains, and visual inspection. In the end, 71 targets were split into 96 EUs. Classification of the EUs into difficulty categories was done semi-automatically with the assistance of metrics provided by the Prediction Center. These metrics account for sequence and structural similarities of the EUs to potential structural templates from the Protein Data Bank, and for the baseline performance of automated server predictions. The metrics readily separate the 96 EUs into 38 EUs that should be straightforward for template-based modeling (TBM) and 39 that are expected to be hard for homology modeling and are thus left for free modeling (FM). The remaining 19 borderline evaluation units were dubbed FM/TBM, and were inspected case by case. The article also overviews structural and evolutionary features of selected targets relevant to our accompanying article presenting the assessment of FM and FM/TBM predictions, and overviews structural features of the hardest evaluation units from the FM category. We finally suggest improvements for the EU definition and classification procedures. © 2017 Wiley Periodicals, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Loughran, B; Singh, V; Jain, A
Purpose: Although generalized linear system analytic metrics such as GMTF and GDQE can evaluate performance of the whole imaging system including detector, scatter and focal-spot, a simplified task-specific measured metric may help to better compare detector systems. Methods: Low quantum-noise images of a neuro-vascular stent with a modified ANSI head phantom were obtained from the average of many exposures taken with the high-resolution Micro-Angiographic Fluoroscope (MAF) and with a Flat Panel Detector (FPD). The square of the Fourier Transform of each averaged image, equivalent to the measured product of the system GMTF and the object function in spatial-frequency space, wasmore » then divided by the normalized noise power spectra (NNPS) for each respective system to obtain a task-specific generalized signal-to-noise ratio. A generalized measured relative object detectability (GM-ROD) was obtained by taking the ratio of the integral of the resulting expressions for each detector system to give an overall metric that enables a realistic systems comparison for the given detection task. Results: The GM-ROD provides comparison of relative performance of detector systems from actual measurements of the object function as imaged by those detector systems. This metric includes noise correlations and spatial frequencies relevant to the specific object. Additionally, the integration bounds for the GM-ROD can be selected to emphasis the higher frequency band of each detector if high-resolution image details are to be evaluated. Examples of this new metric are discussed with a comparison of the MAF to the FPD for neuro-vascular interventional imaging. Conclusion: The GM-ROD is a new direct-measured task-specific metric that can provide clinically relevant comparison of the relative performance of imaging systems. Supported by NIH Grant: 2R01EB002873 and an equipment grant from Toshiba Medical Systems Corporation.« less
Thermodynamic efficiency of nonimaging concentrators
NASA Astrophysics Data System (ADS)
Shatz, Narkis; Bortz, John; Winston, Roland
2009-08-01
The purpose of a nonimaging concentrator is to transfer maximal flux from the phase space of a source to that of a target. A concentrator's performance can be expressed relative to a thermodynamic reference. We discuss consequences of Fermat's principle of geometrical optics. We review étendue dilution and optical loss mechanisms associated with nonimaging concentrators, especially for the photovoltaic (PV) role. We introduce the concept of optical thermodynamic efficiency which is a performance metric combining the first and second laws of thermodynamics. The optical thermodynamic efficiency is a comprehensive metric that takes into account all loss mechanisms associated with transferring flux from the source to the target phase space, which may include losses due to inadequate design, non-ideal materials, fabrication errors, and less than maximal concentration. As such, this metric is a gold standard for evaluating the performance of nonimaging concentrators. Examples are provided to illustrate the use of this new metric. In particular we discuss concentrating PV systems for solar power applications.
Dubin, Ariel K; Smith, Roger; Julian, Danielle; Tanaka, Alyssa; Mattingly, Patricia
To answer the question of whether there is a difference between robotic virtual reality simulator performance assessment and validated human reviewers. Current surgical education relies heavily on simulation. Several assessment tools are available to the trainee, including the actual robotic simulator assessment metrics and the Global Evaluative Assessment of Robotic Skills (GEARS) metrics, both of which have been independently validated. GEARS is a rating scale through which human evaluators can score trainees' performances on 6 domains: depth perception, bimanual dexterity, efficiency, force sensitivity, autonomy, and robotic control. Each domain is scored on a 5-point Likert scale with anchors. We used 2 common robotic simulators, the dV-Trainer (dVT; Mimic Technologies Inc., Seattle, WA) and the da Vinci Skills Simulator (dVSS; Intuitive Surgical, Sunnyvale, CA), to compare the performance metrics of robotic surgical simulators with the GEARS for a basic robotic task on each simulator. A prospective single-blinded randomized study. A surgical education and training center. Surgeons and surgeons in training. Demographic information was collected including sex, age, level of training, specialty, and previous surgical and simulator experience. Subjects performed 2 trials of ring and rail 1 (RR1) on each of the 2 simulators (dVSS and dVT) after undergoing randomization and warm-up exercises. The second RR1 trial simulator performance was recorded, and the deidentified videos were sent to human reviewers using GEARS. Eight different simulator assessment metrics were identified and paired with a similar performance metric in the GEARS tool. The GEARS evaluation scores and simulator assessment scores were paired and a Spearman rho calculated for their level of correlation. Seventy-four subjects were enrolled in this randomized study with 9 subjects excluded for missing or incomplete data. There was a strong correlation between the GEARS score and the simulator metric score for time to complete versus efficiency, time to complete versus total score, economy of motion versus depth perception, and overall score versus total score with rho coefficients greater than or equal to 0.70; these were significant (p < .0001). Those with weak correlation (rho ≥0.30) were bimanual dexterity versus economy of motion, efficiency versus master workspace range, bimanual dexterity versus master workspace range, and robotic control versus instrument collisions. On basic VR tasks, several simulator metrics are well matched with GEARS scores assigned by human reviewers, but others are not. Identifying these matches/mismatches can improve the training and assessment process when using robotic surgical simulators. Copyright © 2017 American Association of Gynecologic Laparoscopists. Published by Elsevier Inc. All rights reserved.
Human-centric predictive model of task difficulty for human-in-the-loop control tasks
Majewicz Fey, Ann
2018-01-01
Quantitatively measuring the difficulty of a manipulation task in human-in-the-loop control systems is ill-defined. Currently, systems are typically evaluated through task-specific performance measures and post-experiment user surveys; however, these methods do not capture the real-time experience of human users. In this study, we propose to analyze and predict the difficulty of a bivariate pointing task, with a haptic device interface, using human-centric measurement data in terms of cognition, physical effort, and motion kinematics. Noninvasive sensors were used to record the multimodal response of human user for 14 subjects performing the task. A data-driven approach for predicting task difficulty was implemented based on several task-independent metrics. We compare four possible models for predicting task difficulty to evaluated the roles of the various types of metrics, including: (I) a movement time model, (II) a fusion model using both physiological and kinematic metrics, (III) a model only with kinematic metrics, and (IV) a model only with physiological metrics. The results show significant correlation between task difficulty and the user sensorimotor response. The fusion model, integrating user physiology and motion kinematics, provided the best estimate of task difficulty (R2 = 0.927), followed by a model using only kinematic metrics (R2 = 0.921). Both models were better predictors of task difficulty than the movement time model (R2 = 0.847), derived from Fitt’s law, a well studied difficulty model for human psychomotor control. PMID:29621301
Fransson, Boel A; Chen, Chi-Ya; Noyes, Julie A; Ragle, Claude A
2016-11-01
To determine the construct and concurrent validity of instrument motion metrics for laparoscopic skills assessment in virtual reality and augmented reality simulators. Evaluation study. Veterinarian students (novice, n = 14) and veterinarians (experienced, n = 11) with no or variable laparoscopic experience. Participants' minimally invasive surgery (MIS) experience was determined by hospital records of MIS procedures performed in the Teaching Hospital. Basic laparoscopic skills were assessed by 5 tasks using a physical box trainer. Each participant completed 2 tasks for assessments in each type of simulator (virtual reality: bowel handling and cutting; augmented reality: object positioning and a pericardial window model). Motion metrics such as instrument path length, angle or drift, and economy of motion of each simulator were recorded. None of the motion metrics in a virtual reality simulator showed correlation with experience, or to the basic laparoscopic skills score. All metrics in augmented reality were significantly correlated with experience (time, instrument path, and economy of movement), except for the hand dominance metric. The basic laparoscopic skills score was correlated to all performance metrics in augmented reality. The augmented reality motion metrics differed between American College of Veterinary Surgeons diplomates and residents, whereas basic laparoscopic skills score and virtual reality metrics did not. Our results provide construct validity and concurrent validity for motion analysis metrics for an augmented reality system, whereas a virtual reality system was validated only for the time score. © Copyright 2016 by The American College of Veterinary Surgeons.
Prediction of user preference over shared-control paradigms for a robotic wheelchair.
Erdogan, Ahmetcan; Argall, Brenna D
2017-07-01
The design of intelligent powered wheelchairs has traditionally focused heavily on providing effective and efficient navigation assistance. Significantly less attention has been given to the end-user's preference between different assistance paradigms. It is possible to include these subjective evaluations in the design process, for example by soliciting feedback in post-experiment questionnaires. However, constantly querying the user for feedback during real-world operation is not practical. In this paper, we present a model that correlates objective performance metrics and subjective evaluations of autonomous wheelchair control paradigms. Using off-the-shelf machine learning techniques, we show that it is possible to build a model that can predict the most preferred shared-control method from task execution metrics such as effort, safety, performance and utilization. We further characterize the relative contributions of each of these metrics to the individual choice of most preferred assistance paradigm. Our evaluation includes Spinal Cord Injured (SCI) and uninjured subject groups. The results show that our proposed correlation model enables the continuous tracking of user preference and offers the possibility of autonomy that is customized to each user.
Voice based gender classification using machine learning
NASA Astrophysics Data System (ADS)
Raahul, A.; Sapthagiri, R.; Pankaj, K.; Vijayarajan, V.
2017-11-01
Gender identification is one of the major problem speech analysis today. Tracing the gender from acoustic data i.e., pitch, median, frequency etc. Machine learning gives promising results for classification problem in all the research domains. There are several performance metrics to evaluate algorithms of an area. Our Comparative model algorithm for evaluating 5 different machine learning algorithms based on eight different metrics in gender classification from acoustic data. Agenda is to identify gender, with five different algorithms: Linear Discriminant Analysis (LDA), K-Nearest Neighbour (KNN), Classification and Regression Trees (CART), Random Forest (RF), and Support Vector Machine (SVM) on basis of eight different metrics. The main parameter in evaluating any algorithms is its performance. Misclassification rate must be less in classification problems, which says that the accuracy rate must be high. Location and gender of the person have become very crucial in economic markets in the form of AdSense. Here with this comparative model algorithm, we are trying to assess the different ML algorithms and find the best fit for gender classification of acoustic data.
Generating One Biometric Feature from Another: Faces from Fingerprints
Ozkaya, Necla; Sagiroglu, Seref
2010-01-01
This study presents a new approach based on artificial neural networks for generating one biometric feature (faces) from another (only fingerprints). An automatic and intelligent system was designed and developed to analyze the relationships among fingerprints and faces and also to model and to improve the existence of the relationships. The new proposed system is the first study that generates all parts of the face including eyebrows, eyes, nose, mouth, ears and face border from only fingerprints. It is also unique and different from similar studies recently presented in the literature with some superior features. The parameter settings of the system were achieved with the help of Taguchi experimental design technique. The performance and accuracy of the system have been evaluated with 10-fold cross validation technique using qualitative evaluation metrics in addition to the expanded quantitative evaluation metrics. Consequently, the results were presented on the basis of the combination of these objective and subjective metrics for illustrating the qualitative properties of the proposed methods as well as a quantitative evaluation of their performances. Experimental results have shown that one biometric feature can be determined from another. These results have once more indicated that there is a strong relationship between fingerprints and faces. PMID:22399877
Weber-aware weighted mutual information evaluation for infrared-visible image fusion
NASA Astrophysics Data System (ADS)
Luo, Xiaoyan; Wang, Shining; Yuan, Ding
2016-10-01
A performance metric for infrared and visible image fusion is proposed based on Weber's law. To indicate the stimulus of source images, two Weber components are provided. One is differential excitation to reflect the spectral signal of visible and infrared images, and the other is orientation to capture the scene structure feature. By comparing the corresponding Weber component in infrared and visible images, the source pixels can be marked with different dominant properties in intensity or structure. If the pixels have the same dominant property label, the pixels are grouped to calculate the mutual information (MI) on the corresponding Weber components between dominant source and fused images. Then, the final fusion metric is obtained via weighting the group-wise MI values according to the number of pixels in different groups. Experimental results demonstrate that the proposed metric performs well on popular image fusion cases and outperforms other image fusion metrics.
NASA Technical Reports Server (NTRS)
Matic, Roy M.; Mosley, Judith I.
1994-01-01
Future space-based, remote sensing systems will have data transmission requirements that exceed available downlinks necessitating the use of lossy compression techniques for multispectral data. In this paper, we describe several algorithms for lossy compression of multispectral data which combine spectral decorrelation techniques with an adaptive, wavelet-based, image compression algorithm to exploit both spectral and spatial correlation. We compare the performance of several different spectral decorrelation techniques including wavelet transformation in the spectral dimension. The performance of each technique is evaluated at compression ratios ranging from 4:1 to 16:1. Performance measures used are visual examination, conventional distortion measures, and multispectral classification results. We also introduce a family of distortion metrics that are designed to quantify and predict the effect of compression artifacts on multi spectral classification of the reconstructed data.
Munro, Sarah A; Lund, Steven P; Pine, P Scott; Binder, Hans; Clevert, Djork-Arné; Conesa, Ana; Dopazo, Joaquin; Fasold, Mario; Hochreiter, Sepp; Hong, Huixiao; Jafari, Nadereh; Kreil, David P; Łabaj, Paweł P; Li, Sheng; Liao, Yang; Lin, Simon M; Meehan, Joseph; Mason, Christopher E; Santoyo-Lopez, Javier; Setterquist, Robert A; Shi, Leming; Shi, Wei; Smyth, Gordon K; Stralis-Pavese, Nancy; Su, Zhenqiang; Tong, Weida; Wang, Charles; Wang, Jian; Xu, Joshua; Ye, Zhan; Yang, Yong; Yu, Ying; Salit, Marc
2014-09-25
There is a critical need for standard approaches to assess, report and compare the technical performance of genome-scale differential gene expression experiments. Here we assess technical performance with a proposed standard 'dashboard' of metrics derived from analysis of external spike-in RNA control ratio mixtures. These control ratio mixtures with defined abundance ratios enable assessment of diagnostic performance of differentially expressed transcript lists, limit of detection of ratio (LODR) estimates and expression ratio variability and measurement bias. The performance metrics suite is applicable to analysis of a typical experiment, and here we also apply these metrics to evaluate technical performance among laboratories. An interlaboratory study using identical samples shared among 12 laboratories with three different measurement processes demonstrates generally consistent diagnostic power across 11 laboratories. Ratio measurement variability and bias are also comparable among laboratories for the same measurement process. We observe different biases for measurement processes using different mRNA-enrichment protocols.
Fu, Lawrence D; Aphinyanaphongs, Yindalon; Wang, Lily; Aliferis, Constantin F
2011-08-01
Evaluating the biomedical literature and health-related websites for quality are challenging information retrieval tasks. Current commonly used methods include impact factor for journals, PubMed's clinical query filters and machine learning-based filter models for articles, and PageRank for websites. Previous work has focused on the average performance of these methods without considering the topic, and it is unknown how performance varies for specific topics or focused searches. Clinicians, researchers, and users should be aware when expected performance is not achieved for specific topics. The present work analyzes the behavior of these methods for a variety of topics. Impact factor, clinical query filters, and PageRank vary widely across different topics while a topic-specific impact factor and machine learning-based filter models are more stable. The results demonstrate that a method may perform excellently on average but struggle when used on a number of narrower topics. Topic-adjusted metrics and other topic robust methods have an advantage in such situations. Users of traditional topic-sensitive metrics should be aware of their limitations. Copyright © 2011 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Ribeiro, M. Gabriela T. C.; Machado, Adelio A. S. C.
2013-01-01
Two new semiquantitative green chemistry metrics, the green circle and the green matrix, have been developed for quick assessment of the greenness of a chemical reaction or process, even without performing the experiment from a protocol if enough detail is provided in it. The evaluation is based on the 12 principles of green chemistry. The…
Evaluating structural pattern recognition for handwritten math via primitive label graphs
NASA Astrophysics Data System (ADS)
Zanibbi, Richard; MoucheÌre, Harold; Viard-Gaudin, Christian
2013-01-01
Currently, structural pattern recognizer evaluations compare graphs of detected structure to target structures (i.e. ground truth) using recognition rates, recall and precision for object segmentation, classification and relationships. In document recognition, these target objects (e.g. symbols) are frequently comprised of multiple primitives (e.g. connected components, or strokes for online handwritten data), but current metrics do not characterize errors at the primitive level, from which object-level structure is obtained. Primitive label graphs are directed graphs defined over primitives and primitive pairs. We define new metrics obtained by Hamming distances over label graphs, which allow classification, segmentation and parsing errors to be characterized separately, or using a single measure. Recall and precision for detected objects may also be computed directly from label graphs. We illustrate the new metrics by comparing a new primitive-level evaluation to the symbol-level evaluation performed for the CROHME 2012 handwritten math recognition competition. A Python-based set of utilities for evaluating, visualizing and translating label graphs is publicly available.
Evaluation metrics for biostatistical and epidemiological collaborations.
Rubio, Doris McGartland; Del Junco, Deborah J; Bhore, Rafia; Lindsell, Christopher J; Oster, Robert A; Wittkowski, Knut M; Welty, Leah J; Li, Yi-Ju; Demets, Dave
2011-10-15
Increasing demands for evidence-based medicine and for the translation of biomedical research into individual and public health benefit have been accompanied by the proliferation of special units that offer expertise in biostatistics, epidemiology, and research design (BERD) within academic health centers. Objective metrics that can be used to evaluate, track, and improve the performance of these BERD units are critical to their successful establishment and sustainable future. To develop a set of reliable but versatile metrics that can be adapted easily to different environments and evolving needs, we consulted with members of BERD units from the consortium of academic health centers funded by the Clinical and Translational Science Award Program of the National Institutes of Health. Through a systematic process of consensus building and document drafting, we formulated metrics that covered the three identified domains of BERD practices: the development and maintenance of collaborations with clinical and translational science investigators, the application of BERD-related methods to clinical and translational research, and the discovery of novel BERD-related methodologies. In this article, we describe the set of metrics and advocate their use for evaluating BERD practices. The routine application, comparison of findings across diverse BERD units, and ongoing refinement of the metrics will identify trends, facilitate meaningful changes, and ultimately enhance the contribution of BERD activities to biomedical research. Copyright © 2011 John Wiley & Sons, Ltd.
Tools for monitoring system suitability in LC MS/MS centric proteomic experiments.
Bereman, Michael S
2015-03-01
With advances in liquid chromatography coupled to tandem mass spectrometry technologies combined with the continued goals of biomarker discovery, clinical applications of established biomarkers, and integrating large multiomic datasets (i.e. "big data"), there remains an urgent need for robust tools to assess instrument performance (i.e. system suitability) in proteomic workflows. To this end, several freely available tools have been introduced that monitor a number of peptide identification (ID) and/or peptide ID free metrics. Peptide ID metrics include numbers of proteins, peptides, or peptide spectral matches identified from a complex mixture. Peptide ID free metrics include retention time reproducibility, full width half maximum, ion injection times, and integrated peptide intensities. The main driving force in the development of these tools is to monitor both intra- and interexperiment performance variability and to identify sources of variation. The purpose of this review is to summarize and evaluate these tools based on versatility, automation, vendor neutrality, metrics monitored, and visualization capabilities. In addition, the implementation of a robust system suitability workflow is discussed in terms of metrics, type of standard, and frequency of evaluation along with the obstacles to overcome prior to incorporating a more proactive approach to overall quality control in liquid chromatography coupled to tandem mass spectrometry based proteomic workflows. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Kasturi, Rangachar; Goldgof, Dmitry; Soundararajan, Padmanabhan; Manohar, Vasant; Garofolo, John; Bowers, Rachel; Boonstra, Matthew; Korzhova, Valentina; Zhang, Jing
2009-02-01
Common benchmark data sets, standardized performance metrics, and baseline algorithms have demonstrated considerable impact on research and development in a variety of application domains. These resources provide both consumers and developers of technology with a common framework to objectively compare the performance of different algorithms and algorithmic improvements. In this paper, we present such a framework for evaluating object detection and tracking in video: specifically for face, text, and vehicle objects. This framework includes the source video data, ground-truth annotations (along with guidelines for annotation), performance metrics, evaluation protocols, and tools including scoring software and baseline algorithms. For each detection and tracking task and supported domain, we developed a 50-clip training set and a 50-clip test set. Each data clip is approximately 2.5 minutes long and has been completely spatially/temporally annotated at the I-frame level. Each task/domain, therefore, has an associated annotated corpus of approximately 450,000 frames. The scope of such annotation is unprecedented and was designed to begin to support the necessary quantities of data for robust machine learning approaches, as well as a statistically significant comparison of the performance of algorithms. The goal of this work was to systematically address the challenges of object detection and tracking through a common evaluation framework that permits a meaningful objective comparison of techniques, provides the research community with sufficient data for the exploration of automatic modeling techniques, encourages the incorporation of objective evaluation into the development process, and contributes useful lasting resources of a scale and magnitude that will prove to be extremely useful to the computer vision research community for years to come.
Re-Assessing Green Building Performance: A Post Occupancy Evaluation of 22 GSA Buildings
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fowler, Kimberly M.; Rauch, Emily M.; Henderson, Jordan W.
2010-06-01
2nd report on the performance of GSA's sustainably designed buildings. The purpose of this study was to provide an overview of measured whole building performance as it compares to GSA and industry baselines. The PNNL research team found the data analysis illuminated strengths and weaknesses of individual buildings as well as the portfolio of buildings. This section includes summary data, observations that cross multiple performance metrics, discussion of lessons learned from this research, and opportunities for future research. The summary of annual data for each of the performance metrics is provided in Table 25. The data represent 1 year ofmore » measurements and are not associated with any specific design features or strategies. Where available, multiple years of data were examined and there were minimal significant differences between the years. Individually focused post occupancy evaluation (POEs) would allow for more detailed analysis of the buildings. Examining building performance over multiple years could potentially offer a useful diagnostic tool for identifying building operations that are in need of operational changes. Investigating what the connection is between the building performance and the design intent would offer potential design guidance and possible insight into building operation strategies. The 'aggregate operating cost' metric used in this study represents the costs that were available for developing a comparative industry baseline for office buildings. The costs include water utilities, energy utilities, general maintenance, grounds maintenance, waste and recycling, and janitorial costs. Three of the buildings that cost more than the baseline in Figure 45 have higher maintenance costs than the baseline, and one has higher energy costs. Given the volume of data collected and analyzed for this study, the inevitable request is for a simple answer with respect to sustainably designed building performance. As previously stated, compiling the individual building values into single metrics is not statistically valid given the small number of buildings, but it has been done to provide a cursory view of this portfolio of sustainably designed buildings. For all metrics except recycling cost per rentable square foot and CBE survey response rate, the averaged building performance was better than the baseline for the GSA buildings in this study.« less
Toward objective image quality metrics: the AIC Eval Program of the JPEG
NASA Astrophysics Data System (ADS)
Richter, Thomas; Larabi, Chaker
2008-08-01
Objective quality assessment of lossy image compression codecs is an important part of the recent call of the JPEG for Advanced Image Coding. The target of the AIC ad-hoc group is twofold: First, to receive state-of-the-art still image codecs and to propose suitable technology for standardization; and second, to study objective image quality metrics to evaluate the performance of such codes. Even tthough the performance of an objective metric is defined by how well it predicts the outcome of a subjective assessment, one can also study the usefulness of a metric in a non-traditional way indirectly, namely by measuring the subjective quality improvement of a codec that has been optimized for a specific objective metric. This approach shall be demonstrated here on the recently proposed HDPhoto format14 introduced by Microsoft and a SSIM-tuned17 version of it by one of the authors. We compare these two implementations with JPEG1 in two variations and a visual and PSNR optimal JPEG200013 implementation. To this end, we use subjective and objective tests based on the multiscale SSIM and a new DCT based metric.
Performance Evaluation Methods for Assistive Robotic Technology
NASA Astrophysics Data System (ADS)
Tsui, Katherine M.; Feil-Seifer, David J.; Matarić, Maja J.; Yanco, Holly A.
Robots have been developed for several assistive technology domains, including intervention for Autism Spectrum Disorders, eldercare, and post-stroke rehabilitation. Assistive robots have also been used to promote independent living through the use of devices such as intelligent wheelchairs, assistive robotic arms, and external limb prostheses. Work in the broad field of assistive robotic technology can be divided into two major research phases: technology development, in which new devices, software, and interfaces are created; and clinical, in which assistive technology is applied to a given end-user population. Moving from technology development towards clinical applications is a significant challenge. Developing performance metrics for assistive robots poses a related set of challenges. In this paper, we survey several areas of assistive robotic technology in order to derive and demonstrate domain-specific means for evaluating the performance of such systems. We also present two case studies of applied performance measures and a discussion regarding the ubiquity of functional performance measures across the sampled domains. Finally, we present guidelines for incorporating human performance metrics into end-user evaluations of assistive robotic technologies.
Duran, Cassidy; Estrada, Sean; O'Malley, Marcia; Sheahan, Malachi G; Shames, Murray L; Lee, Jason T; Bismuth, Jean
2015-12-01
Fundamental skills testing is now required for certification in general surgery. No model for assessing fundamental endovascular skills exists. Our objective was to develop a model that tests the fundamental endovascular skills and differentiates competent from noncompetent performance. The Fundamentals of Endovascular Surgery model was developed in silicon and virtual-reality versions. Twenty individuals (with a range of experience) performed four tasks on each model in three separate sessions. Tasks on the silicon model were performed under fluoroscopic guidance, and electromagnetic tracking captured motion metrics for catheter tip position. Image processing captured tool tip position and motion on the virtual model. Performance was evaluated using a global rating scale, blinded video assessment of error metrics, and catheter tip movement and position. Motion analysis was based on derivations of speed and position that define proficiency of movement (spectral arc length, duration of submovement, and number of submovements). Performance was significantly different between competent and noncompetent interventionalists for the three performance measures of motion metrics, error metrics, and global rating scale. The mean error metric score was 6.83 for noncompetent individuals and 2.51 for the competent group (P < .0001). Median global rating scores were 2.25 for the noncompetent group and 4.75 for the competent users (P < .0001). The Fundamentals of Endovascular Surgery model successfully differentiates competent and noncompetent performance of fundamental endovascular skills based on a series of objective performance measures. This model could serve as a platform for skills testing for all trainees. Copyright © 2015 Society for Vascular Surgery. Published by Elsevier Inc. All rights reserved.
No-reference image quality assessment for horizontal-path imaging scenarios
NASA Astrophysics Data System (ADS)
Rios, Carlos; Gladysz, Szymon
2013-05-01
There exist several image-enhancement algorithms and tasks associated with imaging through turbulence that depend on defining the quality of an image. Examples include: "lucky imaging", choosing the width of the inverse filter for image reconstruction, or stopping iterative deconvolution. We collected a number of image quality metrics found in the literature. Particularly interesting are the blind, "no-reference" metrics. We discuss ways of evaluating the usefulness of these metrics, even when a fully objective comparison is impossible because of the lack of a reference image. Metrics are tested on simulated and real data. Field data comes from experiments performed by the NATO SET 165 research group over a 7 km distance in Dayton, Ohio.
Memory colours and colour quality evaluation of conventional and solid-state lamps.
Smet, Kevin A G; Ryckaert, Wouter R; Pointer, Michael R; Deconinck, Geert; Hanselaer, Peter
2010-12-06
A colour quality metric based on memory colours is presented. The basic idea is simple. The colour quality of a test source is evaluated as the degree of similarity between the colour appearance of a set of familiar objects and their memory colours. The closer the match, the better the colour quality. This similarity was quantified using a set of similarity distributions obtained by Smet et al. in a previous study. The metric was validated by calculating the Pearson and Spearman correlation coefficients between the metric predictions and the visual appreciation results obtained in a validation experiment conducted by the authors as well those obtained in two independent studies. The metric was found to correlate well with the visual appreciation of the lighting quality of the sources used in the three experiments. Its performance was also compared with that of the CIE colour rendering index and the NIST colour quality scale. For all three experiments, the metric was found to be significantly better at predicting the correct visual rank order of the light sources (p < 0.1).
DOT National Transportation Integrated Search
2016-05-01
This study evaluated the accuracy of approach volumes and free flow approach speeds collected by the Wavetronix : SmartSensor Advance sensor for the Signal Performance Metrics system of the Utah Department of Transportation (UDOT), : using the field ...
NASA Astrophysics Data System (ADS)
Trimborn, Barbara; Wolf, Ivo; Abu-Sammour, Denis; Henzler, Thomas; Schad, Lothar R.; Zöllner, Frank G.
2017-03-01
Image registration of preprocedural contrast-enhanced CTs to intraprocedual cone-beam computed tomography (CBCT) can provide additional information for interventional liver oncology procedures such as transcatheter arterial chemoembolisation (TACE). In this paper, a novel similarity metric for gradient-based image registration is proposed. The metric relies on the patch-based computation of histograms of oriented gradients (HOG) building the basis for a feature descriptor. The metric was implemented in a framework for rigid 3D-3D-registration of pre-interventional CT with intra-interventional CBCT data obtained during the workflow of a TACE. To evaluate the performance of the new metric, the capture range was estimated based on the calculation of the mean target registration error and compared to the results obtained with a normalized cross correlation metric. The results show that 3D HOG feature descriptors are suitable as image-similarity metric and that the novel metric can compete with established methods in terms of registration accuracy
Pine, P S; Boedigheimer, M; Rosenzweig, B A; Turpaz, Y; He, Y D; Delenstarr, G; Ganter, B; Jarnagin, K; Jones, W D; Reid, L H; Thompson, K L
2008-11-01
Effective use of microarray technology in clinical and regulatory settings is contingent on the adoption of standard methods for assessing performance. The MicroArray Quality Control project evaluated the repeatability and comparability of microarray data on the major commercial platforms and laid the groundwork for the application of microarray technology to regulatory assessments. However, methods for assessing performance that are commonly applied to diagnostic assays used in laboratory medicine remain to be developed for microarray assays. A reference system for microarray performance evaluation and process improvement was developed that includes reference samples, metrics and reference datasets. The reference material is composed of two mixes of four different rat tissue RNAs that allow defined target ratios to be assayed using a set of tissue-selective analytes that are distributed along the dynamic range of measurement. The diagnostic accuracy of detected changes in expression ratios, measured as the area under the curve from receiver operating characteristic plots, provides a single commutable value for comparing assay specificity and sensitivity. The utility of this system for assessing overall performance was evaluated for relevant applications like multi-laboratory proficiency testing programs and single-laboratory process drift monitoring. The diagnostic accuracy of detection of a 1.5-fold change in signal level was found to be a sensitive metric for comparing overall performance. This test approaches the technical limit for reliable discrimination of differences between two samples using this technology. We describe a reference system that provides a mechanism for internal and external assessment of laboratory proficiency with microarray technology and is translatable to performance assessments on other whole-genome expression arrays used for basic and clinical research.
NASA Astrophysics Data System (ADS)
Moslehi, M.; de Barros, F.
2017-12-01
Complexity of hydrogeological systems arises from the multi-scale heterogeneity and insufficient measurements of their underlying parameters such as hydraulic conductivity and porosity. An inadequate characterization of hydrogeological properties can significantly decrease the trustworthiness of numerical models that predict groundwater flow and solute transport. Therefore, a variety of data assimilation methods have been proposed in order to estimate hydrogeological parameters from spatially scarce data by incorporating the governing physical models. In this work, we propose a novel framework for evaluating the performance of these estimation methods. We focus on the Ensemble Kalman Filter (EnKF) approach that is a widely used data assimilation technique. It reconciles multiple sources of measurements to sequentially estimate model parameters such as the hydraulic conductivity. Several methods have been used in the literature to quantify the accuracy of the estimations obtained by EnKF, including Rank Histograms, RMSE and Ensemble Spread. However, these commonly used methods do not regard the spatial information and variability of geological formations. This can cause hydraulic conductivity fields with very different spatial structures to have similar histograms or RMSE. We propose a vision-based approach that can quantify the accuracy of estimations by considering the spatial structure embedded in the estimated fields. Our new approach consists of adapting a new metric, Color Coherent Vectors (CCV), to evaluate the accuracy of estimated fields achieved by EnKF. CCV is a histogram-based technique for comparing images that incorporate spatial information. We represent estimated fields as digital three-channel images and use CCV to compare and quantify the accuracy of estimations. The sensitivity of CCV to spatial information makes it a suitable metric for assessing the performance of spatial data assimilation techniques. Under various factors of data assimilation methods such as number, layout, and type of measurements, we compare the performance of CCV with other metrics such as RMSE. By simulating hydrogeological processes using estimated and true fields, we observe that CCV outperforms other existing evaluation metrics.
The field performance of frontal air bags: a review of the literature.
Kent, Richard; Viano, David C; Crandall, Jeff
2005-03-01
This article presents a broad review of the literature on frontal air bag field performance, starting with the initial government and industry projections of effectiveness and concluding with the most recent assessments of depowered systems. This review includes as many relevant metrics as practicable, interprets the findings, and provides references so the interested reader can further evaluate the limitations, confounders, and utility of each metric. The evaluations presented here range from the very specific (individual case studies) to the general (statistical analyses of large databases). The metrics used to evaluate air bag performance include fatality reduction or increase; serious, moderate, and minor injury reduction or increase; harm reduction or increase; and cost analyses, including insurance costs and the cost of life years saved for various air bag systems and design philosophies. The review begins with the benefits of air bags. Fatality and injury reductions attributable to the air bag are presented. Next, the negative consequences of air bag deployment are described. Injuries to adults and children and the current trends in air bag injury rates are discussed, as are the few documented instances of inadvertent deployments or non-deployment in severe crashes. In the third section, an attempt is made to quantify the influence of the many confounding factors that affect air bag performance. The negative and positive characteristics of air bags are then put into perspective within the context of societal costs and benefits. Finally, some special topics, including risk homeostasis and the performance of face bags, are discussed.
Integrated Resilient Aircraft Control Project Full Scale Flight Validation
NASA Technical Reports Server (NTRS)
Bosworth, John T.
2009-01-01
Objective: Provide validation of adaptive control law concepts through full scale flight evaluation. Technical Approach: a) Engage failure mode - destabilizing or frozen surface. b) Perform formation flight and air-to-air tracking tasks. Evaluate adaptive algorithm: a) Stability metrics. b) Model following metrics. Full scale flight testing provides an ability to validate different adaptive flight control approaches. Full scale flight testing adds credence to NASA's research efforts. A sustained research effort is required to remove the road blocks and provide adaptive control as a viable design solution for increased aircraft resilience.
Stability and Performance Metrics for Adaptive Flight Control
NASA Technical Reports Server (NTRS)
Stepanyan, Vahram; Krishnakumar, Kalmanje; Nguyen, Nhan; VanEykeren, Luarens
2009-01-01
This paper addresses the problem of verifying adaptive control techniques for enabling safe flight in the presence of adverse conditions. Since the adaptive systems are non-linear by design, the existing control verification metrics are not applicable to adaptive controllers. Moreover, these systems are in general highly uncertain. Hence, the system's characteristics cannot be evaluated by relying on the available dynamical models. This necessitates the development of control verification metrics based on the system's input-output information. For this point of view, a set of metrics is introduced that compares the uncertain aircraft's input-output behavior under the action of an adaptive controller to that of a closed-loop linear reference model to be followed by the aircraft. This reference model is constructed for each specific maneuver using the exact aerodynamic and mass properties of the aircraft to meet the stability and performance requirements commonly accepted in flight control. The proposed metrics are unified in the sense that they are model independent and not restricted to any specific adaptive control methods. As an example, we present simulation results for a wing damaged generic transport aircraft with several existing adaptive controllers.
75 FR 28667 - Market Structure Roundtable
Federal Register 2010, 2011, 2012, 2013, 2014
2010-05-21
... views of investors, issuers, exchanges, alternative trading systems, financial services firms, high... roundtable will focus on market structure performance, including the events of May 6, metrics for evaluating market structure performance, high frequency trading, and undisplayed liquidity. The roundtable...
Modeling and Performance Evaluation of Backoff Misbehaving Nodes in CSMA/CA Networks
2012-08-01
Modeling and Performance Evaluation of Backoff Misbehaving Nodes in CSMA/CA Networks Zhuo Lu, Student Member, IEEE, Wenye Wang, Senior Member, IEEE... misbehaving nodes can obtain, we define and study two general classes of backoff misbehavior: continuous misbehavior, which keeps manipulating the backoff...misbehavior sporadically. Our approach is to introduce a new performance metric, namely order gain, to characterize the performance benefits of misbehaving
A computational imaging target specific detectivity metric
NASA Astrophysics Data System (ADS)
Preece, Bradley L.; Nehmetallah, George
2017-05-01
Due to the large quantity of low-cost, high-speed computational processing available today, computational imaging (CI) systems are expected to have a major role for next generation multifunctional cameras. The purpose of this work is to quantify the performance of theses CI systems in a standardized manner. Due to the diversity of CI system designs that are available today or proposed in the near future, significant challenges in modeling and calculating a standardized detection signal-to-noise ratio (SNR) to measure the performance of these systems. In this paper, we developed a path forward for a standardized detectivity metric for CI systems. The detectivity metric is designed to evaluate the performance of a CI system searching for a specific known target or signal of interest, and is defined as the optimal linear matched filter SNR, similar to the Hotelling SNR, calculated in computational space with special considerations for standardization. Therefore, the detectivity metric is designed to be flexible, in order to handle various types of CI systems and specific targets, while keeping the complexity and assumptions of the systems to a minimum.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Owen, D; Anderson, C; Mayo, C
Purpose: To extend the functionality of a commercial treatment planning system (TPS) to support (i) direct use of quantitative image-based metrics within treatment plan optimization and (ii) evaluation of dose-functional volume relationships to assist in functional image adaptive radiotherapy. Methods: A script was written that interfaces with a commercial TPS via an Application Programming Interface (API). The script executes a program that performs dose-functional volume analyses. Written in C#, the script reads the dose grid and correlates it with image data on a voxel-by-voxel basis through API extensions that can access registration transforms. A user interface was designed through WinFormsmore » to input parameters and display results. To test the performance of this program, image- and dose-based metrics computed from perfusion SPECT images aligned to the treatment planning CT were generated, validated, and compared. Results: The integration of image analysis information was successfully implemented as a plug-in to a commercial TPS. Perfusion SPECT images were used to validate the calculation and display of image-based metrics as well as dose-intensity metrics and histograms for defined structures on the treatment planning CT. Various biological dose correction models, custom image-based metrics, dose-intensity computations, and dose-intensity histograms were applied to analyze the image-dose profile. Conclusion: It is possible to add image analysis features to commercial TPSs through custom scripting applications. A tool was developed to enable the evaluation of image-intensity-based metrics in the context of functional targeting and avoidance. In addition to providing dose-intensity metrics and histograms that can be easily extracted from a plan database and correlated with outcomes, the system can also be extended to a plug-in optimization system, which can directly use the computed metrics for optimization of post-treatment tumor or normal tissue response models. Supported by NIH - P01 - CA059827.« less
Iqbal, Sahar; Mustansar, Tazeen
2017-03-01
Sigma is a metric that quantifies the performance of a process as a rate of Defects-Per-Million opportunities. In clinical laboratories, sigma metric analysis is used to assess the performance of laboratory process system. Sigma metric is also used as a quality management strategy for a laboratory process to improve the quality by addressing the errors after identification. The aim of this study is to evaluate the errors in quality control of analytical phase of laboratory system by sigma metric. For this purpose sigma metric analysis was done for analytes using the internal and external quality control as quality indicators. Results of sigma metric analysis were used to identify the gaps and need for modification in the strategy of laboratory quality control procedure. Sigma metric was calculated for quality control program of ten clinical chemistry analytes including glucose, chloride, cholesterol, triglyceride, HDL, albumin, direct bilirubin, total bilirubin, protein and creatinine, at two control levels. To calculate the sigma metric imprecision and bias was calculated with internal and external quality control data, respectively. The minimum acceptable performance was considered as 3 sigma. Westgard sigma rules were applied to customize the quality control procedure. Sigma level was found acceptable (≥3) for glucose (L2), cholesterol, triglyceride, HDL, direct bilirubin and creatinine at both levels of control. For rest of the analytes sigma metric was found <3. The lowest value for sigma was found for chloride (1.1) at L2. The highest value of sigma was found for creatinine (10.1) at L3. HDL was found with the highest sigma values at both control levels (8.8 and 8.0 at L2 and L3, respectively). We conclude that analytes with the sigma value <3 are required strict monitoring and modification in quality control procedure. In this study application of sigma rules provided us the practical solution for improved and focused design of QC procedure.
Gibbons, Theodore R; Mount, Stephen M; Cooper, Endymion D; Delwiche, Charles F
2015-07-10
Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here. All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios. The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory.
Kumar, B Vinodh; Mohan, Thuthi
2018-01-01
Six Sigma is one of the most popular quality management system tools employed for process improvement. The Six Sigma methods are usually applied when the outcome of the process can be measured. This study was done to assess the performance of individual biochemical parameters on a Sigma Scale by calculating the sigma metrics for individual parameters and to follow the Westgard guidelines for appropriate Westgard rules and levels of internal quality control (IQC) that needs to be processed to improve target analyte performance based on the sigma metrics. This is a retrospective study, and data required for the study were extracted between July 2015 and June 2016 from a Secondary Care Government Hospital, Chennai. The data obtained for the study are IQC - coefficient of variation percentage and External Quality Assurance Scheme (EQAS) - Bias% for 16 biochemical parameters. For the level 1 IQC, four analytes (alkaline phosphatase, magnesium, triglyceride, and high-density lipoprotein-cholesterol) showed an ideal performance of ≥6 sigma level, five analytes (urea, total bilirubin, albumin, cholesterol, and potassium) showed an average performance of <3 sigma level and for level 2 IQCs, same four analytes of level 1 showed a performance of ≥6 sigma level, and four analytes (urea, albumin, cholesterol, and potassium) showed an average performance of <3 sigma level. For all analytes <6 sigma level, the quality goal index (QGI) was <0.8 indicating the area requiring improvement to be imprecision except cholesterol whose QGI >1.2 indicated inaccuracy. This study shows that sigma metrics is a good quality tool to assess the analytical performance of a clinical chemistry laboratory. Thus, sigma metric analysis provides a benchmark for the laboratory to design a protocol for IQC, address poor assay performance, and assess the efficiency of existing laboratory processes.
Rmax: A systematic approach to evaluate instrument sort performance using center stream catch☆
Riddell, Andrew; Gardner, Rui; Perez-Gonzalez, Alexis; Lopes, Telma; Martinez, Lola
2015-01-01
Sorting performance can be evaluated with regard to Purity, Yield and/or Recovery of the sorted fraction. Purity is a check on the quality of the sample and the sort decisions made by the instrument. Recovery and Yield definitions vary with some authors regarding both as how efficient the instrument is at sorting the target particles from the original sample, others distinguishing Recovery from Yield, where the former is used to describe the accuracy of the instrument’s sort count. Yield and Recovery are often neglected, mostly due to difficulties in their measurement. Purity of the sort product is often cited alone but is not sufficient to evaluate sorting performance. All of these three performance metrics require re-sampling of the sorted fraction. But, unlike Purity, calculating Yield and/or Recovery calls for the absolute counting of particles in the sorted fraction, which may not be feasible, particularly when dealing with rare populations and precious samples. In addition, the counting process itself involves large errors. Here we describe a new metric for evaluating instrument sort Recovery, defined as the number of particles sorted relative to the number of original particles to be sorted. This calculation requires only measuring the ratios of target and non-target populations in the original pre-sort sample and in the waste stream or center stream catch (CSC), avoiding re-sampling the sorted fraction and absolute counting. We called this new metric Rmax, since it corresponds to the maximum expected Recovery for a particular set of instrument parameters. Rmax is ideal to evaluate and troubleshoot the optimum drop-charge delay of the sorter, or any instrument related failures that will affect sort performance. It can be used as a daily quality control check but can be particularly useful to assess instrument performance before single-cell sorting experiments. Because we do not perturb the sort fraction we can calculate Rmax during the sort process, being especially valuable to check instrument performance during rare population sorts. PMID:25747337
NASA Astrophysics Data System (ADS)
Senay, G. B.; Budde, M. E.; Allen, R. G.; Verdin, J. P.
2008-12-01
Evapotranspiration (ET) is an important component of the hydrologic budget because it expresses the exchange of mass and energy between the soil-water-vegetation system and the atmosphere. Since direct measurement of ET is difficult, various modeling methods are used to estimate actual ET (ETa). Generally, the choice of method for ET estimation depends on the objective of the study and is further limited by the availability of data and desired accuracy of the ET estimate. Operational monitoring of crop performance requires processing large data sets and a quick response time. A Simplified Surface Energy Balance (SSEB) model was developed by the U.S. Geological Survey's Famine Early Warning Systems Network to estimate irrigation water use in remote places of the world. In this study, we evaluated the performance of the SSEB model with the METRIC (Mapping Evapotranspiration at high Resolution and with Internalized Calibration) model that has been evaluated by several researchers using the Lysimeter data. The METRIC model has been proven to provide reliable ET estimates in different regions of the world. Reference ET fractions of both models (ETrF of METRIC vs. ETf of SSEB) were generated and compared using individual Landsat thermal images collected from 2000 though 2005 in Idaho, New Mexico, and California. In addition, the models were compared using monthly and seasonal total ETa estimates. The SSEB model reproduced both the spatial and temporal variability exhibited by METRIC on land surfaces, explaining up to 80 percent of the spatial variability. However, the ETa estimates over water bodies were systematically higher in the SSEB output, which could be improved by using a correction coefficient to take into account the absorption of solar energy by deeper water layers that has little contribution to the ET process. This study demonstrated the usefulness of the SSEB method for large-scale agro-hydrologic applications for operational monitoring and assessing of crop performance and regional water balance dynamics.
On the new metrics for IMRT QA verification.
Garcia-Romero, Alejandro; Hernandez-Vitoria, Araceli; Millan-Cebrian, Esther; Alba-Escorihuela, Veronica; Serrano-Zabaleta, Sonia; Ortega-Pardina, Pablo
2016-11-01
The aim of this work is to search for new metrics that could give more reliable acceptance/rejection criteria on the IMRT verification process and to offer solutions to the discrepancies found among different conventional metrics. Therefore, besides conventional metrics, new ones are proposed and evaluated with new tools to find correlations among them. These new metrics are based on the processing of the dose-volume histogram information, evaluating the absorbed dose differences, the dose constraint fulfillment, or modified biomathematical treatment outcome models such as tumor control probability (TCP) and normal tissue complication probability (NTCP). An additional purpose is to establish whether the new metrics yield the same acceptance/rejection plan distribution as the conventional ones. Fifty eight treatment plans concerning several patient locations are analyzed. All of them were verified prior to the treatment, using conventional metrics, and retrospectively after the treatment with the new metrics. These new metrics include the definition of three continuous functions, based on dose-volume histograms resulting from measurements evaluated with a reconstructed dose system and also with a Monte Carlo redundant calculation. The 3D gamma function for every volume of interest is also calculated. The information is also processed to obtain ΔTCP or ΔNTCP for the considered volumes of interest. These biomathematical treatment outcome models have been modified to increase their sensitivity to dose changes. A robustness index from a radiobiological point of view is defined to classify plans in robustness against dose changes. Dose difference metrics can be condensed in a single parameter: the dose difference global function, with an optimal cutoff that can be determined from a receiver operating characteristics (ROC) analysis of the metric. It is not always possible to correlate differences in biomathematical treatment outcome models with dose difference metrics. This is due to the fact that the dose constraint is often far from the dose that has an actual impact on the radiobiological model, and therefore, biomathematical treatment outcome models are insensitive to big dose differences between the verification system and the treatment planning system. As an alternative, the use of modified radiobiological models which provides a better correlation is proposed. In any case, it is better to choose robust plans from a radiobiological point of view. The robustness index defined in this work is a good predictor of the plan rejection probability according to metrics derived from modified radiobiological models. The global 3D gamma-based metric calculated for each plan volume shows a good correlation with the dose difference metrics and presents a good performance in the acceptance/rejection process. Some discrepancies have been found in dose reconstruction depending on the algorithm employed. Significant and unavoidable discrepancies were found between the conventional metrics and the new ones. The dose difference global function and the 3D gamma for each plan volume are good classifiers regarding dose difference metrics. ROC analysis is useful to evaluate the predictive power of the new metrics. The correlation between biomathematical treatment outcome models and the dose difference-based metrics is enhanced by using modified TCP and NTCP functions that take into account the dose constraints for each plan. The robustness index is useful to evaluate if a plan is likely to be rejected. Conventional verification should be replaced by the new metrics, which are clinically more relevant.
Blind Source Parameters for Performance Evaluation of Despeckling Filters.
Biradar, Nagashettappa; Dewal, M L; Rohit, ManojKumar; Gowre, Sanjaykumar; Gundge, Yogesh
2016-01-01
The speckle noise is inherent to transthoracic echocardiographic images. A standard noise-free reference echocardiographic image does not exist. The evaluation of filters based on the traditional parameters such as peak signal-to-noise ratio, mean square error, and structural similarity index may not reflect the true filter performance on echocardiographic images. Therefore, the performance of despeckling can be evaluated using blind assessment metrics like the speckle suppression index, speckle suppression and mean preservation index (SMPI), and beta metric. The need for noise-free reference image is overcome using these three parameters. This paper presents a comprehensive analysis and evaluation of eleven types of despeckling filters for echocardiographic images in terms of blind and traditional performance parameters along with clinical validation. The noise is effectively suppressed using the logarithmic neighborhood shrinkage (NeighShrink) embedded with Stein's unbiased risk estimation (SURE). The SMPI is three times more effective compared to the wavelet based generalized likelihood estimation approach. The quantitative evaluation and clinical validation reveal that the filters such as the nonlocal mean, posterior sampling based Bayesian estimation, hybrid median, and probabilistic patch based filters are acceptable whereas median, anisotropic diffusion, fuzzy, and Ripplet nonlinear approximation filters have limited applications for echocardiographic images.
Blind Source Parameters for Performance Evaluation of Despeckling Filters
Biradar, Nagashettappa; Dewal, M. L.; Rohit, ManojKumar; Gowre, Sanjaykumar; Gundge, Yogesh
2016-01-01
The speckle noise is inherent to transthoracic echocardiographic images. A standard noise-free reference echocardiographic image does not exist. The evaluation of filters based on the traditional parameters such as peak signal-to-noise ratio, mean square error, and structural similarity index may not reflect the true filter performance on echocardiographic images. Therefore, the performance of despeckling can be evaluated using blind assessment metrics like the speckle suppression index, speckle suppression and mean preservation index (SMPI), and beta metric. The need for noise-free reference image is overcome using these three parameters. This paper presents a comprehensive analysis and evaluation of eleven types of despeckling filters for echocardiographic images in terms of blind and traditional performance parameters along with clinical validation. The noise is effectively suppressed using the logarithmic neighborhood shrinkage (NeighShrink) embedded with Stein's unbiased risk estimation (SURE). The SMPI is three times more effective compared to the wavelet based generalized likelihood estimation approach. The quantitative evaluation and clinical validation reveal that the filters such as the nonlocal mean, posterior sampling based Bayesian estimation, hybrid median, and probabilistic patch based filters are acceptable whereas median, anisotropic diffusion, fuzzy, and Ripplet nonlinear approximation filters have limited applications for echocardiographic images. PMID:27298618
The performance of disk arrays in shared-memory database machines
NASA Technical Reports Server (NTRS)
Katz, Randy H.; Hong, Wei
1993-01-01
In this paper, we examine how disk arrays and shared memory multiprocessors lead to an effective method for constructing database machines for general-purpose complex query processing. We show that disk arrays can lead to cost-effective storage systems if they are configured from suitably small formfactor disk drives. We introduce the storage system metric data temperature as a way to evaluate how well a disk configuration can sustain its workload, and we show that disk arrays can sustain the same data temperature as a more expensive mirrored-disk configuration. We use the metric to evaluate the performance of disk arrays in XPRS, an operational shared-memory multiprocessor database system being developed at the University of California, Berkeley.
Investigating emergency room service quality using lean manufacturing.
Abdelhadi, Abdelhakim
2015-01-01
The purpose of this paper is to investigate a lean manufacturing metric called Takt time as a benchmark evaluation measure to evaluate a public hospital's service quality. Lean manufacturing is an established managerial philosophy with a proven track record in industry. A lean metric called Takt time is applied as a measure to compare the relative efficiency between two emergency departments (EDs) belonging to the same public hospital. Outcomes guide managers to improve patient services and increase hospital performances. The patient treatment lead time within the hospital's two EDs (one department serves male and the other female patients) are the study's focus. A lean metric called Takt time is used to find the service's relative efficiency. Findings show that the lean manufacturing metric called Takt time can be used as an effective way to measure service efficiency by analyzing relative efficiency and identifies bottlenecks in different departments providing the same services. The paper presents a new procedure to compare relative efficiency between two EDs. It can be applied to any healthcare facility.
Usmani, Adnan Mahmmood; Meo, Sultan Ayoub
2011-01-01
Scientific achievement by publishing a scientific manuscript in a peer reviewed biomedical journal is an important ingredient of research along with a career-enhancing advantages and significant amount of personal satisfaction. The road to evaluate science (research, scientific publications) among scientists often seems complicated. Scientist's career is generally summarized by the number of publications / citations, teaching the undergraduate, graduate and post-doctoral students, writing or reviewing grants and papers, preparing for and organizing meetings, participating in collaborations and conferences, advising colleagues, and serving on editorial boards of scientific journals. Scientists have been sizing up their colleagues since science began. Scientometricians have invented a wide variety of algorithms called science metrics to evaluate science. Many of the science metrics are even unknown to the everyday scientist. Unfortunately, there is no all-in-one metric. Each of them has its own strength, limitation and scope. Some of them are mistakenly applied to evaluate individuals, and each is surrounded by a cloud of variants designed to help them apply across different scientific fields or different career stages [1]. A suitable indicator should be chosen by considering the purpose of the evaluation, and how the results will be used. Scientific Evaluation assists us in: computing the research performance, comparison with peers, forecasting the growth, identifying the excellence in research, citation ranking, finding the influence of research, measuring the productivity, making policy decisions, securing funds for research and spotting trends. Key concepts in science metrics are output and impact. Evaluation of science is traditionally expressed in terms of citation counts. Although most of the science metrics are based on citation counts but two most commonly used are impact factor [2] and h-index [3].
Carlson, DA; Omari, T; Lin, Z; Rommel, N; Starkey, K; Kahrilas, PJ; Tack, J; Pandolfino, JE
2016-01-01
Background High-resolution impedance manometry (HRIM) allows evaluation of esophageal bolus retention, flow, and pressurization. We aimed to perform a collaborative analysis of HRIM metrics to evaluate patients with non-obstructive dysphagia. Methods 14 asymptomatic controls (58% female; ages 20 – 50) and 41 patients (63% female; ages 24 – 82), 18 evaluated for dysphagia, 23 for reflux (‘non-dysphagia patients’), with esophageal motility diagnoses of normal motility or ineffective esophageal motility were evaluated with HRIM and a global dysphagia symptom score (Brief Esophageal Dysphagia Questionnaire). HRIM were analyzed to assess Chicago Classification metrics, automated pressure-flow metrics, the esophageal impedance integral (EII) ratio, and the bolus flow time (BFT). Key Results Significant symptom-metric correlations were detected only with basal EGJ pressure, EII ratio, and BFT. The EII ratio, BFT, and impedance ratio differed between controls and dysphagia patients, while the EII ratio in the upright position was the only measure that differentiated dysphagia from non-dysphagia patients. Conclusions & Inferences The EII ratio and BFT appear to offer an improved diagnostic evaluation in patients with non-obstructive dysphagia without a major esophageal motility disorder. Bolus retention as measured with the EII ratio appears to carry the strongest association with dysphagia, and thus may aid in the characterization of symptomatic patients with otherwise normal manometry. PMID:27647522
Carlson, D A; Omari, T; Lin, Z; Rommel, N; Starkey, K; Kahrilas, P J; Tack, J; Pandolfino, J E
2017-03-01
High-resolution impedance manometry (HRIM) allows evaluation of esophageal bolus retention, flow, and pressurization. We aimed to perform a collaborative analysis of HRIM metrics to evaluate patients with non-obstructive dysphagia. Fourteen asymptomatic controls (58% female; ages 20-50) and 41 patients (63% female; ages 24-82), 18 evaluated for dysphagia and 23 for reflux (non-dysphagia patients), with esophageal motility diagnoses of normal motility or ineffective esophageal motility, were evaluated with HRIM and a global dysphagia symptom score (Brief Esophageal Dysphagia Questionnaire). HRIM was analyzed to assess Chicago Classification metrics, automated pressure-flow metrics, the esophageal impedance integral (EII) ratio, and the bolus flow time (BFT). Significant symptom-metric correlations were detected only with basal EGJ pressure, EII ratio, and BFT. The EII ratio, BFT, and impedance ratio differed between controls and dysphagia patients, while the EII ratio in the upright position was the only measure that differentiated dysphagia from non-dysphagia patients. The EII ratio and BFT appear to offer an improved diagnostic evaluation in patients with non-obstructive dysphagia without a major esophageal motility disorder. Bolus retention as measured with the EII ratio appears to carry the strongest association with dysphagia, and thus may aid in the characterization of symptomatic patients with otherwise normal manometry. © 2016 John Wiley & Sons Ltd.
Towards New Metrics for High-Performance Computing Resilience
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hukerikar, Saurabh; Ashraf, Rizwan A; Engelmann, Christian
Ensuring the reliability of applications is becoming an increasingly important challenge as high-performance computing (HPC) systems experience an ever-growing number of faults, errors and failures. While the HPC community has made substantial progress in developing various resilience solutions, it continues to rely on platform-based metrics to quantify application resiliency improvements. The resilience of an HPC application is concerned with the reliability of the application outcome as well as the fault handling efficiency. To understand the scope of impact, effective coverage and performance efficiency of existing and emerging resilience solutions, there is a need for new metrics. In this paper, wemore » develop new ways to quantify resilience that consider both the reliability and the performance characteristics of the solutions from the perspective of HPC applications. As HPC systems continue to evolve in terms of scale and complexity, it is expected that applications will experience various types of faults, errors and failures, which will require applications to apply multiple resilience solutions across the system stack. The proposed metrics are intended to be useful for understanding the combined impact of these solutions on an application's ability to produce correct results and to evaluate their overall impact on an application's performance in the presence of various modes of faults.« less
Jennifer, Smith; Purewal, Birinder Praneet; Macpherson, Alison; Pike, Ian
2018-05-01
Despite legal protections for young workers in Canada, youth aged 15-24 are at high risk of traumatic occupational injury. While many injury prevention initiatives targeting young workers exist, the challenge faced by youth advocates and employers is deciding what aspect(s) of prevention will be the most effective focus for their efforts. A review of the academic and grey literatures was undertaken to compile the metrics-both the indicators being evaluated and the methods of measurement-commonly used to assess injury prevention programs for young workers. Metrics are standards of measurement through which efficiency, performance, progress, or quality of a plan, process, or product can be assessed. A PICO framework was used to develop search terms. Medline, PubMed, OVID, EMBASE, CCOHS, PsychINFO, CINAHL, NIOSHTIC, Google Scholar and the grey literature were searched for articles in English, published between 1975-2015. Two independent reviewers screened the resulting list and categorized the metrics in three domains of injury prevention: Education, Environment and Enforcement. Of 174 acquired articles meeting the inclusion criteria, 21 both described and assessed an intervention. Half were educational in nature (N=11). Commonly assessed metrics included: knowledge, perceptions, self-reported behaviours or intentions, hazardous exposures, injury claims, and injury counts. One study outlined a method for developing metrics to predict injury rates. Metrics specific to the evaluation of young worker injury prevention programs are needed, as current metrics are insufficient to predict reduced injuries following program implementation. One study, which the review brought to light, could be an appropriate model for future research to develop valid leading metrics specific to young workers, and then apply these metrics to injury prevention programs for youth.
Weber, Charles N; Poff, Jason A; Lev-Toaff, Anna S; Levine, Marc S; Zafar, Hanna M
To explore quantitative differences between genders in morphologic colonic metrics and determine metric reproducibility. Quantitative colonic metrics from 20 male and 20 female CTC datasets were evaluated twice by two readers; all exams were performed after incomplete optical colonoscopy. Intra-/inter-reader reliability was measured with intraclass correlation coefficient (ICC) and concordance correlation coefficient (CCC). Women had overall decreased colonic volume, increased tortuosity and compactness and lower sigmoid apex height on CTC compared to men (p<0.0001,all). Quantitative measurements in colonic metrics were highly reproducible (ICC=0.9989 and 0.9970; CCC=0.9945). Quantitative morphologic differences between genders can be reproducibility measured. Copyright © 2017 Elsevier Inc. All rights reserved.
Application of Climate Impact Metrics to Rotorcraft Design
NASA Technical Reports Server (NTRS)
Russell, Carl; Johnson, Wayne
2013-01-01
Multiple metrics are applied to the design of large civil rotorcraft, integrating minimum cost and minimum environmental impact. The design mission is passenger transport with similar range and capacity to a regional jet. Separate aircraft designs are generated for minimum empty weight, fuel burn, and environmental impact. A metric specifically developed for the design of aircraft is employed to evaluate emissions. The designs are generated using the NDARC rotorcraft sizing code, and rotor analysis is performed with the CAMRAD II aeromechanics code. Design and mission parameters such as wing loading, disk loading, and cruise altitude are varied to minimize both cost and environmental impact metrics. This paper presents the results of these parametric sweeps as well as the final aircraft designs.
Application of Climate Impact Metrics to Civil Tiltrotor Design
NASA Technical Reports Server (NTRS)
Russell, Carl R.; Johnson, Wayne
2013-01-01
Multiple metrics are applied to the design of a large civil tiltrotor, integrating minimum cost and minimum environmental impact. The design mission is passenger transport with similar range and capacity to a regional jet. Separate aircraft designs are generated for minimum empty weight, fuel burn, and environmental impact. A metric specifically developed for the design of aircraft is employed to evaluate emissions. The designs are generated using the NDARC rotorcraft sizing code, and rotor analysis is performed with the CAMRAD II aeromechanics code. Design and mission parameters such as wing loading, disk loading, and cruise altitude are varied to minimize both cost and environmental impact metrics. This paper presents the results of these parametric sweeps as well as the final aircraft designs.
Assessing deep and shallow learning methods for quantitative prediction of acute chemical toxicity.
Liu, Ruifeng; Madore, Michael; Glover, Kyle P; Feasel, Michael G; Wallqvist, Anders
2018-05-02
Animal-based methods for assessing chemical toxicity are struggling to meet testing demands. In silico approaches, including machine-learning methods, are promising alternatives. Recently, deep neural networks (DNNs) were evaluated and reported to outperform other machine-learning methods for quantitative structure-activity relationship modeling of molecular properties. However, most of the reported performance evaluations relied on global performance metrics, such as the root mean squared error (RMSE) between the predicted and experimental values of all samples, without considering the impact of sample distribution across the activity spectrum. Here, we carried out an in-depth analysis of DNN performance for quantitative prediction of acute chemical toxicity using several datasets. We found that the overall performance of DNN models on datasets of up to 30,000 compounds was similar to that of random forest (RF) models, as measured by the RMSE and correlation coefficients between the predicted and experimental results. However, our detailed analyses demonstrated that global performance metrics are inappropriate for datasets with a highly uneven sample distribution, because they show a strong bias for the most populous compounds along the toxicity spectrum. For highly toxic compounds, DNN and RF models trained on all samples performed much worse than the global performance metrics indicated. Surprisingly, our variable nearest neighbor method, which utilizes only structurally similar compounds to make predictions, performed reasonably well, suggesting that information of close near neighbors in the training sets is a key determinant of acute toxicity predictions.
Diffusion kurtosis imaging can efficiently assess the glioma grade and cellular proliferation.
Jiang, Rifeng; Jiang, Jingjing; Zhao, Lingyun; Zhang, Jiaxuan; Zhang, Shun; Yao, Yihao; Yang, Shiqi; Shi, Jingjing; Shen, Nanxi; Su, Changliang; Zhang, Ju; Zhu, Wenzhen
2015-12-08
Conventional diffusion imaging techniques are not sufficiently accurate for evaluating glioma grade and cellular proliferation, which are critical for guiding glioma treatment. Diffusion kurtosis imaging (DKI), an advanced non-Gaussian diffusion imaging technique, has shown potential in grading glioma; however, its applications in this tumor have not been fully elucidated. In this study, DKI and diffusion weighted imaging (DWI) were performed on 74 consecutive patients with histopathologically confirmed glioma. The kurtosis and conventional diffusion metric values of the tumor were semi-automatically obtained. The relationships of these metrics with the glioma grade and Ki-67 expression were evaluated. The diagnostic efficiency of these metrics in grading was further compared. It was demonstrated that compared with the conventional diffusion metrics, the kurtosis metrics were more promising imaging markers in distinguishing high-grade from low-grade gliomas and distinguishing among grade II, III and IV gliomas; the kurtosis metrics also showed great potential in the prediction of Ki-67 expression. To our best knowledge, we are the first to reveal the ability of DKI to assess the cellular proliferation of gliomas, and to employ the semi-automatic method for the accurate measurement of gliomas. These results could have a significant impact on the diagnosis and subsequent therapy of glioma.
Collected notes from the Benchmarks and Metrics Workshop
NASA Technical Reports Server (NTRS)
Drummond, Mark E.; Kaelbling, Leslie P.; Rosenschein, Stanley J.
1991-01-01
In recent years there has been a proliferation of proposals in the artificial intelligence (AI) literature for integrated agent architectures. Each architecture offers an approach to the general problem of constructing an integrated agent. Unfortunately, the ways in which one architecture might be considered better than another are not always clear. There has been a growing realization that many of the positive and negative aspects of an architecture become apparent only when experimental evaluation is performed and that to progress as a discipline, we must develop rigorous experimental methods. In addition to the intrinsic intellectual interest of experimentation, rigorous performance evaluation of systems is also a crucial practical concern to our research sponsors. DARPA, NASA, and AFOSR (among others) are actively searching for better ways of experimentally evaluating alternative approaches to building intelligent agents. One tool for experimental evaluation involves testing systems on benchmark tasks in order to assess their relative performance. As part of a joint DARPA and NASA funded project, NASA-Ames and Teleos Research are carrying out a research effort to establish a set of benchmark tasks and evaluation metrics by which the performance of agent architectures may be determined. As part of this project, we held a workshop on Benchmarks and Metrics at the NASA Ames Research Center on June 25, 1990. The objective of the workshop was to foster early discussion on this important topic. We did not achieve a consensus, nor did we expect to. Collected here is some of the information that was exchanged at the workshop. Given here is an outline of the workshop, a list of the participants, notes taken on the white-board during open discussions, position papers/notes from some participants, and copies of slides used in the presentations.
Methodology, Methods, and Metrics for Testing and Evaluating Augmented Cognition Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Greitzer, Frank L.
The augmented cognition research community seeks cognitive neuroscience-based solutions to improve warfighter performance by applying and managing mitigation strategies to reduce workload and improve the throughput and quality of decisions. The focus of augmented cognition mitigation research is to define, demonstrate, and exploit neuroscience and behavioral measures that support inferences about the warfighter’s cognitive state that prescribe the nature and timing of mitigation. A research challenge is to develop valid evaluation methodologies, metrics and measures to assess the impact of augmented cognition mitigations. Two considerations are external validity, which is the extent to which the results apply to operational contexts;more » and internal validity, which reflects the reliability of performance measures and the conclusions based on analysis of results. The scientific rigor of the research methodology employed in conducting empirical investigations largely affects the validity of the findings. External validity requirements also compel us to demonstrate operational significance of mitigations. Thus it is important to demonstrate effectiveness of mitigations under specific conditions. This chapter reviews some cognitive science and methodological considerations in designing augmented cognition research studies and associated human performance metrics and analysis methods to assess the impact of augmented cognition mitigations.« less
Goodman, Corey W.; Major, Heather J.; Walls, William D.; Sheffield, Val C.; Casavant, Thomas L.; Darbro, Benjamin W.
2016-01-01
Chromosomal microarrays (CMAs) are routinely used in both research and clinical laboratories; yet, little attention has been given to the estimation of genome-wide true and false negatives during the assessment of these assays and how such information could be used to calibrate various algorithmic metrics to improve performance. Low-throughput, locus-specific methods such as fluorescence in situ hybridization (FISH), quantitative PCR (qPCR), or multiplex ligation-dependent probe amplification (MLPA) preclude rigorous calibration of various metrics used by copy number variant (CNV) detection algorithms. To aid this task, we have established a comparative methodology, CNV-ROC, which is capable of performing a high throughput, low cost, analysis of CMAs that takes into consideration genome-wide true and false negatives. CNV-ROC uses a higher resolution microarray to confirm calls from a lower resolution microarray and provides for a true measure of genome-wide performance metrics at the resolution offered by microarray testing. CNV-ROC also provides for a very precise comparison of CNV calls between two microarray platforms without the need to establish an arbitrary degree of overlap. Comparison of CNVs across microarrays is done on a per-probe basis and receiver operator characteristic (ROC) analysis is used to calibrate algorithmic metrics, such as log2 ratio threshold, to enhance CNV calling performance. CNV-ROC addresses a critical and consistently overlooked aspect of analytical assessments of genome-wide techniques like CMAs which is the measurement and use of genome-wide true and false negative data for the calculation of performance metrics and comparison of CNV profiles between different microarray experiments. PMID:25595567
Distributed Space Mission Design for Earth Observation Using Model-Based Performance Evaluation
NASA Technical Reports Server (NTRS)
Nag, Sreeja; LeMoigne-Stewart, Jacqueline; Cervantes, Ben; DeWeck, Oliver
2015-01-01
Distributed Space Missions (DSMs) are gaining momentum in their application to earth observation missions owing to their unique ability to increase observation sampling in multiple dimensions. DSM design is a complex problem with many design variables, multiple objectives determining performance and cost and emergent, often unexpected, behaviors. There are very few open-access tools available to explore the tradespace of variables, minimize cost and maximize performance for pre-defined science goals, and therefore select the most optimal design. This paper presents a software tool that can multiple DSM architectures based on pre-defined design variable ranges and size those architectures in terms of predefined science and cost metrics. The tool will help a user select Pareto optimal DSM designs based on design of experiments techniques. The tool will be applied to some earth observation examples to demonstrate its applicability in making some key decisions between different performance metrics and cost metrics early in the design lifecycle.
Energy dashboard for real-time evaluation of a heat pump assisted solar thermal system
NASA Astrophysics Data System (ADS)
Lotz, David Allen
The emergence of net-zero energy buildings, buildings that generate at least as much energy as they consume, has lead to greater use of renewable energy sources such as solar thermal energy. One example is a heat pump assisted solar thermal system, which uses solar thermal collectors with an electrical heat pump backup to supply space heating and domestic hot water. The complexity of such a system can be somewhat problematic for monitoring and maintaining a high level of performance. Therefore, an energy dashboard was developed to provide comprehensive and user friendly performance metrics for a solar heat pump system. Once developed, the energy dashboard was tested over a two-week period in order to determine the functionality of the dashboard program as well as the performance of the heating system itself. The results showed the importance of a user friendly display and how each metric could be used to better maintain and evaluate an energy system. In particular, Energy Factor (EF), which is the ratio of output energy (collected energy) to input energy (consumed energy), was a key metric for summarizing the performance of the heating system. Furthermore, the average EF of the solar heat pump system was 2.29, indicating an efficiency significantly higher than traditional electrical heating systems.
Sensitivity of the lane change test as a measure of in-vehicle system demand.
Young, Kristie L; Lenné, Michael G; Williamson, Amy R
2011-05-01
The Lane Change Test (LCT) is one of the growing number of methods developed to quantify driving performance degradation brought about by the use of in-vehicle devices. Beyond its validity and reliability, for such a test to be of practical use, it must also be sensitive to the varied demands of individual tasks. The current study evaluated the ability of several recent LCT lateral control and event detection parameters to discriminate between visual-manual and cognitive surrogate In-Vehicle Information System tasks with different levels of demand. Twenty-seven participants (mean age 24.4 years) completed a PC version of the LCT while performing visual search and math problem solving tasks. A number of the lateral control metrics were found to be sensitive to task differences, but the event detection metrics were less able to discriminate between tasks. The mean deviation and lane excursion measures were able to distinguish between the visual and cognitive tasks, but were less sensitive to the different levels of task demand. The other LCT metrics examined were less sensitive to task differences. A major factor influencing the sensitivity of at least some of the LCT metrics could be the type of lane change instructions given to participants. The provision of clear and explicit lane change instructions and further refinement of its metrics will be essential for increasing the utility of the LCT as an evaluation tool. Copyright © 2010 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Li, Guang; Greene, Travis C; Nishino, Thomas K; Willis, Charles E
2016-09-08
The purpose of this study was to evaluate several of the standardized image quality metrics proposed by the American Association of Physics in Medicine (AAPM) Task Group 150. The task group suggested region-of-interest (ROI)-based techniques to measure nonuniformity, minimum signal-to-noise ratio (SNR), number of anomalous pixels, and modulation transfer function (MTF). This study evaluated the effects of ROI size and layout on the image metrics by using four different ROI sets, assessed result uncertainty by repeating measurements, and compared results with two commercially available quality control tools, namely the Carestream DIRECTVIEW Total Quality Tool (TQT) and the GE Healthcare Quality Assurance Process (QAP). Seven Carestream DRX-1C (CsI) detectors on mobile DR systems and four GE FlashPad detectors in radiographic rooms were tested. Images were analyzed using MATLAB software that had been previously validated and reported. Our values for signal and SNR nonuniformity and MTF agree with values published by other investigators. Our results show that ROI size affects nonuniformity and minimum SNR measurements, but not detection of anomalous pixels. Exposure geometry affects all tested image metrics except for the MTF. TG-150 metrics in general agree with the TQT, but agree with the QAP only for local and global signal nonuniformity. The difference in SNR nonuniformity and MTF values between the TG-150 and QAP may be explained by differences in the calculation of noise and acquisition beam quality, respectively. TG-150's SNR nonuniformity metrics are also more sensitive to detector nonuniformity compared to the QAP. Our results suggest that fixed ROI size should be used for consistency because nonuniformity metrics depend on ROI size. Ideally, detector tests should be performed at the exact calibration position. If not feasible, a baseline should be established from the mean of several repeated measurements. Our study indicates that the TG-150 tests can be used as an independent standardized procedure for detector performance assessment. © 2016 The Authors.
Greene, Travis C.; Nishino, Thomas K.; Willis, Charles E.
2016-01-01
The purpose of this study was to evaluate several of the standardized image quality metrics proposed by the American Association of Physics in Medicine (AAPM) Task Group 150. The task group suggested region‐of‐interest (ROI)‐based techniques to measure nonuniformity, minimum signal‐to‐noise ratio (SNR), number of anomalous pixels, and modulation transfer function (MTF). This study evaluated the effects of ROI size and layout on the image metrics by using four different ROI sets, assessed result uncertainty by repeating measurements, and compared results with two commercially available quality control tools, namely the Carestream DIRECTVIEW Total Quality Tool (TQT) and the GE Healthcare Quality Assurance Process (QAP). Seven Carestream DRX‐1C (CsI) detectors on mobile DR systems and four GE FlashPad detectors in radiographic rooms were tested. Images were analyzed using MATLAB software that had been previously validated and reported. Our values for signal and SNR nonuniformity and MTF agree with values published by other investigators. Our results show that ROI size affects nonuniformity and minimum SNR measurements, but not detection of anomalous pixels. Exposure geometry affects all tested image metrics except for the MTF. TG‐150 metrics in general agree with the TQT, but agree with the QAP only for local and global signal nonuniformity. The difference in SNR nonuniformity and MTF values between the TG‐150 and QAP may be explained by differences in the calculation of noise and acquisition beam quality, respectively. TG‐150's SNR nonuniformity metrics are also more sensitive to detector nonuniformity compared to the QAP. Our results suggest that fixed ROI size should be used for consistency because nonuniformity metrics depend on ROI size. Ideally, detector tests should be performed at the exact calibration position. If not feasible, a baseline should be established from the mean of several repeated measurements. Our study indicates that the TG‐150 tests can be used as an independent standardized procedure for detector performance assessment. PACS number(s): 87.57.‐s, 87.57.C PMID:27685102
Jain, Amit; Kuhls-Gilcrist, Andrew T; Gupta, Sandesh K; Bednarek, Daniel R; Rudin, Stephen
2010-03-01
The MTF, NNPS, and DQE are standard linear system metrics used to characterize intrinsic detector performance. To evaluate total system performance for actual clinical conditions, generalized linear system metrics (GMTF, GNNPS and GDQE) that include the effect of the focal spot distribution, scattered radiation, and geometric unsharpness are more meaningful and appropriate. In this study, a two-dimensional (2D) generalized linear system analysis was carried out for a standard flat panel detector (FPD) (194-micron pixel pitch and 600-micron thick CsI) and a newly-developed, high-resolution, micro-angiographic fluoroscope (MAF) (35-micron pixel pitch and 300-micron thick CsI). Realistic clinical parameters and x-ray spectra were used. The 2D detector MTFs were calculated using the new Noise Response method and slanted edge method and 2D focal spot distribution measurements were done using a pin-hole assembly. The scatter fraction, generated for a uniform head equivalent phantom, was measured and the scatter MTF was simulated with a theoretical model. Different magnifications and scatter fractions were used to estimate the 2D GMTF, GNNPS and GDQE for both detectors. Results show spatial non-isotropy for the 2D generalized metrics which provide a quantitative description of the performance of the complete imaging system for both detectors. This generalized analysis demonstrated that the MAF and FPD have similar capabilities at lower spatial frequencies, but that the MAF has superior performance over the FPD at higher frequencies even when considering focal spot blurring and scatter. This 2D generalized performance analysis is a valuable tool to evaluate total system capabilities and to enable optimized design for specific imaging tasks.
A review of training research and virtual reality simulators for the da Vinci surgical system.
Liu, May; Curet, Myriam
2015-01-01
PHENOMENON: Virtual reality simulators are the subject of several recent studies of skills training for robot-assisted surgery. Yet no consensus exists regarding what a core skill set comprises or how to measure skill performance. Defining a core skill set and relevant metrics would help surgical educators evaluate different simulators. This review draws from published research to propose a core technical skill set for using the da Vinci surgeon console. Publications on three commercial simulators were used to evaluate the simulators' content addressing these skills and associated metrics. An analysis of published research suggests that a core technical skill set for operating the surgeon console includes bimanual wristed manipulation, camera control, master clutching to manage hand position, use of third instrument arm, activating energy sources, appropriate depth perception, and awareness of forces applied by instruments. Validity studies of three commercial virtual reality simulators for robot-assisted surgery suggest that all three have comparable content and metrics. However, none have comprehensive content and metrics for all core skills. INSIGHTS: Virtual reality simulation remains a promising tool to support skill training for robot-assisted surgery, yet existing commercial simulator content is inadequate for performing and assessing a comprehensive basic skill set. The results of this evaluation help identify opportunities and challenges that exist for future developments in virtual reality simulation for robot-assisted surgery. Specifically, the inclusion of educational experts in the development cycle alongside clinical and technological experts is recommended.
Chrol-Cannon, Joseph; Jin, Yaochu
2014-01-01
Reservoir computing provides a simpler paradigm of training recurrent networks by initialising and adapting the recurrent connections separately to a supervised linear readout. This creates a problem, though. As the recurrent weights and topology are now separated from adapting to the task, there is a burden on the reservoir designer to construct an effective network that happens to produce state vectors that can be mapped linearly into the desired outputs. Guidance in forming a reservoir can be through the use of some established metrics which link a number of theoretical properties of the reservoir computing paradigm to quantitative measures that can be used to evaluate the effectiveness of a given design. We provide a comprehensive empirical study of four metrics; class separation, kernel quality, Lyapunov's exponent and spectral radius. These metrics are each compared over a number of repeated runs, for different reservoir computing set-ups that include three types of network topology and three mechanisms of weight adaptation through synaptic plasticity. Each combination of these methods is tested on two time-series classification problems. We find that the two metrics that correlate most strongly with the classification performance are Lyapunov's exponent and kernel quality. It is also evident in the comparisons that these two metrics both measure a similar property of the reservoir dynamics. We also find that class separation and spectral radius are both less reliable and less effective in predicting performance.
NASA Astrophysics Data System (ADS)
Zhou, Rurui; Li, Yu; Lu, Di; Liu, Haixing; Zhou, Huicheng
2016-09-01
This paper investigates the use of an epsilon-dominance non-dominated sorted genetic algorithm II (ɛ-NSGAII) as a sampling approach with an aim to improving sampling efficiency for multiple metrics uncertainty analysis using Generalized Likelihood Uncertainty Estimation (GLUE). The effectiveness of ɛ-NSGAII based sampling is demonstrated compared with Latin hypercube sampling (LHS) through analyzing sampling efficiency, multiple metrics performance, parameter uncertainty and flood forecasting uncertainty with a case study of flood forecasting uncertainty evaluation based on Xinanjiang model (XAJ) for Qing River reservoir, China. Results obtained demonstrate the following advantages of the ɛ-NSGAII based sampling approach in comparison to LHS: (1) The former performs more effective and efficient than LHS, for example the simulation time required to generate 1000 behavioral parameter sets is shorter by 9 times; (2) The Pareto tradeoffs between metrics are demonstrated clearly with the solutions from ɛ-NSGAII based sampling, also their Pareto optimal values are better than those of LHS, which means better forecasting accuracy of ɛ-NSGAII parameter sets; (3) The parameter posterior distributions from ɛ-NSGAII based sampling are concentrated in the appropriate ranges rather than uniform, which accords with their physical significance, also parameter uncertainties are reduced significantly; (4) The forecasted floods are close to the observations as evaluated by three measures: the normalized total flow outside the uncertainty intervals (FOUI), average relative band-width (RB) and average deviation amplitude (D). The flood forecasting uncertainty is also reduced a lot with ɛ-NSGAII based sampling. This study provides a new sampling approach to improve multiple metrics uncertainty analysis under the framework of GLUE, and could be used to reveal the underlying mechanisms of parameter sets under multiple conflicting metrics in the uncertainty analysis process.
Person Re-Identification via Distance Metric Learning With Latent Variables.
Sun, Chong; Wang, Dong; Lu, Huchuan
2017-01-01
In this paper, we propose an effective person re-identification method with latent variables, which represents a pedestrian as the mixture of a holistic model and a number of flexible models. Three types of latent variables are introduced to model uncertain factors in the re-identification problem, including vertical misalignments, horizontal misalignments and leg posture variations. The distance between two pedestrians can be determined by minimizing a given distance function with respect to latent variables, and then be used to conduct the re-identification task. In addition, we develop a latent metric learning method for learning the effective metric matrix, which can be solved via an iterative manner: once latent information is specified, the metric matrix can be obtained based on some typical metric learning methods; with the computed metric matrix, the latent variables can be determined by searching the state space exhaustively. Finally, extensive experiments are conducted on seven databases to evaluate the proposed method. The experimental results demonstrate that our method achieves better performance than other competing algorithms.
NASA Astrophysics Data System (ADS)
Phillips, Jonathan B.; Coppola, Stephen M.; Jin, Elaine W.; Chen, Ying; Clark, James H.; Mauer, Timothy A.
2009-01-01
Texture appearance is an important component of photographic image quality as well as object recognition. Noise cleaning algorithms are used to decrease sensor noise of digital images, but can hinder texture elements in the process. The Camera Phone Image Quality (CPIQ) initiative of the International Imaging Industry Association (I3A) is developing metrics to quantify texture appearance. Objective and subjective experimental results of the texture metric development are presented in this paper. Eight levels of noise cleaning were applied to ten photographic scenes that included texture elements such as faces, landscapes, architecture, and foliage. Four companies (Aptina Imaging, LLC, Hewlett-Packard, Eastman Kodak Company, and Vista Point Technologies) have performed psychophysical evaluations of overall image quality using one of two methods of evaluation. Both methods presented paired comparisons of images on thin film transistor liquid crystal displays (TFT-LCD), but the display pixel pitch and viewing distance differed. CPIQ has also been developing objective texture metrics and targets that were used to analyze the same eight levels of noise cleaning. The correlation of the subjective and objective test results indicates that texture perception can be modeled with an objective metric. The two methods of psychophysical evaluation exhibited high correlation despite the differences in methodology.
Load Disaggregation Technologies: Real World and Laboratory Performance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mayhorn, Ebony T.; Sullivan, Greg P.; Petersen, Joseph M.
Low cost interval metering and communication technology improvements over the past ten years have enabled the maturity of load disaggregation (or non-intrusive load monitoring) technologies to better estimate and report energy consumption of individual end-use loads. With the appropriate performance characteristics, these technologies have the potential to enable many utility and customer facing applications such as billing transparency, itemized demand and energy consumption, appliance diagnostics, commissioning, energy efficiency savings verification, load shape research, and demand response measurement. However, there has been much skepticism concerning the ability of load disaggregation products to accurately identify and estimate energy consumption of end-uses; whichmore » has hindered wide-spread market adoption. A contributing factor is that common test methods and metrics are not available to evaluate performance without having to perform large scale field demonstrations and pilots, which can be costly when developing such products. Without common and cost-effective methods of evaluation, more developed disaggregation technologies will continue to be slow to market and potential users will remain uncertain about their capabilities. This paper reviews recent field studies and laboratory tests of disaggregation technologies. Several factors are identified that are important to consider in test protocols, so that the results reflect real world performance. Potential metrics are examined to highlight their effectiveness in quantifying disaggregation performance. This analysis is then used to suggest performance metrics that are meaningful and of value to potential users and that will enable researchers/developers to identify beneficial ways to improve their technologies.« less
Detection and quantification of flow consistency in business process models.
Burattin, Andrea; Bernstein, Vered; Neurauter, Manuel; Soffer, Pnina; Weber, Barbara
2018-01-01
Business process models abstract complex business processes by representing them as graphical models. Their layout, as determined by the modeler, may have an effect when these models are used. However, this effect is currently not fully understood. In order to systematically study this effect, a basic set of measurable key visual features is proposed, depicting the layout properties that are meaningful to the human user. The aim of this research is thus twofold: first, to empirically identify key visual features of business process models which are perceived as meaningful to the user and second, to show how such features can be quantified into computational metrics, which are applicable to business process models. We focus on one particular feature, consistency of flow direction, and show the challenges that arise when transforming it into a precise metric. We propose three different metrics addressing these challenges, each following a different view of flow consistency. We then report the results of an empirical evaluation, which indicates which metric is more effective in predicting the human perception of this feature. Moreover, two other automatic evaluations describing the performance and the computational capabilities of our metrics are reported as well.
ERIC Educational Resources Information Center
Schanie, Charles F.; Kemper, James E.
2008-01-01
Most "performance evaluation" or "performance development" programs in higher education today are little more than metrically-weak, bureaucratic programs aimed simply at establishing a legally-grounded employee record and a rough basis for merit increases. This article outlines the rationale and procedural framework for a talent development and…
Pichler, Peter; Mazanek, Michael; Dusberger, Frederico; Weilnböck, Lisa; Huber, Christian G; Stingl, Christoph; Luider, Theo M; Straube, Werner L; Köcher, Thomas; Mechtler, Karl
2012-11-02
While the performance of liquid chromatography (LC) and mass spectrometry (MS) instrumentation continues to increase, applications such as analyses of complete or near-complete proteomes and quantitative studies require constant and optimal system performance. For this reason, research laboratories and core facilities alike are recommended to implement quality control (QC) measures as part of their routine workflows. Many laboratories perform sporadic quality control checks. However, successive and systematic longitudinal monitoring of system performance would be facilitated by dedicated automatic or semiautomatic software solutions that aid an effortless analysis and display of QC metrics over time. We present the software package SIMPATIQCO (SIMPle AuTomatIc Quality COntrol) designed for evaluation of data from LTQ Orbitrap, Q-Exactive, LTQ FT, and LTQ instruments. A centralized SIMPATIQCO server can process QC data from multiple instruments. The software calculates QC metrics supervising every step of data acquisition from LC and electrospray to MS. For each QC metric the software learns the range indicating adequate system performance from the uploaded data using robust statistics. Results are stored in a database and can be displayed in a comfortable manner from any computer in the laboratory via a web browser. QC data can be monitored for individual LC runs as well as plotted over time. SIMPATIQCO thus assists the longitudinal monitoring of important QC metrics such as peptide elution times, peak widths, intensities, total ion current (TIC) as well as sensitivity, and overall LC-MS system performance; in this way the software also helps identify potential problems. The SIMPATIQCO software package is available free of charge.
2012-01-01
While the performance of liquid chromatography (LC) and mass spectrometry (MS) instrumentation continues to increase, applications such as analyses of complete or near-complete proteomes and quantitative studies require constant and optimal system performance. For this reason, research laboratories and core facilities alike are recommended to implement quality control (QC) measures as part of their routine workflows. Many laboratories perform sporadic quality control checks. However, successive and systematic longitudinal monitoring of system performance would be facilitated by dedicated automatic or semiautomatic software solutions that aid an effortless analysis and display of QC metrics over time. We present the software package SIMPATIQCO (SIMPle AuTomatIc Quality COntrol) designed for evaluation of data from LTQ Orbitrap, Q-Exactive, LTQ FT, and LTQ instruments. A centralized SIMPATIQCO server can process QC data from multiple instruments. The software calculates QC metrics supervising every step of data acquisition from LC and electrospray to MS. For each QC metric the software learns the range indicating adequate system performance from the uploaded data using robust statistics. Results are stored in a database and can be displayed in a comfortable manner from any computer in the laboratory via a web browser. QC data can be monitored for individual LC runs as well as plotted over time. SIMPATIQCO thus assists the longitudinal monitoring of important QC metrics such as peptide elution times, peak widths, intensities, total ion current (TIC) as well as sensitivity, and overall LC–MS system performance; in this way the software also helps identify potential problems. The SIMPATIQCO software package is available free of charge. PMID:23088386
Nixon, Gavin J; Svenstrup, Helle F; Donald, Carol E; Carder, Caroline; Stephenson, Judith M; Morris-Jones, Stephen; Huggett, Jim F; Foy, Carole A
2014-12-01
Molecular diagnostic measurements are currently underpinned by the polymerase chain reaction (PCR). There are also a number of alternative nucleic acid amplification technologies, which unlike PCR, work at a single temperature. These 'isothermal' methods, reportedly offer potential advantages over PCR such as simplicity, speed and resistance to inhibitors and could also be used for quantitative molecular analysis. However there are currently limited mechanisms to evaluate their quantitative performance, which would assist assay development and study comparisons. This study uses a sexually transmitted infection diagnostic model in combination with an adapted metric termed isothermal doubling time (IDT), akin to PCR efficiency, to compare quantitative PCR and quantitative loop-mediated isothermal amplification (qLAMP) assays, and to quantify the impact of matrix interference. The performance metric described here facilitates the comparison of qLAMP assays that could assist assay development and validation activities.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hansen, C., E-mail: hansec@uw.edu; Columbia University, New York, New York 10027; Victor, B.
We present application of three scalar metrics derived from the Biorthogonal Decomposition (BD) technique to evaluate the level of agreement between macroscopic plasma dynamics in different data sets. BD decomposes large data sets, as produced by distributed diagnostic arrays, into principal mode structures without assumptions on spatial or temporal structure. These metrics have been applied to validation of the Hall-MHD model using experimental data from the Helicity Injected Torus with Steady Inductive helicity injection experiment. Each metric provides a measure of correlation between mode structures extracted from experimental data and simulations for an array of 192 surface-mounted magnetic probes. Numericalmore » validation studies have been performed using the NIMROD code, where the injectors are modeled as boundary conditions on the flux conserver, and the PSI-TET code, where the entire plasma volume is treated. Initial results from a comprehensive validation study of high performance operation with different injector frequencies are presented, illustrating application of the BD method. Using a simplified (constant, uniform density and temperature) Hall-MHD model, simulation results agree with experimental observation for two of the three defined metrics when the injectors are driven with a frequency of 14.5 kHz.« less
Proceedings of the 66th National Conference on Weights and Measures, 1981
NASA Astrophysics Data System (ADS)
Wollin, H. F.; Barbrow, L. E.; Heffernan, A. P.
1981-12-01
Major issues discussed included measurement science education, enforcement uniformly, national type approval, inch pound and metric labeling provisions, new design and performance requirements for weighing and measuring technology, metric conversion of retail gasoline dispensers, weights and measures program evaluation studies of model State laws and regulations and their adoption by citation or other means by State and local jurisdictions, and report of States conducting grain moisture meter testing programs.
The Albuquerque Seismological Laboratory Data Quality Analyzer
NASA Astrophysics Data System (ADS)
Ringler, A. T.; Hagerty, M.; Holland, J.; Gee, L. S.; Wilson, D.
2013-12-01
The U.S. Geological Survey's Albuquerque Seismological Laboratory (ASL) has several efforts underway to improve data quality at its stations. The Data Quality Analyzer (DQA) is one such development. The DQA is designed to characterize station data quality in a quantitative and automated manner. Station quality is based on the evaluation of various metrics, such as timing quality, noise levels, sensor coherence, and so on. These metrics are aggregated into a measurable grade for each station. The DQA consists of a website, a metric calculator (Seedscan), and a PostgreSQL database. The website allows the user to make requests for various time periods, review specific networks and stations, adjust weighting of the station's grade, and plot metrics as a function of time. The website dynamically loads all station data from a PostgreSQL database. The database is central to the application; it acts as a hub where metric values and limited station descriptions are stored. Data is stored at the level of one sensor's channel per day. The database is populated by Seedscan. Seedscan reads and processes miniSEED data, to generate metric values. Seedscan, written in Java, compares hashes of metadata and data to detect changes and perform subsequent recalculations. This ensures that the metric values are up to date and accurate. Seedscan can be run in a scheduled task or on demand by way of a config file. It will compute metrics specified in its configuration file. While many metrics are currently in development, some are completed and being actively used. These include: availability, timing quality, gap count, deviation from the New Low Noise Model, deviation from a station's noise baseline, inter-sensor coherence, and data-synthetic fits. In all, 20 metrics are planned, but any number could be added. ASL is actively using the DQA on a daily basis for station diagnostics and evaluation. As Seedscan is scheduled to run every night, data quality analysts are able to then use the website to diagnose changes in noise levels or other anomalous data. This allows for errors to be corrected quickly and efficiently. The code is designed to be flexible for adding metrics and portable for use in other networks. We anticipate further development of the DQA by improving the existing web-interface, adding more metrics, adding an interface to facilitate the verification of historic station metadata and performance, and an interface to allow better monitoring of data quality goals.
Document Level Assessment of Document Retrieval Systems in a Pairwise System Evaluation
ERIC Educational Resources Information Center
Rajagopal, Prabha; Ravana, Sri Devi
2017-01-01
Introduction: The use of averaged topic-level scores can result in the loss of valuable data and can cause misinterpretation of the effectiveness of system performance. This study aims to use the scores of each document to evaluate document retrieval systems in a pairwise system evaluation. Method: The chosen evaluation metrics are document-level…
ACSYNT inner loop flight control design study
NASA Technical Reports Server (NTRS)
Bortins, Richard; Sorensen, John A.
1993-01-01
The NASA Ames Research Center developed the Aircraft Synthesis (ACSYNT) computer program to synthesize conceptual future aircraft designs and to evaluate critical performance metrics early in the design process before significant resources are committed and cost decisions made. ACSYNT uses steady-state performance metrics, such as aircraft range, payload, and fuel consumption, and static performance metrics, such as the control authority required for the takeoff rotation and for landing with an engine out, to evaluate conceptual aircraft designs. It can also optimize designs with respect to selected criteria and constraints. Many modern aircraft have stability provided by the flight control system rather than by the airframe. This may allow the aircraft designer to increase combat agility, or decrease trim drag, for increased range and payload. This strategy requires concurrent design of the airframe and the flight control system, making trade-offs of performance and dynamics during the earliest stages of design. ACSYNT presently lacks means to implement flight control system designs but research is being done to add methods for predicting rotational degrees of freedom and control effector performance. A software module to compute and analyze the dynamics of the aircraft and to compute feedback gains and analyze closed loop dynamics is required. The data gained from these analyses can then be fed back to the aircraft design process so that the effects of the flight control system and the airframe on aircraft performance can be included as design metrics. This report presents results of a feasibility study and the initial design work to add an inner loop flight control system (ILFCS) design capability to the stability and control module in ACSYNT. The overall objective is to provide a capability for concurrent design of the aircraft and its flight control system, and enable concept designers to improve performance by exploiting the interrelationships between aircraft and flight control system design parameters.
The data quality analyzer: A quality control program for seismic data
NASA Astrophysics Data System (ADS)
Ringler, A. T.; Hagerty, M. T.; Holland, J.; Gonzales, A.; Gee, L. S.; Edwards, J. D.; Wilson, D.; Baker, A. M.
2015-03-01
The U.S. Geological Survey's Albuquerque Seismological Laboratory (ASL) has several initiatives underway to enhance and track the quality of data produced from ASL seismic stations and to improve communication about data problems to the user community. The Data Quality Analyzer (DQA) is one such development and is designed to characterize seismic station data quality in a quantitative and automated manner. The DQA consists of a metric calculator, a PostgreSQL database, and a Web interface: The metric calculator, SEEDscan, is a Java application that reads and processes miniSEED data and generates metrics based on a configuration file. SEEDscan compares hashes of metadata and data to detect changes in either and performs subsequent recalculations as needed. This ensures that the metric values are up to date and accurate. SEEDscan can be run as a scheduled task or on demand. The PostgreSQL database acts as a central hub where metric values and limited station descriptions are stored at the channel level with one-day granularity. The Web interface dynamically loads station data from the database and allows the user to make requests for time periods of interest, review specific networks and stations, plot metrics as a function of time, and adjust the contribution of various metrics to the overall quality grade of the station. The quantification of data quality is based on the evaluation of various metrics (e.g., timing quality, daily noise levels relative to long-term noise models, and comparisons between broadband data and event synthetics). Users may select which metrics contribute to the assessment and those metrics are aggregated into a "grade" for each station. The DQA is being actively used for station diagnostics and evaluation based on the completed metrics (availability, gap count, timing quality, deviation from a global noise model, deviation from a station noise model, coherence between co-located sensors, and comparison between broadband data and synthetics for earthquakes) on stations in the Global Seismographic Network and Advanced National Seismic System.
Federal Register 2010, 2011, 2012, 2013, 2014
2012-12-05
... DEPARTMENT OF TRANSPORTATION Pipeline and Hazardous Materials Safety Administration [Docket No... Evaluations AGENCY: Pipeline and Hazardous Materials Safety Administration (PHMSA), DOT. ACTION: Notice... improve performance. For gas transmission pipelines, Sec. Sec. 192.911(i) and 192.945 define the...
Goodman, Corey W; Major, Heather J; Walls, William D; Sheffield, Val C; Casavant, Thomas L; Darbro, Benjamin W
2015-04-01
Chromosomal microarrays (CMAs) are routinely used in both research and clinical laboratories; yet, little attention has been given to the estimation of genome-wide true and false negatives during the assessment of these assays and how such information could be used to calibrate various algorithmic metrics to improve performance. Low-throughput, locus-specific methods such as fluorescence in situ hybridization (FISH), quantitative PCR (qPCR), or multiplex ligation-dependent probe amplification (MLPA) preclude rigorous calibration of various metrics used by copy number variant (CNV) detection algorithms. To aid this task, we have established a comparative methodology, CNV-ROC, which is capable of performing a high throughput, low cost, analysis of CMAs that takes into consideration genome-wide true and false negatives. CNV-ROC uses a higher resolution microarray to confirm calls from a lower resolution microarray and provides for a true measure of genome-wide performance metrics at the resolution offered by microarray testing. CNV-ROC also provides for a very precise comparison of CNV calls between two microarray platforms without the need to establish an arbitrary degree of overlap. Comparison of CNVs across microarrays is done on a per-probe basis and receiver operator characteristic (ROC) analysis is used to calibrate algorithmic metrics, such as log2 ratio threshold, to enhance CNV calling performance. CNV-ROC addresses a critical and consistently overlooked aspect of analytical assessments of genome-wide techniques like CMAs which is the measurement and use of genome-wide true and false negative data for the calculation of performance metrics and comparison of CNV profiles between different microarray experiments. Copyright © 2015 Elsevier Inc. All rights reserved.
Substantial Progress Yet Significant Opportunity for Improvement in Stroke Care in China.
Li, Zixiao; Wang, Chunjuan; Zhao, Xingquan; Liu, Liping; Wang, Chunxue; Li, Hao; Shen, Haipeng; Liang, Li; Bettger, Janet; Yang, Qing; Wang, David; Wang, Anxin; Pan, Yuesong; Jiang, Yong; Yang, Xiaomeng; Zhang, Changqing; Fonarow, Gregg C; Schwamm, Lee H; Hu, Bo; Peterson, Eric D; Xian, Ying; Wang, Yilong; Wang, Yongjun
2016-11-01
Stroke is a leading cause of death in China. Yet the adherence to guideline-recommended ischemic stroke performance metrics in the past decade has been previously shown to be suboptimal. Since then, several nationwide stroke quality management initiatives have been conducted in China. We sought to determine whether adherence had improved since then. Data were obtained from the 2 phases of China National Stroke Registries, which included 131 hospitals (12 173 patients with acute ischemic stroke) in China National Stroke Registries phase 1 from 2007 to 2008 versus 219 hospitals (19 604 patients) in China National Stroke Registries phase 2 from 2012 to 2013. Multiple regression models were developed to evaluate the difference in adherence to performance measure between the 2 study periods. The overall quality of care has improved over time, as reflected by the higher composite score of 0.76 in 2012 to 2013 versus 0.63 in 2007 to 2008. Nine of 13 individual performance metrics improved. However, there were no significant improvements in the rates of intravenous thrombolytic therapy and anticoagulation for atrial fibrillation. After multivariate analysis, there remained a significant 1.17-fold (95% confidence interval, 1.14-1.21) increase in the odds of delivering evidence-based performance metrics in the more recent time periods versus older data. The performance metrics with the most significantly increased odds included stroke education, dysphagia screening, smoking cessation, and antithrombotics at discharge. Adherence to stroke performance metrics has increased over time, but significant opportunities remain for further improvement. Continuous stroke quality improvement program should be developed as a national priority in China. © 2016 American Heart Association, Inc.
The performance of differential VLBI delay during interplanetary cruise
NASA Technical Reports Server (NTRS)
Moultrie, B.; Wolff, P. J.; Taylor, T. H.
1984-01-01
Project Voyager radio metric data are used to evaluate the orbit determination abilities of several data strategies during spacecraft interplanetary cruise. Benchmark performance is established with an operational data strategy of conventional coherent doppler, coherent range, and explicitly differenced range data from two intercontinental baselines to ameliorate the low declination singularity of the doppler data. Employing a Voyager operations trajectory as a reference, the performance of the operational data strategy is compared to the performances of data strategies using differential VLBI delay data (spacecraft delay minus quasar delay) in combinations with the aforementioned conventional data types. The comparison of strategy performances indicates that high accuracy cruise orbit determination can be achieved with a data strategy employing differential VLBI delay data, where the quantity of coherent radio metric data has been greatly reduced.
A high turndown, ultra low emission low swirl burner for natural gas, on-demand water heaters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rapp, Vi H.; Cheng, Robert K.; Therkelsen, Peter L.
Previous research has shown that on-demand water heaters are, on average, approximately 37% more efficient than storage water heaters. However, approximately 98% of water heaters in the U.S. use storage water heaters while the remaining 2% are on-demand. A major market barrier to deployment of on-demand water heaters is their high retail cost, which is due in part to their reliance on multi-stage burner banks that require complex electronic controls. This project aims to research and develop a cost-effective, efficient, ultra-low emission burner for next generation natural gas on-demand water heaters in residential and commercial buildings. To meet these requirements,more » researchers at the Lawrence Berkeley National Laboratory (LBNL) are adapting and testing the low-swirl burner (LSB) technology for commercially available on-demand water heaters. In this report, a low-swirl burner is researched, developed, and evaluated to meet targeted on-demand water heater performance metrics. Performance metrics for a new LSB design are identified by characterizing performance of current on-demand water heaters using published literature and technical specifications, and through experimental evaluations that measure fuel consumption and emissions output over a range of operating conditions. Next, target metrics and design criteria for the LSB are used to create six 3D printed prototypes for preliminary investigations. Prototype designs that proved the most promising were fabricated out of metal and tested further to evaluate the LSB’s full performance potential. After conducting a full performance evaluation on two designs, we found that one LSB design is capable of meeting or exceeding almost all the target performance metrics for on-demand water heaters. Specifically, this LSB demonstrated flame stability when operating from 4.07 kBTU/hr up to 204 kBTU/hr (50:1 turndown), compliance with SCAQMD Rule 1146.2 (14 ng/J or 20 ppm NOX @ 3% O2), and lower CO emissions than state-of-the art water heaters. Overall, the results from this research show that the LSB could provide a simple, low cost burner solution for significantly extending operating range of on-demand water heaters while providing low NOX and CO emissions.« less
Realized population change for long-term monitoring: California spotted owl case study
Mary M. Conner; John J. Keane; Claire V. Gallagher; Gretchen Jehle; Thomas E. Munton; Paula A. Shaklee; Ross A. Gerrard
2013-01-01
The annual rate of population change (λt) is a good metric for evaluating population performance because it summarizes survival and recruitment rates and can be used for open populations. Another measure of population performance, realized population change (Δt...
Information-theoretic model comparison unifies saliency metrics
Kümmerer, Matthias; Wallis, Thomas S. A.; Bethge, Matthias
2015-01-01
Learning the properties of an image associated with human gaze placement is important both for understanding how biological systems explore the environment and for computer vision applications. There is a large literature on quantitative eye movement models that seeks to predict fixations from images (sometimes termed “saliency” prediction). A major problem known to the field is that existing model comparison metrics give inconsistent results, causing confusion. We argue that the primary reason for these inconsistencies is because different metrics and models use different definitions of what a “saliency map” entails. For example, some metrics expect a model to account for image-independent central fixation bias whereas others will penalize a model that does. Here we bring saliency evaluation into the domain of information by framing fixation prediction models probabilistically and calculating information gain. We jointly optimize the scale, the center bias, and spatial blurring of all models within this framework. Evaluating existing metrics on these rephrased models produces almost perfect agreement in model rankings across the metrics. Model performance is separated from center bias and spatial blurring, avoiding the confounding of these factors in model comparison. We additionally provide a method to show where and how models fail to capture information in the fixations on the pixel level. These methods are readily extended to spatiotemporal models of fixation scanpaths, and we provide a software package to facilitate their use. PMID:26655340
Moore, C S; Wood, T J; Avery, G; Balcam, S; Needler, L; Beavis, A W; Saunderson, J R
2014-05-07
The purpose of this study was to examine the use of three physical image quality metrics in the calibration of an automatic exposure control (AEC) device for chest radiography with a computed radiography (CR) imaging system. The metrics assessed were signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR) and mean effective noise equivalent quanta (eNEQm), all measured using a uniform chest phantom. Subsequent calibration curves were derived to ensure each metric was held constant across the tube voltage range. Each curve was assessed for its clinical appropriateness by generating computer simulated chest images with correct detector air kermas for each tube voltage, and grading these against reference images which were reconstructed at detector air kermas correct for the constant detector dose indicator (DDI) curve currently programmed into the AEC device. All simulated chest images contained clinically realistic projected anatomy and anatomical noise and were scored by experienced image evaluators. Constant DDI and CNR curves do not appear to provide optimized performance across the diagnostic energy range. Conversely, constant eNEQm and SNR do appear to provide optimized performance, with the latter being the preferred calibration metric given as it is easier to measure in practice. Medical physicists may use the SNR image quality metric described here when setting up and optimizing AEC devices for chest radiography CR systems with a degree of confidence that resulting clinical image quality will be adequate for the required clinical task. However, this must be done with close cooperation of expert image evaluators, to ensure appropriate levels of detector air kerma.
NASA Astrophysics Data System (ADS)
Moore, C. S.; Wood, T. J.; Avery, G.; Balcam, S.; Needler, L.; Beavis, A. W.; Saunderson, J. R.
2014-05-01
The purpose of this study was to examine the use of three physical image quality metrics in the calibration of an automatic exposure control (AEC) device for chest radiography with a computed radiography (CR) imaging system. The metrics assessed were signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR) and mean effective noise equivalent quanta (eNEQm), all measured using a uniform chest phantom. Subsequent calibration curves were derived to ensure each metric was held constant across the tube voltage range. Each curve was assessed for its clinical appropriateness by generating computer simulated chest images with correct detector air kermas for each tube voltage, and grading these against reference images which were reconstructed at detector air kermas correct for the constant detector dose indicator (DDI) curve currently programmed into the AEC device. All simulated chest images contained clinically realistic projected anatomy and anatomical noise and were scored by experienced image evaluators. Constant DDI and CNR curves do not appear to provide optimized performance across the diagnostic energy range. Conversely, constant eNEQm and SNR do appear to provide optimized performance, with the latter being the preferred calibration metric given as it is easier to measure in practice. Medical physicists may use the SNR image quality metric described here when setting up and optimizing AEC devices for chest radiography CR systems with a degree of confidence that resulting clinical image quality will be adequate for the required clinical task. However, this must be done with close cooperation of expert image evaluators, to ensure appropriate levels of detector air kerma.
DOT National Transportation Integrated Search
2017-09-01
A number of Connected and/or Automated Vehicle (CAV) applications have recently been designed to improve the performance of our transportation system. Safety, mobility and environmental sustainability are three cornerstone performance metrics when ev...
Conceptual Soundness, Metric Development, Benchmarking, and Targeting for PATH Subprogram Evaluation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mosey. G.; Doris, E.; Coggeshall, C.
The objective of this study is to evaluate the conceptual soundness of the U.S. Department of Housing and Urban Development (HUD) Partnership for Advancing Technology in Housing (PATH) program's revised goals and establish and apply a framework to identify and recommend metrics that are the most useful for measuring PATH's progress. This report provides an evaluative review of PATH's revised goals, outlines a structured method for identifying and selecting metrics, proposes metrics and benchmarks for a sampling of individual PATH programs, and discusses other metrics that potentially could be developed that may add value to the evaluation process. The frameworkmore » and individual program metrics can be used for ongoing management improvement efforts and to inform broader program-level metrics for government reporting requirements.« less
Guiding Principles and Checklist for Population-Based Quality Metrics
Brunelli, Steven M.; Maddux, Franklin W.; Parker, Thomas F.; Johnson, Douglas; Nissenson, Allen R.; Collins, Allan; Lacson, Eduardo
2014-01-01
The Centers for Medicare and Medicaid Services oversees the ESRD Quality Incentive Program to ensure that the highest quality of health care is provided by outpatient dialysis facilities that treat patients with ESRD. To that end, Centers for Medicare and Medicaid Services uses clinical performance measures to evaluate quality of care under a pay-for-performance or value-based purchasing model. Now more than ever, the ESRD therapeutic area serves as the vanguard of health care delivery. By translating medical evidence into clinical performance measures, the ESRD Prospective Payment System became the first disease-specific sector using the pay-for-performance model. A major challenge for the creation and implementation of clinical performance measures is the adjustments that are necessary to transition from taking care of individual patients to managing the care of patient populations. The National Quality Forum and others have developed effective and appropriate population-based clinical performance measures quality metrics that can be aggregated at the physician, hospital, dialysis facility, nursing home, or surgery center level. Clinical performance measures considered for endorsement by the National Quality Forum are evaluated using five key criteria: evidence, performance gap, and priority (impact); reliability; validity; feasibility; and usability and use. We have developed a checklist of special considerations for clinical performance measure development according to these National Quality Forum criteria. Although the checklist is focused on ESRD, it could also have broad application to chronic disease states, where health care delivery organizations seek to enhance quality, safety, and efficiency of their services. Clinical performance measures are likely to become the norm for tracking performance for health care insurers. Thus, it is critical that the methodologies used to develop such metrics serve the payer and the provider and most importantly, reflect what represents the best care to improve patient outcomes. PMID:24558050
Evaluation Metrics for Simulations of Tropical South America
NASA Astrophysics Data System (ADS)
Gallup, S.; Baker, I. T.; Denning, A. S.; Cheeseman, M.; Haynes, K. D.; Phillips, M.
2017-12-01
The evergreen broadleaf forest of the Amazon Basin is the largest rainforest on earth, and has teleconnections to global climate and carbon cycle characteristics. This region defies simple characterization, spanning large gradients in total rainfall and seasonal variability. Broadly, the region can be thought of as trending from light-limited in its wettest areas to water-limited near the ecotone, with individual landscapes possibly exhibiting the characteristics of either (or both) limitations during an annual cycle. A basin-scale classification of mean behavior has been elusive, and ecosystem response to seasonal cycles and anomalous drought events has resulted in some disagreement in the literature, to say the least. However, new observational platforms and instruments make characterization of the heterogeneity and variability more feasible.To evaluate simulations of ecophysiological function, we develop metrics that correlate various observational products with meteorological variables such as precipitation and radiation. Observations include eddy covariance fluxes, Solar Induced Fluorescence (SIF, from GOME2 and OCO2), biomass and vegetation indices. We find that the modest correlation between SIF and precipitation decreases with increasing annual precipitation, although the relationship is not consistent between products. Biomass increases with increasing precipitation. Although vegetation indices are generally correlated with biomass and precipitation, they can saturate or experience retrieval issues during cloudy periods.Using these observational products and relationships, we develop a set of model evaluation metrics. These metrics are designed to call attention to models that get "the right answer only if it's for the right reason," and provide an opportunity for more critical evaluation of model physics. These metrics represent a testbed that can be applied to multiple models as a means to evaluate their performance in tropical South America.
Integrating fisheries approaches and household utility models for improved resource management.
Milner-Gulland, E J
2011-01-25
Natural resource management is littered with cases of overexploitation and ineffectual management, leading to loss of both biodiversity and human welfare. Disciplinary boundaries stifle the search for solutions to these issues. Here, I combine the approach of management strategy evaluation, widely applied in fisheries, with household utility models from the conservation and development literature, to produce an integrated framework for evaluating the effectiveness of competing management strategies for harvested resources against a range of performance metrics. I demonstrate the strengths of this approach with a simple model, and use it to examine the effect of manager ignorance of household decisions on resource management effectiveness, and an allocation tradeoff between monitoring resource stocks to reduce observation uncertainty and monitoring users to improve compliance. I show that this integrated framework enables management assessments to consider household utility as a direct metric for system performance, and that although utility and resource stock conservation metrics are well aligned, harvest yield is a poor proxy for both, because it is a product of household allocation decisions between alternate livelihood options, rather than an end in itself. This approach has potential far beyond single-species harvesting in situations where managers are in full control; I show that the integrated approach enables a range of management intervention options to be evaluated within the same framework.
A PERFORMANCE EVALUATION OF THE ETA- CMAQ AIR QUALITY FORECAST SYSTEM FOR THE SUMMER OF 2005
This poster presents an evaluation of the Eta-CMAQ Air Quality Forecast System's experimental domain using O3 observations obtained from EPA's AIRNOW program and a suite of statistical metrics examining both discrete and categorical forecasts.
Metrics for the NASA Airspace Systems Program
NASA Technical Reports Server (NTRS)
Smith, Jeremy C.; Neitzke, Kurt W.
2009-01-01
This document defines an initial set of metrics for use by the NASA Airspace Systems Program (ASP). ASP consists of the NextGen-Airspace Project and the NextGen-Airportal Project. The work in each project is organized along multiple, discipline-level Research Focus Areas (RFAs). Each RFA is developing future concept elements in support of the Next Generation Air Transportation System (NextGen), as defined by the Joint Planning and Development Office (JPDO). In addition, a single, system-level RFA is responsible for integrating concept elements across RFAs in both projects and for assessing system-wide benefits. The primary purpose of this document is to define a common set of metrics for measuring National Airspace System (NAS) performance before and after the introduction of ASP-developed concepts for NextGen as the system handles increasing traffic. The metrics are directly traceable to NextGen goals and objectives as defined by the JPDO and hence will be used to measure the progress of ASP research toward reaching those goals. The scope of this document is focused on defining a common set of metrics for measuring NAS capacity, efficiency, robustness, and safety at the system-level and at the RFA-level. Use of common metrics will focus ASP research toward achieving system-level performance goals and objectives and enable the discipline-level RFAs to evaluate the impact of their concepts at the system level.
NASA Astrophysics Data System (ADS)
Eum, H. I.; Cannon, A. J.
2015-12-01
Climate models are a key provider to investigate impacts of projected future climate conditions on regional hydrologic systems. However, there is a considerable mismatch of spatial resolution between GCMs and regional applications, in particular a region characterized by complex terrain such as Korean peninsula. Therefore, a downscaling procedure is an essential to assess regional impacts of climate change. Numerous statistical downscaling methods have been used mainly due to the computational efficiency and simplicity. In this study, four statistical downscaling methods [Bias-Correction/Spatial Disaggregation (BCSD), Bias-Correction/Constructed Analogue (BCCA), Multivariate Adaptive Constructed Analogs (MACA), and Bias-Correction/Climate Imprint (BCCI)] are applied to downscale the latest Climate Forecast System Reanalysis data to stations for precipitation, maximum temperature, and minimum temperature over South Korea. By split sampling scheme, all methods are calibrated with observational station data for 19 years from 1973 to 1991 are and tested for the recent 19 years from 1992 to 2010. To assess skill of the downscaling methods, we construct a comprehensive suite of performance metrics that measure an ability of reproducing temporal correlation, distribution, spatial correlation, and extreme events. In addition, we employ Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) to identify robust statistical downscaling methods based on the performance metrics for each season. The results show that downscaling skill is considerably affected by the skill of CFSR and all methods lead to large improvements in representing all performance metrics. According to seasonal performance metrics evaluated, when TOPSIS is applied, MACA is identified as the most reliable and robust method for all variables and seasons. Note that such result is derived from CFSR output which is recognized as near perfect climate data in climate studies. Therefore, the ranking of this study may be changed when various GCMs are downscaled and evaluated. Nevertheless, it may be informative for end-users (i.e. modelers or water resources managers) to understand and select more suitable downscaling methods corresponding to priorities on regional applications.
NASA Technical Reports Server (NTRS)
Verma, Savita; Lee, Hanbong; Dulchinos, Victoria L.; Martin, Lynne; Stevens, Lindsay; Jung, Yoon; Chevalley, Eric; Jobe, Kim; Parke, Bonny
2017-01-01
NASA has been working with the FAA and aviation industry partners to develop and demonstrate new concepts and technologies that integrate arrival, departure, and surface traffic management capabilities. In March 2017, NASA conducted a human-in-the-loop (HITL) simulation for integrated surface and airspace operations, modeling Charlotte Douglas International Airport, to evaluate the operational procedures and information requirements for the tactical surface metering tool, and data exchange elements between the airline controlled ramp and ATC Tower. In this paper, we focus on the calibration of the tactical surface metering tool using various metrics measured from the HITL simulation results. Key performance metrics include gate hold times from pushback advisories, taxi-in-out times, runway throughput, and departure queue size. Subjective metrics presented in this paper include workload, situational awareness, and acceptability of the metering tool and its calibration.
NASA Technical Reports Server (NTRS)
Verma, Savita; Lee, Hanbong; Martin, Lynne; Stevens, Lindsay; Jung, Yoon; Dulchinos, Victoria; Chevalley, Eric; Jobe, Kim; Parke, Bonny
2017-01-01
NASA has been working with the FAA and aviation industry partners to develop and demonstrate new concepts and technologies that integrate arrival, departure, and surface traffic management capabilities. In March 2017, NASA conducted a human-in-the-loop (HITL) simulation for integrated surface and airspace operations, modeling Charlotte Douglas International Airport, to evaluate the operational procedures and information requirements for the tactical surface metering tool, and data exchange elements between the airline controlled ramp and ATC Tower. In this paper, we focus on the calibration of the tactical surface metering tool using various metrics measured from the HITL simulation results. Key performance metrics include gate hold times from pushback advisories, taxi-in/out times, runway throughput, and departure queue size. Subjective metrics presented in this paper include workload, situational awareness, and acceptability of the metering tool and its calibration
NASA Astrophysics Data System (ADS)
Hernandez, Monica
2017-12-01
This paper proposes a method for primal-dual convex optimization in variational large deformation diffeomorphic metric mapping problems formulated with robust regularizers and robust image similarity metrics. The method is based on Chambolle and Pock primal-dual algorithm for solving general convex optimization problems. Diagonal preconditioning is used to ensure the convergence of the algorithm to the global minimum. We consider three robust regularizers liable to provide acceptable results in diffeomorphic registration: Huber, V-Huber and total generalized variation. The Huber norm is used in the image similarity term. The primal-dual equations are derived for the stationary and the non-stationary parameterizations of diffeomorphisms. The resulting algorithms have been implemented for running in the GPU using Cuda. For the most memory consuming methods, we have developed a multi-GPU implementation. The GPU implementations allowed us to perform an exhaustive evaluation study in NIREP and LPBA40 databases. The experiments showed that, for all the considered regularizers, the proposed method converges to diffeomorphic solutions while better preserving discontinuities at the boundaries of the objects compared to baseline diffeomorphic registration methods. In most cases, the evaluation showed a competitive performance for the robust regularizers, close to the performance of the baseline diffeomorphic registration methods.
Tenório, Thyago; Bittencourt, Ig Ibert; Isotani, Seiji; Pedro, Alan; Ospina, Patrícia; Tenório, Daniel
2017-06-01
In this dataset, we present the collected data of two experiments with the application of the gamified peer assessment model into online learning environment MeuTutor to allow the comparison of the obtained results with others proposed models. MeuTutor is an intelligent tutoring system aims to monitor the learning of the students in a personalized way, ensuring quality education and improving the performance of its members (Tenório et al., 2016) [1]. The first experiment evaluated the effectiveness of the peer assessment model through metrics as final grade (result), time to correct the activities and associated costs. The second experiment evaluated the gamification influence into peer assessment model, analyzing metrics as access number (logins), number of performed activities and number of performed corrections. In this article, we present in table form for each metric: the raw data of each treatment; the summarized data; the application results of the normality test Shapiro-Wilk; the application results of the statistical tests T -Test and/or Wilcoxon. The presented data in this article are related to the article entitled "A gamified peer assessment model for on-line learning environments in a competitive context" (Tenório et al., 2016) [1].
Crouch, Edmund A; Labarre, David; Golden, Neal J; Kause, Janell R; Dearfield, Kerry L
2009-10-01
The U.S. Department of Agriculture, Food Safety and Inspection Service is exploring quantitative risk assessment methodologies to incorporate the use of the Codex Alimentarius' newly adopted risk management metrics (e.g., food safety objectives and performance objectives). It is suggested that use of these metrics would more closely tie the results of quantitative microbial risk assessments (QMRAs) to public health outcomes. By estimating the food safety objective (the maximum frequency and/or concentration of a hazard in a food at the time of consumption) and the performance objective (the maximum frequency and/or concentration of a hazard in a food at a specified step in the food chain before the time of consumption), risk managers will have a better understanding of the appropriate level of protection (ALOP) from microbial hazards for public health protection. We here demonstrate a general methodology that allows identification of an ALOP and evaluation of corresponding metrics at appropriate points in the food chain. It requires a two-dimensional probabilistic risk assessment, the example used being the Monte Carlo QMRA for Clostridium perfringens in ready-to eat and partially cooked meat and poultry products, with minor modifications to evaluate and abstract required measures. For demonstration purposes, the QMRA model was applied specifically to hot dogs produced and consumed in the United States. Evaluation of the cumulative uncertainty distribution for illness rate allows a specification of an ALOP that, with defined confidence, corresponds to current industry practices.
Kumar, B. Vinodh; Mohan, Thuthi
2018-01-01
OBJECTIVE: Six Sigma is one of the most popular quality management system tools employed for process improvement. The Six Sigma methods are usually applied when the outcome of the process can be measured. This study was done to assess the performance of individual biochemical parameters on a Sigma Scale by calculating the sigma metrics for individual parameters and to follow the Westgard guidelines for appropriate Westgard rules and levels of internal quality control (IQC) that needs to be processed to improve target analyte performance based on the sigma metrics. MATERIALS AND METHODS: This is a retrospective study, and data required for the study were extracted between July 2015 and June 2016 from a Secondary Care Government Hospital, Chennai. The data obtained for the study are IQC - coefficient of variation percentage and External Quality Assurance Scheme (EQAS) - Bias% for 16 biochemical parameters. RESULTS: For the level 1 IQC, four analytes (alkaline phosphatase, magnesium, triglyceride, and high-density lipoprotein-cholesterol) showed an ideal performance of ≥6 sigma level, five analytes (urea, total bilirubin, albumin, cholesterol, and potassium) showed an average performance of <3 sigma level and for level 2 IQCs, same four analytes of level 1 showed a performance of ≥6 sigma level, and four analytes (urea, albumin, cholesterol, and potassium) showed an average performance of <3 sigma level. For all analytes <6 sigma level, the quality goal index (QGI) was <0.8 indicating the area requiring improvement to be imprecision except cholesterol whose QGI >1.2 indicated inaccuracy. CONCLUSION: This study shows that sigma metrics is a good quality tool to assess the analytical performance of a clinical chemistry laboratory. Thus, sigma metric analysis provides a benchmark for the laboratory to design a protocol for IQC, address poor assay performance, and assess the efficiency of existing laboratory processes. PMID:29692587
NASA Technical Reports Server (NTRS)
Lee, P. J.
1985-01-01
For a frequency-hopped noncoherent MFSK communication system without jammer state information (JSI) in a worst case partial band jamming environment, it is well known that the use of a conventional unquantized metric results in very poor performance. In this paper, a 'normalized' unquantized energy metric is suggested for such a system. It is shown that with this metric, one can save 2-3 dB in required signal energy over the system with hard decision metric without JSI for the same desired performance. When this very robust metric is compared to the conventional unquantized energy metric with JSI, the loss in required signal energy is shown to be small. Thus, the use of this normalized metric provides performance comparable to systems for which JSI is known. Cutoff rate and bit error rate with dual-k coding are used for the performance measures.
NASA Astrophysics Data System (ADS)
Wagle, Pradeep; Bhattarai, Nishan; Gowda, Prasanna H.; Kakani, Vijaya G.
2017-06-01
Robust evapotranspiration (ET) models are required to predict water usage in a variety of terrestrial ecosystems under different geographical and agrometeorological conditions. As a result, several remote sensing-based surface energy balance (SEB) models have been developed to estimate ET over large regions. However, comparison of the performance of several SEB models at the same site is limited. In addition, none of the SEB models have been evaluated for their ability to predict ET in rain-fed high biomass sorghum grown for biofuel production. In this paper, we evaluated the performance of five widely used single-source SEB models, namely Surface Energy Balance Algorithm for Land (SEBAL), Mapping ET with Internalized Calibration (METRIC), Surface Energy Balance System (SEBS), Simplified Surface Energy Balance Index (S-SEBI), and operational Simplified Surface Energy Balance (SSEBop), for estimating ET over a high biomass sorghum field during the 2012 and 2013 growing seasons. The predicted ET values were compared against eddy covariance (EC) measured ET (ETEC) for 19 cloud-free Landsat image. In general, S-SEBI, SEBAL, and SEBS performed reasonably well for the study period, while METRIC and SSEBop performed poorly. All SEB models substantially overestimated ET under extremely dry conditions as they underestimated sensible heat (H) and overestimated latent heat (LE) fluxes under dry conditions during the partitioning of available energy. METRIC, SEBAL, and SEBS overestimated LE regardless of wet or dry periods. Consequently, predicted seasonal cumulative ET by METRIC, SEBAL, and SEBS were higher than seasonal cumulative ETEC in both seasons. In contrast, S-SEBI and SSEBop substantially underestimated ET under too wet conditions, and predicted seasonal cumulative ET by S-SEBI and SSEBop were lower than seasonal cumulative ETEC in the relatively wetter 2013 growing season. Our results indicate the necessity of inclusion of soil moisture or plant water stress component in SEB models for the improvement of their performance, especially under too dry or wet environments.
Spectrum splitting metrics and effect of filter characteristics on photovoltaic system performance.
Russo, Juan M; Zhang, Deming; Gordon, Michael; Vorndran, Shelby; Wu, Yuechen; Kostuk, Raymond K
2014-03-10
During the past few years there has been a significant interest in spectrum splitting systems to increase the overall efficiency of photovoltaic solar energy systems. However, methods for comparing the performance of spectrum splitting systems and the effects of optical spectral filter design on system performance are not well developed. This paper addresses these two areas. The system conversion efficiency is examined in detail and the role of optical spectral filters with respect to the efficiency is developed. A new metric termed the Improvement over Best Bandgap is defined which expresses the efficiency gain of the spectrum splitting system with respect to a similar system that contains the highest constituent single bandgap photovoltaic cell. This parameter indicates the benefit of using the more complex spectrum splitting system with respect to a single bandgap photovoltaic system. Metrics are also provided to assess the performance of experimental spectral filters in different spectrum splitting configurations. The paper concludes by using the methodology to evaluate spectrum splitting systems with different filter configurations and indicates the overall efficiency improvement that is possible with ideal and experimental designs.
Strategy quantification using body worn inertial sensors in a reactive agility task.
Eke, Chika U; Cain, Stephen M; Stirling, Leia A
2017-11-07
Agility performance is often evaluated using time-based metrics, which provide little information about which factors aid or limit success. The objective of this study was to better understand agility strategy by identifying biomechanical metrics that were sensitive to performance speed, which were calculated with data from an array of body-worn inertial sensors. Five metrics were defined (normalized number of foot contacts, stride length variance, arm swing variance, mean normalized stride frequency, and number of body rotations) that corresponded to agility terms defined by experts working in athletic, clinical, and military environments. Eighteen participants donned 13 sensors to complete a reactive agility task, which involved navigating a set of cones in response to a vocal cue. Participants were grouped into fast, medium, and slow performance based on their completion time. Participants in the fast group had the smallest number of foot contacts (normalizing by height), highest stride length variance (normalizing by height), highest forearm angular velocity variance, and highest stride frequency (normalizing by height). The number of body rotations was not sensitive to speed and may have been determined by hand and foot dominance while completing the agility task. The results of this study have the potential to inform the development of a composite agility score constructed from the list of significant metrics. By quantifying the agility terms previously defined by expert evaluators through an agility score, this study can assist in strategy development for training and rehabilitation across athletic, clinical, and military domains. Copyright © 2017 Elsevier Ltd. All rights reserved.
Quantitative adaptation analytics for assessing dynamic systems of systems: LDRD Final Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gauthier, John H.; Miner, Nadine E.; Wilson, Michael L.
2015-01-01
Our society is increasingly reliant on systems and interoperating collections of systems, known as systems of systems (SoS). These SoS are often subject to changing missions (e.g., nation- building, arms-control treaties), threats (e.g., asymmetric warfare, terrorism), natural environments (e.g., climate, weather, natural disasters) and budgets. How well can SoS adapt to these types of dynamic conditions? This report details the results of a three year Laboratory Directed Research and Development (LDRD) project aimed at developing metrics and methodologies for quantifying the adaptability of systems and SoS. Work products include: derivation of a set of adaptability metrics, a method for combiningmore » the metrics into a system of systems adaptability index (SoSAI) used to compare adaptability of SoS designs, development of a prototype dynamic SoS (proto-dSoS) simulation environment which provides the ability to investigate the validity of the adaptability metric set, and two test cases that evaluate the usefulness of a subset of the adaptability metrics and SoSAI for distinguishing good from poor adaptability in a SoS. Intellectual property results include three patents pending: A Method For Quantifying Relative System Adaptability, Method for Evaluating System Performance, and A Method for Determining Systems Re-Tasking.« less
NASA Astrophysics Data System (ADS)
Kwon, Hyeokjun; Kang, Yoojin; Jang, Junwoo
2017-09-01
Color fidelity has been used as one of indices to evaluate the performance of light sources. Since the Color Rendering Index (CRI) was proposed at CIE, many color fidelity metrics have been proposed to increase the accuracy of the metric. This paper focuses on a comparison of the color fidelity metrics in an aspect of accuracy with human visual assessments. To visually evaluate the color fidelity of light sources, we made a simulator that reproduces the color samples under lighting conditions. In this paper, eighteen color samples of the Macbeth color checker under test light sources and reference illuminant for each of them are simulated and displayed on a well-characterized monitor. With only a spectrum set of the test light source and reference illuminant, color samples under any lighting condition can be reproduced. In this paper, the spectrums of the two LED and two OLED light sources that have similar values of CRI are used for the visual assessment. In addition, the results of the visual assessment are compared with the two color fidelity metrics that include CRI and IES TM-30-15 (Rf), proposed by Illuminating Engineering Society (IES) in 2015. Experimental results indicate that Rf outperforms CRI in terms of the correlation with visual assessment.
Learning Compositional Shape Models of Multiple Distance Metrics by Information Projection.
Luo, Ping; Lin, Liang; Liu, Xiaobai
2016-07-01
This paper presents a novel compositional contour-based shape model by incorporating multiple distance metrics to account for varying shape distortions or deformations. Our approach contains two key steps: 1) contour feature generation and 2) generative model pursuit. For each category, we first densely sample an ensemble of local prototype contour segments from a few positive shape examples and describe each segment using three different types of distance metrics. These metrics are diverse and complementary with each other to capture various shape deformations. We regard the parameterized contour segment plus an additive residual ϵ as a basic subspace, namely, ϵ -ball, in the sense that it represents local shape variance under the certain distance metric. Using these ϵ -balls as features, we then propose a generative learning algorithm to pursue the compositional shape model, which greedily selects the most representative features under the information projection principle. In experiments, we evaluate our model on several public challenging data sets, and demonstrate that the integration of multiple shape distance metrics is capable of dealing various shape deformations, articulations, and background clutter, hence boosting system performance.
Performance evaluation of a distance learning program.
Dailey, D J; Eno, K R; Brinkley, J F
1994-01-01
This paper presents a performance metric which uses a single number to characterize the response time for a non-deterministic client-server application operating over the Internet. When applied to a Macintosh-based distance learning application called the Digital Anatomist Browser, the metric allowed us to observe that "A typical student doing a typical mix of Browser commands on a typical data set will experience the same delay if they use a slow Macintosh on a local network or a fast Macintosh on the other side of the country accessing the data over the Internet." The methodology presented is applicable to other client-server applications that are rapidly appearing on the Internet.
NASA Technical Reports Server (NTRS)
Strybel, Thomas Z.; Vu, Kim-Phuong L.; Battiste, Vernol; Dao, Arik-Quang; Dwyer, John P.; Landry, Steven; Johnson, Walter; Ho, Nhut
2011-01-01
A research consortium of scientists and engineers from California State University Long Beach (CSULB), San Jose State University Foundation (SJSUF), California State University Northridge (CSUN), Purdue University, and The Boeing Company was assembled to evaluate the impact of changes in roles and responsibilities and new automated technologies, being introduced in the Next Generation Air Transportation System (NextGen), on operator situation awareness (SA) and workload. To meet these goals, consortium members performed systems analyses of NextGen concepts and airspace scenarios, and concurrently evaluated SA, workload, and performance measures to assess their appropriateness for evaluations of NextGen concepts and tools. The following activities and accomplishments were supported by the NRA: a distributed simulation, metric development, systems analysis, part-task simulations, and large-scale simulations. As a result of this NRA, we have gained a greater understanding of situation awareness and its measurement, and have shared our knowledge with the scientific community. This network provides a mechanism for consortium members, colleagues, and students to pursue research on other topics in air traffic management and aviation, thus enabling them to make greater contributions to the field
Shaikh, Faiq; Hendrata, Kenneth; Kolowitz, Brian; Awan, Omer; Shrestha, Rasu; Deible, Christopher
2017-06-01
In the era of value-based healthcare, many aspects of medical care are being measured and assessed to improve quality and reduce costs. Radiology adds enormously to health care costs and is under pressure to adopt a more efficient system that incorporates essential metrics to assess its value and impact on outcomes. Most current systems tie radiologists' incentives and evaluations to RVU-based productivity metrics and peer-review-based quality metrics. In a new potential model, a radiologist's performance will have to increasingly depend on a number of parameters that define "value," beginning with peer review metrics that include referrer satisfaction and feedback from radiologists to the referring physician that evaluates the potency and validity of clinical information provided for a given study. These new dimensions of value measurement will directly impact the cascade of further medical management. We share our continued experience with this project that had two components: RESP (Referrer Evaluation System Pilot) and FRACI (Feedback from Radiologist Addressing Confounding Issues), which were introduced to the clinical radiology workflow in order to capture referrer-based and radiologist-based feedback on radiology reporting. We also share our insight into the principles of design thinking as applied in its planning and execution.
A novel critical infrastructure resilience assessment approach using dynamic Bayesian networks
NASA Astrophysics Data System (ADS)
Cai, Baoping; Xie, Min; Liu, Yonghong; Liu, Yiliu; Ji, Renjie; Feng, Qiang
2017-10-01
The word resilience originally originates from the Latin word "resiliere", which means to "bounce back". The concept has been used in various fields, such as ecology, economics, psychology, and society, with different definitions. In the field of critical infrastructure, although some resilience metrics are proposed, they are totally different from each other, which are determined by the performances of the objects of evaluation. Here we bridge the gap by developing a universal critical infrastructure resilience metric from the perspective of reliability engineering. A dynamic Bayesian networks-based assessment approach is proposed to calculate the resilience value. A series, parallel and voting system is used to demonstrate the application of the developed resilience metric and assessment approach.
Comparison of information theoretic divergences for sensor management
NASA Astrophysics Data System (ADS)
Yang, Chun; Kadar, Ivan; Blasch, Erik; Bakich, Michael
2011-06-01
In this paper, we compare the information-theoretic metrics of the Kullback-Leibler (K-L) and Renyi (α) divergence formulations for sensor management. Information-theoretic metrics have been well suited for sensor management as they afford comparisons between distributions resulting from different types of sensors under different actions. The difference in distributions can also be measured as entropy formulations to discern the communication channel capacity (i.e., Shannon limit). In this paper, we formulate a sensor management scenario for target tracking and compare various metrics for performance evaluation as a function of the design parameter (α) so as to determine which measures might be appropriate for sensor management given the dynamics of the scenario and design parameter.
Standardised metrics for global surgical surveillance.
Weiser, Thomas G; Makary, Martin A; Haynes, Alex B; Dziekan, Gerald; Berry, William R; Gawande, Atul A
2009-09-26
Public health surveillance relies on standardised metrics to evaluate disease burden and health system performance. Such metrics have not been developed for surgical services despite increasing volume, substantial cost, and high rates of death and disability associated with surgery. The Safe Surgery Saves Lives initiative of WHO's Patient Safety Programme has developed standardised public health metrics for surgical care that are applicable worldwide. We assembled an international panel of experts to develop and define metrics for measuring the magnitude and effect of surgical care in a population, while taking into account economic feasibility and practicability. This panel recommended six measures for assessing surgical services at a national level: number of operating rooms, number of operations, number of accredited surgeons, number of accredited anaesthesia professionals, day-of-surgery death ratio, and postoperative in-hospital death ratio. We assessed the feasibility of gathering such statistics at eight diverse hospitals in eight countries and incorporated them into the WHO Guidelines for Safe Surgery, in which methods for data collection, analysis, and reporting are outlined.
Choi, M H; Oh, S N; Park, G E; Yeo, D-M; Jung, S E
2018-05-10
To evaluate the interobserver and intermethod correlations of histogram metrics of dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) parameters acquired by multiple readers using the single-section and whole-tumor volume methods. Four DCE parameters (K trans , K ep , V e , V p ) were evaluated in 45 patients (31 men and 14 women; mean age, 61±11 years [range, 29-83 years]) with locally advanced rectal cancer using pre-chemoradiotherapy (CRT) MRI. Ten histogram metrics were extracted using two methods of lesion selection performed by three radiologists: the whole-tumor volume method for the whole tumor on axial section-by-section images and the single-section method for the entire area of the tumor on one axial image. The interobserver and intermethod correlations were evaluated using the intraclass correlation coefficients (ICCs). The ICCs showed excellent interobserver and intermethod correlations in most of histogram metrics of the DCE parameters. The ICCs among the three readers were > 0.7 (P<0.001) for all histogram metrics, except for the minimum and maximum. The intermethod correlations for most of the histogram metrics were excellent for each radiologist, regardless of the differences in the radiologists' experience. The interobserver and intermethod correlations for most of the histogram metrics of the DCE parameters are excellent in rectal cancer. Therefore, the single-section method may be a potential alternative to the whole-tumor volume method using pre-CRT MRI, despite the fact that the high agreement between the two methods cannot be extrapolated to post-CRT MRI. Copyright © 2018 Société française de radiologie. Published by Elsevier Masson SAS. All rights reserved.
NASA Astrophysics Data System (ADS)
Samardzic, Nikolina
The effectiveness of in-vehicle speech communication can be a good indicator of the perception of the overall vehicle quality and customer satisfaction. Currently available speech intelligibility metrics do not account in their procedures for essential parameters needed for a complete and accurate evaluation of in-vehicle speech intelligibility. These include the directivity and the distance of the talker with respect to the listener, binaural listening, hearing profile of the listener, vocal effort, and multisensory hearing. In the first part of this research the effectiveness of in-vehicle application of these metrics is investigated in a series of studies to reveal their shortcomings, including a wide range of scores resulting from each of the metrics for a given measurement configuration and vehicle operating condition. In addition, the nature of a possible correlation between the scores obtained from each metric is unknown. The metrics and the subjective perception of speech intelligibility using, for example, the same speech material have not been compared in literature. As a result, in the second part of this research, an alternative method for speech intelligibility evaluation is proposed for use in the automotive industry by utilizing a virtual reality driving environment for ultimately setting targets, including the associated statistical variability, for future in-vehicle speech intelligibility evaluation. The Speech Intelligibility Index (SII) was evaluated at the sentence Speech Receptions Threshold (sSRT) for various listening situations and hearing profiles using acoustic perception jury testing and a variety of talker and listener configurations and background noise. In addition, the effect of individual sources and transfer paths of sound in an operating vehicle to the vehicle interior sound, specifically their effect on speech intelligibility was quantified, in the framework of the newly developed speech intelligibility evaluation method. Lastly, as an example of the significance of speech intelligibility evaluation in the context of an applicable listening environment, as indicated in this research, it was found that the jury test participants required on average an approximate 3 dB increase in sound pressure level of speech material while driving and listening compared to when just listening, for an equivalent speech intelligibility performance and the same listening task.
Creating "Intelligent" Climate Model Ensemble Averages Using a Process-Based Framework
NASA Astrophysics Data System (ADS)
Baker, N. C.; Taylor, P. C.
2014-12-01
The CMIP5 archive contains future climate projections from over 50 models provided by dozens of modeling centers from around the world. Individual model projections, however, are subject to biases created by structural model uncertainties. As a result, ensemble averaging of multiple models is often used to add value to model projections: consensus projections have been shown to consistently outperform individual models. Previous reports for the IPCC establish climate change projections based on an equal-weighted average of all model projections. However, certain models reproduce climate processes better than other models. Should models be weighted based on performance? Unequal ensemble averages have previously been constructed using a variety of mean state metrics. What metrics are most relevant for constraining future climate projections? This project develops a framework for systematically testing metrics in models to identify optimal metrics for unequal weighting multi-model ensembles. A unique aspect of this project is the construction and testing of climate process-based model evaluation metrics. A climate process-based metric is defined as a metric based on the relationship between two physically related climate variables—e.g., outgoing longwave radiation and surface temperature. Metrics are constructed using high-quality Earth radiation budget data from NASA's Clouds and Earth's Radiant Energy System (CERES) instrument and surface temperature data sets. It is found that regional values of tested quantities can vary significantly when comparing weighted and unweighted model ensembles. For example, one tested metric weights the ensemble by how well models reproduce the time-series probability distribution of the cloud forcing component of reflected shortwave radiation. The weighted ensemble for this metric indicates lower simulated precipitation (up to .7 mm/day) in tropical regions than the unweighted ensemble: since CMIP5 models have been shown to overproduce precipitation, this result could indicate that the metric is effective in identifying models which simulate more realistic precipitation. Ultimately, the goal of the framework is to identify performance metrics for advising better methods for ensemble averaging models and create better climate predictions.
2017-01-01
Technological developments and greater rigor in the quantitative measurement of biological features in medical images have given rise to an increased interest in using quantitative imaging biomarkers (QIBs) to measure changes in these features. Critical to the performance of a QIB in preclinical or clinical settings are three primary metrology areas of interest: measurement linearity and bias, repeatability, and the ability to consistently reproduce equivalent results when conditions change, as would be expected in any clinical trial. Unfortunately, performance studies to date differ greatly in designs, analysis method and metrics used to assess a QIB for clinical use. It is therefore, difficult or not possible to integrate results from different studies or to use reported results to design studies. The Radiological Society of North America (RSNA) and the Quantitative Imaging Biomarker Alliance (QIBA) with technical, radiological and statistical experts developed a set of technical performance analysis methods, metrics and study designs that provide terminology, metrics and methods consistent with widely accepted metrological standards. This document provides a consistent framework for the conduct and evaluation of QIB performance studies so that results from multiple studies can be compared, contrasted or combined. PMID:24919831
Effect of cerebral spinal fluid suppression for diffusional kurtosis imaging.
Yang, Alicia W; Jensen, Jens H; Hu, Caixia C; Tabesh, Ali; Falangola, Maria F; Helpern, Joseph A
2013-02-01
To evaluate the cerebral spinal fluid (CSF) partial volume effect on diffusional kurtosis imaging (DKI) metrics in white matter and cortical gray matter. Four healthy volunteers participated in this study. Standard DKI and fluid-attenuated inversion recovery (FLAIR) DKI experiments were performed using a twice-refocused-spin-echo diffusion sequence. The conventional diffusion tensor imaging (DTI) metrics of fractional anisotropy (FA), mean, axial, and radial diffusivity (MD, D[symbol in text], D[symbol in text] together with DKI metrics of mean, axial, and radial kurtosis (MK, K[symbol in text], K[symbol in text], were measured and compared. Single image slices located above the lateral ventricles, with similar anatomical features for each subject, were selected to minimize the effect of CSF from the ventricles. In white matter, differences of less than 10% were observed between diffusion metrics measured with standard DKI and FLAIR-DKI sequences, suggesting minimal CSF contamination. For gray matter, conventional DTI metrics differed by 19% to 52%, reflecting significant CSF partial volume effects. Kurtosis metrics, however, changed by 11% or less, indicating greater robustness with respect to CSF contamination. Kurtosis metrics are less sensitive to CSF partial voluming in cortical gray matter than conventional diffusion metrics. The kurtosis metrics may then be more specific indicators of changes in tissue microstructure, provided the effect sizes for the changes are comparable. Copyright © 2012 Wiley Periodicals, Inc.
Gaze entropy reflects surgical task load.
Di Stasi, Leandro L; Diaz-Piedra, Carolina; Rieiro, Héctor; Sánchez Carrión, José M; Martin Berrido, Mercedes; Olivares, Gonzalo; Catena, Andrés
2016-11-01
Task (over-)load imposed on surgeons is a main contributing factor to surgical errors. Recent research has shown that gaze metrics represent a valid and objective index to asses operator task load in non-surgical scenarios. Thus, gaze metrics have the potential to improve workplace safety by providing accurate measurements of task load variations. However, the direct relationship between gaze metrics and surgical task load has not been investigated yet. We studied the effects of surgical task complexity on the gaze metrics of surgical trainees. We recorded the eye movements of 18 surgical residents, using a mobile eye tracker system, during the performance of three high-fidelity virtual simulations of laparoscopic exercises of increasing complexity level: Clip Applying exercise, Cutting Big exercise, and Translocation of Objects exercise. We also measured performance accuracy and subjective rating of complexity. Gaze entropy and velocity linearly increased with increased task complexity: Visual exploration pattern became less stereotyped (i.e., more random) and faster during the more complex exercises. Residents performed better the Clip Applying exercise and the Cutting Big exercise than the Translocation of Objects exercise and their perceived task complexity differed accordingly. Our data show that gaze metrics are a valid and reliable surgical task load index. These findings have potential impacts to improve patient safety by providing accurate measurements of surgeon task (over-)load and might provide future indices to assess residents' learning curves, independently of expensive virtual simulators or time-consuming expert evaluation.
Ho, C. K.; Pacheco, J. E.
2015-06-05
A new metric, the Levelized Cost of Coating (LCOC), is derived in this paper to evaluate and compare alternative solar selective absorber coatings against a baseline coating (Pyromark 2500). In contrast to previous metrics that focused only on the optical performance of the coating, the LCOC includes costs, durability, and optical performance for more comprehensive comparisons among candidate materials. The LCOC is defined as the annualized marginal cost of the coating to produce a baseline annual thermal energy production. Costs include the cost of materials and labor for initial application and reapplication of the coating, as well as the costmore » of additional or fewer heliostats to yield the same annual thermal energy production as the baseline coating. Results show that important factors impacting the LCOC include the initial solar absorptance, thermal emittance, reapplication interval, degradation rate, reapplication cost, and downtime during reapplication. The LCOC can also be used to determine the optimal reapplication interval to minimize the levelized cost of energy production. As a result, similar methods can be applied more generally to determine the levelized cost of component for other applications and systems.« less
Evaluation of Candidate Millimeter Wave Sensors for Synthetic Vision
NASA Technical Reports Server (NTRS)
Alexander, Neal T.; Hudson, Brian H.; Echard, Jim D.
1994-01-01
The goal of the Synthetic Vision Technology Demonstration Program was to demonstrate and document the capabilities of current technologies to achieve safe aircraft landing, take off, and ground operation in very low visibility conditions. Two of the major thrusts of the program were (1) sensor evaluation in measured weather conditions on a tower overlooking an unused airfield and (2) flight testing of sensor and pilot performance via a prototype system. The presentation first briefly addresses the overall technology thrusts and goals of the program and provides a summary of MMW sensor tower-test and flight-test data collection efforts. Data analysis and calibration procedures for both the tower tests and flight tests are presented. The remainder of the presentation addresses the MMW sensor flight-test evaluation results, including the processing approach for determination of various performance metrics (e.g., contrast, sharpness, and variability). The variation of the very important contrast metric in adverse weather conditions is described. Design trade-off considerations for Synthetic Vision MMW sensors are presented.
Texture metric that predicts target detection performance
NASA Astrophysics Data System (ADS)
Culpepper, Joanne B.
2015-12-01
Two texture metrics based on gray level co-occurrence error (GLCE) are used to predict probability of detection and mean search time. The two texture metrics are local clutter metrics and are based on the statistics of GLCE probability distributions. The degree of correlation between various clutter metrics and the target detection performance of the nine military vehicles in complex natural scenes found in the Search_2 dataset are presented. Comparison is also made between four other common clutter metrics found in the literature: root sum of squares, Doyle, statistical variance, and target structure similarity. The experimental results show that the GLCE energy metric is a better predictor of target detection performance when searching for targets in natural scenes than the other clutter metrics studied.
Flight Tasks and Metrics to Evaluate Laser Eye Protection in Flight Simulators
2017-07-07
AFRL-RH-FS-TR-2017-0026 Flight Tasks and Metrics to Evaluate Laser Eye Protection in Flight Simulators Thomas K. Kuyk Peter A. Smith Solangia...34Flight Tasks and Metrics to Evaluate Laser Eye Protection in Flight Simulators" (AFRL-RH-FS-TR- 2017 - 0026 SHORTER.PATRI CK.D.1023156390 Digitally...SUBTITLE Flight Tasks and Metrics to Evaluate Laser Eye Protection in Flight Simulators 5a. CONTRACT NUMBER FA8650-14-D-6519 5b. GRANT NUMBER 5c
Metrication report to the Congress
NASA Technical Reports Server (NTRS)
1990-01-01
The principal NASA metrication activities for FY 1989 were a revision of NASA metric policy and evaluation of the impact of using the metric system of measurement for the design and construction of the Space Station Freedom. Additional studies provided a basis for focusing follow-on activity. In FY 1990, emphasis will shift to implementation of metric policy and development of a long-range metrication plan. The report which follows addresses Policy Development, Planning and Program Evaluation, and Supporting Activities for the past and coming year.
Impact of Immediate Interpretation of Screening Tomosynthesis Mammography on Performance Metrics.
Winkler, Nicole S; Freer, Phoebe; Anzai, Yoshimi; Hu, Nan; Stein, Matthew
2018-05-07
This study aimed to compare performance metrics for immediate and delayed batch interpretation of screening tomosynthesis mammograms. This HIPAA compliant study was approved by institutional review board with a waiver of consent. A retrospective analysis of screening performance metrics for tomosynthesis mammograms interpreted in 2015 when mammograms were read immediately was compared to historical controls from 2013 to 2014 when mammograms were batch interpreted after the patient had departed. A total of 5518 screening tomosynthesis mammograms (n = 1212 for batch interpretation and n = 4306 for immediate interpretation) were evaluated. The larger sample size for the latter group reflects a group practice shift to performing tomosynthesis for the majority of patients. Age, breast density, comparison examinations, and high-risk status were compared. An asymptotic proportion test and multivariable analysis were used to compare performance metrics. There was no statistically significant difference in recall or cancer detection rates for the batch interpretation group compared to immediate interpretation group with respective recall rate of 6.5% vs 5.3% = +1.2% (95% confidence interval -0.3 to 2.7%; P = .101) and cancer detection rate of 6.6 vs 7.2 per thousand = -0.6 (95% confidence interval -5.9 to 4.6; P = .825). There was no statistically significant difference in positive predictive values (PPVs) including PPV1 (screening recall), PPV2 (biopsy recommendation), or PPV 3 (biopsy performed) with batch interpretation (10.1%, 42.1%, and 40.0%, respectively) and immediate interpretation (13.6%, 39.2%, and 39.7%, respectively). After adjusting for age, breast density, high-risk status, and comparison mammogram, there was no difference in the odds of being recalled or cancer detection between the two groups. There is no statistically significant difference in interpretation performance metrics for screening tomosynthesis mammograms interpreted immediately compared to those interpreted in a delayed fashion. Copyright © 2018. Published by Elsevier Inc.
The Case for a Joint Evaluation
2017-01-01
intelligence, and engineering. Finally, the comparative time ex - pended by the combatant commanders (CCDRs) on fulfilling four different evaluation...template for the joint-centric construct would align with the four de facto sections noted earlier: an identifica- tion section, a performance metric...intangible or have not been properly researched. For example, under one evaluation system, a Servicemember’s separation or retire- ment into a post
Conklin, Annalijn; Nolte, Ellen; Vrijhoef, Hubertus
2013-01-01
An overview was produced of approaches currently used to evaluate chronic disease management in selected European countries. The study aims to describe the methods and metrics used in Europe as a first to help advance the methodological basis for their assessment. A common template for collection of evaluation methods and performance measures was sent to key informants in twelve European countries; responses were summarized in tables based on template evaluation categories. Extracted data were descriptively analyzed. Approaches to the evaluation of chronic disease management vary widely in objectives, designs, metrics, observation period, and data collection methods. Half of the reported studies used noncontrolled designs. The majority measure clinical process measures, patient behavior and satisfaction, cost and utilization; several also used a range of structural indicators. Effects are usually observed over 1 or 3 years on patient populations with a single, commonly prevalent, chronic disease. There is wide variation within and between European countries on approaches to evaluating chronic disease management in their objectives, designs, indicators, target audiences, and actors involved. This study is the first extensive, international overview of the area reported in the literature.
ARM Data-Oriented Metrics and Diagnostics Package for Climate Model Evaluation Value-Added Product
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Chengzhu; Xie, Shaocheng
A Python-based metrics and diagnostics package is currently being developed by the U.S. Department of Energy (DOE) Atmospheric Radiation Measurement (ARM) Infrastructure Team at Lawrence Livermore National Laboratory (LLNL) to facilitate the use of long-term, high-frequency measurements from the ARM Facility in evaluating the regional climate simulation of clouds, radiation, and precipitation. This metrics and diagnostics package computes climatological means of targeted climate model simulation and generates tables and plots for comparing the model simulation with ARM observational data. The Coupled Model Intercomparison Project (CMIP) model data sets are also included in the package to enable model intercomparison as demonstratedmore » in Zhang et al. (2017). The mean of the CMIP model can serve as a reference for individual models. Basic performance metrics are computed to measure the accuracy of mean state and variability of climate models. The evaluated physical quantities include cloud fraction, temperature, relative humidity, cloud liquid water path, total column water vapor, precipitation, sensible and latent heat fluxes, and radiative fluxes, with plan to extend to more fields, such as aerosol and microphysics properties. Process-oriented diagnostics focusing on individual cloud- and precipitation-related phenomena are also being developed for the evaluation and development of specific model physical parameterizations. The version 1.0 package is designed based on data collected at ARM’s Southern Great Plains (SGP) Research Facility, with the plan to extend to other ARM sites. The metrics and diagnostics package is currently built upon standard Python libraries and additional Python packages developed by DOE (such as CDMS and CDAT). The ARM metrics and diagnostic package is available publicly with the hope that it can serve as an easy entry point for climate modelers to compare their models with ARM data. In this report, we first present the input data, which constitutes the core content of the metrics and diagnostics package in section 2, and a user's guide documenting the workflow/structure of the version 1.0 codes, and including step-by-step instruction for running the package in section 3.« less
Empirical Evaluation of Hunk Metrics as Bug Predictors
NASA Astrophysics Data System (ADS)
Ferzund, Javed; Ahsan, Syed Nadeem; Wotawa, Franz
Reducing the number of bugs is a crucial issue during software development and maintenance. Software process and product metrics are good indicators of software complexity. These metrics have been used to build bug predictor models to help developers maintain the quality of software. In this paper we empirically evaluate the use of hunk metrics as predictor of bugs. We present a technique for bug prediction that works at smallest units of code change called hunks. We build bug prediction models using random forests, which is an efficient machine learning classifier. Hunk metrics are used to train the classifier and each hunk metric is evaluated for its bug prediction capabilities. Our classifier can classify individual hunks as buggy or bug-free with 86 % accuracy, 83 % buggy hunk precision and 77% buggy hunk recall. We find that history based and change level hunk metrics are better predictors of bugs than code level hunk metrics.
Pattern Activity Clustering and Evaluation (PACE)
NASA Astrophysics Data System (ADS)
Blasch, Erik; Banas, Christopher; Paul, Michael; Bussjager, Becky; Seetharaman, Guna
2012-06-01
With the vast amount of network information available on activities of people (i.e. motions, transportation routes, and site visits) there is a need to explore the salient properties of data that detect and discriminate the behavior of individuals. Recent machine learning approaches include methods of data mining, statistical analysis, clustering, and estimation that support activity-based intelligence. We seek to explore contemporary methods in activity analysis using machine learning techniques that discover and characterize behaviors that enable grouping, anomaly detection, and adversarial intent prediction. To evaluate these methods, we describe the mathematics and potential information theory metrics to characterize behavior. A scenario is presented to demonstrate the concept and metrics that could be useful for layered sensing behavior pattern learning and analysis. We leverage work on group tracking, learning and clustering approaches; as well as utilize information theoretical metrics for classification, behavioral and event pattern recognition, and activity and entity analysis. The performance evaluation of activity analysis supports high-level information fusion of user alerts, data queries and sensor management for data extraction, relations discovery, and situation analysis of existing data.
van Hees, Vincent T.; Gorzelniak, Lukas; Dean León, Emmanuel Carlos; Eder, Martin; Pias, Marcelo; Taherian, Salman; Ekelund, Ulf; Renström, Frida; Franks, Paul W.; Horsch, Alexander; Brage, Søren
2013-01-01
Introduction Human body acceleration is often used as an indicator of daily physical activity in epidemiological research. Raw acceleration signals contain three basic components: movement, gravity, and noise. Separation of these becomes increasingly difficult during rotational movements. We aimed to evaluate five different methods (metrics) of processing acceleration signals on their ability to remove the gravitational component of acceleration during standardised mechanical movements and the implications for human daily physical activity assessment. Methods An industrial robot rotated accelerometers in the vertical plane. Radius, frequency, and angular range of motion were systematically varied. Three metrics (Euclidian norm minus one [ENMO], Euclidian norm of the high-pass filtered signals [HFEN], and HFEN plus Euclidean norm of low-pass filtered signals minus 1 g [HFEN+]) were derived for each experimental condition and compared against the reference acceleration (forward kinematics) of the robot arm. We then compared metrics derived from human acceleration signals from the wrist and hip in 97 adults (22–65 yr), and wrist in 63 women (20–35 yr) in whom daily activity-related energy expenditure (PAEE) was available. Results In the robot experiment, HFEN+ had lowest error during (vertical plane) rotations at an oscillating frequency higher than the filter cut-off frequency while for lower frequencies ENMO performed better. In the human experiments, metrics HFEN and ENMO on hip were most discrepant (within- and between-individual explained variance of 0.90 and 0.46, respectively). ENMO, HFEN and HFEN+ explained 34%, 30% and 36% of the variance in daily PAEE, respectively, compared to 26% for a metric which did not attempt to remove the gravitational component (metric EN). Conclusion In conclusion, none of the metrics as evaluated systematically outperformed all other metrics across a wide range of standardised kinematic conditions. However, choice of metric explains different degrees of variance in daily human physical activity. PMID:23626718
van Hees, Vincent T; Gorzelniak, Lukas; Dean León, Emmanuel Carlos; Eder, Martin; Pias, Marcelo; Taherian, Salman; Ekelund, Ulf; Renström, Frida; Franks, Paul W; Horsch, Alexander; Brage, Søren
2013-01-01
Human body acceleration is often used as an indicator of daily physical activity in epidemiological research. Raw acceleration signals contain three basic components: movement, gravity, and noise. Separation of these becomes increasingly difficult during rotational movements. We aimed to evaluate five different methods (metrics) of processing acceleration signals on their ability to remove the gravitational component of acceleration during standardised mechanical movements and the implications for human daily physical activity assessment. An industrial robot rotated accelerometers in the vertical plane. Radius, frequency, and angular range of motion were systematically varied. Three metrics (Euclidian norm minus one [ENMO], Euclidian norm of the high-pass filtered signals [HFEN], and HFEN plus Euclidean norm of low-pass filtered signals minus 1 g [HFEN+]) were derived for each experimental condition and compared against the reference acceleration (forward kinematics) of the robot arm. We then compared metrics derived from human acceleration signals from the wrist and hip in 97 adults (22-65 yr), and wrist in 63 women (20-35 yr) in whom daily activity-related energy expenditure (PAEE) was available. In the robot experiment, HFEN+ had lowest error during (vertical plane) rotations at an oscillating frequency higher than the filter cut-off frequency while for lower frequencies ENMO performed better. In the human experiments, metrics HFEN and ENMO on hip were most discrepant (within- and between-individual explained variance of 0.90 and 0.46, respectively). ENMO, HFEN and HFEN+ explained 34%, 30% and 36% of the variance in daily PAEE, respectively, compared to 26% for a metric which did not attempt to remove the gravitational component (metric EN). In conclusion, none of the metrics as evaluated systematically outperformed all other metrics across a wide range of standardised kinematic conditions. However, choice of metric explains different degrees of variance in daily human physical activity.
Building an Evaluation Scale using Item Response Theory.
Lalor, John P; Wu, Hao; Yu, Hong
2016-11-01
Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.
Building an Evaluation Scale using Item Response Theory
Lalor, John P.; Wu, Hao; Yu, Hong
2016-01-01
Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.1 PMID:28004039
Evaluating CMIP5 Simulations of Historical Continental Climate with Koeppen Bioclimatic Metrics
NASA Astrophysics Data System (ADS)
Phillips, T. J.; Bonfils, C.
2013-12-01
The classic Koeppen bioclimatic classification scheme associates generic vegetation types (e.g. grassland, tundra, broadleaf or evergreen forests, etc.) with regional climate zones defined by their annual cycles of continental temperature (T) and precipitation (P), considered together. The locations or areas of Koeppen vegetation types derived from observational data thus can provide concise metrical standards for simultaneously evaluating climate simulations of T and P in naturally defined regions. The CMIP5 models' collective ability to correctly represent two variables that are critically important for living organisms at regional scales is therefore central to this evaluation. For this study, 14 Koeppen vegetation types are derived from annual-cycle climatologies of T and P in some 3 dozen CMIP5 simulations of the 1980-1999 period. Metrics for evaluating the ability of the CMIP5 models to simulate the correct locations and areas of each vegetation type, as well as measures of overall model performance, also are developed. It is found that the CMIP5 models are generally most deficient in simulating: 1) climates of drier Koeppen zones (e.g. desert, savanna, grassland, steppe vegetation types) located in the southwestern U.S. and Mexico, eastern Europe, southern Africa, and central Australia; 2) climates of regions such as central Asia and western South America where topography plays a key role. Details of regional T or P biases in selected simulations that exemplify general model performance problems also will be presented. Acknowledgments: This work was funded by the U.S. Department of Energy Office of Science and was performed at the Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Map of Koeppen vegetation types derived from observed T and P.
Comparative performance evaluation of a new a-Si EPID that exceeds quad high-definition resolution.
McConnell, Kristen A; Alexandrian, Ara; Papanikolaou, Niko; Stathakis, Sotiri
2018-01-01
Electronic portal imaging devices (EPIDs) are an integral part of the radiation oncology workflow for treatment setup verification. Several commercial EPID implementations are currently available, each with varying capabilities. To standardize performance evaluation, Task Group Report 58 (TG-58) and TG-142 outline specific image quality metrics to be measured. A LinaTech Image Viewing System (IVS), with the highest commercially available pixel matrix (2688x2688 pixels), was independently evaluated and compared to an Elekta iViewGT (1024x1024 pixels) and a Varian aSi-1000 (1024x768 pixels) using a PTW EPID QC Phantom. The IVS, iViewGT, and aSi-1000 were each used to acquire 20 images of the PTW QC Phantom. The QC phantom was placed on the couch and aligned at isocenter. The images were exported and analyzed using the epidSoft image quality assurance (QA) software. The reported metrics were signal linearity, isotropy of signal linearity, signal-tonoise ratio (SNR), low contrast resolution, and high-contrast resolution. These values were compared between the three EPID solutions. Computed metrics demonstrated comparable results between the EPID solutions with the IVS outperforming the aSi-1000 and iViewGT in the low and high-contrast resolution analysis. The performance of three commercial EPID solutions have been quantified, evaluated, and compared using results from the PTW QC Phantom. The IVS outperformed the other panels in low and high-contrast resolution, but to fully realize the benefits of the IVS, the selection of the monitor on which to view the high-resolution images is important to prevent down sampling and visual of resolution.
NASA Astrophysics Data System (ADS)
Demirel, Mehmet C.; Mai, Juliane; Mendiguren, Gorka; Koch, Julian; Samaniego, Luis; Stisen, Simon
2018-02-01
Satellite-based earth observations offer great opportunities to improve spatial model predictions by means of spatial-pattern-oriented model evaluations. In this study, observed spatial patterns of actual evapotranspiration (AET) are utilised for spatial model calibration tailored to target the pattern performance of the model. The proposed calibration framework combines temporally aggregated observed spatial patterns with a new spatial performance metric and a flexible spatial parameterisation scheme. The mesoscale hydrologic model (mHM) is used to simulate streamflow and AET and has been selected due to its soil parameter distribution approach based on pedo-transfer functions and the build in multi-scale parameter regionalisation. In addition two new spatial parameter distribution options have been incorporated in the model in order to increase the flexibility of root fraction coefficient and potential evapotranspiration correction parameterisations, based on soil type and vegetation density. These parameterisations are utilised as they are most relevant for simulated AET patterns from the hydrologic model. Due to the fundamental challenges encountered when evaluating spatial pattern performance using standard metrics, we developed a simple but highly discriminative spatial metric, i.e. one comprised of three easily interpretable components measuring co-location, variation and distribution of the spatial data. The study shows that with flexible spatial model parameterisation used in combination with the appropriate objective functions, the simulated spatial patterns of actual evapotranspiration become substantially more similar to the satellite-based estimates. Overall 26 parameters are identified for calibration through a sequential screening approach based on a combination of streamflow and spatial pattern metrics. The robustness of the calibrations is tested using an ensemble of nine calibrations based on different seed numbers using the shuffled complex evolution optimiser. The calibration results reveal a limited trade-off between streamflow dynamics and spatial patterns illustrating the benefit of combining separate observation types and objective functions. At the same time, the simulated spatial patterns of AET significantly improved when an objective function based on observed AET patterns and a novel spatial performance metric compared to traditional streamflow-only calibration were included. Since the overall water balance is usually a crucial goal in hydrologic modelling, spatial-pattern-oriented optimisation should always be accompanied by traditional discharge measurements. In such a multi-objective framework, the current study promotes the use of a novel bias-insensitive spatial pattern metric, which exploits the key information contained in the observed patterns while allowing the water balance to be informed by discharge observations.
Performance Evaluation in Network-Based Parallel Computing
NASA Technical Reports Server (NTRS)
Dezhgosha, Kamyar
1996-01-01
Network-based parallel computing is emerging as a cost-effective alternative for solving many problems which require use of supercomputers or massively parallel computers. The primary objective of this project has been to conduct experimental research on performance evaluation for clustered parallel computing. First, a testbed was established by augmenting our existing SUNSPARCs' network with PVM (Parallel Virtual Machine) which is a software system for linking clusters of machines. Second, a set of three basic applications were selected. The applications consist of a parallel search, a parallel sort, a parallel matrix multiplication. These application programs were implemented in C programming language under PVM. Third, we conducted performance evaluation under various configurations and problem sizes. Alternative parallel computing models and workload allocations for application programs were explored. The performance metric was limited to elapsed time or response time which in the context of parallel computing can be expressed in terms of speedup. The results reveal that the overhead of communication latency between processes in many cases is the restricting factor to performance. That is, coarse-grain parallelism which requires less frequent communication between processes will result in higher performance in network-based computing. Finally, we are in the final stages of installing an Asynchronous Transfer Mode (ATM) switch and four ATM interfaces (each 155 Mbps) which will allow us to extend our study to newer applications, performance metrics, and configurations.
Smith, Laurel B; Radomski, Mary Vining; Davidson, Leslie Freeman; Finkelstein, Marsha; Weightman, Margaret M; McCulloch, Karen L; Scherer, Matthew R
2014-01-01
OBJECTIVES. Executive functioning deficits may result from concussion. The Charge of Quarters (CQ) Duty Task is a multitask assessment designed to assess executive functioning in servicemembers after concussion. In this article, we discuss the rationale and process used in the development of the CQ Duty Task and present pilot data from the preliminary evaluation of interrater reliability (IRR). METHOD. Three evaluators observed as 12 healthy participants performed the CQ Duty Task and measured performance using various metrics. Intraclass correlation coefficient (ICC) quantified IRR. RESULTS. The ICC for task completion was .94. ICCs for other assessment metrics were variable. CONCLUSION. Preliminary IRR data for the CQ Duty Task are encouraging, but further investigation is needed to improve IRR in some domains. Lessons learned in the development of the CQ Duty Task could benefit future test development efforts with populations other than the military. Copyright © 2014 by the American Occupational Therapy Association, Inc.
Radomski, Mary Vining; Davidson, Leslie Freeman; Finkelstein, Marsha; Weightman, Margaret M.; McCulloch, Karen L.; Scherer, Matthew R.
2014-01-01
OBJECTIVES. Executive functioning deficits may result from concussion. The Charge of Quarters (CQ) Duty Task is a multitask assessment designed to assess executive functioning in servicemembers after concussion. In this article, we discuss the rationale and process used in the development of the CQ Duty Task and present pilot data from the preliminary evaluation of interrater reliability (IRR). METHOD. Three evaluators observed as 12 healthy participants performed the CQ Duty Task and measured performance using various metrics. Intraclass correlation coefficient (ICC) quantified IRR. RESULTS. The ICC for task completion was .94. ICCs for other assessment metrics were variable. CONCLUSION. Preliminary IRR data for the CQ Duty Task are encouraging, but further investigation is needed to improve IRR in some domains. Lessons learned in the development of the CQ Duty Task could benefit future test development efforts with populations other than the military. PMID:25005507
Toward an optimisation technique for dynamically monitored environment
NASA Astrophysics Data System (ADS)
Shurrab, Orabi M.
2016-10-01
The data fusion community has introduced multiple procedures of situational assessments; this is to facilitate timely responses to emerging situations. More directly, the process refinement of the Joint Directors of Laboratories (JDL) is a meta-process to assess and improve the data fusion task during real-time operation. In other wording, it is an optimisation technique to verify the overall data fusion performance, and enhance it toward the top goals of the decision-making resources. This paper discusses the theoretical concept of prioritisation. Where the analysts team is required to keep an up to date with the dynamically changing environment, concerning different domains such as air, sea, land, space and cyberspace. Furthermore, it demonstrates an illustration example of how various tracking activities are ranked, simultaneously into a predetermined order. Specifically, it presents a modelling scheme for a case study based scenario, where the real-time system is reporting different classes of prioritised events. Followed by a performance metrics for evaluating the prioritisation process of situational awareness (SWA) domain. The proposed performance metrics has been designed and evaluated using an analytical approach. The modelling scheme represents the situational awareness system outputs mathematically, in the form of a list of activities. Such methods allowed the evaluation process to conduct a rigorous analysis of the prioritisation process, despite any constrained related to a domain-specific configuration. After conducted three levels of assessments over three separates scenario, The Prioritisation Capability Score (PCS) has provided an appropriate scoring scheme for different ranking instances, Indeed, from the data fusion perspectives, the proposed metric has assessed real-time system performance adequately, and it is capable of conducting a verification process, to direct the operator's attention to any issue, concerning the prioritisation capability of situational awareness domain.
Moacdieh, Nadine; Sarter, Nadine
2015-06-01
The objective was to use eye tracking to trace the underlying changes in attention allocation associated with the performance effects of clutter, stress, and task difficulty in visual search and noticing tasks. Clutter can degrade performance in complex domains, yet more needs to be known about the associated changes in attention allocation, particularly in the presence of stress and for different tasks. Frequently used and relatively simple eye tracking metrics do not effectively capture the various effects of clutter, which is critical for comprehensively analyzing clutter and developing targeted, real-time countermeasures. Electronic medical records (EMRs) were chosen as the application domain for this research. Clutter, stress, and task difficulty were manipulated, and physicians' performance on search and noticing tasks was recorded. Several eye tracking metrics were used to trace attention allocation throughout those tasks, and subjective data were gathered via a debriefing questionnaire. Clutter degraded performance in terms of response time and noticing accuracy. These decrements were largely accentuated by high stress and task difficulty. Eye tracking revealed the underlying attentional mechanisms, and several display-independent metrics were shown to be significant indicators of the effects of clutter. Eye tracking provides a promising means to understand in detail (offline) and prevent (in real time) major performance breakdowns due to clutter. Display designers need to be aware of the risks of clutter in EMRs and other complex displays and can use the identified eye tracking metrics to evaluate and/or adjust their display. © 2015, Human Factors and Ergonomics Society.
Models of Marine Fish Biodiversity: Assessing Predictors from Three Habitat Classification Schemes.
Yates, Katherine L; Mellin, Camille; Caley, M Julian; Radford, Ben T; Meeuwig, Jessica J
2016-01-01
Prioritising biodiversity conservation requires knowledge of where biodiversity occurs. Such knowledge, however, is often lacking. New technologies for collecting biological and physical data coupled with advances in modelling techniques could help address these gaps and facilitate improved management outcomes. Here we examined the utility of environmental data, obtained using different methods, for developing models of both uni- and multivariate biodiversity metrics. We tested which biodiversity metrics could be predicted best and evaluated the performance of predictor variables generated from three types of habitat data: acoustic multibeam sonar imagery, predicted habitat classification, and direct observer habitat classification. We used boosted regression trees (BRT) to model metrics of fish species richness, abundance and biomass, and multivariate regression trees (MRT) to model biomass and abundance of fish functional groups. We compared model performance using different sets of predictors and estimated the relative influence of individual predictors. Models of total species richness and total abundance performed best; those developed for endemic species performed worst. Abundance models performed substantially better than corresponding biomass models. In general, BRT and MRTs developed using predicted habitat classifications performed less well than those using multibeam data. The most influential individual predictor was the abiotic categorical variable from direct observer habitat classification and models that incorporated predictors from direct observer habitat classification consistently outperformed those that did not. Our results show that while remotely sensed data can offer considerable utility for predictive modelling, the addition of direct observer habitat classification data can substantially improve model performance. Thus it appears that there are aspects of marine habitats that are important for modelling metrics of fish biodiversity that are not fully captured by remotely sensed data. As such, the use of remotely sensed data to model biodiversity represents a compromise between model performance and data availability.
Models of Marine Fish Biodiversity: Assessing Predictors from Three Habitat Classification Schemes
Yates, Katherine L.; Mellin, Camille; Caley, M. Julian; Radford, Ben T.; Meeuwig, Jessica J.
2016-01-01
Prioritising biodiversity conservation requires knowledge of where biodiversity occurs. Such knowledge, however, is often lacking. New technologies for collecting biological and physical data coupled with advances in modelling techniques could help address these gaps and facilitate improved management outcomes. Here we examined the utility of environmental data, obtained using different methods, for developing models of both uni- and multivariate biodiversity metrics. We tested which biodiversity metrics could be predicted best and evaluated the performance of predictor variables generated from three types of habitat data: acoustic multibeam sonar imagery, predicted habitat classification, and direct observer habitat classification. We used boosted regression trees (BRT) to model metrics of fish species richness, abundance and biomass, and multivariate regression trees (MRT) to model biomass and abundance of fish functional groups. We compared model performance using different sets of predictors and estimated the relative influence of individual predictors. Models of total species richness and total abundance performed best; those developed for endemic species performed worst. Abundance models performed substantially better than corresponding biomass models. In general, BRT and MRTs developed using predicted habitat classifications performed less well than those using multibeam data. The most influential individual predictor was the abiotic categorical variable from direct observer habitat classification and models that incorporated predictors from direct observer habitat classification consistently outperformed those that did not. Our results show that while remotely sensed data can offer considerable utility for predictive modelling, the addition of direct observer habitat classification data can substantially improve model performance. Thus it appears that there are aspects of marine habitats that are important for modelling metrics of fish biodiversity that are not fully captured by remotely sensed data. As such, the use of remotely sensed data to model biodiversity represents a compromise between model performance and data availability. PMID:27333202
Real-time performance monitoring and management system
Budhraja, Vikram S [Los Angeles, CA; Dyer, James D [La Mirada, CA; Martinez Morales, Carlos A [Upland, CA
2007-06-19
A real-time performance monitoring system for monitoring an electric power grid. The electric power grid has a plurality of grid portions, each grid portion corresponding to one of a plurality of control areas. The real-time performance monitoring system includes a monitor computer for monitoring at least one of reliability metrics, generation metrics, transmission metrics, suppliers metrics, grid infrastructure security metrics, and markets metrics for the electric power grid. The data for metrics being monitored by the monitor computer are stored in a data base, and a visualization of the metrics is displayed on at least one display computer having a monitor. The at least one display computer in one said control area enables an operator to monitor the grid portion corresponding to a different said control area.
ERIC Educational Resources Information Center
Karelina, Irina Georgievna; Sobolev, Alexander Borisovich; Sorokin, Sviatoslav Olegovich
2016-01-01
The article discusses a comprehensive reporting and monitoring framework used to evaluate the performance of state and private higher education institutions. By analyzing diversified indicators including regulatory compliance, organizational and economic indicators, training and research, and other metrics, the authors spotlight key developments…
Evaluation techniques and metrics for assessment of pan+MSI fusion (pansharpening)
NASA Astrophysics Data System (ADS)
Mercovich, Ryan A.
2015-05-01
Fusion of broadband panchromatic data with narrow band multispectral data - pansharpening - is a common and often studied problem in remote sensing. Many methods exist to produce data fusion results with the best possible spatial and spectral characteristics, and a number have been commercially implemented. This study examines the output products of 4 commercial implementations with regard to their relative strengths and weaknesses for a set of defined image characteristics and analyst use-cases. Image characteristics used are spatial detail, spatial quality, spectral integrity, and composite color quality (hue and saturation), and analyst use-cases included a variety of object detection and identification tasks. The imagery comes courtesy of the RIT SHARE 2012 collect. Two approaches are used to evaluate the pansharpening methods, analyst evaluation or qualitative measure and image quality metrics or quantitative measures. Visual analyst evaluation results are compared with metric results to determine which metrics best measure the defined image characteristics and product use-cases and to support future rigorous characterization the metrics' correlation with the analyst results. Because pansharpening represents a trade between adding spatial information from the panchromatic image, and retaining spectral information from the MSI channels, the metrics examined are grouped into spatial improvement metrics and spectral preservation metrics. A single metric to quantify the quality of a pansharpening method would necessarily be a combination of weighted spatial and spectral metrics based on the importance of various spatial and spectral characteristics for the primary task of interest. Appropriate metrics and weights for such a combined metric are proposed here, based on the conducted analyst evaluation. Additionally, during this work, a metric was developed specifically focused on assessment of spatial structure improvement relative to a reference image and independent of scene content. Using analysis of Fourier transform images, a measure of high-frequency content is computed in small sub-segments of the image. The average increase in high-frequency content across the image is used as the metric, where averaging across sub-segments combats the scene dependent nature of typical image sharpness techniques. This metric had an improved range of scores, better representing difference in the test set than other common spatial structure metrics.
Hoyer, Erik H; Padula, William V; Brotman, Daniel J; Reid, Natalie; Leung, Curtis; Lepley, Diane; Deutschendorf, Amy
2018-01-01
Hospital performance on the 30-day hospital-wide readmission (HWR) metric as calculated by the Centers for Medicare and Medicaid Services (CMS) is currently reported as a quality measure. Focusing on patient-level factors may provide an incomplete picture of readmission risk at the hospital level to explain variations in hospital readmission rates. To evaluate and quantify hospital-level characteristics that track with hospital performance on the current HWR metric. Retrospective cohort study. A total of 4785 US hospitals. We linked publically available data on individual hospitals published by CMS on patient-level adjusted 30-day HWR rates from July 1, 2011, through June 30, 2014, to the 2014 American Hospital Association annual survey. Primary outcome was performance in the worst CMS-calculated HWR quartile. Primary hospital-level exposure variables were defined as: size (total number of beds), safety net status (top quartile of disproportionate share), academic status [member of the Association of American Medical Colleges (AAMC)], National Cancer Institute Comprehensive Cancer Center (NCI-CCC) status, and hospital services offered (e.g., transplant, hospice, emergency department). Multilevel regression was used to evaluate the association between 30-day HWR and the hospital-level factors. Hospital-level characteristics significantly associated with performing in the worst CMS-calculated HWR quartile included: safety net status [adjusted odds ratio (aOR) 1.99, 95% confidence interval (95% CI) 1.61-2.45, p < 0.001], large size (> 400 beds, aOR 1.42, 95% CI 1.07-1.90, p = 0.016), AAMC alone status (aOR 1.95, 95% CI 1.35-2.83, p < 0.001), and AAMC plus NCI-CCC status (aOR 5.16, 95% CI 2.58-10.31, p < 0.001). Hospitals with more critical care beds (aOR 1.26, 95% CI 1.02-1.56, p = 0.033), those with transplant services (aOR 2.80, 95% CI 1.48-5.31,p = 0.001), and those with emergency room services (aOR 3.37, 95% CI 1.12-10.15, p = 0.031) demonstrated significantly worse HWR performance. Hospice service (aOR 0.64, 95% CI 0.50-0.82, p < 0.001) and having a higher proportion of total discharges being surgical cases (aOR 0.62, 95% CI 0.50-0.76, p < 0.001) were associated with better performance. The study approach was not intended to be an alternate readmission metric to compete with the existing CMS metric, which would require a re-examination of patient-level data combined with hospital-level data. A number of hospital-level characteristics (such as academic tertiary care center status) were significantly associated with worse performance on the CMS-calculated HWR metric, which may have important health policy implications. Until the reasons for readmission variability can be addressed, reporting the current HWR metric as an indicator of hospital quality should be reevaluated.
Cho, Woon; Jang, Jinbeum; Koschan, Andreas; Abidi, Mongi A; Paik, Joonki
2016-11-28
A fundamental limitation of hyperspectral imaging is the inter-band misalignment correlated with subject motion during data acquisition. One way of resolving this problem is to assess the alignment quality of hyperspectral image cubes derived from the state-of-the-art alignment methods. In this paper, we present an automatic selection framework for the optimal alignment method to improve the performance of face recognition. Specifically, we develop two qualitative prediction models based on: 1) a principal curvature map for evaluating the similarity index between sequential target bands and a reference band in the hyperspectral image cube as a full-reference metric; and 2) the cumulative probability of target colors in the HSV color space for evaluating the alignment index of a single sRGB image rendered using all of the bands of the hyperspectral image cube as a no-reference metric. We verify the efficacy of the proposed metrics on a new large-scale database, demonstrating a higher prediction accuracy in determining improved alignment compared to two full-reference and five no-reference image quality metrics. We also validate the ability of the proposed framework to improve hyperspectral face recognition.
Metric-driven harm: an exploration of unintended consequences of performance measurement.
Rambur, Betty; Vallett, Carol; Cohen, Judith A; Tarule, Jill Mattuck
2013-11-01
Performance measurement is an increasingly common element of the US health care system. Typically a proxy for high quality outcomes, there has been little systematic investigation of the potential negative unintended consequences of performance metrics, including metric-driven harm. This case study details an incidence of post-surgical metric-driven harm and offers Smith's 1995 work and a patient centered, context sensitive metric model for potential adoption by nurse researchers and clinicians. Implications for further research are discussed. © 2013.
A comparative study on methods of improving SCR for ship detection in SAR image
NASA Astrophysics Data System (ADS)
Lang, Haitao; Shi, Hongji; Tao, Yunhong; Ma, Li
2017-10-01
Knowledge about ship positions plays a critical role in a wide range of maritime applications. To improve the performance of ship detector in SAR image, an effective strategy is improving the signal-to-clutter ratio (SCR) before conducting detection. In this paper, we present a comparative study on methods of improving SCR, including power-law scaling (PLS), max-mean and max-median filter (MMF1 and MMF2), method of wavelet transform (TWT), traditional SPAN detector, reflection symmetric metric (RSM), scattering mechanism metric (SMM). The ability of SCR improvement to SAR image and ship detection performance associated with cell- averaging CFAR (CA-CFAR) of different methods are evaluated on two real SAR data.
The Effect of Training Data Set Composition on the Performance of a Neural Image Caption Generator
2017-09-01
objects was compared using the Metric for Evaluation of Translation with Explicit Ordering (METEOR) and Consensus-Based Image Description Evaluation...using automated scoring systems. Many such systems exist, including Bilingual Evaluation Understudy (BLEU), Consensus-Based Image Description Evaluation...shown to be essential to automated scoring, which correlates highly with human precision.5 CIDEr uses a system of consensus among the captions and
Performance assessment in brain-computer interface-based augmentative and alternative communication
2013-01-01
A large number of incommensurable metrics are currently used to report the performance of brain-computer interfaces (BCI) used for augmentative and alterative communication (AAC). The lack of standard metrics precludes the comparison of different BCI-based AAC systems, hindering rapid growth and development of this technology. This paper presents a review of the metrics that have been used to report performance of BCIs used for AAC from January 2005 to January 2012. We distinguish between Level 1 metrics used to report performance at the output of the BCI Control Module, which translates brain signals into logical control output, and Level 2 metrics at the Selection Enhancement Module, which translates logical control to semantic control. We recommend that: (1) the commensurate metrics Mutual Information or Information Transfer Rate (ITR) be used to report Level 1 BCI performance, as these metrics represent information throughput, which is of interest in BCIs for AAC; 2) the BCI-Utility metric be used to report Level 2 BCI performance, as it is capable of handling all current methods of improving BCI performance; (3) these metrics should be supplemented by information specific to each unique BCI configuration; and (4) studies involving Selection Enhancement Modules should report performance at both Level 1 and Level 2 in the BCI system. Following these recommendations will enable efficient comparison between both BCI Control and Selection Enhancement Modules, accelerating research and development of BCI-based AAC systems. PMID:23680020
Kasthurirathne, Suranga N; Dixon, Brian E; Gichoya, Judy; Xu, Huiping; Xia, Yuni; Mamlin, Burke; Grannis, Shaun J
2017-05-01
Existing approaches to derive decision models from plaintext clinical data frequently depend on medical dictionaries as the sources of potential features. Prior research suggests that decision models developed using non-dictionary based feature sourcing approaches and "off the shelf" tools could predict cancer with performance metrics between 80% and 90%. We sought to compare non-dictionary based models to models built using features derived from medical dictionaries. We evaluated the detection of cancer cases from free text pathology reports using decision models built with combinations of dictionary or non-dictionary based feature sourcing approaches, 4 feature subset sizes, and 5 classification algorithms. Each decision model was evaluated using the following performance metrics: sensitivity, specificity, accuracy, positive predictive value, and area under the receiver operating characteristics (ROC) curve. Decision models parameterized using dictionary and non-dictionary feature sourcing approaches produced performance metrics between 70 and 90%. The source of features and feature subset size had no impact on the performance of a decision model. Our study suggests there is little value in leveraging medical dictionaries for extracting features for decision model building. Decision models built using features extracted from the plaintext reports themselves achieve comparable results to those built using medical dictionaries. Overall, this suggests that existing "off the shelf" approaches can be leveraged to perform accurate cancer detection using less complex Named Entity Recognition (NER) based feature extraction, automated feature selection and modeling approaches. Copyright © 2017 Elsevier Inc. All rights reserved.
Monroe, Katherine S
2016-03-11
This research explored the assessment of self-directed learning readiness within the comprehensive evaluation of medical students' knowledge and skills and the extent to which several variables predicted participants' self-directed learning readiness prior to their graduation. Five metrics for evaluating medical students were considered in a multiple regression analysis. Fourth-year medical students at a competitive US medical school received an informed consent and an online survey. Participants voluntarily completed a self-directed learning readiness scale that assessed four subsets of self-directed learning readiness and consented to the release of their academic records. The assortment of metrics considered in this study only vaguely captured students' self-directedness. The strongest predictors were faculty evaluations of students' performance on clerkship rotations. Specific clerkship grades were mildly predictive of three subscales. The Pediatrics clerkship modestly predicted critical self-evaluation (r=-.30, p=.01) and the Psychiatry clerkship mildly predicted learning self-efficacy (r =-.30, p=.01), while the Junior Surgery clerkship nominally correlated with participants' effective organization for learning (r=.21, p=.05). Other metrics examined did not contribute to predicting participants' readiness for self-directed learning. Given individual differences among participants for the variables considered, no combination of students' grades and/or test scores overwhelmingly predicted their aptitude for self-directed learning. Considering the importance of fostering medical students' self-directed learning skills, schools need a reliable and pragmatic approach to measure them. This data analysis, however, offered no clear-cut way of documenting students' self-directed learning readiness based on the evaluation metrics included.
Huang, Erich P; Wang, Xiao-Feng; Choudhury, Kingshuk Roy; McShane, Lisa M; Gönen, Mithat; Ye, Jingjing; Buckler, Andrew J; Kinahan, Paul E; Reeves, Anthony P; Jackson, Edward F; Guimaraes, Alexander R; Zahlmann, Gudrun
2015-02-01
Medical imaging serves many roles in patient care and the drug approval process, including assessing treatment response and guiding treatment decisions. These roles often involve a quantitative imaging biomarker, an objectively measured characteristic of the underlying anatomic structure or biochemical process derived from medical images. Before a quantitative imaging biomarker is accepted for use in such roles, the imaging procedure to acquire it must undergo evaluation of its technical performance, which entails assessment of performance metrics such as repeatability and reproducibility of the quantitative imaging biomarker. Ideally, this evaluation will involve quantitative summaries of results from multiple studies to overcome limitations due to the typically small sample sizes of technical performance studies and/or to include a broader range of clinical settings and patient populations. This paper is a review of meta-analysis procedures for such an evaluation, including identification of suitable studies, statistical methodology to evaluate and summarize the performance metrics, and complete and transparent reporting of the results. This review addresses challenges typical of meta-analyses of technical performance, particularly small study sizes, which often causes violations of assumptions underlying standard meta-analysis techniques. Alternative approaches to address these difficulties are also presented; simulation studies indicate that they outperform standard techniques when some studies are small. The meta-analysis procedures presented are also applied to actual [18F]-fluorodeoxyglucose positron emission tomography (FDG-PET) test-retest repeatability data for illustrative purposes. © The Author(s) 2014 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
Huang, Erich P; Wang, Xiao-Feng; Choudhury, Kingshuk Roy; McShane, Lisa M; Gönen, Mithat; Ye, Jingjing; Buckler, Andrew J; Kinahan, Paul E; Reeves, Anthony P; Jackson, Edward F; Guimaraes, Alexander R; Zahlmann, Gudrun
2017-01-01
Medical imaging serves many roles in patient care and the drug approval process, including assessing treatment response and guiding treatment decisions. These roles often involve a quantitative imaging biomarker, an objectively measured characteristic of the underlying anatomic structure or biochemical process derived from medical images. Before a quantitative imaging biomarker is accepted for use in such roles, the imaging procedure to acquire it must undergo evaluation of its technical performance, which entails assessment of performance metrics such as repeatability and reproducibility of the quantitative imaging biomarker. Ideally, this evaluation will involve quantitative summaries of results from multiple studies to overcome limitations due to the typically small sample sizes of technical performance studies and/or to include a broader range of clinical settings and patient populations. This paper is a review of meta-analysis procedures for such an evaluation, including identification of suitable studies, statistical methodology to evaluate and summarize the performance metrics, and complete and transparent reporting of the results. This review addresses challenges typical of meta-analyses of technical performance, particularly small study sizes, which often causes violations of assumptions underlying standard meta-analysis techniques. Alternative approaches to address these difficulties are also presented; simulation studies indicate that they outperform standard techniques when some studies are small. The meta-analysis procedures presented are also applied to actual [18F]-fluorodeoxyglucose positron emission tomography (FDG-PET) test–retest repeatability data for illustrative purposes. PMID:24872353
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nourzadeh, H; Watkins, W; Siebers, J
Purpose: To determine if auto-contour and manual-contour—based plans differ when evaluated with respect to probabilistic coverage metrics and biological model endpoints for prostate IMRT. Methods: Manual and auto-contours were created for 149 CT image sets acquired from 16 unique prostate patients. A single physician manually contoured all images. Auto-contouring was completed utilizing Pinnacle’s Smart Probabilistic Image Contouring Engine (SPICE). For each CT, three different 78 Gy/39 fraction 7-beam IMRT plans are created; PD with drawn ROIs, PAS with auto-contoured ROIs, and PM with auto-contoured OARs with the manually drawn target. For each plan, 1000 virtual treatment simulations with different sampledmore » systematic errors for each simulation and a different sampled random error for each fraction were performed using our in-house GPU-accelerated robustness analyzer tool which reports the statistical probability of achieving dose-volume metrics, NTCP, TCP, and the probability of achieving the optimization criteria for both auto-contoured (AS) and manually drawn (D) ROIs. Metrics are reported for all possible cross-evaluation pairs of ROI types (AS,D) and planning scenarios (PD,PAS,PM). Bhattacharyya coefficient (BC) is calculated to measure the PDF similarities for the dose-volume metric, NTCP, TCP, and objectives with respect to the manually drawn contour evaluated on base plan (D-PD). Results: We observe high BC values (BC≥0.94) for all OAR objectives. BC values of max dose objective on CTV also signify high resemblance (BC≥0.93) between the distributions. On the other hand, BC values for CTV’s D95 and Dmin objectives are small for AS-PM, AS-PD. NTCP distributions are similar across all evaluation pairs, while TCP distributions of AS-PM, AS-PD sustain variations up to %6 compared to other evaluated pairs. Conclusion: No significant probabilistic differences are observed in the metrics when auto-contoured OARs are used. The prostate auto-contour needs improvement to achieve clinically equivalent plans.« less
Bayesian model evidence as a model evaluation metric
NASA Astrophysics Data System (ADS)
Guthke, Anneli; Höge, Marvin; Nowak, Wolfgang
2017-04-01
When building environmental systems models, we are typically confronted with the questions of how to choose an appropriate model (i.e., which processes to include or neglect) and how to measure its quality. Various metrics have been proposed that shall guide the modeller towards a most robust and realistic representation of the system under study. Criteria for evaluation often address aspects of accuracy (absence of bias) or of precision (absence of unnecessary variance) and need to be combined in a meaningful way in order to address the inherent bias-variance dilemma. We suggest using Bayesian model evidence (BME) as a model evaluation metric that implicitly performs a tradeoff between bias and variance. BME is typically associated with model weights in the context of Bayesian model averaging (BMA). However, it can also be seen as a model evaluation metric in a single-model context or in model comparison. It combines a measure for goodness of fit with a penalty for unjustifiable complexity. Unjustifiable refers to the fact that the appropriate level of model complexity is limited by the amount of information available for calibration. Derived in a Bayesian context, BME naturally accounts for measurement errors in the calibration data as well as for input and parameter uncertainty. BME is therefore perfectly suitable to assess model quality under uncertainty. We will explain in detail and with schematic illustrations what BME measures, i.e. how complexity is defined in the Bayesian setting and how this complexity is balanced with goodness of fit. We will further discuss how BME compares to other model evaluation metrics that address accuracy and precision such as the predictive logscore or other model selection criteria such as the AIC, BIC or KIC. Although computationally more expensive than other metrics or criteria, BME represents an appealing alternative because it provides a global measure of model quality. Even if not applicable to each and every case, we aim at stimulating discussion about how to judge the quality of hydrological models in the presence of uncertainty in general by dissecting the mechanism behind BME.
Development of quality metrics for ambulatory pediatric cardiology: Infection prevention.
Johnson, Jonathan N; Barrett, Cindy S; Franklin, Wayne H; Graham, Eric M; Halnon, Nancy J; Hattendorf, Brandy A; Krawczeski, Catherine D; McGovern, James J; O'Connor, Matthew J; Schultz, Amy H; Vinocur, Jeffrey M; Chowdhury, Devyani; Anderson, Jeffrey B
2017-12-01
In 2012, the American College of Cardiology's (ACC) Adult Congenital and Pediatric Cardiology Council established a program to develop quality metrics to guide ambulatory practices for pediatric cardiology. The council chose five areas on which to focus their efforts; chest pain, Kawasaki Disease, tetralogy of Fallot, transposition of the great arteries after arterial switch, and infection prevention. Here, we sought to describe the process, evaluation, and results of the Infection Prevention Committee's metric design process. The infection prevention metrics team consisted of 12 members from 11 institutions in North America. The group agreed to work on specific infection prevention topics including antibiotic prophylaxis for endocarditis, rheumatic fever, and asplenia/hyposplenism; influenza vaccination and respiratory syncytial virus prophylaxis (palivizumab); preoperative methods to reduce intraoperative infections; vaccinations after cardiopulmonary bypass; hand hygiene; and testing to identify splenic function in patients with heterotaxy. An extensive literature review was performed. When available, previously published guidelines were used fully in determining metrics. The committee chose eight metrics to submit to the ACC Quality Metric Expert Panel for review. Ultimately, metrics regarding hand hygiene and influenza vaccination recommendation for patients did not pass the RAND analysis. Both endocarditis prophylaxis metrics and the RSV/palivizumab metric passed the RAND analysis but fell out during the open comment period. Three metrics passed all analyses, including those for antibiotic prophylaxis in patients with heterotaxy/asplenia, for influenza vaccination compliance in healthcare personnel, and for adherence to recommended regimens of secondary prevention of rheumatic fever. The lack of convincing data to guide quality improvement initiatives in pediatric cardiology is widespread, particularly in infection prevention. Despite this, three metrics were able to be developed for use in the ACC's quality efforts for ambulatory practice. © 2017 Wiley Periodicals, Inc.
Experimental evaluation of ontology-based HIV/AIDS frequently asked question retrieval system.
Ayalew, Yirsaw; Moeng, Barbara; Mosweunyane, Gontlafetse
2018-05-01
This study presents the results of experimental evaluations of an ontology-based frequently asked question retrieval system in the domain of HIV and AIDS. The main purpose of the system is to provide answers to questions on HIV/AIDS using ontology. To evaluate the effectiveness of the frequently asked question retrieval system, we conducted two experiments. The first experiment focused on the evaluation of the quality of the ontology we developed using the OQuaRE evaluation framework which is based on software quality metrics and metrics designed for ontology quality evaluation. The second experiment focused on evaluating the effectiveness of the ontology in retrieving relevant answers. For this we used an open-source information retrieval platform, Terrier, with retrieval models BM25 and PL2. For the measurement of performance, we used the measures mean average precision, mean reciprocal rank, and precision at 5. The results suggest that frequently asked question retrieval with ontology is more effective than frequently asked question retrieval without ontology in the domain of HIV/AIDS.
Citizen science: A new perspective to advance spatial pattern evaluation in hydrology.
Koch, Julian; Stisen, Simon
2017-01-01
Citizen science opens new pathways that can complement traditional scientific practice. Intuition and reasoning often make humans more effective than computer algorithms in various realms of problem solving. In particular, a simple visual comparison of spatial patterns is a task where humans are often considered to be more reliable than computer algorithms. However, in practice, science still largely depends on computer based solutions, which inevitably gives benefits such as speed and the possibility to automatize processes. However, the human vision can be harnessed to evaluate the reliability of algorithms which are tailored to quantify similarity in spatial patterns. We established a citizen science project to employ the human perception to rate similarity and dissimilarity between simulated spatial patterns of several scenarios of a hydrological catchment model. In total, the turnout counts more than 2500 volunteers that provided over 43000 classifications of 1095 individual subjects. We investigate the capability of a set of advanced statistical performance metrics to mimic the human perception to distinguish between similarity and dissimilarity. Results suggest that more complex metrics are not necessarily better at emulating the human perception, but clearly provide auxiliary information that is valuable for model diagnostics. The metrics clearly differ in their ability to unambiguously distinguish between similar and dissimilar patterns which is regarded a key feature of a reliable metric. The obtained dataset can provide an insightful benchmark to the community to test novel spatial metrics.
NASA Astrophysics Data System (ADS)
Zamani Losgedaragh, Saeideh; Rahimzadegan, Majid
2018-06-01
Evapotranspiration (ET) estimation is of great importance due to its key role in water resource management. Surface energy modeling tools such as Surface Energy Balance Algorithm for Land (SEBAL), Mapping Evapotranspiration with Internalized Calibration (METRIC), and the Surface Energy Balance System (SEBS) can estimate the amount of evapotranspiration for every pixel of the satellite images. The main objective of this research is evaporation investigation from the freshwater bodies using SEBAL, METRIC, and SEBS. For this purpose, the Amirkabir dam reservoir and its nearby agricultural lands in a semi-arid climate were selected and studied from 2011 to 2017 as the study area. The implementations of this study were accomplished on 16 satellite images of Landsat TM5 and OLI. Then, SEBAL, METRIC, and SEBS were implemented on the selected images. Moreover, the corresponding pan evaporate measurements on the reservoir bank were considered as the ground truth data. Regarding to the results, SEBAL is not a reliable method to evaluate freshwater evaporation with the coefficient of determination (R2) of 0.36 and the Root Mean Square Error (RMSE) of 5.1 mm. On the other hand, METRIC with RMSE and R2 of 0.57 and 2.02 mm and SEBS with RMSE and R2 of 0.93 and 0.62 demonstrated a relatively good performance.
Initial Data Analysis Results for ATD-2 ISAS HITL Simulation
NASA Technical Reports Server (NTRS)
Lee, Hanbong
2017-01-01
To evaluate the operational procedures and information requirements for the core functional capabilities of the ATD-2 project, such as tactical surface metering tool, APREQ-CFR procedure, and data element exchanges between ramp and tower, human-in-the-loop (HITL) simulations were performed in March, 2017. This presentation shows the initial data analysis results from the HITL simulations. With respect to the different runway configurations and metering values in tactical surface scheduler, various airport performance metrics were analyzed and compared. These metrics include gate holding time, taxi-out in time, runway throughput, queue size and wait time in queue, and TMI flight compliance. In addition to the metering value, other factors affecting the airport performance in the HITL simulation, including run duration, runway changes, and TMI constraints, are also discussed.
Using community-level metrics to monitor the effects of marine protected areas on biodiversity.
Soykan, Candan U; Lewison, Rebecca L
2015-06-01
Marine protected areas (MPAs) are used to protect species, communities, and their associated habitats, among other goals. Measuring MPA efficacy can be challenging, however, particularly when considering responses at the community level. We gathered 36 abundance and 14 biomass data sets on fish assemblages and used meta-analysis to evaluate the ability of 22 distinct community diversity metrics to detect differences in community structure between MPAs and nearby control sites. We also considered the effects of 6 covariates-MPA size and age, MPA size and age interaction, latitude, total species richness, and level of protection-on each metric. Some common metrics, such as species richness and Shannon diversity, did not differ consistently between MPA and control sites, whereas other metrics, such as total abundance and biomass, were consistently different across studies. Metric responses derived from the biomass data sets were more consistent than those based on the abundance data sets, suggesting that community-level biomass differs more predictably than abundance between MPA and control sites. Covariate analyses indicated that level of protection, latitude, MPA size, and the interaction between MPA size and age affect metric performance. These results highlight a handful of metrics, several of which are little known, that could be used to meet the increasing demand for community-level indicators of MPA effectiveness. © 2015 Society for Conservation Biology.
Koeppen Bioclimatic Metrics for Evaluating CMIP5 Simulations of Historical Climate
NASA Astrophysics Data System (ADS)
Phillips, T. J.; Bonfils, C.
2012-12-01
The classic Koeppen bioclimatic classification scheme associates generic vegetation types (e.g. grassland, tundra, broadleaf or evergreen forests, etc.) with regional climate zones defined by the observed amplitude and phase of the annual cycles of continental temperature (T) and precipitation (P). Koeppen classification thus can provide concise, multivariate metrics for evaluating climate model performance in simulating the regional magnitudes and seasonalities of climate variables that are of critical importance for living organisms. In this study, 14 Koeppen vegetation types are derived from annual-cycle climatologies of T and P in some 3 dozen CMIP5 simulations of 1980-1999 climate, a period when observational data provides a reliable global validation standard. Metrics for evaluating the ability of the CMIP5 models to simulate the correct locations and areas of the vegetation types, as well as measures of overall model performance, also are developed. It is found that the CMIP5 models are most deficient in simulating 1) the climates of the drier zones (e.g. desert, savanna, grassland, steppe vegetation types) that are located in the Southwestern U.S. and Mexico, Eastern Europe, Southern Africa, and Central Australia, as well as 2) the climate of regions such as Central Asia and Western South America where topography plays a central role. (Detailed analysis of regional biases in the annual cycles of T and P of selected simulations exemplifying general model performance problems also are to be presented.) The more encouraging results include evidence for a general improvement in CMIP5 performance relative to that of older CMIP3 models. Within CMIP5 also, the more complex Earth Systems Models (ESMs) with prognostic biogeochemistry perform comparably to the corresponding global models that simulate only the "physical" climate. Acknowledgments This work was funded by the U.S. Department of Energy Office of Science and was performed at the Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Rivard, Justin D; Vergis, Ashley S; Unger, Bertram J; Hardy, Krista M; Andrew, Chris G; Gillman, Lawrence M; Park, Jason
2014-06-01
Computer-based surgical simulators capture a multitude of metrics based on different aspects of performance, such as speed, accuracy, and movement efficiency. However, without rigorous assessment, it may be unclear whether all, some, or none of these metrics actually reflect technical skill, which can compromise educational efforts on these simulators. We assessed the construct validity of individual performance metrics on the LapVR simulator (Immersion Medical, San Jose, CA, USA) and used these data to create task-specific summary metrics. Medical students with no prior laparoscopic experience (novices, N = 12), junior surgical residents with some laparoscopic experience (intermediates, N = 12), and experienced surgeons (experts, N = 11) all completed three repetitions of four LapVR simulator tasks. The tasks included three basic skills (peg transfer, cutting, clipping) and one procedural skill (adhesiolysis). We selected 36 individual metrics on the four tasks that assessed six different aspects of performance, including speed, motion path length, respect for tissue, accuracy, task-specific errors, and successful task completion. Four of seven individual metrics assessed for peg transfer, six of ten metrics for cutting, four of nine metrics for clipping, and three of ten metrics for adhesiolysis discriminated between experience levels. Time and motion path length were significant on all four tasks. We used the validated individual metrics to create summary equations for each task, which successfully distinguished between the different experience levels. Educators should maintain some skepticism when reviewing the plethora of metrics captured by computer-based simulators, as some but not all are valid. We showed the construct validity of a limited number of individual metrics and developed summary metrics for the LapVR. The summary metrics provide a succinct way of assessing skill with a single metric for each task, but require further validation.
Beyond the Turing Test: Performance Metrics for Evaluating a Computer Simulation of the Human Mind
2002-08-01
Tomasello , 2001. Perceiving intentions and learning words in the second year of life. In M. Bowerman and S. Levinson (Eds.), Language Acquisition and Conceptual Development. Cambridge University Press, New York, NY.
Best Practices Handbook: Traffic Engineering in Range Networks
2016-03-01
units of measurement. Measurement Methodology - A repeatable measurement technique used to derive one or more metrics of interest . Network...Performance measures - Metrics that provide quantitative or qualitative measures of the performance of systems or subsystems of interest . Performance Metric
Hyperbolic Harmonic Mapping for Surface Registration
Shi, Rui; Zeng, Wei; Su, Zhengyu; Jiang, Jian; Damasio, Hanna; Lu, Zhonglin; Wang, Yalin; Yau, Shing-Tung; Gu, Xianfeng
2016-01-01
Automatic computation of surface correspondence via harmonic map is an active research field in computer vision, computer graphics and computational geometry. It may help document and understand physical and biological phenomena and also has broad applications in biometrics, medical imaging and motion capture inducstries. Although numerous studies have been devoted to harmonic map research, limited progress has been made to compute a diffeomorphic harmonic map on general topology surfaces with landmark constraints. This work conquers this problem by changing the Riemannian metric on the target surface to a hyperbolic metric so that the harmonic mapping is guaranteed to be a diffeomorphism under landmark constraints. The computational algorithms are based on Ricci flow and nonlinear heat diffusion methods. The approach is general and robust. We employ our algorithm to study the constrained surface registration problem which applies to both computer vision and medical imaging applications. Experimental results demonstrate that, by changing the Riemannian metric, the registrations are always diffeomorphic and achieve relatively high performance when evaluated with some popular surface registration evaluation standards. PMID:27187948
Metrics for covariate balance in cohort studies of causal effects.
Franklin, Jessica M; Rassen, Jeremy A; Ackermann, Diana; Bartels, Dorothee B; Schneeweiss, Sebastian
2014-05-10
Inferring causation from non-randomized studies of exposure requires that exposure groups can be balanced with respect to prognostic factors for the outcome. Although there is broad agreement in the literature that balance should be checked, there is confusion regarding the appropriate metric. We present a simulation study that compares several balance metrics with respect to the strength of their association with bias in estimation of the effect of a binary exposure on a binary, count, or continuous outcome. The simulations utilize matching on the propensity score with successively decreasing calipers to produce datasets with varying covariate balance. We propose the post-matching C-statistic as a balance metric and found that it had consistently strong associations with estimation bias, even when the propensity score model was misspecified, as long as the propensity score was estimated with sufficient study size. This metric, along with the average standardized difference and the general weighted difference, outperformed all other metrics considered in association with bias, including the unstandardized absolute difference, Kolmogorov-Smirnov and Lévy distances, overlapping coefficient, Mahalanobis balance, and L1 metrics. Of the best-performing metrics, the C-statistic and general weighted difference also have the advantage that they automatically evaluate balance on all covariates simultaneously and can easily incorporate balance on interactions among covariates. Therefore, when combined with the usual practice of comparing individual covariate means and standard deviations across exposure groups, these metrics may provide useful summaries of the observed covariate imbalance. Copyright © 2013 John Wiley & Sons, Ltd.
Reference-free ground truth metric for metal artifact evaluation in CT images.
Kratz, Bärbel; Ens, Svitlana; Müller, Jan; Buzug, Thorsten M
2011-07-01
In computed tomography (CT), metal objects in the region of interest introduce data inconsistencies during acquisition. Reconstructing these data results in an image with star shaped artifacts induced by the metal inconsistencies. To enhance image quality, the influence of the metal objects can be reduced by different metal artifact reduction (MAR) strategies. For an adequate evaluation of new MAR approaches a ground truth reference data set is needed. In technical evaluations, where phantoms can be measured with and without metal inserts, ground truth data can easily be obtained by a second reference data acquisition. Obviously, this is not possible for clinical data. Here, an alternative evaluation method is presented without the need of an additionally acquired reference data set. The proposed metric is based on an inherent ground truth for metal artifacts as well as MAR methods comparison, where no reference information in terms of a second acquisition is needed. The method is based on the forward projection of a reconstructed image, which is compared to the actually measured projection data. The new evaluation technique is performed on phantom and on clinical CT data with and without MAR. The metric results are then compared with methods using a reference data set as well as an expert-based classification. It is shown that the new approach is an adequate quantification technique for artifact strength in reconstructed metal or MAR CT images. The presented method works solely on the original projection data itself, which yields some advantages compared to distance measures in image domain using two data sets. Beside this, no parameters have to be manually chosen. The new metric is a useful evaluation alternative when no reference data are available.
Identification of the ideal clutter metric to predict time dependence of human visual search
NASA Astrophysics Data System (ADS)
Cartier, Joan F.; Hsu, David H.
1995-05-01
The Army Night Vision and Electronic Sensors Directorate (NVESD) has recently performed a human perception experiment in which eye tracker measurements were made on trained military observers searching for targets in infrared images. This data offered an important opportunity to evaluate a new technique for search modeling. Following the approach taken by Jeff Nicoll, this model treats search as a random walk in which the observers are in one of two states until they quit: they are either searching, or they are wandering around looking for a point of interest. When wandering they skip rapidly from point to point. When examining they move more slowly, reflecting the fact that target discrimination requires additional thought processes. In this paper we simulate the random walk, using a clutter metric to assign relative attractiveness to points of interest within the image which are competing for the observer's attention. The NVESD data indicates that a number of standard clutter metrics are good estimators of the apportionment of observer's time between wandering and examining. Conversely, the apportionment of observer time spent wandering and examining could be used to reverse engineer the ideal clutter metric which would most perfectly describe the behavior of the group of observers. It may be possible to use this technique to design the optimal clutter metric to predict performance of visual search.
NASA Astrophysics Data System (ADS)
Jobson, Daniel J.; Rahman, Zia-ur; Woodell, Glenn A.; Hines, Glenn D.
2006-05-01
Aerial images from the Follow-On Radar, Enhanced and Synthetic Vision Systems Integration Technology Evaluation (FORESITE) flight tests with the NASA Langley Research Center's research Boeing 757 were acquired during severe haze and haze/mixed clouds visibility conditions. These images were enhanced using the Visual Servo (VS) process that makes use of the Multiscale Retinex. The images were then quantified with visual quality metrics used internally within the VS. One of these metrics, the Visual Contrast Measure, has been computed for hundreds of FORESITE images, and for major classes of imaging-terrestrial (consumer), orbital Earth observations, orbital Mars surface imaging, NOAA aerial photographs, and underwater imaging. The metric quantifies both the degree of visual impairment of the original, un-enhanced images as well as the degree of visibility improvement achieved by the enhancement process. The large aggregate data exhibits trends relating to degree of atmospheric visibility attenuation, and its impact on the limits of enhancement performance for the various image classes. Overall results support the idea that in most cases that do not involve extreme reduction in visibility, large gains in visual contrast are routinely achieved by VS processing. Additionally, for very poor visibility imaging, lesser, but still substantial, gains in visual contrast are also routinely achieved. Further, the data suggest that these visual quality metrics can be used as external standalone metrics for establishing performance parameters.
NASA Technical Reports Server (NTRS)
Johnson, Daniel J.; Rahman, Zia-ur; Woodell, Glenn A.; Hines, Glenn D.
2006-01-01
Aerial images from the Follow-On Radar, Enhanced and Synthetic Vision Systems Integration Technology Evaluation (FORESITE) flight tests with the NASA Langley Research Center's research Boeing 757 were acquired during severe haze and haze/mixed clouds visibility conditions. These images were enhanced using the Visual Servo (VS) process that makes use of the Multiscale Retinex. The images were then quantified with visual quality metrics used internally with the VS. One of these metrics, the Visual Contrast Measure, has been computed for hundreds of FORESITE images, and for major classes of imaging--terrestrial (consumer), orbital Earth observations, orbital Mars surface imaging, NOAA aerial photographs, and underwater imaging. The metric quantifies both the degree of visual impairment of the original, un-enhanced images as well as the degree of visibility improvement achieved by the enhancement process. The large aggregate data exhibits trends relating to degree of atmospheric visibility attenuation, and its impact on limits of enhancement performance for the various image classes. Overall results support the idea that in most cases that do not involve extreme reduction in visibility, large gains in visual contrast are routinely achieved by VS processing. Additionally, for very poor visibility imaging, lesser, but still substantial, gains in visual contrast are also routinely achieved. Further, the data suggest that these visual quality metrics can be used as external standalone metrics for establishing performance parameters.
1993-09-01
and (6) assistance with special problems by the purchasing department (Cavinato, 1987:10). Szilagyi and Wallace state-that an effective performance...Naval Research, November 1975. Szilagyi , Andrew D. Jr. and Marc J. Wallace , Jr. Organizational Behavior and Performance, Santa Monica CA, Goodyear...evaluation systems will the manager be able to achieve the bottom line--organizational effectiveness ( Szilagyi 1980:457). Benefits of Evaluatin2 Performance
Abercrombie, Robert K; Sheldon, Frederick T; Ferragut, Erik M
2014-06-24
A system evaluates reliability, performance and/or safety by automatically assessing the targeted system's requirements. A cost metric quantifies the impact of failures as a function of failure cost per unit of time. The metrics or measurements may render real-time (or near real-time) outcomes by initiating active response against one or more high ranked threats. The system may support or may be executed in many domains including physical domains, cyber security domains, cyber-physical domains, infrastructure domains, etc. or any other domains that are subject to a threat or a loss.
Bradley, Paul S; Ade, Jack D
2018-01-18
Time-motion analysis is a valuable data-collection technique used to quantify the physical match performance of elite soccer players. For over 40 years researchers have adopted a 'traditional' approach when evaluating match demands by simply reporting the distance covered or time spent along a motion continuum of walking through to sprinting. This methodology quantifies physical metrics in isolation without integrating other factors and this ultimately leads to a one-dimensional insight into match performance. Thus, this commentary proposes a novel 'integrated' approach that focuses on a sensitive physical metric such as high-intensity running but contextualizes this in relation to key tactical activities for each position and collectively for the team. In the example presented, the 'integrated' model clearly unveils the unique high-intensity profile that exists due to distinct tactical roles, rather than one-dimensional 'blind' distances produced by 'traditional' models. Intuitively this innovative concept may aid the coaches understanding of the physical performance in relation to the tactical roles and instructions given to the players. Additionally, it will enable practitioners to more effectively translate match metrics into training and testing protocols. This innovative model may well aid advances in other team sports that incorporate similar intermittent movements with tactical purpose. Evidence of the merits and application of this new concept are needed before the scientific community accepts this model as it may well add complexity to an area that conceivably needs simplicity.
Qin, Haiming; Wang, Cheng; Zhao, Kaiguang; Xi, Xiaohuan
2018-01-01
Accurate estimation of the fraction of absorbed photosynthetically active radiation (fPAR) for maize canopies are important for maize growth monitoring and yield estimation. The goal of this study is to explore the potential of using airborne LiDAR and hyperspectral data to better estimate maize fPAR. This study focuses on estimating maize fPAR from (1) height and coverage metrics derived from airborne LiDAR point cloud data; (2) vegetation indices derived from hyperspectral imagery; and (3) a combination of these metrics. Pearson correlation analyses were conducted to evaluate the relationships among LiDAR metrics, hyperspectral metrics, and field-measured fPAR values. Then, multiple linear regression (MLR) models were developed using these metrics. Results showed that (1) LiDAR height and coverage metrics provided good explanatory power (i.e., R2 = 0.81); (2) hyperspectral vegetation indices provided moderate interpretability (i.e., R2 = 0.50); and (3) the combination of LiDAR metrics and hyperspectral metrics improved the LiDAR model (i.e., R2 = 0.88). These results indicate that LiDAR model seems to offer a reliable method for estimating maize fPAR at a high spatial resolution and it can be used for farmland management. Combining LiDAR and hyperspectral metrics led to better performance of maize fPAR estimation than LiDAR or hyperspectral metrics alone, which means that maize fPAR retrieval can benefit from the complementary nature of LiDAR-detected canopy structure characteristics and hyperspectral-captured vegetation spectral information.
Sinitsky, Daniel M; Fernando, Bimbi; Berlingieri, Pasquale
2012-09-01
The unique psychomotor skills required in laparoscopy result in reduced patient safety during the early part of the learning curve. Evidence suggests that these may be safely acquired in the virtual reality (VR) environment. Several VR simulators are available, each preloaded with several psychomotor skills tasks that provide users with computer-generated performance metrics. This review aimed to evaluate the usefulness of specific psychomotor skills tasks and metrics, and how trainers might build an effective training curriculum. We performed a comprehensive literature search. The vast majority of VR psychomotor skills tasks show construct validity for one or more metrics. These are commonly for time and motion parameters. Regarding training schedules, distributed practice is preferred over massed practice. However, a degree of supervision may be needed to counter the limitations of VR training. In the future, standardized proficiency scores should facilitate local institutions in establishing VR laparoscopic psychomotor skills curricula. Copyright © 2012 Elsevier Inc. All rights reserved.
Bibliometrics: tracking research impact by selecting the appropriate metrics.
Agarwal, Ashok; Durairajanayagam, Damayanthi; Tatagari, Sindhuja; Esteves, Sandro C; Harlev, Avi; Henkel, Ralf; Roychoudhury, Shubhadeep; Homa, Sheryl; Puchalt, Nicolás Garrido; Ramasamy, Ranjith; Majzoub, Ahmad; Ly, Kim Dao; Tvrda, Eva; Assidi, Mourad; Kesari, Kavindra; Sharma, Reecha; Banihani, Saleem; Ko, Edmund; Abu-Elmagd, Muhammad; Gosalvez, Jaime; Bashiri, Asher
2016-01-01
Traditionally, the success of a researcher is assessed by the number of publications he or she publishes in peer-reviewed, indexed, high impact journals. This essential yardstick, often referred to as the impact of a specific researcher, is assessed through the use of various metrics. While researchers may be acquainted with such matrices, many do not know how to use them to enhance their careers. In addition to these metrics, a number of other factors should be taken into consideration to objectively evaluate a scientist's profile as a researcher and academician. Moreover, each metric has its own limitations that need to be considered when selecting an appropriate metric for evaluation. This paper provides a broad overview of the wide array of metrics currently in use in academia and research. Popular metrics are discussed and defined, including traditional metrics and article-level metrics, some of which are applied to researchers for a greater understanding of a particular concept, including varicocele that is the thematic area of this Special Issue of Asian Journal of Andrology. We recommend the combined use of quantitative and qualitative evaluation using judiciously selected metrics for a more objective assessment of scholarly output and research impact.
Bibliometrics: tracking research impact by selecting the appropriate metrics
Agarwal, Ashok; Durairajanayagam, Damayanthi; Tatagari, Sindhuja; Esteves, Sandro C; Harlev, Avi; Henkel, Ralf; Roychoudhury, Shubhadeep; Homa, Sheryl; Puchalt, Nicolás Garrido; Ramasamy, Ranjith; Majzoub, Ahmad; Ly, Kim Dao; Tvrda, Eva; Assidi, Mourad; Kesari, Kavindra; Sharma, Reecha; Banihani, Saleem; Ko, Edmund; Abu-Elmagd, Muhammad; Gosalvez, Jaime; Bashiri, Asher
2016-01-01
Traditionally, the success of a researcher is assessed by the number of publications he or she publishes in peer-reviewed, indexed, high impact journals. This essential yardstick, often referred to as the impact of a specific researcher, is assessed through the use of various metrics. While researchers may be acquainted with such matrices, many do not know how to use them to enhance their careers. In addition to these metrics, a number of other factors should be taken into consideration to objectively evaluate a scientist's profile as a researcher and academician. Moreover, each metric has its own limitations that need to be considered when selecting an appropriate metric for evaluation. This paper provides a broad overview of the wide array of metrics currently in use in academia and research. Popular metrics are discussed and defined, including traditional metrics and article-level metrics, some of which are applied to researchers for a greater understanding of a particular concept, including varicocele that is the thematic area of this Special Issue of Asian Journal of Andrology. We recommend the combined use of quantitative and qualitative evaluation using judiciously selected metrics for a more objective assessment of scholarly output and research impact. PMID:26806079
Guiding principles and checklist for population-based quality metrics.
Krishnan, Mahesh; Brunelli, Steven M; Maddux, Franklin W; Parker, Thomas F; Johnson, Douglas; Nissenson, Allen R; Collins, Allan; Lacson, Eduardo
2014-06-06
The Centers for Medicare and Medicaid Services oversees the ESRD Quality Incentive Program to ensure that the highest quality of health care is provided by outpatient dialysis facilities that treat patients with ESRD. To that end, Centers for Medicare and Medicaid Services uses clinical performance measures to evaluate quality of care under a pay-for-performance or value-based purchasing model. Now more than ever, the ESRD therapeutic area serves as the vanguard of health care delivery. By translating medical evidence into clinical performance measures, the ESRD Prospective Payment System became the first disease-specific sector using the pay-for-performance model. A major challenge for the creation and implementation of clinical performance measures is the adjustments that are necessary to transition from taking care of individual patients to managing the care of patient populations. The National Quality Forum and others have developed effective and appropriate population-based clinical performance measures quality metrics that can be aggregated at the physician, hospital, dialysis facility, nursing home, or surgery center level. Clinical performance measures considered for endorsement by the National Quality Forum are evaluated using five key criteria: evidence, performance gap, and priority (impact); reliability; validity; feasibility; and usability and use. We have developed a checklist of special considerations for clinical performance measure development according to these National Quality Forum criteria. Although the checklist is focused on ESRD, it could also have broad application to chronic disease states, where health care delivery organizations seek to enhance quality, safety, and efficiency of their services. Clinical performance measures are likely to become the norm for tracking performance for health care insurers. Thus, it is critical that the methodologies used to develop such metrics serve the payer and the provider and most importantly, reflect what represents the best care to improve patient outcomes. Copyright © 2014 by the American Society of Nephrology.
Correlation of admissions statistics to graduate student success in medical physics
McSpadden, Erin; Rakowski, Joseph; Nalichowski, Adrian; Yudelev, Mark; Snyder, Michael
2014-01-01
The purpose of this work is to develop metrics for evaluation of medical physics graduate student performance, assess relationships between success and other quantifiable factors, and determine whether graduate student performance can be accurately predicted by admissions statistics. A cohort of 108 medical physics graduate students from a single institution were rated for performance after matriculation based on final scores in specific courses, first year graduate Grade Point Average (GPA), performance on the program exit exam, performance in oral review sessions, and faculty rating. Admissions statistics including matriculating program (MS vs. PhD); undergraduate degree type, GPA, and country; graduate degree; general and subject GRE scores; traditional vs. nontraditional status; and ranking by admissions committee were evaluated for potential correlation with the performance metrics. GRE verbal and quantitative scores were correlated with higher scores in the most difficult courses in the program and with the program exit exam; however, the GRE section most correlated with overall faculty rating was the analytical writing section. Students with undergraduate degrees in engineering had a higher faculty rating than those from other disciplines and faculty rating was strongly correlated with undergraduate country. Undergraduate GPA was not statistically correlated with any success metrics investigated in this study. However, the high degree of selection on GPA and quantitative GRE scores during the admissions process results in relatively narrow ranges for these quantities. As such, these results do not necessarily imply that one should not strongly consider traditional metrics, such as undergraduate GPA and quantitative GRE score, during the admissions process. They suggest that once applicants have been initially filtered by these metrics, additional selection should be performed via the other metrics shown here to be correlated with success. The parameters used to make admissions decisions for our program are accurate in predicting student success, as illustrated by the very strong statistical correlation between admissions rank and course average, first year graduate GPA, and faculty rating (p<0.002). Overall, this study indicates that an undergraduate degree in physics should not be considered a fundamental requirement for entry into our program and that within the relatively narrow range of undergraduate GPA and quantitative GRE scores of those admitted into our program, additional variations in these metrics are not important predictors of success. While the high degree of selection on particular statistics involved in the admissions process, along with the relatively small sample size, makes it difficult to draw concrete conclusions about the meaning of correlations here, these results suggest that success in medical physics is based on more than quantitative capabilities. Specifically, they indicate that analytical and communication skills play a major role in student success in our program, as well as predicted future success by program faculty members. Finally, this study confirms that our current admissions process is effective in identifying candidates who will be successful in our program and are expected to be successful after graduation, and provides additional insight useful in improving our admissions selection process. PACS number: 01.40.‐d PMID:24423842
Sánchez-Margallo, Juan A; Sánchez-Margallo, Francisco M; Oropesa, Ignacio; Enciso, Silvia; Gómez, Enrique J
2017-02-01
The aim of this study is to present the construct and concurrent validity of a motion-tracking method of laparoscopic instruments based on an optical pose tracker and determine its feasibility as an objective assessment tool of psychomotor skills during laparoscopic suturing. A group of novice ([Formula: see text] laparoscopic procedures), intermediate (11-100 laparoscopic procedures) and experienced ([Formula: see text] laparoscopic procedures) surgeons performed three intracorporeal sutures on an ex vivo porcine stomach. Motion analysis metrics were recorded using the proposed tracking method, which employs an optical pose tracker to determine the laparoscopic instruments' position. Construct validation was measured for all 10 metrics across the three groups and between pairs of groups. Concurrent validation was measured against a previously validated suturing checklist. Checklists were completed by two independent surgeons over blinded video recordings of the task. Eighteen novices, 15 intermediates and 11 experienced surgeons took part in this study. Execution time and path length travelled by the laparoscopic dissector presented construct validity. Experienced surgeons required significantly less time ([Formula: see text]), travelled less distance using both laparoscopic instruments ([Formula: see text]) and made more efficient use of the work space ([Formula: see text]) compared with novice and intermediate surgeons. Concurrent validation showed strong correlation between both the execution time and path length and the checklist score ([Formula: see text] and [Formula: see text], [Formula: see text]). The suturing performance was successfully assessed by the motion analysis method. Construct and concurrent validity of the motion-based assessment method has been demonstrated for the execution time and path length metrics. This study demonstrates the efficacy of the presented method for objective evaluation of psychomotor skills in laparoscopic suturing. However, this method does not take into account the quality of the suture. Thus, future works will focus on developing new methods combining motion analysis and qualitative outcome evaluation to provide a complete performance assessment to trainees.
Impact of artifact removal on ChIP quality metrics in ChIP-seq and ChIP-exo data
Carroll, Thomas S.; Liang, Ziwei; Salama, Rafik; Stark, Rory; de Santiago, Ines
2014-01-01
With the advent of ChIP-seq multiplexing technologies and the subsequent increase in ChIP-seq throughput, the development of working standards for the quality assessment of ChIP-seq studies has received significant attention. The ENCODE consortium's large scale analysis of transcription factor binding and epigenetic marks as well as concordant work on ChIP-seq by other laboratories has established a new generation of ChIP-seq quality control measures. The use of these metrics alongside common processing steps has however not been evaluated. In this study, we investigate the effects of blacklisting and removal of duplicated reads on established metrics of ChIP-seq quality and show that the interpretation of these metrics is highly dependent on the ChIP-seq preprocessing steps applied. Further to this we perform the first investigation of the use of these metrics for ChIP-exo data and make recommendations for the adaptation of the NSC statistic to allow for the assessment of ChIP-exo efficiency. PMID:24782889
NASA Astrophysics Data System (ADS)
Lee, H.
2016-12-01
Precipitation is one of the most important climate variables that are taken into account in studying regional climate. Nevertheless, how precipitation will respond to a changing climate and even its mean state in the current climate are not well represented in regional climate models (RCMs). Hence, comprehensive and mathematically rigorous methodologies to evaluate precipitation and related variables in multiple RCMs are required. The main objective of the current study is to evaluate the joint variability of climate variables related to model performance in simulating precipitation and condense multiple evaluation metrics into a single summary score. We use multi-objective optimization, a mathematical process that provides a set of optimal tradeoff solutions based on a range of evaluation metrics, to characterize the joint representation of precipitation, cloudiness and insolation in RCMs participating in the North American Regional Climate Change Assessment Program (NARCCAP) and Coordinated Regional Climate Downscaling Experiment-North America (CORDEX-NA). We also leverage ground observations, NASA satellite data and the Regional Climate Model Evaluation System (RCMES). Overall, the quantitative comparison of joint probability density functions between the three variables indicates that performance of each model differs markedly between sub-regions and also shows strong seasonal dependence. Because of the large variability across the models, it is important to evaluate models systematically and make future projections using only models showing relatively good performance. Our results indicate that the optimized multi-model ensemble always shows better performance than the arithmetic ensemble mean and may guide reliable future projections.
Perceived crosstalk assessment on patterned retarder 3D display
NASA Astrophysics Data System (ADS)
Zou, Bochao; Liu, Yue; Huang, Yi; Wang, Yongtian
2014-03-01
CONTEXT: Nowadays, almost all stereoscopic displays suffer from crosstalk, which is one of the most dominant degradation factors of image quality and visual comfort for 3D display devices. To deal with such problems, it is worthy to quantify the amount of perceived crosstalk OBJECTIVE: Crosstalk measurements are usually based on some certain test patterns, but scene content effects are ignored. To evaluate the perceived crosstalk level for various scenes, subjective test may bring a more correct evaluation. However, it is a time consuming approach and is unsuitable for real time applications. Therefore, an objective metric that can reliably predict the perceived crosstalk is needed. A correct objective assessment of crosstalk for different scene contents would be beneficial to the development of crosstalk minimization and cancellation algorithms which could be used to bring a good quality of experience to viewers. METHOD: A patterned retarder 3D display is used to present 3D images in our experiment. By considering the mechanism of this kind of devices, an appropriate simulation of crosstalk is realized by image processing techniques to assign different values of crosstalk to each other between image pairs. It can be seen from the literature that the structures of scenes have a significant impact on the perceived crosstalk, so we first extract the differences of the structural information between original and distorted image pairs through Structural SIMilarity (SSIM) algorithm, which could directly evaluate the structural changes between two complex-structured signals. Then the structural changes of left view and right view are computed respectively and combined to an overall distortion map. Under 3D viewing condition, because of the added value of depth, the crosstalk of pop-out objects may be more perceptible. To model this effect, the depth map of a stereo pair is generated and the depth information is filtered by the distortion map. Moreover, human attention is one of important factors for crosstalk assessment due to the fact that when viewing 3D contents, perceptual salient regions are highly likely to be a major contributor to determining the quality of experience of 3D contents. To take this into account, perceptual significant regions are extracted, and a spatial pooling technique is used to combine structural distortion map, depth map and visual salience map together to predict the perceived crosstalk more precisely. To verify the performance of the proposed crosstalk assessment metric, subjective experiments are conducted with 24 participants viewing and rating 60 simuli (5 scenes * 4 crosstalk levels * 3 camera distances). After an outliers removal and statistical process, the correlation with subjective test is examined using Pearson and Spearman rank-order correlation coefficient. Furthermore, the proposed method is also compared with two traditional 2D metrics, PSNR and SSIM. The objective score is mapped to subjective scale using a nonlinear fitting function to directly evaluate the performance of the metric. RESULIS: After the above-mentioned processes, the evaluation results demonstrate that the proposed metric is highly correlated with the subjective score when compared with the existing approaches. Because the Pearson coefficient of the proposed metric is 90.3%, it is promising for objective evaluation of the perceived crosstalk. NOVELTY: The main goal of our paper is to introduce an objective metric for stereo crosstalk assessment. The novelty contributions are twofold. First, an appropriate simulation of crosstalk by considering the characteristics of patterned retarder 3D display is developed. Second, an objective crosstalk metric based on visual attention model is introduced.
Steigerwald, Sarah N.; Park, Jason; Hardy, Krista M.; Gillman, Lawrence; Vergis, Ashley S.
2015-01-01
Background Considerable resources have been invested in both low- and high-fidelity simulators in surgical training. The purpose of this study was to investigate if the Fundamentals of Laparoscopic Surgery (FLS, low-fidelity box trainer) and LapVR (high-fidelity virtual reality) training systems correlate with operative performance on the Global Operative Assessment of Laparoscopic Skills (GOALS) global rating scale using a porcine cholecystectomy model in a novice surgical group with minimal laparoscopic experience. Methods Fourteen postgraduate year 1 surgical residents with minimal laparoscopic experience performed tasks from the FLS program and the LapVR simulator as well as a live porcine laparoscopic cholecystectomy. Performance was evaluated using standardized FLS metrics, automatic computer evaluations, and a validated global rating scale. Results Overall, FLS score did not show an association with GOALS global rating scale score on the porcine cholecystectomy. None of the five LapVR task scores were significantly associated with GOALS score on the porcine cholecystectomy. Conclusions Neither the low-fidelity box trainer or the high-fidelity virtual simulator demonstrated significant correlation with GOALS operative scores. These findings offer caution against the use of these modalities for brief assessments of novice surgical trainees, especially for predictive or selection purposes. PMID:26641071
Raunig, David L; McShane, Lisa M; Pennello, Gene; Gatsonis, Constantine; Carson, Paul L; Voyvodic, James T; Wahl, Richard L; Kurland, Brenda F; Schwarz, Adam J; Gönen, Mithat; Zahlmann, Gudrun; Kondratovich, Marina V; O'Donnell, Kevin; Petrick, Nicholas; Cole, Patricia E; Garra, Brian; Sullivan, Daniel C
2015-02-01
Technological developments and greater rigor in the quantitative measurement of biological features in medical images have given rise to an increased interest in using quantitative imaging biomarkers to measure changes in these features. Critical to the performance of a quantitative imaging biomarker in preclinical or clinical settings are three primary metrology areas of interest: measurement linearity and bias, repeatability, and the ability to consistently reproduce equivalent results when conditions change, as would be expected in any clinical trial. Unfortunately, performance studies to date differ greatly in designs, analysis method, and metrics used to assess a quantitative imaging biomarker for clinical use. It is therefore difficult or not possible to integrate results from different studies or to use reported results to design studies. The Radiological Society of North America and the Quantitative Imaging Biomarker Alliance with technical, radiological, and statistical experts developed a set of technical performance analysis methods, metrics, and study designs that provide terminology, metrics, and methods consistent with widely accepted metrological standards. This document provides a consistent framework for the conduct and evaluation of quantitative imaging biomarker performance studies so that results from multiple studies can be compared, contrasted, or combined. © The Author(s) 2014 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Weiland, Mark A.; Ploskey, Gene R.; Hughes, James S.
The main purpose of the study was to evaluate the performance of Top Spill Weirs installed at two spillbays at John Day Dam and evaluate the effectiveness of these surface flow outlets at attracting juvenile salmon away from the powerhouse and reducing turbine passage. The Juvenile Salmonid Acoustic Telemetry System (JSATS) was used to estimate survival of juvenile salmonids passing the dam and also for calculating performance metrics used to evaluate the efficiency and effectiveness of the dam at passing juvenile salmonids.
ERIC Educational Resources Information Center
Karelina, Irina Georgievna; Sobolev, Alexander Borisovich; Sorokin, Sviatoslav Olegovich
2016-01-01
The article discusses the deployment of a comprehensive reporting and monitoring framework used to evaluate the performance of state and private higher education institutions in Russia. By referring to diversified indicators including organizational, financial and economic, training, research, graduate employment, and other metrics, the authors…
Timesharing performance as an indicator of pilot mental workload
NASA Technical Reports Server (NTRS)
Casper, Patricia A.; Kantowitz, Barry H.; Sorkin, Robert D.
1988-01-01
Attentional deficits (workloads) were evaluated in a timesharing task. The results from this and other experiments were incorporated into an expert system designed to provide workload metric selection advice to non-experts in the field interested in operator workload.
Performance regression manager for large scale systems
Faraj, Daniel A.
2017-10-17
System and computer program product to perform an operation comprising generating, based on a first output generated by a first execution instance of a command, a first output file specifying a value of at least one performance metric, wherein the first output file is formatted according to a predefined format, comparing the value of the at least one performance metric in the first output file to a value of the performance metric in a second output file, the second output file having been generated based on a second output generated by a second execution instance of the command, and outputting for display an indication of a result of the comparison of the value of the at least one performance metric of the first output file to the value of the at least one performance metric of the second output file.
Voxel-based statistical analysis of uncertainties associated with deformable image registration
NASA Astrophysics Data System (ADS)
Li, Shunshan; Glide-Hurst, Carri; Lu, Mei; Kim, Jinkoo; Wen, Ning; Adams, Jeffrey N.; Gordon, James; Chetty, Indrin J.; Zhong, Hualiang
2013-09-01
Deformable image registration (DIR) algorithms have inherent uncertainties in their displacement vector fields (DVFs).The purpose of this study is to develop an optimal metric to estimate DIR uncertainties. Six computational phantoms have been developed from the CT images of lung cancer patients using a finite element method (FEM). The FEM generated DVFs were used as a standard for registrations performed on each of these phantoms. A mechanics-based metric, unbalanced energy (UE), was developed to evaluate these registration DVFs. The potential correlation between UE and DIR errors was explored using multivariate analysis, and the results were validated by landmark approach and compared with two other error metrics: DVF inverse consistency (IC) and image intensity difference (ID). Landmark-based validation was performed using the POPI-model. The results show that the Pearson correlation coefficient between UE and DIR error is rUE-error = 0.50. This is higher than rIC-error = 0.29 for IC and DIR error and rID-error = 0.37 for ID and DIR error. The Pearson correlation coefficient between UE and the product of the DIR displacements and errors is rUE-error × DVF = 0.62 for the six patients and rUE-error × DVF = 0.73 for the POPI-model data. It has been demonstrated that UE has a strong correlation with DIR errors, and the UE metric outperforms the IC and ID metrics in estimating DIR uncertainties. The quantified UE metric can be a useful tool for adaptive treatment strategies, including probability-based adaptive treatment planning.
Favazza, Christopher P; Fetterly, Kenneth A; Hangiandreou, Nicholas J; Leng, Shuai; Schueler, Beth A
2015-01-01
Evaluation of flat-panel angiography equipment through conventional image quality metrics is limited by the scope of standard spatial-domain image quality metric(s), such as contrast-to-noise ratio and spatial resolution, or by restricted access to appropriate data to calculate Fourier domain measurements, such as modulation transfer function, noise power spectrum, and detective quantum efficiency. Observer models have been shown capable of overcoming these limitations and are able to comprehensively evaluate medical-imaging systems. We present a spatial domain-based channelized Hotelling observer model to calculate the detectability index (DI) of our different sized disks and compare the performance of different imaging conditions and angiography systems. When appropriate, changes in DIs were compared to expectations based on the classical Rose model of signal detection to assess linearity of the model with quantum signal-to-noise ratio (SNR) theory. For these experiments, the estimated uncertainty of the DIs was less than 3%, allowing for precise comparison of imaging systems or conditions. For most experimental variables, DI changes were linear with expectations based on quantum SNR theory. DIs calculated for the smallest objects demonstrated nonlinearity with quantum SNR theory due to system blur. Two angiography systems with different detector element sizes were shown to perform similarly across the majority of the detection tasks.
Quantitative criteria for assessment of gamma-ray imager performance
NASA Astrophysics Data System (ADS)
Gottesman, Steve; Keller, Kristi; Malik, Hans
2015-08-01
In recent years gamma ray imagers such as the GammaCamTM and Polaris have demonstrated good imaging performance in the field. Imager performance is often summarized as "resolution", either angular, or spatial at some distance from the imager, however the definition of resolution is not always related to the ability to image an object. It is difficult to quantitatively compare imagers without a common definition of image quality. This paper examines three categories of definition: point source; line source; and area source. It discusses the details of those definitions and which ones are more relevant for different situations. Metrics such as Full Width Half Maximum (FWHM), variations on the Rayleigh criterion, and some analogous to National Imagery Interpretability Rating Scale (NIIRS) are discussed. The performance against these metrics is evaluated for a high resolution coded aperture imager modeled using Monte Carlo N-Particle (MCNP), and for a medium resolution imager measured in the lab.
Researcher and Author Impact Metrics: Variety, Value, and Context.
Gasparyan, Armen Yuri; Yessirkepov, Marlen; Duisenova, Akmaral; Trukhachev, Vladimir I; Kostyukova, Elena I; Kitas, George D
2018-04-30
Numerous quantitative indicators are currently available for evaluating research productivity. No single metric is suitable for comprehensive evaluation of the author-level impact. The choice of particular metrics depends on the purpose and context of the evaluation. The aim of this article is to overview some of the widely employed author impact metrics and highlight perspectives of their optimal use. The h -index is one of the most popular metrics for research evaluation, which is easy to calculate and understandable for non-experts. It is automatically displayed on researcher and author profiles on citation databases such as Scopus and Web of Science. Its main advantage relates to the combined approach to the quantification of publication and citation counts. This index is increasingly cited globally. Being an appropriate indicator of publication and citation activity of highly productive and successfully promoted authors, the h -index has been criticized primarily for disadvantaging early career researchers and authors with a few indexed publications. Numerous variants of the index have been proposed to overcome its limitations. Alternative metrics have also emerged to highlight 'societal impact.' However, each of these traditional and alternative metrics has its own drawbacks, necessitating careful analyses of the context of social attention and value of publication and citation sets. Perspectives of the optimal use of researcher and author metrics is dependent on evaluation purposes and compounded by information sourced from various global, national, and specialist bibliographic databases.
Thermodynamic efficiency of solar concentrators.
Shatz, Narkis; Bortz, John; Winston, Roland
2010-04-26
The optical thermodynamic efficiency is a comprehensive metric that takes into account all loss mechanisms associated with transferring flux from the source to the target phase space, which may include losses due to inadequate design, non-ideal materials, fabrication errors, and less than maximal concentration. We discuss consequences of Fermat's principle of geometrical optics and review étendue dilution and optical loss mechanisms associated with nonimaging concentrators. We develop an expression for the optical thermodynamic efficiency which combines the first and second laws of thermodynamics. As such, this metric is a gold standard for evaluating the performance of nonimaging concentrators. We provide examples illustrating the use of this new metric for concentrating photovoltaic systems for solar power applications, and in particular show how skewness mismatch limits the attainable optical thermodynamic efficiency.
NASA Astrophysics Data System (ADS)
Mehic, M.; Fazio, P.; Voznak, M.; Partila, P.; Komosny, D.; Tovarek, J.; Chmelikova, Z.
2016-05-01
A mobile ad hoc network is a collection of mobile nodes which communicate without a fixed backbone or centralized infrastructure. Due to the frequent mobility of nodes, routes connecting two distant nodes may change. Therefore, it is not possible to establish a priori fixed paths for message delivery through the network. Because of its importance, routing is the most studied problem in mobile ad hoc networks. In addition, if the Quality of Service (QoS) is demanded, one must guarantee the QoS not only over a single hop but over an entire wireless multi-hop path which may not be a trivial task. In turns, this requires the propagation of QoS information within the network. The key to the support of QoS reporting is QoS routing, which provides path QoS information at each source. To support QoS for real-time traffic one needs to know not only minimum delay on the path to the destination but also the bandwidth available on it. Therefore, throughput, end-to-end delay, and routing overhead are traditional performance metrics used to evaluate the performance of routing protocol. To obtain additional information about the link, most of quality-link metrics are based on calculation of the lost probabilities of links by broadcasting probe packets. In this paper, we address the problem of including multiple routing metrics in existing routing packets that are broadcasted through the network. We evaluate the efficiency of such approach with modified version of DSDV routing protocols in ns-3 simulator.
2007-01-01
parameter dimension between the two models). 93 were tested.3 Model 1 log( pHits 1− pHits ) = α + β1 ∗ MetricScore (6.6) The results for each of the...505.67 oTERavg .357 .13 .007 log( pHits 1− pHits ), that is, log-odds of correct task performance, of 2.79 over the intercept only model. All... pHits 1− pHits ) = −1.15− .418× I[MT=2] − .527× I[MT=3] + 1.78×METEOR+ 1.28×METEOR × I[MT=2] + 1.86×METEOR × I[MT=3] (6.7) Model 3 log( pHits 1− pHits
Color preservation for tone reproduction and image enhancement
NASA Astrophysics Data System (ADS)
Hsin, Chengho; Lee, Zong Wei; Lee, Zheng Zhan; Shin, Shaw-Jyh
2014-01-01
Applications based on luminance processing often face the problem of recovering the original chrominance in the output color image. A common approach to reconstruct a color image from the luminance output is by preserving the original hue and saturation. However, this approach often produces a highly colorful image which is undesirable. We develop a color preservation method that not only retains the ratios of the input tri-chromatic values but also adjusts the output chroma in an appropriate way. Linearizing the output luminance is the key idea to realize this method. In addition, a lightness difference metric together with a colorfulness difference metric are proposed to evaluate the performance of the color preservation methods. It shows that the proposed method performs consistently better than the existing approaches.
Chowriappa, Ashirwad J; Shi, Yi; Raza, Syed Johar; Ahmed, Kamran; Stegemann, Andrew; Wilding, Gregory; Kaouk, Jihad; Peabody, James O; Menon, Mani; Hassett, James M; Kesavadas, Thenkurussi; Guru, Khurshid A
2013-12-01
A standardized scoring system does not exist in virtual reality-based assessment metrics to describe safe and crucial surgical skills in robot-assisted surgery. This study aims to develop an assessment score along with its construct validation. All subjects performed key tasks on previously validated Fundamental Skills of Robotic Surgery curriculum, which were recorded, and metrics were stored. After an expert consensus for the purpose of content validation (Delphi), critical safety determining procedural steps were identified from the Fundamental Skills of Robotic Surgery curriculum and a hierarchical task decomposition of multiple parameters using a variety of metrics was used to develop Robotic Skills Assessment Score (RSA-Score). Robotic Skills Assessment mainly focuses on safety in operative field, critical error, economy, bimanual dexterity, and time. Following, the RSA-Score was further evaluated for construct validation and feasibility. Spearman correlation tests performed between tasks using the RSA-Scores indicate no cross correlation. Wilcoxon rank sum tests were performed between the two groups. The proposed RSA-Score was evaluated on non-robotic surgeons (n = 15) and on expert-robotic surgeons (n = 12). The expert group demonstrated significantly better performance on all four tasks in comparison to the novice group. Validation of the RSA-Score in this study was carried out on the Robotic Surgical Simulator. The RSA-Score is a valid scoring system that could be incorporated in any virtual reality-based surgical simulator to achieve standardized assessment of fundamental surgical tents during robot-assisted surgery. Copyright © 2013 Elsevier Inc. All rights reserved.
Healthcare4VideoStorm: Making Smart Decisions Based on Storm Metrics.
Zhang, Weishan; Duan, Pengcheng; Chen, Xiufeng; Lu, Qinghua
2016-04-23
Storm-based stream processing is widely used for real-time large-scale distributed processing. Knowing the run-time status and ensuring performance is critical to providing expected dependability for some applications, e.g., continuous video processing for security surveillance. The existing scheduling strategies' granularity is too coarse to have good performance, and mainly considers network resources without computing resources while scheduling. In this paper, we propose Healthcare4Storm, a framework that finds Storm insights based on Storm metrics to gain knowledge from the health status of an application, finally ending up with smart scheduling decisions. It takes into account both network and computing resources and conducts scheduling at a fine-grained level using tuples instead of topologies. The comprehensive evaluation shows that the proposed framework has good performance and can improve the dependability of the Storm-based applications.
Quality metrics for sensor images
NASA Technical Reports Server (NTRS)
Ahumada, AL
1993-01-01
Methods are needed for evaluating the quality of augmented visual displays (AVID). Computational quality metrics will help summarize, interpolate, and extrapolate the results of human performance tests with displays. The FLM Vision group at NASA Ames has been developing computational models of visual processing and using them to develop computational metrics for similar problems. For example, display modeling systems use metrics for comparing proposed displays, halftoning optimizing methods use metrics to evaluate the difference between the halftone and the original, and image compression methods minimize the predicted visibility of compression artifacts. The visual discrimination models take as input two arbitrary images A and B and compute an estimate of the probability that a human observer will report that A is different from B. If A is an image that one desires to display and B is the actual displayed image, such an estimate can be regarded as an image quality metric reflecting how well B approximates A. There are additional complexities associated with the problem of evaluating the quality of radar and IR enhanced displays for AVID tasks. One important problem is the question of whether intruding obstacles are detectable in such displays. Although the discrimination model can handle detection situations by making B the original image A plus the intrusion, this detection model makes the inappropriate assumption that the observer knows where the intrusion will be. Effects of signal uncertainty need to be added to our models. A pilot needs to make decisions rapidly. The models need to predict not just the probability of a correct decision, but the probability of a correct decision by the time the decision needs to be made. That is, the models need to predict latency as well as accuracy. Luce and Green have generated models for auditory detection latencies. Similar models are needed for visual detection. Most image quality models are designed for static imagery. Watson has been developing a general spatial-temporal vision model to optimize video compression techniques. These models need to be adapted and calibrated for AVID applications.
75 FR 7581 - RTO/ISO Performance Metrics; Notice Requesting Comments on RTO/ISO Performance Metrics
Federal Register 2010, 2011, 2012, 2013, 2014
2010-02-22
... performance communicate about the benefits of RTOs and, where appropriate, (2) changes that need to be made to... of staff from all the jurisdictional ISOs/RTOs to develop a set of performance metrics that the ISOs/RTOs will use to report annually to the Commission. Commission staff and representatives from the ISOs...
Performance regression manager for large scale systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faraj, Daniel A.
Methods comprising generating, based on a first output generated by a first execution instance of a command, a first output file specifying a value of at least one performance metric, wherein the first output file is formatted according to a predefined format, comparing the value of the at least one performance metric in the first output file to a value of the performance metric in a second output file, the second output file having been generated based on a second output generated by a second execution instance of the command, and outputting for display an indication of a result ofmore » the comparison of the value of the at least one performance metric of the first output file to the value of the at least one performance metric of the second output file.« less
Holtzclaw, Dan J
2017-02-01
Previously published research for a single metropolitan market (Austin, Texas) found that periodontists fare poorly on the Yelp website for nearly all measured metrics, including average star ratings, number of reviews, review removal rate, and evaluations by "elite" Yelp users. The purpose of the current study is to confirm or refute these findings by expanding datasets to additional metropolitan markets of various sizes and geographic locations. A total of 6,559 Yelp reviews were examined for general dentists, endodontists, pediatric dentists, oral surgeons, orthodontists, and periodontists in small (Austin, Texas), medium (Seattle, Washington), and large (New York City, New York) metropolitan markets. Numerous review characteristics were evaluated, including: 1) total number of reviews; 2) average star rating; 3) review filtering rate; and 4) number of reviews by Yelp members with elite status. Results were compared in multiple ways to determine whether statistically significant differences existed. In all metropolitan markets, periodontists were outperformed by all other dental specialties for all measured Yelp metrics in this study. Intermetropolitan comparisons of periodontal practices showed no statistically significant differences. Periodontists were outperformed consistently by all other dental specialties in every measured metric on the Yelp website. These results were consistent and repeated in all three metropolitan markets evaluated in this study. Poor performance of periodontists on Yelp may be related to the age profile of patients in the typical periodontal practice. This may result in inadvertently biased filtering of periodontal reviews and subsequently poor performance in multiple other categories.
Metrics for the Evaluation the Utility of Air Quality Forecasting
NASA Astrophysics Data System (ADS)
Sumo, T. M.; Stockwell, W. R.
2013-12-01
Global warming is expected to lead to higher levels of air pollution and therefore the forecasting of both long-term and daily air quality is an important component for the assessment of the costs of climate change and its impact on human health. Some of the risks associated with poor air quality days (where the Air Pollution Index is greater than 100), include hospital visits and mortality. Accurate air quality forecasting has the potential to allow sensitive groups to take appropriate precautions. This research builds metrics for evaluating the utility of air quality forecasting in terms of its potential impacts. Our analysis of air quality models focuses on the Washington, DC/Baltimore, MD region over the summertime ozone seasons between 2010 and 2012. The metrics that are relevant to our analysis include: (1) The number of times that a high ozone or particulate matter (PM) episode is correctly forecasted, (2) the number of times that high ozone or PM episode is forecasted when it does not occur and (3) the number of times when the air quality forecast predicts a cleaner air episode when the air was observed to have high ozone or PM. Our evaluation of the performance of air quality forecasts include those forecasts of ozone and particulate matter and data available from the U.S. Environmental Protection Agency (EPA)'s AIRNOW. We also examined observational ozone and particulate matter data available from Clean Air Partners. Overall the forecast models perform well for our region and time interval.
Performance regression manager for large scale systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faraj, Daniel A.
System and computer program product to perform an operation comprising generating, based on a first output generated by a first execution instance of a command, a first output file specifying a value of at least one performance metric, wherein the first output file is formatted according to a predefined format, comparing the value of the at least one performance metric in the first output file to a value of the performance metric in a second output file, the second output file having been generated based on a second output generated by a second execution instance of the command, and outputtingmore » for display an indication of a result of the comparison of the value of the at least one performance metric of the first output file to the value of the at least one performance metric of the second output file.« less
Evaluation of bus transit reliability in the District of Columbia.
DOT National Transportation Integrated Search
2013-11-01
Several performance metrics can be used to assess the reliability of a transit system. These include on-time arrivals, travel-time : adherence, run-time adherence, and customer satisfaction, among others. On-time arrival at bus stops is one of the pe...
METRICS OF PERFORMANCE FOR THE SABRE MICROCOSM STUDY (ABSTRACT ONLY)
The SABRE (Source Area BioREmediation) project will evaluate accelerated anaerobic bioremediation of chlorinated solvents in areas of high concentration, such as DNAPL source areas. In preparation for a field scale pilot test, a laboratory microcosm study was conducted to provide...
DOT National Transportation Integrated Search
1999-01-01
This report describes the performance metrics and the baseline measurement system. It also describes efforts to determine the accuracy of the baseline system in determining the time and position of roadway departures. In addition, the precise, synchr...
Automated map sharpening by maximization of detail and connectivity
DOE Office of Scientific and Technical Information (OSTI.GOV)
Terwilliger, Thomas C.; Sobolev, Oleg V.; Afonine, Pavel V.
An algorithm for automatic map sharpening is presented that is based on optimization of the detail and connectivity of the sharpened map. The detail in the map is reflected in the surface area of an iso-contour surface that contains a fixed fraction of the volume of the map, where a map with high level of detail has a high surface area. The connectivity of the sharpened map is reflected in the number of connected regions defined by the same iso-contour surfaces, where a map with high connectivity has a small number of connected regions. By combining these two measures inmore » a metric termed the `adjusted surface area', map quality can be evaluated in an automated fashion. This metric was used to choose optimal map-sharpening parameters without reference to a model or other interpretations of the map. Map sharpening by optimization of the adjusted surface area can be carried out for a map as a whole or it can be carried out locally, yielding a locally sharpened map. To evaluate the performance of various approaches, a simple metric based on map–model correlation that can reproduce visual choices of optimally sharpened maps was used. The map–model correlation is calculated using a model withBfactors (atomic displacement factors; ADPs) set to zero. Finally, this model-based metric was used to evaluate map sharpening and to evaluate map-sharpening approaches, and it was found that optimization of the adjusted surface area can be an effective tool for map sharpening.« less
Automated map sharpening by maximization of detail and connectivity
Terwilliger, Thomas C.; Sobolev, Oleg V.; Afonine, Pavel V.; ...
2018-05-18
An algorithm for automatic map sharpening is presented that is based on optimization of the detail and connectivity of the sharpened map. The detail in the map is reflected in the surface area of an iso-contour surface that contains a fixed fraction of the volume of the map, where a map with high level of detail has a high surface area. The connectivity of the sharpened map is reflected in the number of connected regions defined by the same iso-contour surfaces, where a map with high connectivity has a small number of connected regions. By combining these two measures inmore » a metric termed the `adjusted surface area', map quality can be evaluated in an automated fashion. This metric was used to choose optimal map-sharpening parameters without reference to a model or other interpretations of the map. Map sharpening by optimization of the adjusted surface area can be carried out for a map as a whole or it can be carried out locally, yielding a locally sharpened map. To evaluate the performance of various approaches, a simple metric based on map–model correlation that can reproduce visual choices of optimally sharpened maps was used. The map–model correlation is calculated using a model withBfactors (atomic displacement factors; ADPs) set to zero. Finally, this model-based metric was used to evaluate map sharpening and to evaluate map-sharpening approaches, and it was found that optimization of the adjusted surface area can be an effective tool for map sharpening.« less
Petraco, Ricardo; Dehbi, Hakim-Moulay; Howard, James P; Shun-Shin, Matthew J; Sen, Sayan; Nijjer, Sukhjinder S; Mayet, Jamil; Davies, Justin E; Francis, Darrel P
2018-01-01
Diagnostic accuracy is widely accepted by researchers and clinicians as an optimal expression of a test's performance. The aim of this study was to evaluate the effects of disease severity distribution on values of diagnostic accuracy as well as propose a sample-independent methodology to calculate and display accuracy of diagnostic tests. We evaluated the diagnostic relationship between two hypothetical methods to measure serum cholesterol (Chol rapid and Chol gold ) by generating samples with statistical software and (1) keeping the numerical relationship between methods unchanged and (2) changing the distribution of cholesterol values. Metrics of categorical agreement were calculated (accuracy, sensitivity and specificity). Finally, a novel methodology to display and calculate accuracy values was presented (the V-plot of accuracies). No single value of diagnostic accuracy can be used to describe the relationship between tests, as accuracy is a metric heavily affected by the underlying sample distribution. Our novel proposed methodology, the V-plot of accuracies, can be used as a sample-independent measure of a test performance against a reference gold standard.
Multiscale Medical Image Fusion in Wavelet Domain
Khare, Ashish
2013-01-01
Wavelet transforms have emerged as a powerful tool in image fusion. However, the study and analysis of medical image fusion is still a challenging area of research. Therefore, in this paper, we propose a multiscale fusion of multimodal medical images in wavelet domain. Fusion of medical images has been performed at multiple scales varying from minimum to maximum level using maximum selection rule which provides more flexibility and choice to select the relevant fused images. The experimental analysis of the proposed method has been performed with several sets of medical images. Fusion results have been evaluated subjectively and objectively with existing state-of-the-art fusion methods which include several pyramid- and wavelet-transform-based fusion methods and principal component analysis (PCA) fusion method. The comparative analysis of the fusion results has been performed with edge strength (Q), mutual information (MI), entropy (E), standard deviation (SD), blind structural similarity index metric (BSSIM), spatial frequency (SF), and average gradient (AG) metrics. The combined subjective and objective evaluations of the proposed fusion method at multiple scales showed the effectiveness and goodness of the proposed approach. PMID:24453868
Metrics for building performance assurance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Koles, G.; Hitchcock, R.; Sherman, M.
This report documents part of the work performed in phase I of a Laboratory Directors Research and Development (LDRD) funded project entitled Building Performance Assurances (BPA). The focus of the BPA effort is to transform the way buildings are built and operated in order to improve building performance by facilitating or providing tools, infrastructure, and information. The efforts described herein focus on the development of metrics with which to evaluate building performance and for which information and optimization tools need to be developed. The classes of building performance metrics reviewed are (1) Building Services (2) First Costs, (3) Operating Costs,more » (4) Maintenance Costs, and (5) Energy and Environmental Factors. The first category defines the direct benefits associated with buildings; the next three are different kinds of costs associated with providing those benefits; the last category includes concerns that are broader than direct costs and benefits to the building owner and building occupants. The level of detail of the various issues reflect the current state of knowledge in those scientific areas and the ability of the to determine that state of knowledge, rather than directly reflecting the importance of these issues; it intentionally does not specifically focus on energy issues. The report describes work in progress and is intended as a resource and can be used to indicate the areas needing more investigation. Other reports on BPA activities are also available.« less
NEW CATEGORICAL METRICS FOR AIR QUALITY MODEL EVALUATION
Traditional categorical metrics used in model evaluations are "clear-cut" measures in that the model's ability to predict an exceedance is defined by a fixed threshold concentration and the metrics are defined by observation-forecast sets that are paired both in space and time. T...
Coreference Resolution With Reconcile
2010-07-01
evaluation of coreference re- solvers across a variety of benchmark data sets and standard scoring metrics. We describe Reconcile and present experimental... scores vary wildly across data sets, evaluation metrics, and system configurations. We believe that one root cause of these dispar- ities is the high...resolution and empirical evaluation of coreference resolvers across a variety of benchmark data sets and standard scoring metrics. We describe Reconcile
2014-01-01
valid OMB control number. 1. REPORT DATE 2014 2. REPORT TYPE 3. DATES COVERED 00-00-2014 to 00-00-2014 4. TITLE AND SUBTITLE Measuring...should approaches to monitoring program performance. Recognizing this, Congress requested that the Department of Defense improve metrics for measuring...Cooperative Biological Engagement Program Performance broader community of program evaluation practitioners, the work advances innovative approaches
Shao, Feng; Lin, Weisi; Gu, Shanbo; Jiang, Gangyi; Srikanthan, Thambipillai
2013-05-01
Perceptual quality assessment is a challenging issue in 3D signal processing research. It is important to study 3D signal directly instead of studying simple extension of the 2D metrics directly to the 3D case as in some previous studies. In this paper, we propose a new perceptual full-reference quality assessment metric of stereoscopic images by considering the binocular visual characteristics. The major technical contribution of this paper is that the binocular perception and combination properties are considered in quality assessment. To be more specific, we first perform left-right consistency checks and compare matching error between the corresponding pixels in binocular disparity calculation, and classify the stereoscopic images into non-corresponding, binocular fusion, and binocular suppression regions. Also, local phase and local amplitude maps are extracted from the original and distorted stereoscopic images as features in quality assessment. Then, each region is evaluated independently by considering its binocular perception property, and all evaluation results are integrated into an overall score. Besides, a binocular just noticeable difference model is used to reflect the visual sensitivity for the binocular fusion and suppression regions. Experimental results show that compared with the relevant existing metrics, the proposed metric can achieve higher consistency with subjective assessment of stereoscopic images.
Citizen science: A new perspective to advance spatial pattern evaluation in hydrology
Stisen, Simon
2017-01-01
Citizen science opens new pathways that can complement traditional scientific practice. Intuition and reasoning often make humans more effective than computer algorithms in various realms of problem solving. In particular, a simple visual comparison of spatial patterns is a task where humans are often considered to be more reliable than computer algorithms. However, in practice, science still largely depends on computer based solutions, which inevitably gives benefits such as speed and the possibility to automatize processes. However, the human vision can be harnessed to evaluate the reliability of algorithms which are tailored to quantify similarity in spatial patterns. We established a citizen science project to employ the human perception to rate similarity and dissimilarity between simulated spatial patterns of several scenarios of a hydrological catchment model. In total, the turnout counts more than 2500 volunteers that provided over 43000 classifications of 1095 individual subjects. We investigate the capability of a set of advanced statistical performance metrics to mimic the human perception to distinguish between similarity and dissimilarity. Results suggest that more complex metrics are not necessarily better at emulating the human perception, but clearly provide auxiliary information that is valuable for model diagnostics. The metrics clearly differ in their ability to unambiguously distinguish between similar and dissimilar patterns which is regarded a key feature of a reliable metric. The obtained dataset can provide an insightful benchmark to the community to test novel spatial metrics. PMID:28558050
Performance evaluation methodology for historical document image binarization.
Ntirogiannis, Konstantinos; Gatos, Basilis; Pratikakis, Ioannis
2013-02-01
Document image binarization is of great importance in the document image analysis and recognition pipeline since it affects further stages of the recognition process. The evaluation of a binarization method aids in studying its algorithmic behavior, as well as verifying its effectiveness, by providing qualitative and quantitative indication of its performance. This paper addresses a pixel-based binarization evaluation methodology for historical handwritten/machine-printed document images. In the proposed evaluation scheme, the recall and precision evaluation measures are properly modified using a weighting scheme that diminishes any potential evaluation bias. Additional performance metrics of the proposed evaluation scheme consist of the percentage rates of broken and missed text, false alarms, background noise, character enlargement, and merging. Several experiments conducted in comparison with other pixel-based evaluation measures demonstrate the validity of the proposed evaluation scheme.
2014-01-01
Background Extracted ion chromatogram (EIC) extraction and chromatographic peak detection are two important processing procedures in liquid chromatography/mass spectrometry (LC/MS)-based metabolomics data analysis. Most commonly, the LC/MS technique employs electrospray ionization as the ionization method. The EICs from LC/MS data are often noisy and contain high background signals. Furthermore, the chromatographic peak quality varies with respect to its location in the chromatogram and most peaks have zigzag shapes. Therefore, there is a critical need to develop effective metrics for quality evaluation of EICs and chromatographic peaks in LC/MS based metabolomics data analysis. Results We investigated a comprehensive set of potential quality evaluation metrics for extracted EICs and detected chromatographic peaks. Specifically, for EIC quality evaluation, we analyzed the mass chromatographic quality index (MCQ index) and propose a novel quality evaluation metric, the EIC-related global zigzag index, which is based on an EIC's first order derivatives. For chromatographic peak quality evaluation, we analyzed and compared six metrics: sharpness, Gaussian similarity, signal-to-noise ratio, peak significance level, triangle peak area similarity ratio and the local peak-related local zigzag index. Conclusions Although the MCQ index is suited for selecting and aligning analyte components, it cannot fairly evaluate EICs with high background signals or those containing only a single peak. Our proposed EIC related global zigzag index is robust enough to evaluate EIC qualities in both scenarios. Of the six peak quality evaluation metrics, the sharpness, peak significance level, and zigzag index outperform the others due to the zigzag nature of LC/MS chromatographic peaks. Furthermore, using several peak quality metrics in combination is more efficient than individual metrics in peak quality evaluation. PMID:25350128
Zhang, Wenchao; Zhao, Patrick X
2014-01-01
Extracted ion chromatogram (EIC) extraction and chromatographic peak detection are two important processing procedures in liquid chromatography/mass spectrometry (LC/MS)-based metabolomics data analysis. Most commonly, the LC/MS technique employs electrospray ionization as the ionization method. The EICs from LC/MS data are often noisy and contain high background signals. Furthermore, the chromatographic peak quality varies with respect to its location in the chromatogram and most peaks have zigzag shapes. Therefore, there is a critical need to develop effective metrics for quality evaluation of EICs and chromatographic peaks in LC/MS based metabolomics data analysis. We investigated a comprehensive set of potential quality evaluation metrics for extracted EICs and detected chromatographic peaks. Specifically, for EIC quality evaluation, we analyzed the mass chromatographic quality index (MCQ index) and propose a novel quality evaluation metric, the EIC-related global zigzag index, which is based on an EIC's first order derivatives. For chromatographic peak quality evaluation, we analyzed and compared six metrics: sharpness, Gaussian similarity, signal-to-noise ratio, peak significance level, triangle peak area similarity ratio and the local peak-related local zigzag index. Although the MCQ index is suited for selecting and aligning analyte components, it cannot fairly evaluate EICs with high background signals or those containing only a single peak. Our proposed EIC related global zigzag index is robust enough to evaluate EIC qualities in both scenarios. Of the six peak quality evaluation metrics, the sharpness, peak significance level, and zigzag index outperform the others due to the zigzag nature of LC/MS chromatographic peaks. Furthermore, using several peak quality metrics in combination is more efficient than individual metrics in peak quality evaluation.
Product evaluation based in the association between intuition and tasks.
Almeida e Silva, Caio Márcio; Okimoto, Maria Lúcia L R; Albertazzi, Deise; Calixto, Cyntia; Costa, Humberto
2012-01-01
This paper explores the importance of researching the intuitiveness in the product use. It approaches the intuitiveness influence for users that already had a visual experience of the product. Finally, it is suggested the use of a table that relates the tasks performed while using a product, the features for an intuitive use and the performance metric "task success".
I/O Performance Characterization of Lustre and NASA Applications on Pleiades
NASA Technical Reports Server (NTRS)
Saini, Subhash; Rappleye, Jason; Chang, Johnny; Barker, David Peter; Biswas, Rupak; Mehrotra, Piyush
2012-01-01
In this paper we study the performance of the Lustre file system using five scientific and engineering applications representative of NASA workload on large-scale supercomputing systems such as NASA s Pleiades. In order to facilitate the collection of Lustre performance metrics, we have developed a software tool that exports a wide variety of client and server-side metrics using SGI's Performance Co-Pilot (PCP), and generates a human readable report on key metrics at the end of a batch job. These performance metrics are (a) amount of data read and written, (b) number of files opened and closed, and (c) remote procedure call (RPC) size distribution (4 KB to 1024 KB, in powers of 2) for I/O operations. RPC size distribution measures the efficiency of the Lustre client and can pinpoint problems such as small write sizes, disk fragmentation, etc. These extracted statistics are useful in determining the I/O pattern of the application and can assist in identifying possible improvements for users applications. Information on the number of file operations enables a scientist to optimize the I/O performance of their applications. Amount of I/O data helps users choose the optimal stripe size and stripe count to enhance I/O performance. In this paper, we demonstrate the usefulness of this tool on Pleiades for five production quality NASA scientific and engineering applications. We compare the latency of read and write operations under Lustre to that with NFS by tracing system calls and signals. We also investigate the read and write policies and study the effect of page cache size on I/O operations. We examine the performance impact of Lustre stripe size and stripe count along with performance evaluation of file per process and single shared file accessed by all the processes for NASA workload using parameterized IOR benchmark.
Impact of Different Economic Performance Metrics on the Perceived Value of Solar Photovoltaics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Drury, E.; Denholm, P.; Margolis, R.
2011-10-01
Photovoltaic (PV) systems are installed by several types of market participants, ranging from residential customers to large-scale project developers and utilities. Each type of market participant frequently uses a different economic performance metric to characterize PV value because they are looking for different types of returns from a PV investment. This report finds that different economic performance metrics frequently show different price thresholds for when a PV investment becomes profitable or attractive. Several project parameters, such as financing terms, can have a significant impact on some metrics [e.g., internal rate of return (IRR), net present value (NPV), and benefit-to-cost (B/C)more » ratio] while having a minimal impact on other metrics (e.g., simple payback time). As such, the choice of economic performance metric by different customer types can significantly shape each customer's perception of PV investment value and ultimately their adoption decision.« less
Development of a Multimetric Indicator of Pelagic Zooplankton ...
We used zooplankton data collected for the 2012 National Lakes Assessment (NLA) to develop multimetric indices (MMIs) for five aggregated ecoregions of the conterminous USA (Coastal Plains, Eastern Highlands, Plains, Upper Midwest, and Western Mountains and Xeric [“West’]). We classified candidate metrics into six categories: We evaluated the performance of candidate metrics, and used metrics that had passed these screens to calculate all possible candidate MMIs that included at least one metric from each category. We selected the candidate MMI that had high responsiveness, a reasonable value for repeatability, low mean pairwise correlation among component metrics, and, when possible, a maximum pairwise correlation among component metrics that was <0.7. We were able to develop MMIs that were sufficiently responsive and repeatable to assess ecological condition for the NLA without the need to reduce the effects of natural variation using models. We did not observe effects of either lake size, lake origin, or site depth on the MMIs. The MMIs appear to respond more strongly to increased nutrient concentrations than to shoreline habitat conditions. Improving our understanding of how zooplankton assemblages respond to increased human disturbance, and obtaining more complete autecological information for zooplankton taxa would likely improve MMIs developed for future assessments. Using zooplankton assemblage data from the 2012 National Lakes Assessment (NLA),
A New Metric for Land-Atmosphere Coupling Strength: Applications on Observations and Modeling
NASA Astrophysics Data System (ADS)
Tang, Q.; Xie, S.; Zhang, Y.; Phillips, T. J.; Santanello, J. A., Jr.; Cook, D. R.; Riihimaki, L.; Gaustad, K.
2017-12-01
A new metric is proposed to quantify the land-atmosphere (LA) coupling strength and is elaborated by correlating the surface evaporative fraction and impacting land and atmosphere variables (e.g., soil moisture, vegetation, and radiation). Based upon multiple linear regression, this approach simultaneously considers multiple factors and thus represents complex LA coupling mechanisms better than existing single variable metrics. The standardized regression coefficients quantify the relative contributions from individual drivers in a consistent manner, avoiding the potential inconsistency in relative influence of conventional metrics. Moreover, the unique expendable feature of the new method allows us to verify and explore potentially important coupling mechanisms. Our observation-based application of the new metric shows moderate coupling with large spatial variations at the U.S. Southern Great Plains. The relative importance of soil moisture vs. vegetation varies by location. We also show that LA coupling strength is generally underestimated by single variable methods due to their incompleteness. We also apply this new metric to evaluate the representation of LA coupling in the Accelerated Climate Modeling for Energy (ACME) V1 Contiguous United States (CONUS) regionally refined model (RRM). This work is performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL-ABS-734201
An exploratory survey of methods used to develop measures of performance
NASA Astrophysics Data System (ADS)
Hamner, Kenneth L.; Lafleur, Charles A.
1993-09-01
Nonmanufacturing organizations are being challenged to provide high-quality products and services to their customers, with an emphasis on continuous process improvement. Measures of performance, referred to as metrics, can be used to foster process improvement. The application of performance measurement to nonmanufacturing processes can be very difficult. This research explored methods used to develop metrics in nonmanufacturing organizations. Several methods were formally defined in the literature, and the researchers used a two-step screening process to determine the OMB Generic Method was most likely to produce high-quality metrics. The OMB Generic Method was then used to develop metrics. A few other metric development methods were found in use at nonmanufacturing organizations. The researchers interviewed participants in metric development efforts to determine their satisfaction and to have them identify the strengths and weaknesses of, and recommended improvements to, the metric development methods used. Analysis of participants' responses allowed the researchers to identify the key components of a sound metrics development method. Those components were incorporated into a proposed metric development method that was based on the OMB Generic Method, and should be more likely to produce high-quality metrics that will result in continuous process improvement.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morrissey, Elmer; O'Donnell, James; Keane, Marcus
2004-03-29
Minimizing building life cycle energy consumption is becoming of paramount importance. Performance metrics tracking offers a clear and concise manner of relating design intent in a quantitative form. A methodology is discussed for storage and utilization of these performance metrics through an Industry Foundation Classes (IFC) instantiated Building Information Model (BIM). The paper focuses on storage of three sets of performance data from three distinct sources. An example of a performance metrics programming hierarchy is displayed for a heat pump and a solar array. Utilizing the sets of performance data, two discrete performance effectiveness ratios may be computed, thus offeringmore » an accurate method of quantitatively assessing building performance.« less
NASA Technical Reports Server (NTRS)
McConnell, Joshua B.
2000-01-01
The scientific exploration of Mars will require the collection and return of subterranean samples to Earth for examination. This necessitates the use of some type of device or devices that possesses the ability to effectively penetrate the Martian surface, collect suitable samples and return them to the surface in a manner consistent with imposed scientific constraints. The first opportunity for such a device will occur on the 2003 and 2005 Mars Sample Return missions, being performed by NASA. This paper reviews the work completed on the compilation of a database containing viable penetrating and sampling devices, the performance of a system level trade study comparing selected devices to a set of prescribed parameters and the employment of a metric for the evaluation and ranking of the traded penetration and sampling devices, with respect to possible usage on the 03 and 05 sample return missions. The trade study performed is based on a select set of scientific, engineering, programmatic and socio-political criterion. The use of a metric for the various penetration and sampling devices will act to expedite current and future device selection.
Performance Evaluation of the Approaches and Algorithms for Hamburg Airport Operations
NASA Technical Reports Server (NTRS)
Zhu, Zhifan; Jung, Yoon; Lee, Hanbong; Schier, Sebastian; Okuniek, Nikolai; Gerdes, Ingrid
2016-01-01
In this work, fast-time simulations have been conducted using SARDA tools at Hamburg airport by NASA and real-time simulations using CADEO and TRACC with the NLR ATM Research Simulator (NARSIM) by DLR. The outputs are analyzed using a set of common metrics collaborated between DLR and NASA. The proposed metrics are derived from International Civil Aviation Organization (ICAO)s Key Performance Areas (KPAs) in capability, efficiency, predictability and environment, and adapted to simulation studies. The results are examined to explore and compare the merits and shortcomings of the two approaches using the common performance metrics. Particular attention is paid to the concept of the close-loop, trajectory-based taxi as well as the application of US concept to the European airport. Both teams consider the trajectory-based surface operation concept a critical technology advance in not only addressing the current surface traffic management problems, but also having potential application in unmanned vehicle maneuver on airport surface, such as autonomous towing or TaxiBot [6][7] and even Remote Piloted Aircraft (RPA). Based on this work, a future integration of TRACC and SOSS is described aiming at bringing conflict-free trajectory-based operation concept to US airport.
Wafer hot spot identification through advanced photomask characterization techniques: part 2
NASA Astrophysics Data System (ADS)
Choi, Yohan; Green, Michael; Cho, Young; Ham, Young; Lin, Howard; Lan, Andy; Yang, Richer; Lung, Mike
2017-03-01
Historically, 1D metrics such as Mean to Target (MTT) and CD Uniformity (CDU) have been adequate for mask end users to evaluate and predict the mask impact on the wafer process. However, the wafer lithographer's process margin is shrinking at advanced nodes to a point that classical mask CD metrics are no longer adequate to gauge the mask contribution to wafer process error. For example, wafer CDU error at advanced nodes is impacted by mask factors such as 3-dimensional (3D) effects and mask pattern fidelity on sub-resolution assist features (SRAFs) used in Optical Proximity Correction (OPC) models of ever-increasing complexity. To overcome the limitation of 1D metrics, there are numerous on-going industry efforts to better define wafer-predictive metrics through both standard mask metrology and aerial CD methods. Even with these improvements, the industry continues to struggle to define useful correlative metrics that link the mask to final device performance. In part 1 of this work, we utilized advanced mask pattern characterization techniques to extract potential hot spots on the mask and link them, theoretically, to issues with final wafer performance. In this paper, part 2, we complete the work by verifying these techniques at wafer level. The test vehicle (TV) that was used for hot spot detection on the mask in part 1 will be used to expose wafers. The results will be used to verify the mask-level predictions. Finally, wafer performance with predicted and verified mask/wafer condition will be shown as the result of advanced mask characterization. The goal is to maximize mask end user yield through mask-wafer technology harmonization. This harmonization will provide the necessary feedback to determine optimum design, mask specifications, and mask-making conditions for optimal wafer process margin.
A Case Study Based Analysis of Performance Metrics for Green Infrastructure
NASA Astrophysics Data System (ADS)
Gordon, B. L.; Ajami, N.; Quesnel, K.
2017-12-01
Aging infrastructure, population growth, and urbanization are demanding new approaches to management of all components of the urban water cycle, including stormwater. Traditionally, urban stormwater infrastructure was designed to capture and convey rainfall-induced runoff out of a city through a network of curbs, gutters, drains, and pipes, also known as grey infrastructure. These systems were planned with a single-purpose and designed under the assumption of hydrologic stationarity, a notion that no longer holds true in the face of a changing climate. One solution gaining momentum around the world is green infrastructure (GI). Beyond stormwater quality improvement and quantity reduction (or technical benefits), GI solutions offer many environmental, economic, and social benefits. Yet many practical barriers have prevented the widespread adoption of these systems worldwide. At the center of these challenges is the inability of stakeholders to know how to monitor, measure, and assess the multi-sector performance of GI systems. Traditional grey infrastructure projects require different monitoring strategies than natural systems; there are no overarching policies on how to best design GI monitoring and evaluation systems and measure performance. Previous studies have attempted to quantify the performance of GI, mostly using one evaluation method on a specific case study. We use a case study approach to address these knowledge gaps and develop a conceptual model of how to evaluate the performance of GI through the lens of financing. First, we examined many different case studies of successfully implemented GI around the world. Then we narrowed in on 10 exemplary case studies. For each case studies, we determined what performance method the project developer used such as LCA, TBL, Low Impact Design Assessment (LIDA) and others. Then, we determined which performance metrics were used to determine success and what data was needed to calculate those metrics. Finally, we examine risk priorities of both public and private actors to see how they varied and how risk was overcome. We synthesized these results to pull out key themes and lessons for the future. If project implementers are able to quantify the benefits and show investors how beneficial these systems can be, more will be implemented in the future.
Benchmarking homogenization algorithms for monthly data
NASA Astrophysics Data System (ADS)
Venema, V. K. C.; Mestre, O.; Aguilar, E.; Auer, I.; Guijarro, J. A.; Domonkos, P.; Vertacnik, G.; Szentimrey, T.; Stepanek, P.; Zahradnicek, P.; Viarre, J.; Müller-Westermeier, G.; Lakatos, M.; Williams, C. N.; Menne, M. J.; Lindau, R.; Rasol, D.; Rustemeier, E.; Kolokythas, K.; Marinova, T.; Andresen, L.; Acquaotta, F.; Fratiannil, S.; Cheval, S.; Klancar, M.; Brunetti, M.; Gruber, C.; Prohom Duran, M.; Likso, T.; Esteban, P.; Brandsma, T.; Willett, K.
2013-09-01
The COST (European Cooperation in Science and Technology) Action ES0601: Advances in homogenization methods of climate series: an integrated approach (HOME) has executed a blind intercomparison and validation study for monthly homogenization algorithms. Time series of monthly temperature and precipitation were evaluated because of their importance for climate studies. The algorithms were validated against a realistic benchmark dataset. Participants provided 25 separate homogenized contributions as part of the blind study as well as 22 additional solutions submitted after the details of the imposed inhomogeneities were revealed. These homogenized datasets were assessed by a number of performance metrics including i) the centered root mean square error relative to the true homogeneous values at various averaging scales, ii) the error in linear trend estimates and iii) traditional contingency skill scores. The metrics were computed both using the individual station series as well as the network average regional series. The performance of the contributions depends significantly on the error metric considered. Although relative homogenization algorithms typically improve the homogeneity of temperature data, only the best ones improve precipitation data. Moreover, state-of-the-art relative homogenization algorithms developed to work with an inhomogeneous reference are shown to perform best. The study showed that currently automatic algorithms can perform as well as manual ones.
Contrast model for three-dimensional vehicles in natural lighting and search performance analysis
NASA Astrophysics Data System (ADS)
Witus, Gary; Gerhart, Grant R.; Ellis, R. Darin
2001-09-01
Ground vehicles in natural lighting tend to have significant and systematic variation in luminance through the presented area. This arises, in large part, from the vehicle surfaces having different orientations and shadowing relative to the source of illumination and the position of the observer. These systematic differences create the appearance of a structured 3D object. The 3D appearance is an important factor in search, figure-ground segregation, and object recognition. We present a contrast metric to predict search and detection performance that accounts for the 3D structure. The approach first computes the contrast of the front (or rear), side, and top surfaces. The vehicle contrast metric is the area-weighted sum of the absolute values of the contrasts of the component surfaces. The 3D structure contrast metric, together with target height, account for more than 80% of the variance in probability of detection and 75% of the variance in search time. When false alarm effects are discounted, they account for 89% of the variance in probability of detection and 95% of the variance in search time. The predictive power of the signature metric, when calibrated to half the data and evaluated against the other half, is 90% of the explanatory power.
NASA Astrophysics Data System (ADS)
Kierkels, R. G. J.; den Otter, L. A.; Korevaar, E. W.; Langendijk, J. A.; van der Schaaf, A.; Knopf, A. C.; Sijtsema, N. M.
2018-02-01
A prerequisite for adaptive dose-tracking in radiotherapy is the assessment of the deformable image registration (DIR) quality. In this work, various metrics that quantify DIR uncertainties are investigated using realistic deformation fields of 26 head and neck and 12 lung cancer patients. Metrics related to the physiologically feasibility (the Jacobian determinant, harmonic energy (HE), and octahedral shear strain (OSS)) and numerically robustness of the deformation (the inverse consistency error (ICE), transitivity error (TE), and distance discordance metric (DDM)) were investigated. The deformable registrations were performed using a B-spline transformation model. The DIR error metrics were log-transformed and correlated (Pearson) against the log-transformed ground-truth error on a voxel level. Correlations of r ⩾ 0.5 were found for the DDM and HE. Given a DIR tolerance threshold of 2.0 mm and a negative predictive value of 0.90, the DDM and HE thresholds were 0.49 mm and 0.014, respectively. In conclusion, the log-transformed DDM and HE can be used to identify voxels at risk for large DIR errors with a large negative predictive value. The HE and/or DDM can therefore be used to perform automated quality assurance of each CT-based DIR for head and neck and lung cancer patients.
NASA Astrophysics Data System (ADS)
Dostal, P.; Krasula, L.; Klima, M.
2012-06-01
Various image processing techniques in multimedia technology are optimized using visual attention feature of the human visual system. Spatial non-uniformity causes that different locations in an image are of different importance in terms of perception of the image. In other words, the perceived image quality depends mainly on the quality of important locations known as regions of interest. The performance of such techniques is measured by subjective evaluation or objective image quality criteria. Many state-of-the-art objective metrics are based on HVS properties; SSIM, MS-SSIM based on image structural information, VIF based on the information that human brain can ideally gain from the reference image or FSIM utilizing the low-level features to assign the different importance to each location in the image. But still none of these objective metrics utilize the analysis of regions of interest. We solve the question if these objective metrics can be used for effective evaluation of images reconstructed by processing techniques based on ROI analysis utilizing high-level features. In this paper authors show that the state-of-the-art objective metrics do not correlate well with subjective evaluation while the demosaicing based on ROI analysis is used for reconstruction. The ROI were computed from "ground truth" visual attention data. The algorithm combining two known demosaicing techniques on the basis of ROI location is proposed to reconstruct the ROI in fine quality while the rest of image is reconstructed with low quality. The color image reconstructed by this ROI approach was compared with selected demosaicing techniques by objective criteria and subjective testing. The qualitative comparison of the objective and subjective results indicates that the state-of-the-art objective metrics are still not suitable for evaluation image processing techniques based on ROI analysis and new criteria is demanded.
Brown, Halley J; Andreason, Hope; Melling, Amy K; Imel, Zac E; Simon, Gregory E
2015-08-01
Retention, or its opposite, dropout, is a common metric of psychotherapy quality, but using it to assess provider performance can be problematic. Differences among providers in numbers of general dropouts, "good" dropouts (patients report positive treatment experiences and outcome), and "bad" dropouts (patients report negative treatment experiences and outcome) were evaluated. Patient records were paired with satisfaction surveys (N=3,054). Binomial mixed-effects models were used to examine differences among providers by dropout type. Thirty-four percent of treatment episodes resulted in dropout. Of these, 14% were bad dropouts and 27% were good dropouts. Providers accounted for approximately 17% of the variance in general dropout and 10% of the variance in both bad dropout and good dropout. The ranking of providers fluctuated by type of dropout. Provider assessments based on patient retention should offer a way to isolate dropout type, given that nonspecific metrics may lead to biased estimates of performance.
Multiuser signal detection using sequential decoding
NASA Astrophysics Data System (ADS)
Xie, Zhenhua; Rushforth, Craig K.; Short, Robert T.
1990-05-01
The application of sequential decoding to the detection of data transmitted over the additive white Gaussian noise channel by K asynchronous transmitters using direct-sequence spread-spectrum multiple access is considered. A modification of Fano's (1963) sequential-decoding metric, allowing the messages from a given user to be safely decoded if its Eb/N0 exceeds -1.6 dB, is presented. Computer simulation is used to evaluate the performance of a sequential decoder that uses this metric in conjunction with the stack algorithm. In many circumstances, the sequential decoder achieves results comparable to those obtained using the much more complicated optimal receiver.
Researcher and Author Impact Metrics: Variety, Value, and Context
2018-01-01
Numerous quantitative indicators are currently available for evaluating research productivity. No single metric is suitable for comprehensive evaluation of the author-level impact. The choice of particular metrics depends on the purpose and context of the evaluation. The aim of this article is to overview some of the widely employed author impact metrics and highlight perspectives of their optimal use. The h-index is one of the most popular metrics for research evaluation, which is easy to calculate and understandable for non-experts. It is automatically displayed on researcher and author profiles on citation databases such as Scopus and Web of Science. Its main advantage relates to the combined approach to the quantification of publication and citation counts. This index is increasingly cited globally. Being an appropriate indicator of publication and citation activity of highly productive and successfully promoted authors, the h-index has been criticized primarily for disadvantaging early career researchers and authors with a few indexed publications. Numerous variants of the index have been proposed to overcome its limitations. Alternative metrics have also emerged to highlight ‘societal impact.’ However, each of these traditional and alternative metrics has its own drawbacks, necessitating careful analyses of the context of social attention and value of publication and citation sets. Perspectives of the optimal use of researcher and author metrics is dependent on evaluation purposes and compounded by information sourced from various global, national, and specialist bibliographic databases. PMID:29713258
De Keersmaecker, Wanda; Lhermitte, Stef; Honnay, Olivier; Farifteh, Jamshid; Somers, Ben; Coppin, Pol
2014-07-01
Increasing frequency of extreme climate events is likely to impose increased stress on ecosystems and to jeopardize the services that ecosystems provide. Therefore, it is of major importance to assess the effects of extreme climate events on the temporal stability (i.e., the resistance, the resilience, and the variance) of ecosystem properties. Most time series of ecosystem properties are, however, affected by varying data characteristics, uncertainties, and noise, which complicate the comparison of ecosystem stability metrics (ESMs) between locations. Therefore, there is a strong need for a more comprehensive understanding regarding the reliability of stability metrics and how they can be used to compare ecosystem stability globally. The objective of this study was to evaluate the performance of temporal ESMs based on time series of the Moderate Resolution Imaging Spectroradiometer derived Normalized Difference Vegetation Index of 15 global land-cover types. We provide a framework (i) to assess the reliability of ESMs in function of data characteristics, uncertainties and noise and (ii) to integrate reliability estimates in future global ecosystem stability studies against climate disturbances. The performance of our framework was tested through (i) a global ecosystem comparison and (ii) an comparison of ecosystem stability in response to the 2003 drought. The results show the influence of data quality on the accuracy of ecosystem stability. White noise, biased noise, and trends have a stronger effect on the accuracy of stability metrics than the length of the time series, temporal resolution, or amount of missing values. Moreover, we demonstrate the importance of integrating reliability estimates to interpret stability metrics within confidence limits. Based on these confidence limits, other studies dealing with specific ecosystem types or locations can be put into context, and a more reliable assessment of ecosystem stability against environmental disturbances can be obtained. © 2013 John Wiley & Sons Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dogan, N; Weiss, E; Sleeman, W
Purpose: Errors in displacement vector fields (DVFs) generated by Deformable Image Registration (DIR) algorithms can give rise to significant uncertainties in contour propagation and dose accumulation in Image-Guided Adaptive Radiotherapy (IGART). The purpose of this work is to assess the accuracy of two DIR algorithms using a variety of quality metrics for prostate IGART. Methods: Pelvic CT images were selected from an anonymized database of nineteen prostate patients who underwent 8–12 serial scans during radiotherapy. Prostate, bladder, and rectum were contoured on 34 image-sets for three patients by the same physician. The planning CT was deformably-registered to daily CT usingmore » three variants of the Small deformation Inverse Consistent Linear Elastic (SICLE) algorithm: Grayscale-driven (G), Contour-driven (C, which utilizes segmented structures to drive DIR), combined (G+C); and also grayscale ITK demons (Gd). The accuracy of G, C, G+C SICLE and Gd registrations were evaluated using a new metric Edge Gradient Distance to Agreement (EGDTA) and other commonly-used metrics such as Pearson Correlation Coefficient (PCC), Dice Similarity Index (DSI) and Hausdorff Distance (HD). Results: C and G+C demonstrated much better performance at organ boundaries, revealing the lowest HD and highest DSI, in prostate, bladder and rectum. G+C demonstrated the lowest mean EGDTA (1.14 mm), which corresponds to highest registration quality, compared to G and C DVFs (1.16 and 2.34 mm). However, demons DIR showed the best overall performance, revealing lowest EGDTA (0.73 mm) and highest PCC (0.85). Conclusion: As expected, both C- and C+G SICLE more accurately reproduce manually-contoured target datasets than G-SICLE or Gd using HD and DSI metrics. In general, the Gd appears to have difficulty reproducing large daily position and shape changes in the rectum and bladder. However, Gd outperforms SICLE in terms of EGDTA and PCC metrics, possibly at the expense of topological quality of the estimated DVFs.« less
Evaluative Usage-Based Metrics for the Selection of E-Journals.
ERIC Educational Resources Information Center
Hahn, Karla L.; Faulkner, Lila A.
2002-01-01
Explores electronic journal usage statistics and develops three metrics and three benchmarks based on those metrics. Topics include earlier work that assessed the value of print journals and was modified for the electronic format; the evaluation of potential purchases; and implications for standards development, including the need for content…
Accurate evaluation and analysis of functional genomics data and methods
Greene, Casey S.; Troyanskaya, Olga G.
2016-01-01
The development of technology capable of inexpensively performing large-scale measurements of biological systems has generated a wealth of data. Integrative analysis of these data holds the promise of uncovering gene function, regulation, and, in the longer run, understanding complex disease. However, their analysis has proved very challenging, as it is difficult to quickly and effectively assess the relevance and accuracy of these data for individual biological questions. Here, we identify biases that present challenges for the assessment of functional genomics data and methods. We then discuss evaluation methods that, taken together, begin to address these issues. We also argue that the funding of systematic data-driven experiments and of high-quality curation efforts will further improve evaluation metrics so that they more-accurately assess functional genomics data and methods. Such metrics will allow researchers in the field of functional genomics to continue to answer important biological questions in a data-driven manner. PMID:22268703
Fiszman, Marcelo; Demner-Fushman, Dina; Kilicoglu, Halil; Rindflesch, Thomas C.
2009-01-01
As the number of electronic biomedical textual resources increases, it becomes harder for physicians to find useful answers at the point of care. Information retrieval applications provide access to databases; however, little research has been done on using automatic summarization to help navigate the documents returned by these systems. After presenting a semantic abstraction automatic summarization system for MEDLINE citations, we concentrate on evaluating its ability to identify useful drug interventions for fifty-three diseases. The evaluation methodology uses existing sources of evidence-based medicine as surrogates for a physician-annotated reference standard. Mean average precision (MAP) and a clinical usefulness score developed for this study were computed as performance metrics. The automatic summarization system significantly outperformed the baseline in both metrics. The MAP gain was 0.17 (p < 0.01) and the increase in the overall score of clinical usefulness was 0.39 (p < 0.05). PMID:19022398
Zupanc, Christine M; Wallis, Guy M; Hill, Andrew; Burgess-Limerick, Robin; Riek, Stephan; Plooy, Annaliese M; Horswill, Mark S; Watson, Marcus O; de Visser, Hans; Conlan, David; Hewett, David G
2017-07-12
The effectiveness of colonoscopy for diagnosing and preventing colon cancer is largely dependent on the ability of endoscopists to fully inspect the colonic mucosa, which they achieve primarily through skilled manipulation of the colonoscope during withdrawal. Performance assessment during live procedures is problematic. However, a virtual withdrawal simulation can help identify and parameterise actions linked to successful inspection, and offer standardised assessments for trainees. Eleven experienced endoscopists and 18 endoscopy novices (medical students) completed a mucosal inspection task during three simulated colonoscopic withdrawals. The two groups were compared on 10 performance metrics to preliminarily assess the validity of these measures to describe inspection quality. Four metrics were related to aspects of polyp detection: percentage of polyp markers found; number of polyp markers found per minute; percentage of the mucosal surface illuminated by the colonoscope (≥0.5 s); and percentage of polyp markers illuminated (≥2.5 s) but not identified. A further six metrics described the movement of the colonoscope: withdrawal time; linear distance travelled by the colonoscope tip; total distance travelled by the colonoscope tip; and distance travelled by the colonoscope tip due to movement of the up/down angulation control, movement of the left/right angulation control, and axial shaft rotation. Statistically significant experienced-novice differences were found for 8 of the 10 performance metrics (p's < .005). Compared with novices, experienced endoscopists inspected more of the mucosa and detected more polyp markers, at a faster rate. Despite completing the withdrawals more quickly than the novices, the experienced endoscopists also moved the colonoscope more in terms of linear distance travelled and overall tip movement, with greater use of both the up/down angulation control and axial shaft rotation. However, the groups did not differ in the number of polyp markers visible on the monitor but not identified, or movement of the left/right angulation control. All metrics that yielded significant group differences had adequate to excellent internal consistency reliability (α = .79 to .90). These systematic differences confirm the potential of the simulated withdrawal task for evaluating inspection skills and strategies. It may be useful for training, and assessment of trainee competence.
Closed-loop, pilot/vehicle analysis of the approach and landing task
NASA Technical Reports Server (NTRS)
Schmidt, D. K.; Anderson, M. R.
1985-01-01
Optimal-control-theoretic modeling and frequency-domain analysis is the methodology proposed to evaluate analytically the handling qualities of higher-order manually controlled dynamic systems. Fundamental to the methodology is evaluating the interplay between pilot workload and closed-loop pilot/vehicle performance and stability robustness. The model-based metric for pilot workload is the required pilot phase compensation. Pilot/vehicle performance and loop stability is then evaluated using frequency-domain techniques. When these techniques were applied to the flight-test data for thirty-two highly-augmented fighter configurations, strong correlation was obtained between the analytical and experimental results.
NASA Technical Reports Server (NTRS)
Campbell, Stefan F.; Kaneshige, John T.; Nguyen, Nhan T.; Krishakumar, Kalmanje S.
2010-01-01
Presented here is the evaluation of multiple adaptive control technologies for a generic transport aircraft simulation. For this study, seven model reference adaptive control (MRAC) based technologies were considered. Each technology was integrated into an identical dynamic-inversion control architecture and tuned using a methodology based on metrics and specific design requirements. Simulation tests were then performed to evaluate each technology s sensitivity to time-delay, flight condition, model uncertainty, and artificially induced cross-coupling. The resulting robustness and performance characteristics were used to identify potential strengths, weaknesses, and integration challenges of the individual adaptive control technologies
Pasquali, Sara K; Wallace, Amelia S; Gaynor, J William; Jacobs, Marshall L; O'Brien, Sean M; Hill, Kevin D; Gaies, Michael G; Romano, Jennifer C; Shahian, David M; Mayer, John E; Jacobs, Jeffrey P
2016-11-01
Performance assessment in congenital heart surgery is challenging due to the wide heterogeneity of disease. We describe current case mix across centers, evaluate methodology inclusive of all cardiac operations versus the more homogeneous subset of Society of Thoracic Surgeons benchmark operations, and describe implications regarding performance assessment. Centers (n = 119) participating in the Society of Thoracic Surgeons Congenital Heart Surgery Database (2010 through 2014) were included. Index operation type and frequency across centers were described. Center performance (risk-adjusted operative mortality) was evaluated and classified when including the benchmark versus all eligible operations. Overall, 207 types of operations were performed during the study period (112,140 total cases). Few operations were performed across all centers; only 25% were performed at least once by 75% or more of centers. There was 7.9-fold variation across centers in the proportion of total cases comprising high-complexity cases (STAT 5). In contrast, the benchmark operations made up 36% of cases, and all but 2 were performed by at least 90% of centers. When evaluating performance based on benchmark versus all operations, 15% of centers changed performance classification; 85% remained unchanged. Benchmark versus all operation methodology was associated with lower power, with 35% versus 78% of centers meeting sample size thresholds. There is wide variation in congenital heart surgery case mix across centers. Metrics based on benchmark versus all operations are associated with strengths (less heterogeneity) and weaknesses (lower power), and lead to differing performance classification for some centers. These findings have implications for ongoing efforts to optimize performance assessment, including choice of target population and appropriate interpretation of reported metrics. Copyright © 2016 The Society of Thoracic Surgeons. Published by Elsevier Inc. All rights reserved.
Wide-area, real-time monitoring and visualization system
Budhraja, Vikram S.; Dyer, James D.; Martinez Morales, Carlos A.
2013-03-19
A real-time performance monitoring system for monitoring an electric power grid. The electric power grid has a plurality of grid portions, each grid portion corresponding to one of a plurality of control areas. The real-time performance monitoring system includes a monitor computer for monitoring at least one of reliability metrics, generation metrics, transmission metrics, suppliers metrics, grid infrastructure security metrics, and markets metrics for the electric power grid. The data for metrics being monitored by the monitor computer are stored in a data base, and a visualization of the metrics is displayed on at least one display computer having a monitor. The at least one display computer in one said control area enables an operator to monitor the grid portion corresponding to a different said control area.
Wide-area, real-time monitoring and visualization system
Budhraja, Vikram S [Los Angeles, CA; Dyer, James D [La Mirada, CA; Martinez Morales, Carlos A [Upland, CA
2011-11-15
A real-time performance monitoring system for monitoring an electric power grid. The electric power grid has a plurality of grid portions, each grid portion corresponding to one of a plurality of control areas. The real-time performance monitoring system includes a monitor computer for monitoring at least one of reliability metrics, generation metrics, transmission metrics, suppliers metrics, grid infrastructure security metrics, and markets metrics for the electric power grid. The data for metrics being monitored by the monitor computer are stored in a data base, and a visualization of the metrics is displayed on at least one display computer having a monitor. The at least one display computer in one said control area enables an operator to monitor the grid portion corresponding to a different said control area.
Riato, Luisa; Leira, Manel; Della Bella, Valentina; Oberholster, Paul J
2018-01-15
Acid mine drainage (AMD) from coal mining in the Mpumalanga Highveld region of South Africa has caused severe chemical and biological degradation of aquatic habitats, specifically depressional wetlands, as mines use these wetlands for storage of AMD. Diatom-based multimetric indices (MMIs) to assess wetland condition have mostly been developed to assess agricultural and urban land use impacts. No diatom MMI of wetland condition has been developed to assess AMD impacts related to mining activities. Previous approaches to diatom-based MMI development in wetlands have not accounted for natural variability. Natural variability among depressional wetlands may influence the accuracy of MMIs. Epiphytic diatom MMIs sensitive to AMD were developed for a range of depressional wetland types to account for natural variation in biological metrics. For this, we classified wetland types based on diatom typologies. A range of 4-15 final metrics were selected from a pool of ~140 candidate metrics to develop the MMIs based on their: (1) broad range, (2) high separation power and (3) low correlation among metrics. Final metrics were selected from three categories: similarity to reference sites, functional groups, and taxonomic composition, which represent different aspects of diatom assemblage structure and function. MMI performances were evaluated according to their precision in distinguishing reference sites, responsiveness to discriminate reference and disturbed sites, sensitivity to human disturbances and relevancy to AMD-related stressors. Each MMI showed excellent discriminatory power, whether or not it accounted for natural variation. However, accounting for variation by grouping sites based on diatom typologies improved overall performance of MMIs. Our study highlights the usefulness of diatom-based metrics and provides a model for the biological assessment of depressional wetland condition in South Africa and elsewhere. Copyright © 2017 Elsevier B.V. All rights reserved.
An investigation of fighter aircraft agility
NASA Technical Reports Server (NTRS)
Valasek, John; Downing, David R.
1993-01-01
This report attempts to unify in a single document the results of a series of studies on fighter aircraft agility funded by the NASA Ames Research Center, Dryden Flight Research Facility and conducted at the University of Kansas Flight Research Laboratory during the period January 1989 through December 1993. New metrics proposed by pilots and the research community to assess fighter aircraft agility are collected and analyzed. The report develops a framework for understanding the context into which the various proposed fighter agility metrics fit in terms of application and testing. Since new metrics continue to be proposed, this report does not claim to contain every proposed fighter agility metric. Flight test procedures, test constraints, and related criteria are developed. Instrumentation required to quantify agility via flight test is considered, as is the sensitivity of the candidate metrics to deviations from nominal pilot command inputs, which is studied in detail. Instead of supplying specific, detailed conclusions about the relevance or utility of one candidate metric versus another, the authors have attempted to provide sufficient data and analyses for readers to formulate their own conclusions. Readers are therefore ultimately responsible for judging exactly which metrics are 'best' for their particular needs. Additionally, it is not the intent of the authors to suggest combat tactics or other actual operational uses of the results and data in this report. This has been left up to the user community. Twenty of the candidate agility metrics were selected for evaluation with high fidelity, nonlinear, non real-time flight simulation computer programs of the F-5A Freedom Fighter, F-16A Fighting Falcon, F-18A Hornet, and X-29A. The information and data presented on the 20 candidate metrics which were evaluated will assist interested readers in conducting their own extensive investigations. The report provides a definition and analysis of each metric; details of how to test and measure the metric, including any special data reduction requirements; typical values for the metric obtained using one or more aircraft types; and a sensitivity analysis if applicable. The report is organized as follows. The first chapter in the report presents a historical review of air combat trends which demonstrate the need for agility metrics in assessing the combat performance of fighter aircraft in a modern, all-aspect missile environment. The second chapter presents a framework for classifying each candidate metric according to time scale (transient, functional, instantaneous), further subdivided by axis (pitch, lateral, axial). The report is then broadly divided into two parts, with the transient agility metrics (pitch lateral, axial) covered in chapters three, four, and five, and the functional agility metrics covered in chapter six. Conclusions, recommendations, and an extensive reference list and biography are also included. Five appendices contain a comprehensive list of the definitions of all the candidate metrics; a description of the aircraft models and flight simulation programs used for testing the metrics; several relations and concepts which are fundamental to the study of lateral agility; an in-depth analysis of the axial agility metrics; and a derivation of the relations for the instantaneous agility and their approximations.
Evaluating Core Quality for a Mars Sample Return Mission
NASA Technical Reports Server (NTRS)
Weiss, D. K.; Budney, C.; Shiraishi, L.; Klein, K.
2012-01-01
Sample return missions, including the proposed Mars Sample Return (MSR) mission, propose to collect core samples from scientifically valuable sites on Mars. These core samples would undergo extreme forces during the drilling process, and during the reentry process if the EEV (Earth Entry Vehicle) performed a hard landing on Earth. Because of the foreseen damage to the stratigraphy of the cores, it is important to evaluate each core for rock quality. However, because no core sample return mission has yet been conducted to another planetary body, it remains unclear as to how to assess the cores for rock quality. In this report, we describe the development of a metric designed to quantitatively assess the mechanical quality of any rock cores returned from Mars (or other planetary bodies). We report on the process by which we tested the metric on core samples of Mars analogue materials, and the effectiveness of the core assessment metric (CAM) in assessing rock core quality before and after the cores were subjected to shocking (g forces representative of an EEV landing).
Adapting the ISO 20462 softcopy ruler method for online image quality studies
NASA Astrophysics Data System (ADS)
Burns, Peter D.; Phillips, Jonathan B.; Williams, Don
2013-01-01
In this paper we address the problem of Image Quality Assessment of no reference metrics, focusing on JPEG corrupted images. In general no reference metrics are not able to measure with the same performance the distortions within their possible range and with respect to different image contents. The crosstalk between content and distortion signals influences the human perception. We here propose two strategies to improve the correlation between subjective and objective quality data. The first strategy is based on grouping the images according to their spatial complexity. The second one is based on a frequency analysis. Both the strategies are tested on two databases available in the literature. The results show an improvement in the correlations between no reference metrics and psycho-visual data, evaluated in terms of the Pearson Correlation Coefficient.
Exploring s-CIELAB as a scanner metric for print uniformity
NASA Astrophysics Data System (ADS)
Hertel, Dirk W.
2005-01-01
The s-CIELAB color difference metric combines the standard CIELAB metric for perceived color difference with spatial contrast sensitivity filtering. When studying the performance of digital image processing algorithms, maps of spatial color difference between 'before' and 'after' images are a measure of perceived image difference. A general image quality metric can be obtained by modeling the perceived difference from an ideal image. This paper explores the s-CIELAB concept for evaluating the quality of digital prints. Prints present the challenge that the 'ideal print' which should serve as the reference when calculating the delta E* error map is unknown, and thus be estimated from the scanned print. A reasonable estimate of what the ideal print 'should have been' is possible at least for images of known content such as flat fields or continuous wedges, where the error map can be calculated against a global or local mean. While such maps showing the perceived error at each pixel are extremely useful when analyzing print defects, it is desirable to statistically reduce them to a more manageable dataset. Examples of digital print uniformity are given, and the effect of specific print defects on the s-CIELAB delta E* metric are discussed.
An Innovative Metric to Evaluate Satellite Precipitation's Spatial Distribution
NASA Astrophysics Data System (ADS)
Liu, H.; Chu, W.; Gao, X.; Sorooshian, S.
2011-12-01
Thanks to its capability to cover the mountains, where ground measurement instruments cannot reach, satellites provide a good means of estimating precipitation over mountainous regions. In regions with complex terrains, accurate information on high-resolution spatial distribution of precipitation is critical for many important issues, such as flood/landslide warning, reservoir operation, water system planning, etc. Therefore, in order to be useful in many practical applications, satellite precipitation products should possess high quality in characterizing spatial distribution. However, most existing validation metrics, which are based on point/grid comparison using simple statistics, cannot effectively measure satellite's skill of capturing the spatial patterns of precipitation fields. This deficiency results from the fact that point/grid-wised comparison does not take into account of the spatial coherence of precipitation fields. Furth more, another weakness of many metrics is that they can barely provide information on why satellite products perform well or poor. Motivated by our recent findings of the consistent spatial patterns of the precipitation field over the western U.S., we developed a new metric utilizing EOF analysis and Shannon entropy. The metric can be derived through two steps: 1) capture the dominant spatial patterns of precipitation fields from both satellite products and reference data through EOF analysis, and 2) compute the similarities between the corresponding dominant patterns using mutual information measurement defined with Shannon entropy. Instead of individual point/grid, the new metric treat the entire precipitation field simultaneously, naturally taking advantage of spatial dependence. Since the dominant spatial patterns are shaped by physical processes, the new metric can shed light on why satellite product can or cannot capture the spatial patterns. For demonstration, a experiment was carried out to evaluate a satellite precipitation product, CMORPH, against the U.S. daily precipitation analysis of Climate Prediction Center (CPC) at a daily and .25o scale over the Western U.S.
Improving Learner Handovers in Medical Education.
Warm, Eric J; Englander, Robert; Pereira, Anne; Barach, Paul
2017-07-01
Multiple studies have demonstrated that the information included in the Medical Student Performance Evaluation fails to reliably predict medical students' future performance. This faulty transfer of information can lead to harm when poorly prepared students fail out of residency or, worse, are shuttled through the medical education system without an honest accounting of their performance. Such poor learner handovers likely arise from two root causes: (1) the absence of agreed-on outcomes of training and/or accepted assessments of those outcomes, and (2) the lack of standardized ways to communicate the results of those assessments. To improve the current learner handover situation, an authentic, shared mental model of competency is needed; high-quality tools to assess that competency must be developed and tested; and transparent, reliable, and safe ways to communicate this information must be created.To achieve these goals, the authors propose using a learner handover process modeled after a patient handover process. The CLASS model includes a description of the learner's Competency attainment, a summary of the Learner's performance, an Action list and statement of Situational awareness, and Synthesis by the receiving program. This model also includes coaching oriented towards improvement along the continuum of education and care. Just as studies have evaluated patient handover models using metrics that matter most to patients, studies must evaluate this learner handover model using metrics that matter most to providers, patients, and learners.
Grading the Metrics: Performance-Based Funding in the Florida State University System
ERIC Educational Resources Information Center
Cornelius, Luke M.; Cavanaugh, Terence W.
2016-01-01
A policy analysis of Florida's 10-factor Performance-Based Funding system for state universities. The focus of the article is on the system of performance metrics developed by the state Board of Governors and their impact on institutions and their missions. The paper also discusses problems and issues with the metrics, their ongoing evolution, and…
NASA Astrophysics Data System (ADS)
Che-Aron, Z.; Abdalla, A. H.; Abdullah, K.; Hassan, W. H.
2013-12-01
In recent years, Cognitive Radio (CR) technology has largely attracted significant studies and research. Cognitive Radio Ad Hoc Network (CRAHN) is an emerging self-organized, multi-hop, wireless network which allows unlicensed users to opportunistically access available licensed spectrum bands for data communication under an intelligent and cautious manner. However, in CRAHNs, a lot of failures can easily occur during data transmission caused by PU (Primary User) activity, topology change, node fault, or link degradation. In this paper, an attempt has been made to evaluate the performance of the Multi-Radio Link-Quality Source Routing (MR-LQSR) protocol in CRAHNs under different path failure rate. In the MR-LQSR protocol, the Weighted Cumulative Expected Transmission Time (WCETT) is used as the routing metric. The simulations are carried out using the NS-2 simulator. The protocol performance is evaluated with respect to performance metrics like average throughput, packet loss, average end-to-end delay and average jitter. From the simulation results, it is observed that the number of path failures depends on the PUs number and mobility rate of SUs (Secondary Users). Moreover, the protocol performance is greatly affected when the path failure rate is high, leading to major service outages.
Imaging performance of an isotropic negative dielectric constant slab.
Shivanand; Liu, Huikan; Webb, Kevin J
2008-11-01
The influence of material and thickness on the subwavelength imaging performance of a negative dielectric constant slab is studied. Resonance in the plane-wave transfer function produces a high spatial frequency ripple that could be useful in fabricating periodic structures. A cost function based on the plane-wave transfer function provides a useful metric to evaluate the planar slab lens performance, and using this, the optimal slab dielectric constant can be determined.
Closed-loop, pilot/vehicle analysis of the approach and landing task
NASA Technical Reports Server (NTRS)
Anderson, M. R.; Schmidt, D. K.
1986-01-01
In the case of approach and landing, it is universally accepted that the pilot uses more than one vehicle response, or output, to close his control loops. Therefore, to model this task, a multi-loop analysis technique is required. The analysis problem has been in obtaining reasonable analytic estimates of the describing functions representing the pilot's loop compensation. Once these pilot describing functions are obtained, appropriate performance and workload metrics must then be developed for the landing task. The optimal control approach provides a powerful technique for obtaining the necessary describing functions, once the appropriate task objective is defined in terms of a quadratic objective function. An approach is presented through the use of a simple, reasonable objective function and model-based metrics to evaluate loop performance and pilot workload. The results of an analysis of the LAHOS (Landing and Approach of Higher Order Systems) study performed by R.E. Smith is also presented.
Johnson, S J; Hunt, C M; Woolnough, H M; Crawshaw, M; Kilkenny, C; Gould, D A; England, A; Sinha, A; Villard, P F
2012-05-01
The aim of this article was to identify and prospectively investigate simulated ultrasound-guided targeted liver biopsy performance metrics as differentiators between levels of expertise in interventional radiology. Task analysis produced detailed procedural step documentation allowing identification of critical procedure steps and performance metrics for use in a virtual reality ultrasound-guided targeted liver biopsy procedure. Consultant (n=14; male=11, female=3) and trainee (n=26; male=19, female=7) scores on the performance metrics were compared. Ethical approval was granted by the Liverpool Research Ethics Committee (UK). Independent t-tests and analysis of variance (ANOVA) investigated differences between groups. Independent t-tests revealed significant differences between trainees and consultants on three performance metrics: targeting, p=0.018, t=-2.487 (-2.040 to -0.207); probe usage time, p = 0.040, t=2.132 (11.064 to 427.983); mean needle length in beam, p=0.029, t=-2.272 (-0.028 to -0.002). ANOVA reported significant differences across years of experience (0-1, 1-2, 3+ years) on seven performance metrics: no-go area touched, p=0.012; targeting, p=0.025; length of session, p=0.024; probe usage time, p=0.025; total needle distance moved, p=0.038; number of skin contacts, p<0.001; total time in no-go area, p=0.008. More experienced participants consistently received better performance scores on all 19 performance metrics. It is possible to measure and monitor performance using simulation, with performance metrics providing feedback on skill level and differentiating levels of expertise. However, a transfer of training study is required.
Quantifying Contextual Interference and Its Effect on Skill Transfer in Skilled Youth Tennis Players
Buszard, Tim; Reid, Machar; Krause, Lyndon; Kovalchik, Stephanie; Farrow, Damian
2017-01-01
The contextual interference effect is a well-established motor learning phenomenon. Most of the contextual interference effect literature has addressed simple skills, while less is known about the role of contextual interference in complex sport skill practice, particularly with respect to skilled performers. The purpose of this study was to assess contextual interference when practicing the tennis serve. Study 1 evaluated tennis serve practice of nine skilled youth tennis players using a novel statistical metric developed specifically to measure between-skill and within-skill variability as sources of contextual interference. This metric highlighted that skilled tennis players typically engaged in serve practice that featured low contextual interference. In Study 2, 16 skilled youth tennis players participated in 10 practice sessions that aimed to improve serving “down the T.” Participants were stratified into a low contextual interference practice group (Low CI) and a moderate contextual interference practice group (Moderate CI). Pre- and post-tests were conducted 1 week before and 1 week after the practice period. Testing involved a skill test, which assessed serving performance in a closed setting, and a transfer test, which assessed serving performance in a match-play setting. No significant contextual interference differences were observed with respect to practice performance. However, analysis of pre- and post-test serve performance revealed significant Group × Time interactions. The Moderate CI group showed no change in serving performance (service displacement from the T) from pre- to post-test in the skill test, but did display improvements in the transfer test. Conversely, the Low CI group improved serving performance (service displacement from the T) in the skill test but not the transfer test. Results suggest that the typical contextual interference effect is less clear when practicing a complex motor skill, at least with the tennis serve skill evaluated here. We encourage researchers and applied sport scientists to use our statistical metric to measure contextual interference. PMID:29163306
Metrics and the effective computational scientist: process, quality and communication.
Baldwin, Eric T
2012-09-01
Recent treatments of computational knowledge worker productivity have focused upon the value the discipline brings to drug discovery using positive anecdotes. While this big picture approach provides important validation of the contributions of these knowledge workers, the impact accounts do not provide the granular detail that can help individuals and teams perform better. I suggest balancing the impact-focus with quantitative measures that can inform the development of scientists. Measuring the quality of work, analyzing and improving processes, and the critical evaluation of communication can provide immediate performance feedback. The introduction of quantitative measures can complement the longer term reporting of impacts on drug discovery. These metric data can document effectiveness trends and can provide a stronger foundation for the impact dialogue. Copyright © 2012 Elsevier Ltd. All rights reserved.
Advanced Life Support System Value Metric
NASA Technical Reports Server (NTRS)
Jones, Harry W.; Rasky, Daniel J. (Technical Monitor)
1999-01-01
The NASA Advanced Life Support (ALS) Program is required to provide a performance metric to measure its progress in system development. Extensive discussions within the ALS program have led to the following approach. The Equivalent System Mass (ESM) metric has been traditionally used and provides a good summary of the weight, size, and power cost factors of space life support equipment. But ESM assumes that all the systems being traded off exactly meet a fixed performance requirement, so that the value and benefit (readiness, performance, safety, etc.) of all the different systems designs are considered to be exactly equal. This is too simplistic. Actual system design concepts are selected using many cost and benefit factors and the system specification is defined after many trade-offs. The ALS program needs a multi-parameter metric including both the ESM and a System Value Metric (SVM). The SVM would include safety, maintainability, reliability, performance, use of cross cutting technology, and commercialization potential. Another major factor in system selection is technology readiness level (TRL), a familiar metric in ALS. The overall ALS system metric that is suggested is a benefit/cost ratio, SVM/[ESM + function (TRL)], with appropriate weighting and scaling. The total value is given by SVM. Cost is represented by higher ESM and lower TRL. The paper provides a detailed description and example application of a suggested System Value Metric and an overall ALS system metric.
NASA Technical Reports Server (NTRS)
Heath, Bruce E.; Khan, M. Javed; Rossi, Marcia; Ali, Syed Firasat
2005-01-01
The rising cost of flight training and the low cost of powerful computers have resulted in increasing use of PC-based flight simulators. This has prompted FAA standards regulating such use and allowing aspects of training on simulators meeting these standards to be substituted for flight time. However, the FAA regulations require an authorized flight instructor as part of the training environment. Thus, while costs associated with flight time have been reduced, the cost associated with the need for a flight instructor still remains. The obvious area of research, therefore, has been to develop intelligent simulators. However, the two main challenges of such attempts have been training strategies and assessment. The research reported in this paper was conducted to evaluate various performance metrics of a straight-in landing approach by 33 novice pilots flying a light single engine aircraft simulation. These metrics were compared to assessments of these flights by two flight instructors to establish a correlation between the two techniques in an attempt to determine a composite performance metric for this flight maneuver.
Kireeva, Natalia V; Ovchinnikova, Svetlana I; Kuznetsov, Sergey L; Kazennov, Andrey M; Tsivadze, Aslan Yu
2014-02-01
This study concerns large margin nearest neighbors classifier and its multi-metric extension as the efficient approaches for metric learning which aimed to learn an appropriate distance/similarity function for considered case studies. In recent years, many studies in data mining and pattern recognition have demonstrated that a learned metric can significantly improve the performance in classification, clustering and retrieval tasks. The paper describes application of the metric learning approach to in silico assessment of chemical liabilities. Chemical liabilities, such as adverse effects and toxicity, play a significant role in drug discovery process, in silico assessment of chemical liabilities is an important step aimed to reduce costs and animal testing by complementing or replacing in vitro and in vivo experiments. Here, to our knowledge for the first time, a distance-based metric learning procedures have been applied for in silico assessment of chemical liabilities, the impact of metric learning on structure-activity landscapes and predictive performance of developed models has been analyzed, the learned metric was used in support vector machines. The metric learning results have been illustrated using linear and non-linear data visualization techniques in order to indicate how the change of metrics affected nearest neighbors relations and descriptor space.
NASA Astrophysics Data System (ADS)
Kireeva, Natalia V.; Ovchinnikova, Svetlana I.; Kuznetsov, Sergey L.; Kazennov, Andrey M.; Tsivadze, Aslan Yu.
2014-02-01
This study concerns large margin nearest neighbors classifier and its multi-metric extension as the efficient approaches for metric learning which aimed to learn an appropriate distance/similarity function for considered case studies. In recent years, many studies in data mining and pattern recognition have demonstrated that a learned metric can significantly improve the performance in classification, clustering and retrieval tasks. The paper describes application of the metric learning approach to in silico assessment of chemical liabilities. Chemical liabilities, such as adverse effects and toxicity, play a significant role in drug discovery process, in silico assessment of chemical liabilities is an important step aimed to reduce costs and animal testing by complementing or replacing in vitro and in vivo experiments. Here, to our knowledge for the first time, a distance-based metric learning procedures have been applied for in silico assessment of chemical liabilities, the impact of metric learning on structure-activity landscapes and predictive performance of developed models has been analyzed, the learned metric was used in support vector machines. The metric learning results have been illustrated using linear and non-linear data visualization techniques in order to indicate how the change of metrics affected nearest neighbors relations and descriptor space.
R&D100: Lightweight Distributed Metric Service
Gentile, Ann; Brandt, Jim; Tucker, Tom; Showerman, Mike
2018-06-12
On today's High Performance Computing platforms, the complexity of applications and configurations makes efficient use of resources difficult. The Lightweight Distributed Metric Service (LDMS) is monitoring software developed by Sandia National Laboratories to provide detailed metrics of system performance. LDMS provides collection, transport, and storage of data from extreme-scale systems at fidelities and timescales to provide understanding of application and system performance with no statistically significant impact on application performance.
R&D100: Lightweight Distributed Metric Service
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gentile, Ann; Brandt, Jim; Tucker, Tom
2015-11-19
On today's High Performance Computing platforms, the complexity of applications and configurations makes efficient use of resources difficult. The Lightweight Distributed Metric Service (LDMS) is monitoring software developed by Sandia National Laboratories to provide detailed metrics of system performance. LDMS provides collection, transport, and storage of data from extreme-scale systems at fidelities and timescales to provide understanding of application and system performance with no statistically significant impact on application performance.
Favazza, Christopher P.; Fetterly, Kenneth A.; Hangiandreou, Nicholas J.; Leng, Shuai; Schueler, Beth A.
2015-01-01
Abstract. Evaluation of flat-panel angiography equipment through conventional image quality metrics is limited by the scope of standard spatial-domain image quality metric(s), such as contrast-to-noise ratio and spatial resolution, or by restricted access to appropriate data to calculate Fourier domain measurements, such as modulation transfer function, noise power spectrum, and detective quantum efficiency. Observer models have been shown capable of overcoming these limitations and are able to comprehensively evaluate medical-imaging systems. We present a spatial domain-based channelized Hotelling observer model to calculate the detectability index (DI) of our different sized disks and compare the performance of different imaging conditions and angiography systems. When appropriate, changes in DIs were compared to expectations based on the classical Rose model of signal detection to assess linearity of the model with quantum signal-to-noise ratio (SNR) theory. For these experiments, the estimated uncertainty of the DIs was less than 3%, allowing for precise comparison of imaging systems or conditions. For most experimental variables, DI changes were linear with expectations based on quantum SNR theory. DIs calculated for the smallest objects demonstrated nonlinearity with quantum SNR theory due to system blur. Two angiography systems with different detector element sizes were shown to perform similarly across the majority of the detection tasks. PMID:26158086
A Safety Index and Method for Flightdeck Evaluation
NASA Technical Reports Server (NTRS)
Latorella, Kara A.
2000-01-01
If our goal is to improve safety through machine, interface, and training design, then we must define a metric of flightdeck safety that is usable in the design process. Current measures associated with our notions of "good" pilot performance and ultimate safety of flightdeck performance fail to provide an adequate index of safe flightdeck performance for design evaluation purposes. The goal of this research effort is to devise a safety index and method that allows us to evaluate flightdeck performance holistically and in a naturalistic experiment. This paper uses Reason's model of accident causation (1990) as a basis for measuring safety, and proposes a relational database system and method for 1) defining a safety index of flightdeck performance, and 2) evaluating the "safety" afforded by flightdeck performance for the purpose of design iteration. Methodological considerations, limitations, and benefits are discussed as well as extensions to this work.
NASA Astrophysics Data System (ADS)
Khobragade, P.; Fan, Jiahua; Rupcich, Franco; Crotty, Dominic J.; Gilat Schmidt, Taly
2016-03-01
This study quantitatively evaluated the performance of the exponential transformation of the free-response operating characteristic curve (EFROC) metric, with the Channelized Hotelling Observer (CHO) as a reference. The CHO has been used for image quality assessment of reconstruction algorithms and imaging systems and often it is applied to study the signal-location-known cases. The CHO also requires a large set of images to estimate the covariance matrix. In terms of clinical applications, this assumption and requirement may be unrealistic. The newly developed location-unknown EFROC detectability metric is estimated from the confidence scores reported by a model observer. Unlike the CHO, EFROC does not require a channelization step and is a non-parametric detectability metric. There are few quantitative studies available on application of the EFROC metric, most of which are based on simulation data. This study investigated the EFROC metric using experimental CT data. A phantom with four low contrast objects: 3mm (14 HU), 5mm (7HU), 7mm (5 HU) and 10 mm (3 HU) was scanned at dose levels ranging from 25 mAs to 270 mAs and reconstructed using filtered backprojection. The area under the curve values for CHO (AUC) and EFROC (AFE) were plotted with respect to different dose levels. The number of images required to estimate the non-parametric AFE metric was calculated for varying tasks and found to be less than the number of images required for parametric CHO estimation. The AFE metric was found to be more sensitive to changes in dose than the CHO metric. This increased sensitivity and the assumption of unknown signal location may be useful for investigating and optimizing CT imaging methods. Future work is required to validate the AFE metric against human observers.
NASA Aviation Safety Program Systems Analysis/Program Assessment Metrics Review
NASA Technical Reports Server (NTRS)
Louis, Garrick E.; Anderson, Katherine; Ahmad, Tisan; Bouabid, Ali; Siriwardana, Maya; Guilbaud, Patrick
2003-01-01
The goal of this project is to evaluate the metrics and processes used by NASA's Aviation Safety Program in assessing technologies that contribute to NASA's aviation safety goals. There were three objectives for reaching this goal. First, NASA's main objectives for aviation safety were documented and their consistency was checked against the main objectives of the Aviation Safety Program. Next, the metrics used for technology investment by the Program Assessment function of AvSP were evaluated. Finally, other metrics that could be used by the Program Assessment Team (PAT) were identified and evaluated. This investigation revealed that the objectives are in fact consistent across organizational levels at NASA and with the FAA. Some of the major issues discussed in this study which should be further investigated, are the removal of the Cost and Return-on-Investment metrics, the lack of the metrics to measure the balance of investment and technology, the interdependencies between some of the metric risk driver categories, and the conflict between 'fatal accident rate' and 'accident rate' in the language of the Aviation Safety goal as stated in different sources.
Defining quality metrics and improving safety and outcome in allergy care.
Lee, Stella; Stachler, Robert J; Ferguson, Berrylin J
2014-04-01
The delivery of allergy immunotherapy in the otolaryngology office is variable and lacks standardization. Quality metrics encompasses the measurement of factors associated with good patient-centered care. These factors have yet to be defined in the delivery of allergy immunotherapy. We developed and applied quality metrics to 6 allergy practices affiliated with an academic otolaryngic allergy center. This work was conducted at a tertiary academic center providing care to over 1500 patients. We evaluated methods and variability between 6 sites. Tracking of errors and anaphylaxis was initiated across all sites. A nationwide survey of academic and private allergists was used to collect data on current practice and use of quality metrics. The most common types of errors recorded were patient identification errors (n = 4), followed by vial mixing errors (n = 3), and dosing errors (n = 2). There were 7 episodes of anaphylaxis of which 2 were secondary to dosing errors for a rate of 0.01% or 1 in every 10,000 injection visits/year. Site visits showed that 86% of key safety measures were followed. Analysis of nationwide survey responses revealed that quality metrics are still not well defined by either medical or otolaryngic allergy practices. Academic practices were statistically more likely to use quality metrics (p = 0.021) and perform systems reviews and audits in comparison to private practices (p = 0.005). Quality metrics in allergy delivery can help improve safety and quality care. These metrics need to be further defined by otolaryngic allergists in the changing health care environment. © 2014 ARS-AAOA, LLC.
Coverage Metrics for Requirements-Based Testing: Evaluation of Effectiveness
NASA Technical Reports Server (NTRS)
Staats, Matt; Whalen, Michael W.; Heindahl, Mats P. E.; Rajan, Ajitha
2010-01-01
In black-box testing, the tester creates a set of tests to exercise a system under test without regard to the internal structure of the system. Generally, no objective metric is used to measure the adequacy of black-box tests. In recent work, we have proposed three requirements coverage metrics, allowing testers to objectively measure the adequacy of a black-box test suite with respect to a set of requirements formalized as Linear Temporal Logic (LTL) properties. In this report, we evaluate the effectiveness of these coverage metrics with respect to fault finding. Specifically, we conduct an empirical study to investigate two questions: (1) do test suites satisfying a requirements coverage metric provide better fault finding than randomly generated test suites of approximately the same size?, and (2) do test suites satisfying a more rigorous requirements coverage metric provide better fault finding than test suites satisfying a less rigorous requirements coverage metric? Our results indicate (1) only one coverage metric proposed -- Unique First Cause (UFC) coverage -- is sufficiently rigorous to ensure test suites satisfying the metric outperform randomly generated test suites of similar size and (2) that test suites satisfying more rigorous coverage metrics provide better fault finding than test suites satisfying less rigorous coverage metrics.
Evaluation of Two Crew Module Boilerplate Tests Using Newly Developed Calibration Metrics
NASA Technical Reports Server (NTRS)
Horta, Lucas G.; Reaves, Mercedes C.
2012-01-01
The paper discusses a application of multi-dimensional calibration metrics to evaluate pressure data from water drop tests of the Max Launch Abort System (MLAS) crew module boilerplate. Specifically, three metrics are discussed: 1) a metric to assess the probability of enveloping the measured data with the model, 2) a multi-dimensional orthogonality metric to assess model adequacy between test and analysis, and 3) a prediction error metric to conduct sensor placement to minimize pressure prediction errors. Data from similar (nearly repeated) capsule drop tests shows significant variability in the measured pressure responses. When compared to expected variability using model predictions, it is demonstrated that the measured variability cannot be explained by the model under the current uncertainty assumptions.
Advanced Life Support System Value Metric
NASA Technical Reports Server (NTRS)
Jones, Harry W.; Arnold, James O. (Technical Monitor)
1999-01-01
The NASA Advanced Life Support (ALS) Program is required to provide a performance metric to measure its progress in system development. Extensive discussions within the ALS program have reached a consensus. The Equivalent System Mass (ESM) metric has been traditionally used and provides a good summary of the weight, size, and power cost factors of space life support equipment. But ESM assumes that all the systems being traded off exactly meet a fixed performance requirement, so that the value and benefit (readiness, performance, safety, etc.) of all the different systems designs are exactly equal. This is too simplistic. Actual system design concepts are selected using many cost and benefit factors and the system specification is then set accordingly. The ALS program needs a multi-parameter metric including both the ESM and a System Value Metric (SVM). The SVM would include safety, maintainability, reliability, performance, use of cross cutting technology, and commercialization potential. Another major factor in system selection is technology readiness level (TRL), a familiar metric in ALS. The overall ALS system metric that is suggested is a benefit/cost ratio, [SVM + TRL]/ESM, with appropriate weighting and scaling. The total value is the sum of SVM and TRL. Cost is represented by ESM. The paper provides a detailed description and example application of the suggested System Value Metric.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ma, T; Kumaraswamy, L
Purpose: Detection of treatment delivery errors is important in radiation therapy. However, accurate quantification of delivery errors is also of great importance. This study aims to evaluate the 3DVH software’s ability to accurately quantify delivery errors. Methods: Three VMAT plans (prostate, H&N and brain) were randomly chosen for this study. First, we evaluated whether delivery errors could be detected by gamma evaluation. Conventional per-beam IMRT QA was performed with the ArcCHECK diode detector for the original plans and for the following modified plans: (1) induced dose difference error up to ±4.0% and (2) control point (CP) deletion (3 to 10more » CPs were deleted) (3) gantry angle shift error (3 degree uniformly shift). 2D and 3D gamma evaluation were performed for all plans through SNC Patient and 3DVH, respectively. Subsequently, we investigated the accuracy of 3DVH analysis for all cases. This part evaluated, using the Eclipse TPS plans as standard, whether 3DVH accurately can model the changes in clinically relevant metrics caused by the delivery errors. Results: 2D evaluation seemed to be more sensitive to delivery errors. The average differences between ECLIPSE predicted and 3DVH results for each pair of specific DVH constraints were within 2% for all three types of error-induced treatment plans, illustrating the fact that 3DVH is fairly accurate in quantifying the delivery errors. Another interesting observation was that even though the gamma pass rates for the error plans are high, the DVHs showed significant differences between original plan and error-induced plans in both Eclipse and 3DVH analysis. Conclusion: The 3DVH software is shown to accurately quantify the error in delivered dose based on clinically relevant DVH metrics, where a conventional gamma based pre-treatment QA might not necessarily detect.« less
A priori discretization error metrics for distributed hydrologic modeling applications
NASA Astrophysics Data System (ADS)
Liu, Hongli; Tolson, Bryan A.; Craig, James R.; Shafii, Mahyar
2016-12-01
Watershed spatial discretization is an important step in developing a distributed hydrologic model. A key difficulty in the spatial discretization process is maintaining a balance between the aggregation-induced information loss and the increase in computational burden caused by the inclusion of additional computational units. Objective identification of an appropriate discretization scheme still remains a challenge, in part because of the lack of quantitative measures for assessing discretization quality, particularly prior to simulation. This study proposes a priori discretization error metrics to quantify the information loss of any candidate discretization scheme without having to run and calibrate a hydrologic model. These error metrics are applicable to multi-variable and multi-site discretization evaluation and provide directly interpretable information to the hydrologic modeler about discretization quality. The first metric, a subbasin error metric, quantifies the routing information loss from discretization, and the second, a hydrological response unit (HRU) error metric, improves upon existing a priori metrics by quantifying the information loss due to changes in land cover or soil type property aggregation. The metrics are straightforward to understand and easy to recode. Informed by the error metrics, a two-step discretization decision-making approach is proposed with the advantage of reducing extreme errors and meeting the user-specified discretization error targets. The metrics and decision-making approach are applied to the discretization of the Grand River watershed in Ontario, Canada. Results show that information loss increases as discretization gets coarser. Moreover, results help to explain the modeling difficulties associated with smaller upstream subbasins since the worst discretization errors and highest error variability appear in smaller upstream areas instead of larger downstream drainage areas. Hydrologic modeling experiments under candidate discretization schemes validate the strong correlation between the proposed discretization error metrics and hydrologic simulation responses. Discretization decision-making results show that the common and convenient approach of making uniform discretization decisions across the watershed performs worse than the proposed non-uniform discretization approach in terms of preserving spatial heterogeneity under the same computational cost.
[Applicability of traditional landscape metrics in evaluating urban heat island effect].
Chen, Ai-Lian; Sun, Ran-Hao; Chen, Li-Ding
2012-08-01
By using 24 landscape metrics, this paper evaluated the urban heat island effect in parts of Beijing downtown area. QuickBird (QB) images were used to extract the landscape type information, and the thermal bands from Landsat Enhanced Thematic Mapper Plus (ETM+) images were used to extract the land surface temperature (LST) in four seasons of the same year. The 24 landscape pattern metrics were calculated at landscape and class levels in a fixed window with 120 mx 120 m in size, with the applicability of these traditional landscape metrics in evaluating the urban heat island effect examined. Among the 24 landscape metrics, only the percentage composition of landscape (PLAND), patch density (PD), largest patch index (LPI), coefficient of Euclidean nearest-neighbor distance variance (ENN_CV), and landscape division index (DIVISION) at landscape level were significantly correlated with the LST in March, May, and November, and the PLAND, LPI, DIVISION, percentage of like adjacencies, and interspersion and juxtaposition index at class level showed significant correlations with the LST in March, May, July, and December, especially in July. Some metrics such as PD, edge density, clumpiness index, patch cohesion index, effective mesh size, splitting index, aggregation index, and normalized landscape shape index showed varying correlations with the LST at different class levels. The traditional landscape metrics could not be appropriate in evaluating the effects of river on LST, while some of the metrics could be useful in characterizing urban LST and analyzing the urban heat island effect, but screening and examining should be made on the metrics.
A Survey and Analysis of Aircraft Maintenance Metrics: A Balanced Scorecard Approach
2014-03-27
Metrics Set Theory /Framework .................................................................................... 16 Balanced Scorecard overview...a useful form Figure 1: Metric Evaluation Criteria (Caplice & Sheffi, 1994, p. 14) Metrics Set Theory /Framework The researcher included an...examination of established theory and frameworks on how metrics sets are constructed in the literature review. The purpose of this examination was to
The Hidden Costs of Community Colleges
ERIC Educational Resources Information Center
Schneider, Mark.; Yin, Lu
2011-01-01
Community colleges are an essential component of America's higher education system. Last year, they enrolled well over 6 million students, a number that continues to grow. Community colleges have multiple missions, and their performance ultimately needs to be evaluated on multiple metrics. However, one key mission of community colleges is the…
Assessing Process Mass Intensity and Waste via an "aza"-Baylis-Hillman Reaction
ERIC Educational Resources Information Center
Go´mez-Biagi, Rodolfo F.; Dicks, Andrew P.
2015-01-01
A synthetic procedure is outlined where upper-level undergraduate organic chemistry students perform a two-week, semimicroscale "aza"-Baylis-Hillman reaction to generate an allylic sulfonamide product. Students evaluate several green chemistry reaction metrics of industrial importance (process mass intensity (PMI), E factor, and reaction…
Beaulieu-Jones, Brendin R; Rossy, William H; Sanchez, George; Whalen, James M; Lavery, Kyle P; McHale, Kevin J; Vopat, Bryan G; Van Allen, Joseph J; Akamefula, Ramesses A; Provencher, Matthew T
2017-07-01
At the annual National Football League (NFL) Scouting Combine, the medical staff of each NFL franchise performs a comprehensive medical evaluation of all athletes potentially entering the NFL. Currently, little is known regarding the overall epidemiology of injuries identified at the combine and their impact on NFL performance. To determine the epidemiology of injuries identified at the combine and their impact on initial NFL performance. Cohort study; Level of evidence, 3. All previous musculoskeletal injuries identified at the NFL Combine from 2009 to 2015 were retrospectively reviewed. Medical records and imaging reports were examined. Game statistics for the first 2 seasons of NFL play were obtained for all players from 2009 to 2013. Analysis of injury prevalence and overall impact on the draft status and position-specific performance metrics of each injury was performed and compared with a position-matched control group with no history of injury or surgery. A total of 2203 athletes over 7 years were evaluated, including 1490 (67.6%) drafted athletes and 1040 (47.2%) who ultimately played at least 2 years in the NFL. The most common sites of injury were the ankle (1160, 52.7%), shoulder (1143, 51.9%), knee (1128, 51.2%), spine (785, 35.6%), and hand (739, 33.5%). Odds ratios (ORs) demonstrated that quarterbacks were most at risk of shoulder injury (OR, 2.78; P = .001), while running backs most commonly sustained ankle (OR, 1.39; P = .040) and shoulder injuries (OR, 1.55; P = .020) when compared with all other players. Ultimately, defensive players demonstrated a greater negative impact due to injury than offensive players, with multiple performance metrics significantly affected for each defensive position analyzed, whereas skilled offensive players (eg, quarterbacks, running backs) demonstrated only 1 metric significantly affected at each position. The most common sites of injury identified at the combine were (1) ankle, (2) shoulder, (3) knee, (4) spine, and (5) hand. Overall, performance in the NFL tended to worsen with injury history, with a direct correlation found between injury at a certain anatomic location and position of play. Defensive players tended to perform worse compared with offensive players if injury history was present.
Beaulieu-Jones, Brendin R.; Rossy, William H.; Sanchez, George; Whalen, James M.; Lavery, Kyle P.; McHale, Kevin J.; Vopat, Bryan G.; Van Allen, Joseph J.; Akamefula, Ramesses A.; Provencher, Matthew T.
2017-01-01
Background: At the annual National Football League (NFL) Scouting Combine, the medical staff of each NFL franchise performs a comprehensive medical evaluation of all athletes potentially entering the NFL. Currently, little is known regarding the overall epidemiology of injuries identified at the combine and their impact on NFL performance. Purpose: To determine the epidemiology of injuries identified at the combine and their impact on initial NFL performance. Study Design: Cohort study; Level of evidence, 3. Methods: All previous musculoskeletal injuries identified at the NFL Combine from 2009 to 2015 were retrospectively reviewed. Medical records and imaging reports were examined. Game statistics for the first 2 seasons of NFL play were obtained for all players from 2009 to 2013. Analysis of injury prevalence and overall impact on the draft status and position-specific performance metrics of each injury was performed and compared with a position-matched control group with no history of injury or surgery. Results: A total of 2203 athletes over 7 years were evaluated, including 1490 (67.6%) drafted athletes and 1040 (47.2%) who ultimately played at least 2 years in the NFL. The most common sites of injury were the ankle (1160, 52.7%), shoulder (1143, 51.9%), knee (1128, 51.2%), spine (785, 35.6%), and hand (739, 33.5%). Odds ratios (ORs) demonstrated that quarterbacks were most at risk of shoulder injury (OR, 2.78; P = .001), while running backs most commonly sustained ankle (OR, 1.39; P = .040) and shoulder injuries (OR, 1.55; P = .020) when compared with all other players. Ultimately, defensive players demonstrated a greater negative impact due to injury than offensive players, with multiple performance metrics significantly affected for each defensive position analyzed, whereas skilled offensive players (eg, quarterbacks, running backs) demonstrated only 1 metric significantly affected at each position. Conclusion: The most common sites of injury identified at the combine were (1) ankle, (2) shoulder, (3) knee, (4) spine, and (5) hand. Overall, performance in the NFL tended to worsen with injury history, with a direct correlation found between injury at a certain anatomic location and position of play. Defensive players tended to perform worse compared with offensive players if injury history was present. PMID:28812033
Visual salience metrics for image inpainting
NASA Astrophysics Data System (ADS)
Ardis, Paul A.; Singhal, Amit
2009-01-01
Quantitative metrics for successful image inpainting currently do not exist, with researchers instead relying upon qualitative human comparisons to evaluate their methodologies and techniques. In an attempt to rectify this situation, we propose two new metrics to capture the notions of noticeability and visual intent in order to evaluate inpainting results. The proposed metrics use a quantitative measure of visual salience based upon a computational model of human visual attention. We demonstrate how these two metrics repeatably correlate with qualitative opinion in a human observer study, correctly identify the optimum uses for exemplar-based inpainting (as specified in the original publication), and match qualitative opinion in published examples.
Metrics, The Measure of Your Future: Evaluation Report, 1977.
ERIC Educational Resources Information Center
North Carolina State Dept. of Public Instruction, Raleigh. Div. of Development.
The primary goal of the Metric Education Project was the systematic development of a replicable educational model to facilitate the system-wide conversion to the metric system during the next five to ten years. This document is an evaluation of that project. Three sets of statistical evidence exist to support the fact that the project has been…
Design strategies to improve patient motivation during robot-aided rehabilitation.
Colombo, Roberto; Pisano, Fabrizio; Mazzone, Alessandra; Delconte, Carmen; Micera, Silvestro; Carrozza, M Chiara; Dario, Paolo; Minuco, Giuseppe
2007-02-19
Motivation is an important factor in rehabilitation and frequently used as a determinant of rehabilitation outcome. Several factors can influence patient motivation and so improve exercise adherence. This paper presents the design of two robot devices for use in the rehabilitation of upper limb movements, that can motivate patients during the execution of the assigned motor tasks by enhancing the gaming aspects of rehabilitation. In addition, a regular review of the obtained performance can reinforce in patients' minds the importance of exercising and encourage them to continue, so improving their motivation and consequently adherence to the program. In view of this, we also developed an evaluation metric that could characterize the rate of improvement and quantify the changes in the obtained performance. Two groups (G1, n = 8 and G2, n = 12) of patients with chronic stroke were enrolled in a 3-week rehabilitation program including standard physical therapy (45 min. daily) plus treatment by means of robot devices (40 min., twice daily) respectively for wrist (G1) and elbow-shoulder movements (G2). Both groups were evaluated by means of standard clinical assessment scales and the new robot measured evaluation metric. Patients' motivation was assessed in 9/12 G2 patients by means of the Intrinsic Motivation Inventory (IMI) questionnaire. Both groups reduced their motor deficit and showed a significant improvement in clinical scales and the robot measured parameters. The IMI assessed in G2 patients showed high scores for interest, usefulness and importance subscales and low values for tension and pain subscales. Thanks to the design features of the two robot devices the therapist could easily adapt training to the individual by selecting different difficulty levels of the motor task tailored to each patient's disability. The gaming aspects incorporated in the two rehabilitation robots helped maintain patients' interest high during execution of the assigned tasks by providing feedback on performance. The evaluation metric gave a precise measure of patients' performance and thus provides a tool to help therapists promote patient motivation and hence adherence to the training program.
System performance testing of the DSN radio science system, Mark 3-78
NASA Technical Reports Server (NTRS)
Berman, A. L.; Mehta, J. S.
1978-01-01
System performance tests are required to evaluate system performance following initial system implementation and subsequent modification, and to validate system performance prior to actual operational usage. Non-real-time end-to-end Radio Science system performance tests are described that are based on the comparison of open-loop radio science data to equivalent closed-loop radio metric data, as well as an abbreviated Radio Science real-time system performance test that validates critical Radio Science System elements at the Deep Space Station prior to actual operational usage.
Performance metrics for the assessment of satellite data products: an ocean color case study
Performance assessment of ocean color satellite data has generally relied on statistical metrics chosen for their common usage and the rationale for selecting certain metrics is infrequently explained. Commonly reported statistics based on mean squared errors, such as the coeffic...
Performance Metrics for Soil Moisture Retrievals and Applications Requirements
USDA-ARS?s Scientific Manuscript database
Quadratic performance metrics such as root-mean-square error (RMSE) and time series correlation are often used to assess the accuracy of geophysical retrievals and true fields. These metrics are generally related; nevertheless each has advantages and disadvantages. In this study we explore the relat...
Challenge toward the prediction of typhoon behaviour and down pour
NASA Astrophysics Data System (ADS)
Takahashi, K.; Onishi, R.; Baba, Y.; Kida, S.; Matsuda, K.; Goto, K.; Fuchigami, H.
2013-08-01
Mechanisms of interactions among different scale phenomena play important roles for forecasting of weather and climate. Multi-scale Simulator for the Geoenvironment (MSSG), which deals with multi-scale multi-physics phenomena, is a coupled non-hydrostatic atmosphere-ocean model designed to be run efficiently on the Earth Simulator. We present simulation results with the world-highest 1.9km horizontal resolution for the entire globe and regional heavy rain with 1km horizontal resolution and 5m horizontal/vertical resolution for urban area simulation. To gain high performance by exploiting the system capabilities, we propose novel performance evaluation metrics introduced in previous studies that incorporate the effects of the data caching mechanism between CPU and memory. With a useful code optimization guideline based on such metrics, we demonstrate that MSSG can achieve an excellent peak performance ratio of 32.2% on the Earth Simulator with the single-core performance found to be a key to a reduced time-to-solution.
Performance Evaluation and Modeling Techniques for Parallel Processors. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Dimpsey, Robert Tod
1992-01-01
In practice, the performance evaluation of supercomputers is still substantially driven by singlepoint estimates of metrics (e.g., MFLOPS) obtained by running characteristic benchmarks or workloads. With the rapid increase in the use of time-shared multiprogramming in these systems, such measurements are clearly inadequate. This is because multiprogramming and system overhead, as well as other degradations in performance due to time varying characteristics of workloads, are not taken into account. In multiprogrammed environments, multiple jobs and users can dramatically increase the amount of system overhead and degrade the performance of the machine. Performance techniques, such as benchmarking, which characterize performance on a dedicated machine ignore this major component of true computer performance. Due to the complexity of analysis, there has been little work done in analyzing, modeling, and predicting the performance of applications in multiprogrammed environments. This is especially true for parallel processors, where the costs and benefits of multi-user workloads are exacerbated. While some may claim that the issue of multiprogramming is not a viable one in the supercomputer market, experience shows otherwise. Even in recent massively parallel machines, multiprogramming is a key component. It has even been claimed that a partial cause of the demise of the CM2 was the fact that it did not efficiently support time-sharing. In the same paper, Gordon Bell postulates that, multicomputers will evolve to multiprocessors in order to support efficient multiprogramming. Therefore, it is clear that parallel processors of the future will be required to offer the user a time-shared environment with reasonable response times for the applications. In this type of environment, the most important performance metric is the completion of response time of a given application. However, there are a few evaluation efforts addressing this issue.
Johnson, S J; Hunt, C M; Woolnough, H M; Crawshaw, M; Kilkenny, C; Gould, D A; England, A; Sinha, A; Villard, P F
2012-01-01
Objectives The aim of this article was to identify and prospectively investigate simulated ultrasound-guided targeted liver biopsy performance metrics as differentiators between levels of expertise in interventional radiology. Methods Task analysis produced detailed procedural step documentation allowing identification of critical procedure steps and performance metrics for use in a virtual reality ultrasound-guided targeted liver biopsy procedure. Consultant (n=14; male=11, female=3) and trainee (n=26; male=19, female=7) scores on the performance metrics were compared. Ethical approval was granted by the Liverpool Research Ethics Committee (UK). Independent t-tests and analysis of variance (ANOVA) investigated differences between groups. Results Independent t-tests revealed significant differences between trainees and consultants on three performance metrics: targeting, p=0.018, t=−2.487 (−2.040 to −0.207); probe usage time, p = 0.040, t=2.132 (11.064 to 427.983); mean needle length in beam, p=0.029, t=−2.272 (−0.028 to −0.002). ANOVA reported significant differences across years of experience (0–1, 1–2, 3+ years) on seven performance metrics: no-go area touched, p=0.012; targeting, p=0.025; length of session, p=0.024; probe usage time, p=0.025; total needle distance moved, p=0.038; number of skin contacts, p<0.001; total time in no-go area, p=0.008. More experienced participants consistently received better performance scores on all 19 performance metrics. Conclusion It is possible to measure and monitor performance using simulation, with performance metrics providing feedback on skill level and differentiating levels of expertise. However, a transfer of training study is required. PMID:21304005
Up Periscope! Designing a New Perceptual Metric for Imaging System Performance
NASA Technical Reports Server (NTRS)
Watson, Andrew B.
2016-01-01
Modern electronic imaging systems include optics, sensors, sampling, noise, processing, compression, transmission and display elements, and are viewed by the human eye. Many of these elements cannot be assessed by traditional imaging system metrics such as the MTF. More complex metrics such as NVTherm do address these elements, but do so largely through parametric adjustment of an MTF-like metric. The parameters are adjusted through subjective testing of human observers identifying specific targets in a set of standard images. We have designed a new metric that is based on a model of human visual pattern classification. In contrast to previous metrics, ours simulates the human observer identifying the standard targets. One application of this metric is to quantify performance of modern electronic periscope systems on submarines.
Improvement of impact noise in a passenger car utilizing sound metric based on wavelet transform
NASA Astrophysics Data System (ADS)
Lee, Sang-Kwon; Kim, Ho-Wuk; Na, Eun-Woo
2010-08-01
A new sound metric for impact sound is developed based on the continuous wavelet transform (CWT), a useful tool for the analysis of non-stationary signals such as impact noise. Together with new metric, two other conventional sound metrics related to sound modulation and fluctuation are also considered. In all, three sound metrics are employed to develop impact sound quality indexes for several specific impact courses on the road. Impact sounds are evaluated subjectively by 25 jurors. The indexes are verified by comparing the correlation between the index output and results of a subjective evaluation based on a jury test. These indexes are successfully applied to an objective evaluation for improvement of the impact sound quality for cases where some parts of the suspension system of the test car are modified.
Distance Metric Learning via Iterated Support Vector Machines.
Zuo, Wangmeng; Wang, Faqiang; Zhang, David; Lin, Liang; Huang, Yuchi; Meng, Deyu; Zhang, Lei
2017-07-11
Distance metric learning aims to learn from the given training data a valid distance metric, with which the similarity between data samples can be more effectively evaluated for classification. Metric learning is often formulated as a convex or nonconvex optimization problem, while most existing methods are based on customized optimizers and become inefficient for large scale problems. In this paper, we formulate metric learning as a kernel classification problem with the positive semi-definite constraint, and solve it by iterated training of support vector machines (SVMs). The new formulation is easy to implement and efficient in training with the off-the-shelf SVM solvers. Two novel metric learning models, namely Positive-semidefinite Constrained Metric Learning (PCML) and Nonnegative-coefficient Constrained Metric Learning (NCML), are developed. Both PCML and NCML can guarantee the global optimality of their solutions. Experiments are conducted on general classification, face verification and person re-identification to evaluate our methods. Compared with the state-of-the-art approaches, our methods can achieve comparable classification accuracy and are efficient in training.
Testing the Construct Validity of a Virtual Reality Hip Arthroscopy Simulator.
Khanduja, Vikas; Lawrence, John E; Audenaert, Emmanuel
2017-03-01
To test the construct validity of the hip diagnostics module of a virtual reality hip arthroscopy simulator. Nineteen orthopaedic surgeons performed a simulated arthroscopic examination of a healthy hip joint using a 70° arthroscope in the supine position. Surgeons were categorized as either expert (those who had performed 250 hip arthroscopies or more) or novice (those who had performed fewer than this). Twenty-one specific targets were visualized within the central and peripheral compartments; 9 via the anterior portal, 9 via the anterolateral portal, and 3 via the posterolateral portal. This was immediately followed by a task testing basic probe examination of the joint in which a series of 8 targets were probed via the anterolateral portal. During the tasks, the surgeon's performance was evaluated by the simulator using a set of predefined metrics including task duration, number of soft tissue and bone collisions, and distance travelled by instruments. No repeat attempts at the tasks were permitted. Construct validity was then evaluated by comparing novice and expert group performance metrics over the 2 tasks using the Mann-Whitney test, with a P value of less than .05 considered significant. On the visualization task, the expert group outperformed the novice group on time taken (P = .0003), number of collisions with soft tissue (P = .001), number of collisions with bone (P = .002), and distance travelled by the arthroscope (P = .02). On the probe examination, the 2 groups differed only in the time taken to complete the task (P = .025) with no significant difference in other metrics. Increased experience in hip arthroscopy was reflected by significantly better performance on the virtual reality simulator across 2 tasks, supporting its construct validity. This study validates a virtual reality hip arthroscopy simulator and supports its potential for developing basic arthroscopic skills. Level III. Copyright © 2016 Arthroscopy Association of North America. All rights reserved.
Towards a Framework for Evaluating and Comparing Diagnosis Algorithms
NASA Technical Reports Server (NTRS)
Kurtoglu, Tolga; Narasimhan, Sriram; Poll, Scott; Garcia,David; Kuhn, Lukas; deKleer, Johan; vanGemund, Arjan; Feldman, Alexander
2009-01-01
Diagnostic inference involves the detection of anomalous system behavior and the identification of its cause, possibly down to a failed unit or to a parameter of a failed unit. Traditional approaches to solving this problem include expert/rule-based, model-based, and data-driven methods. Each approach (and various techniques within each approach) use different representations of the knowledge required to perform the diagnosis. The sensor data is expected to be combined with these internal representations to produce the diagnosis result. In spite of the availability of various diagnosis technologies, there have been only minimal efforts to develop a standardized software framework to run, evaluate, and compare different diagnosis technologies on the same system. This paper presents a framework that defines a standardized representation of the system knowledge, the sensor data, and the form of the diagnosis results and provides a run-time architecture that can execute diagnosis algorithms, send sensor data to the algorithms at appropriate time steps from a variety of sources (including the actual physical system), and collect resulting diagnoses. We also define a set of metrics that can be used to evaluate and compare the performance of the algorithms, and provide software to calculate the metrics.
Webb-Robertson, Bobbie-Jo M.; Wiberg, Holli K.; Matzke, Melissa M.; ...
2015-04-09
In this review, we apply selected imputation strategies to label-free liquid chromatography–mass spectrometry (LC–MS) proteomics datasets to evaluate the accuracy with respect to metrics of variance and classification. We evaluate several commonly used imputation approaches for individual merits and discuss the caveats of each approach with respect to the example LC–MS proteomics data. In general, local similarity-based approaches, such as the regularized expectation maximization and least-squares adaptive algorithms, yield the best overall performances with respect to metrics of accuracy and robustness. However, no single algorithm consistently outperforms the remaining approaches, and in some cases, performing classification without imputation sometimes yieldedmore » the most accurate classification. Thus, because of the complex mechanisms of missing data in proteomics, which also vary from peptide to protein, no individual method is a single solution for imputation. In summary, on the basis of the observations in this review, the goal for imputation in the field of computational proteomics should be to develop new approaches that work generically for this data type and new strategies to guide users in the selection of the best imputation for their dataset and analysis objectives.« less
Automated Metrics in a Virtual-Reality Myringotomy Simulator: Development and Construct Validity.
Huang, Caiwen; Cheng, Horace; Bureau, Yves; Ladak, Hanif M; Agrawal, Sumit K
2018-06-15
The objectives of this study were: 1) to develop and implement a set of automated performance metrics into the Western myringotomy simulator, and 2) to establish construct validity. Prospective simulator-based assessment study. The Auditory Biophysics Laboratory at Western University, London, Ontario, Canada. Eleven participants were recruited from the Department of Otolaryngology-Head & Neck Surgery at Western University: four senior otolaryngology consultants and seven junior otolaryngology residents. Educational simulation. Discrimination between expert and novice participants on five primary automated performance metrics: 1) time to completion, 2) surgical errors, 3) incision angle, 4) incision length, and 5) the magnification of the microscope. Automated performance metrics were developed, programmed, and implemented into the simulator. Participants were given a standardized simulator orientation and instructions on myringotomy and tube placement. Each participant then performed 10 procedures and automated metrics were collected. The metrics were analyzed using the Mann-Whitney U test with Bonferroni correction. All metrics discriminated senior otolaryngologists from junior residents with a significance of p < 0.002. Junior residents had 2.8 times more errors compared with the senior otolaryngologists. Senior otolaryngologists took significantly less time to completion compared with junior residents. The senior group also had significantly longer incision lengths, more accurate incision angles, and lower magnification keeping both the umbo and annulus in view. Automated quantitative performance metrics were successfully developed and implemented, and construct validity was established by discriminating between expert and novice participants.
Usability testing of a mobile robotic system for in-home telerehabilitation.
Boissy, Patrick; Brière, Simon; Corriveau, Hélène; Grant, Andrew; Lauria, Michel; Michaud, François
2011-01-01
Mobile robots designed to enhance telepresence in the support of telehealth services are being considered for numerous applications. TELEROBOT is a teleoperated mobile robotic platform equipped with videoconferencingcapabilities and designed to be used in a home environment to. In this study, learnability of the system's teleoperation interface and controls was evaluated with ten rehabilitation professionals during four training sessions in a laboratory environment and in an unknown home environment while performing the execution of a standardized evaluation protocol typically used in home care. Results show that the novice teleoperators' performances on two of the four metrics used (number of command and total time) improved significantly across training sessions (ANOVAS, p<0.05) and that performance in these metrics in the last training session reflected teleoperation abilities seen in the unknown home environment during navigation tasks (r=0,77 and 0,60). With only 4 hours of training, rehabilitation professionals were able learn to teleoperate successfully TELEROBOT. However teleoperation performances remained significantly less efficient then those of an expert. Under the home task condition (navigating the home environment from one point to the other as fast as possible) this translated to completion time between 350 seconds (best performance) and 850 seconds (worse performance). Improvements in other usability aspects of the system will be needed to meet the requirements of in-home telerehabilitation.
Metric Evaluation Pipeline for 3d Modeling of Urban Scenes
NASA Astrophysics Data System (ADS)
Bosch, M.; Leichtman, A.; Chilcott, D.; Goldberg, H.; Brown, M.
2017-05-01
Publicly available benchmark data and metric evaluation approaches have been instrumental in enabling research to advance state of the art methods for remote sensing applications in urban 3D modeling. Most publicly available benchmark datasets have consisted of high resolution airborne imagery and lidar suitable for 3D modeling on a relatively modest scale. To enable research in larger scale 3D mapping, we have recently released a public benchmark dataset with multi-view commercial satellite imagery and metrics to compare 3D point clouds with lidar ground truth. We now define a more complete metric evaluation pipeline developed as publicly available open source software to assess semantically labeled 3D models of complex urban scenes derived from multi-view commercial satellite imagery. Evaluation metrics in our pipeline include horizontal and vertical accuracy and completeness, volumetric completeness and correctness, perceptual quality, and model simplicity. Sources of ground truth include airborne lidar and overhead imagery, and we demonstrate a semi-automated process for producing accurate ground truth shape files to characterize building footprints. We validate our current metric evaluation pipeline using 3D models produced using open source multi-view stereo methods. Data and software is made publicly available to enable further research and planned benchmarking activities.
Silosky, Michael S; Marsh, Rebecca M; Scherzinger, Ann L
2016-07-08
When The Joint Commission updated its Requirements for Diagnostic Imaging Services for hospitals and ambulatory care facilities on July 1, 2015, among the new requirements was an annual performance evaluation for acquisition workstation displays. The purpose of this work was to evaluate a large cohort of acquisition displays used in a clinical environment and compare the results with existing performance standards provided by the American College of Radiology (ACR) and the American Association of Physicists in Medicine (AAPM). Measurements of the minimum luminance, maximum luminance, and luminance uniformity, were performed on 42 acquisition displays across multiple imaging modalities. The mean values, standard deviations, and ranges were calculated for these metrics. Additionally, visual evaluations of contrast, spatial resolution, and distortion were performed using either the Society of Motion Pictures and Television Engineers test pattern or the TG-18-QC test pattern. Finally, an evaluation of local nonuniformities was performed using either a uniform white display or the TG-18-UN80 test pattern. Displays tested were flat panel, liquid crystal displays that ranged from less than 1 to up to 10 years of use and had been built by a wide variety of manufacturers. The mean values for Lmin and Lmax for the displays tested were 0.28 ± 0.13 cd/m2 and 135.07 ± 33.35 cd/m2, respectively. The mean maximum luminance deviation for both ultrasound and non-ultrasound displays was 12.61% ± 4.85% and 14.47% ± 5.36%, respectively. Visual evaluation of display performance varied depending on several factors including brightness and contrast settings and the test pattern used for image quality assessment. This work provides a snapshot of the performance of 42 acquisition displays across several imaging modalities in clinical use at a large medical center. Comparison with existing performance standards reveals that changes in display technology and the move from cathode ray tube displays to flat panel displays may have rendered some of the tests inappropriate for modern use. © 2016 The Authors.
An Overview and Empirical Comparison of Distance Metric Learning Methods.
Moutafis, Panagiotis; Leng, Mengjun; Kakadiaris, Ioannis A
2016-02-16
In this paper, we first offer an overview of advances in the field of distance metric learning. Then, we empirically compare selected methods using a common experimental protocol. The number of distance metric learning algorithms proposed keeps growing due to their effectiveness and wide application. However, existing surveys are either outdated or they focus only on a few methods. As a result, there is an increasing need to summarize the obtained knowledge in a concise, yet informative manner. Moreover, existing surveys do not conduct comprehensive experimental comparisons. On the other hand, individual distance metric learning papers compare the performance of the proposed approach with only a few related methods and under different settings. This highlights the need for an experimental evaluation using a common and challenging protocol. To this end, we conduct face verification experiments, as this task poses significant challenges due to varying conditions during data acquisition. In addition, face verification is a natural application for distance metric learning because the encountered challenge is to define a distance function that: 1) accurately expresses the notion of similarity for verification; 2) is robust to noisy data; 3) generalizes well to unseen subjects; and 4) scales well with the dimensionality and number of training samples. In particular, we utilize well-tested features to assess the performance of selected methods following the experimental protocol of the state-of-the-art database labeled faces in the wild. A summary of the results is presented along with a discussion of the insights obtained and lessons learned by employing the corresponding algorithms.
Evaluation Metrics for the Paragon XP/S-15
NASA Technical Reports Server (NTRS)
Traversat, Bernard; McNab, David; Nitzberg, Bill; Fineberg, Sam; Blaylock, Bruce T. (Technical Monitor)
1993-01-01
On February 17th 1993, the Numerical Aerodynamic Simulation (NAS) facility located at the NASA Ames Research Center installed a 224 node Intel Paragon XP/S-15 system. After its installation, the Paragon was found to be in a very immature state and was unable to support a NAS users' workload, composed of a wide range of development and production activities. As a first step towards addressing this problem, we implemented a set of metrics to objectively monitor the system as operating system and hardware upgrades were installed. The metrics were designed to measure four aspects of the system that we consider essential to support our workload: availability, utilization, functionality, and performance. This report presents the metrics collected from February 1993 to August 1993. Since its installation, the Paragon availability has improved from a low of 15% uptime to a high of 80%, while its utilization has remained low. Functionality and performance have improved from merely running one of the NAS Parallel Benchmarks to running all of them faster (between 1 and 2 times) than on the iPSC/860. In spite of the progress accomplished, fundamental limitations of the Paragon operating system are restricting the Paragon from supporting the NAS workload. The maximum operating system message passing (NORMA IPC) bandwidth was measured at 11 Mbytes/s, well below the peak hardware bandwidth (175 Mbytes/s), limiting overall virtual memory and Unix services (i.e. Disk and HiPPI I/O) performance. The high NX application message passing latency (184 microns), three times than on the iPSC/860, was found to significantly degrade performance of applications relying on small message sizes. The amount of memory available for an application was found to be approximately 10 Mbytes per node, indicating that the OS is taking more space than anticipated (6 Mbytes per node).
Indicators and Metrics for Evaluating the Sustainability of Chemical Processes
A metric-based method, called GREENSCOPE, has been developed for evaluating process sustainability. Using lab-scale information and engineering assumptions the method evaluates full-scale epresentations of processes in environmental, efficiency, energy and economic areas. The m...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Götstedt, Julia; Karlsson Hauer, Anna; Bäck, Anna, E-mail: anna.back@vgregion.se
Purpose: Complexity metrics have been suggested as a complement to measurement-based quality assurance for intensity modulated radiation therapy (IMRT) and volumetric modulated arc therapy (VMAT). However, these metrics have not yet been sufficiently validated. This study develops and evaluates new aperture-based complexity metrics in the context of static multileaf collimator (MLC) openings and compares them to previously published metrics. Methods: This study develops the converted aperture metric and the edge area metric. The converted aperture metric is based on small and irregular parts within the MLC opening that are quantified as measured distances between MLC leaves. The edge area metricmore » is based on the relative size of the region around the edges defined by the MLC. Another metric suggested in this study is the circumference/area ratio. Earlier defined aperture-based complexity metrics—the modulation complexity score, the edge metric, the ratio monitor units (MU)/Gy, the aperture area, and the aperture irregularity—are compared to the newly proposed metrics. A set of small and irregular static MLC openings are created which simulate individual IMRT/VMAT control points of various complexities. These are measured with both an amorphous silicon electronic portal imaging device and EBT3 film. The differences between calculated and measured dose distributions are evaluated using a pixel-by-pixel comparison with two global dose difference criteria of 3% and 5%. The extent of the dose differences, expressed in terms of pass rate, is used as a measure of the complexity of the MLC openings and used for the evaluation of the metrics compared in this study. The different complexity scores are calculated for each created static MLC opening. The correlation between the calculated complexity scores and the extent of the dose differences (pass rate) are analyzed in scatter plots and using Pearson’s r-values. Results: The complexity scores calculated by the edge area metric, converted aperture metric, circumference/area ratio, edge metric, and MU/Gy ratio show good linear correlation to the complexity of the MLC openings, expressed as the 5% dose difference pass rate, with Pearson’s r-values of −0.94, −0.88, −0.84, −0.89, and −0.82, respectively. The overall trends for the 3% and 5% dose difference evaluations are similar. Conclusions: New complexity metrics are developed. The calculated scores correlate to the complexity of the created static MLC openings. The complexity of the MLC opening is dependent on the penumbra region relative to the area of the opening. The aperture-based complexity metrics that combined either the distances between the MLC leaves or the MLC opening circumference with the aperture area show the best correlation with the complexity of the static MLC openings.« less
Degraded visual environment image/video quality metrics
NASA Astrophysics Data System (ADS)
Baumgartner, Dustin D.; Brown, Jeremy B.; Jacobs, Eddie L.; Schachter, Bruce J.
2014-06-01
A number of image quality metrics (IQMs) and video quality metrics (VQMs) have been proposed in the literature for evaluating techniques and systems for mitigating degraded visual environments. Some require both pristine and corrupted imagery. Others require patterned target boards in the scene. None of these metrics relates well to the task of landing a helicopter in conditions such as a brownout dust cloud. We have developed and used a variety of IQMs and VQMs related to the pilot's ability to detect hazards in the scene and to maintain situational awareness. Some of these metrics can be made agnostic to sensor type. Not only are the metrics suitable for evaluating algorithm and sensor variation, they are also suitable for choosing the most cost effective solution to improve operating conditions in degraded visual environments.
Iterative methods for mixed finite element equations
NASA Technical Reports Server (NTRS)
Nakazawa, S.; Nagtegaal, J. C.; Zienkiewicz, O. C.
1985-01-01
Iterative strategies for the solution of indefinite system of equations arising from the mixed finite element method are investigated in this paper with application to linear and nonlinear problems in solid and structural mechanics. The augmented Hu-Washizu form is derived, which is then utilized to construct a family of iterative algorithms using the displacement method as the preconditioner. Two types of iterative algorithms are implemented. Those are: constant metric iterations which does not involve the update of preconditioner; variable metric iterations, in which the inverse of the preconditioning matrix is updated. A series of numerical experiments is conducted to evaluate the numerical performance with application to linear and nonlinear model problems.
Stochastic HKMDHE: A multi-objective contrast enhancement algorithm
NASA Astrophysics Data System (ADS)
Pratiher, Sawon; Mukhopadhyay, Sabyasachi; Maity, Srideep; Pradhan, Asima; Ghosh, Nirmalya; Panigrahi, Prasanta K.
2018-02-01
This contribution proposes a novel extension of the existing `Hyper Kurtosis based Modified Duo-Histogram Equalization' (HKMDHE) algorithm, for multi-objective contrast enhancement of biomedical images. A novel modified objective function has been formulated by joint optimization of the individual histogram equalization objectives. The optimal adequacy of the proposed methodology with respect to image quality metrics such as brightness preserving abilities, peak signal-to-noise ratio (PSNR), Structural Similarity Index (SSIM) and universal image quality metric has been experimentally validated. The performance analysis of the proposed Stochastic HKMDHE with existing histogram equalization methodologies like Global Histogram Equalization (GHE) and Contrast Limited Adaptive Histogram Equalization (CLAHE) has been given for comparative evaluation.
Development of a turbojet engine gearbox test rig for prognostics and health management
NASA Astrophysics Data System (ADS)
Rezaei, Aida; Dadouche, Azzedine
2012-11-01
Aircraft engine gearboxes represent one of the many critical systems/elements that require special attention for longer and safer operation. Reactive maintenance strategies are unsuitable as they usually imply higher repair costs when compared to condition based maintenance. This paper discusses the main prognostics and health management (PHM) approaches, describes a newly designed gearbox experimental facility and analyses preliminary data for gear prognosis. The test rig is designed to provide full capabilities of performing controlled experiments suitable for developing a reliable diagnostic and prognostic system. The rig is based on the accessory gearbox of the GE J85 turbojet engine, which has been slightly modified and reconfigured to replicate real operating conditions such as speeds and loads. Defect to failure tests (DTFT) have been run to evaluate the performance of the rig as well as to assess prognostic metrics extracted from sensors installed on the gearbox casing (vibration and acoustic). The paper also details the main components of the rig and describes the various challenges encountered. Successful DTFT results were obtained during an idle engine performance test and prognostic metrics associated with the sensor suite were evaluated and discussed.
Dehbi, Hakim-Moulay; Howard, James P; Shun-Shin, Matthew J; Sen, Sayan; Nijjer, Sukhjinder S; Mayet, Jamil; Davies, Justin E; Francis, Darrel P
2018-01-01
Background Diagnostic accuracy is widely accepted by researchers and clinicians as an optimal expression of a test’s performance. The aim of this study was to evaluate the effects of disease severity distribution on values of diagnostic accuracy as well as propose a sample-independent methodology to calculate and display accuracy of diagnostic tests. Methods and findings We evaluated the diagnostic relationship between two hypothetical methods to measure serum cholesterol (Cholrapid and Cholgold) by generating samples with statistical software and (1) keeping the numerical relationship between methods unchanged and (2) changing the distribution of cholesterol values. Metrics of categorical agreement were calculated (accuracy, sensitivity and specificity). Finally, a novel methodology to display and calculate accuracy values was presented (the V-plot of accuracies). Conclusion No single value of diagnostic accuracy can be used to describe the relationship between tests, as accuracy is a metric heavily affected by the underlying sample distribution. Our novel proposed methodology, the V-plot of accuracies, can be used as a sample-independent measure of a test performance against a reference gold standard. PMID:29387424
ERIC Educational Resources Information Center
Exum, Kenith Gene
Examined is the effectiveness of a method of teaching the metric system using the booklet, Metric Supplement to Mathematics, in combination with a physical science textbook. The participants in the study were randomly selected undergraduates in a non-science oriented program of study. Instruments used included the Metric Supplement to Mathematics…
NASA Technical Reports Server (NTRS)
Basili, V. R.
1981-01-01
Work on metrics is discussed. Factors that affect software quality are reviewed. Metrics is discussed in terms of criteria achievements, reliability, and fault tolerance. Subjective and objective metrics are distinguished. Product/process and cost/quality metrics are characterized and discussed.
A Three-Dimensional Receiver Operator Characteristic Surface Diagnostic Metric
NASA Technical Reports Server (NTRS)
Simon, Donald L.
2011-01-01
Receiver Operator Characteristic (ROC) curves are commonly applied as metrics for quantifying the performance of binary fault detection systems. An ROC curve provides a visual representation of a detection system s True Positive Rate versus False Positive Rate sensitivity as the detection threshold is varied. The area under the curve provides a measure of fault detection performance independent of the applied detection threshold. While the standard ROC curve is well suited for quantifying binary fault detection performance, it is not suitable for quantifying the classification performance of multi-fault classification problems. Furthermore, it does not provide a measure of diagnostic latency. To address these shortcomings, a novel three-dimensional receiver operator characteristic (3D ROC) surface metric has been developed. This is done by generating and applying two separate curves: the standard ROC curve reflecting fault detection performance, and a second curve reflecting fault classification performance. A third dimension, diagnostic latency, is added giving rise to 3D ROC surfaces. Applying numerical integration techniques, the volumes under and between the surfaces are calculated to produce metrics of the diagnostic system s detection and classification performance. This paper will describe the 3D ROC surface metric in detail, and present an example of its application for quantifying the performance of aircraft engine gas path diagnostic methods. Metric limitations and potential enhancements are also discussed
Cross-evaluation of metrics to estimate the significance of creative works
Wasserman, Max; Zeng, Xiao Han T.; Amaral, Luís A. Nunes
2015-01-01
In a world overflowing with creative works, it is useful to be able to filter out the unimportant works so that the significant ones can be identified and thereby absorbed. An automated method could provide an objective approach for evaluating the significance of works on a universal scale. However, there have been few attempts at creating such a measure, and there are few “ground truths” for validating the effectiveness of potential metrics for significance. For movies, the US Library of Congress’s National Film Registry (NFR) contains American films that are “culturally, historically, or aesthetically significant” as chosen through a careful evaluation and deliberation process. By analyzing a network of citations between 15,425 United States-produced films procured from the Internet Movie Database (IMDb), we obtain several automated metrics for significance. The best of these metrics is able to indicate a film’s presence in the NFR at least as well or better than metrics based on aggregated expert opinions or large population surveys. Importantly, automated metrics can easily be applied to older films for which no other rating may be available. Our results may have implications for the evaluation of other creative works such as scientific research. PMID:25605881
Shaikh, M S; Moiz, B
2016-04-01
Around two-thirds of important clinical decisions about the management of patients are based on laboratory test results. Clinical laboratories are required to adopt quality control (QC) measures to ensure provision of accurate and precise results. Six sigma is a statistical tool, which provides opportunity to assess performance at the highest level of excellence. The purpose of this study was to assess performance of our hematological parameters on sigma scale in order to identify gaps and hence areas of improvement in patient care. Twelve analytes included in the study were hemoglobin (Hb), hematocrit (Hct), red blood cell count (RBC), mean corpuscular volume (MCV), red cell distribution width (RDW), total leukocyte count (TLC) with percentages of neutrophils (Neutr%) and lymphocytes (Lymph %), platelet count (Plt), mean platelet volume (MPV), prothrombin time (PT), and fibrinogen (Fbg). Internal quality control data and external quality assurance survey results were utilized for the calculation of sigma metrics for each analyte. Acceptable sigma value of ≥3 was obtained for the majority of the analytes included in the analysis. MCV, Plt, and Fbg achieved value of <3 for level 1 (low abnormal) control. PT performed poorly on both level 1 and 2 controls with sigma value of <3. Despite acceptable conventional QC tools, application of sigma metrics can identify analytical deficits and hence prospects for the improvement in clinical laboratories. © 2016 John Wiley & Sons Ltd.
Performance metric comparison study for non-magnetic bi-stable energy harvesters
NASA Astrophysics Data System (ADS)
Udani, Janav P.; Wrigley, Cailin; Arrieta, Andres F.
2017-04-01
Energy harvesting employing non-linear systems offers considerable advantages over linear systems given the broadband resonant response which is favorable for applications involving diverse input vibrations. In this respect, the rich dynamics of bi-stable systems present a promising means for harvesting vibrational energy from ambient sources. Harvesters deriving their bi-stability from thermally induced stresses as opposed to magnetic forces are receiving significant attention as it reduces the need for ancillary components and allows for bio- compatible constructions. However, the design of these bi-stable harvesters still requires further optimization to completely exploit the dynamic behavior of these systems. This study presents a comparison of the harvesting capabilities of non-magnetic, bi-stable composite laminates under variations in the design parameters as evaluated utilizing established power metrics. Energy output characteristics of two bi-stable composite laminate plates with a piezoelectric patch bonded on the top surface are experimentally investigated for variations in the thickness ratio and inertial mass positions for multiple load conditions. A particular design configuration is found to perform better over the entire range of testing conditions which include single and multiple frequency excitation, thus indicating that design optimization over the geometry of the harvester yields robust performance. The experimental analysis further highlights the need for appropriate design guidelines for optimization and holistic performance metrics to account for the range of operational conditions.
Metrics, The Measure of Your Future: Materials Evaluation Forms.
ERIC Educational Resources Information Center
Troy, Joan B.
Three evaluation forms are contained in this publication by the Winston-Salem/Forsyth Metric Education Project to be used in conjunction with their materials. They are: (1) Field-Test Materials Evaluation Form; (2) Student Materials Evaluation Form; and (3) Composite Materials Evaluation Form. The questions in these forms are phrased so they can…
2014-10-08
and suspected fraud were not reported; • time and material vouchers were excluded from a paid voucher review; • a supervisor did not support an...6 Requirements for Identification of Potential Fraud __________________________________________6 Audit Deficiencies and Performance Metrics...officer and DCAA auditor an incurred cost submission six months after the end of the contractor’s fiscal year. Incurred cost audits are usually performed
An Examination of Diameter Density Prediction with k-NN and Airborne Lidar
Strunk, Jacob L.; Gould, Peter J.; Packalen, Petteri; ...
2017-11-16
While lidar-based forest inventory methods have been widely demonstrated, performances of methods to predict tree diameters with airborne lidar (lidar) are not well understood. One cause for this is that the performance metrics typically used in studies for prediction of diameters can be difficult to interpret, and may not support comparative inferences between sampling designs and study areas. To help with this problem we propose two indices and use them to evaluate a variety of lidar and k nearest neighbor (k-NN) strategies for prediction of tree diameter distributions. The indices are based on the coefficient of determination ( R 2),more » and root mean square deviation (RMSD). Both of the indices are highly interpretable, and the RMSD-based index facilitates comparisons with alternative (non-lidar) inventory strategies, and with projects in other regions. K-NN diameter distribution prediction strategies were examined using auxiliary lidar for 190 training plots distribute across the 800 km 2 Savannah River Site in South Carolina, USA. In conclusion, we evaluate the performance of k-NN with respect to distance metrics, number of neighbors, predictor sets, and response sets. K-NN and lidar explained 80% of variability in diameters, and Mahalanobis distance with k = 3 neighbors performed best according to a number of criteria.« less
NASA Technical Reports Server (NTRS)
Tikidjian, Raffi; Mackey, Ryan
2008-01-01
The DSN Array Simulator (wherein 'DSN' signifies NASA's Deep Space Network) is an updated version of software previously denoted the DSN Receive Array Technology Assessment Simulation. This software (see figure) is used for computational modeling of a proposed DSN facility comprising user-defined arrays of antennas and transmitting and receiving equipment for microwave communication with spacecraft on interplanetary missions. The simulation includes variations in spacecraft tracked and communication demand changes for up to several decades of future operation. Such modeling is performed to estimate facility performance, evaluate requirements that govern facility design, and evaluate proposed improvements in hardware and/or software. The updated version of this software affords enhanced capability for characterizing facility performance against user-defined mission sets. The software includes a Monte Carlo simulation component that enables rapid generation of key mission-set metrics (e.g., numbers of links, data rates, and date volumes), and statistical distributions thereof as functions of time. The updated version also offers expanded capability for mixed-asset network modeling--for example, for running scenarios that involve user-definable mixtures of antennas having different diameters (in contradistinction to a fixed number of antennas having the same fixed diameter). The improved version also affords greater simulation fidelity, sufficient for validation by comparison with actual DSN operations and analytically predictable performance metrics.
An Examination of Diameter Density Prediction with k-NN and Airborne Lidar
DOE Office of Scientific and Technical Information (OSTI.GOV)
Strunk, Jacob L.; Gould, Peter J.; Packalen, Petteri
While lidar-based forest inventory methods have been widely demonstrated, performances of methods to predict tree diameters with airborne lidar (lidar) are not well understood. One cause for this is that the performance metrics typically used in studies for prediction of diameters can be difficult to interpret, and may not support comparative inferences between sampling designs and study areas. To help with this problem we propose two indices and use them to evaluate a variety of lidar and k nearest neighbor (k-NN) strategies for prediction of tree diameter distributions. The indices are based on the coefficient of determination ( R 2),more » and root mean square deviation (RMSD). Both of the indices are highly interpretable, and the RMSD-based index facilitates comparisons with alternative (non-lidar) inventory strategies, and with projects in other regions. K-NN diameter distribution prediction strategies were examined using auxiliary lidar for 190 training plots distribute across the 800 km 2 Savannah River Site in South Carolina, USA. In conclusion, we evaluate the performance of k-NN with respect to distance metrics, number of neighbors, predictor sets, and response sets. K-NN and lidar explained 80% of variability in diameters, and Mahalanobis distance with k = 3 neighbors performed best according to a number of criteria.« less
Situation Awareness and Workload Measures for SAFOR
NASA Technical Reports Server (NTRS)
DeMaio, Joe; Hart, Sandra G.; Allen, Ed (Technical Monitor)
1999-01-01
The present research was conducted in support of the NASA Safe All-Weather Flight Operations for Rotorcraft (SAFOR) program. The purpose of the work was to investigate the utility of two measurement tools developed by the British Defense Evaluation Research Agency. These tools were a subjective workload assessment scale, the DRA Workload Scale (DRAWS), and a situation awareness measurement tool in which the crews self-evaluation of performance is compared against actual performance. These two measurement tools were evaluated in the context of a test of an innovative approach to alerting the crew by way of a helmet mounted display. The DRAWS was found to be usable, but it offered no advantages over extant scales, and it had only limited resolution. The performance self-evaluation metric of situation awareness was found to be highly effective.
NASA Astrophysics Data System (ADS)
Ring, Christoph; Pollinger, Felix; Kaspar-Ott, Irena; Hertig, Elke; Jacobeit, Jucundus; Paeth, Heiko
2017-04-01
The COMEPRO project (Comparison of Metrics for Probabilistic Climate Change Projections of Mediterranean Precipitation), funded by the Deutsche Forschungsgemeinschaft (DFG), is dedicated to the development of new evaluation metrics for state-of-the-art climate models. Further, we analyze implications for probabilistic projections of climate change. This study focuses on the results of 4-field matrix metrics. Here, six different approaches are compared. We evaluate 24 models of the Coupled Model Intercomparison Project Phase 3 (CMIP3), 40 of CMIP5 and 18 of the Coordinated Regional Downscaling Experiment (CORDEX). In addition to the annual and seasonal precipitation the mean temperature is analysed. We consider both 50-year trend and climatological mean for the second half of the 20th century. For the probabilistic projections of climate change A1b, A2 (CMIP3) and RCP4.5, RCP8.5 (CMIP5,CORDEX) scenarios are used. The eight main study areas are located in the Mediterranean. However, we apply our metrics to globally distributed regions as well. The metrics show high simulation quality of temperature trend and both precipitation and temperature mean for most climate models and study areas. In addition, we find high potential for model weighting in order to reduce uncertainty. These results are in line with other accepted evaluation metrics and studies. The comparison of the different 4-field approaches reveals high correlations for most metrics. The results of the metric-weighted probabilistic density functions of climate change are heterogeneous. We find for different regions and seasons both increases and decreases of uncertainty. The analysis of global study areas is consistent with the regional study areas of the Medeiterrenean.
Kramers, Matthew; Armstrong, Ryan; Bakhshmand, Saeed M; Fenster, Aaron; de Ribaupierre, Sandrine; Eagleson, Roy
2014-01-01
Image guidance can provide surgeons with valuable contextual information during a medical intervention. Often, image guidance systems require considerable infrastructure, setup-time, and operator experience to be utilized. Certain procedures performed at bedside are susceptible to navigational errors that can lead to complications. We present an application for mobile devices that can provide image guidance using augmented reality to assist in performing neurosurgical tasks. A methodology is outlined that evaluates this mode of visualization from the standpoint of perceptual localization, depth estimation, and pointing performance, in scenarios derived from a neurosurgical targeting task. By measuring user variability and speed we can report objective metrics of performance for our augmented reality guidance system.
Experimental Evaluation of High Performance Integrated Heat Pump
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, William A; Berry, Robert; Durfee, Neal
2016-01-01
Integrated heat pump (IHP) technology provides significant potential for energy savings and comfort improvement for residential buildings. In this study, we evaluate the performance of a high performance IHP that provides space heating, cooling, and water heating services. Experiments were conducted according to the ASHRAE Standard 206-2013 where 24 test conditions were identified in order to evaluate the IHP performance indices based on the airside performance. Empirical curve fits of the unit s compressor maps are used in conjunction with saturated condensing and evaporating refrigerant conditions to deduce the refrigerant mass flowrate, which, in turn was used to evaluate themore » refrigerant side performance as a check on the airside performance. Heat pump (compressor, fans, and controls) and water pump power were measured separately per requirements of Standard 206. The system was charged per the system manufacturer s specifications. System test results are presented for each operating mode. The overall IHP performance metrics are determined from the test results per the Standard 206 calculation procedures.« less
Coupled parametric design of flow control and duct shape
NASA Technical Reports Server (NTRS)
Florea, Razvan (Inventor); Bertuccioli, Luca (Inventor)
2009-01-01
A method for designing gas turbine engine components using a coupled parametric analysis of part geometry and flow control is disclosed. Included are the steps of parametrically defining the geometry of the duct wall shape, parametrically defining one or more flow control actuators in the duct wall, measuring a plurality of performance parameters or metrics (e.g., flow characteristics) of the duct and comparing the results of the measurement with desired or target parameters, and selecting the optimal duct geometry and flow control for at least a portion of the duct, the selection process including evaluating the plurality of performance metrics in a pareto analysis. The use of this method in the design of inter-turbine transition ducts, serpentine ducts, inlets, diffusers, and similar components provides a design which reduces pressure losses and flow profile distortions.
Creating "Intelligent" Ensemble Averages Using a Process-Based Framework
NASA Astrophysics Data System (ADS)
Baker, Noel; Taylor, Patrick
2014-05-01
The CMIP5 archive contains future climate projections from over 50 models provided by dozens of modeling centers from around the world. Individual model projections, however, are subject to biases created by structural model uncertainties. As a result, ensemble averaging of multiple models is used to add value to individual model projections and construct a consensus projection. Previous reports for the IPCC establish climate change projections based on an equal-weighted average of all model projections. However, individual models reproduce certain climate processes better than other models. Should models be weighted based on performance? Unequal ensemble averages have previously been constructed using a variety of mean state metrics. What metrics are most relevant for constraining future climate projections? This project develops a framework for systematically testing metrics in models to identify optimal metrics for unequal weighting multi-model ensembles. The intention is to produce improved ("intelligent") unequal-weight ensemble averages. A unique aspect of this project is the construction and testing of climate process-based model evaluation metrics. A climate process-based metric is defined as a metric based on the relationship between two physically related climate variables—e.g., outgoing longwave radiation and surface temperature. Several climate process metrics are constructed using high-quality Earth radiation budget data from NASA's Clouds and Earth's Radiant Energy System (CERES) instrument in combination with surface temperature data sets. It is found that regional values of tested quantities can vary significantly when comparing the equal-weighted ensemble average and an ensemble weighted using the process-based metric. Additionally, this study investigates the dependence of the metric weighting scheme on the climate state using a combination of model simulations including a non-forced preindustrial control experiment, historical simulations, and several radiative forcing Representative Concentration Pathway (RCP) scenarios. Ultimately, the goal of the framework is to advise better methods for ensemble averaging models and create better climate predictions.
Evaluation of a black-footed ferret resource utilization function model
Eads, D.A.; Millspaugh, J.J.; Biggins, D.E.; Jachowski, D.S.; Livieri, T.M.
2011-01-01
Resource utilization function (RUF) models permit evaluation of potential habitat for endangered species; ideally such models should be evaluated before use in management decision-making. We evaluated the predictive capabilities of a previously developed black-footed ferret (Mustela nigripes) RUF. Using the population-level RUF, generated from ferret observations at an adjacent yet distinct colony, we predicted the distribution of ferrets within a black-tailed prairie dog (Cynomys ludovicianus) colony in the Conata Basin, South Dakota, USA. We evaluated model performance, using data collected during post-breeding spotlight surveys (2007-2008) by assessing model agreement via weighted compositional analysis and count-metrics. Compositional analysis of home range use and colony-level availability, and core area use and home range availability, demonstrated ferret selection of the predicted Very high and High occurrence categories in 2007 and 2008. Simple count-metrics corroborated these findings and suggested selection of the Very high category in 2007 and the Very high and High categories in 2008. Collectively, these results suggested that the RUF was useful in predicting occurrence and intensity of space use of ferrets at our study site, the 2 objectives of the RUF. Application of this validated RUF would increase the resolution of habitat evaluations, permitting prediction of the distribution of ferrets within distinct colonies. Additional model evaluation at other sites, on other black-tailed prairie dog colonies of varying resource configuration and size, would increase understanding of influences upon model performance and the general utility of the RUF. ?? 2011 The Wildlife Society.
Christoforou, Christoforos; Christou-Champi, Spyros; Constantinidou, Fofi; Theodorou, Maria
2015-01-01
Eye-tracking has been extensively used to quantify audience preferences in the context of marketing and advertising research, primarily in methodologies involving static images or stimuli (i.e., advertising, shelf testing, and website usability). However, these methodologies do not generalize to narrative-based video stimuli where a specific storyline is meant to be communicated to the audience. In this paper, a novel metric based on eye-gaze dispersion (both within and across viewings) that quantifies the impact of narrative-based video stimuli to the preferences of large audiences is presented. The metric is validated in predicting the performance of video advertisements aired during the 2014 Super Bowl final. In particular, the metric is shown to explain 70% of the variance in likeability scores of the 2014 Super Bowl ads as measured by the USA TODAY Ad-Meter. In addition, by comparing the proposed metric with Heart Rate Variability (HRV) indices, we have associated the metric with biological processes relating to attention allocation. The underlying idea behind the proposed metric suggests a shift in perspective when it comes to evaluating narrative-based video stimuli. In particular, it suggests that audience preferences on video are modulated by the level of viewers lack of attention allocation. The proposed metric can be calculated on any narrative-based video stimuli (i.e., movie, narrative content, emotional content, etc.), and thus has the potential to facilitate the use of such stimuli in several contexts: prediction of audience preferences of movies, quantitative assessment of entertainment pieces, prediction of the impact of movie trailers, identification of group, and individual differences in the study of attention-deficit disorders, and the study of desensitization to media violence. PMID:26029135
An annual plant growth proxy in the Mojave Desert using MODIS-EVI data
Wallace, C.S.A.; Thomas, K.A.
2008-01-01
In the arid Mojave Desert, the phenological response of vegetation is largely dependent upon the timing and amount of rainfall, and maps of annual plant cover at any one point in time can vary widely. Our study developed relative annual plant growth models as proxies for annual plant cover using metrics that captured phenological variability in Moderate-Resolution Imaging Spectroradiometer (MODIS) Enhanced Vegetation Index (EVI) satellite images. We used landscape phenologies revealed in MODIS data together with ecological knowledge of annual plant seasonality to develop a suite of metrics to describe annual growth on a yearly basis. Each of these metrics was applied to temporally-composited MODIS-EVI images to develop a relative model of annual growth. Each model was evaluated by testing how well it predicted field estimates of annual cover collected during 2003 and 2005 at the Mojave National Preserve. The best performing metric was the spring difference metric, which compared the average of three spring MODIS-EVI composites of a given year to that of 2002, a year of record drought. The spring difference metric showed correlations with annual plant cover of R2 = 0.61 for 2005 and R 2 = 0.47 for 2003. Although the correlation is moderate, we consider it supportive given the characteristics of the field data, which were collected for a different study in a localized area and are not ideal for calibration to MODIS pixels. A proxy for annual growth potential was developed from the spring difference metric of 2005 for use as an environmental data layer in desert tortoise habitat modeling. The application of the spring difference metric to other imagery years presents potential for other applications such as fuels, invasive species, and dust-emission monitoring in the Mojave Desert.
Aslan, Kerim; Gunbey, Hediye Pinar; Tomak, Leman; Ozmen, Zafer; Incesu, Lutfi
The aim of this study was to investigate whether the use of combination quantitative metrics (mamillopontine distance [MPD], pontomesencephalic angle, and mesencephalon anterior-posterior/medial-lateral diameter ratios) with qualitative signs (dural enhancement, subdural collections/hematoma, venous engorgement, pituitary gland enlargements, and tonsillar herniations) provides a more accurate diagnosis of intracranial hypotension (IH). The quantitative metrics and qualitative signs of 34 patients and 34 control subjects were assessed by 2 independent observers. Receiver operating characteristic (ROC) curve was used to evaluate the diagnostic performance of quantitative metrics and qualitative signs, and for the diagnosis of IH, optimum cutoff values of quantitative metrics were found with ROC analysis. Combined ROC curve was measured for the quantitative metrics, and qualitative signs combinations in determining diagnostic accuracy and sensitivity, specificity, and positive and negative predictive values were found, and the best model combination was formed. Whereas MPD and pontomesencephalic angle were significantly lower in patients with IH when compared with the control group (P < 0.001), mesencephalon anterior-posterior/medial-lateral diameter ratio was significantly higher (P < 0.001). For qualitative signs, the highest individual distinctive power was dural enhancement with area under the ROC curve (AUC) of 0.838. For quantitative metrics, the highest individual distinctive power was MPD with AUC of 0.947. The best accuracy in the diagnosis of IH was obtained by combination of dural enhancement, venous engorgement, and MPD with an AUC of 1.00. This study showed that the combined use of dural enhancement, venous engorgement, and MPD had diagnostic accuracy of 100 % for the diagnosis of IH. Therefore, a more accurate IH diagnosis can be provided with combination of quantitative metrics with qualitative signs.
Christoforou, Christoforos; Christou-Champi, Spyros; Constantinidou, Fofi; Theodorou, Maria
2015-01-01
Eye-tracking has been extensively used to quantify audience preferences in the context of marketing and advertising research, primarily in methodologies involving static images or stimuli (i.e., advertising, shelf testing, and website usability). However, these methodologies do not generalize to narrative-based video stimuli where a specific storyline is meant to be communicated to the audience. In this paper, a novel metric based on eye-gaze dispersion (both within and across viewings) that quantifies the impact of narrative-based video stimuli to the preferences of large audiences is presented. The metric is validated in predicting the performance of video advertisements aired during the 2014 Super Bowl final. In particular, the metric is shown to explain 70% of the variance in likeability scores of the 2014 Super Bowl ads as measured by the USA TODAY Ad-Meter. In addition, by comparing the proposed metric with Heart Rate Variability (HRV) indices, we have associated the metric with biological processes relating to attention allocation. The underlying idea behind the proposed metric suggests a shift in perspective when it comes to evaluating narrative-based video stimuli. In particular, it suggests that audience preferences on video are modulated by the level of viewers lack of attention allocation. The proposed metric can be calculated on any narrative-based video stimuli (i.e., movie, narrative content, emotional content, etc.), and thus has the potential to facilitate the use of such stimuli in several contexts: prediction of audience preferences of movies, quantitative assessment of entertainment pieces, prediction of the impact of movie trailers, identification of group, and individual differences in the study of attention-deficit disorders, and the study of desensitization to media violence.
An Annual Plant Growth Proxy in the Mojave Desert Using MODIS-EVI Data.
Wallace, Cynthia S A; Thomas, Kathryn A
2008-12-03
In the arid Mojave Desert, the phenological response of vegetation is largely dependent upon the timing and amount of rainfall, and maps of annual plant cover at any one point in time can vary widely. Our study developed relative annual plant growth models as proxies for annual plant cover using metrics that captured phenological variability in Moderate-Resolution Imaging Spectroradiometer (MODIS) Enhanced Vegetation Index (EVI) satellite images. We used landscape phenologies revealed in MODIS data together with ecological knowledge of annual plant seasonality to develop a suite of metrics to describe annual growth on a yearly basis. Each of these metrics was applied to temporally-composited MODIS-EVI images to develop a relative model of annual growth. Each model was evaluated by testing how well it predicted field estimates of annual cover collected during 2003 and 2005 at the Mojave National Preserve. The best performing metric was the spring difference metric, which compared the average of three spring MODIS-EVI composites of a given year to that of 2002, a year of record drought. The spring difference metric showed correlations with annual plant cover of R² = 0.61 for 2005 and R² = 0.47 for 2003. Although the correlation is moderate, we consider it supportive given the characteristics of the field data, which were collected for a different study in a localized area and are not ideal for calibration to MODIS pixels. A proxy for annual growth potential was developed from the spring difference metric of 2005 for use as an environmental data layer in desert tortoise habitat modeling. The application of the spring difference metric to other imagery years presents potential for other applications such as fuels, invasive species, and dust-emission monitoring in the Mojave Desert.
Implementing the Data Center Energy Productivity Metric in a High Performance Computing Data Center
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sego, Landon H.; Marquez, Andres; Rawson, Andrew
2013-06-30
As data centers proliferate in size and number, the improvement of their energy efficiency and productivity has become an economic and environmental imperative. Making these improvements requires metrics that are robust, interpretable, and practical. We discuss the properties of a number of the proposed metrics of energy efficiency and productivity. In particular, we focus on the Data Center Energy Productivity (DCeP) metric, which is the ratio of useful work produced by the data center to the energy consumed performing that work. We describe our approach for using DCeP as the principal outcome of a designed experiment using a highly instrumented,more » high-performance computing data center. We found that DCeP was successful in clearly distinguishing different operational states in the data center, thereby validating its utility as a metric for identifying configurations of hardware and software that would improve energy productivity. We also discuss some of the challenges and benefits associated with implementing the DCeP metric, and we examine the efficacy of the metric in making comparisons within a data center and between data centers.« less
Garcia Castro, Leyla Jael; Berlanga, Rafael; Garcia, Alexander
2015-10-01
Although full-text articles are provided by the publishers in electronic formats, it remains a challenge to find related work beyond the title and abstract context. Identifying related articles based on their abstract is indeed a good starting point; this process is straightforward and does not consume as many resources as full-text based similarity would require. However, further analyses may require in-depth understanding of the full content. Two articles with highly related abstracts can be substantially different regarding the full content. How similarity differs when considering title-and-abstract versus full-text and which semantic similarity metric provides better results when dealing with full-text articles are the main issues addressed in this manuscript. We have benchmarked three similarity metrics - BM25, PMRA, and Cosine, in order to determine which one performs best when using concept-based annotations on full-text documents. We also evaluated variations in similarity values based on title-and-abstract against those relying on full-text. Our test dataset comprises the Genomics track article collection from the 2005 Text Retrieval Conference. Initially, we used an entity recognition software to semantically annotate titles and abstracts as well as full-text with concepts defined in the Unified Medical Language System (UMLS®). For each article, we created a document profile, i.e., a set of identified concepts, term frequency, and inverse document frequency; we then applied various similarity metrics to those document profiles. We considered correlation, precision, recall, and F1 in order to determine which similarity metric performs best with concept-based annotations. For those full-text articles available in PubMed Central Open Access (PMC-OA), we also performed dispersion analyses in order to understand how similarity varies when considering full-text articles. We have found that the PubMed Related Articles similarity metric is the most suitable for full-text articles annotated with UMLS concepts. For similarity values above 0.8, all metrics exhibited an F1 around 0.2 and a recall around 0.1; BM25 showed the highest precision close to 1; in all cases the concept-based metrics performed better than the word-stem-based one. Our experiments show that similarity values vary when considering only title-and-abstract versus full-text similarity. Therefore, analyses based on full-text become useful when a given research requires going beyond title and abstract, particularly regarding connectivity across articles. Visualization available at ljgarcia.github.io/semsim.benchmark/, data available at http://dx.doi.org/10.5281/zenodo.13323. Copyright © 2015 Elsevier Inc. All rights reserved.
Benchmark data sets for structure-based computational target prediction.
Schomburg, Karen T; Rarey, Matthias
2014-08-25
Structure-based computational target prediction methods identify potential targets for a bioactive compound. Methods based on protein-ligand docking so far face many challenges, where the greatest probably is the ranking of true targets in a large data set of protein structures. Currently, no standard data sets for evaluation exist, rendering comparison and demonstration of improvements of methods cumbersome. Therefore, we propose two data sets and evaluation strategies for a meaningful evaluation of new target prediction methods, i.e., a small data set consisting of three target classes for detailed proof-of-concept and selectivity studies and a large data set consisting of 7992 protein structures and 72 drug-like ligands allowing statistical evaluation with performance metrics on a drug-like chemical space. Both data sets are built from openly available resources, and any information needed to perform the described experiments is reported. We describe the composition of the data sets, the setup of screening experiments, and the evaluation strategy. Performance metrics capable to measure the early recognition of enrichments like AUC, BEDROC, and NSLR are proposed. We apply a sequence-based target prediction method to the large data set to analyze its content of nontrivial evaluation cases. The proposed data sets are used for method evaluation of our new inverse screening method iRAISE. The small data set reveals the method's capability and limitations to selectively distinguish between rather similar protein structures. The large data set simulates real target identification scenarios. iRAISE achieves in 55% excellent or good enrichment a median AUC of 0.67 and RMSDs below 2.0 Å for 74% and was able to predict the first true target in 59 out of 72 cases in the top 2% of the protein data set of about 8000 structures.
McHugh, J E; Kenny, R A; Lawlor, B A; Steptoe, A; Kee, F
2017-06-01
Scant evidence is available on the discordance between loneliness and social isolation among older adults. We aimed to investigate this discordance and any health implications that it may have. Using nationally representative datasets from ageing cohorts in Ireland (TILDA) and England (ELSA), we created a metric of discordance between loneliness and social isolation, to which we refer as Social Asymmetry. This metric was the categorised difference between standardised scores on a scale of loneliness and a scale of social isolation, giving categories of: Concordantly Lonely and Isolated, Discordant: Robust to Loneliness, or Discordant: Susceptible to Loneliness. We used regression and multilevel modelling to identify potential relationships between Social Asymmetry and cognitive outcomes. Social Asymmetry predicted cognitive outcomes cross-sectionally and at a two-year follow-up, such that Discordant: Robust to Loneliness individuals were superior performers, but we failed to find evidence for Social Asymmetry as a predictor of cognitive trajectory over time. We present a new metric and preliminary evidence of a relationship with clinical outcomes. Further research validating this metric in different populations, and evaluating its relationship with other outcomes, is warranted. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Evaluating synoptic systems in the CMIP5 climate models over the Australian region
NASA Astrophysics Data System (ADS)
Gibson, Peter B.; Uotila, Petteri; Perkins-Kirkpatrick, Sarah E.; Alexander, Lisa V.; Pitman, Andrew J.
2016-10-01
Climate models are our principal tool for generating the projections used to inform climate change policy. Our confidence in projections depends, in part, on how realistically they simulate present day climate and associated variability over a range of time scales. Traditionally, climate models are less commonly assessed at time scales relevant to daily weather systems. Here we explore the utility of a self-organizing maps (SOMs) procedure for evaluating the frequency, persistence and transitions of daily synoptic systems in the Australian region simulated by state-of-the-art global climate models. In terms of skill in simulating the climatological frequency of synoptic systems, large spread was observed between models. A positive association between all metrics was found, implying that relative skill in simulating the persistence and transitions of systems is related to skill in simulating the climatological frequency. Considering all models and metrics collectively, model performance was found to be related to model horizontal resolution but unrelated to vertical resolution or representation of the stratosphere. In terms of the SOM procedure, the timespan over which evaluation was performed had some influence on model performance skill measures, as did the number of circulation types examined. These findings have implications for selecting models most useful for future projections over the Australian region, particularly for projections related to synoptic scale processes and phenomena. More broadly, this study has demonstrated the utility of the SOMs procedure in providing a process-based evaluation of climate models.
Impact of region contouring variability on image-based focal therapy evaluation
NASA Astrophysics Data System (ADS)
Gibson, Eli; Donaldson, Ian A.; Shah, Taimur T.; Hu, Yipeng; Ahmed, Hashim U.; Barratt, Dean C.
2016-03-01
Motivation: Focal therapy is an emerging low-morbidity treatment option for low-intermediate risk prostate cancer; however, challenges remain in accurately delivering treatment to specified targets and determining treatment success. Registered multi-parametric magnetic resonance imaging (MPMRI) acquired before and after treatment can support focal therapy evaluation and optimization; however, contouring variability, when defining the prostate, the clinical target volume (CTV) and the ablation region in images, reduces the precision of quantitative image-based focal therapy evaluation metrics. To inform the interpretation and clarify the limitations of such metrics, we investigated inter-observer contouring variability and its impact on four metrics. Methods: Pre-therapy and 2-week-post-therapy standard-of-care MPMRI were acquired from 5 focal cryotherapy patients. Two clinicians independently contoured, on each slice, the prostate (pre- and post-treatment) and the dominant index lesion CTV (pre-treatment) in the T2-weighted MRI, and the ablated region (post-treatment) in the dynamic-contrast- enhanced MRI. For each combination of clinician contours, post-treatment images were registered to pre-treatment images using a 3D biomechanical-model-based registration of prostate surfaces, and four metrics were computed: the proportion of the target tissue region that was ablated and the target:ablated region volume ratio for each of two targets (the CTV and an expanded planning target volume). Variance components analysis was used to measure the contribution of each type of contour to the variance in the therapy evaluation metrics. Conclusions: 14-23% of evaluation metric variance was attributable to contouring variability (including 6-12% from ablation region contouring); reducing this variability could improve the precision of focal therapy evaluation metrics.
ERIC Educational Resources Information Center
Ford, Jeremy W.; Missall, Kristen N.; Hosp, John L.; Kuhle, Jennifer L.
2016-01-01
Advances in maze selection curriculum-based measurement have led to several published tools with technical information for interpretation (e.g., norms, benchmarks, cut-scores, classification accuracy) that have increased their usefulness for universal screening. A range of scoring practices have emerged for evaluating student performance on maze…
The Changing Functions of Citation: From Knowledge Networking to Academic Cash-Value
ERIC Educational Resources Information Center
Burbules, Nicholas C.
2015-01-01
This essay reviews the changing functions, and effects, of citation systems in scholarly research as they move from a range of uses primarily oriented around knowledge networking and epistemic validation, to their use as a set of metrics oriented around evaluating and rewarding certain kinds of academic performance (e.g. "impact…
Using Green Star Metrics to Optimize the Greenness of Literature Protocols for Syntheses
ERIC Educational Resources Information Center
Duarte, Rita C. C.; Ribeiro, M. Gabriela T. C.; Machado, Adélio A. S. C.
2015-01-01
A procedure to improve the greenness of a synthesis, without performing laboratory work, using alternative protocols available in the literature is presented. The greenness evaluation involves the separate assessment of the different steps described in the available protocols--reaction, isolation, and purification--as well as the global process,…