benchmark test set: Topics by Science.gov

Sample records for benchmark test set

Validation of tsunami inundation model TUNA-RP using OAR-PMEL-135 benchmark problem set

NASA Astrophysics Data System (ADS)

Koh, H. L.; Teh, S. Y.; Tan, W. K.; Kh'ng, X. Y.

2017-05-01

A standard set of benchmark problems, known as OAR-PMEL-135, is developed by the US National Tsunami Hazard Mitigation Program for tsunami inundation model validation. Any tsunami inundation model must be tested for its accuracy and capability using this standard set of benchmark problems before it can be gainfully used for inundation simulation. The authors have previously developed an in-house tsunami inundation model known as TUNA-RP. This inundation model solves the two-dimensional nonlinear shallow water equations coupled with a wet-dry moving boundary algorithm. This paper presents the validation of TUNA-RP against the solutions provided in the OAR-PMEL-135 benchmark problem set. This benchmark validation testing shows that TUNA-RP can indeed perform inundation simulation with accuracy consistent with that in the tested benchmark problem set.
Test One to Test Many: A Unified Approach to Quantum Benchmarks

NASA Astrophysics Data System (ADS)

Bai, Ge; Chiribella, Giulio

2018-04-01

Quantum benchmarks are routinely used to validate the experimental demonstration of quantum information protocols. Many relevant protocols, however, involve an infinite set of input states, of which only a finite subset can be used to test the quality of the implementation. This is a problem, because the benchmark for the finitely many states used in the test can be higher than the original benchmark calculated for infinitely many states. This situation arises in the teleportation and storage of coherent states, for which the benchmark of 50% fidelity is commonly used in experiments, although finite sets of coherent states normally lead to higher benchmarks. Here, we show that the average fidelity over all coherent states can be indirectly probed with a single setup, requiring only two-mode squeezing, a 50-50 beam splitter, and homodyne detection. Our setup enables a rigorous experimental validation of quantum teleportation, storage, amplification, attenuation, and purification of noisy coherent states. More generally, we prove that every quantum benchmark can be tested by preparing a single entangled state and measuring a single observable.
Method and system for benchmarking computers

DOEpatents

Gustafson, John L.

1993-09-14

A testing system and method for benchmarking computer systems. The system includes a store containing a scalable set of tasks to be performed to produce a solution in ever-increasing degrees of resolution as a larger number of the tasks are performed. A timing and control module allots to each computer a fixed benchmarking interval in which to perform the stored tasks. Means are provided for determining, after completion of the benchmarking interval, the degree of progress through the scalable set of tasks and for producing a benchmarking rating relating to the degree of progress for each computer.
Elementary School Students' Science Talk Ability in Inquiry-Oriented Settings in Taiwan: Test Development, Verification, and Performance Benchmarks

ERIC Educational Resources Information Center

Lin, Sheau-Wen; Liu, Yu; Chen, Shin-Feng; Wang, Jing-Ru; Kao, Huey-Lien

2016-01-01

The purpose of this study was to develop a computer-based measure of elementary students' science talk and to report students' benchmarks. The development procedure had three steps: defining the framework of the test, collecting and identifying key reference sets of science talk, and developing and verifying the science talk instrument. The…
Experimental Data from the Benchmark SuperCritical Wing Wind Tunnel Test on an Oscillating Turntable

NASA Technical Reports Server (NTRS)

Heeg, Jennifer; Piatak, David J.

2013-01-01

The Benchmark SuperCritical Wing (BSCW) wind tunnel model served as a semi-blind testcase for the 2012 AIAA Aeroelastic Prediction Workshop (AePW). The BSCW was chosen as a testcase due to its geometric simplicity and flow physics complexity. The data sets examined include unforced system information and forced pitching oscillations. The aerodynamic challenges presented by this AePW testcase include a strong shock that was observed to be unsteady for even the unforced system cases, shock-induced separation and trailing edge separation. The current paper quantifies these characteristics at the AePW test condition and at a suggested benchmarking test condition. General characteristics of the model's behavior are examined for the entire available data set.
Accumulo/Hadoop, MongoDB, and Elasticsearch Performance for Semi Structured Intrusion Detection (IDS) Data

DTIC Science & Technology

2016-11-01

iii Contents List of Figures v 1. Introduction 1 2. Background 1 3. Yahoo ! Cloud Serving Benchmark (YCSB) 2 3.1 Data Loading and Performance...transactional system. 3. Yahoo ! Cloud Serving Benchmark (YCSB) 3.1 Data Loading and Performance Testing Framework When originally setting out to perform the...that referred to a data loading and performance testing framework, Yahoo ! Cloud Serving Benchmark (YCSB).12 This framework is freely available and
Benchmark dataset for undirected and Mixed Capacitated Arc Routing Problems under Time restrictions with Intermediate Facilities.

PubMed

Willemse, Elias J; Joubert, Johan W

2016-09-01

In this article we present benchmark datasets for the Mixed Capacitated Arc Routing Problem under Time restrictions with Intermediate Facilities (MCARPTIF). The problem is a generalisation of the Capacitated Arc Routing Problem (CARP), and closely represents waste collection routing. Four different test sets are presented, each consisting of multiple instance files, and which can be used to benchmark different solution approaches for the MCARPTIF. An in-depth description of the datasets can be found in "Constructive heuristics for the Mixed Capacity Arc Routing Problem under Time Restrictions with Intermediate Facilities" (Willemseand Joubert, 2016) [2] and "Splitting procedures for the Mixed Capacitated Arc Routing Problem under Time restrictions with Intermediate Facilities" (Willemseand Joubert, in press) [4]. The datasets are publicly available from "Library of benchmark test sets for variants of the Capacitated Arc Routing Problem under Time restrictions with Intermediate Facilities" (Willemse and Joubert, 2016) [3].
Benchmark problems for numerical implementations of phase field models

DOE PAGES

Jokisaari, A. M.; Voorhees, P. W.; Guyer, J. E.; ...

2016-10-01

Here, we present the first set of benchmark problems for phase field models that are being developed by the Center for Hierarchical Materials Design (CHiMaD) and the National Institute of Standards and Technology (NIST). While many scientific research areas use a limited set of well-established software, the growing phase field community continues to develop a wide variety of codes and lacks benchmark problems to consistently evaluate the numerical performance of new implementations. Phase field modeling has become significantly more popular as computational power has increased and is now becoming mainstream, driving the need for benchmark problems to validate and verifymore » new implementations. We follow the example set by the micromagnetics community to develop an evolving set of benchmark problems that test the usability, computational resources, numerical capabilities and physical scope of phase field simulation codes. In this paper, we propose two benchmark problems that cover the physics of solute diffusion and growth and coarsening of a second phase via a simple spinodal decomposition model and a more complex Ostwald ripening model. We demonstrate the utility of benchmark problems by comparing the results of simulations performed with two different adaptive time stepping techniques, and we discuss the needs of future benchmark problems. The development of benchmark problems will enable the results of quantitative phase field models to be confidently incorporated into integrated computational materials science and engineering (ICME), an important goal of the Materials Genome Initiative.« less
Open Rotor - Analysis of Diagnostic Data

NASA Technical Reports Server (NTRS)

Envia, Edmane

2011-01-01

NASA is researching open rotor propulsion as part of its technology research and development plan for addressing the subsonic transport aircraft noise, emission and fuel burn goals. The low-speed wind tunnel test for investigating the aerodynamic and acoustic performance of a benchmark blade set at the approach and takeoff conditions has recently concluded. A high-speed wind tunnel diagnostic test campaign has begun to investigate the performance of this benchmark open rotor blade set at the cruise condition. Databases from both speed regimes will comprise a comprehensive collection of benchmark open rotor data for use in assessing/validating aerodynamic and noise prediction tools (component & system level) as well as providing insights into the physics of open rotors to help guide the development of quieter open rotors.
SkData: data sets and algorithm evaluation protocols in Python

NASA Astrophysics Data System (ADS)

Bergstra, James; Pinto, Nicolas; Cox, David D.

2015-01-01

Machine learning benchmark data sets come in all shapes and sizes, whereas classification algorithms assume sanitized input, such as (x, y) pairs with vector-valued input x and integer class label y. Researchers and practitioners know all too well how tedious it can be to get from the URL of a new data set to a NumPy ndarray suitable for e.g. pandas or sklearn. The SkData library handles that work for a growing number of benchmark data sets (small and large) so that one-off in-house scripts for downloading and parsing data sets can be replaced with library code that is reliable, community-tested, and documented. The SkData library also introduces an open-ended formalization of training and testing protocols that facilitates direct comparison with published research. This paper describes the usage and architecture of the SkData library.
Benchmark Testing of a New 56Fe Evaluation for Criticality Safety Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Leal, Luiz C; Ivanov, E.

2015-01-01

The SAMMY code was used to evaluate resonance parameters of the 56Fe cross section in the resolved resonance energy range of 0–2 MeV using transmission data, capture, elastic, inelastic, and double differential elastic cross sections. The resonance analysis was performed with the code SAMMY that fits R-matrix resonance parameters using the generalized least-squares technique (Bayes’ theory). The evaluation yielded a set of resonance parameters that reproduced the experimental data very well, along with a resonance parameter covariance matrix for data uncertainty calculations. Benchmark tests were conducted to assess the evaluation performance in benchmark calculations.
Aeroelasticity Benchmark Assessment: Subsonic Fixed Wing Program

NASA Technical Reports Server (NTRS)

Florance, Jennifer P.; Chwalowski, Pawel; Wieseman, Carol D.

2010-01-01

The fundamental technical challenge in computational aeroelasticity is the accurate prediction of unsteady aerodynamic phenomena and the effect on the aeroelastic response of a vehicle. Currently, a benchmarking standard for use in validating the accuracy of computational aeroelasticity codes does not exist. Many aeroelastic data sets have been obtained in wind-tunnel and flight testing throughout the world; however, none have been globally presented or accepted as an ideal data set. There are numerous reasons for this. One reason is that often, such aeroelastic data sets focus on the aeroelastic phenomena alone (flutter, for example) and do not contain associated information such as unsteady pressures and time-correlated structural dynamic deflections. Other available data sets focus solely on the unsteady pressures and do not address the aeroelastic phenomena. Other discrepancies can include omission of relevant data, such as flutter frequency and / or the acquisition of only qualitative deflection data. In addition to these content deficiencies, all of the available data sets present both experimental and computational technical challenges. Experimental issues include facility influences, nonlinearities beyond those being modeled, and data processing. From the computational perspective, technical challenges include modeling geometric complexities, coupling between the flow and the structure, grid issues, and boundary conditions. The Aeroelasticity Benchmark Assessment task seeks to examine the existing potential experimental data sets and ultimately choose the one that is viewed as the most suitable for computational benchmarking. An initial computational evaluation of that configuration will then be performed using the Langley-developed computational fluid dynamics (CFD) software FUN3D1 as part of its code validation process. In addition to the benchmarking activity, this task also includes an examination of future research directions. Researchers within the Aeroelasticity Branch will examine other experimental efforts within the Subsonic Fixed Wing (SFW) program (such as testing of the NASA Common Research Model (CRM)) and other NASA programs and assess aeroelasticity issues and research topics.
HPC Analytics Support. Requirements for Uncertainty Quantification Benchmarks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Paulson, Patrick R.; Purohit, Sumit; Rodriguez, Luke R.

2015-05-01

This report outlines techniques for extending benchmark generation products so they support uncertainty quantification by benchmarked systems. We describe how uncertainty quantification requirements can be presented to candidate analytical tools supporting SPARQL. We describe benchmark data sets for evaluating uncertainty quantification, as well as an approach for using our benchmark generator to produce data sets for generating benchmark data sets.
The adenosine triphosphate test is a rapid and reliable audit tool to assess manual cleaning adequacy of flexible endoscope channels.

PubMed

Alfa, Michelle J; Fatima, Iram; Olson, Nancy

2013-03-01

The study objective was to verify that the adenosine triphosphate (ATP) benchmark of <200 relative light units (RLUs) was achievable in a busy endoscopy clinic that followed the manufacturer's manual cleaning instructions. All channels from patient-used colonoscopes (20) and duodenoscopes (20) in a tertiary care hospital endoscopy clinic were sampled after manual cleaning and tested for residual ATP. The ATP test benchmark for adequate manual cleaning was set at <200 RLUs. The benchmark for protein was <6.4 μg/cm(2), and, for bioburden, it was <4-log10 colony-forming units/cm(2). Our data demonstrated that 96% (115/120) of channels from 20 colonoscopes and 20 duodenoscopes evaluated met the ATP benchmark of <200 RLUs. The 5 channels that exceeded 200 RLUs were all elevator guide-wire channels. All 120 of the manually cleaned endoscopes tested had protein and bioburden levels that were compliant with accepted benchmarks for manual cleaning for suction-biopsy, air-water, and auxiliary water channels. Our data confirmed that, by following the endoscope manufacturer's manual cleaning recommendations, 96% of channels in gastrointestinal endoscopes would have <200 RLUs for the ATP test kit evaluated and would meet the accepted clean benchmarks for protein and bioburden. Copyright © 2013 Association for Professionals in Infection Control and Epidemiology, Inc. Published by Mosby, Inc. All rights reserved.
Principal Angle Enrichment Analysis (PAEA): Dimensionally Reduced Multivariate Gene Set Enrichment Analysis Tool

PubMed Central

Clark, Neil R.; Szymkiewicz, Maciej; Wang, Zichen; Monteiro, Caroline D.; Jones, Matthew R.; Ma’ayan, Avi

2016-01-01

Gene set analysis of differential expression, which identifies collectively differentially expressed gene sets, has become an important tool for biology. The power of this approach lies in its reduction of the dimensionality of the statistical problem and its incorporation of biological interpretation by construction. Many approaches to gene set analysis have been proposed, but benchmarking their performance in the setting of real biological data is difficult due to the lack of a gold standard. In a previously published work we proposed a geometrical approach to differential expression which performed highly in benchmarking tests and compared well to the most popular methods of differential gene expression. As reported, this approach has a natural extension to gene set analysis which we call Principal Angle Enrichment Analysis (PAEA). PAEA employs dimensionality reduction and a multivariate approach for gene set enrichment analysis. However, the performance of this method has not been assessed nor its implementation as a web-based tool. Here we describe new benchmarking protocols for gene set analysis methods and find that PAEA performs highly. The PAEA method is implemented as a user-friendly web-based tool, which contains 70 gene set libraries and is freely available to the community. PMID:26848405
Principal Angle Enrichment Analysis (PAEA): Dimensionally Reduced Multivariate Gene Set Enrichment Analysis Tool.

PubMed

Clark, Neil R; Szymkiewicz, Maciej; Wang, Zichen; Monteiro, Caroline D; Jones, Matthew R; Ma'ayan, Avi

2015-11-01

Gene set analysis of differential expression, which identifies collectively differentially expressed gene sets, has become an important tool for biology. The power of this approach lies in its reduction of the dimensionality of the statistical problem and its incorporation of biological interpretation by construction. Many approaches to gene set analysis have been proposed, but benchmarking their performance in the setting of real biological data is difficult due to the lack of a gold standard. In a previously published work we proposed a geometrical approach to differential expression which performed highly in benchmarking tests and compared well to the most popular methods of differential gene expression. As reported, this approach has a natural extension to gene set analysis which we call Principal Angle Enrichment Analysis (PAEA). PAEA employs dimensionality reduction and a multivariate approach for gene set enrichment analysis. However, the performance of this method has not been assessed nor its implementation as a web-based tool. Here we describe new benchmarking protocols for gene set analysis methods and find that PAEA performs highly. The PAEA method is implemented as a user-friendly web-based tool, which contains 70 gene set libraries and is freely available to the community.
Benchmarks for target tracking

NASA Astrophysics Data System (ADS)

Dunham, Darin T.; West, Philip D.

2011-09-01

The term benchmark originates from the chiseled horizontal marks that surveyors made, into which an angle-iron could be placed to bracket ("bench") a leveling rod, thus ensuring that the leveling rod can be repositioned in exactly the same place in the future. A benchmark in computer terms is the result of running a computer program, or a set of programs, in order to assess the relative performance of an object by running a number of standard tests and trials against it. This paper will discuss the history of simulation benchmarks that are being used by multiple branches of the military and agencies of the US government. These benchmarks range from missile defense applications to chemical biological situations. Typically, a benchmark is used with Monte Carlo runs in order to tease out how algorithms deal with variability and the range of possible inputs. We will also describe problems that can be solved by a benchmark.
Cloud-Based Evaluation of Anatomical Structure Segmentation and Landmark Detection Algorithms: VISCERAL Anatomy Benchmarks.

PubMed

Jimenez-Del-Toro, Oscar; Muller, Henning; Krenn, Markus; Gruenberg, Katharina; Taha, Abdel Aziz; Winterstein, Marianne; Eggel, Ivan; Foncubierta-Rodriguez, Antonio; Goksel, Orcun; Jakab, Andras; Kontokotsios, Georgios; Langs, Georg; Menze, Bjoern H; Salas Fernandez, Tomas; Schaer, Roger; Walleyo, Anna; Weber, Marc-Andre; Dicente Cid, Yashin; Gass, Tobias; Heinrich, Mattias; Jia, Fucang; Kahl, Fredrik; Kechichian, Razmig; Mai, Dominic; Spanier, Assaf B; Vincent, Graham; Wang, Chunliang; Wyeth, Daniel; Hanbury, Allan

2016-11-01

Variations in the shape and appearance of anatomical structures in medical images are often relevant radiological signs of disease. Automatic tools can help automate parts of this manual process. A cloud-based evaluation framework is presented in this paper including results of benchmarking current state-of-the-art medical imaging algorithms for anatomical structure segmentation and landmark detection: the VISCERAL Anatomy benchmarks. The algorithms are implemented in virtual machines in the cloud where participants can only access the training data and can be run privately by the benchmark administrators to objectively compare their performance in an unseen common test set. Overall, 120 computed tomography and magnetic resonance patient volumes were manually annotated to create a standard Gold Corpus containing a total of 1295 structures and 1760 landmarks. Ten participants contributed with automatic algorithms for the organ segmentation task, and three for the landmark localization task. Different algorithms obtained the best scores in the four available imaging modalities and for subsets of anatomical structures. The annotation framework, resulting data set, evaluation setup, results and performance analysis from the three VISCERAL Anatomy benchmarks are presented in this article. Both the VISCERAL data set and Silver Corpus generated with the fusion of the participant algorithms on a larger set of non-manually-annotated medical images are available to the research community.
Sequoia Messaging Rate Benchmark

DOE Office of Scientific and Technical Information (OSTI.GOV)

Friedley, Andrew

2008-01-22

The purpose of this benchmark is to measure the maximal message rate of a single compute node. The first num_cores ranks are expected to reside on the 'core' compute node for which message rate is being tested. After that, the next num_nbors ranks are neighbors for the first core rank, the next set of num_nbors ranks are neighbors for the second core rank, and so on. For example, testing an 8-core node (num_cores = 8) with 4 neighbors (num_nbors = 4) requires 8 + 8 * 4 - 40 ranks. The first 8 of those 40 ranks are expected tomore » be on the 'core' node being benchmarked, while the rest of the ranks are on separate nodes.« less
Bias-Free Chemically Diverse Test Sets from Machine Learning.

PubMed

Swann, Ellen T; Fernandez, Michael; Coote, Michelle L; Barnard, Amanda S

2017-08-14

Current benchmarking methods in quantum chemistry rely on databases that are built using a chemist's intuition. It is not fully understood how diverse or representative these databases truly are. Multivariate statistical techniques like archetypal analysis and K-means clustering have previously been used to summarize large sets of nanoparticles however molecules are more diverse and not as easily characterized by descriptors. In this work, we compare three sets of descriptors based on the one-, two-, and three-dimensional structure of a molecule. Using data from the NIST Computational Chemistry Comparison and Benchmark Database and machine learning techniques, we demonstrate the functional relationship between these structural descriptors and the electronic energy of molecules. Archetypes and prototypes found with topological or Coulomb matrix descriptors can be used to identify smaller, statistically significant test sets that better capture the diversity of chemical space. We apply this same method to find a diverse subset of organic molecules to demonstrate how the methods can easily be reapplied to individual research projects. Finally, we use our bias-free test sets to assess the performance of density functional theory and quantum Monte Carlo methods.

ICSBEP Benchmarks For Nuclear Data Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Briggs, J. Blair

2005-05-24

The International Criticality Safety Benchmark Evaluation Project (ICSBEP) was initiated in 1992 by the United States Department of Energy. The ICSBEP became an official activity of the Organization for Economic Cooperation and Development (OECD) -- Nuclear Energy Agency (NEA) in 1995. Representatives from the United States, United Kingdom, France, Japan, the Russian Federation, Hungary, Republic of Korea, Slovenia, Serbia and Montenegro (formerly Yugoslavia), Kazakhstan, Spain, Israel, Brazil, Poland, and the Czech Republic are now participating. South Africa, India, China, and Germany are considering participation. The purpose of the ICSBEP is to identify, evaluate, verify, and formally document a comprehensive andmore » internationally peer-reviewed set of criticality safety benchmark data. The work of the ICSBEP is published as an OECD handbook entitled ''International Handbook of Evaluated Criticality Safety Benchmark Experiments.'' The 2004 Edition of the Handbook contains benchmark specifications for 3331 critical or subcritical configurations that are intended for use in validation efforts and for testing basic nuclear data. New to the 2004 Edition of the Handbook is a draft criticality alarm / shielding type benchmark that should be finalized in 2005 along with two other similar benchmarks. The Handbook is being used extensively for nuclear data testing and is expected to be a valuable resource for code and data validation and improvement efforts for decades to come. Specific benchmarks that are useful for testing structural materials such as iron, chromium, nickel, and manganese; beryllium; lead; thorium; and 238U are highlighted.« less
Radiation Detection Computational Benchmark Scenarios

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shaver, Mark W.; Casella, Andrew M.; Wittman, Richard S.

2013-09-24

Modeling forms an important component of radiation detection development, allowing for testing of new detector designs, evaluation of existing equipment against a wide variety of potential threat sources, and assessing operation performance of radiation detection systems. This can, however, result in large and complex scenarios which are time consuming to model. A variety of approaches to radiation transport modeling exist with complementary strengths and weaknesses for different problems. This variety of approaches, and the development of promising new tools (such as ORNL’s ADVANTG) which combine benefits of multiple approaches, illustrates the need for a means of evaluating or comparing differentmore » techniques for radiation detection problems. This report presents a set of 9 benchmark problems for comparing different types of radiation transport calculations, identifying appropriate tools for classes of problems, and testing and guiding the development of new methods. The benchmarks were drawn primarily from existing or previous calculations with a preference for scenarios which include experimental data, or otherwise have results with a high level of confidence, are non-sensitive, and represent problem sets of interest to NA-22. From a technical perspective, the benchmarks were chosen to span a range of difficulty and to include gamma transport, neutron transport, or both and represent different important physical processes and a range of sensitivity to angular or energy fidelity. Following benchmark identification, existing information about geometry, measurements, and previous calculations were assembled. Monte Carlo results (MCNP decks) were reviewed or created and re-run in order to attain accurate computational times and to verify agreement with experimental data, when present. Benchmark information was then conveyed to ORNL in order to guide testing and development of hybrid calculations. The results of those ADVANTG calculations were then sent to PNNL for compilation. This is a report describing the details of the selected Benchmarks and results from various transport codes.« less
Benchmarking Diagnostic Algorithms on an Electrical Power System Testbed

NASA Technical Reports Server (NTRS)

Kurtoglu, Tolga; Narasimhan, Sriram; Poll, Scott; Garcia, David; Wright, Stephanie

2009-01-01

Diagnostic algorithms (DAs) are key to enabling automated health management. These algorithms are designed to detect and isolate anomalies of either a component or the whole system based on observations received from sensors. In recent years a wide range of algorithms, both model-based and data-driven, have been developed to increase autonomy and improve system reliability and affordability. However, the lack of support to perform systematic benchmarking of these algorithms continues to create barriers for effective development and deployment of diagnostic technologies. In this paper, we present our efforts to benchmark a set of DAs on a common platform using a framework that was developed to evaluate and compare various performance metrics for diagnostic technologies. The diagnosed system is an electrical power system, namely the Advanced Diagnostics and Prognostics Testbed (ADAPT) developed and located at the NASA Ames Research Center. The paper presents the fundamentals of the benchmarking framework, the ADAPT system, description of faults and data sets, the metrics used for evaluation, and an in-depth analysis of benchmarking results obtained from testing ten diagnostic algorithms on the ADAPT electrical power system testbed.
Applicability domains for classification problems: benchmarking of distance to models for AMES mutagenicity set

EPA Science Inventory

For QSAR and QSPR modeling of biological and physicochemical properties, estimating the accuracy of predictions is a critical problem. The “distance to model” (DM) can be defined as a metric that defines the similarity between the training set molecules and the test set compound ...
Benchmarking methods and data sets for ligand enrichment assessment in virtual screening.

PubMed

Xia, Jie; Tilahun, Ermias Lemma; Reid, Terry-Elinor; Zhang, Liangren; Wang, Xiang Simon

2015-01-01

Retrospective small-scale virtual screening (VS) based on benchmarking data sets has been widely used to estimate ligand enrichments of VS approaches in the prospective (i.e. real-world) efforts. However, the intrinsic differences of benchmarking sets to the real screening chemical libraries can cause biased assessment. Herein, we summarize the history of benchmarking methods as well as data sets and highlight three main types of biases found in benchmarking sets, i.e. "analogue bias", "artificial enrichment" and "false negative". In addition, we introduce our recent algorithm to build maximum-unbiased benchmarking sets applicable to both ligand-based and structure-based VS approaches, and its implementations to three important human histone deacetylases (HDACs) isoforms, i.e. HDAC1, HDAC6 and HDAC8. The leave-one-out cross-validation (LOO CV) demonstrates that the benchmarking sets built by our algorithm are maximum-unbiased as measured by property matching, ROC curves and AUCs. Copyright © 2014 Elsevier Inc. All rights reserved.
Benchmarking Methods and Data Sets for Ligand Enrichment Assessment in Virtual Screening

PubMed Central

Xia, Jie; Tilahun, Ermias Lemma; Reid, Terry-Elinor; Zhang, Liangren; Wang, Xiang Simon

2014-01-01

Retrospective small-scale virtual screening (VS) based on benchmarking data sets has been widely used to estimate ligand enrichments of VS approaches in the prospective (i.e. real-world) efforts. However, the intrinsic differences of benchmarking sets to the real screening chemical libraries can cause biased assessment. Herein, we summarize the history of benchmarking methods as well as data sets and highlight three main types of biases found in benchmarking sets, i.e. “analogue bias”, “artificial enrichment” and “false negative”. In addition, we introduced our recent algorithm to build maximum-unbiased benchmarking sets applicable to both ligand-based and structure-based VS approaches, and its implementations to three important human histone deacetylase (HDAC) isoforms, i.e. HDAC1, HDAC6 and HDAC8. The Leave-One-Out Cross-Validation (LOO CV) demonstrates that the benchmarking sets built by our algorithm are maximum-unbiased in terms of property matching, ROC curves and AUCs. PMID:25481478
A time-implicit numerical method and benchmarks for the relativistic Vlasov–Ampere equations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carrie, Michael; Shadwick, B. A.

2016-01-04

Here, we present a time-implicit numerical method to solve the relativistic Vlasov–Ampere system of equations on a two dimensional phase space grid. The time-splitting algorithm we use allows the generalization of the work presented here to higher dimensions keeping the linear aspect of the resulting discrete set of equations. The implicit method is benchmarked against linear theory results for the relativistic Landau damping for which analytical expressions using the Maxwell-Juttner distribution function are derived. We note that, independently from the shape of the distribution function, the relativistic treatment features collective behaviors that do not exist in the non relativistic case.more » The numerical study of the relativistic two-stream instability completes the set of benchmarking tests.« less
A time-implicit numerical method and benchmarks for the relativistic Vlasov–Ampere equations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carrié, Michael, E-mail: mcarrie2@unl.edu; Shadwick, B. A., E-mail: shadwick@mailaps.org

2016-01-15

We present a time-implicit numerical method to solve the relativistic Vlasov–Ampere system of equations on a two dimensional phase space grid. The time-splitting algorithm we use allows the generalization of the work presented here to higher dimensions keeping the linear aspect of the resulting discrete set of equations. The implicit method is benchmarked against linear theory results for the relativistic Landau damping for which analytical expressions using the Maxwell-Jüttner distribution function are derived. We note that, independently from the shape of the distribution function, the relativistic treatment features collective behaviours that do not exist in the nonrelativistic case. The numericalmore » study of the relativistic two-stream instability completes the set of benchmarking tests.« less
Development and testing of the VITAMIN-B7/BUGLE-B7 coupled neutron-gamma multigroup cross-section libraries

DOE Office of Scientific and Technical Information (OSTI.GOV)

Risner, J.M.; Wiarda, D.; Miller, T.M.

2011-07-01

The U.S. Nuclear Regulatory Commission's Regulatory Guide 1.190 states that calculational methods used to estimate reactor pressure vessel (RPV) fluence should use the latest version of the evaluated nuclear data file (ENDF). The VITAMIN-B6 fine-group library and BUGLE-96 broad-group library, which are widely used for RPV fluence calculations, were generated using ENDF/B-VI.3 data, which was the most current data when Regulatory Guide 1.190 was issued. We have developed new fine-group (VITAMIN-B7) and broad-group (BUGLE-B7) libraries based on ENDF/B-VII.0. These new libraries, which were processed using the AMPX code system, maintain the same group structures as the VITAMIN-B6 and BUGLE-96 libraries.more » Verification and validation of the new libraries were accomplished using diagnostic checks in AMPX, 'unit tests' for each element in VITAMIN-B7, and a diverse set of benchmark experiments including critical evaluations for fast and thermal systems, a set of experimental benchmarks that are used for SCALE regression tests, and three RPV fluence benchmarks. The benchmark evaluation results demonstrate that VITAMIN-B7 and BUGLE-B7 are appropriate for use in RPV fluence calculations and meet the calculational uncertainty criterion in Regulatory Guide 1.190. (authors)« less
Development and Testing of the VITAMIN-B7/BUGLE-B7 Coupled Neutron-Gamma Multigroup Cross-Section Libraries

DOE Office of Scientific and Technical Information (OSTI.GOV)

Risner, Joel M; Wiarda, Dorothea; Miller, Thomas Martin

2011-01-01

The U.S. Nuclear Regulatory Commission s Regulatory Guide 1.190 states that calculational methods used to estimate reactor pressure vessel (RPV) fluence should use the latest version of the Evaluated Nuclear Data File (ENDF). The VITAMIN-B6 fine-group library and BUGLE-96 broad-group library, which are widely used for RPV fluence calculations, were generated using ENDF/B-VI data, which was the most current data when Regulatory Guide 1.190 was issued. We have developed new fine-group (VITAMIN-B7) and broad-group (BUGLE-B7) libraries based on ENDF/B-VII. These new libraries, which were processed using the AMPX code system, maintain the same group structures as the VITAMIN-B6 and BUGLE-96more » libraries. Verification and validation of the new libraries was accomplished using diagnostic checks in AMPX, unit tests for each element in VITAMIN-B7, and a diverse set of benchmark experiments including critical evaluations for fast and thermal systems, a set of experimental benchmarks that are used for SCALE regression tests, and three RPV fluence benchmarks. The benchmark evaluation results demonstrate that VITAMIN-B7 and BUGLE-B7 are appropriate for use in LWR shielding applications, and meet the calculational uncertainty criterion in Regulatory Guide 1.190.« less
Benchmarking Data Sets for the Evaluation of Virtual Ligand Screening Methods: Review and Perspectives.

PubMed

Lagarde, Nathalie; Zagury, Jean-François; Montes, Matthieu

2015-07-27

Virtual screening methods are commonly used nowadays in drug discovery processes. However, to ensure their reliability, they have to be carefully evaluated. The evaluation of these methods is often realized in a retrospective way, notably by studying the enrichment of benchmarking data sets. To this purpose, numerous benchmarking data sets were developed over the years, and the resulting improvements led to the availability of high quality benchmarking data sets. However, some points still have to be considered in the selection of the active compounds, decoys, and protein structures to obtain optimal benchmarking data sets.
A new numerical benchmark of a freshwater lens

NASA Astrophysics Data System (ADS)

Stoeckl, L.; Walther, M.; Graf, T.

2016-04-01

A numerical benchmark for 2-D variable-density flow and solute transport in a freshwater lens is presented. The benchmark is based on results of laboratory experiments conducted by Stoeckl and Houben (2012) using a sand tank on the meter scale. This benchmark describes the formation and degradation of a freshwater lens over time as it can be found under real-world islands. An error analysis gave the appropriate spatial and temporal discretization of 1 mm and 8.64 s, respectively. The calibrated parameter set was obtained using the parameter estimation tool PEST. Comparing density-coupled and density-uncoupled results showed that the freshwater-saltwater interface position is strongly dependent on density differences. A benchmark that adequately represents saltwater intrusion and that includes realistic features of coastal aquifers or freshwater lenses was lacking. This new benchmark was thus developed and is demonstrated to be suitable to test variable-density groundwater models applied to saltwater intrusion investigations.
Note: The performance of new density functionals for a recent blind test of non-covalent interactions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mardirossian, Narbe; Head-Gordon, Martin

Benchmark datasets of non-covalent interactions are essential for assessing the performance of density functionals and other quantum chemistry approaches. In a recent blind test, Taylor et al. benchmarked 14 methods on a new dataset consisting of 10 dimer potential energy curves calculated using coupled cluster with singles, doubles, and perturbative triples (CCSD(T)) at the complete basis set (CBS) limit (80 data points in total). Finally, the dataset is particularly interesting because compressed, near-equilibrium, and stretched regions of the potential energy surface are extensively sampled.
Note: The performance of new density functionals for a recent blind test of non-covalent interactions

DOE PAGES

Mardirossian, Narbe; Head-Gordon, Martin

2016-11-09

Benchmark datasets of non-covalent interactions are essential for assessing the performance of density functionals and other quantum chemistry approaches. In a recent blind test, Taylor et al. benchmarked 14 methods on a new dataset consisting of 10 dimer potential energy curves calculated using coupled cluster with singles, doubles, and perturbative triples (CCSD(T)) at the complete basis set (CBS) limit (80 data points in total). Finally, the dataset is particularly interesting because compressed, near-equilibrium, and stretched regions of the potential energy surface are extensively sampled.
Benchmarking In-Flight Icing Detection Products for Future Upgrades

NASA Technical Reports Server (NTRS)

Politovich, M. K.; Minnis, P.; Johnson, D. B.; Wolff, C. A.; Chapman, M.; Heck, P. W.; Haggerty, J. A.

2004-01-01

This paper summarizes the results of a benchmarking exercise conducted as part of the NASA supported Advanced Satellite Aviation-Weather Products (ASAP) Program. The goal of ASAP is to increase and optimize the use of satellite data sets within the existing FAA Aviation Weather Research Program (AWRP) Product Development Team (PDT) structure and to transfer advanced satellite expertise to the PDTs. Currently, ASAP fosters collaborative efforts between NASA Laboratories, the University of Wisconsin Cooperative Institute for Meteorological Satellite Studies (UW-CIMSS), the University of Alabama in Huntsville (UAH), and the AWRP PDTs. This collaboration involves the testing and evaluation of existing satellite algorithms developed or proposed by AWRP teams, the introduction of new techniques and data sets to the PDTs from the satellite community, and enhanced access to new satellite data sets available through CIMSS and NASA Langley Research Center for evaluation and testing.
Demonstration of a software design and statistical analysis methodology with application to patient outcomes data sets

PubMed Central

Mayo, Charles; Conners, Steve; Warren, Christopher; Miller, Robert; Court, Laurence; Popple, Richard

2013-01-01

Purpose: With emergence of clinical outcomes databases as tools utilized routinely within institutions, comes need for software tools to support automated statistical analysis of these large data sets and intrainstitutional exchange from independent federated databases to support data pooling. In this paper, the authors present a design approach and analysis methodology that addresses both issues. Methods: A software application was constructed to automate analysis of patient outcomes data using a wide range of statistical metrics, by combining use of C#.Net and R code. The accuracy and speed of the code was evaluated using benchmark data sets. Results: The approach provides data needed to evaluate combinations of statistical measurements for ability to identify patterns of interest in the data. Through application of the tools to a benchmark data set for dose-response threshold and to SBRT lung data sets, an algorithm was developed that uses receiver operator characteristic curves to identify a threshold value and combines use of contingency tables, Fisher exact tests, Welch t-tests, and Kolmogorov-Smirnov tests to filter the large data set to identify values demonstrating dose-response. Kullback-Leibler divergences were used to provide additional confirmation. Conclusions: The work demonstrates the viability of the design approach and the software tool for analysis of large data sets. PMID:24320426
Demonstration of a software design and statistical analysis methodology with application to patient outcomes data sets.

PubMed

Mayo, Charles; Conners, Steve; Warren, Christopher; Miller, Robert; Court, Laurence; Popple, Richard

2013-11-01

With emergence of clinical outcomes databases as tools utilized routinely within institutions, comes need for software tools to support automated statistical analysis of these large data sets and intrainstitutional exchange from independent federated databases to support data pooling. In this paper, the authors present a design approach and analysis methodology that addresses both issues. A software application was constructed to automate analysis of patient outcomes data using a wide range of statistical metrics, by combining use of C#.Net and R code. The accuracy and speed of the code was evaluated using benchmark data sets. The approach provides data needed to evaluate combinations of statistical measurements for ability to identify patterns of interest in the data. Through application of the tools to a benchmark data set for dose-response threshold and to SBRT lung data sets, an algorithm was developed that uses receiver operator characteristic curves to identify a threshold value and combines use of contingency tables, Fisher exact tests, Welch t-tests, and Kolmogorov-Smirnov tests to filter the large data set to identify values demonstrating dose-response. Kullback-Leibler divergences were used to provide additional confirmation. The work demonstrates the viability of the design approach and the software tool for analysis of large data sets.
Benchmarking density functional theory predictions of framework structures and properties in a chemically diverse test set of metal-organic frameworks

DOE PAGES

Nazarian, Dalar; Ganesh, P.; Sholl, David S.

2015-09-30

We compiled a test set of chemically and topologically diverse Metal–Organic Frameworks (MOFs) with high accuracy experimentally derived crystallographic structure data. The test set was used to benchmark the performance of Density Functional Theory (DFT) functionals (M06L, PBE, PW91, PBE-D2, PBE-D3, and vdW-DF2) for predicting lattice parameters, unit cell volume, bonded parameters and pore descriptors. On average PBE-D2, PBE-D3, and vdW-DF2 predict more accurate structures, but all functionals predicted pore diameters within 0.5 Å of the experimental diameter for every MOF in the test set. The test set was also used to assess the variance in performance of DFT functionalsmore » for elastic properties and atomic partial charges. The DFT predicted elastic properties such as minimum shear modulus and Young's modulus can differ by an average of 3 and 9 GPa for rigid MOFs such as those in the test set. Moreover, we calculated the partial charges by vdW-DF2 deviate the most from other functionals while there is no significant difference between the partial charges calculated by M06L, PBE, PW91, PBE-D2 and PBE-D3 for the MOFs in the test set. We find that while there are differences in the magnitude of the properties predicted by the various functionals, these discrepancies are small compared to the accuracy necessary for most practical applications.« less
Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking

PubMed Central

2012-01-01

A key metric to assess molecular docking remains ligand enrichment against challenging decoys. Whereas the directory of useful decoys (DUD) has been widely used, clear areas for optimization have emerged. Here we describe an improved benchmarking set that includes more diverse targets such as GPCRs and ion channels, totaling 102 proteins with 22886 clustered ligands drawn from ChEMBL, each with 50 property-matched decoys drawn from ZINC. To ensure chemotype diversity, we cluster each target’s ligands by their Bemis–Murcko atomic frameworks. We add net charge to the matched physicochemical properties and include only the most dissimilar decoys, by topology, from the ligands. An online automated tool (http://decoys.docking.org) generates these improved matched decoys for user-supplied ligands. We test this data set by docking all 102 targets, using the results to improve the balance between ligand desolvation and electrostatics in DOCK 3.6. The complete DUD-E benchmarking set is freely available at http://dude.docking.org. PMID:22716043
PFLOTRAN Verification: Development of a Testing Suite to Ensure Software Quality

NASA Astrophysics Data System (ADS)

Hammond, G. E.; Frederick, J. M.

2016-12-01

In scientific computing, code verification ensures the reliability and numerical accuracy of a model simulation by comparing the simulation results to experimental data or known analytical solutions. The model is typically defined by a set of partial differential equations with initial and boundary conditions, and verification ensures whether the mathematical model is solved correctly by the software. Code verification is especially important if the software is used to model high-consequence systems which cannot be physically tested in a fully representative environment [Oberkampf and Trucano (2007)]. Justified confidence in a particular computational tool requires clarity in the exercised physics and transparency in its verification process with proper documentation. We present a quality assurance (QA) testing suite developed by Sandia National Laboratories that performs code verification for PFLOTRAN, an open source, massively-parallel subsurface simulator. PFLOTRAN solves systems of generally nonlinear partial differential equations describing multiphase, multicomponent and multiscale reactive flow and transport processes in porous media. PFLOTRAN's QA test suite compares the numerical solutions of benchmark problems in heat and mass transport against known, closed-form, analytical solutions, including documentation of the exercised physical process models implemented in each PFLOTRAN benchmark simulation. The QA test suite development strives to follow the recommendations given by Oberkampf and Trucano (2007), which describes four essential elements in high-quality verification benchmark construction: (1) conceptual description, (2) mathematical description, (3) accuracy assessment, and (4) additional documentation and user information. Several QA tests within the suite will be presented, including details of the benchmark problems and their closed-form analytical solutions, implementation of benchmark problems in PFLOTRAN simulations, and the criteria used to assess PFLOTRAN's performance in the code verification procedure. References Oberkampf, W. L., and T. G. Trucano (2007), Verification and Validation Benchmarks, SAND2007-0853, 67 pgs., Sandia National Laboratories, Albuquerque, NM.

Pollutant Emissions and Energy Efficiency under Controlled Conditions for Household Biomass Cookstoves and Implications for Metrics Useful in Setting International Test Standards

EPA Science Inventory

Realistic metrics and methods for testing household biomass cookstoves are required to develop standards needed by international policy makers, donors, and investors. Application of consistent test practices allows emissions and energy efficiency performance to be benchmarked and...
The NAS parallel benchmarks

NASA Technical Reports Server (NTRS)

Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)

1993-01-01

A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
Analyzing the BBOB results by means of benchmarking concepts.

PubMed

Mersmann, O; Preuss, M; Trautmann, H; Bischl, B; Weihs, C

2015-01-01

We present methods to answer two basic questions that arise when benchmarking optimization algorithms. The first one is: which algorithm is the "best" one? and the second one is: which algorithm should I use for my real-world problem? Both are connected and neither is easy to answer. We present a theoretical framework for designing and analyzing the raw data of such benchmark experiments. This represents a first step in answering the aforementioned questions. The 2009 and 2010 BBOB benchmark results are analyzed by means of this framework and we derive insight regarding the answers to the two questions. Furthermore, we discuss how to properly aggregate rankings from algorithm evaluations on individual problems into a consensus, its theoretical background and which common pitfalls should be avoided. Finally, we address the grouping of test problems into sets with similar optimizer rankings and investigate whether these are reflected by already proposed test problem characteristics, finding that this is not always the case.
Validating Cellular Automata Lava Flow Emplacement Algorithms with Standard Benchmarks

NASA Astrophysics Data System (ADS)

Richardson, J. A.; Connor, L.; Charbonnier, S. J.; Connor, C.; Gallant, E.

2015-12-01

A major existing need in assessing lava flow simulators is a common set of validation benchmark tests. We propose three levels of benchmarks which test model output against increasingly complex standards. First, imulated lava flows should be morphologically identical, given changes in parameter space that should be inconsequential, such as slope direction. Second, lava flows simulated in simple parameter spaces can be tested against analytical solutions or empirical relationships seen in Bingham fluids. For instance, a lava flow simulated on a flat surface should produce a circular outline. Third, lava flows simulated over real world topography can be compared to recent real world lava flows, such as those at Tolbachik, Russia, and Fogo, Cape Verde. Success or failure of emplacement algorithms in these validation benchmarks can be determined using a Bayesian approach, which directly tests the ability of an emplacement algorithm to correctly forecast lava inundation. Here we focus on two posterior metrics, P(A|B) and P(¬A|¬B), which describe the positive and negative predictive value of flow algorithms. This is an improvement on less direct statistics such as model sensitivity and the Jaccard fitness coefficient. We have performed these validation benchmarks on a new, modular lava flow emplacement simulator that we have developed. This simulator, which we call MOLASSES, follows a Cellular Automata (CA) method. The code is developed in several interchangeable modules, which enables quick modification of the distribution algorithm from cell locations to their neighbors. By assessing several different distribution schemes with the benchmark tests, we have improved the performance of MOLASSES to correctly match early stages of the 2012-3 Tolbachik Flow, Kamchakta Russia, to 80%. We also can evaluate model performance given uncertain input parameters using a Monte Carlo setup. This illuminates sensitivity to model uncertainty.
Benchmarking Data for the Proposed Signature of Used Fuel Casks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rauch, Eric Benton

2016-09-23

A set of benchmarking measurements to test facets of the proposed extended storage signature was conducted on May 17, 2016. The measurements were designed to test the overall concept of how the proposed signature can be used to identify a used fuel cask based only on the distribution of neutron sources within the cask. To simulate the distribution, 4 Cf-252 sources were chosen and arranged on a 3x3 grid in 3 different patterns and raw neutron totals counts were taken at 6 locations around the grid. This is a very simplified test of the typical geometry studied previously in simulationmore » with simulated used nuclear fuel.« less
Ab Initio Density Fitting: Accuracy Assessment of Auxiliary Basis Sets from Cholesky Decompositions.

PubMed

Boström, Jonas; Aquilante, Francesco; Pedersen, Thomas Bondo; Lindh, Roland

2009-06-09

The accuracy of auxiliary basis sets derived by Cholesky decompositions of the electron repulsion integrals is assessed in a series of benchmarks on total ground state energies and dipole moments of a large test set of molecules. The test set includes molecules composed of atoms from the first three rows of the periodic table as well as transition metals. The accuracy of the auxiliary basis sets are tested for the 6-31G**, correlation consistent, and atomic natural orbital basis sets at the Hartree-Fock, density functional theory, and second-order Møller-Plesset levels of theory. By decreasing the decomposition threshold, a hierarchy of auxiliary basis sets is obtained with accuracies ranging from that of standard auxiliary basis sets to that of conventional integral treatments.
The Earthquake Source Inversion Validation (SIV) - Project: Summary, Status, Outlook

NASA Astrophysics Data System (ADS)

Mai, P. M.

2017-12-01

Finite-fault earthquake source inversions infer the (time-dependent) displacement on the rupture surface from geophysical data. The resulting earthquake source models document the complexity of the rupture process. However, this kinematic source inversion is ill-posed and returns non-unique solutions, as seen for instance in multiple source models for the same earthquake, obtained by different research teams, that often exhibit remarkable dissimilarities. To address the uncertainties in earthquake-source inversions and to understand strengths and weaknesses of various methods, the Source Inversion Validation (SIV) project developed a set of forward-modeling exercises and inversion benchmarks. Several research teams then use these validation exercises to test their codes and methods, but also to develop and benchmark new approaches. In this presentation I will summarize the SIV strategy, the existing benchmark exercises and corresponding results. Using various waveform-misfit criteria and newly developed statistical comparison tools to quantify source-model (dis)similarities, the SIV platforms is able to rank solutions and identify particularly promising source inversion approaches. Existing SIV exercises (with related data and descriptions) and all computational tools remain available via the open online collaboration platform; additional exercises and benchmark tests will be uploaded once they are fully developed. I encourage source modelers to use the SIV benchmarks for developing and testing new methods. The SIV efforts have already led to several promising new techniques for tackling the earthquake-source imaging problem. I expect that future SIV benchmarks will provide further innovations and insights into earthquake source kinematics that will ultimately help to better understand the dynamics of the rupture process.
Overview of TPC Benchmark E: The Next Generation of OLTP Benchmarks

NASA Astrophysics Data System (ADS)

Hogan, Trish

Set to replace the aging TPC-C, the TPC Benchmark E is the next generation OLTP benchmark, which more accurately models client database usage. TPC-E addresses the shortcomings of TPC-C. It has a much more complex workload, requires the use of RAID-protected storage, generates much less I/O, and is much cheaper and easier to set up, run, and audit. After a period of overlap, it is expected that TPC-E will become the de facto OLTP benchmark.
An Unbiased Method To Build Benchmarking Sets for Ligand-Based Virtual Screening and its Application To GPCRs

PubMed Central

2015-01-01

Benchmarking data sets have become common in recent years for the purpose of virtual screening, though the main focus had been placed on the structure-based virtual screening (SBVS) approaches. Due to the lack of crystal structures, there is great need for unbiased benchmarking sets to evaluate various ligand-based virtual screening (LBVS) methods for important drug targets such as G protein-coupled receptors (GPCRs). To date these ready-to-apply data sets for LBVS are fairly limited, and the direct usage of benchmarking sets designed for SBVS could bring the biases to the evaluation of LBVS. Herein, we propose an unbiased method to build benchmarking sets for LBVS and validate it on a multitude of GPCRs targets. To be more specific, our methods can (1) ensure chemical diversity of ligands, (2) maintain the physicochemical similarity between ligands and decoys, (3) make the decoys dissimilar in chemical topology to all ligands to avoid false negatives, and (4) maximize spatial random distribution of ligands and decoys. We evaluated the quality of our Unbiased Ligand Set (ULS) and Unbiased Decoy Set (UDS) using three common LBVS approaches, with Leave-One-Out (LOO) Cross-Validation (CV) and a metric of average AUC of the ROC curves. Our method has greatly reduced the “artificial enrichment” and “analogue bias” of a published GPCRs benchmarking set, i.e., GPCR Ligand Library (GLL)/GPCR Decoy Database (GDD). In addition, we addressed an important issue about the ratio of decoys per ligand and found that for a range of 30 to 100 it does not affect the quality of the benchmarking set, so we kept the original ratio of 39 from the GLL/GDD. PMID:24749745
An unbiased method to build benchmarking sets for ligand-based virtual screening and its application to GPCRs.

PubMed

Xia, Jie; Jin, Hongwei; Liu, Zhenming; Zhang, Liangren; Wang, Xiang Simon

2014-05-27

Benchmarking data sets have become common in recent years for the purpose of virtual screening, though the main focus had been placed on the structure-based virtual screening (SBVS) approaches. Due to the lack of crystal structures, there is great need for unbiased benchmarking sets to evaluate various ligand-based virtual screening (LBVS) methods for important drug targets such as G protein-coupled receptors (GPCRs). To date these ready-to-apply data sets for LBVS are fairly limited, and the direct usage of benchmarking sets designed for SBVS could bring the biases to the evaluation of LBVS. Herein, we propose an unbiased method to build benchmarking sets for LBVS and validate it on a multitude of GPCRs targets. To be more specific, our methods can (1) ensure chemical diversity of ligands, (2) maintain the physicochemical similarity between ligands and decoys, (3) make the decoys dissimilar in chemical topology to all ligands to avoid false negatives, and (4) maximize spatial random distribution of ligands and decoys. We evaluated the quality of our Unbiased Ligand Set (ULS) and Unbiased Decoy Set (UDS) using three common LBVS approaches, with Leave-One-Out (LOO) Cross-Validation (CV) and a metric of average AUC of the ROC curves. Our method has greatly reduced the "artificial enrichment" and "analogue bias" of a published GPCRs benchmarking set, i.e., GPCR Ligand Library (GLL)/GPCR Decoy Database (GDD). In addition, we addressed an important issue about the ratio of decoys per ligand and found that for a range of 30 to 100 it does not affect the quality of the benchmarking set, so we kept the original ratio of 39 from the GLL/GDD.
Evaluation and optimization of virtual screening workflows with DEKOIS 2.0--a public library of challenging docking benchmark sets.

PubMed

Bauer, Matthias R; Ibrahim, Tamer M; Vogel, Simon M; Boeckler, Frank M

2013-06-24

The application of molecular benchmarking sets helps to assess the actual performance of virtual screening (VS) workflows. To improve the efficiency of structure-based VS approaches, the selection and optimization of various parameters can be guided by benchmarking. With the DEKOIS 2.0 library, we aim to further extend and complement the collection of publicly available decoy sets. Based on BindingDB bioactivity data, we provide 81 new and structurally diverse benchmark sets for a wide variety of different target classes. To ensure a meaningful selection of ligands, we address several issues that can be found in bioactivity data. We have improved our previously introduced DEKOIS methodology with enhanced physicochemical matching, now including the consideration of molecular charges, as well as a more sophisticated elimination of latent actives in the decoy set (LADS). We evaluate the docking performance of Glide, GOLD, and AutoDock Vina with our data sets and highlight existing challenges for VS tools. All DEKOIS 2.0 benchmark sets will be made accessible at http://www.dekois.com.
Docking and scoring with ICM: the benchmarking results and strategies for improvement

PubMed Central

Neves, Marco A. C.; Totrov, Maxim; Abagyan, Ruben

2012-01-01

Flexible docking and scoring using the Internal Coordinate Mechanics software (ICM) was benchmarked for ligand binding mode prediction against the 85 co-crystal structures in the modified Astex data set. The ICM virtual ligand screening was tested against the 40 DUD target benchmarks and 11-target WOMBAT sets. The self-docking accuracy was evaluated for the top 1 and top 3 scoring poses at each ligand binding site with near native conformations below 2 Å RMSD found in 91% and 95% of the predictions, respectively. The virtual ligand screening using single rigid pocket conformations provided the median area under the ROC curves equal to 69.4 with 22.0% true positives recovered at 2% false positive rate. Significant improvements up to ROC AUC= 82.2 and ROC(2%)= 45.2 were achieved following our best practices for flexible pocket refinement and out-of-pocket binding rescore. The virtual screening can be further improved by considering multiple conformations of the target. PMID:22569591
The Psychology Experiment Building Language (PEBL) and PEBL Test Battery.

PubMed

Mueller, Shane T; Piper, Brian J

2014-01-30

We briefly describe the Psychology Experiment Building Language (PEBL), an open source software system for designing and running psychological experiments. We describe the PEBL Test Battery, a set of approximately 70 behavioral tests which can be freely used, shared, and modified. Included is a comprehensive set of past research upon which tests in the battery are based. We report the results of benchmark tests that establish the timing precision of PEBL. We consider alternatives to the PEBL system and battery tests. We conclude with a discussion of the ethical factors involved in the open source testing movement. Copyright © 2013 Elsevier B.V. All rights reserved.
The Psychology Experiment Building Language (PEBL) and PEBL Test Battery

PubMed Central

Mueller, Shane T.; Piper, Brian J.

2014-01-01

Background We briefly describe the Psychology Experiment Building Language (PEBL), an open source software system for designing and running psychological experiments. New Method We describe the PEBL test battery, a set of approximately 70 behavioral tests which can be freely used, shared, and modified. Included is a comprehensive set of past research upon which tests in the battery are based. Results We report the results of benchmark tests that establish the timing precision of PEBL. Comparison with Existing Method We consider alternatives to the PEBL system and battery tests. Conclusions We conclude with a discussion of the ethical factors involved in the open source testing movement. PMID:24269254
Toxicological benchmarks for screening potential contaminants of concern for effects on aquatic biota: 1994 Revision

DOE Office of Scientific and Technical Information (OSTI.GOV)

Suter, G.W. II; Mabrey, J.B.

1994-07-01

This report presents potential screening benchmarks for protection of aquatic life from contaminants in water. Because there is no guidance for screening benchmarks, a set of alternative benchmarks is presented herein. The alternative benchmarks are based on different conceptual approaches to estimating concentrations causing significant effects. For the upper screening benchmark, there are the acute National Ambient Water Quality Criteria (NAWQC) and the Secondary Acute Values (SAV). The SAV concentrations are values estimated with 80% confidence not to exceed the unknown acute NAWQC for those chemicals with no NAWQC. The alternative chronic benchmarks are the chronic NAWQC, the Secondary Chronicmore » Value (SCV), the lowest chronic values for fish and daphnids from chronic toxicity tests, the estimated EC20 for a sensitive species, and the concentration estimated to cause a 20% reduction in the recruit abundance of largemouth bass. It is recommended that ambient chemical concentrations be compared to all of these benchmarks. If NAWQC are exceeded, the chemicals must be contaminants of concern because the NAWQC are applicable or relevant and appropriate requirements (ARARs). If NAWQC are not exceeded, but other benchmarks are, contaminants should be selected on the basis of the number of benchmarks exceeded and the conservatism of the particular benchmark values, as discussed in the text. To the extent that toxicity data are available, this report presents the alternative benchmarks for chemicals that have been detected on the Oak Ridge Reservation. It also presents the data used to calculate benchmarks and the sources of the data. It compares the benchmarks and discusses their relative conservatism and utility.« less
Computational Chemistry Comparison and Benchmark Database

National Institute of Standards and Technology Data Gateway

SRD 101 NIST Computational Chemistry Comparison and Benchmark Database (Web, free access) The NIST Computational Chemistry Comparison and Benchmark Database is a collection of experimental and ab initio thermochemical properties for a selected set of molecules. The goals are to provide a benchmark set of molecules for the evaluation of ab initio computational methods and allow the comparison between different ab initio computational methods for the prediction of thermochemical properties.
Predicting Protein-protein Association Rates using Coarse-grained Simulation and Machine Learning

NASA Astrophysics Data System (ADS)

Xie, Zhong-Ru; Chen, Jiawen; Wu, Yinghao

2017-04-01

Protein-protein interactions dominate all major biological processes in living cells. We have developed a new Monte Carlo-based simulation algorithm to study the kinetic process of protein association. We tested our method on a previously used large benchmark set of 49 protein complexes. The predicted rate was overestimated in the benchmark test compared to the experimental results for a group of protein complexes. We hypothesized that this resulted from molecular flexibility at the interface regions of the interacting proteins. After applying a machine learning algorithm with input variables that accounted for both the conformational flexibility and the energetic factor of binding, we successfully identified most of the protein complexes with overestimated association rates and improved our final prediction by using a cross-validation test. This method was then applied to a new independent test set and resulted in a similar prediction accuracy to that obtained using the training set. It has been thought that diffusion-limited protein association is dominated by long-range interactions. Our results provide strong evidence that the conformational flexibility also plays an important role in regulating protein association. Our studies provide new insights into the mechanism of protein association and offer a computationally efficient tool for predicting its rate.
Predicting Protein–protein Association Rates using Coarse-grained Simulation and Machine Learning

PubMed Central

Xie, Zhong-Ru; Chen, Jiawen; Wu, Yinghao

2017-01-01

Protein–protein interactions dominate all major biological processes in living cells. We have developed a new Monte Carlo-based simulation algorithm to study the kinetic process of protein association. We tested our method on a previously used large benchmark set of 49 protein complexes. The predicted rate was overestimated in the benchmark test compared to the experimental results for a group of protein complexes. We hypothesized that this resulted from molecular flexibility at the interface regions of the interacting proteins. After applying a machine learning algorithm with input variables that accounted for both the conformational flexibility and the energetic factor of binding, we successfully identified most of the protein complexes with overestimated association rates and improved our final prediction by using a cross-validation test. This method was then applied to a new independent test set and resulted in a similar prediction accuracy to that obtained using the training set. It has been thought that diffusion-limited protein association is dominated by long-range interactions. Our results provide strong evidence that the conformational flexibility also plays an important role in regulating protein association. Our studies provide new insights into the mechanism of protein association and offer a computationally efficient tool for predicting its rate. PMID:28418043
Predicting Protein-protein Association Rates using Coarse-grained Simulation and Machine Learning.

PubMed

Xie, Zhong-Ru; Chen, Jiawen; Wu, Yinghao

2017-04-18

Protein-protein interactions dominate all major biological processes in living cells. We have developed a new Monte Carlo-based simulation algorithm to study the kinetic process of protein association. We tested our method on a previously used large benchmark set of 49 protein complexes. The predicted rate was overestimated in the benchmark test compared to the experimental results for a group of protein complexes. We hypothesized that this resulted from molecular flexibility at the interface regions of the interacting proteins. After applying a machine learning algorithm with input variables that accounted for both the conformational flexibility and the energetic factor of binding, we successfully identified most of the protein complexes with overestimated association rates and improved our final prediction by using a cross-validation test. This method was then applied to a new independent test set and resulted in a similar prediction accuracy to that obtained using the training set. It has been thought that diffusion-limited protein association is dominated by long-range interactions. Our results provide strong evidence that the conformational flexibility also plays an important role in regulating protein association. Our studies provide new insights into the mechanism of protein association and offer a computationally efficient tool for predicting its rate.
Novel probabilistic neuroclassifier

NASA Astrophysics Data System (ADS)

Hong, Jiang; Serpen, Gursel

2003-09-01

A novel probabilistic potential function neural network classifier algorithm to deal with classes which are multi-modally distributed and formed from sets of disjoint pattern clusters is proposed in this paper. The proposed classifier has a number of desirable properties which distinguish it from other neural network classifiers. A complete description of the algorithm in terms of its architecture and the pseudocode is presented. Simulation analysis of the newly proposed neuro-classifier algorithm on a set of benchmark problems is presented. Benchmark problems tested include IRIS, Sonar, Vowel Recognition, Two-Spiral, Wisconsin Breast Cancer, Cleveland Heart Disease and Thyroid Gland Disease. Simulation results indicate that the proposed neuro-classifier performs consistently better for a subset of problems for which other neural classifiers perform relatively poorly.

Structural Benchmark Creep Testing for Microcast MarM-247 Advanced Stirling Convertor E2 Heater Head Test Article SN18

NASA Technical Reports Server (NTRS)

Krause, David L.; Brewer, Ethan J.; Pawlik, Ralph

2013-01-01

This report provides test methodology details and qualitative results for the first structural benchmark creep test of an Advanced Stirling Convertor (ASC) heater head of ASC-E2 design heritage. The test article was recovered from a flight-like Microcast MarM-247 heater head specimen previously used in helium permeability testing. The test article was utilized for benchmark creep test rig preparation, wall thickness and diametral laser scan hardware metrological developments, and induction heater custom coil experiments. In addition, a benchmark creep test was performed, terminated after one week when through-thickness cracks propagated at thermocouple weld locations. Following this, it was used to develop a unique temperature measurement methodology using contact thermocouples, thereby enabling future benchmark testing to be performed without the use of conventional welded thermocouples, proven problematic for the alloy. This report includes an overview of heater head structural benchmark creep testing, the origin of this particular test article, test configuration developments accomplished using the test article, creep predictions for its benchmark creep test, qualitative structural benchmark creep test results, and a short summary.
Benchmarking of London Dispersion-Accounting Density Functional Theory Methods on Very Large Molecular Complexes.

PubMed

Risthaus, Tobias; Grimme, Stefan

2013-03-12

A new test set (S12L) containing 12 supramolecular noncovalently bound complexes is presented and used to evaluate seven different methods to account for dispersion in DFT (DFT-D3, DFT-D2, DFT-NL, XDM, dDsC, TS-vdW, M06-L) at different basis set levels against experimental, back-corrected reference energies. This allows conclusions about the performance of each method in an explorative research setting on "real-life" problems. Most DFT methods show satisfactory performance but, due to the largeness of the complexes, almost always require an explicit correction for the nonadditive Axilrod-Teller-Muto three-body dispersion interaction to get accurate results. The necessity of using a method capable of accounting for dispersion is clearly demonstrated in that the two-body dispersion contributions are on the order of 20-150% of the total interaction energy. MP2 and some variants thereof are shown to be insufficient for this while a few tested D3-corrected semiempirical MO methods perform reasonably well. Overall, we suggest the use of this benchmark set as a "sanity check" against overfitting to too small molecular cases.
Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database

PubMed Central

Butkiewicz, Mariusz; Lowe, Edward W.; Mueller, Ralf; Mendenhall, Jeffrey L.; Teixeira, Pedro L.; Weaver, C. David; Meiler, Jens

2013-01-01

With the rapidly increasing availability of High-Throughput Screening (HTS) data in the public domain, such as the PubChem database, methods for ligand-based computer-aided drug discovery (LB-CADD) have the potential to accelerate and reduce the cost of probe development and drug discovery efforts in academia. We assemble nine data sets from realistic HTS campaigns representing major families of drug target proteins for benchmarking LB-CADD methods. Each data set is public domain through PubChem and carefully collated through confirmation screens validating active compounds. These data sets provide the foundation for benchmarking a new cheminformatics framework BCL::ChemInfo, which is freely available for non-commercial use. Quantitative structure activity relationship (QSAR) models are built using Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), Decision Trees (DTs), and Kohonen networks (KNs). Problem-specific descriptor optimization protocols are assessed including Sequential Feature Forward Selection (SFFS) and various information content measures. Measures of predictive power and confidence are evaluated through cross-validation, and a consensus prediction scheme is tested that combines orthogonal machine learning algorithms into a single predictor. Enrichments ranging from 15 to 101 for a TPR cutoff of 25% are observed. PMID:23299552
Benchmarking the Multidimensional Stellar Implicit Code MUSIC

NASA Astrophysics Data System (ADS)

Goffrey, T.; Pratt, J.; Viallet, M.; Baraffe, I.; Popov, M. V.; Walder, R.; Folini, D.; Geroux, C.; Constantino, T.

2017-04-01

We present the results of a numerical benchmark study for the MUltidimensional Stellar Implicit Code (MUSIC) based on widely applicable two- and three-dimensional compressible hydrodynamics problems relevant to stellar interiors. MUSIC is an implicit large eddy simulation code that uses implicit time integration, implemented as a Jacobian-free Newton Krylov method. A physics based preconditioning technique which can be adjusted to target varying physics is used to improve the performance of the solver. The problems used for this benchmark study include the Rayleigh-Taylor and Kelvin-Helmholtz instabilities, and the decay of the Taylor-Green vortex. Additionally we show a test of hydrostatic equilibrium, in a stellar environment which is dominated by radiative effects. In this setting the flexibility of the preconditioning technique is demonstrated. This work aims to bridge the gap between the hydrodynamic test problems typically used during development of numerical methods and the complex flows of stellar interiors. A series of multidimensional tests were performed and analysed. Each of these test cases was analysed with a simple, scalar diagnostic, with the aim of enabling direct code comparisons. As the tests performed do not have analytic solutions, we verify MUSIC by comparing it to established codes including ATHENA and the PENCIL code. MUSIC is able to both reproduce behaviour from established and widely-used codes as well as results expected from theoretical predictions. This benchmarking study concludes a series of papers describing the development of the MUSIC code and provides confidence in future applications.
High-energy neutron depth-dose distribution experiment.

PubMed

Ferenci, M S; Hertel, N E

2003-01-01

A unique set of high-energy neutron depth-dose benchmark experiments were performed at the Los Alamos Neutron Science Center/Weapons Neutron Research (LANSCE/WNR) complex. The experiments consisted of filtered neutron beams with energies up to 800 MeV impinging on a 30 x 30 x 30 cm3 liquid, tissue-equivalent phantom. The absorbed dose was measured in the phantom at various depths with tissue-equivalent ion chambers. This experiment is intended to serve as a benchmark experiment for the testing of high-energy radiation transport codes for the international radiation protection community.
Developing a benchmark for emotional analysis of music

PubMed Central

Yang, Yi-Hsuan; Soleymani, Mohammad

2017-01-01

Music emotion recognition (MER) field rapidly expanded in the last decade. Many new methods and new audio features are developed to improve the performance of MER algorithms. However, it is very difficult to compare the performance of the new methods because of the data representation diversity and scarcity of publicly available data. In this paper, we address these problems by creating a data set and a benchmark for MER. The data set that we release, a MediaEval Database for Emotional Analysis in Music (DEAM), is the largest available data set of dynamic annotations (valence and arousal annotations for 1,802 songs and song excerpts licensed under Creative Commons with 2Hz time resolution). Using DEAM, we organized the ‘Emotion in Music’ task at MediaEval Multimedia Evaluation Campaign from 2013 to 2015. The benchmark attracted, in total, 21 active teams to participate in the challenge. We analyze the results of the benchmark: the winning algorithms and feature-sets. We also describe the design of the benchmark, the evaluation procedures and the data cleaning and transformations that we suggest. The results from the benchmark suggest that the recurrent neural network based approaches combined with large feature-sets work best for dynamic MER. PMID:28282400
Developing a benchmark for emotional analysis of music.

PubMed

Aljanaki, Anna; Yang, Yi-Hsuan; Soleymani, Mohammad

2017-01-01

Music emotion recognition (MER) field rapidly expanded in the last decade. Many new methods and new audio features are developed to improve the performance of MER algorithms. However, it is very difficult to compare the performance of the new methods because of the data representation diversity and scarcity of publicly available data. In this paper, we address these problems by creating a data set and a benchmark for MER. The data set that we release, a MediaEval Database for Emotional Analysis in Music (DEAM), is the largest available data set of dynamic annotations (valence and arousal annotations for 1,802 songs and song excerpts licensed under Creative Commons with 2Hz time resolution). Using DEAM, we organized the 'Emotion in Music' task at MediaEval Multimedia Evaluation Campaign from 2013 to 2015. The benchmark attracted, in total, 21 active teams to participate in the challenge. We analyze the results of the benchmark: the winning algorithms and feature-sets. We also describe the design of the benchmark, the evaluation procedures and the data cleaning and transformations that we suggest. The results from the benchmark suggest that the recurrent neural network based approaches combined with large feature-sets work best for dynamic MER.
Advancing Ohio's P-16 Agenda: Exit and Entrance Exam?

ERIC Educational Resources Information Center

Rochford, Joseph A.

2004-01-01

Tests like the Ohio Graduation Test are part of what has become known as the "standards-based" reform movement in education. Simply put, they allow states to measure whether or not students are learning according to whatever set of standards, benchmarks and indicators are adopted by that state. They also help meet, in part, the reporting…
Assessing Discriminative Performance at External Validation of Clinical Prediction Models

PubMed Central

Nieboer, Daan; van der Ploeg, Tjeerd; Steyerberg, Ewout W.

2016-01-01

Introduction External validation studies are essential to study the generalizability of prediction models. Recently a permutation test, focusing on discrimination as quantified by the c-statistic, was proposed to judge whether a prediction model is transportable to a new setting. We aimed to evaluate this test and compare it to previously proposed procedures to judge any changes in c-statistic from development to external validation setting. Methods We compared the use of the permutation test to the use of benchmark values of the c-statistic following from a previously proposed framework to judge transportability of a prediction model. In a simulation study we developed a prediction model with logistic regression on a development set and validated them in the validation set. We concentrated on two scenarios: 1) the case-mix was more heterogeneous and predictor effects were weaker in the validation set compared to the development set, and 2) the case-mix was less heterogeneous in the validation set and predictor effects were identical in the validation and development set. Furthermore we illustrated the methods in a case study using 15 datasets of patients suffering from traumatic brain injury. Results The permutation test indicated that the validation and development set were homogenous in scenario 1 (in almost all simulated samples) and heterogeneous in scenario 2 (in 17%-39% of simulated samples). Previously proposed benchmark values of the c-statistic and the standard deviation of the linear predictors correctly pointed at the more heterogeneous case-mix in scenario 1 and the less heterogeneous case-mix in scenario 2. Conclusion The recently proposed permutation test may provide misleading results when externally validating prediction models in the presence of case-mix differences between the development and validation population. To correctly interpret the c-statistic found at external validation it is crucial to disentangle case-mix differences from incorrect regression coefficients. PMID:26881753
Assessing Discriminative Performance at External Validation of Clinical Prediction Models.

PubMed

Nieboer, Daan; van der Ploeg, Tjeerd; Steyerberg, Ewout W

2016-01-01

External validation studies are essential to study the generalizability of prediction models. Recently a permutation test, focusing on discrimination as quantified by the c-statistic, was proposed to judge whether a prediction model is transportable to a new setting. We aimed to evaluate this test and compare it to previously proposed procedures to judge any changes in c-statistic from development to external validation setting. We compared the use of the permutation test to the use of benchmark values of the c-statistic following from a previously proposed framework to judge transportability of a prediction model. In a simulation study we developed a prediction model with logistic regression on a development set and validated them in the validation set. We concentrated on two scenarios: 1) the case-mix was more heterogeneous and predictor effects were weaker in the validation set compared to the development set, and 2) the case-mix was less heterogeneous in the validation set and predictor effects were identical in the validation and development set. Furthermore we illustrated the methods in a case study using 15 datasets of patients suffering from traumatic brain injury. The permutation test indicated that the validation and development set were homogenous in scenario 1 (in almost all simulated samples) and heterogeneous in scenario 2 (in 17%-39% of simulated samples). Previously proposed benchmark values of the c-statistic and the standard deviation of the linear predictors correctly pointed at the more heterogeneous case-mix in scenario 1 and the less heterogeneous case-mix in scenario 2. The recently proposed permutation test may provide misleading results when externally validating prediction models in the presence of case-mix differences between the development and validation population. To correctly interpret the c-statistic found at external validation it is crucial to disentangle case-mix differences from incorrect regression coefficients.
Benchmark and Framework for Encouraging Research on Multi-Threaded Testing Tools

NASA Technical Reports Server (NTRS)

Havelund, Klaus; Stoller, Scott D.; Ur, Shmuel

2003-01-01

A problem that has been getting prominence in testing is that of looking for intermittent bugs. Multi-threaded code is becoming very common, mostly on the server side. As there is no silver bullet solution, research focuses on a variety of partial solutions. In this paper (invited by PADTAD 2003) we outline a proposed project to facilitate research. The project goals are as follows. The first goal is to create a benchmark that can be used to evaluate different solutions. The benchmark, apart from containing programs with documented bugs, will include other artifacts, such as traces, that are useful for evaluating some of the technologies. The second goal is to create a set of tools with open API s that can be used to check ideas without building a large system. For example an instrumentor will be available, that could be used to test temporal noise making heuristics. The third goal is to create a focus for the research in this area around which a community of people who try to solve similar problems with different techniques, could congregate.
A determination of the external forces required to move the benchmark active controls testing model in pure plunge and pure pitch

NASA Technical Reports Server (NTRS)

Dcruz, Jonathan

1993-01-01

In view of the strong need for a well-documented set of experimental data which is suitable for the validation and/or calibration of modern Computational Fluid Dynamics codes, the Benchmark Models Program was initiated by the Structural Dynamics Division of the NASA Langley Research Center. One of the models in the program, the Benchmark Active Controls Testing Model, consists of a rigid wing of rectangular planform with a NACA 0012 profile and three control surfaces (a trailing-edge control surface, a lower-surface spoiler, and an upper-surface spoiler). The model is affixed to a flexible mount system which allows only plunging and/or pitching motion. An approximate analytical determination of the forces required to move this model, with its control surfaces fixed, in pure plunge and pure pitch at a number of test conditions is included. This provides a good indication of the type of actuator system required to generate the aerodynamic data resulting from pure plunging and pure pitching motion, in which much interest was expressed. The analysis makes use of previously obtained numerical results.
Insight into organic reactions from the direct random phase approximation and its corrections

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ruzsinszky, Adrienn; Zhang, Igor Ying; Scheffler, Matthias

2015-10-14

The performance of the random phase approximation (RPA) and beyond-RPA approximations for the treatment of electron correlation is benchmarked on three different molecular test sets. The test sets are chosen to represent three typical sources of error which can contribute to the failure of most density functional approximations in chemical reactions. The first test set (atomization and n-homodesmotic reactions) offers a gradually increasing balance of error from the chemical environment. The second test set (Diels-Alder reaction cycloaddition = DARC) reflects more the effect of weak dispersion interactions in chemical reactions. Finally, the third test set (self-interaction error 11 = SIE11)more » represents reactions which are exposed to noticeable self-interaction errors. This work seeks to answer whether any one of the many-body approximations considered here successfully addresses all these challenges.« less
Protein Models Docking Benchmark 2

PubMed Central

Anishchenko, Ivan; Kundrotas, Petras J.; Tuzikov, Alexander V.; Vakser, Ilya A.

2015-01-01

Structural characterization of protein-protein interactions is essential for our ability to understand life processes. However, only a fraction of known proteins have experimentally determined structures. Such structures provide templates for modeling of a large part of the proteome, where individual proteins can be docked by template-free or template-based techniques. Still, the sensitivity of the docking methods to the inherent inaccuracies of protein models, as opposed to the experimentally determined high-resolution structures, remains largely untested, primarily due to the absence of appropriate benchmark set(s). Structures in such a set should have pre-defined inaccuracy levels and, at the same time, resemble actual protein models in terms of structural motifs/packing. The set should also be large enough to ensure statistical reliability of the benchmarking results. We present a major update of the previously developed benchmark set of protein models. For each interactor, six models were generated with the model-to-native Cα RMSD in the 1 to 6 Å range. The models in the set were generated by a new approach, which corresponds to the actual modeling of new protein structures in the “real case scenario,” as opposed to the previous set, where a significant number of structures were model-like only. In addition, the larger number of complexes (165 vs. 63 in the previous set) increases the statistical reliability of the benchmarking. We estimated the highest accuracy of the predicted complexes (according to CAPRI criteria), which can be attained using the benchmark structures. The set is available at http://dockground.bioinformatics.ku.edu. PMID:25712716
Benchmarking an unstructured grid sediment model in an energetic estuary

DOE PAGES

Lopez, Jesse E.; Baptista, António M.

2016-12-14

A sediment model coupled to the hydrodynamic model SELFE is validated against a benchmark combining a set of idealized tests and an application to a field-data rich energetic estuary. After sensitivity studies, model results for the idealized tests largely agree with previously reported results from other models in addition to analytical, semi-analytical, or laboratory results. Results of suspended sediment in an open channel test with fixed bottom are sensitive to turbulence closure and treatment for hydrodynamic bottom boundary. Results for the migration of a trench are very sensitive to critical stress and erosion rate, but largely insensitive to turbulence closure.more » The model is able to qualitatively represent sediment dynamics associated with estuarine turbidity maxima in an idealized estuary. Applied to the Columbia River estuary, the model qualitatively captures sediment dynamics observed by fixed stations and shipborne profiles. Representation of the vertical structure of suspended sediment degrades when stratification is underpredicted. Across all tests, skill metrics of suspended sediments lag those of hydrodynamics even when qualitatively representing dynamics. The benchmark is fully documented in an openly available repository to encourage unambiguous comparisons against other models.« less
The mass storage testing laboratory at GSFC

NASA Technical Reports Server (NTRS)

Venkataraman, Ravi; Williams, Joel; Michaud, David; Gu, Heng; Kalluri, Atri; Hariharan, P. C.; Kobler, Ben; Behnke, Jeanne; Peavey, Bernard

1998-01-01

Industry-wide benchmarks exist for measuring the performance of processors (SPECmarks), and of database systems (Transaction Processing Council). Despite storage having become the dominant item in computing and IT (Information Technology) budgets, no such common benchmark is available in the mass storage field. Vendors and consultants provide services and tools for capacity planning and sizing, but these do not account for the complete set of metrics needed in today's archives. The availability of automated tape libraries, high-capacity RAID systems, and high- bandwidth interconnectivity between processor and peripherals has led to demands for services which traditional file systems cannot provide. File Storage and Management Systems (FSMS), which began to be marketed in the late 80's, have helped to some extent with large tape libraries, but their use has introduced additional parameters affecting performance. The aim of the Mass Storage Test Laboratory (MSTL) at Goddard Space Flight Center is to develop a test suite that includes not only a comprehensive check list to document a mass storage environment but also benchmark code. Benchmark code is being tested which will provide measurements for both baseline systems, i.e. applications interacting with peripherals through the operating system services, and for combinations involving an FSMS. The benchmarks are written in C, and are easily portable. They are initially being aimed at the UNIX Open Systems world. Measurements are being made using a Sun Ultra 170 Sparc with 256MB memory running Solaris 2.5.1 with the following configuration: 4mm tape stacker on SCSI 2 Fast/Wide; 4GB disk device on SCSI 2 Fast/Wide; and Sony Petaserve on Fast/Wide differential SCSI 2.
Building Diversified Multiple Trees for classification in high dimensional noisy biomedical data.

PubMed

Li, Jiuyong; Liu, Lin; Liu, Jixue; Green, Ryan

2017-12-01

It is common that a trained classification model is applied to the operating data that is deviated from the training data because of noise. This paper will test an ensemble method, Diversified Multiple Tree (DMT), on its capability for classifying instances in a new laboratory using the classifier built on the instances of another laboratory. DMT is tested on three real world biomedical data sets from different laboratories in comparison with four benchmark ensemble methods, AdaBoost, Bagging, Random Forests, and Random Trees. Experiments have also been conducted on studying the limitation of DMT and its possible variations. Experimental results show that DMT is significantly more accurate than other benchmark ensemble classifiers on classifying new instances of a different laboratory from the laboratory where instances are used to build the classifier. This paper demonstrates that an ensemble classifier, DMT, is more robust in classifying noisy data than other widely used ensemble methods. DMT works on the data set that supports multiple simple trees.
Test Cases for the Benchmark Active Controls: Spoiler and Control Surface Oscillations and Flutter

NASA Technical Reports Server (NTRS)

Bennett, Robert M.; Scott, Robert C.; Wieseman, Carol D.

2000-01-01

As a portion of the Benchmark Models Program at NASA Langley, a simple generic model was developed for active controls research and was called BACT for Benchmark Active Controls Technology model. This model was based on the previously-tested Benchmark Models rectangular wing with the NACA 0012 airfoil section that was mounted on the Pitch and Plunge Apparatus (PAPA) for flutter testing. The BACT model had an upper surface spoiler, a lower surface spoiler, and a trailing edge control surface for use in flutter suppression and dynamic response excitation. Previous experience with flutter suppression indicated a need for measured control surface aerodynamics for accurate control law design. Three different types of flutter instability boundaries had also been determined for the NACA 0012/PAPA model, a classical flutter boundary, a transonic stall flutter boundary at angle of attack, and a plunge instability near M = 0.9. Therefore an extensive set of steady and control surface oscillation data was generated spanning the range of the three types of instabilities. This information was subsequently used to design control laws to suppress each flutter instability. There have been three tests of the BACT model. The objective of the first test, TDT Test 485, was to generate a data set of steady and unsteady control surface effectiveness data, and to determine the open loop dynamic characteristics of the control systems including the actuators. Unsteady pressures, loads, and transfer functions were measured. The other two tests, TDT Test 502 and TDT Test 5 18, were primarily oriented towards active controls research, but some data supplementary to the first test were obtained. Dynamic response of the flexible system to control surface excitation and open loop flutter characteristics were determined during Test 502. Loads were not measured during the last two tests. During these tests, a database of over 3000 data sets was obtained. A reasonably extensive subset of the data sets from the first two tests have been chosen for Test Cases for computational comparisons concentrating on static conditions and cases with harmonically oscillating control surfaces. Several flutter Test Cases from both tests have also been included. Some aerodynamic comparisons with the BACT data have been made using computational fluid dynamics codes at the Navier-Stokes level (and in the accompanying chapter SC). Some mechanical and active control studies have been presented. In this report several Test Cases are selected to illustrate trends for a variety of different conditions with emphasis on transonic flow effects. Cases for static angles of attack, static trailing-edge and upper-surface spoiler deflections are included for a range of conditions near those for the oscillation cases. Cases for trailing-edge control and upper-surface spoiler oscillations for a range of Mach numbers, angle of attack, and static control deflections are included. Cases for all three types of flutter instability are selected. In addition some cases are included for dynamic response measurements during forced oscillations of the controls on the flexible mount. An overview of the model and tests is given, and the standard formulary for these data is listed. Some sample data and sample results of calculations are presented. Only the static pressures and the first harmonic real and imaginary parts of the pressures are included in the data for the Test Cases, but digitized time histories have been archived. The data for the Test Cases are also available as separate electronic files.
Optimized selection of benchmark test parameters for image watermark algorithms based on Taguchi methods and corresponding influence on design decisions for real-world applications

NASA Astrophysics Data System (ADS)

Rodriguez, Tony F.; Cushman, David A.

2003-06-01

With the growing commercialization of watermarking techniques in various application scenarios it has become increasingly important to quantify the performance of watermarking products. The quantification of relative merits of various products is not only essential in enabling further adoption of the technology by society as a whole, but will also drive the industry to develop testing plans/methodologies to ensure quality and minimize cost (to both vendors & customers.) While the research community understands the theoretical need for a publicly available benchmarking system to quantify performance, there has been less discussion on the practical application of these systems. By providing a standard set of acceptance criteria, benchmarking systems can dramatically increase the quality of a particular watermarking solution, validating the product performances if they are used efficiently and frequently during the design process. In this paper we describe how to leverage specific design of experiments techniques to increase the quality of a watermarking scheme, to be used with the benchmark tools being developed by the Ad-Hoc Watermark Verification Group. A Taguchi Loss Function is proposed for an application and orthogonal arrays used to isolate optimal levels for a multi-factor experimental situation. Finally, the results are generalized to a population of cover works and validated through an exhaustive test.
A Benchmarking Initiative for Reactive Transport Modeling Applied to Subsurface Environmental Applications

NASA Astrophysics Data System (ADS)

Steefel, C. I.

2015-12-01

Over the last 20 years, we have seen the evolution of multicomponent reactive transport modeling and the expanding range and increasing complexity of subsurface environmental applications it is being used to address. Reactive transport modeling is being asked to provide accurate assessments of engineering performance and risk for important issues with far-reaching consequences. As a result, the complexity and detail of subsurface processes, properties, and conditions that can be simulated have significantly expanded. Closed form solutions are necessary and useful, but limited to situations that are far simpler than typical applications that combine many physical and chemical processes, in many cases in coupled form. In the absence of closed form and yet realistic solutions for complex applications, numerical benchmark problems with an accepted set of results will be indispensable to qualifying codes for various environmental applications. The intent of this benchmarking exercise, now underway for more than five years, is to develop and publish a set of well-described benchmark problems that can be used to demonstrate simulator conformance with norms established by the subsurface science and engineering community. The objective is not to verify this or that specific code--the reactive transport codes play a supporting role in this regard—but rather to use the codes to verify that a common solution of the problem can be achieved. Thus, the objective of each of the manuscripts is to present an environmentally-relevant benchmark problem that tests the conceptual model capabilities, numerical implementation, process coupling, and accuracy. The benchmark problems developed to date include 1) microbially-mediated reactions, 2) isotopes, 3) multi-component diffusion, 4) uranium fate and transport, 5) metal mobility in mining affected systems, and 6) waste repositories and related aspects.

Comparative modeling and benchmarking data sets for human histone deacetylases and sirtuin families.

PubMed

Xia, Jie; Tilahun, Ermias Lemma; Kebede, Eyob Hailu; Reid, Terry-Elinor; Zhang, Liangren; Wang, Xiang Simon

2015-02-23

Histone deacetylases (HDACs) are an important class of drug targets for the treatment of cancers, neurodegenerative diseases, and other types of diseases. Virtual screening (VS) has become fairly effective approaches for drug discovery of novel and highly selective histone deacetylase inhibitors (HDACIs). To facilitate the process, we constructed maximal unbiased benchmarking data sets for HDACs (MUBD-HDACs) using our recently published methods that were originally developed for building unbiased benchmarking sets for ligand-based virtual screening (LBVS). The MUBD-HDACs cover all four classes including Class III (Sirtuins family) and 14 HDAC isoforms, composed of 631 inhibitors and 24609 unbiased decoys. Its ligand sets have been validated extensively as chemically diverse, while the decoy sets were shown to be property-matching with ligands and maximal unbiased in terms of "artificial enrichment" and "analogue bias". We also conducted comparative studies with DUD-E and DEKOIS 2.0 sets against HDAC2 and HDAC8 targets and demonstrate that our MUBD-HDACs are unique in that they can be applied unbiasedly to both LBVS and SBVS approaches. In addition, we defined a novel metric, i.e. NLBScore, to detect the "2D bias" and "LBVS favorable" effect within the benchmarking sets. In summary, MUBD-HDACs are the only comprehensive and maximal-unbiased benchmark data sets for HDACs (including Sirtuins) that are available so far. MUBD-HDACs are freely available at http://www.xswlab.org/ .
Microbially Mediated Kinetic Sulfur Isotope Fractionation: Reactive Transport Modeling Benchmark

NASA Astrophysics Data System (ADS)

Wanner, C.; Druhan, J. L.; Cheng, Y.; Amos, R. T.; Steefel, C. I.; Ajo Franklin, J. B.

2014-12-01

Microbially mediated sulfate reduction is a ubiquitous process in many subsurface systems. Isotopic fractionation is characteristic of this anaerobic process, since sulfate reducing bacteria (SRB) favor the reduction of the lighter sulfate isotopologue (S32O42-) over the heavier isotopologue (S34O42-). Detection of isotopic shifts have been utilized as a proxy for the onset of sulfate reduction in subsurface systems such as oil reservoirs and aquifers undergoing uranium bioremediation. Reactive transport modeling (RTM) of kinetic sulfur isotope fractionation has been applied to field and laboratory studies. These RTM approaches employ different mathematical formulations in the representation of kinetic sulfur isotope fractionation. In order to test the various formulations, we propose a benchmark problem set for the simulation of kinetic sulfur isotope fractionation during microbially mediated sulfate reduction. The benchmark problem set is comprised of four problem levels and is based on a recent laboratory column experimental study of sulfur isotope fractionation. Pertinent processes impacting sulfur isotopic composition such as microbial sulfate reduction and dispersion are included in the problem set. To date, participating RTM codes are: CRUNCHTOPE, TOUGHREACT, MIN3P and THE GEOCHEMIST'S WORKBENCH. Preliminary results from various codes show reasonable agreement for the problem levels simulating sulfur isotope fractionation in 1D.
Benchmarking Procedures for High-Throughput Context Specific Reconstruction Algorithms

PubMed Central

Pacheco, Maria P.; Pfau, Thomas; Sauter, Thomas

2016-01-01

Recent progress in high-throughput data acquisition has shifted the focus from data generation to processing and understanding of how to integrate collected information. Context specific reconstruction based on generic genome scale models like ReconX or HMR has the potential to become a diagnostic and treatment tool tailored to the analysis of specific individuals. The respective computational algorithms require a high level of predictive power, robustness and sensitivity. Although multiple context specific reconstruction algorithms were published in the last 10 years, only a fraction of them is suitable for model building based on human high-throughput data. Beside other reasons, this might be due to problems arising from the limitation to only one metabolic target function or arbitrary thresholding. This review describes and analyses common validation methods used for testing model building algorithms. Two major methods can be distinguished: consistency testing and comparison based testing. The first is concerned with robustness against noise, e.g., missing data due to the impossibility to distinguish between the signal and the background of non-specific binding of probes in a microarray experiment, and whether distinct sets of input expressed genes corresponding to i.e., different tissues yield distinct models. The latter covers methods comparing sets of functionalities, comparison with existing networks or additional databases. We test those methods on several available algorithms and deduce properties of these algorithms that can be compared with future developments. The set of tests performed, can therefore serve as a benchmarking procedure for future algorithms. PMID:26834640
Maximal Unbiased Benchmarking Data Sets for Human Chemokine Receptors and Comparative Analysis.

PubMed

Xia, Jie; Reid, Terry-Elinor; Wu, Song; Zhang, Liangren; Wang, Xiang Simon

2018-05-29

Chemokine receptors (CRs) have long been druggable targets for the treatment of inflammatory diseases and HIV-1 infection. As a powerful technique, virtual screening (VS) has been widely applied to identifying small molecule leads for modern drug targets including CRs. For rational selection of a wide variety of VS approaches, ligand enrichment assessment based on a benchmarking data set has become an indispensable practice. However, the lack of versatile benchmarking sets for the whole CRs family that are able to unbiasedly evaluate every single approach including both structure- and ligand-based VS somewhat hinders modern drug discovery efforts. To address this issue, we constructed Maximal Unbiased Benchmarking Data sets for human Chemokine Receptors (MUBD-hCRs) using our recently developed tools of MUBD-DecoyMaker. The MUBD-hCRs encompasses 13 subtypes out of 20 chemokine receptors, composed of 404 ligands and 15756 decoys so far and is readily expandable in the future. It had been thoroughly validated that MUBD-hCRs ligands are chemically diverse while its decoys are maximal unbiased in terms of "artificial enrichment", "analogue bias". In addition, we studied the performance of MUBD-hCRs, in particular CXCR4 and CCR5 data sets, in ligand enrichment assessments of both structure- and ligand-based VS approaches in comparison with other benchmarking data sets available in the public domain and demonstrated that MUBD-hCRs is very capable of designating the optimal VS approach. MUBD-hCRs is a unique and maximal unbiased benchmarking set that covers major CRs subtypes so far.
Machine characterization and benchmark performance prediction

NASA Technical Reports Server (NTRS)

Saavedra-Barrera, Rafael H.

1988-01-01

From runs of standard benchmarks or benchmark suites, it is not possible to characterize the machine nor to predict the run time of other benchmarks which have not been run. A new approach to benchmarking and machine characterization is reported. The creation and use of a machine analyzer is described, which measures the performance of a given machine on FORTRAN source language constructs. The machine analyzer yields a set of parameters which characterize the machine and spotlight its strong and weak points. Also described is a program analyzer, which analyzes FORTRAN programs and determines the frequency of execution of each of the same set of source language operations. It is then shown that by combining a machine characterization and a program characterization, we are able to predict with good accuracy the run time of a given benchmark on a given machine. Characterizations are provided for the Cray-X-MP/48, Cyber 205, IBM 3090/200, Amdahl 5840, Convex C-1, VAX 8600, VAX 11/785, VAX 11/780, SUN 3/50, and IBM RT-PC/125, and for the following benchmark programs or suites: Los Alamos (BMK8A1), Baskett, Linpack, Livermore Loops, Madelbrot Set, NAS Kernels, Shell Sort, Smith, Whetstone and Sieve of Erathostenes.
Structural and Sequence Similarity Makes a Significant Impact on Machine-Learning-Based Scoring Functions for Protein-Ligand Interactions.

PubMed

Li, Yang; Yang, Jianyi

2017-04-24

The prediction of protein-ligand binding affinity has recently been improved remarkably by machine-learning-based scoring functions. For example, using a set of simple descriptors representing the atomic distance counts, the RF-Score improves the Pearson correlation coefficient to about 0.8 on the core set of the PDBbind 2007 database, which is significantly higher than the performance of any conventional scoring function on the same benchmark. A few studies have been made to discuss the performance of machine-learning-based methods, but the reason for this improvement remains unclear. In this study, by systemically controlling the structural and sequence similarity between the training and test proteins of the PDBbind benchmark, we demonstrate that protein structural and sequence similarity makes a significant impact on machine-learning-based methods. After removal of training proteins that are highly similar to the test proteins identified by structure alignment and sequence alignment, machine-learning-based methods trained on the new training sets do not outperform the conventional scoring functions any more. On the contrary, the performance of conventional functions like X-Score is relatively stable no matter what training data are used to fit the weights of its energy terms.
Once-through integral system (OTIS): Final report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gloudemans, J R

1986-09-01

A scaled experimental facility, designated the once-through integral system (OTIS), was used to acquire post-small break loss-of-coolant accident (SBLOCA) data for benchmarking system codes. OTIS was also used to investigate the application of the Abnormal Transient Operating Guidelines (ATOG) used in the Babcock and Wilcox (B and W) designed nuclear steam supply system (NSSS) during the course of an SBLOCA. OTIS was a single-loop facility with a plant to model power scale factor of 1686. OTIS maintained the key elevations, approximate component volumes, and loop flow resistances, and simulated the major component phenomena of a B and W raised-loop nuclearmore » plant. A test matrix consisting of 15 tests divided into four categories was performed. The largest group contained 10 tests and was defined to parametrically obtain an extensive set of plant-typical experimental data for code benchmarking. Parameters such as leak size, leak location, and high-pressure injection (HPI) shut-off head were individually varied. The remaining categories were specified to study the impact of the ATOGs (2 tests), to note the effect of guard heater operation on observed phenomena (2 tests), and to provide a data set for comparison with previous test experience (1 test). A summary of the test results and a detailed discussion of Test 220100 is presented. Test 220100 was the nominal or reference test for the parametric studies. This test was performed with a scaled 10-cm/sup 2/ leak located in the cold leg suction piping.« less
CompaRNA: a server for continuous benchmarking of automated methods for RNA secondary structure prediction

PubMed Central

Puton, Tomasz; Kozlowski, Lukasz P.; Rother, Kristian M.; Bujnicki, Janusz M.

2013-01-01

We present a continuous benchmarking approach for the assessment of RNA secondary structure prediction methods implemented in the CompaRNA web server. As of 3 October 2012, the performance of 28 single-sequence and 13 comparative methods has been evaluated on RNA sequences/structures released weekly by the Protein Data Bank. We also provide a static benchmark generated on RNA 2D structures derived from the RNAstrand database. Benchmarks on both data sets offer insight into the relative performance of RNA secondary structure prediction methods on RNAs of different size and with respect to different types of structure. According to our tests, on the average, the most accurate predictions obtained by a comparative approach are generated by CentroidAlifold, MXScarna, RNAalifold and TurboFold. On the average, the most accurate predictions obtained by single-sequence analyses are generated by CentroidFold, ContextFold and IPknot. The best comparative methods typically outperform the best single-sequence methods if an alignment of homologous RNA sequences is available. This article presents the results of our benchmarks as of 3 October 2012, whereas the rankings presented online are continuously updated. We will gladly include new prediction methods and new measures of accuracy in the new editions of CompaRNA benchmarks. PMID:23435231
A suite of exercises for verifying dynamic earthquake rupture codes

USGS Publications Warehouse

Harris, Ruth A.; Barall, Michael; Aagaard, Brad T.; Ma, Shuo; Roten, Daniel; Olsen, Kim B.; Duan, Benchun; Liu, Dunyu; Luo, Bin; Bai, Kangchen; Ampuero, Jean-Paul; Kaneko, Yoshihiro; Gabriel, Alice-Agnes; Duru, Kenneth; Ulrich, Thomas; Wollherr, Stephanie; Shi, Zheqiang; Dunham, Eric; Bydlon, Sam; Zhang, Zhenguo; Chen, Xiaofei; Somala, Surendra N.; Pelties, Christian; Tago, Josue; Cruz-Atienza, Victor Manuel; Kozdon, Jeremy; Daub, Eric; Aslam, Khurram; Kase, Yuko; Withers, Kyle; Dalguer, Luis

2018-01-01

We describe a set of benchmark exercises that are designed to test if computer codes that simulate dynamic earthquake rupture are working as intended. These types of computer codes are often used to understand how earthquakes operate, and they produce simulation results that include earthquake size, amounts of fault slip, and the patterns of ground shaking and crustal deformation. The benchmark exercises examine a range of features that scientists incorporate in their dynamic earthquake rupture simulations. These include implementations of simple or complex fault geometry, off‐fault rock response to an earthquake, stress conditions, and a variety of formulations for fault friction. Many of the benchmarks were designed to investigate scientific problems at the forefronts of earthquake physics and strong ground motions research. The exercises are freely available on our website for use by the scientific community.
How Benchmarking and Higher Education Came Together

ERIC Educational Resources Information Center

Levy, Gary D.; Ronco, Sharron L.

2012-01-01

This chapter introduces the concept of benchmarking and how higher education institutions began to use benchmarking for a variety of purposes. Here, benchmarking is defined as a strategic and structured approach whereby an organization compares aspects of its processes and/or outcomes to those of another organization or set of organizations to…
SP2Bench: A SPARQL Performance Benchmark

NASA Astrophysics Data System (ADS)

Schmidt, Michael; Hornung, Thomas; Meier, Michael; Pinkel, Christoph; Lausen, Georg

A meaningful analysis and comparison of both existing storage schemes for RDF data and evaluation approaches for SPARQL queries necessitates a comprehensive and universal benchmark platform. We present SP2Bench, a publicly available, language-specific performance benchmark for the SPARQL query language. SP2Bench is settled in the DBLP scenario and comprises a data generator for creating arbitrarily large DBLP-like documents and a set of carefully designed benchmark queries. The generated documents mirror vital key characteristics and social-world distributions encountered in the original DBLP data set, while the queries implement meaningful requests on top of this data, covering a variety of SPARQL operator constellations and RDF access patterns. In this chapter, we discuss requirements and desiderata for SPARQL benchmarks and present the SP2Bench framework, including its data generator, benchmark queries and performance metrics.
Comparative Modeling and Benchmarking Data Sets for Human Histone Deacetylases and Sirtuin Families

PubMed Central

Xia, Jie; Tilahun, Ermias Lemma; Kebede, Eyob Hailu; Reid, Terry-Elinor; Zhang, Liangren; Wang, Xiang Simon

2015-01-01

Histone Deacetylases (HDACs) are an important class of drug targets for the treatment of cancers, neurodegenerative diseases and other types of diseases. Virtual screening (VS) has become fairly effective approaches for drug discovery of novel and highly selective Histone Deacetylases Inhibitors (HDACIs). To facilitate the process, we constructed the Maximal Unbiased Benchmarking Data Sets for HDACs (MUBD-HDACs) using our recently published methods that were originally developed for building unbiased benchmarking sets for ligand-based virtual screening (LBVS). The MUBD-HDACs covers all 4 Classes including Class III (Sirtuins family) and 14 HDACs isoforms, composed of 631 inhibitors and 24,609 unbiased decoys. Its ligand sets have been validated extensively as chemically diverse, while the decoy sets were shown to be property-matching with ligands and maximal unbiased in terms of “artificial enrichment” and “analogue bias”. We also conducted comparative studies with DUD-E and DEKOIS 2.0 sets against HDAC2 and HDAC8 targets, and demonstrate that our MUBD-HDACs is unique in that it can be applied unbiasedly to both LBVS and SBVS approaches. In addition, we defined a novel metric, i.e. NLBScore, to detect the “2D bias” and “LBVS favorable” effect within the benchmarking sets. In summary, MUBD-HDACs is the only comprehensive and maximal-unbiased benchmark data sets for HDACs (including Sirtuins) that is available so far. MUBD-HDACs is freely available at http://www.xswlab.org/. PMID:25633490
Least-Squares Spectral Element Solutions to the CAA Workshop Benchmark Problems

NASA Technical Reports Server (NTRS)

Lin, Wen H.; Chan, Daniel C.

1997-01-01

This paper presents computed results for some of the CAA benchmark problems via the acoustic solver developed at Rocketdyne CFD Technology Center under the corporate agreement between Boeing North American, Inc. and NASA for the Aerospace Industry Technology Program. The calculations are considered as benchmark testing of the functionality, accuracy, and performance of the solver. Results of these computations demonstrate that the solver is capable of solving the propagation of aeroacoustic signals. Testing of sound generation and on more realistic problems is now pursued for the industrial applications of this solver. Numerical calculations were performed for the second problem of Category 1 of the current workshop problems for an acoustic pulse scattered from a rigid circular cylinder, and for two of the first CAA workshop problems, i. e., the first problem of Category 1 for the propagation of a linear wave and the first problem of Category 4 for an acoustic pulse reflected from a rigid wall in a uniform flow of Mach 0.5. The aim for including the last two problems in this workshop is to test the effectiveness of some boundary conditions set up in the solver. Numerical results of the last two benchmark problems have been compared with their corresponding exact solutions and the comparisons are excellent. This demonstrates the high fidelity of the solver in handling wave propagation problems. This feature lends the method quite attractive in developing a computational acoustic solver for calculating the aero/hydrodynamic noise in a violent flow environment.
The NAS parallel benchmarks

NASA Technical Reports Server (NTRS)

Bailey, D. H.; Barszcz, E.; Barton, J. T.; Carter, R. L.; Lasinski, T. A.; Browning, D. S.; Dagum, L.; Fatoohi, R. A.; Frederickson, P. O.; Schreiber, R. S.

1991-01-01

A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers in the framework of the NASA Ames Numerical Aerodynamic Simulation (NAS) Program. These consist of five 'parallel kernel' benchmarks and three 'simulated application' benchmarks. Together they mimic the computation and data movement characteristics of large-scale computational fluid dynamics applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification-all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
Direct data access protocols benchmarking on DPM

NASA Astrophysics Data System (ADS)

Furano, Fabrizio; Devresse, Adrien; Keeble, Oliver; Mancinelli, Valentina

2015-12-01

The Disk Pool Manager is an example of a multi-protocol, multi-VO system for data access on the Grid that went though a considerable technical evolution in the last years. Among other features, its architecture offers the opportunity of testing its different data access frontends under exactly the same conditions, including hardware and backend software. This characteristic inspired the idea of collecting monitoring information from various testbeds in order to benchmark the behaviour of the HTTP and Xrootd protocols for the use case of data analysis, batch or interactive. A source of information is the set of continuous tests that are run towards the worldwide endpoints belonging to the DPM Collaboration, which accumulated relevant statistics in its first year of activity. On top of that, the DPM releases are based on multiple levels of automated testing that include performance benchmarks of various kinds, executed regularly every day. At the same time, the recent releases of DPM can report monitoring information about any data access protocol to the same monitoring infrastructure that is used to monitor the Xrootd deployments. Our goal is to evaluate under which circumstances the HTTP-based protocols can be good enough for batch or interactive data access. In this contribution we show and discuss the results that our test systems have collected under the circumstances that include ROOT analyses using TTreeCache and stress tests on the metadata performance.
The Concepts "Benchmarks and Benchmarking" Used in Education Planning: Teacher Education as Example

ERIC Educational Resources Information Center

Steyn, H. J.

2015-01-01

Planning in education is a structured activity that includes several phases and steps that take into account several kinds of information (Steyn, Steyn, De Waal & Wolhuter, 2002: 146). One of the sets of information that are usually considered is the (so-called) "benchmarks" and "benchmarking" regarding the focus of a…
HS06 Benchmark for an ARM Server

NASA Astrophysics Data System (ADS)

Kluth, Stefan

2014-06-01

We benchmarked an ARM cortex-A9 based server system with a four-core CPU running at 1.1 GHz. The system used Ubuntu 12.04 as operating system and the HEPSPEC 2006 (HS06) benchmarking suite was compiled natively with gcc-4.4 on the system. The benchmark was run for various settings of the relevant gcc compiler options. We did not find significant influence from the compiler options on the benchmark result. The final HS06 benchmark result is 10.4.
The impact of database quality on keystroke dynamics authentication

NASA Astrophysics Data System (ADS)

Panasiuk, Piotr; Rybnik, Mariusz; Saeed, Khalid; Rogowski, Marcin

2016-06-01

This paper concerns keystroke dynamics, also partially in the context of touchscreen devices. The authors concentrate on the impact of database quality and propose their algorithm to test database quality issues. The algorithm is used on their own as well as the well-known . Following specific problems were researched: classification accuracy, development of user typing proficiency, time precision during sample acquisition, representativeness of training set, sample length.
An evaluation of the accuracy and speed of metagenome analysis tools

PubMed Central

Lindgreen, Stinus; Adair, Karen L.; Gardner, Paul P.

2016-01-01

Metagenome studies are becoming increasingly widespread, yielding important insights into microbial communities covering diverse environments from terrestrial and aquatic ecosystems to human skin and gut. With the advent of high-throughput sequencing platforms, the use of large scale shotgun sequencing approaches is now commonplace. However, a thorough independent benchmark comparing state-of-the-art metagenome analysis tools is lacking. Here, we present a benchmark where the most widely used tools are tested on complex, realistic data sets. Our results clearly show that the most widely used tools are not necessarily the most accurate, that the most accurate tool is not necessarily the most time consuming, and that there is a high degree of variability between available tools. These findings are important as the conclusions of any metagenomics study are affected by errors in the predicted community composition and functional capacity. Data sets and results are freely available from http://www.ucbioinformatics.org/metabenchmark.html PMID:26778510
Study on kinematic and compliance test of suspension

NASA Astrophysics Data System (ADS)

Jing, Lixin; Wu, Liguang; Li, Xuepeng; Zhang, Yu

2017-09-01

Chassis performance development is a major difficulty in vehicle research and development, which is the main factor restricting the independent development of vehicles in China. These years, through a large number of studies, chassis engineers have found that the suspension K&C characteristics as a quasi-static characteristic of the suspension provides a technical route for the suspension performance R&D, and the suspension K&C test has become an important means of vehicle benchmarking, optimization and verification. However, the research on suspension K&C test is less in china, and the test conditions and setting requirements vary greatly from OEM to OEM. In this paper, the influence of different settings on the characteristics of the suspension is obtained through experiments, and the causes of the differences are analyzed; in order to fully reflect the suspension characteristics, the author recommends the appropriate test case and settings.

Minimizing the Total Service Time of Discrete Dynamic Berth Allocation Problem by an Iterated Greedy Heuristic

PubMed Central

2014-01-01

Berth allocation is the forefront operation performed when ships arrive at a port and is a critical task in container port optimization. Minimizing the time ships spend at berths constitutes an important objective of berth allocation problems. This study focuses on the discrete dynamic berth allocation problem (discrete DBAP), which aims to minimize total service time, and proposes an iterated greedy (IG) algorithm to solve it. The proposed IG algorithm is tested on three benchmark problem sets. Experimental results show that the proposed IG algorithm can obtain optimal solutions for all test instances of the first and second problem sets and outperforms the best-known solutions for 35 out of 90 test instances of the third problem set. PMID:25295295
BioPreDyn-bench: a suite of benchmark problems for dynamic modelling in systems biology.

PubMed

Villaverde, Alejandro F; Henriques, David; Smallbone, Kieran; Bongard, Sophia; Schmid, Joachim; Cicin-Sain, Damjan; Crombach, Anton; Saez-Rodriguez, Julio; Mauch, Klaus; Balsa-Canto, Eva; Mendes, Pedro; Jaeger, Johannes; Banga, Julio R

2015-02-20

Dynamic modelling is one of the cornerstones of systems biology. Many research efforts are currently being invested in the development and exploitation of large-scale kinetic models. The associated problems of parameter estimation (model calibration) and optimal experimental design are particularly challenging. The community has already developed many methods and software packages which aim to facilitate these tasks. However, there is a lack of suitable benchmark problems which allow a fair and systematic evaluation and comparison of these contributions. Here we present BioPreDyn-bench, a set of challenging parameter estimation problems which aspire to serve as reference test cases in this area. This set comprises six problems including medium and large-scale kinetic models of the bacterium E. coli, baker's yeast S. cerevisiae, the vinegar fly D. melanogaster, Chinese Hamster Ovary cells, and a generic signal transduction network. The level of description includes metabolism, transcription, signal transduction, and development. For each problem we provide (i) a basic description and formulation, (ii) implementations ready-to-run in several formats, (iii) computational results obtained with specific solvers, (iv) a basic analysis and interpretation. This suite of benchmark problems can be readily used to evaluate and compare parameter estimation methods. Further, it can also be used to build test problems for sensitivity and identifiability analysis, model reduction and optimal experimental design methods. The suite, including codes and documentation, can be freely downloaded from the BioPreDyn-bench website, https://sites.google.com/site/biopredynbenchmarks/ .
The InterFrost benchmark of Thermo-Hydraulic codes for cold regions hydrology - first inter-comparison phase results

NASA Astrophysics Data System (ADS)

Grenier, Christophe; Rühaak, Wolfram

2016-04-01

Climate change impacts in permafrost regions have received considerable attention recently due to the pronounced warming trends experienced in recent decades and which have been projected into the future. Large portions of these permafrost regions are characterized by surface water bodies (lakes, rivers) that interact with the surrounding permafrost often generating taliks (unfrozen zones) within the permafrost that allow for hydrologic interactions between the surface water bodies and underlying aquifers and thus influence the hydrologic response of a landscape to climate change. Recent field studies and modeling exercises indicate that a fully coupled 2D or 3D Thermo-Hydraulic (TH) approach is required to understand and model past and future evolution such units (Kurylyk et al. 2014). However, there is presently a paucity of 3D numerical studies of permafrost thaw and associated hydrological changes, which can be partly attributed to the difficulty in verifying multi-dimensional results produced by numerical models. A benchmark exercise was initialized at the end of 2014. Participants convened from USA, Canada, Europe, representing 13 simulation codes. The benchmark exercises consist of several test cases inspired by existing literature (e.g. McKenzie et al., 2007) as well as new ones (Kurylyk et al. 2014; Grenier et al. in prep.; Rühaak et al. 2015). They range from simpler, purely thermal 1D cases to more complex, coupled 2D TH cases (benchmarks TH1, TH2, and TH3). Some experimental cases conducted in a cold room complement the validation approach. A web site hosted by LSCE (Laboratoire des Sciences du Climat et de l'Environnement) is an interaction platform for the participants and hosts the test case databases at the following address: https://wiki.lsce.ipsl.fr/interfrost. The results of the first stage of the benchmark exercise will be presented. We will mainly focus on the inter-comparison of participant results for the coupled cases TH2 & TH3. Both cases are essentially theoretical but include the full complexity of the coupled non-linear set of equations (heat transfer with conduction, advection, phase change and Darcian flow). The complete set of inter-comparison results shows that the participating codes all produce simulations which are quantitatively similar and correspond to physical intuition. From a quantitative perspective, they agree well over the whole set of performance measures. The differences among the simulation results will be discussed in more depth throughout the test cases especially for the identification of the threshold times for each system as these exhibited the least agreement. However, the results suggest that in spite of the difficulties associated with the resolution of the set of TH equations (coupled and non-linear structure with phase change providing steep slopes), the developed codes provide robust results with a qualitatively reasonable representation of the processes and offer a quantitatively realistic basis. Further perspectives of the exercise will also be presented.
Coreference Resolution With Reconcile

DTIC Science & Technology

2010-07-01

evaluation of coreference re- solvers across a variety of benchmark data sets and standard scoring metrics. We describe Reconcile and present experimental... scores vary wildly across data sets, evaluation metrics, and system configurations. We believe that one root cause of these dispar- ities is the high...resolution and empirical evaluation of coreference resolvers across a variety of benchmark data sets and standard scoring metrics. We describe Reconcile
Issues to consider in the derivation of water quality benchmarks for the protection of aquatic life.

PubMed

Schneider, Uwe

2014-01-01

While water quality benchmarks for the protection of aquatic life have been in use in some jurisdictions for several decades (USA, Canada, several European countries), more and more countries are now setting up their own national water quality benchmark development programs. In doing so, they either adopt an existing method from another jurisdiction, update on an existing approach, or develop their own new derivation method. Each approach has its own advantages and disadvantages, and many issues have to be addressed when setting up a water quality benchmark development program or when deriving a water quality benchmark. Each of these tasks requires a special expertise. They may seem simple, but are complex in their details. The intention of this paper was to provide some guidance for this process of water quality benchmark development on the program level, for the derivation methodology development, and in the actual benchmark derivation step, as well as to point out some issues (notably the inclusion of adapted populations and cryptic species and points to consider in the use of the species sensitivity distribution approach) and future opportunities (an international data repository and international collaboration in water quality benchmark development).
The philosophy of benchmark testing a standards-based picture archiving and communications system.

PubMed

Richardson, N E; Thomas, J A; Lyche, D K; Romlein, J; Norton, G S; Dolecek, Q E

1999-05-01

The Department of Defense issued its requirements for a Digital Imaging Network-Picture Archiving and Communications System (DIN-PACS) in a Request for Proposals (RFP) to industry in January 1997, with subsequent contracts being awarded in November 1997 to the Agfa Division of Bayer and IBM Global Government Industry. The Government's technical evaluation process consisted of evaluating a written technical proposal as well as conducting a benchmark test of each proposed system at the vendor's test facility. The purpose of benchmark testing was to evaluate the performance of the fully integrated system in a simulated operational environment. The benchmark test procedures and test equipment were developed through a joint effort between the Government, academic institutions, and private consultants. Herein the authors discuss the resources required and the methods used to benchmark test a standards-based PACS.
‘Wasteaware’ benchmark indicators for integrated sustainable waste management in cities

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wilson, David C., E-mail: waste@davidcwilson.com; Rodic, Ljiljana; Cowing, Michael J.

Highlights: • Solid waste management (SWM) is a key utility service, but data is often lacking. • Measuring their SWM performance helps a city establish priorities for action. • The Wasteaware benchmark indicators: measure both technical and governance aspects. • Have been developed over 5 years and tested in more than 50 cities on 6 continents. • Enable consistent comparison between cities and countries and monitoring progress. - Abstract: This paper addresses a major problem in international solid waste management, which is twofold: a lack of data, and a lack of consistent data to allow comparison between cities. The papermore » presents an indicator set for integrated sustainable waste management (ISWM) in cities both North and South, to allow benchmarking of a city’s performance, comparing cities and monitoring developments over time. It builds on pioneering work for UN-Habitat’s solid waste management in the World’s cities. The comprehensive analytical framework of a city’s solid waste management system is divided into two overlapping ‘triangles’ – one comprising the three physical components, i.e. collection, recycling, and disposal, and the other comprising three governance aspects, i.e. inclusivity; financial sustainability; and sound institutions and proactive policies. The indicator set includes essential quantitative indicators as well as qualitative composite indicators. This updated and revised ‘Wasteaware’ set of ISWM benchmark indicators is the cumulative result of testing various prototypes in more than 50 cities around the world. This experience confirms the utility of indicators in allowing comprehensive performance measurement and comparison of both ‘hard’ physical components and ‘soft’ governance aspects; and in prioritising ‘next steps’ in developing a city’s solid waste management system, by identifying both local strengths that can be built on and weak points to be addressed. The Wasteaware ISWM indicators are applicable to a broad range of cities with very different levels of income and solid waste management practices. Their wide application as a standard methodology will help to fill the historical data gap.« less
Toxicological benchmarks for screening potential contaminants of concern for effects on aquatic biota: 1996 revision

DOE Office of Scientific and Technical Information (OSTI.GOV)

Suter, G.W. II; Tsao, C.L.

1996-06-01

This report presents potential screening benchmarks for protection of aquatic life form contaminants in water. Because there is no guidance for screening for benchmarks, a set of alternative benchmarks is presented herein. This report presents the alternative benchmarks for chemicals that have been detected on the Oak Ridge Reservation. It also presents the data used to calculate the benchmarks and the sources of the data. It compares the benchmarks and discusses their relative conservatism and utility. Also included is the updates of benchmark values where appropriate, new benchmark values, secondary sources are replaced by primary sources, and a more completemore » documentation of the sources and derivation of all values are presented.« less
The MCNP6 Analytic Criticality Benchmark Suite

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, Forrest B.

2016-06-16

Analytical benchmarks provide an invaluable tool for verifying computer codes used to simulate neutron transport. Several collections of analytical benchmark problems [1-4] are used routinely in the verification of production Monte Carlo codes such as MCNP® [5,6]. Verification of a computer code is a necessary prerequisite to the more complex validation process. The verification process confirms that a code performs its intended functions correctly. The validation process involves determining the absolute accuracy of code results vs. nature. In typical validations, results are computed for a set of benchmark experiments using a particular methodology (code, cross-section data with uncertainties, and modeling)more » and compared to the measured results from the set of benchmark experiments. The validation process determines bias, bias uncertainty, and possibly additional margins. Verification is generally performed by the code developers, while validation is generally performed by code users for a particular application space. The VERIFICATION_KEFF suite of criticality problems [1,2] was originally a set of 75 criticality problems found in the literature for which exact analytical solutions are available. Even though the spatial and energy detail is necessarily limited in analytical benchmarks, typically to a few regions or energy groups, the exact solutions obtained can be used to verify that the basic algorithms, mathematics, and methods used in complex production codes perform correctly. The present work has focused on revisiting this benchmark suite. A thorough review of the problems resulted in discarding some of them as not suitable for MCNP benchmarking. For the remaining problems, many of them were reformulated to permit execution in either multigroup mode or in the normal continuous-energy mode for MCNP. Execution of the benchmarks in continuous-energy mode provides a significant advance to MCNP verification methods.« less
The Impact of the Measures of Academic Progress (MAP) Program on Student Reading Achievement. Final Report. NCEE 2013-4000

ERIC Educational Resources Information Center

Cordray, David; Pion, Georgine; Brandt, Chris; Molefe, Ayrin; Toby, Megan

2012-01-01

During the past decade, the use of standardized benchmark measures to differentiate and individualize instruction for students received renewed attention from educators. Although teachers may use their own assessments (tests, quizzes, homework, problem sets) for monitoring learning, it is challenging for them to equate performance on classroom…
Using HFire for spatial modeling of fire in shrublands

Treesearch

Seth H. Peterson; Marco E. Morais; Jean M. Carlson; Philip E. Dennison; Dar A. Roberts; Max A. Moritz; David R. Weise

2009-01-01

An efficient raster fire-spread model named HFire is introduced. HFire can simulate single-fire events or long-term fire regimes, using the same fire-spread algorithm. This paper describes the HFire algorithm, benchmarks the model using a standard set of tests developed for FARSITE, and compares historical and predicted fire spread perimeters for three southern...
libvdwxc: a library for exchange-correlation functionals in the vdW-DF family

NASA Astrophysics Data System (ADS)

Hjorth Larsen, Ask; Kuisma, Mikael; Löfgren, Joakim; Pouillon, Yann; Erhart, Paul; Hyldgaard, Per

2017-09-01

We present libvdwxc, a general library for evaluating the energy and potential for the family of vdW-DF exchange-correlation functionals. libvdwxc is written in C and provides an efficient implementation of the vdW-DF method and can be interfaced with various general-purpose DFT codes. Currently, the Gpaw and Octopus codes implement interfaces to libvdwxc. The present implementation emphasizes scalability and parallel performance, and thereby enables ab initio calculations of nanometer-scale complexes. The numerical accuracy is benchmarked on the S22 test set whereas parallel performance is benchmarked on ligand-protected gold nanoparticles ({{Au}}144{({{SC}}11{{NH}}25)}60) up to 9696 atoms.
Making Benchmark Testing Work

ERIC Educational Resources Information Center

Herman, Joan L.; Baker, Eva L.

2005-01-01

Many schools are moving to develop benchmark tests to monitor their students' progress toward state standards throughout the academic year. Benchmark tests can provide the ongoing information that schools need to guide instructional programs and to address student learning problems. The authors discuss six criteria that educators can use to…
A health risk benchmark for the neurologic effects of styrene: comparison with NOAEL/LOAEL approach.

PubMed

Rabovsky, J; Fowles, J; Hill, M D; Lewis, D C

2001-02-01

Benchmark dose (BMD) analysis was used to estimate an inhalation benchmark concentration for styrene neurotoxicity. Quantal data on neuropsychologic test results from styrene-exposed workers [Mutti et al. (1984). American Journal of Industrial Medicine, 5, 275-286] were used to quantify neurotoxicity, defined as the percent of tested workers who responded abnormally to > or = 1, > or = 2, or > or = 3 out of a battery of eight tests. Exposure was based on previously published results on mean urinary mandelic- and phenylglyoxylic acid levels in the workers, converted to air styrene levels (15, 44, 74, or 115 ppm). Nonstyrene-exposed workers from the same region served as a control group. Maximum-likelihood estimates (MLEs) and BMDs at 5 and 10% response levels of the exposed population were obtained from log-normal analysis of the quantal data. The highest MLE was 9 ppm (BMD = 4 ppm) styrene and represents abnormal responses to > or = 3 tests by 10% of the exposed population. The most health-protective MLE was 2 ppm styrene (BMD = 0.3 ppm) and represents abnormal responses to > or = 1 test by 5% of the exposed population. A no observed adverse effect level/lowest observed adverse effect level (NOAEL/LOAEL) analysis of the same quantal data showed workers in all styrene exposure groups responded abnormally to > or = 1, > or = 2, or > or = 3 tests, compared to controls, and the LOAEL was 15 ppm. A comparison of the BMD and NOAEL/LOAEL analyses suggests that at air styrene levels below the LOAEL, a segment of the worker population may be adversely affected. The benchmark approach will be useful for styrene noncancer risk assessment purposes by providing a more accurate estimate of potential risk that should, in turn, help to reduce the uncertainty that is a common problem in setting exposure levels.
Molecular diffusion of stable water isotopes in polar firn as a proxy for past temperatures

NASA Astrophysics Data System (ADS)

Holme, Christian; Gkinis, Vasileios; Vinther, Bo M.

2018-03-01

Polar precipitation archived in ice caps contains information on past temperature conditions. Such information can be retrieved by measuring the water isotopic signals of δ18O and δD in ice cores. These signals have been attenuated during densification due to molecular diffusion in the firn column, where the magnitude of the diffusion is isotopologue specific and temperature dependent. By utilizing the differential diffusion signal, dual isotope measurements of δ18O and δD enable multiple temperature reconstruction techniques. This study assesses how well six different methods can be used to reconstruct past surface temperatures from the diffusion-based temperature proxies. Two of the methods are based on the single diffusion lengths of δ18O and δD , three of the methods employ the differential diffusion signal, while the last uses the ratio between the single diffusion lengths. All techniques are tested on synthetic data in order to evaluate their accuracy and precision. We perform a benchmark test to thirteen high resolution Holocene data sets from Greenland and Antarctica, which represent a broad range of mean annual surface temperatures and accumulation rates. Based on the benchmark test, we comment on the accuracy and precision of the methods. Both the benchmark test and the synthetic data test demonstrate that the most precise reconstructions are obtained when using the single isotope diffusion lengths, with precisions of approximately 1.0 °C . In the benchmark test, the single isotope diffusion lengths are also found to reconstruct consistent temperatures with a root-mean-square-deviation of 0.7 °C . The techniques employing the differential diffusion signals are more uncertain, where the most precise method has a precision of 1.9 °C . The diffusion length ratio method is the least precise with a precision of 13.7 °C . The absolute temperature estimates from this method are also shown to be highly sensitive to the choice of fractionation factor parameterization.
Cross-industry benchmarking: is it applicable to the operating room?

PubMed

Marco, A P; Hart, S

2001-01-01

The use of benchmarking has been growing in nonmedical industries. This concept is being increasingly applied to medicine as the industry strives to improve quality and improve financial performance. Benchmarks can be either internal (set by the institution) or external (use other's performance as a goal). In some industries, benchmarking has crossed industry lines to identify breakthroughs in thinking. In this article, we examine whether the airline industry can be used as a source of external process benchmarking for the operating room.
A hybrid interface tracking - level set technique for multiphase flow with soluble surfactant

NASA Astrophysics Data System (ADS)

Shin, Seungwon; Chergui, Jalel; Juric, Damir; Kahouadji, Lyes; Matar, Omar K.; Craster, Richard V.

2018-04-01

A formulation for soluble surfactant transport in multiphase flows recently presented by Muradoglu and Tryggvason (JCP 274 (2014) 737-757) [17] is adapted to the context of the Level Contour Reconstruction Method, LCRM, (Shin et al. IJNMF 60 (2009) 753-778, [8]) which is a hybrid method that combines the advantages of the Front-tracking and Level Set methods. Particularly close attention is paid to the formulation and numerical implementation of the surface gradients of surfactant concentration and surface tension. Various benchmark tests are performed to demonstrate the accuracy of different elements of the algorithm. To verify surfactant mass conservation, values for surfactant diffusion along the interface are compared with the exact solution for the problem of uniform expansion of a sphere. The numerical implementation of the discontinuous boundary condition for the source term in the bulk concentration is compared with the approximate solution. Surface tension forces are tested for Marangoni drop translation. Our numerical results for drop deformation in simple shear are compared with experiments and results from previous simulations. All benchmarking tests compare well with existing data thus providing confidence that the adapted LCRM formulation for surfactant advection and diffusion is accurate and effective in three-dimensional multiphase flows with a structured mesh. We also demonstrate that this approach applies easily to massively parallel simulations.
Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets

PubMed Central

2013-01-01

Background While a large body of work exists on comparing and benchmarking descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 amino acid descriptor sets have been benchmarked with respect to their ability of establishing bioactivity models. The descriptor sets included in the study are Z-scales (3 variants), VHSE, T-scales, ST-scales, MS-WHIM, FASGAI, BLOSUM, a novel protein descriptor set (termed ProtFP (4 variants)), and in addition we created and benchmarked three pairs of descriptor combinations. Prediction performance was evaluated in seven structure-activity benchmarks which comprise Angiotensin Converting Enzyme (ACE) dipeptidic inhibitor data, and three proteochemometric data sets, namely (1) GPCR ligands modeled against a GPCR panel, (2) enzyme inhibitors (NNRTIs) with associated bioactivities against a set of HIV enzyme mutants, and (3) enzyme inhibitors (PIs) with associated bioactivities on a large set of HIV enzyme mutants. Results The amino acid descriptor sets compared here show similar performance (<0.1 log units RMSE difference and <0.1 difference in MCC), while errors for individual proteins were in some cases found to be larger than those resulting from descriptor set differences ( > 0.3 log units RMSE difference and >0.7 difference in MCC). Combining different descriptor sets generally leads to better modeling performance than utilizing individual sets. The best performers were Z-scales (3) combined with ProtFP (Feature), or Z-Scales (3) combined with an average Z-Scale value for each target, while ProtFP (PCA8), ST-Scales, and ProtFP (Feature) rank last. Conclusions While amino acid descriptor sets capture different aspects of amino acids their ability to be used for bioactivity modeling is still – on average – surprisingly similar. Still, combining sets describing complementary information consistently leads to small but consistent improvement in modeling performance (average MCC 0.01 better, average RMSE 0.01 log units lower). Finally, performance differences exist between the targets compared thereby underlining that choosing an appropriate descriptor set is of fundamental for bioactivity modeling, both from the ligand- as well as the protein side. PMID:24059743
A Causal-Comparative Study of the Affects of Benchmark Assessments on Middle Grades Science Achievement Scores

ERIC Educational Resources Information Center

Galloway, Melissa Ritchie

2016-01-01

The purpose of this causal comparative study was to test the theory of assessment that relates benchmark assessments to the Georgia middle grades science Criterion Referenced Competency Test (CRCT) percentages, controlling for schools who do not administer benchmark assessments versus schools who do administer benchmark assessments for all middle…
EVA Health and Human Performance Benchmarking Study

NASA Technical Reports Server (NTRS)

Abercromby, A. F.; Norcross, J.; Jarvis, S. L.

2016-01-01

Multiple HRP Risks and Gaps require detailed characterization of human health and performance during exploration extravehicular activity (EVA) tasks; however, a rigorous and comprehensive methodology for characterizing and comparing the health and human performance implications of current and future EVA spacesuit designs does not exist. This study will identify and implement functional tasks and metrics, both objective and subjective, that are relevant to health and human performance, such as metabolic expenditure, suit fit, discomfort, suited postural stability, cognitive performance, and potentially biochemical responses for humans working inside different EVA suits doing functional tasks under the appropriate simulated reduced gravity environments. This study will provide health and human performance benchmark data for humans working in current EVA suits (EMU, Mark III, and Z2) as well as shirtsleeves using a standard set of tasks and metrics with quantified reliability. Results and methodologies developed during this test will provide benchmark data against which future EVA suits, and different suit configurations (eg, varied pressure, mass, CG) may be reliably compared in subsequent tests. Results will also inform fitness for duty standards as well as design requirements and operations concepts for future EVA suits and other exploration systems.

Applying Quantum Monte Carlo to the Electronic Structure Problem

NASA Astrophysics Data System (ADS)

Powell, Andrew D.; Dawes, Richard

2016-06-01

Two distinct types of Quantum Monte Carlo (QMC) calculations are applied to electronic structure problems such as calculating potential energy curves and producing benchmark values for reaction barriers. First, Variational and Diffusion Monte Carlo (VMC and DMC) methods using a trial wavefunction subject to the fixed node approximation were tested using the CASINO code.[1] Next, Full Configuration Interaction Quantum Monte Carlo (FCIQMC), along with its initiator extension (i-FCIQMC) were tested using the NECI code.[2] FCIQMC seeks the FCI energy for a specific basis set. At a reduced cost, the efficient i-FCIQMC method can be applied to systems in which the standard FCIQMC approach proves to be too costly. Since all of these methods are statistical approaches, uncertainties (error-bars) are introduced for each calculated energy. This study tests the performance of the methods relative to traditional quantum chemistry for some benchmark systems. References: [1] R. J. Needs et al., J. Phys.: Condensed Matter 22, 023201 (2010). [2] G. H. Booth et al., J. Chem. Phys. 131, 054106 (2009).
Benchmark notch test for life prediction

NASA Technical Reports Server (NTRS)

Domas, P. A.; Sharpe, W. N.; Ward, M.; Yau, J. F.

1982-01-01

The laser Interferometric Strain Displacement Gage (ISDG) was used to measure local strains in notched Inconel 718 test bars subjected to six different load histories at 649 C (1200 F) and including effects of tensile and compressive hold periods. The measurements were compared to simplified Neuber notch analysis predictions of notch root stress and strain. The actual strains incurred at the root of a discontinuity in cyclically loaded test samples subjected to inelastic deformation at high temperature where creep deformations readily occur were determined. The steady state cyclic, stress-strain response at the root of the discontinuity was analyzed. Flat, double notched uniaxially loaded fatigue specimens manufactured from the nickel base, superalloy Inconel 718 were used. The ISDG was used to obtain cycle by cycle recordings of notch root strain during continuous and hold time cycling at 649 C. Comparisons to Neuber and finite element model analyses were made. The results obtained provide a benchmark data set in high technology design where notch fatigue life is the predominant component service life limitation.
A Modified Mean Gray Wolf Optimization Approach for Benchmark and Biomedical Problems.

PubMed

Singh, Narinder; Singh, S B

2017-01-01

A modified variant of gray wolf optimization algorithm, namely, mean gray wolf optimization algorithm has been developed by modifying the position update (encircling behavior) equations of gray wolf optimization algorithm. The proposed variant has been tested on 23 standard benchmark well-known test functions (unimodal, multimodal, and fixed-dimension multimodal), and the performance of modified variant has been compared with particle swarm optimization and gray wolf optimization. Proposed algorithm has also been applied to the classification of 5 data sets to check feasibility of the modified variant. The results obtained are compared with many other meta-heuristic approaches, ie, gray wolf optimization, particle swarm optimization, population-based incremental learning, ant colony optimization, etc. The results show that the performance of modified variant is able to find best solutions in terms of high level of accuracy in classification and improved local optima avoidance.
Benchmarking Year Five Students' Reading Abilities

ERIC Educational Resources Information Center

Lim, Chang Kuan; Eng, Lin Siew; Mohamed, Abdul Rashid

2014-01-01

Reading and understanding a written text is one of the most important skills in English learning.This study attempts to benchmark Year Five students' reading abilities of fifteen rural schools in a district in Malaysia. The objectives of this study are to develop a set of standardised written reading comprehension and a set of indicators to inform…
LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights

PubMed Central

Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

2016-01-01

Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher’s exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO’s usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher. PMID:26750448
LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights.

PubMed

Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

2016-01-11

Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher's exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO's usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher.
SU-D-BRD-03: A Gateway for GPU Computing in Cancer Radiotherapy Research

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jia, X; Folkerts, M; Shi, F

Purpose: Graphics Processing Unit (GPU) has become increasingly important in radiotherapy. However, it is still difficult for general clinical researchers to access GPU codes developed by other researchers, and for developers to objectively benchmark their codes. Moreover, it is quite often to see repeated efforts spent on developing low-quality GPU codes. The goal of this project is to establish an infrastructure for testing GPU codes, cross comparing them, and facilitating code distributions in radiotherapy community. Methods: We developed a system called Gateway for GPU Computing in Cancer Radiotherapy Research (GCR2). A number of GPU codes developed by our group andmore » other developers can be accessed via a web interface. To use the services, researchers first upload their test data or use the standard data provided by our system. Then they can select the GPU device on which the code will be executed. Our system offers all mainstream GPU hardware for code benchmarking purpose. After the code running is complete, the system automatically summarizes and displays the computing results. We also released a SDK to allow the developers to build their own algorithm implementation and submit their binary codes to the system. The submitted code is then systematically benchmarked using a variety of GPU hardware and representative data provided by our system. The developers can also compare their codes with others and generate benchmarking reports. Results: It is found that the developed system is fully functioning. Through a user-friendly web interface, researchers are able to test various GPU codes. Developers also benefit from this platform by comprehensively benchmarking their codes on various GPU platforms and representative clinical data sets. Conclusion: We have developed an open platform allowing the clinical researchers and developers to access the GPUs and GPU codes. This development will facilitate the utilization of GPU in radiation therapy field.« less
nu-Anomica: A Fast Support Vector Based Novelty Detection Technique

NASA Technical Reports Server (NTRS)

Das, Santanu; Bhaduri, Kanishka; Oza, Nikunj C.; Srivastava, Ashok N.

2009-01-01

In this paper we propose nu-Anomica, a novel anomaly detection technique that can be trained on huge data sets with much reduced running time compared to the benchmark one-class Support Vector Machines algorithm. In -Anomica, the idea is to train the machine such that it can provide a close approximation to the exact decision plane using fewer training points and without losing much of the generalization performance of the classical approach. We have tested the proposed algorithm on a variety of continuous data sets under different conditions. We show that under all test conditions the developed procedure closely preserves the accuracy of standard one-class Support Vector Machines while reducing both the training time and the test time by 5 - 20 times.
Hospital benchmarking: are U.S. eye hospitals ready?

PubMed

de Korne, Dirk F; van Wijngaarden, Jeroen D H; Sol, Kees J C A; Betz, Robert; Thomas, Richard C; Schein, Oliver D; Klazinga, Niek S

2012-01-01

Benchmarking is increasingly considered a useful management instrument to improve quality in health care, but little is known about its applicability in hospital settings. The aims of this study were to assess the applicability of a benchmarking project in U.S. eye hospitals and compare the results with an international initiative. We evaluated multiple cases by applying an evaluation frame abstracted from the literature to five U.S. eye hospitals that used a set of 10 indicators for efficiency benchmarking. Qualitative analysis entailed 46 semistructured face-to-face interviews with stakeholders, document analyses, and questionnaires. The case studies only partially met the conditions of the evaluation frame. Although learning and quality improvement were stated as overall purposes, the benchmarking initiative was at first focused on efficiency only. No ophthalmic outcomes were included, and clinicians were skeptical about their reporting relevance and disclosure. However, in contrast with earlier findings in international eye hospitals, all U.S. hospitals worked with internal indicators that were integrated in their performance management systems and supported benchmarking. Benchmarking can support performance management in individual hospitals. Having a certain number of comparable institutes provide similar services in a noncompetitive milieu seems to lay fertile ground for benchmarking. International benchmarking is useful only when these conditions are not met nationally. Although the literature focuses on static conditions for effective benchmarking, our case studies show that it is a highly iterative and learning process. The journey of benchmarking seems to be more important than the destination. Improving patient value (health outcomes per unit of cost) requires, however, an integrative perspective where clinicians and administrators closely cooperate on both quality and efficiency issues. If these worlds do not share such a relationship, the added "public" value of benchmarking in health care is questionable.
Toxicological Benchmarks for Screening of Potential Contaminants of Concern for Effects on Aquatic Biota on the Oak Ridge Reservation, Oak Ridge, Tennessee

DOE Office of Scientific and Technical Information (OSTI.GOV)

Suter, G.W., II

1993-01-01

One of the initial stages in ecological risk assessment of hazardous waste sites is the screening of contaminants to determine which, if any, of them are worthy of further consideration; this process is termed contaminant screening. Screening is performed by comparing concentrations in ambient media to benchmark concentrations that are either indicative of a high likelihood of significant effects (upper screening benchmarks) or of a very low likelihood of significant effects (lower screening benchmarks). Exceedance of an upper screening benchmark indicates that the chemical in question is clearly of concern and remedial actions are likely to be needed. Exceedance ofmore » a lower screening benchmark indicates that a contaminant is of concern unless other information indicates that the data are unreliable or the comparison is inappropriate. Chemicals with concentrations below the lower benchmark are not of concern if the ambient data are judged to be adequate. This report presents potential screening benchmarks for protection of aquatic life from contaminants in water. Because there is no guidance for screening benchmarks, a set of alternative benchmarks is presented herein. The alternative benchmarks are based on different conceptual approaches to estimating concentrations causing significant effects. For the upper screening benchmark, there are the acute National Ambient Water Quality Criteria (NAWQC) and the Secondary Acute Values (SAV). The SAV concentrations are values estimated with 80% confidence not to exceed the unknown acute NAWQC for those chemicals with no NAWQC. The alternative chronic benchmarks are the chronic NAWQC, the Secondary Chronic Value (SCV), the lowest chronic values for fish and daphnids, the lowest EC20 for fish and daphnids from chronic toxicity tests, the estimated EC20 for a sensitive species, and the concentration estimated to cause a 20% reduction in the recruit abundance of largemouth bass. It is recommended that ambient chemical concentrations be compared to all of these benchmarks. If NAWQC are exceeded, the chemicals must be contaminants of concern because the NAWQC are applicable or relevant and appropriate requirements (ARARs). If NAWQC are not exceeded, but other benchmarks are, contaminants should be selected on the basis of the number of benchmarks exceeded and the conservatism of the particular benchmark values, as discussed in the text. To the extent that toxicity data are available, this report presents the alternative benchmarks for chemicals that have been detected on the Oak Ridge Reservation. It also presents the data used to calculate the benchmarks and the sources of the data. It compares the benchmarks and discusses their relative conservatism and utility. This report supersedes a prior aquatic benchmarks report (Suter and Mabrey 1994). It adds two new types of benchmarks. It also updates the benchmark values where appropriate, adds some new benchmark values, replaces secondary sources with primary sources, and provides more complete documentation of the sources and derivation of all values.« less
Benchmarking expert system tools

NASA Technical Reports Server (NTRS)

Riley, Gary

1988-01-01

As part of its evaluation of new technologies, the Artificial Intelligence Section of the Mission Planning and Analysis Div. at NASA-Johnson has made timing tests of several expert system building tools. Among the production systems tested were Automated Reasoning Tool, several versions of OPS5, and CLIPS (C Language Integrated Production System), an expert system builder developed by the AI section. Also included in the test were a Zetalisp version of the benchmark along with four versions of the benchmark written in Knowledge Engineering Environment, an object oriented, frame based expert system tool. The benchmarks used for testing are studied.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Lopez, Jesse E.; Baptista, António M.

A sediment model coupled to the hydrodynamic model SELFE is validated against a benchmark combining a set of idealized tests and an application to a field-data rich energetic estuary. After sensitivity studies, model results for the idealized tests largely agree with previously reported results from other models in addition to analytical, semi-analytical, or laboratory results. Results of suspended sediment in an open channel test with fixed bottom are sensitive to turbulence closure and treatment for hydrodynamic bottom boundary. Results for the migration of a trench are very sensitive to critical stress and erosion rate, but largely insensitive to turbulence closure.more » The model is able to qualitatively represent sediment dynamics associated with estuarine turbidity maxima in an idealized estuary. Applied to the Columbia River estuary, the model qualitatively captures sediment dynamics observed by fixed stations and shipborne profiles. Representation of the vertical structure of suspended sediment degrades when stratification is underpredicted. Across all tests, skill metrics of suspended sediments lag those of hydrodynamics even when qualitatively representing dynamics. The benchmark is fully documented in an openly available repository to encourage unambiguous comparisons against other models.« less
Benchmark Shock Tube Experiments for Radiative Heating Relevant to Earth Re-Entry

NASA Technical Reports Server (NTRS)

Brandis, A. M.; Cruden, B. A.

2017-01-01

Detailed spectrally and spatially resolved radiance has been measured in the Electric Arc Shock Tube (EAST) facility for conditions relevant to high speed entry into a variety of atmospheres, including Earth, Venus, Titan, Mars and the Outer Planets. The tests that measured radiation relevant for Earth re-entry are the focus of this work and are taken from campaigns 47, 50, 52 and 57. These tests covered conditions from 8 km/s to 15.5 km/s at initial pressures ranging from 0.05 Torr to 1 Torr, of which shots at 0.1 and 0.2 Torr are analyzed in this paper. These conditions cover a range of points of interest for potential fight missions, including return from Low Earth Orbit, the Moon and Mars. The large volume of testing available from EAST is useful for statistical analysis of radiation data, but is problematic for identifying representative experiments for performing detailed analysis. Therefore, the intent of this paper is to select a subset of benchmark test data that can be considered for further detailed study. These benchmark shots are intended to provide more accessible data sets for future code validation studies and facility-to-facility comparisons. The shots that have been selected as benchmark data are the ones in closest agreement to a line of best fit through all of the EAST results, whilst also showing the best experimental characteristics, such as test time and convergence to equilibrium. The EAST data are presented in different formats for analysis. These data include the spectral radiance at equilibrium, the spatial dependence of radiance over defined wavelength ranges and the mean non-equilibrium spectral radiance (so-called 'spectral non-equilibrium metric'). All the information needed to simulate each experimental trace, including free-stream conditions, shock time of arrival (i.e. x-t) relation, and the spectral and spatial resolution functions, are provided.
Vibrational multiconfiguration self-consistent field theory: implementation and test calculations.

PubMed

Heislbetz, Sandra; Rauhut, Guntram

2010-03-28

A state-specific vibrational multiconfiguration self-consistent field (VMCSCF) approach based on a multimode expansion of the potential energy surface is presented for the accurate calculation of anharmonic vibrational spectra. As a special case of this general approach vibrational complete active space self-consistent field calculations will be discussed. The latter method shows better convergence than the general VMCSCF approach and must be considered the preferred choice within the multiconfigurational framework. Benchmark calculations are provided for a small set of test molecules.
A Simplified Approach for the Rapid Generation of Transient Heat-Shield Environments

NASA Technical Reports Server (NTRS)

Wurster, Kathryn E.; Zoby, E. Vincent; Mills, Janelle C.; Kamhawi, Hilmi

2007-01-01

A simplified approach has been developed whereby transient entry heating environments are reliably predicted based upon a limited set of benchmark radiative and convective solutions. Heating, pressure and shear-stress levels, non-dimensionalized by an appropriate parameter at each benchmark condition are applied throughout the entry profile. This approach was shown to be valid based on the observation that the fully catalytic, laminar distributions examined were relatively insensitive to altitude as well as velocity throughout the regime of significant heating. In order to establish a best prediction by which to judge the results that can be obtained using a very limited benchmark set, predictions based on a series of benchmark cases along a trajectory are used. Solutions which rely only on the limited benchmark set, ideally in the neighborhood of peak heating, are compared against the resultant transient heating rates and total heat loads from the best prediction. Predictions based on using two or fewer benchmark cases at or near the trajectory peak heating condition, yielded results to within 5-10 percent of the best predictions. Thus, the method provides transient heating environments over the heat-shield face with sufficient resolution and accuracy for thermal protection system design and also offers a significant capability to perform rapid trade studies such as the effect of different trajectories, atmospheres, or trim angle of attack, on convective and radiative heating rates and loads, pressure, and shear-stress levels.
S66: A Well-balanced Database of Benchmark Interaction Energies Relevant to Biomolecular Structures

PubMed Central

2011-01-01

With numerous new quantum chemistry methods being developed in recent years and the promise of even more new methods to be developed in the near future, it is clearly critical that highly accurate, well-balanced, reference data for many different atomic and molecular properties be available for the parametrization and validation of these methods. One area of research that is of particular importance in many areas of chemistry, biology, and material science is the study of noncovalent interactions. Because these interactions are often strongly influenced by correlation effects, it is necessary to use computationally expensive high-order wave function methods to describe them accurately. Here, we present a large new database of interaction energies calculated using an accurate CCSD(T)/CBS scheme. Data are presented for 66 molecular complexes, at their reference equilibrium geometries and at 8 points systematically exploring their dissociation curves; in total, the database contains 594 points: 66 at equilibrium geometries, and 528 in dissociation curves. The data set is designed to cover the most common types of noncovalent interactions in biomolecules, while keeping a balanced representation of dispersion and electrostatic contributions. The data set is therefore well suited for testing and development of methods applicable to bioorganic systems. In addition to the benchmark CCSD(T) results, we also provide decompositions of the interaction energies by means of DFT-SAPT calculations. The data set was used to test several correlated QM methods, including those parametrized specifically for noncovalent interactions. Among these, the SCS-MI-CCSD method outperforms all other tested methods, with a root-mean-square error of 0.08 kcal/mol for the S66 data set. PMID:21836824
Requirements for benchmarking personal image retrieval systems

NASA Astrophysics Data System (ADS)

Bouguet, Jean-Yves; Dulong, Carole; Kozintsev, Igor; Wu, Yi

2006-01-01

It is now common to have accumulated tens of thousands of personal ictures. Efficient access to that many pictures can only be done with a robust image retrieval system. This application is of high interest to Intel processor architects. It is highly compute intensive, and could motivate end users to upgrade their personal computers to the next generations of processors. A key question is how to assess the robustness of a personal image retrieval system. Personal image databases are very different from digital libraries that have been used by many Content Based Image Retrieval Systems.1 For example a personal image database has a lot of pictures of people, but a small set of different people typically family, relatives, and friends. Pictures are taken in a limited set of places like home, work, school, and vacation destination. The most frequent queries are searched for people, and for places. These attributes, and many others affect how a personal image retrieval system should be benchmarked, and benchmarks need to be different from existing ones based on art images, or medical images for examples. The attributes of the data set do not change the list of components needed for the benchmarking of such systems as specified in2: - data sets - query tasks - ground truth - evaluation measures - benchmarking events. This paper proposed a way to build these components to be representative of personal image databases, and of the corresponding usage models.
A benchmarking method to measure dietary absorption efficiency of chemicals by fish.

PubMed

Xiao, Ruiyang; Adolfsson-Erici, Margaretha; Åkerman, Gun; McLachlan, Michael S; MacLeod, Matthew

2013-12-01

Understanding the dietary absorption efficiency of chemicals in the gastrointestinal tract of fish is important from both a scientific and a regulatory point of view. However, reported fish absorption efficiencies for well-studied chemicals are highly variable. In the present study, the authors developed and exploited an internal chemical benchmarking method that has the potential to reduce uncertainty and variability and, thus, to improve the precision of measurements of fish absorption efficiency. The authors applied the benchmarking method to measure the gross absorption efficiency for 15 chemicals with a wide range of physicochemical properties and structures. They selected 2,2',5,6'-tetrachlorobiphenyl (PCB53) and decabromodiphenyl ethane as absorbable and nonabsorbable benchmarks, respectively. Quantities of chemicals determined in fish were benchmarked to the fraction of PCB53 recovered in fish, and quantities of chemicals determined in feces were benchmarked to the fraction of decabromodiphenyl ethane recovered in feces. The performance of the benchmarking procedure was evaluated based on the recovery of the test chemicals and precision of absorption efficiency from repeated tests. Benchmarking did not improve the precision of the measurements; after benchmarking, however, the median recovery for 15 chemicals was 106%, and variability of recoveries was reduced compared with before benchmarking, suggesting that benchmarking could account for incomplete extraction of chemical in fish and incomplete collection of feces from different tests. © 2013 SETAC.
Winning Strategy: Set Benchmarks of Early Success to Build Momentum for the Long Term

ERIC Educational Resources Information Center

Spiro, Jody

2012-01-01

Change is a highly personal experience. Everyone participating in the effort has different reactions to change, different concerns, and different motivations for being involved. The smart change leader sets benchmarks along the way so there are guideposts and pause points instead of an endless change process. "Early wins"--a term used to describe…
Summary of comparison and analysis of results from exercises 1 and 2 of the OECD PBMR coupled neutronics/thermal hydraulics transient benchmark

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mkhabela, P.; Han, J.; Tyobeka, B.

2006-07-01

The Nuclear Energy Agency (NEA) of the Organization for Economic Cooperation and Development (OECD) has accepted, through the Nuclear Science Committee (NSC), the inclusion of the Pebble-Bed Modular Reactor 400 MW design (PBMR-400) coupled neutronics/thermal hydraulics transient benchmark problem as part of their official activities. The scope of the benchmark is to establish a well-defined problem, based on a common given library of cross sections, to compare methods and tools in core simulation and thermal hydraulics analysis with a specific focus on transient events through a set of multi-dimensional computational test problems. The benchmark includes three steady state exercises andmore » six transient exercises. This paper describes the first two steady state exercises, their objectives and the international participation in terms of organization, country and computer code utilized. This description is followed by a comparison and analysis of the participants' results submitted for these two exercises. The comparison of results from different codes allows for an assessment of the sensitivity of a result to the method employed and can thus help to focus the development efforts on the most critical areas. The two first exercises also allow for removing of user-related modeling errors and prepare core neutronics and thermal-hydraulics models of the different codes for the rest of the exercises in the benchmark. (authors)« less

Translating an AI application from Lisp to Ada: A case study

NASA Technical Reports Server (NTRS)

Davis, Gloria J.

1991-01-01

A set of benchmarks was developed to test the performance of a newly designed computer executing both Lisp and Ada. Among these was AutoClassII -- a large Artificial Intelligence (AI) application written in Common Lisp. The extraction of a representative subset of this complex application was aided by a Lisp Code Analyzer (LCA). The LCA enabled rapid analysis of the code, putting it in a concise and functionally readable form. An equivalent benchmark was created in Ada through manual translation of the Lisp version. A comparison of the execution results of both programs across a variety of compiler-machine combinations indicate that line-by-line translation coupled with analysis of the initial code can produce relatively efficient and reusable target code.
Benchmarking for Bayesian Reinforcement Learning

PubMed Central

Ernst, Damien; Couëtoux, Adrien

2016-01-01

In the Bayesian Reinforcement Learning (BRL) setting, agents try to maximise the collected rewards while interacting with their environment while using some prior knowledge that is accessed beforehand. Many BRL algorithms have already been proposed, but the benchmarks used to compare them are only relevant for specific cases. The paper addresses this problem, and provides a new BRL comparison methodology along with the corresponding open source library. In this methodology, a comparison criterion that measures the performance of algorithms on large sets of Markov Decision Processes (MDPs) drawn from some probability distributions is defined. In order to enable the comparison of non-anytime algorithms, our methodology also includes a detailed analysis of the computation time requirement of each algorithm. Our library is released with all source code and documentation: it includes three test problems, each of which has two different prior distributions, and seven state-of-the-art RL algorithms. Finally, our library is illustrated by comparing all the available algorithms and the results are discussed. PMID:27304891
Benchmarking for Bayesian Reinforcement Learning.

PubMed

Castronovo, Michael; Ernst, Damien; Couëtoux, Adrien; Fonteneau, Raphael

2016-01-01

In the Bayesian Reinforcement Learning (BRL) setting, agents try to maximise the collected rewards while interacting with their environment while using some prior knowledge that is accessed beforehand. Many BRL algorithms have already been proposed, but the benchmarks used to compare them are only relevant for specific cases. The paper addresses this problem, and provides a new BRL comparison methodology along with the corresponding open source library. In this methodology, a comparison criterion that measures the performance of algorithms on large sets of Markov Decision Processes (MDPs) drawn from some probability distributions is defined. In order to enable the comparison of non-anytime algorithms, our methodology also includes a detailed analysis of the computation time requirement of each algorithm. Our library is released with all source code and documentation: it includes three test problems, each of which has two different prior distributions, and seven state-of-the-art RL algorithms. Finally, our library is illustrated by comparing all the available algorithms and the results are discussed.
A technology mapping based on graph of excitations and outputs for finite state machines

NASA Astrophysics Data System (ADS)

Kania, Dariusz; Kulisz, Józef

2017-11-01

A new, efficient technology mapping method of FSMs, dedicated for PAL-based PLDs is proposed. The essence of the method consists in searching for the minimal set of PAL-based logic blocks that cover a set of multiple-output implicants describing the transition and output functions of an FSM. The method is based on a new concept of graph: the Graph of Excitations and Outputs. The proposed algorithm was tested using the FSM benchmarks. The obtained results were compared with the classical technology mapping of FSM.
OPTIMIZATION OF MUD HAMMER DRILLING PERFORMANCE - A PROGRAM TO BENCHMARK THE VIABILITY OF ADVANCED MUD HAMMER DRILLING

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arnis Judzis

2003-01-01

This document details the progress to date on the ''OPTIMIZATION OF MUD HAMMER DRILLING PERFORMANCE -- A PROGRAM TO BENCHMARK THE VIABILITY OF ADVANCED MUD HAMMER DRILLING'' contract for the quarter starting October 2002 through December 2002. Even though we are awaiting the optimization portion of the testing program, accomplishments included the following: (1) Smith International participated in the DOE Mud Hammer program through full scale benchmarking testing during the week of 4 November 2003. (2) TerraTek acknowledges Smith International, BP America, PDVSA, and ConocoPhillips for cost-sharing the Smith benchmarking tests allowing extension of the contract to add to themore » benchmarking testing program. (3) Following the benchmark testing of the Smith International hammer, representatives from DOE/NETL, TerraTek, Smith International and PDVSA met at TerraTek in Salt Lake City to review observations, performance and views on the optimization step for 2003. (4) The December 2002 issue of Journal of Petroleum Technology (Society of Petroleum Engineers) highlighted the DOE fluid hammer testing program and reviewed last years paper on the benchmark performance of the SDS Digger and Novatek hammers. (5) TerraTek's Sid Green presented a technical review for DOE/NETL personnel in Morgantown on ''Impact Rock Breakage'' and its importance on improving fluid hammer performance. Much discussion has taken place on the issues surrounding mud hammer performance at depth conditions.« less
Notes on numerical reliability of several statistical analysis programs

USGS Publications Warehouse

Landwehr, J.M.; Tasker, Gary D.

1999-01-01

This report presents a benchmark analysis of several statistical analysis programs currently in use in the USGS. The benchmark consists of a comparison between the values provided by a statistical analysis program for variables in the reference data set ANASTY and their known or calculated theoretical values. The ANASTY data set is an amendment of the Wilkinson NASTY data set that has been used in the statistical literature to assess the reliability (computational correctness) of calculated analytical results.
Benchmarking can add up for healthcare accounting.

PubMed

Czarnecki, M T

1994-09-01

In 1993, a healthcare accounting and finance benchmarking survey of hospital and nonhospital organizations gathered statistics about key common performance areas. A low response did not allow for statistically significant findings, but the survey identified performance measures that can be used in healthcare financial management settings. This article explains the benchmarking process and examines some of the 1993 study's findings.
Utilizing Benchmarking to Study the Effectiveness of Parent-Child Interaction Therapy Implemented in a Community Setting

ERIC Educational Resources Information Center

Self-Brown, Shannon; Valente, Jessica R.; Wild, Robert C.; Whitaker, Daniel J.; Galanter, Rachel; Dorsey, Shannon; Stanley, Jenelle

2012-01-01

Benchmarking is a program evaluation approach that can be used to study whether the outcomes of parents/children who participate in an evidence-based program in the community approximate the outcomes found in randomized trials. This paper presents a case illustration using benchmarking methodology to examine a community implementation of…
Optimal type 2 diabetes mellitus management: the randomised controlled OPTIMISE benchmarking study: baseline results from six European countries.

PubMed

Hermans, Michel P; Brotons, Carlos; Elisaf, Moses; Michel, Georges; Muls, Erik; Nobels, Frank

2013-12-01

Micro- and macrovascular complications of type 2 diabetes have an adverse impact on survival, quality of life and healthcare costs. The OPTIMISE (OPtimal Type 2 dIabetes Management Including benchmarking and Standard trEatment) trial comparing physicians' individual performances with a peer group evaluates the hypothesis that benchmarking, using assessments of change in three critical quality indicators of vascular risk: glycated haemoglobin (HbA1c), low-density lipoprotein-cholesterol (LDL-C) and systolic blood pressure (SBP), may improve quality of care in type 2 diabetes in the primary care setting. This was a randomised, controlled study of 3980 patients with type 2 diabetes. Six European countries participated in the OPTIMISE study (NCT00681850). Quality of care was assessed by the percentage of patients achieving pre-set targets for the three critical quality indicators over 12 months. Physicians were randomly assigned to receive either benchmarked or non-benchmarked feedback. All physicians received feedback on six of their patients' modifiable outcome indicators (HbA1c, fasting glycaemia, total cholesterol, high-density lipoprotein-cholesterol (HDL-C), LDL-C and triglycerides). Physicians in the benchmarking group additionally received information on levels of control achieved for the three critical quality indicators compared with colleagues. At baseline, the percentage of evaluable patients (N = 3980) achieving pre-set targets was 51.2% (HbA1c; n = 2028/3964); 34.9% (LDL-C; n = 1350/3865); 27.3% (systolic blood pressure; n = 911/3337). OPTIMISE confirms that target achievement in the primary care setting is suboptimal for all three critical quality indicators. This represents an unmet but modifiable need to revisit the mechanisms and management of improving care in type 2 diabetes. OPTIMISE will help to assess whether benchmarking is a useful clinical tool for improving outcomes in type 2 diabetes.
Effective secondary fracture prevention: implementation of a global benchmarking of clinical quality using the IOF Capture the Fracture® Best Practice Framework tool.

PubMed

Javaid, M K; Kyer, C; Mitchell, P J; Chana, J; Moss, C; Edwards, M H; McLellan, A R; Stenmark, J; Pierroz, D D; Schneider, M C; Kanis, J A; Akesson, K; Cooper, C

2015-11-01

Fracture Liaison Services are the best model to prevent secondary fractures. The International Osteoporosis Foundation developed a Best Practice Framework to provide a quality benchmark. After a year of implementation, we confirmed that a single framework with set criteria is able to benchmark services across healthcare systems worldwide. Despite evidence for the clinical effectiveness of secondary fracture prevention, translation in the real-world setting remains disappointing. Where implemented, a wide variety of service models are used to deliver effective secondary fracture prevention. To support use of effective models of care across the globe, the International Osteoporosis Foundation's Capture the Fracture® programme developed a Best Practice Framework (BPF) tool of criteria and standards to provide a quality benchmark. We now report findings after the first 12 months of implementation. A questionnaire for the BPF was created and made available to institutions on the Capture the Fracture website. Responses from institutions were used to assign gold, silver, bronze or black (insufficient) level of achievements mapped across five domains. Through an interactive process with the institution, a final score was determined and published on the Capture the Fracture website Fracture Liaison Service (FLS) map. Sixty hospitals across six continents submitted their questionnaires. The hospitals served populations from 20,000 to 15 million and were a mix of private and publicly funded. Each FLS managed 146 to 6200 fragility fracture patients per year with a total of 55,160 patients across all sites. Overall, 27 hospitals scored gold, 23 silver and 10 bronze. The pathway for the hip fracture patients had the highest proportion of gold grading while vertebral fracture the lowest. In the first 12 months, we have successfully tested the BPF tool in a range of health settings across the globe. Initial findings confirm a significant heterogeneity in service provision and highlight the importance of a global approach to ensure high quality secondary fracture prevention services.
Benchmarking Memory Performance with the Data Cube Operator

NASA Technical Reports Server (NTRS)

Frumkin, Michael A.; Shabanov, Leonid V.

2004-01-01

Data movement across a computer memory hierarchy and across computational grids is known to be a limiting factor for applications processing large data sets. We use the Data Cube Operator on an Arithmetic Data Set, called ADC, to benchmark capabilities of computers and of computational grids to handle large distributed data sets. We present a prototype implementation of a parallel algorithm for computation of the operatol: The algorithm follows a known approach for computing views from the smallest parent. The ADC stresses all levels of grid memory and storage by producing some of 2d views of an Arithmetic Data Set of d-tuples described by a small number of integers. We control data intensity of the ADC by selecting the tuple parameters, the sizes of the views, and the number of realized views. Benchmarking results of memory performance of a number of computer architectures and of a small computational grid are presented.
Performance Evaluation of Supercomputers using HPCC and IMB Benchmarks

NASA Technical Reports Server (NTRS)

Saini, Subhash; Ciotti, Robert; Gunney, Brian T. N.; Spelce, Thomas E.; Koniges, Alice; Dossa, Don; Adamidis, Panagiotis; Rabenseifner, Rolf; Tiyyagura, Sunil R.; Mueller, Matthias;

2006-01-01

The HPC Challenge (HPCC) benchmark suite and the Intel MPI Benchmark (IMB) are used to compare and evaluate the combined performance of processor, memory subsystem and interconnect fabric of five leading supercomputers - SGI Altix BX2, Cray XI, Cray Opteron Cluster, Dell Xeon cluster, and NEC SX-8. These five systems use five different networks (SGI NUMALINK4, Cray network, Myrinet, InfiniBand, and NEC IXS). The complete set of HPCC benchmarks are run on each of these systems. Additionally, we present Intel MPI Benchmarks (IMB) results to study the performance of 11 MPI communication functions on these systems.

Orthogonal Electric Field Measurements near the Green Fluorescent Protein Fluorophore through Stark Effect Spectroscopy and pKa Shifts Provide a Unique Benchmark for Electrostatics Models.

PubMed

Slocum, Joshua D; First, Jeremy T; Webb, Lauren J

2017-07-20

Measurement of the magnitude, direction, and functional importance of electric fields in biomolecules has been a long-standing experimental challenge. pK a shifts of titratable residues have been the most widely implemented measurements of the local electrostatic environment around the labile proton, and experimental data sets of pK a shifts in a variety of systems have been used to test and refine computational prediction capabilities of protein electrostatic fields. A more direct and increasingly popular technique to measure electric fields in proteins is Stark effect spectroscopy, where the change in absorption energy of a chromophore relative to a reference state is related to the change in electric field felt by the chromophore. While there are merits to both of these methods and they are both reporters of local electrostatic environment, they are fundamentally different measurements, and to our knowledge there has been no direct comparison of these two approaches in a single protein. We have recently demonstrated that green fluorescent protein (GFP) is an ideal model system for measuring changes in electric fields in a protein interior caused by amino acid mutations using both electronic and vibrational Stark effect chromophores. Here we report the changes in pK a of the GFP fluorophore in response to the same mutations and show that they are in excellent agreement with Stark effect measurements. This agreement in the results of orthogonal experiments reinforces our confidence in the experimental results of both Stark effect and pK a measurements and provides an excellent target data set to benchmark diverse protein electrostatics calculations. We used this experimental data set to test the pK a prediction ability of the adaptive Poisson-Boltzmann solver (APBS) and found that a simple continuum dielectric model of the GFP interior is insufficient to accurately capture the measured pK a and Stark effect shifts. We discuss some of the limitations of this continuum-based model in this system and offer this experimentally self-consistent data set as a target benchmark for electrostatics models, which could allow for a more rigorous test of pK a prediction techniques due to the unique environment of the water-filled GFP barrel compared to traditional globular proteins.
Maximizing Use of Extension Beef Cattle Benchmarks Data Derived from Cow Herd Appraisal Performance Software

ERIC Educational Resources Information Center

Ramsay, Jennifer M.; Hanna, Lauren L. Hulsman; Ringwall, Kris A.

2016-01-01

One goal of Extension is to provide practical information that makes a difference to producers. Cow Herd Appraisal Performance Software (CHAPS) has provided beef producers with production benchmarks for 30 years, creating a large historical data set. Many such large data sets contain useful information but are underutilized. Our goal was to create…
Comparison of Fully-Compressible Equation Sets for Atmospheric Dynamics

NASA Technical Reports Server (NTRS)

Ahmad, Nashat N.

2016-01-01

Traditionally, the equation for the conservation of energy used in atmospheric models is based on potential temperature and is used in place of the total energy conservation. This paper compares the application of the two equations sets for both the Euler and the Navier-Stokes solutions using several benchmark test cases. A high-resolution wave-propagation method which accurately takes into account the source term due to gravity is used for computing the non-hydrostatic atmospheric flows. It is demonstrated that there is little to no difference between the results obtained using the two different equation sets for Euler as well as Navier-Stokes solutions.
Benchmarking a geostatistical procedure for the homogenisation of annual precipitation series

NASA Astrophysics Data System (ADS)

Caineta, Júlio; Ribeiro, Sara; Henriques, Roberto; Soares, Amílcar; Costa, Ana Cristina

2014-05-01

The European project COST Action ES0601, Advances in homogenisation methods of climate series: an integrated approach (HOME), has brought to attention the importance of establishing reliable homogenisation methods for climate data. In order to achieve that, a benchmark data set, containing monthly and daily temperature and precipitation data, was created to be used as a comparison basis for the effectiveness of those methods. Several contributions were submitted and evaluated by a number of performance metrics, validating the results against realistic inhomogeneous data. HOME also led to the development of new homogenisation software packages, which included feedback and lessons learned during the project. Preliminary studies have suggested a geostatistical stochastic approach, which uses Direct Sequential Simulation (DSS), as a promising methodology for the homogenisation of precipitation data series. Based on the spatial and temporal correlation between the neighbouring stations, DSS calculates local probability density functions at a candidate station to detect inhomogeneities. The purpose of the current study is to test and compare this geostatistical approach with the methods previously presented in the HOME project, using surrogate precipitation series from the HOME benchmark data set. The benchmark data set contains monthly precipitation surrogate series, from which annual precipitation data series were derived. These annual precipitation series were subject to exploratory analysis and to a thorough variography study. The geostatistical approach was then applied to the data set, based on different scenarios for the spatial continuity. Implementing this procedure also promoted the development of a computer program that aims to assist on the homogenisation of climate data, while minimising user interaction. Finally, in order to compare the effectiveness of this methodology with the homogenisation methods submitted during the HOME project, the obtained results were evaluated using the same performance metrics. This comparison opens new perspectives for the development of an innovative procedure based on the geostatistical stochastic approach. Acknowledgements: The authors gratefully acknowledge the financial support of "Fundação para a Ciência e Tecnologia" (FCT), Portugal, through the research project PTDC/GEO-MET/4026/2012 ("GSIMCLI - Geostatistical simulation with local distributions for the homogenization and interpolation of climate data").
The NAS kernel benchmark program

NASA Technical Reports Server (NTRS)

Bailey, D. H.; Barton, J. T.

1985-01-01

A collection of benchmark test kernels that measure supercomputer performance has been developed for the use of the NAS (Numerical Aerodynamic Simulation) program at the NASA Ames Research Center. This benchmark program is described in detail and the specific ground rules are given for running the program as a performance test.
Benchmarking for Excellence and the Nursing Process

NASA Technical Reports Server (NTRS)

Sleboda, Claire

1999-01-01

Nursing is a service profession. The services provided are essential to life and welfare. Therefore, setting the benchmark for high quality care is fundamental. Exploring the definition of a benchmark value will help to determine a best practice approach. A benchmark is the descriptive statement of a desired level of performance against which quality can be judged. It must be sufficiently well understood by managers and personnel in order that it may serve as a standard against which to measure value.
Evaluation of various LandFlux evapotranspiration algorithms using the LandFlux-EVAL synthesis benchmark products and observational data

NASA Astrophysics Data System (ADS)

Michel, Dominik; Hirschi, Martin; Jimenez, Carlos; McCabe, Mathew; Miralles, Diego; Wood, Eric; Seneviratne, Sonia

2014-05-01

Research on climate variations and the development of predictive capabilities largely rely on globally available reference data series of the different components of the energy and water cycles. Several efforts aimed at producing large-scale and long-term reference data sets of these components, e.g. based on in situ observations and remote sensing, in order to allow for diagnostic analyses of the drivers of temporal variations in the climate system. Evapotranspiration (ET) is an essential component of the energy and water cycle, which can not be monitored directly on a global scale by remote sensing techniques. In recent years, several global multi-year ET data sets have been derived from remote sensing-based estimates, observation-driven land surface model simulations or atmospheric reanalyses. The LandFlux-EVAL initiative presented an ensemble-evaluation of these data sets over the time periods 1989-1995 and 1989-2005 (Mueller et al. 2013). Currently, a multi-decadal global reference heat flux data set for ET at the land surface is being developed within the LandFlux initiative of the Global Energy and Water Cycle Experiment (GEWEX). This LandFlux v0 ET data set comprises four ET algorithms forced with a common radiation and surface meteorology. In order to estimate the agreement of this LandFlux v0 ET data with existing data sets, it is compared to the recently available LandFlux-EVAL synthesis benchmark product. Additional evaluation of the LandFlux v0 ET data set is based on a comparison to in situ observations of a weighing lysimeter from the hydrological research site Rietholzbach in Switzerland. These analyses serve as a test bed for similar evaluation procedures that are envisaged for ESA's WACMOS-ET initiative (http://wacmoset.estellus.eu). Reference: Mueller, B., Hirschi, M., Jimenez, C., Ciais, P., Dirmeyer, P. A., Dolman, A. J., Fisher, J. B., Jung, M., Ludwig, F., Maignan, F., Miralles, D. G., McCabe, M. F., Reichstein, M., Sheffield, J., Wang, K., Wood, E. F., Zhang, Y., and Seneviratne, S. I. (2013). Benchmark products for land evapotranspiration: LandFlux-EVAL multi-data set synthesis. Hydrology and Earth System Sciences, 17(10): 3707-3720.
Alternative industrial carbon emissions benchmark based on input-output analysis

NASA Astrophysics Data System (ADS)

Han, Mengyao; Ji, Xi

2016-12-01

Some problems exist in the current carbon emissions benchmark setting systems. The primary consideration for industrial carbon emissions standards highly relate to direct carbon emissions (power-related emissions) and only a portion of indirect emissions are considered in the current carbon emissions accounting processes. This practice is insufficient and may cause double counting to some extent due to mixed emission sources. To better integrate and quantify direct and indirect carbon emissions, an embodied industrial carbon emissions benchmark setting method is proposed to guide the establishment of carbon emissions benchmarks based on input-output analysis. This method attempts to link direct carbon emissions with inter-industrial economic exchanges and systematically quantifies carbon emissions embodied in total product delivery chains. The purpose of this study is to design a practical new set of embodied intensity-based benchmarks for both direct and indirect carbon emissions. Beijing, at the first level of carbon emissions trading pilot schemes in China, plays a significant role in the establishment of these schemes and is chosen as an example in this study. The newly proposed method tends to relate emissions directly to each responsibility in a practical way through the measurement of complex production and supply chains and reduce carbon emissions from their original sources. This method is expected to be developed under uncertain internal and external contexts and is further expected to be generalized to guide the establishment of industrial benchmarks for carbon emissions trading schemes in China and other countries.

'Wasteaware' benchmark indicators for integrated sustainable waste management in cities.

PubMed

Wilson, David C; Rodic, Ljiljana; Cowing, Michael J; Velis, Costas A; Whiteman, Andrew D; Scheinberg, Anne; Vilches, Recaredo; Masterson, Darragh; Stretz, Joachim; Oelz, Barbara

2015-01-01

This paper addresses a major problem in international solid waste management, which is twofold: a lack of data, and a lack of consistent data to allow comparison between cities. The paper presents an indicator set for integrated sustainable waste management (ISWM) in cities both North and South, to allow benchmarking of a city's performance, comparing cities and monitoring developments over time. It builds on pioneering work for UN-Habitat's solid waste management in the World's cities. The comprehensive analytical framework of a city's solid waste management system is divided into two overlapping 'triangles' - one comprising the three physical components, i.e. collection, recycling, and disposal, and the other comprising three governance aspects, i.e. inclusivity; financial sustainability; and sound institutions and proactive policies. The indicator set includes essential quantitative indicators as well as qualitative composite indicators. This updated and revised 'Wasteaware' set of ISWM benchmark indicators is the cumulative result of testing various prototypes in more than 50 cities around the world. This experience confirms the utility of indicators in allowing comprehensive performance measurement and comparison of both 'hard' physical components and 'soft' governance aspects; and in prioritising 'next steps' in developing a city's solid waste management system, by identifying both local strengths that can be built on and weak points to be addressed. The Wasteaware ISWM indicators are applicable to a broad range of cities with very different levels of income and solid waste management practices. Their wide application as a standard methodology will help to fill the historical data gap. Copyright © 2014 Elsevier Ltd. All rights reserved.
Benchmark of Ab Initio Bethe-Salpeter Equation Approach with Numeric Atom-Centered Orbitals

NASA Astrophysics Data System (ADS)

Liu, Chi; Kloppenburg, Jan; Kanai, Yosuke; Blum, Volker

The Bethe-Salpeter equation (BSE) approach based on the GW approximation has been shown to be successful for optical spectra prediction of solids and recently also for small molecules. We here present an all-electron implementation of the BSE using numeric atom-centered orbital (NAO) basis sets. In this work, we present benchmark of BSE implemented in FHI-aims for low-lying excitation energies for a set of small organic molecules, the well-known Thiel's set. The difference between our implementation (using an analytic continuation of the GW self-energy on the real axis) and the results generated by a fully frequency dependent GW treatment on the real axis is on the order of 0.07 eV for the benchmark molecular set. We study the convergence behavior to the complete basis set limit for excitation spectra, using a group of valence correlation consistent NAO basis sets (NAO-VCC-nZ), as well as for standard NAO basis sets for ground state DFT with extended augmentation functions (NAO+aug). The BSE results and convergence behavior are compared to linear-response time-dependent DFT, where excellent numerical convergence is shown for NAO+aug basis sets.
Ensuring Academic Depth and Rigour in Teacher Education through Benchmarking, with Special Attention to Context

ERIC Educational Resources Information Center

Steyn, H. J.; van der Walt, J. L.; Wolhuter, C. C.

2016-01-01

Benchmarking is one way of ensuring academic depth and rigour in teacher education. After making a case for setting benchmarks in teacher education based on the widely recognised intra-education system contextual factors, the importance of also taking into account the external (e.g. the national-social) context in which teacher education occurs is…
Experimental unsteady pressures at flutter on the Supercritical Wing Benchmark Model

NASA Technical Reports Server (NTRS)

Dansberry, Bryan E.; Durham, Michael H.; Bennett, Robert M.; Rivera, Jose A.; Silva, Walter A.; Wieseman, Carol D.; Turnock, David L.

1993-01-01

This paper describes selected results from the flutter testing of the Supercritical Wing (SW) model. This model is a rigid semispan wing having a rectangular planform and a supercritical airfoil shape. The model was flutter tested in the Langley Transonic Dynamics Tunnel (TDT) as part of the Benchmark Models Program, a multi-year wind tunnel activity currently being conducted by the Structural Dynamics Division of NASA Langley Research Center. The primary objective of this program is to assist in the development and evaluation of aeroelastic computational fluid dynamics codes. The SW is the second of a series of three similar models which are designed to be flutter tested in the TDT on a flexible mount known as the Pitch and Plunge Apparatus. Data sets acquired with these models, including simultaneous unsteady surface pressures and model response data, are meant to be used for correlation with analytical codes. Presented in this report are experimental flutter boundaries and corresponding steady and unsteady pressure distribution data acquired over two model chords located at the 60 and 95 percent span stations.
MoMaS reactive transport benchmark using PFLOTRAN

NASA Astrophysics Data System (ADS)

Park, H.

2017-12-01

MoMaS benchmark was developed to enhance numerical simulation capability for reactive transport modeling in porous media. The benchmark was published in late September of 2009; it is not taken from a real chemical system, but realistic and numerically challenging tests. PFLOTRAN is a state-of-art massively parallel subsurface flow and reactive transport code that is being used in multiple nuclear waste repository projects at Sandia National Laboratories including Waste Isolation Pilot Plant and Used Fuel Disposition. MoMaS benchmark has three independent tests with easy, medium, and hard chemical complexity. This paper demonstrates how PFLOTRAN is applied to this benchmark exercise and shows results of the easy benchmark test case which includes mixing of aqueous components and surface complexation. Surface complexations consist of monodentate and bidentate reactions which introduces difficulty in defining selectivity coefficient if the reaction applies to a bulk reference volume. The selectivity coefficient becomes porosity dependent for bidentate reaction in heterogeneous porous media. The benchmark is solved by PFLOTRAN with minimal modification to address the issue and unit conversions were made properly to suit PFLOTRAN.
Cbench

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ogden, Jeffry B.

2005-09-26

Cbench is intended to be a relatively straightforward collection of tests, benchmarks, applications, utilities, and framework with the goal of facilitating scalable testing and benchmarking of a Linus cluster.
Benchmarking Multilayer-HySEA model for landslide generated tsunami. HTHMP validation process.

NASA Astrophysics Data System (ADS)

Macias, J.; Escalante, C.; Castro, M. J.

2017-12-01

Landslide tsunami hazard may be dominant along significant parts of the coastline around the world, in particular in the USA, as compared to hazards from other tsunamigenic sources. This fact motivated NTHMP about the need of benchmarking models for landslide generated tsunamis, following the same methodology already used for standard tsunami models when the source is seismic. To perform the above-mentioned validation process, a set of candidate benchmarks were proposed. These benchmarks are based on a subset of available laboratory data sets for solid slide experiments and deformable slide experiments, and include both submarine and subaerial slides. A benchmark based on a historic field event (Valdez, AK, 1964) close the list of proposed benchmarks. A total of 7 benchmarks. The Multilayer-HySEA model including non-hydrostatic effects has been used to perform all the benchmarking problems dealing with laboratory experiments proposed in the workshop that was organized at Texas A&M University - Galveston, on January 9-11, 2017 by NTHMP. The aim of this presentation is to show some of the latest numerical results obtained with the Multilayer-HySEA (non-hydrostatic) model in the framework of this validation effort.Acknowledgements. This research has been partially supported by the Spanish Government Research project SIMURISK (MTM2015-70490-C02-01-R) and University of Malaga, Campus de Excelencia Internacional Andalucía Tech. The GPU computations were performed at the Unit of Numerical Methods (University of Malaga).
Baseline ecological risk assessment of the Calcasieu Estuary, Louisiana: 2. An evaluation of the predictive ability of effects-based sediment quality guidelines

USGS Publications Warehouse

MacDonald, Donald D.; Ingersoll, Christopher G.; Smorong, Dawn E.; Sinclair, Jesse A.; Lindskoog, Rebekka; Wang, Ning; Severn, Corrine; Gouguet, Ron; Meyer, John; Field, Jay

2011-01-01

Three sets of effects-based sediment-quality guidelines (SQGs) were evaluated to support the selection of sediment-quality benchmarks for assessing risks to benthic invertebrates in the Calcasieu Estuary, Louisiana. These SQGs included probable effect concentrations (PECs), effects range median values (ERMs), and logistic regression model (LRMs)-based T50 values. The results of this investigation indicate that all three sets of SQGs tend to underestimate sediment toxicity in the Calcasieu Estuary (i.e., relative to the national data sets), as evaluated using the results of 10-day toxicity tests with the amphipod, Hyalella azteca, or Ampelisca abdita, and 28-day whole-sediment toxicity tests with the H. azteca. These results emphasize the importance of deriving site-specific toxicity thresholds for assessing risks to benthic invertebrates.
Benchmarking in pathology: development of an activity-based costing model.

PubMed

Burnett, Leslie; Wilson, Roger; Pfeffer, Sally; Lowry, John

2012-12-01

Benchmarking in Pathology (BiP) allows pathology laboratories to determine the unit cost of all laboratory tests and procedures, and also provides organisational productivity indices allowing comparisons of performance with other BiP participants. We describe 14 years of progressive enhancement to a BiP program, including the implementation of 'avoidable costs' as the accounting basis for allocation of costs rather than previous approaches using 'total costs'. A hierarchical tree-structured activity-based costing model distributes 'avoidable costs' attributable to the pathology activities component of a pathology laboratory operation. The hierarchical tree model permits costs to be allocated across multiple laboratory sites and organisational structures. This has enabled benchmarking on a number of levels, including test profiles and non-testing related workload activities. The development of methods for dealing with variable cost inputs, allocation of indirect costs using imputation techniques, panels of tests, and blood-bank record keeping, have been successfully integrated into the costing model. A variety of laboratory management reports are produced, including the 'cost per test' of each pathology 'test' output. Benchmarking comparisons may be undertaken at any and all of the 'cost per test' and 'cost per Benchmarking Complexity Unit' level, 'discipline/department' (sub-specialty) level, or overall laboratory/site and organisational levels. We have completed development of a national BiP program. An activity-based costing methodology based on avoidable costs overcomes many problems of previous benchmarking studies based on total costs. The use of benchmarking complexity adjustment permits correction for varying test-mix and diagnostic complexity between laboratories. Use of iterative communication strategies with program participants can overcome many obstacles and lead to innovations.
Comparison of different classification algorithms for underwater target discrimination.

PubMed

Li, Donghui; Azimi-Sadjadi, Mahmood R; Robinson, Marc

2004-01-01

Classification of underwater targets from the acoustic backscattered signals is considered here. Several different classification algorithms are tested and benchmarked not only for their performance but also to gain insight to the properties of the feature space. Results on a wideband 80-kHz acoustic backscattered data set collected for six different objects are presented in terms of the receiver operating characteristic (ROC) and robustness of the classifiers wrt reverberation.
Curriculum Policy Seen Through High-Stakes Examinations: Mathematics and Biology in a Selection of School-Leaving Examinations from the Middle East and North Africa

ERIC Educational Resources Information Center

Valverde, Gilbert A.

2005-01-01

A study of curriculum goals set forth in school-leaving examinations in mathematics and biology from Egypt, Iran, Jordan, Lebanon, Morocco, and Tunisia benchmarked against the French baccalaureate examinations. This investigation uncovers and contrasts the goals of secondary education as they are put forward in the tests that are used in the…
Hierarchical Kohonenen net for anomaly detection in network security.

PubMed

Sarasamma, Suseela T; Zhu, Qiuming A; Huff, Julie

2005-04-01

A novel multilevel hierarchical Kohonen Net (K-Map) for an intrusion detection system is presented. Each level of the hierarchical map is modeled as a simple winner-take-all K-Map. One significant advantage of this multilevel hierarchical K-Map is its computational efficiency. Unlike other statistical anomaly detection methods such as nearest neighbor approach, K-means clustering or probabilistic analysis that employ distance computation in the feature space to identify the outliers, our approach does not involve costly point-to-point computation in organizing the data into clusters. Another advantage is the reduced network size. We use the classification capability of the K-Map on selected dimensions of data set in detecting anomalies. Randomly selected subsets that contain both attacks and normal records from the KDD Cup 1999 benchmark data are used to train the hierarchical net. We use a confidence measure to label the clusters. Then we use the test set from the same KDD Cup 1999 benchmark to test the hierarchical net. We show that a hierarchical K-Map in which each layer operates on a small subset of the feature space is superior to a single-layer K-Map operating on the whole feature space in detecting a variety of attacks in terms of detection rate as well as false positive rate.
Performance of Multi-chaotic PSO on a shifted benchmark functions set

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pluhacek, Michal; Senkerik, Roman; Zelinka, Ivan

2015-03-10

In this paper the performance of Multi-chaotic PSO algorithm is investigated using two shifted benchmark functions. The purpose of shifted benchmark functions is to simulate the time-variant real-world problems. The results of chaotic PSO are compared with canonical version of the algorithm. It is concluded that using the multi-chaotic approach can lead to better results in optimization of shifted functions.
Competency based training in robotic surgery: benchmark scores for virtual reality robotic simulation.

PubMed

Raison, Nicholas; Ahmed, Kamran; Fossati, Nicola; Buffi, Nicolò; Mottrie, Alexandre; Dasgupta, Prokar; Van Der Poel, Henk

2017-05-01

To develop benchmark scores of competency for use within a competency based virtual reality (VR) robotic training curriculum. This longitudinal, observational study analysed results from nine European Association of Urology hands-on-training courses in VR simulation. In all, 223 participants ranging from novice to expert robotic surgeons completed 1565 exercises. Competency was set at 75% of the mean expert score. Benchmark scores for all general performance metrics generated by the simulator were calculated. Assessment exercises were selected by expert consensus and through learning-curve analysis. Three basic skill and two advanced skill exercises were identified. Benchmark scores based on expert performance offered viable targets for novice and intermediate trainees in robotic surgery. Novice participants met the competency standards for most basic skill exercises; however, advanced exercises were significantly more challenging. Intermediate participants performed better across the seven metrics but still did not achieve the benchmark standard in the more difficult exercises. Benchmark scores derived from expert performances offer relevant and challenging scores for trainees to achieve during VR simulation training. Objective feedback allows both participants and trainers to monitor educational progress and ensures that training remains effective. Furthermore, the well-defined goals set through benchmarking offer clear targets for trainees and enable training to move to a more efficient competency based curriculum. © 2016 The Authors BJU International © 2016 BJU International Published by John Wiley & Sons Ltd.
A Consumer's Guide to Benchmark Dose Models: Results of U.S. EPA Testing of 14 Dichotomous, 8 Continuous, and 6 Developmental Models (Presentation)

EPA Science Inventory

Benchmark dose risk assessment software (BMDS) was designed by EPA to generate dose-response curves and facilitate the analysis, interpretation and synthesis of toxicological data. Partial results of QA/QC testing of the EPA benchmark dose software (BMDS) are presented. BMDS pr...
Benchmarking to Identify Practice Variation in Test Ordering: A Potential Tool for Utilization Management.

PubMed

Signorelli, Heather; Straseski, Joely A; Genzen, Jonathan R; Walker, Brandon S; Jackson, Brian R; Schmidt, Robert L

2015-01-01

Appropriate test utilization is usually evaluated by adherence to published guidelines. In many cases, medical guidelines are not available. Benchmarking has been proposed as a method to identify practice variations that may represent inappropriate testing. This study investigated the use of benchmarking to identify sites with inappropriate utilization of testing for a particular analyte. We used a Web-based survey to compare 2 measures of vitamin D utilization: overall testing intensity (ratio of total vitamin D orders to blood-count orders) and relative testing intensity (ratio of 1,25(OH)2D to 25(OH)D test orders). A total of 81 facilities contributed data. The average overall testing intensity index was 0.165, or approximately 1 vitamin D test for every 6 blood-count tests. The average relative testing intensity index was 0.055, or one 1,25(OH)2D test for every 18 of the 25(OH)D tests. Both indexes varied considerably. Benchmarking can be used as a screening tool to identify outliers that may be associated with inappropriate test utilization. Copyright© by the American Society for Clinical Pathology (ASCP).
[Benchmarking of university trauma centers in Germany. Research and teaching].

PubMed

Gebhard, F; Raschke, M; Ruchholtz, S; Meffert, R; Marzi, I; Pohlemann, T; Südkamp, N; Josten, C; Zwipp, H

2011-07-01

Benchmarking is a very popular business process and meanwhile is used in research as well. The aim of the present study is to elucidate key numbers of German university trauma departments regarding research and teaching. The data set is based upon the monthly reports given by the administration in each university. As a result the study shows that only well-known parameters such as fund-raising and impact factors can be used to benchmark university-based trauma centers. The German federal system does not allow a nationwide benchmarking.
ωB97M-V: A combinatorially optimized, range-separated hybrid, meta-GGA density functional with VV10 nonlocal correlation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mardirossian, Narbe; Head-Gordon, Martin

2016-06-07

A combinatorially optimized, range-separated hybrid, meta-GGA density functional with VV10 nonlocal correlation is presented in this paper. The final 12-parameter functional form is selected from approximately 10 × 10 9 candidate fits that are trained on a training set of 870 data points and tested on a primary test set of 2964 data points. The resulting density functional, ωB97M-V, is further tested for transferability on a secondary test set of 1152 data points. For comparison, ωB97M-V is benchmarked against 11 leading density functionals including M06-2X, ωB97X-D, M08-HX, M11, ωM05-D, ωB97X-V, and MN15. Encouragingly, the overall performance of ωB97M-V on nearlymore » 5000 data points clearly surpasses that of all of the tested density functionals. Finally, in order to facilitate the use of ωB97M-V, its basis set dependence and integration grid sensitivity are thoroughly assessed, and recommendations that take into account both efficiency and accuracy are provided.« less
Benchmarking image fusion system design parameters

NASA Astrophysics Data System (ADS)

Howell, Christopher L.

2013-06-01

A clear and absolute method for discriminating between image fusion algorithm performances is presented. This method can effectively be used to assist in the design and modeling of image fusion systems. Specifically, it is postulated that quantifying human task performance using image fusion should be benchmarked to whether the fusion algorithm, at a minimum, retained the performance benefit achievable by each independent spectral band being fused. The established benchmark would then clearly represent the threshold that a fusion system should surpass to be considered beneficial to a particular task. A genetic algorithm is employed to characterize the fused system parameters using a Matlab® implementation of NVThermIP as the objective function. By setting the problem up as a mixed-integer constraint optimization problem, one can effectively look backwards through the image acquisition process: optimizing fused system parameters by minimizing the difference between modeled task difficulty measure and the benchmark task difficulty measure. The results of an identification perception experiment are presented, where human observers were asked to identify a standard set of military targets, and used to demonstrate the effectiveness of the benchmarking process.
The Earthquake‐Source Inversion Validation (SIV) Project

USGS Publications Warehouse

Mai, P. Martin; Schorlemmer, Danijel; Page, Morgan T.; Ampuero, Jean-Paul; Asano, Kimiyuki; Causse, Mathieu; Custodio, Susana; Fan, Wenyuan; Festa, Gaetano; Galis, Martin; Gallovic, Frantisek; Imperatori, Walter; Käser, Martin; Malytskyy, Dmytro; Okuwaki, Ryo; Pollitz, Fred; Passone, Luca; Razafindrakoto, Hoby N. T.; Sekiguchi, Haruko; Song, Seok Goo; Somala, Surendra N.; Thingbaijam, Kiran K. S.; Twardzik, Cedric; van Driel, Martin; Vyas, Jagdish C.; Wang, Rongjiang; Yagi, Yuji; Zielke, Olaf

2016-01-01

Finite‐fault earthquake source inversions infer the (time‐dependent) displacement on the rupture surface from geophysical data. The resulting earthquake source models document the complexity of the rupture process. However, multiple source models for the same earthquake, obtained by different research teams, often exhibit remarkable dissimilarities. To address the uncertainties in earthquake‐source inversion methods and to understand strengths and weaknesses of the various approaches used, the Source Inversion Validation (SIV) project conducts a set of forward‐modeling exercises and inversion benchmarks. In this article, we describe the SIV strategy, the initial benchmarks, and current SIV results. Furthermore, we apply statistical tools for quantitative waveform comparison and for investigating source‐model (dis)similarities that enable us to rank the solutions, and to identify particularly promising source inversion approaches. All SIV exercises (with related data and descriptions) and statistical comparison tools are available via an online collaboration platform, and we encourage source modelers to use the SIV benchmarks for developing and testing new methods. We envision that the SIV efforts will lead to new developments for tackling the earthquake‐source imaging problem.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cohen, J; Dossa, D; Gokhale, M

Critical data science applications requiring frequent access to storage perform poorly on today's computing architectures. This project addresses efficient computation of data-intensive problems in national security and basic science by exploring, advancing, and applying a new form of computing called storage-intensive supercomputing (SISC). Our goal is to enable applications that simply cannot run on current systems, and, for a broad range of data-intensive problems, to deliver an order of magnitude improvement in price/performance over today's data-intensive architectures. This technical report documents much of the work done under LDRD 07-ERD-063 Storage Intensive Supercomputing during the period 05/07-09/07. The following chapters describe:more » (1) a new file I/O monitoring tool iotrace developed to capture the dynamic I/O profiles of Linux processes; (2) an out-of-core graph benchmark for level-set expansion of scale-free graphs; (3) an entity extraction benchmark consisting of a pipeline of eight components; and (4) an image resampling benchmark drawn from the SWarp program in the LSST data processing pipeline. The performance of the graph and entity extraction benchmarks was measured in three different scenarios: data sets residing on the NFS file server and accessed over the network; data sets stored on local disk; and data sets stored on the Fusion I/O parallel NAND Flash array. The image resampling benchmark compared performance of software-only to GPU-accelerated. In addition to the work reported here, an additional text processing application was developed that used an FPGA to accelerate n-gram profiling for language classification. The n-gram application will be presented at SC07 at the High Performance Reconfigurable Computing Technologies and Applications Workshop. The graph and entity extraction benchmarks were run on a Supermicro server housing the NAND Flash 40GB parallel disk array, the Fusion-io. The Fusion system specs are as follows: SuperMicro X7DBE Xeon Dual Socket Blackford Server Motherboard; 2 Intel Xeon Dual-Core 2.66 GHz processors; 1 GB DDR2 PC2-5300 RAM (2 x 512); 80GB Hard Drive (Seagate SATA II Barracuda). The Fusion board is presently capable of 4X in a PCIe slot. The image resampling benchmark was run on a dual Xeon workstation with NVIDIA graphics card (see Chapter 5 for full specification). An XtremeData Opteron+FPGA was used for the language classification application. We observed that these benchmarks are not uniformly I/O intensive. The only benchmark that showed greater that 50% of the time in I/O was the graph algorithm when it accessed data files over NFS. When local disk was used, the graph benchmark spent at most 40% of its time in I/O. The other benchmarks were CPU dominated. The image resampling benchmark and language classification showed order of magnitude speedup over software by using co-processor technology to offload the CPU-intensive kernels. Our experiments to date suggest that emerging hardware technologies offer significant benefit to boosting the performance of data-intensive algorithms. Using GPU and FPGA co-processors, we were able to improve performance by more than an order of magnitude on the benchmark algorithms, eliminating the processor bottleneck of CPU-bound tasks. Experiments with a prototype solid state nonvolative memory available today show 10X better throughput on random reads than disk, with a 2X speedup on a graph processing benchmark when compared to the use of local SATA disk.« less
Zn Coordination Chemistry: Development of Benchmark Suites for Geometries, Dipole Moments, and Bond Dissociation Energies and Their Use To Test and Validate Density Functionals and Molecular Orbital Theory.

PubMed

Amin, Elizabeth A; Truhlar, Donald G

2008-01-01

We present nonrelativistic and relativistic benchmark databases (obtained by coupled cluster calculations) of 10 Zn-ligand bond distances, 8 dipole moments, and 12 bond dissociation energies in Zn coordination compounds with O, S, NH3, H2O, OH, SCH3, and H ligands. These are used to test the predictions of 39 density functionals, Hartree-Fock theory, and seven more approximate molecular orbital theories. In the nonrelativisitic case, the M05-2X, B97-2, and mPW1PW functionals emerge as the most accurate ones for this test data, with unitless balanced mean unsigned errors (BMUEs) of 0.33, 0.38, and 0.43, respectively. The best local functionals (i.e., functionals with no Hartree-Fock exchange) are M06-L and τ-HCTH with BMUEs of 0.54 and 0.60, respectively. The popular B3LYP functional has a BMUE of 0.51, only slightly better than the value of 0.54 for the best local functional, which is less expensive. Hartree-Fock theory itself has a BMUE of 1.22. The M05-2X functional has a mean unsigned error of 0.008 Å for bond lengths, 0.19 D for dipole moments, and 4.30 kcal/mol for bond energies. The X3LYP functional has a smaller mean unsigned error (0.007 Å) for bond lengths but has mean unsigned errors of 0.43 D for dipole moments and 5.6 kcal/mol for bond energies. The M06-2X functional has a smaller mean unsigned error (3.3 kcal/mol) for bond energies but has mean unsigned errors of 0.017 Å for bond lengths and 0.37 D for dipole moments. The best of the semiempirical molecular orbital theories are PM3 and PM6, with BMUEs of 1.96 and 2.02, respectively. The ten most accurate functionals from the nonrelativistic benchmark analysis are then tested in relativistic calculations against new benchmarks obtained with coupled-cluster calculations and a relativistic effective core potential, resulting in M05-2X (BMUE = 0.895), PW6B95 (BMUE = 0.90), and B97-2 (BMUE = 0.93) as the top three functionals. We find significant relativistic effects (∼0.01 Å in bond lengths, ∼0.2 D in dipole moments, and ∼4 kcal/mol in Zn-ligand bond energies) that cannot be neglected for accurate modeling, but the same density functionals that do well in all-electron nonrelativistic calculations do well with relativistic effective core potentials. Although most tests are carried out with augmented polarized triple-ζ basis sets, we also carried out some tests with an augmented polarized double-ζ basis set, and we found, on average, that with the smaller basis set DFT has no loss in accuracy for dipole moments and only ∼10% less accurate bond lengths.
The challenge of benchmarking health systems: is ICT innovation capacity more systemic than organizational dependent?

PubMed

Lapão, Luís Velez

2015-01-01

The article by Catan et al. presents a benchmarking exercise comparing Israel and Portugal on the implementation of Information and Communication Technologies in the healthcare sector. Special attention was given to e-Health and m-Health. The authors collected information via a set of interviews with key stakeholders. They compared two different cultures and societies, which have reached slightly different implementation outcomes. Although the comparison is very enlightening, it is also challenging. Benchmarking exercises present a set of challenges, such as the choice of methodologies and the assessment of the impact on organizational strategy. Precise benchmarking methodology is a valid tool for eliciting information about alternatives for improving health systems. However, many beneficial interventions, which benchmark as effective, fail to translate into meaningful healthcare outcomes across contexts. There is a relationship between results and the innovational and competitive environments. Differences in healthcare governance and financing models are well known; but little is known about their impact on Information and Communication Technology implementation. The article by Catan et al. provides interesting clues about this issue. Public systems (such as those of Portugal, UK, Sweden, Spain, etc.) present specific advantages and disadvantages concerning Information and Communication Technology development and implementation. Meanwhile, private systems based fundamentally on insurance packages, (such as Israel, Germany, Netherlands or USA) present a different set of advantages and disadvantages - especially a more open context for innovation. Challenging issues from both the Portuguese and Israeli cases will be addressed. Clearly, more research is needed on both benchmarking methodologies and on ICT implementation strategies.
Bin packing problem solution through a deterministic weighted finite automaton

NASA Astrophysics Data System (ADS)

Zavala-Díaz, J. C.; Pérez-Ortega, J.; Martínez-Rebollar, A.; Almanza-Ortega, N. N.; Hidalgo-Reyes, M.

2016-06-01

In this article the solution of Bin Packing problem of one dimension through a weighted finite automaton is presented. Construction of the automaton and its application to solve three different instances, one synthetic data and two benchmarks are presented: N1C1W1_A.BPP belonging to data set Set_1; and BPP13.BPP belonging to hard28. The optimal solution of synthetic data is obtained. In the first benchmark the solution obtained is one more container than the ideal number of containers and in the second benchmark the solution is two more containers than the ideal solution (approximately 2.5%). The runtime in all three cases was less than one second.
GW100: Benchmarking G0W0 for Molecular Systems.

PubMed

van Setten, Michiel J; Caruso, Fabio; Sharifzadeh, Sahar; Ren, Xinguo; Scheffler, Matthias; Liu, Fang; Lischner, Johannes; Lin, Lin; Deslippe, Jack R; Louie, Steven G; Yang, Chao; Weigend, Florian; Neaton, Jeffrey B; Evers, Ferdinand; Rinke, Patrick

2015-12-08

We present the GW100 set. GW100 is a benchmark set of the ionization potentials and electron affinities of 100 molecules computed with the GW method using three independent GW codes and different GW methodologies. The quasi-particle energies of the highest-occupied molecular orbitals (HOMO) and lowest-unoccupied molecular orbitals (LUMO) are calculated for the GW100 set at the G0W0@PBE level using the software packages TURBOMOLE, FHI-aims, and BerkeleyGW. The use of these three codes allows for a quantitative comparison of the type of basis set (plane wave or local orbital) and handling of unoccupied states, the treatment of core and valence electrons (all electron or pseudopotentials), the treatment of the frequency dependence of the self-energy (full frequency or more approximate plasmon-pole models), and the algorithm for solving the quasi-particle equation. Primary results include reference values for future benchmarks, best practices for convergence within a particular approach, and average error bars for the most common approximations.
Measuring Distribution Performance? Benchmarking Warrants Your Attention

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ericson, Sean J; Alvarez, Paul

Identifying, designing, and measuring performance metrics is critical to securing customer value, but can be a difficult task. This article examines the use of benchmarks based on publicly available performance data to set challenging, yet fair, metrics and targets.
Toxicological benchmarks for potential contaminants of concern for effects on soil and litter invertebrates and heterotrophic process

DOE Office of Scientific and Technical Information (OSTI.GOV)

Will, M.E.; Suter, G.W. II

1995-09-01

An important step in ecological risk assessments is screening the chemicals occur-ring on a site for contaminants of potential concern. Screening may be accomplished by comparing reported ambient concentrations to a set of toxicological benchmarks. Multiple endpoints for assessing risks posed by soil-borne contaminants to organisms directly impacted by them have been established. This report presents benchmarks for soil invertebrates and microbial processes and addresses only chemicals found at United States Department of Energy (DOE) sites. No benchmarks for pesticides are presented. After discussing methods, this report presents the results of the literature review and benchmark derivation for toxicity tomore » earthworms (Sect. 3), heterotrophic microbes and their processes (Sect. 4), and other invertebrates (Sect. 5). The final sections compare the benchmarks to other criteria and background and draw conclusions concerning the utility of the benchmarks.« less
Using benchmarks for radiation testing of microprocessors and FPGAs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Quinn, Heather; Robinson, William H.; Rech, Paolo

Performance benchmarks have been used over the years to compare different systems. These benchmarks can be useful for researchers trying to determine how changes to the technology, architecture, or compiler affect the system's performance. No such standard exists for systems deployed into high radiation environments, making it difficult to assess whether changes in the fabrication process, circuitry, architecture, or software affect reliability or radiation sensitivity. In this paper, we propose a benchmark suite for high-reliability systems that is designed for field-programmable gate arrays and microprocessors. As a result, we describe the development process and report neutron test data for themore » hardware and software benchmarks.« less
Using benchmarks for radiation testing of microprocessors and FPGAs

DOE PAGES

Quinn, Heather; Robinson, William H.; Rech, Paolo; ...

2015-12-17

Performance benchmarks have been used over the years to compare different systems. These benchmarks can be useful for researchers trying to determine how changes to the technology, architecture, or compiler affect the system's performance. No such standard exists for systems deployed into high radiation environments, making it difficult to assess whether changes in the fabrication process, circuitry, architecture, or software affect reliability or radiation sensitivity. In this paper, we propose a benchmark suite for high-reliability systems that is designed for field-programmable gate arrays and microprocessors. As a result, we describe the development process and report neutron test data for themore » hardware and software benchmarks.« less
An analytical benchmark and a Mathematica program for MD codes: Testing LAMMPS on the 2nd generation Brenner potential

NASA Astrophysics Data System (ADS)

Favata, Antonino; Micheletti, Andrea; Ryu, Seunghwa; Pugno, Nicola M.

2016-10-01

An analytical benchmark and a simple consistent Mathematica program are proposed for graphene and carbon nanotubes, that may serve to test any molecular dynamics code implemented with REBO potentials. By exploiting the benchmark, we checked results produced by LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) when adopting the second generation Brenner potential, we made evident that this code in its current implementation produces results which are offset from those of the benchmark by a significant amount, and provide evidence of the reason.
A Standard-Setting Study to Establish College Success Criteria to Inform the SAT® College and Career Readiness Benchmark. Research Report 2012-3

ERIC Educational Resources Information Center

Kobrin, Jennifer L.; Patterson, Brian F.; Wiley, Andrew; Mattern, Krista D.

2012-01-01

In 2011, the College Board released its SAT college and career readiness benchmark, which represents the level of academic preparedness associated with a high likelihood of college success and completion. The goal of this study, which was conducted in 2008, was to establish college success criteria to inform the development of the benchmark. The…
Multi-Level Bitmap Indexes for Flash Memory Storage

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, Kesheng; Madduri, Kamesh; Canon, Shane

2010-07-23

Due to their low access latency, high read speed, and power-efficient operation, flash memory storage devices are rapidly emerging as an attractive alternative to traditional magnetic storage devices. However, tests show that the most efficient indexing methods are not able to take advantage of the flash memory storage devices. In this paper, we present a set of multi-level bitmap indexes that can effectively take advantage of flash storage devices. These indexing methods use coarsely binned indexes to answer queries approximately, and then use finely binned indexes to refine the answers. Our new methods read significantly lower volumes of data atmore » the expense of an increased disk access count, thus taking full advantage of the improved read speed and low access latency of flash devices. To demonstrate the advantage of these new indexes, we measure their performance on a number of storage systems using a standard data warehousing benchmark called the Set Query Benchmark. We observe that multi-level strategies on flash drives are up to 3 times faster than traditional indexing strategies on magnetic disk drives.« less
Implementation and verification of global optimization benchmark problems

NASA Astrophysics Data System (ADS)

Posypkin, Mikhail; Usov, Alexander

2017-12-01

The paper considers the implementation and verification of a test suite containing 150 benchmarks for global deterministic box-constrained optimization. A C++ library for describing standard mathematical expressions was developed for this purpose. The library automate the process of generating the value of a function and its' gradient at a given point and the interval estimates of a function and its' gradient on a given box using a single description. Based on this functionality, we have developed a collection of tests for an automatic verification of the proposed benchmarks. The verification has shown that literary sources contain mistakes in the benchmarks description. The library and the test suite are available for download and can be used freely.
Testing and Benchmarking a 2014 GM Silverado 6L80 Six Speed Automatic Transmission

EPA Science Inventory

Describe the method and test results of EPA’s partial transmission benchmarking process which involves installing both the engine and transmission in an engine dynamometer test cell with the engine wire harness tethered to its vehicle parked outside the test cell.
Benchmark Design and Installation: A synthesis of Existing Information.

DTIC Science & Technology

1987-07-01

casings (15 ft deep) drilled to rock and filled with concrete. Disks - 1 . Set on vertically stable structures (e.g., dam monoliths). 2 . Set in rock ...Structural movement survey 1 . Rock outcrops (first choice) -- chiseled square on high point. 2 . Massive concrete structure (second choice) - cut square on...bolt marker (type 2 ). 58,. % %--"% %I 1 ± 4 -I,.- Table Cl. Recomnded benchmarks. Type of condition or terrain Type of markert Bedrock, rock outcrops
Arithmetic Data Cube as a Data Intensive Benchmark

NASA Technical Reports Server (NTRS)

Frumkin, Michael A.; Shabano, Leonid

2003-01-01

Data movement across computational grids and across memory hierarchy of individual grid machines is known to be a limiting factor for application involving large data sets. In this paper we introduce the Data Cube Operator on an Arithmetic Data Set which we call Arithmetic Data Cube (ADC). We propose to use the ADC to benchmark grid capabilities to handle large distributed data sets. The ADC stresses all levels of grid memory by producing 2d views of an Arithmetic Data Set of d-tuples described by a small number of parameters. We control data intensity of the ADC by controlling the sizes of the views through choice of the tuple parameters.
Policy brief on the current status of certification of electronic Health Records in the US and Europe.

PubMed

De Moor, Georges; O'Brien, John; Fridsma, Doug; Bean, Carol; Devlies, Jos; Cusack, Caitlin M; Bloomrosen, Meryl; Lorenzi, Nancy; Coorevits, Pascal

2011-01-01

If Electronic Health Record systems are to provide an effective contribution to healthcare, a set of benchmarks need to be set to ensure quality control and interoperability of systems. This paper outlines the prevailing status of EHR certification in the US and the EU, compares and contrasts established schemes and poses opportunities for convergence of activity in the domain designed to advance certification endeavours generally. Several EU Member States have in the past proceeded with EHR systems quality labeling and/or certification, but these differ in scope, in legal framework under which they operate, in policies (legislation and financial incentives), in organization, and perhaps most importantly in the quality criteria used for benchmarking. Harmonization, therefore, became a must. Now, through EuroRec (with approaches ranging from self-assessment to third party certification depending on the level of confidence needed) and its Seals, the possibility to achieve this for EHR systems has started in the whole of Europe. The US HITECH Act also attempts to create incentives for all hospitals and eligible providers to adopt and use electronic information. A centerpiece of the Act is to put in place strong financial incentives to adopt and meaningfully use EHRs. The HHS/EHR Certification Programme makes use of ISO/IEC 170XX standards for accreditation, testing and certification. The approved test method addresses the functional and the interoperability requirements defined in the Final Rule criteria and standards. To date six Authorized Testing and Certification Bodies (ATCBs) are testing and certifying products in the US.
Benchmark fragment-based 1H, 13C, 15N and 17O chemical shift predictions in molecular crystals†

PubMed Central

Hartman, Joshua D.; Kudla, Ryan A.; Day, Graeme M.; Mueller, Leonard J.; Beran, Gregory J. O.

2016-01-01

The performance of fragment-based ab initio 1H, 13C, 15N and 17O chemical shift predictions is assessed against experimental NMR chemical shift data in four benchmark sets of molecular crystals. Employing a variety of commonly used density functionals (PBE0, B3LYP, TPSSh, OPBE, PBE, TPSS), we explore the relative performance of cluster, two-body fragment, and combined cluster/fragment models. The hybrid density functionals (PBE0, B3LYP and TPSSh) generally out-perform their generalized gradient approximation (GGA)-based counterparts. 1H, 13C, 15N, and 17O isotropic chemical shifts can be predicted with root-mean-square errors of 0.3, 1.5, 4.2, and 9.8 ppm, respectively, using a computationally inexpensive electrostatically embedded two-body PBE0 fragment model. Oxygen chemical shieldings prove particularly sensitive to local many-body effects, and using a combined cluster/fragment model instead of the simple two-body fragment model decreases the root-mean-square errors to 7.6 ppm. These fragment-based model errors compare favorably with GIPAW PBE ones of 0.4, 2.2, 5.4, and 7.2 ppm for the same 1H, 13C, 15N, and 17O test sets. Using these benchmark calculations, a set of recommended linear regression parameters for mapping between calculated chemical shieldings and observed chemical shifts are provided and their robustness assessed using statistical cross-validation. We demonstrate the utility of these approaches and the reported scaling parameters on applications to 9-tertbutyl anthracene, several histidine co-crystals, benzoic acid and the C-nitrosoarene SnCl2(CH3)2(NODMA)2. PMID:27431490
Benchmark fragment-based (1)H, (13)C, (15)N and (17)O chemical shift predictions in molecular crystals.

PubMed

Hartman, Joshua D; Kudla, Ryan A; Day, Graeme M; Mueller, Leonard J; Beran, Gregory J O

2016-08-21

The performance of fragment-based ab initio(1)H, (13)C, (15)N and (17)O chemical shift predictions is assessed against experimental NMR chemical shift data in four benchmark sets of molecular crystals. Employing a variety of commonly used density functionals (PBE0, B3LYP, TPSSh, OPBE, PBE, TPSS), we explore the relative performance of cluster, two-body fragment, and combined cluster/fragment models. The hybrid density functionals (PBE0, B3LYP and TPSSh) generally out-perform their generalized gradient approximation (GGA)-based counterparts. (1)H, (13)C, (15)N, and (17)O isotropic chemical shifts can be predicted with root-mean-square errors of 0.3, 1.5, 4.2, and 9.8 ppm, respectively, using a computationally inexpensive electrostatically embedded two-body PBE0 fragment model. Oxygen chemical shieldings prove particularly sensitive to local many-body effects, and using a combined cluster/fragment model instead of the simple two-body fragment model decreases the root-mean-square errors to 7.6 ppm. These fragment-based model errors compare favorably with GIPAW PBE ones of 0.4, 2.2, 5.4, and 7.2 ppm for the same (1)H, (13)C, (15)N, and (17)O test sets. Using these benchmark calculations, a set of recommended linear regression parameters for mapping between calculated chemical shieldings and observed chemical shifts are provided and their robustness assessed using statistical cross-validation. We demonstrate the utility of these approaches and the reported scaling parameters on applications to 9-tert-butyl anthracene, several histidine co-crystals, benzoic acid and the C-nitrosoarene SnCl2(CH3)2(NODMA)2.
How to Advance TPC Benchmarks with Dependability Aspects

NASA Astrophysics Data System (ADS)

Almeida, Raquel; Poess, Meikel; Nambiar, Raghunath; Patil, Indira; Vieira, Marco

Transactional systems are the core of the information systems of most organizations. Although there is general acknowledgement that failures in these systems often entail significant impact both on the proceeds and reputation of companies, the benchmarks developed and managed by the Transaction Processing Performance Council (TPC) still maintain their focus on reporting bare performance. Each TPC benchmark has to pass a list of dependability-related tests (to verify ACID properties), but not all benchmarks require measuring their performances. While TPC-E measures the recovery time of some system failures, TPC-H and TPC-C only require functional correctness of such recovery. Consequently, systems used in TPC benchmarks are tuned mostly for performance. In this paper we argue that nowadays systems should be tuned for a more comprehensive suite of dependability tests, and that a dependability metric should be part of TPC benchmark publications. The paper discusses WHY and HOW this can be achieved. Two approaches are introduced and discussed: augmenting each TPC benchmark in a customized way, by extending each specification individually; and pursuing a more unified approach, defining a generic specification that could be adjoined to any TPC benchmark.

Test Scheduling for Core-Based SOCs Using Genetic Algorithm Based Heuristic Approach

NASA Astrophysics Data System (ADS)

Giri, Chandan; Sarkar, Soumojit; Chattopadhyay, Santanu

This paper presents a Genetic algorithm (GA) based solution to co-optimize test scheduling and wrapper design for core based SOCs. Core testing solutions are generated as a set of wrapper configurations, represented as rectangles with width equal to the number of TAM (Test Access Mechanism) channels and height equal to the corresponding testing time. A locally optimal best-fit heuristic based bin packing algorithm has been used to determine placement of rectangles minimizing the overall test times, whereas, GA has been utilized to generate the sequence of rectangles to be considered for placement. Experimental result on ITC'02 benchmark SOCs shows that the proposed method provides better solutions compared to the recent works reported in the literature.
Investigating dye performance and crosstalk in fluorescence enabled bioimaging using a model system

PubMed Central

Arppe, Riikka; Carro-Temboury, Miguel R.; Hempel, Casper; Vosch, Tom

2017-01-01

Detailed imaging of biological structures, often smaller than the diffraction limit, is possible in fluorescence microscopy due to the molecular size and photophysical properties of fluorescent probes. Advances in hardware and multiple providers of high-end bioimaging makes comparing images between studies and between research groups very difficult. Therefore, we suggest a model system to benchmark instrumentation, methods and staining procedures. The system we introduce is based on doped zeolites in stained polyvinyl alcohol (PVA) films: a highly accessible model system which has the properties needed to act as a benchmark in bioimaging experiments. Rather than comparing molecular probes and imaging methods in complicated biological systems, we demonstrate that the model system can emulate this complexity and can be used to probe the effect of concentration, brightness, and cross-talk of fluorophores on the detected fluorescence signal. The described model system comprises of lanthanide (III) ion doped Linde Type A zeolites dispersed in a PVA film stained with fluorophores. We tested: F18, MitoTracker Red and ATTO647N. This model system allowed comparing performance of the fluorophores in experimental conditions. Importantly, we here report considerable cross-talk of the dyes when exchanging excitation and emission settings. Additionally, bleaching was quantified. The proposed model makes it possible to test and benchmark staining procedures before these dyes are applied to more complex biological systems. PMID:29176775
International benchmarking of specialty hospitals. A series of case studies on comprehensive cancer centres.

PubMed

van Lent, Wineke A M; de Beer, Relinde D; van Harten, Wim H

2010-08-31

Benchmarking is one of the methods used in business that is applied to hospitals to improve the management of their operations. International comparison between hospitals can explain performance differences. As there is a trend towards specialization of hospitals, this study examines the benchmarking process and the success factors of benchmarking in international specialized cancer centres. Three independent international benchmarking studies on operations management in cancer centres were conducted. The first study included three comprehensive cancer centres (CCC), three chemotherapy day units (CDU) were involved in the second study and four radiotherapy departments were included in the final study. Per multiple case study a research protocol was used to structure the benchmarking process. After reviewing the multiple case studies, the resulting description was used to study the research objectives. We adapted and evaluated existing benchmarking processes through formalizing stakeholder involvement and verifying the comparability of the partners. We also devised a framework to structure the indicators to produce a coherent indicator set and better improvement suggestions. Evaluating the feasibility of benchmarking as a tool to improve hospital processes led to mixed results. Case study 1 resulted in general recommendations for the organizations involved. In case study 2, the combination of benchmarking and lean management led in one CDU to a 24% increase in bed utilization and a 12% increase in productivity. Three radiotherapy departments of case study 3, were considering implementing the recommendations.Additionally, success factors, such as a well-defined and small project scope, partner selection based on clear criteria, stakeholder involvement, simple and well-structured indicators, analysis of both the process and its results and, adapt the identified better working methods to the own setting, were found. The improved benchmarking process and the success factors can produce relevant input to improve the operations management of specialty hospitals.
International benchmarking of specialty hospitals. A series of case studies on comprehensive cancer centres

PubMed Central

2010-01-01

Background Benchmarking is one of the methods used in business that is applied to hospitals to improve the management of their operations. International comparison between hospitals can explain performance differences. As there is a trend towards specialization of hospitals, this study examines the benchmarking process and the success factors of benchmarking in international specialized cancer centres. Methods Three independent international benchmarking studies on operations management in cancer centres were conducted. The first study included three comprehensive cancer centres (CCC), three chemotherapy day units (CDU) were involved in the second study and four radiotherapy departments were included in the final study. Per multiple case study a research protocol was used to structure the benchmarking process. After reviewing the multiple case studies, the resulting description was used to study the research objectives. Results We adapted and evaluated existing benchmarking processes through formalizing stakeholder involvement and verifying the comparability of the partners. We also devised a framework to structure the indicators to produce a coherent indicator set and better improvement suggestions. Evaluating the feasibility of benchmarking as a tool to improve hospital processes led to mixed results. Case study 1 resulted in general recommendations for the organizations involved. In case study 2, the combination of benchmarking and lean management led in one CDU to a 24% increase in bed utilization and a 12% increase in productivity. Three radiotherapy departments of case study 3, were considering implementing the recommendations. Additionally, success factors, such as a well-defined and small project scope, partner selection based on clear criteria, stakeholder involvement, simple and well-structured indicators, analysis of both the process and its results and, adapt the identified better working methods to the own setting, were found. Conclusions The improved benchmarking process and the success factors can produce relevant input to improve the operations management of specialty hospitals. PMID:20807408
Engine dynamic analysis with general nonlinear finite element codes. II - Bearing element implementation, overall numerical characteristics and benchmarking

NASA Technical Reports Server (NTRS)

Padovan, J.; Adams, M.; Lam, P.; Fertis, D.; Zeid, I.

1982-01-01

Second-year efforts within a three-year study to develop and extend finite element (FE) methodology to efficiently handle the transient/steady state response of rotor-bearing-stator structure associated with gas turbine engines are outlined. The two main areas aim at (1) implanting the squeeze film damper element into a general purpose FE code for testing and evaluation; and (2) determining the numerical characteristics of the FE-generated rotor-bearing-stator simulation scheme. The governing FE field equations are set out and the solution methodology is presented. The choice of ADINA as the general-purpose FE code is explained, and the numerical operational characteristics of the direct integration approach of FE-generated rotor-bearing-stator simulations is determined, including benchmarking, comparison of explicit vs. implicit methodologies of direct integration, and demonstration problems.
Performance of Landslide-HySEA tsunami model for NTHMP benchmarking validation process

NASA Astrophysics Data System (ADS)

Macias, Jorge

2017-04-01

In its FY2009 Strategic Plan, the NTHMP required that all numerical tsunami inundation models be verified as accurate and consistent through a model benchmarking process. This was completed in 2011, but only for seismic tsunami sources and in a limited manner for idealized solid underwater landslides. Recent work by various NTHMP states, however, has shown that landslide tsunami hazard may be dominant along significant parts of the US coastline, as compared to hazards from other tsunamigenic sources. To perform the above-mentioned validation process, a set of candidate benchmarks were proposed. These benchmarks are based on a subset of available laboratory date sets for solid slide experiments and deformable slide experiments, and include both submarine and subaerial slides. A benchmark based on a historic field event (Valdez, AK, 1964) close the list of proposed benchmarks. The Landslide-HySEA model has participated in the workshop that was organized at Texas A&M University - Galveston, on January 9-11, 2017. The aim of this presentation is to show some of the numerical results obtained for Landslide-HySEA in the framework of this benchmarking validation/verification effort. Acknowledgements. This research has been partially supported by the Junta de Andalucía research project TESELA (P11-RNM7069), the Spanish Government Research project SIMURISK (MTM2015-70490-C02-01-R) and Universidad de Málaga, Campus de Excelencia Internacional Andalucía Tech. The GPU computations were performed at the Unit of Numerical Methods (University of Malaga).
Toxicological Benchmarks for Screening Potential Contaminants of Concern for Effects on Soil and Litter Invertebrates and Heterotrophic Process

DOE Office of Scientific and Technical Information (OSTI.GOV)

Will, M.E.

1994-01-01

This report presents a standard method for deriving benchmarks for the purpose of ''contaminant screening,'' performed by comparing measured ambient concentrations of chemicals. The work was performed under Work Breakdown Structure 1.4.12.2.3.04.07.02 (Activity Data Sheet 8304). In addition, this report presents sets of data concerning the effects of chemicals in soil on invertebrates and soil microbial processes, benchmarks for chemicals potentially associated with United States Department of Energy sites, and literature describing the experiments from which data were drawn for benchmark derivation.
How to benchmark methods for structure-based virtual screening of large compound libraries.

PubMed

Christofferson, Andrew J; Huang, Niu

2012-01-01

Structure-based virtual screening is a useful computational technique for ligand discovery. To systematically evaluate different docking approaches, it is important to have a consistent benchmarking protocol that is both relevant and unbiased. Here, we describe the designing of a benchmarking data set for docking screen assessment, a standard docking screening process, and the analysis and presentation of the enrichment of annotated ligands among a background decoy database.
Benchmarking protein classification algorithms via supervised cross-validation.

PubMed

Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor

2008-04-24

Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.
Assessment of composite motif discovery methods.

PubMed

Klepper, Kjetil; Sandve, Geir K; Abul, Osman; Johansen, Jostein; Drablos, Finn

2008-02-26

Computational discovery of regulatory elements is an important area of bioinformatics research and more than a hundred motif discovery methods have been published. Traditionally, most of these methods have addressed the problem of single motif discovery - discovering binding motifs for individual transcription factors. In higher organisms, however, transcription factors usually act in combination with nearby bound factors to induce specific regulatory behaviours. Hence, recent focus has shifted from single motifs to the discovery of sets of motifs bound by multiple cooperating transcription factors, so called composite motifs or cis-regulatory modules. Given the large number and diversity of methods available, independent assessment of methods becomes important. Although there have been several benchmark studies of single motif discovery, no similar studies have previously been conducted concerning composite motif discovery. We have developed a benchmarking framework for composite motif discovery and used it to evaluate the performance of eight published module discovery tools. Benchmark datasets were constructed based on real genomic sequences containing experimentally verified regulatory modules, and the module discovery programs were asked to predict both the locations of these modules and to specify the single motifs involved. To aid the programs in their search, we provided position weight matrices corresponding to the binding motifs of the transcription factors involved. In addition, selections of decoy matrices were mixed with the genuine matrices on one dataset to test the response of programs to varying levels of noise. Although some of the methods tested tended to score somewhat better than others overall, there were still large variations between individual datasets and no single method performed consistently better than the rest in all situations. The variation in performance on individual datasets also shows that the new benchmark datasets represents a suitable variety of challenges to most methods for module discovery.
A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge

PubMed Central

Gururaj, Anupama E.; Chen, Xiaoling; Pournejati, Saeid; Alter, George; Hersh, William R.; Demner-Fushman, Dina; Ohno-Machado, Lucila

2017-01-01

Abstract The rapid proliferation of publicly available biomedical datasets has provided abundant resources that are potentially of value as a means to reproduce prior experiments, and to generate and explore novel hypotheses. However, there are a number of barriers to the re-use of such datasets, which are distributed across a broad array of dataset repositories, focusing on different data types and indexed using different terminologies. New methods are needed to enable biomedical researchers to locate datasets of interest within this rapidly expanding information ecosystem, and new resources are needed for the formal evaluation of these methods as they emerge. In this paper, we describe the design and generation of a benchmark for information retrieval of biomedical datasets, which was developed and used for the 2016 bioCADDIE Dataset Retrieval Challenge. In the tradition of the seminal Cranfield experiments, and as exemplified by the Text Retrieval Conference (TREC), this benchmark includes a corpus (biomedical datasets), a set of queries, and relevance judgments relating these queries to elements of the corpus. This paper describes the process through which each of these elements was derived, with a focus on those aspects that distinguish this benchmark from typical information retrieval reference sets. Specifically, we discuss the origin of our queries in the context of a larger collaborative effort, the biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium, and the distinguishing features of biomedical dataset retrieval as a task. The resulting benchmark set has been made publicly available to advance research in the area of biomedical dataset retrieval. Database URL: https://biocaddie.org/benchmark-data PMID:29220453
PMLB: a large benchmark suite for machine learning evaluation and comparison.

PubMed

Olson, Randal S; La Cava, William; Orzechowski, Patryk; Urbanowicz, Ryan J; Moore, Jason H

2017-01-01

The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. From this study, we find that existing benchmarks lack the diversity to properly benchmark machine learning algorithms, and there are several gaps in benchmarking problems that still need to be considered. This work represents another important step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.
Benchmarking of relative permeability

NASA Astrophysics Data System (ADS)

DiCarlo, D. A.

2017-12-01

Relative permeability is the key relation in terms of multi-phase flow through porous media. There are hundreds of published relative permeability curves for various media, some classic (Oak 90 and 91), some contradictory. This can lead to a confusing situation if one is trying to benchmark simulation results to "experimental data". Coming from the experimental side, I have found that modelers have too much trust in relative permeability data sets. In this talk, I will discuss reasons for discrepancies within and between data sets, and give guidance on which portions of the data sets are most solid in terms of matching through models.
Implementation of Chaotic Gaussian Particle Swarm Optimization for Optimize Learning-to-Rank Software Defect Prediction Model Construction

NASA Astrophysics Data System (ADS)

Buchari, M. A.; Mardiyanto, S.; Hendradjaya, B.

2018-03-01

Finding the existence of software defect as early as possible is the purpose of research about software defect prediction. Software defect prediction activity is required to not only state the existence of defects, but also to be able to give a list of priorities which modules require a more intensive test. Therefore, the allocation of test resources can be managed efficiently. Learning to rank is one of the approach that can provide defect module ranking data for the purposes of software testing. In this study, we propose a meta-heuristic chaotic Gaussian particle swarm optimization to improve the accuracy of learning to rank software defect prediction approach. We have used 11 public benchmark data sets as experimental data. Our overall results has demonstrated that the prediction models construct using Chaotic Gaussian Particle Swarm Optimization gets better accuracy on 5 data sets, ties in 5 data sets and gets worse in 1 data sets. Thus, we conclude that the application of Chaotic Gaussian Particle Swarm Optimization in Learning-to-Rank approach can improve the accuracy of the defect module ranking in data sets that have high-dimensional features.
Comparing the performance of two CBIRS indexing schemes

NASA Astrophysics Data System (ADS)

Mueller, Wolfgang; Robbert, Guenter; Henrich, Andreas

2003-01-01

Content based image retrieval (CBIR) as it is known today has to deal with a number of challenges. Quickly summarized, the main challenges are firstly, to bridge the semantic gap between high-level concepts and low-level features using feedback, secondly to provide performance under adverse conditions. High-dimensional spaces, as well as a demanding machine learning task make the right way of indexing an important issue. When indexing multimedia data, most groups opt for extraction of high-dimensional feature vectors from the data, followed by dimensionality reduction like PCA (Principal Components Analysis) or LSI (Latent Semantic Indexing). The resulting vectors are indexed using spatial indexing structures such as kd-trees or R-trees, for example. Other projects, such as MARS and Viper propose the adaptation of text indexing techniques, notably the inverted file. Here, the Viper system is the most direct adaptation of text retrieval techniques to quantized vectors. However, while the Viper query engine provides decent performance together with impressive user-feedback behavior, as well as the possibility for easy integration of long-term learning algorithms, and support for potentially infinite feature vectors, there has been no comparison of vector-based methods and inverted-file-based methods under similar conditions. In this publication, we compare a CBIR query engine that uses inverted files (Bothrops, a rewrite of the Viper query engine based on a relational database), and a CBIR query engine based on LSD (Local Split Decision) trees for spatial indexing using the same feature sets. The Benchathlon initiative works on providing a set of images and ground truth for simulating image queries by example and corresponding user feedback. When performing the Benchathlon benchmark on a CBIR system (the System Under Test, SUT), a benchmarking harness connects over internet to the SUT, performing a number of queries using an agreed-upon protocol, the multimedia retrieval markup language (MRML). Using this benchmark one can measure the quality of retrieval, as well as the overall (speed) performance of the benchmarked system. Our Benchmarks will draw on the Benchathlon"s work for documenting the retrieval performance of both inverted file-based and LSD tree based techniques. However in addition to these results, we will present statistics, that can be obtained only inside the system under test. These statistics will include the number of complex mathematical operations, as well as the amount of data that has to be read from disk during operation of a query.
Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas.

PubMed

Liseron-Monfils, Christophe; Lewis, Tim; Ashlock, Daniel; McNicholas, Paul D; Fauteux, François; Strömvik, Martina; Raizada, Manish N

2013-03-15

The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs. Several algorithms exist to predict cis-acting elements, but none have been adapted for maize. A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at http://www.promzea.org and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated in silico using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter motifs in their promoters, perhaps uncovering a broader co-regulated gene network. Promzea was also tested against tissue-specific microarray data from maize. An online tool customized for promoter motif discovery in plants has been generated called Promzea. Promzea was validated in silico by its ability to retrieve benchmark motifs and experimentally defined motifs and was tested using tissue-specific microarray data. Promzea predicted broader networks of gene regulation associated with the historic anthocyanin and phlobaphene biosynthetic pathways. Promzea is a new bioinformatics tool for understanding transcriptional gene regulation in maize and has been expanded to include rice and Arabidopsis.
7 CFR 245.12 - State agencies and direct certification requirements.

Code of Federal Regulations, 2014 CFR

2014-01-01

... NUTRITION SERVICE, DEPARTMENT OF AGRICULTURE CHILD NUTRITION PROGRAMS DETERMINING ELIGIBILITY FOR FREE AND... performance benchmarks set forth in paragraph (b) of this section for directly certifying children who are.... State agencies must meet performance benchmarks for directly certifying for free school meals children...
Intercode comparison of gyrokinetic global electromagnetic modes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Görler, T., E-mail: tobias.goerler@ipp.mpg.de; Tronko, N.; Hornsby, W. A.

Aiming to fill a corresponding lack of sophisticated test cases for global electromagnetic gyrokinetic codes, a new hierarchical benchmark is proposed. Starting from established test sets with adiabatic electrons, fully gyrokinetic electrons, and electrostatic fluctuations are taken into account before finally studying the global electromagnetic micro-instabilities. Results from up to five codes involving representatives from different numerical approaches as particle-in-cell methods, Eulerian and Semi-Lagrangian are shown. By means of spectrally resolved growth rates and frequencies and mode structure comparisons, agreement can be confirmed on ion-gyro-radius scales, thus providing confidence in the correct implementation of the underlying equations.
MUSCLE: multiple sequence alignment with high accuracy and high throughput.

PubMed

Edgar, Robert C

2004-01-01

We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.
Benchmarking Ada tasking on tightly coupled multiprocessor architectures

NASA Technical Reports Server (NTRS)

Collard, Philippe; Goforth, Andre; Marquardt, Matthew

1989-01-01

The development of benchmarks and performance measures for parallel Ada tasking is reported with emphasis on the macroscopic behavior of the benchmark across a set of load parameters. The application chosen for the study was the NASREM model for telerobot control, relevant to many NASA missions. The results of the study demonstrate the potential of parallel Ada in accomplishing the task of developing a control system for a system such as the Flight Telerobotic Servicer using the NASREM framework.

A formative evaluation of CU-SeeMe

NASA Astrophysics Data System (ADS)

Bibeau, Michael

1995-02-01

CU-SeeMe is a video conferencing software package that was designed and programmed at Cornell University. The program works with the TCP/IP network protocol and allows two or more parties to conduct a real-time video conference with full audio support. In this paper we evaluate CU-SeeMe through the process of Formative Evaluation. We first perform a Critical Review of the software using a subset of the Smith and Mosier Guidelines for Human-Computer Interaction. Next, we empirically review the software interface through a series of benchmark tests that are derived directly from a set of scenarios. The scenarios attempt to model real world situations that might be encountered by an individual in the target user class. Designing benchmark tasks becomes a natural and straightforward process when they are derived from the scenario set. Empirical measures are taken for each task, including completion times and error counts. These measures are accompanied by critical incident analysis 2 7 13 which serves to identify problems with the interface and the cognitive roots of those problems. The critical incidents reported by participants are accompanied by explanations of what caused the problem and why This helps in the process of formulating solutions for observed usability problems. All the testing results are combined in the Appendix in an illustrated partial redesign of the CU-SeeMe Interface.
Error Rates in Users of Automatic Face Recognition Software

PubMed Central

White, David; Dunn, James D.; Schmid, Alexandra C.; Kemp, Richard I.

2015-01-01

In recent years, wide deployment of automatic face recognition systems has been accompanied by substantial gains in algorithm performance. However, benchmarking tests designed to evaluate these systems do not account for the errors of human operators, who are often an integral part of face recognition solutions in forensic and security settings. This causes a mismatch between evaluation tests and operational accuracy. We address this by measuring user performance in a face recognition system used to screen passport applications for identity fraud. Experiment 1 measured target detection accuracy in algorithm-generated ‘candidate lists’ selected from a large database of passport images. Accuracy was notably poorer than in previous studies of unfamiliar face matching: participants made over 50% errors for adult target faces, and over 60% when matching images of children. Experiment 2 then compared performance of student participants to trained passport officers–who use the system in their daily work–and found equivalent performance in these groups. Encouragingly, a group of highly trained and experienced “facial examiners” outperformed these groups by 20 percentage points. We conclude that human performance curtails accuracy of face recognition systems–potentially reducing benchmark estimates by 50% in operational settings. Mere practise does not attenuate these limits, but superior performance of trained examiners suggests that recruitment and selection of human operators, in combination with effective training and mentorship, can improve the operational accuracy of face recognition systems. PMID:26465631
Comparison of Origin 2000 and Origin 3000 Using NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Turney, Raymond D.

2001-01-01

This report describes results of benchmark tests on the Origin 3000 system currently being installed at the NASA Ames National Advanced Supercomputing facility. This machine will ultimately contain 1024 R14K processors. The first part of the system, installed in November, 2000 and named mendel, is an Origin 3000 with 128 R12K processors. For comparison purposes, the tests were also run on lomax, an Origin 2000 with R12K processors. The BT, LU, and SP application benchmarks in the NAS Parallel Benchmark Suite and the kernel benchmark FT were chosen to determine system performance and measure the impact of changes on the machine as it evolves. Having been written to measure performance on Computational Fluid Dynamics applications, these benchmarks are assumed appropriate to represent the NAS workload. Since the NAS runs both message passing (MPI) and shared-memory, compiler directive type codes, both MPI and OpenMP versions of the benchmarks were used. The MPI versions used were the latest official release of the NAS Parallel Benchmarks, version 2.3. The OpenMP versiqns used were PBN3b2, a beta version that is in the process of being released. NPB 2.3 and PBN 3b2 are technically different benchmarks, and NPB results are not directly comparable to PBN results.
Can data-driven benchmarks be used to set the goals of healthy people 2010?

PubMed Central

Allison, J; Kiefe, C I; Weissman, N W

1999-01-01

OBJECTIVES: Expert panels determined the public health goals of Healthy People 2000 subjectively. The present study examined whether data-driven benchmarks provide a better alternative. METHODS: We developed the "pared-mean" method to define from data the best achievable health care practices. We calculated the pared-mean benchmark for screening mammography from the 1994 National Health Interview Survey, using the metropolitan statistical area as the "provider" unit. Beginning with the best-performing provider and adding providers in descending sequence, we established the minimum provider subset that included at least 10% of all women surveyed on this question. The pared-mean benchmark is then the proportion of women in this subset who received mammography. RESULTS: The pared-mean benchmark for screening mammography was 71%, compared with the Healthy People 2000 goal of 60%. CONCLUSIONS: For Healthy People 2010, benchmarks derived from data reflecting the best available care provide viable alternatives to consensus-derived targets. We are currently pursuing additional refinements to the data-driven pared-mean benchmark approach. PMID:9987466
Surflex-Dock: Docking benchmarks and real-world application

NASA Astrophysics Data System (ADS)

Spitzer, Russell; Jain, Ajay N.

2012-06-01

Benchmarks for molecular docking have historically focused on re-docking the cognate ligand of a well-determined protein-ligand complex to measure geometric pose prediction accuracy, and measurement of virtual screening performance has been focused on increasingly large and diverse sets of target protein structures, cognate ligands, and various types of decoy sets. Here, pose prediction is reported on the Astex Diverse set of 85 protein ligand complexes, and virtual screening performance is reported on the DUD set of 40 protein targets. In both cases, prepared structures of targets and ligands were provided by symposium organizers. The re-prepared data sets yielded results not significantly different than previous reports of Surflex-Dock on the two benchmarks. Minor changes to protein coordinates resulting from complex pre-optimization had large effects on observed performance, highlighting the limitations of cognate ligand re-docking for pose prediction assessment. Docking protocols developed for cross-docking, which address protein flexibility and produce discrete families of predicted poses, produced substantially better performance for pose prediction. Performance on virtual screening performance was shown to benefit by employing and combining multiple screening methods: docking, 2D molecular similarity, and 3D molecular similarity. In addition, use of multiple protein conformations significantly improved screening enrichment.
Toxicological benchmarks for screening potential contaminants of concern for effects on terrestrial plants: 1994 revision

DOE Office of Scientific and Technical Information (OSTI.GOV)

Will, M.E.; Suter, G.W. II

1994-09-01

One of the initial stages in ecological risk assessment for hazardous waste sites is screening contaminants to determine which of them are worthy of further consideration as contaminants of potential concern. This process is termed contaminant screening. It is performed by comparing measured ambient concentrations of chemicals to benchmark concentrations. Currently, no standard benchmark concentrations exist for assessing contaminants in soil with respect to their toxicity to plants. This report presents a standard method for deriving benchmarks for this purpose (phytotoxicity benchmarks), a set of data concerning effects of chemicals in soil or soil solution on plants, and a setmore » of phytotoxicity benchmarks for 38 chemicals potentially associated with United States Department of Energy (DOE) sites. In addition, background information on the phytotoxicity and occurrence of the chemicals in soils is presented, and literature describing the experiments from which data were drawn for benchmark derivation is reviewed. Chemicals that are found in soil at concentrations exceeding both the phytotoxicity benchmark and the background concentration for the soil type should be considered contaminants of potential concern.« less
Toxicological Benchmarks for Screening Potential Contaminants of Concern for Effects on Terrestrial Plants

DOE Office of Scientific and Technical Information (OSTI.GOV)

Suter, G.W. II

1993-01-01

One of the initial stages in ecological risk assessment for hazardous waste sites is screening contaminants to determine which of them are worthy of further consideration as contaminants of potential concern. This process is termed contaminant screening. It is performed by comparing measured ambient concentrations of chemicals to benchmark concentrations. Currently, no standard benchmark concentrations exist for assessing contaminants in soil with respect to their toxicity to plants. This report presents a standard method for deriving benchmarks for this purpose (phytotoxicity benchmarks), a set of data concerning effects of chemicals in soil or soil solution on plants, and a setmore » of phytotoxicity benchmarks for 38 chemicals potentially associated with United States Department of Energy (DOE) sites. In addition, background information on the phytotoxicity and occurrence of the chemicals in soils is presented, and literature describing the experiments from which data were drawn for benchmark derivation is reviewed. Chemicals that are found in soil at concentrations exceeding both the phytotoxicity benchmark and the background concentration for the soil type should be considered contaminants of potential concern.« less
Groundwater-quality data in the North San Francisco Bay Shallow Aquifer study unit, 2012: results from the California GAMA Program

USGS Publications Warehouse

Bennett, George L.; Fram, Miranda S.

2014-01-01

Results for constituents with non-regulatory benchmarks set for aesthetic concerns from the grid wells showed that iron concentrations greater than the CDPH secondary maximum contaminant level (SMCL-CA) of 300 μg/L were detected in 13 grid wells. Chloride was detected at a concentration greater than the SMCL-CA recommended benchmark of 250 mg/L in two grid wells. Sulfate concentrations greater than the SMCL-CA recommended benchmark of 250 mg/L were measured in two grid wells, and the concentration in one of these wells was also greater than the SMCL-CA upper benchmark of 500 mg/L. TDS concentrations greater than the SMCL-CA recommended benchmark of 500 mg/L were measured in 15 grid wells, and concentrations in 4 of these wells were also greater than the SMCL-CA upper benchmark of 1,000 mg/L.
Use of the 22C3 anti-PD-L1 antibody to determine PD-L1 expression in multiple automated immunohistochemistry platforms.

PubMed

Ilie, Marius; Khambata-Ford, Shirin; Copie-Bergman, Christiane; Huang, Lingkang; Juco, Jonathan; Hofman, Veronique; Hofman, Paul

2017-01-01

For non-small cell lung cancer (NSCLC), treatment with pembrolizumab is limited to patients with tumours expressing PD-L1 assessed by immunohistochemistry (IHC) using the PD-L1 IHC 22C3 pharmDx (Dako, Inc.) companion diagnostic test, on the Dako Autostainer Link 48 (ASL48) platform. Optimised protocols are urgently needed for use of the 22C3 antibody concentrate to test PD-L1 expression on more widely available IHC autostainers. We evaluated PD-L1 expression using the 22C3 antibody concentrate in the three main commercially available autostainers Dako ASL48, BenchMark ULTRA (Ventana Medical Systems, Inc.), and Bond-III (Leica Biosystems) and compared the staining results with the PD-L1 IHC 22C3 pharmDx kit on the Dako ASL48 platform. Several technical conditions for laboratory-developed tests (LDTs) were evaluated in tonsil specimens and a training set of three NSCLC samples. Optimised protocols were then validated in 120 NSCLC specimens. Optimised protocols were obtained on both the VENTANA BenchMark ULTRA and Dako ASL48 platforms. Significant expression of PD-L1 was obtained on tissue controls with the Leica Bond-III autostainer when high concentrations of the 22C3 antibody were used. It therefore was not tested on the 120 NSCLC specimens. An almost 100% concordance rate for dichotomized tumour proportion score (TPS) results was observed between TPS ratings using the 22C3 antibody concentrate on the Dako ASL48 and VENTANA BenchMark ULTRA platforms relative to the PD-L1 IHC 22C3 pharmDx kit on the Dako ASL48 platform. Interpathologist agreement was high on both LDTs and the PD-L1 IHC 22C3 pharmDx kit on the Dako ASL48 platform. Availability of standardized protocols for determining PD-L1 expression using the 22C3 antibody concentrate on the widely available Dako ASL48 and VENTANA BenchMark ULTRA IHC platforms will expand the number of laboratories able to determine eligibility of patients with NSCLC for treatment with pembrolizumab in a reliable and concordant manner.
Model Uncertainty and Bayesian Model Averaged Benchmark Dose Estimation for Continuous Data

EPA Science Inventory

The benchmark dose (BMD) approach has gained acceptance as a valuable risk assessment tool, but risk assessors still face significant challenges associated with selecting an appropriate BMD/BMDL estimate from the results of a set of acceptable dose-response models. Current approa...
Translational benchmark risk analysis

PubMed Central

Piegorsch, Walter W.

2010-01-01

Translational development – in the sense of translating a mature methodology from one area of application to another, evolving area – is discussed for the use of benchmark doses in quantitative risk assessment. Illustrations are presented with traditional applications of the benchmark paradigm in biology and toxicology, and also with risk endpoints that differ from traditional toxicological archetypes. It is seen that the benchmark approach can apply to a diverse spectrum of risk management settings. This suggests a promising future for this important risk-analytic tool. Extensions of the method to a wider variety of applications represent a significant opportunity for enhancing environmental, biomedical, industrial, and socio-economic risk assessments. PMID:20953283
Leveraging long read sequencing from a single individual to provide a comprehensive resource for benchmarking variant calling methods

PubMed Central

Mu, John C.; Tootoonchi Afshar, Pegah; Mohiyuddin, Marghoob; Chen, Xi; Li, Jian; Bani Asadi, Narges; Gerstein, Mark B.; Wong, Wing H.; Lam, Hugo Y. K.

2015-01-01

A high-confidence, comprehensive human variant set is critical in assessing accuracy of sequencing algorithms, which are crucial in precision medicine based on high-throughput sequencing. Although recent works have attempted to provide such a resource, they still do not encompass all major types of variants including structural variants (SVs). Thus, we leveraged the massive high-quality Sanger sequences from the HuRef genome to construct by far the most comprehensive gold set of a single individual, which was cross validated with deep Illumina sequencing, population datasets, and well-established algorithms. It was a necessary effort to completely reanalyze the HuRef genome as its previously published variants were mostly reported five years ago, suffering from compatibility, organization, and accuracy issues that prevent their direct use in benchmarking. Our extensive analysis and validation resulted in a gold set with high specificity and sensitivity. In contrast to the current gold sets of the NA12878 or HS1011 genomes, our gold set is the first that includes small variants, deletion SVs and insertion SVs up to a hundred thousand base-pairs. We demonstrate the utility of our HuRef gold set to benchmark several published SV detection tools. PMID:26412485
Toward Scalable Benchmarks for Mass Storage Systems

NASA Technical Reports Server (NTRS)

Miller, Ethan L.

1996-01-01

This paper presents guidelines for the design of a mass storage system benchmark suite, along with preliminary suggestions for programs to be included. The benchmarks will measure both peak and sustained performance of the system as well as predicting both short- and long-term behavior. These benchmarks should be both portable and scalable so they may be used on storage systems from tens of gigabytes to petabytes or more. By developing a standard set of benchmarks that reflect real user workload, we hope to encourage system designers and users to publish performance figures that can be compared with those of other systems. This will allow users to choose the system that best meets their needs and give designers a tool with which they can measure the performance effects of improvements to their systems.
District Heating Systems Performance Analyses. Heat Energy Tariff

NASA Astrophysics Data System (ADS)

Ziemele, Jelena; Vigants, Girts; Vitolins, Valdis; Blumberga, Dagnija; Veidenbergs, Ivars

2014-12-01

The paper addresses an important element of the European energy sector: the evaluation of district heating (DH) system operations from the standpoint of increasing energy efficiency and increasing the use of renewable energy resources. This has been done by developing a new methodology for the evaluation of the heat tariff. The paper presents an algorithm of this methodology, which includes not only a data base and calculation equation systems, but also an integrated multi-criteria analysis module using MADM/MCDM (Multi-Attribute Decision Making / Multi-Criteria Decision Making) based on TOPSIS (Technique for Order Performance by Similarity to Ideal Solution). The results of the multi-criteria analysis are used to set the tariff benchmarks. The evaluation methodology has been tested for Latvian heat tariffs, and the obtained results show that only half of heating companies reach a benchmark value equal to 0.5 for the efficiency closeness to the ideal solution indicator. This means that the proposed evaluation methodology would not only allow companies to determine how they perform with regard to the proposed benchmark, but also to identify their need to restructure so that they may reach the level of a low-carbon business.
Forging the Basis for Developing Protein-Ligand Interaction Scoring Functions.

PubMed

Liu, Zhihai; Su, Minyi; Han, Li; Liu, Jie; Yang, Qifan; Li, Yan; Wang, Renxiao

2017-02-21

In structure-based drug design, scoring functions are widely used for fast evaluation of protein-ligand interactions. They are often applied in combination with molecular docking and de novo design methods. Since the early 1990s, a whole spectrum of protein-ligand interaction scoring functions have been developed. Regardless of their technical difference, scoring functions all need data sets combining protein-ligand complex structures and binding affinity data for parametrization and validation. However, data sets of this kind used to be rather limited in terms of size and quality. On the other hand, standard metrics for evaluating scoring function used to be ambiguous. Scoring functions are often tested in molecular docking or even virtual screening trials, which do not directly reflect the genuine quality of scoring functions. Collectively, these underlying obstacles have impeded the invention of more advanced scoring functions. In this Account, we describe our long-lasting efforts to overcome these obstacles, which involve two related projects. On the first project, we have created the PDBbind database. It is the first database that systematically annotates the protein-ligand complexes in the Protein Data Bank (PDB) with experimental binding data. This database has been updated annually since its first public release in 2004. The latest release (version 2016) provides binding data for 16 179 biomolecular complexes in PDB. Data sets provided by PDBbind have been applied to many computational and statistical studies on protein-ligand interaction and various subjects. In particular, it has become a major data resource for scoring function development. On the second project, we have established the Comparative Assessment of Scoring Functions (CASF) benchmark for scoring function evaluation. Our key idea is to decouple the "scoring" process from the "sampling" process, so scoring functions can be tested in a relatively pure context to reflect their quality. In our latest work on this track, i.e. CASF-2013, the performance of a scoring function was quantified in four aspects, including "scoring power", "ranking power", "docking power", and "screening power". All four performance tests were conducted on a test set containing 195 high-quality protein-ligand complexes selected from PDBbind. A panel of 20 standard scoring functions were tested as demonstration. Importantly, CASF is designed to be an open-access benchmark, with which scoring functions developed by different researchers can be compared on the same grounds. Indeed, it has become a popular choice for scoring function validation in recent years. Despite the considerable progress that has been made so far, the performance of today's scoring functions still does not meet people's expectations in many aspects. There is a constant demand for more advanced scoring functions. Our efforts have helped to overcome some obstacles underlying scoring function development so that the researchers in this field can move forward faster. We will continue to improve the PDBbind database and the CASF benchmark in the future to keep them as useful community resources.
Performance Evaluation of State of the Art Systems for Physical Activity Classification of Older Subjects Using Inertial Sensors in a Real Life Scenario: A Benchmark Study

PubMed Central

Awais, Muhammad; Palmerini, Luca; Bourke, Alan K.; Ihlen, Espen A. F.; Helbostad, Jorunn L.; Chiari, Lorenzo

2016-01-01

The popularity of using wearable inertial sensors for physical activity classification has dramatically increased in the last decade due to their versatility, low form factor, and low power requirements. Consequently, various systems have been developed to automatically classify daily life activities. However, the scope and implementation of such systems is limited to laboratory-based investigations. Furthermore, these systems are not directly comparable, due to the large diversity in their design (e.g., number of sensors, placement of sensors, data collection environments, data processing techniques, features set, classifiers, cross-validation methods). Hence, the aim of this study is to propose a fair and unbiased benchmark for the field-based validation of three existing systems, highlighting the gap between laboratory and real-life conditions. For this purpose, three representative state-of-the-art systems are chosen and implemented to classify the physical activities of twenty older subjects (76.4 ± 5.6 years). The performance in classifying four basic activities of daily life (sitting, standing, walking, and lying) is analyzed in controlled and free living conditions. To observe the performance of laboratory-based systems in field-based conditions, we trained the activity classification systems using data recorded in a laboratory environment and tested them in real-life conditions in the field. The findings show that the performance of all systems trained with data in the laboratory setting highly deteriorates when tested in real-life conditions, thus highlighting the need to train and test the classification systems in the real-life setting. Moreover, we tested the sensitivity of chosen systems to window size (from 1 s to 10 s) suggesting that overall accuracy decreases with increasing window size. Finally, to evaluate the impact of the number of sensors on the performance, chosen systems are modified considering only the sensing unit worn at the lower back. The results, similarly to the multi-sensor setup, indicate substantial degradation of the performance when laboratory-trained systems are tested in the real-life setting. This degradation is higher than in the multi-sensor setup. Still, the performance provided by the single-sensor approach, when trained and tested with real data, can be acceptable (with an accuracy above 80%). PMID:27973434
Setting Evidence-Based Language Goals

ERIC Educational Resources Information Center

Goertler, Senta; Kraemer, Angelika; Schenker, Theresa

2016-01-01

The purpose of this project was to identify target language benchmarks for the German program at Michigan State University (MSU) based on national and international guidelines and previous research, to assess language skills across course levels and class sections in the entire German program, and to adjust the language benchmarks as needed based…
Benchmarking Attrition: What Can We Learn From Other Industries?

ERIC Educational Resources Information Center

Delta Cost Project at American Institutes for Research, 2012

2012-01-01

This brief summarizes Internet-based research into other industries that may offer useful analogies for thinking about student attrition in higher education, in particular for setting realistic benchmarks for reductions in attrition. Reducing attrition to zero or close to zero is not a realistic possibility in higher education. Students are…
QUASAR--scoring and ranking of sequence-structure alignments.

PubMed

Birzele, Fabian; Gewehr, Jan E; Zimmer, Ralf

2005-12-15

Sequence-structure alignments are a common means for protein structure prediction in the fields of fold recognition and homology modeling, and there is a broad variety of programs that provide such alignments based on sequence similarity, secondary structure or contact potentials. Nevertheless, finding the best sequence-structure alignment in a pool of alignments remains a difficult problem. QUASAR (quality of sequence-structure alignments ranking) provides a unifying framework for scoring sequence-structure alignments that aids finding well-performing combinations of well-known and custom-made scoring schemes. Those scoring functions can be benchmarked against widely accepted quality scores like MaxSub, TMScore, Touch and APDB, thus enabling users to test their own alignment scores against 'standard-of-truth' structure-based scores. Furthermore, individual score combinations can be optimized with respect to benchmark sets based on known structural relationships using QUASAR's in-built optimization routines.
Evaluation of a novel electronic eigenvalue (EEVA) molecular descriptor for QSAR/QSPR studies: validation using a benchmark steroid data set.

PubMed

Tuppurainen, Kari; Viisas, Marja; Laatikainen, Reino; Peräkylä, Mikael

2002-01-01

A novel electronic eigenvalue (EEVA) descriptor of molecular structure for use in the derivation of predictive QSAR/QSPR models is described. Like other spectroscopic QSAR/QSPR descriptors, EEVA is also invariant as to the alignment of the structures concerned. Its performance was tested with respect to the CBG (corticosteroid binding globulin) affinity of 31 benchmark steroids. It appeared that the electronic structure of the steroids, i.e., the "spectra" derived from molecular orbital energies, is directly related to the CBG binding affinities. The predictive ability of EEVA is compared to other QSAR approaches, and its performance is discussed in the context of the Hammett equation. The good performance of EEVA is an indication of the essential quantum mechanical nature of QSAR. The EEVA method is a supplement to conventional 3D QSAR methods, which employ fields or surface properties derived from Coulombic and van der Waals interactions.

Basin-scale estimates of oceanic primary production by remote sensing - The North Atlantic

NASA Technical Reports Server (NTRS)

Platt, Trevor; Caverhill, Carla; Sathyendranath, Shubha

1991-01-01

The monthly averaged CZCS data for 1979 are used to estimate annual primary production at ocean basin scales in the North Atlantic. The principal supplementary data used were 873 vertical profiles of chlorophyll and 248 sets of parameters derived from photosynthesis-light experiments. Four different procedures were tested for calculation of primary production. The spectral model with nonuniform biomass was considered as the benchmark for comparison against the other three models. The less complete models gave results that differed by as much as 50 percent from the benchmark. Vertically uniform models tended to underestimate primary production by about 20 percent compared to the nonuniform models. At horizontal scale, the differences between spectral and nonspectral models were negligible. The linear correlation between biomass and estimated production was poor outside the tropics, suggesting caution against the indiscriminate use of biomass as a proxy variable for primary production.
The impact of a scheduling change on ninth grade high school performance on biology benchmark exams and the California Standards Test

NASA Astrophysics Data System (ADS)

Leonardi, Marcelo

The primary purpose of this study was to examine the impact of a scheduling change from a trimester 4x4 block schedule to a modified hybrid schedule on student achievement in ninth grade biology courses. This study examined the impact of the scheduling change on student achievement through teacher created benchmark assessments in Genetics, DNA, and Evolution and on the California Standardized Test in Biology. The secondary purpose of this study examined the ninth grade biology teacher perceptions of ninth grade biology student achievement. Using a mixed methods research approach, data was collected both quantitatively and qualitatively as aligned to research questions. Quantitative methods included gathering data from departmental benchmark exams and California Standardized Test in Biology and conducting multiple analysis of covariance and analysis of covariance to determine significance differences. Qualitative methods include journal entries questions and focus group interviews. The results revealed a statistically significant increase in scores on both the DNA and Evolution benchmark exams. DNA and Evolution benchmark exams showed significant improvements from a change in scheduling format. The scheduling change was responsible for 1.5% of the increase in DNA benchmark scores and 2% of the increase in Evolution benchmark scores. The results revealed a statistically significant decrease in scores on the Genetics Benchmark exam as a result of the scheduling change. The scheduling change was responsible for 1% of the decrease in Genetics benchmark scores. The results also revealed a statistically significant increase in scores on the CST Biology exam. The scheduling change was responsible for .7% of the increase in CST Biology scores. Results of the focus group discussions indicated that all teachers preferred the modified hybrid schedule over the trimester schedule and that it improved student achievement.
Verification and benchmark testing of the NUFT computer code

NASA Astrophysics Data System (ADS)

Lee, K. H.; Nitao, J. J.; Kulshrestha, A.

1993-10-01

This interim report presents results of work completed in the ongoing verification and benchmark testing of the NUFT (Nonisothermal Unsaturated-saturated Flow and Transport) computer code. NUFT is a suite of multiphase, multicomponent models for numerical solution of thermal and isothermal flow and transport in porous media, with application to subsurface contaminant transport problems. The code simulates the coupled transport of heat, fluids, and chemical components, including volatile organic compounds. Grid systems may be cartesian or cylindrical, with one-, two-, or fully three-dimensional configurations possible. In this initial phase of testing, the NUFT code was used to solve seven one-dimensional unsaturated flow and heat transfer problems. Three verification and four benchmarking problems were solved. In the verification testing, excellent agreement was observed between NUFT results and the analytical or quasianalytical solutions. In the benchmark testing, results of code intercomparison were very satisfactory. From these testing results, it is concluded that the NUFT code is ready for application to field and laboratory problems similar to those addressed here. Multidimensional problems, including those dealing with chemical transport, will be addressed in a subsequent report.
Assessing validity of observational intervention studies - the Benchmarking Controlled Trials.

PubMed

Malmivaara, Antti

2016-09-01

Benchmarking Controlled Trial (BCT) is a concept which covers all observational studies aiming to assess impact of interventions or health care system features to patients and populations. To create and pilot test a checklist for appraising methodological validity of a BCT. The checklist was created by extracting the most essential elements from the comprehensive set of criteria in the previous paper on BCTs. Also checklists and scientific papers on observational studies and respective systematic reviews were utilized. Ten BCTs published in the Lancet and in the New England Journal of Medicine were used to assess feasibility of the created checklist. The appraised studies seem to have several methodological limitations, some of which could be avoided in planning, conducting and reporting phases of the studies. The checklist can be used for planning, conducting, reporting, reviewing, and critical reading of observational intervention studies. However, the piloted checklist should be validated in further studies. Key messages Benchmarking Controlled Trial (BCT) is a concept which covers all observational studies aiming to assess impact of interventions or health care system features to patients and populations. This paper presents a checklist for appraising methodological validity of BCTs and pilot-tests the checklist with ten BCTs published in leading medical journals. The appraised studies seem to have several methodological limitations, some of which could be avoided in planning, conducting and reporting phases of the studies. The checklist can be used for planning, conducting, reporting, reviewing, and critical reading of observational intervention studies.
A benchmark for vehicle detection on wide area motion imagery

NASA Astrophysics Data System (ADS)

Catrambone, Joseph; Amzovski, Ismail; Liang, Pengpeng; Blasch, Erik; Sheaff, Carolyn; Wang, Zhonghai; Chen, Genshe; Ling, Haibin

2015-05-01

Wide area motion imagery (WAMI) has been attracting an increased amount of research attention due to its large spatial and temporal coverage. An important application includes moving target analysis, where vehicle detection is often one of the first steps before advanced activity analysis. While there exist many vehicle detection algorithms, a thorough evaluation of them on WAMI data still remains a challenge mainly due to the lack of an appropriate benchmark data set. In this paper, we address a research need by presenting a new benchmark for wide area motion imagery vehicle detection data. The WAMI benchmark is based on the recently available Wright-Patterson Air Force Base (WPAFB09) dataset and the Temple Resolved Uncertainty Target History (TRUTH) associated target annotation. Trajectory annotations were provided in the original release of the WPAFB09 dataset, but detailed vehicle annotations were not available with the dataset. In addition, annotations of static vehicles, e.g., in parking lots, are also not identified in the original release. Addressing these issues, we re-annotated the whole dataset with detailed information for each vehicle, including not only a target's location, but also its pose and size. The annotated WAMI data set should be useful to community for a common benchmark to compare WAMI detection, tracking, and identification methods.
Benchmarking and the laboratory

PubMed Central

Galloway, M; Nadin, L

2001-01-01

This article describes how benchmarking can be used to assess laboratory performance. Two benchmarking schemes are reviewed, the Clinical Benchmarking Company's Pathology Report and the College of American Pathologists' Q-Probes scheme. The Clinical Benchmarking Company's Pathology Report is undertaken by staff based in the clinical management unit, Keele University with appropriate input from the professional organisations within pathology. Five annual reports have now been completed. Each report is a detailed analysis of 10 areas of laboratory performance. In this review, particular attention is focused on the areas of quality, productivity, variation in clinical practice, skill mix, and working hours. The Q-Probes scheme is part of the College of American Pathologists programme in studies of quality assurance. The Q-Probes scheme and its applicability to pathology in the UK is illustrated by reviewing two recent Q-Probe studies: routine outpatient test turnaround time and outpatient test order accuracy. The Q-Probes scheme is somewhat limited by the small number of UK laboratories that have participated. In conclusion, as a result of the government's policy in the UK, benchmarking is here to stay. Benchmarking schemes described in this article are one way in which pathologists can demonstrate that they are providing a cost effective and high quality service. Key Words: benchmarking • pathology PMID:11477112
A Benchmark and Comparative Study of Video-Based Face Recognition on COX Face Database.

PubMed

Huang, Zhiwu; Shan, Shiguang; Wang, Ruiping; Zhang, Haihong; Lao, Shihong; Kuerban, Alifu; Chen, Xilin

2015-12-01

Face recognition with still face images has been widely studied, while the research on video-based face recognition is inadequate relatively, especially in terms of benchmark datasets and comparisons. Real-world video-based face recognition applications require techniques for three distinct scenarios: 1) Videoto-Still (V2S); 2) Still-to-Video (S2V); and 3) Video-to-Video (V2V), respectively, taking video or still image as query or target. To the best of our knowledge, few datasets and evaluation protocols have benchmarked for all the three scenarios. In order to facilitate the study of this specific topic, this paper contributes a benchmarking and comparative study based on a newly collected still/video face database, named COX(1) Face DB. Specifically, we make three contributions. First, we collect and release a largescale still/video face database to simulate video surveillance with three different video-based face recognition scenarios (i.e., V2S, S2V, and V2V). Second, for benchmarking the three scenarios designed on our database, we review and experimentally compare a number of existing set-based methods. Third, we further propose a novel Point-to-Set Correlation Learning (PSCL) method, and experimentally show that it can be used as a promising baseline method for V2S/S2V face recognition on COX Face DB. Extensive experimental results clearly demonstrate that video-based face recognition needs more efforts, and our COX Face DB is a good benchmark database for evaluation.
Availability of Neutronics Benchmarks in the ICSBEP and IRPhEP Handbooks for Computational Tools Testing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bess, John D.; Briggs, J. Blair; Ivanova, Tatiana

2017-02-01

In the past several decades, numerous experiments have been performed worldwide to support reactor operations, measurements, design, and nuclear safety. Those experiments represent an extensive international investment in infrastructure, expertise, and cost, representing significantly valuable resources of data supporting past, current, and future research activities. Those valuable assets represent the basis for recording, development, and validation of our nuclear methods and integral nuclear data [1]. The loss of these experimental data, which has occurred all too much in the recent years, is tragic. The high cost to repeat many of these measurements can be prohibitive, if not impossible, to surmount.more » Two international projects were developed, and are under the direction of the Organisation for Co-operation and Development Nuclear Energy Agency (OECD NEA) to address the challenges of not just data preservation, but evaluation of the data to determine its merit for modern and future use. The International Criticality Safety Benchmark Evaluation Project (ICSBEP) was established to identify and verify comprehensive critical benchmark data sets; evaluate the data, including quantification of biases and uncertainties; compile the data and calculations in a standardized format; and formally document the effort into a single source of verified benchmark data [2]. Similarly, the International Reactor Physics Experiment Evaluation Project (IRPhEP) was established to preserve integral reactor physics experimental data, including separate or special effects data for nuclear energy and technology applications [3]. Annually, contributors from around the world continue to collaborate in the evaluation and review of select benchmark experiments for preservation and dissemination. The extensively peer-reviewed integral benchmark data can then be utilized to support nuclear design and safety analysts to validate the analytical tools, methods, and data needed for next-generation reactor design, safety analysis requirements, and all other front- and back-end activities contributing to the overall nuclear fuel cycle where quality neutronics calculations are paramount.« less
Paediatric International Nursing Study: using person-centred key performance indicators to benchmark children's services.

PubMed

McCance, Tanya; Wilson, Val; Kornman, Kelly

2016-07-01

The aim of the Paediatric International Nursing Study was to explore the utility of key performance indicators in developing person-centred practice across a range of services provided to sick children. The objective addressed in this paper was evaluating the use of these indicators to benchmark services internationally. This study builds on primary research, which produced indicators that were considered novel both in terms of their positive orientation and use in generating data that privileges the patient voice. This study extends this research through wider testing on an international platform within paediatrics. The overall methodological approach was a realistic evaluation used to evaluate the implementation of the key performance indicators, which combined an integrated development and evaluation methodology. The study involved children's wards/hospitals in Australia (six sites across three states) and Europe (seven sites across four countries). Qualitative and quantitative methods were used during the implementation process, however, this paper reports the quantitative data only, which used survey, observations and documentary review. The findings demonstrate the quality of care being delivered to children and their families across different international sites. The benchmarking does, however, highlight some differences between paediatric and general hospitals, and between the different key performance indicators across all the sites. The findings support the use of the key performance indicators as a novel method to benchmark services internationally. Whilst the data collected across 20 paediatric sites suggest services are more similar than different, benchmarking illuminates variations that encourage a critical dialogue about what works and why. The transferability of the key performance indicators and measurement framework across different settings has significant implications for practice. The findings offer an approach to benchmarking and celebrating the successes within practice, while learning from partners across the globe in further developing person-centred cultures. © 2016 John Wiley & Sons Ltd.
Searching for Elements of Evidence-based Practices in Children’s Usual Care and Examining their Impact

PubMed Central

Garland, Ann F.; Accurso, Erin C.; Haine-Schlagel, Rachel; Brookman-Frazee, Lauren; Roesch, Scott; Zhang, Jin Jin

2014-01-01

Objective Most of the knowledge generated to bridge the research - practice gap has been derived from experimental studies implementing specific treatment models. Alternatively, this study uses observational methods to generate knowledge about community-based treatment processes and outcomes. Aims are to (1) describe outcome trajectories for children with disruptive behavior problems (DBPs), and (2) test how observed delivery of a benchmark set of practice elements common in evidence-based (EB) treatments may be associated with outcome change, while accounting for potential confounding variables. Method Participants included 190 children ages 4–13 with DBPs and their caregivers, plus 85 psychotherapists, recruited from six clinics. All treatment sessions were video-taped and a random sample of four sessions in the first four months of treatment was reliably coded for intensity on 27 practice elements (benchmark set and others). Three outcomes (child symptom severity, parent discipline, and family functioning) were assessed by parent report at intake, four, and eight months. Data were collected on several potential covariates including child, parent, therapist, and service use characteristics. Multi-level modeling was used to assess relationships between observed practice and outcome slopes, while accounting for covariates. Results Children and families demonstrated improvements in all three outcomes, but few significant associations between treatment processes and outcome change were identified. Families receiving greater intensity on the benchmark practice elements did demonstrate greater improvement in the parental discipline outcome. Conclusion Observed changes in outcomes for families in community care were generally not strongly associated with the type or amount of treatment received. PMID:24555882
Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Gaeke, Brian R.; Husbands, Parry; Li, Xiaoye S.; Oliker, Leonid; Yelick, Katherine A.; Biegel, Bryan (Technical Monitor)

2002-01-01

The increasing gap between processor and memory performance has lead to new architectural models for memory-intensive applications. In this paper, we explore the performance of a set of memory-intensive benchmarks and use them to compare the performance of conventional cache-based microprocessors to a mixed logic and DRAM processor called VIRAM. The benchmarks are based on problem statements, rather than specific implementations, and in each case we explore the fundamental hardware requirements of the problem, as well as alternative algorithms and data structures that can help expose fine-grained parallelism or simplify memory access patterns. The benchmarks are characterized by their memory access patterns, their basic control structures, and the ratio of computation to memory operation.
[The OPTIMISE study (Optimal Type 2 Diabetes Management Including Benchmarking and Standard Treatment]. Results for Luxembourg].

PubMed

Michel, G

2012-01-01

The OPTIMISE study (NCT00681850) has been run in six European countries, including Luxembourg, to prospectively assess the effect of benchmarking on the quality of primary care in patients with type 2 diabetes, using major modifiable vascular risk factors as critical quality indicators. Primary care centers treating type 2 diabetic patients were randomized to give standard care (control group) or standard care with feedback benchmarked against other centers in each country (benchmarking group). Primary endpoint was percentage of patients in the benchmarking group achieving pre-set targets of the critical quality indicators: glycated hemoglobin (HbAlc), systolic blood pressure (SBP) and low-density lipoprotein (LDL) cholesterol after 12 months follow-up. In Luxembourg, in the benchmarking group, more patients achieved target for SBP (40.2% vs. 20%) and for LDL-cholesterol (50.4% vs. 44.2%). 12.9% of patients in the benchmarking group met all three targets compared with patients in the control group (8.3%). In this randomized, controlled study, benchmarking was shown to be an effective tool for improving critical quality indicator targets, which are the principal modifiable vascular risk factors in diabetes type 2.
Do Medicare Advantage Plans Minimize Costs? Investigating the Relationship Between Benchmarks, Costs, and Rebates.

PubMed

Zuckerman, Stephen; Skopec, Laura; Guterman, Stuart

2017-12-01

Medicare Advantage (MA), the program that allows people to receive their Medicare benefits through private health plans, uses a benchmark-and-bidding system to induce plans to provide benefits at lower costs. However, prior research suggests medical costs, profits, and other plan costs are not as low under this system as they might otherwise be. To examine how well the current system encourages MA plans to bid their lowest cost by examining the relationship between costs and bonuses (rebates) and the benchmarks Medicare uses in determining plan payments. Regression analysis using 2015 data for HMO and local PPO plans. Costs and rebates are higher for MA plans in areas with higher benchmarks, and plan costs vary less than benchmarks do. A one-dollar increase in benchmarks is associated with 32-cent-higher plan costs and a 52-cent-higher rebate, even when controlling for market and plan factors that can affect costs. This suggests the current benchmark-and-bidding system allows plans to bid higher than local input prices and other market conditions would seem to warrant. To incentivize MA plans to maximize efficiency and minimize costs, Medicare could change the way benchmarks are set or used.
An impatient evolutionary algorithm with probabilistic tabu search for unified solution of some NP-hard problems in graph and set theory via clique finding.

PubMed

Guturu, Parthasarathy; Dantu, Ram

2008-06-01

Many graph- and set-theoretic problems, because of their tremendous application potential and theoretical appeal, have been well investigated by the researchers in complexity theory and were found to be NP-hard. Since the combinatorial complexity of these problems does not permit exhaustive searches for optimal solutions, only near-optimal solutions can be explored using either various problem-specific heuristic strategies or metaheuristic global-optimization methods, such as simulated annealing, genetic algorithms, etc. In this paper, we propose a unified evolutionary algorithm (EA) to the problems of maximum clique finding, maximum independent set, minimum vertex cover, subgraph and double subgraph isomorphism, set packing, set partitioning, and set cover. In the proposed approach, we first map these problems onto the maximum clique-finding problem (MCP), which is later solved using an evolutionary strategy. The proposed impatient EA with probabilistic tabu search (IEA-PTS) for the MCP integrates the best features of earlier successful approaches with a number of new heuristics that we developed to yield a performance that advances the state of the art in EAs for the exploration of the maximum cliques in a graph. Results of experimentation with the 37 DIMACS benchmark graphs and comparative analyses with six state-of-the-art algorithms, including two from the smaller EA community and four from the larger metaheuristics community, indicate that the IEA-PTS outperforms the EAs with respect to a Pareto-lexicographic ranking criterion and offers competitive performance on some graph instances when individually compared to the other heuristic algorithms. It has also successfully set a new benchmark on one graph instance. On another benchmark suite called Benchmarks with Hidden Optimal Solutions, IEA-PTS ranks second, after a very recent algorithm called COVER, among its peers that have experimented with this suite.
Diversity Recruiting: Overview of Practices and Benchmarks. CERI Research Brief 4-2013

ERIC Educational Resources Information Center

Gardner, Phil

2013-01-01

Little information exists on the basic elements of diversity recruiting on college campuses. A set of questions was developed for the Collegiate Employment Research Institute's (CERI's) annual college hiring survey that attempted to capture the current practices and benchmarks being employed by organizations in their diversity recruiting programs.…
Mathematics Content Standards Benchmarks and Performance Standards

ERIC Educational Resources Information Center

New Mexico Public Education Department, 2008

2008-01-01

New Mexico Mathematics Content Standards, Benchmarks, and Performance Standards identify what students should know and be able to do across all grade levels, forming a spiraling framework in the sense that many skills, once introduced, develop over time. While the Performance Standards are set forth at grade-specific levels, they do not exist as…
A Critical Thinking Benchmark for a Department of Agricultural Education and Studies

ERIC Educational Resources Information Center

Perry, Dustin K.; Retallick, Michael S.; Paulsen, Thomas H.

2014-01-01

Due to an ever changing world where technology seemingly provides endless answers, today's higher education students must master a new skill set reflecting an emphasis on critical thinking, problem solving, and communications. The purpose of this study was to establish a departmental benchmark for critical thinking abilities of students majoring…
Optimization of a solid-state electron spin qubit using Gate Set Tomography

DOE PAGES

Dehollain, Juan P.; Muhonen, Juha T.; Blume-Kohout, Robin J.; ...

2016-10-13

Here, state of the art qubit systems are reaching the gate fidelities required for scalable quantum computation architectures. Further improvements in the fidelity of quantum gates demands characterization and benchmarking protocols that are efficient, reliable and extremely accurate. Ideally, a benchmarking protocol should also provide information on how to rectify residual errors. Gate Set Tomography (GST) is one such protocol designed to give detailed characterization of as-built qubits. We implemented GST on a high-fidelity electron-spin qubit confined by a single 31P atom in 28Si. The results reveal systematic errors that a randomized benchmarking analysis could measure but not identify, whereasmore » GST indicated the need for improved calibration of the length of the control pulses. After introducing this modification, we measured a new benchmark average gate fidelity of 99.942(8)%, an improvement on the previous value of 99.90(2)%. Furthermore, GST revealed high levels of non-Markovian noise in the system, which will need to be understood and addressed when the qubit is used within a fault-tolerant quantum computation scheme.« less
Toxicological benchmarks for screening potential contaminants of concern for effects on soil and litter invertebrates and heterotrophic process

DOE Office of Scientific and Technical Information (OSTI.GOV)

Will, M.E.; Suter, G.W. II

1994-09-01

One of the initial stages in ecological risk assessments for hazardous waste sites is the screening of contaminants to determine which of them are worthy of further consideration as {open_quotes}contaminants of potential concern.{close_quotes} This process is termed {open_quotes}contaminant screening.{close_quotes} It is performed by comparing measured ambient concentrations of chemicals to benchmark concentrations. Currently, no standard benchmark concentrations exist for assessing contaminants in soil with respect to their toxicity to soil- and litter-dwelling invertebrates, including earthworms, other micro- and macroinvertebrates, or heterotrophic bacteria and fungi. This report presents a standard method for deriving benchmarks for this purpose, sets of data concerningmore » effects of chemicals in soil on invertebrates and soil microbial processes, and benchmarks for chemicals potentially associated with United States Department of Energy sites. In addition, literature describing the experiments from which data were drawn for benchmark derivation. Chemicals that are found in soil at concentrations exceeding both the benchmarks and the background concentration for the soil type should be considered contaminants of potential concern.« less
Benchmarking study of the MCNP code against cold critical experiments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sitaraman, S.

1991-01-01

The purpose of this study was to benchmark the widely used Monte Carlo code MCNP against a set of cold critical experiments with a view to using the code as a means of independently verifying the performance of faster but less accurate Monte Carlo and deterministic codes. The experiments simulated consisted of both fast and thermal criticals as well as fuel in a variety of chemical forms. A standard set of benchmark cold critical experiments was modeled. These included the two fast experiments, GODIVA and JEZEBEL, the TRX metallic uranium thermal experiments, the Babcock and Wilcox oxide and mixed oxidemore » experiments, and the Oak Ridge National Laboratory (ORNL) and Pacific Northwest Laboratory (PNL) nitrate solution experiments. The principal case studied was a small critical experiment that was performed with boiling water reactor bundles.« less

Benchmarking Tool Kit.

ERIC Educational Resources Information Center

Canadian Health Libraries Association.

Nine Canadian health libraries participated in a pilot test of the Benchmarking Tool Kit between January and April, 1998. Although the Tool Kit was designed specifically for health libraries, the content and approach are useful to other types of libraries as well. Used to its full potential, benchmarking can provide a common measuring stick to…
DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, Grace L.; Department of Health Services Research, The University of Texas MD Anderson Cancer Center, Houston, Texas; Jiang, Jing

Purpose: High-quality treatment for intact cervical cancer requires external radiation therapy, brachytherapy, and chemotherapy, carefully sequenced and completed without delays. We sought to determine how frequently current treatment meets quality benchmarks and whether new technologies have influenced patterns of care. Methods and Materials: By searching diagnosis and procedure claims in MarketScan, an employment-based health care claims database, we identified 1508 patients with nonmetastatic, intact cervical cancer treated from 1999 to 2011, who were <65 years of age and received >10 fractions of radiation. Treatments received were identified using procedure codes and compared with 3 quality benchmarks: receipt of brachytherapy, receipt ofmore » chemotherapy, and radiation treatment duration not exceeding 63 days. The Cochran-Armitage test was used to evaluate temporal trends. Results: Seventy-eight percent of patients (n=1182) received brachytherapy, with brachytherapy receipt stable over time (Cochran-Armitage P{sub trend}=.15). Among patients who received brachytherapy, 66% had high–dose rate and 34% had low–dose rate treatment, although use of high–dose rate brachytherapy steadily increased to 75% by 2011 (P{sub trend}<.001). Eighteen percent of patients (n=278) received intensity modulated radiation therapy (IMRT), and IMRT receipt increased to 37% by 2011 (P{sub trend}<.001). Only 2.5% of patients (n=38) received IMRT in the setting of brachytherapy omission. Overall, 79% of patients (n=1185) received chemotherapy, and chemotherapy receipt increased to 84% by 2011 (P{sub trend}<.001). Median radiation treatment duration was 56 days (interquartile range, 47-65 days); however, duration exceeded 63 days in 36% of patients (n=543). Although 98% of patients received at least 1 benchmark treatment, only 44% received treatment that met all 3 benchmarks. With more stringent indicators (brachytherapy, ≥4 chemotherapy cycles, and duration not exceeding 56 days), only 25% of patients received treatment that met all benchmarks. Conclusion: In this cohort, most cervical cancer patients received treatment that did not comply with all 3 benchmarks for quality treatment. In contrast to increasing receipt of newer radiation technologies, there was little improvement in receipt of essential treatment benchmarks.« less
Valence and charge-transfer optical properties for some SinCm (m, n ≤ 12) clusters: Comparing TD-DFT, complete-basis-limit EOMCC, and benchmarks from spectroscopy

NASA Astrophysics Data System (ADS)

Lutz, Jesse J.; Duan, Xiaofeng F.; Ranasinghe, Duminda S.; Jin, Yifan; Margraf, Johannes T.; Perera, Ajith; Burggraf, Larry W.; Bartlett, Rodney J.

2018-05-01

Accurate optical characterization of the closo-Si12C12 molecule is important to guide experimental efforts toward the synthesis of nano-wires, cyclic nano-arrays, and related array structures, which are anticipated to be robust and efficient exciton materials for opto-electronic devices. Working toward calibrated methods for the description of closo-Si12C12 oligomers, various electronic structure approaches are evaluated for their ability to reproduce measured optical transitions of the SiC2, Si2Cn (n = 1-3), and Si3Cn (n = 1, 2) clusters reported earlier by Steglich and Maier [Astrophys. J. 801, 119 (2015)]. Complete-basis-limit equation-of-motion coupled-cluster (EOMCC) results are presented and a comparison is made between perturbative and renormalized non-iterative triples corrections. The effect of adding a renormalized correction for quadruples is also tested. Benchmark test sets derived from both measurement and high-level EOMCC calculations are then used to evaluate the performance of a variety of density functionals within the time-dependent density functional theory (TD-DFT) framework. The best-performing functionals are subsequently applied to predict valence TD-DFT excitation energies for the lowest-energy isomers of SinC and Sin-1C7-n (n = 4-6). TD-DFT approaches are then applied to the SinCn (n = 4-12) clusters and unique spectroscopic signatures of closo-Si12C12 are discussed. Finally, various long-range corrected density functionals, including those from the CAM-QTP family, are applied to a charge-transfer excitation in a cyclic (Si4C4)4 oligomer. Approaches for gauging the extent of charge-transfer character are also tested and EOMCC results are used to benchmark functionals and make recommendations.
Surgeon-Specific Reports in General Surgery: Establishing Benchmarks for Peer Comparison Within a Single Hospital.

PubMed

Hatfield, Mark D; Ashton, Carol M; Bass, Barbara L; Shirkey, Beverly A

2016-02-01

Methods to assess a surgeon's individual performance based on clinically meaningful outcomes have not been fully developed, due to small numbers of adverse outcomes and wide variation in case volumes. The Achievable Benchmark of Care (ABC) method addresses these issues by identifying benchmark-setting surgeons with high levels of performance and greater case volumes. This method was used to help surgeons compare their surgical practice to that of their peers by using merged National Surgical Quality Improvement Program (NSQIP) and Metabolic and Bariatric Surgery Accreditation and Quality Improvement Program (MBSAQIP) data to generate surgeon-specific reports. A retrospective cohort study at a single institution's department of surgery was conducted involving 107 surgeons (8,660 cases) over 5.5 years. Stratification of more than 32,000 CPT codes into 16 CPT clusters served as the risk adjustment. Thirty-day outcomes of interest included surgical site infection (SSI), acute kidney injury (AKI), and mortality. Performance characteristics of the ABC method were explored by examining how many surgeons were identified as benchmark-setters in view of volume and outcome rates within CPT clusters. For the data captured, most surgeons performed cases spanning a median of 5 CPT clusters (range 1 to 15 clusters), with a median of 26 cases (range 1 to 776 cases) and a median of 2.8 years (range 0 to 5.5 years). The highest volume surgeon for that CPT cluster set the benchmark for 6 of 16 CPT clusters for SSIs, 8 of 16 CPT clusters for AKIs, and 9 of 16 CPT clusters for mortality. The ABC method appears to be a sound and useful approach to identifying benchmark-setting surgeons within a single institution. Such surgeons may be able to help their peers improve their performance. Copyright © 2016 American College of Surgeons. Published by Elsevier Inc. All rights reserved.
Propulsion Diagnostic Method Evaluation Strategy (ProDiMES) User's Guide

NASA Technical Reports Server (NTRS)

Simon, Donald L.

2010-01-01

This report is a User's Guide for the Propulsion Diagnostic Method Evaluation Strategy (ProDiMES). ProDiMES is a standard benchmarking problem and a set of evaluation metrics to enable the comparison of candidate aircraft engine gas path diagnostic methods. This Matlab (The Mathworks, Inc.) based software tool enables users to independently develop and evaluate diagnostic methods. Additionally, a set of blind test case data is also distributed as part of the software. This will enable the side-by-side comparison of diagnostic approaches developed by multiple users. The Users Guide describes the various components of ProDiMES, and provides instructions for the installation and operation of the tool.
Using string invariants for prediction searching for optimal parameters

NASA Astrophysics Data System (ADS)

Bundzel, Marek; Kasanický, Tomáš; Pinčák, Richard

2016-02-01

We have developed a novel prediction method based on string invariants. The method does not require learning but a small set of parameters must be set to achieve optimal performance. We have implemented an evolutionary algorithm for the parametric optimization. We have tested the performance of the method on artificial and real world data and compared the performance to statistical methods and to a number of artificial intelligence methods. We have used data and the results of a prediction competition as a benchmark. The results show that the method performs well in single step prediction but the method's performance for multiple step prediction needs to be improved. The method works well for a wide range of parameters.
Simulation-based comprehensive benchmarking of RNA-seq aligners

PubMed Central

Baruzzo, Giacomo; Hayer, Katharina E; Kim, Eun Ji; Di Camillo, Barbara; FitzGerald, Garret A; Grant, Gregory R

2018-01-01

Alignment is the first step in most RNA-seq analysis pipelines, and the accuracy of downstream analyses depends heavily on it. Unlike most steps in the pipeline, alignment is particularly amenable to benchmarking with simulated data. We performed a comprehensive benchmarking of 14 common splice-aware aligners for base, read, and exon junction-level accuracy and compared default with optimized parameters. We found that performance varied by genome complexity, and accuracy and popularity were poorly correlated. The most widely cited tool underperforms for most metrics, particularly when using default settings. PMID:27941783
Benchmarking an Unstructured-Grid Model for Tsunami Current Modeling

NASA Astrophysics Data System (ADS)

Zhang, Yinglong J.; Priest, George; Allan, Jonathan; Stimely, Laura

2016-12-01

We present model results derived from a tsunami current benchmarking workshop held by the NTHMP (National Tsunami Hazard Mitigation Program) in February 2015. Modeling was undertaken using our own 3D unstructured-grid model that has been previously certified by the NTHMP for tsunami inundation. Results for two benchmark tests are described here, including: (1) vortex structure in the wake of a submerged shoal and (2) impact of tsunami waves on Hilo Harbor in the 2011 Tohoku event. The modeled current velocities are compared with available lab and field data. We demonstrate that the model is able to accurately capture the velocity field in the two benchmark tests; in particular, the 3D model gives a much more accurate wake structure than the 2D model for the first test, with the root-mean-square error and mean bias no more than 2 cm s-1 and 8 mm s-1, respectively, for the modeled velocity.
Intrusion detection using rough set classification.

PubMed

Zhang, Lian-hua; Zhang, Guan-hua; Zhang, Jie; Bai, Ying-cai

2004-09-01

Recently machine learning-based intrusion detection approaches have been subjected to extensive researches because they can detect both misuse and anomaly. In this paper, rough set classification (RSC), a modern learning algorithm, is used to rank the features extracted for detecting intrusions and generate intrusion detection models. Feature ranking is a very critical step when building the model. RSC performs feature ranking before generating rules, and converts the feature ranking to minimal hitting set problem addressed by using genetic algorithm (GA). This is done in classical approaches using Support Vector Machine (SVM) by executing many iterations, each of which removes one useless feature. Compared with those methods, our method can avoid many iterations. In addition, a hybrid genetic algorithm is proposed to increase the convergence speed and decrease the training time of RSC. The models generated by RSC take the form of "IF-THEN" rules, which have the advantage of explication. Tests and comparison of RSC with SVM on DARPA benchmark data showed that for Probe and DoS attacks both RSC and SVM yielded highly accurate results (greater than 99% accuracy on testing set).
40 CFR 141.172 - Disinfection profiling and benchmarking.

Code of Federal Regulations, 2013 CFR

2013-07-01

... more representative annual data set than the data set determined under paragraph (a)(1) or (2) of this... may require that a system use a more representative annual data set than the data set determined under... data set than the data set determined under paragraph (a)(2)(i) of this section, the system must submit...
40 CFR 141.172 - Disinfection profiling and benchmarking.

Code of Federal Regulations, 2012 CFR

2012-07-01

... more representative annual data set than the data set determined under paragraph (a)(1) or (2) of this... may require that a system use a more representative annual data set than the data set determined under... data set than the data set determined under paragraph (a)(2)(i) of this section, the system must submit...
40 CFR 141.172 - Disinfection profiling and benchmarking.

Code of Federal Regulations, 2014 CFR

2014-07-01

... more representative annual data set than the data set determined under paragraph (a)(1) or (2) of this... may require that a system use a more representative annual data set than the data set determined under... data set than the data set determined under paragraph (a)(2)(i) of this section, the system must submit...
Pairwise measures of causal direction in the epidemiology of sleep problems and depression.

PubMed

Rosenström, Tom; Jokela, Markus; Puttonen, Sampsa; Hintsanen, Mirka; Pulkki-Råback, Laura; Viikari, Jorma S; Raitakari, Olli T; Keltikangas-Järvinen, Liisa

2012-01-01

Depressive mood is often preceded by sleep problems, suggesting that they increase the risk of depression. Sleep problems can also reflect prodromal symptom of depression, thus temporal precedence alone is insufficient to confirm causality. The authors applied recently introduced statistical causal-discovery algorithms that can estimate causality from cross-sectional samples in order to infer the direction of causality between the two sets of symptoms from a novel perspective. Two common-population samples were used; one from the Young Finns study (690 men and 997 women, average age 37.7 years, range 30-45), and another from the Wisconsin Longitudinal study (3101 men and 3539 women, average age 53.1 years, range 52-55). These included three depression questionnaires (two in Young Finns data) and two sleep problem questionnaires. Three different causality estimates were constructed for each data set, tested in a benchmark data with a (practically) known causality, and tested for assumption violations using simulated data. Causality algorithms performed well in the benchmark data and simulations, and a prediction was drawn for future empirical studies to confirm: for minor depression/dysphoria, sleep problems cause significantly more dysphoria than dysphoria causes sleep problems. The situation may change as depression becomes more severe, or more severe levels of symptoms are evaluated; also, artefacts due to severe depression being less well presented in the population data than minor depression may intervene the estimation for depression scales that emphasize severe symptoms. The findings are consistent with other emerging epidemiological and biological evidence.
Benchmark levels for the consumptive water footprint of crop production for different environmental conditions: a case study for winter wheat in China

NASA Astrophysics Data System (ADS)

Zhuo, La; Mekonnen, Mesfin M.; Hoekstra, Arjen Y.

2016-11-01

Meeting growing food demands while simultaneously shrinking the water footprint (WF) of agricultural production is one of the greatest societal challenges. Benchmarks for the WF of crop production can serve as a reference and be helpful in setting WF reduction targets. The consumptive WF of crops, the consumption of rainwater stored in the soil (green WF), and the consumption of irrigation water (blue WF) over the crop growing period varies spatially and temporally depending on environmental factors like climate and soil. The study explores which environmental factors should be distinguished when determining benchmark levels for the consumptive WF of crops. Hereto we determine benchmark levels for the consumptive WF of winter wheat production in China for all separate years in the period 1961-2008, for rain-fed vs. irrigated croplands, for wet vs. dry years, for warm vs. cold years, for four different soil classes, and for two different climate zones. We simulate consumptive WFs of winter wheat production with the crop water productivity model AquaCrop at a 5 by 5 arcmin resolution, accounting for water stress only. The results show that (i) benchmark levels determined for individual years for the country as a whole remain within a range of ±20 % around long-term mean levels over 1961-2008, (ii) the WF benchmarks for irrigated winter wheat are 8-10 % larger than those for rain-fed winter wheat, (iii) WF benchmarks for wet years are 1-3 % smaller than for dry years, (iv) WF benchmarks for warm years are 7-8 % smaller than for cold years, (v) WF benchmarks differ by about 10-12 % across different soil texture classes, and (vi) WF benchmarks for the humid zone are 26-31 % smaller than for the arid zone, which has relatively higher reference evapotranspiration in general and lower yields in rain-fed fields. We conclude that when determining benchmark levels for the consumptive WF of a crop, it is useful to primarily distinguish between different climate zones. If actual consumptive WFs of winter wheat throughout China were reduced to the benchmark levels set by the best 25 % of Chinese winter wheat production (1224 m3 t-1 for arid areas and 841 m3 t-1 for humid areas), the water saving in an average year would be 53 % of the current water consumption at winter wheat fields in China. The majority of the yield increase and associated improvement in water productivity can be achieved in southern China.
Benchmarking in pathology: development of a benchmarking complexity unit and associated key performance indicators.

PubMed

Neil, Amanda; Pfeffer, Sally; Burnett, Leslie

2013-01-01

This paper details the development of a new type of pathology laboratory productivity unit, the benchmarking complexity unit (BCU). The BCU provides a comparative index of laboratory efficiency, regardless of test mix. It also enables estimation of a measure of how much complex pathology a laboratory performs, and the identification of peer organisations for the purposes of comparison and benchmarking. The BCU is based on the theory that wage rates reflect productivity at the margin. A weighting factor for the ratio of medical to technical staff time was dynamically calculated based on actual participant site data. Given this weighting, a complexity value for each test, at each site, was calculated. The median complexity value (number of BCUs) for that test across all participating sites was taken as its complexity value for the Benchmarking in Pathology Program. The BCU allowed implementation of an unbiased comparison unit and test listing that was found to be a robust indicator of the relative complexity for each test. Employing the BCU data, a number of Key Performance Indicators (KPIs) were developed, including three that address comparative organisational complexity, analytical depth and performance efficiency, respectively. Peer groups were also established using the BCU combined with simple organisational and environmental metrics. The BCU has enabled productivity statistics to be compared between organisations. The BCU corrects for differences in test mix and workload complexity of different organisations and also allows for objective stratification into peer groups.
Analysis of a benchmark suite to evaluate mixed numeric and symbolic processing

NASA Technical Reports Server (NTRS)

Ragharan, Bharathi; Galant, David

1992-01-01

The suite of programs that formed the benchmark for a proposed advanced computer is described and analyzed. The features of the processor and its operating system that are tested by the benchmark are discussed. The computer codes and the supporting data for the analysis are given as appendices.
OPTIMIZATION OF MUD HAMMER DRILLING PERFORMANCE - A PROGRAM TO BENCHMARK THE VIABILITY OF ADVANCED MUD HAMMER DRILLING

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arnis Judzis

2002-10-01

This document details the progress to date on the OPTIMIZATION OF MUD HAMMER DRILLING PERFORMANCE -- A PROGRAM TO BENCHMARK THE VIABILITY OF ADVANCED MUD HAMMER DRILLING contract for the quarter starting July 2002 through September 2002. Even though we are awaiting the optimization portion of the testing program, accomplishments include the following: (1) Smith International agreed to participate in the DOE Mud Hammer program. (2) Smith International chromed collars for upcoming benchmark tests at TerraTek, now scheduled for 4Q 2002. (3) ConocoPhillips had a field trial of the Smith fluid hammer offshore Vietnam. The hammer functioned properly, though themore » well encountered hole conditions and reaming problems. ConocoPhillips plan another field trial as a result. (4) DOE/NETL extended the contract for the fluid hammer program to allow Novatek to ''optimize'' their much delayed tool to 2003 and to allow Smith International to add ''benchmarking'' tests in light of SDS Digger Tools' current financial inability to participate. (5) ConocoPhillips joined the Industry Advisors for the mud hammer program. (6) TerraTek acknowledges Smith International, BP America, PDVSA, and ConocoPhillips for cost-sharing the Smith benchmarking tests allowing extension of the contract to complete the optimizations.« less
State Education Agency Communications Process: Benchmark and Best Practices Project. Benchmark and Best Practices Project. Issue No. 01

ERIC Educational Resources Information Center

Zavadsky, Heather

2014-01-01

The role of state education agencies (SEAs) has shifted significantly from low-profile, compliance activities like managing federal grants to engaging in more complex and politically charged tasks like setting curriculum standards, developing accountability systems, and creating new teacher evaluation systems. The move from compliance-monitoring…
75 FR 51058 - The Effects of Mountaintop Mines and Valley Fills on Aquatic Ecosystems of the Central...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-08-18

... the public additional time to evaluate the data used to derive a benchmark for conductivity. The... FR 18499). By following the link below, reviewers may download the initial data and EPA's derivative data sets that were used to calculate the conductivity benchmark. These reports were developed by the...
Public Interest Energy Research (PIER) Program Development of a Computer-based Benchmarking and Analytical Tool. Benchmarking and Energy & Water Savings Tool in Dairy Plants (BEST-Dairy)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xu, Tengfang; Flapper, Joris; Ke, Jing

The overall goal of the project is to develop a computer-based benchmarking and energy and water savings tool (BEST-Dairy) for use in the California dairy industry - including four dairy processes - cheese, fluid milk, butter, and milk powder. BEST-Dairy tool developed in this project provides three options for the user to benchmark each of the dairy product included in the tool, with each option differentiated based on specific detail level of process or plant, i.e., 1) plant level; 2) process-group level, and 3) process-step level. For each detail level, the tool accounts for differences in production and other variablesmore » affecting energy use in dairy processes. The dairy products include cheese, fluid milk, butter, milk powder, etc. The BEST-Dairy tool can be applied to a wide range of dairy facilities to provide energy and water savings estimates, which are based upon the comparisons with the best available reference cases that were established through reviewing information from international and national samples. We have performed and completed alpha- and beta-testing (field testing) of the BEST-Dairy tool, through which feedback from voluntary users in the U.S. dairy industry was gathered to validate and improve the tool's functionality. BEST-Dairy v1.2 was formally published in May 2011, and has been made available for free downloads from the internet (i.e., http://best-dairy.lbl.gov). A user's manual has been developed and published as the companion documentation for use with the BEST-Dairy tool. In addition, we also carried out technology transfer activities by engaging the dairy industry in the process of tool development and testing, including field testing, technical presentations, and technical assistance throughout the project. To date, users from more than ten countries in addition to those in the U.S. have downloaded the BEST-Dairy from the LBNL website. It is expected that the use of BEST-Dairy tool will advance understanding of energy and water usage in individual dairy plants, augment benchmarking activities in the market places, and facilitate implementation of efficiency measures and strategies to save energy and water usage in the dairy industry. Industrial adoption of this emerging tool and technology in the market is expected to benefit dairy plants, which are important customers of California utilities. Further demonstration of this benchmarking tool is recommended, for facilitating its commercialization and expansion in functions of the tool. Wider use of this BEST-Dairy tool and its continuous expansion (in functionality) will help to reduce the actual consumption of energy and water in the dairy industry sector. The outcomes comply very well with the goals set by the AB 1250 for PIER program.« less

Quality Assurance Testing of Version 1.3 of U.S. EPA Benchmark Dose Software (Presentation)

EPA Science Inventory

EPA benchmark dose software (BMDS) issued to evaluate chemical dose-response data in support of Agency risk assessments, and must therefore be dependable. Quality assurance testing methods developed for BMDS were designed to assess model dependability with respect to curve-fitt...
Kohn-Sham Band Structure Benchmark Including Spin-Orbit Coupling for 2D and 3D Solids

NASA Astrophysics Data System (ADS)

Huhn, William; Blum, Volker

2015-03-01

Accurate electronic band structures serve as a primary indicator of the suitability of a material for a given application, e.g., as electronic or catalytic materials. Computed band structures, however, are subject to a host of approximations, some of which are more obvious (e.g., the treatment of the exchange-correlation of self-energy) and others less obvious (e.g., the treatment of core, semicore, or valence electrons, handling of relativistic effects, or the accuracy of the underlying basis set used). We here provide a set of accurate Kohn-Sham band structure benchmarks, using the numeric atom-centered all-electron electronic structure code FHI-aims combined with the ``traditional'' PBE functional and the hybrid HSE functional, to calculate core, valence, and low-lying conduction bands of a set of 2D and 3D materials. Benchmarks are provided with and without effects of spin-orbit coupling, using quasi-degenerate perturbation theory to predict spin-orbit splittings. This work is funded by Fritz-Haber-Institut der Max-Planck-Gesellschaft.
Signifying quantum benchmarks for qubit teleportation and secure quantum communication using Einstein-Podolsky-Rosen steering inequalities

NASA Astrophysics Data System (ADS)

Reid, M. D.

2013-12-01

The demonstration of quantum teleportation of a photonic qubit from Alice to Bob usually relies on data conditioned on detection at Bob's location. I show that Bohm's Einstein-Podolsky-Rosen (EPR) paradox can be used to verify that the quantum benchmark for qubit teleportation has been reached, without postselection. This is possible for scenarios insensitive to losses at the generation station, and with efficiencies of ηB>1/3 for the teleportation process. The benchmark is obtained if it is shown that Bob can “steer” Alice's record of the qubit as stored by Charlie. EPR steering inequalities involving m measurement settings can also be used to confirm quantum teleportation, for efficiencies ηB>1/m, if one assumes trusted detectors for Charlie and Alice. Using proofs of monogamy, I show that two-setting EPR steering inequalities can signify secure teleportation of the qubit state.
A benchmark testing ground for integrating homology modeling and protein docking.

PubMed

Bohnuud, Tanggis; Luo, Lingqi; Wodak, Shoshana J; Bonvin, Alexandre M J J; Weng, Zhiping; Vajda, Sandor; Schueler-Furman, Ora; Kozakov, Dima

2017-01-01

Protein docking procedures carry out the task of predicting the structure of a protein-protein complex starting from the known structures of the individual protein components. More often than not, however, the structure of one or both components is not known, but can be derived by homology modeling on the basis of known structures of related proteins deposited in the Protein Data Bank (PDB). Thus, the problem is to develop methods that optimally integrate homology modeling and docking with the goal of predicting the structure of a complex directly from the amino acid sequences of its component proteins. One possibility is to use the best available homology modeling and docking methods. However, the models built for the individual subunits often differ to a significant degree from the bound conformation in the complex, often much more so than the differences observed between free and bound structures of the same protein, and therefore additional conformational adjustments, both at the backbone and side chain levels need to be modeled to achieve an accurate docking prediction. In particular, even homology models of overall good accuracy frequently include localized errors that unfavorably impact docking results. The predicted reliability of the different regions in the model can also serve as a useful input for the docking calculations. Here we present a benchmark dataset that should help to explore and solve combined modeling and docking problems. This dataset comprises a subset of the experimentally solved 'target' complexes from the widely used Docking Benchmark from the Weng Lab (excluding antibody-antigen complexes). This subset is extended to include the structures from the PDB related to those of the individual components of each complex, and hence represent potential templates for investigating and benchmarking integrated homology modeling and docking approaches. Template sets can be dynamically customized by specifying ranges in sequence similarity and in PDB release dates, or using other filtering options, such as excluding sets of specific structures from the template list. Multiple sequence alignments, as well as structural alignments of the templates to their corresponding subunits in the target are also provided. The resource is accessible online or can be downloaded at http://cluspro.org/benchmark, and is updated on a weekly basis in synchrony with new PDB releases. Proteins 2016; 85:10-16. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Maximum unbiased validation (MUV) data sets for virtual screening based on PubChem bioactivity data.

PubMed

Rohrer, Sebastian G; Baumann, Knut

2009-02-01

Refined nearest neighbor analysis was recently introduced for the analysis of virtual screening benchmark data sets. It constitutes a technique from the field of spatial statistics and provides a mathematical framework for the nonparametric analysis of mapped point patterns. Here, refined nearest neighbor analysis is used to design benchmark data sets for virtual screening based on PubChem bioactivity data. A workflow is devised that purges data sets of compounds active against pharmaceutically relevant targets from unselective hits. Topological optimization using experimental design strategies monitored by refined nearest neighbor analysis functions is applied to generate corresponding data sets of actives and decoys that are unbiased with regard to analogue bias and artificial enrichment. These data sets provide a tool for Maximum Unbiased Validation (MUV) of virtual screening methods. The data sets and a software package implementing the MUV design workflow are freely available at http://www.pharmchem.tu-bs.de/lehre/baumann/MUV.html.
A new improved artificial bee colony algorithm for ship hull form optimization

NASA Astrophysics Data System (ADS)

Huang, Fuxin; Wang, Lijue; Yang, Chi

2016-04-01

The artificial bee colony (ABC) algorithm is a relatively new swarm intelligence-based optimization algorithm. Its simplicity of implementation, relatively few parameter settings and promising optimization capability make it widely used in different fields. However, it has problems of slow convergence due to its solution search equation. Here, a new solution search equation based on a combination of the elite solution pool and the block perturbation scheme is proposed to improve the performance of the algorithm. In addition, two different solution search equations are used by employed bees and onlooker bees to balance the exploration and exploitation of the algorithm. The developed algorithm is validated by a set of well-known numerical benchmark functions. It is then applied to optimize two ship hull forms with minimum resistance. The tested results show that the proposed new improved ABC algorithm can outperform the ABC algorithm in most of the tested problems.
A community detection algorithm using network topologies and rule-based hierarchical arc-merging strategies

PubMed Central

2017-01-01

The authors use four criteria to examine a novel community detection algorithm: (a) effectiveness in terms of producing high values of normalized mutual information (NMI) and modularity, using well-known social networks for testing; (b) examination, meaning the ability to examine mitigating resolution limit problems using NMI values and synthetic networks; (c) correctness, meaning the ability to identify useful community structure results in terms of NMI values and Lancichinetti-Fortunato-Radicchi (LFR) benchmark networks; and (d) scalability, or the ability to produce comparable modularity values with fast execution times when working with large-scale real-world networks. In addition to describing a simple hierarchical arc-merging (HAM) algorithm that uses network topology information, we introduce rule-based arc-merging strategies for identifying community structures. Five well-studied social network datasets and eight sets of LFR benchmark networks were employed to validate the correctness of a ground-truth community, eight large-scale real-world complex networks were used to measure its efficiency, and two synthetic networks were used to determine its susceptibility to two resolution limit problems. Our experimental results indicate that the proposed HAM algorithm exhibited satisfactory performance efficiency, and that HAM-identified and ground-truth communities were comparable in terms of social and LFR benchmark networks, while mitigating resolution limit problems. PMID:29121100
Benchmarking variable-density flow in saturated and unsaturated porous media

NASA Astrophysics Data System (ADS)

Guevara Morel, Carlos Roberto; Cremer, Clemens; Graf, Thomas

2015-04-01

In natural environments, fluid density and viscosity can be affected by spatial and temporal variations of solute concentration and/or temperature. These variations can occur, for example, due to salt water intrusion in coastal aquifers, leachate infiltration from waste disposal sites and upconing of saline water from deep aquifers. As a consequence, potentially unstable situations may exist in which a dense fluid overlies a less dense fluid. This situation can produce instabilities that manifest as dense plume fingers that move vertically downwards counterbalanced by vertical upwards flow of the less dense fluid. Resulting free convection increases solute transport rates over large distances and times relative to constant-density flow. Therefore, the understanding of free convection is relevant for the protection of freshwater aquifer systems. The results from a laboratory experiment of saturated and unsaturated variable-density flow and solute transport (Simmons et al., Transp. Porous Medium, 2002) are used as the physical basis to define a mathematical benchmark. The HydroGeoSphere code coupled with PEST are used to estimate the optimal parameter set capable of reproducing the physical model. A grid convergency analysis (in space and time) is also undertaken in order to obtain the adequate spatial and temporal discretizations. The new mathematical benchmark is useful for model comparison and testing of variable-density variably saturated flow in porous media.
Benchmarking infrastructure for mutation text mining

PubMed Central

2014-01-01

Background Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. Results We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. Conclusion We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption. PMID:24568600
Benchmarking infrastructure for mutation text mining.

PubMed

Klein, Artjom; Riazanov, Alexandre; Hindle, Matthew M; Baker, Christopher Jo

2014-02-25

Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption.
A Web Resource for Standardized Benchmark Datasets, Metrics, and Rosetta Protocols for Macromolecular Modeling and Design.

PubMed

Ó Conchúir, Shane; Barlow, Kyle A; Pache, Roland A; Ollikainen, Noah; Kundert, Kale; O'Meara, Matthew J; Smith, Colin A; Kortemme, Tanja

2015-01-01

The development and validation of computational macromolecular modeling and design methods depend on suitable benchmark datasets and informative metrics for comparing protocols. In addition, if a method is intended to be adopted broadly in diverse biological applications, there needs to be information on appropriate parameters for each protocol, as well as metrics describing the expected accuracy compared to experimental data. In certain disciplines, there exist established benchmarks and public resources where experts in a particular methodology are encouraged to supply their most efficient implementation of each particular benchmark. We aim to provide such a resource for protocols in macromolecular modeling and design. We present a freely accessible web resource (https://kortemmelab.ucsf.edu/benchmarks) to guide the development of protocols for protein modeling and design. The site provides benchmark datasets and metrics to compare the performance of a variety of modeling protocols using different computational sampling methods and energy functions, providing a "best practice" set of parameters for each method. Each benchmark has an associated downloadable benchmark capture archive containing the input files, analysis scripts, and tutorials for running the benchmark. The captures may be run with any suitable modeling method; we supply command lines for running the benchmarks using the Rosetta software suite. We have compiled initial benchmarks for the resource spanning three key areas: prediction of energetic effects of mutations, protein design, and protein structure prediction, each with associated state-of-the-art modeling protocols. With the help of the wider macromolecular modeling community, we hope to expand the variety of benchmarks included on the website and continue to evaluate new iterations of current methods as they become available.
Issues in Benchmark Metric Selection

NASA Astrophysics Data System (ADS)

Crolotte, Alain

It is true that a metric can influence a benchmark but will esoteric metrics create more problems than they will solve? We answer this question affirmatively by examining the case of the TPC-D metric which used the much debated geometric mean for the single-stream test. We will show how a simple choice influenced the benchmark and its conduct and, to some extent, DBMS development. After examining other alternatives our conclusion is that the “real” measure for a decision-support benchmark is the arithmetic mean.
Production and Testing of the VITAMIN-B7 Fine-Group and BUGLE-B7 Broad-Group Coupled Neutron/Gamma Cross-Section Libraries Derived from ENDF/B-VII.0 Nuclear Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Risner, J. M.; Wiarda, D.; Dunn, M. E.

2011-09-30

New coupled neutron-gamma cross-section libraries have been developed for use in light water reactor (LWR) shielding applications, including pressure vessel dosimetry calculations. The libraries, which were generated using Evaluated Nuclear Data File/B Version VII Release 0 (ENDF/B-VII.0), use the same fine-group and broad-group energy structures as the VITAMIN-B6 and BUGLE-96 libraries. The processing methodology used to generate both libraries is based on the methods used to develop VITAMIN-B6 and BUGLE-96 and is consistent with ANSI/ANS 6.1.2. The ENDF data were first processed into the fine-group pseudo-problem-independent VITAMIN-B7 library and then collapsed into the broad-group BUGLE-B7 library. The VITAMIN-B7 library containsmore » data for 391 nuclides. This represents a significant increase compared to the VITAMIN-B6 library, which contained data for 120 nuclides. The BUGLE-B7 library contains data for the same nuclides as BUGLE-96, and maintains the same numeric IDs for those nuclides. The broad-group data includes nuclides which are infinitely dilute and group collapsed using a concrete weighting spectrum, as well as nuclides which are self-shielded and group collapsed using weighting spectra representative of important regions of LWRs. The verification and validation of the new libraries includes a set of critical benchmark experiments, a set of regression tests that are used to evaluate multigroup crosssection libraries in the SCALE code system, and three pressure vessel dosimetry benchmarks. Results of these tests confirm that the new libraries are appropriate for use in LWR shielding analyses and meet the requirements of Regulatory Guide 1.190.« less
Monte Carlo Perturbation Theory Estimates of Sensitivities to System Dimensions

DOE PAGES

Burke, Timothy P.; Kiedrowski, Brian C.

2017-12-11

Here, Monte Carlo methods are developed using adjoint-based perturbation theory and the differential operator method to compute the sensitivities of the k-eigenvalue, linear functions of the flux (reaction rates), and bilinear functions of the forward and adjoint flux (kinetics parameters) to system dimensions for uniform expansions or contractions. The calculation of sensitivities to system dimensions requires computing scattering and fission sources at material interfaces using collisions occurring at the interface—which is a set of events with infinitesimal probability. Kernel density estimators are used to estimate the source at interfaces using collisions occurring near the interface. The methods for computing sensitivitiesmore » of linear and bilinear ratios are derived using the differential operator method and adjoint-based perturbation theory and are shown to be equivalent to methods previously developed using a collision history–based approach. The methods for determining sensitivities to system dimensions are tested on a series of fast, intermediate, and thermal critical benchmarks as well as a pressurized water reactor benchmark problem with iterated fission probability used for adjoint-weighting. The estimators are shown to agree within 5% and 3σ of reference solutions obtained using direct perturbations with central differences for the majority of test problems.« less
CyClus: a fast, comprehensive cylindrical interface approximation clustering/reranking method for rigid-body protein-protein docking decoys.

PubMed

Omori, Satoshi; Kitao, Akio

2013-06-01

We propose a fast clustering and reranking method, CyClus, for protein-protein docking decoys. This method enables comprehensive clustering of whole decoys generated by rigid-body docking using cylindrical approximation of the protein-proteininterface and hierarchical clustering procedures. We demonstrate the clustering and reranking of 54,000 decoy structures generated by ZDOCK for each complex within a few minutes. After parameter tuning for the test set in ZDOCK benchmark 2.0 with the ZDOCK and ZRANK scoring functions, blind tests for the incremental data in ZDOCK benchmark 3.0 and 4.0 were conducted. CyClus successfully generated smaller subsets of decoys containing near-native decoys. For example, the number of decoys required to create subsets containing near-native decoys with 80% probability was reduced from 22% to 50% of the number required in the original ZDOCK. Although specific ZDOCK and ZRANK results were demonstrated, the CyClus algorithm was designed to be more general and can be applied to a wide range of decoys and scoring functions by adjusting just two parameters, p and T. CyClus results were also compared to those from ClusPro. Copyright © 2013 Wiley Periodicals, Inc.
Benchmark Lisp And Ada Programs

NASA Technical Reports Server (NTRS)

Davis, Gloria; Galant, David; Lim, Raymond; Stutz, John; Gibson, J.; Raghavan, B.; Cheesema, P.; Taylor, W.

1992-01-01

Suite of nonparallel benchmark programs, ELAPSE, designed for three tests: comparing efficiency of computer processing via Lisp vs. Ada; comparing efficiencies of several computers processing via Lisp; or comparing several computers processing via Ada. Tests efficiency which computer executes routines in each language. Available for computer equipped with validated Ada compiler and/or Common Lisp system.
Optimization of Deep Drilling Performance - Development and Benchmark Testing of Advanced Diamond Product Drill Bits & HP/HT Fluids to Significantly Improve Rates of Penetration

DOE Office of Scientific and Technical Information (OSTI.GOV)

Alan Black; Arnis Judzis

2005-09-30

This document details the progress to date on the OPTIMIZATION OF DEEP DRILLING PERFORMANCE--DEVELOPMENT AND BENCHMARK TESTING OF ADVANCED DIAMOND PRODUCT DRILL BITS AND HP/HT FLUIDS TO SIGNIFICANTLY IMPROVE RATES OF PENETRATION contract for the year starting October 2004 through September 2005. The industry cost shared program aims to benchmark drilling rates of penetration in selected simulated deep formations and to significantly improve ROP through a team development of aggressive diamond product drill bit--fluid system technologies. Overall the objectives are as follows: Phase 1--Benchmark ''best in class'' diamond and other product drilling bits and fluids and develop concepts for amore » next level of deep drilling performance; Phase 2--Develop advanced smart bit-fluid prototypes and test at large scale; and Phase 3--Field trial smart bit--fluid concepts, modify as necessary and commercialize products. As of report date, TerraTek has concluded all Phase 1 testing and is planning Phase 2 development.« less
Assessing validity of observational intervention studies – the Benchmarking Controlled Trials

PubMed Central

Malmivaara, Antti

2016-01-01

Abstract Background: Benchmarking Controlled Trial (BCT) is a concept which covers all observational studies aiming to assess impact of interventions or health care system features to patients and populations. Aims: To create and pilot test a checklist for appraising methodological validity of a BCT. Methods: The checklist was created by extracting the most essential elements from the comprehensive set of criteria in the previous paper on BCTs. Also checklists and scientific papers on observational studies and respective systematic reviews were utilized. Ten BCTs published in the Lancet and in the New England Journal of Medicine were used to assess feasibility of the created checklist. Results: The appraised studies seem to have several methodological limitations, some of which could be avoided in planning, conducting and reporting phases of the studies. Conclusions: The checklist can be used for planning, conducting, reporting, reviewing, and critical reading of observational intervention studies. However, the piloted checklist should be validated in further studies.Key messagesBenchmarking Controlled Trial (BCT) is a concept which covers all observational studies aiming to assess impact of interventions or health care system features to patients and populations.This paper presents a checklist for appraising methodological validity of BCTs and pilot-tests the checklist with ten BCTs published in leading medical journals. The appraised studies seem to have several methodological limitations, some of which could be avoided in planning, conducting and reporting phases of the studies.The checklist can be used for planning, conducting, reporting, reviewing, and critical reading of observational intervention studies. PMID:27238631
Construct Validity of Fresh Frozen Human Cadaver as a Training Model in Minimal Access Surgery

PubMed Central

Macafee, David; Pranesh, Nagarajan; Horgan, Alan F.

2012-01-01

Background: The construct validity of fresh human cadaver as a training tool has not been established previously. The aims of this study were to investigate the construct validity of fresh frozen human cadaver as a method of training in minimal access surgery and determine if novices can be rapidly trained using this model to a safe level of performance. Methods: Junior surgical trainees, novices (<3 laparoscopic procedure performed) in laparoscopic surgery, performed 10 repetitions of a set of structured laparoscopic tasks on fresh frozen cadavers. Expert laparoscopists (>100 laparoscopic procedures) performed 3 repetitions of identical tasks. Performances were scored using a validated, objective Global Operative Assessment of Laparoscopic Skills scale. Scores for 3 consecutive repetitions were compared between experts and novices to determine construct validity. Furthermore, to determine if the novices reached a safe level, a trimmed mean of the experts score was used to define a benchmark. Mann-Whitney U test was used for construct validity analysis and 1-sample t test to compare performances of the novice group with the benchmark safe score. Results: Ten novices and 2 experts were recruited. Four out of 5 tasks (nondominant to dominant hand transfer; simulated appendicectomy; intracorporeal and extracorporeal knot tying) showed construct validity. Novices’ scores became comparable to benchmark scores between the eighth and tenth repetition. Conclusion: Minimal access surgical training using fresh frozen human cadavers appears to have construct validity. The laparoscopic skills of novices can be accelerated through to a safe level within 8 to 10 repetitions. PMID:23318058
Benchmarking contactless acquisition sensor reproducibility for latent fingerprint trace evidence

NASA Astrophysics Data System (ADS)

Hildebrandt, Mario; Dittmann, Jana

2015-03-01

Optical, nano-meter range, contactless, non-destructive sensor devices are promising acquisition techniques in crime scene trace forensics, e.g. for digitizing latent fingerprint traces. Before new approaches are introduced in crime investigations, innovations need to be positively tested and quality ensured. In this paper we investigate sensor reproducibility by studying different scans from four sensors: two chromatic white light sensors (CWL600/CWL1mm), one confocal laser scanning microscope, and one NIR/VIS/UV reflection spectrometer. Firstly, we perform an intra-sensor reproducibility testing for CWL600 with a privacy conform test set of artificial-sweat printed, computer generated fingerprints. We use 24 different fingerprint patterns as original samples (printing samples/templates) for printing with artificial sweat (physical trace samples) and their acquisition with contactless sensory resulting in 96 sensor images, called scan or acquired samples. The second test set for inter-sensor reproducibility assessment consists of the first three patterns from the first test set, acquired in two consecutive scans using each device. We suggest using a simple feature space set in spatial and frequency domain known from signal processing and test its suitability for six different classifiers classifying scan data into small differences (reproducible) and large differences (non-reproducible). Furthermore, we suggest comparing the classification results with biometric verification scores (calculated with NBIS, with threshold of 40) as biometric reproducibility score. The Bagging classifier is nearly for all cases the most reliable classifier in our experiments and the results are also confirmed with the biometric matching rates.

PSO algorithm enhanced with Lozi Chaotic Map - Tuning experiment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pluhacek, Michal; Senkerik, Roman; Zelinka, Ivan

2015-03-10

In this paper it is investigated the effect of tuning of control parameters of the Lozi Chaotic Map employed as a chaotic pseudo-random number generator for the particle swarm optimization algorithm. Three different benchmark functions are selected from the IEEE CEC 2013 competition benchmark set. The Lozi map is extensively tuned and the performance of PSO is evaluated.
A high-fidelity airbus benchmark for system fault detection and isolation and flight control law clearance

NASA Astrophysics Data System (ADS)

Goupil, Ph.; Puyou, G.

2013-12-01

This paper presents a high-fidelity generic twin engine civil aircraft model developed by Airbus for advanced flight control system research. The main features of this benchmark are described to make the reader aware of the model complexity and representativeness. It is a complete representation including the nonlinear rigid-body aircraft model with a full set of control surfaces, actuator models, sensor models, flight control laws (FCL), and pilot inputs. Two applications of this benchmark in the framework of European projects are presented: FCL clearance using optimization and advanced fault detection and diagnosis (FDD).
Evaluating performance of biomedical image retrieval systems – an overview of the medical image retrieval task at ImageCLEF 2004–2013

PubMed Central

Kalpathy-Cramer, Jayashree; de Herrera, Alba García Seco; Demner-Fushman, Dina; Antani, Sameer; Bedrick, Steven; Müller, Henning

2014-01-01

Medical image retrieval and classification have been extremely active research topics over the past 15 years. With the ImageCLEF benchmark in medical image retrieval and classification a standard test bed was created that allows researchers to compare their approaches and ideas on increasingly large and varied data sets including generated ground truth. This article describes the lessons learned in ten evaluations campaigns. A detailed analysis of the data also highlights the value of the resources created. PMID:24746250
Towards routine determination of focal mechanisms obtained from first motion P-wave arrivals

NASA Astrophysics Data System (ADS)

Lentas, K.

2018-03-01

The Bulletin of the International Seismological Centre (ISC) contains information on earthquake mechanisms collected from many different sources including national and global agencies, resulting in a satisfactory coverage over a wide magnitude range (M ˜2-9). Nevertheless, there are still a vast number of earthquakes with no reported source mechanisms especially for magnitudes up to 5. This study investigates the possibility of calculating earthquake focal mechanisms in a routine and systematic way based on P-wave first motion polarities. Any available parametric data in the ISC database is being used, as well as auto-picked polarities from waveform data up to teleseismic epicentral distances (90°) for stations that are not reported to the ISC. The determination of the earthquake mechanisms is carried out with a modified version of the HASH algorithm that is compatible with a wide range of epicentral distances and takes into account the ellipsoids defined by the ISC location errors, and the Earth's structure uncertainties. Initially, benchmark tests for a set of ISC reviewed earthquakes (mb > 4.5) are carried out and the HASH mechanism classification scheme is used to define the mechanism quality. Focal mechanisms of quality A, B and C with an azimuthal gap up to 90° compare well to the benchmark mechanisms. Nevertheless, the majority of the obtained mechanisms fall into class D as a result of limited polarity data from stations in local/regional epicentral distances. Specifically, the computation of the minimum rotation angle between the obtained mechanisms and the benchmarks, reveals that 41 per cent of the examined earthquakes show rotation angles up to 35°. Finally, the current technique is applied to a small set of earthquakes from the reviewed ISC bulletin where 62 earthquakes, with no previously reported source mechanisms, are successfully obtained.
jCompoundMapper: An open source Java library and command-line tool for chemical fingerprints

PubMed Central

2011-01-01

Background The decomposition of a chemical graph is a convenient approach to encode information of the corresponding organic compound. While several commercial toolkits exist to encode molecules as so-called fingerprints, only a few open source implementations are available. The aim of this work is to introduce a library for exactly defined molecular decompositions, with a strong focus on the application of these features in machine learning and data mining. It provides several options such as search depth, distance cut-offs, atom- and pharmacophore typing. Furthermore, it provides the functionality to combine, to compare, or to export the fingerprints into several formats. Results We provide a Java 1.6 library for the decomposition of chemical graphs based on the open source Chemistry Development Kit toolkit. We reimplemented popular fingerprinting algorithms such as depth-first search fingerprints, extended connectivity fingerprints, autocorrelation fingerprints (e.g. CATS2D), radial fingerprints (e.g. Molprint2D), geometrical Molprint, atom pairs, and pharmacophore fingerprints. We also implemented custom fingerprints such as the all-shortest path fingerprint that only includes the subset of shortest paths from the full set of paths of the depth-first search fingerprint. As an application of jCompoundMapper, we provide a command-line executable binary. We measured the conversion speed and number of features for each encoding and described the composition of the features in detail. The quality of the encodings was tested using the default parametrizations in combination with a support vector machine on the Sutherland QSAR data sets. Additionally, we benchmarked the fingerprint encodings on the large-scale Ames toxicity benchmark using a large-scale linear support vector machine. The results were promising and could often compete with literature results. On the large Ames benchmark, for example, we obtained an AUC ROC performance of 0.87 with a reimplementation of the extended connectivity fingerprint. This result is comparable to the performance achieved by a non-linear support vector machine using state-of-the-art descriptors. On the Sutherland QSAR data set, the best fingerprint encodings showed a comparable or better performance on 5 of the 8 benchmarks when compared against the results of the best descriptors published in the paper of Sutherland et al. Conclusions jCompoundMapper is a library for chemical graph fingerprints with several tweaking possibilities and exporting options for open source data mining toolkits. The quality of the data mining results, the conversion speed, the LPGL software license, the command-line interface, and the exporters should be useful for many applications in cheminformatics like benchmarks against literature methods, comparison of data mining algorithms, similarity searching, and similarity-based data mining. PMID:21219648
ORBDA: An openEHR benchmark dataset for performance assessment of electronic health record servers.

PubMed

Teodoro, Douglas; Sundvall, Erik; João Junior, Mario; Ruch, Patrick; Miranda Freire, Sergio

2018-01-01

The openEHR specifications are designed to support implementation of flexible and interoperable Electronic Health Record (EHR) systems. Despite the increasing number of solutions based on the openEHR specifications, it is difficult to find publicly available healthcare datasets in the openEHR format that can be used to test, compare and validate different data persistence mechanisms for openEHR. To foster research on openEHR servers, we present the openEHR Benchmark Dataset, ORBDA, a very large healthcare benchmark dataset encoded using the openEHR formalism. To construct ORBDA, we extracted and cleaned a de-identified dataset from the Brazilian National Healthcare System (SUS) containing hospitalisation and high complexity procedures information and formalised it using a set of openEHR archetypes and templates. Then, we implemented a tool to enrich the raw relational data and convert it into the openEHR model using the openEHR Java reference model library. The ORBDA dataset is available in composition, versioned composition and EHR openEHR representations in XML and JSON formats. In total, the dataset contains more than 150 million composition records. We describe the dataset and provide means to access it. Additionally, we demonstrate the usage of ORBDA for evaluating inserting throughput and query latency performances of some NoSQL database management systems. We believe that ORBDA is a valuable asset for assessing storage models for openEHR-based information systems during the software engineering process. It may also be a suitable component in future standardised benchmarking of available openEHR storage platforms.
ORBDA: An openEHR benchmark dataset for performance assessment of electronic health record servers

PubMed Central

Sundvall, Erik; João Junior, Mario; Ruch, Patrick; Miranda Freire, Sergio

2018-01-01

The openEHR specifications are designed to support implementation of flexible and interoperable Electronic Health Record (EHR) systems. Despite the increasing number of solutions based on the openEHR specifications, it is difficult to find publicly available healthcare datasets in the openEHR format that can be used to test, compare and validate different data persistence mechanisms for openEHR. To foster research on openEHR servers, we present the openEHR Benchmark Dataset, ORBDA, a very large healthcare benchmark dataset encoded using the openEHR formalism. To construct ORBDA, we extracted and cleaned a de-identified dataset from the Brazilian National Healthcare System (SUS) containing hospitalisation and high complexity procedures information and formalised it using a set of openEHR archetypes and templates. Then, we implemented a tool to enrich the raw relational data and convert it into the openEHR model using the openEHR Java reference model library. The ORBDA dataset is available in composition, versioned composition and EHR openEHR representations in XML and JSON formats. In total, the dataset contains more than 150 million composition records. We describe the dataset and provide means to access it. Additionally, we demonstrate the usage of ORBDA for evaluating inserting throughput and query latency performances of some NoSQL database management systems. We believe that ORBDA is a valuable asset for assessing storage models for openEHR-based information systems during the software engineering process. It may also be a suitable component in future standardised benchmarking of available openEHR storage platforms. PMID:29293556
The limitations of simple gene set enrichment analysis assuming gene independence.

PubMed

Tamayo, Pablo; Steinhardt, George; Liberzon, Arthur; Mesirov, Jill P

2016-02-01

Since its first publication in 2003, the Gene Set Enrichment Analysis method, based on the Kolmogorov-Smirnov statistic, has been heavily used, modified, and also questioned. Recently a simplified approach using a one-sample t-test score to assess enrichment and ignoring gene-gene correlations was proposed by Irizarry et al. 2009 as a serious contender. The argument criticizes Gene Set Enrichment Analysis's nonparametric nature and its use of an empirical null distribution as unnecessary and hard to compute. We refute these claims by careful consideration of the assumptions of the simplified method and its results, including a comparison with Gene Set Enrichment Analysis's on a large benchmark set of 50 datasets. Our results provide strong empirical evidence that gene-gene correlations cannot be ignored due to the significant variance inflation they produced on the enrichment scores and should be taken into account when estimating gene set enrichment significance. In addition, we discuss the challenges that the complex correlation structure and multi-modality of gene sets pose more generally for gene set enrichment methods. © The Author(s) 2012.
Simplified Numerical Analysis of ECT Probe - Eddy Current Benchmark Problem 3

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sikora, R.; Chady, T.; Gratkowski, S.

2005-04-09

In this paper a third eddy current benchmark problem is considered. The objective of the benchmark is to determine optimal operating frequency and size of the pancake coil designated for testing tubes made of Inconel. It can be achieved by maximization of the change in impedance of the coil due to a flaw. Approximation functions of the probe (coil) characteristic were developed and used in order to reduce number of required calculations. It results in significant speed up of the optimization process. An optimal testing frequency and size of the probe were achieved as a final result of the calculation.
Simulated annealing with restart strategy for the blood pickup routing problem

NASA Astrophysics Data System (ADS)

Yu, V. F.; Iswari, T.; Normasari, N. M. E.; Asih, A. M. S.; Ting, H.

2018-04-01

This study develops a simulated annealing heuristic with restart strategy (SA_RS) for solving the blood pickup routing problem (BPRP). BPRP minimizes the total length of the routes for blood bag collection between a blood bank and a set of donation sites, each associated with a time window constraint that must be observed. The proposed SA_RS is implemented in C++ and tested on benchmark instances of the vehicle routing problem with time windows to verify its performance. The algorithm is then tested on some newly generated BPRP instances and the results are compared with those obtained by CPLEX. Experimental results show that the proposed SA_RS heuristic effectively solves BPRP.
Mathematical model and metaheuristics for simultaneous balancing and sequencing of a robotic mixed-model assembly line

NASA Astrophysics Data System (ADS)

Li, Zixiang; Janardhanan, Mukund Nilakantan; Tang, Qiuhua; Nielsen, Peter

2018-05-01

This article presents the first method to simultaneously balance and sequence robotic mixed-model assembly lines (RMALB/S), which involves three sub-problems: task assignment, model sequencing and robot allocation. A new mixed-integer programming model is developed to minimize makespan and, using CPLEX solver, small-size problems are solved for optimality. Two metaheuristics, the restarted simulated annealing algorithm and co-evolutionary algorithm, are developed and improved to address this NP-hard problem. The restarted simulated annealing method replaces the current temperature with a new temperature to restart the search process. The co-evolutionary method uses a restart mechanism to generate a new population by modifying several vectors simultaneously. The proposed algorithms are tested on a set of benchmark problems and compared with five other high-performing metaheuristics. The proposed algorithms outperform their original editions and the benchmarked methods. The proposed algorithms are able to solve the balancing and sequencing problem of a robotic mixed-model assembly line effectively and efficiently.
Image segmentation with a novel regularized composite shape prior based on surrogate study

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhao, Tingting, E-mail: tingtingzhao@mednet.ucla.edu; Ruan, Dan, E-mail: druan@mednet.ucla.edu

Purpose: Incorporating training into image segmentation is a good approach to achieve additional robustness. This work aims to develop an effective strategy to utilize shape prior knowledge, so that the segmentation label evolution can be driven toward the desired global optimum. Methods: In the variational image segmentation framework, a regularization for the composite shape prior is designed to incorporate the geometric relevance of individual training data to the target, which is inferred by an image-based surrogate relevance metric. Specifically, this regularization is imposed on the linear weights of composite shapes and serves as a hyperprior. The overall problem is formulatedmore » in a unified optimization setting and a variational block-descent algorithm is derived. Results: The performance of the proposed scheme is assessed in both corpus callosum segmentation from an MR image set and clavicle segmentation based on CT images. The resulted shape composition provides a proper preference for the geometrically relevant training data. A paired Wilcoxon signed rank test demonstrates statistically significant improvement of image segmentation accuracy, when compared to multiatlas label fusion method and three other benchmark active contour schemes. Conclusions: This work has developed a novel composite shape prior regularization, which achieves superior segmentation performance than typical benchmark schemes.« less
Contra-Rotating Open Rotor Tone Noise Prediction

NASA Technical Reports Server (NTRS)

Envia, Edmane

2014-01-01

Reliable prediction of contra-rotating open rotor (CROR) noise is an essential element of any strategy for the development of low-noise open rotor propulsion systems that can meet both the community noise regulations and the cabin noise limits. Since CROR noise spectra typically exhibits a preponderance of tones, significant efforts have been directed towards predicting their tone spectra. To that end, there has been an ongoing effort at NASA to assess various in-house open rotor tone noise prediction tools using a benchmark CROR blade set for which significant aerodynamic and acoustic data had been acquired in wind tunnel tests. In the work presented here, the focus is on the near-field noise of the benchmark open rotor blade set at the cruise condition. Using an analytical CROR tone noise model with input from high-fidelity aerodynamic simulations, detailed tone noise spectral predictions have been generated and compared with the experimental data. Comparisons indicate that the theoretical predictions are in good agreement with the data, especially for the dominant CROR tones and their overall sound pressure level. The results also indicate that, whereas individual rotor tones are well predicted by the linear sources (i.e., thickness and loading), for the interaction tones it is essential that the quadrupole sources be included in the analysis.
Contra-Rotating Open Rotor Tone Noise Prediction

NASA Technical Reports Server (NTRS)

Envia, Edmane

2014-01-01

Reliable prediction of contra-rotating open rotor (CROR) noise is an essential element of any strategy for the development of low-noise open rotor propulsion systems that can meet both the community noise regulations and cabin noise limits. Since CROR noise spectra exhibit a preponderance of tones, significant efforts have been directed towards predicting their tone content. To that end, there has been an ongoing effort at NASA to assess various in-house open rotor tone noise prediction tools using a benchmark CROR blade set for which significant aerodynamic and acoustic data have been acquired in wind tunnel tests. In the work presented here, the focus is on the nearfield noise of the benchmark open rotor blade set at the cruise condition. Using an analytical CROR tone noise model with input from high-fidelity aerodynamic simulations, tone noise spectra have been predicted and compared with the experimental data. Comparisons indicate that the theoretical predictions are in good agreement with the data, especially for the dominant tones and for the overall sound pressure level of tones. The results also indicate that, whereas the individual rotor tones are well predicted by the combination of the thickness and loading sources, for the interaction tones it is essential that the quadrupole source is also included in the analysis.
Level-set simulations of soluble surfactant driven flows

NASA Astrophysics Data System (ADS)

Cleret de Langavant, Charles; Guittet, Arthur; Theillard, Maxime; Temprano-Coleto, Fernando; Gibou, Frédéric

2017-11-01

We present an approach to simulate the diffusion, advection and adsorption-desorption of a material quantity defined on an interface in two and three spatial dimensions. We use a level-set approach to capture the interface motion and a Quad/Octree data structure to efficiently solve the equations describing the underlying physics. Coupling with a Navier-Stokes solver enables the study of the effect of soluble surfactants that locally modify the parameters of surface tension on different types of flows. The method is tested on several benchmarks and applied to three typical examples of flows in the presence of surfactant: a bubble in a shear flow, the well-known phenomenon of tears of wine, and the Landau-Levich coating problem.
Water adsorption on a copper formate paddlewheel model of CuBTC: A comparative MP2 and DFT study

NASA Astrophysics Data System (ADS)

Toda, Jordi; Fischer, Michael; Jorge, Miguel; Gomes, José R. B.

2013-11-01

Simultaneous adsorption of two water molecules on open metal sites of the HKUST-1 metal-organic framework (MOF), modeled with a Cu2(HCOO)4 cluster, was studied by means of density functional theory (DFT) and second-order Moller-Plesset (MP2) approaches together with correlation consistent basis sets. Experimental geometries and MP2 energetic data extrapolated to the complete basis set limit were used as benchmarks for testing the accuracy of several different exchange-correlation functionals in the correct description of the water-MOF interaction. M06-L and some LC-DFT methods arise as the most appropriate in terms of the quality of geometrical data, energetic data and computational resources needed.
Ranking metrics in gene set enrichment analysis: do they matter?

PubMed

Zyla, Joanna; Marczyk, Michal; Weiner, January; Polanska, Joanna

2017-05-12

There exist many methods for describing the complex relation between changes of gene expression in molecular pathways or gene ontologies under different experimental conditions. Among them, Gene Set Enrichment Analysis seems to be one of the most commonly used (over 10,000 citations). An important parameter, which could affect the final result, is the choice of a metric for the ranking of genes. Applying a default ranking metric may lead to poor results. In this work 28 benchmark data sets were used to evaluate the sensitivity and false positive rate of gene set analysis for 16 different ranking metrics including new proposals. Furthermore, the robustness of the chosen methods to sample size was tested. Using k-means clustering algorithm a group of four metrics with the highest performance in terms of overall sensitivity, overall false positive rate and computational load was established i.e. absolute value of Moderated Welch Test statistic, Minimum Significant Difference, absolute value of Signal-To-Noise ratio and Baumgartner-Weiss-Schindler test statistic. In case of false positive rate estimation, all selected ranking metrics were robust with respect to sample size. In case of sensitivity, the absolute value of Moderated Welch Test statistic and absolute value of Signal-To-Noise ratio gave stable results, while Baumgartner-Weiss-Schindler and Minimum Significant Difference showed better results for larger sample size. Finally, the Gene Set Enrichment Analysis method with all tested ranking metrics was parallelised and implemented in MATLAB, and is available at https://github.com/ZAEDPolSl/MrGSEA . Choosing a ranking metric in Gene Set Enrichment Analysis has critical impact on results of pathway enrichment analysis. The absolute value of Moderated Welch Test has the best overall sensitivity and Minimum Significant Difference has the best overall specificity of gene set analysis. When the number of non-normally distributed genes is high, using Baumgartner-Weiss-Schindler test statistic gives better outcomes. Also, it finds more enriched pathways than other tested metrics, which may induce new biological discoveries.
MOTIVATION: Goals and Goal Setting

ERIC Educational Resources Information Center

Stratton, Richard K.

2005-01-01

Goal setting has great impact on a team's performance. Goals enable a team to synchronize their efforts to achieve success. In this article, the author talks about goals and goal setting. This articles complements Domain 5--Teaching and Communication (p.14) and discusses one of the benchmarks listed therein: "Teach the goal setting process and…
Generation of openEHR Test Datasets for Benchmarking.

PubMed

El Helou, Samar; Karvonen, Tuukka; Yamamoto, Goshiro; Kume, Naoto; Kobayashi, Shinji; Kondo, Eiji; Hiragi, Shusuke; Okamoto, Kazuya; Tamura, Hiroshi; Kuroda, Tomohiro

2017-01-01

openEHR is a widely used EHR specification. Given its technology-independent nature, different approaches for implementing openEHR data repositories exist. Public openEHR datasets are needed to conduct benchmark analyses over different implementations. To address their current unavailability, we propose a method for generating openEHR test datasets that can be publicly shared and used.
Is Higher Better? Determinants and Comparisons of Performance on the Major Field Test in Business

ERIC Educational Resources Information Center

Bielinska-Kwapisz, Agnieszka; Brown, F. William; Semenik, Richard

2012-01-01

Student performance on the Major Field Achievement Test in Business is an important benchmark for college of business programs. The authors' results indicate that such benchmarking can only be meaningful if certain student characteristics are taken into account. The differences in achievement between cohorts are explored in detail by separating…

But What Do You Do with the Data?

ERIC Educational Resources Information Center

Matthews, Jan; Trimble, Susan; Gay, Anne

2007-01-01

Using data to redesign instruction is a means of increasing student achievement. Educators in Camden County (Georgia) Schools have used data from benchmark testing since 1999. They hired a commercial vendor to design a benchmark test that is administered four times a year and use the data to generate subject-area reports that can be further…
ViSAPy: a Python tool for biophysics-based generation of virtual spiking activity for evaluation of spike-sorting algorithms.

PubMed

Hagen, Espen; Ness, Torbjørn V; Khosrowshahi, Amir; Sørensen, Christina; Fyhn, Marianne; Hafting, Torkel; Franke, Felix; Einevoll, Gaute T

2015-04-30

New, silicon-based multielectrodes comprising hundreds or more electrode contacts offer the possibility to record spike trains from thousands of neurons simultaneously. This potential cannot be realized unless accurate, reliable automated methods for spike sorting are developed, in turn requiring benchmarking data sets with known ground-truth spike times. We here present a general simulation tool for computing benchmarking data for evaluation of spike-sorting algorithms entitled ViSAPy (Virtual Spiking Activity in Python). The tool is based on a well-established biophysical forward-modeling scheme and is implemented as a Python package built on top of the neuronal simulator NEURON and the Python tool LFPy. ViSAPy allows for arbitrary combinations of multicompartmental neuron models and geometries of recording multielectrodes. Three example benchmarking data sets are generated, i.e., tetrode and polytrode data mimicking in vivo cortical recordings and microelectrode array (MEA) recordings of in vitro activity in salamander retinas. The synthesized example benchmarking data mimics salient features of typical experimental recordings, for example, spike waveforms depending on interspike interval. ViSAPy goes beyond existing methods as it includes biologically realistic model noise, synaptic activation by recurrent spiking networks, finite-sized electrode contacts, and allows for inhomogeneous electrical conductivities. ViSAPy is optimized to allow for generation of long time series of benchmarking data, spanning minutes of biological time, by parallel execution on multi-core computers. ViSAPy is an open-ended tool as it can be generalized to produce benchmarking data or arbitrary recording-electrode geometries and with various levels of complexity. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
RASSP Benchmark 4 Technical Description.

DTIC Science & Technology

1998-01-09

be carried out. Based on results of the study, an implementation of all, or part, of the system described in this benchmark technical description...validate interface and timing constraints. The ISA level of modeling defines the limit of detail expected in the VHDL virtual prototype. It does not...develop a set of candidate architectures and perform an architecture trade-off study. Candidate proces- sor implementations must then be examined for
Building Bridges Between Geoscience and Data Science through Benchmark Data Sets

NASA Astrophysics Data System (ADS)

Thompson, D. R.; Ebert-Uphoff, I.; Demir, I.; Gel, Y.; Hill, M. C.; Karpatne, A.; Güereque, M.; Kumar, V.; Cabral, E.; Smyth, P.

2017-12-01

The changing nature of observational field data demands richer and more meaningful collaboration between data scientists and geoscientists. Thus, among other efforts, the Working Group on Case Studies of the NSF-funded RCN on Intelligent Systems Research To Support Geosciences (IS-GEO) is developing a framework to strengthen such collaborations through the creation of benchmark datasets. Benchmark datasets provide an interface between disciplines without requiring extensive background knowledge. The goals are to create (1) a means for two-way communication between geoscience and data science researchers; (2) new collaborations, which may lead to new approaches for data analysis in the geosciences; and (3) a public, permanent repository of complex data sets, representative of geoscience problems, useful to coordinate efforts in research and education. The group identified 10 key elements and characteristics for ideal benchmarks. High impact: A problem with high potential impact. Active research area: A group of geoscientists should be eager to continue working on the topic. Challenge: The problem should be challenging for data scientists. Data science generality and versatility: It should stimulate development of new general and versatile data science methods. Rich information content: Ideally the data set provides stimulus for analysis at many different levels. Hierarchical problem statement: A hierarchy of suggested analysis tasks, from relatively straightforward to open-ended tasks. Means for evaluating success: Data scientists and geoscientists need means to evaluate whether the algorithms are successful and achieve intended purpose. Quick start guide: Introduction for data scientists on how to easily read the data to enable rapid initial data exploration. Geoscience context: Summary for data scientists of the specific data collection process, instruments used, any pre-processing and the science questions to be answered. Citability: A suitable identifier to facilitate tracking the use of the benchmark later on, e.g. allowing search engines to find all research papers using it. A first sample benchmark developed in collaboration with the Jet Propulsion Laboratory (JPL) deals with the automatic analysis of imaging spectrometer data to detect significant methane sources in the atmosphere.
A new numerical benchmark for variably saturated variable-density flow and transport in porous media

NASA Astrophysics Data System (ADS)

Guevara, Carlos; Graf, Thomas

2016-04-01

In subsurface hydrological systems, spatial and temporal variations in solute concentration and/or temperature may affect fluid density and viscosity. These variations could lead to potentially unstable situations, in which a dense fluid overlies a less dense fluid. These situations could produce instabilities that appear as dense plume fingers migrating downwards counteracted by vertical upwards flow of freshwater (Simmons et al., Transp. Porous Medium, 2002). As a result of unstable variable-density flow, solute transport rates are increased over large distances and times as compared to constant-density flow. The numerical simulation of variable-density flow in saturated and unsaturated media requires corresponding benchmark problems against which a computer model is validated (Diersch and Kolditz, Adv. Water Resour, 2002). Recorded data from a laboratory-scale experiment of variable-density flow and solute transport in saturated and unsaturated porous media (Simmons et al., Transp. Porous Medium, 2002) is used to define a new numerical benchmark. The HydroGeoSphere code (Therrien et al., 2004) coupled with PEST (www.pesthomepage.org) are used to obtain an optimized parameter set capable of adequately representing the data set by Simmons et al., (2002). Fingering in the numerical model is triggered using random hydraulic conductivity fields. Due to the inherent randomness, a large number of simulations were conducted in this study. The optimized benchmark model adequately predicts the plume behavior and the fate of solutes. This benchmark is useful for model verification of variable-density flow problems in saturated and/or unsaturated media.
Parallel ALLSPD-3D: Speeding Up Combustor Analysis Via Parallel Processing

NASA Technical Reports Server (NTRS)

Fricker, David M.

1997-01-01

The ALLSPD-3D Computational Fluid Dynamics code for reacting flow simulation was run on a set of benchmark test cases to determine its parallel efficiency. These test cases included non-reacting and reacting flow simulations with varying numbers of processors. Also, the tests explored the effects of scaling the simulation with the number of processors in addition to distributing a constant size problem over an increasing number of processors. The test cases were run on a cluster of IBM RS/6000 Model 590 workstations with ethernet and ATM networking plus a shared memory SGI Power Challenge L workstation. The results indicate that the network capabilities significantly influence the parallel efficiency, i.e., a shared memory machine is fastest and ATM networking provides acceptable performance. The limitations of ethernet greatly hamper the rapid calculation of flows using ALLSPD-3D.
Standardised Benchmarking in the Quest for Orthologs

PubMed Central

Altenhoff, Adrian M.; Boeckmann, Brigitte; Capella-Gutierrez, Salvador; Dalquen, Daniel A.; DeLuca, Todd; Forslund, Kristoffer; Huerta-Cepas, Jaime; Linard, Benjamin; Pereira, Cécile; Pryszcz, Leszek P.; Schreiber, Fabian; Sousa da Silva, Alan; Szklarczyk, Damian; Train, Clément-Marie; Bork, Peer; Lecompte, Odile; von Mering, Christian; Xenarios, Ioannis; Sjölander, Kimmen; Juhl Jensen, Lars; Martin, Maria J.; Muffato, Matthieu; Gabaldón, Toni; Lewis, Suzanna E.; Thomas, Paul D.; Sonnhammer, Erik; Dessimoz, Christophe

2016-01-01

The identification of evolutionarily related genes across different species—orthologs in particular—forms the backbone of many comparative, evolutionary, and functional genomic analyses. Achieving high accuracy in orthology inference is thus essential. Yet the true evolutionary history of genes, required to ascertain orthology, is generally unknown. Furthermore, orthologs are used for very different applications across different phyla, with different requirements in terms of the precision-recall trade-off. As a result, assessing the performance of orthology inference methods remains difficult for both users and method developers. Here, we present a community effort to establish standards in orthology benchmarking and facilitate orthology benchmarking through an automated web-based service (http://orthology.benchmarkservice.org). Using this new service, we characterise the performance of 15 well-established orthology inference methods and resources on a battery of 20 different benchmarks. Standardised benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimal requirement for new tools and resources, and guides the development of more accurate orthology inference methods. PMID:27043882
Benchmarking CRISPR on-target sgRNA design.

PubMed

Yan, Jifang; Chuai, Guohui; Zhou, Chi; Zhu, Chenyu; Yang, Jing; Zhang, Chao; Gu, Feng; Xu, Han; Wei, Jia; Liu, Qi

2017-02-15

CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)-based gene editing has been widely implemented in various cell types and organisms. A major challenge in the effective application of the CRISPR system is the need to design highly efficient single-guide RNA (sgRNA) with minimal off-target cleavage. Several tools are available for sgRNA design, while limited tools were compared. In our opinion, benchmarking the performance of the available tools and indicating their applicable scenarios are important issues. Moreover, whether the reported sgRNA design rules are reproducible across different sgRNA libraries, cell types and organisms remains unclear. In our study, a systematic and unbiased benchmark of the sgRNA predicting efficacy was performed on nine representative on-target design tools, based on six benchmark data sets covering five different cell types. The benchmark study presented here provides novel quantitative insights into the available CRISPR tools. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Learning moment-based fast local binary descriptor

NASA Astrophysics Data System (ADS)

Bellarbi, Abdelkader; Zenati, Nadia; Otmane, Samir; Belghit, Hayet

2017-03-01

Recently, binary descriptors have attracted significant attention due to their speed and low memory consumption; however, using intensity differences to calculate the binary descriptive vector is not efficient enough. We propose an approach to binary description called POLAR_MOBIL, in which we perform binary tests between geometrical and statistical information using moments in the patch instead of the classical intensity binary test. In addition, we introduce a learning technique used to select an optimized set of binary tests with low correlation and high variance. This approach offers high distinctiveness against affine transformations and appearance changes. An extensive evaluation on well-known benchmark datasets reveals the robustness and the effectiveness of the proposed descriptor, as well as its good performance in terms of low computation complexity when compared with state-of-the-art real-time local descriptors.
A Firefly-Inspired Method for Protein Structure Prediction in Lattice Models

PubMed Central

Maher, Brian; Albrecht, Andreas A.; Loomes, Martin; Yang, Xin-She; Steinhöfel, Kathleen

2014-01-01

We introduce a Firefly-inspired algorithmic approach for protein structure prediction over two different lattice models in three-dimensional space. In particular, we consider three-dimensional cubic and three-dimensional face-centred-cubic (FCC) lattices. The underlying energy models are the Hydrophobic-Polar (H-P) model, the Miyazawa–Jernigan (M-J) model and a related matrix model. The implementation of our approach is tested on ten H-P benchmark problems of a length of 48 and ten M-J benchmark problems of a length ranging from 48 until 61. The key complexity parameter we investigate is the total number of objective function evaluations required to achieve the optimum energy values for the H-P model or competitive results in comparison to published values for the M-J model. For H-P instances and cubic lattices, where data for comparison are available, we obtain an average speed-up over eight instances of 2.1, leaving out two extreme values (otherwise, 8.8). For six M-J instances, data for comparison are available for cubic lattices and runs with a population size of 100, where, a priori, the minimum free energy is a termination criterion. The average speed-up over four instances is 1.2 (leaving out two extreme values, otherwise 1.1), which is achieved for a population size of only eight instances. The present study is a test case with initial results for ad hoc parameter settings, with the aim of justifying future research on larger instances within lattice model settings, eventually leading to the ultimate goal of implementations for off-lattice models. PMID:24970205
A firefly-inspired method for protein structure prediction in lattice models.

PubMed

Maher, Brian; Albrecht, Andreas A; Loomes, Martin; Yang, Xin-She; Steinhöfel, Kathleen

2014-01-07

We introduce a Firefly-inspired algorithmic approach for protein structure prediction over two different lattice models in three-dimensional space. In particular, we consider three-dimensional cubic and three-dimensional face-centred-cubic (FCC) lattices. The underlying energy models are the Hydrophobic-Polar (H-P) model, the Miyazawa-Jernigan (M-J) model and a related matrix model. The implementation of our approach is tested on ten H-P benchmark problems of a length of 48 and ten M-J benchmark problems of a length ranging from 48 until 61. The key complexity parameter we investigate is the total number of objective function evaluations required to achieve the optimum energy values for the H-P model or competitive results in comparison to published values for the M-J model. For H-P instances and cubic lattices, where data for comparison are available, we obtain an average speed-up over eight instances of 2.1, leaving out two extreme values (otherwise, 8.8). For six M-J instances, data for comparison are available for cubic lattices and runs with a population size of 100, where, a priori, the minimum free energy is a termination criterion. The average speed-up over four instances is 1.2 (leaving out two extreme values, otherwise 1.1), which is achieved for a population size of only eight instances. The present study is a test case with initial results for ad hoc parameter settings, with the aim of justifying future research on larger instances within lattice model settings, eventually leading to the ultimate goal of implementations for off-lattice models.
Pairwise Measures of Causal Direction in the Epidemiology of Sleep Problems and Depression

PubMed Central

Rosenström, Tom; Jokela, Markus; Puttonen, Sampsa; Hintsanen, Mirka; Pulkki-Råback, Laura; Viikari, Jorma S.; Raitakari, Olli T.; Keltikangas-Järvinen, Liisa

2012-01-01

Depressive mood is often preceded by sleep problems, suggesting that they increase the risk of depression. Sleep problems can also reflect prodromal symptom of depression, thus temporal precedence alone is insufficient to confirm causality. The authors applied recently introduced statistical causal-discovery algorithms that can estimate causality from cross-sectional samples in order to infer the direction of causality between the two sets of symptoms from a novel perspective. Two common-population samples were used; one from the Young Finns study (690 men and 997 women, average age 37.7 years, range 30–45), and another from the Wisconsin Longitudinal study (3101 men and 3539 women, average age 53.1 years, range 52–55). These included three depression questionnaires (two in Young Finns data) and two sleep problem questionnaires. Three different causality estimates were constructed for each data set, tested in a benchmark data with a (practically) known causality, and tested for assumption violations using simulated data. Causality algorithms performed well in the benchmark data and simulations, and a prediction was drawn for future empirical studies to confirm: for minor depression/dysphoria, sleep problems cause significantly more dysphoria than dysphoria causes sleep problems. The situation may change as depression becomes more severe, or more severe levels of symptoms are evaluated; also, artefacts due to severe depression being less well presented in the population data than minor depression may intervene the estimation for depression scales that emphasize severe symptoms. The findings are consistent with other emerging epidemiological and biological evidence. PMID:23226400
ForceGen 3D structure and conformer generation: from small lead-like molecules to macrocyclic drugs

NASA Astrophysics Data System (ADS)

Cleves, Ann E.; Jain, Ajay N.

2017-05-01

We introduce the ForceGen method for 3D structure generation and conformer elaboration of drug-like small molecules. ForceGen is novel, avoiding use of distance geometry, molecular templates, or simulation-oriented stochastic sampling. The method is primarily driven by the molecular force field, implemented using an extension of MMFF94s and a partial charge estimator based on electronegativity-equalization. The force field is coupled to algorithms for direct sampling of realistic physical movements made by small molecules. Results are presented on a standard benchmark from the Cambridge Crystallographic Database of 480 drug-like small molecules, including full structure generation from SMILES strings. Reproduction of protein-bound crystallographic ligand poses is demonstrated on four carefully curated data sets: the ConfGen Set (667 ligands), the PINC cross-docking benchmark (1062 ligands), a large set of macrocyclic ligands (182 total with typical ring sizes of 12-23 atoms), and a commonly used benchmark for evaluating macrocycle conformer generation (30 ligands total). Results compare favorably to alternative methods, and performance on macrocyclic compounds approaches that observed on non-macrocycles while yielding a roughly 100-fold speed improvement over alternative MD-based methods with comparable performance.
featsel: A framework for benchmarking of feature selection algorithms and cost functions

NASA Astrophysics Data System (ADS)

Reis, Marcelo S.; Estrela, Gustavo; Ferreira, Carlos Eduardo; Barrera, Junior

In this paper, we introduce featsel, a framework for benchmarking of feature selection algorithms and cost functions. This framework allows the user to deal with the search space as a Boolean lattice and has its core coded in C++ for computational efficiency purposes. Moreover, featsel includes Perl scripts to add new algorithms and/or cost functions, generate random instances, plot graphs and organize results into tables. Besides, this framework already comes with dozens of algorithms and cost functions for benchmarking experiments. We also provide illustrative examples, in which featsel outperforms the popular Weka workbench in feature selection procedures on data sets from the UCI Machine Learning Repository.
Taming parallel I/O complexity with auto-tuning

DOE PAGES

Behzad, Babak; Luu, Huong Vu Thanh; Huchette, Joseph; ...

2013-11-17

We present an auto-tuning system for optimizing I/O performance of HDF5 applications and demonstrate its value across platforms, applications, and at scale. The system uses a genetic algorithm to search a large space of tunable parameters and to identify effective settings at all layers of the parallel I/O stack. The parameter settings are applied transparently by the auto-tuning system via dynamically intercepted HDF5 calls. To validate our auto-tuning system, we applied it to three I/O benchmarks (VPIC, VORPAL, and GCRM) that replicate the I/O activity of their respective applications. We tested the system with different weak-scaling configurations (128, 2048, andmore » 4096 CPU cores) that generate 30 GB to 1 TB of data, and executed these configurations on diverse HPC platforms (Cray XE6, IBM BG/P, and Dell Cluster). In all cases, the auto-tuning framework identified tunable parameters that substantially improved write performance over default system settings. In conclusion, we consistently demonstrate I/O write speedups between 2x and 100x for test configurations.« less
Decreasing unnecessary utilization in acute bronchiolitis care: results from the value in inpatient pediatrics network.

PubMed

Ralston, Shawn; Garber, Matthew; Narang, Steve; Shen, Mark; Pate, Brian; Pope, John; Lossius, Michele; Croland, Trina; Bennett, Jeff; Jewell, Jennifer; Krugman, Scott; Robbins, Elizabeth; Nazif, Joanne; Liewehr, Sheila; Miller, Ansley; Marks, Michelle; Pappas, Rita; Pardue, Jeanann; Quinonez, Ricardo; Fine, Bryan R; Ryan, Michael

2013-01-01

Acute viral bronchiolitis is the most common diagnosis resulting in hospital admission in pediatrics. Utilization of non-evidence-based therapies and testing remains common despite a large volume of evidence to guide quality improvement efforts. Our objective was to reduce utilization of unnecessary therapies in the inpatient care of bronchiolitis across a diverse network of clinical sites. We formed a voluntary quality improvement collaborative of pediatric hospitalists for the purpose of benchmarking the use of bronchodilators, steroids, chest radiography, chest physiotherapy, and viral testing in bronchiolitis using hospital administrative data. We shared resources within the network, including protocols, scores, order sets, and key bibliographies, and established group norms for decreasing utilization. Aggregate data on 11,568 hospitalizations for bronchiolitis from 17 centers was analyzed for this report. The network was organized in 2008. By 2010, we saw a 46% reduction in overall volume of bronchodilators used, a 3.4 dose per patient absolute decrease in utilization (95% confidence interval [CI] 1.4-5.8). Overall exposure to any dose of bronchodilator decreased by 12 percentage points as well (95% CI 5%-25%). There was also a statistically significant decline in chest physiotherapy usage, but not for steroids, chest radiography, or viral testing. Benchmarking within a voluntary pediatric hospitalist collaborative facilitated decreased utilization of bronchodilators and chest physiotherapy in bronchiolitis. Copyright © 2012 Society of Hospital Medicine.
Nonparametric estimation of benchmark doses in environmental risk assessment

PubMed Central

Piegorsch, Walter W.; Xiong, Hui; Bhattacharya, Rabi N.; Lin, Lizhen

2013-01-01

Summary An important statistical objective in environmental risk analysis is estimation of minimum exposure levels, called benchmark doses (BMDs), that induce a pre-specified benchmark response in a dose-response experiment. In such settings, representations of the risk are traditionally based on a parametric dose-response model. It is a well-known concern, however, that if the chosen parametric form is misspecified, inaccurate and possibly unsafe low-dose inferences can result. We apply a nonparametric approach for calculating benchmark doses, based on an isotonic regression method for dose-response estimation with quantal-response data (Bhattacharya and Kong, 2007). We determine the large-sample properties of the estimator, develop bootstrap-based confidence limits on the BMDs, and explore the confidence limits’ small-sample properties via a short simulation study. An example from cancer risk assessment illustrates the calculations. PMID:23914133
Benchmarking short sequence mapping tools

PubMed Central

2013-01-01

Background The development of next-generation sequencing instruments has led to the generation of millions of short sequences in a single run. The process of aligning these reads to a reference genome is time consuming and demands the development of fast and accurate alignment tools. However, the current proposed tools make different compromises between the accuracy and the speed of mapping. Moreover, many important aspects are overlooked while comparing the performance of a newly developed tool to the state of the art. Therefore, there is a need for an objective evaluation method that covers all the aspects. In this work, we introduce a benchmarking suite to extensively analyze sequencing tools with respect to various aspects and provide an objective comparison. Results We applied our benchmarking tests on 9 well known mapping tools, namely, Bowtie, Bowtie2, BWA, SOAP2, MAQ, RMAP, GSNAP, Novoalign, and mrsFAST (mrFAST) using synthetic data and real RNA-Seq data. MAQ and RMAP are based on building hash tables for the reads, whereas the remaining tools are based on indexing the reference genome. The benchmarking tests reveal the strengths and weaknesses of each tool. The results show that no single tool outperforms all others in all metrics. However, Bowtie maintained the best throughput for most of the tests while BWA performed better for longer read lengths. The benchmarking tests are not restricted to the mentioned tools and can be further applied to others. Conclusion The mapping process is still a hard problem that is affected by many factors. In this work, we provided a benchmarking suite that reveals and evaluates the different factors affecting the mapping process. Still, there is no tool that outperforms all of the others in all the tests. Therefore, the end user should clearly specify his needs in order to choose the tool that provides the best results. PMID:23758764
Using Benchmarking To Strengthen the Assessment of Persistence.

PubMed

McLachlan, Michael S; Zou, Hongyan; Gouin, Todd

2017-01-03

Chemical persistence is a key property for assessing chemical risk and chemical hazard. Current methods for evaluating persistence are based on laboratory tests. The relationship between the laboratory based estimates and persistence in the environment is often unclear, in which case the current methods for evaluating persistence can be questioned. Chemical benchmarking opens new possibilities to measure persistence in the field. In this paper we explore how the benchmarking approach can be applied in both the laboratory and the field to deepen our understanding of chemical persistence in the environment and create a firmer scientific basis for laboratory to field extrapolation of persistence test results.
Determining the sample size required to establish whether a medical device is non-inferior to an external benchmark.

PubMed

Sayers, Adrian; Crowther, Michael J; Judge, Andrew; Whitehouse, Michael R; Blom, Ashley W

2017-08-28

The use of benchmarks to assess the performance of implants such as those used in arthroplasty surgery is a widespread practice. It provides surgeons, patients and regulatory authorities with the reassurance that implants used are safe and effective. However, it is not currently clear how or how many implants should be statistically compared with a benchmark to assess whether or not that implant is superior, equivalent, non-inferior or inferior to the performance benchmark of interest.We aim to describe the methods and sample size required to conduct a one-sample non-inferiority study of a medical device for the purposes of benchmarking. Simulation study. Simulation study of a national register of medical devices. We simulated data, with and without a non-informative competing risk, to represent an arthroplasty population and describe three methods of analysis (z-test, 1-Kaplan-Meier and competing risks) commonly used in surgical research. We evaluate the performance of each method using power, bias, root-mean-square error, coverage and CI width. 1-Kaplan-Meier provides an unbiased estimate of implant net failure, which can be used to assess if a surgical device is non-inferior to an external benchmark. Small non-inferiority margins require significantly more individuals to be at risk compared with current benchmarking standards. A non-inferiority testing paradigm provides a useful framework for determining if an implant meets the required performance defined by an external benchmark. Current contemporary benchmarking standards have limited power to detect non-inferiority, and substantially larger samples sizes, in excess of 3200 procedures, are required to achieve a power greater than 60%. It is clear when benchmarking implant performance, net failure estimated using 1-KM is preferential to crude failure estimated by competing risk models. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

A Comparison of Web-Based Standard Setting and Monitored Standard Setting.

ERIC Educational Resources Information Center

Harvey, Anne L.; Way, Walter D.

Standard setting, when carefully done, can be an expensive and time-consuming process. The modified Angoff method and the benchmark method, as utilized in this study, employ representative panels of judges to provide recommended passing scores to standard setting decision-makers. It has been considered preferable to have the judges meet in a…
Efficient G0W0 using localized basis sets: a benchmark for molecules

NASA Astrophysics Data System (ADS)

Koval, Petr; Per Ljungberg, Mathias; Sanchez-Portal, Daniel

Electronic structure calculations within Hedin's GW approximation are becoming increasingly accessible to the community. In particular, as it has been shown earlier and we confirm by calculations using our MBPT_LCAO package, the computational cost of the so-called G0W0 can be made comparable to the cost of a regular Hartree-Fock calculation. In this work, we study the performance of our new implementation of G0W0 to reproduce the ionization potentials of all 117 closed-shell molecules belonging to the G2/97 test set, using a pseudo-potential starting point provided by the popular density-functional package SIESTA. Moreover, the ionization potentials and electron affinities of a set of 24 acceptor molecules are compared to experiment and to reference all-electron calculations. PK: Guipuzcoa Fellow; PK,ML,DSP: Deutsche Forschungsgemeinschaft (SFB1083); PK,DSP: MINECO MAT2013-46593-C6-2-P.
Simulation of Benchmark Cases with the Terminal Area Simulation System (TASS)

NASA Technical Reports Server (NTRS)

Ahmad, Nashat N.; Proctor, Fred H.

2011-01-01

The hydrodynamic core of the Terminal Area Simulation System (TASS) is evaluated against different benchmark cases. In the absence of closed form solutions for the equations governing atmospheric flows, the models are usually evaluated against idealized test cases. Over the years, various authors have suggested a suite of these idealized cases which have become standards for testing and evaluating the dynamics and thermodynamics of atmospheric flow models. In this paper, simulations of three such cases are described. In addition, the TASS model is evaluated against a test case that uses an exact solution of the Navier-Stokes equations. The TASS results are compared against previously reported simulations of these benchmark cases in the literature. It is demonstrated that the TASS model is highly accurate, stable and robust.
Benchmark Intelligent Agent Systems for Distributed Battle Tracking

DTIC Science & Technology

2008-06-20

services in the military and other domains, each entity in the benchmark system exposes a standard set of Web services. Jess ( Java Expert Shell...System) is a rule engine for the Java platform and is an interpreter for the Jess rule language. It is used here to implement policies that maintain...battle tracking system (DBTS), maintaining distributed situation awareness. The Java Agent DEvelopment (JADE) framework is a software framework
Managing for Results in America's Great City Schools 2014: Results from Fiscal Year 2012-13. A Report of the Performance Measurement and Benchmarking Project

ERIC Educational Resources Information Center

Council of the Great City Schools, 2014

2014-01-01

In 2002 the "Council of the Great City Schools" and its members set out to develop performance measures that could be used to improve business operations in urban public school districts. The Council launched the "Performance Measurement and Benchmarking Project" to achieve these objectives. The purposes of the project was to:…
Curriculum-Based Measurement of Oral Reading: An Evaluation of Growth Rates and Seasonal Effects among Students Served in General and Special Education

ERIC Educational Resources Information Center

Christ, Theodore J.; Silberglitt, Benjamin; Yeo, Seungsoo; Cormier, Damien

2010-01-01

Curriculum-based measurement of oral reading (CBM-R) is often used to benchmark growth in the fall, winter, and spring. CBM-R is also used to set goals and monitor student progress between benchmarking occasions. The results of previous research establish an expectation that weekly growth on CBM-R tasks is consistently linear throughout the…
PPI4DOCK: large scale assessment of the use of homology models in free docking over more than 1000 realistic targets.

PubMed

Yu, Jinchao; Guerois, Raphaël

2016-12-15

Protein-protein docking methods are of great importance for understanding interactomes at the structural level. It has become increasingly appealing to use not only experimental structures but also homology models of unbound subunits as input for docking simulations. So far we are missing a large scale assessment of the success of rigid-body free docking methods on homology models. We explored how we could benefit from comparative modelling of unbound subunits to expand docking benchmark datasets. Starting from a collection of 3157 non-redundant, high X-ray resolution heterodimers, we developed the PPI4DOCK benchmark containing 1417 docking targets based on unbound homology models. Rigid-body docking by Zdock showed that for 1208 cases (85.2%), at least one correct decoy was generated, emphasizing the efficiency of rigid-body docking in generating correct assemblies. Overall, the PPI4DOCK benchmark contains a large set of realistic cases and provides new ground for assessing docking and scoring methodologies. Benchmark sets can be downloaded from http://biodev.cea.fr/interevol/ppi4dock/ CONTACT: guerois@cea.frSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Evaluation of the Pool Critical Assembly Benchmark with Explicitly-Modeled Geometry using MCNP6

DOE PAGES

Kulesza, Joel A.; Martz, Roger Lee

2017-03-01

Despite being one of the most widely used benchmarks for qualifying light water reactor (LWR) radiation transport methods and data, no benchmark calculation of the Oak Ridge National Laboratory (ORNL) Pool Critical Assembly (PCA) pressure vessel wall benchmark facility (PVWBF) using MCNP6 with explicitly modeled core geometry exists. As such, this paper provides results for such an analysis. First, a criticality calculation is used to construct the fixed source term. Next, ADVANTG-generated variance reduction parameters are used within the final MCNP6 fixed source calculations. These calculations provide unadjusted dosimetry results using three sets of dosimetry reaction cross sections of varyingmore » ages (those packaged with MCNP6, from the IRDF-2002 multi-group library, and from the ACE-formatted IRDFF v1.05 library). These results are then compared to two different sets of measured reaction rates. The comparison agrees in an overall sense within 2% and on a specific reaction- and dosimetry location-basis within 5%. Except for the neptunium dosimetry, the individual foil raw calculation-to-experiment comparisons usually agree within 10% but is typically greater than unity. Finally, in the course of developing these calculations, geometry that has previously not been completely specified is provided herein for the convenience of future analysts.« less
Benchmarking hypercube hardware and software

NASA Technical Reports Server (NTRS)

Grunwald, Dirk C.; Reed, Daniel A.

1986-01-01

It was long a truism in computer systems design that balanced systems achieve the best performance. Message passing parallel processors are no different. To quantify the balance of a hypercube design, an experimental methodology was developed and the associated suite of benchmarks was applied to several existing hypercubes. The benchmark suite includes tests of both processor speed in the absence of internode communication and message transmission speed as a function of communication patterns.
Patient Safety Culture Survey in Pediatric Complex Care Settings: A Factor Analysis.

PubMed

Hessels, Amanda J; Murray, Meghan; Cohen, Bevin; Larson, Elaine L

2017-04-19

Children with complex medical needs are increasing in number and demanding the services of pediatric long-term care facilities (pLTC), which require a focus on patient safety culture (PSC). However, no tool to measure PSC has been tested in this unique hybrid acute care-residential setting. The objective of this study was to evaluate the psychometric properties of the Nursing Home Survey on Patient Safety Culture tool slightly modified for use in the pLTC setting. Factor analyses were performed on data collected from 239 staff at 3 pLTC in 2012. Items were screened by principal axis factoring, and the original structure was tested using confirmatory factor analysis. Exploratory factor analysis was conducted to identify the best model fit for the pLTC data, and factor reliability was assessed by Cronbach alpha. The extracted, rotated factor solution suggested items in 4 (staffing, nonpunitive response to mistakes, communication openness, and organizational learning) of the original 12 dimensions may not be a good fit for this population. Nevertheless, in the pLTC setting, both the original and the modified factor solutions demonstrated similar reliabilities to the published consistencies of the survey when tested in adult nursing homes and the items factored nearly identically as theorized. This study demonstrates that the Nursing Home Survey on Patient Safety Culture with minimal modification may be an appropriate instrument to measure PSC in pLTC settings. Additional psychometric testing is recommended to further validate the use of this instrument in this setting, including examining the relationship to safety outcomes. Increased use will yield data for benchmarking purposes across these specialized settings to inform frontline workers and organizational leaders of areas of strength and opportunity for improvement.
Benchmarking the cost efficiency of community care in Australian child and adolescent mental health services: implications for future benchmarking.

PubMed

Furber, Gareth; Brann, Peter; Skene, Clive; Allison, Stephen

2011-06-01

The purpose of this study was to benchmark the cost efficiency of community care across six child and adolescent mental health services (CAMHS) drawn from different Australian states. Organizational, contact and outcome data from the National Mental Health Benchmarking Project (NMHBP) data-sets were used to calculate cost per "treatment hour" and cost per episode for the six participating organizations. We also explored the relationship between intake severity as measured by the Health of the Nations Outcome Scales for Children and Adolescents (HoNOSCA) and cost per episode. The average cost per treatment hour was $223, with cost differences across the six services ranging from a mean of $156 to $273 per treatment hour. The average cost per episode was $3349 (median $1577) and there were significant differences in the CAMHS organizational medians ranging from $388 to $7076 per episode. HoNOSCA scores explained at best 6% of the cost variance per episode. These large cost differences indicate that community CAMHS have the potential to make substantial gains in cost efficiency through collaborative benchmarking. Benchmarking forums need considerable financial and business expertise for detailed comparison of business models for service provision.
A large-scale benchmark of gene prioritization methods.

PubMed

Guala, Dimitri; Sonnhammer, Erik L L

2017-04-21

In order to maximize the use of results from high-throughput experimental studies, e.g. GWAS, for identification and diagnostics of new disease-associated genes, it is important to have properly analyzed and benchmarked gene prioritization tools. While prospective benchmarks are underpowered to provide statistically significant results in their attempt to differentiate the performance of gene prioritization tools, a strategy for retrospective benchmarking has been missing, and new tools usually only provide internal validations. The Gene Ontology(GO) contains genes clustered around annotation terms. This intrinsic property of GO can be utilized in construction of robust benchmarks, objective to the problem domain. We demonstrate how this can be achieved for network-based gene prioritization tools, utilizing the FunCoup network. We use cross-validation and a set of appropriate performance measures to compare state-of-the-art gene prioritization algorithms: three based on network diffusion, NetRank and two implementations of Random Walk with Restart, and MaxLink that utilizes network neighborhood. Our benchmark suite provides a systematic and objective way to compare the multitude of available and future gene prioritization tools, enabling researchers to select the best gene prioritization tool for the task at hand, and helping to guide the development of more accurate methods.
Medical school benchmarking - from tools to programmes.

PubMed

Wilkinson, Tim J; Hudson, Judith N; Mccoll, Geoffrey J; Hu, Wendy C Y; Jolly, Brian C; Schuwirth, Lambert W T

2015-02-01

Benchmarking among medical schools is essential, but may result in unwanted effects. To apply a conceptual framework to selected benchmarking activities of medical schools. We present an analogy between the effects of assessment on student learning and the effects of benchmarking on medical school educational activities. A framework by which benchmarking can be evaluated was developed and applied to key current benchmarking activities in Australia and New Zealand. The analogy generated a conceptual framework that tested five questions to be considered in relation to benchmarking: what is the purpose? what are the attributes of value? what are the best tools to assess the attributes of value? what happens to the results? and, what is the likely "institutional impact" of the results? If the activities were compared against a blueprint of desirable medical graduate outcomes, notable omissions would emerge. Medical schools should benchmark their performance on a range of educational activities to ensure quality improvement and to assure stakeholders that standards are being met. Although benchmarking potentially has positive benefits, it could also result in perverse incentives with unforeseen and detrimental effects on learning if it is undertaken using only a few selected assessment tools.
Reliable B Cell Epitope Predictions: Impacts of Method Development and Improved Benchmarking

PubMed Central

Kringelum, Jens Vindahl; Lundegaard, Claus; Lund, Ole; Nielsen, Morten

2012-01-01

The interaction between antibodies and antigens is one of the most important immune system mechanisms for clearing infectious organisms from the host. Antibodies bind to antigens at sites referred to as B-cell epitopes. Identification of the exact location of B-cell epitopes is essential in several biomedical applications such as; rational vaccine design, development of disease diagnostics and immunotherapeutics. However, experimental mapping of epitopes is resource intensive making in silico methods an appealing complementary approach. To date, the reported performance of methods for in silico mapping of B-cell epitopes has been moderate. Several issues regarding the evaluation data sets may however have led to the performance values being underestimated: Rarely, all potential epitopes have been mapped on an antigen, and antibodies are generally raised against the antigen in a given biological context not against the antigen monomer. Improper dealing with these aspects leads to many artificial false positive predictions and hence to incorrect low performance values. To demonstrate the impact of proper benchmark definitions, we here present an updated version of the DiscoTope method incorporating a novel spatial neighborhood definition and half-sphere exposure as surface measure. Compared to other state-of-the-art prediction methods, Discotope-2.0 displayed improved performance both in cross-validation and in independent evaluations. Using DiscoTope-2.0, we assessed the impact on performance when using proper benchmark definitions. For 13 proteins in the training data set where sufficient biological information was available to make a proper benchmark redefinition, the average AUC performance was improved from 0.791 to 0.824. Similarly, the average AUC performance on an independent evaluation data set improved from 0.712 to 0.727. Our results thus demonstrate that given proper benchmark definitions, B-cell epitope prediction methods achieve highly significant predictive performances suggesting these tools to be a powerful asset in rational epitope discovery. The updated version of DiscoTope is available at www.cbs.dtu.dk/services/DiscoTope-2.0. PMID:23300419
Enrichment assessment of multiple virtual screening strategies for Toll-like receptor 8 agonists based on a maximal unbiased benchmarking data set.

PubMed

Pei, Fen; Jin, Hongwei; Zhou, Xin; Xia, Jie; Sun, Lidan; Liu, Zhenming; Zhang, Liangren

2015-11-01

Toll-like receptor 8 agonists, which activate adaptive immune responses by inducing robust production of T-helper 1-polarizing cytokines, are promising candidates for vaccine adjuvants. As the binding site of toll-like receptor 8 is large and highly flexible, virtual screening by individual method has inevitable limitations; thus, a comprehensive comparison of different methods may provide insights into seeking effective strategy for the discovery of novel toll-like receptor 8 agonists. In this study, the performance of knowledge-based pharmacophore, shape-based 3D screening, and combined strategies was assessed against a maximum unbiased benchmarking data set containing 13 actives and 1302 decoys specialized for toll-like receptor 8 agonists. Prior structure-activity relationship knowledge was involved in knowledge-based pharmacophore generation, and a set of antagonists was innovatively used to verify the selectivity of the selected knowledge-based pharmacophore. The benchmarking data set was generated from our recently developed 'mubd-decoymaker' protocol. The enrichment assessment demonstrated a considerable performance through our selected three-layer virtual screening strategy: knowledge-based pharmacophore (Phar1) screening, shape-based 3D similarity search (Q4_combo), and then a Gold docking screening. This virtual screening strategy could be further employed to perform large-scale database screening and to discover novel toll-like receptor 8 agonists. © 2015 John Wiley & Sons A/S.
Open-source platform to benchmark fingerprints for ligand-based virtual screening

PubMed Central

2013-01-01

Similarity-search methods using molecular fingerprints are an important tool for ligand-based virtual screening. A huge variety of fingerprints exist and their performance, usually assessed in retrospective benchmarking studies using data sets with known actives and known or assumed inactives, depends largely on the validation data sets used and the similarity measure used. Comparing new methods to existing ones in any systematic way is rather difficult due to the lack of standard data sets and evaluation procedures. Here, we present a standard platform for the benchmarking of 2D fingerprints. The open-source platform contains all source code, structural data for the actives and inactives used (drawn from three publicly available collections of data sets), and lists of randomly selected query molecules to be used for statistically valid comparisons of methods. This allows the exact reproduction and comparison of results for future studies. The results for 12 standard fingerprints together with two simple baseline fingerprints assessed by seven evaluation methods are shown together with the correlations between methods. High correlations were found between the 12 fingerprints and a careful statistical analysis showed that only the two baseline fingerprints were different from the others in a statistically significant way. High correlations were also found between six of the seven evaluation methods, indicating that despite their seeming differences, many of these methods are similar to each other. PMID:23721588
Verification of cardiac mechanics software: benchmark problems and solutions for testing active and passive material behaviour.

PubMed

Land, Sander; Gurev, Viatcheslav; Arens, Sander; Augustin, Christoph M; Baron, Lukas; Blake, Robert; Bradley, Chris; Castro, Sebastian; Crozier, Andrew; Favino, Marco; Fastl, Thomas E; Fritz, Thomas; Gao, Hao; Gizzi, Alessio; Griffith, Boyce E; Hurtado, Daniel E; Krause, Rolf; Luo, Xiaoyu; Nash, Martyn P; Pezzuto, Simone; Plank, Gernot; Rossi, Simone; Ruprecht, Daniel; Seemann, Gunnar; Smith, Nicolas P; Sundnes, Joakim; Rice, J Jeremy; Trayanova, Natalia; Wang, Dafang; Jenny Wang, Zhinuo; Niederer, Steven A

2015-12-08

Models of cardiac mechanics are increasingly used to investigate cardiac physiology. These models are characterized by a high level of complexity, including the particular anisotropic material properties of biological tissue and the actively contracting material. A large number of independent simulation codes have been developed, but a consistent way of verifying the accuracy and replicability of simulations is lacking. To aid in the verification of current and future cardiac mechanics solvers, this study provides three benchmark problems for cardiac mechanics. These benchmark problems test the ability to accurately simulate pressure-type forces that depend on the deformed objects geometry, anisotropic and spatially varying material properties similar to those seen in the left ventricle and active contractile forces. The benchmark was solved by 11 different groups to generate consensus solutions, with typical differences in higher-resolution solutions at approximately 0.5%, and consistent results between linear, quadratic and cubic finite elements as well as different approaches to simulating incompressible materials. Online tools and solutions are made available to allow these tests to be effectively used in verification of future cardiac mechanics software.
Evaluation of control strategies using an oxidation ditch benchmark.

PubMed

Abusam, A; Keesman, K J; Spanjers, H; van, Straten G; Meinema, K

2002-01-01

This paper presents validation and implementation results of a benchmark developed for a specific full-scale oxidation ditch wastewater treatment plant. A benchmark is a standard simulation procedure that can be used as a tool in evaluating various control strategies proposed for wastewater treatment plants. It is based on model and performance criteria development. Testing of this benchmark, by comparing benchmark predictions to real measurements of the electrical energy consumptions and amounts of disposed sludge for a specific oxidation ditch WWTP, has shown that it can (reasonably) be used for evaluating the performance of this WWTP. Subsequently, the validated benchmark was then used in evaluating some basic and advanced control strategies. Some of the interesting results obtained are the following: (i) influent flow splitting ratio, between the first and the fourth aerated compartments of the ditch, has no significant effect on the TN concentrations in the effluent, and (ii) for evaluation of long-term control strategies, future benchmarks need to be able to assess settlers' performance.
Benchmark Evaluation of Start-Up and Zero-Power Measurements at the High-Temperature Engineering Test Reactor

DOE PAGES

Bess, John D.; Fujimoto, Nozomu

2014-10-09

Benchmark models were developed to evaluate six cold-critical and two warm-critical, zero-power measurements of the HTTR. Additional measurements of a fully-loaded subcritical configuration, core excess reactivity, shutdown margins, six isothermal temperature coefficients, and axial reaction-rate distributions were also evaluated as acceptable benchmark experiments. Insufficient information is publicly available to develop finely-detailed models of the HTTR as much of the design information is still proprietary. However, the uncertainties in the benchmark models are judged to be of sufficient magnitude to encompass any biases and bias uncertainties incurred through the simplification process used to develop the benchmark models. Dominant uncertainties in themore » experimental keff for all core configurations come from uncertainties in the impurity content of the various graphite blocks that comprise the HTTR. Monte Carlo calculations of keff are between approximately 0.9 % and 2.7 % greater than the benchmark values. Reevaluation of the HTTR models as additional information becomes available could improve the quality of this benchmark and possibly reduce the computational biases. High-quality characterization of graphite impurities would significantly improve the quality of the HTTR benchmark assessment. Simulation of the other reactor physics measurements are in good agreement with the benchmark experiment values. The complete benchmark evaluation details are available in the 2014 edition of the International Handbook of Evaluated Reactor Physics Benchmark Experiments.« less
NAS Grid Benchmarks. 1.0

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob; Frumkin, Michael; Biegel, Bryan A. (Technical Monitor)

2002-01-01

We provide a paper-and-pencil specification of a benchmark suite for computational grids. It is based on the NAS (NASA Advanced Supercomputing) Parallel Benchmarks (NPB) and is called the NAS Grid Benchmarks (NGB). NGB problems are presented as data flow graphs encapsulating an instance of a slightly modified NPB task in each graph node, which communicates with other nodes by sending/receiving initialization data. Like NPB, NGB specifies several different classes (problem sizes). In this report we describe classes S, W, and A, and provide verification values for each. The implementor has the freedom to choose any language, grid environment, security model, fault tolerance/error correction mechanism, etc., as long as the resulting implementation passes the verification test and reports the turnaround time of the benchmark.

On the accuracy of density functional theory and wave function methods for calculating vertical ionization energies

DOE Office of Scientific and Technical Information (OSTI.GOV)

McKechnie, Scott; Booth, George H.; Cohen, Aron J.

The best practice in computational methods for determining vertical ionization energies (VIEs) is assessed, via reference to experimentally determined VIEs that are corroborated by highly accurate coupled-cluster calculations. These reference values are used to benchmark the performance of density-functional theory (DFT) and wave function methods: Hartree-Fock theory (HF), second-order Møller-Plesset perturbation theory (MP2) and Electron Propagator Theory (EPT). The core test set consists of 147 small molecules. An extended set of six larger molecules, from benzene to hexacene, is also considered to investigate the dependence of the results on molecule size. The closest agreement with experiment is found for ionizationmore » energies obtained from total energy diff calculations. In particular, DFT calculations using exchange-correlation functionals with either a large amount of exact exchange or long-range correction perform best. The results from these functionals are also the least sensitive to an increase in molecule size. In general, ionization energies calculated directly from the orbital energies of the neutral species are less accurate and more sensitive to an increase in molecule size. For the single-calculation approach, the EPT calculations are in closest agreement for both sets of molecules. For the orbital energies from DFT functionals, only those with long-range correction give quantitative agreement with dramatic failing for all other functionals considered. The results offer a practical hierarchy of approximations for the calculation of vertical ionization energies. In addition, the experimental and computational reference values can be used as a standardized set of benchmarks, against which other approximate methods can be compared.« less
A review of current practices to increase Chlamydia screening in the community--a consumer-centred social marketing perspective.

PubMed

Phillipson, Lyn; Gordon, Ross; Telenta, Joanne; Magee, Chris; Janssen, Marty

2016-02-01

Chlamydia trachomatis is one of the most frequently reported sexually transmitted infections (STI) in Australia, the UK and Europe. Yet, rates of screening for STIs remain low, especially in younger adults. To assess effectiveness of Chlamydia screening interventions targeting young adults in community-based settings, describe strategies utilized and assess them according to social marketing benchmark criteria. A systematic review of relevant literature between 2002 and 2012 in Medline, Web of Knowledge, PubMed, Scopus and the Cumulative Index to Nursing and Allied Health was undertaken. Of 18 interventions identified, quality of evidence was low. Proportional screening rates varied, ranging from: 30.9 to 62.5% in educational settings (n = 4), 4.8 to 63% in media settings (n = 6) and from 5.7 to 44.5% in other settings (n = 7). Assessment against benchmark criteria found that interventions incorporating social marketing principles were more likely to achieve positive results, yet few did this comprehensively. Most demonstrated customer orientation and addressed barriers to presenting to a clinic for screening. Only one addressed barriers to presenting for treatment after a positive result. Promotional messages typically focused on providing facts and accessing a testing kit. Risk assessment tools appeared to promote screening among higher risk groups. Few evaluated treatment rates following positive results; therefore, impact of screening on treatment rates remains unknown. Future interventions should consider utilizing a comprehensive social marketing approach, using formative research to increase insight and segmentation and tailoring of screening interventions. Easy community access to both screening and treatment should be prioritized. © 2015 John Wiley & Sons Ltd.
A set-covering based heuristic algorithm for the periodic vehicle routing problem.

PubMed

Cacchiani, V; Hemmelmayr, V C; Tricoire, F

2014-01-30

We present a hybrid optimization algorithm for mixed-integer linear programming, embedding both heuristic and exact components. In order to validate it we use the periodic vehicle routing problem (PVRP) as a case study. This problem consists of determining a set of minimum cost routes for each day of a given planning horizon, with the constraints that each customer must be visited a required number of times (chosen among a set of valid day combinations), must receive every time the required quantity of product, and that the number of routes per day (each respecting the capacity of the vehicle) does not exceed the total number of available vehicles. This is a generalization of the well-known vehicle routing problem (VRP). Our algorithm is based on the linear programming (LP) relaxation of a set-covering-like integer linear programming formulation of the problem, with additional constraints. The LP-relaxation is solved by column generation, where columns are generated heuristically by an iterated local search algorithm. The whole solution method takes advantage of the LP-solution and applies techniques of fixing and releasing of the columns as a local search, making use of a tabu list to avoid cycling. We show the results of the proposed algorithm on benchmark instances from the literature and compare them to the state-of-the-art algorithms, showing the effectiveness of our approach in producing good quality solutions. In addition, we report the results on realistic instances of the PVRP introduced in Pacheco et al. (2011) [24] and on benchmark instances of the periodic traveling salesman problem (PTSP), showing the efficacy of the proposed algorithm on these as well. Finally, we report the new best known solutions found for all the tested problems.
A set-covering based heuristic algorithm for the periodic vehicle routing problem

PubMed Central

Cacchiani, V.; Hemmelmayr, V.C.; Tricoire, F.

2014-01-01

We present a hybrid optimization algorithm for mixed-integer linear programming, embedding both heuristic and exact components. In order to validate it we use the periodic vehicle routing problem (PVRP) as a case study. This problem consists of determining a set of minimum cost routes for each day of a given planning horizon, with the constraints that each customer must be visited a required number of times (chosen among a set of valid day combinations), must receive every time the required quantity of product, and that the number of routes per day (each respecting the capacity of the vehicle) does not exceed the total number of available vehicles. This is a generalization of the well-known vehicle routing problem (VRP). Our algorithm is based on the linear programming (LP) relaxation of a set-covering-like integer linear programming formulation of the problem, with additional constraints. The LP-relaxation is solved by column generation, where columns are generated heuristically by an iterated local search algorithm. The whole solution method takes advantage of the LP-solution and applies techniques of fixing and releasing of the columns as a local search, making use of a tabu list to avoid cycling. We show the results of the proposed algorithm on benchmark instances from the literature and compare them to the state-of-the-art algorithms, showing the effectiveness of our approach in producing good quality solutions. In addition, we report the results on realistic instances of the PVRP introduced in Pacheco et al. (2011) [24] and on benchmark instances of the periodic traveling salesman problem (PTSP), showing the efficacy of the proposed algorithm on these as well. Finally, we report the new best known solutions found for all the tested problems. PMID:24748696
Benchmarking network for clinical and humanistic outcomes in diabetes (BENCH-D) study: protocol, tools, and population.

PubMed

Nicolucci, Antonio; Rossi, Maria C; Pellegrini, Fabio; Lucisano, Giuseppe; Pintaudi, Basilio; Gentile, Sandro; Marra, Giampiero; Skovlund, Soren E; Vespasiani, Giacomo

2014-01-01

In the context of the DAWN-2 initiatives, the BENCH-D Study aims to test a model of regional benchmarking to improve not only the quality of diabetes care, but also patient-centred outcomes. As part of the AMD-Annals quality improvement program, 32 diabetes clinics in 4 Italian regions extracted clinical data from electronic databases for measuring process and outcome quality indicators. A random sample of patients with type 2 diabetes filled in a questionnaire including validated instruments to assess patient-centred indicators: SF-12 Health Survey, WHO-5 Well-Being Index, Diabetes Empowerment Scale, Problem Areas in Diabetes, Health Care Climate Questionnaire, Patients Assessment of Chronic Illness Care, Barriers to Medications, Patient Support, Diabetes Self-care Activities, and Global Satisfaction for Diabetes Treatment. Data were discussed with participants in regional meetings. Main problems, obstacles and solutions were identified through a standardized process, and a regional mandate was produced to drive the priority actions. Overall, clinical indicators on 78,854 patients have been measured; additionally, 2,390 patients filled-in the questionnaire. The regional mandates were officially launched in March 2012. Clinical and patient-centred indicators will be evaluated again after 18 months. A final assessment of clinical indicators will take place after 30 months. In the context of the BENCH-D study, a set of instruments has been validated to measure patient well-being and satisfaction with the care. In the four regional meetings, different priorities were identified, reflecting different organizational resources of the different areas. In all the regions, a major challenge was represented by the need of skills and instruments to address psychosocial issues of people with diabetes. The BENCH-D study allows a field testing of benchmarking activities focused on clinical and patient-centred indicators.
Collected notes from the Benchmarks and Metrics Workshop

NASA Technical Reports Server (NTRS)

Drummond, Mark E.; Kaelbling, Leslie P.; Rosenschein, Stanley J.

1991-01-01

In recent years there has been a proliferation of proposals in the artificial intelligence (AI) literature for integrated agent architectures. Each architecture offers an approach to the general problem of constructing an integrated agent. Unfortunately, the ways in which one architecture might be considered better than another are not always clear. There has been a growing realization that many of the positive and negative aspects of an architecture become apparent only when experimental evaluation is performed and that to progress as a discipline, we must develop rigorous experimental methods. In addition to the intrinsic intellectual interest of experimentation, rigorous performance evaluation of systems is also a crucial practical concern to our research sponsors. DARPA, NASA, and AFOSR (among others) are actively searching for better ways of experimentally evaluating alternative approaches to building intelligent agents. One tool for experimental evaluation involves testing systems on benchmark tasks in order to assess their relative performance. As part of a joint DARPA and NASA funded project, NASA-Ames and Teleos Research are carrying out a research effort to establish a set of benchmark tasks and evaluation metrics by which the performance of agent architectures may be determined. As part of this project, we held a workshop on Benchmarks and Metrics at the NASA Ames Research Center on June 25, 1990. The objective of the workshop was to foster early discussion on this important topic. We did not achieve a consensus, nor did we expect to. Collected here is some of the information that was exchanged at the workshop. Given here is an outline of the workshop, a list of the participants, notes taken on the white-board during open discussions, position papers/notes from some participants, and copies of slides used in the presentations.
Benchmarking and testing the "Sea Level Equation

NASA Astrophysics Data System (ADS)

Spada, G.; Barletta, V. R.; Klemann, V.; van der Wal, W.; James, T. S.; Simon, K.; Riva, R. E. M.; Martinec, Z.; Gasperini, P.; Lund, B.; Wolf, D.; Vermeersen, L. L. A.; King, M. A.

2012-04-01

The study of the process of Glacial Isostatic Adjustment (GIA) and of the consequent sea level variations is gaining an increasingly important role within the geophysical community. Understanding the response of the Earth to the waxing and waning ice sheets is crucial in various contexts, ranging from the interpretation of modern satellite geodetic measurements to the projections of future sea level trends in response to climate change. All the processes accompanying GIA can be described solving the so-called Sea Level Equation (SLE), an integral equation that accounts for the interactions between the ice sheets, the solid Earth, and the oceans. Modern approaches to the SLE are based on various techniques that range from purely analytical formulations to fully numerical methods. Despite various teams independently investigating GIA, we do not have a suitably large set of agreed numerical results through which the methods may be validated. Following the example of the mantle convection community and our recent successful Benchmark for Post Glacial Rebound codes (Spada et al., 2011, doi: 10.1111/j.1365-246X.2011.04952.x), here we present the results of a benchmark study of independently developed codes designed to solve the SLE. This study has taken place within a collaboration facilitated through the European Cooperation in Science and Technology (COST) Action ES0701. The tests involve predictions of past and current sea level variations, and 3D deformations of the Earth surface. In spite of the signi?cant differences in the numerical methods employed, the test computations performed so far show a satisfactory agreement between the results provided by the participants. The differences found, which can be often attributed to the different numerical algorithms employed within the community, help to constrain the intrinsic errors in model predictions. These are of fundamental importance for a correct interpretation of the geodetic variations observed today, and particularly for the evaluation of climate-driven sea level variations.
Valence and charge-transfer optical properties for some SinCm (m, n ≤ 12) clusters: Comparing TD-DFT, complete-basis-limit EOMCC, and benchmarks from spectroscopy.

PubMed

Lutz, Jesse J; Duan, Xiaofeng F; Ranasinghe, Duminda S; Jin, Yifan; Margraf, Johannes T; Perera, Ajith; Burggraf, Larry W; Bartlett, Rodney J

2018-05-07

Accurate optical characterization of the closo-Si 12 C 12 molecule is important to guide experimental efforts toward the synthesis of nano-wires, cyclic nano-arrays, and related array structures, which are anticipated to be robust and efficient exciton materials for opto-electronic devices. Working toward calibrated methods for the description of closo-Si 12 C 12 oligomers, various electronic structure approaches are evaluated for their ability to reproduce measured optical transitions of the SiC 2 , Si 2 C n (n = 1-3), and Si 3 C n (n = 1, 2) clusters reported earlier by Steglich and Maier [Astrophys. J. 801, 119 (2015)]. Complete-basis-limit equation-of-motion coupled-cluster (EOMCC) results are presented and a comparison is made between perturbative and renormalized non-iterative triples corrections. The effect of adding a renormalized correction for quadruples is also tested. Benchmark test sets derived from both measurement and high-level EOMCC calculations are then used to evaluate the performance of a variety of density functionals within the time-dependent density functional theory (TD-DFT) framework. The best-performing functionals are subsequently applied to predict valence TD-DFT excitation energies for the lowest-energy isomers of Si n C and Si n-1 C 7-n (n = 4-6). TD-DFT approaches are then applied to the Si n C n (n = 4-12) clusters and unique spectroscopic signatures of closo-Si 12 C 12 are discussed. Finally, various long-range corrected density functionals, including those from the CAM-QTP family, are applied to a charge-transfer excitation in a cyclic (Si 4 C 4 ) 4 oligomer. Approaches for gauging the extent of charge-transfer character are also tested and EOMCC results are used to benchmark functionals and make recommendations.
The Eighth Industrial Fluids Properties Simulation Challenge

PubMed Central

Schultz, Nathan E.; Ahmad, Riaz; Brennan, John K.; Frankel, Kevin A.; Moore, Jonathan D.; Moore, Joshua D.; Mountain, Raymond D.; Ross, Richard B.; Thommes, Matthias; Shen, Vincent K.; Siderius, Daniel W.; Smith, Kenneth D.

2016-01-01

The goal of the eighth industrial fluid properties simulation challenge was to test the ability of molecular simulation methods to predict the adsorption of organic adsorbates in activated carbon materials. In particular, the eighth challenge focused on the adsorption of perfluorohexane in the activated carbon BAM-109. Entrants were challenged to predict the adsorption in the carbon at 273 K and relative pressures of 0.1, 0.3, and 0.6. The predictions were judged by comparison to a benchmark set of experimentally determined values. Overall good agreement and consistency were found between the predictions of most entrants. PMID:27840542
Evaluation of the selection methods used in the exIWO algorithm based on the optimization of multidimensional functions

NASA Astrophysics Data System (ADS)

Kostrzewa, Daniel; Josiński, Henryk

2016-06-01

The expanded Invasive Weed Optimization algorithm (exIWO) is an optimization metaheuristic modelled on the original IWO version inspired by dynamic growth of weeds colony. The authors of the present paper have modified the exIWO algorithm introducing a set of both deterministic and non-deterministic strategies of individuals' selection. The goal of the project was to evaluate the modified exIWO by testing its usefulness for multidimensional numerical functions optimization. The optimized functions: Griewank, Rastrigin, and Rosenbrock are frequently used as benchmarks because of their characteristics.
Heuristic methods for the single machine scheduling problem with different ready times and a common due date

NASA Astrophysics Data System (ADS)

Birgin, Ernesto G.; Ronconi, Débora P.

2012-10-01

The single machine scheduling problem with a common due date and non-identical ready times for the jobs is examined in this work. Performance is measured by the minimization of the weighted sum of earliness and tardiness penalties of the jobs. Since this problem is NP-hard, the application of constructive heuristics that exploit specific characteristics of the problem to improve their performance is investigated. The proposed approaches are examined through a computational comparative study on a set of 280 benchmark test problems with up to 1000 jobs.
Acoustic Shielding for a Model Scale Counter-rotation Open Rotor

NASA Technical Reports Server (NTRS)

Stephens, David B.; Edmane, Envia

2012-01-01

The noise shielding benefit of installing an open rotor above a simplified wing or tail is explored experimentally. The test results provide both a benchmark data set for validating shielding prediction tools and an opportunity for a system level evaluation of the noise reduction potential of propulsion noise shielding by an airframe component. A short barrier near the open rotor was found to provide up to 8.5 dB of attenuation at some directivity angles, with tonal sound particularly well shielded. Predictions from two simple shielding theories were found to overestimate the shielding benefit.
OWL2 benchmarking for the evaluation of knowledge based systems.

PubMed

Khan, Sher Afgun; Qadir, Muhammad Abdul; Abbas, Muhammad Azeem; Afzal, Muhammad Tanvir

2017-01-01

OWL2 semantics are becoming increasingly popular for the real domain applications like Gene engineering and health MIS. The present work identifies the research gap that negligible attention has been paid to the performance evaluation of Knowledge Base Systems (KBS) using OWL2 semantics. To fulfil this identified research gap, an OWL2 benchmark for the evaluation of KBS is proposed. The proposed benchmark addresses the foundational blocks of an ontology benchmark i.e. data schema, workload and performance metrics. The proposed benchmark is tested on memory based, file based, relational database and graph based KBS for performance and scalability measures. The results show that the proposed benchmark is able to evaluate the behaviour of different state of the art KBS on OWL2 semantics. On the basis of the results, the end users (i.e. domain expert) would be able to select a suitable KBS appropriate for his domain.
Comparing a Coevolutionary Genetic Algorithm for Multiobjective Optimization

NASA Technical Reports Server (NTRS)

Lohn, Jason D.; Kraus, William F.; Haith, Gary L.; Clancy, Daniel (Technical Monitor)

2002-01-01

We present results from a study comparing a recently developed coevolutionary genetic algorithm (CGA) against a set of evolutionary algorithms using a suite of multiobjective optimization benchmarks. The CGA embodies competitive coevolution and employs a simple, straightforward target population representation and fitness calculation based on developmental theory of learning. Because of these properties, setting up the additional population is trivial making implementation no more difficult than using a standard GA. Empirical results using a suite of two-objective test functions indicate that this CGA performs well at finding solutions on convex, nonconvex, discrete, and deceptive Pareto-optimal fronts, while giving respectable results on a nonuniform optimization. On a multimodal Pareto front, the CGA finds a solution that dominates solutions produced by eight other algorithms, yet the CGA has poor coverage across the Pareto front.
Benchmark Data Set for Wheat Growth Models: Field Experiments and AgMIP Multi-Model Simulations.

NASA Technical Reports Server (NTRS)

Asseng, S.; Ewert, F.; Martre, P.; Rosenzweig, C.; Jones, J. W.; Hatfield, J. L.; Ruane, A. C.; Boote, K. J.; Thorburn, P.J.; Rotter, R. P.

2015-01-01

The data set includes a current representative management treatment from detailed, quality-tested sentinel field experiments with wheat from four contrasting environments including Australia, The Netherlands, India and Argentina. Measurements include local daily climate data (solar radiation, maximum and minimum temperature, precipitation, surface wind, dew point temperature, relative humidity, and vapor pressure), soil characteristics, frequent growth, nitrogen in crop and soil, crop and soil water and yield components. Simulations include results from 27 wheat models and a sensitivity analysis with 26 models and 30 years (1981-2010) for each location, for elevated atmospheric CO2 and temperature changes, a heat stress sensitivity analysis at anthesis, and a sensitivity analysis with soil and crop management variations and a Global Climate Model end-century scenario.
Predicting Presynaptic and Postsynaptic Neurotoxins by Developing Feature Selection Technique

PubMed Central

Yang, Yunchun; Zhang, Chunmei; Chen, Rong; Huang, Po

2017-01-01

Presynaptic and postsynaptic neurotoxins are proteins which act at the presynaptic and postsynaptic membrane. Correctly predicting presynaptic and postsynaptic neurotoxins will provide important clues for drug-target discovery and drug design. In this study, we developed a theoretical method to discriminate presynaptic neurotoxins from postsynaptic neurotoxins. A strict and objective benchmark dataset was constructed to train and test our proposed model. The dipeptide composition was used to formulate neurotoxin samples. The analysis of variance (ANOVA) was proposed to find out the optimal feature set which can produce the maximum accuracy. In the jackknife cross-validation test, the overall accuracy of 94.9% was achieved. We believe that the proposed model will provide important information to study neurotoxins. PMID:28303250
Discriminant analysis for fast multiclass data classification through regularized kernel function approximation.

PubMed

Ghorai, Santanu; Mukherjee, Anirban; Dutta, Pranab K

2010-06-01

In this brief we have proposed the multiclass data classification by computationally inexpensive discriminant analysis through vector-valued regularized kernel function approximation (VVRKFA). VVRKFA being an extension of fast regularized kernel function approximation (FRKFA), provides the vector-valued response at single step. The VVRKFA finds a linear operator and a bias vector by using a reduced kernel that maps a pattern from feature space into the low dimensional label space. The classification of patterns is carried out in this low dimensional label subspace. A test pattern is classified depending on its proximity to class centroids. The effectiveness of the proposed method is experimentally verified and compared with multiclass support vector machine (SVM) on several benchmark data sets as well as on gene microarray data for multi-category cancer classification. The results indicate the significant improvement in both training and testing time compared to that of multiclass SVM with comparable testing accuracy principally in large data sets. Experiments in this brief also serve as comparison of performance of VVRKFA with stratified random sampling and sub-sampling.
Using scientific evidence to improve hospital library services: Southern Chapter/Medical Library Association journal usage study.

PubMed

Dee, C R; Rankin, J A; Burns, C A

1998-07-01

Journal usage studies, which are useful for budget management and for evaluating collection performance relative to library use, have generally described a single library or subject discipline. The Southern Chapter/Medical Library Association (SC/MLA) study has examined journal usage at the aggregate data level with the long-term goal of developing hospital library benchmarks for journal use. Thirty-six SC/MLA hospital libraries, categorized for the study by size as small, medium, or large, reported current journal title use centrally for a one-year period following standardized data collection procedures. Institutional and aggregate data were analyzed for the average annual frequency of use, average costs per use and non-use, and average percent of non-used titles. Permutation F-type tests were used to measure difference among the three hospital groups. Averages were reported for each data set analysis. Statistical tests indicated no significant differences between the hospital groups, suggesting that benchmarks can be derived applying to all types of hospital libraries. The unanticipated lack of commonality among heavily used titles pointed to a need for uniquely tailored collections. Although the small sample size precluded definitive results, the study's findings constituted a baseline of data that can be compared against future studies.
Using scientific evidence to improve hospital library services: Southern Chapter/Medical Library Association journal usage study.

PubMed Central

Dee, C R; Rankin, J A; Burns, C A

1998-01-01

BACKGROUND: Journal usage studies, which are useful for budget management and for evaluating collection performance relative to library use, have generally described a single library or subject discipline. The Southern Chapter/Medical Library Association (SC/MLA) study has examined journal usage at the aggregate data level with the long-term goal of developing hospital library benchmarks for journal use. METHODS: Thirty-six SC/MLA hospital libraries, categorized for the study by size as small, medium, or large, reported current journal title use centrally for a one-year period following standardized data collection procedures. Institutional and aggregate data were analyzed for the average annual frequency of use, average costs per use and non-use, and average percent of non-used titles. Permutation F-type tests were used to measure difference among the three hospital groups. RESULTS: Averages were reported for each data set analysis. Statistical tests indicated no significant differences between the hospital groups, suggesting that benchmarks can be derived applying to all types of hospital libraries. The unanticipated lack of commonality among heavily used titles pointed to a need for uniquely tailored collections. CONCLUSION: Although the small sample size precluded definitive results, the study's findings constituted a baseline of data that can be compared against future studies. PMID:9681164
Optimization and experimental validation of a thermal cycle that maximizes entropy coefficient fisher identifiability for lithium iron phosphate cells

NASA Astrophysics Data System (ADS)

Mendoza, Sergio; Rothenberger, Michael; Hake, Alison; Fathy, Hosam

2016-03-01

This article presents a framework for optimizing the thermal cycle to estimate a battery cell's entropy coefficient at 20% state of charge (SOC). Our goal is to maximize Fisher identifiability: a measure of the accuracy with which a parameter can be estimated. Existing protocols in the literature for estimating entropy coefficients demand excessive laboratory time. Identifiability optimization makes it possible to achieve comparable accuracy levels in a fraction of the time. This article demonstrates this result for a set of lithium iron phosphate (LFP) cells. We conduct a 24-h experiment to obtain benchmark measurements of their entropy coefficients. We optimize a thermal cycle to maximize parameter identifiability for these cells. This optimization proceeds with respect to the coefficients of a Fourier discretization of this thermal cycle. Finally, we compare the estimated parameters using (i) the benchmark test, (ii) the optimized protocol, and (iii) a 15-h test from the literature (by Forgez et al.). The results are encouraging for two reasons. First, they confirm the simulation-based prediction that the optimized experiment can produce accurate parameter estimates in 2 h, compared to 15-24. Second, the optimized experiment also estimates a thermal time constant representing the effects of thermal capacitance and convection heat transfer.

Assessment of the Accuracy of the Bethe-Salpeter (BSE/GW) Oscillator Strengths.

PubMed

Jacquemin, Denis; Duchemin, Ivan; Blondel, Aymeric; Blase, Xavier

2016-08-09

Aiming to assess the accuracy of the oscillator strengths determined at the BSE/GW level, we performed benchmark calculations using three complementary sets of molecules. In the first, we considered ∼80 states in Thiel's set of compounds and compared the BSE/GW oscillator strengths to recently determined ADC(3/2) and CC3 reference values. The second set includes the oscillator strengths of the low-lying states of 80 medium to large dyes for which we have determined CC2/aug-cc-pVTZ values. The third set contains 30 anthraquinones for which experimental oscillator strengths are available. We find that BSE/GW accurately reproduces the trends for all series with excellent correlation coefficients to the benchmark data and generally very small errors. Indeed, for Thiel's sets, the BSE/GW values are more accurate (using CC3 references) than both CC2 and ADC(3/2) values on both absolute and relative scales. For all three sets, BSE/GW errors also tend to be nicely spread with almost equal numbers of positive and negative deviations as compared to reference values.
Variation in assessment and standard setting practices across UK undergraduate medicine and the need for a benchmark.

PubMed

MacDougall, Margaret

2015-10-31

The principal aim of this study is to provide an account of variation in UK undergraduate medical assessment styles and corresponding standard setting approaches with a view to highlighting the importance of a UK national licensing exam in recognizing a common standard. Using a secure online survey system, response data were collected during the period 13 - 30 January 2014 from selected specialists in medical education assessment, who served as representatives for their respective medical schools. Assessment styles and corresponding choices of standard setting methods vary markedly across UK medical schools. While there is considerable consensus on the application of compensatory approaches, individual schools display their own nuances through use of hybrid assessment and standard setting styles, uptake of less popular standard setting techniques and divided views on norm referencing. The extent of variation in assessment and standard setting practices across UK medical schools validates the concern that there is a lack of evidence that UK medical students achieve a common standard on graduation. A national licensing exam is therefore a viable option for benchmarking the performance of all UK undergraduate medical students.
Selecting Students for Pre-Algebra: Examination of the Relative Utility of the Anchorage Pre-Algebra Screening Tests and the State of Alaska Standards Based Benchmark 2 Mathematics Study. An Examination of Consequential Validity and Recommendation.

ERIC Educational Resources Information Center

Fenton, Ray

This study examined the relative efficacy of the Anchorage (Alaska) Pre-Algebra Test and the State of Alaska Benchmark in 2 Math examination as tools used in the process of recommending grade 6 students for grade 7 Pre-Algebra placement. The consequential validity of the tests is explored in the context of class placements and grades earned. The…
Fast Neutron Spectrum Potassium Worth for Space Power Reactor Design Validation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bess, John D.; Marshall, Margaret A.; Briggs, J. Blair

2015-03-01

A variety of critical experiments were constructed of enriched uranium metal (oralloy ) during the 1960s and 1970s at the Oak Ridge Critical Experiments Facility (ORCEF) in support of criticality safety operations at the Y-12 Plant. The purposes of these experiments included the evaluation of storage, casting, and handling limits for the Y-12 Plant and providing data for verification of calculation methods and cross-sections for nuclear criticality safety applications. These included solid cylinders of various diameters, annuli of various inner and outer diameters, two and three interacting cylinders of various diameters, and graphite and polyethylene reflected cylinders and annuli. Ofmore » the hundreds of delayed critical experiments, one was performed that consisted of uranium metal annuli surrounding a potassium-filled, stainless steel can. The outer diameter of the annuli was approximately 13 inches (33.02 cm) with an inner diameter of 7 inches (17.78 cm). The diameter of the stainless steel can was 7 inches (17.78 cm). The critical height of the configurations was approximately 5.6 inches (14.224 cm). The uranium annulus consisted of multiple stacked rings, each with radial thicknesses of 1 inch (2.54 cm) and varying heights. A companion measurement was performed using empty stainless steel cans; the primary purpose of these experiments was to test the fast neutron cross sections of potassium as it was a candidate for coolant in some early space power reactor designs.The experimental measurements were performed on July 11, 1963, by J. T. Mihalczo and M. S. Wyatt (Ref. 1) with additional information in its corresponding logbook. Unreflected and unmoderated experiments with the same set of highly enriched uranium metal parts were performed at the Oak Ridge Critical Experiments Facility in the 1960s and are evaluated in the International Handbook for Evaluated Criticality Safety Benchmark Experiments (ICSBEP Handbook) with the identifier HEU MET FAST 051. Thin graphite reflected (2 inches or less) experiments also using the same set of highly enriched uranium metal parts are evaluated in HEU MET FAST 071. Polyethylene-reflected configurations are evaluated in HEU-MET-FAST-076. A stack of highly enriched metal discs with a thick beryllium top reflector is evaluated in HEU-MET-FAST-069, and two additional highly enriched uranium annuli with beryllium cores are evaluated in HEU-MET-FAST-059. Both detailed and simplified model specifications are provided in this evaluation. Both of these fast neutron spectra assemblies were determined to be acceptable benchmark experiments. The calculated eigenvalues for both the detailed and the simple benchmark models are within ~0.26 % of the benchmark values for Configuration 1 (calculations performed using MCNP6 with ENDF/B-VII.1 neutron cross section data), but under-calculate the benchmark values by ~7s because the uncertainty in the benchmark is very small: ~0.0004 (1s); for Configuration 2, the under-calculation is ~0.31 % and ~8s. Comparison of detailed and simple model calculations for the potassium worth measurement and potassium mass coefficient yield results approximately 70 – 80 % lower (~6s to 10s) than the benchmark values for the various nuclear data libraries utilized. Both the potassium worth and mass coefficient are also deemed to be acceptable benchmark experiment measurements.« less
Benchmarking protein-protein interface predictions: why you should care about protein size.

PubMed

Martin, Juliette

2014-07-01

A number of predictive methods have been developed to predict protein-protein binding sites. Each new method is traditionally benchmarked using sets of protein structures of various sizes, and global statistics are used to assess the quality of the prediction. Little attention has been paid to the potential bias due to protein size on these statistics. Indeed, small proteins involve proportionally more residues at interfaces than large ones. If a predictive method is biased toward small proteins, this can lead to an over-estimation of its performance. Here, we investigate the bias due to the size effect when benchmarking protein-protein interface prediction on the widely used docking benchmark 4.0. First, we simulate random scores that favor small proteins over large ones. Instead of the 0.5 AUC (Area Under the Curve) value expected by chance, these biased scores result in an AUC equal to 0.6 using hypergeometric distributions, and up to 0.65 using constant scores. We then use real prediction results to illustrate how to detect the size bias by shuffling, and subsequently correct it using a simple conversion of the scores into normalized ranks. In addition, we investigate the scores produced by eight published methods and show that they are all affected by the size effect, which can change their relative ranking. The size effect also has an impact on linear combination scores by modifying the relative contributions of each method. In the future, systematic corrections should be applied when benchmarking predictive methods using data sets with mixed protein sizes. © 2014 Wiley Periodicals, Inc.
Using a visual plate waste study to monitor menu performance.

PubMed

Connors, Priscilla L; Rozell, Sarah B

2004-01-01

Two visual plate waste studies were conducted in 1-week phases over a 1-year period in an acute care hospital. A total of 383 trays were evaluated in the first phase and 467 in the second. Food items were ranked for consumption from a low (1) to high (6) score, with a score of 4.0 set as the benchmark denoting a minimum level of acceptable consumption. In the first phase two entrees, four starches, all of the vegetables, sliced white bread, and skim milk scored below the benchmark. As a result six menu items were replaced and one was modified. In the second phase all entrees scored at or above 4.0, as did seven vegetables, and a dinner roll that replaced sliced white bread. Skim milk continued to score below the benchmark. A visual plate waste study assists in benchmarking performance, planning menu changes, and assessing effectiveness.
Results of the GABLS3 diurnal-cycle benchmark for wind energy applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rodrigo, J. Sanz; Allaerts, D.; Avila, M.

We present results of the GABLS3 model intercomparison benchmark revisited for wind energy applications. The case consists of a diurnal cycle, measured at the 200-m tall Cabauw tower in the Netherlands, including a nocturnal low-level jet. The benchmark includes a sensitivity analysis of WRF simulations using two input meteorological databases and five planetary boundary-layer schemes. A reference set of mesoscale tendencies is used to drive microscale simulations using RANS k-ϵ and LES turbulence models. The validation is based on rotor-based quantities of interest. Cycle-integrated mean absolute errors are used to quantify model performance. The results of the benchmark are usedmore » to discuss input uncertainties from mesoscale modelling, different meso-micro coupling strategies (online vs offline) and consistency between RANS and LES codes when dealing with boundary-layer mean flow quantities. Altogether, all the microscale simulations produce a consistent coupling with mesoscale forcings.« less
Results of the GABLS3 diurnal-cycle benchmark for wind energy applications

DOE PAGES

Rodrigo, J. Sanz; Allaerts, D.; Avila, M.; ...

2017-06-13

We present results of the GABLS3 model intercomparison benchmark revisited for wind energy applications. The case consists of a diurnal cycle, measured at the 200-m tall Cabauw tower in the Netherlands, including a nocturnal low-level jet. The benchmark includes a sensitivity analysis of WRF simulations using two input meteorological databases and five planetary boundary-layer schemes. A reference set of mesoscale tendencies is used to drive microscale simulations using RANS k-ϵ and LES turbulence models. The validation is based on rotor-based quantities of interest. Cycle-integrated mean absolute errors are used to quantify model performance. The results of the benchmark are usedmore » to discuss input uncertainties from mesoscale modelling, different meso-micro coupling strategies (online vs offline) and consistency between RANS and LES codes when dealing with boundary-layer mean flow quantities. Altogether, all the microscale simulations produce a consistent coupling with mesoscale forcings.« less
MIPS bacterial genomes functional annotation benchmark dataset.

PubMed

Tetko, Igor V; Brauner, Barbara; Dunger-Kaltenbach, Irmtraud; Frishman, Goar; Montrone, Corinna; Fobo, Gisela; Ruepp, Andreas; Antonov, Alexey V; Surmeli, Dimitrij; Mewes, Hans-Wernen

2005-05-15

Any development of new methods for automatic functional annotation of proteins according to their sequences requires high-quality data (as benchmark) as well as tedious preparatory work to generate sequence parameters required as input data for the machine learning methods. Different program settings and incompatible protocols make a comparison of the analyzed methods difficult. The MIPS Bacterial Functional Annotation Benchmark dataset (MIPS-BFAB) is a new, high-quality resource comprising four bacterial genomes manually annotated according to the MIPS functional catalogue (FunCat). These resources include precalculated sequence parameters, such as sequence similarity scores, InterPro domain composition and other parameters that could be used to develop and benchmark methods for functional annotation of bacterial protein sequences. These data are provided in XML format and can be used by scientists who are not necessarily experts in genome annotation. BFAB is available at http://mips.gsf.de/proj/bfab
A benchmark study of the sea-level equation in GIA modelling

NASA Astrophysics Data System (ADS)

Martinec, Zdenek; Klemann, Volker; van der Wal, Wouter; Riva, Riccardo; Spada, Giorgio; Simon, Karen; Blank, Bas; Sun, Yu; Melini, Daniele; James, Tom; Bradley, Sarah

2017-04-01

The sea-level load in glacial isostatic adjustment (GIA) is described by the so called sea-level equation (SLE), which represents the mass redistribution between ice sheets and oceans on a deforming earth. Various levels of complexity of SLE have been proposed in the past, ranging from a simple mean global sea level (the so-called eustatic sea level) to the load with a deforming ocean bottom, migrating coastlines and a changing shape of the geoid. Several approaches to solve the SLE have been derived, from purely analytical formulations to fully numerical methods. Despite various teams independently investigating GIA, there has been no systematic intercomparison amongst the solvers through which the methods may be validated. The goal of this paper is to present a series of benchmark experiments designed for testing and comparing numerical implementations of the SLE. Our approach starts with simple load cases even though the benchmark will not result in GIA predictions for a realistic loading scenario. In the longer term we aim for a benchmark with a realistic loading scenario, and also for benchmark solutions with rotational feedback. The current benchmark uses an earth model for which Love numbers have been computed and benchmarked in Spada et al (2011). In spite of the significant differences in the numerical methods employed, the test computations performed so far show a satisfactory agreement between the results provided by the participants. The differences found can often be attributed to the different approximations inherent to the various algorithms. Literature G. Spada, V. R. Barletta, V. Klemann, R. E. M. Riva, Z. Martinec, P. Gasperini, B. Lund, D. Wolf, L. L. A. Vermeersen, and M. A. King, 2011. A benchmark study for glacial isostatic adjustment codes. Geophys. J. Int. 185: 106-132 doi:10.1111/j.1365-
Experimental flutter boundaries with unsteady pressure distributions for the NACA 0012 Benchmark Model

NASA Technical Reports Server (NTRS)

Rivera, Jose A., Jr.; Dansberry, Bryan E.; Farmer, Moses G.; Eckstrom, Clinton V.; Seidel, David A.; Bennett, Robert M.

1991-01-01

The Structural Dynamics Div. at NASA-Langley has started a wind tunnel activity referred to as the Benchmark Models Program. The objective is to acquire test data that will be useful for developing and evaluating aeroelastic type Computational Fluid Dynamics codes currently in use or under development. The progress is described which was achieved in testing the first model in the Benchmark Models Program. Experimental flutter boundaries are presented for a rigid semispan model (NACA 0012 airfoil section) mounted on a flexible mount system. Also, steady and unsteady pressure measurements taken at the flutter condition are presented. The pressure data were acquired over the entire model chord located at the 60 pct. span station.
A Multi-Verse Optimizer with Levy Flights for Numerical Optimization and Its Application in Test Scheduling for Network-on-Chip.

PubMed

Hu, Cong; Li, Zhi; Zhou, Tian; Zhu, Aijun; Xu, Chuanpei

2016-01-01

We propose a new meta-heuristic algorithm named Levy flights multi-verse optimizer (LFMVO), which incorporates Levy flights into multi-verse optimizer (MVO) algorithm to solve numerical and engineering optimization problems. The Original MVO easily falls into stagnation when wormholes stochastically re-span a number of universes (solutions) around the best universe achieved over the course of iterations. Since Levy flights are superior in exploring unknown, large-scale search space, they are integrated into the previous best universe to force MVO out of stagnation. We test this method on three sets of 23 well-known benchmark test functions and an NP complete problem of test scheduling for Network-on-Chip (NoC). Experimental results prove that the proposed LFMVO is more competitive than its peers in both the quality of the resulting solutions and convergence speed.
A Multi-Verse Optimizer with Levy Flights for Numerical Optimization and Its Application in Test Scheduling for Network-on-Chip

PubMed Central

Hu, Cong; Li, Zhi; Zhou, Tian; Zhu, Aijun; Xu, Chuanpei

2016-01-01

We propose a new meta-heuristic algorithm named Levy flights multi-verse optimizer (LFMVO), which incorporates Levy flights into multi-verse optimizer (MVO) algorithm to solve numerical and engineering optimization problems. The Original MVO easily falls into stagnation when wormholes stochastically re-span a number of universes (solutions) around the best universe achieved over the course of iterations. Since Levy flights are superior in exploring unknown, large-scale search space, they are integrated into the previous best universe to force MVO out of stagnation. We test this method on three sets of 23 well-known benchmark test functions and an NP complete problem of test scheduling for Network-on-Chip (NoC). Experimental results prove that the proposed LFMVO is more competitive than its peers in both the quality of the resulting solutions and convergence speed. PMID:27926946
Toward Automated Benchmarking of Atomistic Force Fields: Neat Liquid Densities and Static Dielectric Constants from the ThermoML Data Archive.

PubMed

Beauchamp, Kyle A; Behr, Julie M; Rustenburg, Ariën S; Bayly, Christopher I; Kroenlein, Kenneth; Chodera, John D

2015-10-08

Atomistic molecular simulations are a powerful way to make quantitative predictions, but the accuracy of these predictions depends entirely on the quality of the force field employed. Although experimental measurements of fundamental physical properties offer a straightforward approach for evaluating force field quality, the bulk of this information has been tied up in formats that are not machine-readable. Compiling benchmark data sets of physical properties from non-machine-readable sources requires substantial human effort and is prone to the accumulation of human errors, hindering the development of reproducible benchmarks of force-field accuracy. Here, we examine the feasibility of benchmarking atomistic force fields against the NIST ThermoML data archive of physicochemical measurements, which aggregates thousands of experimental measurements in a portable, machine-readable, self-annotating IUPAC-standard format. As a proof of concept, we present a detailed benchmark of the generalized Amber small-molecule force field (GAFF) using the AM1-BCC charge model against experimental measurements (specifically, bulk liquid densities and static dielectric constants at ambient pressure) automatically extracted from the archive and discuss the extent of data available for use in larger scale (or continuously performed) benchmarks. The results of even this limited initial benchmark highlight a general problem with fixed-charge force fields in the representation low-dielectric environments, such as those seen in binding cavities or biological membranes.
Benchmarking: a method for continuous quality improvement in health.

PubMed

Ettorchi-Tardy, Amina; Levif, Marie; Michel, Philippe

2012-05-01

Benchmarking, a management approach for implementing best practices at best cost, is a recent concept in the healthcare system. The objectives of this paper are to better understand the concept and its evolution in the healthcare sector, to propose an operational definition, and to describe some French and international experiences of benchmarking in the healthcare sector. To this end, we reviewed the literature on this approach's emergence in the industrial sector, its evolution, its fields of application and examples of how it has been used in the healthcare sector. Benchmarking is often thought to consist simply of comparing indicators and is not perceived in its entirety, that is, as a tool based on voluntary and active collaboration among several organizations to create a spirit of competition and to apply best practices. The key feature of benchmarking is its integration within a comprehensive and participatory policy of continuous quality improvement (CQI). Conditions for successful benchmarking focus essentially on careful preparation of the process, monitoring of the relevant indicators, staff involvement and inter-organizational visits. Compared to methods previously implemented in France (CQI and collaborative projects), benchmarking has specific features that set it apart as a healthcare innovation. This is especially true for healthcare or medical-social organizations, as the principle of inter-organizational visiting is not part of their culture. Thus, this approach will need to be assessed for feasibility and acceptability before it is more widely promoted.
Benchmarking of venous thromboembolism prophylaxis practice with ENT.UK guidelines.

PubMed

Al-Qahtani, Ali S

2017-05-01

The aim of this study was to benchmark our guidelines of prevention of venous thromboembolism (VTE) in ENT surgical population against ENT.UK guidelines, and also to encourage healthcare providers to utilize benchmarking as an effective method of improving performance. The study design is prospective descriptive analysis. The setting of this study is tertiary referral centre (Assir Central Hospital, Abha, Saudi Arabia). In this study, we are benchmarking our practice guidelines of the prevention of VTE in the ENT surgical population against that of ENT.UK guidelines to mitigate any gaps. ENT guidelines 2010 were downloaded from the ENT.UK Website. Our guidelines were compared with the possibilities that either our performance meets or fall short of ENT.UK guidelines. Immediate corrective actions will take place if there is quality chasm between the two guidelines. ENT.UK guidelines are evidence-based and updated which may serve as role-model for adoption and benchmarking. Our guidelines were accordingly amended to contain all factors required in providing a quality service to ENT surgical patients. While not given appropriate attention, benchmarking is a useful tool in improving quality of health care. It allows learning from others' practices and experiences, and works towards closing any quality gaps. In addition, benchmarking clinical outcomes is critical for quality improvement and informing decisions concerning service provision. It is recommended to be included on the list of quality improvement methods of healthcare services.
Groundwater-quality data in the Santa Cruz, San Gabriel, and Peninsular Ranges Hard Rock Aquifers study unit, 2011-2012: results from the California GAMA program

USGS Publications Warehouse

Davis, Tracy A.; Shelton, Jennifer L.

2014-01-01

Results for constituents with nonregulatory benchmarks set for aesthetic concerns showed that iron concentrations greater than the CDPH secondary maximum contaminant level (SMCL-CA) of 300 μg/L were detected in samples from 19 grid wells. Manganese concentrations greater than the SMCL-CA of 50 μg/L were detected in 27 grid wells. Chloride was detected at a concentration greater than the SMCL-CA upper benchmark of 500 mg/L in one grid well. TDS concentrations in three grid wells were greater than the SMCL-CA upper benchmark of 1,000 mg/L.
Data Race Benchmark Collection

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liao, Chunhua; Lin, Pei-Hung; Asplund, Joshua

2017-03-21

This project is a benchmark suite of Open-MP parallel codes that have been checked for data races. The programs are marked to show which do and do not have races. This allows them to be leveraged while testing and developing race detection tools.
Sparse Solutions for Single Class SVMs: A Bi-Criterion Approach

NASA Technical Reports Server (NTRS)

Das, Santanu; Oza, Nikunj C.

2011-01-01

In this paper we propose an innovative learning algorithm - a variation of One-class nu Support Vector Machines (SVMs) learning algorithm to produce sparser solutions with much reduced computational complexities. The proposed technique returns an approximate solution, nearly as good as the solution set obtained by the classical approach, by minimizing the original risk function along with a regularization term. We introduce a bi-criterion optimization that helps guide the search towards the optimal set in much reduced time. The outcome of the proposed learning technique was compared with the benchmark one-class Support Vector machines algorithm which more often leads to solutions with redundant support vectors. Through out the analysis, the problem size for both optimization routines was kept consistent. We have tested the proposed algorithm on a variety of data sources under different conditions to demonstrate the effectiveness. In all cases the proposed algorithm closely preserves the accuracy of standard one-class nu SVMs while reducing both training time and test time by several factors.
Comparative study on gene set and pathway topology-based enrichment methods.

PubMed

Bayerlová, Michaela; Jung, Klaus; Kramer, Frank; Klemm, Florian; Bleckmann, Annalen; Beißbarth, Tim

2015-10-22

Enrichment analysis is a popular approach to identify pathways or sets of genes which are significantly enriched in the context of differentially expressed genes. The traditional gene set enrichment approach considers a pathway as a simple gene list disregarding any knowledge of gene or protein interactions. In contrast, the new group of so called pathway topology-based methods integrates the topological structure of a pathway into the analysis. We comparatively investigated gene set and pathway topology-based enrichment approaches, considering three gene set and four topological methods. These methods were compared in two extensive simulation studies and on a benchmark of 36 real datasets, providing the same pathway input data for all methods. In the benchmark data analysis both types of methods showed a comparable ability to detect enriched pathways. The first simulation study was conducted with KEGG pathways, which showed considerable gene overlaps between each other. In this study with original KEGG pathways, none of the topology-based methods outperformed the gene set approach. Therefore, a second simulation study was performed on non-overlapping pathways created by unique gene IDs. Here, methods accounting for pathway topology reached higher accuracy than the gene set methods, however their sensitivity was lower. We conducted one of the first comprehensive comparative works on evaluating gene set against pathway topology-based enrichment methods. The topological methods showed better performance in the simulation scenarios with non-overlapping pathways, however, they were not conclusively better in the other scenarios. This suggests that simple gene set approach might be sufficient to detect an enriched pathway under realistic circumstances. Nevertheless, more extensive studies and further benchmark data are needed to systematically evaluate these methods and to assess what gain and cost pathway topology information introduces into enrichment analysis. Both types of methods for enrichment analysis require further improvements in order to deal with the problem of pathway overlaps.

Benchmarking facilities providing care: An international overview of initiatives

PubMed Central

Thonon, Frédérique; Watson, Jonathan; Saghatchian, Mahasti

2015-01-01

We performed a literature review of existing benchmarking projects of health facilities to explore (1) the rationales for those projects, (2) the motivation for health facilities to participate, (3) the indicators used and (4) the success and threat factors linked to those projects. We studied both peer-reviewed and grey literature. We examined 23 benchmarking projects of different medical specialities. The majority of projects used a mix of structure, process and outcome indicators. For some projects, participants had a direct or indirect financial incentive to participate (such as reimbursement by Medicaid/Medicare or litigation costs related to quality of care). A positive impact was reported for most projects, mainly in terms of improvement of practice and adoption of guidelines and, to a lesser extent, improvement in communication. Only 1 project reported positive impact in terms of clinical outcomes. Success factors and threats are linked to both the benchmarking process (such as organisation of meetings, link with existing projects) and indicators used (such as adjustment for diagnostic-related groups). The results of this review will help coordinators of a benchmarking project to set it up successfully. PMID:26770800
GEN-IV Benchmarking of Triso Fuel Performance Models under accident conditions modeling input data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Collin, Blaise Paul

This document presents the benchmark plan for the calculation of particle fuel performance on safety testing experiments that are representative of operational accidental transients. The benchmark is dedicated to the modeling of fission product release under accident conditions by fuel performance codes from around the world, and the subsequent comparison to post-irradiation experiment (PIE) data from the modeled heating tests. The accident condition benchmark is divided into three parts: • The modeling of a simplified benchmark problem to assess potential numerical calculation issues at low fission product release. • The modeling of the AGR-1 and HFR-EU1bis safety testing experiments. •more » The comparison of the AGR-1 and HFR-EU1bis modeling results with PIE data. The simplified benchmark case, thereafter named NCC (Numerical Calculation Case), is derived from “Case 5” of the International Atomic Energy Agency (IAEA) Coordinated Research Program (CRP) on coated particle fuel technology [IAEA 2012]. It is included so participants can evaluate their codes at low fission product release. “Case 5” of the IAEA CRP-6 showed large code-to-code discrepancies in the release of fission products, which were attributed to “effects of the numerical calculation method rather than the physical model” [IAEA 2012]. The NCC is therefore intended to check if these numerical effects subsist. The first two steps imply the involvement of the benchmark participants with a modeling effort following the guidelines and recommendations provided by this document. The third step involves the collection of the modeling results by Idaho National Laboratory (INL) and the comparison of these results with the available PIE data. The objective of this document is to provide all necessary input data to model the benchmark cases, and to give some methodology guidelines and recommendations in order to make all results suitable for comparison with each other. The participants should read this document thoroughly to make sure all the data needed for their calculations is provided in the document. Missing data will be added to a revision of the document if necessary. 09/2016: Tables 6 and 8 updated. AGR-2 input data added« less
Generation IV benchmarking of TRISO fuel performance models under accident conditions: Modeling input data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Collin, Blaise P.

2014-09-01

This document presents the benchmark plan for the calculation of particle fuel performance on safety testing experiments that are representative of operational accidental transients. The benchmark is dedicated to the modeling of fission product release under accident conditions by fuel performance codes from around the world, and the subsequent comparison to post-irradiation experiment (PIE) data from the modeled heating tests. The accident condition benchmark is divided into three parts: the modeling of a simplified benchmark problem to assess potential numerical calculation issues at low fission product release; the modeling of the AGR-1 and HFR-EU1bis safety testing experiments; and, the comparisonmore » of the AGR-1 and HFR-EU1bis modeling results with PIE data. The simplified benchmark case, thereafter named NCC (Numerical Calculation Case), is derived from ''Case 5'' of the International Atomic Energy Agency (IAEA) Coordinated Research Program (CRP) on coated particle fuel technology [IAEA 2012]. It is included so participants can evaluate their codes at low fission product release. ''Case 5'' of the IAEA CRP-6 showed large code-to-code discrepancies in the release of fission products, which were attributed to ''effects of the numerical calculation method rather than the physical model''[IAEA 2012]. The NCC is therefore intended to check if these numerical effects subsist. The first two steps imply the involvement of the benchmark participants with a modeling effort following the guidelines and recommendations provided by this document. The third step involves the collection of the modeling results by Idaho National Laboratory (INL) and the comparison of these results with the available PIE data. The objective of this document is to provide all necessary input data to model the benchmark cases, and to give some methodology guidelines and recommendations in order to make all results suitable for comparison with each other. The participants should read this document thoroughly to make sure all the data needed for their calculations is provided in the document. Missing data will be added to a revision of the document if necessary.« less
TRUST. I. A 3D externally illuminated slab benchmark for dust radiative transfer

NASA Astrophysics Data System (ADS)

Gordon, K. D.; Baes, M.; Bianchi, S.; Camps, P.; Juvela, M.; Kuiper, R.; Lunttila, T.; Misselt, K. A.; Natale, G.; Robitaille, T.; Steinacker, J.

2017-07-01

Context. The radiative transport of photons through arbitrary three-dimensional (3D) structures of dust is a challenging problem due to the anisotropic scattering of dust grains and strong coupling between different spatial regions. The radiative transfer problem in 3D is solved using Monte Carlo or Ray Tracing techniques as no full analytic solution exists for the true 3D structures. Aims: We provide the first 3D dust radiative transfer benchmark composed of a slab of dust with uniform density externally illuminated by a star. This simple 3D benchmark is explicitly formulated to provide tests of the different components of the radiative transfer problem including dust absorption, scattering, and emission. Methods: The details of the external star, the slab itself, and the dust properties are provided. This benchmark includes models with a range of dust optical depths fully probing cases that are optically thin at all wavelengths to optically thick at most wavelengths. The dust properties adopted are characteristic of the diffuse Milky Way interstellar medium. This benchmark includes solutions for the full dust emission including single photon (stochastic) heating as well as two simplifying approximations: One where all grains are considered in equilibrium with the radiation field and one where the emission is from a single effective grain with size-distribution-averaged properties. A total of six Monte Carlo codes and one Ray Tracing code provide solutions to this benchmark. Results: The solution to this benchmark is given as global spectral energy distributions (SEDs) and images at select diagnostic wavelengths from the ultraviolet through the infrared. Comparison of the results revealed that the global SEDs are consistent on average to a few percent for all but the scattered stellar flux at very high optical depths. The image results are consistent within 10%, again except for the stellar scattered flux at very high optical depths. The lack of agreement between different codes of the scattered flux at high optical depths is quantified for the first time. Convergence tests using one of the Monte Carlo codes illustrate the sensitivity of the solutions to various model parameters. Conclusions: We provide the first 3D dust radiative transfer benchmark and validate the accuracy of this benchmark through comparisons between multiple independent codes and detailed convergence tests.
'Sportmotorische Bestandesaufnahme': criterion- vs. norm-based reference values of fitness tests for Swiss first grade children.

PubMed

Tomatis, Laura; Krebs, Andreas; Siegenthaler, Jessica; Murer, Kurt; de Bruin, Eling D

2015-01-01

Health is closely linked to physical activity and fitness. It is therefore important to monitor fitness in children. Although many reports on physical tests have been published, data comparison between studies is an issue. This study reports Swiss first grade norm values of fitness tests and compares these with criterion reference data. A total of 10,565 boys (7.18 ± 0.42 years) and 10,204 girls (7.14 ± 0.41 years) were tested for standing long jump, plate tapping, 20-m shuttle run, lateral jump and 20-m sprint. Average values for six-, seven- and eight-year-olds were analysed and reference curves for age were constructed. Z-values were generated for comparisons with criterion references reported in the literature. Results were better for all disciplines in seven-year-old first grade children compared to six-year-old children (p < 0.01). Eight-year-old children did not perform better compared to seven-year-old children in the sprint run (p = 0.11), standing long jump (p > 0.99) and shuttle run (p = 0.43), whereas they were better in all other disciplines compared to their younger peers. The average performance of boys was better than girls except for tapping at the age of 8 (p = 0.06). Differences in performance due to testing protocol and setting must be considered when test values from a first grade setting are compared to criterion-based benchmarks. In a classroom setting, younger children tended to have better results and older children tended to have worse outcomes when compared to their age group criterion reference values. Norm reference data are valid allowing comparison with other data generated by similar test protocols applied in a classroom setting.
Gene Ranking of RNA-Seq Data via Discriminant Non-Negative Matrix Factorization.

PubMed

Jia, Zhilong; Zhang, Xiang; Guan, Naiyang; Bo, Xiaochen; Barnes, Michael R; Luo, Zhigang

2015-01-01

RNA-sequencing is rapidly becoming the method of choice for studying the full complexity of transcriptomes, however with increasing dimensionality, accurate gene ranking is becoming increasingly challenging. This paper proposes an accurate and sensitive gene ranking method that implements discriminant non-negative matrix factorization (DNMF) for RNA-seq data. To the best of our knowledge, this is the first work to explore the utility of DNMF for gene ranking. When incorporating Fisher's discriminant criteria and setting the reduced dimension as two, DNMF learns two factors to approximate the original gene expression data, abstracting the up-regulated or down-regulated metagene by using the sample label information. The first factor denotes all the genes' weights of two metagenes as the additive combination of all genes, while the second learned factor represents the expression values of two metagenes. In the gene ranking stage, all the genes are ranked as a descending sequence according to the differential values of the metagene weights. Leveraging the nature of NMF and Fisher's criterion, DNMF can robustly boost the gene ranking performance. The Area Under the Curve analysis of differential expression analysis on two benchmarking tests of four RNA-seq data sets with similar phenotypes showed that our proposed DNMF-based gene ranking method outperforms other widely used methods. Moreover, the Gene Set Enrichment Analysis also showed DNMF outweighs others. DNMF is also computationally efficient, substantially outperforming all other benchmarked methods. Consequently, we suggest DNMF is an effective method for the analysis of differential gene expression and gene ranking for RNA-seq data.
Benchmarking the GW Approximation and Bethe–Salpeter Equation for Groups IB and IIB Atoms and Monoxides

DOE PAGES

Hung, Linda; Bruneval, Fabien; Baishya, Kopinjol; ...

2017-04-07

Energies from the GW approximation and the Bethe–Salpeter equation (BSE) are benchmarked against the excitation energies of transition-metal (Cu, Zn, Ag, and Cd) single atoms and monoxide anions. We demonstrate that best estimates of GW quasiparticle energies at the complete basis set limit should be obtained via extrapolation or closure relations, while numerically converged GW-BSE eigenvalues can be obtained on a finite basis set. Calculations using real-space wave functions and pseudopotentials are shown to give best-estimate GW energies that agree (up to the extrapolation error) with calculations using all-electron Gaussian basis sets. We benchmark the effects of a vertex approximationmore » (ΓLDA) and the mean-field starting point in GW and the BSE, performing computations using a real-space, transition-space basis and scalar-relativistic pseudopotentials. Here, while no variant of GW improves on perturbative G0W0 at predicting ionization energies, G0W0Γ LDA-BSE computations give excellent agreement with experimental absorption spectra as long as off-diagonal self-energy terms are included. We also present G0W0 quasiparticle energies for the CuO –, ZnO –, AgO –, and CdO – anions, in comparison to available anion photoelectron spectra.« less
Benchmarking the GW Approximation and Bethe–Salpeter Equation for Groups IB and IIB Atoms and Monoxides

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hung, Linda; Bruneval, Fabien; Baishya, Kopinjol

Energies from the GW approximation and the Bethe–Salpeter equation (BSE) are benchmarked against the excitation energies of transition-metal (Cu, Zn, Ag, and Cd) single atoms and monoxide anions. We demonstrate that best estimates of GW quasiparticle energies at the complete basis set limit should be obtained via extrapolation or closure relations, while numerically converged GW-BSE eigenvalues can be obtained on a finite basis set. Calculations using real-space wave functions and pseudopotentials are shown to give best-estimate GW energies that agree (up to the extrapolation error) with calculations using all-electron Gaussian basis sets. We benchmark the effects of a vertex approximationmore » (ΓLDA) and the mean-field starting point in GW and the BSE, performing computations using a real-space, transition-space basis and scalar-relativistic pseudopotentials. Here, while no variant of GW improves on perturbative G0W0 at predicting ionization energies, G0W0Γ LDA-BSE computations give excellent agreement with experimental absorption spectra as long as off-diagonal self-energy terms are included. We also present G0W0 quasiparticle energies for the CuO –, ZnO –, AgO –, and CdO – anions, in comparison to available anion photoelectron spectra.« less
Benchmark duration of work hours for development of fatigue symptoms in Japanese workers with adjustment for job-related stress.

PubMed

Suwazono, Yasushi; Dochi, Mirei; Kobayashi, Etsuko; Oishi, Mitsuhiro; Okubo, Yasushi; Tanaka, Kumihiko; Sakata, Kouichi

2008-12-01

The objective of this study was to calculate benchmark durations and lower 95% confidence limits for benchmark durations of working hours associated with subjective fatigue symptoms by applying the benchmark dose approach while adjusting for job-related stress using multiple logistic regression analyses. A self-administered questionnaire was completed by 3,069 male and 412 female daytime workers (age 18-67 years) in a Japanese steel company. The eight dependent variables in the Cumulative Fatigue Symptoms Index were decreased vitality, general fatigue, physical disorders, irritability, decreased willingness to work, anxiety, depressive feelings, and chronic tiredness. Independent variables were daily working hours, four subscales (job demand, job control, interpersonal relationship, and job suitability) of the Brief Job Stress Questionnaire, and other potential covariates. Using significant parameters for working hours and those for other covariates, the benchmark durations of working hours were calculated for the corresponding Index property. Benchmark response was set at 5% or 10%. Assuming a condition of worst job stress, the benchmark duration/lower 95% confidence limit for benchmark duration of working hours per day with a benchmark response of 5% or 10% were 10.0/9.4 or 11.7/10.7 (irritability) and 9.2/8.9 or 10.4/9.8 (chronic tiredness) in men and 8.9/8.4 or 9.8/8.9 (chronic tiredness) in women. The threshold amounts of working hours for fatigue symptoms under the worst job-related stress were very close to the standard daily working hours in Japan. The results strongly suggest that special attention should be paid to employees whose working hours exceed threshold amounts based on individual levels of job-related stress.
INL Results for Phases I and III of the OECD/NEA MHTGR-350 Benchmark

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gerhard Strydom; Javier Ortensi; Sonat Sen

2013-09-01

The Idaho National Laboratory (INL) Very High Temperature Reactor (VHTR) Technology Development Office (TDO) Methods Core Simulation group led the construction of the Organization for Economic Cooperation and Development (OECD) Modular High Temperature Reactor (MHTGR) 350 MW benchmark for comparing and evaluating prismatic VHTR analysis codes. The benchmark is sponsored by the OECD's Nuclear Energy Agency (NEA), and the project will yield a set of reference steady-state, transient, and lattice depletion problems that can be used by the Department of Energy (DOE), the Nuclear Regulatory Commission (NRC), and vendors to assess their code suits. The Methods group is responsible formore » defining the benchmark specifications, leading the data collection and comparison activities, and chairing the annual technical workshops. This report summarizes the latest INL results for Phase I (steady state) and Phase III (lattice depletion) of the benchmark. The INSTANT, Pronghorn and RattleSnake codes were used for the standalone core neutronics modeling of Exercise 1, and the results obtained from these codes are compared in Section 4. Exercise 2 of Phase I requires the standalone steady-state thermal fluids modeling of the MHTGR-350 design, and the results for the systems code RELAP5-3D are discussed in Section 5. The coupled neutronics and thermal fluids steady-state solution for Exercise 3 are reported in Section 6, utilizing the newly developed Parallel and Highly Innovative Simulation for INL Code System (PHISICS)/RELAP5-3D code suit. Finally, the lattice depletion models and results obtained for Phase III are compared in Section 7. The MHTGR-350 benchmark proved to be a challenging simulation set of problems to model accurately, and even with the simplifications introduced in the benchmark specification this activity is an important step in the code-to-code verification of modern prismatic VHTR codes. A final OECD/NEA comparison report will compare the Phase I and III results of all other international participants in 2014, while the remaining Phase II transient case results will be reported in 2015.« less
Benchmark datasets for phylogenomic pipeline validation, applications for foodborne pathogen surveillance

PubMed Central

Rand, Hugh; Shumway, Martin; Trees, Eija K.; Simmons, Mustafa; Agarwala, Richa; Davis, Steven; Tillman, Glenn E.; Defibaugh-Chavez, Stephanie; Carleton, Heather A.; Klimke, William A.; Katz, Lee S.

2017-01-01

Background As next generation sequence technology has advanced, there have been parallel advances in genome-scale analysis programs for determining evolutionary relationships as proxies for epidemiological relationship in public health. Most new programs skip traditional steps of ortholog determination and multi-gene alignment, instead identifying variants across a set of genomes, then summarizing results in a matrix of single-nucleotide polymorphisms or alleles for standard phylogenetic analysis. However, public health authorities need to document the performance of these methods with appropriate and comprehensive datasets so they can be validated for specific purposes, e.g., outbreak surveillance. Here we propose a set of benchmark datasets to be used for comparison and validation of phylogenomic pipelines. Methods We identified four well-documented foodborne pathogen events in which the epidemiology was concordant with routine phylogenomic analyses (reference-based SNP and wgMLST approaches). These are ideal benchmark datasets, as the trees, WGS data, and epidemiological data for each are all in agreement. We have placed these sequence data, sample metadata, and “known” phylogenetic trees in publicly-accessible databases and developed a standard descriptive spreadsheet format describing each dataset. To facilitate easy downloading of these benchmarks, we developed an automated script that uses the standard descriptive spreadsheet format. Results Our “outbreak” benchmark datasets represent the four major foodborne bacterial pathogens (Listeria monocytogenes, Salmonella enterica, Escherichia coli, and Campylobacter jejuni) and one simulated dataset where the “known tree” can be accurately called the “true tree”. The downloading script and associated table files are available on GitHub: https://github.com/WGS-standards-and-analysis/datasets. Discussion These five benchmark datasets will help standardize comparison of current and future phylogenomic pipelines, and facilitate important cross-institutional collaborations. Our work is part of a global effort to provide collaborative infrastructure for sequence data and analytic tools—we welcome additional benchmark datasets in our recommended format, and, if relevant, we will add these on our GitHub site. Together, these datasets, dataset format, and the underlying GitHub infrastructure present a recommended path for worldwide standardization of phylogenomic pipelines. PMID:29372115
Benchmark datasets for phylogenomic pipeline validation, applications for foodborne pathogen surveillance.

PubMed

Timme, Ruth E; Rand, Hugh; Shumway, Martin; Trees, Eija K; Simmons, Mustafa; Agarwala, Richa; Davis, Steven; Tillman, Glenn E; Defibaugh-Chavez, Stephanie; Carleton, Heather A; Klimke, William A; Katz, Lee S

2017-01-01

As next generation sequence technology has advanced, there have been parallel advances in genome-scale analysis programs for determining evolutionary relationships as proxies for epidemiological relationship in public health. Most new programs skip traditional steps of ortholog determination and multi-gene alignment, instead identifying variants across a set of genomes, then summarizing results in a matrix of single-nucleotide polymorphisms or alleles for standard phylogenetic analysis. However, public health authorities need to document the performance of these methods with appropriate and comprehensive datasets so they can be validated for specific purposes, e.g., outbreak surveillance. Here we propose a set of benchmark datasets to be used for comparison and validation of phylogenomic pipelines. We identified four well-documented foodborne pathogen events in which the epidemiology was concordant with routine phylogenomic analyses (reference-based SNP and wgMLST approaches). These are ideal benchmark datasets, as the trees, WGS data, and epidemiological data for each are all in agreement. We have placed these sequence data, sample metadata, and "known" phylogenetic trees in publicly-accessible databases and developed a standard descriptive spreadsheet format describing each dataset. To facilitate easy downloading of these benchmarks, we developed an automated script that uses the standard descriptive spreadsheet format. Our "outbreak" benchmark datasets represent the four major foodborne bacterial pathogens ( Listeria monocytogenes , Salmonella enterica , Escherichia coli , and Campylobacter jejuni ) and one simulated dataset where the "known tree" can be accurately called the "true tree". The downloading script and associated table files are available on GitHub: https://github.com/WGS-standards-and-analysis/datasets. These five benchmark datasets will help standardize comparison of current and future phylogenomic pipelines, and facilitate important cross-institutional collaborations. Our work is part of a global effort to provide collaborative infrastructure for sequence data and analytic tools-we welcome additional benchmark datasets in our recommended format, and, if relevant, we will add these on our GitHub site. Together, these datasets, dataset format, and the underlying GitHub infrastructure present a recommended path for worldwide standardization of phylogenomic pipelines.
How do I know if my forecasts are better? Using benchmarks in hydrological ensemble prediction

NASA Astrophysics Data System (ADS)

Pappenberger, F.; Ramos, M. H.; Cloke, H. L.; Wetterhall, F.; Alfieri, L.; Bogner, K.; Mueller, A.; Salamon, P.

2015-03-01

The skill of a forecast can be assessed by comparing the relative proximity of both the forecast and a benchmark to the observations. Example benchmarks include climatology or a naïve forecast. Hydrological ensemble prediction systems (HEPS) are currently transforming the hydrological forecasting environment but in this new field there is little information to guide researchers and operational forecasters on how benchmarks can be best used to evaluate their probabilistic forecasts. In this study, it is identified that the forecast skill calculated can vary depending on the benchmark selected and that the selection of a benchmark for determining forecasting system skill is sensitive to a number of hydrological and system factors. A benchmark intercomparison experiment is then undertaken using the continuous ranked probability score (CRPS), a reference forecasting system and a suite of 23 different methods to derive benchmarks. The benchmarks are assessed within the operational set-up of the European Flood Awareness System (EFAS) to determine those that are 'toughest to beat' and so give the most robust discrimination of forecast skill, particularly for the spatial average fields that EFAS relies upon. Evaluating against an observed discharge proxy the benchmark that has most utility for EFAS and avoids the most naïve skill across different hydrological situations is found to be meteorological persistency. This benchmark uses the latest meteorological observations of precipitation and temperature to drive the hydrological model. Hydrological long term average benchmarks, which are currently used in EFAS, are very easily beaten by the forecasting system and the use of these produces much naïve skill. When decomposed into seasons, the advanced meteorological benchmarks, which make use of meteorological observations from the past 20 years at the same calendar date, have the most skill discrimination. They are also good at discriminating skill in low flows and for all catchment sizes. Simpler meteorological benchmarks are particularly useful for high flows. Recommendations for EFAS are to move to routine use of meteorological persistency, an advanced meteorological benchmark and a simple meteorological benchmark in order to provide a robust evaluation of forecast skill. This work provides the first comprehensive evidence on how benchmarks can be used in evaluation of skill in probabilistic hydrological forecasts and which benchmarks are most useful for skill discrimination and avoidance of naïve skill in a large scale HEPS. It is recommended that all HEPS use the evidence and methodology provided here to evaluate which benchmarks to employ; so forecasters can have trust in their skill evaluation and will have confidence that their forecasts are indeed better.
Estimation of Δ R/ R values by benchmark study of the Mössbauer Isomer shifts for Ru, Os complexes using relativistic DFT calculations

NASA Astrophysics Data System (ADS)

Kaneko, Masashi; Yasuhara, Hiroki; Miyashita, Sunao; Nakashima, Satoru

2017-11-01

The present study applies all-electron relativistic DFT calculation with Douglas-Kroll-Hess (DKH) Hamiltonian to each ten sets of Ru and Os compounds. We perform the benchmark investigation of three density functionals (BP86, B3LYP and B2PLYP) using segmented all-electron relativistically contracted (SARC) basis set with the experimental Mössbauer isomer shifts for 99Ru and 189Os nuclides. Geometry optimizations at BP86 theory of level locate the structure in a local minimum. We calculate the contact density to the wavefunction obtained by a single point calculation. All functionals show the good linear correlation with experimental isomer shifts for both 99Ru and 189Os. Especially, B3LYP functional gives a stronger correlation compared to BP86 and B2PLYP functionals. The comparison of contact density between SARC and well-tempered basis set (WTBS) indicated that the numerical convergence of contact density cannot be obtained, but the reproducibility is less sensitive to the choice of basis set. We also estimate the values of Δ R/ R, which is an important nuclear constant, for 99Ru and 189Os nuclides by using the benchmark results. The sign of the calculated Δ R/ R values is consistent with the predicted data for 99Ru and 189Os. We obtain computationally the Δ R/ R values of 99Ru and 189Os (36.2 keV) as 2.35×10-4 and -0.20×10-4, respectively, at B3LYP level for SARC basis set.
The surface elevation table: marker horizon method for measuring wetland accretion and elevation dynamics

USGS Publications Warehouse

Callaway, John C.; Cahoon, Donald R.; Lynch, James C.

2014-01-01

Tidal wetlands are highly sensitive to processes that affect their elevation relative to sea level. The surface elevation table–marker horizon (SET–MH) method has been used to successfully measure these processes, including sediment accretion, changes in relative elevation, and shallow soil processes (subsidence and expansion due to root production). The SET–MH method is capable of measuring changes at very high resolution (±millimeters) and has been used worldwide both in natural wetlands and under experimental conditions. Marker horizons are typically deployed using feldspar over 50- by 50-cm plots, with replicate plots at each sampling location. Plots are sampled using a liquid N2 cryocorer that freezes a small sample, allowing the handling and measurement of soft and easily compressed soils with minimal compaction. The SET instrument is a portable device that is attached to a permanent benchmark to make high-precision measurements of wetland surface elevation. The SET instrument has evolved substantially in recent decades, and the current rod SET (RSET) is widely used. For the RSET, a 15-mm-diameter stainless steel rod is pounded into the ground until substantial resistance is achieved to establish a benchmark. The SET instrument is attached to the benchmark and leveled such that it reoccupies the same reference plane in space, and pins lowered from the instrument repeatedly measure the same point on the soil surface. Changes in the height of the lowered pins reflect changes in the soil surface. Permanent or temporary platforms provide access to SET and MH locations without disturbing the wetland surface.
New approaches in the indirect quantification of thermal rock properties in sedimentary basins: the well-log perspective

NASA Astrophysics Data System (ADS)

Fuchs, Sven; Balling, Niels; Förster, Andrea

2016-04-01

Numerical temperature models generated for geodynamic studies as well as for geothermal energy solutions heavily depend on rock thermal properties. Best practice for the determination of those parameters is the measurement of rock samples in the laboratory. Given the necessity to enlarge databases of subsurface rock parameters beyond drill core measurements an approach for the indirect determination of these parameters is developed, for rocks as well a for geological formations. We present new and universally applicable prediction equations for thermal conductivity, thermal diffusivity and specific heat capacity in sedimentary rocks derived from data provided by standard geophysical well logs. The approach is based on a data set of synthetic sedimentary rocks (clastic rocks, carbonates and evaporates) composed of mineral assemblages with variable contents of 15 major rock-forming minerals and porosities varying between 0 and 30%. Petrophysical properties are assigned to both the rock-forming minerals and the pore-filling fluids. Using multivariate statistics, relationships then were explored between each thermal property and well-logged petrophysical parameters (density, sonic interval transit time, hydrogen index, volume fraction of shale and photoelectric absorption index) on a regression sub set of data (70% of data) (Fuchs et al., 2015). Prediction quality was quantified on the remaining test sub set (30% of data). The combination of three to five well-log parameters results in predictions on the order of <15% for thermal conductivity and thermal diffusivity, and of <10% for specific heat capacity. Comparison of predicted and benchmark laboratory thermal conductivity from deep boreholes of the Norwegian-Danish Basin, the North German Basin, and the Molasse Basin results in 3 to 5% larger uncertainties with regard to the test data set. With regard to temperature models, the use of calculated TC borehole profiles approximate measured temperature logs with an error of <3°C along a 4 km deep profile. A benchmark comparison for thermal diffusivity and specific heat capacity is pending. Fuchs, Sven; Balling, Niels; Förster, Andrea (2015): Calculation of thermal conductivity, thermal diffusivity and specific heat capacity of sedimentary rocks using petrophysical well logs, Geophysical Journal International 203, 1977-2000, doi: 10.1093/gji/ggv403
Model fitting for small skin permeability data sets: hyperparameter optimisation in Gaussian Process Regression.

PubMed

Ashrafi, Parivash; Sun, Yi; Davey, Neil; Adams, Roderick G; Wilkinson, Simon C; Moss, Gary Patrick

2018-03-01

The aim of this study was to investigate how to improve predictions from Gaussian Process models by optimising the model hyperparameters. Optimisation methods, including Grid Search, Conjugate Gradient, Random Search, Evolutionary Algorithm and Hyper-prior, were evaluated and applied to previously published data. Data sets were also altered in a structured manner to reduce their size, which retained the range, or 'chemical space' of the key descriptors to assess the effect of the data range on model quality. The Hyper-prior Smoothbox kernel results in the best models for the majority of data sets, and they exhibited significantly better performance than benchmark quantitative structure-permeability relationship (QSPR) models. When the data sets were systematically reduced in size, the different optimisation methods generally retained their statistical quality, whereas benchmark QSPR models performed poorly. The design of the data set, and possibly also the approach to validation of the model, is critical in the development of improved models. The size of the data set, if carefully controlled, was not generally a significant factor for these models and that models of excellent statistical quality could be produced from substantially smaller data sets. © 2018 Royal Pharmaceutical Society.
Contributions to Integral Nuclear Data in ICSBEP and IRPhEP since ND 2013

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bess, John D.; Briggs, J. Blair; Gulliford, Jim

2016-09-01

The status of the International Criticality Safety Benchmark Evaluation Project (ICSBEP) and the International Reactor Physics Experiment Evaluation Project (IRPhEP) was last discussed directly with the international nuclear data community at ND2013. Since ND2013, integral benchmark data that are available for nuclear data testing has continued to increase. The status of the international benchmark efforts and the latest contributions to integral nuclear data for testing is discussed. Select benchmark configurations that have been added to the ICSBEP and IRPhEP Handbooks since ND2013 are highlighted. The 2015 edition of the ICSBEP Handbook now contains 567 evaluations with benchmark specifications for 4,874more » critical, near-critical, or subcritical configurations, 31 criticality alarm placement/shielding configuration with multiple dose points apiece, and 207 configurations that have been categorized as fundamental physics measurements that are relevant to criticality safety applications. The 2015 edition of the IRPhEP Handbook contains data from 143 different experimental series that were performed at 50 different nuclear facilities. Currently 139 of the 143 evaluations are published as approved benchmarks with the remaining four evaluations published in draft format only. Measurements found in the IRPhEP Handbook include criticality, buckling and extrapolation length, spectral characteristics, reactivity effects, reactivity coefficients, kinetics, reaction-rate distributions, power distributions, isotopic compositions, and/or other miscellaneous types of measurements for various types of reactor systems. Annual technical review meetings for both projects were held in April 2016; additional approved benchmark evaluations will be included in the 2016 editions of these handbooks.« less
Benchmarking Evaluation Results for Prototype Extravehicular Activity Gloves

NASA Technical Reports Server (NTRS)

Aitchison, Lindsay; McFarland, Shane

2012-01-01

The Space Suit Assembly (SSA) Development Team at NASA Johnson Space Center has invested heavily in the advancement of rear-entry planetary exploration suit design but largely deferred development of extravehicular activity (EVA) glove designs, and accepted the risk of using the current flight gloves, Phase VI, for unique mission scenarios outside the Space Shuttle and International Space Station (ISS) Program realm of experience. However, as design reference missions mature, the risks of using heritage hardware have highlighted the need for developing robust new glove technologies. To address the technology gap, the NASA Game-Changing Technology group provided start-up funding for the High Performance EVA Glove (HPEG) Project in the spring of 2012. The overarching goal of the HPEG Project is to develop a robust glove design that increases human performance during EVA and creates pathway for future implementation of emergent technologies, with specific aims of increasing pressurized mobility to 60% of barehanded capability, increasing the durability by 100%, and decreasing the potential of gloves to cause injury during use. The HPEG Project focused initial efforts on identifying potential new technologies and benchmarking the performance of current state of the art gloves to identify trends in design and fit leading to establish standards and metrics against which emerging technologies can be assessed at both the component and assembly levels. The first of the benchmarking tests evaluated the quantitative mobility performance and subjective fit of four prototype gloves developed by Flagsuit LLC, Final Frontier Designs, LLC Dover, and David Clark Company as compared to the Phase VI. All of the companies were asked to design and fabricate gloves to the same set of NASA provided hand measurements (which corresponded to a single size of Phase Vi glove) and focus their efforts on improving mobility in the metacarpal phalangeal and carpometacarpal joints. Four test subjects representing the design ]to hand anthropometry completed range of motion, grip/pinch strength, dexterity, and fit evaluations for each glove design in both the unpressurized and pressurized conditions. This paper provides a comparison of the test results along with a detailed description of hardware and test methodologies used.
Setting Achievement Targets for School Children.

ERIC Educational Resources Information Center

Thanassoulis, Emmanuel

1999-01-01

Develops an approach for setting performance targets for schoolchildren, using data-envelopment analysis to identify benchmark pupils who achieve the best observed performance (allowing for contextual factors). These pupils' achievement forms the basis of targets estimated. The procedure also identifies appropriate role models for weaker students'…

Direct infusion mass spectrometry metabolomics dataset: a benchmark for data processing and quality control

PubMed Central

Kirwan, Jennifer A; Weber, Ralf J M; Broadhurst, David I; Viant, Mark R

2014-01-01

Direct-infusion mass spectrometry (DIMS) metabolomics is an important approach for characterising molecular responses of organisms to disease, drugs and the environment. Increasingly large-scale metabolomics studies are being conducted, necessitating improvements in both bioanalytical and computational workflows to maintain data quality. This dataset represents a systematic evaluation of the reproducibility of a multi-batch DIMS metabolomics study of cardiac tissue extracts. It comprises of twenty biological samples (cow vs. sheep) that were analysed repeatedly, in 8 batches across 7 days, together with a concurrent set of quality control (QC) samples. Data are presented from each step of the workflow and are available in MetaboLights. The strength of the dataset is that intra- and inter-batch variation can be corrected using QC spectra and the quality of this correction assessed independently using the repeatedly-measured biological samples. Originally designed to test the efficacy of a batch-correction algorithm, it will enable others to evaluate novel data processing algorithms. Furthermore, this dataset serves as a benchmark for DIMS metabolomics, derived using best-practice workflows and rigorous quality assessment. PMID:25977770
Higher representations on the lattice: Numerical simulations, SU(2) with adjoint fermions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Del Debbio, Luigi; Patella, Agostino; Pica, Claudio

2010-05-01

We discuss the lattice formulation of gauge theories with fermions in arbitrary representations of the color group and present in detail the implementation of the hybrid Monte Carlo (HMC)/rational HMC algorithm for simulating dynamical fermions. We discuss the validation of the implementation through an extensive set of tests and the stability of simulations by monitoring the distribution of the lowest eigenvalue of the Wilson-Dirac operator. Working with two flavors of Wilson fermions in the adjoint representation, benchmark results for realistic lattice simulations are presented. Runs are performed on different lattice sizes ranging from 4{sup 3}x8 to 24{sup 3}x64 sites. Formore » the two smallest lattices we also report the measured values of benchmark mesonic observables. These results can be used as a baseline for rapid cross-checks of simulations in higher representations. The results presented here are the first steps toward more extensive investigations with controlled systematic errors, aiming at a detailed understanding of the phase structure of these theories, and of their viability as candidates for strong dynamics beyond the standard model.« less
Functionalized Single-Walled Carbon Nanotube-Based Fuel Cell Benchmarked Against US DOE 2017 Technical Targets

PubMed Central

Jha, Neetu; Ramesh, Palanisamy; Bekyarova, Elena; Tian, Xiaojuan; Wang, Feihu; Itkis, Mikhail E.; Haddon, Robert C.

2013-01-01

Chemically modified single-walled carbon nanotubes (SWNTs) with varying degrees of functionalization were utilized for the fabrication of SWNT thin film catalyst support layers (CSLs) in polymer electrolyte membrane fuel cells (PEMFCs), which were suitable for benchmarking against the US DOE 2017 targets. Use of the optimum level of SWNT -COOH functionality allowed the construction of a prototype SWNT-based PEMFC with total Pt loading of 0.06 mgPt/cm2 - well below the value of 0.125 mgPt/cm2 set as the US DOE 2017 technical target for total Pt group metals (PGM) loading. This prototype PEMFC also approaches the technical target for the total Pt content per kW of power (<0.125 gPGM/kW) at cell potential 0.65 V: a value of 0.15 gPt/kW was achieved at 80°C/22 psig testing conditions, which was further reduced to 0.12 gPt/kW at 35 psig back pressure. PMID:23877112
Functionalized single-walled carbon nanotube-based fuel cell benchmarked against US DOE 2017 technical targets.

PubMed

Jha, Neetu; Ramesh, Palanisamy; Bekyarova, Elena; Tian, Xiaojuan; Wang, Feihu; Itkis, Mikhail E; Haddon, Robert C

2013-01-01

Chemically modified single-walled carbon nanotubes (SWNTs) with varying degrees of functionalization were utilized for the fabrication of SWNT thin film catalyst support layers (CSLs) in polymer electrolyte membrane fuel cells (PEMFCs), which were suitable for benchmarking against the US DOE 2017 targets. Use of the optimum level of SWNT -COOH functionality allowed the construction of a prototype SWNT-based PEMFC with total Pt loading of 0.06 mg(Pt)/cm²--well below the value of 0.125 mg(Pt)/cm² set as the US DOE 2017 technical target for total Pt group metals (PGM) loading. This prototype PEMFC also approaches the technical target for the total Pt content per kW of power (<0.125 g(PGM)/kW) at cell potential 0.65 V: a value of 0.15 g(Pt)/kW was achieved at 80°C/22 psig testing conditions, which was further reduced to 0.12 g(Pt)/kW at 35 psig back pressure.
DeltaSA tool for source apportionment benchmarking, description and sensitivity analysis

NASA Astrophysics Data System (ADS)

Pernigotti, D.; Belis, C. A.

2018-05-01

DeltaSA is an R-package and a Java on-line tool developed at the EC-Joint Research Centre to assist and benchmark source apportionment applications. Its key functionalities support two critical tasks in this kind of studies: the assignment of a factor to a source in factor analytical models (source identification) and the model performance evaluation. The source identification is based on the similarity between a given factor and source chemical profiles from public databases. The model performance evaluation is based on statistical indicators used to compare model output with reference values generated in intercomparison exercises. The references values are calculated as the ensemble average of the results reported by participants that have passed a set of testing criteria based on chemical profiles and time series similarity. In this study, a sensitivity analysis of the model performance criteria is accomplished using the results of a synthetic dataset where "a priori" references are available. The consensus modulated standard deviation punc gives the best choice for the model performance evaluation when a conservative approach is adopted.
Comparison of discrete ordinate and Monte Carlo simulations of polarized radiative transfer in two coupled slabs with different refractive indices.

PubMed

Cohen, D; Stamnes, S; Tanikawa, T; Sommersten, E R; Stamnes, J J; Lotsberg, J K; Stamnes, K

2013-04-22

A comparison is presented of two different methods for polarized radiative transfer in coupled media consisting of two adjacent slabs with different refractive indices, each slab being a stratified medium with no change in optical properties except in the direction of stratification. One of the methods is based on solving the integro-differential radiative transfer equation for the two coupled slabs using the discrete ordinate approximation. The other method is based on probabilistic and statistical concepts and simulates the propagation of polarized light using the Monte Carlo approach. The emphasis is on non-Rayleigh scattering for particles in the Mie regime. Comparisons with benchmark results available for a slab with constant refractive index show that both methods reproduce these benchmark results when the refractive index is set to be the same in the two slabs. Computed results for test cases with coupling (different refractive indices in the two slabs) show that the two methods produce essentially identical results for identical input in terms of absorption and scattering coefficients and scattering phase matrices.
Highly Efficient and Scalable Compound Decomposition of Two-Electron Integral Tensor and Its Application in Coupled Cluster Calculations.

PubMed

Peng, Bo; Kowalski, Karol

2017-09-12

The representation and storage of two-electron integral tensors are vital in large-scale applications of accurate electronic structure methods. Low-rank representation and efficient storage strategy of integral tensors can significantly reduce the numerical overhead and consequently time-to-solution of these methods. In this work, by combining pivoted incomplete Cholesky decomposition (CD) with a follow-up truncated singular vector decomposition (SVD), we develop a decomposition strategy to approximately represent the two-electron integral tensor in terms of low-rank vectors. A systematic benchmark test on a series of 1-D, 2-D, and 3-D carbon-hydrogen systems demonstrates high efficiency and scalability of the compound two-step decomposition of the two-electron integral tensor in our implementation. For the size of the atomic basis set, N b , ranging from ∼100 up to ∼2,000, the observed numerical scaling of our implementation shows [Formula: see text] versus [Formula: see text] cost of performing single CD on the two-electron integral tensor in most of the other implementations. More importantly, this decomposition strategy can significantly reduce the storage requirement of the atomic orbital (AO) two-electron integral tensor from [Formula: see text] to [Formula: see text] with moderate decomposition thresholds. The accuracy tests have been performed using ground- and excited-state formulations of coupled cluster formalism employing single and double excitations (CCSD) on several benchmark systems including the C 60 molecule described by nearly 1,400 basis functions. The results show that the decomposition thresholds can be generally set to 10 -4 to 10 -3 to give acceptable compromise between efficiency and accuracy.
Closed-Loop Neuromorphic Benchmarks

PubMed Central

Stewart, Terrence C.; DeWolf, Travis; Kleinhans, Ashley; Eliasmith, Chris

2015-01-01

Evaluating the effectiveness and performance of neuromorphic hardware is difficult. It is even more difficult when the task of interest is a closed-loop task; that is, a task where the output from the neuromorphic hardware affects some environment, which then in turn affects the hardware's future input. However, closed-loop situations are one of the primary potential uses of neuromorphic hardware. To address this, we present a methodology for generating closed-loop benchmarks that makes use of a hybrid of real physical embodiment and a type of “minimal” simulation. Minimal simulation has been shown to lead to robust real-world performance, while still maintaining the practical advantages of simulation, such as making it easy for the same benchmark to be used by many researchers. This method is flexible enough to allow researchers to explicitly modify the benchmarks to identify specific task domains where particular hardware excels. To demonstrate the method, we present a set of novel benchmarks that focus on motor control for an arbitrary system with unknown external forces. Using these benchmarks, we show that an error-driven learning rule can consistently improve motor control performance across a randomly generated family of closed-loop simulations, even when there are up to 15 interacting joints to be controlled. PMID:26696820
Benchmark Comparison of Dual- and Quad-Core Processor Linux Clusters with Two Global Climate Modeling Workloads

NASA Technical Reports Server (NTRS)

McGalliard, James

2008-01-01

This viewgraph presentation details the science and systems environments that NASA High End computing program serves. Included is a discussion of the workload that is involved in the processing for the Global Climate Modeling. The Goddard Earth Observing System Model, Version 5 (GEOS-5) is a system of models integrated using the Earth System Modeling Framework (ESMF). The GEOS-5 system was used for the Benchmark tests, and the results of the tests are shown and discussed. Tests were also run for the Cubed Sphere system, results for these test are also shown.
Benchmarking the Collocation Stand-Alone Library and Toolkit (CSALT)

NASA Technical Reports Server (NTRS)

Hughes, Steven; Knittel, Jeremy; Shoan, Wendy; Kim, Youngkwang; Conway, Claire; Conway, Darrel J.

2017-01-01

This paper describes the processes and results of Verification and Validation (VV) efforts for the Collocation Stand Alone Library and Toolkit (CSALT). We describe the test program and environments, the tools used for independent test data, and comparison results. The VV effort employs classical problems with known analytic solutions, solutions from other available software tools, and comparisons to benchmarking data available in the public literature. Presenting all test results are beyond the scope of a single paper. Here we present high-level test results for a broad range of problems, and detailed comparisons for selected problems.
Benchmarking the Collocation Stand-Alone Library and Toolkit (CSALT)

NASA Technical Reports Server (NTRS)

Hughes, Steven; Knittel, Jeremy; Shoan, Wendy (Compiler); Kim, Youngkwang; Conway, Claire (Compiler); Conway, Darrel

2017-01-01

This paper describes the processes and results of Verification and Validation (V&V) efforts for the Collocation Stand Alone Library and Toolkit (CSALT). We describe the test program and environments, the tools used for independent test data, and comparison results. The V&V effort employs classical problems with known analytic solutions, solutions from other available software tools, and comparisons to benchmarking data available in the public literature. Presenting all test results are beyond the scope of a single paper. Here we present high-level test results for a broad range of problems, and detailed comparisons for selected problems.
Airfoil Vibration Dampers program

NASA Technical Reports Server (NTRS)

Cook, Robert M.

1991-01-01

The Airfoil Vibration Damper program has consisted of an analysis phase and a testing phase. During the analysis phase, a state-of-the-art computer code was developed, which can be used to guide designers in the placement and sizing of friction dampers. The use of this computer code was demonstrated by performing representative analyses on turbine blades from the High Pressure Oxidizer Turbopump (HPOTP) and High Pressure Fuel Turbopump (HPFTP) of the Space Shuttle Main Engine (SSME). The testing phase of the program consisted of performing friction damping tests on two different cantilever beams. Data from these tests provided an empirical check on the accuracy of the computer code developed in the analysis phase. Results of the analysis and testing showed that the computer code can accurately predict the performance of friction dampers. In addition, a valuable set of friction damping data was generated, which can be used to aid in the design of friction dampers, as well as provide benchmark test cases for future code developers.
Test Cases for Modeling and Validation of Structures with Piezoelectric Actuators

NASA Technical Reports Server (NTRS)

Reaves, Mercedes C.; Horta, Lucas G.

2001-01-01

A set of benchmark test articles were developed to validate techniques for modeling structures containing piezoelectric actuators using commercially available finite element analysis packages. The paper presents the development, modeling, and testing of two structures: an aluminum plate with surface mounted patch actuators and a composite box beam with surface mounted actuators. Three approaches for modeling structures containing piezoelectric actuators using the commercially available packages: MSC/NASTRAN and ANSYS are presented. The approaches, applications, and limitations are discussed. Data for both test articles are compared in terms of frequency response functions from deflection and strain data to input voltage to the actuator. Frequency response function results using the three different analysis approaches provided comparable test/analysis results. It is shown that global versus local behavior of the analytical model and test article must be considered when comparing different approaches. Also, improper bonding of actuators greatly reduces the electrical to mechanical effectiveness of the actuators producing anti-resonance errors.
Energy Behavior Change and Army Net Zero Energy; Gaps in the Army’s Approach to Changing Energy Behavior

DTIC Science & Technology

2014-06-13

age, and design , installations set benchmarks for utility use and cost. This benchmark includes a buffer above and below the baseline. If residents...sustainability officers from each government agency (US President 2009, 6). The order requires that each federal agency designate a senior...conducting direct comparisons of pre and post intervention data (Judd et al. 2013, 15). Soldiers were the primary occupants of the three buildings with
Benchmark CCSD(T) and DFT study of binding energies in Be7 - 12: in search of reliable DFT functional for beryllium clusters

NASA Astrophysics Data System (ADS)

Labanc, Daniel; Šulka, Martin; Pitoňák, Michal; Černušák, Ivan; Urban, Miroslav; Neogrády, Pavel

2018-05-01

We present a computational study of the stability of small homonuclear beryllium clusters Be7 - 12 in singlet electronic states. Our predictions are based on highly correlated CCSD(T) coupled cluster calculations. Basis set convergence towards the complete basis set limit as well as the role of the 1s core electron correlation are carefully examined. Our CCSD(T) data for binding energies of Be7 - 12 clusters serve as a benchmark for performance assessment of several density functional theory (DFT) methods frequently used in beryllium cluster chemistry. We observe that, from Be10 clusters on, the deviation from CCSD(T) benchmarks is stable with respect to size, and fluctuating within 0.02 eV error bar for most examined functionals. This opens up the possibility of scaling the DFT binding energies for large Be clusters using CCSD(T) benchmark values for smaller clusters. We also tried to find analogies between the performance of DFT functionals for Be clusters and for the valence-isoelectronic Mg clusters investigated recently in Truhlar's group. We conclude that it is difficult to find DFT functionals that perform reasonably well for both beryllium and magnesium clusters. Out of 12 functionals examined, only the M06-2X functional gives reasonably accurate and balanced binding energies for both Be and Mg clusters.
A benchmark for statistical microarray data analysis that preserves actual biological and technical variance.

PubMed

De Hertogh, Benoît; De Meulder, Bertrand; Berger, Fabrice; Pierre, Michael; Bareke, Eric; Gaigneaux, Anthoula; Depiereux, Eric

2010-01-11

Recent reanalysis of spike-in datasets underscored the need for new and more accurate benchmark datasets for statistical microarray analysis. We present here a fresh method using biologically-relevant data to evaluate the performance of statistical methods. Our novel method ranks the probesets from a dataset composed of publicly-available biological microarray data and extracts subset matrices with precise information/noise ratios. Our method can be used to determine the capability of different methods to better estimate variance for a given number of replicates. The mean-variance and mean-fold change relationships of the matrices revealed a closer approximation of biological reality. Performance analysis refined the results from benchmarks published previously.We show that the Shrinkage t test (close to Limma) was the best of the methods tested, except when two replicates were examined, where the Regularized t test and the Window t test performed slightly better. The R scripts used for the analysis are available at http://urbm-cluster.urbm.fundp.ac.be/~bdemeulder/.
The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS).

PubMed

Menze, Bjoern H; Jakab, Andras; Bauer, Stefan; Kalpathy-Cramer, Jayashree; Farahani, Keyvan; Kirby, Justin; Burren, Yuliya; Porz, Nicole; Slotboom, Johannes; Wiest, Roland; Lanczi, Levente; Gerstner, Elizabeth; Weber, Marc-André; Arbel, Tal; Avants, Brian B; Ayache, Nicholas; Buendia, Patricia; Collins, D Louis; Cordier, Nicolas; Corso, Jason J; Criminisi, Antonio; Das, Tilak; Delingette, Hervé; Demiralp, Çağatay; Durst, Christopher R; Dojat, Michel; Doyle, Senan; Festa, Joana; Forbes, Florence; Geremia, Ezequiel; Glocker, Ben; Golland, Polina; Guo, Xiaotao; Hamamci, Andac; Iftekharuddin, Khan M; Jena, Raj; John, Nigel M; Konukoglu, Ender; Lashkari, Danial; Mariz, José Antonió; Meier, Raphael; Pereira, Sérgio; Precup, Doina; Price, Stephen J; Raviv, Tammy Riklin; Reza, Syed M S; Ryan, Michael; Sarikaya, Duygu; Schwartz, Lawrence; Shin, Hoo-Chang; Shotton, Jamie; Silva, Carlos A; Sousa, Nuno; Subbanna, Nagesh K; Szekely, Gabor; Taylor, Thomas J; Thomas, Owen M; Tustison, Nicholas J; Unal, Gozde; Vasseur, Flor; Wintermark, Max; Ye, Dong Hye; Zhao, Liang; Zhao, Binsheng; Zikic, Darko; Prastawa, Marcel; Reyes, Mauricio; Van Leemput, Koen

2015-10-01

In this paper we report the set-up and results of the Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) organized in conjunction with the MICCAI 2012 and 2013 conferences. Twenty state-of-the-art tumor segmentation algorithms were applied to a set of 65 multi-contrast MR scans of low- and high-grade glioma patients-manually annotated by up to four raters-and to 65 comparable scans generated using tumor image simulation software. Quantitative evaluations revealed considerable disagreement between the human raters in segmenting various tumor sub-regions (Dice scores in the range 74%-85%), illustrating the difficulty of this task. We found that different algorithms worked best for different sub-regions (reaching performance comparable to human inter-rater variability), but that no single algorithm ranked in the top for all sub-regions simultaneously. Fusing several good algorithms using a hierarchical majority vote yielded segmentations that consistently ranked above all individual algorithms, indicating remaining opportunities for further methodological improvements. The BRATS image data and manual annotations continue to be publicly available through an online evaluation system as an ongoing benchmarking resource.
The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS)

PubMed Central

Jakab, Andras; Bauer, Stefan; Kalpathy-Cramer, Jayashree; Farahani, Keyvan; Kirby, Justin; Burren, Yuliya; Porz, Nicole; Slotboom, Johannes; Wiest, Roland; Lanczi, Levente; Gerstner, Elizabeth; Weber, Marc-André; Arbel, Tal; Avants, Brian B.; Ayache, Nicholas; Buendia, Patricia; Collins, D. Louis; Cordier, Nicolas; Corso, Jason J.; Criminisi, Antonio; Das, Tilak; Delingette, Hervé; Demiralp, Çağatay; Durst, Christopher R.; Dojat, Michel; Doyle, Senan; Festa, Joana; Forbes, Florence; Geremia, Ezequiel; Glocker, Ben; Golland, Polina; Guo, Xiaotao; Hamamci, Andac; Iftekharuddin, Khan M.; Jena, Raj; John, Nigel M.; Konukoglu, Ender; Lashkari, Danial; Mariz, José António; Meier, Raphael; Pereira, Sérgio; Precup, Doina; Price, Stephen J.; Raviv, Tammy Riklin; Reza, Syed M. S.; Ryan, Michael; Sarikaya, Duygu; Schwartz, Lawrence; Shin, Hoo-Chang; Shotton, Jamie; Silva, Carlos A.; Sousa, Nuno; Subbanna, Nagesh K.; Szekely, Gabor; Taylor, Thomas J.; Thomas, Owen M.; Tustison, Nicholas J.; Unal, Gozde; Vasseur, Flor; Wintermark, Max; Ye, Dong Hye; Zhao, Liang; Zhao, Binsheng; Zikic, Darko; Prastawa, Marcel; Reyes, Mauricio; Van Leemput, Koen

2016-01-01

In this paper we report the set-up and results of the Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) organized in conjunction with the MICCAI 2012 and 2013 conferences. Twenty state-of-the-art tumor segmentation algorithms were applied to a set of 65 multi-contrast MR scans of low- and high-grade glioma patients—manually annotated by up to four raters—and to 65 comparable scans generated using tumor image simulation software. Quantitative evaluations revealed considerable disagreement between the human raters in segmenting various tumor sub-regions (Dice scores in the range 74%–85%), illustrating the difficulty of this task. We found that different algorithms worked best for different sub-regions (reaching performance comparable to human inter-rater variability), but that no single algorithm ranked in the top for all sub-regions simultaneously. Fusing several good algorithms using a hierarchical majority vote yielded segmentations that consistently ranked above all individual algorithms, indicating remaining opportunities for further methodological improvements. The BRATS image data and manual annotations continue to be publicly available through an online evaluation system as an ongoing benchmarking resource. PMID:25494501
Benchmarking MARS (accident management software) with the Browns Ferry fire

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dawson, S.M.; Liu, L.Y.; Raines, J.C.

1992-01-01

The MAAP Accident Response System (MARS) is a userfriendly computer software developed to provide management and engineering staff with the most needed insights, during actual or simulated accidents, of the current and future conditions of the plant based on current plant data and its trends. To demonstrate the reliability of the MARS code in simulatng a plant transient, MARS is being benchmarked with the available reactor pressure vessel (RPV) pressure and level data from the Browns Ferry fire. The MRS software uses the Modular Accident Analysis Program (MAAP) code as its basis to calculate plant response under accident conditions. MARSmore » uses a limited set of plant data to initialize and track the accidnt progression. To perform this benchmark, a simulated set of plant data was constructed based on actual report data containing the information necessary to initialize MARS and keep track of plant system status throughout the accident progression. The initial Browns Ferry fire data were produced by performing a MAAP run to simulate the accident. The remaining accident simulation used actual plant data.« less
On the predictability of land surface fluxes from meteorological variables

NASA Astrophysics Data System (ADS)

Haughton, Ned; Abramowitz, Gab; Pitman, Andy J.

2018-01-01

Previous research has shown that land surface models (LSMs) are performing poorly when compared with relatively simple empirical models over a wide range of metrics and environments. Atmospheric driving data appear to provide information about land surface fluxes that LSMs are not fully utilising. Here, we further quantify the information available in the meteorological forcing data that are used by LSMs for predicting land surface fluxes, by interrogating FLUXNET data, and extending the benchmarking methodology used in previous experiments. We show that substantial performance improvement is possible for empirical models using meteorological data alone, with no explicit vegetation or soil properties, thus setting lower bounds on a priori expectations on LSM performance. The process also identifies key meteorological variables that provide predictive power. We provide an ensemble of empirical benchmarks that are simple to reproduce and provide a range of behaviours and predictive performance, acting as a baseline benchmark set for future studies. We reanalyse previously published LSM simulations and show that there is more diversity between LSMs than previously indicated, although it remains unclear why LSMs are broadly performing so much worse than simple empirical models.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Marck, Steven C. van der, E-mail: vandermarck@nrg.eu

Recent releases of three major world nuclear reaction data libraries, ENDF/B-VII.1, JENDL-4.0, and JEFF-3.1.1, have been tested extensively using benchmark calculations. The calculations were performed with the latest release of the continuous energy Monte Carlo neutronics code MCNP, i.e. MCNP6. Three types of benchmarks were used, viz. criticality safety benchmarks, (fusion) shielding benchmarks, and reference systems for which the effective delayed neutron fraction is reported. For criticality safety, more than 2000 benchmarks from the International Handbook of Criticality Safety Benchmark Experiments were used. Benchmarks from all categories were used, ranging from low-enriched uranium, compound fuel, thermal spectrum ones (LEU-COMP-THERM), tomore » mixed uranium-plutonium, metallic fuel, fast spectrum ones (MIX-MET-FAST). For fusion shielding many benchmarks were based on IAEA specifications for the Oktavian experiments (for Al, Co, Cr, Cu, LiF, Mn, Mo, Si, Ti, W, Zr), Fusion Neutronics Source in Japan (for Be, C, N, O, Fe, Pb), and Pulsed Sphere experiments at Lawrence Livermore National Laboratory (for {sup 6}Li, {sup 7}Li, Be, C, N, O, Mg, Al, Ti, Fe, Pb, D2O, H2O, concrete, polyethylene and teflon). The new functionality in MCNP6 to calculate the effective delayed neutron fraction was tested by comparison with more than thirty measurements in widely varying systems. Among these were measurements in the Tank Critical Assembly (TCA in Japan) and IPEN/MB-01 (Brazil), both with a thermal spectrum, two cores in Masurca (France) and three cores in the Fast Critical Assembly (FCA, Japan), all with fast spectra. The performance of the three libraries, in combination with MCNP6, is shown to be good. The results for the LEU-COMP-THERM category are on average very close to the benchmark value. Also for most other categories the results are satisfactory. Deviations from the benchmark values do occur in certain benchmark series, or in isolated cases within benchmark series. Such instances can often be related to nuclear data for specific non-fissile elements, such as C, Fe, or Gd. Indications are that the intermediate and mixed spectrum cases are less well described. The results for the shielding benchmarks are generally good, with very similar results for the three libraries in the majority of cases. Nevertheless there are, in certain cases, strong deviations between calculated and benchmark values, such as for Co and Mg. Also, the results show discrepancies at certain energies or angles for e.g. C, N, O, Mo, and W. The functionality of MCNP6 to calculate the effective delayed neutron fraction yields very good results for all three libraries.« less
A Field-Based Aquatic Life Benchmark for Conductivity in ...

EPA Pesticide Factsheets

This report adapts the standard U.S. EPA methodology for deriving ambient water quality criteria. Rather than use toxicity test results, the adaptation uses field data to determine the loss of 5% of genera from streams. The method is applied to derive effect benchmarks for dissolved salts as measured by conductivity in Central Appalachian streams using data from West Virginia and Kentucky. This report provides scientific evidence for a conductivity benchmark in a specific region rather than for the entire United States.
XWeB: The XML Warehouse Benchmark

NASA Astrophysics Data System (ADS)

Mahboubi, Hadj; Darmont, Jérôme

With the emergence of XML as a standard for representing business data, new decision support applications are being developed. These XML data warehouses aim at supporting On-Line Analytical Processing (OLAP) operations that manipulate irregular XML data. To ensure feasibility of these new tools, important performance issues must be addressed. Performance is customarily assessed with the help of benchmarks. However, decision support benchmarks do not currently support XML features. In this paper, we introduce the XML Warehouse Benchmark (XWeB), which aims at filling this gap. XWeB derives from the relational decision support benchmark TPC-H. It is mainly composed of a test data warehouse that is based on a unified reference model for XML warehouses and that features XML-specific structures, and its associate XQuery decision support workload. XWeB's usage is illustrated by experiments on several XML database management systems.
An optimized proportional-derivative controller for the human upper extremity with gravity.

PubMed

Jagodnik, Kathleen M; Blana, Dimitra; van den Bogert, Antonie J; Kirsch, Robert F

2015-10-15

When Functional Electrical Stimulation (FES) is used to restore movement in subjects with spinal cord injury (SCI), muscle stimulation patterns should be selected to generate accurate and efficient movements. Ideally, the controller for such a neuroprosthesis will have the simplest architecture possible, to facilitate translation into a clinical setting. In this study, we used the simulated annealing algorithm to optimize two proportional-derivative (PD) feedback controller gain sets for a 3-dimensional arm model that includes musculoskeletal dynamics and has 5 degrees of freedom and 22 muscles, performing goal-oriented reaching movements. Controller gains were optimized by minimizing a weighted sum of position errors, orientation errors, and muscle activations. After optimization, gain performance was evaluated on the basis of accuracy and efficiency of reaching movements, along with three other benchmark gain sets not optimized for our system, on a large set of dynamic reaching movements for which the controllers had not been optimized, to test ability to generalize. Robustness in the presence of weakened muscles was also tested. The two optimized gain sets were found to have very similar performance to each other on all metrics, and to exhibit significantly better accuracy, compared with the three standard gain sets. All gain sets investigated used physiologically acceptable amounts of muscular activation. It was concluded that optimization can yield significant improvements in controller performance while still maintaining muscular efficiency, and that optimization should be considered as a strategy for future neuroprosthesis controller design. Published by Elsevier Ltd.
INTEGRAL BENCHMARK DATA FOR NUCLEAR DATA TESTING THROUGH THE ICSBEP AND THE NEWLY ORGANIZED IRPHEP

DOE Office of Scientific and Technical Information (OSTI.GOV)

J. Blair Briggs; Lori Scott; Yolanda Rugama

The status of the International Criticality Safety Benchmark Evaluation Project (ICSBEP) was last reported in a nuclear data conference at the International Conference on Nuclear Data for Science and Technology, ND-2004, in Santa Fe, New Mexico. Since that time the number and type of integral benchmarks have increased significantly. Included in the ICSBEP Handbook are criticality-alarm / shielding and fundamental physic benchmarks in addition to the traditional critical / subcritical benchmark data. Since ND 2004, a reactor physics counterpart to the ICSBEP, the International Reactor Physics Experiment Evaluation Project (IRPhEP) was initiated. The IRPhEP is patterned after the ICSBEP, butmore » focuses on other integral measurements, such as buckling, spectral characteristics, reactivity effects, reactivity coefficients, kinetics measurements, reaction-rate and power distributions, nuclide compositions, and other miscellaneous-type measurements in addition to the critical configuration. The status of these two projects is discussed and selected benchmarks highlighted in this paper.« less
Toward real-time performance benchmarks for Ada

NASA Technical Reports Server (NTRS)

Clapp, Russell M.; Duchesneau, Louis; Volz, Richard A.; Mudge, Trevor N.; Schultze, Timothy

1986-01-01

The issue of real-time performance measurements for the Ada programming language through the use of benchmarks is addressed. First, the Ada notion of time is examined and a set of basic measurement techniques are developed. Then a set of Ada language features believed to be important for real-time performance are presented and specific measurement methods discussed. In addition, other important time related features which are not explicitly part of the language but are part of the run-time related features which are not explicitly part of the language but are part of the run-time system are also identified and measurement techniques developed. The measurement techniques are applied to the language and run-time system features and the results are presented.
NetBenchmark: a bioconductor package for reproducible benchmarks of gene regulatory network inference.

PubMed

Bellot, Pau; Olsen, Catharina; Salembier, Philippe; Oliveras-Vergés, Albert; Meyer, Patrick E

2015-09-29

In the last decade, a great number of methods for reconstructing gene regulatory networks from expression data have been proposed. However, very few tools and datasets allow to evaluate accurately and reproducibly those methods. Hence, we propose here a new tool, able to perform a systematic, yet fully reproducible, evaluation of transcriptional network inference methods. Our open-source and freely available Bioconductor package aggregates a large set of tools to assess the robustness of network inference algorithms against different simulators, topologies, sample sizes and noise intensities. The benchmarking framework that uses various datasets highlights the specialization of some methods toward network types and data. As a result, it is possible to identify the techniques that have broad overall performances.
A benchmark for fault tolerant flight control evaluation

NASA Astrophysics Data System (ADS)

Smaili, H.; Breeman, J.; Lombaerts, T.; Stroosma, O.

2013-12-01

A large transport aircraft simulation benchmark (REconfigurable COntrol for Vehicle Emergency Return - RECOVER) has been developed within the GARTEUR (Group for Aeronautical Research and Technology in Europe) Flight Mechanics Action Group 16 (FM-AG(16)) on Fault Tolerant Control (2004 2008) for the integrated evaluation of fault detection and identification (FDI) and reconfigurable flight control strategies. The benchmark includes a suitable set of assessment criteria and failure cases, based on reconstructed accident scenarios, to assess the potential of new adaptive control strategies to improve aircraft survivability. The application of reconstruction and modeling techniques, based on accident flight data, has resulted in high-fidelity nonlinear aircraft and fault models to evaluate new Fault Tolerant Flight Control (FTFC) concepts and their real-time performance to accommodate in-flight failures.
A benchmark for subduction zone modeling

NASA Astrophysics Data System (ADS)

van Keken, P.; King, S.; Peacock, S.

2003-04-01

Our understanding of subduction zones hinges critically on the ability to discern its thermal structure and dynamics. Computational modeling has become an essential complementary approach to observational and experimental studies. The accurate modeling of subduction zones is challenging due to the unique geometry, complicated rheological description and influence of fluid and melt formation. The complicated physics causes problems for the accurate numerical solution of the governing equations. As a consequence it is essential for the subduction zone community to be able to evaluate the ability and limitations of various modeling approaches. The participants of a workshop on the modeling of subduction zones, held at the University of Michigan at Ann Arbor, MI, USA in 2002, formulated a number of case studies to be developed into a benchmark similar to previous mantle convection benchmarks (Blankenbach et al., 1989; Busse et al., 1991; Van Keken et al., 1997). Our initial benchmark focuses on the dynamics of the mantle wedge and investigates three different rheologies: constant viscosity, diffusion creep, and dislocation creep. In addition we investigate the ability of codes to accurate model dynamic pressure and advection dominated flows. Proceedings of the workshop and the formulation of the benchmark are available at www.geo.lsa.umich.edu/~keken/subduction02.html We strongly encourage interested research groups to participate in this benchmark. At Nice 2003 we will provide an update and first set of benchmark results. Interested researchers are encouraged to contact one of the authors for further details.
Benchmarking: A Method for Continuous Quality Improvement in Health

PubMed Central

Ettorchi-Tardy, Amina; Levif, Marie; Michel, Philippe

2012-01-01

Benchmarking, a management approach for implementing best practices at best cost, is a recent concept in the healthcare system. The objectives of this paper are to better understand the concept and its evolution in the healthcare sector, to propose an operational definition, and to describe some French and international experiences of benchmarking in the healthcare sector. To this end, we reviewed the literature on this approach's emergence in the industrial sector, its evolution, its fields of application and examples of how it has been used in the healthcare sector. Benchmarking is often thought to consist simply of comparing indicators and is not perceived in its entirety, that is, as a tool based on voluntary and active collaboration among several organizations to create a spirit of competition and to apply best practices. The key feature of benchmarking is its integration within a comprehensive and participatory policy of continuous quality improvement (CQI). Conditions for successful benchmarking focus essentially on careful preparation of the process, monitoring of the relevant indicators, staff involvement and inter-organizational visits. Compared to methods previously implemented in France (CQI and collaborative projects), benchmarking has specific features that set it apart as a healthcare innovation. This is especially true for healthcare or medical–social organizations, as the principle of inter-organizational visiting is not part of their culture. Thus, this approach will need to be assessed for feasibility and acceptability before it is more widely promoted. PMID:23634166
A geometrical correction for the inter- and intra-molecular basis set superposition error in Hartree-Fock and density functional theory calculations for large systems

NASA Astrophysics Data System (ADS)

Kruse, Holger; Grimme, Stefan

2012-04-01

A semi-empirical counterpoise-type correction for basis set superposition error (BSSE) in molecular systems is presented. An atom pair-wise potential corrects for the inter- and intra-molecular BSSE in supermolecular Hartree-Fock (HF) or density functional theory (DFT) calculations. This geometrical counterpoise (gCP) denoted scheme depends only on the molecular geometry, i.e., no input from the electronic wave-function is required and hence is applicable to molecules with ten thousands of atoms. The four necessary parameters have been determined by a fit to standard Boys and Bernadi counterpoise corrections for Hobza's S66×8 set of non-covalently bound complexes (528 data points). The method's target are small basis sets (e.g., minimal, split-valence, 6-31G*), but reliable results are also obtained for larger triple-ζ sets. The intermolecular BSSE is calculated by gCP within a typical error of 10%-30% that proves sufficient in many practical applications. The approach is suggested as a quantitative correction in production work and can also be routinely applied to estimate the magnitude of the BSSE beforehand. The applicability for biomolecules as the primary target is tested for the crambin protein, where gCP removes intramolecular BSSE effectively and yields conformational energies comparable to def2-TZVP basis results. Good mutual agreement is also found with Jensen's ACP(4) scheme, estimating the intramolecular BSSE in the phenylalanine-glycine-phenylalanine tripeptide, for which also a relaxed rotational energy profile is presented. A variety of minimal and double-ζ basis sets combined with gCP and the dispersion corrections DFT-D3 and DFT-NL are successfully benchmarked on the S22 and S66 sets of non-covalent interactions. Outstanding performance with a mean absolute deviation (MAD) of 0.51 kcal/mol (0.38 kcal/mol after D3-refit) is obtained at the gCP-corrected HF-D3/(minimal basis) level for the S66 benchmark. The gCP-corrected B3LYP-D3/6-31G* model chemistry yields MAD=0.68 kcal/mol, which represents a huge improvement over plain B3LYP/6-31G* (MAD=2.3 kcal/mol). Application of gCP-corrected B97-D3 and HF-D3 on a set of large protein-ligand complexes prove the robustness of the method. Analytical gCP gradients make optimizations of large systems feasible with small basis sets, as demonstrated for the inter-ring distances of 9-helicene and most of the complexes in Hobza's S22 test set. The method is implemented in a freely available FORTRAN program obtainable from the author's website.
Benchmark tests on the digital equipment corporation Alpha AXP 21164-based AlphaServer 8400, including a comparison of optimized vector and superscalar processing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wasserman, H.J.

1996-02-01

The second generation of the Digital Equipment Corp. (DEC) DECchip Alpha AXP microprocessor is referred to as the 21164. From the viewpoint of numerically-intensive computing, the primary difference between it and its predecessor, the 21064, is that the 21164 has twice the multiply/add throughput per clock period (CP), a maximum of two floating point operations (FLOPS) per CP vs. one for 21064. The AlphaServer 8400 is a shared-memory multiprocessor server system that can accommodate up to 12 CPUs and up to 14 GB of memory. In this report we will compare single processor performance of the 8400 system with thatmore » of the International Business Machines Corp. (IBM) RISC System/6000 POWER-2 microprocessor running at 66 MHz, the Silicon Graphics, Inc. (SGI) MIPS R8000 microprocessor running at 75 MHz, and the Cray Research, Inc. CRAY J90. The performance comparison is based on a set of Fortran benchmark codes that represent a portion of the Los Alamos National Laboratory supercomputer workload. The advantage of using these codes, is that the codes also span a wide range of computational characteristics, such as vectorizability, problem size, and memory access pattern. The primary disadvantage of using them is that detailed, quantitative analysis of performance behavior of all codes on all machines is difficult. One important addition to the benchmark set appears for the first time in this report. Whereas the older version was written for a vector processor, the newer version is more optimized for microprocessor architectures. Therefore, we have for the first time, an opportunity to measure performance on a single application using implementations that expose the respective strengths of vector and superscalar architecture. All results in this report are from single processors. A subsequent article will explore shared-memory multiprocessing performance of the 8400 system.« less
Less is more: Sampling chemical space with active learning

NASA Astrophysics Data System (ADS)

Smith, Justin S.; Nebgen, Ben; Lubbers, Nicholas; Isayev, Olexandr; Roitberg, Adrian E.

2018-06-01

The development of accurate and transferable machine learning (ML) potentials for predicting molecular energetics is a challenging task. The process of data generation to train such ML potentials is a task neither well understood nor researched in detail. In this work, we present a fully automated approach for the generation of datasets with the intent of training universal ML potentials. It is based on the concept of active learning (AL) via Query by Committee (QBC), which uses the disagreement between an ensemble of ML potentials to infer the reliability of the ensemble's prediction. QBC allows the presented AL algorithm to automatically sample regions of chemical space where the ML potential fails to accurately predict the potential energy. AL improves the overall fitness of ANAKIN-ME (ANI) deep learning potentials in rigorous test cases by mitigating human biases in deciding what new training data to use. AL also reduces the training set size to a fraction of the data required when using naive random sampling techniques. To provide validation of our AL approach, we develop the COmprehensive Machine-learning Potential (COMP6) benchmark (publicly available on GitHub) which contains a diverse set of organic molecules. Active learning-based ANI potentials outperform the original random sampled ANI-1 potential with only 10% of the data, while the final active learning-based model vastly outperforms ANI-1 on the COMP6 benchmark after training to only 25% of the data. Finally, we show that our proposed AL technique develops a universal ANI potential (ANI-1x) that provides accurate energy and force predictions on the entire COMP6 benchmark. This universal ML potential achieves a level of accuracy on par with the best ML potentials for single molecules or materials, while remaining applicable to the general class of organic molecules composed of the elements CHNO.
Statistics based sampling for controller and estimator design

NASA Astrophysics Data System (ADS)

Tenne, Dirk

The purpose of this research is the development of statistical design tools for robust feed-forward/feedback controllers and nonlinear estimators. This dissertation is threefold and addresses the aforementioned topics nonlinear estimation, target tracking and robust control. To develop statistically robust controllers and nonlinear estimation algorithms, research has been performed to extend existing techniques, which propagate the statistics of the state, to achieve higher order accuracy. The so-called unscented transformation has been extended to capture higher order moments. Furthermore, higher order moment update algorithms based on a truncated power series have been developed. The proposed techniques are tested on various benchmark examples. Furthermore, the unscented transformation has been utilized to develop a three dimensional geometrically constrained target tracker. The proposed planar circular prediction algorithm has been developed in a local coordinate framework, which is amenable to extension of the tracking algorithm to three dimensional space. This tracker combines the predictions of a circular prediction algorithm and a constant velocity filter by utilizing the Covariance Intersection. This combined prediction can be updated with the subsequent measurement using a linear estimator. The proposed technique is illustrated on a 3D benchmark trajectory, which includes coordinated turns and straight line maneuvers. The third part of this dissertation addresses the design of controller which include knowledge of parametric uncertainties and their distributions. The parameter distributions are approximated by a finite set of points which are calculated by the unscented transformation. This set of points is used to design robust controllers which minimize a statistical performance of the plant over the domain of uncertainty consisting of a combination of the mean and variance. The proposed technique is illustrated on three benchmark problems. The first relates to the design of prefilters for a linear and nonlinear spring-mass-dashpot system and the second applies a feedback controller to a hovering helicopter. Lastly, the statistical robust controller design is devoted to a concurrent feed-forward/feedback controller structure for a high-speed low tension tape drive.
Spectral properties from Matsubara Green's function approach: Application to molecules

NASA Astrophysics Data System (ADS)

Schüler, M.; Pavlyukh, Y.

2018-03-01

We present results for many-body perturbation theory for the one-body Green's function at finite temperatures using the Matsubara formalism. Our method relies on the accurate representation of the single-particle states in standard Gaussian basis sets, allowing to efficiently compute, among other observables, quasiparticle energies and Dyson orbitals of atoms and molecules. In particular, we challenge the second-order treatment of the Coulomb interaction by benchmarking its accuracy for a well-established test set of small molecules, which includes also systems where the usual Hartree-Fock treatment encounters difficulties. We discuss different schemes how to extract quasiparticle properties and assess their range of applicability. With an accurate solution and compact representation, our method is an ideal starting point to study electron dynamics in time-resolved experiments by the propagation of the Kadanoff-Baym equations.
A Novel Consensus-Based Particle Swarm Optimization-Assisted Trust-Tech Methodology for Large-Scale Global Optimization.

PubMed

Zhang, Yong-Feng; Chiang, Hsiao-Dong

2017-09-01

A novel three-stage methodology, termed the "consensus-based particle swarm optimization (PSO)-assisted Trust-Tech methodology," to find global optimal solutions for nonlinear optimization problems is presented. It is composed of Trust-Tech methods, consensus-based PSO, and local optimization methods that are integrated to compute a set of high-quality local optimal solutions that can contain the global optimal solution. The proposed methodology compares very favorably with several recently developed PSO algorithms based on a set of small-dimension benchmark optimization problems and 20 large-dimension test functions from the CEC 2010 competition. The analytical basis for the proposed methodology is also provided. Experimental results demonstrate that the proposed methodology can rapidly obtain high-quality optimal solutions that can contain the global optimal solution. The scalability of the proposed methodology is promising.
Interference correction by extracting the information of interference dominant regions: Application to near-infrared spectra

NASA Astrophysics Data System (ADS)

Bi, Yiming; Tang, Liang; Shan, Peng; Xie, Qiong; Hu, Yong; Peng, Silong; Tan, Jie; Li, Changwen

2014-08-01

Interference such as baseline drift and light scattering can degrade the model predictability in multivariate analysis of near-infrared (NIR) spectra. Usually interference can be represented by an additive and a multiplicative factor. In order to eliminate these interferences, correction parameters are needed to be estimated from spectra. However, the spectra are often mixed of physical light scattering effects and chemical light absorbance effects, making it difficult for parameter estimation. Herein, a novel algorithm was proposed to find a spectral region automatically that the interesting chemical absorbance and noise are low, that is, finding an interference dominant region (IDR). Based on the definition of IDR, a two-step method was proposed to find the optimal IDR and the corresponding correction parameters estimated from IDR. Finally, the correction was performed to the full spectral range using previously obtained parameters for the calibration set and test set, respectively. The method can be applied to multi target systems with one IDR suitable for all targeted analytes. Tested on two benchmark data sets of near-infrared spectra, the performance of the proposed method provided considerable improvement compared with full spectral estimation methods and comparable with other state-of-art methods.
Land Ice Verification and Validation Kit

DOE Office of Scientific and Technical Information (OSTI.GOV)

2015-07-15

To address a pressing need to better understand the behavior and complex interaction of ice sheets within the global Earth system, significant development of continental-scale, dynamical ice-sheet models is underway. The associated verification and validation process of these models is being coordinated through a new, robust, python-based extensible software package, the Land Ice Verification and Validation toolkit (LIVV). This release provides robust and automated verification and a performance evaluation on LCF platforms. The performance V&V involves a comprehensive comparison of model performance relative to expected behavior on a given computing platform. LIVV operates on a set of benchmark and testmore » data, and provides comparisons for a suite of community prioritized tests, including configuration and parameter variations, bit-4-bit evaluation, and plots of tests where differences occur.« less
Online Object Tracking, Learning and Parsing with And-Or Graphs.

PubMed

Wu, Tianfu; Lu, Yang; Zhu, Song-Chun

2017-12-01

This paper presents a method, called AOGTracker, for simultaneously tracking, learning and parsing (TLP) of unknown objects in video sequences with a hierarchical and compositional And-Or graph (AOG) representation. The TLP method is formulated in the Bayesian framework with a spatial and a temporal dynamic programming (DP) algorithms inferring object bounding boxes on-the-fly. During online learning, the AOG is discriminatively learned using latent SVM [1] to account for appearance (e.g., lighting and partial occlusion) and structural (e.g., different poses and viewpoints) variations of a tracked object, as well as distractors (e.g., similar objects) in background. Three key issues in online inference and learning are addressed: (i) maintaining purity of positive and negative examples collected online, (ii) controling model complexity in latent structure learning, and (iii) identifying critical moments to re-learn the structure of AOG based on its intrackability. The intrackability measures uncertainty of an AOG based on its score maps in a frame. In experiments, our AOGTracker is tested on two popular tracking benchmarks with the same parameter setting: the TB-100/50/CVPR2013 benchmarks , [3] , and the VOT benchmarks [4] -VOT 2013, 2014, 2015 and TIR2015 (thermal imagery tracking). In the former, our AOGTracker outperforms state-of-the-art tracking algorithms including two trackers based on deep convolutional network [5] , [6] . In the latter, our AOGTracker outperforms all other trackers in VOT2013 and is comparable to the state-of-the-art methods in VOT2014, 2015 and TIR2015.
Variation in assessment and standard setting practices across UK undergraduate medicine and the need for a benchmark

PubMed Central

2015-01-01

Objectives The principal aim of this study is to provide an account of variation in UK undergraduate medical assessment styles and corresponding standard setting approaches with a view to highlighting the importance of a UK national licensing exam in recognizing a common standard. Methods Using a secure online survey system, response data were collected during the period 13 - 30 January 2014 from selected specialists in medical education assessment, who served as representatives for their respective medical schools. Results Assessment styles and corresponding choices of standard setting methods vary markedly across UK medical schools. While there is considerable consensus on the application of compensatory approaches, individual schools display their own nuances through use of hybrid assessment and standard setting styles, uptake of less popular standard setting techniques and divided views on norm referencing. Conclusions The extent of variation in assessment and standard setting practices across UK medical schools validates the concern that there is a lack of evidence that UK medical students achieve a common standard on graduation. A national licensing exam is therefore a viable option for benchmarking the performance of all UK undergraduate medical students. PMID:26520472

Comparative Benchmark Dose Modeling as a Tool to Make the First Estimate of Safe Human Exposure Levels to Lunar Dust

NASA Technical Reports Server (NTRS)

James, John T.; Lam, Chiu-wing; Scully, Robert R.

2013-01-01

Brief exposures of Apollo Astronauts to lunar dust occasionally elicited upper respiratory irritation; however, no limits were ever set for prolonged exposure ot lunar dust. Habitats for exploration, whether mobile of fixed must be designed to limit human exposure to lunar dust to safe levels. We have used a new technique we call Comparative Benchmark Dose Modeling to estimate safe exposure limits for lunar dust collected during the Apollo 14 mission.
RESULTS OF QA/QC TESTING OF EPA BENCHMARK DOSE SOFTWARE VERSION 1.2

EPA Science Inventory

EPA is developing benchmark dose software (BMDS) to support cancer and non-cancer dose-response assessments. Following the recent public review of BMDS version 1.1b, EPA developed a Hill model for evaluating continuous data, and improved the user interface and Multistage, Polyno...
TRAC-PF1 code verification with data from the OTIS test facility. [Once-Through Intergral System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Childerson, M.T.; Fujita, R.K.

1985-01-01

A computer code (TRAC-PF1/MOD1) developed for predicting transient thermal and hydraulic integral nuclear steam supply system (NSSS) response was benchmarked. Post-small break loss-of-coolant accident (LOCA) data from a scaled, experimental facility, designated the One-Through Integral System (OTIS), were obtained for the Babcock and Wilcox NSSS and compared to TRAC predictions. The OTIS tests provided a challenging small break LOCA data set for TRAC verification. The major phases of a small break LOCA observed in the OTIS tests included pressurizer draining and loop saturation, intermittent reactor coolant system circulation, boiler-condenser mode, and the initial stages of refill. The TRAC code wasmore » successful in predicting OTIS loop conditions (system pressures and temperatures) after modification of the steam generator model. In particular, the code predicted both pool and auxiliary-feedwater initiated boiler-condenser mode heat transfer.« less
A Reusable, Compliant, Small Volume Blood Reservoir for In Vitro Hemolysis Testing.

PubMed

Olia, Salim E; Herbertson, Luke H; Malinauskas, Richard A; Kameneva, Marina V

2017-02-01

Bench-top in vitro hemolysis testing is a fundamental tool during the design and regulatory safety evaluation of blood-contacting medical devices. While multiple published experimental protocols exist, descriptions of the test loop reservoir remain ambiguous. A critical fixture within the circuit, there is no readily available blood reservoir that ensures thorough mixing and complete air evacuation: two major factors which can affect results. As part of the Food and Drug Administration (FDA) Critical Path Initiative, we developed a three-piece reservoir consisting of a 3D-printed base, a plastic clamp set, and a medical-grade blood bag. This simple, reusable, and cost-effective design was used successfully in the hemolysis assessment of FDA benchmark nozzles and prototype rotary blood pumps, and may be useful as an integral component to any in vitro blood circulation loop. © 2016 International Center for Artificial Organs and Transplantation and Wiley Periodicals, Inc.
Fast neutron irradiation tests of flash memories used in space environment at the ISIS spallation neutron source

NASA Astrophysics Data System (ADS)

Andreani, C.; Senesi, R.; Paccagnella, A.; Bagatin, M.; Gerardin, S.; Cazzaniga, C.; Frost, C. D.; Picozza, P.; Gorini, G.; Mancini, R.; Sarno, M.

2018-02-01

This paper presents a neutron accelerated study of soft errors in advanced electronic devices used in space missions, i.e. Flash memories performed at the ChipIr and VESUVIO beam lines at the ISIS spallation neutron source. The two neutron beam lines are set up to mimic the space environment spectra and allow neutron irradiation tests on Flash memories in the neutron energy range above 10 MeV and up to 800 MeV. The ISIS neutron energy spectrum is similar to the one occurring in the atmospheric as well as in space and planetary environments, with intensity enhancements varying in the range 108- 10 9 and 106- 10 7 respectively. Such conditions are suitable for the characterization of the atmospheric, space and planetary neutron radiation environments, and are directly applicable for accelerated tests of electronic components as demonstrated here in benchmark measurements performed on flash memories.
Interactive visual optimization and analysis for RFID benchmarking.

PubMed

Wu, Yingcai; Chung, Ka-Kei; Qu, Huamin; Yuan, Xiaoru; Cheung, S C

2009-01-01

Radio frequency identification (RFID) is a powerful automatic remote identification technique that has wide applications. To facilitate RFID deployment, an RFID benchmarking instrument called aGate has been invented to identify the strengths and weaknesses of different RFID technologies in various environments. However, the data acquired by aGate are usually complex time varying multidimensional 3D volumetric data, which are extremely challenging for engineers to analyze. In this paper, we introduce a set of visualization techniques, namely, parallel coordinate plots, orientation plots, a visual history mechanism, and a 3D spatial viewer, to help RFID engineers analyze benchmark data visually and intuitively. With the techniques, we further introduce two workflow procedures (a visual optimization procedure for finding the optimum reader antenna configuration and a visual analysis procedure for comparing the performance and identifying the flaws of RFID devices) for the RFID benchmarking, with focus on the performance analysis of the aGate system. The usefulness and usability of the system are demonstrated in the user evaluation.
Encoding color information for visual tracking: Algorithms and benchmark.

PubMed

Liang, Pengpeng; Blasch, Erik; Ling, Haibin

2015-12-01

While color information is known to provide rich discriminative clues for visual inference, most modern visual trackers limit themselves to the grayscale realm. Despite recent efforts to integrate color in tracking, there is a lack of comprehensive understanding of the role color information can play. In this paper, we attack this problem by conducting a systematic study from both the algorithm and benchmark perspectives. On the algorithm side, we comprehensively encode 10 chromatic models into 16 carefully selected state-of-the-art visual trackers. On the benchmark side, we compile a large set of 128 color sequences with ground truth and challenge factor annotations (e.g., occlusion). A thorough evaluation is conducted by running all the color-encoded trackers, together with two recently proposed color trackers. A further validation is conducted on an RGBD tracking benchmark. The results clearly show the benefit of encoding color information for tracking. We also perform detailed analysis on several issues, including the behavior of various combinations between color model and visual tracker, the degree of difficulty of each sequence for tracking, and how different challenge factors affect the tracking performance. We expect the study to provide the guidance, motivation, and benchmark for future work on encoding color in visual tracking.
Integral Full Core Multi-Physics PWR Benchmark with Measured Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Forget, Benoit; Smith, Kord; Kumar, Shikhar

In recent years, the importance of modeling and simulation has been highlighted extensively in the DOE research portfolio with concrete examples in nuclear engineering with the CASL and NEAMS programs. These research efforts and similar efforts worldwide aim at the development of high-fidelity multi-physics analysis tools for the simulation of current and next-generation nuclear power reactors. Like all analysis tools, verification and validation is essential to guarantee proper functioning of the software and methods employed. The current approach relies mainly on the validation of single physic phenomena (e.g. critical experiment, flow loops, etc.) and there is a lack of relevantmore » multiphysics benchmark measurements that are necessary to validate high-fidelity methods being developed today. This work introduces a new multi-cycle full-core Pressurized Water Reactor (PWR) depletion benchmark based on two operational cycles of a commercial nuclear power plant that provides a detailed description of fuel assemblies, burnable absorbers, in-core fission detectors, core loading and re-loading patterns. This benchmark enables analysts to develop extremely detailed reactor core models that can be used for testing and validation of coupled neutron transport, thermal-hydraulics, and fuel isotopic depletion. The benchmark also provides measured reactor data for Hot Zero Power (HZP) physics tests, boron letdown curves, and three-dimensional in-core flux maps from 58 instrumented assemblies. The benchmark description is now available online and has been used by many groups. However, much work remains to be done on the quantification of uncertainties and modeling sensitivities. This work aims to address these deficiencies and make this benchmark a true non-proprietary international benchmark for the validation of high-fidelity tools. This report details the BEAVRS uncertainty quantification for the first two cycle of operations and serves as the final report of the project.« less
Preliminary Results for the OECD/NEA Time Dependent Benchmark using Rattlesnake, Rattlesnake-IQS and TDKENO

DOE Office of Scientific and Technical Information (OSTI.GOV)

DeHart, Mark D.; Mausolff, Zander; Weems, Zach

2016-08-01

One goal of the MAMMOTH M&S project is to validate the analysis capabilities within MAMMOTH. Historical data has shown limited value for validation of full three-dimensional (3D) multi-physics methods. Initial analysis considered the TREAT startup minimum critical core and one of the startup transient tests. At present, validation is focusing on measurements taken during the M8CAL test calibration series. These exercises will valuable in preliminary assessment of the ability of MAMMOTH to perform coupled multi-physics calculations; calculations performed to date are being used to validate the neutron transport solver Rattlesnake\\cite{Rattlesnake} and the fuels performance code BISON. Other validation projects outsidemore » of TREAT are available for single-physics benchmarking. Because the transient solution capability of Rattlesnake is one of the key attributes that makes it unique for TREAT transient simulations, validation of the transient solution of Rattlesnake using other time dependent kinetics benchmarks has considerable value. The Nuclear Energy Agency (NEA) of the Organization for Economic Cooperation and Development (OECD) has recently developed a computational benchmark for transient simulations. This benchmark considered both two-dimensional (2D) and 3D configurations for a total number of 26 different transients. All are negative reactivity insertions, typically returning to the critical state after some time.« less
Analysis of 100Mb/s Ethernet for the Whitney Commodity Computing Testbed

NASA Technical Reports Server (NTRS)

Fineberg, Samuel A.; Pedretti, Kevin T.; Kutler, Paul (Technical Monitor)

1997-01-01

We evaluate the performance of a Fast Ethernet network configured with a single large switch, a single hub, and a 4x4 2D torus topology in a testbed cluster of "commodity" Pentium Pro PCs. We also evaluated a mixed network composed of ethernet hubs and switches. An MPI collective communication benchmark, and the NAS Parallel Benchmarks version 2.2 (NPB2) show that the torus network performs best for all sizes that we were able to test (up to 16 nodes). For larger networks the ethernet switch outperforms the hub, though its performance is far less than peak. The hub/switch combination tests indicate that the NAS parallel benchmarks are relatively insensitive to hub densities of less than 7 nodes per hub.
A Hyper-Heuristic Ensemble Method for Static Job-Shop Scheduling.

PubMed

Hart, Emma; Sim, Kevin

2016-01-01

We describe a new hyper-heuristic method NELLI-GP for solving job-shop scheduling problems (JSSP) that evolves an ensemble of heuristics. The ensemble adopts a divide-and-conquer approach in which each heuristic solves a unique subset of the instance set considered. NELLI-GP extends an existing ensemble method called NELLI by introducing a novel heuristic generator that evolves heuristics composed of linear sequences of dispatching rules: each rule is represented using a tree structure and is itself evolved. Following a training period, the ensemble is shown to outperform both existing dispatching rules and a standard genetic programming algorithm on a large set of new test instances. In addition, it obtains superior results on a set of 210 benchmark problems from the literature when compared to two state-of-the-art hyper-heuristic approaches. Further analysis of the relationship between heuristics in the evolved ensemble and the instances each solves provides new insights into features that might describe similar instances.
Development and application of freshwater sediment-toxicity benchmarks for currently used pesticides

USGS Publications Warehouse

Nowell, Lisa H.; Norman, Julia E.; Ingersoll, Christopher G.; Moran, Patrick W.

2016-01-01

Sediment-toxicity benchmarks are needed to interpret the biological significance of currently used pesticides detected in whole sediments. Two types of freshwater sediment benchmarks for pesticides were developed using spiked-sediment bioassay (SSB) data from the literature. These benchmarks can be used to interpret sediment-toxicity data or to assess the potential toxicity of pesticides in whole sediment. The Likely Effect Benchmark (LEB) defines a pesticide concentration in whole sediment above which there is a high probability of adverse effects on benthic invertebrates, and the Threshold Effect Benchmark (TEB) defines a concentration below which adverse effects are unlikely. For compounds without available SSBs, benchmarks were estimated using equilibrium partitioning (EqP). When a sediment sample contains a pesticide mixture, benchmark quotients can be summed for all detected pesticides to produce an indicator of potential toxicity for that mixture. Benchmarks were developed for 48 pesticide compounds using SSB data and 81 compounds using the EqP approach. In an example application, data for pesticides measured in sediment from 197 streams across the United States were evaluated using these benchmarks, and compared to measured toxicity from whole-sediment toxicity tests conducted with the amphipod Hyalella azteca (28-d exposures) and the midge Chironomus dilutus (10-d exposures). Amphipod survival, weight, and biomass were significantly and inversely related to summed benchmark quotients, whereas midge survival, weight, and biomass showed no relationship to benchmarks. Samples with LEB exceedances were rare (n = 3), but all were toxic to amphipods (i.e., significantly different from control). Significant toxicity to amphipods was observed for 72% of samples exceeding one or more TEBs, compared to 18% of samples below all TEBs. Factors affecting toxicity below TEBs may include the presence of contaminants other than pesticides, physical/chemical characteristics of sediment, and uncertainty in TEB values. Additional evaluations of benchmarks in relation to sediment chemistry and toxicity are ongoing.
Towards a sharp-interface volume-of-fluid methodology for modeling evaporation

NASA Astrophysics Data System (ADS)

Pathak, Ashish; Raessi, Mehdi

2017-11-01

In modeling evaporation, the diffuse-interface (one-domain) formulation yields inaccurate results. Recent efforts approaching the problem via a sharp-interface (two-domain) formulation have shown significant improvements. The reasons behind their better performance are discussed in the present work. All available sharp-interface methods, however, exclusively employ the level-set. In the present work, we develop a sharp-interface evaporation model in a volume-of-fluid (VOF) framework in order to leverage its mass-conserving property as well as its ability to handle large topographical changes. We start with a critical review of the assumptions underlying the mathematical equations governing evaporation. For example, it is shown that the assumption of incompressibility can only be applied in special circumstances. The famous D2 law used for benchmarking is valid exclusively to steady-state test problems. Transient is present over significant lifetime of a micron-size droplet. Therefore, a 1D spherical fully transient model is developed to provide a benchmark transient solution. Finally, a 3D Cartesian Navier-Stokes evaporation solver is developed. Some preliminary validation test-cases are presented for static and moving drop evaporation. This material is based upon work supported by the Department of Energy, Office of Energy Efficiency and Renewable Energy and the Department of Defense, Tank and Automotive Research, Development, and Engineering Center, under Award Number DEEE0007292.
PdnCO (n = 1,2): accurate Ab initio bond energies, geometries, and dipole moments and the applicability of density functional theory for fuel cell modeling.

PubMed

Schultz, Nathan E; Gherman, Benjamin F; Cramer, Christopher J; Truhlar, Donald G

2006-11-30

Electrode poisoning by CO is a major concern in fuel cells. As interest in applying computational methods to electrochemistry is increasing, it is important to understand the levels of theory required for reliable treatments of metal-CO interactions. In this paper we justify the use of relativistic effective core potentials for the treatment of PdCO and hence, by inference, for metal-CO interactions where the predominant bonding mechanism is charge transfer. We also sort out key issues involving basis sets and we recommend that bond energies of 17.2, 43.3, and 69.4 kcal/mol be used as the benchmark bond energy for dissociation of Pd2 into Pd atoms, PdCO into Pd and CO, and Pd2CO into Pd2 and CO, respectively. We calculated the dipole moments of PdCO and Pd2CO, and we recommend benchmark values of 2.49 and 2.81 D, respectively. Furthermore, we tested 27 density functionals for this system and found that only hybrid density functionals can qualitatively and quantitatively predict the nature of the sigma-donation/pi-back-donation mechanism that is associated with the Pd-CO and Pd2-CO bonds. The most accurate density functionals for the systems tested in this paper are O3LYP, OLYP, PW6B95, and PBEh.
Technical Report: Benchmarking for Quasispecies Abundance Inference with Confidence Intervals from Metagenomic Sequence Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

McLoughlin, K.

2016-01-22

The software application “MetaQuant” was developed by our group at Lawrence Livermore National Laboratory (LLNL). It is designed to profile microbial populations in a sample using data from whole-genome shotgun (WGS) metagenomic DNA sequencing. Several other metagenomic profiling applications have been described in the literature. We ran a series of benchmark tests to compare the performance of MetaQuant against that of a few existing profiling tools, using real and simulated sequence datasets. This report describes our benchmarking procedure and results.
PeakRanger: A cloud-enabled peak caller for ChIP-seq data

PubMed Central

2011-01-01

Background Chromatin immunoprecipitation (ChIP), coupled with massively parallel short-read sequencing (seq) is used to probe chromatin dynamics. Although there are many algorithms to call peaks from ChIP-seq datasets, most are tuned either to handle punctate sites, such as transcriptional factor binding sites, or broad regions, such as histone modification marks; few can do both. Other algorithms are limited in their configurability, performance on large data sets, and ability to distinguish closely-spaced peaks. Results In this paper, we introduce PeakRanger, a peak caller software package that works equally well on punctate and broad sites, can resolve closely-spaced peaks, has excellent performance, and is easily customized. In addition, PeakRanger can be run in a parallel cloud computing environment to obtain extremely high performance on very large data sets. We present a series of benchmarks to evaluate PeakRanger against 10 other peak callers, and demonstrate the performance of PeakRanger on both real and synthetic data sets. We also present real world usages of PeakRanger, including peak-calling in the modENCODE project. Conclusions Compared to other peak callers tested, PeakRanger offers improved resolution in distinguishing extremely closely-spaced peaks. PeakRanger has above-average spatial accuracy in terms of identifying the precise location of binding events. PeakRanger also has excellent sensitivity and specificity in all benchmarks evaluated. In addition, PeakRanger offers significant improvements in run time when running on a single processor system, and very marked improvements when allowed to take advantage of the MapReduce parallel environment offered by a cloud computing resource. PeakRanger can be downloaded at the official site of modENCODE project: http://www.modencode.org/software/ranger/ PMID:21554709
Simulation Studies for Inspection of the Benchmark Test with PATRASH

NASA Astrophysics Data System (ADS)

Shimosaki, Y.; Igarashi, S.; Machida, S.; Shirakata, M.; Takayama, K.; Noda, F.; Shigaki, K.

2002-12-01

In order to delineate the halo-formation mechanisms in a typical FODO lattice, a 2-D simulation code PATRASH (PArticle TRAcking in a Synchrotron for Halo analysis) has been developed. The electric field originating from the space charge is calculated by the Hybrid Tree code method. Benchmark tests utilizing three simulation codes of ACCSIM, PATRASH and SIMPSONS were carried out. These results have been confirmed to be fairly in agreement with each other. The details of PATRASH simulation are discussed with some examples.
Systematic study on the TD-DFT calculated electronic circular dichroism spectra of chiral aromatic nitro compounds: A comparison of B3LYP and CAM-B3LYP.

PubMed

Komjáti, Balázs; Urai, Ákos; Hosztafi, Sándor; Kökösi, József; Kováts, Benjámin; Nagy, József; Horváth, Péter

2016-02-15

B3LYP is one of the most widely used functional for the prediction of electronic circular dichroism spectra, however if the studied molecule contains aromatic nitro group computations may fail to produce reliable results. A test set of molecules of known stereochemistry were synthesized to study this phenomenon in detail. Spectra were computed by B3LYP and CAM-B3LYP functionals with 6-311++G(2d,2p) basis set. It was found that the range separated CAM-B3LYP gives better predictions than B3LYP for all test molecules. Fragment population analysis revealed that the nitro groups form highly localized molecule orbitals but the exact composition depends on the functional. CAM-B3LYP allows sufficient spatial overlap between the nitro group and distant parts of the molecule, which is necessary for the accurate description of excited states especially for charge transfer states. This phenomenon and the synthesized test molecules can be used to benchmark theoretical methods as well as to help the development of new functionals intended for spectroscopical studies. Copyright © 2015 Elsevier B.V. All rights reserved.
Generalized Self-Organizing Maps for Automatic Determination of the Number of Clusters and Their Multiprototypes in Cluster Analysis.

PubMed

Gorzalczany, Marian B; Rudzinski, Filip

2017-06-07

This paper presents a generalization of self-organizing maps with 1-D neighborhoods (neuron chains) that can be effectively applied to complex cluster analysis problems. The essence of the generalization consists in introducing mechanisms that allow the neuron chain--during learning--to disconnect into subchains, to reconnect some of the subchains again, and to dynamically regulate the overall number of neurons in the system. These features enable the network--working in a fully unsupervised way (i.e., using unlabeled data without a predefined number of clusters)--to automatically generate collections of multiprototypes that are able to represent a broad range of clusters in data sets. First, the operation of the proposed approach is illustrated on some synthetic data sets. Then, this technique is tested using several real-life, complex, and multidimensional benchmark data sets available from the University of California at Irvine (UCI) Machine Learning repository and the Knowledge Extraction based on Evolutionary Learning data set repository. A sensitivity analysis of our approach to changes in control parameters and a comparative analysis with an alternative approach are also performed.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Van Der Marck, S. C.

Three nuclear data libraries have been tested extensively using criticality safety benchmark calculations. The three libraries are the new release of the US library ENDF/B-VII.1 (2011), the new release of the Japanese library JENDL-4.0 (2011), and the OECD/NEA library JEFF-3.1 (2006). All calculations were performed with the continuous-energy Monte Carlo code MCNP (version 4C3, as well as version 6-beta1). Around 2000 benchmark cases from the International Handbook of Criticality Safety Benchmark Experiments (ICSBEP) were used. The results were analyzed per ICSBEP category, and per element. Overall, the three libraries show similar performance on most criticality safety benchmarks. The largest differencesmore » are probably caused by elements such as Be, C, Fe, Zr, W. (authors)« less

Real-effectiveness medicine--pursuing the best effectiveness in the ordinary care of patients.

PubMed

Malmivaara, Antti

2013-03-01

Clinical know-how and skills as well as up-to-date scientific evidence are cornerstones for providing effective treatment for patients. However, in order to improve the effectiveness of treatment in ordinary practice, also appropriate documentation of care at the health care units and benchmarking based on this documentation are needed. This article presents the new concept of real-effectiveness medicine (REM) which pursues the best effectiveness of patient care in the real-world setting. In order to reach the goal, four layers of information are utilized: 1) good medical know-how and skills combined with the patient view, 2) up-to-date scientific evidence, 3) continuous documentation of performance in ordinary settings, and 4) benchmarking between providers. The new framework is suggested for clinicians, organizations, policy-makers, and researchers.
A Better Benchmark Assessment: Multiple-Choice versus Project-Based

ERIC Educational Resources Information Center

Peariso, Jamon F.

2006-01-01

The purpose of this literature review and Ex Post Facto descriptive study was to determine which type of benchmark assessment, multiple-choice or project-based, provides the best indication of general success on the history portion of the CST (California Standards Tests). The result of the study indicates that although the project-based benchmark…
Benchmark testing of DIII-D neutral beam modeling with water flow calorimetry

DOE PAGES

Rauch, J. M.; Crowley, B. J.; Scoville, J. T.; ...

2016-06-02

Power loading on beamline components in the DIII-D neutral beam system is measured in this paper using water flow calorimetry. The results are used to benchmark beam transport models. Finally, anomalously high heat loads in the magnet region are investigated and a speculative hypothesis as to their origin is presented.
A Field-Based Aquatic Life Benchmark for Conductivity in Central Appalachian Streams (2010) (External Review Draft)

EPA Science Inventory

This report adapts the standard U.S. EPA methodology for deriving ambient water quality criteria. Rather than use toxicity test results, the adaptation uses field data to determine the loss of 5% of genera from streams. The method is applied to derive effect benchmarks for disso...
Academic Achievement and Extracurricular School Activities of At-Risk High School Students

ERIC Educational Resources Information Center

Marchetti, Ryan; Wilson, Randal H.; Dunham, Mardis

2016-01-01

This study compared the employment, extracurricular participation, and family structure status of students from low socioeconomic families that achieved state-approved benchmarks on ACT reading and mathematics tests to those that did not achieve the benchmarks. Free and reduced lunch eligibility was used to determine SES. Participants included 211…
Benchmarking for maximum value.

PubMed

Baldwin, Ed

2009-03-01

Speaking at the most recent Healthcare Estates conference, Ed Baldwin, of international built asset consultancy EC Harris LLP, examined the role of benchmarking and market-testing--two of the key methods used to evaluate the quality and cost-effectiveness of hard and soft FM services provided under PFI healthcare schemes to ensure they are offering maximum value for money.
Benchmarking health IT among OECD countries: better data for better policy

PubMed Central

Adler-Milstein, Julia; Ronchi, Elettra; Cohen, Genna R; Winn, Laura A Pannella; Jha, Ashish K

2014-01-01

Objective To develop benchmark measures of health information and communication technology (ICT) use to facilitate cross-country comparisons and learning. Materials and methods The effort is led by the Organisation for Economic Co-operation and Development (OECD). Approaches to definition and measurement within four ICT domains were compared across seven OECD countries in order to identify functionalities in each domain. These informed a set of functionality-based benchmark measures, which were refined in collaboration with representatives from more than 20 OECD and non-OECD countries. We report on progress to date and remaining work to enable countries to begin to collect benchmark data. Results The four benchmarking domains include provider-centric electronic record, patient-centric electronic record, health information exchange, and tele-health. There was broad agreement on functionalities in the provider-centric electronic record domain (eg, entry of core patient data, decision support), and less agreement in the other three domains in which country representatives worked to select benchmark functionalities. Discussion Many countries are working to implement ICTs to improve healthcare system performance. Although many countries are looking to others as potential models, the lack of consistent terminology and approach has made cross-national comparisons and learning difficult. Conclusions As countries develop and implement strategies to increase the use of ICTs to promote health goals, there is a historic opportunity to enable cross-country learning. To facilitate this learning and reduce the chances that individual countries flounder, a common understanding of health ICT adoption and use is needed. The OECD-led benchmarking process is a crucial step towards achieving this. PMID:23721983
Benchmarking routine psychological services: a discussion of challenges and methods.

PubMed

Delgadillo, Jaime; McMillan, Dean; Leach, Chris; Lucock, Mike; Gilbody, Simon; Wood, Nick

2014-01-01

Policy developments in recent years have led to important changes in the level of access to evidence-based psychological treatments. Several methods have been used to investigate the effectiveness of these treatments in routine care, with different approaches to outcome definition and data analysis. To present a review of challenges and methods for the evaluation of evidence-based treatments delivered in routine mental healthcare. This is followed by a case example of a benchmarking method applied in primary care. High, average and poor performance benchmarks were calculated through a meta-analysis of published data from services working under the Improving Access to Psychological Therapies (IAPT) Programme in England. Pre-post treatment effect sizes (ES) and confidence intervals were estimated to illustrate a benchmarking method enabling services to evaluate routine clinical outcomes. High, average and poor performance ES for routine IAPT services were estimated to be 0.91, 0.73 and 0.46 for depression (using PHQ-9) and 1.02, 0.78 and 0.52 for anxiety (using GAD-7). Data from one specific IAPT service exemplify how to evaluate and contextualize routine clinical performance against these benchmarks. The main contribution of this report is to summarize key recommendations for the selection of an adequate set of psychometric measures, the operational definition of outcomes, and the statistical evaluation of clinical performance. A benchmarking method is also presented, which may enable a robust evaluation of clinical performance against national benchmarks. Some limitations concerned significant heterogeneity among data sources, and wide variations in ES and data completeness.
Benchmarking health IT among OECD countries: better data for better policy.

PubMed

Adler-Milstein, Julia; Ronchi, Elettra; Cohen, Genna R; Winn, Laura A Pannella; Jha, Ashish K

2014-01-01

To develop benchmark measures of health information and communication technology (ICT) use to facilitate cross-country comparisons and learning. The effort is led by the Organisation for Economic Co-operation and Development (OECD). Approaches to definition and measurement within four ICT domains were compared across seven OECD countries in order to identify functionalities in each domain. These informed a set of functionality-based benchmark measures, which were refined in collaboration with representatives from more than 20 OECD and non-OECD countries. We report on progress to date and remaining work to enable countries to begin to collect benchmark data. The four benchmarking domains include provider-centric electronic record, patient-centric electronic record, health information exchange, and tele-health. There was broad agreement on functionalities in the provider-centric electronic record domain (eg, entry of core patient data, decision support), and less agreement in the other three domains in which country representatives worked to select benchmark functionalities. Many countries are working to implement ICTs to improve healthcare system performance. Although many countries are looking to others as potential models, the lack of consistent terminology and approach has made cross-national comparisons and learning difficult. As countries develop and implement strategies to increase the use of ICTs to promote health goals, there is a historic opportunity to enable cross-country learning. To facilitate this learning and reduce the chances that individual countries flounder, a common understanding of health ICT adoption and use is needed. The OECD-led benchmarking process is a crucial step towards achieving this.
A building extraction approach for Airborne Laser Scanner data utilizing the Object Based Image Analysis paradigm

NASA Astrophysics Data System (ADS)

Tomljenovic, Ivan; Tiede, Dirk; Blaschke, Thomas

2016-10-01

In the past two decades Object-Based Image Analysis (OBIA) established itself as an efficient approach for the classification and extraction of information from remote sensing imagery and, increasingly, from non-image based sources such as Airborne Laser Scanner (ALS) point clouds. ALS data is represented in the form of a point cloud with recorded multiple returns and intensities. In our work, we combined OBIA with ALS point cloud data in order to identify and extract buildings as 2D polygons representing roof outlines in a top down mapping approach. We performed rasterization of the ALS data into a height raster for the purpose of the generation of a Digital Surface Model (DSM) and a derived Digital Elevation Model (DEM). Further objects were generated in conjunction with point statistics from the linked point cloud. With the use of class modelling methods, we generated the final target class of objects representing buildings. The approach was developed for a test area in Biberach an der Riß (Germany). In order to point out the possibilities of the adaptation-free transferability to another data set, the algorithm has been applied ;as is; to the ISPRS Benchmarking data set of Toronto (Canada). The obtained results show high accuracies for the initial study area (thematic accuracies of around 98%, geometric accuracy of above 80%). The very high performance within the ISPRS Benchmark without any modification of the algorithm and without any adaptation of parameters is particularly noteworthy.
Comprehensive benchmarking reveals H2BK20 acetylation as a distinctive signature of cell-state-specific enhancers and promoters.

PubMed

Kumar, Vibhor; Rayan, Nirmala Arul; Muratani, Masafumi; Lim, Stefan; Elanggovan, Bavani; Xin, Lixia; Lu, Tess; Makhija, Harshyaa; Poschmann, Jeremie; Lufkin, Thomas; Ng, Huck Hui; Prabhakar, Shyam

2016-05-01

Although over 35 different histone acetylation marks have been described, the overwhelming majority of regulatory genomics studies focus exclusively on H3K27ac and H3K9ac. In order to identify novel epigenomic traits of regulatory elements, we constructed a benchmark set of validated enhancers by performing 140 enhancer assays in human T cells. We tested 40 chromatin signatures on this unbiased enhancer set and identified H2BK20ac, a little-studied histone modification, as the most predictive mark of active enhancers. Notably, we detected a novel class of functionally distinct enhancers enriched in H2BK20ac but lacking H3K27ac, which was present in all examined cell lines and also in embryonic forebrain tissue. H2BK20ac was also unique in highlighting cell-type-specific promoters. In contrast, other acetylation marks were present in all active promoters, regardless of cell-type specificity. In stimulated microglial cells, H2BK20ac was more correlated with cell-state-specific expression changes than H3K27ac, with TGF-beta signaling decoupling the two acetylation marks at a subset of regulatory elements. In summary, our study reveals a previously unknown connection between histone acetylation and cell-type-specific gene regulation and indicates that H2BK20ac profiling can be used to uncover new dimensions of gene regulation. © 2016 Kumar et al.; Published by Cold Spring Harbor Laboratory Press.
What Can Pictures Tell Us About Web Pages? Improving Document Search Using Images.

PubMed

Rodriguez-Vaamonde, Sergio; Torresani, Lorenzo; Fitzgibbon, Andrew W

2015-06-01

Traditional Web search engines do not use the images in the HTML pages to find relevant documents for a given query. Instead, they typically operate by computing a measure of agreement between the keywords provided by the user and only the text portion of each page. In this paper we study whether the content of the pictures appearing in a Web page can be used to enrich the semantic description of an HTML document and consequently boost the performance of a keyword-based search engine. We present a Web-scalable system that exploits a pure text-based search engine to find an initial set of candidate documents for a given query. Then, the candidate set is reranked using visual information extracted from the images contained in the pages. The resulting system retains the computational efficiency of traditional text-based search engines with only a small additional storage cost needed to encode the visual information. We test our approach on one of the TREC Million Query Track benchmarks where we show that the exploitation of visual content yields improvement in accuracies for two distinct text-based search engines, including the system with the best reported performance on this benchmark. We further validate our approach by collecting document relevance judgements on our search results using Amazon Mechanical Turk. The results of this experiment confirm the improvement in accuracy produced by our image-based reranker over a pure text-based system.
The implementation of interconception care in two community health settings: lessons learned.

PubMed

Handler, Arden; Rankin, Kristin M; Peacock, Nadine; Townsell, Stephanie; McGlynn, Andrea; Issel, L Michele

2013-01-01

This study reports on an evaluation of the implementation of a pilot interconceptional care program (ICCP) in Chicago and the experiences of the participants in their first postpartum year. A longitudinal, multi-method approach was used to gather data to measure success in achieving project benchmarks and to gain insights into women's experiences after an adverse pregnancy outcome. The ICCP interventions were provided in two different health care settings. Low-income African-American women with a prior adverse pregnancy outcome were recruited to participate. Data on services delivered are available for 220 women; linked interview data are also available for 99 of these women. The ICCP focused on the integration of social services, family planning, and medical care provided through a team approach. An interview questionnaire asked detailed information about interconceptional health status, attitudes, and behaviors. A services database documented all services delivered to each participant. Key informant interviews were conducted with the ICCP project staff. Simple frequencies were generated. Chi-square and t-tests were used to compare participants and benchmarks at the two different sites. The planned delivery of interventions based on women's unique interconceptional health needs was often replaced by efforts to address women's socioeconomic needs. Although medical care remained important, participants viewed themselves as healthy and did not view medical care as a priority. Women's perceptions of contraceptive effectiveness were not always in sync with clinical knowledge. Interconceptional care is a complex process of matching interventions and services to meet women's unique needs, including their socioeconomic needs.
Comprehensive benchmarking reveals H2BK20 acetylation as a distinctive signature of cell-state-specific enhancers and promoters

PubMed Central

Kumar, Vibhor; Rayan, Nirmala Arul; Muratani, Masafumi; Lim, Stefan; Elanggovan, Bavani; Xin, Lixia; Lu, Tess; Makhija, Harshyaa; Poschmann, Jeremie; Lufkin, Thomas; Ng, Huck Hui; Prabhakar, Shyam

2016-01-01

Although over 35 different histone acetylation marks have been described, the overwhelming majority of regulatory genomics studies focus exclusively on H3K27ac and H3K9ac. In order to identify novel epigenomic traits of regulatory elements, we constructed a benchmark set of validated enhancers by performing 140 enhancer assays in human T cells. We tested 40 chromatin signatures on this unbiased enhancer set and identified H2BK20ac, a little-studied histone modification, as the most predictive mark of active enhancers. Notably, we detected a novel class of functionally distinct enhancers enriched in H2BK20ac but lacking H3K27ac, which was present in all examined cell lines and also in embryonic forebrain tissue. H2BK20ac was also unique in highlighting cell-type-specific promoters. In contrast, other acetylation marks were present in all active promoters, regardless of cell-type specificity. In stimulated microglial cells, H2BK20ac was more correlated with cell-state-specific expression changes than H3K27ac, with TGF-beta signaling decoupling the two acetylation marks at a subset of regulatory elements. In summary, our study reveals a previously unknown connection between histone acetylation and cell-type-specific gene regulation and indicates that H2BK20ac profiling can be used to uncover new dimensions of gene regulation. PMID:26957309
OPTIMIZATION OF MUD HAMMER DRILLING PERFORMANCE - A PROGRAM TO BENCHMARK THE VIABILITY OF ADVANCED MUD HAMMER DRILLING

DOE Office of Scientific and Technical Information (OSTI.GOV)

Alan Black; Arnis Judzis

2003-01-01

Progress during current reporting year 2002 by quarter--Progress during Q1 2002: (1) In accordance to Task 7.0 (D. No.2 Technical Publications) TerraTek, NETL, and the Industry Contributors successfully presented a paper detailing Phase 1 testing results at the February 2002 IADC/SPE Drilling Conference, a prestigious venue for presenting DOE and private sector drilling technology advances. The full reference is as follows: IADC/SPE 74540 ''World's First Benchmarking of Drilling Mud Hammer Performance at Depth Conditions'' authored by Gordon A. Tibbitts, TerraTek; Roy C. Long, US Department of Energy, Brian E. Miller, BP America, Inc.; Arnis Judzis, TerraTek; and Alan D. Black,more » TerraTek. Gordon Tibbitts, TerraTek, will presented the well-attended paper in February of 2002. The full text of the Mud Hammer paper was included in the last quarterly report. (2) The Phase 2 project planning meeting (Task 6) was held at ExxonMobil's Houston Greenspoint offices on February 22, 2002. In attendance were representatives from TerraTek, DOE, BP, ExxonMobil, PDVSA, Novatek, and SDS Digger Tools. (3) PDVSA has joined the advisory board to this DOE mud hammer project. PDVSA's commitment of cash and in-kind contributions were reported during the last quarter. (4) Strong Industry support remains for the DOE project. Both Andergauge and Smith Tools have expressed an interest in participating in the ''optimization'' phase of the program. The potential for increased testing with additional Industry cash support was discussed at the planning meeting in February 2002. Progress during Q2 2002: (1) Presentation material was provided to the DOE/NETL project manager (Dr. John Rogers) for the DOE exhibit at the 2002 Offshore Technology Conference. (2) Two meeting at Smith International and one at Andergauge in Houston were held to investigate their interest in joining the Mud Hammer Performance study. (3) SDS Digger Tools (Task 3 Benchmarking participant) apparently has not negotiated a commercial deal with Halliburton on the supply of fluid hammers to the oil and gas business. (4) TerraTek is awaiting progress by Novatek (a DOE contractor) on the redesign and development of their next hammer tool. Their delay will require an extension to TerraTek's contracted program. (5) Smith International has sufficient interest in the program to start engineering and chroming of collars for testing at TerraTek. (6) Shell's Brian Tarr has agreed to join the Industry Advisory Group for the DOE project. The addition of Brian Tarr is welcomed as he has numerous years of experience with the Novatek tool and was involved in the early tests in Europe while with Mobil Oil. (7) Conoco's field trial of the Smith fluid hammer for an application in Vietnam was organized and has contributed to the increased interest in their tool. Progress during Q3 2002: (1) Smith International agreed to participate in the DOE Mud Hammer program. (2) Smith International chromed collars for upcoming benchmark tests at TerraTek, now scheduled for 4Q 2002. (3) ConocoPhillips had a field trial of the Smith fluid hammer offshore Vietnam. The hammer functioned properly, though the well encountered hole conditions and reaming problems. ConocoPhillips plan another field trial as a result. (4) DOE/NETL extended the contract for the fluid hammer program to allow Novatek to ''optimize'' their much delayed tool to 2003 and to allow Smith International to add ''benchmarking'' tests in light of SDS Digger Tools' current financial inability to participate. (5) ConocoPhillips joined the Industry Advisors for the mud hammer program. Progress during Q4 2002: (1) Smith International participated in the DOE Mud Hammer program through full scale benchmarking testing during the week of 4 November 2003. (2) TerraTek acknowledges Smith International, BP America, PDVSA, and ConocoPhillips for cost-sharing the Smith benchmarking tests allowing extension of the contract to add to the benchmarking testing program. (3) Following the benchmark testing of the Smith International hammer, representatives from DOE/NETL, TerraTek, Smith International and PDVSA met at TerraTek in Salt Lake City to review observations, performance and views on the optimization step for 2003. (4) The December 2002 issue of Journal of Petroleum Technology (Society of Petroleum Engineers) highlighted the DOE fluid hammer testing program and reviewed last years paper on the benchmark performance of the SDS Digger and Novatek hammers. (5) TerraTek's Sid Green presented a technical review for DOE/NETL personnel in Morgantown on ''Impact Rock Breakage'' and its importance on improving fluid hammer performance. Much discussion has taken place on the issues surrounding mud hammer performance at depth conditions.« less
OPTIMIZATION OF DEEP DRILLING PERFORMANCE--DEVELOPMENT AND BENCHMARK TESTING OF ADVANCED DIAMOND PRODUCT DRILL BITS & HP/HT FLUIDS TO SIGNIFICANTLY IMPROVE RATES OF PENETRATION

DOE Office of Scientific and Technical Information (OSTI.GOV)

Alan Black; Arnis Judzis

2004-10-01

The industry cost shared program aims to benchmark drilling rates of penetration in selected simulated deep formations and to significantly improve ROP through a team development of aggressive diamond product drill bit--fluid system technologies. Overall the objectives are as follows: Phase 1--Benchmark ''best in class'' diamond and other product drilling bits and fluids and develop concepts for a next level of deep drilling performance; Phase 2--Develop advanced smart bit-fluid prototypes and test at large scale; and Phase 3--Field trial smart bit-fluid concepts, modify as necessary and commercialize products. As of report date, TerraTek has concluded all major preparations for themore » high pressure drilling campaign. Baker Hughes encountered difficulties in providing additional pumping capacity before TerraTek's scheduled relocation to another facility, thus the program was delayed further to accommodate the full testing program.« less
Data processing has major impact on the outcome of quantitative label-free LC-MS analysis.

PubMed

Chawade, Aakash; Sandin, Marianne; Teleman, Johan; Malmström, Johan; Levander, Fredrik

2015-02-06

High-throughput multiplexed protein quantification using mass spectrometry is steadily increasing in popularity, with the two major techniques being data-dependent acquisition (DDA) and targeted acquisition using selected reaction monitoring (SRM). However, both techniques involve extensive data processing, which can be performed by a multitude of different software solutions. Analysis of quantitative LC-MS/MS data is mainly performed in three major steps: processing of raw data, normalization, and statistical analysis. To evaluate the impact of data processing steps, we developed two new benchmark data sets, one each for DDA and SRM, with samples consisting of a long-range dilution series of synthetic peptides spiked in a total cell protein digest. The generated data were processed by eight different software workflows and three postprocessing steps. The results show that the choice of the raw data processing software and the postprocessing steps play an important role in the final outcome. Also, the linear dynamic range of the DDA data could be extended by an order of magnitude through feature alignment and a charge state merging algorithm proposed here. Furthermore, the benchmark data sets are made publicly available for further benchmarking and software developments.
Response to "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra".

PubMed

Griss, Johannes; Perez-Riverol, Yasset; The, Matthew; Käll, Lukas; Vizcaíno, Juan Antonio

2018-05-04

In the recent benchmarking article entitled "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra", Rieder et al. compared several different approaches to cluster MS/MS spectra. While we certainly recognize the value of the manuscript, here, we report some shortcomings detected in the original analyses. For most analyses, the authors clustered only single MS/MS runs. In one of the reported analyses, three MS/MS runs were processed together, which already led to computational performance issues in many of the tested approaches. This fact highlights the difficulties of using many of the tested algorithms on the nowadays produced average proteomics data sets. Second, the authors only processed identified spectra when merging MS runs. Thereby, all unidentified spectra that are of lower quality were already removed from the data set and could not influence the clustering results. Next, we found that the authors did not analyze the effect of chimeric spectra on the clustering results. In our analysis, we found that 3% of the spectra in the used data sets were chimeric, and this had marked effects on the behavior of the different clustering algorithms tested. Finally, the authors' choice to evaluate the MS-Cluster and spectra-cluster algorithms using a precursor tolerance of 5 Da for high-resolution Orbitrap data only was, in our opinion, not adequate to assess the performance of MS/MS clustering approaches.
Optimization of Deep Drilling Performance--Development and Benchmark Testing of Advanced Diamond Product Drill Bits & HP/HT Fluids to Significantly Improve Rates of Penetration

DOE Office of Scientific and Technical Information (OSTI.GOV)

Alan Black; Arnis Judzis

2003-10-01

This document details the progress to date on the OPTIMIZATION OF DEEP DRILLING PERFORMANCE--DEVELOPMENT AND BENCHMARK TESTING OF ADVANCED DIAMOND PRODUCT DRILL BITS AND HP/HT FLUIDS TO SIGNIFICANTLY IMPROVE RATES OF PENETRATION contract for the year starting October 2002 through September 2002. The industry cost shared program aims to benchmark drilling rates of penetration in selected simulated deep formations and to significantly improve ROP through a team development of aggressive diamond product drill bit--fluid system technologies. Overall the objectives are as follows: Phase 1--Benchmark ''best in class'' diamond and other product drilling bits and fluids and develop concepts for amore » next level of deep drilling performance; Phase 2--Develop advanced smart bit--fluid prototypes and test at large scale; and Phase 3--Field trial smart bit--fluid concepts, modify as necessary and commercialize products. Accomplishments to date include the following: 4Q 2002--Project started; Industry Team was assembled; Kick-off meeting was held at DOE Morgantown; 1Q 2003--Engineering meeting was held at Hughes Christensen, The Woodlands Texas to prepare preliminary plans for development and testing and review equipment needs; Operators started sending information regarding their needs for deep drilling challenges and priorities for large-scale testing experimental matrix; Aramco joined the Industry Team as DEA 148 objectives paralleled the DOE project; 2Q 2003--Engineering and planning for high pressure drilling at TerraTek commenced; 3Q 2003--Continuation of engineering and design work for high pressure drilling at TerraTek; Baker Hughes INTEQ drilling Fluids and Hughes Christensen commence planning for Phase 1 testing--recommendations for bits and fluids.« less
Multi-strategy coevolving aging particle optimization.

PubMed

Iacca, Giovanni; Caraffini, Fabio; Neri, Ferrante

2014-02-01

We propose Multi-Strategy Coevolving Aging Particles (MS-CAP), a novel population-based algorithm for black-box optimization. In a memetic fashion, MS-CAP combines two components with complementary algorithm logics. In the first stage, each particle is perturbed independently along each dimension with a progressively shrinking (decaying) radius, and attracted towards the current best solution with an increasing force. In the second phase, the particles are mutated and recombined according to a multi-strategy approach in the fashion of the ensemble of mutation strategies in Differential Evolution. The proposed algorithm is tested, at different dimensionalities, on two complete black-box optimization benchmarks proposed at the Congress on Evolutionary Computation 2010 and 2013. To demonstrate the applicability of the approach, we also test MS-CAP to train a Feedforward Neural Network modeling the kinematics of an 8-link robot manipulator. The numerical results show that MS-CAP, for the setting considered in this study, tends to outperform the state-of-the-art optimization algorithms on a large set of problems, thus resulting in a robust and versatile optimizer.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.