NASA Technical Reports Server (NTRS)
Generazio, Edward R. (Inventor)
2012-01-01
A method of validating a probability of detection (POD) testing system using directed design of experiments (DOE) includes recording an input data set of observed hit and miss or analog data for sample components as a function of size of a flaw in the components. The method also includes processing the input data set to generate an output data set having an optimal class width, assigning a case number to the output data set, and generating validation instructions based on the assigned case number. An apparatus includes a host machine for receiving the input data set from the testing system and an algorithm for executing DOE to validate the test system. The algorithm applies DOE to the input data set to determine a data set having an optimal class width, assigns a case number to that data set, and generates validation instructions based on the case number.
Assessing Discriminative Performance at External Validation of Clinical Prediction Models
Nieboer, Daan; van der Ploeg, Tjeerd; Steyerberg, Ewout W.
2016-01-01
Introduction External validation studies are essential to study the generalizability of prediction models. Recently a permutation test, focusing on discrimination as quantified by the c-statistic, was proposed to judge whether a prediction model is transportable to a new setting. We aimed to evaluate this test and compare it to previously proposed procedures to judge any changes in c-statistic from development to external validation setting. Methods We compared the use of the permutation test to the use of benchmark values of the c-statistic following from a previously proposed framework to judge transportability of a prediction model. In a simulation study we developed a prediction model with logistic regression on a development set and validated them in the validation set. We concentrated on two scenarios: 1) the case-mix was more heterogeneous and predictor effects were weaker in the validation set compared to the development set, and 2) the case-mix was less heterogeneous in the validation set and predictor effects were identical in the validation and development set. Furthermore we illustrated the methods in a case study using 15 datasets of patients suffering from traumatic brain injury. Results The permutation test indicated that the validation and development set were homogenous in scenario 1 (in almost all simulated samples) and heterogeneous in scenario 2 (in 17%-39% of simulated samples). Previously proposed benchmark values of the c-statistic and the standard deviation of the linear predictors correctly pointed at the more heterogeneous case-mix in scenario 1 and the less heterogeneous case-mix in scenario 2. Conclusion The recently proposed permutation test may provide misleading results when externally validating prediction models in the presence of case-mix differences between the development and validation population. To correctly interpret the c-statistic found at external validation it is crucial to disentangle case-mix differences from incorrect regression coefficients. PMID:26881753
Assessing Discriminative Performance at External Validation of Clinical Prediction Models.
Nieboer, Daan; van der Ploeg, Tjeerd; Steyerberg, Ewout W
2016-01-01
External validation studies are essential to study the generalizability of prediction models. Recently a permutation test, focusing on discrimination as quantified by the c-statistic, was proposed to judge whether a prediction model is transportable to a new setting. We aimed to evaluate this test and compare it to previously proposed procedures to judge any changes in c-statistic from development to external validation setting. We compared the use of the permutation test to the use of benchmark values of the c-statistic following from a previously proposed framework to judge transportability of a prediction model. In a simulation study we developed a prediction model with logistic regression on a development set and validated them in the validation set. We concentrated on two scenarios: 1) the case-mix was more heterogeneous and predictor effects were weaker in the validation set compared to the development set, and 2) the case-mix was less heterogeneous in the validation set and predictor effects were identical in the validation and development set. Furthermore we illustrated the methods in a case study using 15 datasets of patients suffering from traumatic brain injury. The permutation test indicated that the validation and development set were homogenous in scenario 1 (in almost all simulated samples) and heterogeneous in scenario 2 (in 17%-39% of simulated samples). Previously proposed benchmark values of the c-statistic and the standard deviation of the linear predictors correctly pointed at the more heterogeneous case-mix in scenario 1 and the less heterogeneous case-mix in scenario 2. The recently proposed permutation test may provide misleading results when externally validating prediction models in the presence of case-mix differences between the development and validation population. To correctly interpret the c-statistic found at external validation it is crucial to disentangle case-mix differences from incorrect regression coefficients.
Parameterization of Model Validating Sets for Uncertainty Bound Optimizations. Revised
NASA Technical Reports Server (NTRS)
Lim, K. B.; Giesy, D. P.
2000-01-01
Given measurement data, a nominal model and a linear fractional transformation uncertainty structure with an allowance on unknown but bounded exogenous disturbances, easily computable tests for the existence of a model validating uncertainty set are given. Under mild conditions, these tests are necessary and sufficient for the case of complex, nonrepeated, block-diagonal structure. For the more general case which includes repeated and/or real scalar uncertainties, the tests are only necessary but become sufficient if a collinearity condition is also satisfied. With the satisfaction of these tests, it is shown that a parameterization of all model validating sets of plant models is possible. The new parameterization is used as a basis for a systematic way to construct or perform uncertainty tradeoff with model validating uncertainty sets which have specific linear fractional transformation structure for use in robust control design and analysis. An illustrative example which includes a comparison of candidate model validating sets is given.
How to test validity in orthodontic research: a mixed dentition analysis example.
Donatelli, Richard E; Lee, Shin-Jae
2015-02-01
The data used to test the validity of a prediction method should be different from the data used to generate the prediction model. In this study, we explored whether an independent data set is mandatory for testing the validity of a new prediction method and how validity can be tested without independent new data. Several validation methods were compared in an example using the data from a mixed dentition analysis with a regression model. The validation errors of real mixed dentition analysis data and simulation data were analyzed for increasingly large data sets. The validation results of both the real and the simulation studies demonstrated that the leave-1-out cross-validation method had the smallest errors. The largest errors occurred in the traditional simple validation method. The differences between the validation methods diminished as the sample size increased. The leave-1-out cross-validation method seems to be an optimal validation method for improving the prediction accuracy in a data set with limited sample sizes. Copyright © 2015 American Association of Orthodontists. Published by Elsevier Inc. All rights reserved.
Correcting Evaluation Bias of Relational Classifiers with Network Cross Validation
2010-01-01
classi- fication algorithms: simple random resampling (RRS), equal-instance random resampling (ERS), and network cross-validation ( NCV ). The first two... NCV procedure that eliminates overlap between test sets altogether. The procedure samples for k disjoint test sets that will be used for evaluation...propLabeled ∗ S) nodes from train Pool in f erenceSet =network − trainSet F = F ∪ < trainSet, test Set, in f erenceSet > end for output: F NCV addresses
Does rational selection of training and test sets improve the outcome of QSAR modeling?
Martin, Todd M; Harten, Paul; Young, Douglas M; Muratov, Eugene N; Golbraikh, Alexander; Zhu, Hao; Tropsha, Alexander
2012-10-22
Prior to using a quantitative structure activity relationship (QSAR) model for external predictions, its predictive power should be established and validated. In the absence of a true external data set, the best way to validate the predictive ability of a model is to perform its statistical external validation. In statistical external validation, the overall data set is divided into training and test sets. Commonly, this splitting is performed using random division. Rational splitting methods can divide data sets into training and test sets in an intelligent fashion. The purpose of this study was to determine whether rational division methods lead to more predictive models compared to random division. A special data splitting procedure was used to facilitate the comparison between random and rational division methods. For each toxicity end point, the overall data set was divided into a modeling set (80% of the overall set) and an external evaluation set (20% of the overall set) using random division. The modeling set was then subdivided into a training set (80% of the modeling set) and a test set (20% of the modeling set) using rational division methods and by using random division. The Kennard-Stone, minimal test set dissimilarity, and sphere exclusion algorithms were used as the rational division methods. The hierarchical clustering, random forest, and k-nearest neighbor (kNN) methods were used to develop QSAR models based on the training sets. For kNN QSAR, multiple training and test sets were generated, and multiple QSAR models were built. The results of this study indicate that models based on rational division methods generate better statistical results for the test sets than models based on random division, but the predictive power of both types of models are comparable.
Validation of tsunami inundation model TUNA-RP using OAR-PMEL-135 benchmark problem set
NASA Astrophysics Data System (ADS)
Koh, H. L.; Teh, S. Y.; Tan, W. K.; Kh'ng, X. Y.
2017-05-01
A standard set of benchmark problems, known as OAR-PMEL-135, is developed by the US National Tsunami Hazard Mitigation Program for tsunami inundation model validation. Any tsunami inundation model must be tested for its accuracy and capability using this standard set of benchmark problems before it can be gainfully used for inundation simulation. The authors have previously developed an in-house tsunami inundation model known as TUNA-RP. This inundation model solves the two-dimensional nonlinear shallow water equations coupled with a wet-dry moving boundary algorithm. This paper presents the validation of TUNA-RP against the solutions provided in the OAR-PMEL-135 benchmark problem set. This benchmark validation testing shows that TUNA-RP can indeed perform inundation simulation with accuracy consistent with that in the tested benchmark problem set.
Lung Reference Set A Application: LaszloTakacs - Biosystems (2010) — EDRN Public Portal
We would like to access the NCI lung cancer Combined Pre-Validation Reference Set A in order to further validate a lung cancer diagnostic test candidate. Our test is based on a panel of antibodies which have been tested on 4 different cohorts (see below, paragraph “Preliminary Data and Methods”). This Reference Set A, whose clinical setting is “Diagnosis of lung cancer”, will be used to validate the panel of monoclonal antibodies which have been demonstrated by extensive data analysis to provide the best discrimination between controls and Lung Cancer patient plasma samples, sensitivity and specificity values from ROC analyses are superior than 85 %.
Validation of the German version of the Ford Insomnia Response to Stress Test.
Dieck, Arne; Helbig, Susanne; Drake, Christopher L; Backhaus, Jutta
2018-06-01
The purpose of this study was to assess the psychometric properties of a German version of the Ford Insomnia Response to Stress Test with groups with and without sleep problems. Three studies were analysed. Data set 1 was based on an initial screening for a sleep training program (n = 393), data set 2 was based on a study to test the test-retest reliability of the Ford Insomnia Response to Stress Test (n = 284) and data set 3 was based on a study to examine the influence of competitive sport on sleep (n = 37). Data sets 1 and 2 were used to test internal consistency, factor structure, convergent validity, discriminant validity and test-retest reliability of the Ford Insomnia Response to Stress Test. Content validity was tested using data set 3. Cronbach's alpha of the Ford Insomnia Response to Stress Test was good (α = 0.80) and test-retest reliability was satisfactory (r = 0.72). Overall, the one-factor model showed the best fit. Furthermore, significant positive correlations between the Ford Insomnia Response to Stress Test and impaired sleep quality, depression and stress reactivity were in line with the expectations regarding the convergent validity. Subjects with sleep problems had significantly higher scores in the Ford Insomnia Response to Stress Test than subjects without sleep problems (P < 0.01). Competitive athletes with higher scores in the Ford Insomnia Response to Stress Test had significantly lower sleep quality (P = 0.01), demonstrating that vulnerability for stress-induced sleep disturbances accompanies poorer sleep quality in stressful episodes. The findings show that the German version of the Ford Insomnia Response to Stress Test is a reliable and valid questionnaire to assess the vulnerability to stress-induced sleep disturbances. © 2017 European Sleep Research Society.
Evaluating the Content Validity of Multistage-Adaptive Tests
ERIC Educational Resources Information Center
Crotts, Katrina; Sireci, Stephen G.; Zenisky, April
2012-01-01
Validity evidence based on test content is important for educational tests to demonstrate the degree to which they fulfill their purposes. Most content validity studies involve subject matter experts (SMEs) who rate items that comprise a test form. In computerized-adaptive testing, examinees take different sets of items and test "forms"…
Endarti, Dwi; Riewpaiboon, Arthorn; Thavorncharoensap, Montarat; Praditsitthikorn, Naiyana; Hutubessy, Raymond; Kristina, Susi Ari
2018-05-01
To gain insight into the most suitable foreign value set among Malaysian, Singaporean, Thai, and UK value sets for calculating the EuroQol five-dimensional questionnaire index score (utility) among patients with cervical cancer in Indonesia. Data from 87 patients with cervical cancer recruited from a referral hospital in Yogyakarta province, Indonesia, from an earlier study of health-related quality of life were used in this study. The differences among the utility scores derived from the four value sets were determined using the Friedman test. Performance of the psychometric properties of the four value sets versus visual analogue scale (VAS) was assessed. Intraclass correlation coefficients and Bland-Altman plots were used to test the agreement among the utility scores. Spearman ρ correlation coefficients were used to assess convergent validity between utility scores and patients' sociodemographic and clinical characteristics. With respect to known-group validity, the Kruskal-Wallis test was used to examine the differences in utility according to the stages of cancer. There was significant difference among utility scores derived from the four value sets, among which the Malaysian value set yielded higher utility than the other three value sets. Utility obtained from the Malaysian value set had more agreements with VAS than the other value sets versus VAS (intraclass correlation coefficients and Bland-Altman plot tests results). As for the validity, the four value sets showed equivalent psychometric properties as those that resulted from convergent and known-group validity tests. In the absence of an Indonesian value set, the Malaysian value set was more preferable to be used compared with the other value sets. Further studies on the development of an Indonesian value set need to be conducted. Copyright © 2018. Published by Elsevier Inc.
Giguère, Charles-Édouard; Potvin, Stéphane
2017-01-01
Substance use disorders (SUDs) are significant risk factors for psychiatric relapses and hospitalizations in psychiatric populations. Unfortunately, no instrument has been validated for the screening of SUDs in psychiatric emergency settings. The Drug Abuse Screening Test (DAST) is widely used in the addiction field, but is has not been validated in that particular context. The objective of the current study is to examine the psychometric properties of the DAST administered to psychiatric populations evaluated in an emergency setting. The DAST was administered to 912 psychiatric patients in an emergency setting, of which 119 had a SUD (excluding those misusing alcohol only). The internal consistency, the construct validity, the test-retest reliability and the predictive validity (using SUD diagnoses) of the DAST were examined. The convergent validity was also examined, using a validated impulsivity scale. Regarding the internal consistency of the DAST, the Cronbach's alpha was 0.88. The confirmatory factor analysis showed that the DAST has one underlying factor. The test-retest reliability analysis produced a correlation coefficient of 0.86. ROC curve analyses produced an area under the curve of 0.799. Interestingly, a sex effect was observed. Finally, the convergent validity analysis showed that the DAST total score is specifically correlated with the sensation seeking dimension of impulsivity. The results of this validation study shows that the DAST preserves its excellent psychometric properties in psychiatric populations evaluated in an emergency setting. These results should encourage the use of the DAST in this unstable clinical situation. Copyright © 2016 Elsevier Ltd. All rights reserved.
Validation of the SimSET simulation package for modeling the Siemens Biograph mCT PET scanner
NASA Astrophysics Data System (ADS)
Poon, Jonathan K.; Dahlbom, Magnus L.; Casey, Michael E.; Qi, Jinyi; Cherry, Simon R.; Badawi, Ramsey D.
2015-02-01
Monte Carlo simulation provides a valuable tool in performance assessment and optimization of system design parameters for PET scanners. SimSET is a popular Monte Carlo simulation toolkit that features fast simulation time, as well as variance reduction tools to further enhance computational efficiency. However, SimSET has lacked the ability to simulate block detectors until its most recent release. Our goal is to validate new features of SimSET by developing a simulation model of the Siemens Biograph mCT PET scanner and comparing the results to a simulation model developed in the GATE simulation suite and to experimental results. We used the NEMA NU-2 2007 scatter fraction, count rates, and spatial resolution protocols to validate the SimSET simulation model and its new features. The SimSET model overestimated the experimental results of the count rate tests by 11-23% and the spatial resolution test by 13-28%, which is comparable to previous validation studies of other PET scanners in the literature. The difference between the SimSET and GATE simulation was approximately 4-8% for the count rate test and approximately 3-11% for the spatial resolution test. In terms of computational time, SimSET performed simulations approximately 11 times faster than GATE simulations. The new block detector model in SimSET offers a fast and reasonably accurate simulation toolkit for PET imaging applications.
Rational selection of training and test sets for the development of validated QSAR models
NASA Astrophysics Data System (ADS)
Golbraikh, Alexander; Shen, Min; Xiao, Zhiyan; Xiao, Yun-De; Lee, Kuo-Hsiung; Tropsha, Alexander
2003-02-01
Quantitative Structure-Activity Relationship (QSAR) models are used increasingly to screen chemical databases and/or virtual chemical libraries for potentially bioactive molecules. These developments emphasize the importance of rigorous model validation to ensure that the models have acceptable predictive power. Using k nearest neighbors ( kNN) variable selection QSAR method for the analysis of several datasets, we have demonstrated recently that the widely accepted leave-one-out (LOO) cross-validated R2 (q2) is an inadequate characteristic to assess the predictive ability of the models [Golbraikh, A., Tropsha, A. Beware of q2! J. Mol. Graphics Mod. 20, 269-276, (2002)]. Herein, we provide additional evidence that there exists no correlation between the values of q 2 for the training set and accuracy of prediction ( R 2) for the test set and argue that this observation is a general property of any QSAR model developed with LOO cross-validation. We suggest that external validation using rationally selected training and test sets provides a means to establish a reliable QSAR model. We propose several approaches to the division of experimental datasets into training and test sets and apply them in QSAR studies of 48 functionalized amino acid anticonvulsants and a series of 157 epipodophyllotoxin derivatives with antitumor activity. We formulate a set of general criteria for the evaluation of predictive power of QSAR models.
Mehta, Urvakhsh M; Thirthalli, Jagadisha; Naveen Kumar, C; Mahadevaiah, Mahesh; Rao, Kiran; Subbakrishna, Doddaballapura K; Gangadhar, Bangalore N; Keshavan, Matcheri S
2011-09-01
Social cognition is a cognitive domain that is under substantial cultural influence. There are no culturally appropriate standardized tools in India to comprehensively test social cognition. This study describes validation of tools for three social cognition constructs: theory of mind, social perception and attributional bias. Theory of mind tests included adaptations of, (a) two first order tasks [Sally-Anne and Smarties task], (b) two second order tasks [Ice cream van and Missing cookies story], (c) two metaphor-irony tasks and (d) the faux pas recognition test. Internal, Personal, and Situational Attributions Questionnaire (IPSAQ) and Social Cue Recognition Test were adapted to assess attributional bias and social perception, respectively. These tests were first modified to suit the Indian cultural context without changing the constructs to be tested. A panel of experts then rated the tests on likert scales as to (1) whether the modified tasks tested the same construct as in the original and (2) whether they were culturally appropriate. The modified tests were then administered to groups of actively symptomatic and remitted schizophrenia patients as well as healthy comparison subjects. All tests of the Social Cognition Rating Tools in Indian Setting had good content validity and known groups validity. In addition, the social cure recognition test in Indian setting had good internal consistency and concurrent validity. Copyright © 2011 Elsevier B.V. All rights reserved.
A Historical Overview on the Concept of Validity in Language Testing
ERIC Educational Resources Information Center
Hamavandy, Mehraban; Kiany, Gholam Reza
2014-01-01
This article provides an overview on language test validation theories, especially the Messickian view on construct validity and the way it's been translated into practice. First, a brief historical synopsis will be set forth, followed by recent views on test validity as advanced by Messick and Kane. The review goes on to lay out the similarities…
Quasi-QSAR for mutagenic potential of multi-walled carbon-nanotubes.
Toropov, Andrey A; Toropova, Alla P
2015-04-01
Available on the Internet, the CORAL software (http://www.insilico.eu/coral) has been used to build up quasi-quantitative structure-activity relationships (quasi-QSAR) for prediction of mutagenic potential of multi-walled carbon-nanotubes (MWCNTs). In contrast with the previous models built up by CORAL which were based on representation of the molecular structure by simplified molecular input-line entry system (SMILES) the quasi-QSARs based on the representation of conditions (not on the molecular structure) such as concentration, presence (absence) S9 mix, the using (or without the using) of preincubation were encoded by so-called quasi-SMILES. The statistical characteristics of these models (quasi-QSARs) for three random splits into the visible training set and test set and invisible validation set are the following: (i) split 1: n=13, r(2)=0.8037, q(2)=0.7260, s=0.033, F=45 (training set); n=5, r(2)=0.9102, s=0.071 (test set); n=6, r(2)=0.7627, s=0.044 (validation set); (ii) split 2: n=13, r(2)=0.6446, q(2)=0.4733, s=0.045, F=20 (training set); n=5, r(2)=0.6785, s=0.054 (test set); n=6, r(2)=0.9593, s=0.032 (validation set); and (iii) n=14, r(2)=0.8087, q(2)=0.6975, s=0.026, F=51 (training set); n=5, r(2)=0.9453, s=0.074 (test set); n=5, r(2)=0.8951, s=0.052 (validation set). Copyright © 2014 Elsevier Ltd. All rights reserved.
The Utrecht questionnaire (U-CEP) measuring knowledge on clinical epidemiology proved to be valid.
Kortekaas, Marlous F; Bartelink, Marie-Louise E L; de Groot, Esther; Korving, Helen; de Wit, Niek J; Grobbee, Diederick E; Hoes, Arno W
2017-02-01
Knowledge on clinical epidemiology is crucial to practice evidence-based medicine. We describe the development and validation of the Utrecht questionnaire on knowledge on Clinical epidemiology for Evidence-based Practice (U-CEP); an assessment tool to be used in the training of clinicians. The U-CEP was developed in two formats: two sets of 25 questions and a combined set of 50. The validation was performed among postgraduate general practice (GP) trainees, hospital trainees, GP supervisors, and experts. Internal consistency, internal reliability (item-total correlation), item discrimination index, item difficulty, content validity, construct validity, responsiveness, test-retest reliability, and feasibility were assessed. The questionnaire was externally validated. Internal consistency was good with a Cronbach alpha of 0.8. The median item-total correlation and mean item discrimination index were satisfactory. Both sets were perceived as relevant to clinical practice. Construct validity was good. Both sets were responsive but failed on test-retest reliability. One set took 24 minutes and the other 33 minutes to complete, on average. External GP trainees had comparable results. The U-CEP is a valid questionnaire to assess knowledge on clinical epidemiology, which is a prerequisite for practicing evidence-based medicine in daily clinical practice. Copyright © 2016 Elsevier Inc. All rights reserved.
40 CFR 1065.550 - Gas analyzer range validation, drift validation, and drift correction.
Code of Federal Regulations, 2011 CFR
2011-07-01
... given test interval (i.e., do not set them to zero). A third calculation of composite brake-specific...) values from each test interval and sets any negative mass (or mass rate) values to zero before... less than the standard by at least two times the absolute difference between the uncorrected and...
Lauer, Michael S; Pothier, Claire E; Magid, David J; Smith, S Scott; Kattan, Michael W
2007-12-18
The exercise treadmill test is recommended for risk stratification among patients with intermediate to high pretest probability of coronary artery disease. Posttest risk stratification is based on the Duke treadmill score, which includes only functional capacity and measures of ischemia. To develop and externally validate a post-treadmill test, multivariable mortality prediction rule for adults with suspected coronary artery disease and normal electrocardiograms. Prospective cohort study conducted from September 1990 to May 2004. Exercise treadmill laboratories in a major medical center (derivation set) and a separate HMO (validation set). 33,268 patients in the derivation set and 5821 in the validation set. All patients had normal electrocardiograms and were referred for evaluation of suspected coronary artery disease. The derivation set patients were followed for a median of 6.2 years. A nomogram-illustrated model was derived on the basis of variables easily obtained in the stress laboratory, including age; sex; history of smoking, hypertension, diabetes, or typical angina; and exercise findings of functional capacity, ST-segment changes, symptoms, heart rate recovery, and frequent ventricular ectopy in recovery. The derivation data set included 1619 deaths. Although both the Duke treadmill score and our nomogram-illustrated model were significantly associated with death (P < 0.001), the nomogram was better at discrimination (concordance index for right-censored data, 0.83 vs. 0.73) and calibration. We reclassified many patients with intermediate- to high-risk Duke treadmill scores as low risk on the basis of the nomogram. The model also predicted 3-year mortality rates well in the validation set: Based on an optimal cut-point for a negative predictive value of 0.97, derivation and validation rates were, respectively, 1.7% and 2.5% below the cut-point and 25% and 29% above the cut-point. Blood test-based measures or left ventricular ejection fraction were not included. The nomogram can be applied only to patients with a normal electrocardiogram. Clinical utility remains to be tested. A simple nomogram based on easily obtained pretest and exercise test variables predicted all-cause mortality in adults with suspected coronary artery disease and normal electrocardiograms.
A new test set for validating predictions of protein-ligand interaction.
Nissink, J Willem M; Murray, Chris; Hartshorn, Mike; Verdonk, Marcel L; Cole, Jason C; Taylor, Robin
2002-12-01
We present a large test set of protein-ligand complexes for the purpose of validating algorithms that rely on the prediction of protein-ligand interactions. The set consists of 305 complexes with protonation states assigned by manual inspection. The following checks have been carried out to identify unsuitable entries in this set: (1) assessing the involvement of crystallographically related protein units in ligand binding; (2) identification of bad clashes between protein side chains and ligand; and (3) assessment of structural errors, and/or inconsistency of ligand placement with crystal structure electron density. In addition, the set has been pruned to assure diversity in terms of protein-ligand structures, and subsets are supplied for different protein-structure resolution ranges. A classification of the set by protein type is available. As an illustration, validation results are shown for GOLD and SuperStar. GOLD is a program that performs flexible protein-ligand docking, and SuperStar is used for the prediction of favorable interaction sites in proteins. The new CCDC/Astex test set is freely available to the scientific community (http://www.ccdc.cam.ac.uk). Copyright 2002 Wiley-Liss, Inc.
Olar, Adriana; Wani, Khalida; Mansouri, Alireza; Zadeh, Gelareh; Wilson, Charmaine; DeMonte, Franco; Fuller, Gregory; Jones, David; Pfister, Stefan; von Deimling, Andreas; Sulman, Erik; Aldape, Kenneth
2014-01-01
BACKGROUND: Methylation profiling of solid tumors has revealed biologic subtypes, often with clinical implications. Methylation profiles of meningioma and their clinical implications are not well understood. METHODS: Ninety-two meningioma samples (n = 44 test set and n = 48 validation set) were profiled using the Illumina HumanMethylation450 BeadChip. Unsupervised clustering and analyses for recurrence-free survival (RFS) were performed. RESULTS: Unsupervised clustering of the test set using approximately 900 highly variable markers identified two clearly defined methylation subgroups. One of the groups (n = 19) showed global hypermethylation of a set of markers, analogous to CpG island methylator phenotype (CIMP). These findings were reproducible in the validation set, with 18/48 samples showing the CIMP-positive phenotype. Importantly, of 347 highly variable markers common to both the test and validation set analyses, 107 defined CIMP in the test set and 94 defined CIMP in the validation set, with an overlap of 83 markers between the two datasets. This number is much greater than expected by chance indicating reproducibly of the hypermethylated markers that define CIMP in meningioma. With respect to clinical correlation, the 37 CIMP-positive cases displayed significantly shorter RFS compared to the 55 non-CIMP cases (hazard ratio 2.9, p = 0.013). In an effort to develop a preliminary outcome predictor, a 155-marker subset correlated with RFS was identified in the test dataset. When interrogated in the validation dataset, this 155-marker subset showed a statistical trend (p < 0.1) towards distinguishing survival groups. CONCLUSIONS: This study defines the existence of a CIMP phenotype in meningioma, which involves a substantial proportion (37/92, 40%) of samples with clinical implications. Ongoing work will expand this cohort and examine identification of additional biologic differences (mutational and DNA copy number analysis) to further characterize the aberrant methylation subtype in meningioma. CIMP-positivity with aberrant methylation in recurrent/malignant meningioma suggests a potential therapeutic target for clinically aggressive cases.
34 CFR 462.11 - What must an application contain?
Code of Federal Regulations, 2010 CFR
2010-07-01
... the methodology and procedures used to measure the reliability of the test. (h) Construct validity... previous test, and results from validity, reliability, and equating or standard-setting studies undertaken... NRS educational functioning levels (content validity). Documentation of the extent to which the items...
Piette, Elizabeth R; Moore, Jason H
2018-01-01
Machine learning methods and conventions are increasingly employed for the analysis of large, complex biomedical data sets, including genome-wide association studies (GWAS). Reproducibility of machine learning analyses of GWAS can be hampered by biological and statistical factors, particularly so for the investigation of non-additive genetic interactions. Application of traditional cross validation to a GWAS data set may result in poor consistency between the training and testing data set splits due to an imbalance of the interaction genotypes relative to the data as a whole. We propose a new cross validation method, proportional instance cross validation (PICV), that preserves the original distribution of an independent variable when splitting the data set into training and testing partitions. We apply PICV to simulated GWAS data with epistatic interactions of varying minor allele frequencies and prevalences and compare performance to that of a traditional cross validation procedure in which individuals are randomly allocated to training and testing partitions. Sensitivity and positive predictive value are significantly improved across all tested scenarios for PICV compared to traditional cross validation. We also apply PICV to GWAS data from a study of primary open-angle glaucoma to investigate a previously-reported interaction, which fails to significantly replicate; PICV however improves the consistency of testing and training results. Application of traditional machine learning procedures to biomedical data may require modifications to better suit intrinsic characteristics of the data, such as the potential for highly imbalanced genotype distributions in the case of epistasis detection. The reproducibility of genetic interaction findings can be improved by considering this variable imbalance in cross validation implementation, such as with PICV. This approach may be extended to problems in other domains in which imbalanced variable distributions are a concern.
ERIC Educational Resources Information Center
Wong, Kung-Teck; Osman, Rosma bt; Goh, Pauline Swee Choo; Rahmat, Mohd Khairezan
2013-01-01
This study sets out to validate and test the Technology Acceptance Model (TAM) in the context of Malaysian student teachers' integration of their technology in teaching and learning. To establish factorial validity, data collected from 302 respondents were tested against the TAM using confirmatory factor analysis (CFA), and structural equation…
Petersen, John R; Stevenson, Heather L; Kasturi, Krishna S; Naniwadekar, Ashutosh; Parkes, Julie; Cross, Richard; Rosenberg, William M; Xiao, Shu-Yuan; Snyder, Ned
2014-04-01
The assessment of liver fibrosis in chronic hepatitis C patients is important for prognosis and making decisions regarding antiviral treatment. Although liver biopsy is considered the reference standard for assessing hepatic fibrosis in patients with chronic hepatitis C, it is invasive and associated with sampling and interobserver variability. Serum fibrosis markers have been utilized as surrogates for a liver biopsy. We completed a prospective study of 191 patients in which blood draws and liver biopsies were performed on the same visit. Using liver biopsies the sensitivity, specificity, and negative and positive predictive values for both aspartate aminotransferase/platelet ratio index (APRI) and enhanced liver fibrosis (ELF) were determined. The patients were divided into training and validation patient sets to develop and validate a clinically useful algorithm for differentiating mild and significant fibrosis. The area under the ROC curve for the APRI and ELF tests for the training set was 0.865 and 0.880, respectively. The clinical sensitivity in separating mild (F0-F1) from significant fibrosis (F2-F4) was 80% and 86.0% with a clinical specificity of 86.7% and 77.8%, respectively. For the validation sets the area under the ROC curve for the APRI and ELF tests was, 0.855 and 0.780, respectively. The clinical sensitivity of the APRI and ELF tests in separating mild (F0-F1) from significant (F2-F4) fibrosis for the validation set was 90.0% and 70.0% with a clinical specificity of 73.3% and 86.7%, respectively. There were no differences between the APRI and ELF tests in distinguishing mild from significant fibrosis for either the training or validation sets (P=0.61 and 0.20, respectively). Using APRI as the primary test followed by ELF for patients in the intermediate zone, would have decreased the number of liver biopsies needed by 40% for the validation set. Overall, use of our algorithm would have decreased the number of patients who needed a liver biopsy from 95 to 24-a 74.7% reduction. This study has shown that the APRI and ELF tests are equally accurate in distinguishing mild from significant liver fibrosis, and combining them into a validated algorithm improves their performance in distinguishing mild from significant fibrosis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Im, Piljae; Bhandari, Mahabir S.; New, Joshua Ryan
This document describes the Oak Ridge National Laboratory (ORNL) multiyear experimental plan for validation and uncertainty characterization of whole-building energy simulation for a multi-zone research facility using a traditional rooftop unit (RTU) as a baseline heating, ventilating, and air conditioning (HVAC) system. The project’s overarching objective is to increase the accuracy of energy simulation tools by enabling empirical validation of key inputs and algorithms. Doing so is required to inform the design of increasingly integrated building systems and to enable accountability for performance gaps between design and operation of a building. The project will produce documented data sets that canmore » be used to validate key functionality in different energy simulation tools and to identify errors and inadequate assumptions in simulation engines so that developers can correct them. ASHRAE Standard 140, Method of Test for the Evaluation of Building Energy Analysis Computer Programs (ASHRAE 2004), currently consists primarily of tests to compare different simulation programs with one another. This project will generate sets of measured data to enable empirical validation, incorporate these test data sets in an extended version of Standard 140, and apply these tests to the Department of Energy’s (DOE) EnergyPlus software (EnergyPlus 2016) to initiate the correction of any significant deficiencies. The fitness-for-purpose of the key algorithms in EnergyPlus will be established and demonstrated, and vendors of other simulation programs will be able to demonstrate the validity of their products. The data set will be equally applicable to validation of other simulation engines as well.« less
Validation of sterilizing grade filtration.
Jornitz, M W; Meltzer, T H
2003-01-01
Validation consideration of sterilizing grade filters, namely 0.2 micron, changed when FDA voiced concerns about the validity of Bacterial Challenge tests performed in the past. Such validation exercises are nowadays considered to be filter qualification. Filter validation requires more thorough analysis, especially Bacterial Challenge testing with the actual drug product under process conditions. To do so, viability testing is a necessity to determine the Bacterial Challenge test methodology. Additionally to these two compulsory tests, other evaluations like extractable, adsorption and chemical compatibility tests should be considered. PDA Technical Report # 26, Sterilizing Filtration of Liquids, describes all parameters and aspects required for the comprehensive validation of filters. The report is a most helpful tool for validation of liquid filters used in the biopharmaceutical industry. It sets the cornerstones of validation requirements and other filtration considerations.
ERIC Educational Resources Information Center
George-Ezzelle, Carol E.; Skaggs, Gary
2004-01-01
Current testing standards call for test developers to provide evidence that testing procedures and test scores, and the inferences made based on the test scores, show evidence of validity and are comparable across subpopulations (American Educational Research Association [AERA], American Psychological Association [APA], & National Council on…
The Air Force Officer Qualifying Test: Validity, Fairness, and Bias
2010-01-01
scores. The Standards for Educational and Psychological Testing (AERA, APA, and NCME, 1999) provides a set of guidelines published and endorsed by the...determining the validity and bias of selection tests falls upon professionals in the discipline of industrial/organizational psychology 20 See Roper v. Dep’t...i). 30 The Air Force Officer Qualifying Test : Validity, Fairness, and Bias and closely related fields (e.g., educational psychology and
Dahlke, Jeffrey A; Kostal, Jack W; Sackett, Paul R; Kuncel, Nathan R
2018-05-03
We explore potential explanations for validity degradation using a unique predictive validation data set containing up to four consecutive years of high school students' cognitive test scores and four complete years of those students' college grades. This data set permits analyses that disentangle the effects of predictor-score age and timing of criterion measurements on validity degradation. We investigate the extent to which validity degradation is explained by criterion dynamism versus the limited shelf-life of ability scores. We also explore whether validity degradation is attributable to fluctuations in criterion variability over time and/or GPA contamination from individual differences in course-taking patterns. Analyses of multiyear predictor data suggest that changes to the determinants of performance over time have much stronger effects on validity degradation than does the shelf-life of cognitive test scores. The age of predictor scores had only a modest relationship with criterion-related validity when the criterion measurement occasion was held constant. Practical implications and recommendations for future research are discussed. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Ultrasonic inspection of a glued laminated timber fabricated with defects
Robert Emerson; David Pollock; David McLean; Kenneth Fridley; Robert Ross; Roy Pellerin
2001-01-01
The Federal Highway Administration (FHWA) set up a validation test to compare the effectiveness of various nondestructive inspection techniques for detecting artificial defects in glulam members. The validation test consisted of a glulam beam fabricated with artificial defects known to FHWA personnel but not originally known to the scientists performing the validation...
Serim-Yildiz, Begüm; Erdur-Baker, Ozgür
2013-01-01
The authors examined the cultural validity of Fear Survey Schedule for Children (FSSC-AM) developed by J. J. Burnham (2005) with Turkish children. The relationships between demographic variables and the level of fear were also tested. Three independent data sets were used. The first data set comprised 676 participants (321 women and 355 men) and was used for examining factor structure and internal reliability of FSSC. The second data set comprised 639 participants (321 women and 318 men) and was used for testing internal reliability and to confirm the factor structure of FSCC. The third data set comprised 355 participants (173 women and 182 men) and used for analyses of test-retest reliability, inter-item reliability, and convergent validity for the scores of FSSC. The sum of the first and second samples (1,315 participants; 642 women and 673 men) was used for testing the relationships between demographic variables and the level of fear. Results indicated that FSSC is a valid and reliable instrument to examine Turkish children's and adolescents' fears between the ages of 8 and 18 years. The younger, female, children of low-income parents reported a higher level of fear. The findings are discussed in light of the existing literature.
Routine development of objectively derived search strategies.
Hausner, Elke; Waffenschmidt, Siw; Kaiser, Thomas; Simon, Michael
2012-02-29
Over the past few years, information retrieval has become more and more professionalized, and information specialists are considered full members of a research team conducting systematic reviews. Research groups preparing systematic reviews and clinical practice guidelines have been the driving force in the development of search strategies, but open questions remain regarding the transparency of the development process and the available resources. An empirically guided approach to the development of a search strategy provides a way to increase transparency and efficiency. Our aim in this paper is to describe the empirically guided development process for search strategies as applied by the German Institute for Quality and Efficiency in Health Care (Institut für Qualität und Wirtschaftlichkeit im Gesundheitswesen, or "IQWiG"). This strategy consists of the following steps: generation of a test set, as well as the development, validation and standardized documentation of the search strategy. We illustrate our approach by means of an example, that is, a search for literature on brachytherapy in patients with prostate cancer. For this purpose, a test set was generated, including a total of 38 references from 3 systematic reviews. The development set for the generation of the strategy included 25 references. After application of textual analytic procedures, a strategy was developed that included all references in the development set. To test the search strategy on an independent set of references, the remaining 13 references in the test set (the validation set) were used. The validation set was also completely identified. Our conclusion is that an objectively derived approach similar to that used in search filter development is a feasible way to develop and validate reliable search strategies. Besides creating high-quality strategies, the widespread application of this approach will result in a substantial increase in the transparency of the development process of search strategies.
2014-01-01
Background Patient-reported outcome validation needs to achieve validity and reliability standards. Among reliability analysis parameters, test-retest reliability is an important psychometric property. Retested patients must be in a clinically stable condition. This is particularly problematic in palliative care (PC) settings because advanced cancer patients are prone to a faster rate of clinical deterioration. The aim of this study was to evaluate the methods by which multi-symptom and health-related qualities of life (HRQoL) based on patient-reported outcomes (PROs) have been validated in oncological PC settings with regards to test-retest reliability. Methods A systematic search of PubMed (1966 to June 2013), EMBASE (1980 to June 2013), PsychInfo (1806 to June 2013), CINAHL (1980 to June 2013), and SCIELO (1998 to June 2013), and specific PRO databases was performed. Studies were included if they described a set of validation studies. Studies were included if they described a set of validation studies for an instrument developed to measure multi-symptom or multidimensional HRQoL in advanced cancer patients under PC. The COSMIN checklist was used to rate the methodological quality of the study designs. Results We identified 89 validation studies from 746 potentially relevant articles. From those 89 articles, 31 measured test-retest reliability and were included in this review. Upon critical analysis of the overall quality of the criteria used to determine the test-retest reliability, 6 (19.4%), 17 (54.8%), and 8 (25.8%) of these articles were rated as good, fair, or poor, respectively, and no article was classified as excellent. Multi-symptom instruments were retested over a shortened interval when compared to the HRQoL instruments (median values 24 hours and 168 hours, respectively; p = 0.001). Validation studies that included objective confirmation of clinical stability in their design yielded better results for the test-retest analysis with regard to both pain and global HRQoL scores (p < 0.05). The quality of the statistical analysis and its description were of great concern. Conclusion Test-retest reliability has been infrequently and poorly evaluated. The confirmation of clinical stability was an important factor in our analysis, and we suggest that special attention be focused on clinical stability when designing a PRO validation study that includes advanced cancer patients under PC. PMID:24447633
Calès, P; Boursier, J; Lebigot, J; de Ledinghen, V; Aubé, C; Hubert, I; Oberti, F
2017-04-01
In chronic hepatitis C, the European Association for the Study of the Liver and the Asociacion Latinoamericana para el Estudio del Higado recommend performing transient elastography plus a blood test to diagnose significant fibrosis; test concordance confirms the diagnosis. To validate this rule and improve it by combining a blood test, FibroMeter (virus second generation, Echosens, Paris, France) and transient elastography (constitutive tests) into a single combined test, as suggested by the American Association for the Study of Liver Diseases and the Infectious Diseases Society of America. A total of 1199 patients were included in an exploratory set (HCV, n = 679) or in two validation sets (HCV ± HIV, HBV, n = 520). Accuracy was mainly evaluated by correct diagnosis rate for severe fibrosis (pathological Metavir F ≥ 3, primary outcome) by classical test scores or a fibrosis classification, reflecting Metavir staging, as a function of test concordance. Score accuracy: there were no significant differences between the blood test (75.7%), elastography (79.1%) and the combined test (79.4%) (P = 0.066); the score accuracy of each test was significantly (P < 0.001) decreased in discordant vs. concordant tests. Classification accuracy: combined test accuracy (91.7%) was significantly (P < 0.001) increased vs. the blood test (84.1%) and elastography (88.2%); accuracy of each constitutive test was significantly (P < 0.001) decreased in discordant vs. concordant tests but not with combined test: 89.0 vs. 92.7% (P = 0.118). Multivariate analysis for accuracy showed an interaction between concordance and fibrosis level: in the 1% of patients with full classification discordance and severe fibrosis, non-invasive tests were unreliable. The advantage of combined test classification was confirmed in the validation sets. The concordance recommendation is validated. A combined test, expressed in classification instead of score, improves this rule and validates the recommendation of a combined test, avoiding 99% of biopsies, and offering precise staging. © 2017 John Wiley & Sons Ltd.
Poljak, Mario; Oštrbenk, Anja
2013-01-01
Human papillomavirus (HPV) testing has become an essential part of current clinical practice in the management of cervical cancer and precancerous lesions. We reviewed the most important validation studies of a next-generation real-time polymerase chain reaction-based assay, the RealTime High Risk HPV test (RealTime)(Abbott Molecular, Des Plaines, IL, USA), for triage in referral population settings and for use in primary cervical cancer screening in women 30 years and older published in peer-reviewed journals from 2009 to 2013. RealTime is designed to detect 14 high-risk HPV genotypes with concurrent distinction of HPV-16 and HPV-18 from 12 other HPV genotypes. The test was launched on the European market in January 2009 and is currently used in many laboratories worldwide for routine detection of HPV. We concisely reviewed validation studies of a next-generation real-time polymerase chain reaction (PCR)-based assay: the Abbott RealTime High Risk HPV test. Eight validation studies of RealTime in referral settings showed its consistently high absolute clinical sensitivity for both CIN2+ (range 88.3-100%) and CIN3+ (range 93.0-100%), as well as comparative clinical sensitivity relative to the currently most widely used HPV test: the Qiagen/Digene Hybrid Capture 2 HPV DNA Test (HC2). Due to the significantly different composition of the referral populations, RealTime absolute clinical specificity for CIN2+ and CIN3+ varied greatly across studies, but was comparable relative to HC2. Four validation studies of RealTime performance in cervical cancer screening settings showed its consistently high absolute clinical sensitivity for both CIN2+ and CIN3+, as well as comparative clinical sensitivity and specificity relative to HC2 and GP5+/6+ PCR. RealTime has been extensively evaluated in the last 4 years. RealTime can be considered clinically validated for triage in referral population settings and for use in primary cervical cancer screening in women 30 years and older.
The NeoTech Aqua Solutions, Inc. D438™ UV Water Treatment System was tested to validate the UV dose delivered by the system using biodosimetry and a set line approach. The set line for 40 mJ/cm2 measured Reduction Equivalent Dose (RED) was based on validation testing at three (3)...
Slavov, Svetoslav H; Stoyanova-Slavova, Iva; Mattes, William; Beger, Richard D; Brüschweiler, Beat J
2018-07-01
A grid-based, alignment-independent 3D-SDAR (three-dimensional spectral data-activity relationship) approach based on simulated 13 C and 15 N NMR chemical shifts augmented with through-space interatomic distances was used to model the mutagenicity of 554 primary and 419 secondary aromatic amines. A robust modeling strategy supported by extensive validation including randomized training/hold-out test set pairs, validation sets, "blind" external test sets as well as experimental validation was applied to avoid over-parameterization and build Organization for Economic Cooperation and Development (OECD 2004) compliant models. Based on an experimental validation set of 23 chemicals tested in a two-strain Salmonella typhimurium Ames assay, 3D-SDAR was able to achieve performance comparable to 5-strain (Ames) predictions by Lhasa Limited's Derek and Sarah Nexus for the same set. Furthermore, mapping of the most frequently occurring bins on the primary and secondary aromatic amine structures allowed the identification of molecular features that were associated either positively or negatively with mutagenicity. Prominent structural features found to enhance the mutagenic potential included: nitrobenzene moieties, conjugated π-systems, nitrothiophene groups, and aromatic hydroxylamine moieties. 3D-SDAR was also able to capture "true" negative contributions that are particularly difficult to detect through alternative methods. These include sulphonamide, acetamide, and other functional groups, which not only lack contributions to the overall mutagenic potential, but are known to actively lower it, if present in the chemical structures of what otherwise would be potential mutagens.
Aldekhayel, Salah A; Alselaim, Nahar A; Magzoub, Mohi Eldin; Al-Qattan, Mohammad M; Al-Namlah, Abdullah M; Tamim, Hani; Al-Khayal, Abdullah; Al-Habdan, Sultan I; Zamakhshary, Mohammed F
2012-10-24
Script Concordance Test (SCT) is a new assessment tool that reliably assesses clinical reasoning skills. Previous descriptions of developing SCT-question banks were merely subjective. This study addresses two gaps in the literature: 1) conducting the first phase of a multistep validation process of SCT in Plastic Surgery, and 2) providing an objective methodology to construct a question bank based on SCT. After developing a test blueprint, 52 test items were written. Five validation questions were developed and a validation survey was established online. Seven reviewers were asked to answer this survey. They were recruited from two countries, Saudi Arabia and Canada, to improve the test's external validity. Their ratings were transformed into percentages. Analysis was performed to compare reviewers' ratings by looking at correlations, ranges, means, medians, and overall scores. Scores of reviewers' ratings were between 76% and 95% (mean 86% ± 5). We found poor correlations between reviewers (Pearson's: +0.38 to -0.22). Ratings of individual validation questions ranged between 0 and 4 (on a scale 1-5). Means and medians of these ranges were computed for each test item (mean: 0.8 to 2.4; median: 1 to 3). A subset of test items comprising 27 items was generated based on a set of inclusion and exclusion criteria. This study proposes an objective methodology for validation of SCT-question bank. Analysis of validation survey is done from all angles, i.e., reviewers, validation questions, and test items. Finally, a subset of test items is generated based on a set of criteria.
Halpern, Yoni; Jernite, Yacine; Shapiro, Nathan I.; Nathanson, Larry A.
2017-01-01
Objective To demonstrate the incremental benefit of using free text data in addition to vital sign and demographic data to identify patients with suspected infection in the emergency department. Methods This was a retrospective, observational cohort study performed at a tertiary academic teaching hospital. All consecutive ED patient visits between 12/17/08 and 2/17/13 were included. No patients were excluded. The primary outcome measure was infection diagnosed in the emergency department defined as a patient having an infection related ED ICD-9-CM discharge diagnosis. Patients were randomly allocated to train (64%), validate (20%), and test (16%) data sets. After preprocessing the free text using bigram and negation detection, we built four models to predict infection, incrementally adding vital signs, chief complaint, and free text nursing assessment. We used two different methods to represent free text: a bag of words model and a topic model. We then used a support vector machine to build the prediction model. We calculated the area under the receiver operating characteristic curve to compare the discriminatory power of each model. Results A total of 230,936 patient visits were included in the study. Approximately 14% of patients had the primary outcome of diagnosed infection. The area under the ROC curve (AUC) for the vitals model, which used only vital signs and demographic data, was 0.67 for the training data set, 0.67 for the validation data set, and 0.67 (95% CI 0.65–0.69) for the test data set. The AUC for the chief complaint model which also included demographic and vital sign data was 0.84 for the training data set, 0.83 for the validation data set, and 0.83 (95% CI 0.81–0.84) for the test data set. The best performing methods made use of all of the free text. In particular, the AUC for the bag-of-words model was 0.89 for training data set, 0.86 for the validation data set, and 0.86 (95% CI 0.85–0.87) for the test data set. The AUC for the topic model was 0.86 for the training data set, 0.86 for the validation data set, and 0.85 (95% CI 0.84–0.86) for the test data set. Conclusion Compared to previous work that only used structured data such as vital signs and demographic information, utilizing free text drastically improves the discriminatory ability (increase in AUC from 0.67 to 0.86) of identifying infection. PMID:28384212
Horng, Steven; Sontag, David A; Halpern, Yoni; Jernite, Yacine; Shapiro, Nathan I; Nathanson, Larry A
2017-01-01
To demonstrate the incremental benefit of using free text data in addition to vital sign and demographic data to identify patients with suspected infection in the emergency department. This was a retrospective, observational cohort study performed at a tertiary academic teaching hospital. All consecutive ED patient visits between 12/17/08 and 2/17/13 were included. No patients were excluded. The primary outcome measure was infection diagnosed in the emergency department defined as a patient having an infection related ED ICD-9-CM discharge diagnosis. Patients were randomly allocated to train (64%), validate (20%), and test (16%) data sets. After preprocessing the free text using bigram and negation detection, we built four models to predict infection, incrementally adding vital signs, chief complaint, and free text nursing assessment. We used two different methods to represent free text: a bag of words model and a topic model. We then used a support vector machine to build the prediction model. We calculated the area under the receiver operating characteristic curve to compare the discriminatory power of each model. A total of 230,936 patient visits were included in the study. Approximately 14% of patients had the primary outcome of diagnosed infection. The area under the ROC curve (AUC) for the vitals model, which used only vital signs and demographic data, was 0.67 for the training data set, 0.67 for the validation data set, and 0.67 (95% CI 0.65-0.69) for the test data set. The AUC for the chief complaint model which also included demographic and vital sign data was 0.84 for the training data set, 0.83 for the validation data set, and 0.83 (95% CI 0.81-0.84) for the test data set. The best performing methods made use of all of the free text. In particular, the AUC for the bag-of-words model was 0.89 for training data set, 0.86 for the validation data set, and 0.86 (95% CI 0.85-0.87) for the test data set. The AUC for the topic model was 0.86 for the training data set, 0.86 for the validation data set, and 0.85 (95% CI 0.84-0.86) for the test data set. Compared to previous work that only used structured data such as vital signs and demographic information, utilizing free text drastically improves the discriminatory ability (increase in AUC from 0.67 to 0.86) of identifying infection.
Comparison of Random Forest and Support Vector Machine classifiers using UAV remote sensing imagery
NASA Astrophysics Data System (ADS)
Piragnolo, Marco; Masiero, Andrea; Pirotti, Francesco
2017-04-01
Since recent years surveying with unmanned aerial vehicles (UAV) is getting a great amount of attention due to decreasing costs, higher precision and flexibility of usage. UAVs have been applied for geomorphological investigations, forestry, precision agriculture, cultural heritage assessment and for archaeological purposes. It can be used for land use and land cover classification (LULC). In literature, there are two main types of approaches for classification of remote sensing imagery: pixel-based and object-based. On one hand, pixel-based approach mostly uses training areas to define classes and respective spectral signatures. On the other hand, object-based classification considers pixels, scale, spatial information and texture information for creating homogeneous objects. Machine learning methods have been applied successfully for classification, and their use is increasing due to the availability of faster computing capabilities. The methods learn and train the model from previous computation. Two machine learning methods which have given good results in previous investigations are Random Forest (RF) and Support Vector Machine (SVM). The goal of this work is to compare RF and SVM methods for classifying LULC using images collected with a fixed wing UAV. The processing chain regarding classification uses packages in R, an open source scripting language for data analysis, which provides all necessary algorithms. The imagery was acquired and processed in November 2015 with cameras providing information over the red, blue, green and near infrared wavelength reflectivity over a testing area in the campus of Agripolis, in Italy. Images were elaborated and ortho-rectified through Agisoft Photoscan. The ortho-rectified image is the full data set, and the test set is derived from partial sub-setting of the full data set. Different tests have been carried out, using a percentage from 2 % to 20 % of the total. Ten training sets and ten validation sets are obtained from each test set. The control dataset consist of an independent visual classification done by an expert over the whole area. The classes are (i) broadleaf, (ii) building, (iii) grass, (iv) headland access path, (v) road, (vi) sowed land, (vii) vegetable. The RF and SVM are applied to the test set. The performances of the methods are evaluated using the three following accuracy metrics: Kappa index, Classification accuracy and Classification Error. All three are calculated in three different ways: with K-fold cross validation, using the validation test set and using the full test set. The analysis indicates that SVM gets better results in terms of good scores using K-fold cross or validation test set. Using the full test set, RF achieves a better result in comparison to SVM. It also seems that SVM performs better with smaller training sets, whereas RF performs better as training sets get larger.
An early-biomarker algorithm predicts lethal graft-versus-host disease and survival
Hartwell, Matthew J.; Özbek, Umut; Holler, Ernst; Major-Monfried, Hannah; Reddy, Pavan; Aziz, Mina; Hogan, William J.; Ayuk, Francis; Efebera, Yvonne A.; Hexner, Elizabeth O.; Bunworasate, Udomsak; Qayed, Muna; Ordemann, Rainer; Wölfl, Matthias; Mielke, Stephan; Chen, Yi-Bin; Devine, Steven; Jagasia, Madan; Kitko, Carrie L.; Litzow, Mark R.; Kröger, Nicolaus; Locatelli, Franco; Morales, George; Nakamura, Ryotaro; Reshef, Ran; Rösler, Wolf; Weber, Daniela; Yanik, Gregory A.; Levine, John E.; Ferrara, James L.M.
2017-01-01
BACKGROUND. No laboratory test can predict the risk of nonrelapse mortality (NRM) or severe graft-versus-host disease (GVHD) after hematopoietic cellular transplantation (HCT) prior to the onset of GVHD symptoms. METHODS. Patient blood samples on day 7 after HCT were obtained from a multicenter set of 1,287 patients, and 620 samples were assigned to a training set. We measured the concentrations of 4 GVHD biomarkers (ST2, REG3α, TNFR1, and IL-2Rα) and used them to model 6-month NRM using rigorous cross-validation strategies to identify the best algorithm that defined 2 distinct risk groups. We then applied the final algorithm in an independent test set (n = 309) and validation set (n = 358). RESULTS. A 2-biomarker model using ST2 and REG3α concentrations identified patients with a cumulative incidence of 6-month NRM of 28% in the high-risk group and 7% in the low-risk group (P < 0.001). The algorithm performed equally well in the test set (33% vs. 7%, P < 0.001) and the multicenter validation set (26% vs. 10%, P < 0.001). Sixteen percent, 17%, and 20% of patients were at high risk in the training, test, and validation sets, respectively. GVHD-related mortality was greater in high-risk patients (18% vs. 4%, P < 0.001), as was severe gastrointestinal GVHD (17% vs. 8%, P < 0.001). The same algorithm can be successfully adapted to define 3 distinct risk groups at GVHD onset. CONCLUSION. A biomarker algorithm based on a blood sample taken 7 days after HCT can consistently identify a group of patients at high risk for lethal GVHD and NRM. FUNDING. The National Cancer Institute, American Cancer Society, and the Doris Duke Charitable Foundation. PMID:28194439
American Alcohol Photo Stimuli (AAPS): A standardized set of alcohol and matched non-alcohol images.
Stauffer, Christopher S; Dobberteen, Lily; Woolley, Joshua D
2017-11-01
Photographic stimuli are commonly used to assess cue reactivity in the research and treatment of alcohol use disorder. The stimuli used are often non-standardized, not properly validated, and poorly controlled. There are no previously published, validated, American-relevant sets of alcohol images created in a standardized fashion. We aimed to: 1) make available a standardized, matched set of photographic alcohol and non-alcohol beverage stimuli, 2) establish face validity, the extent to which the stimuli are subjectively viewed as what they are purported to be, and 3) establish construct validity, the degree to which a test measures what it claims to be measuring. We produced a standardized set of 36 images consisting of American alcohol and non-alcohol beverages matched for basic color, form, and complexity. A total of 178 participants (95 male, 82 female, 1 genderqueer) rated each image for appetitiveness. An arrow-probe task, in which matched pairs were categorized after being presented for 200 ms, assessed face validity. Criteria for construct validity were met if variation in AUDIT scores were associated with variation in performance on tasks during alcohol image presentation. Overall, images were categorized with >90% accuracy. Participants' AUDIT scores correlated significantly with alcohol "want" and "like" ratings [r(176) = 0.27, p = <0.001; r(176) = 0.36, p = <0.001] and arrow-probe latency [r(176) = -0.22, p = 0.004], but not with non-alcohol outcomes. Furthermore, appetitive ratings and arrow-probe latency for alcohol, but not non-alcohol, differed significantly for heavy versus light drinkers. Our image set provides valid and reliable alcohol stimuli for both explicit and implicit tests of cue reactivity. The use of standardized, validated, reliable image sets may improve consistency across research and treatment paradigms.
Larsen, Lisbeth Runge; Jørgensen, Martin Grønbech; Junge, Tina; Juul-Kristensen, Birgit; Wedderkopp, Niels
2014-06-10
Because body proportions in childhood are different to those in adulthood, children have a relatively higher centre of mass location. This biomechanical difference and the fact that children's movements have not yet fully matured result in different sway performances in children and adults. When assessing static balance, it is essential to use objective, sensitive tools, and these types of measurement have previously been performed in laboratory settings. However, the emergence of technologies like the Nintendo Wii Board (NWB) might allow balance assessment in field settings. As the NWB has only been validated and tested for reproducibility in adults, the purpose of this study was to examine reproducibility and validity of the NWB in a field setting, in a population of children. Fifty-four 10-14 year-olds from the CHAMPS-Study DK performed four different balance tests: bilateral stance with eyes open (1), unilateral stance on dominant (2) and non-dominant leg (3) with eyes open, and bilateral stance with eyes closed (4). Three rounds of the four tests were completed with the NWB and with a force platform (AMTI). To assess reproducibility, an intra-day test-retest design was applied with a two-hour break between sessions. Bland-Altman plots supplemented by Minimum Detectable Change (MDC) and concordance correlation coefficient (CCC) demonstrated satisfactory reproducibility for the NWB and the AMTI (MDC: 26.3-28.2%, CCC: 0.76-0.86) using Centre Of Pressure path Length as measurement parameter. Bland-Altman plots demonstrated satisfactory concurrent validity between the NWB and the AMTI, supplemented by satisfactory CCC in all tests (CCC: 0.74-0.87). The ranges of the limits of agreement in the validity study were comparable to the limits of agreement of the reproducibility study. Both NWB and AMTI have satisfactory reproducibility for testing static balance in a population of children. Concurrent validity of NWB compared with AMTI was satisfactory. Furthermore, the results from the concurrent validity study were comparable to the reproducibility results of the NWB and the AMTI. Thus, NWB has the potential to replace the AMTI in field settings in studies including children. Future studies are needed to examine intra-subject variability and to test the predictive validity of NWB.
2014-01-01
Background Because body proportions in childhood are different to those in adulthood, children have a relatively higher centre of mass location. This biomechanical difference and the fact that children’s movements have not yet fully matured result in different sway performances in children and adults. When assessing static balance, it is essential to use objective, sensitive tools, and these types of measurement have previously been performed in laboratory settings. However, the emergence of technologies like the Nintendo Wii Board (NWB) might allow balance assessment in field settings. As the NWB has only been validated and tested for reproducibility in adults, the purpose of this study was to examine reproducibility and validity of the NWB in a field setting, in a population of children. Methods Fifty-four 10–14 year-olds from the CHAMPS-Study DK performed four different balance tests: bilateral stance with eyes open (1), unilateral stance on dominant (2) and non-dominant leg (3) with eyes open, and bilateral stance with eyes closed (4). Three rounds of the four tests were completed with the NWB and with a force platform (AMTI). To assess reproducibility, an intra-day test-retest design was applied with a two-hour break between sessions. Results Bland-Altman plots supplemented by Minimum Detectable Change (MDC) and concordance correlation coefficient (CCC) demonstrated satisfactory reproducibility for the NWB and the AMTI (MDC: 26.3-28.2%, CCC: 0.76-0.86) using Centre Of Pressure path Length as measurement parameter. Bland-Altman plots demonstrated satisfactory concurrent validity between the NWB and the AMTI, supplemented by satisfactory CCC in all tests (CCC: 0.74-0.87). The ranges of the limits of agreement in the validity study were comparable to the limits of agreement of the reproducibility study. Conclusion Both NWB and AMTI have satisfactory reproducibility for testing static balance in a population of children. Concurrent validity of NWB compared with AMTI was satisfactory. Furthermore, the results from the concurrent validity study were comparable to the reproducibility results of the NWB and the AMTI. Thus, NWB has the potential to replace the AMTI in field settings in studies including children. Future studies are needed to examine intra-subject variability and to test the predictive validity of NWB. PMID:24913461
Phase 1 Validation Testing and Simulation for the WEC-Sim Open Source Code
NASA Astrophysics Data System (ADS)
Ruehl, K.; Michelen, C.; Gunawan, B.; Bosma, B.; Simmons, A.; Lomonaco, P.
2015-12-01
WEC-Sim is an open source code to model wave energy converters performance in operational waves, developed by Sandia and NREL and funded by the US DOE. The code is a time-domain modeling tool developed in MATLAB/SIMULINK using the multibody dynamics solver SimMechanics, and solves the WEC's governing equations of motion using the Cummins time-domain impulse response formulation in 6 degrees of freedom. The WEC-Sim code has undergone verification through code-to-code comparisons; however validation of the code has been limited to publicly available experimental data sets. While these data sets provide preliminary code validation, the experimental tests were not explicitly designed for code validation, and as a result are limited in their ability to validate the full functionality of the WEC-Sim code. Therefore, dedicated physical model tests for WEC-Sim validation have been performed. This presentation provides an overview of the WEC-Sim validation experimental wave tank tests performed at the Oregon State University's Directional Wave Basin at Hinsdale Wave Research Laboratory. Phase 1 of experimental testing was focused on device characterization and completed in Fall 2015. Phase 2 is focused on WEC performance and scheduled for Winter 2015/2016. These experimental tests were designed explicitly to validate the performance of WEC-Sim code, and its new feature additions. Upon completion, the WEC-Sim validation data set will be made publicly available to the wave energy community. For the physical model test, a controllable model of a floating wave energy converter has been designed and constructed. The instrumentation includes state-of-the-art devices to measure pressure fields, motions in 6 DOF, multi-axial load cells, torque transducers, position transducers, and encoders. The model also incorporates a fully programmable Power-Take-Off system which can be used to generate or absorb wave energy. Numerical simulations of the experiments using WEC-Sim will be presented. These simulations highlight the code features included in the latest release of WEC-Sim (v1.2), including: wave directionality, nonlinear hydrostatics and hydrodynamics, user-defined wave elevation time-series, state space radiation, and WEC-Sim compatibility with BEMIO (open source AQWA/WAMI/NEMOH coefficient parser).
Castro-Vale, Ivone; Severo, Milton; Carvalho, Davide; Mota-Cardoso, Rui
2015-01-01
Emotion recognition is very important for social interaction. Several mental disorders influence facial emotion recognition. War veterans and their offspring are subject to an increased risk of developing psychopathology. Emotion recognition is an important aspect that needs to be addressed in this population. To our knowledge, no test exists that is validated for use with war veterans and their offspring. The current study aimed to validate the JACFEE photo set to study facial emotion recognition in war veterans and their offspring. The JACFEE photo set was presented to 135 participants, comprised of 62 male war veterans and 73 war veterans' offspring. The participants identified the facial emotion presented from amongst the possible seven emotions that were tested for: anger, contempt, disgust, fear, happiness, sadness, and surprise. A loglinear model was used to evaluate whether the agreement between the intended and the chosen emotions was higher than the expected. Overall agreement between chosen and intended emotions was 76.3% (Cohen kappa = 0.72). The agreement ranged from 63% (sadness expressions) to 91% (happiness expressions). The reliability by emotion ranged from 0.617 to 0.843 and the overall JACFEE photo set Cronbach alpha was 0.911. The offspring showed higher agreement when compared with the veterans (RR: 41.52 vs 12.12, p < 0.001), which confirms the construct validity of the test. The JACFEE set of photos showed good validity and reliability indices, which makes it an adequate instrument for researching emotion recognition ability in the study sample of war veterans and their respective offspring.
Castro-Vale, Ivone; Severo, Milton; Carvalho, Davide; Mota-Cardoso, Rui
2015-01-01
Emotion recognition is very important for social interaction. Several mental disorders influence facial emotion recognition. War veterans and their offspring are subject to an increased risk of developing psychopathology. Emotion recognition is an important aspect that needs to be addressed in this population. To our knowledge, no test exists that is validated for use with war veterans and their offspring. The current study aimed to validate the JACFEE photo set to study facial emotion recognition in war veterans and their offspring. The JACFEE photo set was presented to 135 participants, comprised of 62 male war veterans and 73 war veterans’ offspring. The participants identified the facial emotion presented from amongst the possible seven emotions that were tested for: anger, contempt, disgust, fear, happiness, sadness, and surprise. A loglinear model was used to evaluate whether the agreement between the intended and the chosen emotions was higher than the expected. Overall agreement between chosen and intended emotions was 76.3% (Cohen kappa = 0.72). The agreement ranged from 63% (sadness expressions) to 91% (happiness expressions). The reliability by emotion ranged from 0.617 to 0.843 and the overall JACFEE photo set Cronbach alpha was 0.911. The offspring showed higher agreement when compared with the veterans (RR: 41.52 vs 12.12, p < 0.001), which confirms the construct validity of the test. The JACFEE set of photos showed good validity and reliability indices, which makes it an adequate instrument for researching emotion recognition ability in the study sample of war veterans and their respective offspring. PMID:26147938
Urine specimen validity test for drug abuse testing in workplace and court settings.
Lin, Shin-Yu; Lee, Hei-Hwa; Lee, Jong-Feng; Chen, Bai-Hsiun
2018-01-01
In recent decades, urine drug testing in the workplace has become common in many countries in the world. There have been several studies concerning the use of the urine specimen validity test (SVT) for drug abuse testing administered in the workplace. However, very little data exists concerning the urine SVT on drug abuse tests from court specimens, including dilute, substituted, adulterated, and invalid tests. We investigated 21,696 submitted urine drug test samples for SVT from workplace and court settings in southern Taiwan over 5 years. All immunoassay screen-positive urine specimen drug tests were confirmed by gas chromatography/mass spectrometry. We found that the mean 5-year prevalence of tampering (dilute, substituted, or invalid tests) in urine specimens from the workplace and court settings were 1.09% and 3.81%, respectively. The mean 5-year percentage of dilute, substituted, and invalid urine specimens from the workplace were 89.2%, 6.8%, and 4.1%, respectively. The mean 5-year percentage of dilute, substituted, and invalid urine specimens from the court were 94.8%, 1.4%, and 3.8%, respectively. No adulterated cases were found among the workplace or court samples. The most common drug identified from the workplace specimens was amphetamine, followed by opiates. The most common drug identified from the court specimens was ketamine, followed by amphetamine. We suggest that all urine specimens taken for drug testing from both the workplace and court settings need to be tested for validity. Copyright © 2017. Published by Elsevier B.V.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gong, Y; Yu, J; Yeung, V
Purpose: Artificial neural networks (ANN) can be used to discover complex relations within datasets to help with medical decision making. This study aimed to develop an ANN method to predict two-year overall survival of patients with peri-ampullary cancer (PAC) following resection. Methods: Data were collected from 334 patients with PAC following resection treated in our institutional pancreatic tumor registry between 2006 and 2012. The dataset contains 14 variables including age, gender, T-stage, tumor differentiation, positive-lymph-node ratio, positive resection margins, chemotherapy, radiation therapy, and tumor histology.After censoring for two-year survival analysis, 309 patients were left, of which 44 patients (∼15%) weremore » randomly selected to form testing set. The remaining 265 cases were randomly divided into training set (211 cases, ∼80% of 265) and validation set (54 cases, ∼20% of 265) for 20 times to build 20 ANN models. Each ANN has one hidden layer with 5 units. The 20 ANN models were ranked according to their concordance index (c-index) of prediction on validation sets. To further improve prediction, the top 10% of ANN models were selected, and their outputs averaged for prediction on testing set. Results: By random division, 44 cases in testing set and the remaining 265 cases have approximately equal two-year survival rates, 36.4% and 35.5% respectively. The 20 ANN models, which were trained and validated on the 265 cases, yielded mean c-indexes as 0.59 and 0.63 on validation sets and the testing set, respectively. C-index was 0.72 when the two best ANN models (top 10%) were used in prediction on testing set. The c-index of Cox regression analysis was 0.63. Conclusion: ANN improved survival prediction for patients with PAC. More patient data and further analysis of additional factors may be needed for a more robust model, which will help guide physicians in providing optimal post-operative care. This project was supported by PA CURE Grant.« less
Size and Strength: Do We Need Both to Measure Vocabulary Knowledge?
ERIC Educational Resources Information Center
Laufer, B.; Elder, C.; Hill, K.; Congdon, P.
2004-01-01
This article describes the development and validation of a test of vocabulary size and strength. The first part of the article sets out the theoretical rationale for the test, and describes how the size and strength constructs have been conceptualized and operationalized. The second part of the article focusses on the process of test validation,…
Formal verification and testing: An integrated approach to validating Ada programs
NASA Technical Reports Server (NTRS)
Cohen, Norman H.
1986-01-01
An integrated set of tools called a validation environment is proposed to support the validation of Ada programs by a combination of methods. A Modular Ada Validation Environment (MAVEN) is described which proposes a context in which formal verification can fit into the industrial development of Ada software.
Shum, Bennett O V; Henner, Ilya; Belluoccio, Daniele; Hinchcliffe, Marcus J
2017-07-01
The sensitivity and specificity of next-generation sequencing laboratory developed tests (LDTs) are typically determined by an analyte-specific approach. Analyte-specific validations use disease-specific controls to assess an LDT's ability to detect known pathogenic variants. Alternatively, a methods-based approach can be used for LDT technical validations. Methods-focused validations do not use disease-specific controls but use benchmark reference DNA that contains known variants (benign, variants of unknown significance, and pathogenic) to assess variant calling accuracy of a next-generation sequencing workflow. Recently, four whole-genome reference materials (RMs) from the National Institute of Standards and Technology (NIST) were released to standardize methods-based validations of next-generation sequencing panels across laboratories. We provide a practical method for using NIST RMs to validate multigene panels. We analyzed the utility of RMs in validating a novel newborn screening test that targets 70 genes, called NEO1. Despite the NIST RM variant truth set originating from multiple sequencing platforms, replicates, and library types, we discovered a 5.2% false-negative variant detection rate in the RM truth set genes that were assessed in our validation. We developed a strategy using complementary non-RM controls to demonstrate 99.6% sensitivity of the NEO1 test in detecting variants. Our findings have implications for laboratories or proficiency testing organizations using whole-genome NIST RMs for testing. Copyright © 2017 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
Bittante, G; Ferragina, A; Cipolat-Gotet, C; Cecchinato, A
2014-10-01
Cheese yield is an important technological trait in the dairy industry. The aim of this study was to infer the genetic parameters of some cheese yield-related traits predicted using Fourier-transform infrared (FTIR) spectral analysis and compare the results with those obtained using an individual model cheese-producing procedure. A total of 1,264 model cheeses were produced using 1,500-mL milk samples collected from individual Brown Swiss cows, and individual measurements were taken for 10 traits: 3 cheese yield traits (fresh curd, curd total solids, and curd water as a percent of the weight of the processed milk), 4 milk nutrient recovery traits (fat, protein, total solids, and energy of the curd as a percent of the same nutrient in the processed milk), and 3 daily cheese production traits per cow (fresh curd, total solids, and water weight of the curd). Each unprocessed milk sample was analyzed using a MilkoScan FT6000 (Foss, Hillerød, Denmark) over the spectral range, from 5,000 to 900 wavenumber × cm(-1). The FTIR spectrum-based prediction models for the previously mentioned traits were developed using modified partial least-square regression. Cross-validation of the whole data set yielded coefficients of determination between the predicted and measured values in cross-validation of 0.65 to 0.95 for all traits, except for the recovery of fat (0.41). A 3-fold external validation was also used, in which the available data were partitioned into 2 subsets: a training set (one-third of the herds) and a testing set (two-thirds). The training set was used to develop calibration equations, whereas the testing subsets were used for external validation of the calibration equations and to estimate the heritabilities and genetic correlations of the measured and FTIR-predicted phenotypes. The coefficients of determination between the predicted and measured values in cross-validation results obtained from the training sets were very similar to those obtained from the whole data set, but the coefficient of determination of validation values for the external validation sets were much lower for all traits (0.30 to 0.73), and particularly for fat recovery (0.05 to 0.18), for the training sets compared with the full data set. For each testing subset, the (co)variance components for the measured and FTIR-predicted phenotypes were estimated using bivariate Bayesian analyses and linear models. The intraherd heritabilities for the predicted traits obtained from our internal cross-validation using the whole data set ranged from 0.085 for daily yield of curd solids to 0.576 for protein recovery, and were similar to those obtained from the measured traits (0.079 to 0.586, respectively). The heritabilities estimated from the testing data set used for external validation were more variable but similar (on average) to the corresponding values obtained from the whole data set. Moreover, the genetic correlations between the predicted and measured traits were high in general (0.791 to 0.996), and they were always higher than the corresponding phenotypic correlations (0.383 to 0.995), especially for the external validation subset. In conclusion, we herein report that application of the cross-validation technique to the whole data set tended to overestimate the predictive ability of FTIR spectra, give more precise phenotypic predictions than the calibrations obtained using smaller data sets, and yield genetic correlations similar to those obtained from the measured traits. Collectively, our findings indicate that FTIR predictions have the potential to be used as indicator traits for the rapid and inexpensive selection of dairy populations for improvement of cheese yield, milk nutrient recovery in curd, and daily cheese production per cow. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Samala, Ravi K.; Chan, Heang-Ping; Hadjiiski, Lubomir; Helvie, Mark A.; Richter, Caleb; Cha, Kenny
2018-02-01
We propose a cross-domain, multi-task transfer learning framework to transfer knowledge learned from non-medical images by a deep convolutional neural network (DCNN) to medical image recognition task while improving the generalization by multi-task learning of auxiliary tasks. A first stage cross-domain transfer learning was initiated from ImageNet trained DCNN to mammography trained DCNN. 19,632 regions-of-interest (ROI) from 2,454 mass lesions were collected from two imaging modalities: digitized-screen film mammography (SFM) and full-field digital mammography (DM), and split into training and test sets. In the multi-task transfer learning, the DCNN learned the mass classification task simultaneously from the training set of SFM and DM. The best transfer network for mammography was selected from three transfer networks with different number of convolutional layers frozen. The performance of single-task and multitask transfer learning on an independent SFM test set in terms of the area under the receiver operating characteristic curve (AUC) was 0.78+/-0.02 and 0.82+/-0.02, respectively. In the second stage cross-domain transfer learning, a set of 12,680 ROIs from 317 mass lesions on DBT were split into validation and independent test sets. We first studied the data requirements for the first stage mammography trained DCNN by varying the mammography training data from 1% to 100% and evaluated its learning on the DBT validation set in inference mode. We found that the entire available mammography set provided the best generalization. The DBT validation set was then used to train only the last four fully connected layers, resulting in an AUC of 0.90+/-0.04 on the independent DBT test set.
40 CFR 1065.550 - Gas analyzer range validation, drift validation, and drift correction.
Code of Federal Regulations, 2010 CFR
2010-07-01
... interval (i.e., do not set them to zero). A third calculation of composite brake-specific emission values... from each test interval and sets any negative mass (or mass rate) values to zero before calculating the... value is less than the standard by at least two times the absolute difference between the uncorrected...
Sotardi, Valerie A
2018-05-01
Educational measures of anxiety focus heavily on students' experiences with tests yet overlook other assessment contexts. In this research, two brief multiscale questionnaires were developed and validated to measure trait evaluation anxiety (MTEA-12) and state evaluation anxiety (MSEA-12) for use in various assessment contexts in non-clinical, educational settings. The research included a cross-sectional analysis of self-report data using authentic assessment settings in which evaluation anxiety was measured. Instruments were tested using a validation sample of 241 first-year university students in New Zealand. Scale development included component structures for state and trait scales based on existing theoretical frameworks. Analyses using confirmatory factor analysis and descriptive statistics indicate that the scales are reliable and structurally valid. Multivariate general linear modeling using subscales from the MTEA-12, MSEA-12, and student grades suggest adequate criterion-related validity. Initial predictive validity in which one relevant MTEA-12 factor explained between 21% and 54% of the variance in three MSEA-12 factors. Results document MTEA-12 and MSEA-12 as reliable measures of trait and state dimensions of evaluation anxiety for test and writing contexts. Initial estimates suggest the scales as having promising validity, and recommendations for further validation are outlined.
Developing a Data Set and Processing Methodology for Fluid/Structure Interaction Code Validation
2007-06-01
50 29. 9-Probe Wake Survey Rake Configurations...structural stability and fatigue in test article components and, in general, in facility support structures and rotating machinery blading . Both T&E... blade analysis and simulations. To ensure the accuracy of the U of CO technology, validation using flight-test data and test data from a wind tunnel
The Chinese version of the Outcome Expectations for Exercise scale: validation study.
Lee, Ling-Ling; Chiu, Yu-Yun; Ho, Chin-Chih; Wu, Shu-Chen; Watson, Roger
2011-06-01
Estimates of the reliability and validity of the English nine-item Outcome Expectations for Exercise (OEE) scale have been tested and found to be valid for use in various settings, particularly among older people, with good internal consistency and validity. Data on the use of the OEE scale among older Chinese people living in the community and how cultural differences might affect the administration of the OEE scale are limited. To test the validity and reliability of the Chinese version of the Outcome Expectations for Exercise scale among older people. A cross-sectional validation study was designed to test the Chinese version of the OEE scale (OEE-C). Reliability was examined by testing both the internal consistency for the overall scale and the squared multiple correlation coefficient for the single item measure. The validity of the scale was tested on the basis of both a traditional psychometric test and a confirmatory factor analysis using structural equation modelling. The Mokken Scaling Procedure (MSP) was used to investigate if there were any hierarchical, cumulative sets of items in the measure. The OEE-C scale was tested in a group of older people in Taiwan (n=108, mean age=77.1). There was acceptable internal consistency (alpha=.85) and model fit in the scale. Evidence of the validity of the measure was demonstrated by the tests for criterion-related validity and construct validity. There was a statistically significant correlation between exercise outcome expectations and exercise self-efficacy (r=.34, p<.01). An analysis of the Mokken Scaling Procedure found that nine items of the scale were all retained in the analysis and the resulting scale was reliable and statistically significant (p=.0008). The results obtained in the present study provided acceptable levels of reliability and validity evidence for the Chinese Outcome Expectations for Exercise scale when used with older people in Taiwan. Future testing of the OEE-C scale needs to be carried out to see whether these results are generalisable to older Chinese people living in urban areas. Copyright © 2010 Elsevier Ltd. All rights reserved.
Identification and Validation of ESP Teacher Competencies: A Research Design
ERIC Educational Resources Information Center
Venkatraman, G.; Prema, P.
2013-01-01
The paper presents the research design used for identifying and validating a set of competencies required of ESP (English for Specific Purposes) teachers. The identification of the competencies and the three-stage validation process are also discussed. The observation of classes of ESP teachers for field-testing the validated competencies and…
Validation of the Information/Communications Technology Literacy Test
2016-10-01
nested set. Table 11 presents the results of incremental validity analyses for job knowledge/performance criteria by MOS. Figure 7 presents much...Systems Operator-Analyst (25B) and Nodal Network Systems Operator-Maintainer (25N) MOS. This report documents technical procedures and results of the...research effort. Results suggest that the ICTL test has potential as a valid and highly efficient predictor of valued outcomes in Signal school MOS. Not
Embedded performance validity testing in neuropsychological assessment: Potential clinical tools.
Rickards, Tyler A; Cranston, Christopher C; Touradji, Pegah; Bechtold, Kathleen T
2018-01-01
The article aims to suggest clinically-useful tools in neuropsychological assessment for efficient use of embedded measures of performance validity. To accomplish this, we integrated available validity-related and statistical research from the literature, consensus statements, and survey-based data from practicing neuropsychologists. We provide recommendations for use of 1) Cutoffs for embedded performance validity tests including Reliable Digit Span, California Verbal Learning Test (Second Edition) Forced Choice Recognition, Rey-Osterrieth Complex Figure Test Combination Score, Wisconsin Card Sorting Test Failure to Maintain Set, and the Finger Tapping Test; 2) Selecting number of performance validity measures to administer in an assessment; and 3) Hypothetical clinical decision-making models for use of performance validity testing in a neuropsychological assessment collectively considering behavior, patient reporting, and data indicating invalid or noncredible performance. Performance validity testing helps inform the clinician about an individual's general approach to tasks: response to failure, task engagement and persistence, compliance with task demands. Data-driven clinical suggestions provide a resource to clinicians and to instigate conversation within the field to make more uniform, testable decisions to further the discussion, and guide future research in this area.
Zhang, Jinshui; Yuan, Zhoumiqi; Shuai, Guanyuan; Pan, Yaozhong; Zhu, Xiufang
2017-04-26
This paper developed an approach, the window-based validation set for support vector data description (WVS-SVDD), to determine optimal parameters for support vector data description (SVDD) model to map specific land cover by integrating training and window-based validation sets. Compared to the conventional approach where the validation set included target and outlier pixels selected visually and randomly, the validation set derived from WVS-SVDD constructed a tightened hypersphere because of the compact constraint by the outlier pixels which were located neighboring to the target class in the spectral feature space. The overall accuracies for wheat and bare land achieved were as high as 89.25% and 83.65%, respectively. However, target class was underestimated because the validation set covers only a small fraction of the heterogeneous spectra of the target class. The different window sizes were then tested to acquire more wheat pixels for validation set. The results showed that classification accuracy increased with the increasing window size and the overall accuracies were higher than 88% at all window size scales. Moreover, WVS-SVDD showed much less sensitivity to the untrained classes than the multi-class support vector machine (SVM) method. Therefore, the developed method showed its merits using the optimal parameters, tradeoff coefficient ( C ) and kernel width ( s ), in mapping homogeneous specific land cover.
A Tissue Systems Pathology Assay for High-Risk Barrett's Esophagus.
Critchley-Thorne, Rebecca J; Duits, Lucas C; Prichard, Jeffrey W; Davison, Jon M; Jobe, Blair A; Campbell, Bruce B; Zhang, Yi; Repa, Kathleen A; Reese, Lia M; Li, Jinhong; Diehl, David L; Jhala, Nirag C; Ginsberg, Gregory; DeMarshall, Maureen; Foxwell, Tyler; Zaidi, Ali H; Lansing Taylor, D; Rustgi, Anil K; Bergman, Jacques J G H M; Falk, Gary W
2016-06-01
Better methods are needed to predict risk of progression for Barrett's esophagus. We aimed to determine whether a tissue systems pathology approach could predict progression in patients with nondysplastic Barrett's esophagus, indefinite for dysplasia, or low-grade dysplasia. We performed a nested case-control study to develop and validate a test that predicts progression of Barrett's esophagus to high-grade dysplasia (HGD) or esophageal adenocarcinoma (EAC), based upon quantification of epithelial and stromal variables in baseline biopsies. Data were collected from Barrett's esophagus patients at four institutions. Patients who progressed to HGD or EAC in ≥1 year (n = 79) were matched with patients who did not progress (n = 287). Biopsies were assigned randomly to training or validation sets. Immunofluorescence analyses were performed for 14 biomarkers and quantitative biomarker and morphometric features were analyzed. Prognostic features were selected in the training set and combined into classifiers. The top-performing classifier was assessed in the validation set. A 3-tier, 15-feature classifier was selected in the training set and tested in the validation set. The classifier stratified patients into low-, intermediate-, and high-risk classes [HR, 9.42; 95% confidence interval, 4.6-19.24 (high-risk vs. low-risk); P < 0.0001]. It also provided independent prognostic information that outperformed predictions based on pathology analysis, segment length, age, sex, or p53 overexpression. We developed a tissue systems pathology test that better predicts risk of progression in Barrett's esophagus than clinicopathologic variables. The test has the potential to improve upon histologic analysis as an objective method to risk stratify Barrett's esophagus patients. Cancer Epidemiol Biomarkers Prev; 25(6); 958-68. ©2016 AACR. ©2016 American Association for Cancer Research.
A Metric-Based Validation Process to Assess the Realism of Synthetic Power Grids
DOE Office of Scientific and Technical Information (OSTI.GOV)
Birchfield, Adam; Schweitzer, Eran; Athari, Mir
Public power system test cases that are of high quality benefit the power systems research community with expanded resources for testing, demonstrating, and cross-validating new innovations. Building synthetic grid models for this purpose is a relatively new problem, for which a challenge is to show that created cases are sufficiently realistic. This paper puts forth a validation process based on a set of metrics observed from actual power system cases. These metrics follow the structure, proportions, and parameters of key power system elements, which can be used in assessing and validating the quality of synthetic power grids. Though wide diversitymore » exists in the characteristics of power systems, the paper focuses on an initial set of common quantitative metrics to capture the distribution of typical values from real power systems. The process is applied to two new public test cases, which are shown to meet the criteria specified in the metrics of this paper.« less
Development and validation of a Response Bias Scale (RBS) for the MMPI-2.
Gervais, Roger O; Ben-Porath, Yossef S; Wygant, Dustin B; Green, Paul
2007-06-01
This study describes the development of a Minnesota Multiphasic Personality Inventory (MMPI-2) scale designed to detect negative response bias in forensic neuropsychological or disability assessment settings. The Response Bias Scale (RBS) consists of 28 MMPI-2 items that discriminated between persons who passed or failed the Word Memory Test (WMT), Computerized Assessment of Response Bias (CARB), and/or Test of Memory Malingering (TOMM) in a sample of 1,212 nonhead-injury disability claimants. Incremental validity of the RBS was evaluated by comparing its ability to detect poor performance on four separate symptom validity tests with that of the F and F(P) scales and the Fake Bad Scale (FBS). The RBS consistently outperformed F, F(P), and FBS. Study results suggest that the RBS may be a useful addition to existing MMPI-2 validity scales and indices in detecting symptom complaints predominantly associated with cognitive response bias and overreporting in forensic neuropsychological and disability assessment settings.
A Metric-Based Validation Process to Assess the Realism of Synthetic Power Grids
Birchfield, Adam; Schweitzer, Eran; Athari, Mir; ...
2017-08-19
Public power system test cases that are of high quality benefit the power systems research community with expanded resources for testing, demonstrating, and cross-validating new innovations. Building synthetic grid models for this purpose is a relatively new problem, for which a challenge is to show that created cases are sufficiently realistic. This paper puts forth a validation process based on a set of metrics observed from actual power system cases. These metrics follow the structure, proportions, and parameters of key power system elements, which can be used in assessing and validating the quality of synthetic power grids. Though wide diversitymore » exists in the characteristics of power systems, the paper focuses on an initial set of common quantitative metrics to capture the distribution of typical values from real power systems. The process is applied to two new public test cases, which are shown to meet the criteria specified in the metrics of this paper.« less
ERIC Educational Resources Information Center
Lane, Kathleen Lynne; Parks, Robin J.; Kalberg, Jemma Robertson; Carter, Erik W.
2007-01-01
This article presents findings of two studies, one conducted with middle school students (n = 500) in a rural setting and a second conducted with middle school students (n = 528) in an urban setting, of the reliability and validity of the "Student Risk Screening Scale" (SRSS; Drummond, 1994). Results revealed high internal consistency, test-retest…
Reliability and Validity of a Q-Sort Measure of Attachment Security in Hispanic Infants.
ERIC Educational Resources Information Center
Busch-Rossnagel, Nancy A.; And Others
1994-01-01
A set of Q-sort items to assess individual differences in infant-mother attachment was adapted for a Hispanic population of low-SES background. Completion of the Q-sort by observers and inner-city Hispanic mothers and testing of 43 infants with the Ainsworth Strange Situation established the Q-set's validity and indicated moderate reliability for…
Validity of Alternative Cut-Off Scores for the Back-Saver Sit and Reach Test
ERIC Educational Resources Information Center
Looney, Marilyn A.; Gilbert, Jennie
2012-01-01
The purpose of the study was to determine if currently used FITNESSGRAM[R] cut-off scores for the Back Saver Sit and Reach Test had the best criterion-referenced validity evidence for 6-12 year old children. Secondary analyses of an existing data set focused on the passive straight leg raise and Back Saver Sit and Reach Test flexibility scores of…
Automatic sleep stage classification using two facial electrodes.
Virkkala, Jussi; Velin, Riitta; Himanen, Sari-Leena; Värri, Alpo; Müller, Kiti; Hasan, Joel
2008-01-01
Standard sleep stage classification is based on visual analysis of central EEG, EOG and EMG signals. Automatic analysis with a reduced number of sensors has been studied as an easy alternative to the standard. In this study, a single-channel electro-oculography (EOG) algorithm was developed for separation of wakefulness, SREM, light sleep (S1, S2) and slow wave sleep (S3, S4). The algorithm was developed and tested with 296 subjects. Additional validation was performed on 16 subjects using a low weight single-channel Alive Monitor. In the validation study, subjects attached the disposable EOG electrodes themselves at home. In separating the four stages total agreement (and Cohen's Kappa) in the training data set was 74% (0.59), in the testing data set 73% (0.59) and in the validation data set 74% (0.59). Self-applicable electro-oculography with only two facial electrodes was found to provide reasonable sleep stage information.
Federal COBOL Compiler Testing Service Compiler Validation Request Information.
1977-05-09
background of the Federal COBOL Compiler Testing Service which was set up by a memorandum of agreement between the National Bureau of Standards and the...Federal Standard, and the requirement of COBOL compiler validation in the procurement process. It also contains a list of all software products...produced by the software Development Division in support of the FCCTS as well as the Validation Summary Reports produced as a result of discharging the
Wang, Haili; Tso, Victor; Wong, Clarence; Sadowski, Dan; Fedorak, Richard N
2014-03-20
Adenomatous polyps are precursors of colorectal cancer; their detection and removal is the goal of colon cancer screening programs. However, fecal-based methods identify patients with adenomatous polyps with low levels of sensitivity. The aim or this study was to develop a highly accurate, prototypic, proof-of-concept, spot urine-based diagnostic test using metabolomic technology to distinguish persons with adenomatous polyps from those without polyps. Prospective urine and stool samples were collected from 876 participants undergoing colonoscopy examination in a colon cancer screening program, from April 2008 to October 2009 at the University of Alberta. Colonoscopy reference standard identified 633 participants with no colonic polyps and 243 with colonic adenomatous polyps. One-dimensional nuclear magnetic resonance spectra of urine metabolites were analyzed to define a diagnostic metabolomic profile for colonic adenomas. A urine metabolomic diagnostic test for colonic adenomatous polyps was established using 67% of the samples (un-blinded training set) and validated using the other 33% of the samples (blinded testing set). The urine metabolomic diagnostic test's specificity and sensitivity were compared with those of fecal-based tests. Using a two-component, orthogonal, partial least-squares model of the metabolomic profile, the un-blinded training set identified patients with colonic adenomatous polyps with 88.9% sensitivity and 50.2% specificity. Validation using the blinded testing set confirmed sensitivity and specificity values of 82.7% and 51.2%, respectively. Sensitivities of fecal-based tests to identify colonic adenomas ranged from 2.5 to 11.9%. We describe a proof-of-concept spot urine-based metabolomic diagnostic test that identifies patients with colonic adenomatous polyps with a greater level of sensitivity (83%) than fecal-based tests.
NIKE: a new clinical tool for establishing levels of indications for cataract surgery.
Lundström, Mats; Albrecht, Susanne; Håkansson, Ingemar; Lorefors, Ragnhild; Ohlsson, Sven; Polland, Werner; Schmid, Andrea; Svensson, Göran; Wendel, Eva
2006-08-01
The purpose of this study was to construct a new clinical tool for establishing levels of indications for cataract surgery, and to validate this tool. Teams from nine eye clinics reached an agreement about the need to develop a clinical tool for setting levels of indications for cataract surgery and about the items that should be included in the tool. The tool was to be called 'NIKE' (Nationell Indikationsmodell för Kataraktextraktion). The Canadian Cataract Priority Criteria Tool served as a model for the NIKE tool, which was modified for Swedish conditions. Items included in the tool were visual acuity of both eyes, patients' perceived difficulties in day-to-day life, cataract symptoms, the ability to live independently, and medical/ophthalmic reasons for surgery. The tool was validated and tested in 343 cataract surgery patients. Validity, stability and reliability were tested and the outcome of surgery was studied in relation to the indication setting. Four indication groups (IGs) were suggested. The group with the greatest indications for surgery was named group 1 and that with the lowest, group 4. Validity was proved to be good. Surgery had the greatest impact on the group with the highest indications for surgery. Test-retest reliability test and interexaminer tests of indication settings showed statistically significant intraclass correlations (intraclass correlation coefficients [ICCs] 0.526 and 0.923, respectively). A new clinical tool for indication setting in cataract surgery is presented. This tool, the NIKE, takes into account both visual acuity and the patient's perceived problems in day-to-day life because of cataract. The tool seems to be stable and reliable and neutral towards different examiners.
Ballangrud, Randi; Husebø, Sissel Eikeland; Hall-Lord, Marie Louise
2017-12-02
Teamwork is an integrated part of today's specialized and complex healthcare and essential to patient safety, and is considered as a core competency to improve twenty-first century healthcare. Teamwork measurements and evaluations show promising results to promote good team performance, and are recommended for identifying areas for improvement. The validated TeamSTEPPS® Teamwork Perception Questionnaire (T-TPQ) was found suitable for cross-cultural validation and testing in a Norwegian context. T-TPQ is a self-report survey that examines five dimensions of perception of teamwork within healthcare settings. The aim of the study was to translate and cross-validate the T-TPQ into Norwegian, and test the questionnaire for psychometric properties among healthcare personnel. The T-TPQ was translated and adapted to a Norwegian context according to a model of a back-translation process. A total of 247 healthcare personnel representing different professionals and hospital settings responded to the questionnaire. A confirmatory factor analysis was carried out to test the factor structure. Cronbach's alpha was used to establish internal consistency, and an Intraclass Correlation Coefficient was used to assess the test - retest reliability. A confirmatory factor analysis showed an acceptable fitting model (χ 2 (df) 969.46 (546), p < 0.001, Root Mean Square Error of Approximation (RMSEA) = 0.056, Tucker-Lewis Index (TLI) = 0.88, Comparative fit index (CFI) = 0.89, which indicates that each set of the items that was supposed to accompany each teamwork dimension clearly represents that specific construct. The Cronbach's alpha demonstrated acceptable values on the five subscales (0.786-0.844), and test-retest showed a reliability parameter, with Intraclass Correlation Coefficient scores from 0.672 to 0.852. The Norwegian version of T-TPQ was considered to be acceptable regarding the validity and reliability for measuring Norwegian individual healthcare personnel's perception of group level teamwork within their unit. However, it needs to be further tested, preferably in a larger sample and in different clinical settings.
Halabi, Susan; Lin, Chen-Yen; Kelly, W. Kevin; Fizazi, Karim S.; Moul, Judd W.; Kaplan, Ellen B.; Morris, Michael J.; Small, Eric J.
2014-01-01
Purpose Prognostic models for overall survival (OS) for patients with metastatic castration-resistant prostate cancer (mCRPC) are dated and do not reflect significant advances in treatment options available for these patients. This work developed and validated an updated prognostic model to predict OS in patients receiving first-line chemotherapy. Methods Data from a phase III trial of 1,050 patients with mCRPC were used (Cancer and Leukemia Group B CALGB-90401 [Alliance]). The data were randomly split into training and testing sets. A separate phase III trial served as an independent validation set. Adaptive least absolute shrinkage and selection operator selected eight factors prognostic for OS. A predictive score was computed from the regression coefficients and used to classify patients into low- and high-risk groups. The model was assessed for its predictive accuracy using the time-dependent area under the curve (tAUC). Results The model included Eastern Cooperative Oncology Group performance status, disease site, lactate dehydrogenase, opioid analgesic use, albumin, hemoglobin, prostate-specific antigen, and alkaline phosphatase. Median OS values in the high- and low-risk groups, respectively, in the testing set were 17 and 30 months (hazard ratio [HR], 2.2; P < .001); in the validation set they were 14 and 26 months (HR, 2.9; P < .001). The tAUCs were 0.73 (95% CI, 0.70 to 0.73) and 0.76 (95% CI, 0.72 to 0.76) in the testing and validation sets, respectively. Conclusion An updated prognostic model for OS in patients with mCRPC receiving first-line chemotherapy was developed and validated on an external set. This model can be used to predict OS, as well as to better select patients to participate in trials on the basis of their prognosis. PMID:24449231
Whole-Genome DNA Methylation Status Associated with Clinical PTSD Measures of OIF/OEF Veterans
Emerging knowledge suggests that post -traumatic stress disorder (PTSD) pathophysiology is linked to the patients epigenetic changes, but...promoter-bound CpGIs to identify networks related to PTSD. The identified networks were further validated by an independent test set comprising 31 PTSD /29...set. To improve the statistical power and mitigate the assay bias and batch effects, a union set combining both training and test set was assayed
Gourmelon, Anne; Delrue, Nathalie
Ten years elapsed since the OECD published the Guidance document on the validation and international regulatory acceptance of test methods for hazard assessment. Much experience has been gained since then in validation centres, in countries and at the OECD on a variety of test methods that were subjected to validation studies. This chapter reviews validation principles and highlights common features that appear to be important for further regulatory acceptance across studies. Existing OECD-agreed validation principles will most likely generally remain relevant and applicable to address challenges associated with the validation of future test methods. Some adaptations may be needed to take into account the level of technique introduced in test systems, but demonstration of relevance and reliability will continue to play a central role as pre-requisite for the regulatory acceptance. Demonstration of relevance will become more challenging for test methods that form part of a set of predictive tools and methods, and that do not stand alone. OECD is keen on ensuring that while these concepts evolve, countries can continue to rely on valid methods and harmonised approaches for an efficient testing and assessment of chemicals.
Use of the Ames Check Standard Model for the Validation of Wall Interference Corrections
NASA Technical Reports Server (NTRS)
Ulbrich, N.; Amaya, M.; Flach, R.
2018-01-01
The new check standard model of the NASA Ames 11-ft Transonic Wind Tunnel was chosen for a future validation of the facility's wall interference correction system. The chosen validation approach takes advantage of the fact that test conditions experienced by a large model in the slotted part of the tunnel's test section will change significantly if a subset of the slots is temporarily sealed. Therefore, the model's aerodynamic coefficients have to be recorded, corrected, and compared for two different test section configurations in order to perform the validation. Test section configurations with highly accurate Mach number and dynamic pressure calibrations were selected for the validation. First, the model is tested with all test section slots in open configuration while keeping the model's center of rotation on the tunnel centerline. In the next step, slots on the test section floor are sealed and the model is moved to a new center of rotation that is 33 inches below the tunnel centerline. Then, the original angle of attack sweeps are repeated. Afterwards, wall interference corrections are applied to both test data sets and response surface models of the resulting aerodynamic coefficients in interference-free flow are generated. Finally, the response surface models are used to predict the aerodynamic coefficients for a family of angles of attack while keeping dynamic pressure, Mach number, and Reynolds number constant. The validation is considered successful if the corrected aerodynamic coefficients obtained from the related response surface model pair show good agreement. Residual differences between the corrected coefficient sets will be analyzed as well because they are an indicator of the overall accuracy of the facility's wall interference correction process.
USDA-ARS?s Scientific Manuscript database
The U.S. National Beef Cattle Evaluation Consortium (NBCEC) has been involved in the validation of commercial DNA tests for quantitative beef quality traits since their first appearance on the U.S. market in the early 2000s. The NBCEC Advisory Council initially requested that the NBCEC set up a syst...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Choi, Yong Joon; Yoo, Jun Soo; Smith, Curtis Lee
2015-09-01
This INL plan comprehensively describes the Requirements Traceability Matrix (RTM) on main physics and numerical method of the RELAP-7. The plan also describes the testing-based software verification and validation (SV&V) process—a set of specially designed software models used to test RELAP-7.
ERIC Educational Resources Information Center
Koziol, Natalie A.; Bovaird, James A.
2018-01-01
Evaluations of measurement invariance provide essential construct validity evidence--a prerequisite for seeking meaning in psychological and educational research and ensuring fair testing procedures in high-stakes settings. However, the quality of such evidence is partly dependent on the validity of the resulting statistical conclusions. Type I or…
2D-QSAR and 3D-QSAR Analyses for EGFR Inhibitors
Zhao, Manman; Zheng, Linfeng; Qiu, Chun
2017-01-01
Epidermal growth factor receptor (EGFR) is an important target for cancer therapy. In this study, EGFR inhibitors were investigated to build a two-dimensional quantitative structure-activity relationship (2D-QSAR) model and a three-dimensional quantitative structure-activity relationship (3D-QSAR) model. In the 2D-QSAR model, the support vector machine (SVM) classifier combined with the feature selection method was applied to predict whether a compound was an EGFR inhibitor. As a result, the prediction accuracy of the 2D-QSAR model was 98.99% by using tenfold cross-validation test and 97.67% by using independent set test. Then, in the 3D-QSAR model, the model with q2 = 0.565 (cross-validated correlation coefficient) and r2 = 0.888 (non-cross-validated correlation coefficient) was built to predict the activity of EGFR inhibitors. The mean absolute error (MAE) of the training set and test set was 0.308 log units and 0.526 log units, respectively. In addition, molecular docking was also employed to investigate the interaction between EGFR inhibitors and EGFR. PMID:28630865
ERIC Educational Resources Information Center
Serim-Yildiz, Begum; Erdur-Baker, Ozgur
2013-01-01
The authors examined the cultural validity of Fear Survey Schedule for Children (FSSC-AM) developed by J. J. Burnham (2005) with Turkish children. The relationships between demographic variables and the level of fear were also tested. Three independent data sets were used. The first data set comprised 676 participants (321 women and 355 men) and…
High-Throughput Classification of Radiographs Using Deep Convolutional Neural Networks.
Rajkomar, Alvin; Lingam, Sneha; Taylor, Andrew G; Blum, Michael; Mongan, John
2017-02-01
The study aimed to determine if computer vision techniques rooted in deep learning can use a small set of radiographs to perform clinically relevant image classification with high fidelity. One thousand eight hundred eighty-five chest radiographs on 909 patients obtained between January 2013 and July 2015 at our institution were retrieved and anonymized. The source images were manually annotated as frontal or lateral and randomly divided into training, validation, and test sets. Training and validation sets were augmented to over 150,000 images using standard image manipulations. We then pre-trained a series of deep convolutional networks based on the open-source GoogLeNet with various transformations of the open-source ImageNet (non-radiology) images. These trained networks were then fine-tuned using the original and augmented radiology images. The model with highest validation accuracy was applied to our institutional test set and a publicly available set. Accuracy was assessed by using the Youden Index to set a binary cutoff for frontal or lateral classification. This retrospective study was IRB approved prior to initiation. A network pre-trained on 1.2 million greyscale ImageNet images and fine-tuned on augmented radiographs was chosen. The binary classification method correctly classified 100 % (95 % CI 99.73-100 %) of both our test set and the publicly available images. Classification was rapid, at 38 images per second. A deep convolutional neural network created using non-radiological images, and an augmented set of radiographs is effective in highly accurate classification of chest radiograph view type and is a feasible, rapid method for high-throughput annotation.
Rate My Attitude: Research Agendas and RateMyProfessor Scores
ERIC Educational Resources Information Center
Carlozzi, Michael
2018-01-01
The literature on student evaluations of teaching (SETs) generally presents two opposing camps: those who believe in the validity and usefulness of SETs, and those who do not. Some researchers have suggested that 'SET deniers' resist SETs because of their own poor SET results. To test this hypothesis, I analysed essays by 230 SET researchers (170…
Validity and validation of expert (Q)SAR systems.
Hulzebos, E; Sijm, D; Traas, T; Posthumus, R; Maslankiewicz, L
2005-08-01
At a recent workshop in Setubal (Portugal) principles were drafted to assess the suitability of (quantitative) structure-activity relationships ((Q)SARs) for assessing the hazards and risks of chemicals. In the present study we applied some of the Setubal principles to test the validity of three (Q)SAR expert systems and validate the results. These principles include a mechanistic basis, the availability of a training set and validation. ECOSAR, BIOWIN and DEREK for Windows have a mechanistic or empirical basis. ECOSAR has a training set for each QSAR. For half of the structural fragments the number of chemicals in the training set is >4. Based on structural fragments and log Kow, ECOSAR uses linear regression to predict ecotoxicity. Validating ECOSAR for three 'valid' classes results in predictivity of > or = 64%. BIOWIN uses (non-)linear regressions to predict the probability of biodegradability based on fragments and molecular weight. It has a large training set and predicts non-ready biodegradability well. DEREK for Windows predictions are supported by a mechanistic rationale and literature references. The structural alerts in this program have been developed with a training set of positive and negative toxicity data. However, to support the prediction only a limited number of chemicals in the training set is presented to the user. DEREK for Windows predicts effects by 'if-then' reasoning. The program predicts best for mutagenicity and carcinogenicity. Each structural fragment in ECOSAR and DEREK for Windows needs to be evaluated and validated separately.
A score to estimate the likelihood of detecting advanced colorectal neoplasia at colonoscopy
Kaminski, Michal F; Polkowski, Marcin; Kraszewska, Ewa; Rupinski, Maciej; Butruk, Eugeniusz; Regula, Jaroslaw
2014-01-01
Objective This study aimed to develop and validate a model to estimate the likelihood of detecting advanced colorectal neoplasia in Caucasian patients. Design We performed a cross-sectional analysis of database records for 40-year-old to 66-year-old patients who entered a national primary colonoscopy-based screening programme for colorectal cancer in 73 centres in Poland in the year 2007. We used multivariate logistic regression to investigate the associations between clinical variables and the presence of advanced neoplasia in a randomly selected test set, and confirmed the associations in a validation set. We used model coefficients to develop a risk score for detection of advanced colorectal neoplasia. Results Advanced colorectal neoplasia was detected in 2544 of the 35 918 included participants (7.1%). In the test set, a logistic-regression model showed that independent risk factors for advanced colorectal neoplasia were: age, sex, family history of colorectal cancer, cigarette smoking (p<0.001 for these four factors), and Body Mass Index (p=0.033). In the validation set, the model was well calibrated (ratio of expected to observed risk of advanced neoplasia: 1.00 (95% CI 0.95 to 1.06)) and had moderate discriminatory power (c-statistic 0.62). We developed a score that estimated the likelihood of detecting advanced neoplasia in the validation set, from 1.32% for patients scoring 0, to 19.12% for patients scoring 7–8. Conclusions Developed and internally validated score consisting of simple clinical factors successfully estimates the likelihood of detecting advanced colorectal neoplasia in asymptomatic Caucasian patients. Once externally validated, it may be useful for counselling or designing primary prevention studies. PMID:24385598
A score to estimate the likelihood of detecting advanced colorectal neoplasia at colonoscopy.
Kaminski, Michal F; Polkowski, Marcin; Kraszewska, Ewa; Rupinski, Maciej; Butruk, Eugeniusz; Regula, Jaroslaw
2014-07-01
This study aimed to develop and validate a model to estimate the likelihood of detecting advanced colorectal neoplasia in Caucasian patients. We performed a cross-sectional analysis of database records for 40-year-old to 66-year-old patients who entered a national primary colonoscopy-based screening programme for colorectal cancer in 73 centres in Poland in the year 2007. We used multivariate logistic regression to investigate the associations between clinical variables and the presence of advanced neoplasia in a randomly selected test set, and confirmed the associations in a validation set. We used model coefficients to develop a risk score for detection of advanced colorectal neoplasia. Advanced colorectal neoplasia was detected in 2544 of the 35,918 included participants (7.1%). In the test set, a logistic-regression model showed that independent risk factors for advanced colorectal neoplasia were: age, sex, family history of colorectal cancer, cigarette smoking (p<0.001 for these four factors), and Body Mass Index (p=0.033). In the validation set, the model was well calibrated (ratio of expected to observed risk of advanced neoplasia: 1.00 (95% CI 0.95 to 1.06)) and had moderate discriminatory power (c-statistic 0.62). We developed a score that estimated the likelihood of detecting advanced neoplasia in the validation set, from 1.32% for patients scoring 0, to 19.12% for patients scoring 7-8. Developed and internally validated score consisting of simple clinical factors successfully estimates the likelihood of detecting advanced colorectal neoplasia in asymptomatic Caucasian patients. Once externally validated, it may be useful for counselling or designing primary prevention studies. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
In silico prediction of ROCK II inhibitors by different classification approaches.
Cai, Chuipu; Wu, Qihui; Luo, Yunxia; Ma, Huili; Shen, Jiangang; Zhang, Yongbin; Yang, Lei; Chen, Yunbo; Wen, Zehuai; Wang, Qi
2017-11-01
ROCK II is an important pharmacological target linked to central nervous system disorders such as Alzheimer's disease. The purpose of this research is to generate ROCK II inhibitor prediction models by machine learning approaches. Firstly, four sets of descriptors were calculated with MOE 2010 and PaDEL-Descriptor, and optimized by F-score and linear forward selection methods. In addition, four classification algorithms were used to initially build 16 classifiers with k-nearest neighbors [Formula: see text], naïve Bayes, Random forest, and support vector machine. Furthermore, three sets of structural fingerprint descriptors were introduced to enhance the predictive capacity of classifiers, which were assessed with fivefold cross-validation, test set validation and external test set validation. The best two models, MFK + MACCS and MLR + SubFP, have both MCC values of 0.925 for external test set. After that, a privileged substructure analysis was performed to reveal common chemical features of ROCK II inhibitors. Finally, binding modes were analyzed to identify relationships between molecular descriptors and activity, while main interactions were revealed by comparing the docking interaction of the most potent and the weakest ROCK II inhibitors. To the best of our knowledge, this is the first report on ROCK II inhibitors utilizing machine learning approaches that provides a new method for discovering novel ROCK II inhibitors.
ERIC Educational Resources Information Center
Walsh, Jennifer R.; Hebert, Angel; Byrd-Bredbenner, Carol; Carey, Gale; Colby, Sarah; Brown-Esters, Onikia N.; Greene, Geoffrey; Hoerr, Sharon; Horacek, Tanya; Kattelmann, Kendra; Kidd, Tandalayo; Koenings, Mallory; Phillips, Beatrice; Shelnutt, Karla P.; White, Adrienne A.
2012-01-01
Objective: To develop and test the validity of the Behavior, Environment, and Changeability Survey (BECS) for identifying the importance and changeability of nutrition, exercise, and stress management behavior and related aspects of the environment. Design: A cross-sectional, online survey of the BECS and selected validated instruments. Setting:…
Factor Structure and Validation of a Set of Readiness Measures.
ERIC Educational Resources Information Center
Kaufman, Maurice; Lynch, Mervin
A study was undertaken to identify the factor structure of a battery of readiness measures and to demonstrate the concurrent and predictive validity of one instrument in that battery--the Pre-Reading Screening Procedures (PSP). Concurrent validity was determined by examining the correlation of the PSP with the Metropolitan Readiness Test (MRT),…
Development of knowledge tests for multi-disciplinary emergency training: a review and an example.
Sørensen, J L; Thellesen, L; Strandbygaard, J; Svendsen, K D; Christensen, K B; Johansen, M; Langhoff-Roos, P; Ekelund, K; Ottesen, B; Van Der Vleuten, C
2015-01-01
The literature is sparse on written test development in a post-graduate multi-disciplinary setting. Developing and evaluating knowledge tests for use in multi-disciplinary post-graduate training is challenging. The objective of this study was to describe the process of developing and evaluating a multiple-choice question (MCQ) test for use in a multi-disciplinary training program in obstetric-anesthesia emergencies. A multi-disciplinary working committee with 12 members representing six professional healthcare groups and another 28 participants were involved. Recurrent revisions of the MCQ items were undertaken followed by a statistical analysis. The MCQ items were developed stepwise, including decisions on aims and content, followed by testing for face and content validity, construct validity, item-total correlation, and reliability. To obtain acceptable content validity, 40 out of originally 50 items were included in the final MCQ test. The MCQ test was able to distinguish between levels of competence, and good construct validity was indicated by a significant difference in the mean score between consultants and first-year trainees, as well as between first-year trainees and medical and midwifery students. Evaluation of the item-total correlation analysis in the 40 items set revealed that 11 items needed re-evaluation, four of which addressed content issues in local clinical guidelines. A Cronbach's alpha of 0.83 for reliability was found, which is acceptable. Content and construct validity and reliability were acceptable. The presented template for the development of this MCQ test could be useful to others when developing knowledge tests and may enhance the overall quality of test development. © 2014 The Acta Anaesthesiologica Scandinavica Foundation. Published by John Wiley & Sons Ltd.
Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies
2010-01-01
Background All polypeptide backbones have the potential to form amyloid fibrils, which are associated with a number of degenerative disorders. However, the likelihood that amyloidosis would actually occur under physiological conditions depends largely on the amino acid composition of a protein. We explore using a naive Bayesian classifier and a weighted decision tree for predicting the amyloidogenicity of immunoglobulin sequences. Results The average accuracy based on leave-one-out (LOO) cross validation of a Bayesian classifier generated from 143 amyloidogenic sequences is 60.84%. This is consistent with the average accuracy of 61.15% for a holdout test set comprised of 103 AM and 28 non-amyloidogenic sequences. The LOO cross validation accuracy increases to 81.08% when the training set is augmented by the holdout test set. In comparison, the average classification accuracy for the holdout test set obtained using a decision tree is 78.64%. Non-amyloidogenic sequences are predicted with average LOO cross validation accuracies between 74.05% and 77.24% using the Bayesian classifier, depending on the training set size. The accuracy for the holdout test set was 89%. For the decision tree, the non-amyloidogenic prediction accuracy is 75.00%. Conclusions This exploratory study indicates that both classification methods may be promising in providing straightforward predictions on the amyloidogenicity of a sequence. Nevertheless, the number of available sequences that satisfy the premises of this study are limited, and are consequently smaller than the ideal training set size. Increasing the size of the training set clearly increases the accuracy, and the expansion of the training set to include not only more derivatives, but more alignments, would make the method more sound. The accuracy of the classifiers may also be improved when additional factors, such as structural and physico-chemical data, are considered. The development of this type of classifier has significant applications in evaluating engineered antibodies, and may be adapted for evaluating engineered proteins in general. PMID:20144194
Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies.
David, Maria Pamela C; Concepcion, Gisela P; Padlan, Eduardo A
2010-02-08
All polypeptide backbones have the potential to form amyloid fibrils, which are associated with a number of degenerative disorders. However, the likelihood that amyloidosis would actually occur under physiological conditions depends largely on the amino acid composition of a protein. We explore using a naive Bayesian classifier and a weighted decision tree for predicting the amyloidogenicity of immunoglobulin sequences. The average accuracy based on leave-one-out (LOO) cross validation of a Bayesian classifier generated from 143 amyloidogenic sequences is 60.84%. This is consistent with the average accuracy of 61.15% for a holdout test set comprised of 103 AM and 28 non-amyloidogenic sequences. The LOO cross validation accuracy increases to 81.08% when the training set is augmented by the holdout test set. In comparison, the average classification accuracy for the holdout test set obtained using a decision tree is 78.64%. Non-amyloidogenic sequences are predicted with average LOO cross validation accuracies between 74.05% and 77.24% using the Bayesian classifier, depending on the training set size. The accuracy for the holdout test set was 89%. For the decision tree, the non-amyloidogenic prediction accuracy is 75.00%. This exploratory study indicates that both classification methods may be promising in providing straightforward predictions on the amyloidogenicity of a sequence. Nevertheless, the number of available sequences that satisfy the premises of this study are limited, and are consequently smaller than the ideal training set size. Increasing the size of the training set clearly increases the accuracy, and the expansion of the training set to include not only more derivatives, but more alignments, would make the method more sound. The accuracy of the classifiers may also be improved when additional factors, such as structural and physico-chemical data, are considered. The development of this type of classifier has significant applications in evaluating engineered antibodies, and may be adapted for evaluating engineered proteins in general.
Mahmood, Eitezaz; Matyal, Robina; Mueller, Ariel; Mahmood, Feroze; Tung, Avery; Montealegre-Gallegos, Mario; Schermerhorn, Marc; Shahul, Sajid
2018-03-01
In some institutions, the current blood ordering practice does not discriminate minimally invasive endovascular aneurysm repair (EVAR) from open procedures, with consequent increasing costs and likelihood of blood product wastage for EVARs. This limitation in practice can possibly be addressed with the development of a reliable prediction model for transfusion risk in EVAR patients. We used the American College of Surgeons National Surgical Quality Improvement Program (ACS NSQIP) database to create a model for prediction of intraoperative blood transfusion occurrence in patients undergoing EVAR. Afterward, we tested our predictive model on the Vascular Study Group of New England (VSGNE) database. We used the ACS NSQIP database for patients who underwent EVAR from 2011 to 2013 (N = 4709) as our derivation set for identifying a risk index for predicting intraoperative blood transfusion. We then developed a clinical risk score and validated this model using patients who underwent EVAR from 2003 to 2014 in the VSGNE database (N = 4478). The transfusion rates were 8.4% and 6.1% for the ACS NSQIP (derivation set) and VSGNE (validation) databases, respectively. Hemoglobin concentration, American Society of Anesthesiologists class, age, and aneurysm diameter predicted blood transfusion in the derivation set. When it was applied on the validation set, our risk index demonstrated good discrimination in both the derivation and validation set (C statistic = 0.73 and 0.70, respectively) and calibration using the Hosmer-Lemeshow test (P = .27 and 0.31) for both data sets. We developed and validated a risk index for predicting the likelihood of intraoperative blood transfusion in EVAR patients. Implementation of this index may facilitate the blood management strategies specific for EVAR. Copyright © 2017 Society for Vascular Surgery. Published by Elsevier Inc. All rights reserved.
European Portuguese adaptation and validation of dilemmas used to assess moral decision-making.
Fernandes, Carina; Gonçalves, Ana Ribeiro; Pasion, Rita; Ferreira-Santos, Fernando; Paiva, Tiago Oliveira; Melo E Castro, Joana; Barbosa, Fernando; Martins, Isabel Pavão; Marques-Teixeira, João
2018-03-01
Objective To adapt and validate a widely used set of moral dilemmas to European Portuguese, which can be applied to assess decision-making. Moreover, the classical formulation of the dilemmas was compared with a more focused moral probe. Finally, a shorter version of the moral scenarios was tested. Methods The Portuguese version of the set of moral dilemmas was tested in 53 individuals from several regions of Portugal. In a second study, an alternative way of questioning on moral dilemmas was tested in 41 participants. Finally, the shorter version of the moral dilemmas was tested in 137 individuals. Results Results evidenced no significant differences between English and Portuguese versions. Also, asking whether actions are "morally acceptable" elicited less utilitarian responses than the original question, although without reaching statistical significance. Finally, all tested versions of moral dilemmas exhibited the same pattern of responses, suggesting that the fundamental elements to the moral decision-making were preserved. Conclusions We found evidence of cross-cultural validity for moral dilemmas. However, the moral focus might affect utilitarian/deontological judgments.
Focks, Andreas; Belgers, Dick; Boerwinkel, Marie-Claire; Buijse, Laura; Roessink, Ivo; Van den Brink, Paul J
2018-05-01
Exposure patterns in ecotoxicological experiments often do not match the exposure profiles for which a risk assessment needs to be performed. This limitation can be overcome by using toxicokinetic-toxicodynamic (TKTD) models for the prediction of effects under time-variable exposure. For the use of TKTD models in the environmental risk assessment of chemicals, it is required to calibrate and validate the model for specific compound-species combinations. In this study, the survival of macroinvertebrates after exposure to the neonicotinoid insecticide was modelled using TKTD models from the General Unified Threshold models of Survival (GUTS) framework. The models were calibrated on existing survival data from acute or chronic tests under static exposure regime. Validation experiments were performed for two sets of species-compound combinations: one set focussed on multiple species sensitivity to a single compound: imidacloprid, and the other set on the effects of multiple compounds for a single species, i.e., the three neonicotinoid compounds imidacloprid, thiacloprid and thiamethoxam, on the survival of the mayfly Cloeon dipterum. The calibrated models were used to predict survival over time, including uncertainty ranges, for the different time-variable exposure profiles used in the validation experiments. From the comparison between observed and predicted survival, it appeared that the accuracy of the model predictions was acceptable for four of five tested species in the multiple species data set. For compounds such as neonicotinoids, which are known to have the potential to show increased toxicity under prolonged exposure, the calibration and validation of TKTD models for survival needs to be performed ideally by considering calibration data from both acute and chronic tests.
Pittman, Joyce; Beeson, Terrie; Terry, Colin; Dillon, Jill; Hampton, Charity; Kerley, Denise; Mosier, Judith; Gumiela, Ellen; Tucker, Jessica
2016-01-01
Despite prevention strategies, hospital-acquired pressure ulcers (HAPUs) continue to occur in the acute care setting. The purpose of this study was to develop an operational definition of and an instrument for identifying avoidable/unavoidable HAPUs in the acute care setting. The Indiana University Health Pressure Ulcer Prevention Inventory (PUPI) was developed and psychometric testing was performed. A retrospective pilot study of 31 adult hospitalized patients with an HAPU was conducted using the PUPI. Overall content validity index of 0.99 and individual item content validity index scores (0.9-1.0) demonstrated excellent content validity. Acceptable PUPI criterion validity was demonstrated with no statistically significant differences between wound specialists' and other panel experts' scoring. Construct validity findings were acceptable with no statistically significant differences among avoidable or unavoidable HAPU patients and their Braden Scale total scores. Interrater reliability was acceptable with perfect agreement on the total PUPI score between raters (κ = 1.0; P = .025). Raters were in total agreement 93% (242/260) of the time on all 12 individual PUPI items. No risk factors were found to be significantly associated with unavoidable HAPUs. An operational definition of and an instrument for identifying avoidable/unavoidable HAPUs in the acute care setting were developed and tested. The instrument provides an objective and structured method for identifying avoidable/unavoidable HAPUs. The PUPI provides an additional method that could be used in root-cause analyses and when reporting adverse pressure ulcer events.
Nakagami, Katsuyuki; Yamauchi, Toyoaki; Noguchi, Hiroyuki; Maeda, Tohru; Nakagami, Tomoko
2014-06-01
This study aimed to develop a reliable and valid measure of functional health literacy in a Japanese clinical setting. Test development consisted of three phases: generation of an item pool, consultation with experts to assess content validity, and comparison with external criteria (the Japanese Health Knowledge Test) to assess criterion validity. A trial version of the test was administered to 535 Japanese outpatients. Internal consistency reliability, calculated by Cronbach's alpha, was 0.81, and concurrent validity was moderate. Receiver Operating Characteristics and Item Response Theory were used to classify patients as having adequate, marginal, or inadequate functional health literacy. Both inadequate and marginal functional health literacy were associated with older age, lower income, lower educational attainment, and poor health knowledge. The time required to complete the test was 10-15 min. This test should enable health workers to better identify patients with inadequate health literacy. © 2013 Wiley Publishing Asia Pty Ltd.
de Morton, Natalie A; Lane, Kylie
2010-11-01
To investigate the clinimetric properties of the de Morton Mobility Index (DEMMI) in a Geriatric Evaluation and Management (GEM) population. A longitudinal validation study (n = 100) and inter-rater reliability study (n = 29) in a GEM population. Consecutive patients admitted to a GEM rehabilitation ward were eligible for inclusion. At hospital admission and discharge, a physical therapist assessed patients with physical performance instruments that included the 6-metre walk test, step test, Clinical Test of Sensory Organization and Balance, Timed Up and Go test, 6-minute walk test and the DEMMI. Consecutively eligible patients were included in an inter-rater reliability study between physical therapists. DEMMI admission scores were normally distributed (mean 30.2, standard deviation 16.7) and other activity limitation instruments had either a floor or a ceiling effect. Evidence of convergent, discriminant and known groups validity for the DEMMI were obtained. The minimal detectable change with 90% confidence was 10.5 (95% confidence interval 6.1-17.9) points and the minimally clinically important difference was 8.4 points on the 100-point interval DEMMI scale. The DEMMI provides clinicians with an accurate and valid method of measuring mobility for geriatric patients in the subacute hospital setting.
Validation of a short-term memory test for the recognition of people and faces.
Leyk, D; Sievert, A; Heiss, A; Gorges, W; Ridder, D; Alexander, T; Wunderlich, M; Ruther, T
2008-08-01
Memorising and processing faces is a short-term memory dependent task of utmost importance in the security domain, in which constant and high performance is a must. Especially in access or passport control-related tasks, the timely identification of performance decrements is essential, margins of error are narrow and inadequate performance may have grave consequences. However, conventional short-term memory tests frequently use abstract settings with little relevance to working situations. They may thus be unable to capture task-specific decrements. The aim of the study was to devise and validate a new test, better reflecting job specifics and employing appropriate stimuli. After 1.5 s (short) or 4.5 s (long) presentation, a set of seven portraits of faces had to be memorised for comparison with two control stimuli. Stimulus appearance followed 2 s (first item) and 8 s (second item) after set presentation. Twenty eight subjects (12 male, 16 female) were tested at seven different times of day, 3 h apart. Recognition rates were above 60% even for the least favourable condition. Recognition was significantly better in the 'long' condition (+10%) and for the first item (+18%). Recognition time showed significant differences (10%) between items. Minor effects of learning were found for response latencies only. Based on occupationally relevant metrics, the test displayed internal and external validity, consistency and suitability for further use in test/retest scenarios. In public security, especially where access to restricted areas is monitored, margins of error are narrow and operator performance must remain high and level. Appropriate schedules for personnel, based on valid test results, are required. However, task-specific data and performance tests, permitting the description of task specific decrements, are not available. Commonly used tests may be unsuitable due to undue abstraction and insufficient reference to real-world conditions. Thus, tests are required that account for task-specific conditions and neurophysiological characteristics.
Zhang, Hui; Ren, Ji-Xia; Kang, Yan-Li; Bo, Peng; Liang, Jun-Yu; Ding, Lan; Kong, Wei-Bao; Zhang, Ji
2017-08-01
Toxicological testing associated with developmental toxicity endpoints are very expensive, time consuming and labor intensive. Thus, developing alternative approaches for developmental toxicity testing is an important and urgent task in the drug development filed. In this investigation, the naïve Bayes classifier was applied to develop a novel prediction model for developmental toxicity. The established prediction model was evaluated by the internal 5-fold cross validation and external test set. The overall prediction results for the internal 5-fold cross validation of the training set and external test set were 96.6% and 82.8%, respectively. In addition, four simple descriptors and some representative substructures of developmental toxicants were identified. Thus, we hope the established in silico prediction model could be used as alternative method for toxicological assessment. And these obtained molecular information could afford a deeper understanding on the developmental toxicants, and provide guidance for medicinal chemists working in drug discovery and lead optimization. Copyright © 2017 Elsevier Inc. All rights reserved.
van Dongen, Koen W; Ahlberg, Gunnar; Bonavina, Luigi; Carter, Fiona J; Grantcharov, Teodor P; Hyltander, Anders; Schijven, Marlies P; Stefani, Alessandro; van der Zee, David C; Broeders, Ivo A M J
2011-01-01
Virtual reality (VR) simulators have been demonstrated to improve basic psychomotor skills in endoscopic surgery. The exercise configuration settings used for validation in studies published so far are default settings or are based on the personal choice of the tutors. The purpose of this study was to establish consensus on exercise configurations and on a validated training program for a virtual reality simulator, based on the experience of international experts to set criterion levels to construct a proficiency-based training program. A consensus meeting was held with eight European teams, all extensively experienced in using the VR simulator. Construct validity of the training program was tested by 20 experts and 60 novices. The data were analyzed by using the t test for equality of means. Consensus was achieved on training designs, exercise configuration, and examination. Almost all exercises (7/8) showed construct validity. In total, 50 of 94 parameters (53%) showed significant difference. A European, multicenter, validated, training program was constructed according to the general consensus of a large international team with extended experience in virtual reality simulation. Therefore, a proficiency-based training program can be offered to training centers that use this simulator for training in basic psychomotor skills in endoscopic surgery.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pražnikar, Jure; University of Primorska,; Turk, Dušan, E-mail: dusan.turk@ijs.si
2014-12-01
The maximum-likelihood free-kick target, which calculates model error estimates from the work set and a randomly displaced model, proved superior in the accuracy and consistency of refinement of crystal structures compared with the maximum-likelihood cross-validation target, which calculates error estimates from the test set and the unperturbed model. The refinement of a molecular model is a computational procedure by which the atomic model is fitted to the diffraction data. The commonly used target in the refinement of macromolecular structures is the maximum-likelihood (ML) function, which relies on the assessment of model errors. The current ML functions rely on cross-validation. Theymore » utilize phase-error estimates that are calculated from a small fraction of diffraction data, called the test set, that are not used to fit the model. An approach has been developed that uses the work set to calculate the phase-error estimates in the ML refinement from simulating the model errors via the random displacement of atomic coordinates. It is called ML free-kick refinement as it uses the ML formulation of the target function and is based on the idea of freeing the model from the model bias imposed by the chemical energy restraints used in refinement. This approach for the calculation of error estimates is superior to the cross-validation approach: it reduces the phase error and increases the accuracy of molecular models, is more robust, provides clearer maps and may use a smaller portion of data for the test set for the calculation of R{sub free} or may leave it out completely.« less
Kapiriri, Lydia
2017-06-19
While there have been efforts to develop frameworks to guide healthcare priority setting; there has been limited focus on evaluation frameworks. Moreover, while the few frameworks identify quality indicators for successful priority setting, they do not provide the users with strategies to verify these indicators. Kapiriri and Martin (Health Care Anal 18:129-147, 2010) developed a framework for evaluating priority setting in low and middle income countries. This framework provides BOTH parameters for successful priority setting and proposes means of their verification. Before its use in real life contexts, this paper presents results from a validation process of the framework. The framework validation involved 53 policy makers and priority setting researchers at the global, national and sub-national levels (in Uganda). They were requested to indicate the relative importance of the proposed parameters as well as the feasibility of obtaining the related information. We also pilot tested the proposed means of verification. Almost all the respondents evaluated all the parameters, including the contextual factors, as 'very important'. However, some respondents at the global level thought 'presence of incentives to comply', 'reduced disagreements', 'increased public understanding,' 'improved institutional accountability' and 'meeting the ministry of health objectives', which could be a reflection of their levels of decision making. All the proposed means of verification were assessed as feasible with the exception of meeting observations which would require an insider. These findings results were consistent with those obtained from the pilot testing. These findings are relevant to policy makers and researchers involved in priority setting in low and middle income countries. To the best of our knowledge, this is one of the few initiatives that has involved potential users of a framework (at the global and in a Low Income Country) in its validation. The favorable validation of all the parameters at the national and sub-national levels implies that the framework has potential usefulness at those levels, as is. The parameters that were disputed at the global level necessitate further discussion when using the framework at that level. The next step is to use the validated framework in evaluating actual priority setting at the different levels.
Examining the Effectiveness of Test Accommodation Using DIF and a Mixture IRT Model
ERIC Educational Resources Information Center
Cho, Hyun-Jeong; Lee, Jaehoon; Kingston, Neal
2012-01-01
This study examined the validity of test accommodation in third-eighth graders using differential item functioning (DIF) and mixture IRT models. Two data sets were used for these analyses. With the first data set (N = 51,591) we examined whether item type (i.e., story, explanation, straightforward) or item features were associated with item…
Bauer, Lyndsey; McCaffrey, Robert J
2006-01-01
In forensic neuropsychological settings, maintaining test security has become critically important, especially in regard to symptom validity tests (SVTs). Coaching, which can entail providing patients or litigants with information about the cognitive sequelae of head injury, or teaching them test-taking strategies to avoid detection of symptom dissimulation has been examined experimentally in many research studies. Emerging evidence supports that coaching strategies affect psychological and neuropsychological test performance to differing degrees depending on the coaching paradigm and the tests administered. The present study sought to examine Internet coverage of SVTs because it is potentially another source of coaching, or information that is readily available. Google searches were performed on the Test of Memory Malingering, the Victoria Symptom Validity Test, and the Word Memory Test. Results indicated that there is a variable amount of information available about each test that could threaten test security and validity should inappropriately interested parties find it. Steps that could be taken to improve this situation and limitations to this exploration are discussed.
Measurement of latent cognitive abilities involved in concept identification learning.
Thomas, Michael L; Brown, Gregory G; Gur, Ruben C; Moore, Tyler M; Patt, Virginie M; Nock, Matthew K; Naifeh, James A; Heeringa, Steven; Ursano, Robert J; Stein, Murray B
2015-01-01
We used cognitive and psychometric modeling techniques to evaluate the construct validity and measurement precision of latent cognitive abilities measured by a test of concept identification learning: the Penn Conditional Exclusion Test (PCET). Item response theory parameters were embedded within classic associative- and hypothesis-based Markov learning models and were fitted to 35,553 Army soldiers' PCET data from the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS). Data were consistent with a hypothesis-testing model with multiple latent abilities-abstraction and set shifting. Latent abstraction ability was positively correlated with number of concepts learned, and latent set-shifting ability was negatively correlated with number of perseverative errors, supporting the construct validity of the two parameters. Abstraction was most precisely assessed for participants with abilities ranging from 1.5 standard deviations below the mean to the mean itself. Measurement of set shifting was acceptably precise only for participants making a high number of perseverative errors. The PCET precisely measures latent abstraction ability in the Army STARRS sample, especially within the range of mildly impaired to average ability. This precision pattern is ideal for a test developed to measure cognitive impairment as opposed to cognitive strength. The PCET also measures latent set-shifting ability, but reliable assessment is limited to the impaired range of ability, reflecting that perseverative errors are rare among cognitively healthy adults. Integrating cognitive and psychometric models can provide information about construct validity and measurement precision within a single analytical framework.
Nayana, M Ravi Shashi; Sekhar, Y Nataraja; Nandyala, Haritha; Muttineni, Ravikumar; Bairy, Santosh Kumar; Singh, Kriti; Mahmood, S K
2008-10-01
In the present study, a series of 179 quinoline and quinazoline heterocyclic analogues exhibiting inhibitory activity against Gastric (H+/K+)-ATPase were investigated using the comparative molecular field analysis (CoMFA) and comparative molecular similarity indices (CoMSIA) methods. Both the models exhibited good correlation between the calculated 3D-QSAR fields and the observed biological activity for the respective training set compounds. The most optimal CoMFA and CoMSIA models yielded significant leave-one-out cross-validation coefficient, q(2) of 0.777, 0.744 and conventional cross-validation coefficient, r(2) of 0.927, 0.914 respectively. The predictive ability of generated models was tested on a set of 52 compounds having broad range of activity. CoMFA and CoMSIA yielded predicted activities for test set compounds with r(pred)(2) of 0.893 and 0.917 respectively. These validation tests not only revealed the robustness of the models but also demonstrated that for our models r(pred)(2) based on the mean activity of test set compounds can accurately estimate external predictivity. The factors affecting activity were analyzed carefully according to standard coefficient contour maps of steric, electrostatic, hydrophobic, acceptor and donor fields derived from the CoMFA and CoMSIA. These contour plots identified several key features which explain the wide range of activities. The results obtained from models offer important structural insight into designing novel peptic-ulcer inhibitors prior to their synthesis.
Novel naïve Bayes classification models for predicting the chemical Ames mutagenicity.
Zhang, Hui; Kang, Yan-Li; Zhu, Yuan-Yuan; Zhao, Kai-Xia; Liang, Jun-Yu; Ding, Lan; Zhang, Teng-Guo; Zhang, Ji
2017-06-01
Prediction of drug candidates for mutagenicity is a regulatory requirement since mutagenic compounds could pose a toxic risk to humans. The aim of this investigation was to develop a novel prediction model of mutagenicity by using a naïve Bayes classifier. The established model was validated by the internal 5-fold cross validation and external test sets. For comparison, the recursive partitioning classifier prediction model was also established and other various reported prediction models of mutagenicity were collected. Among these methods, the prediction performance of naïve Bayes classifier established here displayed very well and stable, which yielded average overall prediction accuracies for the internal 5-fold cross validation of the training set and external test set I set were 89.1±0.4% and 77.3±1.5%, respectively. The concordance of the external test set II with 446 marketed drugs was 90.9±0.3%. In addition, four simple molecular descriptors (e.g., Apol, No. of H donors, Num-Rings and Wiener) related to mutagenicity and five representative substructures of mutagens (e.g., aromatic nitro, hydroxyl amine, nitroso, aromatic amine and N-methyl-N-methylenemethanaminum) produced by ECFP_14 fingerprints were identified. We hope the established naïve Bayes prediction model can be applied to risk assessment processes; and the obtained important information of mutagenic chemicals can guide the design of chemical libraries for hit and lead optimization. Copyright © 2017 Elsevier B.V. All rights reserved.
Esteban, Santiago; Rodríguez Tablado, Manuel; Peper, Francisco; Mahumud, Yamila S; Ricci, Ricardo I; Kopitowski, Karin; Terrasa, Sergio
2017-01-01
Precision medicine requires extremely large samples. Electronic health records (EHR) are thought to be a cost-effective source of data for that purpose. Phenotyping algorithms help reduce classification errors, making EHR a more reliable source of information for research. Four algorithm development strategies for classifying patients according to their diabetes status (diabetics; non-diabetics; inconclusive) were tested (one codes-only algorithm; one boolean algorithm, four statistical learning algorithms and six stacked generalization meta-learners). The best performing algorithms within each strategy were tested on the validation set. The stacked generalization algorithm yielded the highest Kappa coefficient value in the validation set (0.95 95% CI 0.91, 0.98). The implementation of these algorithms allows for the exploitation of data from thousands of patients accurately, greatly reducing the costs of constructing retrospective cohorts for research.
Tóth, Gergely; Bodai, Zsolt; Héberger, Károly
2013-10-01
Coefficient of determination (R (2)) and its leave-one-out cross-validated analogue (denoted by Q (2) or R cv (2) ) are the most frequantly published values to characterize the predictive performance of models. In this article we use R (2) and Q (2) in a reversed aspect to determine uncommon points, i.e. influential points in any data sets. The term (1 - Q (2))/(1 - R (2)) corresponds to the ratio of predictive residual sum of squares and the residual sum of squares. The ratio correlates to the number of influential points in experimental and random data sets. We propose an (approximate) F test on (1 - Q (2))/(1 - R (2)) term to quickly pre-estimate the presence of influential points in training sets of models. The test is founded upon the routinely calculated Q (2) and R (2) values and warns the model builders to verify the training set, to perform influence analysis or even to change to robust modeling.
DESCQA: An Automated Validation Framework for Synthetic Sky Catalogs
NASA Astrophysics Data System (ADS)
Mao, Yao-Yuan; Kovacs, Eve; Heitmann, Katrin; Uram, Thomas D.; Benson, Andrew J.; Campbell, Duncan; Cora, Sofía A.; DeRose, Joseph; Di Matteo, Tiziana; Habib, Salman; Hearin, Andrew P.; Bryce Kalmbach, J.; Krughoff, K. Simon; Lanusse, François; Lukić, Zarija; Mandelbaum, Rachel; Newman, Jeffrey A.; Padilla, Nelson; Paillas, Enrique; Pope, Adrian; Ricker, Paul M.; Ruiz, Andrés N.; Tenneti, Ananth; Vega-Martínez, Cristian A.; Wechsler, Risa H.; Zhou, Rongpu; Zu, Ying; The LSST Dark Energy Science Collaboration
2018-02-01
The use of high-quality simulated sky catalogs is essential for the success of cosmological surveys. The catalogs have diverse applications, such as investigating signatures of fundamental physics in cosmological observables, understanding the effect of systematic uncertainties on measured signals and testing mitigation strategies for reducing these uncertainties, aiding analysis pipeline development and testing, and survey strategy optimization. The list of applications is growing with improvements in the quality of the catalogs and the details that they can provide. Given the importance of simulated catalogs, it is critical to provide rigorous validation protocols that enable both catalog providers and users to assess the quality of the catalogs in a straightforward and comprehensive way. For this purpose, we have developed the DESCQA framework for the Large Synoptic Survey Telescope Dark Energy Science Collaboration as well as for the broader community. The goal of DESCQA is to enable the inspection, validation, and comparison of an inhomogeneous set of synthetic catalogs via the provision of a common interface within an automated framework. In this paper, we present the design concept and first implementation of DESCQA. In order to establish and demonstrate its full functionality we use a set of interim catalogs and validation tests. We highlight several important aspects, both technical and scientific, that require thoughtful consideration when designing a validation framework, including validation metrics and how these metrics impose requirements on the synthetic sky catalogs.
NASA Technical Reports Server (NTRS)
Sutliff, Daniel L.
2014-01-01
The NASA Glenn Research Center's Advanced Noise Control Fan (ANCF) was developed in the early 1990s to provide a convenient test bed to measure and understand fan-generated acoustics, duct propagation, and radiation to the farfield. A series of tests were performed primarily for the use of code validation and tool validation. Rotating Rake mode measurements were acquired for parametric sets of: (i) mode blockage, (ii) liner insertion loss, (iii) short ducts, and (iv) mode reflection.
NASA Technical Reports Server (NTRS)
Sutliff, Daniel L.
2014-01-01
The NASA Glenn Research Center's Advanced Noise Control Fan (ANCF) was developed in the early 1990s to provide a convenient test bed to measure and understand fan-generated acoustics, duct propagation, and radiation to the farfield. A series of tests were performed primarily for the use of code validation and tool validation. Rotating Rake mode measurements were acquired for parametric sets of: (1) mode blockage, (2) liner insertion loss, (3) short ducts, and (4) mode reflection.
ERIC Educational Resources Information Center
St. Clair, Travis; Hallberg, Kelly; Cook, Thomas D.
2016-01-01
We explore the conditions under which short, comparative interrupted time-series (CITS) designs represent valid alternatives to randomized experiments in educational evaluations. To do so, we conduct three within-study comparisons, each of which uses a unique data set to test the validity of the CITS design by comparing its causal estimates to…
ERIC Educational Resources Information Center
Boonstra, Anne M.; Reneman, Michiel F.; Stewart, Roy E.; Balk, Gerlof A.
2012-01-01
The aim of this study was to determine the reliability and discriminant validity of the Dutch version of the life satisfaction questionnaire (Lisat-9 DV) to assess patients with an acquired brain injury. The reliability study used a test-retest design, and the validity study used a cross-sectional design. The setting was the general rehabilitation…
Development of a Valid Volleyball Skills Test Battery.
ERIC Educational Resources Information Center
Bartlett, Jackie; And Others
1991-01-01
Describes the development of the North Carolina State University Volleyball Skills Test Battery which offers accurate measurement of three volleyball skills (serve, forearm pass, and set). When physical educators tested 313 students, the battery objectively measured their abilities, providing a gamelike means of teaching, testing, grouping, and…
Akerman, Eva; Fridlund, Bengt; Ersson, Anders; Granberg-Axéll, Anetth
2009-04-01
Current studies reveal a lack of consensus for the evaluation of physical and psychosocial problems after ICU stay and their changes over time. The aim was to develop and evaluate the validity and reliability of a questionnaire for assessing physical and psychosocial problems over time for patients following ICU recovery. Thirty-nine patients completed the questionnaire, 17 were retested. The questionnaire was constructed in three sets: physical problems, psychosocial problems and follow-up care. Face and content validity were tested by nurses, researchers and patients. The questionnaire showed good construct validity in all three sets and had strong factor loadings (explained variance >70%, factor loadings >0.5) for all three sets. There was good concurrent validity compared with the SF 12 (r(s)>0.5). Internal consistency was shown to be reliable (Cronbach's alpha 0.70-0.85). Stability reliability on retesting was good for the physical and psychosocial sets (r(s)>0.5). The 3-set 4P questionnaire was a first step in developing an instrument for assessment of former ICU patients' problems over time. The sample size was small and thus, further studies are needed to confirm these findings.
Psychometric evaluation of 3-set 4P questionnaire.
Akerman, Eva; Fridlund, Bengt; Samuelson, Karin; Baigi, Amir; Ersson, Anders
2013-02-01
This is a further development of a specific questionnaire, the 3-set 4P, to be used for measuring former ICU patients' physical and psychosocial problems after intensive care and the need for follow-up. The aim was to psychometrically test and evaluate the 3-set 4P questionnaire in a larger population. The questionnaire consists of three sets: "physical", "psychosocial" and "follow-up". The questionnaires were sent by mail to all patients with more than 24-hour length of stay on four ICUs in Sweden. Construct validity was measured with exploratory factor analysis with Varimax rotation. This resulted in three factors for the "physical set", five factors for the "psychosocial set" and four factors for the "follow-up set" with strong factor loadings and a total explained variance of 62-77.5%. Thirteen questions in the SF-36 were used for concurrent validity showing Spearman's r(s) 0.3-0.6 in eight questions and less than 0.2 in five. Test-retest was used for stability reliability. In set follow-up the correlation was strong to moderate and in physical and psychosocial sets the correlations were moderate to fair. This may have been because the physical and psychosocial status changed rapidly during the test period. All three sets had good homogeneity. In conclusion, the 3-set 4P showed overall acceptable results, but it has to be further modified in different cultures before being considered a fully operational instrument for use in clinical practice. Copyright © 2012 Elsevier Ltd. All rights reserved.
Prediction of breast cancer risk with volatile biomarkers in breath.
Phillips, Michael; Cataneo, Renee N; Cruz-Ramos, Jose Alfonso; Huston, Jan; Ornelas, Omar; Pappas, Nadine; Pathak, Sonali
2018-03-23
Human breath contains volatile organic compounds (VOCs) that are biomarkers of breast cancer. We investigated the positive and negative predictive values (PPV and NPV) of breath VOC biomarkers as indicators of breast cancer risk. We employed ultra-clean breath collection balloons to collect breath samples from 54 women with biopsy-proven breast cancer and 124 cancer-free controls. Breath VOCs were analyzed with gas chromatography (GC) combined with either mass spectrometry (GC MS) or surface acoustic wave detection (GC SAW). Chromatograms were randomly assigned to a training set or a validation set. Monte Carlo analysis identified significant breath VOC biomarkers of breast cancer in the training set, and these biomarkers were incorporated into a multivariate algorithm to predict disease in the validation set. In the unsplit dataset, the predictive algorithms generated discriminant function (DF) values that varied with sensitivity, specificity, PPV and NPV. Using GC MS, test accuracy = 90% (area under curve of receiver operating characteristic in unsplit dataset) and cross-validated accuracy = 77%. Using GC SAW, test accuracy = 86% and cross-validated accuracy = 74%. With both assays, a low DF value was associated with a low risk of breast cancer (NPV > 99.9%). A high DF value was associated with a high risk of breast cancer and PPV rising to 100%. Analysis of breath VOC samples collected with ultra-clean balloons detected biomarkers that accurately predicted risk of breast cancer.
ERIC Educational Resources Information Center
Cory, Charles H.
This report presents data concerning the validity of a set of experimental computerized and paper-and-pencil tests for measures of on-job performance on global and job elements. It reports on the usefulness of 30 experimental and operational variables for predicting marks on 42 job elements and on a global criterion for Electrician's Mate,…
ERIC Educational Resources Information Center
Hobayan, Kalani; Patterson, Debra; Sherman, Clay; Wiersma, Lenny
2014-01-01
In a society in which obesity levels have tripled in the past 30 years, the importance of increased fitness levels within the academic setting has become even more critical. The purpose of this study was to investigate the validity of alternative Fitnessgram upper body tests of muscular strength and endurance among seventh and eighth grade males…
Development and initial validation of the appropriate antibiotic use self-efficacy scale.
Hill, Erin M; Watkins, Kaitlin
2018-06-04
While there are various medication self-efficacy scales that exist, none assess self-efficacy for appropriate antibiotic use. The Appropriate Antibiotic Use Self-Efficacy Scale (AAUSES) was developed, pilot tested, and its psychometric properties were examined. Following pilot testing of the scale, a 28-item questionnaire was examined using a sample (n = 289) recruited through the Amazon Mechanical Turk platform. Participants also completed other scales and items, which were used in assessing discriminant, convergent, and criterion-related validity. Test-retest reliability was also examined. After examining the scale and removing items that did not assess appropriate antibiotic use, an exploratory factor analysis was conducted on 13 items from the original scale. Three factors were retained that explained 65.51% of the variance. The scale and its subscales had adequate internal consistency. The scale had excellent test-retest reliability, as well as demonstrated convergent, discriminant, and criterion-related validity. The AAUSES is a valid and reliable scale that assesses three domains of appropriate antibiotic use self-efficacy. The AAUSES may have utility in clinical and research settings in understanding individuals' beliefs about appropriate antibiotic use and related behavioral correlates. Future research is needed to examine the scale's utility in these settings. Copyright © 2018 Elsevier B.V. All rights reserved.
2014-01-01
Background A balance test provides important information such as the standard to judge an individual’s functional recovery or make the prediction of falls. The development of a tool for a balance test that is inexpensive and widely available is needed, especially in clinical settings. The Wii Balance Board (WBB) is designed to test balance, but there is little software used in balance tests, and there are few studies on reliability and validity. Thus, we developed a balance assessment software using the Nintendo Wii Balance Board, investigated its reliability and validity, and compared it with a laboratory-grade force platform. Methods Twenty healthy adults participated in our study. The participants participated in the test for inter-rater reliability, intra-rater reliability, and concurrent validity. The tests were performed with balance assessment software using the Nintendo Wii balance board and a laboratory-grade force platform. Data such as Center of Pressure (COP) path length and COP velocity were acquired from the assessment systems. The inter-rater reliability, the intra-rater reliability, and concurrent validity were analyzed by an intraclass correlation coefficient (ICC) value and a standard error of measurement (SEM). Results The inter-rater reliability (ICC: 0.89-0.79, SEM in path length: 7.14-1.90, SEM in velocity: 0.74-0.07), intra-rater reliability (ICC: 0.92-0.70, SEM in path length: 7.59-2.04, SEM in velocity: 0.80-0.07), and concurrent validity (ICC: 0.87-0.73, SEM in path length: 5.94-0.32, SEM in velocity: 0.62-0.08) were high in terms of COP path length and COP velocity. Conclusion The balance assessment software incorporating the Nintendo Wii balance board was used in our study and was found to be a reliable assessment device. In clinical settings, the device can be remarkably inexpensive, portable, and convenient for the balance assessment. PMID:24912769
Park, Dae-Sung; Lee, GyuChang
2014-06-10
A balance test provides important information such as the standard to judge an individual's functional recovery or make the prediction of falls. The development of a tool for a balance test that is inexpensive and widely available is needed, especially in clinical settings. The Wii Balance Board (WBB) is designed to test balance, but there is little software used in balance tests, and there are few studies on reliability and validity. Thus, we developed a balance assessment software using the Nintendo Wii Balance Board, investigated its reliability and validity, and compared it with a laboratory-grade force platform. Twenty healthy adults participated in our study. The participants participated in the test for inter-rater reliability, intra-rater reliability, and concurrent validity. The tests were performed with balance assessment software using the Nintendo Wii balance board and a laboratory-grade force platform. Data such as Center of Pressure (COP) path length and COP velocity were acquired from the assessment systems. The inter-rater reliability, the intra-rater reliability, and concurrent validity were analyzed by an intraclass correlation coefficient (ICC) value and a standard error of measurement (SEM). The inter-rater reliability (ICC: 0.89-0.79, SEM in path length: 7.14-1.90, SEM in velocity: 0.74-0.07), intra-rater reliability (ICC: 0.92-0.70, SEM in path length: 7.59-2.04, SEM in velocity: 0.80-0.07), and concurrent validity (ICC: 0.87-0.73, SEM in path length: 5.94-0.32, SEM in velocity: 0.62-0.08) were high in terms of COP path length and COP velocity. The balance assessment software incorporating the Nintendo Wii balance board was used in our study and was found to be a reliable assessment device. In clinical settings, the device can be remarkably inexpensive, portable, and convenient for the balance assessment.
Ribeiro, João Carlos; Simões, João; Silva, Filipe; Silva, Eduardo D.; Hummel, Cornelia; Hummel, Thomas; Paiva, António
2016-01-01
The cross-cultural adaptation and validation of the Sniffin`Sticks test for the Portuguese population is described. Over 270 people participated in four experiments. In Experiment 1, 67 participants rated the familiarity of presented odors and seven descriptors of the original test were adapted to a Portuguese context. In Experiment 2, the Portuguese version of Sniffin`Sticks test was administered to 203 healthy participants. Older age, male gender and active smoking status were confirmed as confounding factors. The third experiment showed the validity of the Portuguese version of Sniffin`Sticks test in discriminating healthy controls from patients with olfactory dysfunction. In Experiment 4, the test-retest reliability for both the composite score (r71 = 0.86) and the identification test (r71 = 0.62) was established (p<0.001). Normative data for the Portuguese version of Sniffin`Sticks test is provided, showing good validity and reliability and effectively distinguishing patients from healthy controls with high sensitivity and specificity. The Portuguese version of Sniffin`Sticks test identification test is a clinically suitable screening tool in routine outpatient Portuguese settings. PMID:26863023
NASA Astrophysics Data System (ADS)
Zhu, Xiaoliang; Du, Li; Liu, Bendong; Zhe, Jiang
2016-06-01
We present a method based on an electrochemical sensor array and a back propagation artificial neural network for detection and quantification of four properties of lubrication oil, namely water (0, 500 ppm, 1000 ppm), total acid number (TAN) (13.1, 13.7, 14.4, 15.6 mg KOH g-1), soot (0, 1%, 2%, 3%) and sulfur content (1.3%, 1.37%, 1.44%, 1.51%). The sensor array, consisting of four micromachined electrochemical sensors, detects the four properties with overlapping sensitivities. A total set of 36 oil samples containing mixtures of water, soot, and sulfuric acid with different concentrations were prepared for testing. The sensor array’s responses were then divided to three sets: training sets (80% data), validation sets (10%) and testing sets (10%). Several back propagation artificial neural network architectures were trained with the training and validation sets; one architecture with four input neurons, 50 and 5 neurons in the first and second hidden layer, and four neurons in the output layer was selected. The selected neural network was then tested using the four sets of testing data (10%). Test results demonstrated that the developed artificial neural network is able to quantitatively determine the four lubrication properties (water, TAN, soot, and sulfur content) with a maximum prediction error of 18.8%, 6.0%, 6.7%, and 5.4%, respectively, indicting a good match between the target and predicted values. With the developed network, the sensor array could be potentially used for online lubricant oil condition monitoring.
Brett, Benjamin L; Solomon, Gary S
2017-04-01
Research findings to date on the stability of Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) Composite scores have been inconsistent, requiring further investigation. The use of test validity criteria across these studies also has been inconsistent. Using multiple measures of stability, we examined test-retest reliability of repeated ImPACT baseline assessments in high school athletes across various validity criteria reported in previous studies. A total of 1146 high school athletes completed baseline cognitive testing using the online ImPACT test battery at two time periods of approximately two-year intervals. No participant sustained a concussion between assessments. Five forms of validity criteria used in previous test-retest studies were applied to the data, and differences in reliability were compared. Intraclass correlation coefficients (ICCs) ranged in composite scores from .47 (95% confidence interval, CI [.38, .54]) to .83 (95% CI [.81, .85]) and showed little change across a two-year interval for all five sets of validity criteria. Regression based methods (RBMs) examining the test-retest stability demonstrated a lack of significant change in composite scores across the two-year interval for all forms of validity criteria, with no cases falling outside the expected range of 90% confidence intervals. The application of more stringent validity criteria does not alter test-retest reliability, nor does it account for some of the variation observed across previously performed studies. As such, use of the ImPACT manual validity criteria should be utilized in the determination of test validity and in the individualized approach to concussion management. Potential future efforts to improve test-retest reliability are discussed.
Vadlin, Sofia; Åslund, Cecilia; Nilsson, Kent W
2015-08-01
This study describes the development of a screening tool for gaming addiction in adolescents - the Gaming Addiction Identification Test (GAIT). Its development was based on the research literature on gaming and addiction. An expert panel comprising professional raters (n = 7), experiential adolescent raters (n = 10), and parent raters (n = 10) estimated the content validity of each item (I-CVI) as well as of the whole scale (S-CVI/Ave), and participated in a cognitive interview about the GAIT scale. The mean scores for both I-CVI and S-CVI/Ave ranged between 0.97 and 0.99 compared with the lowest recommended I-CVI value of 0.78 and the S-CVI/Ave value of 0.90. There were no sex differences and no differences between expert groups regarding ratings in content validity. No differences in the overall evaluation of the scale emerged in the cognitive interviews. Our conclusions were that GAIT showed good content validity in capturing gaming addiction. The GAIT needs further investigation into its psychometric properties of construct validity (convergent and divergent validity) and criterion-related validity, as well as its reliability in both clinical settings and in community settings with adolescents. © 2015 Scandinavian Psychological Associations and John Wiley & Sons Ltd.
Validation of US3D for Capsule Aerodynamics using 05-CA Wind Tunnel Test Data
NASA Technical Reports Server (NTRS)
Schwing, Alan
2012-01-01
Several comparisons of computational fluid dynamics to wind tunnel test data are shown for the purpose of code validation. The wind tunnel test, 05-CA, uses a 7.66% model of NASA's Multi-Purpose Crew Vehicle in the 11-foot test section of the Ames Unitary Plan Wind tunnel. A variety of freestream conditions over four Mach numbers and three angles of attack are considered. Test data comparisons include time-averaged integrated forces and moments, time-averaged static pressure ports on the surface, and Strouhal Number. The applicability of the US3D code to subsonic and transonic flow over a bluff body is assessed on a comprehensive data set. With close comparison, this work validates US3D for highly separated flows similar to those examined here.
16 CFR 1500.41 - Method of testing primary irritant substances.
Code of Federal Regulations, 2014 CFR
2014-01-01
... corrosivity properties of substances, including testing that does not require animals, are presented in the CPSC's animal testing policy set forth in 16 CFR 1500.232. A weight-of-evidence analysis or a validated... conducted, a sequential testing strategy is recommended to reduce the number of test animals. The method of...
Korjus, Kristjan; Hebart, Martin N; Vicente, Raul
2016-01-01
Supervised machine learning methods typically require splitting data into multiple chunks for training, validating, and finally testing classifiers. For finding the best parameters of a classifier, training and validation are usually carried out with cross-validation. This is followed by application of the classifier with optimized parameters to a separate test set for estimating the classifier's generalization performance. With limited data, this separation of test data creates a difficult trade-off between having more statistical power in estimating generalization performance versus choosing better parameters and fitting a better model. We propose a novel approach that we term "Cross-validation and cross-testing" improving this trade-off by re-using test data without biasing classifier performance. The novel approach is validated using simulated data and electrophysiological recordings in humans and rodents. The results demonstrate that the approach has a higher probability of discovering significant results than the standard approach of cross-validation and testing, while maintaining the nominal alpha level. In contrast to nested cross-validation, which is maximally efficient in re-using data, the proposed approach additionally maintains the interpretability of individual parameters. Taken together, we suggest an addition to currently used machine learning approaches which may be particularly useful in cases where model weights do not require interpretation, but parameters do.
Predictors of validity and reliability of a physical activity record in adolescents
2013-01-01
Background Poor to moderate validity of self-reported physical activity instruments is commonly observed in young people in low- and middle-income countries. However, the reasons for such low validity have not been examined in detail. We tested the validity of a self-administered daily physical activity record in adolescents and assessed if personal characteristics or the convenience level of reporting physical activity modified the validity estimates. Methods The study comprised a total of 302 adolescents from an urban and rural area in Ecuador. Validity was evaluated by comparing the record with accelerometer recordings for seven consecutive days. Test-retest reliability was examined by comparing registrations from two records administered three weeks apart. Time spent on sedentary (SED), low (LPA), moderate (MPA) and vigorous (VPA) intensity physical activity was estimated. Bland Altman plots were used to evaluate measurement agreement. We assessed if age, sex, urban or rural setting, anthropometry and convenience of completing the record explained differences in validity estimates using a linear mixed model. Results Although the record provided higher estimates for SED and VPA and lower estimates for LPA and MPA compared to the accelerometer, it showed an overall fair measurement agreement for validity. There was modest reliability for assessing physical activity in each intensity level. Validity was associated with adolescents’ personal characteristics: sex (SED: P = 0.007; LPA: P = 0.001; VPA: P = 0.009) and setting (LPA: P = 0.000; MPA: P = 0.047). Reliability was associated with the convenience of completing the physical activity record for LPA (low convenience: P = 0.014; high convenience: P = 0.045). Conclusions The physical activity record provided acceptable estimates for reliability and validity on a group level. Sex and setting were associated with validity estimates, whereas convenience to fill out the record was associated with better reliability estimates for LPA. This tendency of improved reliability estimates for adolescents reporting higher convenience merits further consideration. PMID:24289296
Evaluation of a Serum Lung Cancer Biomarker Panel.
Mazzone, Peter J; Wang, Xiao-Feng; Han, Xiaozhen; Choi, Humberto; Seeley, Meredith; Scherer, Richard; Doseeva, Victoria
2018-01-01
A panel of 3 serum proteins and 1 autoantibody has been developed to assist with the detection of lung cancer. We aimed to validate the accuracy of the biomarker panel in an independent test set and explore the impact of adding a fourth serum protein to the panel, as well as the impact of combining molecular and clinical variables. The training set of serum samples was purchased from commercially available biorepositories. The testing set was from a biorepository at the Cleveland Clinic. All lung cancer and control subjects were >50 years old and had smoked a minimum of 20 pack-years. A panel of biomarkers including CEA (carcinoembryonic antigen), CYFRA21-1 (cytokeratin-19 fragment 21-1), CA125 (carbohydrate antigen 125), HGF (hepatocyte growth factor), and NY-ESO-1 (New York esophageal cancer-1 antibody) was measured using immunoassay techniques. The multiple of the median method, multivariate logistic regression, and random forest modeling was used to analyze the results. The training set consisted of 604 patient samples (268 with lung cancer and 336 controls) and the testing set of 400 patient samples (155 with lung cancer and 245 controls). With a threshold established from the training set, the sensitivity and specificity of both the 4- and 5-biomarker panels on the testing set was 49% and 96%, respectively. Models built on the testing set using only clinical variables had an area under the receiver operating characteristic curve of 0.68, using the biomarker panel 0.81 and by combining clinical and biomarker variables 0.86. This study validates the accuracy of a panel of proteins and an autoantibody in a population relevant to lung cancer detection and suggests a benefit to combining clinical features with the biomarker results.
Evaluation of a Serum Lung Cancer Biomarker Panel
Mazzone, Peter J; Wang, Xiao-Feng; Han, Xiaozhen; Choi, Humberto; Seeley, Meredith; Scherer, Richard; Doseeva, Victoria
2018-01-01
Background: A panel of 3 serum proteins and 1 autoantibody has been developed to assist with the detection of lung cancer. We aimed to validate the accuracy of the biomarker panel in an independent test set and explore the impact of adding a fourth serum protein to the panel, as well as the impact of combining molecular and clinical variables. Methods: The training set of serum samples was purchased from commercially available biorepositories. The testing set was from a biorepository at the Cleveland Clinic. All lung cancer and control subjects were >50 years old and had smoked a minimum of 20 pack-years. A panel of biomarkers including CEA (carcinoembryonic antigen), CYFRA21-1 (cytokeratin-19 fragment 21-1), CA125 (carbohydrate antigen 125), HGF (hepatocyte growth factor), and NY-ESO-1 (New York esophageal cancer-1 antibody) was measured using immunoassay techniques. The multiple of the median method, multivariate logistic regression, and random forest modeling was used to analyze the results. Results: The training set consisted of 604 patient samples (268 with lung cancer and 336 controls) and the testing set of 400 patient samples (155 with lung cancer and 245 controls). With a threshold established from the training set, the sensitivity and specificity of both the 4- and 5-biomarker panels on the testing set was 49% and 96%, respectively. Models built on the testing set using only clinical variables had an area under the receiver operating characteristic curve of 0.68, using the biomarker panel 0.81 and by combining clinical and biomarker variables 0.86. Conclusions: This study validates the accuracy of a panel of proteins and an autoantibody in a population relevant to lung cancer detection and suggests a benefit to combining clinical features with the biomarker results. PMID:29371783
Bauer, Russell M.; Iverson, Grant L.; Cernich, Alison N.; Binder, Laurence M.; Ruff, Ronald M.; Naugle, Richard I.
2012-01-01
This joint position paper of the American Academy of Clinical Neuropsychology and the National Academy of Neuropsychology sets forth our position on appropriate standards and conventions for computerized neuropsychological assessment devices (CNADs). In this paper, we first define CNADs and distinguish them from examiner-administered neuropsychological instruments. We then set forth position statements on eight key issues relevant to the development and use of CNADs in the healthcare setting. These statements address (a) device marketing and performance claims made by developers of CNADs; (b) issues involved in appropriate end-users for administration and interpretation of CNADs; (c) technical (hardware/software/firmware) issues; (d) privacy, data security, identity verification, and testing environment; (e) psychometric development issues, especially reliability, and validity; (f) cultural, experiential, and disability factors affecting examinee interaction with CNADs; (g) use of computerized testing and reporting services; and (h) the need for checks on response validity and effort in the CNAD environment. This paper is intended to provide guidance for test developers and users of CNADs that will promote accurate and appropriate use of computerized tests in a way that maximizes clinical utility and minimizes risks of misuse. The positions taken in this paper are put forth with an eye toward balancing the need to make validated CNADs accessible to otherwise underserved patients with the need to ensure that such tests are developed and utilized competently, appropriately, and with due concern for patient welfare and quality of care. PMID:22382386
Bauer, Russell M.; Iverson, Grant L.; Cernich, Alison N.; Binder, Laurence M.; Ruff, Ronald M.; Naugle, Richard I.
2013-01-01
This joint position paper of the American Academy of Clinical Neuropsychology and the National Academy of Neuropsychology sets forth our position on appropriate standards and conventions for computerized neuropsychological assessment devices (CNADs). In this paper, we first define CNADs and distinguish them from examiner-administered neuropsychological instruments. We then set forth position statements on eight key issues relevant to the development and use of CNADs in the healthcare setting. These statements address (a) device marketing and performance claims made by developers of CNADs; (b) issues involved in appropriate end-users for administration and interpretation of CNADs; (c) technical (hardware/software/firmware) issues; (d) privacy, data security, identity verification, and testing environment; (e) psychometric development issues, especially reliability and validity; (f) cultural, experiential, and disability factors affecting examinee interaction with CNADs; (g) use of computerized testing and reporting services; and (h) the need for checks on response validity and effort in the CNAD environment. This paper is intended to provide guidance for test developers and users of CNADs that will promote accurate and appropriate use of computerized tests in a way that maximizes clinical utility and minimizes risks of misuse. The positions taken in this paper are put forth with an eye toward balancing the need to make validated CNADs accessible to otherwise underserved patients with the need to ensure that such tests are developed and utilized competently, appropriately, and with due concern for patient welfare and quality of care. PMID:22394228
Testing and Validation of Computational Methods for Mass Spectrometry.
Gatto, Laurent; Hansen, Kasper D; Hoopmann, Michael R; Hermjakob, Henning; Kohlbacher, Oliver; Beyer, Andreas
2016-03-04
High-throughput methods based on mass spectrometry (proteomics, metabolomics, lipidomics, etc.) produce a wealth of data that cannot be analyzed without computational methods. The impact of the choice of method on the overall result of a biological study is often underappreciated, but different methods can result in very different biological findings. It is thus essential to evaluate and compare the correctness and relative performance of computational methods. The volume of the data as well as the complexity of the algorithms render unbiased comparisons challenging. This paper discusses some problems and challenges in testing and validation of computational methods. We discuss the different types of data (simulated and experimental validation data) as well as different metrics to compare methods. We also introduce a new public repository for mass spectrometric reference data sets ( http://compms.org/RefData ) that contains a collection of publicly available data sets for performance evaluation for a wide range of different methods.
Sociopathic Knowledge Bases: Correct Knowledge Can Be Harmful Even Given Unlimited Computation
1989-08-01
pobitive, as false positives generated by a medical program can often be caught by a physician upon further testing . False negatives, however, may be...improvement over the knowledge base tested is obtained. Although our work is pretty much theoretical research oriented one example of ex- periments is...knowledge base, improves the performance by about 10%. of tests . First, we divide the cases into a training set and a validation set with 70% vs. 30% each
Test set up description and performances for HAWAII-2RG detector characterization at ESTEC
NASA Astrophysics Data System (ADS)
Crouzet, P.-E.; ter Haar, J.; de Wit, F.; Beaufort, T.; Butler, B.; Smit, H.; van der Luijt, C.; Martin, D.
2012-07-01
In the frame work of the European Space Agency's Cosmic Vision program, the Euclid mission has the objective to map the geometry of the Dark Universe. Galaxies and clusters of galaxies will be observed in the visible and near-infrared wavelengths by an imaging and spectroscopic channel. For the Near Infrared Spectrometer instrument (NISP), the state-of-the-art HAWAII-2RG detectors will be used, associated with the SIDECAR ASIC readout electronic which will perform the image frame acquisitions. To characterize and validate the performance of these detectors, a test bench has been designed, tested and validated. This publication describes the pre-tests performed to build the set up dedicated to dark current measurements and tests requiring reasonably uniform light levels (such as for conversion gain measurements). Successful cryogenic and vacuum tests on commercial LEDs and photodiodes are shown. An optimized feed through in stainless steel with a V-groove to pot the flex cable connecting the SIDECAR ASIC to the room temperature board (JADE2) has been designed and tested. The test set up for quantum efficiency measurements consisting of a lamp, a monochromator, an integrating sphere and set of cold filters, and which is currently under construction will ensure a uniform illumination across the detector with variations lower than 2%. A dedicated spot projector for intra-pixel measurements has been designed and built to reach a spot diameter of 5 μm at 920nm with 2nm of bandwidth [1].
DOE Office of Scientific and Technical Information (OSTI.GOV)
Blanchat, Thomas K.; Jernigan, Dann A.
A set of experiments and test data are outlined in this report that provides radiation intensity data for the validation of models for the radiative transfer equation. The experiments were performed with lightly-sooting liquid hydrocarbon fuels that yielded fully turbulent fires 2 m diameter). In addition, supplemental measurements of air flow and temperature, fuel temperature and burn rate, and flame surface emissive power, wall heat, and flame height and width provide a complete set of boundary condition data needed for validation of models used in fire simulations.
Final technical report provides test methods used and verification results to be published on ETV web sites. The ETS UV System Model UVL-200-4 was tested to validate the UV dose delivered by the system using biodosimetry and a set line approach. The set line for 40 mJ/cm2 Red...
Shmulewitz, D.; Wall, M.M.; Aharonovich, E.; Spivak, B.; Weizman, A.; Frisch, A.; Grant, B. F.; Hasin, D.
2013-01-01
Background The fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) proposes aligning nicotine use disorder (NUD) criteria with those for other substances, by including the current DSM fourth edition (DSM-IV) nicotine dependence (ND) criteria, three abuse criteria (neglect roles, hazardous use, interpersonal problems) and craving. Although NUD criteria indicate one latent trait, evidence is lacking on: (1) validity of each criterion; (2) validity of the criteria as a set; (3) comparative validity between DSM-5 NUD and DSM-IV ND criterion sets; and (4) NUD prevalence. Method Nicotine criteria (DSM-IV ND, abuse and craving) and external validators (e.g. smoking soon after awakening, number of cigarettes per day) were assessed with a structured interview in 734 lifetime smokers from an Israeli household sample. Regression analysis evaluated the association between validators and each criterion. Receiver operating characteristic analysis assessed the association of the validators with the DSM-5 NUD set (number of criteria endorsed) and tested whether DSM-5 or DSM-IV provided the most discriminating criterion set. Changes in prevalence were examined. Results Each DSM-5 NUD criterion was significantly associated with the validators, with strength of associations similar across the criteria. As a set, DSM-5 criteria were significantly associated with the validators, were significantly more discriminating than DSM-IV ND criteria, and led to increased prevalence of binary NUD (two or more criteria) over ND. Conclusions All findings address previous concerns about the DSM-IV nicotine diagnosis and its criteria and support the proposed changes for DSM-5 NUD, which should result in improved diagnosis of nicotine disorders. PMID:23312475
Does Rational Selection of Training and Test Sets Improve the Outcome of QSAR Modeling?
Prior to using a quantitative structure activity relationship (QSAR) model for external predictions, its predictive power should be established and validated. In the absence of a true external dataset, the best way to validate the predictive ability of a model is to perform its s...
Teachers' Engagement at Work: An International Validation Study
ERIC Educational Resources Information Center
Klassen, Robert M.; Aldhafri, Said; Mansfield, Caroline F.; Purwanto, Edy; Siu, Angela F. Y.; Wong, Marina W.; Woods-McConney, Amanda
2012-01-01
This study explored the validity of the Utrecht Work Engagement Scale in a sample of 853 practicing teachers from Australia, Canada, China (Hong Kong), Indonesia, and Oman. The authors used multigroup confirmatory factor analysis to test the factor structure and measurement invariance across settings, after which they examined the relationships…
McCoy, Dana Charles; Sudfeld, Christopher R; Bellinger, David C; Muhihi, Alfa; Ashery, Geofrey; Weary, Taylor E; Fawzi, Wafaie; Fink, Günther
2017-02-09
Low-cost, cross-culturally comparable measures of the motor, cognitive, and socioemotional skills of children under 3 years remain scarce. In the present paper, we aim to develop a new caregiver-reported early childhood development (ECD) scale designed to be implemented as part of household surveys in low-resourced settings. We evaluate the acceptability, test-retest reliability, internal consistency, and discriminant validity of the new ECD items, subscales, and full scale in a sample of 2481 18- to 36-month-old children from peri-urban and rural Tanzania. We also compare total and subscale scores with performance on the Bayley Scales of Infant Development (BSID-III) in a subsample of 1036 children. Qualitative interviews from 10 mothers and 10 field workers are used to inform quantitative data. Adequate levels of acceptability and internal consistency were found for the new scale and its motor, cognitive, and socioemotional subscales. Correlations between the new scale and the BSID-III were high (r > .50) for the motor and cognitive subscales, but low (r < .20) for the socioemotional subscale. The new scale discriminated between children's skills based on age, stunting status, caregiver-reported disability, and adult stimulation. Test-retest reliability scores were variable among a subset of items tested. Results of this study provide empirical support from a low-income country setting for the acceptability, reliability, and validity of a new caregiver-reported ECD scale. Additional research is needed to test these and other caregiver reported items in children in the full 0 to 3 year range across multiple cultural and linguistic settings.
Prediction of biodegradability from chemical structure: Modeling or ready biodegradation test data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Loonen, H.; Lindgren, F.; Hansen, B.
1999-08-01
Biodegradation data were collected and evaluated for 894 substances with widely varying chemical structures. All data were determined according to the Japanese Ministry of International Trade and Industry (MITI) I test protocol. The MITI I test is a screening test for ready biodegradability and has been described by Organization for Economic Cooperation and Development (OECD) test guideline 301 C and European Union (EU) test guideline C4F. The chemicals were characterized by a set of 127 predefined structural fragments. This data set was used to develop a model for the prediction of the biodegradability of chemicals under standardized OECD and EUmore » ready biodegradation test conditions. Partial least squares (PLS) discriminant analysis was used for the model development. The model was evaluated by means of internal cross-validation and repeated external validation. The importance of various structural fragments and fragment interactions was investigated. The most important fragments include the presence of a long alkyl chain; hydroxy, ester, and acid groups (enhancing biodegradation); and the presence of one or more aromatic rings and halogen substituents (regarding biodegradation). More than 85% of the model predictions were correct for using the complete data set. The not readily biodegradable predictions were slightly better than the readily biodegradable predictions (86 vs 84%). The average percentage of correct predictions from four external validation studies was 83%. Model optimization by including fragment interactions improve the model predicting capabilities to 89%. It can be concluded that the PLS model provides predictions of high reliability for a diverse range of chemical structures. The predictions conform to the concept of readily biodegradable (or not readily biodegradable) as defined by OECD and EU test guidelines.« less
Pizzo, Fabiola; Lombardo, Anna; Manganaro, Alberto; Benfenati, Emilio
2016-01-01
The prompt identification of chemical molecules with potential effects on liver may help in drug discovery and in raising the levels of protection for human health. Besides in vitro approaches, computational methods in toxicology are drawing attention. We built a structure-activity relationship (SAR) model for evaluating hepatotoxicity. After compiling a data set of 950 compounds using data from the literature, we randomly split it into training (80%) and test sets (20%). We also compiled an external validation set (101 compounds) for evaluating the performance of the model. To extract structural alerts (SAs) related to hepatotoxicity and non-hepatotoxicity we used SARpy, a statistical application that automatically identifies and extracts chemical fragments related to a specific activity. We also applied the chemical grouping approach for manually identifying other SAs. We calculated accuracy, specificity, sensitivity and Matthews correlation coefficient (MCC) on the training, test and external validation sets. Considering the complexity of the endpoint, the model performed well. In the training, test and external validation sets the accuracy was respectively 81, 63, and 68%, specificity 89, 33, and 33%, sensitivity 93, 88, and 80% and MCC 0.63, 0.27, and 0.13. Since it is preferable to overestimate hepatotoxicity rather than not to recognize unsafe compounds, the model's architecture followed a conservative approach. As it was built using human data, it might be applied without any need for extrapolation from other species. This model will be freely available in the VEGA platform. PMID:27920722
Chang, Ken; Bai, Harrison X; Zhou, Hao; Su, Chang; Bi, Wenya Linda; Agbodza, Ena; Kavouridis, Vasileios K; Senders, Joeky T; Boaro, Alessandro; Beers, Andrew; Zhang, Biqi; Capellini, Alexandra; Liao, Weihua; Shen, Qin; Li, Xuejun; Xiao, Bo; Cryan, Jane; Ramkissoon, Shakti; Ramkissoon, Lori; Ligon, Keith; Wen, Patrick Y; Bindra, Ranjit S; Woo, John; Arnaout, Omar; Gerstner, Elizabeth R; Zhang, Paul J; Rosen, Bruce R; Yang, Li; Huang, Raymond Y; Kalpathy-Cramer, Jayashree
2018-03-01
Purpose: Isocitrate dehydrogenase ( IDH ) mutations in glioma patients confer longer survival and may guide treatment decision making. We aimed to predict the IDH status of gliomas from MR imaging by applying a residual convolutional neural network to preoperative radiographic data. Experimental Design: Preoperative imaging was acquired for 201 patients from the Hospital of University of Pennsylvania (HUP), 157 patients from Brigham and Women's Hospital (BWH), and 138 patients from The Cancer Imaging Archive (TCIA) and divided into training, validation, and testing sets. We trained a residual convolutional neural network for each MR sequence (FLAIR, T2, T1 precontrast, and T1 postcontrast) and built a predictive model from the outputs. To increase the size of the training set and prevent overfitting, we augmented the training set images by introducing random rotations, translations, flips, shearing, and zooming. Results: With our neural network model, we achieved IDH prediction accuracies of 82.8% (AUC = 0.90), 83.0% (AUC = 0.93), and 85.7% (AUC = 0.94) within training, validation, and testing sets, respectively. When age at diagnosis was incorporated into the model, the training, validation, and testing accuracies increased to 87.3% (AUC = 0.93), 87.6% (AUC = 0.95), and 89.1% (AUC = 0.95), respectively. Conclusions: We developed a deep learning technique to noninvasively predict IDH genotype in grade II-IV glioma using conventional MR imaging using a multi-institutional data set. Clin Cancer Res; 24(5); 1073-81. ©2017 AACR . ©2017 American Association for Cancer Research.
Ropodi, Athina I; Panagou, Efstathios Z; Nychas, George-John E
2018-01-01
In recent years, fraud detection has become a major priority for food authorities, as fraudulent practices can have various economic and safety consequences. This work explores ways of identifying frozen-then-thawed minced beef labeled as fresh in a rapid, large-scale and cost-effective way. For this reason, freshly-ground beef was purchased from seven separate shops at different times, divided in fifteen portions and placed in Petri dishes. Multi-spectral images and FTIR spectra of the first five were immediately acquired while the remaining were frozen (-20°C) and stored for 7 and 32days (5 samples for each time interval). Samples were thawed and subsequently subjected to similar data acquisition. In total, 105 multispectral images and FTIR spectra were collected which were further analyzed using partial least-squares discriminant analysis and support vector machines. Two meat batches (30 samples) were reserved for independent validation and the remaining five batches were divided in training and test set (75 samples). Results showed 100% overall correct classification for test and external validation MSI data, while FTIR data yielded 93.3 and 96.7% overall correct classification for FTIR test set and external validation set respectively. Copyright © 2017 Elsevier Ltd. All rights reserved.
Oncology biomarkers: discovery, validation, and clinical use.
Heckman-Stoddard, Brandy M
2012-05-01
To discuss the discovery, validation, and clinical use of multiple types of biomarkers. Medical literature and published guidelines. Formal validation of biomarkers should include both retrospective analyses of well-characterized samples as well as a prospective clinical trial in which the biomarker is tested for its ability to predict the presence of disease or the efficacy of a cancer therapy. Biomarker development is complicated, with very few biomarker discoveries leading to clinically useful tests. Nurses should understand how a biomarker was developed, including the sensitivity and specificity before applying new biomarkers in the clinical setting. Copyright © 2012. Published by Elsevier Inc.
Long, Clive G; Banyard, Ellen; Fulton, Barbara; Hollin, Clive R
2014-09-01
Arson and fire-setting are highly prevalent among patients in secure psychiatric settings but there is an absence of valid and reliable assessment instruments and no evidence of a significant approach to intervention. To develop a semi-structured interview assessment specifically for fire-setting to augment structured assessments of risk and need. The extant literature was used to frame interview questions relating to the antecedents, behaviour and consequences necessary to formulate a functional analysis. Questions also covered readiness to change, fire-setting self-efficacy, the probability of future fire-setting, barriers to change, and understanding of fire-setting behaviour. The assessment concludes with indications for assessment and a treatment action plan. The inventory was piloted with a sample of women in secure care and was assessed for comprehensibility, reliability and validity. Staff rated the St Andrews Fire and Risk Instrument (SAFARI) as acceptable to patients and easy to administer. SAFARI was found to be comprehensible by over 95% of the general population, to have good acceptance, high internal reliability, substantial test-retest reliability and validity. SAFARI helps to provide a clear explanation of fire-setting in terms of the complex interplay of antecedents and consequences and facilitates the design of an individually tailored treatment programme in sympathy with a cognitive-behavioural approach. Further studies are needed to verify the reliability and validity of SAFARI with male populations and across settings.
Sebastião, Emerson; Sandroff, Brian M; Learmonth, Yvonne C; Motl, Robert W
2016-07-01
To examine the validity of the timed Up and Go (TUG) test as a measure of functional mobility in persons with multiple sclerosis (MS) by using a comprehensive framework based on construct validity (ie, convergent and divergent validity). Cross-sectional study. Hospital setting. Community-residing persons with MS (N=47). Not applicable. Main outcome measures included the TUG test, timed 25-foot walk test, 6-minute walk test, Multiple Sclerosis Walking Scale-12, Late-Life Function and Disability Instrument, posturography evaluation, Activities-specific Balance Confidence scale, Symbol Digits Modalities Test, Expanded Disability Status Scale, and the number of steps taken per day. The TUG test was strongly associated with other valid outcome measures of ambulatory mobility (Spearman rank correlation, rs=.71-.90) and disability status (rs=.80), moderately to strongly associated with balance confidence (rs=.66), and weakly associated with postural control (ie, balance) (rs=.31). The TUG test was moderately associated with cognitive processing speed (rs=.59), but not associated with other nonambulatory measures (ie, Late-Life Function and Disability Instrument-upper extremity function). Our findings support the validity of the TUG test as a measure of functional mobility. This warrants its inclusion in patients' assessment alongside other valid measures of functional mobility in both clinical and research practice in persons with MS. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Richardson, Michelle; Katsakou, Christina; Torres-González, Francisco; Onchev, George; Kallert, Thomas; Priebe, Stefan
2011-06-30
Patients' views of inpatient care need to be assessed for research and routine evaluation. For this a valid instrument is required. The Client Assessment of Treatment Scale (CAT) has been used in large scale international studies, but its psychometric properties have not been well established. The structural validity of the CAT was tested among involuntary inpatients with psychosis. Data from locations in three separate European countries (England, Spain and Bulgaria) were collected. The factorial validity was initially tested using single sample confirmatory factor analyses in each country. Subsequent multi-sample analyses were used to test for invariance of the factor loadings, and factor variances across the countries. Results provide good initial support for the factorial validity and invariance of the CAT scores. Future research is needed to cross-validate these findings and to generalise them to other countries, treatment settings, and patient populations. Copyright © 2011 Elsevier Ltd. All rights reserved.
Mohammed, Mohammed A.; Rudge, Gavin; Watson, Duncan; Wood, Gordon; Smith, Gary B.; Prytherch, David R.; Girling, Alan; Stevens, Andrew
2013-01-01
Background We explored the use of routine blood tests and national early warning scores (NEWS) reported within ±24 hours of admission to predict in-hospital mortality in emergency admissions, using empirical decision Tree models because they are intuitive and may ultimately be used to support clinical decision making. Methodology A retrospective analysis of adult emergency admissions to a large acute hospital during April 2009 to March 2010 in the West Midlands, England, with a full set of index blood tests results (albumin, creatinine, haemoglobin, potassium, sodium, urea, white cell count and an index NEWS undertaken within ±24 hours of admission). We developed a Tree model by randomly splitting the admissions into a training (50%) and validation dataset (50%) and assessed its accuracy using the concordance (c-) statistic. Emergency admissions (about 30%) did not have a full set of index blood tests and/or NEWS and so were not included in our analysis. Results There were 23248 emergency admissions with a full set of blood tests and NEWS with an in-hospital mortality of 5.69%. The Tree model identified age, NEWS, albumin, sodium, white cell count and urea as significant (p<0.001) predictors of death, which described 17 homogeneous subgroups of admissions with mortality ranging from 0.2% to 60%. The c-statistic for the training model was 0.864 (95%CI 0.852 to 0.87) and when applied to the testing data set this was 0.853 (95%CI 0.840 to 0.866). Conclusions An easy to interpret validated risk adjustment Tree model using blood test and NEWS taken within ±24 hours of admission provides good discrimination and offers a novel approach to risk adjustment which may potentially support clinical decision making. Given the nature of the clinical data, the results are likely to be generalisable but further research is required to investigate this promising approach. PMID:23734195
Work zone performance measures pilot test.
DOT National Transportation Integrated Search
2011-04-01
Currently, a well-defined and validated set of metrics to use in monitoring work zone performance do not : exist. This pilot test was conducted to assist state DOTs in identifying what work zone performance : measures can and should be targeted, what...
DESCQA: An Automated Validation Framework for Synthetic Sky Catalogs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mao, Yao-Yuan; Kovacs, Eve; Heitmann, Katrin
The use of high-quality simulated sky catalogs is essential for the success of cosmological surveys. The catalogs have diverse applications, such as investigating signatures of fundamental physics in cosmological observables, understanding the effect of systematic uncertainties on measured signals and testing mitigation strategies for reducing these uncertainties, aiding analysis pipeline development and testing, and survey strategy optimization. The list of applications is growing with improvements in the quality of the catalogs and the details that they can provide. Given the importance of simulated catalogs, it is critical to provide rigorous validation protocols that enable both catalog providers and users tomore » assess the quality of the catalogs in a straightforward and comprehensive way. For this purpose, we have developed the DESCQA framework for the Large Synoptic Survey Telescope Dark Energy Science Collaboration as well as for the broader community. The goal of DESCQA is to enable the inspection, validation, and comparison of an inhomogeneous set of synthetic catalogs via the provision of a common interface within an automated framework. Here in this paper, we present the design concept and first implementation of DESCQA. In order to establish and demonstrate its full functionality we use a set of interim catalogs and validation tests. We highlight several important aspects, both technical and scientific, that require thoughtful consideration when designing a validation framework, including validation metrics and how these metrics impose requirements on the synthetic sky catalogs.« less
DESCQA: An Automated Validation Framework for Synthetic Sky Catalogs
Mao, Yao-Yuan; Kovacs, Eve; Heitmann, Katrin; ...
2018-02-08
The use of high-quality simulated sky catalogs is essential for the success of cosmological surveys. The catalogs have diverse applications, such as investigating signatures of fundamental physics in cosmological observables, understanding the effect of systematic uncertainties on measured signals and testing mitigation strategies for reducing these uncertainties, aiding analysis pipeline development and testing, and survey strategy optimization. The list of applications is growing with improvements in the quality of the catalogs and the details that they can provide. Given the importance of simulated catalogs, it is critical to provide rigorous validation protocols that enable both catalog providers and users tomore » assess the quality of the catalogs in a straightforward and comprehensive way. For this purpose, we have developed the DESCQA framework for the Large Synoptic Survey Telescope Dark Energy Science Collaboration as well as for the broader community. The goal of DESCQA is to enable the inspection, validation, and comparison of an inhomogeneous set of synthetic catalogs via the provision of a common interface within an automated framework. Here in this paper, we present the design concept and first implementation of DESCQA. In order to establish and demonstrate its full functionality we use a set of interim catalogs and validation tests. We highlight several important aspects, both technical and scientific, that require thoughtful consideration when designing a validation framework, including validation metrics and how these metrics impose requirements on the synthetic sky catalogs.« less
The key-features approach to assess clinical decisions: validity evidence to date.
Bordage, G; Page, G
2018-05-17
The key-features (KFs) approach to assessment was initially proposed during the First Cambridge Conference on Medical Education in 1984 as a more efficient and effective means of assessing clinical decision-making skills. Over three decades later, we conducted a comprehensive, systematic review of the validity evidence gathered since then. The evidence was compiled according to the Standards for Educational and Psychological Testing's five sources of validity evidence, namely, Content, Response process, Internal structure, Relations to other variables, and Consequences, to which we added two other types related to Cost-feasibility and Acceptability. Of the 457 publications that referred to the KFs approach between 1984 and October 2017, 164 are cited here; the remaining 293 were either redundant or the authors simply mentioned the KFs concept in relation to their work. While one set of articles reported meeting the validity standards, another set examined KFs test development choices and score interpretation. The accumulated validity evidence for the KFs approach since its inception supports the decision-making construct measured and its use to assess clinical decision-making skills at all levels of training and practice and with various types of exam formats. Recognizing that gathering validity evidence is an ongoing process, areas with limited evidence, such as item factor analyses or consequences of testing, are identified as well as new topics needing further clarification, such as the use of the KFs approach for formative assessment and its place within a program of assessment.
Effect of Content Knowledge on Angoff-Style Standard Setting Judgments
ERIC Educational Resources Information Center
Margolis, Melissa J.; Mee, Janet; Clauser, Brian E.; Winward, Marcia; Clauser, Jerome C.
2016-01-01
Evidence to support the credibility of standard setting procedures is a critical part of the validity argument for decisions made based on tests that are used for classification. One area in which there has been limited empirical study is the impact of standard setting judge selection on the resulting cut score. One important issue related to…
NASA Technical Reports Server (NTRS)
Crutcher, H. L.; Falls, L. W.
1976-01-01
Sets of experimentally determined or routinely observed data provide information about the past, present and, hopefully, future sets of similarly produced data. An infinite set of statistical models exists which may be used to describe the data sets. The normal distribution is one model. If it serves at all, it serves well. If a data set, or a transformation of the set, representative of a larger population can be described by the normal distribution, then valid statistical inferences can be drawn. There are several tests which may be applied to a data set to determine whether the univariate normal model adequately describes the set. The chi-square test based on Pearson's work in the late nineteenth and early twentieth centuries is often used. Like all tests, it has some weaknesses which are discussed in elementary texts. Extension of the chi-square test to the multivariate normal model is provided. Tables and graphs permit easier application of the test in the higher dimensions. Several examples, using recorded data, illustrate the procedures. Tests of maximum absolute differences, mean sum of squares of residuals, runs and changes of sign are included in these tests. Dimensions one through five with selected sample sizes 11 to 101 are used to illustrate the statistical tests developed.
Feenstra, Heleen E M; Murre, Jaap M J; Vermeulen, Ivar E; Kieffer, Jacobien M; Schagen, Sanne B
2018-04-01
To facilitate large-scale assessment of a variety of cognitive abilities in clinical studies, we developed a self-administered online neuropsychological test battery: the Amsterdam Cognition Scan (ACS). The current studies evaluate in a group of adult cancer patients: test-retest reliability of the ACS and the influence of test setting (home or hospital), and the relationship between our online and a traditional test battery (concurrent validity). Test-retest reliability was studied in 96 cancer patients (57 female; M age = 51.8 years) who completed the ACS twice. Intraclass correlation coefficients (ICCs) were used to assess consistency over time. The test setting was counterbalanced between home and hospital; influence on test performance was assessed by repeated measures analyses of variance. Concurrent validity was studied in 201 cancer patients (112 female; M age = 53.5 years) who completed both the online and an equivalent traditional neuropsychological test battery. Spearman or Pearson correlations were used to assess consistency between online and traditional tests. ICCs of the online tests ranged from .29 to .76, with an ICC of .78 for the ACS total score. These correlations are generally comparable with the test-retest correlations of the traditional tests as reported in the literature. Correlating online and traditional test scores, we observed medium to large concurrent validity (r/ρ = .42 to .70; total score r = .78), except for a visuospatial memory test (ρ = .36). Correlations were affected-as expected-by design differences between online tests and their offline counterparts. Although development and optimization of the ACS is an ongoing process, and reliability can be optimized for several tests, our results indicate that it is a highly usable tool to obtain (online) measures of various cognitive abilities. The ACS is expected to facilitate efficient gathering of data on cognitive functioning in the near future.
Content validity and reliability of test of gross motor development in Chilean children
Cano-Cappellacci, Marcelo; Leyton, Fernanda Aleitte; Carreño, Joshua Durán
2016-01-01
ABSTRACT OBJECTIVE To validate a Spanish version of the Test of Gross Motor Development (TGMD-2) for the Chilean population. METHODS Descriptive, transversal, non-experimental validity and reliability study. Four translators, three experts and 92 Chilean children, from five to 10 years, students from a primary school in Santiago, Chile, have participated. The Committee of Experts has carried out translation, back-translation and revision processes to determine the translinguistic equivalence and content validity of the test, using the content validity index in 2013. In addition, a pilot implementation was achieved to determine test reliability in Spanish, by using the intraclass correlation coefficient and Bland-Altman method. We evaluated whether the results presented significant differences by replacing the bat with a racket, using T-test. RESULTS We obtained a content validity index higher than 0.80 for language clarity and relevance of the TGMD-2 for children. There were significant differences in the object control subtest when comparing the results with bat and racket. The intraclass correlation coefficient for reliability inter-rater, intra-rater and test-retest reliability was greater than 0.80 in all cases. CONCLUSIONS The TGMD-2 has appropriate content validity to be applied in the Chilean population. The reliability of this test is within the appropriate parameters and its use could be recommended in this population after the establishment of normative data, setting a further precedent for the validation in other Latin American countries. PMID:26815160
Test Information. Using the Essay as an Assessment Technique. Set 77. Number One. Item 13.
ERIC Educational Resources Information Center
Cowie, Colin
Certain testing procedures will overcome some of the problems associated with the use of essay tests. Essay tests may not validly indicate achievement because the questions included in the test may not fairly represent instructional content. Reliability may be a problem because of variations in examinee response in different situations, in test…
Validation of the Hospital Ethical Climate Survey for older people care.
Suhonen, Riitta; Stolt, Minna; Katajisto, Jouko; Charalambous, Andreas; Olson, Linda L
2015-08-01
The exploration of the ethical climate in the care settings for older people is highlighted in the literature, and it has been associated with various aspects of clinical practice and nurses' jobs. However, ethical climate is seldom studied in the older people care context. Valid, reliable, feasible measures are needed for the measurement of ethical climate. This study aimed to test the reliability, validity, and sensitivity of the Hospital Ethical Climate Survey in healthcare settings for older people. A non-experimental cross-sectional study design was employed, and a survey using questionnaires, including the Hospital Ethical Climate Survey was used for data collection. Data were analyzed using descriptive statistics, inferential statistics, and multivariable methods. Survey data were collected from a sample of nurses working in the care settings for older people in Finland (N = 1513, n = 874, response rate = 58%) in 2011. This study was conducted according to good scientific inquiry guidelines, and ethical approval was obtained from the university ethics committee. The mean score for the Hospital Ethical Climate Survey total was 3.85 (standard deviation = 0.56). Cronbach's alpha was 0.92. Principal component analysis provided evidence for factorial validity. LISREL provided evidence for construct validity based on goodness-of-fit statistics. Pearson's correlations of 0.68-0.90 were found between the sub-scales and the Hospital Ethical Climate Survey. The Hospital Ethical Climate Survey was found able to reveal discrimination across care settings and proved to be a valid and reliable tool for measuring ethical climate in care settings for older people and sensitive enough to reveal variations across various clinical settings. The Finnish version of the Hospital Ethical Climate Survey, used mainly in the hospital settings previously, proved to be a valid instrument to be used in the care settings for older people. Further studies are due to analyze the factor structure and some items of the Hospital Ethical Climate Survey. © The Author(s) 2014.
Mohan, Alladi; Reddy, S. Aparna; Sachan, Alok; Sarma, K.V.S.; Kumar, D. Prabath; Panchagnula, Mahesh V.; Rao, P.V.L.N. Srinivasa; Kumar, B. Siddhartha; Krishnaprasanthi, P.
2016-01-01
Background & Objectives: Glycosylated haemoglobin (HbA1c) has been in use for more than a decade, as a diagnostic test for type 2 diabetes. Validity of HbA1c needs to be established in the ethnic population in which it is intended to be used. The objective of this study was to derive and validate a HbA1c cut-off value for the diagnosis of type 2 diabetes in the ethnic population of Rayalaseema area of south India. Methods: In this cross-sectional study, consecutive patients suspected to have type 2 diabetes underwent fasting plasma glucose (FPG) and 2 h post-load plasma glucose (2 h-PG) measurements after a 75 g glucose load and HbA1c estimation. They were classified as having diabetes as per the American Diabetes Association criteria [(FPG ≥7 mmol/l (≥126 mg/dl) and/or 2 h-PG ≥11.1 mmol/l (≥200 mg/dl)]. In the training data set (n = 342), optimum cut-off value of HbA1c for defining type 2 diabetes was derived by receiver-operator characteristic (ROC) curve method using oral glucose tolerance test results as gold standard. This cut-off was validated in a validation data set (n = 341). Results: On applying HbA1c cut-off value of >6.3 per cent (45 mmol/mol) to the training data set, sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) for diagnosing type 2 diabetes were calculated to be 90.6, 85.2, 80.8 and 93.0 per cent, respectively. When the same cut-off value was applied to the validation data set, sensitivity, specificity, PPV and NPV were 88.8, 81.9, 74.0 and 92.7 per cent, respectively, although the latter were consistently smaller than the proportions for the training data set, the differences being not significant. Interpretation & conclusions: HbA1c >6.3 per cent (45 mmol/mol) appears to be the optimal cut-off value for the diagnosis of type 2 diabetes applicable to the ethnic population of Rayalaseema area of Andhra Pradesh state in south India. PMID:27934801
Garcia Hejl, Carine; Ramirez, Jose Manuel; Vest, Philippe; Chianea, Denis; Renard, Christophe
2014-09-01
Laboratories working towards accreditation by the International Standards Organization (ISO) 15189 standard are required to demonstrate the validity of their analytical methods. The different guidelines set by various accreditation organizations make it difficult to provide objective evidence that an in-house method is fit for the intended purpose. Besides, the required performance characteristics tests and acceptance criteria are not always detailed. The laboratory must choose the most suitable validation protocol and set the acceptance criteria. Therefore, we propose a validation protocol to evaluate the performance of an in-house method. As an example, we validated the process for the detection and quantification of lead in whole blood by electrothermal absorption spectrometry. The fundamental parameters tested were, selectivity, calibration model, precision, accuracy (and uncertainty of measurement), contamination, stability of the sample, reference interval, and analytical interference. We have developed a protocol that has been applied successfully to quantify lead in whole blood by electrothermal atomic absorption spectrometry (ETAAS). In particular, our method is selective, linear, accurate, and precise, making it suitable for use in routine diagnostics.
Chen, Hongda; Werner, Simone; Butt, Julia; Zörnig, Inka; Knebel, Phillip; Michel, Angelika; Eichmüller, Stefan B; Jäger, Dirk; Waterboer, Tim; Pawlita, Michael; Brenner, Hermann
2016-03-29
Novel blood-based screening tests are strongly desirable for early detection of colorectal cancer (CRC). We aimed to identify and evaluate autoantibodies against tumor-associated antigens as biomarkers for early detection of CRC. 380 clinically identified CRC patients and samples of participants with selected findings from a cohort of screening colonoscopy participants in 2005-2013 (N=6826) were included in this analysis. Sixty-four serum autoantibody markers were measured by multiplex bead-based serological assays. A two-step approach with selection of biomarkers in a training set, and validation of findings in a validation set, the latter exclusively including participants from the screening setting, was applied. Anti-MAGEA4 exhibited the highest sensitivity for detecting early stage CRC and advanced adenoma. Multi-marker combinations substantially increased sensitivity at the price of a moderate loss of specificity. Anti-TP53, anti-IMPDH2, anti-MDM2 and anti-MAGEA4 were consistently included in the best-performing 4-, 5-, and 6-marker combinations. This four-marker panel yielded a sensitivity of 26% (95% CI, 13-45%) for early stage CRC at a specificity of 90% (95% CI, 83-94%) in the validation set. Notably, it also detected 20% (95% CI, 13-29%) of advanced adenomas. Taken together, the identified biomarkers could contribute to the development of a useful multi-marker blood-based test for CRC early detection.
Recent NASA Wake-Vortex Flight Tests, Flow-Physics Database and Wake-Development Analysis
NASA Technical Reports Server (NTRS)
Vicroy, Dan D.; Vijgen, Paul M.; Reimer, Heidi M.; Gallegos, Joey L.; Spalart, Philippe R.
1998-01-01
A series of flight tests over the ocean of a four engine turboprop airplane in the cruise configuration have provided a data set for improved understanding of wake vortex physics and atmospheric interaction. An integrated database has been compiled for wake characterization and validation of wake-vortex computational models. This paper describes the wake-vortex flight tests, the data processing, the database development and access, and results obtained from preliminary wake-characterization analysis using the data sets.
Eye-tracking-based assessment of cognitive function in low-resource settings.
Forssman, Linda; Ashorn, Per; Ashorn, Ulla; Maleta, Kenneth; Matchado, Andrew; Kortekangas, Emma; Leppänen, Jukka M
2017-04-01
Early development of neurocognitive functions in infants can be compromised by poverty, malnutrition and lack of adequate stimulation. Optimal management of neurodevelopmental problems in infants requires assessment tools that can be used early in life, and are objective and applicable across economic, cultural and educational settings. The present study examined the feasibility of infrared eye tracking as a novel and highly automated technique for assessing visual-orienting and sequence-learning abilities as well as attention to facial expressions in young (9-month-old) infants. Techniques piloted in a high-resource laboratory setting in Finland (N=39) were subsequently field-tested in a community health centre in rural Malawi (N=40). Parents' perception of the acceptability of the method (Finland 95%, Malawi 92%) and percentages of infants completing the whole eye-tracking test (Finland 95%, Malawi 90%) were high, and percentages of valid test trials (Finland 69-85%, Malawi 68-73%) satisfactory at both sites. Test completion rates were slightly higher for eye tracking (90%) than traditional observational tests (87%) in Malawi. The predicted response pattern indicative of specific cognitive function was replicated in Malawi, but Malawian infants exhibited lower response rates and slower processing speed across tasks. High test completion rates and the replication of the predicted test patterns in a novel environment in Malawi support the feasibility of eye tracking as a technique for assessing infant development in low-resource setting. Further research is needed to the test-retest stability and predictive validity of the eye-tracking scores in low-income settings. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
Korjus, Kristjan; Hebart, Martin N.; Vicente, Raul
2016-01-01
Supervised machine learning methods typically require splitting data into multiple chunks for training, validating, and finally testing classifiers. For finding the best parameters of a classifier, training and validation are usually carried out with cross-validation. This is followed by application of the classifier with optimized parameters to a separate test set for estimating the classifier’s generalization performance. With limited data, this separation of test data creates a difficult trade-off between having more statistical power in estimating generalization performance versus choosing better parameters and fitting a better model. We propose a novel approach that we term “Cross-validation and cross-testing” improving this trade-off by re-using test data without biasing classifier performance. The novel approach is validated using simulated data and electrophysiological recordings in humans and rodents. The results demonstrate that the approach has a higher probability of discovering significant results than the standard approach of cross-validation and testing, while maintaining the nominal alpha level. In contrast to nested cross-validation, which is maximally efficient in re-using data, the proposed approach additionally maintains the interpretability of individual parameters. Taken together, we suggest an addition to currently used machine learning approaches which may be particularly useful in cases where model weights do not require interpretation, but parameters do. PMID:27564393
X-56A MUTT: Aeroservoelastic Modeling
NASA Technical Reports Server (NTRS)
Ouellette, Jeffrey A.
2015-01-01
For the NASA X-56a Program, Armstrong Flight Research Center has been developing a set of linear states space models that integrate the flight dynamics and structural dynamics. These high order models are needed for the control design, control evaluation, and test input design. The current focus has been on developing stiff wing models to validate the current modeling approach. The extension of the modeling approach to the flexible wings requires only a change in the structural model. Individual subsystems models (actuators, inertial properties, etc.) have been validated by component level ground tests. Closed loop simulation of maneuvers designed to validate the flight dynamics of these models correlates very well flight test data. The open loop structural dynamics are also shown to correlate well to the flight test data.
Validation of individual and aggregate global flood hazard models for two major floods in Africa.
NASA Astrophysics Data System (ADS)
Trigg, M.; Bernhofen, M.; Whyman, C.
2017-12-01
A recent intercomparison of global flood hazard models undertaken by the Global Flood Partnership shows that there is an urgent requirement to undertake more validation of the models against flood observations. As part of the intercomparison, the aggregated model dataset resulting from the project was provided as open access data. We compare the individual and aggregated flood extent output from the six global models and test these against two major floods in the African Continent within the last decade, namely severe flooding on the Niger River in Nigeria in 2012, and on the Zambezi River in Mozambique in 2007. We test if aggregating different number and combination of models increases model fit to the observations compared with the individual model outputs. We present results that illustrate some of the challenges of comparing imperfect models with imperfect observations and also that of defining the probability of a real event in order to test standard model output probabilities. Finally, we propose a collective set of open access validation flood events, with associated observational data and descriptions that provide a standard set of tests across different climates and hydraulic conditions.
Bouzas-Mosquera, Alberto; Peteiro, Jesús; Broullón, Francisco J; Álvarez-García, Nemesio; Maneiro-Melón, Nicolás; Pardo-Martinez, Patricia; Sagastagoitia-Fornie, Marta; Martínez, Dolores; Yáñez, Juan C; Vázquez-Rodríguez, José Manuel
2016-08-01
Although cardiac stress testing may help establish the safety of early discharge in patients with suspected acute coronary syndromes and negative troponins, more cost-effective strategies are necessary. We aimed to develop a clinical prediction rule to safely obviate the need for cardiac stress testing in this setting. A decision rule was derived in a prospective cohort of 3001 patients with acute chest pain and negative troponins, and validated in a set of 1473 subjects. The primary end point was a composite of positive cardiac stress testing (in the absence of a subsequent negative coronary angiogram), positive coronary angiography, or any major coronary events within 3 months. A score chart was built based on 7 variables: male sex (+2), age (+1 per decade from the fifth decade), diabetes mellitus (+2), hypercholesterolemia (+1), prior coronary revascularization (+2), type of chest pain (typical angina, +5; non-specific chest pain, -3), and non-diagnostic repolarization abnormalities (+2). In the validation set, the model showed good discrimination (c statistic = 0.84; 95% confidence interval, 0.82-0.87) and calibration (Hosmer-Lemeshow goodness-of-fit test, P= .34). If stress tests were avoided in patients in the validation sample with a sum score of 0 or lower, the number of referrals would be reduced by 23.4%, yielding a negative predictive value of 98.8% (95% confidence interval, 97.0%-99.7%). This novel prediction rule based on a combination of readily available clinical characteristics may be a valuable tool to decide whether stress testing can be reliably avoided in patients with acute chest pain and negative troponins. Copyright © 2016 Elsevier Inc. All rights reserved.
Computerized Numerical Control Test Item Bank.
ERIC Educational Resources Information Center
Reneau, Fred; And Others
This guide contains 285 test items for use in teaching a course in computerized numerical control. All test items were reviewed, revised, and validated by incumbent workers and subject matter instructors. Items are provided for assessing student achievement in such aspects of programming and planning, setting up, and operating machines with…
Walker, Gemma M; Carter, Tim; Aubeeluck, Aimee; Witchell, Miranda; Coad, Jane
2018-01-01
Introduction Currently, no standardised, evidence-based assessment tool for assessing immediate self-harm and suicide in acute paediatric inpatient settings exists. Aim The aim of this study is to develop and test the psychometric properties of an assessment tool that identifies immediate risk of self-harm and suicide in children and young people (10–19 years) in acute paediatric hospital settings. Methods and analysis Development phase: This phase involved a scoping review of the literature to identify and extract items from previously published suicide and self-harm risk assessment scales. Using a modified electronic Delphi approach, these items will then be rated according to their relevance for assessment of immediate suicide or self-harm risk by expert professionals. Inclusion of items will be determined by 65%–70% consensus between raters. Subsequently, a panel of expert members will convene to determine the face validity, appropriate phrasing, item order and response format for the finalised items. Psychometric testing phase: The finalised items will be tested for validity and reliability through a multicentre, psychometric evaluation. Psychometric testing will be undertaken to determine the following: internal consistency, inter-rater reliability, convergent, divergent validity and concurrent validity. Ethics and dissemination Ethical approval was provided by the National Health Service East Midlands—Derby Research Ethics Committee (17/EM/0347) and full governance clearance received by the Health Research Authority and local participating sites. Findings from this study will be disseminated to professionals and the public via peer-reviewed journal publications, popular social media and conference presentations. PMID:29654046
NASA Astrophysics Data System (ADS)
Caillat, A.; Costille, A.; Pascal, S.; Rossin, C.; Vives, S.; Foulon, B.; Sanchez, P.
2017-09-01
Dark matter and dark energy mysteries will be explored by the Euclid ESA M-class space mission which will be launched in 2020. Millions of galaxies will be surveyed through visible imagery and NIR imagery and spectroscopy in order to map in three dimensions the Universe at different evolution stages over the past 10 billion years. The massive NIR spectroscopic survey will be done efficiently by the NISP instrument thanks to the use of grisms (for "Grating pRISMs") developed under the responsibility of the LAM. In this paper, we present the verification philosophy applied to test and validate each grism before the delivery to the project. The test sequence covers a large set of verifications: optical tests to validate efficiency and WFE of the component, mechanical tests to validate the robustness to vibration, thermal tests to validate its behavior in cryogenic environment and a complete metrology of the assembled component. We show the test results obtained on the first grism Engineering and Qualification Model (EQM) which will be delivered to the NISP project in fall 2016.
A novel method to estimate the affinity of HLA-A∗0201 restricted CTL epitope
NASA Astrophysics Data System (ADS)
Xu, Yun-sheng; Lin, Yong; Zhu, Bo; Lin, Zhi-hua
2009-02-01
A set of 70 peptides with affinity for the class I MHC HLA-A∗0201 molecule was subjected to quantitative structure-affinity relationship studies based on the SCORE function with good results ( r2 = 0.6982, RMS = 0.280). Then the 'leave-one-out' cross-validation (LOO-CV) and an outer test set including 18 outer samples were used to validate the QSAR model. The results of the LOO-CV were q2 = 0.6188, RMS = 0.315, and the results of outer test set were r2 = 0.5633, RMS = 0.2292. All these show that the QSAR model has good predictability. Statistical analysis showed that the hydrophobic and hydrogen bond interaction played a significant role in peptide-MHC molecule binding. The study also provided useful information for structure modification of CTL epitope, and laid theoretical base for molecular design of therapeutic vaccine.
Establishment of a VISAR Measurement System for Material Model Validation in DSTO
2013-02-01
advancements published in the works by L.M. Baker, E.R. Hollenbach and W.F. Hemsing [1-3] and results in the user-friendly interface and configuration of the...VISAR system [4] used in the current work . VISAR tests are among the mandatory instrumentation techniques when validating material models and...The present work reports on preliminary tests using the recently commissioned DSTO VISAR system, providing an assessment of the experimental set-up
An Ethical Issue Scale for Community Pharmacy Setting (EISP): Development and Validation.
Crnjanski, Tatjana; Krajnovic, Dusanka; Tadic, Ivana; Stojkov, Svetlana; Savic, Mirko
2016-04-01
Many problems that arise when providing pharmacy services may contain some ethical components and the aims of this study were to develop and validate a scale that could assess difficulties of ethical issues, as well as the frequency of those occurrences in everyday practice of community pharmacists. Development and validation of the scale was conducted in three phases: (1) generating items for the initial survey instrument after qualitative analysis; (2) defining the design and format of the instrument; (3) validation of the instrument. The constructed Ethical Issue scale for community pharmacy setting has two parts containing the same 16 items for assessing the difficulty and frequency thereof. The results of the 171 completely filled out scales were analyzed (response rate 74.89%). The Cronbach's α value of the part of the instrument that examines difficulties of the ethical situations was 0.83 and for the part of the instrument that examined frequency of the ethical situations was 0.84. Test-retest reliability for both parts of the instrument was satisfactory with all Interclass correlation coefficient (ICC) values above 0.6, (for the part that examines severity ICC = 0.809, for the part that examines frequency ICC = 0.929). The 16-item scale, as a self assessment tool, demonstrated a high degree of content, criterion, and construct validity and test-retest reliability. The results support its use as a research tool to asses difficulty and frequency of ethical issues in community pharmacy setting. The validated scale needs to be further employed on a larger sample of pharmacists.
Wygant, Dustin B; Ben-Porath, Yossef S; Arbisi, Paul A; Berry, David T R; Freeman, David B; Heilbronner, Robert L
2009-11-01
The current study examined the effectiveness of the MMPI-2 Restructured Form (MMPI-2-RF; Ben-Porath and Tellegen, 2008) over-reporting indicators in civil forensic settings. The MMPI-2-RF includes three revised MMPI-2 over-reporting validity scales and a new scale to detect over-reported somatic complaints. Participants dissimulated medical and neuropsychological complaints in two simulation samples, and a known-groups sample used symptom validity tests as a response bias criterion. Results indicated large effect sizes for the MMPI-2-RF validity scales, including a Cohen's d of .90 for Fs in a head injury simulation sample, 2.31 for FBS-r, 2.01 for F-r, and 1.97 for Fs in a medical simulation sample, and 1.45 for FBS-r and 1.30 for F-r in identifying poor effort on SVTs. Classification results indicated good sensitivity and specificity for the scales across the samples. This study indicates that the MMPI-2-RF over-reporting validity scales are effective at detecting symptom over-reporting in civil forensic settings.
Vlieg-Boerstra, Berber J; Bijleveld, Charles M A; van der Heide, Sicco; Beusekamp, Berta J; Wolt-Plompen, Saskia A A; Kukler, Jeanet; Brinkman, Joep; Duiverman, Eric J; Dubois, Anthony E J
2004-02-01
The use of double-blind, placebo-controlled food challenges (DBPCFCs) is considered the gold standard for the diagnosis of food allergy. Despite this, materials and methods used in DBPCFCs have not been standardized. The purpose of this study was to develop and validate recipes for use in DBPCFCs in children by using allergenic foods, preferably in their usual edible form. Recipes containing milk, soy, cooked egg, raw whole egg, peanut, hazelnut, and wheat were developed. For each food, placebo and active test food recipes were developed that met the requirements of acceptable taste, allowance of a challenge dose high enough to elicit reactions in an acceptable volume, optimal matrix ingredients, and good matching of sensory properties of placebo and active test food recipes. Validation was conducted on the basis of sensory tests for difference by using the triangle test and the paired comparison test. Recipes were first tested by volunteers from the hospital staff and subsequently by a professional panel of food tasters in a food laboratory designed for sensory testing. Recipes were considered to be validated if no statistically significant differences were found. Twenty-seven recipes were developed and found to be valid by the volunteer panel. Of these 27 recipes, 17 could be validated by the professional panel. Sensory testing with appropriate statistical analysis allows for objective validation of challenge materials. We recommend the use of professional tasters in the setting of a food laboratory for best results.
Spectral signature verification using statistical analysis and text mining
NASA Astrophysics Data System (ADS)
DeCoster, Mallory E.; Firpi, Alexe H.; Jacobs, Samantha K.; Cone, Shelli R.; Tzeng, Nigel H.; Rodriguez, Benjamin M.
2016-05-01
In the spectral science community, numerous spectral signatures are stored in databases representative of many sample materials collected from a variety of spectrometers and spectroscopists. Due to the variety and variability of the spectra that comprise many spectral databases, it is necessary to establish a metric for validating the quality of spectral signatures. This has been an area of great discussion and debate in the spectral science community. This paper discusses a method that independently validates two different aspects of a spectral signature to arrive at a final qualitative assessment; the textual meta-data and numerical spectral data. Results associated with the spectral data stored in the Signature Database1 (SigDB) are proposed. The numerical data comprising a sample material's spectrum is validated based on statistical properties derived from an ideal population set. The quality of the test spectrum is ranked based on a spectral angle mapper (SAM) comparison to the mean spectrum derived from the population set. Additionally, the contextual data of a test spectrum is qualitatively analyzed using lexical analysis text mining. This technique analyzes to understand the syntax of the meta-data to provide local learning patterns and trends within the spectral data, indicative of the test spectrum's quality. Text mining applications have successfully been implemented for security2 (text encryption/decryption), biomedical3 , and marketing4 applications. The text mining lexical analysis algorithm is trained on the meta-data patterns of a subset of high and low quality spectra, in order to have a model to apply to the entire SigDB data set. The statistical and textual methods combine to assess the quality of a test spectrum existing in a database without the need of an expert user. This method has been compared to other validation methods accepted by the spectral science community, and has provided promising results when a baseline spectral signature is present for comparison. The spectral validation method proposed is described from a practical application and analytical perspective.
A Supervised Learning Process to Validate Online Disease Reports for Use in Predictive Models.
Patching, Helena M M; Hudson, Laurence M; Cooke, Warrick; Garcia, Andres J; Hay, Simon I; Roberts, Mark; Moyes, Catherine L
2015-12-01
Pathogen distribution models that predict spatial variation in disease occurrence require data from a large number of geographic locations to generate disease risk maps. Traditionally, this process has used data from public health reporting systems; however, using online reports of new infections could speed up the process dramatically. Data from both public health systems and online sources must be validated before they can be used, but no mechanisms exist to validate data from online media reports. We have developed a supervised learning process to validate geolocated disease outbreak data in a timely manner. The process uses three input features, the data source and two metrics derived from the location of each disease occurrence. The location of disease occurrence provides information on the probability of disease occurrence at that location based on environmental and socioeconomic factors and the distance within or outside the current known disease extent. The process also uses validation scores, generated by disease experts who review a subset of the data, to build a training data set. The aim of the supervised learning process is to generate validation scores that can be used as weights going into the pathogen distribution model. After analyzing the three input features and testing the performance of alternative processes, we selected a cascade of ensembles comprising logistic regressors. Parameter values for the training data subset size, number of predictors, and number of layers in the cascade were tested before the process was deployed. The final configuration was tested using data for two contrasting diseases (dengue and cholera), and 66%-79% of data points were assigned a validation score. The remaining data points are scored by the experts, and the results inform the training data set for the next set of predictors, as well as going to the pathogen distribution model. The new supervised learning process has been implemented within our live site and is being used to validate the data that our system uses to produce updated predictive disease maps on a weekly basis.
MABAL: a Novel Deep-Learning Architecture for Machine-Assisted Bone Age Labeling.
Mutasa, Simukayi; Chang, Peter D; Ruzal-Shapiro, Carrie; Ayyala, Rama
2018-02-05
Bone age assessment (BAA) is a commonly performed diagnostic study in pediatric radiology to assess skeletal maturity. The most commonly utilized method for assessment of BAA is the Greulich and Pyle method (Pediatr Radiol 46.9:1269-1274, 2016; Arch Dis Child 81.2:172-173, 1999) atlas. The evaluation of BAA can be a tedious and time-consuming process for the radiologist. As such, several computer-assisted detection/diagnosis (CAD) methods have been proposed for automation of BAA. Classical CAD tools have traditionally relied on hard-coded algorithmic features for BAA which suffer from a variety of drawbacks. Recently, the advent and proliferation of convolutional neural networks (CNNs) has shown promise in a variety of medical imaging applications. There have been at least two published applications of using deep learning for evaluation of bone age (Med Image Anal 36:41-51, 2017; JDI 1-5, 2017). However, current implementations are limited by a combination of both architecture design and relatively small datasets. The purpose of this study is to demonstrate the benefits of a customized neural network algorithm carefully calibrated to the evaluation of bone age utilizing a relatively large institutional dataset. In doing so, this study will aim to show that advanced architectures can be successfully trained from scratch in the medical imaging domain and can generate results that outperform any existing proposed algorithm. The training data consisted of 10,289 images of different skeletal age examinations, 8909 from the hospital Picture Archiving and Communication System at our institution and 1383 from the public Digital Hand Atlas Database. The data was separated into four cohorts, one each for male and female children above the age of 8, and one each for male and female children below the age of 10. The testing set consisted of 20 radiographs of each 1-year-age cohort from 0 to 1 years to 14-15+ years, half male and half female. The testing set included left-hand radiographs done for bone age assessment, trauma evaluation without significant findings, and skeletal surveys. A 14 hidden layer-customized neural network was designed for this study. The network included several state of the art techniques including residual-style connections, inception layers, and spatial transformer layers. Data augmentation was applied to the network inputs to prevent overfitting. A linear regression output was utilized. Mean square error was used as the network loss function and mean absolute error (MAE) was utilized as the primary performance metric. MAE accuracies on the validation and test sets for young females were 0.654 and 0.561 respectively. For older females, validation and test accuracies were 0.662 and 0.497 respectively. For young males, validation and test accuracies were 0.649 and 0.585 respectively. Finally, for older males, validation and test set accuracies were 0.581 and 0.501 respectively. The female cohorts were trained for 900 epochs each and the male cohorts were trained for 600 epochs. An eightfold cross-validation set was employed for hyperparameter tuning. Test error was obtained after training on a full data set with the selected hyperparameters. Using our proposed customized neural network architecture on our large available data, we achieved an aggregate validation and test set mean absolute errors of 0.637 and 0.536 respectively. To date, this is the best published performance on utilizing deep learning for bone age assessment. Our results support our initial hypothesis that customized, purpose-built neural networks provide improved performance over networks derived from pre-trained imaging data sets. We build on that initial work by showing that the addition of state-of-the-art techniques such as residual connections and inception architecture further improves prediction accuracy. This is important because the current assumption for use of residual and/or inception architectures is that a large pre-trained network is required for successful implementation given the relatively small datasets in medical imaging. Instead we show that a small, customized architecture incorporating advanced CNN strategies can indeed be trained from scratch, yielding significant improvements in algorithm accuracy. It should be noted that for all four cohorts, testing error outperformed validation error. One reason for this is that our ground truth for our test set was obtained by averaging two pediatric radiologist reads compared to our training data for which only a single read was used. This suggests that despite relatively noisy training data, the algorithm could successfully model the variation between observers and generate estimates that are close to the expected ground truth.
Development of the Online Assessment of Athletic Training Education (OAATE) Instrument
ERIC Educational Resources Information Center
Carr, W. David; Frey, Bruce B.; Swann, Elizabeth
2009-01-01
Objective: To establish the validity and reliability of an online assessment instrument's items developed to track educational outcomes over time. Design and Setting: A descriptive study of the validation arguments and reliability testing of the assessment items. The instrument is available to graduating students enrolled in entry-level Athletic…
Instrument Development and Validation of the Infant and Toddler Assessment for Quality Improvement
ERIC Educational Resources Information Center
Perlman, Michal; Brunsek, Ashley; Hepditch, Anne; Gray, Karen; Falenchuck, Olesya
2017-01-01
Research Findings: There is a growing need for accurate and efficient measures of classroom quality in early childhood education and care (ECEC) settings. Observational measures are costly, as their administration generally takes 3-5 hr per classroom. This article outlines the process of development and preliminary concurrent validity testing of…
The development and validation of the Closed-set Mandarin Sentence (CMS) test.
Tao, Duo-Duo; Fu, Qian-Jie; Galvin, John J; Yu, Ya-Feng
2017-09-01
Matrix-styled sentence tests offer a closed-set paradigm that may be useful when evaluating speech intelligibility. Ideally, sentence test materials should reflect the distribution of phonemes within the target language. We developed and validated the Closed-set Mandarin Sentence (CMS) test to assess Mandarin speech intelligibility in noise. CMS test materials were selected to be familiar words and to represent the natural distribution of vowels, consonants, and lexical tones found in Mandarin Chinese. Ten key words in each of five categories (Name, Verb, Number, Color, and Fruit) were produced by a native Mandarin talker, resulting in a total of 50 words that could be combined to produce 100,000 unique sentences. Normative data were collected in 10 normal-hearing, adult Mandarin-speaking Chinese listeners using a closed-set test paradigm. Two test runs were conducted for each subject, and 20 sentences per run were randomly generated while ensuring that each word was presented only twice in each run. First, the level of the words in each category were adjusted to produce equal intelligibility in noise. Test-retest reliability for word-in-sentence recognition was excellent according to Cronbach's alpha (0.952). After the category level adjustments, speech reception thresholds (SRTs) for sentences in noise, defined as the signal-to-noise ratio (SNR) that produced 50% correct whole sentence recognition, were adaptively measured by adjusting the SNR according to the correctness of response. The mean SRT was -7.9 (SE=0.41) and -8.1 (SE=0.34) dB for runs 1 and 2, respectively. The mean standard deviation across runs was 0.93 dB, and paired t-tests showed no significant difference between runs 1 and 2 (p=0.74) despite random sentences being generated for each run and each subject. The results suggest that the CMS provides large stimulus set with which to repeatedly and reliably measure Mandarin-speaking listeners' speech understanding in noise using a closed-set paradigm.
A framework for the design and development of physical employment tests and standards.
Payne, W; Harvey, J
2010-07-01
Because operational tasks in the uniformed services (military, police, fire and emergency services) are physically demanding and incur the risk of injury, employment policy in these services is usually competency based and predicated on objective physical employment standards (PESs) based on physical employment tests (PETs). In this paper, a comprehensive framework for the design of PETs and PESs is presented. Three broad approaches to physical employment testing are described and compared: generic predictive testing; task-related predictive testing; task simulation testing. Techniques for the selection of a set of tests with good coverage of job requirements, including job task analysis, physical demands analysis and correlation analysis, are discussed. Regarding individual PETs, theoretical considerations including measurability, discriminating power, reliability and validity, and practical considerations, including development of protocols, resource requirements, administrative issues and safety, are considered. With regard to the setting of PESs, criterion referencing and norm referencing are discussed. STATEMENT OF RELEVANCE: This paper presents an integrated and coherent framework for the development of PESs and hence provides a much needed theoretically based but practically oriented guide for organisations seeking to establish valid and defensible PESs.
NASA Technical Reports Server (NTRS)
Wolf, S. W. D.
1984-01-01
Self streamlining two dimensional flexible walled test sections eliminate the uncertainties found in data from conventional test sections particularly at transonic speeds. The test section sidewalls are rigid, while the floor and ceiling are flexible and are positioned to streamline shapes by a system of jacks, without reference to the model. The walls are therefore self streamlining. Data are taken from the model when the walls are good streamlines such that the inevitable residual wall induced interference is acceptably small and correctable. Successful two dimensional validation testing at low speeds has led to the development of a new transonic flexible walled test section. Tunnel setting times are minimized by the development of a rapid wall setting strategy coupled with on line computer control of wall shapes using motorized jacks. Two dimensional validation testing using symmetric and cambered aerofoils in the Mach number range up to about 0.85 where the walls are just supercritical, shows good agreement with reference data using small height-chord ratios between 1.5 and unity.
Woitha, Kathrin; Van Beek, Karen; Ahmed, Nisar; Jaspers, Birgit; Mollard, Jean M; Ahmedzai, Sam H; Hasselaar, Jeroen; Menten, Johan; Vissers, Kris; Engels, Yvonne
2014-02-01
Validated quality indicators can help health-care professionals to evaluate their medical practices in a comparative manner to deliver optimal clinical care. No international set of quality indicators to measure the organizational aspects of palliative care settings exists. To develop and validate a set of structure and process indicators for palliative care settings in Europe. A two-round modified RAND Delphi process was conducted to rate clarity and usefulness of a previously developed set of 110 quality indicators. In total, 20 multi-professional palliative care teams of centers of excellence from seven European countries. In total, 56 quality indicators were rated as useful. These valid quality indicators concerned the following domains: the definition of a palliative care service (2 quality indicators), accessibility to palliative care (16 quality indicators), specific infrastructure to deliver palliative care (8 quality indicators), symptom assessment tools (1 quality indicator), specific personnel in palliative care services (9 quality indicators), documentation methodology of clinical data (14 quality indicators), evaluation of quality and safety procedures (1 quality indicator), reporting of clinical activities (1 quality indicator), and education in palliative care (4 quality indicator). The modified RAND Delphi process resulted in 56 international face-validated quality indicators to measure and compare organizational aspects of palliative care. These quality indicators, aimed to assess and improve the organization of palliative care, will be pilot tested in palliative care settings all over Europe and be used in the EU FP7 funded IMPACT project.
Evaluation of a panel of 28 biomarkers for the non-invasive diagnosis of endometriosis.
Vodolazkaia, A; El-Aalamat, Y; Popovic, D; Mihalyi, A; Bossuyt, X; Kyama, C M; Fassbender, A; Bokor, A; Schols, D; Huskens, D; Meuleman, C; Peeraer, K; Tomassetti, C; Gevaert, O; Waelkens, E; Kasran, A; De Moor, B; D'Hooghe, T M
2012-09-01
At present, the only way to conclusively diagnose endometriosis is laparoscopic inspection, preferably with histological confirmation. This contributes to the delay in the diagnosis of endometriosis which is 6-11 years. So far non-invasive diagnostic approaches such as ultrasound (US), MRI or blood tests do not have sufficient diagnostic power. Our aim was to develop and validate a non-invasive diagnostic test with a high sensitivity (80% or more) for symptomatic endometriosis patients, without US evidence of endometriosis, since this is the group most in need of a non-invasive test. A total of 28 inflammatory and non-inflammatory plasma biomarkers were measured in 353 EDTA plasma samples collected at surgery from 121 controls without endometriosis at laparoscopy and from 232 women with endometriosis (minimal-mild n = 148; moderate-severe n = 84), including 175 women without preoperative US evidence of endometriosis. Surgery was done during menstrual (n = 83), follicular (n = 135) and luteal (n = 135) phases of the menstrual cycle. For analysis, the data were randomly divided into an independent training (n = 235) and a test (n = 118) data set. Statistical analysis was done using univariate and multivariate (logistic regression and least squares support vector machines (LS-SVM) approaches in training- and test data set separately to validate our findings. In the training set, two models of four biomarkers (Model 1: annexin V, VEGF, CA-125 and glycodelin; Model 2: annexin V, VEGF, CA-125 and sICAM-1) analysed in plasma, obtained during the menstrual phase, could predict US-negative endometriosis with a high sensitivity (81-90%) and an acceptable specificity (68-81%). The same two models predicted US-negative endometriosis in the independent validation test set with a high sensitivity (82%) and an acceptable specificity (63-75%). In plasma samples obtained during menstruation, multivariate analysis of four biomarkers (annexin V, VEGF, CA-125 and sICAM-1/or glycodelin) enabled the diagnosis of endometriosis undetectable by US with a sensitivity of 81-90% and a specificity of 63-81% in independent training- and test data set. The next step is to apply these models for preoperative prediction of endometriosis in an independent set of patients with infertility and/or pain without US evidence of endometriosis, scheduled for laparoscopy.
Mertens, H W; Milburn, N J; Collins, W E
2000-12-01
Two practical color vision tests were developed and validated for use in screening Air Traffic Control Specialist (ATCS) applicants for work at en route center or terminal facilities. The development of the tests involved careful reproduction/simulation of color-coded materials from the most demanding, safety-critical color task performed in each type of facility. The tests were evaluated using 106 subjects with normal color vision and 85 with color vision deficiency. The en route center test, named the Flight Progress Strips Test (FPST), required the identification of critical red/black coding in computer printing and handwriting on flight progress strips. The terminal option test, named the Aviation Lights Test (ALT), simulated red/green/white aircraft lights that must be identified in night ATC tower operations. Color-coding is a non-redundant source of safety-critical information in both tasks. The FPST was validated by direct comparison of responses to strip reproductions with responses to the original flight progress strips and a set of strips selected independently. Validity was high; Kappa = 0.91 with original strips as the validation criterion and 0.86 with different strips. The light point stimuli of the ALT were validated physically with a spectroradiometer. The reliabilities of the FPST and ALT were estimated with Chronbach's alpha as 0.93 and 0.98, respectively. The high job-relevance, validity, and reliability of these tests increases the effectiveness and fairness of ATCS color vision testing.
2014-03-01
Behçet's disease (BD) is a chronic, relapsing, inflammatory vascular disease with no pathognomonic test. Low sensitivity of the currently applied International Study Group (ISG) clinical diagnostic criteria led to their reassessment. An International Team for the Revision of the International Criteria for BD (from 27 countries) submitted data from 2556 clinically diagnosed BD patients and 1163 controls with BD-mimicking diseases or presenting at least one major BD sign. These were randomly divided into training and validation sets. Logistic regression, 'leave-one-country-out' cross-validation and clinical judgement were employed to develop new International Criteria for BD (ICBD) with the training data. Existing and new criteria were tested for their performance in the validation set. For the ICBD, ocular lesions, oral aphthosis and genital aphthosis are each assigned 2 points, while skin lesions, central nervous system involvement and vascular manifestations 1 point each. The pathergy test, when used, was assigned 1 point. A patient scoring ≥4 points is classified as having BD. In the training set, 93.9% sensitivity and 92.1% specificity were assessed compared with 81.2% sensitivity and 95.9% specificity for the ISG criteria. In the validation set, ICBD demonstrated an unbiased estimate of sensitivity of 94.8% (95% CI: 93.4-95.9%), considerably higher than that of the ISG criteria (85.0%). Specificity (90.5%, 95% CI: 87.9-92.8%) was lower than that of the ISG-criteria (96.0%), yet still reasonably high. For countries with at least 90%-of-cases and controls having a pathergy test, adding 1 point for pathergy test increased the estimate of sensitivity from 95.5% to 98.5%, while barely reducing specificity from 92.1% to 91.6%. The new proposed criteria derived from multinational data exhibits much improved sensitivity over the ISG criteria while maintaining reasonable specificity. It is proposed that the ICBD criteria to be adopted both as a guide for diagnosis and classification of BD. © 2013 The Authors Journal of the European Academy of Dermatology and Venereology © 2013 European Academy of Dermatology and Venereology.
Blum, David; Rosa, Daniel; deWolf-Linder, Susanne; Hayoz, Stefanie; Ribi, Karin; Koeberle, Dieter; Strasser, Florian
2014-12-01
Oncologists perform a range of pharmacological and nonpharmacological interventions to manage the symptoms of outpatients with advanced cancer. The aim of this study was to develop and test a symptom management performance checklist (SyMPeC) to review medical charts. First, the content of the checklist was determined by consensus of an interprofessional team. The SyMPeC was tested using the data set of the SAKK 96/06 E-MOSAIC (Electronical Monitoring of Symptoms and Syndromes Associated with Cancer) trial, which included six consecutive visits from 247 patients. In a test data set (half of the data) of medical charts, two people extracted and quantified the definitions of the parameters (content validity). To assess the inter-rater reliability, three independent researchers used the SyMPeC on a random sample (10% of the test data set), and Fleiss's kappa was calculated. To test external validity, the interventions retrieved by the SyMPeC chart review were compared with nurse-led assessment of patient-perceived oncologists' palliative interventions. Five categories of symptoms were included: pain, fatigue, anorexia/nausea, dyspnea, and depression/anxiety. Interventions were categorized as symptom specific or symptom unspecific. In the test data set of 123 patients, 402 unspecific and 299 symptom-specific pharmacological interventions were detected. Nonpharmacological interventions (n = 242) were mostly symptom unspecific. Fleiss's kappa for symptom and intervention detections was K = 0.7 and K = 0.86, respectively. In 1003 of 1167 visits (86%), there was a match between SyMPeC and nurse-led assessment. Seventy-nine percent (195 of 247) of patients had no or one mismatch. Chart review by SyMPeC seems reliable to detect symptom management interventions by oncologists in outpatient clinics. Nonpharmacological interventions were less symptom specific. A template for documentation is needed for standardization. Copyright © 2014 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.
Dragomir-Daescu, Dan; Buijs, Jorn Op Den; McEligot, Sean; Dai, Yifei; Entwistle, Rachel C.; Salas, Christina; Melton, L. Joseph; Bennet, Kevin E.; Khosla, Sundeep; Amin, Shreyasee
2013-01-01
Clinical implementation of quantitative computed tomography-based finite element analysis (QCT/FEA) of proximal femur stiffness and strength to assess the likelihood of proximal femur (hip) fractures requires a unified modeling procedure, consistency in predicting bone mechanical properties, and validation with realistic test data that represent typical hip fractures, specifically, a sideways fall on the hip. We, therefore, used two sets (n = 9, each) of cadaveric femora with bone densities varying from normal to osteoporotic to build, refine, and validate a new class of QCT/FEA models for hip fracture under loading conditions that simulate a sideways fall on the hip. Convergence requirements of finite element models of the first set of femora led to the creation of a new meshing strategy and a robust process to model proximal femur geometry and material properties from QCT images. We used a second set of femora to cross-validate the model parameters derived from the first set. Refined models were validated experimentally by fracturing femora using specially designed fixtures, load cells, and high speed video capture. CT image reconstructions of fractured femora were created to classify the fractures. The predicted stiffness (cross-validation R2 = 0.87), fracture load (cross-validation R2 = 0.85), and fracture patterns (83% agreement) correlated well with experimental data. PMID:21052839
ERIC Educational Resources Information Center
Meeker, Mary; Meeker, Robert
In this analysis of intelligence testing of minority group children, the implications of inadequate testing practices are discussed. Several aspects of test design are examined: deficiencies in intelligence testing, cultural bias, construct validity, and diagnostic utility. A sample set of results derived from a Stanford-Binet test administered to…
Assessment of protein set coherence using functional annotations
Chagoyen, Monica; Carazo, Jose M; Pascual-Montano, Alberto
2008-01-01
Background Analysis of large-scale experimental datasets frequently produces one or more sets of proteins that are subsequently mined for functional interpretation and validation. To this end, a number of computational methods have been devised that rely on the analysis of functional annotations. Although current methods provide valuable information (e.g. significantly enriched annotations, pairwise functional similarities), they do not specifically measure the degree of homogeneity of a protein set. Results In this work we present a method that scores the degree of functional homogeneity, or coherence, of a set of proteins on the basis of the global similarity of their functional annotations. The method uses statistical hypothesis testing to assess the significance of the set in the context of the functional space of a reference set. As such, it can be used as a first step in the validation of sets expected to be homogeneous prior to further functional interpretation. Conclusion We evaluate our method by analysing known biologically relevant sets as well as random ones. The known relevant sets comprise macromolecular complexes, cellular components and pathways described for Saccharomyces cerevisiae, which are mostly significantly coherent. Finally, we illustrate the usefulness of our approach for validating 'functional modules' obtained from computational analysis of protein-protein interaction networks. Matlab code and supplementary data are available at PMID:18937846
Taylor, K; Parashar, D; Bouverat, G; Poulos, A; Gullien, R; Stewart, E; Aarre, R; Crystal, P; Wallis, M
2017-11-01
Optimum mammography positioning technique is necessary to maximise cancer detection. Current criteria for mammography appraisal lack reliability and validity with a need to develop a more objective system. We aimed to establish current international practice in assessing image quality (IQ), of screening mammograms then develop and validate a reproducible assessment tool. A questionnaire sent to centres in countries undertaking population screening identified practice, participants for an expert panel (EP) of radiologists/radiographers and a testing panel (TP) of radiographers. The EP developed category criteria and descriptors using a modified Delphi process to agree definitions. The EP scored 12 screening mammograms to test agreement then a main set of 178 cases. Weighted scores were derived for each descriptor enabling calculation of numerical parameters for each new category. The TP then scored the main set. Statistical analysis included ANOVA, t-tests and Kendall's coefficient. 11 centres in 8 countries responded forming an EP of 7 members and TP of 44 members. The EP showed moderate agreement when the scoring the mini test set W = 0.50 p < 0.001 and the main set W = 0.55 p < 0.001, 'posterior nipple line' being the most difficult descriptor. The weighted total scores differentiated the 4 new categories Perfect, Good, Adequate and Inadequate (p < 0.001). We have developed an assessment tool by Delphi consensus and weighted consensus criteria. We have successfully tabulated a range of numerical scores for each new category providing the first validated and reproducible mammography IQ scoring system. Copyright © 2017 The College of Radiographers. Published by Elsevier Ltd. All rights reserved.
Paddick, Stella-Maria; Gray, William K; McGuire, Jackie; Richardson, Jenny; Dotchin, Catherine; Walker, Richard W
2017-06-01
The majority of older adults with dementia live in low- and middle-income countries (LMICs). Illiteracy and low educational background are common in older LMIC populations, particularly in rural areas, and cognitive screening tools developed for this setting must reflect this. This study aimed to review published validation studies of cognitive screening tools for dementia in low-literacy settings in order to determine the most appropriate tools for use. A systematic search of major databases was conducted according to PRISMA guidelines. Validation studies of brief cognitive screening tests including illiterate participants or those with elementary education were eligible. Studies were quality assessed using the QUADAS-2 tool. Good or fair quality studies were included in a bivariate random-effects meta-analysis and a hierarchical summary receiver operating characteristic (HSROC) curve constructed. Forty-five eligible studies were quality assessed. A significant proportion utilized a case-control design, resulting in spectrum bias. The area under the ROC (AUROC) curve was 0.937 for community/low prevalence studies, 0.881 for clinic based/higher prevalence studies, and 0.869 for illiterate populations. For the Mini-Mental State Examination (MMSE) (and adaptations), the AUROC curve was 0.853. Numerous tools for assessment of cognitive impairment in low-literacy settings have been developed, and tools developed for use in high-income countries have also been validated in low-literacy settings. Most tools have been inadequately validated, with only MMSE, cognitive abilities screening instrument (CASI), Eurotest, and Fototest having more than one published good or fair quality study in an illiterate or low-literate setting. At present no screening test can be recommended.
ERIC Educational Resources Information Center
Harsch, Claudia; Ushloda, Ema; Ladroue, Christophe
2017-01-01
The project examined the predictive validity of the "TOEFL iBT"® test with a focus on the relationship between TOEFL iBT scores and students' subsequent academic success in postgraduate studies in one leading university in the United Kingdom, paying specific attention to the role of linguistic preparedness as perceived by students and…
Good, Andrew C; Hermsmeier, Mark A
2007-01-01
Research into the advancement of computer-aided molecular design (CAMD) has a tendency to focus on the discipline of algorithm development. Such efforts are often wrought to the detriment of the data set selection and analysis used in said algorithm validation. Here we highlight the potential problems this can cause in the context of druglikeness classification. More rigorous efforts are applied to the selection of decoy (nondruglike) molecules from the ACD. Comparisons are made between model performance using the standard technique of random test set creation with test sets derived from explicit ontological separation by drug class. The dangers of viewing druglike space as sufficiently coherent to permit simple classification are highlighted. In addition the issues inherent in applying unfiltered data and random test set selection to (Q)SAR models utilizing large and supposedly heterogeneous databases are discussed.
Results from an Independent View on The Validation of Safety-Critical Space Systems
NASA Astrophysics Data System (ADS)
Silva, N.; Lopes, R.; Esper, A.; Barbosa, R.
2013-08-01
The Independent verification and validation (IV&V) has been a key process for decades, and is considered in several international standards. One of the activities described in the “ESA ISVV Guide” is the independent test verification (stated as Integration/Unit Test Procedures and Test Data Verification). This activity is commonly overlooked since customers do not really see the added value of checking thoroughly the validation team work (could be seen as testing the tester's work). This article presents the consolidated results of a large set of independent test verification activities, including the main difficulties, results obtained and advantages/disadvantages for the industry of these activities. This study will support customers in opting-in or opting-out for this task in future IV&V contracts since we provide concrete results from real case studies in the space embedded systems domain.
Alhaiti, Ali Hassan; Alotaibi, Alanod Raffa; Jones, Linda Katherine; DaCosta, Cliff
2016-01-01
Objective. To translate the revised Michigan Diabetes Knowledge Test into the Arabic language and examine its psychometric properties. Setting. Of the 139 participants recruited through King Fahad Medical City in Riyadh, Saudi Arabia, 34 agreed to the second-round sample for retesting purposes. Methods. The translation process followed the World Health Organization's guidelines for the translation and adaptation of instruments. All translations were examined for their validity and reliability. Results. The translation process revealed excellent results throughout all stages. The Arabic version received 0.75 for internal consistency via Cronbach's alpha test and excellent outcomes in terms of the test-retest reliability of the instrument with a mean of 0.90 infraclass correlation coefficient. It also received positive content validity index scores. The item-level content validity index for all instrument scales fell between 0.83 and 1 with a mean scale-level index of 0.96. Conclusion. The Arabic version is proven to be a reliable and valid measure of patient's knowledge that is ready to be used in clinical practices. PMID:27995149
Testing the Zimbardo Time Perspective Inventory in the Chinese context.
Wang, Ya; Chen, Xing-Jie; Cui, Ji-Fang; Liu, Lu-Lu
2015-09-01
In this study, the authors evaluated the Chinese version of the Zimbardo Time Perspective Inventory (ZTPI). The ZTPI was tested among a sample of 303 university students. A subsample of 51 participants was then asked to complete the ZTPI again along with another set of questionnaires. The five-factor model of a 20-item short version of the ZTPI showed good model fit, internal consistency, and test-retest reliability. The 20-item Chinese version of the ZTPI also provided good validity, showing correlations with other variables in expected directions. Past-Positive was positively correlated with reappraisal and negatively correlated with suppression emotion regulation strategies, and Present-Hedonistic was positively correlated with reappraisal emotion regulation strategies. These findings indicate that the ZTPI is a reliable and valid instrument for measuring time perspective in the Chinese setting. © 2015 The Institute of Psychology, Chinese Academy of Sciences and Wiley Publishing Asia Pty Ltd.
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert; Boone, Alan R.
2003-01-01
Data from the test of a large semispan model was used to perform a direct validation of a wall interference correction system for a transonic slotted wall wind tunnel. At first, different sets of uncorrected aerodynamic coefficients were generated by physically changing the boundary condition of the test section walls. Then, wall interference corrections were computed and applied to all data points. Finally, an interpolation of the corrected aerodynamic coefficients was performed. This interpolation made sure that the corrected Mach number of a given run would be constant. Overall, the agreement between corresponding interpolated lift, drag, and pitching moment coefficient sets was very good. Buoyancy corrections were also investigated. These studies showed that the accuracy goal of one drag count may only be achieved if reliable estimates of the wall interference induced buoyancy correction are available during a test.
Esplen, Mary Jane; Cappelli, Mario; Wong, Jiahui; Bottorff, Joan L; Hunter, Jon; Carroll, June; Dorval, Michel; Wilson, Brenda; Allanson, Judith; Semotiuk, Kara; Aronson, Melyssa; Bordeleau, Louise; Charlemagne, Nicole; Meschino, Wendy
2013-01-01
Objectives To develop a brief, reliable and valid instrument to screen psychosocial risk among those who are undergoing genetic testing for Adult-Onset Hereditary Disease (AOHD). Design A prospective two-phase cohort study. Setting 5 genetic testing centres for AOHD, such as cancer, Huntington's disease or haemochromatosis, in ambulatory clinics of tertiary hospitals across Canada. Participants 141 individuals undergoing genetic testing were approached and consented to the instrument development phase of the study (Phase I). The Genetic Psychosocial Risk Instrument (GPRI) developed in Phase I was tested in Phase II for item refinement and validation. A separate cohort of 722 individuals consented to the study, 712 completed the baseline package and 463 completed all follow-up assessments. Most participants were female, at the mid-life stage. Individuals in advanced stages of the illness or with cognitive impairment or a language barrier were excluded. Interventions Phase I: GPRI items were generated from (1) a review of the literature, (2) input from genetic counsellors and (3) phase I participants. Phase II: further item refinement and validation were conducted with a second cohort of participants who completed the GPRI at baseline and were followed for psychological distress 1-month postgenetic testing results. Primary and secondary outcome measures GPRI, Hamilton Depression Rating Scale (HAM-D), Hamilton Anxiety Rating Scale (HAM-A), Brief Symptom Inventory (BSI) and Impact of Event Scale (IES). Results The final 20-item GPRI had a high reliability—Cronbach's α at 0.81. The construct validity was supported by high correlations between GPRI and BSI and IES. The predictive value was demonstrated by a receiver operating characteristic curve of 0.78 plotting GPRI against follow-up assessments using HAM-D and HAM-A. Conclusions With a cut-off score of 50, GPRI identified 84% of participants who displayed distress postgenetic testing results, supporting its potential usefulness in a clinical setting. PMID:23485718
A Method to Examine Content Domain Structures
ERIC Educational Resources Information Center
D'Agostino, Jerome; Karpinski, Aryn; Welsh, Megan
2011-01-01
After a test is developed, most content validation analyses shift from ascertaining domain definition to studying domain representation and relevance because the domain is assumed to be set once a test exists. We present an approach that allows for the examination of alternative domain structures based on extant test items. In our example based on…
Prognostication in Pulmonary Arterial Hypertension with Submaximal Exercise Testing.
Khatri, Vinod; Neal, Jennifer E; Burger, Charles D; Lee, Augustine S
2015-02-06
The submaximal exercise test (SET), which gives both a measure of exercise tolerance, as well as disease severity, should be a more robust functional and prognostic marker than the six-minute walk test (6MWT). This study aimed to determine the prognostic value of SET as predicted by the validated REVEAL (Registry to Evaluate Early and Long-Term Pulmonary Artery Hypertension Disease Management) registry risk score (RRRS). Sixty-five consecutive patients with idiopathic and associated pulmonary arterial hypertension (PAH) underwent right-heart catheterization, echocardiogram, 6MWT and a three-minute SET (Shape-HF™). Analyses explored the association between SET variables and prognosis predicted by the RRRS. Although multiple SET variables correlated with the RRRS on univariate analyses, only V E /V CO2 (r = 0.57, p < 0.0001) remained an independent predictor in multivariate analysis (β = 0.05, p = 0.0371). Additionally, the V E /V CO2 was the most discriminatory (area under receiver operating characteristic curve, 0.84) in identifying the highest-risk category (RRRS ≥ 10), with an optimal cut-off of 40.6, resulting in a high sensitivity (92%) and negative-predictive value (97%), but a lower specificity (67%). SETs, particularly the V E /V CO2 , appear to have prognostic value when compared to the RRRS. If validated in prospective trials, SET should prove superior to the 6MWT or the RRRS, with significant implications for both future clinical trials and clinical practice.
Russo, Frank A.
2018-01-01
The RAVDESS is a validated multimodal database of emotional speech and song. The database is gender balanced consisting of 24 professional actors, vocalizing lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity, with an additional neutral expression. All conditions are available in face-and-voice, face-only, and voice-only formats. The set of 7356 recordings were each rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity and test-retest intrarater reliability were reported. Corrected accuracy and composite "goodness" measures are presented to assist researchers in the selection of stimuli. All recordings are made freely available under a Creative Commons license and can be downloaded at https://doi.org/10.5281/zenodo.1188976. PMID:29768426
van Rossum, Huub H; Kemperman, Hans
2017-02-01
To date, no practical tools are available to obtain optimal settings for moving average (MA) as a continuous analytical quality control instrument. Also, there is no knowledge of the true bias detection properties of applied MA. We describe the use of bias detection curves for MA optimization and MA validation charts for validation of MA. MA optimization was performed on a data set of previously obtained consecutive assay results. Bias introduction and MA bias detection were simulated for multiple MA procedures (combination of truncation limits, calculation algorithms and control limits) and performed for various biases. Bias detection curves were generated by plotting the median number of test results needed for bias detection against the simulated introduced bias. In MA validation charts the minimum, median, and maximum numbers of assay results required for MA bias detection are shown for various bias. Their use was demonstrated for sodium, potassium, and albumin. Bias detection curves allowed optimization of MA settings by graphical comparison of bias detection properties of multiple MA. The optimal MA was selected based on the bias detection characteristics obtained. MA validation charts were generated for selected optimal MA and provided insight into the range of results required for MA bias detection. Bias detection curves and MA validation charts are useful tools for optimization and validation of MA procedures.
WEC-SIM Phase 1 Validation Testing -- Numerical Modeling of Experiments: Preprint
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ruehl, Kelley; Michelen, Carlos; Bosma, Bret
2016-08-01
The Wave Energy Converter Simulator (WEC-Sim) is an open-source code jointly developed by Sandia National Laboratories and the National Renewable Energy Laboratory. It is used to model wave energy converters subjected to operational and extreme waves. In order for the WEC-Sim code to be beneficial to the wave energy community, code verification and physical model validation is necessary. This paper describes numerical modeling of the wave tank testing for the 1:33-scale experimental testing of the floating oscillating surge wave energy converter. The comparison between WEC-Sim and the Phase 1 experimental data set serves as code validation. This paper is amore » follow-up to the WEC-Sim paper on experimental testing, and describes the WEC-Sim numerical simulations for the floating oscillating surge wave energy converter.« less
Thermal-Hydraulic Results for the Boiling Water Reactor Dry Cask Simulator.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Durbin, Samuel; Lindgren, Eric R.
The thermal performance of commercial nuclear spent fuel dry storage casks is evaluated through detailed numerical analysis. These modeling efforts are completed by the vendor to demonstrate performance and regulatory compliance. The calculations are then independently verified by the Nuclear Regulatory Commission (NRC). Carefully measured data sets generated from testing of full sized casks or smaller cask analogs are widely recognized as vital for validating these models. Recent advances in dry storage cask designs have significantly increased the maximum thermal load allowed in a cask in part by increasing the efficiency of internal conduction pathways and by increasing the internalmore » convection through greater canister helium pressure. These same canistered cask systems rely on ventilation between the canister and the overpack to convect heat away from the canister to the environment for both aboveground and belowground configurations. While several testing programs have been previously conducted, these earlier validation attempts did not capture the effects of elevated helium pressures or accurately portray the external convection of aboveground and belowground canistered dry cask systems. The purpose of this investigation was to produce validation-quality data that can be used to test the validity of the modeling presently used to determine cladding temperatures in modern vertical dry casks. These cladding temperatures are critical to evaluate cladding integrity throughout the storage cycle. To produce these data sets under well-controlled boundary conditions, the dry cask simulator (DCS) was built to study the thermal-hydraulic response of fuel under a variety of heat loads, internal vessel pressures, and external configurations. An existing electrically heated but otherwise prototypic BWR Incoloy-clad test assembly was deployed inside of a representative storage basket and cylindrical pressure vessel that represents a vertical canister system. The symmetric single assembly geometry with well-controlled boundary conditions simplified interpretation of results. Two different arrangements of ducting were used to mimic conditions for aboveground and belowground storage configurations for vertical, dry cask systems with canisters. Transverse and axial temperature profiles were measured throughout the test assembly. The induced air mass flow rate was measured for both the aboveground and belowground configurations. In addition, the impact of cross-wind conditions on the belowground configuration was quantified. Over 40 unique data sets were collected and analyzed for these efforts. Fourteen data sets for the aboveground configuration were recorded for powers and internal pressures ranging from 0.5 to 5.0 kW and 0.3 to 800 kPa absolute, respectively. Similarly, fourteen data sets were logged for the belowground configuration starting at ambient conditions and concluding with thermal-hydraulic steady state. Over thirteen tests were conducted using a custom-built wind machine. The results documented in this report highlight a small, but representative, subset of the available data from this test series. This addition to the dry cask experimental database signifies a substantial addition of first-of-a-kind, high-fidelity transient and steady-state thermal-hydraulic data sets suitable for CFD model validation.« less
Extended version of the "Sniffin' Sticks" identification test: test-retest reliability and validity.
Sorokowska, A; Albrecht, E; Haehner, A; Hummel, T
2015-03-30
The extended, 32-item version of the Sniffin' Sticks identification test was developed in order to create a precise tool enabling repeated, longitudinal testing of individual olfactory subfunctions. Odors of the previous test version had to be changed for technical reasons, and the odor identification test needed re-investigation in terms of reliability, validity, and normative values. In our study we investigated olfactory abilities of a group of 100 patients with olfactory dysfunction and 100 controls. We reconfirmed the high test-retest reliability of the extended version of the Sniffin' Sticks identification test and high correlations between the new and the original part of this tool. In addition, we confirmed the validity of the test as it discriminated clearly between controls and patients with olfactory loss. The additional set of 16 odor identification sticks can be either included in the current olfactory test, thus creating a more detailed diagnosis tool, or it can be used separately, enabling to follow olfactory function over time. Additionally, the normative values presented in our paper might provide useful guidelines for interpretation of the extended identification test results. The revised version of the Sniffin' Sticks 32-item odor identification test is a reliable and valid tool for the assessment of olfactory function. Copyright © 2015 Elsevier B.V. All rights reserved.
An Application-Based Discussion of Construct Validity and Internal Consistency Reliability.
ERIC Educational Resources Information Center
Taylor, Dianne L.; Campbell, Kathleen T.
Several techniques for conducting studies of measurement integrity are explained and illustrated using a heuristic data set from a study of teachers' participation in decision making (D. L. Taylor, 1991). The sample consisted of 637 teachers. It is emphasized that validity and reliability are characteristics of data, and do not inure to tests as…
Khashan, Raed; Zheng, Weifan; Tropsha, Alexander
2014-03-01
We present a novel approach to generating fragment-based molecular descriptors. The molecules are represented by labeled undirected chemical graph. Fast Frequent Subgraph Mining (FFSM) is used to find chemical-fragments (subgraphs) that occur in at least a subset of all molecules in a dataset. The collection of frequent subgraphs (FSG) forms a dataset-specific descriptors whose values for each molecule are defined by the number of times each frequent fragment occurs in this molecule. We have employed the FSG descriptors to develop variable selection k Nearest Neighbor (kNN) QSAR models of several datasets with binary target property including Maximum Recommended Therapeutic Dose (MRTD), Salmonella Mutagenicity (Ames Genotoxicity), and P-Glycoprotein (PGP) data. Each dataset was divided into training, test, and validation sets to establish the statistical figures of merit reflecting the model validated predictive power. The classification accuracies of models for both training and test sets for all datasets exceeded 75 %, and the accuracy for the external validation sets exceeded 72 %. The model accuracies were comparable or better than those reported earlier in the literature for the same datasets. Furthermore, the use of fragment-based descriptors affords mechanistic interpretation of validated QSAR models in terms of essential chemical fragments responsible for the compounds' target property. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Pharmacophore Based Virtual Screening Approach to Identify Selective PDE4B Inhibitors
Gaurav, Anand; Gautam, Vertika
2017-01-01
Phosphodiesterase 4 (PDE4) has been established as a promising target in asthma and chronic obstructive pulmonary disease. PDE4B subtype selective inhibitors are known to reduce the dose limiting adverse effect associated with non-selective PDE4B inhibitors. This makes the development of PDE4B subtype selective inhibitors a desirable research goal. To achieve this goal, ligand based pharmacophore modeling approach is employed. Separate pharmacophore hypotheses for PDE4B and PDE4D inhibitors were generated using HypoGen algorithm and 106 PDE4 inhibitors from literature having thiopyrano [3,2-d] Pyrimidines, 2-arylpyrimidines, and triazines skeleton. Suitable training and test sets were created using the molecules as per the guidelines available for HypoGen program. Training set was used for hypothesis development while test set was used for validation purpose. Fisher validation was also used to test the significance of the developed hypothesis. The validated pharmacophore hypotheses for PDE4B and PDE4D inhibitors were used in sequential virtual screening of zinc database of drug like molecules to identify selective PDE4B inhibitors. The hits were screened for their estimated activity and fit value. The top hit was subjected to docking into the active sites of PDE4B and PDE4D to confirm its selectivity for PDE4B. The hits are proposed to be evaluated further using in-vitro assays. PMID:29201082
K(3)EDTA Vacuum Tubes Validation for Routine Hematological Testing.
Lima-Oliveira, Gabriel; Lippi, Giuseppe; Salvagno, Gian Luca; Montagnana, Martina; Poli, Giovanni; Solero, Giovanni Pietro; Picheth, Geraldo; Guidi, Gian Cesare
2012-01-01
Background and Objective. Some in vitro diagnostic devices (e.g, blood collection vacuum tubes and syringes for blood analyses) are not validated before the quality laboratory managers decide to start using or to change the brand. Frequently, the laboratory or hospital managers select the vacuum tubes for blood collection based on cost considerations or on relevance of a brand. The aim of this study was to validate two dry K(3)EDTA vacuum tubes of different brands for routine hematological testing. Methods. Blood specimens from 100 volunteers in two different K(3)EDTA vacuum tubes were collected by a single, expert phlebotomist. The routine hematological testing was done on Advia 2120i hematology system. The significance of the differences between samples was assessed by paired Student's t-test after checking for normality. The level of statistical significance was set at P < 0.05. Results and Conclusions. Different brand's tubes evaluated can represent a clinically relevant source of variations only on mean platelet volume (MPV) and platelet distribution width (PDW). Basically, our validation will permit the laboratory or hospital managers to select the brand's vacuum tubes validated according to him/her technical or economical reasons for routine hematological tests.
K3EDTA Vacuum Tubes Validation for Routine Hematological Testing
Lima-Oliveira, Gabriel; Lippi, Giuseppe; Salvagno, Gian Luca; Montagnana, Martina; Poli, Giovanni; Solero, Giovanni Pietro; Picheth, Geraldo; Guidi, Gian Cesare
2012-01-01
Background and Objective. Some in vitro diagnostic devices (e.g, blood collection vacuum tubes and syringes for blood analyses) are not validated before the quality laboratory managers decide to start using or to change the brand. Frequently, the laboratory or hospital managers select the vacuum tubes for blood collection based on cost considerations or on relevance of a brand. The aim of this study was to validate two dry K3EDTA vacuum tubes of different brands for routine hematological testing. Methods. Blood specimens from 100 volunteers in two different K3EDTA vacuum tubes were collected by a single, expert phlebotomist. The routine hematological testing was done on Advia 2120i hematology system. The significance of the differences between samples was assessed by paired Student's t-test after checking for normality. The level of statistical significance was set at P < 0.05. Results and Conclusions. Different brand's tubes evaluated can represent a clinically relevant source of variations only on mean platelet volume (MPV) and platelet distribution width (PDW). Basically, our validation will permit the laboratory or hospital managers to select the brand's vacuum tubes validated according to him/her technical or economical reasons for routine hematological tests. PMID:22888448
Standard Specimen Reference Set: Pancreatic — EDRN Public Portal
The primary objective of the EDRN Pancreatic Cancer Working Group Proposal is to create a reference set consisting of well-characterized serum/plasma specimens to use as a resource for the development of biomarkers for the early detection of pancreatic adenocarcinoma. The testing of biomarkers on the same sample set permits direct comparison among them; thereby, allowing the development of a biomarker panel that can be evaluated in a future validation study. Additionally, the establishment of an infrastructure with core data elements and standardized operating procedures for specimen collection, processing and storage, will provide the necessary preparatory platform for larger validation studies when the appropriate marker/panel for pancreatic adenocarcinoma has been identified.
Does sensitivity measured from screening test-sets predict clinical performance?
NASA Astrophysics Data System (ADS)
Soh, BaoLin P.; Lee, Warwick B.; Mello-Thoms, Claudia R.; Tapia, Kriscia A.; Ryan, John; Hung, Wai Tak; Thompson, Graham J.; Heard, Rob; Brennan, Patrick C.
2014-03-01
Aim: To examine the relationship between sensitivity measured from the BREAST test-set and clinical performance. Background: Although the UK and Australia national breast screening programs have regarded PERFORMS and BREAST test-set strategies as possible methods of estimating readers' clinical efficacy, the relationship between test-set and real life performance results has never been satisfactorily understood. Methods: Forty-one radiologists from BreastScreen New South Wales participated in this study. Each reader interpreted a BREAST test-set which comprised sixty de-identified mammographic examinations sourced from the BreastScreen Digital Imaging Library. Spearman's rank correlation coefficient was used to compare the sensitivity measured from the BREAST test-set with screen readers' clinical audit data. Results: Results shown statistically significant positive moderate correlations between test-set sensitivity and each of the following metrics: rate of invasive cancer per 10 000 reads (r=0.495; p < 0.01); rate of small invasive cancer per 10 000 reads (r=0.546; p < 0.001); detection rate of all invasive cancers and DCIS per 10 000 reads (r=0.444; p < 0.01). Conclusion: Comparison between sensitivity measured from the BREAST test-set and real life detection rate demonstrated statistically significant positive moderate correlations which validated that such test-set strategies can reflect readers' clinical performance and be used as a quality assurance tool. The strength of correlation demonstrated in this study was higher than previously found by others.
Roalf, David R; Moore, Tyler M; Wolk, David A; Arnold, Steven E; Mechanic-Hamilton, Dawn; Rick, Jacqueline; Kabadi, Sushila; Ruparel, Kosha; Chen-Plotkin, Alice S; Chahine, Lama M; Dahodwala, Nabila A; Duda, John E; Weintraub, Daniel A; Moberg, Paul J
2016-01-01
Introduction Screening for cognitive deficits is essential in neurodegenerative disease. Screening tests, such as the Montreal Cognitive Assessment (MoCA), are easily administered, correlate with neuropsychological performance and demonstrate diagnostic utility. Yet, administration time is too long for many clinical settings. Methods Item response theory and computerised adaptive testing simulation were employed to establish an abbreviated MoCA in 1850 well-characterised community-dwelling individuals with and without neurodegenerative disease. Results 8 MoCA items with high item discrimination and appropriate difficulty were identified for use in a short form (s-MoCA). The s-MoCA was highly correlated with the original MoCA, showed robust diagnostic classification and cross-validation procedures substantiated these items. Discussion Early detection of cognitive impairment is an important clinical and public health concern, but administration of screening measures is limited by time constraints in demanding clinical settings. Here, we provide as-MoCA that is valid across neurological disorders and can be administered in approximately 5 min. PMID:27071646
Fang, Jiansong; Yang, Ranyao; Gao, Li; Zhou, Dan; Yang, Shengqian; Liu, Ai-Lin; Du, Guan-hua
2013-11-25
Butyrylcholinesterase (BuChE, EC 3.1.1.8) is an important pharmacological target for Alzheimer's disease (AD) treatment. However, the currently available BuChE inhibitor screening assays are expensive, labor-intensive, and compound-dependent. It is necessary to develop robust in silico methods to predict the activities of BuChE inhibitors for the lead identification. In this investigation, support vector machine (SVM) models and naive Bayesian models were built to discriminate BuChE inhibitors (BuChEIs) from the noninhibitors. Each molecule was initially represented in 1870 structural descriptors (1235 from ADRIANA.Code, 334 from MOE, and 301 from Discovery studio). Correlation analysis and stepwise variable selection method were applied to figure out activity-related descriptors for prediction models. Additionally, structural fingerprint descriptors were added to improve the predictive ability of models, which were measured by cross-validation, a test set validation with 1001 compounds and an external test set validation with 317 diverse chemicals. The best two models gave Matthews correlation coefficient of 0.9551 and 0.9550 for the test set and 0.9132 and 0.9221 for the external test set. To demonstrate the practical applicability of the models in virtual screening, we screened an in-house data set with 3601 compounds, and 30 compounds were selected for further bioactivity assay. The assay results showed that 10 out of 30 compounds exerted significant BuChE inhibitory activities with IC50 values ranging from 0.32 to 22.22 μM, at which three new scaffolds as BuChE inhibitors were identified for the first time. To our best knowledge, this is the first report on BuChE inhibitors using machine learning approaches. The models generated from SVM and naive Bayesian approaches successfully predicted BuChE inhibitors. The study proved the feasibility of a new method for predicting bioactivities of ligands and discovering novel lead compounds.
Translating and validating a Training Needs Assessment tool into Greek
Markaki, Adelais; Antonakis, Nikos; Hicks, Carolyn M; Lionis, Christos
2007-01-01
Background The translation and cultural adaptation of widely accepted, psychometrically tested tools is regarded as an essential component of effective human resource management in the primary care arena. The Training Needs Assessment (TNA) is a widely used, valid instrument, designed to measure professional development needs of health care professionals, especially in primary health care. This study aims to describe the translation, adaptation and validation of the TNA questionnaire into Greek language and discuss possibilities of its use in primary care settings. Methods A modified version of the English self-administered questionnaire consisting of 30 items was used. Internationally recommended methodology, mandating forward translation, backward translation, reconciliation and pretesting steps, was followed. Tool validation included assessing item internal consistency, using the alpha coefficient of Cronbach. Reproducibility (test – retest reliability) was measured by the kappa correlation coefficient. Criterion validity was calculated for selected parts of the questionnaire by correlating respondents' research experience with relevant research item scores. An exploratory factor analysis highlighted how the items group together, using a Varimax (oblique) rotation and subsequent Cronbach's alpha assessment. Results The psychometric properties of the Greek version of the TNA questionnaire for nursing staff employed in primary care were good. Internal consistency of the instrument was very good, Cronbach's alpha was found to be 0.985 (p < 0.001) and Kappa coefficient for reproducibility was found to be 0.928 (p < 0.0001). Significant positive correlations were found between respondents' current performance levels on each of the research items and amount of research involvement, indicating good criterion validity in the areas tested. Factor analysis revealed seven factors with eigenvalues of > 1.0, KMO (Kaiser-Meyer-Olkin) measure of sampling adequacy = 0.680 and Bartlett's test of sphericity, p < 0.001. Conclusion The translated and adapted Greek version is comparable with the original English instrument in terms of validity and reliability and it is suitable to assess professional development needs of nursing staff in Greek primary care settings. PMID:17474989
Ward, Dianne S; Mazzucca, Stephanie; McWilliams, Christina; Hales, Derek
2015-09-26
Early care and education (ECE) centers are important settings influencing young children's diet and physical activity (PA) behaviors. To better understand their impact on diet and PA behaviors as well as to evaluate public health programs aimed at ECE settings, we developed and tested the Environment and Policy Assessment and Observation - Self-Report (EPAO-SR), a self-administered version of the previously validated, researcher-administered EPAO. Development of the EPAO-SR instrument included modification of items from the EPAO, community advisory group and expert review, and cognitive interviews with center directors and classroom teachers. Reliability and validity data were collected across 4 days in 3-5 year old classrooms in 50 ECE centers in North Carolina. Center teachers and directors completed relevant portions of the EPAO-SR on multiple days according to a standardized protocol, and trained data collectors completed the EPAO for 4 days in the centers. Reliability and validity statistics calculated included percent agreement, kappa, correlation coefficients, coefficients of variation, deviations, mean differences, and intraclass correlation coefficients (ICC), depending on the response option of the item. Data demonstrated a range of reliability and validity evidence for the EPAO-SR instrument. Reporting from directors and classroom teachers was consistent and similar to the observational data. Items that produced strongest reliability and validity estimates included beverages served, outside time, and physical activity equipment, while items such as whole grains served and amount of teacher-led PA had lower reliability (observation and self-report) and validity estimates. To overcome lower reliability and validity estimates, some items need administration on multiple days. This study demonstrated appropriate reliability and validity evidence for use of the EPAO-SR in the field. The self-administered EPAO-SR is an advancement of the measurement of ECE settings and can be used by researchers and practitioners to assess the nutrition and physical activity environments of ECE settings.
Chen, Hongda; Werner, Simone; Butt, Julia; Zörnig, Inka; Knebel, Phillip; Michel, Angelika; Eichmüller, Stefan B.; Jäger, Dirk; Waterboer, Tim; Pawlita, Michael; Brenner, Hermann
2016-01-01
Novel blood-based screening tests are strongly desirable for early detection of colorectal cancer (CRC). We aimed to identify and evaluate autoantibodies against tumor-associated antigens as biomarkers for early detection of CRC. 380 clinically identified CRC patients and samples of participants with selected findings from a cohort of screening colonoscopy participants in 2005–2013 (N=6826) were included in this analysis. Sixty-four serum autoantibody markers were measured by multiplex bead-based serological assays. A two-step approach with selection of biomarkers in a training set, and validation of findings in a validation set, the latter exclusively including participants from the screening setting, was applied. Anti-MAGEA4 exhibited the highest sensitivity for detecting early stage CRC and advanced adenoma. Multi-marker combinations substantially increased sensitivity at the price of a moderate loss of specificity. Anti-TP53, anti-IMPDH2, anti-MDM2 and anti-MAGEA4 were consistently included in the best-performing 4-, 5-, and 6-marker combinations. This four-marker panel yielded a sensitivity of 26% (95% CI, 13–45%) for early stage CRC at a specificity of 90% (95% CI, 83–94%) in the validation set. Notably, it also detected 20% (95% CI, 13–29%) of advanced adenomas. Taken together, the identified biomarkers could contribute to the development of a useful multi-marker blood-based test for CRC early detection. PMID:26909861
ERIC Educational Resources Information Center
van der Meulen, Ineke; van de Sandt-Koenderman, W. Mieke E.; Duivenvoorden, Hugo J.; Ribbers, Gerard M.
2010-01-01
Background: This study explores the psychometric qualities of the Scenario Test, a new test to assess daily-life communication in severe aphasia. The test is innovative in that it: (1) examines the effectiveness of verbal and non-verbal communication; and (2) assesses patients' communication in an interactive setting, with a supportive…
Johansson, Fredrik R.; Skillgate, Eva; Lapauw, Mattis L.; Clijmans, Dorien; Deneulin, Valentijn P.; Palmans, Tanneke; Engineer, Human Kinetic; Cools, Ann M.
2015-01-01
Context Shoulder strength assessment plays an important role in the clinical examination of the shoulder region. Eccentric strength measurements are of special importance in guiding the clinician in injury prevention or return-to-play decisions after injury. Objective To examine the absolute and relative reliability and validity of a standardized eccentric strength-measurement protocol for the glenohumeral external rotators. Design Descriptive laboratory study. Setting Testing environment at the Department of Rehabilitation Sciences and Physiotherapy of Ghent University, Belgium. Patients or Other Participants Twenty-five healthy participants (9 men and 16 women) without any history of shoulder pain were tested by 2 independent assessors using a handheld dynamometer (HHD) and underwent an isokinetic testing procedure. Intervention(s) The clinical protocol used an HHD, a DynaPort accelerometer to measure acceleration and angular velocity of testing 30°/s over 90° of range of motion, and a Biodex dynamometer to measure isokinetic activity. Main Outcome Measure(s) Three eccentric strength measurements: (1) tester 1 with the HHD, (2) tester 2 with the HHD, and (3) Biodex isokinetic strength measurement. Results The intratester reliability was excellent (0.879 and 0.858), whereas the intertester reliability was good, with an intraclass correlation coefficient between testers of 0.714. Pearson product moment correlation coefficients of 0.78 and 0.70 were noted between the HHD and the isokinetic data, showing good validity of this new procedure. Conclusions Standardized eccentric rotator cuff strength can be tested and measured in the clinical setting with good-to-excellent reliability and validity using an HHD. PMID:25974381
Poghosyan, Lusine; Nannini, Angela; Finkelstein, Stacey R; Mason, Emanuel; Shaffer, Jonathan A
2013-01-01
Policy makers and healthcare organizations are calling for expansion of the nurse practitioner (NP) workforce in primary care settings to assure timely access and high-quality care for the American public. However, many barriers, including those at the organizational level, exist that may undermine NP workforce expansion and their optimal utilization in primary care. This study developed a new NP-specific survey instrument, Nurse Practitioner Primary Care Organizational Climate Questionnaire (NP-PCOCQ), to measure organizational climate in primary care settings and conducted its psychometric testing. Using instrument development design, the organizational climate domain pertinent for primary care NPs was identified. Items were generated from the evidence and qualitative data. Face and content validity were established through two expert meetings. Content validity index was computed. The 86-item pool was reduced to 55 items, which was pilot tested with 81 NPs using mailed surveys and then field-tested with 278 NPs in New York State. SPSS 18 and Mplus software were used for item analysis, reliability testing, and maximum likelihood exploratory factor analysis. Nurse Practitioner Primary Care Organizational Climate Questionnaire had face and content validity. The content validity index was .90. Twenty-nine items loaded on four subscale factors: professional visibility, NP-administration relations, NP-physician relations, and independent practice and support. The subscales had high internal consistency reliability. Cronbach's alphas ranged from.87 to .95. Having a strong instrument is important to promote future research. Also, administrators can use it to assess organizational climate in their clinics and propose interventions to improve it, thus promoting NP practice and the expansion of NP workforce.
Optimization of multilayer neural network parameters for speaker recognition
NASA Astrophysics Data System (ADS)
Tovarek, Jaromir; Partila, Pavol; Rozhon, Jan; Voznak, Miroslav; Skapa, Jan; Uhrin, Dominik; Chmelikova, Zdenka
2016-05-01
This article discusses the impact of multilayer neural network parameters for speaker identification. The main task of speaker identification is to find a specific person in the known set of speakers. It means that the voice of an unknown speaker (wanted person) belongs to a group of reference speakers from the voice database. One of the requests was to develop the text-independent system, which means to classify wanted person regardless of content and language. Multilayer neural network has been used for speaker identification in this research. Artificial neural network (ANN) needs to set parameters like activation function of neurons, steepness of activation functions, learning rate, the maximum number of iterations and a number of neurons in the hidden and output layers. ANN accuracy and validation time are directly influenced by the parameter settings. Different roles require different settings. Identification accuracy and ANN validation time were evaluated with the same input data but different parameter settings. The goal was to find parameters for the neural network with the highest precision and shortest validation time. Input data of neural networks are a Mel-frequency cepstral coefficients (MFCC). These parameters describe the properties of the vocal tract. Audio samples were recorded for all speakers in a laboratory environment. Training, testing and validation data set were split into 70, 15 and 15 %. The result of the research described in this article is different parameter setting for the multilayer neural network for four speakers.
Documenting historical data and accessing it on the World Wide Web
Malchus B. Baker; Daniel P. Huebner; Peter F. Ffolliott
2000-01-01
New computer technologies facilitate the storage, retrieval, and summarization of watershed-based data sets on the World Wide Web. These data sets are used by researchers when testing and validating predictive models, managers when planning and implementing watershed management practices, educators when learning about hydrologic processes, and decisionmakers when...
Identifying and Evaluating External Validity Evidence for Passing Scores
ERIC Educational Resources Information Center
Davis-Becker, Susan L.; Buckendahl, Chad W.
2013-01-01
A critical component of the standard setting process is collecting evidence to evaluate the recommended cut scores and their use for making decisions and classifying students based on test performance. Kane (1994, 2001) proposed a framework by which practitioners can identify and evaluate evidence of the results of the standard setting from (1)…
Measuring Mindfulness in Summer Camp Staff
ERIC Educational Resources Information Center
Gillard, Ann; Roark, Mark F.; Nyaga, Lewis Ramsey Kanyiba; Bialeschki, M. Deborah
2011-01-01
Examining mindfulness in a non-clinical and non-therapeutic setting such as a summer camp is an area of growing interest. Our study tested three mindfulness scales with staff in a summer camp setting, and we conducted preliminary reliability and validity analyses for any modifications needed in the scales. Results indicated two major findings: (a)…
NASA Astrophysics Data System (ADS)
Shevade, Abhijit V.; Ryan, Margaret A.; Homer, Margie L.; Zhou, Hanying; Manfreda, Allison M.; Lara, Liana M.; Yen, Shiao-Pin S.; Jewell, April D.; Manatt, Kenneth S.; Kisor, Adam K.
We have developed a Quantitative Structure-Activity Relationships (QSAR) based approach to correlate the response of chemical sensors in an array with molecular descriptors. A novel molecular descriptor set has been developed; this set combines descriptors of sensing film-analyte interactions, representing sensor response, with a basic analyte descriptor set commonly used in QSAR studies. The descriptors are obtained using a combination of molecular modeling tools and empirical and semi-empirical Quantitative Structure-Property Relationships (QSPR) methods. The sensors under investigation are polymer-carbon sensing films which have been exposed to analyte vapors at parts-per-million (ppm) concentrations; response is measured as change in film resistance. Statistically validated QSAR models have been developed using Genetic Function Approximations (GFA) for a sensor array for a given training data set. The applicability of the sensor response models has been tested by using it to predict the sensor activities for test analytes not considered in the training set for the model development. The validated QSAR sensor response models show good predictive ability. The QSAR approach is a promising computational tool for sensing materials evaluation and selection. It can also be used to predict response of an existing sensing film to new target analytes.
Measuring the Process and Quality of Informed Consent for Clinical Research: Development and Testing
Cohn, Elizabeth Gross; Jia, Haomiao; Smith, Winifred Chapman; Erwin, Katherine; Larson, Elaine L.
2013-01-01
Purpose/Objectives To develop and assess the reliability and validity of an observational instrument, the Process and Quality of Informed Consent (P-QIC). Design A pilot study of the psychometrics of a tool designed to measure the quality and process of the informed consent encounter in clinical research. The study used professionally filmed, simulated consent encounters designed to vary in process and quality. Setting A major urban teaching hospital in the northeastern region of the United States. Sample 63 students enrolled in health-related programs participated in psychometric testing, 16 students participated in test-retest reliability, and 5 investigator-participant dyads were observed for the actual consent encounters. Methods For reliability and validity testing, students watched and rated videotaped simulations of four consent encounters intentionally varied in process and content and rated them with the proposed instrument. Test-retest reliability was established by raters watching the videotaped simulations twice. Inter-rater reliability was demonstrated by two simultaneous but independent raters observing an actual consent encounter. Main Research Variables The essential elements of information and communication for informed consent. Findings The initial testing of the P-QIC demonstrated reliable and valid psychometric properties in both the simulated standardized consent encounters and actual consent encounters in the hospital setting. Conclusions The P-QIC is an easy-to-use observational tool that provides a quick assessment of the areas of strength and areas that need improvement in a consent encounter. It can be used in the initial trainings of new investigators or consent administrators and in ongoing programs of improvement for informed consent. Implications for Nursing The development of a validated observational instrument will allow investigators to assess the consent process more accurately and evaluate strategies designed to improve it. PMID:21708532
CFD Modeling Needs and What Makes a Good Supersonic Combustion Validation Experiment
NASA Technical Reports Server (NTRS)
Gaffney, Richard L., Jr.; Cutler, Andrew D.
2005-01-01
If a CFD code/model developer is asked what experimental data he wants to validate his code or numerical model, his answer will be: "Everything, everywhere, at all times." Since this is not possible, practical, or even reasonable, the developer must understand what can be measured within the limits imposed by the test article, the test location, the test environment and the available diagnostic equipment. At the same time, it is important for the expermentalist/diagnostician to understand what the CFD developer needs (as opposed to wants) in order to conduct a useful CFD validation experiment. If these needs are not known, it is possible to neglect easily measured quantities at locations needed by the developer, rendering the data set useless for validation purposes. It is also important for the experimentalist/diagnostician to understand what the developer is trying to validate so that the experiment can be designed to isolate (as much as possible) the effects of a particular physical phenomena that is associated with the model to be validated. The probability of a successful validation experiment can be greatly increased if the two groups work together, each understanding the needs and limitations of the other.
ERIC Educational Resources Information Center
Feldt, Leonard S.
2004-01-01
In some settings, the validity of a battery composite or a test score is enhanced by weighting some parts or items more heavily than others in the total score. This article describes methods of estimating the total score reliability coefficient when differential weights are used with items or parts.
Hagger, Martin S.; Gucciardi, Daniel F.; Chatzisarantis, Nikos L. D.
2017-01-01
Tests of social cognitive theories provide informative data on the factors that relate to health behavior, and the processes and mechanisms involved. In the present article, we contend that tests of social cognitive theories should adhere to the principles of nomological validity, defined as the degree to which predictions in a formal theoretical network are confirmed. We highlight the importance of nomological validity tests to ensure theory predictions can be disconfirmed through observation. We argue that researchers should be explicit on the conditions that lead to theory disconfirmation, and identify any auxiliary assumptions on which theory effects may be conditional. We contend that few researchers formally test the nomological validity of theories, or outline conditions that lead to model rejection and the auxiliary assumptions that may explain findings that run counter to hypotheses, raising potential for ‘falsification evasion.’ We present a brief analysis of studies (k = 122) testing four key social cognitive theories in health behavior to illustrate deficiencies in reporting theory tests and evaluations of nomological validity. Our analysis revealed that few articles report explicit statements suggesting that their findings support or reject the hypotheses of the theories tested, even when findings point to rejection. We illustrate the importance of explicit a priori specification of fundamental theory hypotheses and associated auxiliary assumptions, and identification of the conditions which would lead to rejection of theory predictions. We also demonstrate the value of confirmatory analytic techniques, meta-analytic structural equation modeling, and Bayesian analyses in providing robust converging evidence for nomological validity. We provide a set of guidelines for researchers on how to adopt and apply the nomological validity approach to testing health behavior models. PMID:29163307
Test One to Test Many: A Unified Approach to Quantum Benchmarks
NASA Astrophysics Data System (ADS)
Bai, Ge; Chiribella, Giulio
2018-04-01
Quantum benchmarks are routinely used to validate the experimental demonstration of quantum information protocols. Many relevant protocols, however, involve an infinite set of input states, of which only a finite subset can be used to test the quality of the implementation. This is a problem, because the benchmark for the finitely many states used in the test can be higher than the original benchmark calculated for infinitely many states. This situation arises in the teleportation and storage of coherent states, for which the benchmark of 50% fidelity is commonly used in experiments, although finite sets of coherent states normally lead to higher benchmarks. Here, we show that the average fidelity over all coherent states can be indirectly probed with a single setup, requiring only two-mode squeezing, a 50-50 beam splitter, and homodyne detection. Our setup enables a rigorous experimental validation of quantum teleportation, storage, amplification, attenuation, and purification of noisy coherent states. More generally, we prove that every quantum benchmark can be tested by preparing a single entangled state and measuring a single observable.
New public QSAR model for carcinogenicity
2010-01-01
Background One of the main goals of the new chemical regulation REACH (Registration, Evaluation and Authorization of Chemicals) is to fulfill the gaps in data concerned with properties of chemicals affecting the human health. (Q)SAR models are accepted as a suitable source of information. The EU funded CAESAR project aimed to develop models for prediction of 5 endpoints for regulatory purposes. Carcinogenicity is one of the endpoints under consideration. Results Models for prediction of carcinogenic potency according to specific requirements of Chemical regulation were developed. The dataset of 805 non-congeneric chemicals extracted from Carcinogenic Potency Database (CPDBAS) was used. Counter Propagation Artificial Neural Network (CP ANN) algorithm was implemented. In the article two alternative models for prediction carcinogenicity are described. The first model employed eight MDL descriptors (model A) and the second one twelve Dragon descriptors (model B). CAESAR's models have been assessed according to the OECD principles for the validation of QSAR. For the model validity we used a wide series of statistical checks. Models A and B yielded accuracy of training set (644 compounds) equal to 91% and 89% correspondingly; the accuracy of the test set (161 compounds) was 73% and 69%, while the specificity was 69% and 61%, respectively. Sensitivity in both cases was equal to 75%. The accuracy of the leave 20% out cross validation for the training set of models A and B was equal to 66% and 62% respectively. To verify if the models perform correctly on new compounds the external validation was carried out. The external test set was composed of 738 compounds. We obtained accuracy of external validation equal to 61.4% and 60.0%, sensitivity 64.0% and 61.8% and specificity equal to 58.9% and 58.4% respectively for models A and B. Conclusion Carcinogenicity is a particularly important endpoint and it is expected that QSAR models will not replace the human experts opinions and conventional methods. However, we believe that combination of several methods will provide useful support to the overall evaluation of carcinogenicity. In present paper models for classification of carcinogenic compounds using MDL and Dragon descriptors were developed. Models could be used to set priorities among chemicals for further testing. The models at the CAESAR site were implemented in java and are publicly accessible. PMID:20678182
NASA Astrophysics Data System (ADS)
Well, Reinhard; Böttcher, Jürgen; Butterbach-Bahl, Klaus; Dannenmann, Michael; Deppe, Marianna; Dittert, Klaus; Dörsch, Peter; Horn, Marcus; Ippisch, Olaf; Mikutta, Robert; Müller, Carsten; Müller, Christoph; Senbayram, Mehmet; Vogel, Hans-Jörg; Wrage-Mönnig, Nicole
2016-04-01
Robust denitrification data suitable to validate soil N2 fluxes in denitrification models are scarce due to methodical limitations and the extreme spatio-temporal heterogeneity of denitrification in soils. Numerical models have become essential tools to predict denitrification at different scales. Model performance could either be tested for total gaseous flux (NO + N2O + N2), individual denitrification products (e.g. N2O and/or NO) or for the effect of denitrification factors (e.g. C-availability, respiration, diffusivity, anaerobic volume, etc.). While there are numerous examples for validating N2O fluxes, there are neither robust field data of N2 fluxes nor sufficiently resolved measurements of control factors used as state variables in the models. To the best of our knowledge there has been only one published validation of modelled soil N2 flux by now, using a laboratory data set to validate an ecosystem model. Hence there is a need for validation data at both, the mesocosm and the field scale including validation of individual denitrification controls. Here we present the concept for collecting model validation data which is be part of the DFG-research unit "Denitrification in Agricultural Soils: Integrated Control and Modelling at Various Scales (DASIM)" starting this year. We will use novel approaches including analysis of stable isotopes, microbial communities, pores structure and organic matter fractions to provide denitrification data sets comprising as much detail on activity and regulation as possible as a basis to validate existing and calibrate new denitrification models that are applied and/or developed by DASIM subprojects. The basic idea is to simulate "field-like" conditions as far as possible in an automated mesocosm system without plants in order to mimic processes in the soil parts not significantly influenced by the rhizosphere (rhizosphere soils are studied by other DASIM projects). Hence, to allow model testing in a wide range of conditions, denitrification control factors will be varied in the initial settings (pore volume, plant residues, mineral N, pH) but also over time, where moisture, temperature, and mineral N will be manipulated according to typical time patterns in the field. This will be realized by including precipitation events, fertilization (via irrigation), drainage (via water potential) and temperature in the course of incubations. Moreover, oxygen concentration will be varied to simulate anaerobic events. These data will be used to calibrate the newly to develop DASIM models as well as existing denitrification models. One goal of DASIM is to create a public data base as a joint basis for model testing by denitrification modellers. Therefore we invite contributions of suitable data-sets from the scientific community. Requirements will be briefly outlined.
Design, Implementation and Validation of a Europe-Wide Pedagogical Framework for E-Learning
ERIC Educational Resources Information Center
Granic, Andrina; Mifsud, Charles; Cukusic, Maja
2009-01-01
Within the context of a Europe-wide project UNITE, a number of European partners set out to design, implement and validate a pedagogical framework (PF) for e- and m-Learning in secondary schools. The process of formulating and testing the PF was an evolutionary one that reflected the experiences and skills of the various European partners and…
Press, Michael F; Sauter, Guido; Buyse, Marc; Bernstein, Leslie; Guzman, Roberta; Santiago, Angela; Villalobos, Ivonne E; Eiermann, Wolfgang; Pienkowski, Tadeusz; Martin, Miguel; Robert, Nicholas; Crown, John; Bee, Valerie; Taupin, Henry; Flom, Kerry J; Tabah-Fisch, Isabelle; Pauletti, Giovanni; Lindsay, Mary-Ann; Riva, Alessandro; Slamon, Dennis J
2011-03-01
Approximately 35% of HER2-amplified breast cancers have coamplification of the topoisomerase II-alpha (TOP2A) gene encoding an enzyme that is a major target of anthracyclines. This study was designed to evaluate whether TOP2A gene alterations may predict incremental responsiveness to anthracyclines in some breast cancers. A total of 4,943 breast cancers were analyzed for alterations in TOP2A and HER2. Primary tumor tissues from patients with metastatic breast cancer treated in a trial of chemotherapy plus/minus trastuzumab were studied for amplification/deletion of TOP2A and HER2 as a test set followed by evaluation of malignancies from two separate, large trials for changes in these same genes as a validation set. Association between these alterations and clinical outcomes was determined. Test set cases containing HER2 amplification treated with doxorubicin and cyclophosphamide (AC) plus trastuzumab, demonstrated longer progression-free survival compared to those treated with AC alone (P = .0002). However, patients treated with AC alone whose tumors contain HER2/TOP2A coamplification experienced a similar improvement in survival (P = .004). Conversely, for patients treated with paclitaxel, HER2/TOP2A coamplification was not associated with improved outcomes. These observations were confirmed in a larger validation set, where HER2/TOP2A coamplification was again associated with longer survival when only anthracycline-containing chemotherapy was used for treatment compared with outcome in HER2-positive cancers lacking TOP2A coamplification. In a study involving nearly 5,000 breast malignancies, both test set and validation set demonstrate that TOP2A coamplification, not HER2 amplification, is the clinically useful predictive marker of an incremental response to anthracycline-based chemotherapy. Absence of HER2/TOP2A coamplification may indicate a more restricted efficacy advantage for breast cancers than previously thought.
Press, Michael F.; Sauter, Guido; Buyse, Marc; Bernstein, Leslie; Guzman, Roberta; Santiago, Angela; Villalobos, Ivonne E.; Eiermann, Wolfgang; Pienkowski, Tadeusz; Martin, Miguel; Robert, Nicholas; Crown, John; Bee, Valerie; Taupin, Henry; Flom, Kerry J.; Tabah-Fisch, Isabelle; Pauletti, Giovanni; Lindsay, Mary-Ann; Riva, Alessandro; Slamon, Dennis J.
2011-01-01
Purpose Approximately 35% of HER2-amplified breast cancers have coamplification of the topoisomerase II-alpha (TOP2A) gene encoding an enzyme that is a major target of anthracyclines. This study was designed to evaluate whether TOP2A gene alterations may predict incremental responsiveness to anthracyclines in some breast cancers. Methods A total of 4,943 breast cancers were analyzed for alterations in TOP2A and HER2. Primary tumor tissues from patients with metastatic breast cancer treated in a trial of chemotherapy plus/minus trastuzumab were studied for amplification/deletion of TOP2A and HER2 as a test set followed by evaluation of malignancies from two separate, large trials for changes in these same genes as a validation set. Association between these alterations and clinical outcomes was determined. Results Test set cases containing HER2 amplification treated with doxorubicin and cyclophosphamide (AC) plus trastuzumab, demonstrated longer progression-free survival compared to those treated with AC alone (P = .0002). However, patients treated with AC alone whose tumors contain HER2/TOP2A coamplification experienced a similar improvement in survival (P = .004). Conversely, for patients treated with paclitaxel, HER2/TOP2A coamplification was not associated with improved outcomes. These observations were confirmed in a larger validation set, where HER2/TOP2A coamplification was again associated with longer survival when only anthracycline-containing chemotherapy was used for treatment compared with outcome in HER2-positive cancers lacking TOP2A coamplification. Conclusion In a study involving nearly 5,000 breast malignancies, both test set and validation set demonstrate that TOP2A coamplification, not HER2 amplification, is the clinically useful predictive marker of an incremental response to anthracycline-based chemotherapy. Absence of HER2/TOP2A coamplification may indicate a more restricted efficacy advantage for breast cancers than previously thought. PMID:21189395
Sakthong, Phantipa; Munpan, Wipaporn
2017-10-01
Little was known about the head-to-head comparison of psychometric properties between SF-6D and EQ-5D-5L or the different value sets of EQ-5D-5L. Therefore, this study set out to compare the psychometric properties including agreement, convergent, and known-group validity between the SF-6D and the EQ-5D-5L using the real value sets from Thailand and the UK in patients with chronic diseases. 356 adults taking a medication for at least 3 months were identified from a university hospital in Bangkok, Thailand, between July 2014 and March 2015. Agreement was assessed by intraclass correlation coefficients (ICCs) and Bland-Altman plots. Convergent validity was evaluated using Spearman's rank correlation coefficients between SF-6D and EQ-5D-5L and EQ-VAS and SF-12v2. For known-groups validity, the Mann-Whitney U test and Kruskal-Wallis test were used to examine the associations between SF-6D and EQ-5D-5L and patient characteristics. Agreements between the SF-6D and the EQ-5D-5L using Thai and UK value sets were fair, with ICCs of 0.45 and 0.49, respectively. Bland-Altman plots showed that the majority of the SF-6D index scores were lower than the EQ-5D-5L index scores. Both the EQ-5D-5L value sets were more related to the EQ-VAS and physical health, while the SF-6D was more associated with mental health. Both EQ-5D-5L value sets were more sensitive than the SF-6D in discriminating patients with different levels of more known groups except for adverse drug reactions. The SF-6D and both EQ-5D-5L value sets appeared to be valid but sensitive to different outcomes in Thai patients with chronic diseases.
Kaluzhny, Yulia; Kandárová, Helena; Handa, Yuki; DeLuca, Jane; Truong, Thoa; Hunter, Amy; Kearney, Paul; d'Argembeau-Thornton, Laurence; Klausner, Mitchell
2015-05-01
The 7th Amendment to the EU Cosmetics Directive and the EU REACH Regulation have reinforced the need for in vitro ocular test methods. Validated in vitro ocular toxicity tests that can predict the human response to chemicals, cosmetics and other consumer products are required for the safety assessment of materials that intentionally, or inadvertently, come into contact with the eye. The EpiOcular Eye Irritation Test (EIT), which uses the normal human cell-based EpiOcular™ tissue model, was developed to address this need. The EpiOcular-EIT is able to discriminate, with high sensitivity and accuracy, between ocular irritant/corrosive materials and those that require no labelling. Although the original EpiOcular-EIT protocol was successfully pre-validated in an international, multicentre study sponsored by COLIPA (the predecessor to Cosmetics Europe), data from two larger studies (the EURL ECVAM-COLIPA validation study and an independent in-house validation at BASF SE) resulted in a sensitivity for the protocol for solids that was below the acceptance criteria set by the Validation Management Group (VMG) for eye irritation, and indicated the need for improvement of the assay's sensitivity for solids. By increasing the exposure time for solid materials from 90 minutes to 6 hours, the optimised EpiOcular-EIT protocol achieved 100% sensitivity, 68.4% specificity and 84.6% accuracy, thereby meeting all the acceptance criteria set by the VMG. In addition, to satisfy the needs of Japan and the Pacific region, the EpiOcular-EIT method was evaluated for its performance after extended shipment and storage of the tissues (4-5 days), and it was confirmed that the assay performs with similar levels of sensitivity, specificity and reproducibility in these circumstances. 2015 FRAME.
Burns, Ted M.; Conaway, Mark; Sanders, Donald B.
2010-01-01
Objective: To study the concurrent and construct validity and test-retest reliability in the practice setting of an outcome measure for myasthenia gravis (MG). Methods: Eleven centers participated in the validation study of the Myasthenia Gravis Composite (MGC) scale. Patients with MG were evaluated at 2 consecutive visits. Concurrent and construct validities of the MGC were assessed by evaluating MGC scores in the context of other MG-specific outcome measures. We used numerous potential indicators of clinical improvement to assess the sensitivity and specificity of the MGC for detecting clinical improvement. Test-retest reliability was performed on patients at the University of Virginia. Results: A total of 175 patients with MG were enrolled at 11 sites from July 1, 2008, to January 31, 2009. A total of 151 patients were seen in follow-up. Total MGC scores showed excellent concurrent validity with other MG-specific scales. Analyses of sensitivities and specificities of the MGC revealed that a 3-point improvement in total MGC score was optimal for signifying clinical improvement. A 3-point improvement in the MGC also appears to represent a meaningful improvement to most patients, as indicated by improved 15-item myasthenia gravis quality of life scale (MG-QOL15) scores. The psychometric properties were no better for an individualized subscore made up of the 2 functional domains that the patient identified as most important to treat. The test-retest reliability coefficient of the MGC was 98%, with a lower 95% confidence interval of 97%, indicating excellent test-retest reliability. Conclusions: The Myasthenia Gravis Composite is a reliable and valid instrument for measuring clinical status of patients with myasthenia gravis in the practice setting and in clinical trials. PMID:20439845
Lunny, Carole; McKenzie, Joanne E; McDonald, Steve
2016-06-01
Locating overviews of systematic reviews is difficult because of an absence of appropriate indexing terms and inconsistent terminology used to describe overviews. Our objective was to develop a validated search strategy to retrieve overviews in MEDLINE. We derived a test set of overviews from the references of two method articles on overviews. Two population sets were used to identify discriminating terms, that is, terms that appear frequently in the test set but infrequently in two population sets of references found in MEDLINE. We used text mining to conduct a frequency analysis of terms appearing in the titles and abstracts. Candidate terms were combined and tested in MEDLINE in various permutations, and the performance of strategies measured using sensitivity and precision. Two search strategies were developed: a sensitivity-maximizing strategy, achieving 93% sensitivity (95% confidence interval [CI]: 87, 96) and 7% precision (95% CI: 6, 8), and a sensitivity-and-precision-maximizing strategy, achieving 66% sensitivity (95% CI: 58, 74) and 21% precision (95% CI: 17, 25). The developed search strategies enable users to more efficiently identify overviews of reviews compared to current strategies. Consistent language in describing overviews would aid in their identification, as would a specific MEDLINE Publication Type. Copyright © 2015 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rainina, Evguenia I.; McCune, D. E.; Luna, Maria L.
2012-05-31
The goal of this study was to validate the previously observed high biological kill performance of PAEROSOL, a semi-dry, micro-aerosol decontamination technology, against common HAI in a non-human subject trial within a hospital setting of Madigan Army Medical Center (MAMC) on Joint Base Lewis-McChord in Tacoma, Washington. In addition to validating the disinfecting efficacy of PAEROSOL, the objectives of the trial included a demonstration of PAEROSOL environmental safety, (i.e., impact to hospital interior materials and electronic equipment exposed during testing) and PAEROSOL parameters optimization for future deployment.
Validation of the Economic and Health Outcomes Model of Type 2 Diabetes Mellitus (ECHO-T2DM).
Willis, Michael; Johansen, Pierre; Nilsson, Andreas; Asseburg, Christian
2017-03-01
The Economic and Health Outcomes Model of Type 2 Diabetes Mellitus (ECHO-T2DM) was developed to address study questions pertaining to the cost-effectiveness of treatment alternatives in the care of patients with type 2 diabetes mellitus (T2DM). Naturally, the usefulness of a model is determined by the accuracy of its predictions. A previous version of ECHO-T2DM was validated against actual trial outcomes and the model predictions were generally accurate. However, there have been recent upgrades to the model, which modify model predictions and necessitate an update of the validation exercises. The objectives of this study were to extend the methods available for evaluating model validity, to conduct a formal model validation of ECHO-T2DM (version 2.3.0) in accordance with the principles espoused by the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) and the Society for Medical Decision Making (SMDM), and secondarily to evaluate the relative accuracy of four sets of macrovascular risk equations included in ECHO-T2DM. We followed the ISPOR/SMDM guidelines on model validation, evaluating face validity, verification, cross-validation, and external validation. Model verification involved 297 'stress tests', in which specific model inputs were modified systematically to ascertain correct model implementation. Cross-validation consisted of a comparison between ECHO-T2DM predictions and those of the seminal National Institutes of Health model. In external validation, study characteristics were entered into ECHO-T2DM to replicate the clinical results of 12 studies (including 17 patient populations), and model predictions were compared to observed values using established statistical techniques as well as measures of average prediction error, separately for the four sets of macrovascular risk equations supported in ECHO-T2DM. Sub-group analyses were conducted for dependent vs. independent outcomes and for microvascular vs. macrovascular vs. mortality endpoints. All stress tests were passed. ECHO-T2DM replicated the National Institutes of Health cost-effectiveness application with numerically similar results. In external validation of ECHO-T2DM, model predictions agreed well with observed clinical outcomes. For all sets of macrovascular risk equations, the results were close to the intercept and slope coefficients corresponding to a perfect match, resulting in high R 2 and failure to reject concordance using an F test. The results were similar for sub-groups of dependent and independent validation, with some degree of under-prediction of macrovascular events. ECHO-T2DM continues to match health outcomes in clinical trials in T2DM, with prediction accuracy similar to other leading models of T2DM.
[Reliability and validity of the Braden Scale for predicting pressure sore risk].
Boes, C
2000-12-01
For more accurate and objective pressure sore risk assessment various risk assessment tools were developed mainly in the USA and Great Britain. The Braden Scale for Predicting Pressure Sore Risk is one such example. By means of a literature analysis of German and English texts referring to the Braden Scale the scientific control criteria reliability and validity will be traced and consequences for application of the scale in Germany will be demonstrated. Analysis of 4 reliability studies shows an exclusive focus on interrater reliability. Further, even though examination of 19 validity studies occurs in many different settings, such examination is limited to the criteria sensitivity and specificity (accuracy). The range of sensitivity and specificity level is 35-100%. The recommended cut off points rank in the field of 10 to 19 points. The studies prove to be not comparable with each other. Furthermore, distortions in these studies can be found which affect accuracy of the scale. The results of the here presented analysis show an insufficient proof for reliability and validity in the American studies. In Germany, the Braden scale has not yet been tested under scientific criteria. Such testing is needed before using the scale in different German settings. During the course of such testing, construction and study procedures of the American studies can be used as a basis as can the problems be identified in the analysis presented below.
Achieving external validity in home advantage research: generalizing crowd noise effects
Myers, Tony D.
2014-01-01
Different factors have been postulated to explain the home advantage phenomenon in sport. One plausible explanation investigated has been the influence of a partisan home crowd on sports officials' decisions. Different types of studies have tested the crowd influence hypothesis including purposefully designed experiments. However, while experimental studies investigating crowd influences have high levels of internal validity, they suffer from a lack of external validity; decision-making in a laboratory setting bearing little resemblance to decision-making in live sports settings. This focused review initially considers threats to external validity in applied and theoretical experimental research. Discussing how such threats can be addressed using representative design by focusing on a recently published study that arguably provides the first experimental evidence of the impact of live crowd noise on officials in sport. The findings of this controlled experiment conducted in a real tournament setting offer a level of confirmation of the findings of laboratory studies in the area. Finally directions for future research and the future conduct of crowd noise studies are discussed. PMID:24917839
1992-06-01
Paper, Version 2.0, December 1989. [Woodcock90] Gary Woodcock , Automated Generation of Hypertext Documents, CIVC Technical Report (working paper...environment setup, performance testing, assessor testing, and analysis) of the ACEC. A captive scenario example could be developed that would guide the
How Do We Go about Investigating Test Fairness?
ERIC Educational Resources Information Center
Xi, Xiaoming
2010-01-01
Previous test fairness frameworks have greatly expanded the scope of fairness, but do not provide a means to fully integrate fairness investigations and set priorities. This article proposes an approach to guide practitioners on fairness research and practices. This approach treats fairness as an aspect of validity and conceptualizes it as…
Jayaraman, Jayakumar; Wong, Hai Ming; King, Nigel M; Roberts, Graham J
2016-10-01
Many countries have recently experienced a rapid increase in the demand for forensic age estimates of unaccompanied minors. Hong Kong is a major tourist and business center where there has been an increase in the number of people intercepted with false travel documents. An accurate estimation of age is only possible when a dataset for age estimation that has been derived from the corresponding ethnic population. Thus, the aim of this study was to develop and validate a Reference Data Set (RDS) for dental age estimation for southern Chinese. A total of 2306 subjects were selected from the patient archives of a large dental hospital and the chronological age for each subject was recorded. This age was assigned to each specific stage of dental development for each tooth to create a RDS. To validate this RDS, a further 484 subjects were randomly chosen from the patient archives and their dental age was assessed based on the scores from the RDS. Dental age was estimated using meta-analysis command corresponding to random effects statistical model. Chronological age (CA) and Dental Age (DA) were compared using the paired t-test. The overall difference between the chronological and dental age (CA-DA) was 0.05 years (2.6 weeks) for males and 0.03 years (1.6 weeks) for females. The paired t-test indicated that there was no statistically significant difference between the chronological and dental age (p > 0.05). The validated southern Chinese reference dataset based on dental maturation accurately estimated the chronological age. Copyright © 2016 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Hravnak, Marilyn; Chen, Lujie; Dubrawski, Artur; Bose, Eliezer; Clermont, Gilles; Pinsky, Michael R
2016-12-01
Huge hospital information system databases can be mined for knowledge discovery and decision support, but artifact in stored non-invasive vital sign (VS) high-frequency data streams limits its use. We used machine-learning (ML) algorithms trained on expert-labeled VS data streams to automatically classify VS alerts as real or artifact, thereby "cleaning" such data for future modeling. 634 admissions to a step-down unit had recorded continuous noninvasive VS monitoring data [heart rate (HR), respiratory rate (RR), peripheral arterial oxygen saturation (SpO 2 ) at 1/20 Hz, and noninvasive oscillometric blood pressure (BP)]. Time data were across stability thresholds defined VS event epochs. Data were divided Block 1 as the ML training/cross-validation set and Block 2 the test set. Expert clinicians annotated Block 1 events as perceived real or artifact. After feature extraction, ML algorithms were trained to create and validate models automatically classifying events as real or artifact. The models were then tested on Block 2. Block 1 yielded 812 VS events, with 214 (26 %) judged by experts as artifact (RR 43 %, SpO 2 40 %, BP 15 %, HR 2 %). ML algorithms applied to the Block 1 training/cross-validation set (tenfold cross-validation) gave area under the curve (AUC) scores of 0.97 RR, 0.91 BP and 0.76 SpO 2 . Performance when applied to Block 2 test data was AUC 0.94 RR, 0.84 BP and 0.72 SpO 2 . ML-defined algorithms applied to archived multi-signal continuous VS monitoring data allowed accurate automated classification of VS alerts as real or artifact, and could support data mining for future model building.
Hravnak, Marilyn; Chen, Lujie; Dubrawski, Artur; Bose, Eliezer; Clermont, Gilles; Pinsky, Michael R.
2015-01-01
PURPOSE Huge hospital information system databases can be mined for knowledge discovery and decision support, but artifact in stored non-invasive vital sign (VS) high-frequency data streams limits its use. We used machine-learning (ML) algorithms trained on expert-labeled VS data streams to automatically classify VS alerts as real or artifact, thereby “cleaning” such data for future modeling. METHODS 634 admissions to a step-down unit had recorded continuous noninvasive VS monitoring data (heart rate [HR], respiratory rate [RR], peripheral arterial oxygen saturation [SpO2] at 1/20Hz., and noninvasive oscillometric blood pressure [BP]) Time data were across stability thresholds defined VS event epochs. Data were divided Block 1 as the ML training/cross-validation set and Block 2 the test set. Expert clinicians annotated Block 1 events as perceived real or artifact. After feature extraction, ML algorithms were trained to create and validate models automatically classifying events as real or artifact. The models were then tested on Block 2. RESULTS Block 1 yielded 812 VS events, with 214 (26%) judged by experts as artifact (RR 43%, SpO2 40%, BP 15%, HR 2%). ML algorithms applied to the Block 1 training/cross-validation set (10-fold cross-validation) gave area under the curve (AUC) scores of 0.97 RR, 0.91 BP and 0.76 SpO2. Performance when applied to Block 2 test data was AUC 0.94 RR, 0.84 BP and 0.72 SpO2). CONCLUSIONS ML-defined algorithms applied to archived multi-signal continuous VS monitoring data allowed accurate automated classification of VS alerts as real or artifact, and could support data mining for future model building. PMID:26438655
Correcting for Optimistic Prediction in Small Data Sets
Smith, Gordon C. S.; Seaman, Shaun R.; Wood, Angela M.; Royston, Patrick; White, Ian R.
2014-01-01
The C statistic is a commonly reported measure of screening test performance. Optimistic estimation of the C statistic is a frequent problem because of overfitting of statistical models in small data sets, and methods exist to correct for this issue. However, many studies do not use such methods, and those that do correct for optimism use diverse methods, some of which are known to be biased. We used clinical data sets (United Kingdom Down syndrome screening data from Glasgow (1991–2003), Edinburgh (1999–2003), and Cambridge (1990–2006), as well as Scottish national pregnancy discharge data (2004–2007)) to evaluate different approaches to adjustment for optimism. We found that sample splitting, cross-validation without replication, and leave-1-out cross-validation produced optimism-adjusted estimates of the C statistic that were biased and/or associated with greater absolute error than other available methods. Cross-validation with replication, bootstrapping, and a new method (leave-pair-out cross-validation) all generated unbiased optimism-adjusted estimates of the C statistic and had similar absolute errors in the clinical data set. Larger simulation studies confirmed that all 3 methods performed similarly with 10 or more events per variable, or when the C statistic was 0.9 or greater. However, with lower events per variable or lower C statistics, bootstrapping tended to be optimistic but with lower absolute and mean squared errors than both methods of cross-validation. PMID:24966219
Space Suit Joint Torque Testing
NASA Technical Reports Server (NTRS)
Valish, Dana J.
2011-01-01
In 2009 and early 2010, a test was performed to quantify the torque required to manipulate joints in several existing operational and prototype space suits in an effort to develop joint torque requirements appropriate for a new Constellation Program space suit system. The same test method was levied on the Constellation space suit contractors to verify that their suit design meets the requirements. However, because the original test was set up and conducted by a single test operator there was some question as to whether this method was repeatable enough to be considered a standard verification method for Constellation or other future space suits. In order to validate the method itself, a representative subset of the previous test was repeated, using the same information that would be available to space suit contractors, but set up and conducted by someone not familiar with the previous test. The resultant data was compared using graphical and statistical analysis and a variance in torque values for some of the tested joints was apparent. Potential variables that could have affected the data were identified and re-testing was conducted in an attempt to eliminate these variables. The results of the retest will be used to determine if further testing and modification is necessary before the method can be validated.
Control and Non-Payload Communications (CNPC) Prototype Radio Validation Flight Test Report
NASA Technical Reports Server (NTRS)
Shalkhauser, Kurt A.; Ishac, Joseph A.; Iannicca, Dennis C.; Bretmersky, Steven C.; Smith, Albert E.
2017-01-01
This report provides an overview and results from the unmanned aircraft (UA) Control and Non-Payload Communications (CNPC) Generation 5 prototype radio validation flight test campaign. The radios used in the test campaign were developed under cooperative agreement NNC11AA01A between the NASA Glenn Research Center and Rockwell Collins, Inc., of Cedar Rapids, Iowa. Measurement results are presented for flight tests over hilly terrain, open water, and urban landscape, utilizing radio sets installed into a NASA aircraft and ground stations. Signal strength and frame loss measurement data are analyzed relative to time and aircraft position, specifically addressing the impact of line-of-sight terrain obstructions on CNPC data flow. Both the radio and flight test system are described.
Negahban, Hossein; Mohtasebi, Elham; Goharpey, Shahin
2015-01-01
The aim of this methodological study was to cross-culturally translate the Shoulder Activity Scale (SAS) into the Persian and determine its clinimetric properties including reliability, validity, and responsiveness in patients with shoulder disorders. Persian version of the SAS was obtained after standard forward-backward translation. Three questionnaires were completed by the respondents: SAS, shoulder pain and disability index (SPADI), and Short-Form 36 Health Survey (SF-36). The patients completed the SAS, 1 week after the first visit to evaluate the test-retest reliability. Construct validity was evaluated by examining the associations between the scores on the SAS and the scores obtained from the SPADI, SF-36, and age of the patients. To assess responsiveness, data were collected in the first visit and then again after 4 weeks physiotherapy intervention. Test-retest reliability and internal consistency were assessed using Intra-class Correlation Coefficient (ICC) and Cronbach's alpha, respectively. To evaluate construct validity, Spearman's rank correlation was used. The ability of the SAS to detect changes was evaluated by the receiver-operating characteristics method. No problem or language difficulties were reported during translation process. Test-retest reliability of the SAS was excellent with an ICC of 0.98. Also, the marginal Cronbach's alpha level of 0.64 was obtained. The correlation between the SAS and the SPADI was low, proving divergent validity, whereas the correlations between the SAS and the SF-36/age were moderate proving convergent validity. A marginally acceptable responsiveness was achieved for the Persian SAS. The study provides some evidences to support the test-retest reliability, internal consistency, construct validity, and responsiveness of the Persian version of the SAS in patients with shoulder disorders. Therefore, it seems that this instrument is a useful measure of shoulder activity level in research setting and clinical practice. The shoulder activity scale (SAS) is a reliable, valid, and responsive measure of shoulder activity level in Persian-speaking patients with different shoulder disorders. The results on clinimetric properties of the Persian SAS are comparable with its original, English version. Persian version of the SAS can be used in "clinical" and "research" settings of patients with shoulder disorders.
Hernan, Andrea L; Giles, Sally J; O'Hara, Jane K; Fuller, Jeffrey; Johnson, Julie K; Dunbar, James A
2016-04-01
Patients are a valuable source of information about ways to prevent harm in primary care and are in a unique position to provide feedback about the factors that contribute to safety incidents. Unlike in the hospital setting, there are currently no tools that allow the systematic capture of this information from patients. The aim of this study was to develop a quantitative primary care patient measure of safety (PC PMOS). A two-stage approach was undertaken to develop questionnaire domains and items. Stage 1 involved a modified Delphi process. An expert panel reached consensus on domains and items based on three sources of information (validated hospital PMOS, previous research conducted by our study team and literature on threats to patient safety). Stage 2 involved testing the face validity of the questionnaire developed during stage 1 with patients and primary care staff using the 'think aloud' method. Following this process, the questionnaire was revised accordingly. The PC PMOS was received positively by both patients and staff during face validity testing. Barriers to completion included the length, relevance and clarity of questions. The final PC PMOS consisted of 50 items across 15 domains. The contributory factors to safety incidents centred on communication, access to care, patient-related factors, organisation and care planning, task performance and information flow. This is the first tool specifically designed for primary care settings, which allows patients to provide feedback about factors contributing to potential safety incidents. The PC PMOS provides a way for primary care organisations to learn about safety from the patient perspective and make service improvements with the aim of reducing harm in this setting. Future research will explore the reliability and construct validity of the PC PMOS. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Validation of the Arabic Version of the Internet Gaming Disorder-20 Test.
Hawi, Nazir S; Samaha, Maya
2017-04-01
In recent years, researchers have been trying to shed light on gaming addiction and its association with different psychiatric disorders and psychological determinants. The latest edition version of the American Psychiatric Association's Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) included in its Section 3 Internet Gaming Disorder (IGD) as a condition for further empirical study and proposed nine criteria for the diagnosis of IGD. The 20-item Internet Gaming Disorder (IGD-20) Test was developed as a valid and reliable tool to assess gaming addiction based on the nine criteria set by the DSM-5. The aim of this study is to validate an Arabic version of the IGD-20 Test. The Arabic version of IGD-20 will not only help in identifying Arabic-speaking pathological gamers but also stimulate cross-cultural studies that could contribute to an area in need of more research for insight and treatment. After a process of translation and back-translation and with the participation of a sizable sample of Arabic-speaking adolescents, the present study conducted a psychometric validation of the IGD-20 Test. Our confirmatory factor analysis showed the validity of the Arabic version of the IGD-20 Test. The one-factor model of the Arabic IGD-20 Test had very good psychometric properties, and it fitted the sample data extremely well. In addition, correlation analysis between the IGD-20 Test and the daily duration on weekdays and weekends gameplay revealed significant positive relationships that warranted a criterion-related validation. Thus, the Arabic version of the IGD-20 Test is a valid and reliable measure of IGD among Arabic-speaking populations.
van der Meulen, Ineke; van de Sandt-Koenderman, W Mieke E; Duivenvoorden, Hugo J; Ribbers, Gerard M
2010-01-01
This study explores the psychometric qualities of the Scenario Test, a new test to assess daily-life communication in severe aphasia. The test is innovative in that it: (1) examines the effectiveness of verbal and non-verbal communication; and (2) assesses patients' communication in an interactive setting, with a supportive communication partner. To determine the reliability, validity, and sensitivity to change of the Scenario Test and discuss its clinical value. The Scenario Test was administered to 122 persons with aphasia after stroke and to 25 non-aphasic controls. Analyses were performed for the entire group of persons with aphasia, as well as for a subgroup of persons unable to communicate verbally (n = 43). Reliability (internal consistency, test-retest reliability, inter-judge, and intra-judge reliability) and validity (internal validity, convergent validity, known-groups validity) and sensitivity to change were examined using standard psychometric methods. The Scenario Test showed high levels of reliability. Internal consistency (Cronbach's alpha = 0.96; item-rest correlations = 0.58-0.82) and test-retest reliability (ICC = 0.98) were high. Agreement between judges in total scores was good, as indicated by the high inter- and intra-judge reliability (ICC = 0.86-1.00). Agreement in scores on the individual items was also good (square-weighted kappa values 0.61-0.92). The test demonstrated good levels of validity. A principal component analysis for categorical data identified two dimensions, interpreted as general communication and communicative creativity. Correlations with three other instruments measuring communication in aphasia, that is, Spontaneous Speech interview from the Aachen Aphasia Test (AAT), Amsterdam-Nijmegen Everyday Language Test (ANELT), and Communicative Effectiveness Index (CETI), were moderate to strong (0.50-0.85) suggesting good convergent validity. Group differences were observed between persons with aphasia and non-aphasic controls, as well as between persons with aphasia unable to use speech to convey information and those able to communicate verbally; this indicates good known-groups validity. The test was sensitive to changes in performance, measured over a period of 6 months. The data support the reliability and validity of the Scenario Test as an instrument for examining daily-life communication in aphasia. The test focuses on multimodal communication; its psychometric qualities enable future studies on the effect of Alternative and Augmentative Communication (AAC) training in aphasia.
NASA Technical Reports Server (NTRS)
Stehura, Aaron; Rozek, Matthew
2013-01-01
The complexity of the Mars Science Laboratory (MSL) mission presented the Entry, Descent, and Landing systems engineering team with many challenges in its Verification and Validation (V&V) campaign. This paper describes some of the logistical hurdles related to managing a complex set of requirements, test venues, test objectives, and analysis products in the implementation of a specific portion of the overall V&V program to test the interaction of flight software with the MSL avionics suite. Application-specific solutions to these problems are presented herein, which can be generalized to other space missions and to similar formidable systems engineering problems.
Jacqmin, Dustin J; Bredfeldt, Jeremy S; Frigo, Sean P; Smilowitz, Jennifer B
2017-01-01
The AAPM Medical Physics Practice Guideline (MPPG) 5.a provides concise guidance on the commissioning and QA of beam modeling and dose calculation in radiotherapy treatment planning systems. This work discusses the implementation of the validation testing recommended in MPPG 5.a at two institutions. The two institutions worked collaboratively to create a common set of treatment fields and analysis tools to deliver and analyze the validation tests. This included the development of a novel, open-source software tool to compare scanning water tank measurements to 3D DICOM-RT Dose distributions. Dose calculation algorithms in both Pinnacle and Eclipse were tested with MPPG 5.a to validate the modeling of Varian TrueBeam linear accelerators. The validation process resulted in more than 200 water tank scans and more than 50 point measurements per institution, each of which was compared to a dose calculation from the institution's treatment planning system (TPS). Overall, the validation testing recommended in MPPG 5.a took approximately 79 person-hours for a machine with four photon and five electron energies for a single TPS. Of the 79 person-hours, 26 person-hours required time on the machine, and the remainder involved preparation and analysis. The basic photon, electron, and heterogeneity correction tests were evaluated with the tolerances in MPPG 5.a, and the tolerances were met for all tests. The MPPG 5.a evaluation criteria were used to assess the small field and IMRT/VMAT validation tests. Both institutions found the use of MPPG 5.a to be a valuable resource during the commissioning process. The validation testing in MPPG 5.a showed the strengths and limitations of the TPS models. In addition, the data collected during the validation testing is useful for routine QA of the TPS, validation of software upgrades, and commissioning of new algorithms. © 2016 The Authors. Journal of Applied Clinical Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.
Sisic, Nedim; Jelicic, Mario; Pehar, Miran; Spasic, Miodrag; Sekulic, Damir
2016-01-01
In basketball, anthropometric status is an important factor when identifying and selecting talents, while agility is one of the most vital motor performances. The aim of this investigation was to evaluate the influence of anthropometric variables and power capacities on different preplanned agility performances. The participants were 92 high-level, junior-age basketball players (16-17 years of age; 187.6±8.72 cm in body height, 78.40±12.26 kg in body mass), randomly divided into a validation and cross-validation subsample. The predictors set consisted of 16 anthropometric variables, three tests of power-capacities (Sargent-jump, broad-jump and medicine-ball-throw) as predictors. The criteria were three tests of agility: a T-Shape-Test; a Zig-Zag-Test, and a test of running with a 180-degree turn (T180). Forward stepwise multiple regressions were calculated for validation subsamples and then cross-validated. Cross validation included correlations between observed and predicted scores, dependent samples t-test between predicted and observed scores; and Bland Altman graphics. Analysis of the variance identified centres being advanced in most of the anthropometric indices, and medicine-ball-throw (all at P<0.05); with no significant between-position-differences for other studied motor performances. Multiple regression models originally calculated for the validation subsample were then cross-validated, and confirmed for Zig-zag-Test (R of 0.71 and 0.72 for the validation and cross-validation subsample, respectively). Anthropometrics were not strongly related to agility performance, but leg length is found to be negatively associated with performance in basketball-specific agility. Power capacities are confirmed to be an important factor in agility. The results highlighted the importance of sport-specific tests when studying pre-planned agility performance in basketball. The improvement in power capacities will probably result in an improvement in agility in basketball athletes, while anthropometric indices should be used in order to identify those athletes who can achieve superior agility performance.
Infant polysomnography: reliability and validity of infant arousal assessment.
Crowell, David H; Kulp, Thomas D; Kapuniai, Linda E; Hunt, Carl E; Brooks, Lee J; Weese-Mayer, Debra E; Silvestri, Jean; Ward, Sally Davidson; Corwin, Michael; Tinsley, Larry; Peucker, Mark
2002-10-01
Infant arousal scoring based on the Atlas Task Force definition of transient EEG arousal was evaluated to determine (1). whether transient arousals can be identified and assessed reliably in infants and (2). whether arousal and no-arousal epochs scored previously by trained raters can be validated reliably by independent sleep experts. Phase I for inter- and intrarater reliability scoring was based on two datasets of sleep epochs selected randomly from nocturnal polysomnograms of healthy full-term, preterm, idiopathic apparent life-threatening event cases, and siblings of Sudden Infant Death Syndrome infants of 35 to 64 weeks postconceptional age. After training, test set 1 reliability was assessed and discrepancies identified. After retraining, test set 2 was scored by the same raters to determine interrater reliability. Later, three raters from the trained group rescored test set 2 to assess inter- and intrarater reliabilities. Interrater and intrarater reliability kappa's, with 95% confidence intervals, ranged from substantial to almost perfect levels of agreement. Interrater reliabilities for spontaneous arousals were initially moderate and then substantial. During the validation phase, 315 previously scored epochs were presented to four sleep experts to rate as containing arousal or no-arousal events. Interrater expert agreements were diverse and considered as noninterpretable. Concordance in sleep experts' agreements, based on identification of the previously sampled arousal and no-arousal epochs, was used as a secondary evaluative technique. Results showed agreement by two or more experts on 86% of the Collaborative Home Infant Monitoring Evaluation Study arousal scored events. Conversely, only 1% of the Collaborative Home Infant Monitoring Evaluation Study-scored no-arousal epochs were rated as an arousal. In summary, this study presents an empirically tested model with procedures and criteria for attaining improved reliability in transient EEG arousal assessments in infants using the modified Atlas Task Force standards. With training based on specific criteria, substantial inter- and intrarater agreement in identifying infant arousals was demonstrated. Corroborative validation results were too disparate for meaningful interpretation. Alternate evaluation based on concordance agreements supports reliance on infant EEG criteria for assessment. Results mandate additional confirmatory validation studies with specific training on infant EEG arousal assessment criteria.
Ghisi, Gabriela Lima de Melo; Sandison, Nicole; Oh, Paul
2016-03-01
To develop, pilot test and psychometrically validate a shorter version of the coronary artery disease education questionnaire (CADE-Q), called CADE-Q SV. Based on previous versions of the CADE-Q, cardiac rehabilitation (CR) experts developed 20 items divided into 5 knowledge domains to comprise the first version of the CADE-Q SV. To establish content validity, they were reviewed by an expert panel (N=12). Refined items were pilot-tested in 20 patients, in which clarity was provided. A final version was generated and psychometrically-tested in 132CR patients. Test-retest reliability was assessed via the intraclass correlation coefficient (ICC), the internal consistency using Cronbach's alpha, and criterion validity with regard to patients' education and duration in CR. All ICC coefficients meet the minimum recommended standard. All domains were considered internally consistent (α>0.7). Criterion validity was supported by significant differences in mean scores by educational level (p<0.01) and duration in CR (p<0.05). Knowledge about exercise and nutrition was higher than knowledge about medical condition. The CADE-Q SV was demonstrated to have good reliability and validity. This is a short, quick and appropriate tool for application in clinical and research settings, assessing patients' knowledge during CR and as part of education programming. Copyright © 2015. Published by Elsevier Ireland Ltd.
Validity and Reliability of Baseline Testing in a Standardized Environment.
Higgins, Kathryn L; Caze, Todd; Maerlender, Arthur
2017-08-11
The Immediate Postconcussion Assessment and Cognitive Testing (ImPACT) is a computerized neuropsychological test battery commonly used to determine cognitive recovery from concussion based on comparing post-injury scores to baseline scores. This model is based on the premise that ImPACT baseline test scores are a valid and reliable measure of optimal cognitive function at baseline. Growing evidence suggests that this premise may not be accurate and a large contributor to invalid and unreliable baseline test scores may be the protocol and environment in which baseline tests are administered. This study examined the effects of a standardized environment and administration protocol on the reliability and performance validity of athletes' baseline test scores on ImPACT by comparing scores obtained in two different group-testing settings. Three hundred-sixty one Division 1 cohort-matched collegiate athletes' baseline data were assessed using a variety of indicators of potential performance invalidity; internal reliability was also examined. Thirty-one to thirty-nine percent of the baseline cases had at least one indicator of low performance validity, but there were no significant differences in validity indicators based on environment in which the testing was conducted. Internal consistency reliability scores were in the acceptable to good range, with no significant differences between administration conditions. These results suggest that athletes may be reliably performing at levels lower than their best effort would produce. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Reconceptualisation of Approaches to Teaching Evaluation in Higher Education
ERIC Educational Resources Information Center
Tran, Nga D.
2015-01-01
The ubiquity of using Student Evaluation of Teaching (SET) in higher education is inherently controversial. Issues mostly resolve around whether the instrument is reliable and valid for the purpose for which it was intended. Controversies exist, in part, due to the lack of a theoretical framework upon which SETs can be based and tested for their…
Validation of the Spanish version of the Index of Spouse Abuse.
Plazaola-Castaño, Juncal; Ruiz-Pérez, Isabel; Escribà-Agüir, Vicenta; Jiménez-Martín, Juan Manuel; Hernández-Torres, Elisa
2009-04-01
Partner violence against women is a major public health problem. Although there are currently a number of validated screening and diagnostic tools that can be used to evaluate this type of violence, such tools are not available in Spain. The aim of this study is to analyze the validity and reliability of the Spanish version of the Index of Spouse Abuse (ISA). A cross-sectional study was carried out in 2005 in two health centers in Granada, Spain, in 390 women between 18 and 70 years old. Analyses of the factorial structure, internal consistency, test-retest reliability, and construct validity were conducted. Cutoff points for each subscale were also defined. For the construct validity analysis, the SF-36 perceived general health dimension, the Rosenberg Self-Esteem Scale and the Goldberg 12-item General Health Questionnaire were included. The psychometric analysis shows that the instrument has good internal consistency, reproducibility, and construct validity. The scale is useful for the analysis of partner violence against women in both a research setting and a healthcare setting.
Mas, Sergi; Gassó, Patricia; Morer, Astrid; Calvo, Anna; Bargalló, Nuria; Lafuente, Amalia; Lázaro, Luisa
2016-01-01
We propose an integrative approach that combines structural magnetic resonance imaging data (MRI), diffusion tensor imaging data (DTI), neuropsychological data, and genetic data to predict early-onset obsessive compulsive disorder (OCD) severity. From a cohort of 87 patients, 56 with complete information were used in the present analysis. First, we performed a multivariate genetic association analysis of OCD severity with 266 genetic polymorphisms. This association analysis was used to select and prioritize the SNPs that would be included in the model. Second, we split the sample into a training set (N = 38) and a validation set (N = 18). Third, entropy-based measures of information gain were used for feature selection with the training subset. Fourth, the selected features were fed into two supervised methods of class prediction based on machine learning, using the leave-one-out procedure with the training set. Finally, the resulting model was validated with the validation set. Nine variables were used for the creation of the OCD severity predictor, including six genetic polymorphisms and three variables from the neuropsychological data. The developed model classified child and adolescent patients with OCD by disease severity with an accuracy of 0.90 in the testing set and 0.70 in the validation sample. Above its clinical applicability, the combination of particular neuropsychological, neuroimaging, and genetic characteristics could enhance our understanding of the neurobiological basis of the disorder. PMID:27093171
Manning, Joseph C; Walker, Gemma M; Carter, Tim; Aubeeluck, Aimee; Witchell, Miranda; Coad, Jane
2018-04-12
Currently, no standardised, evidence-based assessment tool for assessing immediate self-harm and suicide in acute paediatric inpatient settings exists. The aim of this study is to develop and test the psychometric properties of an assessment tool that identifies immediate risk of self-harm and suicide in children and young people (10-19 years) in acute paediatric hospital settings. Development phase: This phase involved a scoping review of the literature to identify and extract items from previously published suicide and self-harm risk assessment scales. Using a modified electronic Delphi approach, these items will then be rated according to their relevance for assessment of immediate suicide or self-harm risk by expert professionals. Inclusion of items will be determined by 65%-70% consensus between raters. Subsequently, a panel of expert members will convene to determine the face validity, appropriate phrasing, item order and response format for the finalised items.Psychometric testing phase: The finalised items will be tested for validity and reliability through a multicentre, psychometric evaluation. Psychometric testing will be undertaken to determine the following: internal consistency, inter-rater reliability, convergent, divergent validity and concurrent validity. Ethical approval was provided by the National Health Service East Midlands-Derby Research Ethics Committee (17/EM/0347) and full governance clearance received by the Health Research Authority and local participating sites. Findings from this study will be disseminated to professionals and the public via peer-reviewed journal publications, popular social media and conference presentations. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Huang, Yun-Hsin; Wu, Chih-Hsun; Chen, Hsiu-Jung; Cheng, Yih-Ru; Hung, Fu-Chien; Leung, Kai-Kuan; Lue, Bee-Horng; Chen, Ching-Yu; Chiu, Tai-Yuan; Wu, Yin-Chang
2018-01-16
Severe negative emotional reactions to chronic illness are maladaptive to patients and they need to be addressed in a primary care setting. The psychometric properties of a quick screening tool-the Negative Emotions due to Chronic Illness Screening Test (NECIS)-for general emotional problems among patients with chronic illness being treated in a primary care setting was investigated. Three studies including 375 of patients with chronic illness were used to assess and analyze internal consistency, test-retest reliability, criterion-related validity, a cut-off point for distinguishing maladaptive emotions and clinical application validity of NECIS. Self-report questionnaires were used. Internal consistency (Cronbach's α) ranged from 0.78 to 0.82, and the test-retest reliability was 0.71 (P < 0.001). Criterion-related validity was 0.51 (P < 0.001). Based on the 'severe maladaptation' and 'moderate maladaptation' groups defined by using the 'Worsening due to Chronic Illness' index as the analysis reference, the receiver-operating characteristic curve analysis revealed an area under the curve of 0.81 and 0.82 (ps < 0.001), and a cut-off point of 19/20 was the most satisfactory for distinguishing those with overly negative emotions, with a sensitivity and specificity of 83.3 and 69.0%, and 68.5 and 83.0%, respectively. The clinical application validity analysis revealed that low NECIS group showed significantly better adaptation to chronic illness on the scales of subjective health, general satisfaction with life, self-efficacy of self-care for disease, illness perception and stressors in everyday life. The NECIS has satisfactory psychometric properties for use in the primary care setting. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Patient-perceived hospital service quality: an empirical assessment.
Pai, Yogesh P; Chary, Satyanarayana T; Pai, Rashmi Yogesh
2018-02-12
Purpose The purpose of this paper is to appraise Pai and Chary's (2016) conceptual framework for measuring patient-perceived hospital service quality (HSQ). Design/methodology/approach A structured questionnaire was used to obtain data from teaching, public and corporate hospital patients. Several tests were conducted to assess the instrument's reliability and validity. Pai and Chary's (2016) nine dimensions for measuring HSQ were examined in this paper. Findings The tests confirm that Pai and Chary's (2016) conceptual framework is reliable and valid. The study also establishes that the nine dimensions measure HSQ. Practical implications The framework empowers managers to assess service quality in any hospital settings, corporate, public and teaching, using an approach that is superior to the existing HSQ scales. Originality/value This paper helps researchers and practitioners to assess HSQ from patient perspectives in any hospital setting.
NASA Astrophysics Data System (ADS)
He, Song-Bing; Ben Hu; Kuang, Zheng-Kun; Wang, Dong; Kong, De-Xin
2016-11-01
Adenosine receptors (ARs) are potential therapeutic targets for Parkinson’s disease, diabetes, pain, stroke and cancers. Prediction of subtype selectivity is therefore important from both therapeutic and mechanistic perspectives. In this paper, we introduced a shape similarity profile as molecular descriptor, namely three-dimensional biologically relevant spectrum (BRS-3D), for AR selectivity prediction. Pairwise regression and discrimination models were built with the support vector machine methods. The average determination coefficient (r2) of the regression models was 0.664 (for test sets). The 2B-3 (A2B vs A3) model performed best with q2 = 0.769 for training sets (10-fold cross-validation), and r2 = 0.766, RMSE = 0.828 for test sets. The models’ robustness and stability were validated with 100 times resampling and 500 times Y-randomization. We compared the performance of BRS-3D with 3D descriptors calculated by MOE. BRS-3D performed as good as, or better than, MOE 3D descriptors. The performances of the discrimination models were also encouraging, with average accuracy (ACC) 0.912 and MCC 0.792 (test set). The 2A-3 (A2A vs A3) selectivity discrimination model (ACC = 0.882 and MCC = 0.715 for test set) outperformed an earlier reported one (ACC = 0.784). These results demonstrated that, through multiple conformation encoding, BRS-3D can be used as an effective molecular descriptor for AR subtype selectivity prediction.
ERIC Educational Resources Information Center
Prado, Elizabeth L.; Hartini, Sri; Rahmawati, Atik; Ismayani, Elfa; Hidayati, Astri; Hikmah, Nurul; Muadz, Husni; Apriatni, Mandri S.; Ullman, Michael T.; Shankar, Anuraj H.; Alcock, Katherine J.
2010-01-01
Background: Evaluating the impact of nutrition interventions on developmental outcomes in developing countries can be challenging since most assessment tests have been produced in and for developed country settings. Such tests may not be valid measures of children's abilities when used in a new context. Aims: We present several principles for the…
Development of Nonword and Irregular Word Lists for Australian Grade 3 Students Using Rasch Analysis
ERIC Educational Resources Information Center
Callinan, Sarah; Cunningham, Everarda; Theiler, Stephen
2014-01-01
Many tests used in educational settings to identify learning difficulties endeavour to pick up only the lowest performers. Yet these tests are generally developed within a Classical Test Theory (CTT) paradigm that assumes that data do not have significant skew. Rasch analysis is more tolerant of skew and was used to validate two newly developed…
ERIC Educational Resources Information Center
Perkins, Kyle; Duncan, Ann
An assessment analysis was performed to determine whether sets of items designed to measure three different subskills of reading comprehension of the Iowa Tests of Basic Skills (ITBSs) did, in fact, distinguish among these subskills. The three major skills objectives were: (1) facts; (2) generalizations; and (3) inferences. Data from…
ERIC Educational Resources Information Center
Aryadoust, Vahid; Baghaei, Purya
2016-01-01
This study aims to examine the relationship between reading comprehension and lexical and grammatical knowledge among English as a foreign language students by using an Artificial Neural Network (ANN). There were 825 test takers administered both a second-language reading test and a set of psychometrically validated grammar and vocabulary tests.…
Lynn, Scott K.; Watkins, Casey M.; Wong, Megan A.; Balfany, Katherine; Feeney, Daniel F.
2018-01-01
The Athos ® wearable system integrates surface electromyography (sEMG ) electrodes into the construction of compression athletic apparel. The Athos system reduces the complexity and increases the portability of collecting EMG data and provides processed data to the end user. The objective of the study was to determine the reliability and validity of Athos as compared with a research grade sEMG system. Twelve healthy subjects performed 7 trials on separate days (1 baseline trial and 6 repeated trials). In each trial subjects wore the wearable sEMG system and had a research grade sEMG system’s electrodes placed just distal on the same muscle, as close as possible to the wearable system’s electrodes. The muscles tested were the vastus lateralis (VL), vastus medialis (VM), and biceps femoris (BF). All testing was done on an isokinetic dynamometer. Baseline testing involved performing isometric 1 repetition maximum tests for the knee extensors and flexors and three repetitions of concentric-concentric knee flexion and extension at MVC for each testing speed: 60, 180, and 300 deg/sec. Repeated trials 2-7 each comprised 9 sets where each set included three repetitions of concentric-concentric knee flexion-extension. Each repeated trial (2-7) comprised one set at each speed and percent MVC (50%, 75%, 100%) combination. The wearable system and research grade sEMG data were processed using the same methods and aligned in time. The amplitude metrics calculated from the sEMG for each repetition were the peak amplitude, sum of the linear envelope, and 95th percentile. Validity results comprise two main findings. First, there is not a significant effect of system (Athos or research grade system) on the repetition amplitude metrics (95%, peak, or sum). Second, the relationship between torque and sEMG is not significantly different between Athos and the research grade system. For reliability testing, the variation across trials and averaged across speeds was 0.8%, 7.3%, and 0.2% higher for Athos from BF, VL and VM, respectively. Also, using the standard deviation of the MVC normalized repetition amplitude, the research grade system showed 10.7% variability while Athos showed 12%. The wearable technology (Athos) provides sEMG measures that are consistent with controlled, research grade technologies and data collection procedures. Key points Surface EMG embedded into athletic garments (Athos) had similar validity and reliability when compared with a research grade system There was no difference in the torque-EMG relationship between the two systems No statistically significant difference in reliability across 6 trials between the two systems The validity and reliability of Athos demonstrates the potential for sEMG to be applied in dynamic rehabilitation and sports settings PMID:29769821
Development of a brief instrument to measure smartphone addiction among nursing students.
Cho, Sumi; Lee, Eunjoo
2015-05-01
Interruptions and distractions due to smartphone use in healthcare settings pose potential risks to patient safety. Therefore, it is important to assess smartphone use at work, to encourage nursing students to review their relevant behaviors, and to recognize these potential risks. This study's aim was to develop a scale to measure smartphone addiction and test its validity and reliability. We investigated nursing students' experiences of distractions caused by smartphones in the clinical setting and their opinions about smartphone use policies. Smartphone addiction and the need for a scale to measure it were identified through a literature review and in-depth interviews with nursing students. This scale showed reliability and validity with exploratory and confirmatory factor analysis. In testing the discriminant and convergent validity of the selected (18) items with four factors, the smartphone addiction model explained approximately 91% (goodness-of-fit index = 0.909) of the variance in the data. Pearson correlation coefficients among addiction level, distractions in the clinical setting, and attitude toward policies on smartphone use were calculated. Addiction level and attitude toward policies of smartphone use were negatively correlated. This study suggests that healthcare organizations in Korea should create practical guidelines and policies for the appropriate use of smartphones in clinical practice.
Tulloch, Joanie; Vaillancourt, Régis; Irwin, Danica; Pascuet, Elena
2012-01-01
OBJECTIVES: To test, modify and validate a set of illustrations depicting different levels of asthma control and common asthma triggers in pediatric patients (and/or their parents) with chronic asthma who presented to the emergency department at the Children’s Hospital of Eastern Ontario, Ottawa, Ontario. METHODS: Semistructured interviews using guessability and translucency questionnaires tested the comprehensibility of 15 illustrations depicting different levels of asthma control and common asthma triggers in children 10 to 17 years of age, and parents of children one to nine years of age who presented to the emergency department. Illustrations with an overall guessability score <80% and/or translucency median score <6, were reviewed by the study team and modified by the study’s graphic designer. Modifications were made based on key concepts identified by study participants. RESULTS: A total of 80 patients were interviewed. Seven of the original 15 illustrations (47%) required modifications to obtain the prespecified guessability and translucency goals. CONCLUSION: The authors successfully developed, modified and validated a set of 15 illustrations representing different levels of asthma control and common asthma triggers. PRACTICE IMPLICATIONS: These illustrations will be incorporated into a child-friendly asthma action plan that enables the child to be involved in his or her asthma self-management care. PMID:22332128
Tulloch, Joanie; Irwin, Danica; Pascuet, Elena; Vaillancourt, Régis
2012-01-01
To test, modify and validate a set of illustrations depicting different levels of asthma control and common asthma triggers in pediatric patients (and⁄or their parents) with chronic asthma who presented to the emergency department at the Children's Hospital of Eastern Ontario, Ottawa, Ontario. Semistructured interviews using guessability and translucency questionnaires tested the comprehensibility of 15 illustrations depicting different levels of asthma control and common asthma triggers in children 10 to 17 years of age, and parents of children one to nine years of age who presented to the emergency department. Illustrations with an overall guessability score <80% and⁄or translucency median score <6, were reviewed by the study team and modified by the study's graphic designer. Modifications were made based on key concepts identified by study participants. A total of 80 patients were interviewed. Seven of the original 15 illustrations (47%) required modifications to obtain the prespecified guessability and translucency goals. The authors successfully developed, modified and validated a set of 15 illustrations representing different levels of asthma control and common asthma triggers. These illustrations will be incorporated into a child-friendly asthma action plan that enables the child to be involved in his or her asthma self-management care.
Chen, Hongda; Zucknick, Manuela; Werner, Simone; Knebel, Phillip; Brenner, Hermann
2015-07-15
Novel noninvasive blood-based screening tests are strongly desirable for early detection of colorectal cancer. We aimed to conduct a head-to-head comparison of the diagnostic performance of 92 plasma-based tumor-associated protein biomarkers for early detection of colorectal cancer in a true screening setting. Among all available 35 carriers of colorectal cancer and a representative sample of 54 men and women free of colorectal neoplasms recruited in a cohort of screening colonoscopy participants in 2005-2012 (N = 5,516), the plasma levels of 92 protein biomarkers were measured. ROC analyses were conducted to evaluate the diagnostic performance. A multimarker algorithm was developed through the Lasso logistic regression model and validated in an independent validation set. The .632+ bootstrap method was used to adjust for the potential overestimation of diagnostic performance. Seventeen protein markers were identified to show statistically significant differences in plasma levels between colorectal cancer cases and controls. The adjusted area under the ROC curves (AUC) of these 17 individual markers ranged from 0.55 to 0.70. An eight-marker classifier was constructed that increased the adjusted AUC to 0.77 [95% confidence interval (CI), 0.59-0.91]. When validating this algorithm in an independent validation set, the AUC was 0.76 (95% CI, 0.65-0.85), and sensitivities at cutoff levels yielding 80% and 90% specificities were 65% (95% CI, 41-80%) and 44% (95% CI, 24-72%), respectively. The identified profile of protein biomarkers could contribute to the development of a powerful multimarker blood-based test for early detection of colorectal cancer. ©2015 American Association for Cancer Research.
Data Mining of Historical Human Data to Assess the Risk of Injury due to Dynamic Loads
NASA Technical Reports Server (NTRS)
Wells, Jesica; Somers, Jeffrey T.; Newby, N.; Gernhardt, Michael
2014-01-01
The NASA Occupant Protection Group is charged with ensuring crewmembers are protected during all dynamic phases of spaceflight. Previous work with outside experts has led to the development of a definition of acceptable risk (DAR) for space capsule vehicles. The DAR defines allowable probability rates for various categories of injuries. An important question is how to validate these probabilities for a given vehicle. One approach is to impact test human volunteers under projected nominal landing loads. The main drawback is the large number of subject tests required to attain a reasonable level of confidence that the injury probability rates would meet those outlined in the DAR. An alternative is to mine existing databases containing human responses to impact. Testing an anthropomorphic test device (ATD) at the same human-exposure levels could yield a range of ATD responses that would meet DAR. As one aspect of future vehicle validation, the ATD could be tested in the vehicle's seat and suit configuration at nominal landing loads and compared with the ATD responses supported by the human data set. This approach could reduce the number of human-volunteer tests NASA would need to conduct to validate that a vehicle meets occupant protection standards. METHODS: The U.S. Air Force has recorded hundreds of human responses to frontal, lateral, and spinal impacts at many acceleration levels and pulse durations. All of this data are stored on the Collaborative Biomechanics Data Network (CBDN), which is maintained by the Wright Patterson Air Force Base (WPAFB). The test device for human occupant restraint (THOR) ATD was impact tested on WPAFB's horizontal impulse accelerator (HIA) matching human-volunteer exposures on the HIA to 5 frontal and 3 spinal loading conditions. No human injuries occurred as a result of these impact conditions. Peak THOR response variables for neck axial tension and compression, and thoracic-spine axial compression were collected. Maximal chest deflection was determined from motion capture video of the impact test. HIC- 15 and BRIC were calculated from head acceleration responses. Given the number of human subjects for each test condition a confidence interval of injury probability will be obtained. RESULTS: Results will be discussed in terms of injury-risk probability estimates based on the human data set evaluated. Also, gaps in the data set will be identified. These gaps could be one of two types. One is areas where additional THOR testing would increase the comparable human data set, thereby improving confidence in the injury probability rate. The other is where additional human testing would assist in obtaining information on other acceleration levels or directions. DISCUSSION: The historical human data showed validity of the THOR ATD for supplemental testing. The historical human data are limited in scope, however. Further data are needed to characterize the effects of sex, age, anthropometry, and deconditioning due to spaceflight on risk of injury
Kong, X; Ma, L; Wu, L; Chen, H; Ma, L; Sun, Y; Wu, W; Ji, Z; Zhang, Z; Yang, C; Ye, S; Chen, S; Dai, S; Xue, Y; Qin, G; Zou, Y; Yu, Q; Jiang, L
2015-01-01
Takayasu arteritis (TA) is a chronic granulomatous large-vessel vasculitis. When diagnosing TA, the criteria designed by the American College of Rheumatology (ACR) are used commonly but they were just classification criteria. There is an urgent need for a new set of diagnostic criteria. One hundred and thirty-one TA patients and 132 control patients with other types of vascular disease were enrolled and both groups were distributed into a "training set" and a "validation set". All general information as well as clinical, laboratory and imaging data were collected. After comparing all the medical records of two groups in the training set, logistic regression and clinical judgment were used to form the new criteria for TA. The new criteria were tested by the validation set. New TA diagnostic criteria within total score 26 include age (<40 years), female, chest pain/chest distress, amaurosis, vascular bruits, a decreased/absent pulse, involvement of the aortic arch or its major branches, and involvement of the abdominal aorta or its branches. Patients with a score ≥ 8 were diagnosed as TA. The sensitivity and specificity of our new criteria were 91.92% and 93.94%, respectively, higher than those of the ACR criteria (75.76%, 85.86%) and the Ishikawa criteria (56.57%, 94.95%). The areas under the ROC curves of the new criteria and ACR criteria were 0.981 and 0.868, respectively (p<0.001). Sensitivity and specificity tested in the validation set were 90.63% and 96.97%, respectively. The new diagnostic criteria exhibited high sensitivity and specificity and have demonstrated to be feasible in the diagnosis of TA.
Stability and Change in Interests: A Longitudinal Study of Adolescents from Grades 8 through 12
ERIC Educational Resources Information Center
Tracey, Terence J. G.; Robbins, Steven B.; Hofsess, Christy D.
2005-01-01
The pattern of RIASEC interests and academic skills were assessed longitudinally from a large-scale national database at three time points: eight grade, 10th grade, and 12th grade. Validation and cross-validation samples of 1000 males and 1000 females in each set were used to test the pattern of these scores over time relative to mean changes,…
The CO₂ GAP Project--CO₂ GAP as a prognostic tool in emergency departments.
Shetty, Amith L; Lai, Kevin H; Byth, Karen
2010-12-01
To determine whether CO₂ GAP [(a-ET) PCO₂] value differs consistently in patients presenting with shortness of breath to the ED requiring ventilatory support. To determine a cut-off value of CO₂ GAP, which is consistently associated with measured outcome and to compare its performance against other derived variables. This prospective observational study was conducted in ED on a convenience sample of 412 from 759 patients who underwent concurrent arterial blood gas and ETCO₂ (end-tidal CO₂) measurement. They were randomized to test sample of 312 patients and validation set of 100 patients. The primary outcome of interest was the need for ventilatory support and secondary outcomes were admission to high dependency unit or death during stay in ED. The randomly selected training set was used to select cut-points for the possible predictors; that is, CO₂ GAP, CO₂ gradient, physiologic dead space and A-a gradient. The sensitivity, specificity and predictive values of these predictors were validated in the test set of 100 patients. Analysis of the receiver operating characteristic curves revealed the CO₂ GAP performed significantly better than the arterial-alveolar gradient in patients requiring ventilator support (area under the curve 0.950 vs 0.726). A CO₂ GAP ≥10 was associated with assisted ventilation outcomes when applied to the validation test set (100% sensitivity 70% specificity). The CO₂ GAP [(a-ET) PCO₂] differs significantly in patients requiring assisted ventilation when presenting with shortness of breath to EDs and further research addressing the prognostic value of CO₂ GAP in this specific aspect is required. © 2010 The Authors. EMA © 2010 Australasian College for Emergency Medicine and Australasian Society for Emergency Medicine.
Sivan, Sree Kanth; Manga, Vijjulatha
2012-02-01
Multiple receptors conformation docking (MRCD) and clustering of dock poses allows seamless incorporation of receptor binding conformation of the molecules on wide range of ligands with varied structural scaffold. The accuracy of the approach was tested on a set of 120 cyclic urea molecules having HIV-1 protease inhibitory activity using 12 high resolution X-ray crystal structures and one NMR resolved conformation of HIV-1 protease extracted from protein data bank. A cross validation was performed on 25 non-cyclic urea HIV-1 protease inhibitor having varied structures. The comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) models were generated using 60 molecules in the training set by applying leave one out cross validation method, r (loo) (2) values of 0.598 and 0.674 for CoMFA and CoMSIA respectively and non-cross validated regression coefficient r(2) values of 0.983 and 0.985 were obtained for CoMFA and CoMSIA respectively. The predictive ability of these models was determined using a test set of 60 cyclic urea molecules that gave predictive correlation (r (pred) (2) ) of 0.684 and 0.64 respectively for CoMFA and CoMSIA indicating good internal predictive ability. Based on this information 25 non-cyclic urea molecules were taken as a test set to check the external predictive ability of these models. This gave remarkable out come with r (pred) (2) of 0.61 and 0.53 for CoMFA and CoMSIA respectively. The results invariably show that this method is useful for performing 3D QSAR analysis on molecules having different structural motifs.
Guise, Brian J; Thompson, Matthew D; Greve, Kevin W; Bianchini, Kevin J; West, Laura
2014-03-01
The current study assessed performance validity on the Stroop Color and Word Test (Stroop) in mild traumatic brain injury (TBI) using criterion-groups validation. The sample consisted of 77 patients with a reported history of mild TBI. Data from 42 moderate-severe TBI and 75 non-head-injured patients with other clinical diagnoses were also examined. TBI patients were categorized on the basis of Slick, Sherman, and Iverson (1999) criteria for malingered neurocognitive dysfunction (MND). Classification accuracy is reported for three indicators (Word, Color, and Color-Word residual raw scores) from the Stroop across a range of injury severities. With false-positive rates set at approximately 5%, sensitivity was as high as 29%. The clinical implications of these findings are discussed. © 2012 The British Psychological Society.
Validity of the Miller forensic assessment of symptoms test in psychiatric inpatients.
Veazey, Connie H; Wagner, Alisha L; Hays, J Ray; Miller, Holly A
2005-06-01
This study investigated the validity of the Miller Forensic Assessment of Symptoms Test (M-FAST), a brief measure of malingering, in an inpatient psychiatric sample of 70. Among those patients who also completed the Personality Assessment Inventory (N=44), Total M-FAST score was related in the expected directions to the Personality Assessment Inventory validity scales and indexes, providing evidence for concurrent validity of the M-FAST. With the PAI malingering index used as a criterion, we examined the diagnostic efficiency of the M-FAST and found a cut score of 8 represented the best balance of sensitivity, specificity, positive predictive power, and negative predictive power. Based on this cut-score of 8, 16% of the population was classified as malingering. The M-FAST appears to be an excellent rapid screen for symptom exaggeration in this population and setting.
A validation procedure for a LADAR system radiometric simulation model
NASA Astrophysics Data System (ADS)
Leishman, Brad; Budge, Scott; Pack, Robert
2007-04-01
The USU LadarSIM software package is a ladar system engineering tool that has recently been enhanced to include the modeling of the radiometry of Ladar beam footprints. This paper will discuss our validation of the radiometric model and present a practical approach to future validation work. In order to validate complicated and interrelated factors affecting radiometry, a systematic approach had to be developed. Data for known parameters were first gathered then unknown parameters of the system were determined from simulation test scenarios. This was done in a way to isolate as many unknown variables as possible, then build on the previously obtained results. First, the appropriate voltage threshold levels of the discrimination electronics were set by analyzing the number of false alarms seen in actual data sets. With this threshold set, the system noise was then adjusted to achieve the appropriate number of dropouts. Once a suitable noise level was found, the range errors of the simulated and actual data sets were compared and studied. Predicted errors in range measurements were analyzed using two methods: first by examining the range error of a surface with known reflectivity and second by examining the range errors for specific detectors with known responsivities. This provided insight into the discrimination method and receiver electronics used in the actual system.
NASA Technical Reports Server (NTRS)
Baumeister, Joseph F.
1994-01-01
A non-flowing, electrically heated test rig was developed to verify computer codes that calculate radiant energy propagation from nozzle geometries that represent aircraft propulsion nozzle systems. Since there are a variety of analysis tools used to evaluate thermal radiation propagation from partially enclosed nozzle surfaces, an experimental benchmark test case was developed for code comparison. This paper briefly describes the nozzle test rig and the developed analytical nozzle geometry used to compare the experimental and predicted thermal radiation results. A major objective of this effort was to make available the experimental results and the analytical model in a format to facilitate conversion to existing computer code formats. For code validation purposes this nozzle geometry represents one validation case for one set of analysis conditions. Since each computer code has advantages and disadvantages based on scope, requirements, and desired accuracy, the usefulness of this single nozzle baseline validation case can be limited for some code comparisons.
Fatigue after stroke: the development and evaluation of a case definition.
Lynch, Joanna; Mead, Gillian; Greig, Carolyn; Young, Archie; Lewis, Susan; Sharpe, Michael
2007-11-01
While fatigue after stroke is a common problem, it has no generally accepted definition. Our aim was to develop a case definition for post-stroke fatigue and to test its psychometric properties. A case definition with face validity and an associated structured interview was constructed. After initial piloting, the feasibility, reliability (test-retest and inter-rater) and concurrent validity (in relation to four fatigue severity scales) were determined in 55 patients with stroke. All participating patients provided satisfactory answers to all the case definition probe questions demonstrating its feasibility For test-retest reliability, kappa was 0.78 (95% CI, 0.57-0.94, P<.01) and for inter-rater reliability kappa was 0.80 (95% CI, 0.62-0.99, P<.01). Patients fulfilling the case definition also had substantially higher fatigue scores on four fatigue severity scales (P<.001) indicating concurrent validity. The proposed case definition is feasible to administer and reliable in practice, and there is evidence of concurrent validity. It requires further evaluation in different settings.
Criteria for a State-of-the-Art Vision Test System
1985-05-01
tests are enumerated for possible inclusion in a battery of candidate vision tests to be statistically examined for validity as predictors of aircrew...derived subset thereof) of vision tests may be given to a series of individuals, and statistical tests may be used to determine which visual functions...no target. Statistical analysis of the responses would set a threshold level, which would define the smallest size - (most distant target) or least
Constrained structural dynamic model verification using free vehicle suspension testing methods
NASA Technical Reports Server (NTRS)
Blair, Mark A.; Vadlamudi, Nagarjuna
1988-01-01
Verification of the validity of a spacecraft's structural dynamic math model used in computing ascent (or in the case of the STS, ascent and landing) loads is mandatory. This verification process requires that tests be carried out on both the payload and the math model such that the ensuing correlation may validate the flight loads calculations. To properly achieve this goal, the tests should be performed with the payload in the launch constraint (i.e., held fixed at only the payload-booster interface DOFs). The practical achievement of this set of boundary conditions is quite difficult, especially with larger payloads, such as the 12-ton Hubble Space Telescope. The development of equations in the paper will show that by exciting the payload at its booster interface while it is suspended in the 'free-free' state, a set of transfer functions can be produced that will have minima that are directly related to the fundamental modes of the payload when it is constrained in its launch configuration.
Wang, Yi-Wen; Tsai, Yun-Fang; Lee, Shwu-Hua; Chen, Ying-Jen; Chen, Hsiu-Fang
2016-07-01
To develop and psychometrically test the Protective Reasons against Suicide Inventory among older Chinese-speaking outpatients. Tools currently exist to test reasons for living among individuals of all ages in western countries, but few are available to assess older adults' protective reasons against suicide in Asia. A cross-sectional survey to investigate protective reasons against suicide among older Chinese-speaking outpatients. The Protective Reasons against Suicide Inventory was developed based on individual interviews with 83 older outpatients in Taiwan, the literature and the authors' clinical experiences. The resulting Inventory was examined in 2013 for content validity, face validity, construct validity, criterion-related validity, internal consistency reliability and test-retest reliability. The Inventory had excellent content validity and face validity. Factor analysis yielded a seven-factor solution, accounting for 87·7% of the variance. Scores on the global Inventory and its subscales tended to be higher in outpatients diagnosed without suicidal ideation than in outpatients diagnosed with suicidal ideation, indicating good criterion validity. Inventory reliability and the intraclass correlation coefficient were satisfactory. The Protective Reasons against Suicide Inventory can be completed in 5 minutes and is perceived as easy to complete. Moreover, the Inventory yielded highly acceptable parameters for validity and reliability. The Protective Reasons against Suicide Inventory can be used to assess older Chinese-speaking outpatients for factors that protect them from attempting suicide. © 2016 John Wiley & Sons Ltd.
NASA Technical Reports Server (NTRS)
Adler, R. F.; Gu, G.; Curtis, S.; Huffman, G. J.; Bolvin, D. T.; Nelkin, E. J.
2005-01-01
The Global Precipitation Climatology Project (GPCP) 25-year precipitation data set is used to evaluate the variability and extremes on global and regional scales. The variability of precipitation year-to-year is evaluated in relation to the overall lack of a significant global trend and to climate events such as ENSO and volcanic eruptions. The validity of conclusions and limitations of the data set are checked by comparison with independent data sets (e.g., TRMM). The GPCP data set necessarily has a heterogeneous time series of input data sources, so part of the assessment described above is to test the initial results for potential influence by major data boundaries in the record. Regional trends, or inter-decadal changes, are also analyzed to determine validity and correlation with other long-term data sets related to the hydrological cycle (e.g., clouds and ocean surface fluxes). Statistics of extremes (both wet and dry) are analyzed at the monthly time scale for the 25 years. A preliminary result of increasing frequency of extreme monthly values will be a focus to determine validity. Daily values for an eight-year are also examined for variation in extremes and compared to the longer monthly-based study.
A new IRT-based standard setting method: application to eCat-listening.
García, Pablo Eduardo; Abad, Francisco José; Olea, Julio; Aguado, David
2013-01-01
Criterion-referenced interpretations of tests are highly necessary, which usually involves the difficult task of establishing cut scores. Contrasting with other Item Response Theory (IRT)-based standard setting methods, a non-judgmental approach is proposed in this study, in which Item Characteristic Curve (ICC) transformations lead to the final cut scores. eCat-Listening, a computerized adaptive test for the evaluation of English Listening, was administered to 1,576 participants, and the proposed standard setting method was applied to classify them into the performance standards of the Common European Framework of Reference for Languages (CEFR). The results showed a classification closely related to relevant external measures of the English language domain, according to the CEFR. It is concluded that the proposed method is a practical and valid standard setting alternative for IRT-based tests interpretations.
Development and evaluation of the Korean Health Literacy Instrument.
Kang, Soo Jin; Lee, Tae Wha; Paasche-Orlow, Michael K; Kim, Gwang Suk; Won, Hee Kwan
2014-01-01
The purpose of this study is to develop and validate the Korean Health Literacy Instrument, which measures the capacity to understand and use health-related information and make informed health decisions in Korean adults. In Phase 1, 33 initial items were generated to measure functional, interactive, and critical health literacy with prose, document, and numeracy tasks. These items included content from health promotion, disease management, and health navigation contexts. Content validity assessment was conducted by an expert panel, and 11 items were excluded. In Phase 2, the 22 remaining items were administered to a convenience sample of 292 adults from community and clinical settings. Exploratory factor and item difficulty and discrimination analyses were conducted and four items with low discrimination were deleted. In Phase 3, the remaining 18 items were administered to a convenience sample of 315 adults 40-64 years of age from community and clinical settings. A confirmatory factor analysis was performed to test the construct validity of the instrument. The Korean Health Literacy Instrument has a range of 0 to 18. The mean score in our validation study was 11.98. The instrument exhibited an internal consistency reliability coefficient of 0.82, and a test-retest reliability of 0.89. The instrument is suitable for screening individuals who have limited health literacy skills. Future studies are needed to further define the psychometric properties and predictive validity of the Korean Health Literacy Instrument.
Reliability and validity of the adolescent health profile-types.
Riley, A W; Forrest, C B; Starfield, B; Green, B; Kang, M; Ensminger, M
1998-08-01
The purpose of this study was to demonstrate the preliminary reliability and validity of a set 13 profiles of adolescent health that describe distinct patterns of health and health service requirements on four domains of health. Reliability and validity were tested in four ethnically diverse population samples of urban and rural youths aged 11 to 17-years-old in public schools (N = 4,066). The reliability of the classification procedure and construct validity were examined in terms of the predicted and actual distributions of age, gender, race, socioeconomic status, and family type. School achievement, medical conditions, and the proportion of youths with a psychiatric disorder also were examined as tests of construct validity. The classification method was shown to produce consistent results across the four populations in terms of proportions of youths assigned with specific sociodemographic characteristics. Variations in health described by specific profiles showed expected relations to sociodemographic characteristics, family structure, school achievement, medical disorders, and psychiatric disorders. This taxonomy of health profile-types appears to effectively describe a set of patterns that characterize adolescent health. The profile-types provide a unique and practical method for identifying subgroups having distinct needs for health services, with potential utility for health policy and planning. Such integrative reporting methods are critical for more effective utilization of health status instruments in health resource planning and policy development.
NASA Astrophysics Data System (ADS)
Kennedy, Joseph H.; Bennett, Andrew R.; Evans, Katherine J.; Price, Stephen; Hoffman, Matthew; Lipscomb, William H.; Fyke, Jeremy; Vargo, Lauren; Boghozian, Adrianna; Norman, Matthew; Worley, Patrick H.
2017-06-01
To address the pressing need to better understand the behavior and complex interaction of ice sheets within the global Earth system, significant development of continental-scale, dynamical ice sheet models is underway. Concurrent to the development of the Community Ice Sheet Model (CISM), the corresponding verification and validation (V&V) process is being coordinated through a new, robust, Python-based extensible software package, the Land Ice Verification and Validation toolkit (LIVVkit). Incorporated into the typical ice sheet model development cycle, it provides robust and automated numerical verification, software verification, performance validation, and physical validation analyses on a variety of platforms, from personal laptops to the largest supercomputers. LIVVkit operates on sets of regression test and reference data sets, and provides comparisons for a suite of community prioritized tests, including configuration and parameter variations, bit-for-bit evaluation, and plots of model variables to indicate where differences occur. LIVVkit also provides an easily extensible framework to incorporate and analyze results of new intercomparison projects, new observation data, and new computing platforms. LIVVkit is designed for quick adaptation to additional ice sheet models via abstraction of model specific code, functions, and configurations into an ice sheet model description bundle outside the main LIVVkit structure. Ultimately, through shareable and accessible analysis output, LIVVkit is intended to help developers build confidence in their models and enhance the credibility of ice sheet models overall.
Gonzales, J L; Loza, A; Chacon, E
2006-03-15
There are several T. vivax specific primers developed for PCR diagnosis. Most of these primers were validated under different DNA extraction methods and study designs leading to heterogeneity of results. The objective of the present study was to validate PCR as a diagnostic test for T. vivax trypanosomosis by means of determining the test sensitivity of different published specific primers with different sample preparations. Four different DNA extraction methods were used to test the sensitivity of PCR with four different primer sets. DNA was extracted directly from whole blood samples, blood dried on filter papers or blood dried on FTA cards. The results showed that the sensitivity of PCR with each primer set was highly dependant of the sample preparation and DNA extraction method. The highest sensitivities for all the primers tested were determined using DNA extracted from whole blood samples, while the lowest sensitivities were obtained when DNA was extracted from filter paper preparations. To conclude, the obtained results are discussed and a protocol for diagnosis and surveillance for T. vivax trypanosomosis is recommended.
Thorne, M C; Degnan, P; Ewen, J; Parkin, G
2000-12-01
The physically based river catchment modelling system SHETRAN incorporates components representing water flow, sediment transport and radionuclide transport both in solution and bound to sediments. The system has been applied to simulate hypothetical future catchments in the context of post-closure radiological safety assessments of a potential site for a deep geological disposal facility for intermediate and certain low-level radioactive wastes at Sellafield, west Cumbria. In order to have confidence in the application of SHETRAN for this purpose, various blind validation studies have been undertaken. In earlier studies, the validation was undertaken against uncertainty bounds in model output predictions set by the modelling team on the basis of how well they expected the model to perform. However, validation can also be carried out with bounds set on the basis of how well the model is required to perform in order to constitute a useful assessment tool. Herein, such an assessment-based validation exercise is reported. This exercise related to a field plot experiment conducted at Calder Hollow, west Cumbria, in which the migration of strontium and lanthanum in subsurface Quaternary deposits was studied on a length scale of a few metres. Blind predictions of tracer migration were compared with experimental results using bounds set by a small group of assessment experts independent of the modelling team. Overall, the SHETRAN system performed well, failing only two out of seven of the imposed tests. Furthermore, of the five tests that were not failed, three were positively passed even when a pessimistic view was taken as to how measurement errors should be taken into account. It is concluded that the SHETRAN system, which is still being developed further, is a powerful tool for application in post-closure radiological safety assessments.
Spoorenberg, Sophie L W; Reijneveld, Sijmen A; Middel, Berrie; Uittenbroek, Ronald J; Kremer, Hubertus P H; Wynia, Klaske
2015-01-01
The aim of the present study was to develop a valid Geriatric ICF Core Set reflecting relevant health-related problems of community-living older adults without dementia. A Delphi study was performed in order to reach consensus (≥70% agreement) on second-level categories from the International Classification of Functioning, Disability and Health (ICF). The Delphi panel comprised 41 older adults, medical and non-medical experts. Content validity of the set was tested in a cross-sectional study including 267 older adults identified as frail or having complex care needs. Consensus was reached for 30 ICF categories in the Delphi study (fourteen Body functions, ten Activities and Participation and six Environmental Factors categories). Content validity of the set was high: the prevalence of all the problems was >10%, except for d530 Toileting. The most frequently reported problems were b710 Mobility of joint functions (70%), b152 Emotional functions (65%) and b455 Exercise tolerance functions (62%). No categories had missing values. The final Geriatric ICF Core Set is a comprehensive and valid set of 29 ICF categories, reflecting the most relevant health-related problems among community-living older adults without dementia. This Core Set may contribute to optimal care provision and support of the older population. Implications for Rehabilitation The Geriatric ICF Core Set may provide a practical tool for gaining an understanding of the relevant health-related problems of community-living older adults without dementia. The Geriatric ICF Core Set may be used in primary care practice as an assessment tool in order to tailor care and support to the needs of older adults. The Geriatric ICF Core Set may be suitable for use in multidisciplinary teams in integrated care settings, since it is based on a broad range of problems in functioning. Professionals should pay special attention to health problems related to mobility and emotional functioning since these are the most prevalent problems in community-living older adults.
Burkey, Matthew D.; Ghimire, Lajina; Adhikari, Ramesh P.; Kohrt, Brandon A.; Jordans, Mark J. D.; Haroz, Emily; Wissow, Lawrence
2017-01-01
Systematic processes are needed to develop valid measurement instruments for disruptive behavior disorders (DBDs) in cross-cultural settings. We employed a four-step process in Nepal to identify and select items for a culturally valid assessment instrument: 1) We extracted items from validated scales and local free-list interviews. 2) Parents, teachers, and peers (n=30) rated the perceived relevance and importance of behavior problems. 3) Highly rated items were piloted with children (n=60) in Nepal. 4) We evaluated internal consistency of the final scale. We identified 49 symptoms from 11 scales, and 39 behavior problems from free-list interviews (n=72). After dropping items for low ratings of relevance and severity and for poor item-test correlation, low frequency, and/or poor acceptability in pilot testing, 16 items remained for the Disruptive Behavior International Scale—Nepali version (DBIS-N). The final scale had good internal consistency (α=0.86). A 4-step systematic approach to scale development including local participation yielded an internally consistent scale that included culturally relevant behavior problems. PMID:28093575
ERIC Educational Resources Information Center
Muchinsky, Paul M.
1975-01-01
A work sample test can provide a high degree of content validity, and offers a practical method of screening job applicants in accordance with guidelines on employee selection procedures set forth by the Equal Employment Opportunity Commission. (MW)
Laibhen-Parkes, Natasha; Kimble, Laura P; Melnyk, Bernadette Mazurek; Sudia, Tanya; Codone, Susan
2018-06-01
Instruments used to assess evidence-based practice (EBP) competence in nurses have been subjective, unreliable, or invalid. The Fresno test was identified as the only instrument to measure all the steps of EBP with supportive reliability and validity data. However, the items and psychometric properties of the original Fresno test are only relevant to measure EBP with medical residents. Therefore, the purpose of this paper is to describe the development of the adapted Fresno test for pediatric nurses, and provide preliminary validity and reliability data for its use with Bachelor of Science in Nursing-prepared pediatric bedside nurses. General adaptations were made to the original instrument's case studies, item content, wording, and format to meet the needs of a pediatric nursing sample. The scoring rubric was also modified to complement changes made to the instrument. Content and face validity, and intrarater reliability of the adapted Fresno test were assessed during a mixed-methods pilot study conducted from October to December 2013 with 29 Bachelor of Science in Nursing-prepared pediatric nurses. Validity data provided evidence for good content and face validity. Intrarater reliability estimates were high. The adapted Fresno test presented here appears to be a valid and reliable assessment of EBP competence in Bachelor of Science in Nursing-prepared pediatric nurses. However, further testing of this instrument is warranted using a larger sample of pediatric nurses in diverse settings. This instrument can be a starting point for evaluating the impact of EBP competence on patient outcomes. © 2018 Sigma Theta Tau International.
Glavatskikh, Marta; Madzhidov, Timur; Solov'ev, Vitaly; Marcou, Gilles; Horvath, Dragos; Varnek, Alexandre
2016-12-01
In this work, we report QSPR modeling of the free energy ΔG of 1 : 1 hydrogen bond complexes of different H-bond acceptors and donors. The modeling was performed on a large and structurally diverse set of 3373 complexes featuring a single hydrogen bond, for which ΔG was measured at 298 K in CCl 4 . The models were prepared using Support Vector Machine and Multiple Linear Regression, with ISIDA fragment descriptors. The marked atoms strategy was applied at fragmentation stage, in order to capture the location of H-bond donor and acceptor centers. Different strategies of model validation have been suggested, including the targeted omission of individual H-bond acceptors and donors from the training set, in order to check whether the predictive ability of the model is not limited to the interpolation of H-bond strength between two already encountered partners. Successfully cross-validating individual models were combined into a consensus model, and challenged to predict external test sets of 629 and 12 complexes, in which donor and acceptor formed single and cooperative H-bonds, respectively. In all cases, SVM models outperform MLR. The SVM consensus model performs well both in 3-fold cross-validation (RMSE=1.50 kJ/mol), and on the external test sets containing complexes with single (RMSE=3.20 kJ/mol) and cooperative H-bonds (RMSE=1.63 kJ/mol). © 2016 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Sleeper, Mark D; Kenyon, Lisa K; Elliott, James M; Cheng, M Samuel
2016-12-01
Despite the availability of various field-tests for many competitive sports, a reliable and valid test specifically developed for use in men's gymnastics has not yet been developed. The Men's Gymnastics Functional Measurement Tool (MGFMT) was designed to assess sport-specific physical abilities in male competitive gymnasts. The purpose of this study was to develop the MGFMT by establishing a scoring system for individual test items and to initiate the process of establishing test-retest reliability and construct validity. A total of 83 competitive male gymnasts ages 7-18 underwent testing using the MGFMT. Thirty of these subjects underwent re-testing one week later in order to assess test-retest reliability. Construct validity was assessed using a simple regression analysis between total MGFMT scores and the gymnasts' USA-Gymnastics competitive level to calculate the coefficient of determination (r 2 ). Test-retest reliability was analyzed using Model 1 Intraclass correlation coefficients (ICC). Statistical significance was set at the p<0.05 level. The relationship between total MGFMT scores and subjects' current USA-Gymnastics competitive level was found to be good (r 2 = 0.63). Reliability testing of the MGFMT composite test score showed excellent test-retest reliability over a one-week period (ICC = 0.97). Test-retest reliability of the individual component tests ranged from good to excellent (ICC = 0.75-0.97). The results of this study provide initial support for the construct validity and test-retest reliability of the MGFMT. Level 3.
Validation of 2D flood models with insurance claims
NASA Astrophysics Data System (ADS)
Zischg, Andreas Paul; Mosimann, Markus; Bernet, Daniel Benjamin; Röthlisberger, Veronika
2018-02-01
Flood impact modelling requires reliable models for the simulation of flood processes. In recent years, flood inundation models have been remarkably improved and widely used for flood hazard simulation, flood exposure and loss analyses. In this study, we validate a 2D inundation model for the purpose of flood exposure analysis at the river reach scale. We validate the BASEMENT simulation model with insurance claims using conventional validation metrics. The flood model is established on the basis of available topographic data in a high spatial resolution for four test cases. The validation metrics were calculated with two different datasets; a dataset of event documentations reporting flooded areas and a dataset of insurance claims. The model fit relating to insurance claims is in three out of four test cases slightly lower than the model fit computed on the basis of the observed inundation areas. This comparison between two independent validation data sets suggests that validation metrics using insurance claims can be compared to conventional validation data, such as the flooded area. However, a validation on the basis of insurance claims might be more conservative in cases where model errors are more pronounced in areas with a high density of values at risk.
D'Amato, Christopher P; Denney, Robert L
2008-09-01
The purpose of this study was to utilize a known-group research design to evaluate the diagnostic utility of the Rarely Missed Index (RMI) of the Wechsler Memory Scale-Third Edition [Wechsler, D. (1997). Wechsler Adult Intelligence Test-3rd Edition. San Antonio, TX: The Psychological Corporation] in assessing response bias in an adult male incarcerated setting. Archival data from a sample of 60 adult male inmates who presented for neuropsychological testing were reviewed. Evaluees were assigned to one of two groups; probable malingerers (PM; n=30) and a group of valid test responders (n=30) (1999). Using the recommended cut-off score of 136 or less, the sensitivity of the RMI was extremely low at 33%. Its specificity was 83%. The positive predictive power of the RMI with the published base rate of 22.8 was 38%; with a negative predictive power of 81%. The positive predictive power of the RMI with a published base rate of 70.5 was 82%. The negative predictive power using a base rate of 70.5% was 34%. Results of the receiver operating characteristics (ROC) analysis indicated that the RMI score with a cut-off 136 or less performed only slightly better than chance in delineating probable malingerers from valid responders in this setting. Overall, the findings suggest that the RMI may not be a reliable index for detecting response bias in this setting and perhaps in similar settings.
Sie, Daoud; Snijders, Peter J F; Meijer, Gerrit A; Doeleman, Marije W; van Moorsel, Marinda I H; van Essen, Hendrik F; Eijk, Paul P; Grünberg, Katrien; van Grieken, Nicole C T; Thunnissen, Erik; Verheul, Henk M; Smit, Egbert F; Ylstra, Bauke; Heideman, Daniëlle A M
2014-10-01
Next generation DNA sequencing (NGS) holds promise for diagnostic applications, yet implementation in routine molecular pathology practice requires performance evaluation on DNA derived from routine formalin-fixed paraffin-embedded (FFPE) tissue specimens. The current study presents a comprehensive analysis of TruSeq Amplicon Cancer Panel-based NGS using a MiSeq Personal sequencer (TSACP-MiSeq-NGS) for somatic mutation profiling. TSACP-MiSeq-NGS (testing 212 hotspot mutation amplicons of 48 genes) and a data analysis pipeline were evaluated in a retrospective learning/test set approach (n = 58/n = 45 FFPE-tumor DNA samples) against 'gold standard' high-resolution-melting (HRM)-sequencing for the genes KRAS, EGFR, BRAF and PIK3CA. Next, the performance of the validated test algorithm was assessed in an independent, prospective cohort of FFPE-tumor DNA samples (n = 75). In the learning set, a number of minimum parameter settings was defined to decide whether a FFPE-DNA sample is qualified for TSACP-MiSeq-NGS and for calling mutations. The resulting test algorithm revealed 82% (37/45) compliance to the quality criteria and 95% (35/37) concordant assay findings for KRAS, EGFR, BRAF and PIK3CA with HRM-sequencing (kappa = 0.92; 95% CI = 0.81-1.03) in the test set. Subsequent application of the validated test algorithm to the prospective cohort yielded a success rate of 84% (63/75), and a high concordance with HRM-sequencing (95% (60/63); kappa = 0.92; 95% CI = 0.84-1.01). TSACP-MiSeq-NGS detected 77 mutations in 29 additional genes. TSACP-MiSeq-NGS is suitable for diagnostic gene mutation profiling in oncopathology.
González-Mancebo, E; Alonso Díaz de Durana, M D; García Estringana, Y; Meléndez Baltanás, A; Rodriguez-Alvarez, M; de la Hoz Caballer, B; Del Prado, N; Fernández-Rivas, M
The double-blind, placebo-controlled food challenge (DBPCFC) is considered the definitive diagnostic test for food allergy. Nevertheless, validated recipes for masking the foods are scarce, have not been standardized, and differ between centers. Sensory evaluation techniques such as the triangle test are necessary to validate the recipes used for DBPCFC. We developed 3 recipes for use in DBPCFC with milk, egg white, and hazelnut and used the triangle test to validate them in a 2-phase study in which 197 volunteers participated. In each phase, participants tried 3 samples (2 active-1 placebo or 2 placebo-1 active) and had to identify the odd one. In phase 1, the 3 samples were given simultaneously, whereas in phase 2, the 3 samples of foods that failed validation in phase 1 were given sequentially. A visual analog scale (VAS) ranging from 1 to 10 was used to evaluate how much participants liked the recipes. In phase 1, the egg white recipe was validated (n=89 volunteers, 38.9% found the odd sample, P=.16). Milk and hazelnut recipes were validated in phase 2 (for both foods, n=30 participants, 36.7% found the odd sample, P=.36). Median VAS scores for the 3 recipes ranged from 6.6 to 9.7. We used sensory testing to validate milk, egg white, and hazelnut recipes for use in DBPCFC. The validated recipes are easy to prepare in a clinical setting, provide the equivalent of 1 serving dose, and were liked by most participants.
Bleau Lavigne, Maude; Reeves, Isabelle; Sasseville, Marie-Josée; Loignon, Christine
The primary purpose of this study was to develop 2 survey tools to explore factors influencing adoption of best practices for diabetic foot ulcer offloading treatment in primary health care settings. One survey was intended for the patients receiving care for a diabetic foot ulcer in primary health care settings and the other was intended for the health professionals providing treatment. The second purpose of this study was to evaluate the psychometric properties of the 2 surveys. Development and validation of survey instruments. Two surveys were developed using a published guide. Following review of pertinent literature and identification of variables to be measured, a bank of items was developed and pretested to determine clarity of the item and responses. Psychometric testing comprised measurement of content validity index (CVI) and intraclass correlation coefficient (ICC). Only items obtaining satisfactory CVI and ICC scores were included in the final version of the surveys. The final version of the patient survey contained 41 items and the final version of the survey for health care professionals contained 21 items. The patient-intended survey's items demonstrate high content validity scores and satisfactory test-retest reliability scores. The overall CVI score was 0.98. Forty of the 49 items eligible for testing obtain satisfactory ICC scores. One item's test-retest reliability could not be tested but it was retained based on its high CVI. The health professional-intended survey, an overall CVI score of 0.91 but items had lower ICC scores (63%, 31 of the 49 items), did not achieve a satisfactory ICC score for inclusion in the final instrument. This project led to development of 2 instruments designed to identify and explore factors influencing adoption of best practices for diabetic foot ulcer offloading treatment in the primary health care setting. Future research and testing is required to translate these French surveys into English and additional languages, in order to reach a broader population.
Analysis of near infrared spectra for age-grading of wild populations of Anopheles gambiae.
Krajacich, Benjamin J; Meyers, Jacob I; Alout, Haoues; Dabiré, Roch K; Dowell, Floyd E; Foy, Brian D
2017-11-07
Understanding the age-structure of mosquito populations, especially malaria vectors such as Anopheles gambiae, is important for assessing the risk of infectious mosquitoes, and how vector control interventions may impact this risk. The use of near-infrared spectroscopy (NIRS) for age-grading has been demonstrated previously on laboratory and semi-field mosquitoes, but to date has not been utilized on wild-caught mosquitoes whose age is externally validated via parity status or parasite infection stage. In this study, we developed regression and classification models using NIRS on datasets of wild An. gambiae (s.l.) reared from larvae collected from the field in Burkina Faso, and two laboratory strains. We compared the accuracy of these models for predicting the ages of wild-caught mosquitoes that had been scored for their parity status as well as for positivity for Plasmodium sporozoites. Regression models utilizing variable selection increased predictive accuracy over the more common full-spectrum partial least squares (PLS) approach for cross-validation of the datasets, validation, and independent test sets. Models produced from datasets that included the greatest range of mosquito samples (i.e. different sampling locations and times) had the highest predictive accuracy on independent testing sets, though overall accuracy on these samples was low. For classification, we found that intramodel accuracy ranged between 73.5-97.0% for grouping of mosquitoes into "early" and "late" age classes, with the highest prediction accuracy found in laboratory colonized mosquitoes. However, this accuracy was decreased on test sets, with the highest classification of an independent set of wild-caught larvae reared to set ages being 69.6%. Variation in NIRS data, likely from dietary, genetic, and other factors limits the accuracy of this technique with wild-caught mosquitoes. Alternative algorithms may help improve prediction accuracy, but care should be taken to either maximize variety in models or minimize confounders.
Innes, Carrie R H; Jones, Richard D; Anderson, Tim J; Hollobon, Susan G; Dalrymple-Alford, John C
2009-05-01
Currently, there is no international standard for the assessment of fitness to drive for cognitively or physically impaired persons. A computerized battery of driving-related sensory-motor and cognitive tests (SMCTests) has been developed, comprising tests of visuoperception, visuomotor ability, complex attention, visual search, decision making, impulse control, planning, and divided attention. Construct validity analysis was conducted in 60 normal, healthy subjects and showed that, overall, the novel cognitive tests assessed cognitive functions similar to a set of standard neuropsychological tests. The novel tests were found to have greater perceived face validity for predicting on-road driving ability than was found in the equivalent standard tests. Test-retest stability and reliability of SMCTests measures, as well as correlations between SMCTests and on-road driving, were determined in a subset of 12 subjects. The majority of test measures were stable and reliable across two sessions, and significant correlations were found between on-road driving scores and measures from ballistic movement, footbrake reaction, hand-control reaction, and complex attention. The substantial face validity, construct validity, stability, and reliability of SMCTests, together with the battery's level of correlation with on-road driving in normal subjects, strengthen our confidence in the ability of SMCTests to detect and identify sensory-motor and cognitive deficits related to unsafe driving and increased risk of accidents.
Trouli, Marianna N; Vernon, Howard T; Kakavelakis, Kyriakos N; Antonopoulou, Maria D; Paganas, Aristofanis N; Lionis, Christos D
2008-07-22
Neck pain is a highly prevalent condition resulting in major disability. Standard scales for measuring disability in patients with neck pain have a pivotal role in research and clinical settings. The Neck Disability Index (NDI) is a valid and reliable tool, designed to measure disability in activities of daily living due to neck pain. The purpose of our study was the translation and validation of the NDI in a Greek primary care population with neck complaints. The original version of the questionnaire was used. Based on international standards, the translation strategy comprised forward translations, reconciliation, backward translation and pre-testing steps. The validation procedure concerned the exploration of internal consistency (Cronbach alpha), test-retest reliability (Intraclass Correlation Coefficient, Bland and Altman method), construct validity (exploratory factor analysis) and responsiveness (Spearman correlation coefficient, Standard Error of Measurement and Minimal Detectable Change) of the questionnaire. Data quality was also assessed through completeness of data and floor/ceiling effects. The translation procedure resulted in the Greek modified version of the NDI. The latter was culturally adapted through the pre-testing phase. The validation procedure raised a large amount of missing data due to low applicability, which were assessed with two methods. Floor or ceiling effects were not observed. Cronbach alpha was calculated as 0.85, which was interpreted as good internal consistency. Intraclass correlation coefficient was found to be 0.93 (95% CI 0.84-0.97), which was considered as very good test-retest reliability. Factor analysis yielded one factor with Eigenvalue 4.48 explaining 44.77% of variance. The Spearman correlation coefficient (0.3; P = 0.02) revealed some relation between the change score in the NDI and Global Rating of Change (GROC). The SEM and MDC were calculated as 0.64 and 1.78 respectively. The Greek version of the NDI measures disability in patients with neck pain in a reliable, valid and responsive manner. It is considered a useful tool for research and clinical settings in Greek Primary Health Care.
Trouli, Marianna N; Vernon, Howard T; Kakavelakis, Kyriakos N; Antonopoulou, Maria D; Paganas, Aristofanis N; Lionis, Christos D
2008-01-01
Background Neck pain is a highly prevalent condition resulting in major disability. Standard scales for measuring disability in patients with neck pain have a pivotal role in research and clinical settings. The Neck Disability Index (NDI) is a valid and reliable tool, designed to measure disability in activities of daily living due to neck pain. The purpose of our study was the translation and validation of the NDI in a Greek primary care population with neck complaints. Methods The original version of the questionnaire was used. Based on international standards, the translation strategy comprised forward translations, reconciliation, backward translation and pre-testing steps. The validation procedure concerned the exploration of internal consistency (Cronbach alpha), test-retest reliability (Intraclass Correlation Coefficient, Bland and Altman method), construct validity (exploratory factor analysis) and responsiveness (Spearman correlation coefficient, Standard Error of Measurement and Minimal Detectable Change) of the questionnaire. Data quality was also assessed through completeness of data and floor/ceiling effects. Results The translation procedure resulted in the Greek modified version of the NDI. The latter was culturally adapted through the pre-testing phase. The validation procedure raised a large amount of missing data due to low applicability, which were assessed with two methods. Floor or ceiling effects were not observed. Cronbach alpha was calculated as 0.85, which was interpreted as good internal consistency. Intraclass correlation coefficient was found to be 0.93 (95% CI 0.84–0.97), which was considered as very good test-retest reliability. Factor analysis yielded one factor with Eigenvalue 4.48 explaining 44.77% of variance. The Spearman correlation coefficient (0.3; P = 0.02) revealed some relation between the change score in the NDI and Global Rating of Change (GROC). The SEM and MDC were calculated as 0.64 and 1.78 respectively. Conclusion The Greek version of the NDI measures disability in patients with neck pain in a reliable, valid and responsive manner. It is considered a useful tool for research and clinical settings in Greek Primary Health Care. PMID:18647393
ERIC Educational Resources Information Center
Yarbrough, Nükhet D.
2016-01-01
As part of a project to translate and administer the Torrance Tests of Creative Thinking (TTCT) to Turkish elementary and secondary students, 35 professionals were trained in a full-day workshop to learn to score the verbal TTCT. All trainees scored the same 4 sets of TTCT verbal criterion tests for fluency, flexibility, and originality by filling…
Test method research on weakening interface strength of steel - concrete under cyclic loading
NASA Astrophysics Data System (ADS)
Liu, Ming-wei; Zhang, Fang-hua; Su, Guang-quan
2018-02-01
The mechanical properties of steel - concrete interface under cyclic loading are the key factors affecting the rule of horizontal load transfer, the calculation of bearing capacity and cumulative horizontal deformation. Cyclic shear test is an effective method to study the strength reduction of steel - concrete interface. A test system composed of large repeated direct shear test instrument, hydraulic servo system, data acquisition system, test control software system and so on is independently designed, and a set of test method, including the specimen preparation, the instrument preparation, the loading method and so on, is put forward. By listing a set of test results, the validity of the test method is verified. The test system and the test method based on it provide a reference for the experimental study on mechanical properties of steel - concrete interface.
Yurteri-Kaplan, Ladin A; Andriani, Leslie; Kumar, Anagha; Saunders, Pamela A; Mete, Mihriye M; Sokol, Andrew I
To develop a valid and reliable survey to measure surgical team members' perceptions regarding their institution's requirements for successful minimally invasive surgery (MIS). Questionnaire development and validation study (Canadian Task Force classification II-2). Three hospital types: rural, urban/academic, and community/academic. Minimally invasive staff (team members). Development and validation of a minimally invasive surgery survey (MISS). Using the Safety Attitudes questionnaire as a guide, we developed questions assessing study participants' attitudes regarding the requirements for successful MIS. The questions were closed-ended and responses based on a 5-point Likert scale. The large pool of questions was then given to 4 focus groups made up of 3 to 6 individuals. Each focus group consisted of individuals from a specific profession (e.g., surgeons, anesthesiologists, nurses, and surgical technicians). Questions were revised based on focus group recommendations, resulting in a final 52-question set. The question set was then distributed to MIS team members. Individuals were included if they had participated in >10 MIS cases and worked in the MIS setting in the past 3 months. Participants in the trial population were asked to repeat the questionnaire 4 weeks later to evaluate internal consistency. Participants' demographics, including age, gender, specialty, profession, and years of experience, were captured in the questionnaire. Factor analysis with varimax rotation was performed to determine domains (questions evaluating similar themes). For internal consistency and reliability, domains were tested using interitem correlations and Cronbach's α. Cronbach's α > .6 was considered internally consistent. Kendall's correlation coefficient τ closer to 1 and with p < .05 was considered significant for the test-retest reliability. Two hundred fifty participants answered the initial question set. Of those, 53 were eliminated because they did not meet inclusion criteria or failed to answer all questions, leaving 197 participants. Most participants were women (68% vs 32%), and 42% were between the ages 30 and 39 years. Factor analysis identified 6 domains: collaboration, error reporting, job proficiency/efficiency, problem-solving, job satisfaction, and situational awareness. Interitem correlations testing for redundancy for each domain ranged from .2 to .7, suggesting similar themed questions while avoiding redundancy. Cronbach's α, testing internal consistency, was .87. Sixty-two participants from the original cohort repeated the question set at 4 weeks. Forty-three were analyzed for test-retest reliability after excluding those who did not meet inclusion criteria. The final questions showed high test-retest reliability (τ = .3-.7, p < .05). The final questionnaire was made up of 29 questions from the original 52 question set. The MISS is a reliable and valid tool that can be used to measure how surgical team members conceptualize the requirements for successful MIS. The MISS revealed that participants identified 6 important domains of a successful workenvironment: collaboration, error reporting, job proficiency/efficiency, problem-solving, job satisfaction, and situational awareness. The questionnaire can be used to understand and align various surgical team members' goals and expectations and may help improve quality of care in the MIS setting. Copyright © 2017 American Association of Gynecologic Laparoscopists. Published by Elsevier Inc. All rights reserved.
Comparison of disease prevalence in two populations in the presence of misclassification.
Tang, Man-Lai; Qiu, Shi-Fang; Poon, Wai-Yin
2012-11-01
Comparing disease prevalence in two groups is an important topic in medical research, and prevalence rates are obtained by classifying subjects according to whether they have the disease. Both high-cost infallible gold-standard classifiers or low-cost fallible classifiers can be used to classify subjects. However, statistical analysis that is based on data sets with misclassifications leads to biased results. As a compromise between the two classification approaches, partially validated sets are often used in which all individuals are classified by fallible classifiers, and some of the individuals are validated by the accurate gold-standard classifiers. In this article, we develop several reliable test procedures and approximate sample size formulas for disease prevalence studies based on the difference between two disease prevalence rates with two independent partially validated series. Empirical studies show that (i) the Score test produces close-to-nominal level and is preferred in practice; and (ii) the sample size formula based on the Score test is also fairly accurate in terms of the empirical power and type I error rate, and is hence recommended. A real example from an aplastic anemia study is used to illustrate the proposed methodologies. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
NASA Astrophysics Data System (ADS)
Dixon, David A.; Hughes, H. Grady
2017-09-01
This paper presents a validation test comparing angular distributions from an electron multiple-scattering experiment with those generated using the MCNP6 Monte Carlo code system. In this experiment, a 13- and 20-MeV electron pencil beam is deflected by thin foils with atomic numbers from 4 to 79. To determine the angular distribution, the fluence is measured down range of the scattering foil at various radii orthogonal to the beam line. The characteristic angle (the angle for which the max of the distribution is reduced by 1/e) is then determined from the angular distribution and compared with experiment. Multiple scattering foils tested herein include beryllium, carbon, aluminum, copper, and gold. For the default electron-photon transport settings, the calculated characteristic angle was statistically distinguishable from measurement and generally broader than the measured distributions. The average relative difference ranged from 5.8% to 12.2% over all of the foils, source energies, and physics settings tested. This validation illuminated a deficiency in the computation of the underlying angular distributions that is well understood. As a result, code enhancements were made to stabilize the angular distributions in the presence of very small substeps. However, the enhancement only marginally improved results indicating that additional algorithmic details should be studied.
ERIC Educational Resources Information Center
Dumbrower, Jule; And Others
1981-01-01
This study attempts to obtain evidence of the construct validity of pupil ability tests hypothesized to represent orientation to right, left, or integrated hemispheric function, and of teacher observation subscales intended to reveal behaviors in school setting that were hypothesized to portray preference for right or left brain function. (Author)
ERIC Educational Resources Information Center
Caliskan, Gokhan
2015-01-01
The current study aims to test the reliability and validity of the Leader-Member Exchange (LMX 7) scale with regard to coach--player relationships in sports settings. A total of 330 professional soccer players from the Turkish Super League as well as from the First and Second Leagues participated in this study. Factor analyses were performed to…
ERIC Educational Resources Information Center
Matter, M. Kevin
The Cherry Creek School district (Englewood, Colorado) is a growing district of 37,000 students in the Denver area. The 1988 Colorado State School Finance Act required district-set proficiencies (standards), and forced agreement on a set of values for student knowledge and skills. State-adopted standards added additional requirements for the…
Beland, H
1994-12-01
Clinical material is presented for discussion with the aim of exemplifying the author's conceptions of validation in a number of sessions and in psychoanalytic research and of making them verifiable, susceptible to consensus and/or falsifiable. Since Freud's postscript to the Dora case, the first clinical validation in the history of psychoanalysis, validation has been group-related and society-related, that is to say, it combines the evidence of subjectivity with the consensus of the research community (the scientific community). Validation verifies the conformity of the unconscious transference meaning with the analyst's understanding. The deciding criterion is the patient's reaction to the interpretation. In terms of the theory of science, validation in the clinical process corresponds to experimental testing of truth in the sphere of inanimate nature. Four settings of validation can be distinguished: the analyst's self-supervision during the process of understanding, which goes from incomprehension to comprehension (container-contained, PS-->D, selected fact); the patient's reaction to the interpretation (insight) and the analyst's assessment of the reaction; supervision and second thoughts; and discussion in groups and publications leading to consensus. It is a peculiarity of psychoanalytic research that in the event of positive validation the three criteria of truth (evidence, consensus and utility) coincide.
NASA Astrophysics Data System (ADS)
Trandafir, Laura; Alexandru, Mioara; Constantin, Mihai; Ioniţă, Anca; Zorilă, Florina; Moise, Valentin
2012-09-01
EN ISO 11137 established regulations for setting or substantiating the dose for achieving the desired sterility assurance level. The validation studies can be designed in particular for different types of products. Each product needs distinct protocols for bioburden determination and sterility testing. The Microbiological Laboratory from Irradiation Processing Center (IRASM) deals with different types of products, mainly for the VDmax25 method. When it comes to microbiological evaluation the most challenging was cotton gauze. A special situation for establishing the sterilization validation method appears in cases of cotton packed in large quantities. The VDmax25 method cannot be applied for items with average bioburden more than 1000 CFU/pack, irrespective of the weight of the package. This is a method limitation and implies increased costs for the manufacturer when choosing other methods. For microbiological tests, culture condition should be selected in both cases of the bioburden and sterility testing. Details about choosing criteria are given.
Hansen, Tor Ivar; Haferstrom, Elise Christina D; Brunner, Jan F; Lehn, Hanne; Håberg, Asta Kristine
2015-01-01
Computerized neuropsychological tests are effective in assessing different cognitive domains, but are often limited by the need of proprietary hardware and technical staff. Web-based tests can be more accessible and flexible. We aimed to investigate validity, effects of computer familiarity, education, and age, and the feasibility of a new web-based self-administered neuropsychological test battery (Memoro) in older adults and seniors. A total of 62 (37 female) participants (mean age 60.7 years) completed the Memoro web-based neuropsychological test battery and a traditional battery composed of similar tests intended to measure the same cognitive constructs. Participants were assessed on computer familiarity and how they experienced the two batteries. To properly test the factor structure of Memoro, an additional factor analysis in 218 individuals from the HUNT population was performed. Comparing Memoro to traditional tests, we observed good concurrent validity (r = .49-.63). The performance on the traditional and Memoro test battery was consistent, but differences in raw scores were observed with higher scores on verbal memory and lower in spatial memory in Memoro. Factor analysis indicated two factors: verbal and spatial memory. There were no correlations between test performance and computer familiarity after adjustment for age or age and education. Subjects reported that they preferred web-based testing as it allowed them to set their own pace, and they did not feel scrutinized by an administrator. Memoro showed good concurrent validity compared to neuropsychological tests measuring similar cognitive constructs. Based on the current results, Memoro appears to be a tool that can be used to assess cognitive function in older and senior adults. Further work is necessary to ascertain its validity and reliability.
Hansen, Tor Ivar; Haferstrom, Elise Christina D.; Brunner, Jan F.; Lehn, Hanne; Håberg, Asta Kristine
2015-01-01
Introduction: Computerized neuropsychological tests are effective in assessing different cognitive domains, but are often limited by the need of proprietary hardware and technical staff. Web-based tests can be more accessible and flexible. We aimed to investigate validity, effects of computer familiarity, education, and age, and the feasibility of a new web-based self-administered neuropsychological test battery (Memoro) in older adults and seniors. Method: A total of 62 (37 female) participants (mean age 60.7 years) completed the Memoro web-based neuropsychological test battery and a traditional battery composed of similar tests intended to measure the same cognitive constructs. Participants were assessed on computer familiarity and how they experienced the two batteries. To properly test the factor structure of Memoro, an additional factor analysis in 218 individuals from the HUNT population was performed. Results: Comparing Memoro to traditional tests, we observed good concurrent validity (r = .49–.63). The performance on the traditional and Memoro test battery was consistent, but differences in raw scores were observed with higher scores on verbal memory and lower in spatial memory in Memoro. Factor analysis indicated two factors: verbal and spatial memory. There were no correlations between test performance and computer familiarity after adjustment for age or age and education. Subjects reported that they preferred web-based testing as it allowed them to set their own pace, and they did not feel scrutinized by an administrator. Conclusions: Memoro showed good concurrent validity compared to neuropsychological tests measuring similar cognitive constructs. Based on the current results, Memoro appears to be a tool that can be used to assess cognitive function in older and senior adults. Further work is necessary to ascertain its validity and reliability. PMID:26009791
Software for Automated Testing of Mission-Control Displays
NASA Technical Reports Server (NTRS)
OHagan, Brian
2004-01-01
MCC Display Cert Tool is a set of software tools for automated testing of computerterminal displays in spacecraft mission-control centers, including those of the space shuttle and the International Space Station. This software makes it possible to perform tests that are more thorough, take less time, and are less likely to lead to erroneous results, relative to tests performed manually. This software enables comparison of two sets of displays to report command and telemetry differences, generates test scripts for verifying telemetry and commands, and generates a documentary record containing display information, including version and corrective-maintenance data. At the time of reporting the information for this article, work was continuing to add a capability for validation of display parameters against a reconfiguration file.
Poirier, Anne-Lise; Kwiatkowski, Fabrice; Commer, Jean-Marie; D'Aillières, Bénédicte; Berger, Virginie; Mercier, Mariette; Bonnetain, Franck
2012-04-20
The end of life for cancer patients is the ultimate stage of the disease, and care in this setting is important as it can improve the wellbeing not only of patients, but also the patients' family and close friends. As it is a matter of profoundly personal concerns, patients' perception of this phase of the disease is difficult to assess and has thus been insufficiently studied. Nonetheless, caregivers are required to provide specific care to help patients and to treat them in order to improve their wellbeing during this period.While tools to assess health-related quality of life (QoL) in cancer patients at the end of life exist in English, to our knowledge, no validated tools are available in French. This randomized multicenter cohort study will be carried out to cross-culturally adapt and validate a French version of the English QUAL-E and the Missoula Vitas Quality Of Life Index (MVQOLI) questionnaires for advanced cancer patients in a palliative setting. A randomized clinical trial component in addition to a cohort study is implemented in order to test psychometric hypotheses: order effect and improvement of sensibility to change.The validation procedure will ensure that the psychometric properties are maintained.The main criterion to assess the reliability of the questionnaires will be reproducibility (test-retest method) using intraclass correlation coefficients. It will be necessary to include 372 patients. The sensitivity to change, discriminant capability as well as convergent validity will be also investigated. If the cross-cultural validation of the MVQOLI and QUAL-E questionnaires for advanced cancer patients in a palliative setting have satisfactory psychometric properties, it will allow us to assess the specific dimensions of QoL at the end of life. Current Controlled Trials NCT01545921.
Cheong, A T; Tong, S F; Sazlina, S G
2015-01-01
Hill-Bone compliance to high blood pressure therapy scale (HBTS) is one of the useful scales in primary care settings. It has been tested in America, Africa and Turkey with variable validity and reliability. The aim of this paper was to determine the validity and reliability of the Malay version of HBTS (HBTS-M) for the Malaysian population. HBTS comprises three subscales assessing compliance to medication, appointment and salt intake. The content validity of HBTS to the local population was agreed through consensus of expert panel. The 14 items used in the HBTS were adapted to reflect the local situations. It was translated into Malay and then back-translated into English. The translated version was piloted in 30 participants. This was followed by structural and predictive validity, and internal consistency testing in 262 patients with hypertension, who were on antihypertensive agent(s) for at least 1 year in two primary healthcare clinics in Kuala Lumpur, Malaysia. Exploratory factor analyses and the correlation between HBTS-M total score and blood pressure were performed. The Cronbach's alpha was calculated accordingly. Factor analysis revealed a three-component structure represented by two components on medication adherence and one on salt intake adherence. The Kaiser-Meyer-Olkin statistic was 0.764. The variance explained by each factors were 23.6%, 10.4% and 9.8%, respectively. However, the internal consistency for each component was suboptimal with Cronbach's alpha of 0.64, 0.55 and 0.29, respectively. Although there were two components representing medication adherence, the theoretical concepts underlying each concept cannot be differentiated. In addition, there was no correlation between the HBTS-M total score and blood pressure. HBTS-M did not conform to the structural and predictive validity of the original scale. Its reliability on assessing medication and salt intake adherence would most probably to be suboptimal in the Malaysian primary care setting.
Dykes, Patricia C; Hurley, Ann C; Brown, Suzanne; Carr, Robyn; Cashen, Margaret; Collins, Rita; Cook, Robyn; Currie, Leanne; Docherty, Charles; Ensio, Anneli; Foster, Joanne; Hardiker, Nicholas R; Honey, Michelle L L; Killalea, Rosaleen; Murphy, Judy; Saranto, Kaija; Sensmeier, Joyce; Weaver, Charlotte
2009-01-01
In 2005, the Healthcare Information Management Systems Society (HIMSS) Nursing Informatics Community developed a survey to measure the impact of health information technology (HIT), the I-HIT Scale, on the role of nurses and interdisciplinary communication in hospital settings. In 2007, nursing informatics colleagues from Australia, England, Finland, Ireland, New Zealand, Scotland and the United States formed a research collaborative to validate the I-HIT across countries. All teams have completed construct and face validation in their countries. Five out of six teams have initiated reliability testing by practicing nurses. This paper reports the international collaborative's validation of the I-HIT Scale completed to date.
Measuring leprosy-related stigma - a pilot study to validate a toolkit of instruments.
Rensen, Carin; Bandyopadhyay, Sudhakar; Gopal, Pala K; Van Brakel, Wim H
2011-01-01
Stigma negatively affects the quality of life of leprosy-affected people. Instruments are needed to assess levels of stigma and to monitor and evaluate stigma reduction interventions. We conducted a validation study of such instruments in Tamil Nadu and West Bengal, India. Four instruments were tested in a 'Community Based Rehabilitation' (CBR) setting, the Participation Scale, Internalised Scale of Mental Illness (ISMI) adapted for leprosy-affected persons, Explanatory Model Interview Catalogue (EMIC) for leprosy-affected and non-affected persons and the General Self-Efficacy (GSE) Scale. We evaluated the following components of validity, construct validity, internal consistency, test-retest reproducibility and reliability to distinguish between groups. Construct validity was tested by correlating instrument scores and by triangulating quantitative and qualitative findings. Reliability was evaluated by comparing levels of stigma among people affected by leprosy and community controls, and among affected people living in CBR project areas and those in non-CBR areas. For the Participation, ISMI and EMIC scores significant differences were observed between those affected by leprosy and those not affected (p = 0.0001), and between affected persons in the CBR and Control group (p < 0.05). The internal consistency of the instruments measured with Cronbach's α ranged from 0.83 to 0.96 and was very good for all instruments. Test-retest reproducibility coefficients were 0.80 for the Participation score, 0.70 for the EMIC score, 0.62 for the ISMI score and 0.50 for the GSE score. The construct validity of all instruments was confirmed. The Participation and EMIC Scales met all validity criteria, but test-retest reproducibility of the ISMI and GSE Scales needs further evaluation with a shorter test-retest interval and longer training and additional adaptations for the latter.
Prowse, Rachel J L; Naylor, Patti-Jean; Olstad, Dana Lee; Carson, Valerie; Mâsse, Louise C; Storey, Kate; Kirk, Sara F L; Raine, Kim D
2018-05-31
Current methods for evaluating food marketing to children often study a single marketing channel or approach. As the World Health Organization urges the removal of unhealthy food marketing in children's settings, methods that comprehensively explore the exposure and power of food marketing within a setting from multiple marketing channels and approaches are needed. The purpose of this study was to test the inter-rater reliability and the validity of a novel settings-based food marketing audit tool. The Food and beverage Marketing Assessment Tool for Settings (FoodMATS) was developed and its psychometric properties evaluated in five public recreation and sport facilities (sites) and subsequently used in 51 sites across Canada for a cross-sectional analysis of food marketing. Raters recorded the count of food marketing occasions, presence of child-targeted and sports-related marketing techniques, and the physical size of marketing occasions. Marketing occasions were classified by healthfulness. Inter-rater reliability was tested using Cohen's kappa (κ) and intra-class correlations (ICC). FoodMATS scores for each site were calculated using an algorithm that represented the theoretical impact of the marketing environment on food preferences, purchases, and consumption. Higher FoodMATS scores represented sites with higher exposure to, and more powerful (unhealthy, child-targeted, sports-related, large) food marketing. Validity of the scoring algorithm was tested through (1) Pearson's correlations between FoodMATS scores and facility sponsorship dollars, and (2) sequential multiple regression for predicting "Least Healthy" food sales from FoodMATS scores. Inter-rater reliability was very good to excellent (κ = 0.88-1.00, p < 0.001; ICC = 0.97, p < 0.001). There was a strong positive correlation between FoodMATS scores and food sponsorship dollars, after controlling for facility size (r = 0.86, p < 0.001). The FoodMATS score explained 14% of the variability in "Least Healthy" concession sales (p = 0.012) and 24% of the variability total concession and vending "Least Healthy" food sales (p = 0.003). FoodMATS has high inter-rater reliability and good validity. As the first validated tool to evaluate the exposure and power of food marketing in recreation facilities, the FoodMATS provides a novel means to comprehensively track changes in food marketing environments that can assist in developing and monitoring the impact of policies and interventions.
Wahl, Simone; Boulesteix, Anne-Laure; Zierer, Astrid; Thorand, Barbara; van de Wiel, Mark A
2016-10-26
Missing values are a frequent issue in human studies. In many situations, multiple imputation (MI) is an appropriate missing data handling strategy, whereby missing values are imputed multiple times, the analysis is performed in every imputed data set, and the obtained estimates are pooled. If the aim is to estimate (added) predictive performance measures, such as (change in) the area under the receiver-operating characteristic curve (AUC), internal validation strategies become desirable in order to correct for optimism. It is not fully understood how internal validation should be combined with multiple imputation. In a comprehensive simulation study and in a real data set based on blood markers as predictors for mortality, we compare three combination strategies: Val-MI, internal validation followed by MI on the training and test parts separately, MI-Val, MI on the full data set followed by internal validation, and MI(-y)-Val, MI on the full data set omitting the outcome followed by internal validation. Different validation strategies, including bootstrap und cross-validation, different (added) performance measures, and various data characteristics are considered, and the strategies are evaluated with regard to bias and mean squared error of the obtained performance estimates. In addition, we elaborate on the number of resamples and imputations to be used, and adopt a strategy for confidence interval construction to incomplete data. Internal validation is essential in order to avoid optimism, with the bootstrap 0.632+ estimate representing a reliable method to correct for optimism. While estimates obtained by MI-Val are optimistically biased, those obtained by MI(-y)-Val tend to be pessimistic in the presence of a true underlying effect. Val-MI provides largely unbiased estimates, with a slight pessimistic bias with increasing true effect size, number of covariates and decreasing sample size. In Val-MI, accuracy of the estimate is more strongly improved by increasing the number of bootstrap draws rather than the number of imputations. With a simple integrated approach, valid confidence intervals for performance estimates can be obtained. When prognostic models are developed on incomplete data, Val-MI represents a valid strategy to obtain estimates of predictive performance measures.
Kim, Shin-Jeong; Kim, Sunghee; Kang, Kyung-Ah; Oh, Jina; Lee, Myung-Nam
2016-02-01
The lack of reliable and valid tools to evaluate learning outcomes during simulations has limited the adoption and progress of simulation-based nursing education. This study had two aims: (a) to develop a simulation evaluation tool (SET(c-dehydration)) to assess students' clinical judgment in caring for children with dehydration based on the Lasater Clinical Judgment Rubric (LCJR) and (b) to examine its reliability and validity. Undergraduate nursing students from two nursing schools in South Korea participated in this study from March 3 through June 10, 2014. The SET(c-dehydration) was developed, and 120 nursing students' clinical judgment was evaluated. Descriptive statistics, Cronbach's alpha, Cohen's kappa coefficient, and confirmatory factor analysis (CFA) were used to analyze the data. A 41-item version of the SET(c-dehydration) with three subscales was developed. Cohen's kappa (measuring inter-observer reliability) of the sessions ranged from .73 to .95, and Cronbach's alpha was .87. The mean total rating of the SET(c-dehydration) by the instructors was 1.92 (±.25), and the mean scores for the four LCJR dimensions of clinical judgment were as follows: noticing (1.74±.27), interpreting (1.85±.43), responding (2.17±.32), and reflecting (1.79±.35). CFA, which was performed to test construct validity, showed that the four dimensions of the SET(c-dehydration) was an appropriate framework. The SET(c-dehydration) provides a means to evaluate clinical judgment in simulation education. Its reliability and validity should be examined further. Copyright © 2015 Elsevier Ltd. All rights reserved.
Cascade Back-Propagation Learning in Neural Networks
NASA Technical Reports Server (NTRS)
Duong, Tuan A.
2003-01-01
The cascade back-propagation (CBP) algorithm is the basis of a conceptual design for accelerating learning in artificial neural networks. The neural networks would be implemented as analog very-large-scale integrated (VLSI) circuits, and circuits to implement the CBP algorithm would be fabricated on the same VLSI circuit chips with the neural networks. Heretofore, artificial neural networks have learned slowly because it has been necessary to train them via software, for lack of a good on-chip learning technique. The CBP algorithm is an on-chip technique that provides for continuous learning in real time. Artificial neural networks are trained by example: A network is presented with training inputs for which the correct outputs are known, and the algorithm strives to adjust the weights of synaptic connections in the network to make the actual outputs approach the correct outputs. The input data are generally divided into three parts. Two of the parts, called the "training" and "cross-validation" sets, respectively, must be such that the corresponding input/output pairs are known. During training, the cross-validation set enables verification of the status of the input-to-output transformation learned by the network to avoid over-learning. The third part of the data, termed the "test" set, consists of the inputs that are required to be transformed into outputs; this set may or may not include the training set and/or the cross-validation set. Proposed neural-network circuitry for on-chip learning would be divided into two distinct networks; one for training and one for validation. Both networks would share the same synaptic weights.
Progress toward the determination of correct classification rates in fire debris analysis.
Waddell, Erin E; Song, Emma T; Rinke, Caitlin N; Williams, Mary R; Sigman, Michael E
2013-07-01
Principal components analysis (PCA), linear discriminant analysis (LDA), and quadratic discriminant analysis (QDA) were used to develop a multistep classification procedure for determining the presence of ignitable liquid residue in fire debris and assigning any ignitable liquid residue present into the classes defined under the American Society for Testing and Materials (ASTM) E 1618-10 standard method. A multistep classification procedure was tested by cross-validation based on model data sets comprised of the time-averaged mass spectra (also referred to as total ion spectra) of commercial ignitable liquids and pyrolysis products from common building materials and household furnishings (referred to simply as substrates). Fire debris samples from laboratory-scale and field test burns were also used to test the model. The optimal model's true-positive rate was 81.3% for cross-validation samples and 70.9% for fire debris samples. The false-positive rate was 9.9% for cross-validation samples and 8.9% for fire debris samples. © 2013 American Academy of Forensic Sciences.
Fluorescence In Situ Hybridization Probe Validation for Clinical Use.
Gu, Jun; Smith, Janice L; Dowling, Patricia K
2017-01-01
In this chapter, we provide a systematic overview of the published guidelines and validation procedures for fluorescence in situ hybridization (FISH) probes for clinical diagnostic use. FISH probes-which are classified as molecular probes or analyte-specific reagents (ASRs)-have been extensively used in vitro for both clinical diagnosis and research. Most commercially available FISH probes in the United States are strictly regulated by the U.S. Food and Drug Administration (FDA), the Centers for Disease Control and Prevention (CDC), the Centers for Medicare & Medicaid Services (CMS) the Clinical Laboratory Improvement Amendments (CLIA), and the College of American Pathologists (CAP). Although home-brewed FISH probes-defined as probes made in-house or acquired from a source that does not supply them to other laboratories-are not regulated by these agencies, they too must undergo the same individual validation process prior to clinical use as their commercial counterparts. Validation of a FISH probe involves initial validation and ongoing verification of the test system. Initial validation includes assessment of a probe's technical specifications, establishment of its standard operational procedure (SOP), determination of its clinical sensitivity and specificity, development of its cutoff, baseline, and normal reference ranges, gathering of analytics, confirmation of its applicability to a specific research or clinical setting, testing of samples with or without the abnormalities that the probe is meant to detect, staff training, and report building. Ongoing verification of the test system involves testing additional normal and abnormal samples using the same method employed during the initial validation of the probe.
Experimental Validation of the Half-Length Force Concept Inventory
ERIC Educational Resources Information Center
Han, Jing; Koenig, Kathleen; Cui, Lili; Fritchman, Joseph; Li, Dan; Sun, Wanyi; Fu, Zhao; Bao, Lei
2016-01-01
In a recent study, the 30-question Force Concept Inventory (FCI) was theoretically split into two 14-question "half-length" tests (HFCIs) covering the same set of concepts and producing mean scores that can be equated to those of the original FCI. The HFCIs require less administration time and reduce test-retest issues when different…
Wealth, Income, and Price Effects in Local School Finance.
ERIC Educational Resources Information Center
Grubb, W. Norton
In this paper, the author attempts to clarify several implicit hypotheses about local school finance reform, set up tests whereby hypothesis validity can be affirmed or rejected, and outline the policy implications of the results. Two mathematical models of school district behavior are examined, and their implications are tested on a sample of 150…
Validating TOEFL[R] iBT Speaking and Setting Score Requirements for ITA Screening
ERIC Educational Resources Information Center
Xi, Xiaoming
2007-01-01
Although the primary use of the speaking section of the Test of English as a Foreign Language Internet-based test (TOEFL[R] iBT Speaking) is to inform admissions decisions at English medium universities, it may also be useful as an initial screening measure for international teaching assistants (ITAs). This study provides criterion-related…
ERIC Educational Resources Information Center
Anglim, Jeromy; Bozic, Stefan; Little, Jonathon; Lievens, Filip
2018-01-01
The current study examined the degree to which applicants applying for medical internships distort their responses to personality tests and assessed whether this response distortion led to reduced predictive validity. The applicant sample (n = 530) completed the NEO Personality Inventory whilst applying for one of 60 positions as first-year…
ERIC Educational Resources Information Center
Daschmann, Elena C.; Goetz, Thomas; Stupnisky, Robert H.
2011-01-01
Background: Boredom has been found to be an important emotion for students' learning processes and achievement outcomes; however, the precursors of this emotion remain largely unexplored. Aim: In the current study, scales assessing the precursors to boredom in academic achievement settings were developed and tested. Sample: Participants were 1,380…
Examining the Effects of Testwiseness in Conceptual Physics Evaluations
ERIC Educational Resources Information Center
DeVore, Seth; Stewart, John; Stewart, Gay
2016-01-01
Testwiseness is defined as the set of cognitive strategies used by a student that is intended to improve his or her score on a test regardless of the test's subject matter. Questions with elements that may be affected by testwiseness are common in physics assessments, even in those which have been extensively validated and widely used as…
The Subjective Visual Vertical: Validation of a Simple Test
ERIC Educational Resources Information Center
Tesio, Luigi; Longo, Stefano; Rota, Viviana
2011-01-01
The study sought to provide norms for a simple test of visual perception of verticality (subjective visual vertical). The study was designed as a cohort study with a balanced design. The setting was the Rehabilitation Department of a University Hospital. Twenty-two healthy adults, of 23-58 years, 11 men (three left handed) and 11 women (three left…
ERIC Educational Resources Information Center
Marcotte, Karine; McSween, Marie-Pier; Pouliot, Monica; Martineau, Sarah; Pauze, Anne-Marie; Wiseman-Hakes, Catherine; MacDonald, Sheila
2017-01-01
Purpose: The Functional Assessment of Verbal Reasoning and Executive Strategies (FAVRES; MacDonald, 2005) test was designed for use by speech-language pathologists to assess verbal reasoning, complex comprehension, discourse, and executive skills during performance on a set of challenging and ecologically valid functional tasks. A recent French…
Shaik, Munvar Miya; Hassan, Norul Badriah; Tan, Huay Lin; Bhaskar, Shalini; Gan, Siew Hua
2014-01-01
The study was designed to determine the validity and reliability of the Bahasa Melayu version (MIDAS-M) of the Migraine Disability Assessment (MIDAS) questionnaire. Patients having migraine for more than six months attending the Neurology Clinic, Hospital Universiti Sains Malaysia, Kubang Kerian, Kelantan, Malaysia, were recruited. Standard forward and back translation procedures were used to translate and adapt the MIDAS questionnaire to produce the Bahasa Melayu version. The translated Malay version was tested for face and content validity. Validity and reliability testing were further conducted with 100 migraine patients (1st administration) followed by a retesting session 21 days later (2nd administration). A total of 100 patients between 15 and 60 years of age were recruited. The majority of the patients were single (66%) and students (46%). Cronbach's alpha values were 0.84 (1st administration) and 0.80 (2nd administration). The test-retest reliability for the total MIDAS score was 0.73, indicating that the MIDAS-M questionnaire is stable; for the five disability questions, the test-retest values ranged from 0.77 to 0.87. The MIDAS-M questionnaire is comparable with the original English version in terms of validity and reliability and may be used for the assessment of migraine in clinical settings.
Shaik, Munvar Miya; Hassan, Norul Badriah; Bhaskar, Shalini; Gan, Siew Hua
2014-01-01
Background. The study was designed to determine the validity and reliability of the Bahasa Melayu version (MIDAS-M) of the Migraine Disability Assessment (MIDAS) questionnaire. Methods. Patients having migraine for more than six months attending the Neurology Clinic, Hospital Universiti Sains Malaysia, Kubang Kerian, Kelantan, Malaysia, were recruited. Standard forward and back translation procedures were used to translate and adapt the MIDAS questionnaire to produce the Bahasa Melayu version. The translated Malay version was tested for face and content validity. Validity and reliability testing were further conducted with 100 migraine patients (1st administration) followed by a retesting session 21 days later (2nd administration). Results. A total of 100 patients between 15 and 60 years of age were recruited. The majority of the patients were single (66%) and students (46%). Cronbach's alpha values were 0.84 (1st administration) and 0.80 (2nd administration). The test-retest reliability for the total MIDAS score was 0.73, indicating that the MIDAS-M questionnaire is stable; for the five disability questions, the test-retest values ranged from 0.77 to 0.87. Conclusion. The MIDAS-M questionnaire is comparable with the original English version in terms of validity and reliability and may be used for the assessment of migraine in clinical settings. PMID:25121099
Kong, Xiangxing; Li, Jun; Cai, Yibo; Tian, Yu; Chi, Shengqiang; Tong, Danyang; Hu, Yeting; Yang, Qi; Li, Jingsong; Poston, Graeme; Yuan, Ying; Ding, Kefeng
2018-01-08
To revise the American Joint Committee on Cancer TNM staging system for colorectal cancer (CRC) based on a nomogram analysis of Surveillance, Epidemiology, and End Results (SEER) database, and to prove the rationality of enhancing T stage's weighting in our previously proposed T-plus staging system. Total 115,377 non-metastatic CRC patients from SEER were randomly grouped as training and testing set by ratio 1:1. The Nomo-staging system was established via three nomograms based on 1-year, 2-year and 3-year disease specific survival (DSS) Logistic regression analysis of the training set. The predictive value of Nomo-staging system for the testing set was evaluated by concordance index (c-index), likelihood ratio (L.R.) and Akaike information criteria (AIC) for 1-year, 2-year, 3-year overall survival (OS) and DSS. Kaplan-Meier survival curve was used to valuate discrimination and gradient monotonicity. And an external validation was performed on database from the Second Affiliated Hospital of Zhejiang University (SAHZU). Patients with T1-2 N1 and T1N2a were classified into stage II while T4 N0 patients were classified into stage III in Nomo-staging system. Kaplan-Meier survival curves of OS and DSS in testing set showed Nomo-staging system performed better in discrimination and gradient monotonicity, and the external validation in SAHZU database also showed distinctly better discrimination. The Nomo-staging system showed higher value in L.R. and c-index, and lower value in AIC when predicting OS and DSS in testing set. The Nomo-staging system showed better performance in prognosis prediction and the weight of lymph nodes status in prognosis prediction should be cautiously reconsidered.
Development and validation of an all-cause mortality risk score in type 2 diabetes.
Yang, Xilin; So, Wing Yee; Tong, Peter C Y; Ma, Ronald C W; Kong, Alice P S; Lam, Christopher W K; Ho, Chung Shun; Cockram, Clive S; Ko, Gary T C; Chow, Chun-Chung; Wong, Vivian C W; Chan, Juliana C N
2008-03-10
Diabetes reduces life expectancy by 10 to 12 years, but whether death can be predicted in type 2 diabetes mellitus remains uncertain. A prospective cohort of 7583 type 2 diabetic patients enrolled since 1995 were censored on July 30, 2005, or after 6 years of follow-up, whichever came first. A restricted cubic spline model was used to check data linearity and to develop linear-transforming formulas. Data were randomly assigned to a training data set and to a test data set. A Cox model was used to develop risk scores in the test data set. Calibration and discrimination were assessed in the test data set. A total of 619 patients died during a median follow-up period of 5.51 years, resulting in a mortality rate of 18.69 per 1000 person-years. Age, sex, peripheral arterial disease, cancer history, insulin use, blood hemoglobin levels, linear-transformed body mass index, random spot urinary albumin-creatinine ratio, and estimated glomerular filtration rate at enrollment were predictors of all-cause death. A risk score for all-cause mortality was developed using these predictors. The predicted and observed death rates in the test data set were similar (P > .70). The area under the receiver operating characteristic curve was 0.85 for 5 years of follow-up. Using the risk score in ranking cause-specific deaths, the area under the receiver operating characteristic curve was 0.95 for genitourinary death, 0.85 for circulatory death, 0.85 for respiratory death, and 0.71 for neoplasm death. Death in type 2 diabetes mellitus can be predicted using a risk score consisting of commonly measured clinical and biochemical variables. Further validation is needed before clinical use.
Developing a Suitable Model for Water Uptake for Biodegradable Polymers Using Small Training Sets.
Valenzuela, Loreto M; Knight, Doyle D; Kohn, Joachim
2016-01-01
Prediction of the dynamic properties of water uptake across polymer libraries can accelerate polymer selection for a specific application. We first built semiempirical models using Artificial Neural Networks and all water uptake data, as individual input. These models give very good correlations (R (2) > 0.78 for test set) but very low accuracy on cross-validation sets (less than 19% of experimental points within experimental error). Instead, using consolidated parameters like equilibrium water uptake a good model is obtained (R (2) = 0.78 for test set), with accurate predictions for 50% of tested polymers. The semiempirical model was applied to the 56-polymer library of L-tyrosine-derived polyarylates, identifying groups of polymers that are likely to satisfy design criteria for water uptake. This research demonstrates that a surrogate modeling effort can reduce the number of polymers that must be synthesized and characterized to identify an appropriate polymer that meets certain performance criteria.
Generating quality word sense disambiguation test sets based on MeSH indexing.
Fan, Jung-Wei; Friedman, Carol
2009-11-14
Word sense disambiguation (WSD) determines the correct meaning of a word that has more than one meaning, and is a critical step in biomedical natural language processing, as interpretation of information in text can be correct only if the meanings of their component terms are correctly identified first. Quality evaluation sets are important to WSD because they can be used as representative samples for developing automatic programs and as referees for comparing different WSD programs. To help create quality test sets for WSD, we developed a MeSH-based automatic sense-tagging method that preferentially annotates terms being topical of the text. Preliminary results were promising and revealed important issues to be addressed in biomedical WSD research. We also suggest that, by cross-validating with 2 or 3 annotators, the method should be able to efficiently generate quality WSD test sets. Online supplement is available at: http://www.dbmi.columbia.edu/~juf7002/AMIA09.
Reliability and validity of the Dutch pediatric Voice Handicap Index.
Veder, Laura; Pullens, Bas; Timmerman, Marieke; Hoeve, Hans; Joosten, Koen; Hakkesteegt, Marieke
2017-05-01
The pediatric voice handicap index (pVHI) has been developed to provide a better insight into the parents' perception of their child's voice related quality of life. The purpose of the present study was to validate the Dutch pVHI by evaluating its internal consistency and reliability. Furthermore, we determined the optimal cut-off point for a normal pVHI score. All items of the English pVHI were translated into Dutch. Parents of children in our dysphonic and control group were asked to fill out the questionnaire. For the test re-test analysis we used a different study group who filled out the pVHI twice as part of a large follow up study. Internal consistency was analyzed through Cronbach's α coefficient. The test-retest reliability was assessed by determining Pearson's correlation coefficient. Mann-Whitney test was used to compare the scores of the questionnaire of the control group with the dysphonic group. By calculating receiver operating characteristic (ROC) curves, sensitivity and specificity we were able to set a cut-off point. We obtained data from 122 asymptomatic children and from 79 dysphonic children. The scores of the questionnaire significantly differed between both groups. The internal consistency showed an overall Cronbach α coefficient of 0.96 and an excellent test-retest reliability of the total pVHI questionnaire with a Pearson's correlation coefficient of 0.90. A cut-off point for the total pVHI questionnaire was set at 7 points with a specificity of 85% and sensitivity of 100%. A cut-off point for the VAS score was set at 13 with a specificity of 93% and sensitivity of 97%. The Dutch pVHI is a valid and reliable tool for the assessment of children with voice problems. By setting a cut-off point for the score of the total pVHI questionnaire of 7 points and the VAS score of 13, the pVHI might be used as a screening tool to assess dysphonic complaints and the pVHI might be a useful and complementary tool to identify children with dysphonia. Copyright © 2017 Elsevier B.V. All rights reserved.
Model-Based Verification and Validation of Spacecraft Avionics
NASA Technical Reports Server (NTRS)
Khan, M. Omair; Sievers, Michael; Standley, Shaun
2012-01-01
Verification and Validation (V&V) at JPL is traditionally performed on flight or flight-like hardware running flight software. For some time, the complexity of avionics has increased exponentially while the time allocated for system integration and associated V&V testing has remained fixed. There is an increasing need to perform comprehensive system level V&V using modeling and simulation, and to use scarce hardware testing time to validate models; the norm for thermal and structural V&V for some time. Our approach extends model-based V&V to electronics and software through functional and structural models implemented in SysML. We develop component models of electronics and software that are validated by comparison with test results from actual equipment. The models are then simulated enabling a more complete set of test cases than possible on flight hardware. SysML simulations provide access and control of internal nodes that may not be available in physical systems. This is particularly helpful in testing fault protection behaviors when injecting faults is either not possible or potentially damaging to the hardware. We can also model both hardware and software behaviors in SysML, which allows us to simulate hardware and software interactions. With an integrated model and simulation capability we can evaluate the hardware and software interactions and identify problems sooner. The primary missing piece is validating SysML model correctness against hardware; this experiment demonstrated such an approach is possible.
Space Suit Joint Torque Measurement Method Validation
NASA Technical Reports Server (NTRS)
Valish, Dana; Eversley, Karina
2012-01-01
In 2009 and early 2010, a test method was developed and performed to quantify the torque required to manipulate joints in several existing operational and prototype space suits. This was done in an effort to develop joint torque requirements appropriate for a new Constellation Program space suit system. The same test method was levied on the Constellation space suit contractors to verify that their suit design met the requirements. However, because the original test was set up and conducted by a single test operator there was some question as to whether this method was repeatable enough to be considered a standard verification method for Constellation or other future development programs. In order to validate the method itself, a representative subset of the previous test was repeated, using the same information that would be available to space suit contractors, but set up and conducted by someone not familiar with the previous test. The resultant data was compared using graphical and statistical analysis; the results indicated a significant variance in values reported for a subset of the re-tested joints. Potential variables that could have affected the data were identified and a third round of testing was conducted in an attempt to eliminate and/or quantify the effects of these variables. The results of the third test effort will be used to determine whether or not the proposed joint torque methodology can be applied to future space suit development contracts.
The Outcome and Assessment Information Set (OASIS): A Review of Validity and Reliability
O’CONNOR, MELISSA; DAVITT, JOAN K.
2015-01-01
The Outcome and Assessment Information Set (OASIS) is the patient-specific, standardized assessment used in Medicare home health care to plan care, determine reimbursement, and measure quality. Since its inception in 1999, there has been debate over the reliability and validity of the OASIS as a research tool and outcome measure. A systematic literature review of English-language articles identified 12 studies published in the last 10 years examining the validity and reliability of the OASIS. Empirical findings indicate the validity and reliability of the OASIS range from low to moderate but vary depending on the item studied. Limitations in the existing research include: nonrepresentative samples; inconsistencies in methods used, items tested, measurement, and statistical procedures; and the changes to the OASIS itself over time. The inconsistencies suggest that these results are tentative at best; additional research is needed to confirm the value of the OASIS for measuring patient outcomes, research, and quality improvement. PMID:23216513
Cronin, Katherine A; Jacobson, Sarah L; Bonnie, Kristin E; Hopper, Lydia M
2017-01-01
Studying animal cognition in a social setting is associated with practical and statistical challenges. However, conducting cognitive research without disturbing species-typical social groups can increase ecological validity, minimize distress, and improve animal welfare. Here, we review the existing literature on cognitive research run with primates in a social setting in order to determine how widespread such testing is and highlight approaches that may guide future research planning. Using Google Scholar to search the terms "primate" "cognition" "experiment" and "social group," we conducted a systematic literature search covering 16 years (2000-2015 inclusive). We then conducted two supplemental searches within each journal that contained a publication meeting our criteria in the original search, using the terms "primate" and "playback" in one search and the terms "primate" "cognition" and "social group" in the second. The results were used to assess how frequently nonhuman primate cognition has been studied in a social setting (>3 individuals), to gain perspective on the species and topics that have been studied, and to extract successful approaches for social testing. Our search revealed 248 unique publications in 43 journals encompassing 71 species. The absolute number of publications has increased over years, suggesting viable strategies for studying cognition in social settings. While a wide range of species were studied they were not equally represented, with 19% of the publications reporting data for chimpanzees. Field sites were the most common environment for experiments run in social groups of primates, accounting for more than half of the results. Approaches to mitigating the practical and statistical challenges were identified. This analysis has revealed that the study of primate cognition in a social setting is increasing and taking place across a range of environments. This literature review calls attention to examples that may provide valuable models for researchers wishing to overcome potential practical and statistical challenges to studying cognition in a social setting, ultimately increasing validity and improving the welfare of the primates we study.
Tomizawa, Ryoko; Yamano, Mayumi; Osako, Mitue; Hirabayashi, Naotugu; Oshima, Nobuo; Sigeta, Masahiro; Reeves, Scott
2017-12-01
Few scales currently exist to assess the quality of interprofessional teamwork through team members' perceptions of working together in mental health settings. The purpose of this study was to revise and validate an interprofessional scale to assess the quality of teamwork in inpatient psychiatric units and to use it multi-nationally. A literature review was undertaken to identify evaluative teamwork tools and develop an additional 12 items to ensure a broad global focus. Focus group discussions considered adaptation to different care systems using subjective judgements from 11 participants in a pre-test of items. Data quality, construct validity, reproducibility, and internal consistency were investigated in the survey using an international comparative design. Exploratory factor analysis yielded five factors with 21 items: 'patient/community centred care', 'collaborative communication', 'interprofessional conflict', 'role clarification', and 'environment'. High overall internal consistency, reproducibility, adequate face validity, and reasonable construct validity were shown in the USA and Japan. The revised Collaborative Practice Assessment Tool (CPAT) is a valid measure to assess the quality of interprofessional teamwork in psychiatry and identifies the best strategies to improve team performance. Furthermore, the revised scale will generate more rigorous evidence for collaborative practice in psychiatry internationally.
A new casemix adjustment index for hospital mortality among patients with congestive heart failure.
Polanczyk, C A; Rohde, L E; Philbin, E A; Di Salvo, T G
1998-10-01
Comparative analysis of hospital outcomes requires reliable adjustment for casemix. Although congestive heart failure is one of the most common indications for hospitalization, congestive heart failure casemix adjustment has not been widely studied. The purposes of this study were (1) to describe and validate a new congestive heart failure-specific casemix adjustment index to predict in-hospital mortality and (2) to compare its performance to the Charlson comorbidity index. Data from all 4,608 admissions to the Massachusetts General Hospital from January 1990 to July 1996 with a principal ICD-9-CM discharge diagnosis of congestive heart failure were evaluated. Massachusetts General Hospital patients were randomly divided in a derivation and a validation set. By logistic regression, odds ratios for in-hospital death were computed and weights were assigned to construct a new predictive index in the derivation set. The performance of the index was tested in an internal Massachusetts General Hospital validation set and in a non-Massachusetts General Hospital external validation set incorporating data from all 1995 New York state hospital discharges with a primary discharge diagnosis of congestive heart failure. Overall in-hospital mortality was 6.4%. Based on the new index, patients were assigned to six categories with incrementally increasing hospital mortality rates ranging from 0.5% to 31%. By logistic regression, "c" statistics of the congestive heart failure-specific index (0.83 and 0.78, derivation and validation set) were significantly superior to the Charlson index (0.66). Similar incrementally increasing hospital mortality rates were observed in the New York database with the congestive heart failure-specific index ("c" statistics 0.75). In an administrative database, this congestive heart failure-specific index may be a more adequate casemix adjustment tool to predict hospital mortality in patients hospitalized for congestive heart failure.
Hall, Jennifer; Barrett, Geraldine; Mbwana, Nicholas; Copas, Andrew; Malata, Address; Stephenson, Judith
2013-11-05
The London Measure of Unplanned Pregnancy (LMUP) is a new and psychometrically valid measure of pregnancy intention that was developed in the United Kingdom. An improved understanding of pregnancy intention in low-income countries, where unintended pregnancies are common and maternal and neonatal deaths are high, is necessary to inform policies to address the unmet need for family planning. To this end this research aimed to validate the LMUP for use in the Chichewa language in Malawi. Three Chichewa speakers translated the LMUP and one translation was agreed which was back-translated and pre-tested on five pregnant women using cognitive interviews. The measure was field tested with pregnant women who were recruited at antenatal clinics and data were analysed using classical test theory and hypothesis testing. 125 women aged 15-43 (median 23), with parities of 1-8 (median 2) completed the Chichewa LMUP. There were no missing data. The full range of LMUP scores was captured. In terms of reliability, the scale was internally consistent (Cronbach's alpha = 0.78) and test-retest data from 70 women showed good stability (weighted Kappa 0.80). In terms of validity, hypothesis testing confirmed that unmarried women (p = 0.003), women who had four or more children alive (p = 0.0051) and women who were below 20 or over 29 (p = 0.0115) were all more likely to have unintended pregnancies. Principal component analysis showed that five of the six items loaded onto one factor, with a further item borderline. A sensitivity analysis to assess the effect of the removal of the weakest item of the scale showed slightly improved performance but as the LMUP was not significantly adversely affected by its inclusion we recommend retaining the six-item score. The Chichewa LMUP is a valid and reliable measure of pregnancy intention in Malawi and can now be used in research and/or surveillance. This is the first validation of this tool in a low-income country, helping to demonstrate that the concept of pregnancy planning is applicable in such a setting. Use of the Chichewa LMUP can enhance our understanding of pregnancy intention in Malawi, giving insight into the family planning services that are required to better meet women's needs and save lives.
DISCOVER-AQ: a unique acoustic propagation verification and validation data set
DOT National Transportation Integrated Search
2015-08-09
In 2013, the National Aeronautics and Space Administration conducted a month-long flight test for the Deriving Information on Surface conditions from Column and Vertically Resolved Observations Relevant to Air Quality research effort in Houston...
Timmings, Caitlyn; Khan, Sobia; Moore, Julia E; Marquez, Christine; Pyka, Kasha; Straus, Sharon E
2016-02-24
To address challenges related to selecting a valid, reliable, and appropriate readiness assessment measure in practice, we developed an online decision support tool to aid frontline implementers in healthcare settings in this process. The focus of this paper is to describe a multi-step, end-user driven approach to developing this tool for use during the planning stages of implementation. A multi-phase, end-user driven approach was used to develop and test the usability of a readiness decision support tool. First, readiness assessment measures that are valid, reliable, and appropriate for healthcare settings were identified from a systematic review. Second, a mapping exercise was performed to categorize individual items of included measures according to key readiness constructs from an existing framework. Third, a modified Delphi process was used to collect stakeholder ratings of the included measures on domains of feasibility, relevance, and likelihood to recommend. Fourth, two versions of a decision support tool prototype were developed and evaluated for usability. Nine valid and reliable readiness assessment measures were included in the decision support tool. The mapping exercise revealed that of the nine measures, most measures (78 %) focused on assessing readiness for change at the organizational versus the individual level, and that four measures (44 %) represented all constructs of organizational readiness. During the modified Delphi process, stakeholders rated most measures as feasible and relevant for use in practice, and reported that they would be likely to recommend use of most measures. Using data from the mapping exercise and stakeholder panel, an algorithm was developed to link users to a measure based on characteristics of their organizational setting and their readiness for change assessment priorities. Usability testing yielded recommendations that were used to refine the Ready, Set, Change! decision support tool . Ready, Set, Change! decision support tool is an implementation support that is designed to facilitate the routine incorporation of a readiness assessment as an early step in implementation. Use of this tool in practice may offer time and resource-saving implications for implementation.
Beutel, Manfred E; Brähler, Elmar; Wiltink, Jörg; Michal, Matthias; Klein, Eva M; Jünger, Claus; Wild, Philipp S; Münzel, Thomas; Blettner, Maria; Lackner, Karl; Nickels, Stefan; Tibubos, Ana N
2017-01-01
Aim of the study was the development and validation of the psychometric properties of a six-item bi-factorial instrument for the assessment of social support (emotional and tangible support) with a population-based sample. A cross-sectional data set of N = 15,010 participants enrolled in the Gutenberg Health Study (GHS) in 2007-2012 was divided in two sub-samples. The GHS is a population-based, prospective, observational single-center cohort study in the Rhein-Main-Region in western Mid-Germany. The first sub-sample was used for scale development by performing an exploratory factor analysis. In order to test construct validity, confirmatory factor analyses were run to compare the extracted bi-factorial model with the one-factor solution. Reliability of the scales was indicated by calculating internal consistency. External validity was tested by investigating demographic characteristics health behavior, and distress using analysis of variance, Spearman and Pearson correlation analysis, and logistic regression analysis. Based on an exploratory factor analysis, a set of six items was extracted representing two independent factors. The two-factor structure of the Brief Social Support Scale (BS6) was confirmed by the results of the confirmatory factor analyses. Fit indices of the bi-factorial model were good and better compared to the one-factor solution. External validity was demonstrated for the BS6. The BS6 is a reliable and valid short scale that can be applied in social surveys due to its brevity to assess emotional and practical dimensions of social support.
Validation of the 3-day rule for stool bacterial tests in Japan.
Kobayashi, Masanori; Sako, Akahito; Ogami, Toshiko; Nishimura, So; Asayama, Naoki; Yada, Tomoyuki; Nagata, Naoyoshi; Sakurai, Toshiyuki; Yokoi, Chizu; Kobayakawa, Masao; Yanase, Mikio; Masaki, Naohiko; Takeshita, Nozomi; Uemura, Naomi
2014-01-01
Stool cultures are expensive and time consuming, and the positive rate of enteric pathogens in cases of nosocomial diarrhea is low. The 3-day rule, whereby clinicians order a Clostridium difficile (CD) toxin test rather than a stool culture for inpatients developing diarrhea >3 days after admission, has been well studied in Western countries. The present study sought to validate the 3-day rule in an acute care hospital setting in Japan. Stool bacterial and CD toxin test results for adult patients hospitalized in an acute care hospital in 2008 were retrospectively analyzed. Specimens collected after an initial positive test were excluded. The positive rate and cost-effectiveness of the tests were compared among three patient groups. The adult patients were divided into three groups for comparison: outpatients, patients hospitalized for ≤3 days and patients hospitalized for ≥4 days. Over the 12-month period, 1,597 stool cultures were obtained from 992 patients, and 880 CD toxin tests were performed in 529 patients. In the outpatient, inpatient ≤3 days and inpatient ≥4 days groups, the rate of positive stool cultures was 14.2%, 3.6% and 1.3% and that of positive CD toxin tests was 1.9%, 7.1% and 8.5%, respectively. The medical costs required to obtain one positive result were 9,181, 36,075 and 103,600 JPY and 43,200, 11,333 and 9,410 JPY, respectively. The 3-day rule was validated for the first time in a setting other than a Western country. Our results revealed that the "3-day rule" is also useful and cost-effective in Japan.
Validation of a Low-Cost Paper-Based Screening Test for Sickle Cell Anemia
Piety, Nathaniel Z.; Yang, Xiaoxi; Kanter, Julie; Vignes, Seth M.; George, Alex; Shevkoplyas, Sergey S.
2016-01-01
Background The high childhood mortality and life-long complications associated with sickle cell anemia (SCA) in developing countries could be significantly reduced with effective prophylaxis and education if SCA is diagnosed early in life. However, conventional laboratory methods used for diagnosing SCA remain prohibitively expensive and impractical in this setting. This study describes the clinical validation of a low-cost paper-based test for SCA that can accurately identify sickle trait carriers (HbAS) and individuals with SCA (HbSS) among adults and children over 1 year of age. Methods and Findings In a population of healthy volunteers and SCA patients in the United States (n = 55) the test identified individuals whose blood contained any HbS (HbAS and HbSS) with 100% sensitivity and 100% specificity for both visual evaluation and automated analysis, and detected SCA (HbSS) with 93% sensitivity and 94% specificity for visual evaluation and 100% sensitivity and 97% specificity for automated analysis. In a population of post-partum women (with a previously unknown SCA status) at a primary obstetric hospital in Cabinda, Angola (n = 226) the test identified sickle cell trait carriers with 94% sensitivity and 97% specificity using visual evaluation (none of the women had SCA). Notably, our test permits instrument- and electricity-free visual diagnostics, requires minimal training to be performed, can be completed within 30 minutes, and costs about $0.07 in test-specific consumable materials. Conclusions Our results validate the paper-based SCA test as a useful low-cost tool for screening adults and children for sickle trait and disease and demonstrate its practicality in resource-limited clinical settings. PMID:26735691
Jacob, Robin; Somers, Marie-Andree; Zhu, Pei; Bloom, Howard
2016-06-01
In this article, we examine whether a well-executed comparative interrupted time series (CITS) design can produce valid inferences about the effectiveness of a school-level intervention. This article also explores the trade-off between bias reduction and precision loss across different methods of selecting comparison groups for the CITS design and assesses whether choosing matched comparison schools based only on preintervention test scores is sufficient to produce internally valid impact estimates. We conduct a validation study of the CITS design based on the federal Reading First program as implemented in one state using results from a regression discontinuity design as a causal benchmark. Our results contribute to the growing base of evidence regarding the validity of nonexperimental designs. We demonstrate that the CITS design can, in our example, produce internally valid estimates of program impacts when multiple years of preintervention outcome data (test scores in the present case) are available and when a set of reasonable criteria are used to select comparison organizations (schools in the present case). © The Author(s) 2016.
Clinical evaluation of a new noninvasive ankle arthrometer.
Nauck, Tanja; Lohrer, Heinz; Gollhofer, Albert
2010-06-01
A nonradiographic arthrometer was developed to objectively quantify anterior talar drawer instability in stable and unstable ankles. Diagnostic validity of this device was previously demonstrated in a cadaver study. The aim of the present study was to validate the ankle arthrometer in an in vivo setting. Twenty-three subjects participated in the study. An orthopedic surgeon first performed a manual anterior talar drawer test to classify the subjects' ankles as stable or unstable. The subjects were then evaluated using the ankle arthrometer, and filled out a validated self-reported questionnaire (German version of the Foot and Ankle Ability Measure [FAAM-G]). Ankle stiffness was calculated from the low linear region (40-60 N) of the load deformation curves obtained from the ankle arthrometer. Reliability testing of these stiffness values was done based on load deformation curves, with 150 and 200 N maximum anterior drawer loads applied in the ankle arthrometer. Using the manual anterior drawer test, 16 ankles were classified as stable and 7 were classified as unstable. Arthrometer stiffness analysis differentiated stable from unstable ankles (P = 0.00 and P = 0.01, respectively). Test-retest demonstrated an accurate reliability (intraclass correlation coefficient = 0.80). A significant correlation was found between both FAAM-G subscales and the arthrometer stiffness values (r = 0.43 and 0.54; P = 0.04 and 0.01). Discussion Subjects with and without mechanical ankle instability could be differentiated by ankle arthrometer stiffness analysis and the FAAM-G questionnaire results. This nonradiographic device may be relevant for screening athletes at risk for ankle injuries, for clinical follow-up studies, and implementing preventive strategies. Validity and reliability of the new ankle arthrometer is demonstrated in a small cohort in an in vivo setting.
Validation of the Headache Impact Test (HIT-6) in patients with chronic migraine.
Rendas-Baum, Regina; Yang, Min; Varon, Sepideh F; Bloudek, Lisa M; DeGryse, Ronald E; Kosinski, Mark
2014-08-01
The Headache Impact Test (HIT)-6 was developed and has been validated in patients with various types of headache. The objective of this study was to report the psychometric properties of the HIT-6 among patients with chronic migraine. Data came from two international, multicenter, randomized, double-blind, placebo-controlled clinical trials of chronic migraine patients (N = 1,384) undergoing prophylaxis therapy. Confirmatory factor analysis and differential item functioning (DIF) analysis were used to test the latent structure and cross-cultural comparability of the HIT-6. Reliability, construct validity, and responsiveness were assessed. Two sets of criterion groups were used: (1) 28-day headache frequency: <10, 10-14, and ≥15 days; (2) sample quartiles of the total cumulative hours of headache: <140, 140 to <280, 280 to <420, and ≥420 hours. Two sets of responsiveness categories were defined as reduction of <30%, 30% to <50%, or ≥50% in (1) number of headache days and (2) cumulative hours of headache. Measurement invariance tests supported the stability of the HIT-6 latent structure across studies. DIF analysis supported cross-cultural comparability. Good reliability was observed across studies (Cronbach's α: 0.75-0.92; intraclass correlation coefficient: 0.76-0.80). HIT-6 scores correlated strongly (-0.86 to -0.59) with scores of the Migraine-Specific Quality-of-Life Questionnaire. Analysis of variance indicated that HIT-6 scores discriminated across both types of criterion groups (P<0.001), across studies and time points. HIT-6 change scores were significantly higher in magnitude in groups experiencing greater improvement (P<0.001). All measurement properties were consistently verified across the two studies, supporting the validity of the HIT-6 among chronic migraine patients. NCT00156910 and NCT00168428 on www.ClinicalTrials.gov.
(NTF) National Transonic Facility Test 213-SFW Flow Control II,
2012-11-19
(NTF) National Transonic Facility Test 213-SFW Flow Control II, Fast-MAC Model: The fundamental Aerodynamics Subsonic Transonic-Modular Active Control (Fast-MAC) Model was tested for the 2nd time in the NTF. The objectives were to document the effects of Reynolds numbers on circulation control aerodynamics and to develop and open data set for CFD code validation. Image taken in building 1236, National Transonic Facility
Golder, Su; Wright, Kath; Loke, Yoon Kong
2018-06-01
Search filter development for adverse effects has tended to focus on retrieving studies of drug interventions. However, a different approach is required for surgical interventions. To develop and validate search filters for medline and Embase for the adverse effects of surgical interventions. Systematic reviews of surgical interventions where the primary focus was to evaluate adverse effect(s) were sought. The included studies within these reviews were divided randomly into a development set, evaluation set and validation set. Using word frequency analysis we constructed a sensitivity maximising search strategy and this was tested in the evaluation and validation set. Three hundred and fifty eight papers were included from 19 surgical intervention reviews. Three hundred and fifty two papers were available on medline and 348 were available on Embase. Generic adverse effects search strategies in medline and Embase could achieve approximately 90% relative recall. Recall could be further improved with the addition of specific adverse effects terms to the search strategies. We have derived and validated a novel search filter that has reasonable performance for identifying adverse effects of surgical interventions in medline and Embase. However, we appreciate the limitations of our methods, and recommend further research on larger sample sizes and prospective systematic reviews. © 2018 The Authors Health Information and Libraries Journal published by John Wiley & Sons Ltd on behalf of Health Libraries Group.
45 CFR 162.1011 - Valid code sets.
Code of Federal Regulations, 2012 CFR
2012-10-01
... 45 Public Welfare 1 2012-10-01 2012-10-01 false Valid code sets. 162.1011 Section 162.1011 Public Welfare DEPARTMENT OF HEALTH AND HUMAN SERVICES ADMINISTRATIVE DATA STANDARDS AND RELATED REQUIREMENTS ADMINISTRATIVE REQUIREMENTS Code Sets § 162.1011 Valid code sets. Each code set is valid within the dates...
45 CFR 162.1011 - Valid code sets.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 45 Public Welfare 1 2013-10-01 2013-10-01 false Valid code sets. 162.1011 Section 162.1011 Public Welfare DEPARTMENT OF HEALTH AND HUMAN SERVICES ADMINISTRATIVE DATA STANDARDS AND RELATED REQUIREMENTS ADMINISTRATIVE REQUIREMENTS Code Sets § 162.1011 Valid code sets. Each code set is valid within the dates...
45 CFR 162.1011 - Valid code sets.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 45 Public Welfare 1 2010-10-01 2010-10-01 false Valid code sets. 162.1011 Section 162.1011 Public Welfare DEPARTMENT OF HEALTH AND HUMAN SERVICES ADMINISTRATIVE DATA STANDARDS AND RELATED REQUIREMENTS ADMINISTRATIVE REQUIREMENTS Code Sets § 162.1011 Valid code sets. Each code set is valid within the dates...
45 CFR 162.1011 - Valid code sets.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 45 Public Welfare 1 2014-10-01 2014-10-01 false Valid code sets. 162.1011 Section 162.1011 Public Welfare Department of Health and Human Services ADMINISTRATIVE DATA STANDARDS AND RELATED REQUIREMENTS ADMINISTRATIVE REQUIREMENTS Code Sets § 162.1011 Valid code sets. Each code set is valid within the dates...
45 CFR 162.1011 - Valid code sets.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 45 Public Welfare 1 2011-10-01 2011-10-01 false Valid code sets. 162.1011 Section 162.1011 Public Welfare DEPARTMENT OF HEALTH AND HUMAN SERVICES ADMINISTRATIVE DATA STANDARDS AND RELATED REQUIREMENTS ADMINISTRATIVE REQUIREMENTS Code Sets § 162.1011 Valid code sets. Each code set is valid within the dates...
TIE: an ability test of emotional intelligence.
Śmieja, Magdalena; Orzechowski, Jarosław; Stolarski, Maciej S
2014-01-01
The Test of Emotional Intelligence (TIE) is a new ability scale based on a theoretical model that defines emotional intelligence as a set of skills responsible for the processing of emotion-relevant information. Participants are provided with descriptions of emotional problems, and asked to indicate which emotion is most probable in a given situation, or to suggest the most appropriate action. Scoring is based on the judgments of experts: professional psychotherapists, trainers, and HR specialists. The validation study showed that the TIE is a reliable and valid test, suitable for both scientific research and individual assessment. Its internal consistency measures were as high as .88. In line with theoretical model of emotional intelligence, the results of the TIE shared about 10% of common variance with a general intelligence test, and were independent of major personality dimensions.
Land Ice Verification and Validation Kit
DOE Office of Scientific and Technical Information (OSTI.GOV)
2015-07-15
To address a pressing need to better understand the behavior and complex interaction of ice sheets within the global Earth system, significant development of continental-scale, dynamical ice-sheet models is underway. The associated verification and validation process of these models is being coordinated through a new, robust, python-based extensible software package, the Land Ice Verification and Validation toolkit (LIVV). This release provides robust and automated verification and a performance evaluation on LCF platforms. The performance V&V involves a comprehensive comparison of model performance relative to expected behavior on a given computing platform. LIVV operates on a set of benchmark and testmore » data, and provides comparisons for a suite of community prioritized tests, including configuration and parameter variations, bit-4-bit evaluation, and plots of tests where differences occur.« less
Forecasting Advancement Rates to Petty Officer Third Class for U.S. Navy Hospital Corpsmen
2014-06-01
variable. c. Designation of Data Subsets for Cross-Validation In order to maintain the integrity of the analysis and test the fitted models’ predictive...two models, an H-L goodness-of-fit test is conducted on the 1,524 individual Sailors within the designated test data set; the results of which are...the total number of sea months, the proportion of vacancies to test takers, Armed Forces Qualification Test score, and performance mark average
IRIS thermal balance test within ESTEC LSS
NASA Technical Reports Server (NTRS)
Messidoro, Piero; Ballesio, Marino; Vessaz, J. P.
1988-01-01
The Italian Research Interim Stage (IRIS) thermal balance test was successfully performed in the ESTEC Large Space Simulator (LSS) to qualify the thermal design and to validate the thermal mathematical model. Characteristics of the test were the complexity of the set-up required to simulate the Shuttle cargo bay and allowing IRIS mechanism actioning and operation for the first time in the new LSS facility. Details of the test are presented, and test results for IRIS and the LSS facility are described.
Vreugdenhil, Jettie; Spek, Bea
2018-03-01
Clinical reasoning in patient care is a skill that cannot be observed directly. So far, no reliable, valid instrument exists for the assessment of nursing students' clinical reasoning skills in hospital practice. Lasater's clinical judgment rubric (LCJR), based on Tanner's model "Thinking like a nurse" has been tested, mainly in academic simulation settings. The aim is to develop a Dutch version of the LCJR (D-LCJR) and to test its psychometric properties when used in a hospital traineeship context. A mixed-model approach was used to develop and to validate the instrument. Ten dedicated educational units in a university hospital. A well-mixed group of 52 nursing students, nurse coaches and nurse educators. A Delphi panel developed the D-LCJR. Students' clinical reasoning skills were assessed "live" by nurse coaches, nurse educators and students who rated themselves. The psychometric properties tested during the assessment process are reliability, reproducibility, content validity and construct validity by testing two hypothesis: 1) a positive correlation between assessed and self-reported sum scores (convergent validity) and 2) a linear relation between experience and sum score (clinical validity). The obtained D-LCJR was found to be internally consistent, Cronbach's alpha 0.93. The rubric is also reproducible with intraclass correlations between 0.69 and 0.78. Experts judged it to be content valid. The two hypothesis were both tested significant, supporting evidence for construct validity. The translated and modified LCJR, is a promising tool for the evaluation of nursing students' development in clinical reasoning in hospital traineeships, by students, nurse coaches and nurse educators. More evidence on construct validity is necessary, in particular for students at the end of their hospital traineeship. Based on our research, the D-LCJR applied in hospital traineeships is a usable and reliable tool. Copyright © 2017 Elsevier Ltd. All rights reserved.
Validation of the AVM Blast Computational Modeling and Simulation Tool Set
2015-08-04
by-construction" methodology is powerful and would not be possible without high -level design languages to support validation and verification. [1,4...to enable the making of informed design decisions. Enable rapid exploration of the design trade-space for high -fidelity requirements tradeoffs...live-fire tests, the jump height of the target structure is recorded by using either high speed cameras or a string pot. A simple projectile motion
ERIC Educational Resources Information Center
Williams, Harriet G.; Pfeiffer, Karin A.; Dowda, Marsha; Jeter, Chevy; Jones, Shaverra; Pate, Russell R.
2009-01-01
The purpose of this study was to develop a valid and reliable tool for use in assessing motor skills in preschool children in field-based settings. The development of the Children's Activity and Movement in Preschool Study Motor Skills Protocol included evidence of its reliability and validity for use in field-based environments as part of large…
Establishing Qualitative Software Metrics in Department of the Navy Programs
2015-10-29
dedicated to provide the highest quality software to its users. In doing, there is a need for a formalized set of Software Quality Metrics . The goal...of this paper is to establish the validity of those necessary Quality metrics . In our approach we collected the data of over a dozen programs...provide the necessary variable data for our formulas and tested the formulas for validity. Keywords: metrics ; software; quality I. PURPOSE Space
Kenyon, Lisa K.; Elliott, James M; Cheng, M. Samuel
2016-01-01
Purpose/Background Despite the availability of various field-tests for many competitive sports, a reliable and valid test specifically developed for use in men's gymnastics has not yet been developed. The Men's Gymnastics Functional Measurement Tool (MGFMT) was designed to assess sport-specific physical abilities in male competitive gymnasts. The purpose of this study was to develop the MGFMT by establishing a scoring system for individual test items and to initiate the process of establishing test-retest reliability and construct validity. Methods A total of 83 competitive male gymnasts ages 7-18 underwent testing using the MGFMT. Thirty of these subjects underwent re-testing one week later in order to assess test-retest reliability. Construct validity was assessed using a simple regression analysis between total MGFMT scores and the gymnasts’ USA-Gymnastics competitive level to calculate the coefficient of determination (r2). Test-retest reliability was analyzed using Model 1 Intraclass correlation coefficients (ICC). Statistical significance was set at the p<0.05 level. Results The relationship between total MGFMT scores and subjects’ current USA-Gymnastics competitive level was found to be good (r2 = 0.63). Reliability testing of the MGFMT composite test score showed excellent test-retest reliability over a one-week period (ICC = 0.97). Test-retest reliability of the individual component tests ranged from good to excellent (ICC = 0.75-0.97). Conclusions The results of this study provide initial support for the construct validity and test-retest reliability of the MGFMT. Level of Evidence Level 3 PMID:27999723
Neural Network Prediction of New Aircraft Design Coefficients
NASA Technical Reports Server (NTRS)
Norgaard, Magnus; Jorgensen, Charles C.; Ross, James C.
1997-01-01
This paper discusses a neural network tool for more effective aircraft design evaluations during wind tunnel tests. Using a hybrid neural network optimization method, we have produced fast and reliable predictions of aerodynamical coefficients, found optimal flap settings, and flap schedules. For validation, the tool was tested on a 55% scale model of the USAF/NASA Subsonic High Alpha Research Concept aircraft (SHARC). Four different networks were trained to predict coefficients of lift, drag, moment of inertia, and lift drag ratio (C(sub L), C(sub D), C(sub M), and L/D) from angle of attack and flap settings. The latter network was then used to determine an overall optimal flap setting and for finding optimal flap schedules.
Davies, Kylie; Bulsara, Max K; Ramelet, Anne-Sylvie; Monterosso, Leanne
2018-05-01
To establish criterion-related construct validity and test-retest reliability for the Endotracheal Suction Assessment Tool© (ESAT©). Endotracheal tube suction performed in children can significantly affect clinical stability. Previously identified clinical indicators for endotracheal tube suction were used as criteria when designing the ESAT©. Content validity was reported previously. The final stages of psychometric testing are presented. Observational testing was used to measure construct validity and determine whether the ESAT© could guide "inexperienced" paediatric intensive care nurses' decision-making regarding endotracheal tube suction. Test-retest reliability of the ESAT© was performed at two time points. The researchers and paediatric intensive care nurse "experts" developed 10 hypothetical clinical scenarios with predetermined endotracheal tube suction outcomes. "Experienced" (n = 12) and "inexperienced" (n = 14) paediatric intensive care nurses were presented with the scenarios and the ESAT© guiding decision-making about whether to perform endotracheal tube suction for each scenario. Outcomes were compared with those predetermined by the "experts" (n = 9). Test-retest reliability of the ESAT© was measured at two consecutive time points (4 weeks apart) with "experienced" and "inexperienced" paediatric intensive care nurses using the same scenarios and tool to guide decision-making. No differences were observed between endotracheal tube suction decisions made by "experts" (n = 9), "inexperienced" (n = 14) and "experienced" (n = 12) nurses confirming the tool's construct validity. No differences were observed between groups for endotracheal tube suction decisions at T1 and T2. Criterion-related construct validity and test-retest reliability of the ESAT© were demonstrated. Further testing is recommended to confirm reliability in the clinical setting with the "inexperienced" nurse to guide decision-making related to endotracheal tube suction. The ESAT© is the first validated tool to systematically guide endotracheal nursing practice for the "inexperienced" nurse. © 2018 John Wiley & Sons Ltd.
Can We Study Autonomous Driving Comfort in Moving-Base Driving Simulators? A Validation Study.
Bellem, Hanna; Klüver, Malte; Schrauf, Michael; Schöner, Hans-Peter; Hecht, Heiko; Krems, Josef F
2017-05-01
To lay the basis of studying autonomous driving comfort using driving simulators, we assessed the behavioral validity of two moving-base simulator configurations by contrasting them with a test-track setting. With increasing level of automation, driving comfort becomes increasingly important. Simulators provide a safe environment to study perceived comfort in autonomous driving. To date, however, no studies were conducted in relation to comfort in autonomous driving to determine the extent to which results from simulator studies can be transferred to on-road driving conditions. Participants ( N = 72) experienced six differently parameterized lane-change and deceleration maneuvers and subsequently rated the comfort of each scenario. One group of participants experienced the maneuvers on a test-track setting, whereas two other groups experienced them in one of two moving-base simulator configurations. We could demonstrate relative and absolute validity for one of the two simulator configurations. Subsequent analyses revealed that the validity of the simulator highly depends on the parameterization of the motion system. Moving-base simulation can be a useful research tool to study driving comfort in autonomous vehicles. However, our results point at a preference for subunity scaling factors for both lateral and longitudinal motion cues, which might be explained by an underestimation of speed in virtual environments. In line with previous studies, we recommend lateral- and longitudinal-motion scaling factors of approximately 50% to 60% in order to obtain valid results for both active and passive driving tasks.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kennedy, Joseph H.; Bennett, Andrew R.; Evans, Katherine J.
To address the pressing need to better understand the behavior and complex interaction of ice sheets within the global Earth system, significant development of continental-scale, dynamical ice sheet models is underway. Concurrent to the development of the Community Ice Sheet Model (CISM), the corresponding verification and validation (V&V) process is being coordinated through a new, robust, Python-based extensible software package, the Land Ice Verification and Validation toolkit (LIVVkit). Incorporated into the typical ice sheet model development cycle, it provides robust and automated numerical verification, software verification, performance validation, and physical validation analyses on a variety of platforms, from personal laptopsmore » to the largest supercomputers. LIVVkit operates on sets of regression test and reference data sets, and provides comparisons for a suite of community prioritized tests, including configuration and parameter variations, bit-for-bit evaluation, and plots of model variables to indicate where differences occur. LIVVkit also provides an easily extensible framework to incorporate and analyze results of new intercomparison projects, new observation data, and new computing platforms. LIVVkit is designed for quick adaptation to additional ice sheet models via abstraction of model specific code, functions, and configurations into an ice sheet model description bundle outside the main LIVVkit structure. Furthermore, through shareable and accessible analysis output, LIVVkit is intended to help developers build confidence in their models and enhance the credibility of ice sheet models overall.« less
Kennedy, Joseph H.; Bennett, Andrew R.; Evans, Katherine J.; ...
2017-03-23
To address the pressing need to better understand the behavior and complex interaction of ice sheets within the global Earth system, significant development of continental-scale, dynamical ice sheet models is underway. Concurrent to the development of the Community Ice Sheet Model (CISM), the corresponding verification and validation (V&V) process is being coordinated through a new, robust, Python-based extensible software package, the Land Ice Verification and Validation toolkit (LIVVkit). Incorporated into the typical ice sheet model development cycle, it provides robust and automated numerical verification, software verification, performance validation, and physical validation analyses on a variety of platforms, from personal laptopsmore » to the largest supercomputers. LIVVkit operates on sets of regression test and reference data sets, and provides comparisons for a suite of community prioritized tests, including configuration and parameter variations, bit-for-bit evaluation, and plots of model variables to indicate where differences occur. LIVVkit also provides an easily extensible framework to incorporate and analyze results of new intercomparison projects, new observation data, and new computing platforms. LIVVkit is designed for quick adaptation to additional ice sheet models via abstraction of model specific code, functions, and configurations into an ice sheet model description bundle outside the main LIVVkit structure. Furthermore, through shareable and accessible analysis output, LIVVkit is intended to help developers build confidence in their models and enhance the credibility of ice sheet models overall.« less
Validation of a clinical assessment of spectral-ripple resolution for cochlear implant users.
Drennan, Ward R; Anderson, Elizabeth S; Won, Jong Ho; Rubinstein, Jay T
2014-01-01
Nonspeech psychophysical tests of spectral resolution, such as the spectral-ripple discrimination task, have been shown to correlate with speech-recognition performance in cochlear implant (CI) users. However, these tests are best suited for use in the research laboratory setting and are impractical for clinical use. A test of spectral resolution that is quicker and could more easily be implemented in the clinical setting has been developed. The objectives of this study were (1) To determine whether this new clinical ripple test would yield individual results equivalent to the longer, adaptive version of the ripple-discrimination test; (2) To evaluate test-retest reliability for the clinical ripple measure; and (3) To examine the relationship between clinical ripple performance and monosyllabic word recognition in quiet for a group of CI listeners. Twenty-eight CI recipients participated in the study. Each subject was tested on both the adaptive and the clinical versions of spectral ripple discrimination, as well as consonant-nucleus-consonant word recognition in quiet. The adaptive version of spectral ripple used a two-up, one-down procedure for determining spectral ripple discrimination threshold. The clinical ripple test used a method of constant stimuli, with trials for each of 12 fixed ripple densities occurring six times in random order. Results from the clinical ripple test (proportion correct) were then compared with ripple-discrimination thresholds (in ripples per octave) from the adaptive test. The clinical ripple test showed strong concurrent validity, evidenced by a good correlation between clinical ripple and adaptive ripple results (r = 0.79), as well as a correlation with word recognition (r = 0.7). Excellent test-retest reliability was also demonstrated with a high test-retest correlation (r = 0.9). The clinical ripple test is a reliable nonlinguistic measure of spectral resolution, optimized for use with CI users in a clinical setting. The test might be useful as a diagnostic tool or as a possible surrogate outcome measure for evaluating treatment effects in hearing.
Quantum efficiency test set up performances for NIR detector characterization at ESTEC
NASA Astrophysics Data System (ADS)
Crouzet, P.-E.; Duvet, L.; De Wit, F.; Beaufort, T.; Blommaert, S.; Butler, B.; Van Duinkerken, G.; ter Haar, J.; Heijnen, J.; van der Luijt, K.; Smit, H.; Viale, T.
2014-07-01
The Payload Technology Validation Section (Future mission preparation Office) at ESTEC is in charge of specific mission oriented validation activities, for science and robotic exploration missions, aiming at reducing development risks in the implementation phase. These activities take place during the early mission phases or during the implementation itself. In this framework, a test set up to characterize the quantum efficiency of near infrared detectors has been developed. The first detector to be tested will an HAWAII-2RG detector with a 2.5μm cut off, it will be used as commissioning device in preparation to the tests of prototypes European detectors developed under ESA funding. The capability to compare on the same setup detectors from different manufacturers will be a unique asset for the future mission preparation office. This publication presents the performances of the quantum efficiency test bench to prepare measurements on the HAWAII-2RG detector. A SOFRADIR Saturn detector has been used as a preliminary test vehicle for the bench. A test set up with a lamp, chopper, monochromator, pinhole and off axis mirrors allows to create a spot of 1mm diameter between 700nm and 2.5μm.The shape of the beam has been measured to match the rms voltage read by the Merlin Lock -in amplifier and the amplitude of the incoming signal. The reference detectors have been inter-calibrated with an uncertainty up to 3 %. For the measurement with HAWAII-2RG detector, the existing cryostat [1] has been modified to adapt cold black baffling, a cold filter wheel and a sapphire window. An statistic uncertainty of +/-2.6% on the quantum efficiency on the detector under test measurement is expected.
Purcell, Maureen K.; Getchell, Rodman G.; McClure, Carol A.; Weber, S.E.; Garver, Kyle A.
2011-01-01
Real-time, or quantitative, polymerase chain reaction (qPCR) is quickly supplanting other molecular methods for detecting the nucleic acids of human and other animal pathogens owing to the speed and robustness of the technology. As the aquatic animal health community moves toward implementing national diagnostic testing schemes, it will need to evaluate how qPCR technology should be employed. This review outlines the basic principles of qPCR technology, considerations for assay development, standards and controls, assay performance, diagnostic validation, implementation in the diagnostic laboratory, and quality assurance and control measures. These factors are fundamental for ensuring the validity of qPCR assay results obtained in the diagnostic laboratory setting.
Diagnostic accuracy of a two-item Drug Abuse Screening Test (DAST-2).
Tiet, Quyen Q; Leyva, Yani E; Moos, Rudolf H; Smith, Brandy
2017-11-01
Drug use is prevalent and costly to society, but individuals with drug use disorders (DUDs) are under-diagnosed and under-treated, particularly in primary care (PC) settings. Drug screening instruments have been developed to identify patients with DUDs and facilitate treatment. The Drug Abuse Screening Test (DAST) is one of the most well-known drug screening instruments. However, similar to many such instruments, it is too long for routine use in busy PC settings. This study developed and validated a briefer and more practical DAST for busy PC settings. We recruited 1300 PC patients in two Department of Veterans Affairs (VA) clinics. Participants responded to a structured diagnostic interview. We randomly selected half of the sample to develop and the other half to validate the new instrument. We employed signal detection techniques to select the best DAST items to identify DUDs (based on the MINI) and negative consequences of drug use (measured by the Inventory of Drug Use Consequences). Performance indicators were calculated. The two-item DAST (DAST-2) was 97% sensitive and 91% specific for DUDs in the development sample and 95% sensitive and 89% specific in the validation sample. It was highly sensitive and specific for DUD and negative consequences of drug use in subgroups of patients, including gender, age, race/ethnicity, marital status, educational level, and posttraumatic stress disorder status. The DAST-2 is an appropriate drug screening instrument for routine use in PC settings in the VA and may be applicable in broader range of PC clinics. Published by Elsevier Ltd.
Rank estimation and the multivariate analysis of in vivo fast-scan cyclic voltammetric data
Keithley, Richard B.; Carelli, Regina M.; Wightman, R. Mark
2010-01-01
Principal component regression has been used in the past to separate current contributions from different neuromodulators measured with in vivo fast-scan cyclic voltammetry. Traditionally, a percent cumulative variance approach has been used to determine the rank of the training set voltammetric matrix during model development, however this approach suffers from several disadvantages including the use of arbitrary percentages and the requirement of extreme precision of training sets. Here we propose that Malinowski’s F-test, a method based on a statistical analysis of the variance contained within the training set, can be used to improve factor selection for the analysis of in vivo fast-scan cyclic voltammetric data. These two methods of rank estimation were compared at all steps in the calibration protocol including the number of principal components retained, overall noise levels, model validation as determined using a residual analysis procedure, and predicted concentration information. By analyzing 119 training sets from two different laboratories amassed over several years, we were able to gain insight into the heterogeneity of in vivo fast-scan cyclic voltammetric data and study how differences in factor selection propagate throughout the entire principal component regression analysis procedure. Visualizing cyclic voltammetric representations of the data contained in the retained and discarded principal components showed that using Malinowski’s F-test for rank estimation of in vivo training sets allowed for noise to be more accurately removed. Malinowski’s F-test also improved the robustness of our criterion for judging multivariate model validity, even though signal-to-noise ratios of the data varied. In addition, pH change was the majority noise carrier of in vivo training sets while dopamine prediction was more sensitive to noise. PMID:20527815
DOE Office of Scientific and Technical Information (OSTI.GOV)
Valencia, Antoni; Prous, Josep; Mora, Oscar
As indicated in ICH M7 draft guidance, in silico predictive tools including statistically-based QSARs and expert analysis may be used as a computational assessment for bacterial mutagenicity for the qualification of impurities in pharmaceuticals. To address this need, we developed and validated a QSAR model to predict Salmonella t. mutagenicity (Ames assay outcome) of pharmaceutical impurities using Prous Institute's Symmetry℠, a new in silico solution for drug discovery and toxicity screening, and the Mold2 molecular descriptor package (FDA/NCTR). Data was sourced from public benchmark databases with known Ames assay mutagenicity outcomes for 7300 chemicals (57% mutagens). Of these data, 90%more » was used to train the model and the remaining 10% was set aside as a holdout set for validation. The model's applicability to drug impurities was tested using a FDA/CDER database of 951 structures, of which 94% were found within the model's applicability domain. The predictive performance of the model is acceptable for supporting regulatory decision-making with 84 ± 1% sensitivity, 81 ± 1% specificity, 83 ± 1% concordance and 79 ± 1% negative predictivity based on internal cross-validation, while the holdout dataset yielded 83% sensitivity, 77% specificity, 80% concordance and 78% negative predictivity. Given the importance of having confidence in negative predictions, an additional external validation of the model was also carried out, using marketed drugs known to be Ames-negative, and obtained 98% coverage and 81% specificity. Additionally, Ames mutagenicity data from FDA/CFSAN was used to create another data set of 1535 chemicals for external validation of the model, yielding 98% coverage, 73% sensitivity, 86% specificity, 81% concordance and 84% negative predictivity. - Highlights: • A new in silico QSAR model to predict Ames mutagenicity is described. • The model is extensively validated with chemicals from the FDA and the public domain. • Validation tests show desirable high sensitivity and high negative predictivity. • The model predicted 14 reportedly difficult to predict drug impurities with accuracy. • The model is suitable to support risk evaluation of potentially mutagenic compounds.« less
Garvin, Jennifer H; DuVall, Scott L; South, Brett R; Bray, Bruce E; Bolton, Daniel; Heavirland, Julia; Pickard, Steve; Heidenreich, Paul; Shen, Shuying; Weir, Charlene; Samore, Matthew; Goldstein, Mary K
2012-01-01
Left ventricular ejection fraction (EF) is a key component of heart failure quality measures used within the Department of Veteran Affairs (VA). Our goals were to build a natural language processing system to extract the EF from free-text echocardiogram reports to automate measurement reporting and to validate the accuracy of the system using a comparison reference standard developed through human review. This project was a Translational Use Case Project within the VA Consortium for Healthcare Informatics. We created a set of regular expressions and rules to capture the EF using a random sample of 765 echocardiograms from seven VA medical centers. The documents were randomly assigned to two sets: a set of 275 used for training and a second set of 490 used for testing and validation. To establish the reference standard, two independent reviewers annotated all documents in both sets; a third reviewer adjudicated disagreements. System test results for document-level classification of EF of <40% had a sensitivity (recall) of 98.41%, a specificity of 100%, a positive predictive value (precision) of 100%, and an F measure of 99.2%. System test results at the concept level had a sensitivity of 88.9% (95% CI 87.7% to 90.0%), a positive predictive value of 95% (95% CI 94.2% to 95.9%), and an F measure of 91.9% (95% CI 91.2% to 92.7%). An EF value of <40% can be accurately identified in VA echocardiogram reports. An automated information extraction system can be used to accurately extract EF for quality measurement.
Health Auctions: a Valuation Experiment (HAVE) study protocol.
Kularatna, Sanjeewa; Petrie, Dennis; Scuffham, Paul A; Byrnes, Joshua
2016-04-07
Quality-adjusted life years are derived using health state utility weights which adjust for the relative value of living in each health state compared with living in perfect health. Various techniques are used to estimate health state utility weights including time-trade-off and standard gamble. These methods have exhibited limitations in terms of complexity, validity and reliability. A new composite approach using experimental auctions to value health states is introduced in this protocol. A pilot study will test the feasibility and validity of using experimental auctions to value health states in monetary terms. A convenient sample (n=150) from a population of university staff and students will be invited to participate in 30 auction sets with a group of 5 people in each set. The 9 health states auctioned in each auction set will come from the commonly used EQ-5D-3L instrument. At most participants purchase 2 health states, and the participant who acquires the 2 'best' health states on average will keep the amount of money they do not spend in acquiring those health states. The value (highest bid and average bid) of each of the 24 health states will be compared across auctions to test for reliability across auction groups and across auctioneers. A test retest will be conducted for 10% of the sample to assess reliability of responses for health states auctions. Feasibility of conducting experimental auctions to value health states will also be examined. The validity of estimated health states values will be compared with published utility estimates from other methods. This pilot study will explore the feasibility, reliability and validity in using experimental auction for valuing health states. Ethical clearance was obtained from Griffith University ethics committee. The results will be disseminated in peer-reviewed journals and major international conferences. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Jeong, Yunwha; Law, Mary; Stratford, Paul; DeMatteo, Carol; Kim, Hwan
2016-11-01
To develop the Korean version of the Participation and Environment Measure for Children and Youth (KPEM-CY) and examine its psychometric properties. The PEM-CY was cross-culturally translated into Korean using a specific guideline: pre-review of participation items, forward/backward translation, expert committee review, pre-test of the KPEM-CY and final review. To establish internal consistency, test-retest reliability and construct validity of the KPEM-CY, 80 parents of children with disabilities aged 5-13 years were recruited in South Korea. Across the home, school and community settings, 76% of participation items and 29% of environment items were revised to improve their fit with Korean culture. Internal consistency was moderate to excellent (0.67-0.92) for different summary scores. Test-retest reliability was excellent (>0.75) in the summary scores of participation frequency and extent of involvement across the three settings and moderate to excellent (0.53-0.95) in all summary scores at home. Child's age, type of school and annual income were the factors that significantly influenced specific dimensions of participation and environment across all settings. Results indicated that the KPEM-CY is equivalent to the original PEM-CY and has initial evidence of reliability and validity for use with Korean children with disabilities. Implications for rehabilitation Because 'participation' is a key outcome of the rehabilitation, measuring comprehensive participation of children with disabilities is necessary. The PEM-CY is a parent-report survey measure to assess comprehensive participation of children and youth and environment, which affect their participation, at home, school and in the community. A cross-cultural adaptation process is mandatory to adapt the measurement tool to a new culture or country. The Korean PEM-CY has both reliability and validity and can therefore generate useful clinical data for Korean children with disabilities.
The Reliability and Validity of Measures of Gait Variability in Community-Dwelling Older Adults
Brach, Jennifer S.; Perera, Subashan; Studenski, Stephanie; Newman, Anne B.
2009-01-01
Objective To examine the test-retest reliability and concurrent validity of variability of gait characteristics. Design Cross-sectional study. Setting Research laboratory. Participants Older adults (N=558) from the Cardiovascular Health Study. Interventions Not applicable. Main Outcome Measures Gait characteristics were measured using a 4-m computerized walkway. SD determined from the steps recorded were used as the measures of variability. Intraclass correlation coefficients (ICC) were calculated to examine test-retest reliability of a 4-m walk and two 4-m walks. To establish concurrent validity, the measures of gait variability were compared across levels of health, functional status, and physical activity using independent t tests and analysis of variances. Results Gait variability measures from the two 4-m walks demonstrated greater test-retest reliability than those from the single 4-m walk (ICC=.22–.48 and ICC=.40–.63, respectively). Greater step length and stance time variability were associated with poorer health, functional status and physical activity (P<.05). Conclusions Gait variability calculated from a limited number of steps has fair to good test-retest reliability and concurrent validity. Reliability of gait variability calculated from a greater number of steps should be assessed to determine if the consistency can be improved. PMID:19061741
Reducing animal experimentation in foot-and-mouth disease vaccine potency tests.
Reeve, Richard; Cox, Sarah; Smitsaart, Eliana; Beascoechea, Claudia Perez; Haas, Bernd; Maradei, Eduardo; Haydon, Daniel T; Barnett, Paul
2011-07-26
The World Organisation for Animal Health (OIE) Terrestrial Manual and the European Pharmacopoeia (EP) still prescribe live challenge experiments for foot-and-mouth disease virus (FMDV) immunogenicity and vaccine potency tests. However, the EP allows for other validated tests for the latter, and specifically in vitro tests if a "satisfactory pass level" has been determined; serological replacements are also currently in use in South America. Much research has therefore focused on validating both ex vivo and in vitro tests to replace live challenge. However, insufficient attention has been given to the sensitivity and specificity of the "gold standard"in vivo test being replaced, despite this information being critical to determining what should be required of its replacement. This paper aims to redress this imbalance by examining the current live challenge tests and their associated statistics and determining the confidence that we can have in them, thereby setting a standard for candidate replacements. It determines that the statistics associated with the current EP PD(50) test are inappropriate given our domain knowledge, but that the OIE test statistics are satisfactory. However, it has also identified a new set of live animal challenge test regimes that provide similar sensitivity and specificity to all of the currently used OIE tests using fewer animals (16 including controls), and can also provide further savings in live animal experiments in exchange for small reductions in sensitivity and specificity. Copyright © 2011 Elsevier Ltd. All rights reserved.
DOT National Transportation Integrated Search
2014-10-01
This research program develops and validates structural design guidelines and details for concrete bridge decks with : corrosion-resistant reinforcing (CRR) bars. A two-phase experimental program was conducted where a control test set consistent : wi...
Long term validation of an accelerated polishing test procedure for HMA pavements.
DOT National Transportation Integrated Search
2013-04-01
The Ohio Department of Transportation (ODOT) has set strategic goals to improve driving safety by maintaining : smooth pavement surfaces with high skid resistance. ODOT has taken the initiative to monitor pavement : friction on Ohio roadways and reme...
Lienemann, Kai; Plötz, Thomas; Pestel, Sabine
2008-01-01
The aim of safety pharmacology is early detection of compound-induced side-effects. NMR-based urine analysis followed by multivariate data analysis (metabonomics) identifies efficiently differences between toxic and non-toxic compounds; but in most cases multiple administrations of the test compound are necessary. We tested the feasibility of detecting proximal tubule kidney toxicity and phospholipidosis with metabonomics techniques after single compound administration as an early safety pharmacology approach. Rats were treated orally, intravenously, inhalatively or intraperitoneally with different test compounds. Urine was collected at 0-8 h and 8-24 h after compound administration, and (1)H NMR-patterns were recorded from the samples. Variation of post-processing and feature extraction methods led to different views on the data. Support Vector Machines were trained on these different data sets and then aggregated as experts in an Ensemble. Finally, validity was monitored with a cross-validation study using a training, validation, and test data set. Proximal tubule kidney toxicity could be predicted with reasonable total classification accuracy (85%), specificity (88%) and sensitivity (78%). In comparison to alternative histological studies, results were obtained quicker, compound need was reduced, and very importantly fewer animals were needed. In contrast, the induction of phospholipidosis by the test compounds could not be predicted using NMR-based urine analysis or the previously published biomarker PAG. NMR-based urine analysis was shown to effectively predict proximal tubule kidney toxicity after single compound administration in rats. Thus, this experimental design allows early detection of toxicity risks with relatively low amounts of compound in a reasonably short period of time.
Erdodi, Laszlo A; Sagar, Sanya; Seke, Kristian; Zuccato, Brandon G; Schwartz, Eben S; Roth, Robert M
2018-06-01
This study was designed to develop performance validity indicators embedded within the Delis-Kaplan Executive Function Systems (D-KEFS) version of the Stroop task. Archival data from a mixed clinical sample of 132 patients (50% male; M Age = 43.4; M Education = 14.1) clinically referred for neuropsychological assessment were analyzed. Criterion measures included the Warrington Recognition Memory Test-Words and 2 composites based on several independent validity indicators. An age-corrected scaled score ≤6 on any of the 4 trials reliably differentiated psychometrically defined credible and noncredible response sets with high specificity (.87-.94) and variable sensitivity (.34-.71). An inverted Stroop effect was less sensitive (.14-.29), but comparably specific (.85-90) to invalid performance. Aggregating the newly developed D-KEFS Stroop validity indicators further improved classification accuracy. Failing the validity cutoffs was unrelated to self-reported depression or anxiety. However, it was associated with elevated somatic symptom report. In addition to processing speed and executive function, the D-KEFS version of the Stroop task can function as a measure of performance validity. A multivariate approach to performance validity assessment is generally superior to univariate models. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Wright, Adam; Pang, Justine; Feblowitz, Joshua C; Maloney, Francine L; Wilcox, Allison R; Ramelson, Harley Z; Schneider, Louise I; Bates, David W
2011-01-01
Accurate knowledge of a patient's medical problems is critical for clinical decision making, quality measurement, research, billing and clinical decision support. Common structured sources of problem information include the patient problem list and billing data; however, these sources are often inaccurate or incomplete. To develop and validate methods of automatically inferring patient problems from clinical and billing data, and to provide a knowledge base for inferring problems. We identified 17 target conditions and designed and validated a set of rules for identifying patient problems based on medications, laboratory results, billing codes, and vital signs. A panel of physicians provided input on a preliminary set of rules. Based on this input, we tested candidate rules on a sample of 100,000 patient records to assess their performance compared to gold standard manual chart review. The physician panel selected a final rule for each condition, which was validated on an independent sample of 100,000 records to assess its accuracy. Seventeen rules were developed for inferring patient problems. Analysis using a validation set of 100,000 randomly selected patients showed high sensitivity (range: 62.8-100.0%) and positive predictive value (range: 79.8-99.6%) for most rules. Overall, the inference rules performed better than using either the problem list or billing data alone. We developed and validated a set of rules for inferring patient problems. These rules have a variety of applications, including clinical decision support, care improvement, augmentation of the problem list, and identification of patients for research cohorts.
saltPAD: A New Analytical Tool for Monitoring Salt Iodization in Low Resource Settings
Myers, Nicholas M.; Strydom, Emmerentia Elza; Sweet, James; Sweet, Christopher; Spohrer, Rebecca; Dhansay, Muhammad Ali; Lieberman, Marya
2016-01-01
We created a paper test card that measures a common iodizing agent, iodate, in salt. To test the analytical metrics, usability, and robustness of the paper test card when it is used in low resource settings, the South African Medical Research Council and GroundWork performed independent validation studies of the device. The accuracy and precision metrics from both studies were comparable. In the SAMRC study, more than 90% of the test results (n=1704) were correctly classified as corresponding to adequately or inadequately iodized salt. The cards are suitable for market and household surveys to determine whether salt is adequately iodized. Further development of the cards will improve their utility for monitoring salt iodization during production. PMID:29942380
ERIC Educational Resources Information Center
Wang, Shudong; McCall, Marty; Jiao, Hong; Harris, Gregg
2012-01-01
The purposes of this study are twofold. First, to investigate the construct or factorial structure of a set of Reading and Mathematics computerized adaptive tests (CAT), "Measures of Academic Progress" (MAP), given in different states at different grades and academic terms. The second purpose is to investigate the invariance of test…
ERIC Educational Resources Information Center
Riegle, Sandra E.; Frye, Ann W.; Glenn, Jason; Smith, Kirk L.
2012-01-01
Teachers tasked with developing moral character in future physicians face an array of pedagogic challenges, among them identifying tools to measure progress in instilling the requisite skill set. One validated instrument for assessing moral judgment is the Defining Issues Test (DIT-2). Based on the work of Lawrence Kohlberg, the test's main…
40 CFR 86.094-28 - Compliance with emission standards.
Code of Federal Regulations, 2010 CFR
2010-07-01
... the outlier procedure and averaging (as allowed under § 86.094-26(a)(6)(i)) to the same data set, the...) through (3) of this section. (1) All valid exhaust emission data from the tests required under § 86.094-26... § 86.094-29 for all tests conducted on all durability data vehicles of the combination selected under...
40 CFR 86.094-28 - Compliance with emission standards.
Code of Federal Regulations, 2012 CFR
2012-07-01
... the outlier procedure and averaging (as allowed under § 86.094-26(a)(6)(i)) to the same data set, the...) through (3) of this section. (1) All valid exhaust emission data from the tests required under § 86.094-26... § 86.094-29 for all tests conducted on all durability data vehicles of the combination selected under...
40 CFR 86.094-28 - Compliance with emission standards.
Code of Federal Regulations, 2013 CFR
2013-07-01
... the outlier procedure and averaging (as allowed under § 86.094-26(a)(6)(i)) to the same data set, the...) through (3) of this section. (1) All valid exhaust emission data from the tests required under § 86.094-26... § 86.094-29 for all tests conducted on all durability data vehicles of the combination selected under...
40 CFR 86.094-28 - Compliance with emission standards.
Code of Federal Regulations, 2011 CFR
2011-07-01
... the outlier procedure and averaging (as allowed under § 86.094-26(a)(6)(i)) to the same data set, the...) through (3) of this section. (1) All valid exhaust emission data from the tests required under § 86.094-26... § 86.094-29 for all tests conducted on all durability data vehicles of the combination selected under...
Payne, Philip R O; Kwok, Alan; Dhaval, Rakesh; Borlawsky, Tara B
2009-03-01
The conduct of large-scale translational studies presents significant challenges related to the storage, management and analysis of integrative data sets. Ideally, the application of methodologies such as conceptual knowledge discovery in databases (CKDD) provides a means for moving beyond intuitive hypothesis discovery and testing in such data sets, and towards the high-throughput generation and evaluation of knowledge-anchored relationships between complex bio-molecular and phenotypic variables. However, the induction of such high-throughput hypotheses is non-trivial, and requires correspondingly high-throughput validation methodologies. In this manuscript, we describe an evaluation of the efficacy of a natural language processing-based approach to validating such hypotheses. As part of this evaluation, we will examine a phenomenon that we have labeled as "Conceptual Dissonance" in which conceptual knowledge derived from two or more sources of comparable scope and granularity cannot be readily integrated or compared using conventional methods and automated tools.
QSAR modeling of GPCR ligands: methodologies and examples of applications.
Tropsha, A; Wang, S X
2006-01-01
GPCR ligands represent not only one of the major classes of current drugs but the major continuing source of novel potent pharmaceutical agents. Because 3D structures of GPCRs as determined by experimental techniques are still unavailable, ligand-based drug discovery methods remain the major computational molecular modeling approaches to the analysis of growing data sets of tested GPCR ligands. This paper presents an overview of modern Quantitative Structure Activity Relationship (QSAR) modeling. We discuss the critical issue of model validation and the strategy for applying the successfully validated QSAR models to virtual screening of available chemical databases. We present several examples of applications of validated QSAR modeling approaches to GPCR ligands. We conclude with the comments on exciting developments in the QSAR modeling of GPCR ligands that focus on the study of emerging data sets of compounds with dual or even multiple activities against two or more of GPCRs.
Validation of Community Models: Identifying Events in Space Weather Model Timelines
NASA Technical Reports Server (NTRS)
MacNeice, Peter
2009-01-01
I develop and document a set of procedures which test the quality of predictions of solar wind speed and polarity of the interplanetary magnetic field (IMF) made by coupled models of the ambient solar corona and heliosphere. The Wang-Sheeley-Arge (WSA) model is used to illustrate the application of these validation procedures. I present an algorithm which detects transitions of the solar wind from slow to high speed. I also present an algorithm which processes the measured polarity of the outward directed component of the IMF. This removes high-frequency variations to expose the longer-scale changes that reflect IMF sector changes. I apply these algorithms to WSA model predictions made using a small set of photospheric synoptic magnetograms obtained by the Global Oscillation Network Group as input to the model. The results of this preliminary validation of the WSA model (version 1.6) are summarized.
Prediction of pelvic organ prolapse using an artificial neural network.
Robinson, Christopher J; Swift, Steven; Johnson, Donna D; Almeida, Jonas S
2008-08-01
The objective of this investigation was to test the ability of a feedforward artificial neural network (ANN) to differentiate patients who have pelvic organ prolapse (POP) from those who retain good pelvic organ support. Following institutional review board approval, patients with POP (n = 87) and controls with good pelvic organ support (n = 368) were identified from the urogynecology research database. Historical and clinical information was extracted from the database. Data analysis included the training of a feedforward ANN, variable selection, and external validation of the model with an independent data set. Twenty variables were used. The median-performing ANN model used a median of 3 (quartile 1:3 to quartile 3:5) variables and achieved an area under the receiver operator curve of 0.90 (external, independent validation set). Ninety percent sensitivity and 83% specificity were obtained in the external validation by ANN classification. Feedforward ANN modeling is applicable to the identification and prediction of POP.
Mayorga-Vega, Daniel; Merino-Marban, Rafael; Viciana, Jesús
2014-01-01
The main purpose of the present meta-analysis was to examine the scientific literature on the criterion-related validity of sit-and-reach tests for estimating hamstring and lumbar extensibility. For this purpose relevant studies were searched from seven electronic databases dated up through December 2012. Primary outcomes of criterion-related validity were Pearson´s zero-order correlation coefficients (r) between sit-and-reach tests and hamstrings and/or lumbar extensibility criterion measures. Then, from the included studies, the Hunter- Schmidt´s psychometric meta-analysis approach was conducted to estimate population criterion- related validity of sit-and-reach tests. Firstly, the corrected correlation mean (rp), unaffected by statistical artefacts (i.e., sampling error and measurement error), was calculated separately for each sit-and-reach test. Subsequently, the three potential moderator variables (sex of participants, age of participants, and level of hamstring extensibility) were examined by a partially hierarchical analysis. Of the 34 studies included in the present meta-analysis, 99 correlations values across eight sit-and-reach tests and 51 across seven sit-and-reach tests were retrieved for hamstring and lumbar extensibility, respectively. The overall results showed that all sit-and-reach tests had a moderate mean criterion-related validity for estimating hamstring extensibility (rp = 0.46-0.67), but they had a low mean for estimating lumbar extensibility (rp = 0. 16-0.35). Generally, females, adults and participants with high levels of hamstring extensibility tended to have greater mean values of criterion-related validity for estimating hamstring extensibility. When the use of angular tests is limited such as in a school setting or in large scale studies, scientists and practitioners could use the sit-and-reach tests as a useful alternative for hamstring extensibility estimation, but not for estimating lumbar extensibility. Key Points Overall sit-and-reach tests have a moderate mean criterion-related validity for estimating hamstring extensibility, but they have a low mean validity for estimating lumbar extensibility. Among all the sit-and-reach test protocols, the Classic sit-and-reach test seems to be the best option to estimate hamstring extensibility. End scores (e.g., the Classic sit-and-reach test) are a better indicator of hamstring extensibility than the modifications that incorporate fingers-to-box distance (e.g., the Modified sit-and-reach test). When angular tests such as straight leg raise or knee extension tests cannot be used, sit-and-reach tests seem to be a useful field test alternative to estimate hamstring extensibility, but not to estimate lumbar extensibility. PMID:24570599
Mirage: a visible signature evaluation tool
NASA Astrophysics Data System (ADS)
Culpepper, Joanne B.; Meehan, Alaster J.; Shao, Q. T.; Richards, Noel
2017-10-01
This paper presents the Mirage visible signature evaluation tool, designed to provide a visible signature evaluation capability that will appropriately reflect the effect of scene content on the detectability of targets, providing a capability to assess visible signatures in the context of the environment. Mirage is based on a parametric evaluation of input images, assessing the value of a range of image metrics and combining them using the boosted decision tree machine learning method to produce target detectability estimates. It has been developed using experimental data from photosimulation experiments, where human observers search for vehicle targets in a variety of digital images. The images used for tool development are synthetic (computer generated) images, showing vehicles in many different scenes and exhibiting a wide variation in scene content. A preliminary validation has been performed using k-fold cross validation, where 90% of the image data set was used for training and 10% of the image data set was used for testing. The results of the k-fold validation from 200 independent tests show a prediction accuracy between Mirage predictions of detection probability and observed probability of detection of r(262) = 0:63, p < 0:0001 (Pearson correlation) and a MAE = 0:21 (mean absolute error).
The composite dynamic method as evidence for age-specific waterfowl mortality
Burnham, Kenneth P.; Anderson, David R.
1979-01-01
For the past 25 years estimation of mortality rates for waterfowl has been based almost entirely on the composite dynamic life table. We examined the specific assumptions for this method and derived a valid goodness of fit test. We performed this test on 45 data sets representing a cross section of banded sampled for various waterfowl species, geographic areas, banding periods, and age/sex classes. We found that: (1) the composite dynamic method was rejected (P <0.001) in 37 of the 45 data sets (in fact, 29 were rejected at P <0.00001) and (2) recovery and harvest rates are year-specific (a critical violation of the necessary assumptions). We conclude that the restrictive assumptions required for the composite dynamic method to produce valid estimates of mortality rates are not met in waterfowl data. Also we demonstrate that even when the required assumptions are met, the method produces very biased estimates of age-specific mortality rates. We believe the composite dynamic method should not be used in the analysis of waterfowl banding data. Furthermore, the composite dynamic method does not provide valid evidence for age-specific mortality rates in waterfowl.
Predicting Mouse Liver Microsomal Stability with “Pruned” Machine Learning Models and Public Data
Perryman, Alexander L.; Stratton, Thomas P.; Ekins, Sean; Freundlich, Joel S.
2015-01-01
Purpose Mouse efficacy studies are a critical hurdle to advance translational research of potential therapeutic compounds for many diseases. Although mouse liver microsomal (MLM) stability studies are not a perfect surrogate for in vivo studies of metabolic clearance, they are the initial model system used to assess metabolic stability. Consequently, we explored the development of machine learning models that can enhance the probability of identifying compounds possessing MLM stability. Methods Published assays on MLM half-life values were identified in PubChem, reformatted, and curated to create a training set with 894 unique small molecules. These data were used to construct machine learning models assessed with internal cross-validation, external tests with a published set of antitubercular compounds, and independent validation with an additional diverse set of 571 compounds (PubChem data on percent metabolism). Results “Pruning” out the moderately unstable/moderately stable compounds from the training set produced models with superior predictive power. Bayesian models displayed the best predictive power for identifying compounds with a half-life ≥1 hour. Conclusions Our results suggest the pruning strategy may be of general benefit to improve test set enrichment and provide machine learning models with enhanced predictive value for the MLM stability of small organic molecules. This study represents the most exhaustive study to date of using machine learning approaches with MLM data from public sources. PMID:26415647
Predicting Mouse Liver Microsomal Stability with "Pruned" Machine Learning Models and Public Data.
Perryman, Alexander L; Stratton, Thomas P; Ekins, Sean; Freundlich, Joel S
2016-02-01
Mouse efficacy studies are a critical hurdle to advance translational research of potential therapeutic compounds for many diseases. Although mouse liver microsomal (MLM) stability studies are not a perfect surrogate for in vivo studies of metabolic clearance, they are the initial model system used to assess metabolic stability. Consequently, we explored the development of machine learning models that can enhance the probability of identifying compounds possessing MLM stability. Published assays on MLM half-life values were identified in PubChem, reformatted, and curated to create a training set with 894 unique small molecules. These data were used to construct machine learning models assessed with internal cross-validation, external tests with a published set of antitubercular compounds, and independent validation with an additional diverse set of 571 compounds (PubChem data on percent metabolism). "Pruning" out the moderately unstable / moderately stable compounds from the training set produced models with superior predictive power. Bayesian models displayed the best predictive power for identifying compounds with a half-life ≥1 h. Our results suggest the pruning strategy may be of general benefit to improve test set enrichment and provide machine learning models with enhanced predictive value for the MLM stability of small organic molecules. This study represents the most exhaustive study to date of using machine learning approaches with MLM data from public sources.
van den Boogaard, Jossy; Lyimo, Ramsey A; Boeree, Martin J; Kibiki, Gibson S; Aarnoutse, Rob E
2011-09-01
To assess adherence to community-based directly observed treatment (DOT) among Tanzanian tuberculosis patients using the Medication Event Monitoring System (MEMS) and to validate alternative adherence measures for resource-limited settings using MEMS as a gold standard. This was a longitudinal pilot study of 50 patients recruited consecutively from one rural hospital, one urban hospital and two urban health centres. Treatment adherence was monitored with MEMS and the validity of the following adherence measures was assessed: isoniazid urine test, urine colour test, Morisky scale, Brief Medication Questionnaire, adapted AIDS Clinical Trials Group (ACTG) adherence questionnaire, pill counts and medication refill visits. The mean adherence rate in the study population was 96.3% (standard deviation, SD: 7.7). Adherence was less than 100% in 70% of the patients, less than 95% in 21% of them, and less than 80% in 2%. The ACTG adherence questionnaire and urine colour test had the highest sensitivities but lowest specificities. The Morisky scale and refill visits had the highest specificities but lowest sensitivities. Pill counts and refill visits combined, used in routine practice, yielded moderate sensitivity and specificity, but sensitivity improved when the ACTG adherence questionnaire was added. Patients on community-based DOT showed good adherence in this study. The combination of pill counts, refill visits and the ACTG adherence questionnaire could be used to monitor adherence in settings where MEMS is not affordable. The findings with regard to adherence and to the validity of simple adherence measures should be confirmed in larger populations with wider variability in adherence rates.
Lundin, Andreas; Hallgren, Mats; Balliu, Natalja; Forsell, Yvonne
2015-01-01
The alcohol use disorders identification test (AUDIT) and AUDIT-Consumption (AUDIT-C) are commonly used in population surveys but there are few validations studies in the general population. Validity should be estimated in samples close to the targeted population and setting. This study aims to validate AUDIT and AUDIT-C in a general population sample (PART) in Stockholm, Sweden. We used a general population subsample age 20 to 64 that answered a postal questionnaire including AUDIT who later participated in a psychiatric interview (n = 1,093). Interviews using Schedules for Clinical Assessment in Neuropsychiatry was used as criterion standard. Diagnoses were set according to the fourth version of the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV). Agreement between the diagnostic test and criterion standard was measured with area under the receiver operator characteristics curve (AUC). A total of 1,086 (450 men and 636 women) of the interview participants completed AUDIT. There were 96 individuals with DSM-IV-alcohol dependence, 36 DSM-IV-Alcohol Abuse, and 153 Risk drinkers. AUCs were for DSM-IV-alcohol use disorder 0.90 (AUDIT-C 0.85); DSM-IV-dependence 0.94 (AUDIT-C 0.89); risk drinking 0.80 (AUDIT-C 0.80); and any criterion 0.87 (AUDIT-C 0.84). In this general population sample, AUDIT and AUDIT-C performed outstanding or excellent in identifying dependency, risk drinking, alcohol use disorder, any disorder, or risk drinking. Copyright © 2015 by the Research Society on Alcoholism.
Validity and Reliability of Farsi Version of Youth Sport Environment Questionnaire
Eshghi, Mohammad Ali; Kordi, Ramin; Memari, Amir Hossein; Ghaziasgar, Ahmad; Mansournia, Mohammad-Ali; Zamani Sani, Seyed Hojjat
2015-01-01
The Youth Sport Environment Questionnaire (YSEQ) had been developed from Group Environment Questionnaire, a well-known measure of team cohesion. The aim of this study was to adapt and examine the reliability and validity of the Farsi version of the YSEQ. This version was completed by 455 athletes aged 13–17 years. Results of confirmatory factor analysis indicated that two-factor solution showed a good fit to the data. The results also revealed that the Farsi YSEQ showed high internal consistency, test-retest reliability, and good concurrent validity. This study indicated that the Farsi version of the YSEQ is a valid and reliable measure to assess team cohesion in sport setting. PMID:26464900
Zimmermann, Karin; Cignacco, Eva; Eskola, Katri; Engberg, Sandra; Ramelet, Anne-Sylvie; Von der Weid, Nicolas; Bergstraesser, Eva
2015-12-01
To develop and test the Parental PELICAN Questionnaire, an instrument to retrospectively assess parental experiences and needs during their child's end-of-life care. To offer appropriate care for dying children, healthcare professionals need to understand the illness experience from the family perspective. A questionnaire specific to the end-of-life experiences and needs of parents losing a child is needed to evaluate the perceived quality of paediatric end-of-life care. This is an instrument development study applying mixed methods based on recommendations for questionnaire design and validation. The Parental PELICAN Questionnaire was developed in four phases between August 2012-March 2014: phase 1: item generation; phase 2: validity testing; phase 3: translation; phase 4: pilot testing. Psychometric properties were assessed after applying the Parental PELICAN Questionnaire in a sample of 224 bereaved parents in April 2014. Validity testing covered the evidence based on tests of content, internal structure and relations to other variables. The Parental PELICAN Questionnaire consists of approximately 90 items in four slightly different versions accounting for particularities of the four diagnostic groups. The questionnaire's items were structured according to six quality domains described in the literature. Evidence of initial validity and reliability could be demonstrated with the involvement of healthcare professionals and bereaved parents. The Parental PELICAN Questionnaire holds promise as a measure to assess parental experiences and needs and is applicable to a broad range of paediatric specialties and settings. Future validation is needed to evaluate its suitability in different cultures. © 2015 John Wiley & Sons Ltd.
Computation and application of tissue-specific gene set weights.
Frost, H Robert
2018-04-06
Gene set testing, or pathway analysis, has become a critical tool for the analysis of highdimensional genomic data. Although the function and activity of many genes and higher-level processes is tissue-specific, gene set testing is typically performed in a tissue agnostic fashion, which impacts statistical power and the interpretation and replication of results. To address this challenge, we have developed a bioinformatics approach to compute tissuespecific weights for individual gene sets using information on tissue-specific gene activity from the Human Protein Atlas (HPA). We used this approach to create a public repository of tissue-specific gene set weights for 37 different human tissue types from the HPA and all collections in the Molecular Signatures Database (MSigDB). To demonstrate the validity and utility of these weights, we explored three different applications: the functional characterization of human tissues, multi-tissue analysis for systemic diseases and tissue-specific gene set testing. All data used in the reported analyses is publicly available. An R implementation of the method and tissue-specific weights for MSigDB gene set collections can be downloaded at http://www.dartmouth.edu/∼hrfrost/TissueSpecificGeneSets. rob.frost@dartmouth.edu.
Bianco, Antonino; Filingeri, Davide; Paoli, Antonio; Palma, Antonio
2015-04-01
The aim of this study was to evaluate a new method to perform the one repetition maximum (1RM) bench press test, by combining previously validated predictive and practical procedures. Eight young male and 7 females participants, with no previous experience of resistance training, performed a first set of repetitions to fatigue (RTF) with a workload corresponding to ⅓ of their body mass (BM) for a maximum of 25 repetitions. Following a 5-min recovery period, a second set of RTF was performed with a workload corresponding to ½ of participants' BM. The number of repetitions performed in this set was then used to predict the workload to be used for the 1RM bench press test using Mayhew's equation. Oxygen consumption, heart rate and blood lactate were monitored before, during and after each 1RM attempt. A significant effect of gender was found on the maximum number of repetitions achieved during the RTF set performed with ½ of participants' BM (males: 25.0 ± 6.3; females: 11.0x± 10.6; t = 6.2; p < 0.001). The 1RM attempt performed with the workload predicted by Mayhew's equation resulted in females performing 1.2 ± 0.7 repetitions, while males performed 4.8 ± 1.9 repetitions. All participants reached their 1RM performance within 3 attempts, thus resulting in a maximum of 5 sets required to successfully perform the 1RM bench press test. We conclude that, by combining previously validated predictive equations with practical procedures (i.e. using a fraction of participants' BM to determine the workload for an RTF set), the new method we tested appeared safe, accurate (particularly in females) and time-effective in the practical evaluation of 1RM performance in inexperienced individuals. Copyright © 2014 Elsevier Ltd. All rights reserved.
PSL Icing Facility Upgrade Overview
NASA Technical Reports Server (NTRS)
Griffin, Thomas A.; Dicki, Dennis J.; Lizanich, Paul J.
2014-01-01
The NASA Glenn Research Center Propulsion Systems Lab (PSL) was recently upgraded to perform engine inlet ice crystal testing in an altitude environment. The system installed 10 spray bars in the inlet plenum for ice crystal generation using 222 spray nozzles. As an altitude test chamber, the PSL is capable of simulating icing events at altitude in a groundtest facility. The system was designed to operate at altitudes from 4,000 to 40,000 ft at Mach numbers up to 0.8M and inlet total temperatures from -60 to +15 degF. This paper and presentation will be part of a series of presentations on PSL Icing and will cover the development of the icing capability through design, developmental testing, installation, initial calibration, and validation engine testing. Information will be presented on the design criteria and process, spray bar developmental testing at Cox and Co., system capabilities, and initial calibration and engine validation test. The PSL icing system was designed to provide NASA and the icing community with a facility that could be used for research studies of engine icing by duplicating in-flight events in a controlled ground-test facility. With the system and the altitude chamber we can produce flight conditions and cloud environments to simulate those encountered in flight. The icing system can be controlled to set various cloud uniformities, droplet median volumetric diameter (MVD), and icing water content (IWC) through a wide variety of conditions. The PSL chamber can set altitudes, Mach numbers, and temperatures of interest to the icing community and also has the instrumentation capability of measuring engine performance during icing testing. PSL last year completed the calibration and initial engine validation of the facility utilizing a Honeywell ALF502-R5 engine and has duplicated in-flight roll back conditions experienced during flight testing. This paper will summarize the modifications and buildup of the facility to accomplish these tests.
ERIC Educational Resources Information Center
Levy, Susan S.; Readdy, R. Tucker
2009-01-01
The purpose of this study was to examine the test-retest reliability of the last 7-day long form International Physical Activity Questionnaire (Craig et al., 2003) and to examine the construct validity for the measure in a research setting. Participants were 151 male (n = 52) and female (n = 99) university students (M age = 24.15 years, SD = 5.01)…
A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models.
Tabe-Bordbar, Shayan; Emad, Amin; Zhao, Sihai Dave; Sinha, Saurabh
2018-04-26
Cross-validation (CV) is a technique to assess the generalizability of a model to unseen data. This technique relies on assumptions that may not be satisfied when studying genomics datasets. For example, random CV (RCV) assumes that a randomly selected set of samples, the test set, well represents unseen data. This assumption doesn't hold true where samples are obtained from different experimental conditions, and the goal is to learn regulatory relationships among the genes that generalize beyond the observed conditions. In this study, we investigated how the CV procedure affects the assessment of supervised learning methods used to learn gene regulatory networks (or in other applications). We compared the performance of a regression-based method for gene expression prediction estimated using RCV with that estimated using a clustering-based CV (CCV) procedure. Our analysis illustrates that RCV can produce over-optimistic estimates of the model's generalizability compared to CCV. Next, we defined the 'distinctness' of test set from training set and showed that this measure is predictive of performance of the regression method. Finally, we introduced a simulated annealing method to construct partitions with gradually increasing distinctness and showed that performance of different gene expression prediction methods can be better evaluated using this method.
AlzhCPI: A knowledge base for predicting chemical-protein interactions towards Alzheimer's disease.
Fang, Jiansong; Wang, Ling; Li, Yecheng; Lian, Wenwen; Pang, Xiaocong; Wang, Hong; Yuan, Dongsheng; Wang, Qi; Liu, Ai-Lin; Du, Guan-Hua
2017-01-01
Alzheimer's disease (AD) is a complicated progressive neurodegeneration disorder. To confront AD, scientists are searching for multi-target-directed ligands (MTDLs) to delay disease progression. The in silico prediction of chemical-protein interactions (CPI) can accelerate target identification and drug discovery. Previously, we developed 100 binary classifiers to predict the CPI for 25 key targets against AD using the multi-target quantitative structure-activity relationship (mt-QSAR) method. In this investigation, we aimed to apply the mt-QSAR method to enlarge the model library to predict CPI towards AD. Another 104 binary classifiers were further constructed to predict the CPI for 26 preclinical AD targets based on the naive Bayesian (NB) and recursive partitioning (RP) algorithms. The internal 5-fold cross-validation and external test set validation were applied to evaluate the performance of the training sets and test set, respectively. The area under the receiver operating characteristic curve (ROC) for the test sets ranged from 0.629 to 1.0, with an average of 0.903. In addition, we developed a web server named AlzhCPI to integrate the comprehensive information of approximately 204 binary classifiers, which has potential applications in network pharmacology and drug repositioning. AlzhCPI is available online at http://rcidm.org/AlzhCPI/index.html. To illustrate the applicability of AlzhCPI, the developed system was employed for the systems pharmacology-based investigation of shichangpu against AD to enhance the understanding of the mechanisms of action of shichangpu from a holistic perspective.
Antecedents and Consequences of Supplier Performance Evaluation Efficacy
2016-06-30
forming groups of high and low values. These tests are contingent on the reliable and valid measure of high and low rating inflation and high and...year)? Future research could deploy a SPM system as a test case on a limited set of transactions. Using a quasi-experimental design , comparisons...single source, common method bias must be of concern. Harmon’s one -factor test showed that when latent-indicator items were forced onto a single
DOE Office of Scientific and Technical Information (OSTI.GOV)
SAMS TL; GUILLOT S
Scoping laboratory scale tests were performed at the Chemical Engineering Department of the Georgia Institute of Technology (Georgia Tech), and the Hanford 222-S Laboratory, involving double-shell tank (DST) and single-shell tank (SST) Hanford waste simulants. These tests established the viability of the Lithium Hydrotalcite precipitation process as a solution to remove aluminum and recycle sodium hydroxide from the Hanford tank waste, and set the basis of a validation test campaign to demonstrate a Technology Readiness Level of 3.
Ada Programming Support Environment (APSE) Evaluation and Validation (E&V) Team
1991-12-31
standards. The purpose of the team was to assist the project in several ways. Raymond Szymanski of Wright Research Iand Development Center (WRDC, now...debuggers, program library systems, and compiler diagnostics. The test suite does not include explicit tests for the existence of language features . The...support software is a set of tools and procedures which assist in preparing and executing the test suite, in extracting data from the results of
A rotorcraft flight database for validation of vision-based ranging algorithms
NASA Technical Reports Server (NTRS)
Smith, Phillip N.
1992-01-01
A helicopter flight test experiment was conducted at the NASA Ames Research Center to obtain a database consisting of video imagery and accurate measurements of camera motion, camera calibration parameters, and true range information. The database was developed to allow verification of monocular passive range estimation algorithms for use in the autonomous navigation of rotorcraft during low altitude flight. The helicopter flight experiment is briefly described. Four data sets representative of the different helicopter maneuvers and the visual scenery encountered during the flight test are presented. These data sets will be made available to researchers in the computer vision community.
Raykov, Tenko; Marcoulides, George A; Dimitrov, Dimiter M; Li, Tatyana
2018-02-01
This article extends the procedure outlined in the article by Raykov, Marcoulides, and Tong for testing congruence of latent constructs to the setting of binary items and clustering effects. In this widely used setting in contemporary educational and psychological research, the method can be used to examine if two or more homogeneous multicomponent instruments with distinct components measure the same construct. The approach is useful in scale construction and development research as well as in construct validation investigations. The discussed method is illustrated with data from a scholastic aptitude assessment study.
Iversen, J V; Bartels, E M; Jørgensen, J E; Nielsen, T G; Ginnerup, C; Lind, M C; Langberg, H
2016-12-01
The VISA-A questionnaire has proven to be a valid and reliable tool for assessing severity of Achilles tendinopathy (AT). The aim was to translate and cross-culturally adapt the VISA-A questionnaire for a Danish-speaking AT population, and subsequently perform validity and reliability tests. Translation and following cross-cultural adaptation was performed as translation, synthesis, reverse translation, expert review, and pretesting. The final Danish version (VISA-A-DK) was tested for reliability on healthy controls (n = 75) and patients (n = 36). Tests for internal consistency, validity, and structure were performed on 71 patients. VISA-A-DK showed good reliability for patients (r = 0.80 ICC = 0.79) and healthy individuals (r = 0.98 ICC = 0.97). Internal consistency was 0.73 (Cronbach's alpha). The mean VISA-A-DK score in AT patients was 51 [47-55]. This was significantly lower than healthy controls with a score of 93 (90-95). Criterion validity was considered good when comparing the scores of the Danish version with the original version in both healthy individuals and patients. VISA-A-DK is a valid and reliable instrument and has shown compatible to the original version in assessment of AT patients. VISA-A-DK is a useful tool in the assessment of AT, both in research and in a clinical setting. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Impact of syncope on quality of life: validation of a measure in patients undergoing tilt testing.
Nave-Leal, Elisabete; Oliveira, Mário; Pais-Ribeiro, José; Santos, Sofia; Oliveira, Eunice; Alves, Teresa; Cruz Ferreira, Rui
2015-03-01
Recurrent syncope has a significant impact on quality of life. The development of measurement scales to assess this impact that are easy to use in clinical settings is crucial. The objective of the present study is a preliminary validation of the Impact of Syncope on Quality of Life questionnaire for the Portuguese population. The instrument underwent a process of translation, validation, analysis of cultural appropriateness and cognitive debriefing. A population of 39 patients with a history of recurrent syncope (>1 year) who underwent tilt testing, aged 52.1 ± 16.4 years (21-83), 43.5% male, most in active employment (n=18) or retired (n=13), constituted a convenience sample. The resulting Portuguese version is similar to the original, with 12 items in a single aggregate score, and underwent statistical validation, with assessment of reliability, validity and stability over time. With regard to reliability, the internal consistency of the scale is 0.9. Assessment of convergent and discriminant validity showed statistically significant results (p<0.01). Regarding stability over time, a test-retest of this instrument at six months after tilt testing with 22 patients of the sample who had not undergone any clinical intervention found no statistically significant changes in quality of life. The results indicate that this instrument is of value for assessing quality of life in patients with recurrent syncope in Portugal. Copyright © 2014 Sociedade Portuguesa de Cardiologia. Published by Elsevier España. All rights reserved.
Andrews, Donald A; Guzzo, Lina; Raynor, Peter; Rowe, Robert C; Rettinger, L Jill; Brews, Albert; Wormith, J Stephen
2012-02-01
The Level of Service/Case Management Inventory (LS/CMI) and the Youth version (YLS/CMI) generate an assessment of risk/need across eight domains that are considered to be relevant for girls and boys and for women and men. Aggregated across five data sets, the predictive validity of each of the eight domains was gender-neutral. The composite total score (LS/CMI total risk/need) was strongly associated with the recidivism of males (mean r = .39, mean AUC = .746) and very strongly associated with the recidivism of females (mean r = .53, mean AUC = .827). The enhanced validity of LS total risk/need with females was traced to the exceptional validity of Substance Abuse with females. The intra-data set conclusions survived the introduction of two very large samples composed of female offenders exclusively. Finally, the mean incremental contributions of gender and the gender-by-risk level interactions in the prediction of criminal recidivism were minimal compared to the relatively strong validity of the LS/CMI risk level. Although the variance explained by gender was minimal and although high-risk cases were high-risk cases regardless of gender, the recidivism rates of lower risk females were lower than the recidivism rates of lower risk males, suggesting possible implications for test interpretation and policy.
Smith, P; Endris, R; Kronvall, G; Thomas, V; Verner-Jeffreys, D; Wilhelm, C; Dalsgaard, I
2016-02-01
Epidemiological cut-off values were developed for application to antibiotic susceptibility data for Flavobacterium psychrophilum generated by standard CLSI test protocols. The MIC values for ten antibiotic agents against Flavobacterium psychrophilum were determined in two laboratories. For five antibiotics, the data sets were of sufficient quality and quantity to allow the setting of valid epidemiological cut-off values. For these agents, the cut-off values, calculated by the application of the statistically based normalized resistance interpretation method, were ≤16 mg L(-1) for erythromycin, ≤2 mg L(-1) for florfenicol, ≤0.025 mg L(-1) for oxolinic acid (OXO), ≤0.125 mg L(-1) for oxytetracycline and ≤20 (1/19) mg L(-1) for trimethoprim/sulphamethoxazole. For ampicillin and amoxicillin, the majority of putative wild-type observations were 'off scale', and therefore, statistically valid cut-off values could not be calculated. For ormetoprim/sulphadimethoxine, the data were excessively diverse and a valid cut-off could not be determined. For flumequine, the putative wild-type data were extremely skewed, and for enrofloxacin, there was inadequate separation in the MIC values for putative wild-type and non-wild-type strains. It is argued that the adoption of OXO as a class representative for the quinolone group would be a valid method of determining susceptibilities to these agents. © 2014 John Wiley & Sons Ltd.
Ver Elst, K; Vermeiren, S; Schouwers, S; Callebaut, V; Thomson, W; Weekx, S
2013-12-01
CLSI recommends a minimal citrate tube fill volume of 90%. A validation protocol with clinical and analytical components was set up to determine the tube fill threshold for international normalized ratio of prothrombin time (PT-INR), activated partial thromboplastin time (aPTT) and fibrinogen. Citrated coagulation samples from 16 healthy donors and eight patients receiving vitamin K antagonists (VKA) were evaluated. Eighty-nine tubes were filled to varying volumes of >50%. Coagulation tests were performed on ACL TOP 500 CTS(®) . Receiver Operating Characteristic (ROC) plot, with Total error (TE) and critical difference (CD) as possible acceptance criteria, was used to determine the fill threshold. Receiving Operating Characteristic was the most accurate with CD for PT-INR and TE for aPTT resulting in thresholds of 63% for PT and 80% for aPTT. By adapted ROC, based on threshold setting at a point of 100% sensitivity at a maximum specificity, CD was best for PT and TE for aPTT resulting in thresholds of 73% for PT and 90% for aPTT. For fibrinogen, the method was only valid with the TE criterion at a 63% fill volume. In our study, we validated the minimal citrate tube fill volumes of 73%, 90% and 63% for PT-INR, aPTT and fibrinogen, respectively. © 2013 John Wiley & Sons Ltd.
INOUE, Akiomi; KAWAKAMI, Norito; SHIMOMITSU, Teruichi; TSUTSUMI, Akizumi; HARATANI, Takashi; YOSHIKAWA, Toru; SHIMAZU, Akihito; ODAGIRI, Yuko
2014-01-01
This study aimed to investigate the reliability and construct validity of a new version of the Brief Job Stress Questionnaire (New BJSQ), which measures an extended set of psychosocial factors at work by adding new scales/items to the current version of the BJSQ. Additional scales/items were extensively collected from theoretical job stress models and similar questionnaires in several countries. Scales/items were field-tested and refined through a pilot internet survey. Finally, an 84-item questionnaire (141 items in total when combined with the current BJSQ) was developed. A nationally representative survey was administered to employees in Japan (n=1,633) to examine the reliability and construct validity. Most scales showed acceptable levels of internal consistency and test-retest reliability. Principal component analyses showed that the first factor explained 50% or greater proportion of the variance in most scales. A scale factor analysis and a correlation analysis showed that these scales fit the theoretical expectations. These findings provided a piece of evidence that the New BJSQ scales are reliable and valid. Although more detailed content and construct validity should be examined in future study, the New BJSQ is a useful instrument to evaluate psychosocial work environment and positive mental health outcomes in the current workplace. PMID:24492763
Validity of an Athletic Skills Track among 6- to 12-year-old children.
Hoeboer, Joris; De Vries, Sanne; Krijger-Hombergen, Michiel; Wormhoudt, René; Drent, Annelies; Krabben, Kay; Savelsbergh, Geert
2016-11-01
The purpose of this study was to examine the feasibility and validity of an Athletic Skills Track (AST) to assess fundamental movement skills among 6- to 12-year-old children in a physical education setting. Four hundred sixty-three Dutch children (211 girls, 252 boys) completed three tests: the Körperkoordinationstest für Kinder (KTK) and two Athletic Skills Tracks (AST-1, AST-2). The validity of AST-1 and AST-2 was examined by correlating the time (s) needed to complete the tracks and the KTK Motor Quotient (MQ). Overall, there was a low correlation between AST-1 and the KTK MQ (r = -0.474 (P < 0.01)) and a moderate correlation between AST-2 and the KTK MQ (r = -0.502 (P < 0.01)). When split up by age group the associations were much higher and ranged between r = -0.469 and r = -0.767), with the exception of the low correlation coefficient of the AST-2 in 7-year-olds. The results indicate that fundamental movement skills of 6- to 12-year-old children can be assessed with a quick, convenient and low-cost motor competence test in a physical education setting, i.e., an Athletic Skills Track. Future studies should further assess the reliability, discriminative ability and validity of age-specific versions of the AST.
Lee, Jin; Huang, Yueng-hsiang; Robertson, Michelle M; Murphy, Lauren A; Garabet, Angela; Chang, Wen-Ruey
2014-02-01
The goal of this study was to examine the external validity of a 12-item generic safety climate scale for lone workers in order to evaluate the appropriateness of generalized use of the scale in the measurement of safety climate across various lone work settings. External validity evidence was established by investigating the measurement equivalence (ME) across different industries and companies. Confirmatory factor analysis (CFA)-based and item response theory (IRT)-based perspectives were adopted to examine the ME of the generic safety climate scale for lone workers across 11 companies from the trucking, electrical utility, and cable television industries. Fairly strong evidence of ME was observed for both organization- and group-level generic safety climate sub-scales. Although significant invariance was observed in the item intercepts across the different lone work settings, absolute model fit indices remained satisfactory in the most robust step of CFA-based ME testing. IRT-based ME testing identified only one differentially functioning item from the organization-level generic safety climate sub-scale, but its impact was minimal and strong ME was supported. The generic safety climate scale for lone workers reported good external validity and supported the presence of a common feature of safety climate among lone workers. The scale can be used as an effective safety evaluation tool in various lone work situations. Copyright © 2013 Elsevier Ltd. All rights reserved.
Inoue, Akiomi; Kawakami, Norito; Shimomitsu, Teruichi; Tsutsumi, Akizumi; Haratani, Takashi; Yoshikawa, Toru; Shimazu, Akihito; Odagiri, Yuko
2014-01-01
This study aimed to investigate the reliability and construct validity of a new version of the Brief Job Stress Questionnaire (New BJSQ), which measures an extended set of psychosocial factors at work by adding new scales/items to the current version of the BJSQ. Additional scales/items were extensively collected from theoretical job stress models and similar questionnaires in several countries. Scales/items were field-tested and refined through a pilot internet survey. Finally, an 84-item questionnaire (141 items in total when combined with the current BJSQ) was developed. A nationally representative survey was administered to employees in Japan (n=1,633) to examine the reliability and construct validity. Most scales showed acceptable levels of internal consistency and test-retest reliability. Principal component analyses showed that the first factor explained 50% or greater proportion of the variance in most scales. A scale factor analysis and a correlation analysis showed that these scales fit the theoretical expectations. These findings provided a piece of evidence that the New BJSQ scales are reliable and valid. Although more detailed content and construct validity should be examined in future study, the New BJSQ is a useful instrument to evaluate psychosocial work environment and positive mental health outcomes in the current workplace.
Results for the Aboveground Configuration of the Boiling Water Reactor Dry Cask Simulator
DOE Office of Scientific and Technical Information (OSTI.GOV)
Durbin, Samuel G.; Lindgren, Eric R.
The thermal performance of commercial nuclear spent fuel dry storage casks is evaluated through detailed numerical analysis. These modeling efforts are completed by the vendor to demonstrate performance and regulatory compliance. The calculations are then independently verified by the Nuclear Regulatory Commission (NRC). Carefully measured data sets generated from testing of full-sized casks or smaller cask analogs are widely recognized as vital for validating these models. Recent advances in dry storage cask designs have significantly increased the maximum thermal load allowed in a cask, in part by increasing the efficiency of internal conduction pathways, and also by increasing the internalmore » convection through greater canister helium pressure. These same canistered cask systems rely on ventilation between the canister and the overpack to convect heat away from the canister to the environment for both above- and below-ground configurations. While several testing programs have been previously conducted, these earlier validation attempts did not capture the effects of elevated helium pressures or accurately portray the external convection of above-ground and below-ground canistered dry cask systems. The purpose of the current investigation was to produce data sets that can be used to test the validity of the assumptions associated with the calculations used to determine steady-state cladding temperatures in modern dry casks that utilize elevated helium pressure in the sealed canister in an above-ground configuration.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cho, Daniel D; Wernicke, A Gabriella; Nori, Dattatreyudu
Purpose/Objective(s): The aim of this study is to build the estimator of toxicity using artificial neural network (ANN) for head and neck cancer patients Materials/Methods: An ANN can combine variables into a predictive model during training and considered all possible correlations of variables. We constructed an ANN based on the data from 73 patients with advanced H and N cancer treated with external beam radiotherapy and/or chemotherapy at our institution. For the toxicity estimator we defined input data including age, sex, site, stage, pathology, status of chemo, technique of external beam radiation therapy (EBRT), length of treatment, dose of EBRT,more » status of post operation, length of follow-up, the status of local recurrences and distant metastasis. These data were digitized based on the significance and fed to the ANN as input nodes. We used 20 hidden nodes (for the 13 input nodes) to take care of the correlations of input nodes. For training ANN, we divided data into three subsets such as training set, validation set and test set. Finally, we built the estimator for the toxicity from ANN output. Results: We used 13 input variables including the status of local recurrences and distant metastasis and 20 hidden nodes for correlations. 59 patients for training set, 7 patients for validation set and 7 patients for test set and fed the inputs to Matlab neural network fitting tool. We trained the data within 15% of errors of outcome. In the end we have the toxicity estimation with 74% of accuracy. Conclusion: We proved in principle that ANN can be a very useful tool for predicting the RT outcomes for high risk H and N patients. Currently we are improving the results using cross validation.« less
Kanai, Masashi; Okamoto, Kazuya; Yamamoto, Yosuke; Yoshioka, Akira; Hiramoto, Shuji; Nozaki, Akira; Nishikawa, Yoshitaka; Yamaguchi, Daisuke; Tomono, Teruko; Nakatsui, Masahiko; Baba, Mika; Morita, Tatsuya; Matsumoto, Shigemi; Kuroda, Tomohiro; Okuno, Yasushi; Muto, Manabu
2017-01-01
Background We aimed to develop an adaptable prognosis prediction model that could be applied at any time point during the treatment course for patients with cancer receiving chemotherapy, by applying time-series real-world big data. Methods Between April 2004 and September 2014, 4,997 patients with cancer who had received systemic chemotherapy were registered in a prospective cohort database at the Kyoto University Hospital. Of these, 2,693 patients with a death record were eligible for inclusion and divided into training (n = 1,341) and test (n = 1,352) cohorts. In total, 3,471,521 laboratory data at 115,738 time points, representing 40 laboratory items [e.g., white blood cell counts and albumin (Alb) levels] that were monitored for 1 year before the death event were applied for constructing prognosis prediction models. All possible prediction models comprising three different items from 40 laboratory items (40C3 = 9,880) were generated in the training cohort, and the model selection was performed in the test cohort. The fitness of the selected models was externally validated in the validation cohort from three independent settings. Results A prognosis prediction model utilizing Alb, lactate dehydrogenase, and neutrophils was selected based on a strong ability to predict death events within 1–6 months and a set of six prediction models corresponding to 1,2, 3, 4, 5, and 6 months was developed. The area under the curve (AUC) ranged from 0.852 for the 1 month model to 0.713 for the 6 month model. External validation supported the performance of these models. Conclusion By applying time-series real-world big data, we successfully developed a set of six adaptable prognosis prediction models for patients with cancer receiving chemotherapy. PMID:28837592
Lian, Zhen-Qiang; Wang, Qi; Zhang, An-Qin; Zhang, Jiang-Yu; Han, Xiao-Rong; Yu, Hai-Yun; Xie, Si-Mei
2015-04-01
Mammary ductoscopy (MD) is commonly used to detect intraductal lesions associated with nipple discharge. This study investigated the relationships between ductoscopic image-based indicators and breast cancer risk, and developed a nomogram for evaluating breast cancer risk in intraductal neoplasms with nipple discharge. A total of 879 consecutive inpatients (916 breasts) with nipple discharge who underwent selective duct excision for intraductal neoplasms detected by MD from June 2008 to April 2014 were analyzed retrospectively. A nomogram was developed using a multivariate logistic regression model based on data from a training set (687 cases) and validated in an independent validation set (229 cases). A Youden-derived cut-off value was assigned to the nomogram for the diagnosis of breast cancer. Color of discharge, location, appearance, and surface of neoplasm, and morphology of ductal wall were independent predictors for breast cancer in multivariate logistic regression analysis. A nomogram based on these predictors performed well. The P value of the Hosmer-Lemeshow test for the prediction model was 0.36. Area under the curve values of 0.812 (95 % confidence interval (CI) 0.763-0.860) and 0.738 (95 % CI 0.635-0.841) was obtained in the training and validation sets, respectively. The accuracies of the nomogram for breast cancer diagnosis were 71.2 % in the training set and 75.5 % in the validation set. We developed a nomogram for evaluating breast cancer risk in intraductal neoplasms with nipple discharge based on MD image findings. This model may aid individual risk assessment and guide treatment in clinical practice.
Ng, Hui Wen; Doughty, Stephen W; Luo, Heng; Ye, Hao; Ge, Weigong; Tong, Weida; Hong, Huixiao
2015-12-21
Some chemicals in the environment possess the potential to interact with the endocrine system in the human body. Multiple receptors are involved in the endocrine system; estrogen receptor α (ERα) plays very important roles in endocrine activity and is the most studied receptor. Understanding and predicting estrogenic activity of chemicals facilitates the evaluation of their endocrine activity. Hence, we have developed a decision forest classification model to predict chemical binding to ERα using a large training data set of 3308 chemicals obtained from the U.S. Food and Drug Administration's Estrogenic Activity Database. We tested the model using cross validations and external data sets of 1641 chemicals obtained from the U.S. Environmental Protection Agency's ToxCast project. The model showed good performance in both internal (92% accuracy) and external validations (∼ 70-89% relative balanced accuracies), where the latter involved the validations of the model across different ER pathway-related assays in ToxCast. The important features that contribute to the prediction ability of the model were identified through informative descriptor analysis and were related to current knowledge of ER binding. Prediction confidence analysis revealed that the model had both high prediction confidence and accuracy for most predicted chemicals. The results demonstrated that the model constructed based on the large training data set is more accurate and robust for predicting ER binding of chemicals than the published models that have been developed using much smaller data sets. The model could be useful for the evaluation of ERα-mediated endocrine activity potential of environmental chemicals.
Loeb, Danielle F; Crane, Lori A; Leister, Erin; Bayliss, Elizabeth A; Ludman, Evette; Binswanger, Ingrid A; Kline, Danielle M; Smith, Meredith; deGruy, Frank V; Nease, Donald E; Dickinson, L Miriam
Develop and validate self-efficacy scales for primary care provider (PCP) mental illness management and team-based care participation. We developed three self-efficacy scales: team-based care (TBC), mental illness management (MIM), and chronic medical illness (CMI). We developed the scales using Bandura's Social Cognitive Theory as a guide. The survey instrument included items from previously validated scales on team-based care and mental illness management. We administered a mail survey to 900 randomly selected Colorado physicians. We conducted exploratory principal factor analysis with oblique rotation. We constructed self-efficacy scales and calculated standardized Cronbach's alpha coefficients to test internal consistency. We calculated correlation coefficients between the MIM and TBC scales and previously validated measures related to each scale to evaluate convergent validity. We tested correlations between the TBC and the measures expected to correlate with the MIM scale and vice versa to evaluate discriminant validity. PCPs (n=402, response rate=49%) from diverse practice settings completed surveys. Items grouped into factors as expected. Cronbach's alphas were 0.94, 0.88, and 0.83 for TBC, MIM, and CMI scales respectively. In convergent validity testing, the TBC scale was correlated as predicted with scales assessing communications strategies, attitudes toward teams, and other teamwork indicators (r=0.25 to 0.40, all statistically significant). Likewise, the MIM scale was significantly correlated with several items about knowledge and experience managing mental illness (r=0.24 to 41, all statistically significant). As expected in discriminant validity testing, the TBC scale had only very weak correlations with the mental illness knowledge and experience managing mental illness items (r=0.03 to 0.12). Likewise, the MIM scale was only weakly correlated with measures of team-based care (r=0.09 to.17). This validation study of MIM and TBC self-efficacy scales showed high internal validity and good construct validity. Copyright © 2016 Elsevier Inc. All rights reserved.
A Method of Q-Matrix Validation for the Linear Logistic Test Model
Baghaei, Purya; Hohensinn, Christine
2017-01-01
The linear logistic test model (LLTM) is a well-recognized psychometric model for examining the components of difficulty in cognitive tests and validating construct theories. The plausibility of the construct model, summarized in a matrix of weights, known as the Q-matrix or weight matrix, is tested by (1) comparing the fit of LLTM with the fit of the Rasch model (RM) using the likelihood ratio (LR) test and (2) by examining the correlation between the Rasch model item parameters and LLTM reconstructed item parameters. The problem with the LR test is that it is almost always significant and, consequently, LLTM is rejected. The drawback of examining the correlation coefficient is that there is no cut-off value or lower bound for the magnitude of the correlation coefficient. In this article we suggest a simulation method to set a minimum benchmark for the correlation between item parameters from the Rasch model and those reconstructed by the LLTM. If the cognitive model is valid then the correlation coefficient between the RM-based item parameters and the LLTM-reconstructed item parameters derived from the theoretical weight matrix should be greater than those derived from the simulated matrices. PMID:28611721
Votano, Joseph R; Parham, Marc; Hall, L Mark; Hall, Lowell H; Kier, Lemont B; Oloff, Scott; Tropsha, Alexander
2006-11-30
Four modeling techniques, using topological descriptors to represent molecular structure, were employed to produce models of human serum protein binding (% bound) on a data set of 1008 experimental values, carefully screened from publicly available sources. To our knowledge, this data is the largest set on human serum protein binding reported for QSAR modeling. The data was partitioned into a training set of 808 compounds and an external validation test set of 200 compounds. Partitioning was accomplished by clustering the compounds in a structure descriptor space so that random sampling of 20% of the whole data set produced an external test set that is a good representative of the training set with respect to both structure and protein binding values. The four modeling techniques include multiple linear regression (MLR), artificial neural networks (ANN), k-nearest neighbors (kNN), and support vector machines (SVM). With the exception of the MLR model, the ANN, kNN, and SVM QSARs were ensemble models. Training set correlation coefficients and mean absolute error ranged from r2=0.90 and MAE=7.6 for ANN to r2=0.61 and MAE=16.2 for MLR. Prediction results from the validation set yielded correlation coefficients and mean absolute errors which ranged from r2=0.70 and MAE=14.1 for ANN to a low of r2=0.59 and MAE=18.3 for the SVM model. Structure descriptors that contribute significantly to the models are discussed and compared with those found in other published models. For the ANN model, structure descriptor trends with respect to their affects on predicted protein binding can assist the chemist in structure modification during the drug design process.
49 CFR 535.9 - Enforcement approach.
Code of Federal Regulations, 2011 CFR
2011-10-01
... Transportation Other Regulations Relating to Transportation (Continued) NATIONAL HIGHWAY TRAFFIC SAFETY... testing throughout a given model year in order to validate data received from manufacturers and will... model year. The average set balance is based upon the engines or vehicles performance above or below the...
NASA Technical Reports Server (NTRS)
LaBel, Kenneth A.; Barth, Janet L.; Brewer, Dana A.
2003-01-01
This viewgraph presentation provides information on flight validation experiments for technologies to determine solar effects. The experiments are intended to demonstrate tolerance to a solar variant environment. The technologies tested are microelectronics, photonics, materials, and sensors.
Groeber, F; Schober, L; Schmid, F F; Traube, A; Kolbus-Hernandez, S; Daton, K; Hoffmann, S; Petersohn, D; Schäfer-Korting, M; Walles, H; Mewes, K R
2016-10-01
To replace the Draize skin irritation assay (OECD guideline 404) several test methods based on reconstructed human epidermis (RHE) have been developed and were adopted in the OECD test guideline 439. However, all validated test methods in the guideline are linked to RHE provided by only three companies. Thus, the availability of these test models is dependent on the commercial interest of the producer. To overcome this limitation and thus to increase the accessibility of in vitro skin irritation testing, an open source reconstructed epidermis (OS-REp) was introduced. To demonstrate the capacity of the OS-REp in regulatory risk assessment, a catch-up validation study was performed. The participating laboratories used in-house generated OS-REp to assess the set of 20 reference substances according to the performance standards amending the OECD test guideline 439. Testing was performed under blinded conditions. The within-laboratory reproducibility of 87% and the inter-laboratory reproducibility of 85% prove a high reliability of irritancy testing using the OS-REp protocol. In addition, the prediction capacity was with an accuracy of 80% comparable to previous published RHE based test protocols. Taken together the results indicate that the OS-REp test method can be used as a standalone alternative skin irritation test replacing the OECD test guideline 404. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Shi, Min; Movius, James; Dator, Romel; Aro, Patrick; Zhao, Yanchun; Pan, Catherine; Lin, Xiangmin; Bammler, Theo K.; Stewart, Tessandra; Zabetian, Cyrus P.; Peskind, Elaine R.; Hu, Shu-Ching; Quinn, Joseph F.; Galasko, Douglas R.; Zhang, Jing
2015-01-01
Finding robust biomarkers for Parkinson disease (PD) is currently hampered by inherent technical limitations associated with imaging or antibody-based protein assays. To circumvent the challenges, we adapted a staged pipeline, starting from our previous proteomic profiling followed by high-throughput targeted mass spectrometry (MS), to identify peptides in human cerebrospinal fluid (CSF) for PD diagnosis and disease severity correlation. In this multicenter study consisting of training and validation sets, a total of 178 subjects were randomly selected from a retrospective cohort, matching age and sex between PD patients, healthy controls, and neurological controls with Alzheimer disease (AD). From ∼14,000 unique peptides displaying differences between PD and healthy control in proteomic investigations, 126 peptides were selected based on relevance and observability in CSF using bioinformatic analysis and MS screening, and then quantified by highly accurate and sensitive selected reaction monitoring (SRM) in the CSF of 30 PD patients versus 30 healthy controls (training set), followed by diagnostic (receiver operating characteristics) and disease severity correlation analyses. The most promising candidates were further tested in an independent cohort of 40 PD patients, 38 AD patients, and 40 healthy controls (validation set). A panel of five peptides (derived from SPP1, LRP1, CSF1R, EPHA4, and TIMP1) was identified to provide an area under curve (AUC) of 0.873 (sensitivity = 76.7%, specificity = 80.0%) for PD versus healthy controls in the training set. The performance was essentially confirmed in the validation set (AUC = 0.853, sensitivity = 82.5%, specificity = 82.5%). Additionally, this panel could also differentiate the PD and AD groups (AUC = 0.990, sensitivity = 95.0%, specificity = 97.4%). Furthermore, a combination of two peptides belonging to proteins TIMP1 and APLP1 significantly correlated with disease severity as determined by the Unified Parkinson's Disease Rating Scale motor scores in both the training (r = 0.381, p = 0.038)j and the validation (r = 0.339, p = 0.032) sets. The novel panel of CSF peptides, if validated in independent cohorts, could be used to assist in clinical diagnosis of PD and has the potential to help monitoring or predicting disease progression. PMID:25556233
Lam, Lucia L.; Ghadessi, Mercedeh; Erho, Nicholas; Vergara, Ismael A.; Alshalalfa, Mohammed; Buerki, Christine; Haddad, Zaid; Sierocinski, Thomas; Triche, Timothy J.; Skinner, Eila C.; Davicioni, Elai; Daneshmand, Siamak; Black, Peter C.
2014-01-01
Background Nearly half of muscle-invasive bladder cancer patients succumb to their disease following cystectomy. Selecting candidates for adjuvant therapy is currently based on clinical parameters with limited predictive power. This study aimed to develop and validate genomic-based signatures that can better identify patients at risk for recurrence than clinical models alone. Methods Transcriptome-wide expression profiles were generated using 1.4 million feature-arrays on archival tumors from 225 patients who underwent radical cystectomy and had muscle-invasive and/or node-positive bladder cancer. Genomic (GC) and clinical (CC) classifiers for predicting recurrence were developed on a discovery set (n = 133). Performances of GC, CC, an independent clinical nomogram (IBCNC), and genomic-clinicopathologic classifiers (G-CC, G-IBCNC) were assessed in the discovery and independent validation (n = 66) sets. GC was further validated on four external datasets (n = 341). Discrimination and prognostic abilities of classifiers were compared using area under receiver-operating characteristic curves (AUCs). All statistical tests were two-sided. Results A 15-feature GC was developed on the discovery set with area under curve (AUC) of 0.77 in the validation set. This was higher than individual clinical variables, IBCNC (AUC = 0.73), and comparable to CC (AUC = 0.78). Performance was improved upon combining GC with clinical nomograms (G-IBCNC, AUC = 0.82; G-CC, AUC = 0.86). G-CC high-risk patients had elevated recurrence probabilities (P < .001), with GC being the best predictor by multivariable analysis (P = .005). Genomic-clinicopathologic classifiers outperformed clinical nomograms by decision curve and reclassification analyses. GC performed the best in validation compared with seven prior signatures. GC markers remained prognostic across four independent datasets. Conclusions The validated genomic-based classifiers outperform clinical models for predicting postcystectomy bladder cancer recurrence. This may be used to better identify patients who need more aggressive management. PMID:25344601
Validity of an Interactive Functional Reach Test.
Galen, Sujay S; Pardo, Vicky; Wyatt, Douglas; Diamond, Andrew; Brodith, Victor; Pavlov, Alex
2015-08-01
Videogaming platforms such as the Microsoft (Redmond, WA) Kinect(®) are increasingly being used in rehabilitation to improve balance performance and mobility. These gaming platforms do not have built-in clinical measures that offer clinically meaningful data. We have now developed software that will enable the Kinect sensor to assess a patient's balance using an interactive functional reach test (I-FRT). The aim of the study was to test the concurrent validity of the I-FRT and to establish the feasibility of implementing the I-FRT in a clinical setting. The concurrent validity of the I-FRT was tested among 20 healthy adults (mean age, 25.8±3.4 years; 14 women). The Functional Reach Test (FRT) was measured simultaneously by both the Kinect sensor using the I-FRT software and the Optotrak Certus(®) 3D motion-capture system (Northern Digital Inc., Waterloo, ON, Canada). The feasibility of implementing the I-FRT in a clinical setting was assessed by performing the I-FRT in 10 participants with mild balance impairments recruited from the outpatient physical therapy clinic (mean age, 55.8±13.5 years; four women) and obtaining their feedback using a NASA Task Load Index (NASA-TLX) questionnaire. There was moderate to good agreement between FRT measures made by the two measurement systems. The greatest agreement between the two measurement system was found with the Kinect sensor placed at a distance of 2.5 m [intraclass correlation coefficient (2,k)=0.786; P<0.001] from the participant. Participants with mild balance impairments whose balance was assessed using the I-FRT software scored their experience favorably by assigning lower scores for the Frustration, Mental Demand, and Temporal Demand subscales on the NASA/TLX questionnaire. FRT measures made using the Kinect sensor I-FRT software provides a valid clinical measure that can be used with the gaming platforms.
Reliability and validity of the new Tanaka B Intelligence Scale scores: a group intelligence test.
Uno, Yota; Mizukami, Hitomi; Ando, Masahiko; Yukihiro, Ryoji; Iwasaki, Yoko; Ozaki, Norio
2014-01-01
The present study evaluated the reliability and concurrent validity of the new Tanaka B Intelligence Scale, which is an intelligence test that can be administered on groups within a short period of time. The new Tanaka B Intelligence Scale and Wechsler Intelligence Scale for Children-Third Edition were administered to 81 subjects (mean age ± SD 15.2 ± 0.7 years) residing in a juvenile detention home; reliability was assessed using Cronbach's alpha coefficient, and concurrent validity was assessed using the one-way analysis of variance intraclass correlation coefficient. Moreover, receiver operating characteristic analysis for screening for individuals who have a deficit in intellectual function (an FIQ<70) was performed. In addition, stratum-specific likelihood ratios for detection of intellectual disability were calculated. The Cronbach's alpha for the new Tanaka B Intelligence Scale IQ (BIQ) was 0.86, and the intraclass correlation coefficient with FIQ was 0.83. Receiver operating characteristic analysis demonstrated an area under the curve of 0.89 (95% CI: 0.85-0.96). In addition, the stratum-specific likelihood ratio for the BIQ≤65 stratum was 13.8 (95% CI: 3.9-48.9), and the stratum-specific likelihood ratio for the BIQ≥76 stratum was 0.1 (95% CI: 0.03-0.4). Thus, intellectual disability could be ruled out or determined. The present results demonstrated that the new Tanaka B Intelligence Scale score had high reliability and concurrent validity with the Wechsler Intelligence Scale for Children-Third Edition score. Moreover, the post-test probability for the BIQ could be calculated when screening for individuals who have a deficit in intellectual function. The new Tanaka B Intelligence Test is convenient and can be administered within a variety of settings. This enables evaluation of intellectual development even in settings where performing intelligence tests have previously been difficult.
Reliability and Validity of the New Tanaka B Intelligence Scale Scores: A Group Intelligence Test
Uno, Yota; Mizukami, Hitomi; Ando, Masahiko; Yukihiro, Ryoji; Iwasaki, Yoko; Ozaki, Norio
2014-01-01
Objective The present study evaluated the reliability and concurrent validity of the new Tanaka B Intelligence Scale, which is an intelligence test that can be administered on groups within a short period of time. Methods The new Tanaka B Intelligence Scale and Wechsler Intelligence Scale for Children-Third Edition were administered to 81 subjects (mean age ± SD 15.2±0.7 years) residing in a juvenile detention home; reliability was assessed using Cronbach’s alpha coefficient, and concurrent validity was assessed using the one-way analysis of variance intraclass correlation coefficient. Moreover, receiver operating characteristic analysis for screening for individuals who have a deficit in intellectual function (an FIQ<70) was performed. In addition, stratum-specific likelihood ratios for detection of intellectual disability were calculated. Results The Cronbach’s alpha for the new Tanaka B Intelligence Scale IQ (BIQ) was 0.86, and the intraclass correlation coefficient with FIQ was 0.83. Receiver operating characteristic analysis demonstrated an area under the curve of 0.89 (95% CI: 0.85–0.96). In addition, the stratum-specific likelihood ratio for the BIQ≤65 stratum was 13.8 (95% CI: 3.9–48.9), and the stratum-specific likelihood ratio for the BIQ≥76 stratum was 0.1 (95% CI: 0.03–0.4). Thus, intellectual disability could be ruled out or determined. Conclusion The present results demonstrated that the new Tanaka B Intelligence Scale score had high reliability and concurrent validity with the Wechsler Intelligence Scale for Children-Third Edition score. Moreover, the post-test probability for the BIQ could be calculated when screening for individuals who have a deficit in intellectual function. The new Tanaka B Intelligence Test is convenient and can be administered within a variety of settings. This enables evaluation of intellectual development even in settings where performing intelligence tests have previously been difficult. PMID:24940880
Validation of Safety-Critical Systems for Aircraft Loss-of-Control Prevention and Recovery
NASA Technical Reports Server (NTRS)
Belcastro, Christine M.
2012-01-01
Validation of technologies developed for loss of control (LOC) prevention and recovery poses significant challenges. Aircraft LOC can result from a wide spectrum of hazards, often occurring in combination, which cannot be fully replicated during evaluation. Technologies developed for LOC prevention and recovery must therefore be effective under a wide variety of hazardous and uncertain conditions, and the validation framework must provide some measure of assurance that the new vehicle safety technologies do no harm (i.e., that they themselves do not introduce new safety risks). This paper summarizes a proposed validation framework for safety-critical systems, provides an overview of validation methods and tools developed by NASA to date within the Vehicle Systems Safety Project, and develops a preliminary set of test scenarios for the validation of technologies for LOC prevention and recovery
Preliminary Report on Oak Ridge National Laboratory Testing of Drake/ACSS/MA2/E3X
DOE Office of Scientific and Technical Information (OSTI.GOV)
Irminger, Philip; King, Daniel J.; Herron, Andrew N.
2016-01-01
A key to industry acceptance of a new technology is extensive validation in field trials. The Powerline Conductor Accelerated Test facility (PCAT) at Oak Ridge National Laboratory (ORNL) is specifically designed to evaluate the performance and reliability of a new conductor technology under real world conditions. The facility is set up to capture large amounts of data during testing. General Cable used the ORNL PCAT facility to validate the performance of TransPowr with E3X Technology a standard overhead conductor with an inorganic high emissivity, low absorptivity surface coating. Extensive testing has demonstrated a significant improvement in conductor performance across amore » wide range of operating temperatures, indicating that E3X Technology can provide a reduction in temperature, a reduction in sag, and an increase in ampacity when applied to the surface of any overhead conductor. This report provides initial results of that testing.« less
Computerized neurocognitive testing in the management of sport-related concussion: an update.
Resch, Jacob E; McCrea, Michael A; Cullum, C Munro
2013-12-01
Since the late nineties, computerized neurocognitive testing has become a central component of sport-related concussion (SRC) management at all levels of sport. In 2005, a review of the available evidence on the psychometric properties of four computerized neuropsychological test batteries concluded that the tests did not possess the necessary criteria to warrant clinical application. Since the publication of that review, several more computerized neurocognitive tests have entered the market place. The purpose of this review is to summarize the body of published studies on psychometric properties and clinical utility of computerized neurocognitive tests available for use in the assessment of SRC. A review of the literature from 2005 to 2013 was conducted to gather evidence of test-retest reliability and clinical validity of these instruments. Reviewed articles included both prospective and retrospective studies of primarily sport-based adult and pediatric samples. Summaries are provided regarding the available evidence of reliability and validity for the most commonly used computerized neurocognitive tests in sports settings.
Validity and reliability of the Omron HJ-303 tri-axial accelerometer-based pedometer.
Steeves, Jeremy A; Tyo, Brian M; Connolly, Christopher P; Gregory, Douglas A; Stark, Nyle A; Bassett, David R
2011-09-01
This study compared the validity of a new Omron HJ-303 piezoelectric pedometer and 2 other pedometers (Sportline Traq and Yamax SW200). To examine the effect of speed, 60 subjects walked on a treadmill at 2, 3, and 4 mph. Twenty subjects also ran at 6, 7, and 8 mph. To test lifestyle activities, 60 subjects performed front-back-side-side stepping, elliptical machine and stair climbing/descending. Twenty others performed ballroom dancing. Sixty participants completed 5 100-step trials while wearing 5 different sets of the devices tested device reliability. Actual steps were determined using a hand tally counter. Significant differences existed among pedometers (P < .05). For walking, the Omron pedometers were the most valid. The Sportline overestimated and the Yamax underestimated steps (P < .05). Worn on the waist or in the backpack, the Omron device and Sportline were valid for running. The Omron was valid for 3 activities (elliptical machine, ascending and descending stairs). The Sportline overestimated all of these activities, and Yamax was only valid for descending stairs. The Omron andYamax were both valid and reliable in the 100-step trials. The Omron HJ-303, worn on the waist, appeared to be the most valid of the 3 pedometers.
Zgaljardic, Dennis J; Yancy, Sybil; Temple, Richard O; Watford, Monica F; Miller, Rebekah
2011-11-01
The assessment of ecological validity of neuropsychological measures is an area of growing interest, particularly in the postacute brain injury rehabilitation (PABIR) setting, as there is an increasing demand for clinicians to address functional and real-world outcomes. In the current study, we assessed the predictive value of the Screening module and the Daily Living tests of the Neuropsychological Assessment Battery (NAB) using clinician ratings from the Mayo-Portland Adaptability Inventory-4 (MPAI-4) in patients with moderate to severe traumatic brain injury. Forty-seven individuals were each administered the NAB Screening module (NAB-SM) and the NAB Daily Living (NAB-DL) tests following admission to a residential PABIR program. MPAI-4 ratings were also obtained at admission. Linear regression analysis was used to examine the association between these functional and neuropsychological assessment measures. We replicated prior work (Temple at al., 2009) and expanded evidence for the ecological validity of the NAB-SM. Furthermore, our findings support the ecological validity of the NAB-DL Bill Payment, Judgment, and Map Reading tests with regards to functional skills and real-world activities. The current study supports prior work from our lab assessing the predictive value of the NAB-SM, as well as provides evidence for the ecological validity for select NAB-DL tests in patients with moderate to severe traumatic brain injury admitted to a residential PABIR program.
Prospective Validation of Optimal Drain Management "The 3 × 3 Rule" after Liver Resection.
Mitsuka, Yusuke; Yamazaki, Shintaro; Yoshida, Nao; Masamichi, Moriguchi; Higaki, Tokio; Takayama, Tadatoshi
2016-09-01
We previously established an optimal postoperative drain management rule after liver resection (i.e., drain removal on postoperative day 3 if the drain fluid bilirubin concentration is <3 mg/dl) from the results of 514 drains of 316 consecutive patients. This test set predicts that 274 of 316 patients (87.0 %) will be safely managed without adverse events when drain management is performed without deviation from the rule. To validate the feasibility of our rule in recent time period. The data from 493 drains of 274 consecutive patients were prospectively collected. Drain fluid volumes, bilirubin levels, and bacteriological cultures were measured on postoperative days (POD) 1, 3, 5, and 7. The drains were removed according to the management rule. The achievement rate of the rule, postoperative adverse events, hospital stay, medical costs, and predictive value for reoperation according to the rule were validated. The rule was achieved in 255 of 274 (93.1 %) patients. The drain removal time was significantly shorter [3 days (1-30) vs. 7 (2-105), p < 0.01], drain fluid infection was less frequent [4 patients (1.5 %) vs. 58 (18.4 %), p < 0.01], postoperative hospital stay was shorter [11 days (6-73) vs. 16 (9-59), p = 0.04], and medical costs were decreased [1453 USD (968-6859) vs. 1847 (4667-9498), p < 0.01] in the validation set compared with the test set. Five patients who required reoperation were predicted by the drain-based information and treated within 2 days after operation. Our 3 × 3 rule is clinically feasible and allows for the early removal of the drain tube with minimum infection risk after liver resection.
TIE: An Ability Test of Emotional Intelligence
Śmieja, Magdalena; Orzechowski, Jarosław; Stolarski, Maciej S.
2014-01-01
The Test of Emotional Intelligence (TIE) is a new ability scale based on a theoretical model that defines emotional intelligence as a set of skills responsible for the processing of emotion-relevant information. Participants are provided with descriptions of emotional problems, and asked to indicate which emotion is most probable in a given situation, or to suggest the most appropriate action. Scoring is based on the judgments of experts: professional psychotherapists, trainers, and HR specialists. The validation study showed that the TIE is a reliable and valid test, suitable for both scientific research and individual assessment. Its internal consistency measures were as high as .88. In line with theoretical model of emotional intelligence, the results of the TIE shared about 10% of common variance with a general intelligence test, and were independent of major personality dimensions. PMID:25072656
Folmsbee, Martha
2015-01-01
Approximately 97% of filter validation tests result in the demonstration of absolute retention of the test bacteria, and thus sterile filter validation failure is rare. However, while Brevundimonas diminuta (B. diminuta) penetration of sterilizing-grade filters is rarely detected, the observation that some fluids (such as vaccines and liposomal fluids) may lead to an increased incidence of bacterial penetration of sterilizing-grade filters by B. diminuta has been reported. The goal of the following analysis was to identify important drivers of filter validation failure in these rare cases. The identification of these drivers will hopefully serve the purpose of assisting in the design of commercial sterile filtration processes with a low risk of filter validation failure for vaccine, liposomal, and related fluids. Filter validation data for low-surface-tension fluids was collected and evaluated with regard to the effect of bacterial load (CFU/cm(2)), bacterial load rate (CFU/min/cm(2)), volume throughput (mL/cm(2)), and maximum filter flux (mL/min/cm(2)) on bacterial penetration. The data set (∼1162 individual filtrations) included all instances of process-specific filter validation failures performed at Pall Corporation, including those using other filter media, but did not include all successful retentive filter validation bacterial challenges. It was neither practical nor necessary to include all filter validation successes worldwide (Pall Corporation) to achieve the goals of this analysis. The percentage of failed filtration events for the selected total master data set was 27% (310/1162). Because it is heavily weighted with penetration events, this percentage is considerably higher than the actual rate of failed filter validations, but, as such, facilitated a close examination of the conditions that lead to filter validation failure. In agreement with our previous reports, two of the significant drivers of bacterial penetration identified were the total bacterial load and the bacterial load rate. In addition to these parameters, another three possible drivers of failure were also identified: volume throughput, maximum filter flux, and pressure. Of the data for which volume throughput information was available, 24% (249/1038) of the filtrations resulted in penetration. However, for the volume throughput range of 680-2260 mL/cm(2), only 9 out of 205 bacterial challenges (∼4%) resulted in penetration. Of the data for which flux information was available, 22% (212/946) resulted in bacterial penetration. However, in the maximum filter flux range from 7 to 18 mL/min/cm(2), only one out of 121 filtrations (0.6%) resulted in penetration. A slight increase in filter failure was observed in filter bacterial challenges with a differential pressure greater than 30 psid. When designing a commercial process for the sterile filtration of a low-surface-tension fluid (or any other potentially high-risk fluid), targeting the volume throughput range of 680-2260 mL/cm(2) or flux range of 7-18 mL/min/cm(2), and maintaining the differential pressure below 30 psid, could significantly decrease the risk of validation filter failure. However, it is important to keep in mind that these are general trends described in this study and some test fluids may not conform to the general trends described here. Ultimately, it is important to evaluate both filterability and bacterial retention of the test fluid under proposed process conditions prior to finalizing the manufacturing process to ensure successful process-specific filter validation of low-surface-tension fluids. An overwhelming majority of process-specific filter validation (qualification) tests result in the demonstration of absolute retention of test bacteria by sterilizing-grade membrane filters. As such, process-specific filter validation failure is rare. However, while bacterial penetration of sterilizing-grade filters during process-specific filter validation is rarely detected, some fluids (such as vaccines and liposomal fluids) have been associated with an increased incidence of bacterial penetration. The goal of the following analysis was to identify important drivers of process-specific filter validation failure. The identification of these drivers will possibly serve to assist in the design of commercial sterile filtration processes with a low risk of filter validation failure. Filter validation data for low-surface-tension fluids was collected and evaluated with regard to bacterial concentration and rates, as well as filtered fluid volume and rate (Pall Corporation). The master data set (∼1160 individual filtrations) included all recorded instances of process-specific filter validation failures but did not include all successful filter validation bacterial challenge tests. This allowed for a close examination of the conditions that lead to process-specific filter validation failure. As previously reported, two significant drivers of bacterial penetration were identified: the total bacterial load (the total number of bacteria per filter) and the bacterial load rate (the rate at which bacteria were applied to the filter). In addition to these parameters, another three possible drivers of failure were also identified: volumetric throughput, filter flux, and pressure. When designing a commercial process for the sterile filtration of a low-surface-tension fluid (or any other penetrative-risk fluid), targeting the identified bacterial challenge loads, volume throughput, and corresponding flux rates could decrease, and possibly eliminate, the risk of validation filter failure. However, it is important to keep in mind that these are general trends described in this study and some test fluids may not conform to the general trends described here. Ultimately, it is important to evaluate both filterability and bacterial retention of the test fluid under proposed process conditions prior to finalizing the manufacturing process to ensure successful filter validation of low-surface-tension fluids. © PDA, Inc. 2015.
Measuring theory of mind in children. Psychometric properties of the ToM Storybooks.
Blijd-Hoogewys, E M A; van Geert, P L C; Serra, M; Minderaa, R B
2008-11-01
Although research on Theory-of-Mind (ToM) is often based on single task measurements, more comprehensive instruments result in a better understanding of ToM development. The ToM Storybooks is a new instrument measuring basic ToM-functioning and associated aspects. There are 34 tasks, tapping various emotions, beliefs, desires and mental-physical distinctions. Four studies on the validity and reliability of the test are presented, in typically developing children (n = 324, 3-12 years) and children with PDD-NOS (n = 30). The ToM Storybooks have good psychometric qualities. A component analysis reveals five components corresponding with the underlying theoretical constructs. The internal consistency, test-retest reliability, inter-rater reliability, construct validity and convergent validity are good. The ToM Storybooks can be used in research as well as in clinical settings.
Redley, Bernice; Waugh, Rachael
2018-04-01
Nurse bedside handover quality is influenced by complex interactions related to the content, processes used and the work environment. Audit tools are seldom tested in 'real' settings. Examine the reliability, validity and usability of a quality improvement tool for audit of nurse bedside handover. Naturalistic, descriptive, mixed-methods. Six inpatient wards at a single large not-for-profit private health service in Victoria, Australia. Five nurse experts and 104 nurses involved in 199 change-of-shift bedside handovers. A focus group with experts and pilot test were used to examine content and face validity, and usability of the handover audit tool. The tool was examined for inter-rater reliability and usability using observation audits of handovers across six wards. Data were collected in 2013-2014. Two independent observers for 72 audits demonstrated acceptable inter-observer agreement for 27 (77%) items. Reliability was weak for items examining the handover environment. Seventeen items were not observed reflecting gaps in practices. Across 199 observation audits, gaps in nurse bedside handover practice most often related to process and environment, rather than content items. Usability was impacted by high observer burden, familiarity and non-specific illustrative behaviours. The reliability and validity of most items to audit handover content was acceptable. Gaps in practices for process and environment items were identified. Context specific exemplars and reducing the items used at each handover audit can enhance usability. Further research is needed to develop context specific exemplars and undertake additional reliability testing using a wide range of handover settings. CONTRIBUTION OF THE PAPER. Copyright © 2017 Elsevier Inc. All rights reserved.
Pezzei, Cornelia K; Schönbichler, Stefan A; Hussain, Shah; Kirchler, Christian G; Huck-Pezzei, Verena A; Popp, Michael; Krolitzek, Justine; Bonn, Günther K; Huck, Christian W
2018-04-01
In this study, novel near-infrared and attenuated total reflectance mid-infrared spectroscopic methods coupled with multivariate data analysis were established enabling the determination of thymol, rosmarinic acid, and the antioxidant capacity of Thymi herba. A new high-performance liquid chromatography method and UV-Vis spectroscopy were applied as reference methods. Partial least squares regressions were carried out as cross and test set validations. To reduce systematic errors, different data pretreatments, such as multiplicative scatter correction, 1st derivative, or 2nd derivative, were applied on the spectra. The performances of the two infrared spectroscopic techniques were evaluated and compared. In general, attenuated total reflectance mid-infrared spectroscopy demonstrated a slightly better predictive power (thymol: coefficient of determination = 0.93, factors = 3, ratio of performance to deviation = 3.94; rosmarinic acid: coefficient of determination = 0.91, factors = 3, ratio of performance to deviation = 3.35, antioxidant capacity: coefficient of determination = 0.87, factors = 2, ratio of performance to deviation = 2.80; test set validation) than near-infrared spectroscopy (thymol: coefficient of determination = 0.90, factors = 6, ratio of performance to deviation = 3.10; rosmarinic acid: coefficient of determination = 0.92, factors = 6, ratio of performance to deviation = 3.61, antioxidant capacity: coefficient of determination = 0.91, factors = 6, ratio of performance to deviation = 3.42; test set validation). The capability of infrared vibrational spectroscopy as a quick and simple analytical tool to replace conventional time and chemical consuming analyses for the quality control of T. herba could be demonstrated. Georg Thieme Verlag KG Stuttgart · New York.
Wang, Li-Ju; Naudé, Nicole; Demissie, Misganaw; Crivaro, Anne; Kamoun, Malek; Wang, Ping; Li, Lei
2018-07-01
Most mobile health (mHealth) diagnostic devices for laboratory tests only analyze one sample at a time, which is not suitable for large volume serology testing, especially in low-resource settings with shortage of health professionals. In this study, we developed an ultra-low-cost clinically-accurate mobile phone microplate reader (mReader), and clinically validated this optical device for 12 infectious disease tests. The mReader optically reads 96 samples on a microplate at one time. 771 de-identified patient samples were tested for 12 serology assays for bacterial/viral infections. The mReader and the clinical instrument blindly read and analyzed all tests in parallel. The analytical accuracy and the diagnostic performance of the mReader were evaluated across the clinical reportable categories by comparison with clinical laboratorial testing results. The mReader exhibited 97.59-99.90% analytical accuracy and <5% coefficient of variation (CV). The positive percent agreement (PPA) in all 12 tests achieved 100%, negative percent agreement (NPA) was higher than 83% except for one test (42.86%), and overall percent agreement (OPA) ranged 89.33-100%. We envision the mReader can benefit underserved areas/populations and low-resource settings in rural clinics/hospitals at a low cost (~$50 USD) with clinical-level analytical quality. It has the potential to improve health access, speed up healthcare delivery, and reduce health disparities and education disparities by providing access to a low-cost spectrophotometer. Copyright © 2018 Elsevier B.V. All rights reserved.
The Pareidolia Test: A Simple Neuropsychological Test Measuring Visual Hallucination-Like Illusions.
Mamiya, Yasuyuki; Nishio, Yoshiyuki; Watanabe, Hiroyuki; Yokoi, Kayoko; Uchiyama, Makoto; Baba, Toru; Iizuka, Osamu; Kanno, Shigenori; Kamimura, Naoto; Kazui, Hiroaki; Hashimoto, Mamoru; Ikeda, Manabu; Takeshita, Chieko; Shimomura, Tatsuo; Mori, Etsuro
2016-01-01
Visual hallucinations are a core clinical feature of dementia with Lewy bodies (DLB), and this symptom is important in the differential diagnosis and prediction of treatment response. The pareidolia test is a tool that evokes visual hallucination-like illusions, and these illusions may be a surrogate marker of visual hallucinations in DLB. We created a simplified version of the pareidolia test and examined its validity and reliability to establish the clinical utility of this test. The pareidolia test was administered to 52 patients with DLB, 52 patients with Alzheimer's disease (AD) and 20 healthy controls (HCs). We assessed the test-retest/inter-rater reliability using the intra-class correlation coefficient (ICC) and the concurrent validity using the Neuropsychiatric Inventory (NPI) hallucinations score as a reference. A receiver operating characteristic (ROC) analysis was used to evaluate the sensitivity and specificity of the pareidolia test to differentiate DLB from AD and HCs. The pareidolia test required approximately 15 minutes to administer, exhibited good test-retest/inter-rater reliability (ICC of 0.82), and moderately correlated with the NPI hallucinations score (rs = 0.42). Using an optimal cut-off score set according to the ROC analysis, and the pareidolia test differentiated DLB from AD with a sensitivity of 81% and a specificity of 92%. Our study suggests that the simplified version of the pareidolia test is a valid and reliable surrogate marker of visual hallucinations in DLB.
Thermo-mechanical evaluation of carbon-carbon primary structure for SSTO vehicles
NASA Astrophysics Data System (ADS)
Croop, Harold C.; Lowndes, Holland B.; Hahn, Steven E.; Barthel, Chris A.
1998-01-01
An advanced development program to demonstrate carbon-carbon composite structure for use as primary load carrying structure has entered the experimental validation phase. The component being evaluated is a wing torque box section for a single-stage-to-orbit (SSTO) vehicle. The validation or demonstration component features an advanced carbon-carbon design incorporating 3D woven graphite preforms, integral spars, oxidation inhibited matrix, chemical vapor deposited (CVD) oxidation protection coating, and ceramic matrix composite fasteners. The validation component represents the culmination of a four phase design and fabrication development effort. Extensive developmental testing was performed to verify material properties and integrity of basic design features before committing to fabrication of the full scale box. The wing box component is now being set up for testing in the Air Force Research Laboratory Structural Test Facility at Wright-Patterson Air Force Base, Ohio. One of the important developmental tests performed in support of the design and planned testing of the full scale box was the fabrication and test of a skin/spar trial subcomponent. The trial subcomponent incorporated critical features of the full scale wing box design. This paper discusses the results of the trial subcomponent test which served as a pathfinder for the upcoming full scale box test.
Assessment of NDE reliability data
NASA Technical Reports Server (NTRS)
Yee, B. G. W.; Couchman, J. C.; Chang, F. H.; Packman, D. F.
1975-01-01
Twenty sets of relevant nondestructive test (NDT) reliability data were identified, collected, compiled, and categorized. A criterion for the selection of data for statistical analysis considerations was formulated, and a model to grade the quality and validity of the data sets was developed. Data input formats, which record the pertinent parameters of the defect/specimen and inspection procedures, were formulated for each NDE method. A comprehensive computer program was written and debugged to calculate the probability of flaw detection at several confidence limits by the binomial distribution. This program also selects the desired data sets for pooling and tests the statistical pooling criteria before calculating the composite detection reliability. An example of the calculated reliability of crack detection in bolt holes by an automatic eddy current method is presented.
Papadakaki, Maria; Prokopiadou, Dimitra; Petridou, Eleni; Kogevinas, Manolis; Lionis, Christos
2012-06-01
The current article aims to translate the PREMIS (Physician Readiness to Manage Intimate Partner Violence) survey into the Greek language and test its validity and reliability in a sample of primary care physicians. The validation study was conducted in 2010 and involved all the general practitioners serving two adjacent prefectures of Greece (n = 80). Maximum-likelihood factor analysis (MLF) was used to extract key survey factors. The instrument was further assessed for the following psychometric properties: (a) scale reliability, (b) item-specific reliability, (c) test-retest reliability, (d) scale construct validity, and (e) internal predictive validity. The MLF analysis of 23 opinion items revealed a seven-factor solution (preparation, constraint, workplace issues, screening, self-efficacy, alcohol/drugs, victim understanding), which was statistically sound (p = .293). Most of the newly derived scales displayed satisfactory internal consistency (α ≥ .60), high item-specific reliability, strong construct, and internal predictive validity (F = 2.82; p = .004), and high repeatability when retested with 20 individuals (intraclass correlation coefficient [ICC] > .70). The tool was found appropriate to facilitate the identification of competence deficits and the evaluation of training initiatives.
Beattie, James R.; Cummins, Niamh M.; Caraher, Clare; O’Driscoll, Olive M.; Bansal, Aruna T.; Eastell, Richard; Ralston, Stuart H.; Stone, Michael D.; Pearson, Gill; Towler, Mark R.
2016-01-01
Raman spectroscopy was applied to nail clippings from 633 postmenopausal British and Irish women, from six clinical sites, of whom 42% had experienced a fragility fracture. The objective was to build a prediction algorithm for fracture using data from four sites (known as the calibration set) and test its performance using data from the other two sites (known as the validation set). Results from the validation set showed that a novel algorithm, combining spectroscopy data with clinical data, provided area under the curve (AUC) of 74% compared to an AUC of 60% from a reduced QFracture score (a clinically accepted risk calculator) and 61% from the dual-energy X-ray absorptiometry T-score, which is in current use for the diagnosis of osteoporosis. Raman spectroscopy should be investigated further as a noninvasive tool for the early detection of enhanced risk of fragility fracture. PMID:27429561
Determination of butter adulteration with margarine using Raman spectroscopy.
Uysal, Reyhan Selin; Boyaci, Ismail Hakki; Genis, Hüseyin Efe; Tamer, Ugur
2013-12-15
In this study, adulteration of butter with margarine was analysed using Raman spectroscopy combined with chemometric methods (principal component analysis (PCA), principal component regression (PCR), partial least squares (PLS)) and artificial neural networks (ANNs). Different butter and margarine samples were mixed at various concentrations ranging from 0% to 100% w/w. PCA analysis was applied for the classification of butters, margarines and mixtures. PCR, PLS and ANN were used for the detection of adulteration ratios of butter. Models were created using a calibration data set and developed models were evaluated using a validation data set. The coefficient of determination (R(2)) values between actual and predicted values obtained for PCR, PLS and ANN for the validation data set were 0.968, 0.987 and 0.978, respectively. In conclusion, a combination of Raman spectroscopy with chemometrics and ANN methods can be applied for testing butter adulteration. Copyright © 2013 Elsevier Ltd. All rights reserved.
Inter‐station intensity standardization for whole‐body MR data
Staring, Marius; Reijnierse, Monique; Lelieveldt, Boudewijn P. F.; van der Geest, Rob J.
2016-01-01
Purpose To develop and validate a method for performing inter‐station intensity standardization in multispectral whole‐body MR data. Methods Different approaches for mapping the intensity of each acquired image stack into the reference intensity space were developed and validated. The registration strategies included: “direct” registration to the reference station (Strategy 1), “progressive” registration to the neighboring stations without (Strategy 2), and with (Strategy 3) using information from the overlap regions of the neighboring stations. For Strategy 3, two regularized modifications were proposed and validated. All methods were tested on two multispectral whole‐body MR data sets: a multiple myeloma patients data set (48 subjects) and a whole‐body MR angiography data set (33 subjects). Results For both data sets, all strategies showed significant improvement of intensity homogeneity with respect to vast majority of the validation measures (P < 0.005). Strategy 1 exhibited the best performance, closely followed by Strategy 2. Strategy 3 and its modifications were performing worse, in majority of the cases significantly (P < 0.05). Conclusions We propose several strategies for performing inter‐station intensity standardization in multispectral whole‐body MR data. All the strategies were successfully applied to two types of whole‐body MR data, and the “direct” registration strategy was concluded to perform the best. Magn Reson Med 77:422–433, 2017. © 2016 The Authors Magnetic Resonance in Medicine published by Wiley Periodicals, Inc. on behalf of International Society for Magnetic Resonance in Medicine PMID:26834001
NDARC - NASA Design and Analysis of Rotorcraft Validation and Demonstration
NASA Technical Reports Server (NTRS)
Johnson, Wayne
2010-01-01
Validation and demonstration results from the development of the conceptual design tool NDARC (NASA Design and Analysis of Rotorcraft) are presented. The principal tasks of NDARC are to design a rotorcraft to satisfy specified design conditions and missions, and then analyze the performance of the aircraft for a set of off-design missions and point operating conditions. The aircraft chosen as NDARC development test cases are the UH-60A single main-rotor and tail-rotor helicopter, the CH-47D tandem helicopter, the XH-59A coaxial lift-offset helicopter, and the XV-15 tiltrotor. These aircraft were selected because flight performance data, a weight statement, detailed geometry information, and a correlated comprehensive analysis model are available for each. Validation consists of developing the NDARC models for these aircraft by using geometry and weight information, airframe wind tunnel test data, engine decks, rotor performance tests, and comprehensive analysis results; and then comparing the NDARC results for aircraft and component performance with flight test data. Based on the calibrated models, the capability of the code to size rotorcraft is explored.
Three controversies over item disclosure in medical licensure examinations.
Park, Yoon Soo; Yang, Eunbae B
2015-01-01
In response to views on public's right to know, there is growing attention to item disclosure - release of items, answer keys, and performance data to the public - in medical licensure examinations and their potential impact on the test's ability to measure competence and select qualified candidates. Recent debates on this issue have sparked legislative action internationally, including South Korea, with prior discussions among North American countries dating over three decades. The purpose of this study is to identify and analyze three issues associated with item disclosure in medical licensure examinations - 1) fairness and validity, 2) impact on passing levels, and 3) utility of item disclosure - by synthesizing existing literature in relation to standards in testing. Historically, the controversy over item disclosure has centered on fairness and validity. Proponents of item disclosure stress test takers' right to know, while opponents argue from a validity perspective. Item disclosure may bias item characteristics, such as difficulty and discrimination, and has consequences on setting passing levels. To date, there has been limited research on the utility of item disclosure for large scale testing. These issues requires ongoing and careful consideration.
MAVRIC Flutter Model Transonic Limit Cycle Oscillation Test
NASA Technical Reports Server (NTRS)
Edwards, John W.; Schuster, David M.; Spain, Charles V.; Keller, Donald F.; Moses, Robert W.
2001-01-01
The Models for Aeroelastic Validation Research Involving Computation semi-span wind-tunnel model (MAVRIC-I), a business jet wing-fuselage flutter model, was tested in NASA Langley's Transonic Dynamics Tunnel with the goal of obtaining experimental data suitable for Computational Aeroelasticity code validation at transonic separation onset conditions. This research model is notable for its inexpensive construction and instrumentation installation procedures. Unsteady pressures and wing responses were obtained for three wingtip configurations of clean, tipstore, and winglet. Traditional flutter boundaries were measured over the range of M = 0.6 to 0.9 and maps of Limit Cycle Oscillation (LCO) behavior were made in the range of M = 0.85 to 0.95. Effects of dynamic pressure and angle-of-attack were measured. Testing in both R134a heavy gas and air provided unique data on Reynolds number, transition effects, and the effect of speed of sound on LCO behavior. The data set provides excellent code validation test cases for the important class of flow conditions involving shock-induced transonic flow separation onset at low wing angles, including LCO behavior.
MAVRIC Flutter Model Transonic Limit Cycle Oscillation Test
NASA Technical Reports Server (NTRS)
Edwards, John W.; Schuster, David M.; Spain, Charles V.; Keller, Donald F.; Moses, Robert W.
2001-01-01
The Models for Aeroelastic Validation Research Involving Computation semi-span wind-tunnel model (MAVRIC-I), a business jet wing-fuselage flutter model, was tested in NASA Langley's Transonic Dynamics Tunnel with the goal of obtaining experimental data suitable for Computational Aeroelasticity code validation at transonic separation onset conditions. This research model is notable for its inexpensive construction and instrumentation installation procedures. Unsteady pressures and wing responses were obtained for three wingtip configurations clean, tipstore, and winglet. Traditional flutter boundaries were measured over the range of M = 0.6 to 0.9 and maps of Limit Cycle Oscillation (LCO) behavior were made in the range of M = 0.85 to 0.95. Effects of dynamic pressure and angle-of-attack were measured. Testing in both R134a heavy gas and air provided unique data on Reynolds number, transition effects, and the effect of speed of sound on LCO behavior. The data set provides excellent code validation test cases for the important class of flow conditions involving shock-induced transonic flow separation onset at low wing angles, including Limit Cycle Oscillation behavior.
Parpa, Efi; Galanopoulou, Natasa; Tsilika, Eleni; Galanos, Antonis; Mystakidou, Kyriaki
2017-08-01
To investigate the psychometric properties of the Greek 13-item measure of patients' satisfaction (FAMCARE-P13) in palliative care setting. A hundred patients completed the FAMCARE-P13. Exploratory factor analysis and confirmatory factor analysis (CFA) have been conducted. Two factors' solution was revealed from CFA. The questionnaire was administered to an initial validation sample and then for test-retest in a sample of 40 patients 3 days later. The Rosenberg Self-Esteem Scale measuring global self-esteem has been also used as a gold standard for construct validity. Subscale and known groups validity have also been tested for FAMCARE-P13s' validity. A reduced 13-item version of our measure (FAMCARE-P13) possessed 2-factor structure with high reliability. Patient satisfaction was correlated with physical distress, communication and relationship with health-care providers, and caregiver satisfaction. We recommend the use of the Greek FAMCARE-P13 to assess care satisfaction of patients with advanced-stage cancer.
Geostatistics as a validation tool for setting ozone standards for durum wheat.
De Marco, Alessandra; Screpanti, Augusto; Paoletti, Elena
2010-02-01
Which is the best standard for protecting plants from ozone? To answer this question, we must validate the standards by testing biological responses vs. ambient data in the field. A validation is missing for European and USA standards, because the networks for ozone, meteorology and plant responses are spatially independent. We proposed geostatistics as validation tool, and used durum wheat in central Italy as a test. The standards summarized ozone impact on yield better than hourly averages. Although USA criteria explained ozone-induced yield losses better than European criteria, USA legal level (75 ppb) protected only 39% of sites. European exposure-based standards protected > or =90%. Reducing the USA level to the Canadian 65 ppb or using W126 protected 91% and 97%, respectively. For a no-threshold accumulated stomatal flux, 22 mmol m(-2) was suggested to protect 97% of sites. In a multiple regression, precipitation explained 22% and ozone explained <0.9% of yield variability. Copyright (c) 2009 Elsevier Ltd. All rights reserved.
Walters, Stephen John; Stern, Cindy; Robertson-Malt, Suzanne
2016-04-01
There is a growing call by consumers and governments for healthcare to adopt systems and approaches to care to improve patient safety. Collaboration within healthcare settings is an important factor for improving systems of care. By using validated measurement instruments a standardized approach to assessing collaboration is possible, otherwise it is only an assumption that collaboration is occurring in any healthcare setting. The objective of this review was to evaluate and compare measurement properties of instruments that measure collaboration within healthcare settings, specifically those which have been psychometrically tested and validated. Participants could be healthcare professionals, the patient or any non-professional who contributes to a patient's care, for example, family members, chaplains or orderlies. The term participant type means the designation of any one participant; for example 'nurse', 'social worker' or 'administrator'. More than two participant types was mandatory. The focus of this review was the validity of tools used to measure collaboration within healthcare settings. The types of studies considered for inclusion were validation studies, but quantitative study designs such as randomized controlled trials, controlled trials and case studies were also eligible for inclusion. Studies that focused on Interprofessional Education, were published as an abstract only, contained patient self-reporting only or were not about care delivery were excluded. The outcome of interest was validation and interpretability of the instrument being assessed and included content validity, construct validity and reliability. Interpretability is characterized by statistics such as mean and standard deviation which can be translated to a qualitative meaning. The search strategy aimed to find both published and unpublished studies. A three-step search strategy was utilized in this review. The databases searched included PubMed, CINAHL, Embase, Cochrane Central Register of Controlled Trials, Emerald Fulltext, MD Consult Australia, PsycARTICLES, Psychology and Behavioural Sciences Collection, PsycINFO, Informit Health Databases, Scopus, UpToDate and Web of Science. The search for unpublished studies included EThOS (Electronic Thesis Online Service), Index to Theses and ProQuest- Dissertations and Theses. The assessment of methodological quality of the included studies was undertaken using the COSMIN checklist which is a validated tool that assesses the process of design and validation of healthcare measurement instruments. An Excel spreadsheet version of COSMIN was developed for data collection which included a worksheet for extracting participant characteristics and interpretability data. Statistical pooling of data was not possible for this review. Therefore, the findings are presented in a narrative form including tables and figures to aid in data presentation. To make a synthesis of the assessments of methodological quality of the different studies, each instrument was rated by accounting for the number of studies performed with an instrument, the appraisal of methodological quality and the consistency of results between studies. Twenty-one studies of 12 instruments were included in the review. The studies were diverse in their theoretical underpinnings, target population/setting and measurement objectives. Measurement objectives included: investigating beliefs, behaviors, attitudes, perceptions and relationships associated with collaboration; measuring collaboration between different levels of care or within a multi-rater/target group; assessing collaboration across teams; or assessing internal participation of both teams and patients.Studies produced validity or interpretability data but none of the studies assessed all validity and reliability properties. However, most of the included studies produced a factor structure or referred to prior factor analysis. A narrative synthesis of the individual study factor structures was generated consisting of nine headings: organizational settings, support structures, purpose and goals; communication; reflection on process; cooperation; coordination; role interdependence and partnership; relationships; newly created professional activities; and professional flexibility. Among the many instruments that measure collaboration within healthcare settings, the quality of each instrument varies; instruments are designed for specific populations and purposes, and are validated in various settings. Selecting an instrument requires careful consideration of the qualities of each. Therefore, referring to systematic reviews of measurement properties of instruments may be helpful to clinicians or researchers in instrument selection. Systematic reviews of measurement properties of instruments are valuable in aiding in instrument selection. This systematic review may be useful in instrument selection for the measurement of collaboration within healthcare settings with a complex mix of participant types. Evaluating collaboration provides important information on the strengths and limitations of different healthcare settings and the opportunities for continuous improvement via any remedial actions initiated. Development of a tool that can be used to measure collaboration within teams of healthcare professionals and non-professionals is important for practice. The use of different statistical modelling techniques, such as Item Response Theory modelling and the translation of models into Computer Adaptive Tests, may prove useful. Measurement equivalence is an important consideration for future instrument development and validation. Further development of the COSMIN tool should include appraisal for measurement equivalence. Researchers developing and validating measurement tools should consider multi-method research designs.
Lee, Won Jun; Kim, Sang Cheol; Lee, Seul Ji; Lee, Jeongmi; Park, Jeong Hill; Yu, Kyung-Sang; Lim, Johan; Kwon, Sung Won
2014-01-01
Based on the process of carcinogenesis, carcinogens are classified as either genotoxic or non-genotoxic. In contrast to non-genotoxic carcinogens, many genotoxic carcinogens have been reported to cause tumor in carcinogenic bioassays in animals. Thus evaluating the genotoxicity potential of chemicals is important to discriminate genotoxic from non-genotoxic carcinogens for health care and pharmaceutical industry safety. Additionally, investigating the difference between the mechanisms of genotoxic and non-genotoxic carcinogens could provide the foundation for a mechanism-based classification for unknown compounds. In this study, we investigated the gene expression of HepG2 cells treated with genotoxic or non-genotoxic carcinogens and compared their mechanisms of action. To enhance our understanding of the differences in the mechanisms of genotoxic and non-genotoxic carcinogens, we implemented a gene set analysis using 12 compounds for the training set (12, 24, 48 h) and validated significant gene sets using 22 compounds for the test set (24, 48 h). For a direct biological translation, we conducted a gene set analysis using Globaltest and selected significant gene sets. To validate the results, training and test compounds were predicted by the significant gene sets using a prediction analysis for microarrays (PAM). Finally, we obtained 6 gene sets, including sets enriched for genes involved in the adherens junction, bladder cancer, p53 signaling pathway, pathways in cancer, peroxisome and RNA degradation. Among the 6 gene sets, the bladder cancer and p53 signaling pathway sets were significant at 12, 24 and 48 h. We also found that the DDB2, RRM2B and GADD45A, genes related to the repair and damage prevention of DNA, were consistently up-regulated for genotoxic carcinogens. Our results suggest that a gene set analysis could provide a robust tool in the investigation of the different mechanisms of genotoxic and non-genotoxic carcinogens and construct a more detailed understanding of the perturbation of significant pathways.
Lee, Won Jun; Kim, Sang Cheol; Lee, Seul Ji; Lee, Jeongmi; Park, Jeong Hill; Yu, Kyung-Sang; Lim, Johan; Kwon, Sung Won
2014-01-01
Based on the process of carcinogenesis, carcinogens are classified as either genotoxic or non-genotoxic. In contrast to non-genotoxic carcinogens, many genotoxic carcinogens have been reported to cause tumor in carcinogenic bioassays in animals. Thus evaluating the genotoxicity potential of chemicals is important to discriminate genotoxic from non-genotoxic carcinogens for health care and pharmaceutical industry safety. Additionally, investigating the difference between the mechanisms of genotoxic and non-genotoxic carcinogens could provide the foundation for a mechanism-based classification for unknown compounds. In this study, we investigated the gene expression of HepG2 cells treated with genotoxic or non-genotoxic carcinogens and compared their mechanisms of action. To enhance our understanding of the differences in the mechanisms of genotoxic and non-genotoxic carcinogens, we implemented a gene set analysis using 12 compounds for the training set (12, 24, 48 h) and validated significant gene sets using 22 compounds for the test set (24, 48 h). For a direct biological translation, we conducted a gene set analysis using Globaltest and selected significant gene sets. To validate the results, training and test compounds were predicted by the significant gene sets using a prediction analysis for microarrays (PAM). Finally, we obtained 6 gene sets, including sets enriched for genes involved in the adherens junction, bladder cancer, p53 signaling pathway, pathways in cancer, peroxisome and RNA degradation. Among the 6 gene sets, the bladder cancer and p53 signaling pathway sets were significant at 12, 24 and 48 h. We also found that the DDB2, RRM2B and GADD45A, genes related to the repair and damage prevention of DNA, were consistently up-regulated for genotoxic carcinogens. Our results suggest that a gene set analysis could provide a robust tool in the investigation of the different mechanisms of genotoxic and non-genotoxic carcinogens and construct a more detailed understanding of the perturbation of significant pathways. PMID:24497971