On the Validity of Useless Tests
ERIC Educational Resources Information Center
Sireci, Stephen G.
2016-01-01
A misconception exists that validity may refer only to the "interpretation" of test scores and not to the "uses" of those scores. The development and evolution of validity theory illustrate test score interpretation was a primary focus in the earliest days of modern testing, and that validating interpretations derived from test…
Validating Test Score Meaning and Defending Test Score Use: Different Aims, Different Methods
ERIC Educational Resources Information Center
Cizek, Gregory J.
2016-01-01
Advances in validity theory and alacrity in validation practice have suffered because the term "validity" has been used to refer to two incompatible concerns: (1) the degree of support for specified interpretations of test scores (i.e. intended score meaning) and (2) the degree of support for specified applications (i.e. intended test…
How Is Testing Supposed to Improve Schooling?
ERIC Educational Resources Information Center
Haertel, Edward
2013-01-01
Validation research for educational achievement tests is often limited to an examination of intended test score interpretations. This article calls for an expansion of validation research in three dimensions. First, validation must attend to actual test use and its consequences, not just score meaning. Second, validation must attend to unintended…
Moore, Tyler M.; Reise, Steven P.; Roalf, David R.; Satterthwaite, Theodore D.; Davatzikos, Christos; Bilker, Warren B.; Port, Allison M.; Jackson, Chad T.; Ruparel, Kosha; Savitt, Adam P.; Baron, Robert B.; Gur, Raquel E.; Gur, Ruben C.
2016-01-01
Traditional “paper-and-pencil” testing is imprecise in measuring speed and hence limited in assessing performance efficiency, but computerized testing permits precision in measuring itemwise response time. We present a method of scoring performance efficiency (combining information from accuracy and speed) at the item level. Using a community sample of 9,498 youths age 8-21, we calculated item-level efficiency scores on four neurocognitive tests, and compared the concurrent, convergent, discriminant, and predictive validity of these scores to simple averaging of standardized speed and accuracy-summed scores. Concurrent validity was measured by the scores' abilities to distinguish men from women and their correlations with age; convergent and discriminant validity were measured by correlations with other scores inside and outside of their neurocognitive domains; predictive validity was measured by correlations with brain volume in regions associated with the specific neurocognitive abilities. Results provide support for the ability of itemwise efficiency scoring to detect signals as strong as those detected by standard efficiency scoring methods. We find no evidence of superior validity of the itemwise scores over traditional scores, but point out several advantages of the former. The itemwise efficiency scoring method shows promise as an alternative to standard efficiency scoring methods, with overall moderate support from tests of four different types of validity. This method allows the use of existing item analysis methods and provides the convenient ability to adjust the overall emphasis of accuracy versus speed in the efficiency score, thus adjusting the scoring to the real-world demands the test is aiming to fulfill. PMID:26866796
ERIC Educational Resources Information Center
Lowe, Patricia A.; Papanastasiou, Elena C.; DeRuyck, Kimberly A.; Reynolds, Cecil R.
2005-01-01
In this study, the authors investigated the temporal stability and construct validity of the Adult Manifest Anxiety Scale-College Version (AMAS-C; C. R. Reynolds, B. O. Richmond, & P. A. Lowe, 2003b) scores. Results indicated that the AMAS-C scores had adequate to excellent test score stability, and evidence supported the construct validity of the…
ERIC Educational Resources Information Center
Wu, Amery D.; Stone, Jake E.
2016-01-01
This article explores an approach for test score validation that examines test takers' strategies for taking a reading comprehension test. The authors formulated three working hypotheses about score validity pertaining to three types of test-taking strategy (comprehending meaning, test management, and test-wiseness). These hypotheses were…
ERIC Educational Resources Information Center
George-Ezzelle, Carol E.; Skaggs, Gary
2004-01-01
Current testing standards call for test developers to provide evidence that testing procedures and test scores, and the inferences made based on the test scores, show evidence of validity and are comparable across subpopulations (American Educational Research Association [AERA], American Psychological Association [APA], & National Council on…
ERIC Educational Resources Information Center
Zimmermann, Judith; von Davier, Alina A.; Buhmann, Joachim M.; Heinimann, Hans R.
2018-01-01
Graduate admission has become a critical process in tertiary education, whereby selecting valid admissions instruments is key. This study assessed the validity of Graduate Record Examination (GRE) General Test scores for admission to Master's programmes at a technical university in Europe. We investigated the indicative value of GRE scores for the…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rades, Dirk, E-mail: Rades.Dirk@gmx.net; Dziggel, Liesa; Haatanen, Tiina
2011-07-15
Purpose: To create and validate scoring systems for intracerebral control (IC) and overall survival (OS) of patients irradiated for brain metastases. Methods and Materials: In this study, 1,797 patients were randomly assigned to the test (n = 1,198) or the validation group (n = 599). Two scoring systems were developed, one for IC and another for OS. The scores included prognostic factors found significant on multivariate analyses. Age, performance status, extracerebral metastases, interval tumor diagnosis to RT, and number of brain metastases were associated with OS. Tumor type, performance status, interval, and number of brain metastases were associated with IC.more » The score for each factor was determined by dividing the 6-month IC or OS rate (given in percent) by 10. The total score represented the sum of the scores for each factor. The score groups of the test group were compared with the corresponding score groups of the validation group. Results: In the test group, 6-month IC rates were 17% for 14-18 points, 49% for 19-23 points, and 77% for 24-27 points (p < 0.0001). IC rates in the validation group were 19%, 52%, and 77%, respectively (p < 0.0001). In the test group, 6-month OS rates were 9% for 15-19 points, 41% for 20-25 points, and 78% for 26-30 points (p < 0.0001). OS rates in the validation group were 7%, 39%, and 79%, respectively (p < 0.0001). Conclusions: Patients irradiated for brain metastases can be given scores to estimate OS and IC. IC and OS rates of the validation group were similar to the test group demonstrating the validity and reproducibility of both scores.« less
Test Takers and the Validity of Score Interpretations
ERIC Educational Resources Information Center
Kopriva, Rebecca J.; Thurlow, Martha L.; Perie, Marianne; Lazarus, Sheryl S.; Clark, Amy
2016-01-01
This article argues that test takers are as integral to determining validity of test scores as defining target content and conditioning inferences on test use. A principled sustained attention to how students interact with assessment opportunities is essential, as is a principled sustained evaluation of evidence confirming the validity or calling…
ERIC Educational Resources Information Center
Sinharay, Sandip; Feng, Ying; Saldivia, Luis; Powers, Donald E.; Ginuta, Anthony; Simpson, Annabelle; Weng, Vincent
2008-01-01
The validity of TOEIC Bridge™ scores as a measure of English language skill was examined from the standpoint of a unified concept of test validity. In this study, more than 6,000 test takers in 3 Latin American countries (Chile, Colombia, and Ecuador) took 1 form of the TOEIC Bridge test, and their scores were compared to additional information…
Validation of the Narrowing Beam Walking Test in Lower Limb Prosthesis Users.
Sawers, Andrew; Hafner, Brian
2018-04-11
To evaluate the content, construct, and discriminant validity of the Narrowing Beam Walking Test (NBWT), a performance-based balance test for lower limb prosthesis users. Cross-sectional study. Research laboratory and prosthetics clinic. Unilateral transtibial and transfemoral prosthesis users (N=40). Not applicable. Content validity was examined by quantifying the percentage of participants receiving maximum or minimum scores (ie, ceiling and floor effects). Convergent construct validity was examined using correlations between participants' NBWT scores and scores or times on existing clinical balance tests regularly administered to lower limb prosthesis users. Known-groups construct validity was examined by comparing NBWT scores between groups of participants with different fall histories, amputation levels, amputation etiologies, and functional levels. Discriminant validity was evaluated by analyzing the area under each test's receiver operating characteristic (ROC) curve. No minimum or maximum scores were recorded on the NBWT. NBWT scores demonstrated strong correlations (ρ=.70‒.85) with scores/times on performance-based balance tests (timed Up and Go test, Four Square Step Test, and Berg Balance Scale) and a moderate correlation (ρ=.49) with the self-report Activities-specific Balance Confidence scale. NBWT performance was significantly lower among participants with a history of falls (P=.003), transfemoral amputation (P=.011), and a lower mobility level (P<.001). The NBWT also had the largest area under the ROC curve (.81) and was the only test to exhibit an area that was statistically significantly >.50 (ie, chance). The results provide strong evidence of content, construct, and discriminant validity for the NBWT as a performance-based test of balance ability. The evidence supports its use to assess balance impairments and fall risk in unilateral transtibial and transfemoral prosthesis users. Copyright © 2018 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Dahlke, Jeffrey A; Kostal, Jack W; Sackett, Paul R; Kuncel, Nathan R
2018-05-03
We explore potential explanations for validity degradation using a unique predictive validation data set containing up to four consecutive years of high school students' cognitive test scores and four complete years of those students' college grades. This data set permits analyses that disentangle the effects of predictor-score age and timing of criterion measurements on validity degradation. We investigate the extent to which validity degradation is explained by criterion dynamism versus the limited shelf-life of ability scores. We also explore whether validity degradation is attributable to fluctuations in criterion variability over time and/or GPA contamination from individual differences in course-taking patterns. Analyses of multiyear predictor data suggest that changes to the determinants of performance over time have much stronger effects on validity degradation than does the shelf-life of cognitive test scores. The age of predictor scores had only a modest relationship with criterion-related validity when the criterion measurement occasion was held constant. Practical implications and recommendations for future research are discussed. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Does Test Preparation Work? Implications for Score Validity
ERIC Educational Resources Information Center
Xie, Qin
2013-01-01
This article reports an empirical study that examined the pattern of test preparation for College English Test Band 4 (CET4) and the differential effects of test preparation practices on its scores, thereby drawing implications for CET4 score validity. Data collection involved 1,003 test takers of CET4. A pretest was administered at the beginning…
ERIC Educational Resources Information Center
Sahin, Füsun
2017-01-01
Examining the testing processes, as well as the scores, is needed for a complete understanding of validity and fairness of computer-based assessments. Examinees' rapid-guessing and insufficient familiarity with computers have been found to be major issues that weaken the validity arguments of scores. This study has three goals: (a) improving…
The Validity of IQ Scores Derived from Readiness Screening Tests
ERIC Educational Resources Information Center
Telegdy, Gabriel A.
1976-01-01
The Screening Test of Academic Readiness (STAR) and the Peabody Picture Vocabulary Test (PPVT) were administered to 52 kindergarten children to reveal the convergent validity of IQ scores derived from the STAR. The findings raise doubts about the validity of the deviation IQs derived from the STAR. (Author)
Validity Semantics in Educational and Psychological Assessment
ERIC Educational Resources Information Center
Hathcoat, John D.
2013-01-01
The semantics, or meaning, of validity is a fluid concept in educational and psychological testing. Contemporary controversies surrounding this concept appear to stem from the proper location of validity. Under one view, validity is a property of score-based inferences and entailed uses of test scores. This view is challenged by the…
Validity and Reliability of Baseline Testing in a Standardized Environment.
Higgins, Kathryn L; Caze, Todd; Maerlender, Arthur
2017-08-11
The Immediate Postconcussion Assessment and Cognitive Testing (ImPACT) is a computerized neuropsychological test battery commonly used to determine cognitive recovery from concussion based on comparing post-injury scores to baseline scores. This model is based on the premise that ImPACT baseline test scores are a valid and reliable measure of optimal cognitive function at baseline. Growing evidence suggests that this premise may not be accurate and a large contributor to invalid and unreliable baseline test scores may be the protocol and environment in which baseline tests are administered. This study examined the effects of a standardized environment and administration protocol on the reliability and performance validity of athletes' baseline test scores on ImPACT by comparing scores obtained in two different group-testing settings. Three hundred-sixty one Division 1 cohort-matched collegiate athletes' baseline data were assessed using a variety of indicators of potential performance invalidity; internal reliability was also examined. Thirty-one to thirty-nine percent of the baseline cases had at least one indicator of low performance validity, but there were no significant differences in validity indicators based on environment in which the testing was conducted. Internal consistency reliability scores were in the acceptable to good range, with no significant differences between administration conditions. These results suggest that athletes may be reliably performing at levels lower than their best effort would produce. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Hawkins, Melanie; Elsworth, Gerald R; Osborne, Richard H
2018-07-01
Data from subjective patient-reported outcome measures (PROMs) are now being used in the health sector to make or support decisions about individuals, groups and populations. Contemporary validity theorists define validity not as a statistical property of the test but as the extent to which empirical evidence supports the interpretation of test scores for an intended use. However, validity testing theory and methodology are rarely evident in the PROM validation literature. Application of this theory and methodology would provide structure for comprehensive validation planning to support improved PROM development and sound arguments for the validity of PROM score interpretation and use in each new context. This paper proposes the application of contemporary validity theory and methodology to PROM validity testing. The validity testing principles will be applied to a hypothetical case study with a focus on the interpretation and use of scores from a translated PROM that measures health literacy (the Health Literacy Questionnaire or HLQ). Although robust psychometric properties of a PROM are a pre-condition to its use, a PROM's validity lies in the sound argument that a network of empirical evidence supports the intended interpretation and use of PROM scores for decision making in a particular context. The health sector is yet to apply contemporary theory and methodology to PROM development and validation. The theoretical and methodological processes in this paper are offered as an advancement of the theory and practice of PROM validity testing in the health sector.
Lonsdale, Chris; Hodge, Ken; Rose, Elaine A
2008-06-01
The purpose of the four studies described in this article was to develop and test a new measure of competitive sport participants' intrinsic motivation, extrinsic motivation, and amotivation (self-determination theory; Deci & Ryan, 1985). The items for the new measure, named the Behavioral Regulation in Sport Questionnaire (BRSQ), were constructed using interviews, expert review, and pilot testing. Analyses supported the internal consistency, test-retest reliability, and factorial validity of the BRSQ scores. Nomological validity evidence was also supportive, as BRSQ subscale scores were correlated in the expected pattern with scores derived from measures of motivational consequences. When directly compared with scores derived from the Sport Motivation Scale (SMS; Pelletier, Fortier, Vallerand, Tuson, & Blais, 1995) and a revised version of that questionnaire (SMS-6; Mallett, Kawabata, Newcombe, Otero-Forero, & Jackson, 2007), BRSQ scores demonstrated equal or superior reliability and factorial validity as well as better nomological validity.
Commentary on "Validating the Interpretations and Uses of Test Scores"
ERIC Educational Resources Information Center
Brennan, Robert L.
2013-01-01
Kane's paper "Validating the Interpretations and Uses of Test Scores" is the most complete and clearest discussion yet available of the argument-based approach to validation. At its most basic level, validation as formulated by Kane is fundamentally a simply-stated two-step enterprise: (1) specify the claims inherent in a particular interpretation…
ERIC Educational Resources Information Center
Sawyer, Richard
2013-01-01
Correlational evidence suggests that high school GPA is better than admission test scores in predicting first-year college GPA, although test scores have incremental predictive validity. The usefulness of a selection variable in making admission decisions depends in part on its predictive validity, but also on institutions' selectivity and…
Translation and validation of the Dutch new Knee Society Scoring System ©.
Van Der Straeten, Catherine; Witvrouw, Erik; Willems, Tine; Bellemans, Johan; Victor, Jan
2013-11-01
A new version of The Knee Society Knee Scoring System(©) (KSS) has recently been developed. Before this scale can be used in non-English-speaking populations, it has to be translated and validated for a particular population. We evaluated the construct and content validity, the test-retest reliability, and the internal consistency of the Dutch version of the New Knee Society KSS. A Dutch translation was performed using a forward-backward translation protocol. We tested the construct validity of the Dutch New KSS by comparing it with the Dutch versions of the WOMAC, Knee Injury and Osteoarthritis Outcome Score (KOOS), and SF-12 scores in 137 patients undergoing total knee arthroplasty (TKA). Content validity was assessed by comparing pre- and postoperative scores and by checking floor and ceiling effects. To evaluate test-retest reliability and consistency, 47 patients completed the questionnaire a second time with a mean of 8 days interval (range, 2-20 days) between tests. Construct validity was demonstrated because the Dutch New KSS correlated well with the Dutch WOMAC (r = -0.751; p < 0.001), Dutch KOOS (r = -0.723; p < 0.001), and Dutch SF-12 (r = 0.569; p < 0.001). There was a significant difference between pre- and postoperative scores (p < 0.001) in line with the other scores. Test-retest reliability proved excellent with an intraclass correlation coefficient between 0.73 and 0.92 depending on the domain tested. Consistency as indicated by Cronbach's alpha ranging from 0.84 to 0.96 was good to excellent. As demonstrated by the validation procedure, the Dutch New KSS is an excellent instrument to evaluate TKA outcome in Dutch-speaking patients.
Bernabeu-Mora, Roberto; Medina-Mirapeix, Françesc; Llamazares-Herrán, Eduardo; García-Guillamón, Gloria; Giménez-Giménez, Luz María; Sánchez-Nieto, Juan Miguel
2015-01-01
Limited mobility is a risk factor for developing chronic obstructive pulmonary disease (COPD)-related disabilities. Little is known about the validity of the Short Physical Performance Battery (SPPB) for identifying mobility limitations in patients with COPD. To determine the clinical validity of the SPPB summary score and its three components (standing balance, 4-meter gait speed, and five-repetition sit-to-stand) for identifying mobility limitations in patients with COPD. This cross-sectional study included 137 patients with COPD, recruited from a hospital in Spain. Muscle strength tests and SPPB were measured; then, patients were surveyed for self-reported mobility limitations. The validity of SPPB scores was analyzed by developing receiver operating characteristic curves to analyze the sensitivity and specificity for identifying patients with mobility limitations; by examining group differences in SPPB scores across categories of mobility activities; and by correlating SPPB scores to strength tests. Only the SPPB summary score and the five-repetition sit-to-stand components showed good discriminative capabilities; both showed areas under the receiver operating characteristic curves greater than 0.7. Patients with limitations had significantly lower SPPB scores than patients without limitations in nine different mobility activities. SPPB scores were moderately correlated with the quadriceps test (r>0.40), and less correlated with the handgrip test (r<0.30), which reinforced convergent and divergent validities. A SPPB summary score cutoff of 10 provided the best accuracy for identifying mobility limitations. This study provided evidence for the validity of the SPPB summary score and the five-repetition sit-to-stand test for assessing mobility in patients with COPD. These tests also showed potential as a screening test for identifying patients with COPD that have mobility limitations.
Bernabeu-Mora, Roberto; Medina-Mirapeix, Françesc; Llamazares-Herrán, Eduardo; García-Guillamón, Gloria; Giménez-Giménez, Luz María; Sánchez-Nieto, Juan Miguel
2015-01-01
Background Limited mobility is a risk factor for developing chronic obstructive pulmonary disease (COPD)-related disabilities. Little is known about the validity of the Short Physical Performance Battery (SPPB) for identifying mobility limitations in patients with COPD. Objective To determine the clinical validity of the SPPB summary score and its three components (standing balance, 4-meter gait speed, and five-repetition sit-to-stand) for identifying mobility limitations in patients with COPD. Methods This cross-sectional study included 137 patients with COPD, recruited from a hospital in Spain. Muscle strength tests and SPPB were measured; then, patients were surveyed for self-reported mobility limitations. The validity of SPPB scores was analyzed by developing receiver operating characteristic curves to analyze the sensitivity and specificity for identifying patients with mobility limitations; by examining group differences in SPPB scores across categories of mobility activities; and by correlating SPPB scores to strength tests. Results Only the SPPB summary score and the five-repetition sit-to-stand components showed good discriminative capabilities; both showed areas under the receiver operating characteristic curves greater than 0.7. Patients with limitations had significantly lower SPPB scores than patients without limitations in nine different mobility activities. SPPB scores were moderately correlated with the quadriceps test (r>0.40), and less correlated with the handgrip test (r<0.30), which reinforced convergent and divergent validities. A SPPB summary score cutoff of 10 provided the best accuracy for identifying mobility limitations. Conclusion This study provided evidence for the validity of the SPPB summary score and the five-repetition sit-to-stand test for assessing mobility in patients with COPD. These tests also showed potential as a screening test for identifying patients with COPD that have mobility limitations. PMID:26664110
Wells, Erica L; Kofler, Michael J; Soto, Elia F; Schaefer, Hillary S; Sarver, Dustin E
2018-01-01
Pediatric ADHD is associated with impairments in working memory, but these deficits often go undetected when using clinic-based tests such as digit span backward. The current study pilot-tested minor administration/scoring modifications to improve digit span backward's construct and predictive validities in a well-characterized sample of children with ADHD. WISC-IV digit span was modified to administer all trials (i.e., ignore discontinue rule) and count digits rather than trials correct. Traditional and modified scores were compared to a battery of criterion working memory (construct validity) and academic achievement tests (predictive validity) for 34 children with ADHD ages 8-13 (M=10.41; 11 girls). Traditional digit span backward scores failed to predict working memory or KTEA-2 achievement (allns). Alternate administration/scoring of digit span backward significantly improved its associations with working memory reordering (r=.58), working memory dual-processing (r=.53), working memory updating (r=.28), and KTEA-2 achievement (r=.49). Consistent with prior work, these findings urge caution when interpreting digit span performance. Minor test modifications may address test validity concerns, and should be considered in future test revisions. Digit span backward becomes a valid measure of working memory at exactly the point that testing is traditionally discontinued. Copyright © 2017 Elsevier Ltd. All rights reserved.
Chang, Jasper O; Levy, Susan S; Seay, Seth W; Goble, Daniel J
2014-05-01
Recent guidelines advocate sports medicine professionals to use balance tests to assess sensorimotor status in the management of concussions. The present study sought to determine whether a low-cost balance board could provide a valid, reliable, and objective means of performing this balance testing. Criterion validity testing relative to a gold standard and 7 day test-retest reliability. University biomechanics laboratory. Thirty healthy young adults. Balance ability was assessed on 2 days separated by 1 week using (1) a gold standard measure (ie, scientific grade force plate), (2) a low-cost Nintendo Wii Balance Board (WBB), and (3) the Balance Error Scoring System (BESS). Validity of the WBB center of pressure path length and BESS scores were determined relative to the force plate data. Test-retest reliability was established based on intraclass correlation coefficients. Composite scores for the WBB had excellent validity (r = 0.99) and test-retest reliability (R = 0.88). Both the validity (r = 0.10-0.52) and test-retest reliability (r = 0.61-0.78) were lower for the BESS. These findings demonstrate that a low-cost balance board can provide improved balance testing accuracy/reliability compared with the BESS. This approach provides a potentially more valid/reliable, yet affordable, means of assessing sports-related concussion compared with current methods.
Narin, Selnur; Unver, Bayram; Bakırhan, Serkan; Bozan, Ozgür; Karatosun, Vasfi
2014-01-01
The purpose of this study was to adapt the English version of the Hospital for Special Surgery (HSS) knee score for use in a Turkish population and to evaluate its validity, reliability and cultural adaptation. Standard forward-back translation of the HSS knee score was performed and the Turkish version was applied in 73 patients. The Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC), Mini-Mental State Examination and sit-to-stand test were also performed and analyzed. Internal consistency reliability was tested using Cronbach's alpha. The intraclass correlation coefficient (ICC) was used to calculate the test-retest reliability at one-week intervals. Validity was assessed by calculating the Pearson correlation between the HSS, WOMAC and sit-to-stand test scores. The ICC ranged from 0.98 to 0.99 with high internal consistency (Cronbach's alpha: 0.87). The WOMAC score correlated with total HSS score (r: -0.80, p<0.001) and sit-to-stand score (r: 0.12, p: 0.312). The Turkish version of the HSS knee score is reliable and valid in evaluating the total knee arthroplasty in Turkish patients.
Rater Cognition: Implications for Validity
ERIC Educational Resources Information Center
Bejar, Issac I.
2012-01-01
The scoring process is critical in the validation of tests that rely on constructed responses. Documenting that readers carry out the scoring in ways consistent with the construct and measurement goals is an important aspect of score validity. In this article, rater cognition is approached as a source of support for a validity argument for scores…
Shifting the Focus of Validity for Test Use
ERIC Educational Resources Information Center
Moss, Pamela A.
2016-01-01
The conventional focus of validity in educational measurement has been on intended interpretations and uses of test scores. Empirical studies of test use by teachers, administrators and policy-makers show that actual interpretations and uses of test scores in context are invariably shaped by local users' questions, which frequently require…
A contemporary approach to validity arguments: a practical guide to Kane's framework.
Cook, David A; Brydges, Ryan; Ginsburg, Shiphra; Hatala, Rose
2015-06-01
Assessment is central to medical education and the validation of assessments is vital to their use. Earlier validity frameworks suffer from a multiplicity of types of validity or failure to prioritise among sources of validity evidence. Kane's framework addresses both concerns by emphasising key inferences as the assessment progresses from a single observation to a final decision. Evidence evaluating these inferences is planned and presented as a validity argument. We aim to offer a practical introduction to the key concepts of Kane's framework that educators will find accessible and applicable to a wide range of assessment tools and activities. All assessments are ultimately intended to facilitate a defensible decision about the person being assessed. Validation is the process of collecting and interpreting evidence to support that decision. Rigorous validation involves articulating the claims and assumptions associated with the proposed decision (the interpretation/use argument), empirically testing these assumptions, and organising evidence into a coherent validity argument. Kane identifies four inferences in the validity argument: Scoring (translating an observation into one or more scores); Generalisation (using the score[s] as a reflection of performance in a test setting); Extrapolation (using the score[s] as a reflection of real-world performance), and Implications (applying the score[s] to inform a decision or action). Evidence should be collected to support each of these inferences and should focus on the most questionable assumptions in the chain of inference. Key assumptions (and needed evidence) vary depending on the assessment's intended use or associated decision. Kane's framework applies to quantitative and qualitative assessments, and to individual tests and programmes of assessment. Validation focuses on evaluating the key claims, assumptions and inferences that link assessment scores with their intended interpretations and uses. The Implications and associated decisions are the most important inferences in the validity argument. © 2015 John Wiley & Sons Ltd.
Rades, Dirk; Dziggel, Liesa; Nagy, Viorica; Segedin, Barbara; Lohynska, Radka; Veninga, Theo; Khoa, Mai T; Trang, Ngo T; Schild, Steven E
2013-07-01
Survival scores for patients with brain metastasis exist. However, the treatment regimens used to create these scores were heterogeneous. This study aimed to develop and validate a survival score in homogeneously treated patients. Eight-hundred-and-eighty-two patients receiving 10 × 3Gy of WBRT alone were randomly assigned to a test group (N=441) or a validation group (N=441). In the multivariate analysis of the test group, age, performance status, extracranial metastasis, and systemic treatment prior to WBRT were independent predictors of survival. The score for each factor was determined by dividing the 6-month survival rate (in %) by 10. Scores were summed and total scores ranged from 6 to 19 points. Patients were divided into four prognostic groups. The 6-month survival rates were 4% for 6-9 points, 29% for 10-14 points, 62% for 15-17 points, and 93% for 17-18 points (p<0.001) in the test group. The survival rates were 3%, 28%, 54% and 96%, respectively (p<0.001) in the validation group. Since the 6-month survival rates in the validation group were very similar to the test group, this new score (WBRT-30) appears valid and reproducible. It can help making treatment choices and stratifying patients in future trials. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Validity of Alternative Cut-Off Scores for the Back-Saver Sit and Reach Test
ERIC Educational Resources Information Center
Looney, Marilyn A.; Gilbert, Jennie
2012-01-01
The purpose of the study was to determine if currently used FITNESSGRAM[R] cut-off scores for the Back Saver Sit and Reach Test had the best criterion-referenced validity evidence for 6-12 year old children. Secondary analyses of an existing data set focused on the passive straight leg raise and Back Saver Sit and Reach Test flexibility scores of…
The Michigan Alcoholism Screening Test (MAST): A Statistical Validation Analysis
ERIC Educational Resources Information Center
Laux, John M.; Newman, Isadore; Brown, Russ
2004-01-01
This study extends the Michigan Alcoholism Screening Test (MAST; M. L. Selzer, 1971) literature base by examining 4 issues related to the validity of the MAST scores. Specifically, the authors examine the validity of the MAST scores in light of the presence of impression management, participant demographic variables, and item endorsement…
Interactional Competence: Challenges for Validity.
ERIC Educational Resources Information Center
Young, Richard F.
One of the ways in which language testing interfaces with applied linguistics is in the definition and validation of the constructs that underlie language tests. When language testers and score users interpret scores on a test, they do so by implicit and explicit reference to the construct on which the test is based. Equally, when applied to new…
Xiao, Yuan-mei; Wang, Zhi-ming; Wang, Mian-zhen; Lan, Ya-jia
2005-06-01
To test the reliability and validity of two mental workload assessment scales, i.e. subjective workload assessment technique (SWAT) and NASA task load index (NASA-TLX). One thousand two hundred and sixty-eight mental workers were sampled from various kinds of occupations, such as scientific research, education, administration and medicine, etc, with randomized cluster sampling. The re-test reliability, split-half reliability, Cronbach's alpha coefficient and correlation coefficients between item score and total score were adopted to test the reliability. The test of validity included structure validity. The re-test reliability coefficients of these two scales and their items were ranged from 0.516 to 0.753 (P < 0.01), indicating the two scales had good re-test reliability; the split-half reliability of SWAT was 0.645, and its Cronbach's alpha coefficient was more than 0.80, all the correlation coefficients between its items score and total score were more than 0.70; as for NASA-TLX, both the split-half reliability and Cronbach's alpha coefficient were more than 0.80, the correlation coefficients between its items score and total score were all more than 0.60 (P < 0.01) except the item of performance. Both scales had good inner consistency. The Pearson correlation coefficient between the two scales was 0.492 (P < 0.01), implying the results of the two scales had good consistency. Factor analysis showed that the two scales had good structure validity. Both SWAT and NASA-TLX have good reliability and validity and may be used as a valid tool to assess mental workload in China after being revised properly.
Measuring leprosy-related stigma - a pilot study to validate a toolkit of instruments.
Rensen, Carin; Bandyopadhyay, Sudhakar; Gopal, Pala K; Van Brakel, Wim H
2011-01-01
Stigma negatively affects the quality of life of leprosy-affected people. Instruments are needed to assess levels of stigma and to monitor and evaluate stigma reduction interventions. We conducted a validation study of such instruments in Tamil Nadu and West Bengal, India. Four instruments were tested in a 'Community Based Rehabilitation' (CBR) setting, the Participation Scale, Internalised Scale of Mental Illness (ISMI) adapted for leprosy-affected persons, Explanatory Model Interview Catalogue (EMIC) for leprosy-affected and non-affected persons and the General Self-Efficacy (GSE) Scale. We evaluated the following components of validity, construct validity, internal consistency, test-retest reproducibility and reliability to distinguish between groups. Construct validity was tested by correlating instrument scores and by triangulating quantitative and qualitative findings. Reliability was evaluated by comparing levels of stigma among people affected by leprosy and community controls, and among affected people living in CBR project areas and those in non-CBR areas. For the Participation, ISMI and EMIC scores significant differences were observed between those affected by leprosy and those not affected (p = 0.0001), and between affected persons in the CBR and Control group (p < 0.05). The internal consistency of the instruments measured with Cronbach's α ranged from 0.83 to 0.96 and was very good for all instruments. Test-retest reproducibility coefficients were 0.80 for the Participation score, 0.70 for the EMIC score, 0.62 for the ISMI score and 0.50 for the GSE score. The construct validity of all instruments was confirmed. The Participation and EMIC Scales met all validity criteria, but test-retest reproducibility of the ISMI and GSE Scales needs further evaluation with a shorter test-retest interval and longer training and additional adaptations for the latter.
Derivation and Applicability of Asymptotic Results for Multiple Subtests Person-Fit Statistics
Albers, Casper J.; Meijer, Rob R.; Tendeiro, Jorge N.
2016-01-01
In high-stakes testing, it is important to check the validity of individual test scores. Although a test may, in general, result in valid test scores for most test takers, for some test takers, test scores may not provide a good description of a test taker’s proficiency level. Person-fit statistics have been proposed to check the validity of individual test scores. In this study, the theoretical asymptotic sampling distribution of two person-fit statistics that can be used for tests that consist of multiple subtests is first discussed. Second, simulation study was conducted to investigate the applicability of this asymptotic theory for tests of finite length, in which the correlation between subtests and number of items in the subtests was varied. The authors showed that these distributions provide reasonable approximations, even for tests consisting of subtests of only 10 items each. These results have practical value because researchers do not have to rely on extensive simulation studies to simulate sampling distributions. PMID:29881053
Shenker, Bennett S
2014-02-01
To validate a scoring system that evaluates the ability of Internet search engines to correctly predict diagnoses when symptoms are used as search terms. We developed a five point scoring system to evaluate the diagnostic accuracy of Internet search engines. We identified twenty diagnoses common to a primary care setting to validate the scoring system. One investigator entered the symptoms for each diagnosis into three Internet search engines (Google, Bing, and Ask) and saved the first five webpages from each search. Other investigators reviewed the webpages and assigned a diagnostic accuracy score. They rescored a random sample of webpages two weeks later. To validate the five point scoring system, we calculated convergent validity and test-retest reliability using Kendall's W and Spearman's rho, respectively. We used the Kruskal-Wallis test to look for differences in accuracy scores for the three Internet search engines. A total of 600 webpages were reviewed. Kendall's W for the raters was 0.71 (p<0.0001). Spearman's rho for test-retest reliability was 0.72 (p<0.0001). There was no difference in scores based on Internet search engine. We found a significant difference in scores based on the webpage's order on the Internet search engine webpage (p=0.007). Pairwise comparisons revealed higher scores in the first webpages vs. the fourth (corr p=0.009) and fifth (corr p=0.017). However, this significance was lost when creating composite scores. The five point scoring system to assess diagnostic accuracy of Internet search engines is a valid and reliable instrument. The scoring system may be used in future Internet research. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Validity of the Mayer-Salovey-Caruso Emotional Intelligence Test: Youth Version-Research Edition
ERIC Educational Resources Information Center
Peters, Christine; Kranzler, John H.; Rossen, Eric
2009-01-01
This study examines the criterion-related validity evidence of scores on the Mayer-Salovey-Caruso Emotional Intelligence Test: Youth Version-Research Version. The authors also investigate the relationship between scores on the MSCEIT-YV and chronological age. Results provide initial support for the construct validity of the MSCEIT-YV but also…
ERIC Educational Resources Information Center
Wu, Amery D.; Stone, Jake E.; Liu, Yan
2016-01-01
This article proposes and demonstrates a methodology for test score validation through abductive reasoning. It describes how abductive reasoning can be utilized in support of the claims made about test score validity. This methodology is demonstrated with a real data example of the Canadian English Language Proficiency Index Program…
Prince, Lisa K; Campbell, Ruth C; Gao, Sam W; Kendrick, Jessica; Lebrun, Christopher J; Little, Dustin J; Mahoney, David L; Maursetter, Laura A; Nee, Robert; Saddler, Mark; Watson, Maura A; Yuan, Christina M
2018-04-01
Few quantitative nephrology-specific simulations assess fellow competency. We describe the development and initial validation of a formative objective structured clinical examination (OSCE) assessing fellow competence in ordering acute dialysis. The three test scenarios were acute continuous renal replacement therapy, chronic dialysis initiation in moderate uremia and acute dialysis in end-stage renal disease-associated hyperkalemia. The test committee included five academic nephrologists and four clinically practicing nephrologists outside of academia. There were 49 test items (58 points). A passing score was 46/58 points. No item had median relevance less than 'important'. The content validity index was 0.91. Ninety-five percent of positive-point items were easy-medium difficulty. Preliminary validation was by 10 board-certified volunteers, not test committee members, a median of 3.5 years from graduation. The mean score was 49 [95% confidence interval (CI) 46-51], κ = 0.68 (95% CI 0.59-0.77), Cronbach's α = 0.84. We subsequently administered the test to 25 fellows. The mean score was 44 (95% CI 43-45); 36% passed the test. Fellows scored significantly less than validators (P < 0.001). Of evidence-based questions, 72% were answered correctly by validators and 54% by fellows (P = 0.018). Fellows and validators scored least well on the acute hyperkalemia question. In self-assessing proficiency, 71% of fellows surveyed agreed or strongly agreed that the OSCE was useful. The OSCE may be used to formatively assess fellow proficiency in three common areas of acute dialysis practice. Further validation studies are in progress.
Scarponi, Letizia; de Felicio, Claudia Maria; Sforza, Chiarella; Pimenta Ferreira, Claudia Lucia; Ginocchio, Daniela; Pizzorni, Nicole; Barozzi, Stefania; Mozzanica, Francesco; Schindler, Antonio
2018-05-30
To evaluate the reliability, validity, and responsiveness of the Italian OMES (I-OMES). The study consisted of 3 phases: (1) internal consistency and reliability, (2) validity, and (3) responsiveness analysis. The recruited population included 27 patients with orofacial myofunctional disorders (OMD) and 174 healthy volunteers. Forty-seven subjects, 18 healthy and all recruited patients with OMD were assessed for inter-rater and test-retest reliability analysis. I-OMES and Nordic Orofacial Test - Screening (NOT-S) scores of the patients were correlated for concurrent validity analysis. I-OMES scores from 27 patients with OMD and 27 age- and gender-matched healthy subjects were compared to investigate construct validity. I-OMES scores before and after successful swallowing rehabilitation in patients were compared for responsiveness analysis. Adequate internal consistency (Cronbach α = 0.71) and strong inter-rater and test-retest reliability (intraclass coefficient correlation = 0.97 and 0.98, respectively) were found. I-OMES and NOT-S scores significantly and inversely correlated (r = -0.38). A statistical significance (p < 0.001) was found between the pathological group and the control group for the total I-OMES score. The mean I-OMES score improved from 90 (78-102) to 99 (89-103) after myofunctional rehabilitation (p < 0.001). The I-OMES is a reliable and valid tool to evaluate OMD. © 2018 S. Karger AG, Basel.
Oren, Carmel; Kennet-Cohen, Tamar; Turvall, Elliot; Allalouf, Avi
2014-01-01
The Psychometric Entrance Test (PET), used for admission to higher education in Israel together with the Matriculation (Bagrut), had in the past one general (total) score in which the weights for its domains: Verbal, Quantitative and English, were 2:2:1, respectively. In 2011, two additional total scores were introduced, with different weights for the Verbal and the Quantitative domains. This study compares the predictive validity of the three general scores of PET, and demonstrates validity in terms of utility. 100,863 freshmen students of all Israeli universities over the classes of 2005-2009. Regression weights and correlations of the predictors with FYGPA were computed. Simulations based on these results supplied the utility estimates. On average, PET is slightly more predictive than the Bagrut; using them both yields a better tool than either of them alone. Assigning differential weights to the components in the respective schools further improves the validity. The introduction of the new general scores of PET is validated by gathering and analyzing evidence based on relations of test scores to other variables. The utility of using the test can be demonstrated in ways different from correlations.
Guo, Jing; Lau, Ajax Hong Yin; Chau, Jack; Ng, Bobby Kin Wah; Lee, Kwong Man; Qiu, Yong; Cheng, Jack Chun Yiu; Lam, Tsz Ping
2016-10-01
"Simplified Chinese" version of Spinal Appearance Questionnaire (SC-SAQ) for patients with adolescent idiopathic scoliosis (AIS) was available but did not fit for communities using "Traditional Chinese" as their primary language. We developed a traditional Chinese version of SAQ (TC-SAQ) and evaluated its reliability and validity. TC-SAQ was administered to 112 AIS patients, of which 101 bilingual (English and Chinese) patients completed E-SAQ and the traditional Chinese version of Scoliosis Research Society-22 questionnaire (TC-SRS-22). Internal consistency and test-retest reliability were evaluated. Concurrent validity was evaluated by comparing TC-SAQ score with E-SAQ score, and convergent validity by comparing TC-SAQ score with TC-SRS-22 self-image domain score, and discriminant validity by analyzing the relationship between TC-SAQ score and patients' characteristics. Internal consistency of individual TC-SAQ domain was high (Cronbach's α = 0.785 to 0.940), except for general (Cronbach's α = 0.665) and shoulders (Cronbach's α = 0.421) domain. Test-retest reliability of TC-SAQ was good (ICCs of each domain from 0.798 to 0.865). Concurrent validity demonstrated an excellent correlation between TC-SAQ and E-SAQ scores (r = 0.820 to 0.954, P < 0.0001 for all domains). Correlation between TC-SAQ domains and TC-SRS-22 self-image domain was weak to moderate. TC-SAQ total score and individual domain scores (except waist and chest domains) were positively correlated to major curve magnitude. TC-SAQ had good internal consistency and test-retest reliability. Concurrent validity evaluated against the original English version was excellent. TC-SAQ was both reliable and valid for clinical use for AIS patients using traditional Chinese as their primary language.
Apivatgaroon, Adinun; Angthong, Chayanin; Sanguanjit, Prakasit; Chernchujit, Bancha
2016-10-01
To develop a Thai version of the Kujala score and show the evaluation of the validity and reliability of the score. The Thai version of the Kujala score was developed using the forward-backward translation protocol. The 49 PFPS patients answered the Thai version of questionnaires including the Kujala score, Short Form-36 (SF-36) and International Knee Documentation Committee (IKDC) Subjective Knee Form. The validity between the scores has been tested. The reliability was assessed using test-retest reliability and internal consistency. The Thai version of the Kujala score showed a good correlation with Thai IKDC Subjective Knee Form (Pearson's correlation coefficient; r = 0.74: p < 0.01) and moderate correlation with the Thai SF-36 subscales of physical component summary, total score and role physical (r = 0.586, 0.571 and 0.524, respectively: p < 0.01). The test-retest reliability was excellent with an intra-class correlation coefficient of 0.908 (p < 0.001; 95% CI [0.842-0.947]). The internal consistency was strong with Cronbach's alpha of 0.952 (p < 0.001). No floor and ceiling effects were observed. The Thai version of the Kujala score has shown good validity and reliability. This score can be effectively used for evaluating Thai patients with patellofemoral pain syndrome. Implications for Rehabilitation The Kujala score is a self-administered questionnaire for patients with patellofemoral pain syndrome (PFPS). The validity and reliability of the Thai version of Kujala are compatible with other versions (Turkish, Chinese and Persian version). The Thai version of Kujala has been shown to have validity and reliability in Thai PFPS patients and can be used for clinical evaluation and also in the research work.
[Validity criteria of a short test to assess speech and language competence in 4-year-olds].
Euler, H A; Holler-Zittlau, I; Minnen, S; Sick, U; Dux, W; Zaretsky, Y; Neumann, K
2010-11-01
A psychometrically constructed short test as a prerequisite for screening was developed on the basis of a revision of the Marburger Speech Screening to assess speech/language competence among children in Hessen (Germany). A total of 257 children (age 4.0 to 4.5 years) performed the test battery for speech/language competence; 214 children repeated the test 1 year later. Test scores correlated highly with scores of two competing language screenings (SSV, HASE) and with a combined score from four diagnostic tests of individual speech/language competences (Reynell III, patholinguistic diagnostics in impaired language development, PLAKSS, AWST-R). Validity was demonstrated by three comparisons: (1) Children with German family language had higher scores than children with another language. (2) The 3-month-older children achieved higher scores than younger children. (3) The difference between the children with German family language and those with another language was higher for the 3-month-older than for the younger children. The short test assesses the speech/language competence of 4-year-olds quickly, validly, and comprehensively.
External validation of the HIT Expert Probability (HEP) score.
Joseph, Lee; Gomes, Marcelo P V; Al Solaiman, Firas; St John, Julie; Ozaki, Asuka; Raju, Manjunath; Dhariwal, Manoj; Kim, Esther S H
2015-03-01
The diagnosis of heparin-induced thrombocytopenia (HIT) can be challenging. The HIT Expert Probability (HEP) Score has recently been proposed to aid in the diagnosis of HIT. We sought to externally and prospectively validate the HEP score. We prospectively assessed pre-test probability of HIT for 51 consecutive patients referred to our Consultative Service for evaluation of possible HIT between August 1, 2012 and February 1, 2013. Two Vascular Medicine fellows independently applied the 4T and HEP scores for each patient. Two independent HIT expert adjudicators rendered a diagnosis of HIT likely or unlikely. The median (interquartile range) of 4T and HEP scores were 4.5 (3.0, 6.0) and 5 (3.0, 8.5), respectively. There were no significant differences between area under receiver-operating characteristic curves of 4T and HEP scores against the gold standard, confirmed HIT [defined as positive serotonin release assay and positive anti-PF4/heparin ELISA] (0.74 vs 0.73, p = 0.97). HEP score ≥ 2 was 100 % sensitive and 16 % specific for determining the presence of confirmed HIT while a 4T score > 3 was 93 % sensitive and 35 % specific. In conclusion, the HEP and 4T scores are excellent screening pre-test probability models for HIT, however, in this prospective validation study, test characteristics for the diagnosis of HIT based on confirmatory laboratory testing and expert opinion are similar. Given the complexity of the HEP scoring model compared to that of the 4T score, further validation of the HEP score is warranted prior to widespread clinical acceptance.
Development and Validation of Scores from an Instrument Measuring Student Test-Taking Motivation
ERIC Educational Resources Information Center
Eklof, Hanna
2006-01-01
Using the expectancy-value model of achievement motivation as a basis, this study's purpose is to develop, apply, and validate scores from a self-report instrument measuring student test-taking motivation. Sampled evidence of construct validity for the present sample indicates that a number of the items in the instrument could be used as an…
Chowriappa, Ashirwad J; Shi, Yi; Raza, Syed Johar; Ahmed, Kamran; Stegemann, Andrew; Wilding, Gregory; Kaouk, Jihad; Peabody, James O; Menon, Mani; Hassett, James M; Kesavadas, Thenkurussi; Guru, Khurshid A
2013-12-01
A standardized scoring system does not exist in virtual reality-based assessment metrics to describe safe and crucial surgical skills in robot-assisted surgery. This study aims to develop an assessment score along with its construct validation. All subjects performed key tasks on previously validated Fundamental Skills of Robotic Surgery curriculum, which were recorded, and metrics were stored. After an expert consensus for the purpose of content validation (Delphi), critical safety determining procedural steps were identified from the Fundamental Skills of Robotic Surgery curriculum and a hierarchical task decomposition of multiple parameters using a variety of metrics was used to develop Robotic Skills Assessment Score (RSA-Score). Robotic Skills Assessment mainly focuses on safety in operative field, critical error, economy, bimanual dexterity, and time. Following, the RSA-Score was further evaluated for construct validation and feasibility. Spearman correlation tests performed between tasks using the RSA-Scores indicate no cross correlation. Wilcoxon rank sum tests were performed between the two groups. The proposed RSA-Score was evaluated on non-robotic surgeons (n = 15) and on expert-robotic surgeons (n = 12). The expert group demonstrated significantly better performance on all four tasks in comparison to the novice group. Validation of the RSA-Score in this study was carried out on the Robotic Surgical Simulator. The RSA-Score is a valid scoring system that could be incorporated in any virtual reality-based surgical simulator to achieve standardized assessment of fundamental surgical tents during robot-assisted surgery. Copyright © 2013 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Knapp, Deirdre J.; Pliske, Rebecca M.
A study was conducted to validate the Army's Computerized Adaptive Screening Test (CAST), using data from 2,240 applicants from 60 army recruiting stations across the nation. CAST is a computer-assisted adaptive test used to predict performance on the Armed Forces Qualification Test (AFQT). AFQT scores are computed by adding four subtest scores of…
Translation and adaptation of the fatigue severity scale for use in Portugal.
Laranjeira, Carlos António
2012-08-01
The Fatigue Severity Scale (FSS) is a widely used instrument to measure the impact of fatigue on specific types of functioning. This study aims to translate and test the reliability and validity of the Portuguese version of the FSS. The questionnaire was administered to a worker sample of 424 nurses. Reliability analysis showed satisfactory results (Cronbach's alpha coefficient = .87). The test-retest reliability was .85. The principal component analysis showed that the FSS was a measure with a one-factor structure. The construct validity of the total FSS score was assessed by correlation with Maslach Burnout Inventory (MBI) score, Depression Anxiety Stress Scale (DASS) score, and Visual Analogue Scale (VAS) score. Each of the corresponding correlation coefficients among the total FSS score and MBI score, DASS score, and perceived fatigue score (VAS) were .55 (p < .01), .62 (p < .01), and .68 (p < .01), respectively, which shows sufficient construct validity. To measure the discriminant validity of FSS, we examined the differences in scores between groups in terms of the number of hours of sleep and overtime. The less nurses slept and the longer they worked, the higher their total FSS score became. This preliminary validation study of the Portuguese version of FSS proved that it is an acceptable, reliable, and valid measure of fatigue in the working population. Copyright © 2012 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Young, John W.; Klieger, David; Bochenek, Jennifer; Li, Chen; Cline, Fred
2014-01-01
Scores from the "GRE"® revised General Test provide important information regarding the verbal and quantitative reasoning abilities and analytical writing skills of applicants to graduate programs. The validity and utility of these scores depend upon the degree to which the scores predict success in graduate and business school in…
Technical analysis of the Slosson Written Expression Test.
Erford, Bradley T; Hofler, Donald B
2004-06-01
The Slosson Written Expression Test was designed to assess students ages 8-17 years at risk for difficulties in written expression. Scores from three independent samples were used to evaluate the test's reliability and validity for measuring students' written expression. Test-retest reliability of the SWET subscales ranged from .80 to .94 (n = 151), and .95 for the Written Expression Total Standard Scores. The median alternate-form reliability for students' Written Expression Total Standard Scores was .81 across the three forms. Scores on the Slosson test yielded concurrent validity coefficients (n = 143) of .60 with scores from the Woodcock-Johnson: Tests of Achievement-Third Edition Broad Written Language Domain and .49 with scores on the Test of Written Language-Third Edition Spontaneous Writing Quotient. Exploratory factor analytic procedures suggested the Slosson test is comprised of two dimensions, Writing Mechanics and Writing Maturity (47.1% and 20.1% variance accounted for, respectively). In general, the Slosson Written Expression Test presents with sufficient technical characteristics to be considered a useful written expression screening test.
Automated smartphone audiometry: Validation of a word recognition test app.
Dewyer, Nicholas A; Jiradejvong, Patpong; Henderson Sabes, Jennifer; Limb, Charles J
2018-03-01
Develop and validate an automated smartphone word recognition test. Cross-sectional case-control diagnostic test comparison. An automated word recognition test was developed as an app for a smartphone with earphones. English-speaking adults with recent audiograms and various levels of hearing loss were recruited from an audiology clinic and were administered the smartphone word recognition test. Word recognition scores determined by the smartphone app and the gold standard speech audiometry test performed by an audiologist were compared. Test scores for 37 ears were analyzed. Word recognition scores determined by the smartphone app and audiologist testing were in agreement, with 86% of the data points within a clinically acceptable margin of error and a linear correlation value between test scores of 0.89. The WordRec automated smartphone app accurately determines word recognition scores. 3b. Laryngoscope, 128:707-712, 2018. © 2017 The American Laryngological, Rhinological and Otological Society, Inc.
Prince, Lisa K; Campbell, Ruth C; Gao, Sam W; Kendrick, Jessica; Lebrun, Christopher J; Little, Dustin J; Mahoney, David L; Maursetter, Laura A; Nee, Robert; Saddler, Mark; Watson, Maura A
2018-01-01
Abstract Background Few quantitative nephrology-specific simulations assess fellow competency. We describe the development and initial validation of a formative objective structured clinical examination (OSCE) assessing fellow competence in ordering acute dialysis. Methods The three test scenarios were acute continuous renal replacement therapy, chronic dialysis initiation in moderate uremia and acute dialysis in end-stage renal disease-associated hyperkalemia. The test committee included five academic nephrologists and four clinically practicing nephrologists outside of academia. There were 49 test items (58 points). A passing score was 46/58 points. No item had median relevance less than ‘important’. The content validity index was 0.91. Ninety-five percent of positive-point items were easy–medium difficulty. Preliminary validation was by 10 board-certified volunteers, not test committee members, a median of 3.5 years from graduation. The mean score was 49 [95% confidence interval (CI) 46–51], κ = 0.68 (95% CI 0.59–0.77), Cronbach’s α = 0.84. Results We subsequently administered the test to 25 fellows. The mean score was 44 (95% CI 43–45); 36% passed the test. Fellows scored significantly less than validators (P < 0.001). Of evidence-based questions, 72% were answered correctly by validators and 54% by fellows (P = 0.018). Fellows and validators scored least well on the acute hyperkalemia question. In self-assessing proficiency, 71% of fellows surveyed agreed or strongly agreed that the OSCE was useful. Conclusions The OSCE may be used to formatively assess fellow proficiency in three common areas of acute dialysis practice. Further validation studies are in progress. PMID:29644053
NASA Astrophysics Data System (ADS)
Zimmermann, Judith; von Davier, Alina A.; Buhmann, Joachim M.; Heinimann, Hans R.
2018-01-01
Graduate admission has become a critical process in tertiary education, whereby selecting valid admissions instruments is key. This study assessed the validity of Graduate Record Examination (GRE) General Test scores for admission to Master's programmes at a technical university in Europe. We investigated the indicative value of GRE scores for the Master's programme grade point average (GGPA) with and without the addition of the undergraduate GPA (UGPA) and the TOEFL score, and of GRE scores for study completion and Master's thesis performance. GRE scores explained 20% of the variation in the GGPA, while additional 7% were explained by the TOEFL score and 3% by the UGPA. Contrary to common belief, the GRE quantitative reasoning score showed only little explanatory power. GRE scores were also weakly related to study progress but not to thesis performance. Nevertheless, GRE and TOEFL scores were found to be sensible admissions instruments. Rigorous methodology was used to obtain highly reliable results.
Denehy, Linda; de Morton, Natalie A; Skinner, Elizabeth H; Edbrooke, Lara; Haines, Kimberley; Warrillow, Stephen; Berney, Sue
2013-12-01
Several tests have recently been developed to measure changes in patient strength and functional outcomes in the intensive care unit (ICU). The original Physical Function ICU Test (PFIT) demonstrates reliability and sensitivity. The aims of this study were to further develop the original PFIT, to derive an interval score (the PFIT-s), and to test the clinimetric properties of the PFIT-s. A nested cohort study was conducted. One hundred forty-four and 116 participants performed the PFIT at ICU admission and discharge, respectively. Original test components were modified using principal component analysis. Rasch analysis examined the unidimensionality of the PFIT, and an interval score was derived. Correlations tested validity, and multiple regression analyses investigated predictive ability. Responsiveness was assessed using the effect size index (ESI), and the minimal clinically important difference (MCID) was calculated. The shoulder lift component was removed. Unidimensionality of combined admission and discharge PFIT-s scores was confirmed. The PFIT-s displayed moderate convergent validity with the Timed "Up & Go" Test (r=-.60), the Six-Minute Walk Test (r=.41), and the Medical Research Council (MRC) sum score (rho=.49). The ESI of the PFIT-s was 0.82, and the MCID was 1.5 points (interval scale range=0-10). A higher admission PFIT-s score was predictive of: an MRC score of ≥48, increased likelihood of discharge home, reduced likelihood of discharge to inpatient rehabilitation, and reduced acute care hospital length of stay. Scoring of sit-to-stand assistance required is subjective, and cadence cutpoints used may not be generalizable. The PFIT-s is a safe and inexpensive test of physical function with high clinical utility. It is valid, responsive to change, and predictive of key outcomes. It is recommended that the PFIT-s be adopted to test physical function in the ICU.
Wolf, Timothy J; Dahl, Abigail; Auen, Colleen; Doherty, Meghan
2017-07-01
The objective of this study was to evaluate the inter-rater reliability, test-retest reliability, concurrent validity, and discriminant validity of the Complex Task Performance Assessment (CTPA): an ecologically valid performance-based assessment of executive function. Community control participants (n = 20) and individuals with mild stroke (n = 14) participated in this study. All participants completed the CTPA and a battery of cognitive assessments at initial testing. The control participants completed the CTPA at two different times one week apart. The intra-class correlation coefficient (ICC) for inter-rater reliability for the total score on the CTPA was .991. The ICCs for all of the sub-scores of the CTPA were also high (.889-.977). The CTPA total score was significantly correlated to Condition 4 of the DKEFS Color-Word Interference Test (p = -.425), and the Wechsler Test of Adult Reading (p = -.493). Finally, there were significant differences between control subjects and individuals with mild stroke on the total score of the CTPA (p = .007) and all sub-scores except interpretation failures and total items incorrect. These results are also consistent with other current executive function performance-based assessments and indicate that the CTPA is a reliable and valid performance-based measure of executive function.
El-Housseiny, Azza A; Alsadat, Farah A; Alamoudi, Najlaa M; El Derwi, Douaa A; Farsi, Najat M; Attar, Moaz H; Andijani, Basil M
2016-04-14
Early recognition of dental fear is essential for the effective delivery of dental care. This study aimed to test the reliability and validity of the Arabic version of the Children's Fear Survey Schedule-Dental Subscale (CFSS-DS). A school-based sample of 1546 children was randomly recruited. The Arabic version of the CFSS-DS was completed by children during class time. The scale was tested for internal consistency and test-retest reliability. To test criterion validity, children's behavior was assessed using the Frankl scale during dental examination, and results were compared with children's CFSS-DS scores. To test the scale's construct validity, scores on "fear of going to the dentist soon" were correlated with CFSS-DS scores. Factor analysis was also used. The Arabic version of the CFSS-DS showed high reliability regarding both test-retest reliability (intraclass correlation = 0.83, p < 0.001) and internal consistency (Cronbach's α = 0.88). It showed good criterion validity: children with negative behavior had significantly higher fear scores (t = 13.67, p < 0.001). It also showed moderate construct validity (Spearman's rho correlation, r = 0.53, p < 0.001). Factor analysis identified the following factors: "fear of invasive dental procedures," "fear of less invasive dental procedures" and "fear of strangers." The Arabic version of the CFSS-DS is a reliable and valid measure of dental fear in Arabic-speaking children. Pediatric dentists and researchers may use this validated version of the CFSS-DS to measure dental fear in Arabic-speaking children.
Yapali, Gökmen; Günel, Mintaze Kerem; Karahan, Sevilay
2012-05-15
The study design was cross-cultural adaptation and investigation of reliability and validity of the Copenhagen Neck Functional Disability Scale (CNFDS). The aim of this study was to translate the CNFDS into Turkish language and assess its reliability and validity among patients with neck pain in Turkish population. The CNFDS is a reliable and valid evaluation instrument for disability, but there is no published the Turkish version of the CNFDS. One hundred one subjects who had chronic neck pain were included in this study. The CNFDS, Neck Pain and Disability Scale, and visual analogue scale were administered to all subjects. For investigating test-retest reliability, correlation between CNFDS scores, applied at 1-week interval, intraclass correlation coefficient score for test-retest reliability was 0.86 (95% confidence interval = 0.679-0.935). There was no difference between test-retest scores (P < 0.001). For investigating concurrent validity, correlation between total score of the CNFDS and the mean visual analogue scale was r = 0.73 (P < 0.001). Concurrent validity of the CNFDS was very good. For investigating construct validity, correlation between total score of the CNFDS and the Neck Pain and Disability Scale was r = 0.78 (P < 0.001). Construct validity of the CNFDS was also very good. Our results suggest that the Turkish version of the CNFDS is a reliable and valid instrument for Turkish people.
Jensen, Christian Gaden; Niclasen, Janni; Vangkilde, Signe Allerup; Petersen, Anders; Hasselbalch, Steen Gregers
2016-05-01
The Mindful Attention Awareness Scale (MAAS) measures perceived degree of inattentiveness in different contexts and is often used as a reversed indicator of mindfulness. MAAS is hypothesized to reflect a psychological trait or disposition when used outside attentional training contexts, but the long-term test-retest reliability of MAAS scores is virtually untested. It is unknown whether MAAS predicts psychological health after controlling for standardized socioeconomic status classifications. First, MAAS translated to Danish was validated psychometrically within a randomly invited healthy adult community sample (N = 490). Factor analysis confirmed that MAAS scores quantified a unifactorial construct of excellent composite reliability and consistent convergent validity. Structural equation modeling revealed that MAAS scores contributed independently to predicting psychological distress and mental health, after controlling for age, gender, income, socioeconomic occupational class, stressful life events, and social desirability (β = 0.32-.42, ps < .001). Second, MAAS scores showed satisfactory short-term test-retest reliability in 100 retested healthy university students. Finally, MAAS sample mean scores as well as individuals' scores demonstrated satisfactory test-retest reliability across a 6 months interval in the adult community (retested N = 407), intraclass correlations ≥ .74. MAAS scores displayed significantly stronger long-term test-retest reliability than scores measuring psychological distress (z = 2.78, p = .005). Test-retest reliability estimates did not differ within demographic and socioeconomic strata. Scores on the Danish MAAS were psychometrically validated in healthy adults. MAAS's inattentiveness scores reflected a unidimensional construct, long-term reliable disposition, and a factor of independent significance for predicting psychological health. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Fundamentals of endoscopic surgery: creation and validation of the hands-on test.
Vassiliou, Melina C; Dunkin, Brian J; Fried, Gerald M; Mellinger, John D; Trus, Thadeus; Kaneva, Pepa; Lyons, Calvin; Korndorffer, James R; Ujiki, Michael; Velanovich, Vic; Kochman, Michael L; Tsuda, Shawn; Martinez, Jose; Scott, Daniel J; Korus, Gary; Park, Adrian; Marks, Jeffrey M
2014-03-01
The Fundamentals of Endoscopic Surgery™ (FES) program consists of online materials and didactic and skills-based tests. All components were designed to measure the skills and knowledge required to perform safe flexible endoscopy. The purpose of this multicenter study was to evaluate the reliability and validity of the hands-on component of the FES examination, and to establish the pass score. Expert endoscopists identified the critical skill set required for flexible endoscopy. They were then modeled in a virtual reality simulator (GI Mentor™ II, Simbionix™ Ltd., Airport City, Israel) to create five tasks and metrics. Scores were designed to measure both speed and precision. Validity evidence was assessed by correlating performance with self-reported endoscopic experience (surgeons and gastroenterologists [GIs]). Internal consistency of each test task was assessed using Cronbach's alpha. Test-retest reliability was determined by having the same participant perform the test a second time and comparing their scores. Passing scores were determined by a contrasting groups methodology and use of receiver operating characteristic curves. A total of 160 participants (17 % GIs) performed the simulator test. Scores on the five tasks showed good internal consistency reliability and all had significant correlations with endoscopic experience. Total FES scores correlated 0.73, with participants' level of endoscopic experience providing evidence of their validity, and their internal consistency reliability (Cronbach's alpha) was 0.82. Test-retest reliability was assessed in 11 participants, and the intraclass correlation was 0.85. The passing score was determined and is estimated to have a sensitivity (true positive rate) of 0.81 and a 1-specificity (false positive rate) of 0.21. The FES hands-on skills test examines the basic procedural components required to perform safe flexible endoscopy. It meets rigorous standards of reliability and validity required for high-stakes examinations, and, together with the knowledge component, may help contribute to the definition and determination of competence in endoscopy.
de Vreede, Paul L; Samson, Monique M; van Meeteren, Nico L; Duursma, Sijmen A; Verhaar, Harald J
2006-08-01
The Assessment of Daily Activity Performance (ADAP) test was developed, and modeled after the Continuous-scale Physical Functional Performance (CS-PFP) test, to provide a quantitative assessment of older adults' physical functional performance. The aim of this study was to determine the intra-examiner reliability and construct validity of the ADAP in a community-living older population, and to identify the importance of tester experience. Forty-three community-dwelling, older women (mean age 75 yr +/-4.3) were randomized to the test-retest reliability study (n=19) or validation study (n=24). The intra-examiner reliability of an experienced (tester 1) and an inexperienced tester (tester 2) was assessed by comparing test and retest scores of 19 participants. Construct validity was assessed by comparing the ADAP scores of 24 participants with self-perceived function by the SF-36 Health Survey, muscle function tests, and the Timed Up and Go test (TUG). Tester 1 had good consistency and reliability scores (mean difference between test and retest scores (DIF), -1.05+/-1.99; 95% confidence interval (CI), -2.58 to 0.48; Cronbach's alpha (alpha) range, 0.83 to 0.98; intraclass correlation (ICC) range, 0.75 to 0.96; Limits of Agreement (LoA), -2.58 to 4.95). Tester 2 had lower reliability scores (DIF, -2.45+/-4.36; 95% CI, -5.56 to 0.67; alpha range, 0.53 to 0.94; ICC range, 0.36 to 0.90; LoA, -6.09 to 10.99), with a systematic difference between test and retest scores for the ADAP domain lower-body strength (-3.81; 95% CI, -6.09 to -1.54), ADAP correlated with SF-36 Physical Functioning scale (r=0.67), TUG test (r=-0.91) and with isometric knee extensor strength (r=0.80). The ADAP test is a reliable and valid instrument. Our results suggest that testers should practise using the test, to improve reliability, before applying it to clinical settings.
ERIC Educational Resources Information Center
Feldt, Leonard S.
2004-01-01
In some settings, the validity of a battery composite or a test score is enhanced by weighting some parts or items more heavily than others in the total score. This article describes methods of estimating the total score reliability coefficient when differential weights are used with items or parts.
McGaghie, William C; Cohen, Elaine R; Wayne, Diane B
2011-01-01
United States Medical Licensing Examination (USMLE) scores are frequently used by residency program directors when evaluating applicants. The objectives of this report are to study the chain of reasoning and evidence that underlies the use of USMLE Step 1 and 2 scores for postgraduate medical resident selection decisions and to evaluate the validity argument about the utility of USMLE scores for this purpose. This is a research synthesis using the critical review approach. The study first describes the chain of reasoning that underlies a validity argument about using test scores for a specific purpose. It continues by summarizing correlations of USMLE Step 1 and 2 scores and reliable measures of clinical skill acquisition drawn from nine studies involving 393 medical learners from 2005 to 2010. The integrity of the validity argument about using USMLE Step 1 and 2 scores for postgraduate residency selection decisions is tested. The research synthesis shows that USMLE Step 1 and 2 scores are not correlated with reliable measures of medical students', residents', and fellows' clinical skill acquisition. The validity argument about using USMLE Step 1 and 2 scores for postgraduate residency selection decisions is neither structured, coherent, nor evidence based. The USMLE score validity argument breaks down on grounds of extrapolation and decision/interpretation because the scores are not associated with measures of clinical skill acquisition among advanced medical students, residents, and subspecialty fellows. Continued use of USMLE Step 1 and 2 scores for postgraduate medical residency selection decisions is discouraged.
AlHeresh, Rawan; LaValley, Michael P; Coster, Wendy; Keysor, Julie J
2017-06-01
To evaluate construct validity and scoring methods of the world health organization-health and work performance questionnaire (HPQ) for people with arthritis. Construct validity was examined through hypothesis testing using the recommended guidelines of the consensus-based standards for the selection of health measurement instruments (COSMIN). The HPQ using the absolute scoring method showed moderate construct validity as four of the seven hypotheses were met. The HPQ using the relative scoring method had weak construct validity as only one of the seven hypotheses were met. The absolute scoring method for the HPQ is superior in construct validity to the relative scoring method in assessing work performance among people with arthritis and related rheumatic conditions; however, more research is needed to further explore other psychometric properties of the HPQ.
Convergent and diagnostic validity of STAVUX, a word and pseudoword spelling test for adults.
Östberg, Per; Backlund, Charlotte; Lindström, Emma
2016-10-01
Few comprehensive spelling tests are available in Swedish, and none have been validated in adults with reading and writing disorders. The recently developed STAVUX test includes word and pseudoword spelling subtests with high internal consistency and adult norms stratified by education. This study evaluated the convergent and diagnostic validity of STAVUX in adults with dyslexia. Forty-six adults, 23 with dyslexia and 23 controls, took STAVUX together with a standard word-decoding test and a self-rated measure of spelling skills. STAVUX subtest scores showed moderate to strong correlations with word-decoding scores and predicted self-rated spelling skills. Word and pseudoword subtest scores both predicted dyslexia status. Receiver-operating characteristic (ROC) analysis showed excellent diagnostic discriminability. Sensitivity was 91% and specificity 96%. In conclusion, the results of this study support the convergent and diagnostic validity of STAVUX.
Baum, C M; Wolf, T J; Wong, A W K; Chen, C H; Walker, K; Young, A C; Carlozzi, N E; Tulsky, D S; Heaton, R K; Heinemann, A W
2017-07-01
This study examined the relationships between the Executive Function Performance Test (EFPT), the NIH Toolbox Cognitive Function tests, and neuropsychological executive function measures in 182 persons with traumatic brain injury (TBI) and 46 controls to evaluate construct, discriminant, and predictive validity. Construct validity: There were moderate correlations between the EFPT and the NIH Toolbox Crystallized (r = -.479), Fluid Tests (r = -.420), and Total Composite Scores (r = -.496). Discriminant validity: Significant differences were found in the EFPT total and sequence scores across control, complicated mild/moderate, and severe TBI groups. We found differences in the organisation score between control and severe, and between mild and severe TBI groups. Both TBI groups had significantly lower scores in safety and judgement than controls. Compared to the controls, the severe TBI group demonstrated significantly lower performance on all instrumental activities of daily living (IADL) tasks. Compared to the mild TBI group, the controls performed better on the medication task, the severe TBI group performed worse in the cooking and telephone tasks. Predictive validity: The EFPT predicted the self-perception of independence measured by the TBI-QOL (beta = -0.49, p < .001) for the severe TBI group. Overall, these data support the validity of the EFPT for use in individuals with TBI.
Brett, Benjamin L; Solomon, Gary S
2017-04-01
Research findings to date on the stability of Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) Composite scores have been inconsistent, requiring further investigation. The use of test validity criteria across these studies also has been inconsistent. Using multiple measures of stability, we examined test-retest reliability of repeated ImPACT baseline assessments in high school athletes across various validity criteria reported in previous studies. A total of 1146 high school athletes completed baseline cognitive testing using the online ImPACT test battery at two time periods of approximately two-year intervals. No participant sustained a concussion between assessments. Five forms of validity criteria used in previous test-retest studies were applied to the data, and differences in reliability were compared. Intraclass correlation coefficients (ICCs) ranged in composite scores from .47 (95% confidence interval, CI [.38, .54]) to .83 (95% CI [.81, .85]) and showed little change across a two-year interval for all five sets of validity criteria. Regression based methods (RBMs) examining the test-retest stability demonstrated a lack of significant change in composite scores across the two-year interval for all forms of validity criteria, with no cases falling outside the expected range of 90% confidence intervals. The application of more stringent validity criteria does not alter test-retest reliability, nor does it account for some of the variation observed across previously performed studies. As such, use of the ImPACT manual validity criteria should be utilized in the determination of test validity and in the individualized approach to concussion management. Potential future efforts to improve test-retest reliability are discussed.
Validity and reliability of the Self-Reported Physical Fitness (SRFit) survey.
Keith, NiCole R; Clark, Daniel O; Stump, Timothy E; Miller, Douglas K; Callahan, Christopher M
2014-05-01
An accurate physical fitness survey could be useful in research and clinical care. To estimate the validity and reliability of a Self-Reported Fitness (SRFit) survey; an instrument that estimates muscular fitness, flexibility, cardiovascular endurance, BMI, and body composition (BC) in adults ≥ 40 years of age. 201 participants completed the SF-36 Physical Function Subscale, International Physical Activity Questionnaire (IPAQ), Older Adults' Desire for Physical Competence Scale (Rejeski), the SRFit survey, and the Rikli and Jones Senior Fitness Test. BC, height and weight were measured. SRFit survey items described BC, BMI, and Senior Fitness Test movements. Correlations between the Senior Fitness Test and the SRFit survey assessed concurrent validity. Cronbach's Alpha measured internal consistency within each SRFit domain. SRFit domain scores were compared with SF-36, IPAQ, and Rejeski survey scores to assess construct validity. Intraclass correlations evaluated test-retest reliability. Correlations between SRFit and the Senior Fitness Test domains ranged from 0.35 to 0.79. Cronbach's Alpha scores were .75 to .85. Correlations between SRFit and other survey scores were -0.23 to 0.72 and in the expected direction. Intraclass correlation coefficients were 0.79 to 0.93. All P-values were 0.001. Initial evaluation supports the SRFit survey's validity and reliability.
Development and Validation of a Mobile Device-based External Ventricular Drain Simulator.
Morone, Peter J; Bekelis, Kimon; Root, Brandon K; Singer, Robert J
2017-10-01
Multiple external ventricular drain (EVD) simulators have been created, yet their cost, bulky size, and nonreusable components limit their accessibility to residency programs. To create and validate an animated EVD simulator that is accessible on a mobile device. We developed a mobile-based EVD simulator that is compatible with iOS (Apple Inc., Cupertino, California) and Android-based devices (Google, Mountain View, California) and can be downloaded from the Apple App and Google Play Store. Our simulator consists of a learn mode, which teaches users the procedure, and a test mode, which assesses users' procedural knowledge. Twenty-eight participants, who were divided into expert and novice categories, completed the simulator in test mode and answered a postmodule survey. This was graded using a 5-point Likert scale, with 5 representing the highest score. Using the survey results, we assessed the module's face and content validity, whereas construct validity was evaluated by comparing the expert and novice test scores. Participants rated individual survey questions pertaining to face and content validity a median score of 4 out of 5. When comparing test scores, generated by the participants completing the test mode, the experts scored higher than the novices (mean, 71.5; 95% confidence interval, 69.2 to 73.8 vs mean, 48; 95% confidence interval, 44.2 to 51.6; P < .001). We created a mobile-based EVD simulator that is inexpensive, reusable, and accessible. Our results demonstrate that this simulator is face, content, and construct valid. Copyright © 2017 by the Congress of Neurological Surgeons
ERIC Educational Resources Information Center
Goldhaber, Dan; Gratz, Trevor; Theobald, Roddy
2016-01-01
We investigate the predictive validity of teacher credential test scores for student performance in secondary STEM classrooms in Washington state. After replicating earlier findings that teacher basic skills licensure test scores are a modest and statistically significant predictor of student math test score gains in elementary grades, we focus on…
ERIC Educational Resources Information Center
Lee, Tayla T. C.; Graham, John R.; Sellbom, Martin; Gervais, Roger O.
2012-01-01
Using a sample of individuals undergoing medico-legal evaluations (690 men, 519 women), the present study extended past research on potential gender biases for scores of the Symptom Validity (FBS) scale of the Minnesota Multiphasic Personality Inventory-2 by examining score- and item-level differences between men and women and determining the…
Validation and cross cultural adaptation of the Italian version of the Harris Hip Score.
Dettoni, Federico; Pellegrino, Pietro; La Russa, Massimo R; Bonasia, Davide E; Blonna, Davide; Bruzzone, Matteo; Castoldi, Filippo; Rossi, Roberto
2015-01-01
The Harris Hip Score (HHS) is one of the most widely used health related quality of life (HRQOL) measures for the assessment of hip pathology: in spite of this, a validation study, and an official Italian version have not been provided yet. The aim of this study was to create an Italian valid and reliable version of the HHS. The score was translated and modified in Italian; then 103 patients with different hip pathologies were evaluated using this HHS version and also with the WOMAC and the SF-12 questionnaires. Content, construct and criterion validities were tested, such as interobserver reliability, test-retest reliability and internal consistency. Cross-cultural adaptation was easy, and only minor adaptation was required in the translation process. Construct and criterion validity of the HHS Italian Version were confirmed by satisfactory values of Spearman's Rho for correlation between specific domains of HHS and Womac and SF12 scores. Interobserver and test-retest reliabilities obtained values of 0.996 and 0.975 respectively; Cronbach's alpha for internal consistency was 0.816. Statistical and clinical analysis showed that HHS is highly valid and reliable in this new Italian version.
ERIC Educational Resources Information Center
Erford, Bradley T.; Alsamadi, Silvana C.
2012-01-01
Score reliability and validity of parent responses concerning their 10- to 17-year-old students were analyzed using the Screening Test for Emotional Problems-Parent Report (STEP-P), which assesses a variety of emotional problems classified under the Individuals with Disabilities Education Improvement Act. Score reliability, convergent, and…
Cross-cultural validity of a dietary questionnaire for studies of dental caries risk in Japanese.
Shinga-Ishihara, Chikako; Nakai, Yukie; Milgrom, Peter; Murakami, Kaori; Matsumoto-Nakano, Michiyo
2014-01-02
Diet is a major modifiable contributing factor in the etiology of dental caries. The purpose of this paper is to examine the reliability and cross-cultural validity of the Japanese version of the Food Frequency Questionnaire to assess dietary intake in relation to dental caries risk in Japanese. The 38-item Food Frequency Questionnaire, in which Japanese food items were added to increase content validity, was translated into Japanese, and administered to two samples. The first sample comprised 355 pregnant women with mean age of 29.2 ± 4.2 years for the internal consistency and criterion validity analyses. Factor analysis (principal components with Varimax rotation) was used to determine dimensionality. The dietary cariogenicity score was calculated from the Food Frequency Questionnaire and used for the analyses. Salivary mutans streptococci level was used as a semi-quantitative assessment of dental caries risk and measured by Dentocult SM. Dentocult SM scores were compared with the dietary cariogenicity score computed from the Food Frequency Questionnaire to examine criterion validity, and assessed by Spearman's correlation coefficient (rs) and Kruskal-Wallis test. Test-retest reliability of the Food Frequency Questionnaire was assessed with a second sample of 25 adults with mean age of 34.0 ± 3.0 years by using the intraclass correlation coefficient analysis. The Japanese language version of the Food Frequency Questionnaire showed high test-retest reliability (ICC = 0.70) and good criterion validity assessed by relationship with salivary mutans streptococci levels (rs = 0.22; p < 0.001). Factor analysis revealed four subscales that construct the questionnaire (solid sugars, solid and starchy sugars, liquid and semisolid sugars, sticky and slowly dissolving sugars). Internal consistency were low to acceptable (Cronbach's alpha = 0.67 for the total scale, 0.46-0.61 for each subscale). Mean dietary cariogenicity scores were 50.8 ± 19.5 in the first sample, 47.4 ± 14.1, and 40.6 ± 11.3 for the first and second administrations in the second sample. The distribution of Dentocult SM score was 6.8% (score = 0), 34.4% (score = 1), 39.4% (score = 2), and 19.4% (score = 3). Participants with higher scores were more likely to have higher dietary cariogenicity scores (p < 0.001; Kruskal-Wallis test). These results provide the preliminary evidence for the reliability and validity of the Japanese language Food Frequency Questionnaire.
Vyas, Shaleen; Nagarajappa, Sandesh; Dasar, Pralhad L.; Mishra, Prashant
2018-01-01
AIM: To translate OHIP-14 into Hindi and test its psychometric properties among school teacher community. METHODS: The OHIP-14 was translated to OHIP-14-H using WHO recommended translation protocol. During pre-testing, an expert panel assessed content validity of the questionnaire. Face validity was assessed on a sample of 10 individuals. The OHIP-14-H was administered on a random sample of 170 primary school teachers. Internal consistency and test-retest reliability were assessed using Cronbach's alpha and Intra-class correlation coefficient (ICC) respectively, with 2 weeks interval. Predictive validity was tested by comparing OHIP-14-H scores with clinical parameters. The concurrent validity was assessed using self-reported oral health and discriminant validity was ascertained through negative association with sociodemographic variables. RESULTS: The mean OHIP-14-H score was 9.57 (S.D = 4.58). ICC and Cronbach's alpha for OHIP-14-H was 0.96 and 0.92 respectively. Concurrent validity using binomial regression model indicated that good (OR = 0.56, 95% CI = 0.55 – 4.47) and moderate (OR = 0.25, 95% CI = 0.17 – 1.87) OHIP-14-H scores were negative but significant risk indicators of poor self reported oral health (P < 0.009). Significant predictive validity was observed between OHIP-14-H scores and clinical parameters (P < 0.000). CONCLUSION: Translated and culturally adapted OHIP-14-H indicates good reliability and validity among primary school teachers. PMID:29417064
de los Santos, Gonzalo; Reyes, Pablo; del Castillo, Raúl; Fragola, Claudio; Royuela, Ana
2015-11-01
Our objective was to perform translation, cross-cultural adaptation and validation of the sino-nasal outcome test 22 (SNOT-22) to Spanish language. SNOT-22 was translated, back translated, and a pretest trial was performed. The study included 119 individuals divided into 60 cases, who met diagnostic criteria for chronic rhinosinusitis according to the European Position Paper on Rhinosinusitis 2012; and 59 controls, who reported no sino-nasal disease. Internal consistency was evaluated with Cronbach's alpha test, reproducibility with Kappa coefficient, reliability with intraclass correlation coefficient (ICC), validity with Mann-Whitney U test and responsiveness with Wilcoxon test. In cases, Cronbach's alpha was 0.91 both before and after treatment, as for controls, it was 0.90 at their first test assessment and 0.88 at 3 weeks. Kappa coefficient was calculated for each item, with an average score of 0.69. ICC was also performed for each item, with a score of 0.87 in the overall score and an average among all items of 0.71. Median score for cases was 47, and 2 for controls, finding the difference to be highly significant (Mann-Whitney U test, p < 0.001). Clinical changes were observed among treated patients, with a median score of 47 and 13.5 before and after treatment, respectively (Wilcoxon test, p < 0.001). The effect size resulted in 0.14 in treated patients whose status at 3 weeks was unvarying; 1.03 in those who were better and 1.89 for much better group. All controls were unvarying with an effect size of 0.05. The Spanish version of the SNOT-22 has the internal consistency, reliability, reproducibility, validity and responsiveness necessary to be a valid instrument to be used in clinical practice.
Validity of the Microcomputer Evaluation Screening and Assessment Aptitude Scores.
ERIC Educational Resources Information Center
Janikowski, Timothy P.; And Others
1991-01-01
Examined validity of Microcomputer Evaluation Screening and Assessment (MESA) aptitude scores relative to General Aptitude Test Battery (GATB) using multitrait-multimethod correlational analyses. Findings from 54 rehabilitation clients and 29 displaced workers revealed no evidence to support the construct validity of the MESA. (Author/NB)
Examining the Predictive Validity of NIH Peer Review Scores
Lindner, Mark D.; Nakamura, Richard K.
2015-01-01
The predictive validity of peer review at the National Institutes of Health (NIH) has not yet been demonstrated empirically. It might be assumed that the most efficient and expedient test of the predictive validity of NIH peer review would be an examination of the correlation between percentile scores from peer review and bibliometric indices of the publications produced from funded projects. The present study used a large dataset to examine the rationale for such a study, to determine if it would satisfy the requirements for a test of predictive validity. The results show significant restriction of range in the applications selected for funding. Furthermore, those few applications that are funded with slightly worse peer review scores are not selected at random or representative of other applications in the same range. The funding institutes also negotiate with applicants to address issues identified during peer review. Therefore, the peer review scores assigned to the submitted applications, especially for those few funded applications with slightly worse peer review scores, do not reflect the changed and improved projects that are eventually funded. In addition, citation metrics by themselves are not valid or appropriate measures of scientific impact. The use of bibliometric indices on their own to measure scientific impact would likely increase the inefficiencies and problems with replicability already largely attributed to the current over-emphasis on bibliometric indices. Therefore, retrospective analyses of the correlation between percentile scores from peer review and bibliometric indices of the publications resulting from funded grant applications are not valid tests of the predictive validity of peer review at the NIH. PMID:26039440
Erel, Suat; Şimşek, İbrahim Engin; Özkan, Hüseyin
2015-01-01
The aim of this study was to analyze the validity and reliability of the Turkish version (ICOAP-TR) of the intermittent and constant osteoarthritis pain (ICOAP) questionnaire in patients with knee osteoarthritis (OA). Thirty-eight volunteer patients diagnosed with knee OA answered the questionnaire twice with an interval of 2-4 days. The reliability of the measurement was assessed using Cronbach's alpha coefficient and intraclass correlation (ICC) for test-retest reliability. Criterion validity was tested against the Western Ontario and McMaster Universities Arthritis Index (WOMAC) pain score and visual analog scale (VAS) designed to assess the perceived discomfort rated by the patient. Test-retest reliability was found to be ICC=0.942 for total score, 0.902 for constant pain subscale, and 0.945 for intermittent pain subscale. Internal consistency was tested using Cronbach's alpha and was found to be 0.970 for total score, 0.948 for constant pain subscale, and 0.972 for intermittent pain subscale. For criterion validity, the correlation between the total score of ICOAP-TR and WOMAC pain subscale was r=0.779 (p<0.05), and correlation between total score of ICOAP-TR and VAS was r=0.570 (p<0.05). The ICOAP-TR is a reliable and valid instrument to be used with patients with knee OA.
ERIC Educational Resources Information Center
Haertel, Edward H.
2013-01-01
Policymakers and school administrators have embraced value-added models of teacher effectiveness as tools for educational improvement. Teacher value-added estimates may be viewed as complicated scores of a certain kind. This suggests using a test validation model to examine their reliability and validity. Validation begins with an interpretive…
Reliability and validity of a questionnaire for self-assessment of complete dentures.
Komagamine, Yuriko; Kanazawa, Manabu; Kaiba, Yoshinori; Sato, Yusuke; Minakuchi, Shunsuke
2014-05-02
Demand for complete denture treatment is expected to rise over several decades. However, to date, no questionnaire on complete dentures, as evaluated by edentulous patients, has been shown to be reliable and valid. This study sought to assess the reliability and validity of Patient's Denture Assessment (PDA), which provides a multidimensional evaluation of dentures among edentulous patients. Patients, who had new complete dentures fabricated at the University Hospital of Dentistry, Tokyo Medical and Dental University through 2009 to 2010, were enrolled. The reliability of the PDA was determined by examining internal consistency and test-retest reliability. Internal consistency for all of the question items and the six subscales was measured using Cronbach's α and average inter-item correlation coefficients among 93 participants. For 33 of these participants, test-retest reliability was determined at a 2 month-interval using the interclass correlation coefficients (ICCs) and 95% confidence interval for the summary scores and the six subscale scores. The PDA was validated in 93 participants by examining the difference in the summary score and the six subscale scores of the PDA before and after replacement with new dentures by the paired t-test. Ability to detect change was also tested in 93 patients using effect size. The Cronbach's α for the PDA ranged from 0.56 to 0.93. The average inter-item correlation coefficients ranged from 0.28 to 0.83. ICCs for the PDA ranged from 0.37 to 0.83. The paired t-test showed a significant difference between the summary score and the six subscale scores before and after replacement with new dentures (p < 0.05) and the effect size was 0.97. The PDA demonstrated good reliability by assessing internal consistency and test-retest reliability. In addition, the PDA demonstrated good validity by assessing discriminant validity. Thus, the PDA could help dentists obtain a detailed understanding of the patients' perceptions in using their dentures.
Evidence of Construct Validity in Published Achievement Tests.
ERIC Educational Resources Information Center
Nolet, Victor; Tindal, Gerald
Valid interpretation of test scores is the shared responsibility of the test designer and the test user. Test publishers must provide evidence of the validity of the decisions their tests are intended to support, while test users are responsible for analyzing this evidence and subsequently using the test in the manner indicated by the publisher.…
Sleeper, Mark D; Kenyon, Lisa K; Elliott, James M; Cheng, M Samuel
2016-12-01
Despite the availability of various field-tests for many competitive sports, a reliable and valid test specifically developed for use in men's gymnastics has not yet been developed. The Men's Gymnastics Functional Measurement Tool (MGFMT) was designed to assess sport-specific physical abilities in male competitive gymnasts. The purpose of this study was to develop the MGFMT by establishing a scoring system for individual test items and to initiate the process of establishing test-retest reliability and construct validity. A total of 83 competitive male gymnasts ages 7-18 underwent testing using the MGFMT. Thirty of these subjects underwent re-testing one week later in order to assess test-retest reliability. Construct validity was assessed using a simple regression analysis between total MGFMT scores and the gymnasts' USA-Gymnastics competitive level to calculate the coefficient of determination (r 2 ). Test-retest reliability was analyzed using Model 1 Intraclass correlation coefficients (ICC). Statistical significance was set at the p<0.05 level. The relationship between total MGFMT scores and subjects' current USA-Gymnastics competitive level was found to be good (r 2 = 0.63). Reliability testing of the MGFMT composite test score showed excellent test-retest reliability over a one-week period (ICC = 0.97). Test-retest reliability of the individual component tests ranged from good to excellent (ICC = 0.75-0.97). The results of this study provide initial support for the construct validity and test-retest reliability of the MGFMT. Level 3.
Cross-cultural adaptation and validation of the Turkish version of Oxford hip score.
Tuğay, Baki Umut; Tuğay, Nazan; Güney, Hande; Hazar, Zeynep; Yüksel, İnci; Atilla, Bülent
2015-06-01
The purpose of this study was to translate the Oxford hip score (OHS) into Turkish and to evaluate the psychometric properties by testing the internal consistency, reproducibility, construct validity, and responsiveness in patients with hip osteoarthritis (OA). Oxford hip score was translated and culturally adapted according to the guidelines in the literature. Seventy patients (mean age 61.45 ± 9.29 years) with hip osteoarthritis participated in the study. Patients completed the Turkish Oxford hip score (OHS-TR), the Short-Form 36 (SF-36), and Western Ontario and McMaster Universities Index (WOMAC). Internal consistency was tested using Cronbach's α coefficient. Patients completed OHS-TR questionnaire twice in 7 days for determining the reproducibility. Correlation between the total results of both tests was determined by the Pearson correlation coefficient and intraclass correlation coefficient (ICC). Validity was assessed by calculating the Pearson correlation coefficient between the OHS-TR and WOMAC and SF-36 scores. Floor and ceiling effects were analyzed. The internal consistency was high (Cronbach's α 0.93). The construct validity showed a significant correlation between the OHS-TR and WOMAC and related SF-36 domains (p < 0.001). The ICC's ranged between 0.80 and 0.99. There was no floor or ceiling effect in total OHS-TR score. The OHS-TR questionnaire is valid, reliable, and responsive for the Turkish-speaking patients with hip OA.
Comparative Predictive Validity of the New MCAT Using Different Admissions Criteria.
ERIC Educational Resources Information Center
Golmon, Melton E.; Berry, Charles A.
1981-01-01
New Medical College Admission Test (MCAT) scores and undergraduate academic achievement were examined for their validity in predicting the performance of two select student populations at Northwestern University Medical School. The data support the hypothesis that New MCAT scores possess substantial predictive validity. (Author/MLW)
ERIC Educational Resources Information Center
Schneider, W. Joel; Roman, Zachary
2018-01-01
We used data simulations to test whether composites consisting of cohesive subtest scores are more accurate than composites consisting of divergent subtest scores. We demonstrate that when multivariate normality holds, divergent and cohesive scores are equally accurate. Furthermore, excluding divergent scores results in biased estimates of…
Niemeijer, Anuschka S; van Waelvelde, Hilde; Smits-Engelsman, Bouwien C M
2015-02-01
The Movement Assessment Battery for Children has been revised as the Movement ABC-2 (Henderson, Sugden, & Barnett, 2007). In Europe, the 15th percentile score on this test is recommended for one of the DSM-IV diagnostic criteria for Developmental Coordination Disorder (DCD). A representative sample of Dutch and Flemish children was tested to cross-validate the UK standard scores, including the 15th percentile score. First, the mean, SD and percentile scores of Dutch children were compared to those of UK normative samples. Item standard scores of Dutch speaking children deviated from the UK reference values suggesting necessary adjustments. Except for very young children, the Dutch-speaking samples performed better. Second, based on the mean and SD and clinical relevant cut-off scores (5th and 15th percentile), norms were adjusted for the Dutch population. For diagnostic use, researchers and clinicians should use the reference norms that are valid for the group of children they are testing. The results indicate that there possibly is an effect of testing procedure in other countries that validated the UK norms and/or cultural influence on the age norms of the Movement ABC-2. It is suggested to formulate criterion-based norms for age groups in addition to statistical norms. Copyright © 2014 Elsevier B.V. All rights reserved.
Eye-Tracking as a Tool in Process-Oriented Reading Test Validation
ERIC Educational Resources Information Center
Solheim, Oddny Judith; Uppstad, Per Henning
2011-01-01
The present paper addresses the continuous need for methodological reflection on how to validate inferences made on the basis of test scores. Validation is a process that requires many lines of evidence. In this article we discuss the potential of eye tracking methodology in process-oriented reading test validation. Methodological considerations…
López-Miñarro, Pedro Ángel; Vaquero-Cristóbal, Raquel; Muyor, José María; Espejo-Antúnez, Luis
2015-07-01
lumbo-sacral posture and the sit-andreach score have been proposed as measures of hamstring extensibility. However, the validity is influenced by sample characteristics. to determine the validity of lumbo-horizontal angle and score in the sit-and-reach test as measures of hamstring extensibility in older women. a hundred and twenty older women performed the straight leg raise test with both leg, and the sit-and-reach test (SR) in a random order. For the sitand- reach test, the score and the lumbo-sacral posture in bending (lumbo-horizontal angle, L-Hfx) were measured. the mean values of straight leg raise in left and right leg were 81.70 ± 13.83º and 82.10 ± 14.36º, respectively. The mean value of EPR of both legs was 81.90 ± 12.70º. The mean values of SR score and L-Hfx were -1.54 ± 8.09 cm and 91.08º ± 9.32º, respectively. The correlation values between the mean straight leg raise test with respect to lumbo-sacral posture and SR score were moderate (L-Hfx: r = -0.72, p < 0.01; SR: r = 0.70, p < 0.01). Both variables independently explained about 50% of the variance (L-Hfx: R2 = 0.52, p < 0,001; SR: R2 = 0.49, p < 0,001). the validity of lumbo-sacral posture in bending as measure of hamstring muscle extensibility on older women is moderate, with similar values than SR score. Copyright AULA MEDICA EDICIONES 2014. Published by AULA MEDICA. All rights reserved.
Adaptation and validation of Common Object Token (COT) test into the Sinhalese language.
Jeyaraman, Janani; Kumarasinghe, Chameera; Mohamed Rafi, Shabnam Fathima; Mendis, Thirimadura Lakna Amalie; Abdul Rasheed, Fathima Shameema
2016-04-01
This manuscript presents a translation and adaptation of the Common Object Token (COT) test, which assesses speech perception, into the Sinhalese language and an attempt to validate it for use on children with normal hearing (NH) and children with a cochlear implant (CI). Ninety-five children (70 with NH, 25 with a CI) participated in the study. The COT test was translated, back-translated, and evaluated by a team of experts until the Sinhalese translation was deemed acceptable. Data of Sinhalese children with NH and values of children with a CI were analysed. Internal reliability and consistency of the COT total score were determined. Lastly, a quick version of the COT test was created. The total mean scores and subtest mean scores improved with age for children with NH. For children with a CI, a strong relationship between the COT total score and device experience, i.e. hearing age, was found. A Quick Sinhalese COT test version, suitable for children with a CI, could be created from Subtests 2, 3, and 4. The Sinhalese COT test is valid for assessing the age-related development of speech perception and identification skills of children with NH. Results suggest that the COT is valid for use in children with a CI. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Boonstra, Anne M; Schiphorst Preuper, Henrica R; Reneman, Michiel F; Posthumus, Jitze B; Stewart, Roy E
2008-06-01
To determine the reliability and concurrent validity of a visual analogue scale (VAS) for disability as a single-item instrument measuring disability in chronic pain patients was the objective of the study. For the reliability study a test-retest design and for the validity study a cross-sectional design was used. A general rehabilitation centre and a university rehabilitation centre was the setting for the study. The study population consisted of patients over 18 years of age, suffering from chronic musculoskeletal pain; 52 patients in the reliability study, 344 patients in the validity study. Main outcome measures were as follows. Reliability study: Spearman's correlation coefficients (rho values) of the test and retest data of the VAS for disability; validity study: rho values of the VAS disability scores with the scores on four domains of the Short-Form Health Survey (SF-36) and VAS pain scores, and with Roland-Morris Disability Questionnaire scores in chronic low back pain patients. Results were as follows: in the reliability study rho values varied from 0.60 to 0.77; and in the validity study rho values of VAS disability scores with SF-36 domain scores varied from 0.16 to 0.51, with Roland-Morris Disability Questionnaire scores from 0.38 to 0.43 and with VAS pain scores from 0.76 to 0.84. The conclusion of the study was that the reliability of the VAS for disability is moderate to good. Because of a weak correlation with other disability instruments and a strong correlation with the VAS for pain, however, its validity is questionable.
Cross-cultural adaptation and validation of the Korean version of the neck disability index.
Song, Kyung-Jin; Choi, Byung-Wan; Choi, Byung-Ryeul; Seo, Gyeu-Beom
2010-09-15
Validation of a translated, culturally adapted questionnaire. The purpose of this study is to translate and culturally adapt the Neck Disability Index (NDI) and to validate the use of the derived version in Korean patient. Although several valid measures exist for measurement of neck pain and functional impairment, these measures have yet been validated in Korean version. The NDI was linguistically translated into Korean, and prefinal version was assessed and modified by a pilot study. The reliability and validity of the derived Korean version was examined in 78 patients with degenerative cervical spine disease. Test-retest reliability, internal consistency, and construct validity were investigated by comparing Visual Analogue Scale (VAS) and Short Form Health Survey (SF-36) scores. Factor analysis of Korean NDI extracted 2 factors with eigenvalues >1. The intraclass-correlation coefficient of test-retest reliability was 0.93. Reliability, estimated by internal consistency, had a Cronbach alpha value of 0.82. The correlation between NDI and VAS scores was r = 0.49, and the correlation between NDI and SF-36 scores was r = -0.44. The physical health component score of SF-36 was highly correlated with NDI, and the correlation between VAS scores and the mental health component scores of SF-36 was high. The derived Korean version of the NDI was found to be a reliable and valid instrument for measuring disability in Korean patients with cervical problems. The authors recommend its use in future Korean clinical studies.
Testing the Predictive Validity of the Hendrich II Fall Risk Model.
Jung, Hyesil; Park, Hyeoun-Ae
2018-03-01
Cumulative data on patient fall risk have been compiled in electronic medical records systems, and it is possible to test the validity of fall-risk assessment tools using these data between the times of admission and occurrence of a fall. The Hendrich II Fall Risk Model scores assessed during three time points of hospital stays were extracted and used for testing the predictive validity: (a) upon admission, (b) when the maximum fall-risk score from admission to falling or discharge, and (c) immediately before falling or discharge. Predictive validity was examined using seven predictive indicators. In addition, logistic regression analysis was used to identify factors that significantly affect the occurrence of a fall. Among the different time points, the maximum fall-risk score assessed between admission and falling or discharge showed the best predictive performance. Confusion or disorientation and having a poor ability to rise from a sitting position were significant risk factors for a fall.
Tsuji, Naoko; Kakee, Naoko; Ishida, Yasushi; Asami, Keiko; Tabuchi, Ken; Nakadate, Hisaya; Iwai, Tsuyako; Maeda, Miho; Okamura, Jun; Kazama, Takuro; Terao, Yoko; Ohyama, Wataru; Yuza, Yuki; Kaneko, Takashi; Manabe, Atsushi; Kobayashi, Kyoko; Kamibeppu, Kiyoko; Matsushima, Eisuke
2011-04-10
The PedsQL 3.0 Cancer Module is a widely used instrument to measure pediatric cancer specific health-related quality of life (HRQOL) for children aged 2 to 18 years. We developed the Japanese version of the PedsQL Cancer Module and investigated its reliability and validity among Japanese children and their parents. Participants were 212 children with cancer and 253 of their parents. Reliability was determined by internal consistency using Cronbach's coefficient alpha and test-retest reliability using intra-class correlation coefficient (ICC). Validity was assessed through factor validity, convergent and discriminant validity, concurrent validity, and clinical validity. Factor validity was examined by exploratory factor analysis. Convergent and discriminant validity were examined by multitrait scaling analysis. Concurrent validity was assessed using Spearman's correlation coefficients between the Cancer Module and Generic Core Scales, and the comparison of the scores of child self-reports with those of other self-rating depression scales for children. Clinical validity was assessed by comparing the on- and off- treatment scores using Kruskal-Wallis and Mann-Whitney U tests. Cronbach's coefficient alpha was over 0.70 for the total scale and over 0.60 for each subscale by age except for the 'pain and hurt' subscale for children aged 5 to 7 years. For test-retest reliability, the ICC exceeded 0.70 for the total scale for each age. Exploratory factor analysis demonstrated sufficient factorial validity. Multitrait scaling analysis showed high success rates. Strong correlations were found between the reports by children and their parents, and the scores of the Cancer Module and the Generic Core Scales except for 'treatment anxiety' subscales for child reports. The Depression Self-Rating Scale for Children (DSRS-C) scores were significantly correlated with emotional domains and the total score of the cancer module. Children who had been off treatment over 12 months demonstrated significantly higher scores than those on treatment. The results demonstrate the reliability and validity of the Japanese version of the PedsQL Cancer Module among Japanese children.
ERIC Educational Resources Information Center
De Leng, W. E.; Stegers-Jager, K. M.; Husbands, A.; Dowell, J. S.; Born, M. Ph.; Themmen, A. P.
2017-01-01
Situational Judgment Tests (SJTs) are increasingly used for medical school selection. Scoring an SJT is more complicated than scoring a knowledge test, because there are no objectively correct answers. The scoring method of an SJT may influence the construct and concurrent validity and the adverse impact with respect to non-traditional students.…
Aligning Scales of Certification Tests. Research Report. ETS RR-10-07
ERIC Educational Resources Information Center
Dorans, Neil J.; Liang, Longjuan; Puhan, Gautam
2010-01-01
Scores are the most visible and widely used products of a testing program. The choice of score scale has implications for test specifications, equating, and test reliability and validity, as well as for test interpretation. At the same time, the score scale should be viewed as infrastructure likely to require repair at some point. In this report…
Cubiella, Joaquín; Digby, Jayne; Rodríguez-Alonso, Lorena; Vega, Pablo; Salve, María; Díaz-Ondina, Marta; Strachan, Judith A; Mowat, Craig; McDonald, Paula J; Carey, Francis A; Godber, Ian M; Younes, Hakim Ben; Rodriguez-Moranta, Francisco; Quintero, Enrique; Álvarez-Sánchez, Victoria; Fernández-Bañares, Fernando; Boadas, Jaume; Campo, Rafel; Bujanda, Luis; Garayoa, Ana; Ferrandez, Ángel; Piñol, Virginia; Rodríguez-Alcalde, Daniel; Guardiola, Jordi; Steele, Robert J C; Fraser, Callum G
2017-05-15
Prediction models for colorectal cancer (CRC) detection in symptomatic patients, based on easily obtainable variables such as fecal haemoglobin concentration (f-Hb), age and sex, may simplify CRC diagnosis. We developed, and then externally validated, a multivariable prediction model, the FAST Score, with data from five diagnostic test accuracy studies that evaluated quantitative fecal immunochemical tests in symptomatic patients referred for colonoscopy. The diagnostic accuracy of the Score in derivation and validation cohorts was compared statistically with the area under the curve (AUC) and the Chi-square test. 1,572 and 3,976 patients were examined in these cohorts, respectively. For CRC, the odds ratio (OR) of the variables included in the Score were: age (years): 1.03 (95% confidence intervals (CI): 1.02-1.05), male sex: 1.6 (95% CI: 1.1-2.3) and f-Hb (0-<20 µg Hb/g feces): 2.0 (95% CI: 0.7-5.5), (20-<200 µg Hb/g): 16.8 (95% CI: 6.6-42.0), ≥200 µg Hb/g: 65.7 (95% CI: 26.3-164.1). The AUC for CRC detection was 0.88 (95% CI: 0.85-0.90) in the derivation and 0.91 (95% CI: 0.90-093; p = 0.005) in the validation cohort. At the two Score thresholds with 90% (4.50) and 99% (2.12) sensitivity for CRC, the Score had equivalent sensitivity, although the specificity was higher in the validation cohort (p < 0.001). Accordingly, the validation cohort was divided into three groups: high (21.4% of the cohort, positive predictive value-PPV: 21.7%), intermediate (59.8%, PPV: 0.9%) and low (18.8%, PPV: 0.0%) risk for CRC. The FAST Score is an easy to calculate prediction tool, highly accurate for CRC detection in symptomatic patients. © 2017 UICC.
Tennant, Alan; Küçükdeveci, Ayse A; Kutlay, Sehim; Elhan, Atilla H
2006-03-23
The Middlesex Elderly Assessment of Mental State (MEAMS) was developed as a screening test to detect cognitive impairment in the elderly. It includes 12 subtests, each having a 'pass score'. A series of tasks were undertaken to adapt the measure for use in the adult population in Turkey and to determine the validity of existing cut points for passing subtests, given the wide range of educational level in the Turkish population. This study focuses on identifying and validating the scoring system of the MEAMS for Turkish adult population. After the translation procedure, 350 normal subjects and 158 acquired brain injury patients were assessed by the Turkish version of MEAMS. Initially, appropriate pass scores for the normal population were determined through ANOVA post-hoc tests according to age, gender and education. Rasch analysis was then used to test the internal construct validity of the scale and the validity of the cut points for pass scores on the pooled data by using Differential Item Functioning (DIF) analysis within the framework of the Rasch model. Data with the initially modified pass scores were analyzed. DIF was found for certain subtests by age and education, but not for gender. Following this, pass scores were further adjusted and data re-fitted to the model. All subtests were found to fit the Rasch model (mean item fit 0.184, SD 0.319; person fit -0.224, SD 0.557) and DIF was then found to be absent. Thus the final pass scores for all subtests were determined. The MEAMS offers a valid assessment of cognitive state for the adult Turkish population, and the revised cut points accommodate for age and education. Further studies are required to ascertain the validity in different diagnostic groups.
Tuğay, Baki Umut; Tuğay, Nazan; Güney, Hande; Kınıklı, Gizem İrem; Yüksel, İnci; Atilla, Bülent
2016-01-01
The Oxford Knee Score (OKS) is a valid, short, self-administered, and site- specific outcome measure specifically developed for patients with knee arthroplasty. This study aimed to cross-culturally adapt and validate the OKS to be used in Turkish-speaking patients with osteoarthritis of the knee. The OKS was translated and culturally adapted according to the guidelines in the literature. Ninety-one patients (mean age: 55.89±7.85 years) with knee osteoarthritis participated in the study. Patients completed the Turkish version of the Oxford Knee Score (OKS-TR), Short-Form 36 Health Survey (SF-36), and Western Ontario and McMaster Universities Index (WOMAC) questionnaires. Internal consistency was tested using Cronbach's α coefficient. Patients completed the OKS-TR questionnaire twice in 7 days to determine the reproducibility. Correlation between the total results of both tests was determined by Spearman's correlation coefficient and intraclass correlation coefficients (ICC). Validity was assessed by calculating Spearman's correlation coefficient between the OKS, WOMAC, and SF-36 scores. Floor and ceiling effects were analyzed. Internal consistency was high (Cronbach's α: 0.90). The reproducibility tested by 2 different methods showed no significant difference (p>0.05). The construct validity analyses showed a significant correlation between the OKS and the other scores (p<0.05). There was no floor or ceiling effect in total OKS score. The OKS-TR is a reliable and valid measure for the self-assessment of pain and function in Turkish-speaking patients with osteoarthritis of the knee.
Turkish Version of Kolcaba's Immobilization Comfort Questionnaire: A Validity and Reliability Study.
Tosun, Betül; Aslan, Özlem; Tunay, Servet; Akyüz, Aygül; Özkan, Hüseyin; Bek, Doğan; Açıksöz, Semra
2015-12-01
The purpose of this study was to determine the validity and reliability of the Turkish version of the Immobilization Comfort Questionnaire (ICQ). The sample used in this methodological study consisted of 121 patients undergoing lower extremity arthroscopy in a training and research hospital. The validity study of the questionnaire assessed language validity, structural validity and criterion validity. Structural validity was evaluated via exploratory factor analysis. Criterion validity was evaluated by assessing the correlation between the visual analog scale (VAS) scores (i.e., the comfort and pain VAS scores) and the ICQ scores using Spearman's correlation test. The Kaiser-Meyer-Olkin coefficient and Bartlett's test of sphericity were used to determine the suitability of the data for factor analysis. Internal consistency was evaluated to determine reliability. The data were analyzed with SPSS version 15.00 for Windows. Descriptive statistics were presented as frequencies, percentages, means and standard deviations. A p value ≤ .05 was considered statistically significant. A moderate positive correlation was found between the ICQ scores and the VAS comfort scores; a moderate negative correlation was found between the ICQ and the VAS pain measures in the criterion validity analysis. Cronbach α values of .75 and .82 were found for the first and second measurements, respectively. The findings of this study reveal that the ICQ is a valid and reliable tool for assessing the comfort of patients in Turkey who are immobilized because of lower extremity orthopedic problems. Copyright © 2015. Published by Elsevier B.V.
ERIC Educational Resources Information Center
Longenbecker, Sueann; Wood, Peter H.
1984-01-01
Scores from the National Board Dental Hygiene Examination (NBDHE) served as the criterion variable in a comparison of the predictive validity of the Dental Hygiene Aptitude Tests (DHAT) and the ACT Assessment tests. The DHAT-Science and Verbal tests combined to produce the highest multiple correlation with NBDHE scores. (Author/DWH)
Abou-Taleb, Doaa A E; Ibrahim, Ahmed K; Youssef, Eman M K; Moubasher, Alaa E A
2017-02-01
The new modified Melasma Area and Severity Index (mMASI) score, the recently used outcome measure for melasma, has not been tested to determine its sensitivity to change in melasma. To determine the reliability, validity, and sensitivity to change overtime of the mMASI score in assessment of the severity of melasma. Pearson correlation, Cronbach alpha, and intraclass correlation coefficient were calculated to assess the reliability of the mMASI score. Validity of the mMASI scale was carried out using Spearman correlation between mMASI total score (before and after treatment), clinical data, and patient's responses. The mMASI score showed excellent reliability and good validity for assessment of the severity of melasma. The authors also determined that the mMASI score demonstrated sensitivity to change over time. An excellent degree of agreement between the mMSAI and MASI scores was revealed. The mMASI score is reliable, valid, and responsive to change in the assessment of severity of melasma. Moreover, the mMASI score was found to be easier to learn and perform and simpler in calculation compared with the MASI score. Overall, the mMASI score can effectively replace the MASI score.
ERIC Educational Resources Information Center
Politzer, Robert L.; And Others
1983-01-01
The development, administration, and scoring of a communicative test and its validation with tests of linguistic and sociolinguistic competence in English and Spanish are reported. Correlation with measures of home language use and school achievement are also presented, and issues of test validation for bilingual programs are discussed. (MSE)
Validity and reliability of the Diagnostic Adaptive Behaviour Scale.
Tassé, M J; Schalock, R L; Balboni, G; Spreat, S; Navas, P
2016-01-01
The Diagnostic Adaptive Behaviour Scale (DABS) is a new standardised adaptive behaviour measure that provides information for evaluating limitations in adaptive behaviour for the purpose of determining a diagnosis of intellectual disability. This article presents validity evidence and reliability data for the DABS. Validity evidence was based on comparing DABS scores with scores obtained on the Vineland Adaptive Behaviour Scale, second edition. The stability of the test scores was measured using a test and retest, and inter-rater reliability was assessed by computing the inter-respondent concordance. The DABS convergent validity coefficients ranged from 0.70 to 0.84, while the test-retest reliability coefficients ranged from 0.78 to 0.95, and the inter-rater concordance as measured by intraclass correlation coefficients ranged from 0.61 to 0.87. All obtained validity and reliability indicators were strong and comparable with the validity and reliability coefficients of the most commonly used adaptive behaviour instruments. These results and the advantages of the DABS for clinician and researcher use are discussed. © 2015 MENCAP and International Association of the Scientific Study of Intellectual and Developmental Disabilities and John Wiley & Sons Ltd.
Gadbury-Amyot, Cynthia C; McCracken, Michael S; Woldt, Janet L; Brennan, Robert L
2014-05-01
The purpose of this study was to empirically investigate the validity and reliability of portfolio assessment in two U.S. dental schools using a unified framework for validity. In the process of validation, it is not the test that is validated but rather the claims (interpretations and uses) about test scores that are validated. Kane's argument-based validation framework provided the structure for reporting results where validity claims are followed by evidence to support the argument. This multivariate generalizability theory study found that the greatest source of variance was attributable to faculty raters, suggesting that portfolio assessment would benefit from two raters' evaluating each portfolio independently. The results are generally supportive of holistic scoring, but analytical scoring deserves further research. Correlational analyses between student portfolios and traditional measures of student competence and readiness for licensure resulted in significant correlations between portfolios and National Board Dental Examination Part I (r=0.323, p<0.01) and Part II scores (r=0.268, p<0.05) and small and non-significant correlations with grade point average and scores on the Western Regional Examining Board (WREB) exam. It is incumbent upon the users of portfolio assessment to determine if the claims and evidence arguments set forth in this study support the proposed claims for and decisions about portfolio assessment in their respective institutions.
Validity Evidence for ACT Compass® Placement Tests. ACT Research Report Series 2014 (2)
ERIC Educational Resources Information Center
Westrick, Paul A.; Allen, Jeff
2014-01-01
We examined the validity of using Compass® test scores and high school grade point average (GPA) for placing students in first-year college courses and for identifying students at risk of not succeeding. Consistent with other research, the combination of high school GPA and Compass scores performed better than either measure used alone. Results…
Glaister, Mark; Stone, Michael H; Stewart, Andrew M; Hughes, Michael; Moir, Gavin L
2004-08-01
The purpose of the present study was to assess the reliability and validity of fatigue measures, as derived from 4 separate formulae, during tests of repeat sprint ability. On separate days over a 3-week period, 2 groups of 7 recreationally active men completed 6 trials of 1 of 2 maximal (20 x 5 seconds) intermittent cycling tests with contrasting recovery periods (10 or 30 seconds). All trials were conducted on a friction-braked cycle ergometer, and fatigue scores were derived from measures of mean power output for each sprint. Apart from formula 1, which calculated fatigue from the percentage difference in mean power output between the first and last sprint, all remaining formulae produced fatigue scores that showed a reasonably good level of test-retest reliability in both intermittent test protocols (intraclass correlation range: 0.78-0.86; 95% likely range of true values: 0.54-0.97). Although between-protocol differences in the magnitude of the fatigue scores suggested good construct validity, within-protocol differences highlighted limitations with each formula. Overall, the results support the use of the percentage decrement score as the most valid and reliable measure of fatigue during brief maximal intermittent work.
Kenyon, Lisa K.; Elliott, James M; Cheng, M. Samuel
2016-01-01
Purpose/Background Despite the availability of various field-tests for many competitive sports, a reliable and valid test specifically developed for use in men's gymnastics has not yet been developed. The Men's Gymnastics Functional Measurement Tool (MGFMT) was designed to assess sport-specific physical abilities in male competitive gymnasts. The purpose of this study was to develop the MGFMT by establishing a scoring system for individual test items and to initiate the process of establishing test-retest reliability and construct validity. Methods A total of 83 competitive male gymnasts ages 7-18 underwent testing using the MGFMT. Thirty of these subjects underwent re-testing one week later in order to assess test-retest reliability. Construct validity was assessed using a simple regression analysis between total MGFMT scores and the gymnasts’ USA-Gymnastics competitive level to calculate the coefficient of determination (r2). Test-retest reliability was analyzed using Model 1 Intraclass correlation coefficients (ICC). Statistical significance was set at the p<0.05 level. Results The relationship between total MGFMT scores and subjects’ current USA-Gymnastics competitive level was found to be good (r2 = 0.63). Reliability testing of the MGFMT composite test score showed excellent test-retest reliability over a one-week period (ICC = 0.97). Test-retest reliability of the individual component tests ranged from good to excellent (ICC = 0.75-0.97). Conclusions The results of this study provide initial support for the construct validity and test-retest reliability of the MGFMT. Level of Evidence Level 3 PMID:27999723
Calès, P; Boursier, J; Lebigot, J; de Ledinghen, V; Aubé, C; Hubert, I; Oberti, F
2017-04-01
In chronic hepatitis C, the European Association for the Study of the Liver and the Asociacion Latinoamericana para el Estudio del Higado recommend performing transient elastography plus a blood test to diagnose significant fibrosis; test concordance confirms the diagnosis. To validate this rule and improve it by combining a blood test, FibroMeter (virus second generation, Echosens, Paris, France) and transient elastography (constitutive tests) into a single combined test, as suggested by the American Association for the Study of Liver Diseases and the Infectious Diseases Society of America. A total of 1199 patients were included in an exploratory set (HCV, n = 679) or in two validation sets (HCV ± HIV, HBV, n = 520). Accuracy was mainly evaluated by correct diagnosis rate for severe fibrosis (pathological Metavir F ≥ 3, primary outcome) by classical test scores or a fibrosis classification, reflecting Metavir staging, as a function of test concordance. Score accuracy: there were no significant differences between the blood test (75.7%), elastography (79.1%) and the combined test (79.4%) (P = 0.066); the score accuracy of each test was significantly (P < 0.001) decreased in discordant vs. concordant tests. Classification accuracy: combined test accuracy (91.7%) was significantly (P < 0.001) increased vs. the blood test (84.1%) and elastography (88.2%); accuracy of each constitutive test was significantly (P < 0.001) decreased in discordant vs. concordant tests but not with combined test: 89.0 vs. 92.7% (P = 0.118). Multivariate analysis for accuracy showed an interaction between concordance and fibrosis level: in the 1% of patients with full classification discordance and severe fibrosis, non-invasive tests were unreliable. The advantage of combined test classification was confirmed in the validation sets. The concordance recommendation is validated. A combined test, expressed in classification instead of score, improves this rule and validates the recommendation of a combined test, avoiding 99% of biopsies, and offering precise staging. © 2017 John Wiley & Sons Ltd.
Can patients interpret health information? An assessment of the medical data interpretation test.
Schwartz, Lisa M; Woloshin, Steven; Welch, H Gilbert
2005-01-01
To establish the reliability/validity of an 18-item test of patients' medical data interpretation skills. Survey with retest after 2 weeks. Subjects. 178 people recruited from advertisements in local newspapers, an outpatient clinic, and a hospital open house. The percentage of correct answers to individual items ranged from 20% to 87%, and medical data interpretation test scores (on a 0- 100 scale) were normally distributed (median 61.1, mean 61.0, range 6-94). Reliability was good (test-retest correlation=0.67, Cronbach's alpha=0.71). Construct validity was supported in several ways. Higher scores were found among people with highest versus lowest numeracy (71 v. 36, P<0.001), highest quantitative literacy (65 v. 28, P<0.001), and highest education (69 v. 42, P=0.004). Scores for 15 physician experts also completing the survey were significantly higher than participants with other postgraduate degrees (mean score 89 v. 69, P<0.001). The medical data interpretation test is a reliable and valid measure of the ability to interpret medical statistics.
Transcultural validation of the Oxford Shoulder Score for the French-speaking population.
Tuton, D; Barbe, C; Salmon, J-H; Dramé, M; Nérot, C; Ohl, X
2016-09-01
Patient-reported outcome measures (PROMs) have been gaining in popularity over the last decade. The Oxford Shoulder Score (OSS) is a well-established self-administered questionnaire for shoulder evaluation adapted for the English-speaking population. The aim of the present study was to develop a translation and a transcultural adaptation of the OSS and to assess its validity in native French-speaker patients with shoulder pain. The translation process was carried out following a translation/back-translation methodology by two translators. All patients completed the French OSS, the Subjective Shoulder Value (SSV), and the Constant score. Internal consistency was tested using Cronbach's α coefficient. Validity was assessed by calculating the Pearson correlation coefficient between the OSS and the Constant score and the SSV. One hundred forty-four patients suffering from degenerative or inflammatory diseases of the shoulder were included in this study. The average time required to complete the French OSS was 2min and 45s. Seventy patients were asked to complete the questionnaire twice (test/retest reliability). Internal consistency was high with Cronbach's α coefficient=0.93. The intraclass correlation coefficient was 0.91 (95% CI: 0.88-0.94) for test/retest reliability. The French OSS score was significantly correlated with the Constant-Murley score (r=0.73 and P<0.0001) and with the SSV (r=0.68 and P<0.0001). The present study shows that the French version of the OSS is reliable, valid, and reproducible. The sensitivity to change now needs to be evaluated. This score was adapted to the French-speaking population for the self-assessment of patients with degenerative or inflammatory disorders of the shoulder. Level 1, Test of previously developed criteria, diagnostic test study. Copyright © 2016 Elsevier Masson SAS. All rights reserved.
Mbada, Chidozie Emmanuel; Adeogun, Gafar Atanda; Ogunlana, Michael Opeoluwa; Adedoyin, Rufus Adesoji; Akinsulore, Adesanmi; Awotidebe, Taofeek Oluwole; Idowu, Opeyemi Ayodiipo; Olaoye, Olumide Ayoola
2015-09-14
The Short-Form Health Survey (SF-36) is a valid quality of life tool often employed to determine the impact of medical intervention and the outcome of health care services. However, the SF-36 is culturally sensitive which necessitates its adaptation and translation into different languages. This study was conducted to cross-culturally adapt the SF-36 into Yoruba language and determine its reliability and validity. Based on the International Quality of Life Assessment project guidelines, a sequence of translation, test of item-scale correlation, and validation was implemented for the translation of the Yoruba version of the SF-36. Following pilot testing, the English and the Yoruba versions of the SF-36 were administered to a random sample of 1087 apparently healthy individuals to test validity and 249 respondents completed the Yoruba SF-36 again after two weeks to test reliability. Data was analyzed using Pearson's product moment correlation analysis, independent t-test, one-way analysis of variance, multi trait scaling analysis and Intra-Class Correlation (ICC) at p < 0.05. The concurrent validity scores for scales and domains ranges between 0.749 and 0.902 with the highest and lowest scores in the General Health (0.902) and Bodily Pain (0.749) scale. Scale-level descriptive result showed that all scale and domain scores had negative skewness ranging from -2.08 to -0.98. The mean scores for each scales ranges between 83.2 and 88.8. The domain scores for Physical Health Component and Mental Health Component were 85.6 ± 13.7 and 85.9 ± 15.4 respectively. The convergent validity was satisfactory, ranging from 0.421 to 0.907. Discriminant validity was also satisfactory except for item '1'. The ICC for the test-retest reliability of the Yoruba SF-36 ranges between 0.636 and 0.843 for scales; and 0.783 and 0.851 for domains. The data quality, concurrent and discriminant validity, reliability and internal consistency of the Yoruba version of the SF-36 are adequate and it is recommended for measuring health-related quality of life among Yoruba population.
Walters, Steven O; Weaver, Kenneth A
2003-06-01
The Kaufman Brief Intelligence Test detects learning problems of young students and is a screen for whether a more comprehensive test of intelligence is needed. A study to assess whether this test was valid as an adult intelligence test was conducted with 20 undergraduate psychology majors. The correlations between the Kaufman Brief Intelligence Test's Composite, Vocabulary, and Matrices test scores and their corresponding Wechsler Adult Intelligence Scale-Third Edition test scores, the Full Scale (r=.88), Verbal (r=.77), and Performance scores (r=.87), indicated very strong relationships. In addition, no significant differences were obtained between the Composite, Vocabulary, and Matrices means of the Kaufman Brief Intelligence Test and the Full Scale, Verbal, and Performance means of the WAIS-III. The Kaufman Brief Intelligence Test appears to be a valid test of intelligence for adults.
Translation and Validation of the Dysphagia Handicap Index in Hebrew-Speaking Patients.
Shapira-Galitz, Yael; Drendel, Michael; Yousovich-Ulriech, Ruth; Shtreiffler-Moskovich, Liat; Wolf, Michael; Lahav, Yonatan
2018-06-07
The Dysphagia Handicap Index (DHI) is a 25-item questionnaire assessing the physical, functional, and emotional aspects of dysphagia patients' quality of life (QoL). The study goal was to translate and validate the Hebrew-DHI. 148 patients undergoing fiberoptic endoscopic examination of swallowing (FEES) in two specialized dysphagia clinics between February and August 2017 filled the Hebrew-DHI and self-reported their dysphagia severity on a scale of 1-7. 21 patients refilled the DHI during a 2-week period following their first visit. FEES were scored for residue (1 point per consistency), penetration and aspiration (1 point for penetration, 2 points for aspiration, per consistency). 51 healthy volunteers also filled the DHI. Internal consistency and test-retest reproducibility were used for reliability testing. Validity was established by comparing DHI scores of dysphagia patients and healthy controls. Concurrent validity was established by correlating the DHI score with the FEES score. Internal consistency of the Hebrew-DHI was high (Cronbach's alpha = 0.96), as was the test-retest reproducibility (Spearman's correlation coefficient = 0.82, p < 0.001). The Hebrew-DHI's total score, and its three subscales (physical/functional/emotional) were significantly higher in dysphagia patients compared to those in healthy controls (median 38 pts, IQR 18-56 for dysphagia patients compared to 0, IQR 0-2 for healthy controls, p < 0.0001). A strong correlation was observed between the DHI score and the self-reported dysphagia severity measure (Spearman's correlation coefficient = 0.88, p < 0.0001). A moderate correlation was found between the DHI score and the FEES score (Pearson's correlation coefficient = 0.245, p = 0.003). The Hebrew-DHI is a reliable and valid questionnaire assessing dysphagia patients' QoL.
Clarifying the Consensus Definition of Validity
ERIC Educational Resources Information Center
Newton, Paul E.
2012-01-01
The 1999 "Standards for Educational and Psychological Testing" defines validity as the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests. Although quite explicit, there are ways in which this definition lacks precision, consistency, and clarity. The history of validity has taught us…
Evaluating Test Validity: Reprise and Progress
ERIC Educational Resources Information Center
Shepard, Lorrie A.
2016-01-01
The AERA, APA, NCME Standards define validity as "the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests". A century of disagreement about validity does not mean that there has not been substantial progress. This consensus definition brings together interpretations and use so that it…
Cross-cultural adaptation and validation of Persian Achilles tendon Total Rupture Score.
Ansari, Noureddin Nakhostin; Naghdi, Soofia; Hasanvand, Sahar; Fakhari, Zahra; Kordi, Ramin; Nilsson-Helander, Katarina
2016-04-01
To cross-culturally adapt the Achilles tendon Total Rupture Score (ATRS) to Persian language and to preliminary evaluate the reliability and validity of a Persian ATRS. A cross-sectional and prospective cohort study was conducted to translate and cross-culturally adapt the ATRS to Persian language (ATRS-Persian) following steps described in guidelines. Thirty patients with total Achilles tendon rupture and 30 healthy subjects participated in this study. Psychometric properties of floor/ceiling effects (responsiveness), internal consistency reliability, test-retest reliability, standard error of measurement (SEM), smallest detectable change (SDC), construct validity, and discriminant validity were tested. Factor analysis was performed to determine the ATRS-Persian structure. There were no floor or ceiling effects that indicate the content and responsiveness of ATRS-Persian. Internal consistency was high (Cronbach's α 0.95). Item-total correlations exceeded acceptable standard of 0.3 for the all items (0.58-0.95). The test-retest reliability was excellent [(ICC)agreement 0.98]. SEM and SDC were 3.57 and 9.9, respectively. Construct validity was supported by a significant correlation between the ATRS-Persian total score and the Persian Foot and Ankle Outcome Score (PFAOS) total score and PFAOS subscales (r = 0.55-0.83). The ATRS-Persian significantly discriminated between patients and healthy subjects. Explanatory factor analysis revealed 1 component. The ATRS was cross-culturally adapted to Persian and demonstrated to be a reliable and valid instrument to measure functional outcomes in Persian patients with Achilles tendon rupture. II.
Development of the outcome expectancy scale for self-care among periodontal disease patients.
Kakudate, Naoki; Morita, Manabu; Fukuhara, Shunichi; Sugai, Makoto; Nagayama, Masato; Isogai, Emiko; Kawanami, Masamitsu; Chiba, Itsuo
2011-12-01
The theory of self-efficacy states that specific efficacy expectations affect behaviour. Two types of efficacy expectations are described within the theory. Self-efficacy expectations are the beliefs in the capacity to perform a specific behaviour. Outcome expectations are the beliefs that carrying out a specific behaviour will lead to a desired outcome. To develop and examine the reliability and validity of an outcome expectancy scale for self-care (OESS) among periodontal disease patients. A 34-item scale was tested on 101 patients at a dental clinic. Accuracy was improved by item analysis, and internal consistency and test-retest stability were investigated. Concurrent validity was tested by examining associations of the OESS score with the self-efficacy scale for self-care (SESS) score and plaque index score. Construct validity was examined by comparing OESS scores between periodontal patients at initial visit (group 1) and those continuing maintenance care (group 2). Item analysis identified 13 items for the OESS. Factor analysis extracted three factors: social-, oral- and self-evaluative outcome expectancy. Cronbach's alpha coefficient for the OESS was 0.90. A significant association was observed between test and retest scores, and between the OESS and SESS and plaque index scores. Further, group 2 had a significantly higher mean OESS score than group 1. We developed a 13-item OESS with high reliability and validity which may be used to assess outcome expectancy for self-care. A patient's psychological condition with regard to behaviour and affective status can be accurately evaluated using the OESS with SESS. © 2011 Blackwell Publishing Ltd.
Item validity vs. item discrimination index: a redundancy?
NASA Astrophysics Data System (ADS)
Panjaitan, R. L.; Irawati, R.; Sujana, A.; Hanifah, N.; Djuanda, D.
2018-03-01
In several literatures about evaluation and test analysis, it is common to find that there are calculations of item validity as well as item discrimination index (D) with different formula for each. Meanwhile, other resources said that item discrimination index could be obtained by calculating the correlation between the testee’s score in a particular item and the testee’s score on the overall test, which is actually the same concept as item validity. Some research reports, especially undergraduate theses tend to include both item validity and item discrimination index in the instrument analysis. It seems that these concepts might overlap for both reflect the test quality on measuring the examinees’ ability. In this paper, examples of some results of data processing on item validity and item discrimination index were compared. It would be discussed whether item validity and item discrimination index can be represented by one of them only or it should be better to present both calculations for simple test analysis, especially in undergraduate theses where test analyses were included.
ERIC Educational Resources Information Center
Lowe, Patricia A.; Peyton, Vicki; Reynolds, Cecil R.
2007-01-01
A sample of 79 individuals participated in the present study to evaluate the test score stability (8-week test-retest interval) and construct validity of the scores of the Adult Manifest Anxiety Scale-College Version, a new measure used to assess anxiety in college students, for application to graduate-level students. Results of the study…
The reliability and validity of the Turkish version of Fullerton Advanced Balance (FAB-T) scale.
Iyigun, Gozde; Kirmizigil, Berkiye; Angin, Ender; Oksuz, Sevim; Can, Filiz; Eker, Levent; Rose, Debra J
2018-06-04
The aim of this study was to evaluate the reliability and validity of the Turkish version of the FAB(FAB-T) scale in the older Turkish adults. The reliability and validity of the scale was tested on 200 community-dwelling older adults. FAB-T scale was scored by different physiotherapists on different days to evaluate inter-rater and intrarater reliability. The Berg Balance Scale (BBS) was used for the evaluation of convergent validity, and the content validity of the FAB-T scale was investigated. The FAB-T scale showed very high inter- and intra-rater reliability. For inter-rater agreement, on the individual test items and total score ICC values were 0.92 (95 %CI; 0.90-0.94) and 0.96 (95% CI; 0.95-0.97) respectively. The intra-rater agreement, on the individual test items and total score ICC values were 0.93 (95 %CI; 0.91- 0.95) and 0.96 (95% CI; 0.95- 0.97) respectively. There was a good agreement between the FAB-T and BBS scales. A high correlation was found between the BBS and FAB-T scales [rho = 0.70 (%95 CI; 0.62-0.76)] indicating good convergent validity. Considering the content validity of the FAB-T scale, no floor (floor score: 0%) or ceiling (ceiling score: 6.5%) effect was detected. The FAB-T scale was successfully translated from the original English version (FAB) and demonstrated strong psychometric features. It was found that the FAB-T scale has very high inter-rater and intra-rater reliability. Considering the convergent validity, the scale has high correlation with the BBS. The FAB-T has no floor and ceiling effect. Copyright © 2018 Elsevier B.V. All rights reserved.
Corner, E J; Wood, H; Englebretsen, C; Thomas, A; Grant, R L; Nikoletou, D; Soni, N
2013-03-01
To develop a scoring system to measure physical morbidity in critical care - the Chelsea Critical Care Physical Assessment Tool (CPAx). The development process was iterative involving content validity indices (CVI), a focus group and an observational study of 33 patients to test construct validity against the Medical Research Council score for muscle strength, peak cough flow, Australian Therapy Outcome Measures score, Glasgow Coma Scale score, Bloomsbury sedation score, Sequential Organ Failure Assessment score, Short Form 36 (SF-36) score, days of mechanical ventilation and inter-rater reliability. Trauma and general critical care patients from two London teaching hospitals. Users of the CPAx felt that it possessed content validity, giving a final CVI of 1.00 (P<0.05). Construct validation data showed moderate to strong significant correlations between the CPAx score and all secondary measures, apart from the mental component of the SF-36 which demonstrated weak correlation with the CPAx score (r=0.024, P=0.720). Reliability testing showed internal consistency of α=0.798 and inter-rater reliability of κ=0.988 (95% confidence interval 0.791 to 1.000) between five raters. This pilot work supports proof of concept of the CPAx as a measure of physical morbidity in the critical care population, and is a cogent argument for further investigation of the scoring system. Copyright © 2012 Chartered Society of Physiotherapy. Published by Elsevier Ltd. All rights reserved.
Glassmire, David M; Toofanian Ross, Parnian; Kinney, Dominique I; Nitch, Stephen R
2016-06-01
Two studies were conducted to identify and cross-validate cutoff scores on the Wechsler Adult Intelligence Scale-Fourth Edition Digit Span-based embedded performance validity (PV) measures for individuals with schizophrenia spectrum disorders. In Study 1, normative scores were identified on Digit Span-embedded PV measures among a sample of patients (n = 84) with schizophrenia spectrum diagnoses who had no known incentive to perform poorly and who put forth valid effort on external PV tests. Previously identified cutoff scores resulted in unacceptable false positive rates and lower cutoff scores were adopted to maintain specificity levels ≥90%. In Study 2, the revised cutoff scores were cross-validated within a sample of schizophrenia spectrum patients (n = 96) committed as incompetent to stand trial. Performance on Digit Span PV measures was significantly related to Full Scale IQ in both studies, indicating the need to consider the intellectual functioning of examinees with psychotic spectrum disorders when interpreting scores on Digit Span PV measures. © The Author(s) 2015.
Construct Validity of Fresh Frozen Human Cadaver as a Training Model in Minimal Access Surgery
Macafee, David; Pranesh, Nagarajan; Horgan, Alan F.
2012-01-01
Background: The construct validity of fresh human cadaver as a training tool has not been established previously. The aims of this study were to investigate the construct validity of fresh frozen human cadaver as a method of training in minimal access surgery and determine if novices can be rapidly trained using this model to a safe level of performance. Methods: Junior surgical trainees, novices (<3 laparoscopic procedure performed) in laparoscopic surgery, performed 10 repetitions of a set of structured laparoscopic tasks on fresh frozen cadavers. Expert laparoscopists (>100 laparoscopic procedures) performed 3 repetitions of identical tasks. Performances were scored using a validated, objective Global Operative Assessment of Laparoscopic Skills scale. Scores for 3 consecutive repetitions were compared between experts and novices to determine construct validity. Furthermore, to determine if the novices reached a safe level, a trimmed mean of the experts score was used to define a benchmark. Mann-Whitney U test was used for construct validity analysis and 1-sample t test to compare performances of the novice group with the benchmark safe score. Results: Ten novices and 2 experts were recruited. Four out of 5 tasks (nondominant to dominant hand transfer; simulated appendicectomy; intracorporeal and extracorporeal knot tying) showed construct validity. Novices’ scores became comparable to benchmark scores between the eighth and tenth repetition. Conclusion: Minimal access surgical training using fresh frozen human cadavers appears to have construct validity. The laparoscopic skills of novices can be accelerated through to a safe level within 8 to 10 repetitions. PMID:23318058
Kaveney, Sarah C; Baumstarck, Karine; Minaya-Flores, Patricia; Shannon, Tarrah; Symes, Philip; Loundou, Anderson; Auquier, Pascal
2016-05-28
The CareGiver Oncology Quality of Life (CarGOQoL) questionnaire, a 29-item, multidimensional, self-administered questionnaire, was validated using a large French sample. We reported the linguistic validation process and the metric validity of the English version of CarGOQoL in the United- States. The translation process consisted of 3 consecutive steps: forward-backward translation, acceptability testing, and cognitive interviews. The psychometric testing was applied to caregivers of consecutive patients with representative cancers who were recruited from the Regional Cancer Center in northwestern Pennsylvania. All individuals completed the CarGOQoL at baseline, day- 30, and day- 90. Internal consistency, reliability, external validity, reproducibility, and sensitivity to change were tested. The translated version was validated on a total of 87 American cancer caregivers. The dimensions of the CarGOQoL generally demonstrated a high internal consistency (Cronbach's alpha > 0.70 for all but four domain scores). External validity testing revealed that the CarGOQoL index score correlated significantly with all SF-36 dimension scores except the physical composite score (Pearson's correlation: 0.28-0.70). Reproducibility was satisfactory at day- 30 (intraclass correlation coefficient: 0.46-0.94) and day- 90 (0.43-0.92). Four specific dimensions of CarGOQoL showed responsiveness: the Psychological well-being, the Relationships with health care system, the Social support and the Finances. The American version of the CarGOQoL constitutes a useful instrument to measure QoL in caregivers of cancer patients in the United- States.
Translation and validation of the Canadian diabetes risk assessment questionnaire in China.
Guo, Jia; Shi, Zhengkun; Chen, Jyu-Lin; Dixon, Jane K; Wiley, James; Parry, Monica
2018-01-01
To adapt the Canadian Diabetes Risk Assessment Questionnaire for the Chinese population and to evaluate its psychometric properties. A cross-sectional study was conducted with a convenience sample of 194 individuals aged 35-74 years from October 2014 to April 2015. The Canadian Diabetes Risk Assessment Questionnaire was adapted and translated for the Chinese population. Test-retest reliability was conducted to measure stability. Criterion and convergent validity of the adapted questionnaire were assessed using 2-hr 75 g oral glucose tolerance tests and the Finnish Diabetes Risk Scores, respectively. Sensitivity and specificity were evaluated to establish its predictive validity. The test-retest reliability was 0.988. Adequate validity of the adapted questionnaire was demonstrated by positive correlations found between the scores and 2-hr 75 g oral glucose tolerance tests (r = .343, p < .001) and with the Finnish Diabetes Risk Scores (r = .738, p < .001). The area under receiver operating characteristic curve was 0.705 (95% CI .632, .778), demonstrating moderate diagnostic value at a cutoff score of 30. The sensitivity was 73%, with a positive predictive value of 57% and negative predictive value of 78%. Our results provided evidence supporting the translation consistency, content validity, convergent validity, criterion validity, sensitivity, and specificity of the translated Canadian Diabetes Risk Assessment Questionnaire with minor modifications. This paper provides clinical, practical, and methodological information on how to adapt a diabetes risk calculator between cultures for public health nurses. © 2017 Wiley Periodicals, Inc.
André, Helô-Isa; Carnide, Filomena; Moço, Andreia; Valamatos, Maria-João; Ramalho, Fátima; Santos-Rocha, Rita; Veloso, António
2018-06-05
The assessment of the plantar-flexors muscle strength in older adults (OA) is of the utmost importance since they are strongly associated with the performance of fundamental tasks of daily life. The objective was to strengthen the validity of the Calf-Raise-Senior (CRS) test by assessing the biomechanical movement pattern of calf muscles in OA with different levels of functional fitness (FF) and physical activity (PA). Twenty-six OA were assessed with CRS, a FF battery, accelerometry, strength tests, kinematics and electromyography (EMG). OA with the best and worst CRS scores were compared. The association between the scores and EMG pattern of ankle muscles was determined. OA with the best CRS scores presented higher levels of FF, PA, strength, power, speed and range of movement, and a more efficient movement pattern during the test. Subjects who scored more at the CRS test demonstrated the possibility to use a stretch-shortening cycle type of action in the PF muscles to increase power during the movements. OA with different levels of FF can be stratified by the muscular activation pattern of the calf muscles and the scores in CRS test. This study reinforced the validity of CRS for evaluating ankle strength and power in OA. Copyright © 2018 Elsevier Ltd. All rights reserved.
Measuring Nutrition Literacy in Spanish-Speaking Latinos: An Exploratory Validation Study.
Gibbs, Heather D; Camargo, Juliana M T B; Owens, Sarah; Gajewski, Byron; Cupertino, Ana Paula
2017-11-21
Nutrition is important for preventing and treating chronic diseases highly prevalent among Latinos, yet no tool exists for measuring nutrition literacy among Spanish speakers. This study aimed to adapt the validated Nutrition Literacy Assessment Instrument for Spanish-speaking Latinos. This study was developed in two phases: adaptation and validity testing. Adaptation included translation, expert item content review, and interviews with Spanish speakers. For validity testing, 51 participants completed the Short Assessment of Health Literacy-Spanish (SAHL-S), the Nutrition Literacy Assessment Instrument in Spanish (NLit-S), and socio-demographic questionnaire. Validity and reliability statistics were analyzed. Content validity was confirmed with a Scale Content Validity Index of 0.96. Validity testing demonstrated NLit-S scores were strongly correlated with SAHL-S scores (r = 0.52, p < 0.001). Entire reliability was substantial at 0.994 (CI 0.992-0.996) and internal consistency was excellent (Cronbach's α = 0.92). The NLit-S demonstrates validity and reliability for measuring nutrition literacy among Spanish-speakers.
The Pittsburgh Sleep Quality Index: validation of the Urdu translation.
Hashmi, Ali Madeeh; Khawaja, Imran Shuja; Butt, Zeeshan; Umair, Muhammad; Naqvi, Suhaib Haider; Jawad-Ul-Haq
2014-02-01
To translate and validate the Pittsburgh Sleep Quality Index (PSQI), a standardized self-administered questionnaire for the assessment of subjective sleep quality into the Urdu language. Validation study. Mayo Hospital, Lahore, from March to April 2012. The PSQI was translated into Urdu following standard guidelines. The final Urdu version (PSQI-U) was administered to 200 healthy volunteers comprising medical students, nursing staff and doctors. Inter-item correlation was assessed by calculating Cronbach alpha. Correlation of component scores with global score was assessed by calculating Spearman correlation coefficient. Correlation between global PSQI-U scores at baseline with global scores for each PSQI-U and PSQI-E at 4-week interval was evaluated by calculating Spearman correlation coefficient. Moreover, scores on individual items of the scale at baseline were compared with respective scores after 4-week by t-test. One hundred and eighty five (185) participants completed the PSQI-U at baseline. The Cronbach alpha for PSQI-U was 0.56. Scores on individual components of the PSQI-U and composite scores were all highly correlated with each other (all p-values < 0.01). Composite scores for PSQI-U at baseline and PSQI-E at 4-week interval were also highly correlated with each other (Spearman correlation coefficient 0.74, p-value < 0.01) indicating good linguistic interchangeability. Composite scores for PSQI-U at baseline and at 4-week interval were positively correlated with each other (Spearman correlation coefficient 0.70, p < 0.01) indicating good test-retest reliability. The PSQI-U is a valid and reliable instrument for the assessment of sleep quality. It shows good linguistic interchangeability and test-retest reliability in comparison to the original English version when applied to individuals who speak the Urdu language. The PSQI-U can be a tool either for clinical management or research.
Klein, A A; Collier, T; Yeates, J; Miles, L F; Fletcher, S N; Evans, C; Richards, T
2017-09-01
A simple and accurate scoring system to predict risk of transfusion for patients undergoing cardiac surgery is lacking. We identified independent risk factors associated with transfusion by performing univariate analysis, followed by logistic regression. We then simplified the score to an integer-based system and tested it using the area under the receiver operator characteristic (AUC) statistic with a Hosmer-Lemeshow goodness-of-fit test. Finally, the scoring system was applied to the external validation dataset and the same statistical methods applied to test the accuracy of the ACTA-PORT score. Several factors were independently associated with risk of transfusion, including age, sex, body surface area, logistic EuroSCORE, preoperative haemoglobin and creatinine, and type of surgery. In our primary dataset, the score accurately predicted risk of perioperative transfusion in cardiac surgery patients with an AUC of 0.76. The external validation confirmed accuracy of the scoring method with an AUC of 0.84 and good agreement across all scores, with a minor tendency to under-estimate transfusion risk in very high-risk patients. The ACTA-PORT score is a reliable, validated tool for predicting risk of transfusion for patients undergoing cardiac surgery. This and other scores can be used in research studies for risk adjustment when assessing outcomes, and might also be incorporated into a Patient Blood Management programme. © The Author 2017. Published by Oxford University Press on behalf of the British Journal of Anaesthesia. All rights reserved. For Permissions, please email: journals.permissions@oup.com
End-stage dementia spark of life: reliability and validity of the "GATOS" questionnaire.
Tsoucalas, Gregory; Bourelia, Stamati; Kalogirou, Vaso; Giatsiou, Styliani; Mavrogiannaki, Eirini; Gatos, Georgios; Galanos, Antonis; Repana, Olga; Iliadou, Eleni; Antoniou, Antonis; Sgantzos, Markos; Gatos, Konstantinos
2015-01-01
Fl oor effects are present in most dementia assessment tools as dementia progresses and the in-depth assessment of patients considered more or less on vegetative state is questionable. To develop a questionnaire (the "Gatos Clinical Test-GCT") for the assessment of end-stage demented patients. Five hundred patients with dementia of various causes and an MMSE score between 0 and 2 were enrolled in the study. The GCT consists of 14 closed type questions rated on a Likert scale. The total score is used to evaluate patient's dementia. Various aspects of validity and reliability (including face, content and structural validity as well as test-retest reliability) were examined. Three subscales "Autonomy/Alertness", "Gnosias" and "Somatokinetic function" were defined, with a Cronbach equal to 0.851, 0.756 and 0.598 respectively. The GCT subscales and total score were statistically significant higher in patients with MMSE score 1 or 2 compared with those with MMSE score 0 (p<0.0005). Patients with GCT total score less than 12.5 had 75% probability to have zero MMSE score. The "GATOS" questionnaire is a valid and reliable test for patients with severe dementia, aiming at identification of those patients who could sustain some quality of life. It is a relatively short and easy to administer tool. As dementia prevalence is expected to rise further worldwide we believe that GCT could offer valuable services to health professionals, caregivers and patients.
Sivaprasad, Sobha; Tschosik, Elizabeth; Kapre, Audrey; Varma, Rohit; Bressler, Neil M; Kimel, Miriam; Dolan, Chantal; Silverman, David
2018-06-01
Geographic atrophy (GA) is an advanced form of age-related macular degeneration characterized by progressive, irreversible visual function loss. This analysis evaluates the psychometric properties of the 25-Item National Eye Institute Visual Function Questionnaire (NEI VFQ-25) composite, near activity, and distance activity scores in patients with GA. Reliability and validity study. Reliability and validity were tested with NEI VFQ-25 data collected from 100 subjects with GA from United States' sites of the phase 2 Mahalo study of lampalizumab (ClinicalTrials.gov identifier: NCT01229215). Strong internal consistency and reproducibility were demonstrated for the NEI VFQ-25 composite (Cronbach's α, 0.95; intraclass correlation coefficient [ICC], 0.86), near activity (Cronbach's α, 0.84; ICC, 0.80), and distance activity (Cronbach's α, 0.84; ICC, 0.84) scores. Convergent validity with the binocular measures, Minnesota Low-Vision Reading Test (MNRead) reading speed and Functional Reading Independence (FRI) index score, was demonstrated for baseline NEI VFQ-25 composite (Pearson correlation [r] = 0.61 and 0.69, respectively), near activities (r = 0.69 and 0.73), and distance activities (r = 0.57 and 0.64) scores. Known-group validity testing for baseline mean NEI VFQ-25 scores (composite, near activities, and distance activities) showed differences between patients with mean maximum MNRead reading speed ≥ 80 vs < 80 words per minute, and between mean FRI index score ≥ 2.5 vs < 2.5 (all P < .0001). Psychometric evidence supports the NEI VFQ-25 as a reliable and valid cross-sectional measure of the impact of GA on patient visual function and vision-related quality of life. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Keeping Your Audience in Mind: Applying Audience Analysis to the Design of Interactive Score Reports
ERIC Educational Resources Information Center
Zapata-Rivera, Juan Diego; Katz, Irvin R.
2014-01-01
Score reports have one or more intended audiences: the people who use the reports to make decisions about test takers, including teachers, administrators, parents and test takers. Attention to audience when designing a score report supports assessment validity by increasing the likelihood that score users will interpret and use assessment results…
34 CFR 668.144 - Application for test approval.
Code of Federal Regulations, 2010 CFR
2010-07-01
... the comparability of scores on the current test to scores on the previous test, and data from validity... explanation of the methodology and procedures for measuring the reliability of the test; (ii) Evidence that different forms of the test, including, if applicable, short forms, are comparable in reliability; (iii...
Ebrahimi-Madiseh, Azadeh; Eikelboom, Robert H; Jayakody, Dona Mp; Atlas, Marcus D
2016-01-01
To evaluate the clinical utility of the City University of New York sentence test in a cohort of post-lingually deafened cochlear implants recipients over time. 117 post-lingually deafened, Australian English-speaking CI recipients aged between 23 and 98 years (M = 66 years; SD = 15.09) were recruited. CUNY sentence test scores in quiet were collated and analysed at two cut-offs, 95% and 100%, as ceiling scores. CUNY sentence scores ranged from 4% to 100% (M = 86.75; SD = 20.65), with 38.8% of participants scoring 95% and 16.5% of participants reaching the 100% scores. The percentage of participants reaching the 95% and 100% ceiling scores increased over time (6 and 12 months post-implantation). The distribution of all post-operative CUNY test scores skewed to the right with 82% of test scores reaching above 90%. This study demonstrates that the CUNY test cannot be used as a valid tool to measure the speech perception skills of post-lingually deafened CI recipients over time. This may be overcome by using adaptive test protocols or linguistically, cognitively or contextually demanding test materials. The high percentage of CI recipients achieving ceiling scores for the CUNY sentence test in quiet at 3 months post-implantation, questions the validity of using CUNY in CI assessment test battery and limits its application for use in longitudinal studies evaluating CI outcomes. Further studies are required to examine different methods to overcome this problem.
Inter-Rater Reliability and Validity of the Australian Football League’s Kicking and Handball Tests
Cripps, Ashley J.; Hopper, Luke S.; Joyce, Christopher
2015-01-01
Talent identification tests used at the Australian Football League’s National Draft Combine assess the capacities of athletes to compete at a professional level. Tests created for the National Draft Combine are also commonly used for talent identification and athlete development in development pathways. The skills tests created by the Australian Football League required players to either handball (striking the ball with the hand) or kick to a series of 6 randomly generated targets. Assessors subjectively rate each skill execution giving a 0-5 score for each disposal. This study aimed to investigate the inter-rater reliability and validity of the skills tests at an adolescent sub-elite level. Male Australian footballers were recruited from sub-elite adolescent teams (n = 121, age = 15.7 ± 0.3 years, height = 1.77 ± 0.07 m, mass = 69.17 ± 8.08 kg). The coaches (n = 7) of each team were also recruited. Inter-rater reliability was assessed using Inter-class correlations (ICC) and Limits of Agreement statistics. Both the kicking (ICC = 0.96, p < .01) and handball tests (ICC = 0.89, p < .01) demonstrated strong reliability and acceptable levels of absolute agreement. Content validity was determined by examining the test scores sensitivity to laterality and distance. Concurrent validity was assessed by comparing coaches’ perceptions of skill to actual test outcomes. Multivariate analysis of variance (MANOVA) examined the main effect of laterality, with scores on the dominant hand (p = .04) and foot (p < .01) significantly higher compared to the non-dominant side. Follow-up univariate analysis reported significant differences at every distance in the kicking test. A poor correlation was found between coaches’ perceptions of skill and testing outcomes. The results of this study demonstrate both skill tests demonstrate acceptable inter-rater reliable. Partial content validity was confirmed for the kicking test, however further research is required to confirm validity of the handball test. Key points The skill tests created by the AFL demonstrated acceptable levels of relative and absolute inter-rater reliability. Both the AFL’s skills tests are able to differentiate between athletes dominant and non-dominant limbs. However, only the kicking test could consistently differentiated between score outcomes over a range of Australian Football specific disposal distances. Both tests demonstrated poor concurrent validity, with no correlation found between coaches’ perceptions of technical skills and actual skill outcomes measured. PMID:26336356
Reliability and validity of a Swedish language version of the Resilience Scale.
Nygren, Björn; Randström, Kerstin Björkman; Lejonklou, Anna K; Lundman, Beril
2004-01-01
The purpose of this study was to test the reliability and validity of the Swedish language version of the Resilience Scale (RS). Participants were 142 adults between 19-85 years of age. Internal consistency reliability, stability over time, and construct validity were evaluated using Cronbach's alpha, principal components analysis with varimax rotation and correlations with scores on the Sense of Coherence Scale (SOC) and the Rosenberg Self-Esteem Scale (RSE). The mean score on the RS was 142 (SD = 15). The possible scores on the RS range from 25 to 175, and scores higher than 146 are considered high. The test-retest correlation was .78. Correlations with the SOC and the RSE were .41 (p < 0.01) and .37 (p < 0.01), respectively. Personal Assurance and Acceptance of Self and Life emerged as components from the principal components analysis. These findings provide evidence for the reliability and validity of the Swedish language version of the RS.
Cross-Validation of easyCBM Reading Cut Scores in Washington: 2009-2010. Technical Report #1109
ERIC Educational Resources Information Center
Irvin, P. Shawn; Park, Bitnara Jasmine; Anderson, Daniel; Alonzo, Julie; Tindal, Gerald
2011-01-01
This technical report presents results from a cross-validation study designed to identify optimal cut scores when using easyCBM[R] reading tests in Washington state. The cross-validation study analyzes data from the 2009-2010 academic year for easyCBM[R] reading measures. A sample of approximately 900 students per grade, randomly split into two…
Assessment scale of risk for surgical positioning injuries 1
Lopes, Camila Mendonça de Moraes; Haas, Vanderlei José; Dantas, Rosana Aparecida Spadoti; de Oliveira, Cheila Gonçalves; Galvão, Cristina Maria
2016-01-01
ABSTRACT Objective: to build and validate a scale to assess the risk of surgical positioning injuries in adult patients. Method: methodological research, conducted in two phases: construction and face and content validation of the scale and field research, involving 115 patients. Results: the Risk Assessment Scale for the Development of Injuries due to Surgical Positioning contains seven items, each of which presents five subitems. The scale score ranges between seven and 35 points in which, the higher the score, the higher the patient's risk. The Content Validity Index of the scale corresponded to 0.88. The application of Student's t-test for equality of means revealed the concurrent criterion validity between the scores on the Braden scale and the constructed scale. To assess the predictive criterion validity, the association was tested between the presence of pain deriving from surgical positioning and the development of pressure ulcer, using the score on the Risk Assessment Scale for the Development of Injuries due to Surgical Positioning (p<0.001). The interrater reliability was verified using the intraclass correlation coefficient, equal to 0.99 (p<0.001). Conclusion: the scale is a valid and reliable tool, but further research is needed to assess its use in clinical practice. PMID:27579925
Validity of the Miller forensic assessment of symptoms test in psychiatric inpatients.
Veazey, Connie H; Wagner, Alisha L; Hays, J Ray; Miller, Holly A
2005-06-01
This study investigated the validity of the Miller Forensic Assessment of Symptoms Test (M-FAST), a brief measure of malingering, in an inpatient psychiatric sample of 70. Among those patients who also completed the Personality Assessment Inventory (N=44), Total M-FAST score was related in the expected directions to the Personality Assessment Inventory validity scales and indexes, providing evidence for concurrent validity of the M-FAST. With the PAI malingering index used as a criterion, we examined the diagnostic efficiency of the M-FAST and found a cut score of 8 represented the best balance of sensitivity, specificity, positive predictive power, and negative predictive power. Based on this cut-score of 8, 16% of the population was classified as malingering. The M-FAST appears to be an excellent rapid screen for symptom exaggeration in this population and setting.
Can Percentiles Replace Raw Scores in the Statistical Analysis of Test Data?
ERIC Educational Resources Information Center
Zimmerman, Donald W.; Zumbo, Bruno D.
2005-01-01
Educational and psychological testing textbooks typically warn of the inappropriateness of performing arithmetic operations and statistical analysis on percentiles instead of raw scores. This seems inconsistent with the well-established finding that transforming scores to ranks and using nonparametric methods often improves the validity and power…
Urdu translation of the Hamilton Rating Scale for Depression: Results of a validation study
Hashmi, Ali M.; Naz, Shahana; Asif, Aftab; Khawaja, Imran S.
2016-01-01
Objective: To develop a standardized validated version of the Hamilton Rating Scale for Depression (HAM-D) in Urdu. Methods: After translation of the HAM-D into the Urdu language following standard guidelines, the final Urdu version (HAM-D-U) was administered to 160 depressed outpatients. Inter-item correlation was assessed by calculating Cronbach alpha. Correlation between HAM-D-U scores at baseline and after a 2-week interval was evaluated for test-retest reliability. Moreover, scores of two clinicians on HAM-D-U were compared for inter-rater reliability. For establishing concurrent validity, scores of HAM-D-U and BDI-U were compared by using Spearman correlation coefficient. The study was conducted at Mayo Hospital, Lahore, from May to December 2014. Results: The Cronbach alpha for HAM-D-U was 0.71. Composite scores for HAM-D-U at baseline and after a 2-week interval were also highly correlated with each other (Spearman correlation coefficient 0.83, p-value < 0.01) indicating good test-retest reliability. Composite scores for HAM-D-U and BDI-U were positively correlated with each other (Spearman correlation coefficient 0.85, p < 0.01) indicating good concurrent validity. Scores of two clinicians for HAM-D-U were also positively correlated (Spearman correlation coefficient 0.82, p-value < 0.01) indicated good inter-rater reliability. Conclusion: The HAM-D-U is a valid and reliable instrument for the assessment of Depression. It shows good inter-rater and test-retest reliability. The HAM-D-U can be a tool either for clinical management or research. PMID:28083049
Urdu translation of the Hamilton Rating Scale for Depression: Results of a validation study.
Hashmi, Ali M; Naz, Shahana; Asif, Aftab; Khawaja, Imran S
2016-01-01
To develop a standardized validated version of the Hamilton Rating Scale for Depression (HAM-D) in Urdu. After translation of the HAM-D into the Urdu language following standard guidelines, the final Urdu version (HAM-D-U) was administered to 160 depressed outpatients. Inter-item correlation was assessed by calculating Cronbach alpha. Correlation between HAM-D-U scores at baseline and after a 2-week interval was evaluated for test-retest reliability. Moreover, scores of two clinicians on HAM-D-U were compared for inter-rater reliability. For establishing concurrent validity, scores of HAM-D-U and BDI-U were compared by using Spearman correlation coefficient. The study was conducted at Mayo Hospital, Lahore, from May to December 2014. The Cronbach alpha for HAM-D-U was 0.71. Composite scores for HAM-D-U at baseline and after a 2-week interval were also highly correlated with each other (Spearman correlation coefficient 0.83, p-value < 0.01) indicating good test-retest reliability. Composite scores for HAM-D-U and BDI-U were positively correlated with each other (Spearman correlation coefficient 0.85, p < 0.01) indicating good concurrent validity. Scores of two clinicians for HAM-D-U were also positively correlated (Spearman correlation coefficient 0.82, p-value < 0.01) indicated good inter-rater reliability. The HAM-D-U is a valid and reliable instrument for the assessment of Depression. It shows good inter-rater and test-retest reliability. The HAM-D-U can be a tool either for clinical management or research.
ERIC Educational Resources Information Center
Ling, Guangming; Powers, Donald E.; Adler, Rachel M.
2014-01-01
One fundamental way to determine the validity of standardized English-language test scores is to investigate the extent to which they reflect anticipated learning effects in different English-language programs. In this study, we investigated the extent to which the "TOEFL iBT"® practice test reflects the learning effects of students at…
ERIC Educational Resources Information Center
Anderson, Daniel; Alonzo, Julie; Tindal, Gerald
2011-01-01
In this technical report, we document the results of a cross-validation study designed to identify optimal cut-scores for the use of the easyCBM[R] mathematics test in the state of Washington. A large sample, randomly split into two groups of roughly equal size, was used for this study. Students' performance classification on the Washington state…
A Cross-Validation of easyCBM[R] Mathematics Cut Scores in Oregon: 2009-2010. Technical Report #1104
ERIC Educational Resources Information Center
Anderson, Daniel; Alonzo, Julie; Tindal, Gerald
2011-01-01
In this technical report, we document the results of a cross-validation study designed to identify optimal cut-scores for the use of the easyCBM[R] mathematics test in Oregon. A large sample, randomly split into two groups of roughly equal size, was used for this study. Students' performance classification on the Oregon state test was used as the…
Endarti, Dwi; Riewpaiboon, Arthorn; Thavorncharoensap, Montarat; Praditsitthikorn, Naiyana; Hutubessy, Raymond; Kristina, Susi Ari
2018-05-01
To gain insight into the most suitable foreign value set among Malaysian, Singaporean, Thai, and UK value sets for calculating the EuroQol five-dimensional questionnaire index score (utility) among patients with cervical cancer in Indonesia. Data from 87 patients with cervical cancer recruited from a referral hospital in Yogyakarta province, Indonesia, from an earlier study of health-related quality of life were used in this study. The differences among the utility scores derived from the four value sets were determined using the Friedman test. Performance of the psychometric properties of the four value sets versus visual analogue scale (VAS) was assessed. Intraclass correlation coefficients and Bland-Altman plots were used to test the agreement among the utility scores. Spearman ρ correlation coefficients were used to assess convergent validity between utility scores and patients' sociodemographic and clinical characteristics. With respect to known-group validity, the Kruskal-Wallis test was used to examine the differences in utility according to the stages of cancer. There was significant difference among utility scores derived from the four value sets, among which the Malaysian value set yielded higher utility than the other three value sets. Utility obtained from the Malaysian value set had more agreements with VAS than the other value sets versus VAS (intraclass correlation coefficients and Bland-Altman plot tests results). As for the validity, the four value sets showed equivalent psychometric properties as those that resulted from convergent and known-group validity tests. In the absence of an Indonesian value set, the Malaysian value set was more preferable to be used compared with the other value sets. Further studies on the development of an Indonesian value set need to be conducted. Copyright © 2018. Published by Elsevier Inc.
Azari, Nadia; Soleimani, Farin; Vameghi, Roshanak; Sajedi, Firoozeh; Shahshahani, Soheila; Karimi, Hossein; Kraskian, Adis; Shahrokhi, Amin; Teymouri, Robab; Gharib, Masoud
2017-01-01
Bayley Scales of infant & toddler development is a well-known diagnostic developmental assessment tool for children aged 1-42 months. Our aim was investigating the validity & reliability of this scale in Persian speaking children. The method was descriptive-analytic. Translation- back translation and cultural adaptation was done. Content & face validity of translated scale was determined by experts' opinions. Overall, 403 children aged 1 to 42 months were recruited from health centers of Tehran, during years of 2013-2014 for developmental assessment in cognitive, communicative (receptive & expressive) and motor (fine & gross) domains. Reliability of scale was calculated through three methods; internal consistency using Cronbach's alpha coefficient, test-retest and interrater methods. Construct validity was calculated using factor analysis and comparison of the mean scores methods. Cultural and linguistic changes were made in items of all domains especially on communication subscale. Content and face validity of the test were approved by experts' opinions. Cronbach's alpha coefficient was above 0.74 in all domains. Pearson correlation coefficient in various domains, were ≥ 0.982 in test retest method, and ≥0.993 in inter-rater method. Construct validity of the test was approved by factor analysis. Moreover, the mean scores for the different age groups were compared and statistically significant differences were observed between mean scores of different age groups, that confirms validity of the test. The Bayley Scales of Infant and Toddler Development is a valid and reliable tool for child developmental assessment in Persian language children.
Hoenigl, Martin; Weibel, Nadir; Mehta, Sanjay R; Anderson, Christy M; Jenks, Jeffrey; Green, Nella; Gianella, Sara; Smith, Davey M; Little, Susan J
2015-08-01
Although men who have sex with men (MSM) represent a dominant risk group for human immunodeficiency virus (HIV), the risk of HIV infection within this population is not uniform. The objective of this study was to develop and validate a score to estimate incident HIV infection risk. Adult MSM who were tested for acute and early HIV (AEH) between 2008 and 2014 were retrospectively randomized 2:1 to a derivation and validation dataset, respectively. Using the derivation dataset, each predictor associated with an AEH outcome in the multivariate prediction model was assigned a point value that corresponded to its odds ratio. The score was validated on the validation dataset using C-statistics. Data collected at a single HIV testing encounter from 8326 unique MSM were analyzed, including 200 with AEH (2.4%). Four risk behavior variables were significantly associated with an AEH diagnosis (ie, incident infection) in multivariable analysis and were used to derive the San Diego Early Test (SDET) score: condomless receptive anal intercourse (CRAI) with an HIV-positive MSM (3 points), the combination of CRAI plus ≥5 male partners (3 points), ≥10 male partners (2 points), and diagnosis of bacterial sexually transmitted infection (2 points)-all as reported for the prior 12 months. The C-statistic for this risk score was >0.7 in both data sets. The SDET risk score may help to prioritize resources and target interventions, such as preexposure prophylaxis, to MSM at greatest risk of acquiring HIV infection. The SDET risk score is deployed as a freely available tool at http://sdet.ucsd.edu. © The Author 2015. Published by Oxford University Press on behalf of the Infectious Diseases Society of America. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Soble, Jason R; Bain, Kathleen M; Bailey, K Chase; Kirton, Joshua W; Marceaux, Janice C; Critchfield, Edan A; McCoy, Karin J M; O'Rourke, Justin J F
2018-01-08
Embedded performance validity tests (PVTs) allow for continuous assessment of invalid performance throughout neuropsychological test batteries. This study evaluated the utility of the Wechsler Memory Scale-Fourth Edition (WMS-IV) Logical Memory (LM) Recognition score as an embedded PVT using the Advanced Clinical Solutions (ACS) for WAIS-IV/WMS-IV Effort System. This mixed clinical sample was comprised of 97 total participants, 71 of whom were classified as valid and 26 as invalid based on three well-validated, freestanding criterion PVTs. Overall, the LM embedded PVT demonstrated poor concordance with the criterion PVTs and unacceptable psychometric properties using ACS validity base rates (42% sensitivity/79% specificity). Moreover, 15-39% of participants obtained an invalid ACS base rate despite having a normatively-intact age-corrected LM Recognition total score. Receiving operating characteristic curve analysis revealed a Recognition total score cutoff of < 61% correct improved specificity (92%) while sensitivity remained weak (31%). Thus, results indicated the LM Recognition embedded PVT is not appropriate for use from an evidence-based perspective, and that clinicians may be faced with reconciling how a normatively intact cognitive performance on the Recognition subtest could simultaneously reflect invalid performance validity.
Murphy, Jennifer; Ahmed, Fizaa; Lomen-Hoerth, Catherine
2015-03-01
The University of California San Francisco (UCSF) Screening Battery provides clinicians with a uniquely tailored tool to measure ALS patients' cognitive and behavioral changes, adjusting for dysarthria and hand weakness. The battery consists of the ALS-CBS ( 1 ), Written Fluency Test ( 2 ), and a new revision of the Frontal Behavior Inventory (FBI-ALS) ( 3 ). The validity of each component was tested by comparing results with a gold standard neuropsychological exam (GNE). Consensus criteria-based GNE diagnoses ( 4 ) were assigned (n = 24) and concurrent validity was tested for each screening exam component. Results showed that each of the four cognitive and behavioral screening test components were significantly associated with diagnoses confirmed by GNE. GNE diagnoses were significantly associated with FBI-ALS negative score, written S-words score, and ALS-CBS cognitive score. The total FBI-ALS score and C-words tests were less predictive of GNE-diagnosed impairment. In conclusion, the UCSF Cognitive Screening Battery demonstrates good external validity compared with GNE in this modest sample, encouraging its use in larger investigations. These data suggest that this battery may provide an effective screen to identify ALS patients who will then benefit from a full examination to confirm their diagnosis.
Halm, Margo A
2018-05-14
Proficiency in evidence-based practice (EBP) is essential for relevant research findings to be integrated into clinical care when congruent with patient preferences. Few valid and reliable tools are available to evaluate the effectiveness of educational programs in advancing EBP attitudes, knowledge, skills, or behaviors, and ongoing competency. The Fresno test is one objective method to evaluate EBP knowledge and skills; however, the original and modified versions were validated with family physicians, physical therapists, and speech and language therapists. To adapt the Modified Fresno-Acute Care Nursing test and develop a psychometrically sound tool for use in academic and practice settings. In Phase 1, modified Fresno (Tilson, 2010) items were adapted for acute care nursing. In Phase 2, content validity was established with an expert panel. Content validity indices (I-CVI) ranged from .75 to 1.0. Scale CVI was .95%. A cross-sectional convenience sample of acute care nurses (n = 90) in novice, master, and expert cohorts completed the Modified Fresno-Acute Care Nursing test administered electronically via SurveyMonkey. Total scores were significantly different between training levels (p < .0001). Novice nurses scored significantly lower than master or expert nurses, but differences were not found between the latter cohorts. Total score reliability was acceptable: (interrater [ICC (2, 1)]) = .88. Cronbach's alpha was 0.70. Psychometric properties of most modified items were satisfactory; however, six require further revision and testing to meet acceptable standards. The Modified Fresno-Acute Care Nursing test is a 14-item test for objectively assessing EBP knowledge and skills of acute care nurses. While preliminary psychometric properties for this new EBP knowledge measure for acute care nursing are promising, further validation of some of the items and scoring rubric is needed. © 2018 Sigma Theta Tau International.
Portuguese-language version of the COPD Assessment Test: validation for use in Brazil*
da Silva, Guilherme Pinheiro Ferreira; Morano, Maria Tereza Aguiar Pessoa; Viana, Cyntia Maria Sampaio; Magalhães, Clarissa Bentes de Araujo; Pereira, Eanes Delgado Barros
2013-01-01
OBJECTIVE: To validate a Portuguese-language version of the COPD assessment test (CAT) for use in Brazil and to assess the reproducibility of this version. METHODS: This was multicenter study involving patients with stable COPD at two teaching hospitals in the city of Fortaleza, Brazil. Two independent observers (twice in one day) administered the Portuguese-language version of the CAT to 50 patients with COPD. One of those observers again administered the scale to the same patients one week later. At baseline, the patients were submitted to pulmonary function testing and the six-minute walk test (6MWT), as well as completing the previously validated Portuguese-language versions of the Saint George's Respiratory Questionnaire (SGRQ), modified Medical Research Council (MMRC) dyspnea scale, and hospital anxiety and depression scale (HADS). RESULTS: Inter-rater and intra-rater reliability was excellent (intraclass correlation coefficient [ICC] = 0.96; 95% CI: 0.93-0.97; p < 0.001; and ICC = 0.98; 95% CI: 0.96-0.98; p < 0.001, respectively). Bland Altman plots showed good test-retest reliability. The CAT total score correlated significantly with spirometry results, 6MWT distance, SGRQ scores, MMRC dyspnea scale scores, and HADS-depression scores. CONCLUSIONS: The Portuguese-language version of the CAT is a valid, reproducible, and reliable instrument for evaluating patients with COPD in Brazil. PMID:24068260
Ray, Midge N; Houston, Thomas K; Yu, Feliciano B; Menachemi, Nir; Maisiak, Richard S; Allison, Jeroan J; Berner, Eta S
2006-01-01
The authors developed and evaluated a rating scale, the Attitudes toward Handheld Decision Support Software Scale (H-DSS), to assess physician attitudes about handheld decision support systems. The authors conducted a prospective assessment of psychometric characteristics of the H-DSS including reliability, validity, and responsiveness. Participants were 82 Internal Medicine residents. A higher score on each of the 14 five-point Likert scale items reflected a more positive attitude about handheld DSS. The H-DSS score is the mean across the fourteen items. Attitudes toward the use of the handheld DSS were assessed prior to and six months after receiving the handheld device. Cronbach's Alpha was used to assess internal consistency reliability. Pearson correlations were used to estimate and detect significant associations between scale scores and other measures (validity). Paired sample t-tests were used to test for changes in the mean attitude scale score (responsiveness) and for differences between groups. Internal consistency reliability for the scale was alpha = 0.73. In testing validity, moderate correlations were noted between the attitude scale scores and self-reported Personal Digital Assistant (PDA) usage in the hospital (correlation coefficient = 0.55) and clinic (0.48), p < 0.05 for both. The scale was responsive, in that it detected the expected increase in scores between the two administrations (3.99 (s.d. = 0.35) vs. 4.08, (s.d. = 0.34), p < 0.005). The authors' evaluation showed that the H-DSS scale was reliable, valid, and responsive. The scale can be used to guide future handheld DSS development and implementation.
Merolla, Giovanni; Corona, Katia; Zanoli, Gustavo; Cerciello, Simone; Giannotti, Stefano; Porcellini, Giuseppe
2017-12-01
The Kerlan-Jobe Orthopaedic Clinic (KJOC) Shoulder and Elbow score is a reliable and sensitive tool to measure the performance of overhead athletes. The purpose of this study was to carry out a cross-cultural adaptation and validation of the KJOC questionnaire in Italian and to assess its reliability, validity, and responsiveness. Ninety professional athletes with a painful shoulder were included in this study and were assigned to the "injury group" (n = 32) or the "overuse group" (n = 58); 65 were managed conservatively and 25 were treated by arthroscopic surgery. To assess the reliability of the KJOC score, patients were asked to fill in the questionnaire at baseline and after 2 weeks. To test the construct validity, KJOC scores were compared to those obtained with the Italian version of the Disabilities of the Arm, Shoulder, and Hand (DASH) scale, and with the DASH sports/performing arts module. To test KJOC score responsiveness, the follow-up KJOC scores of the participants treated conservatively were compared to those of the patients treated by arthroscopic surgery. Statistical analysis demonstrated that the KJOC questionnaire is reliable in terms of the single items and the overall score (ICC 0.95-0.99); that it has high construct validity (r s = -0.697; p < 0.01); and that it is responsive to clinical differences in shoulder function (p < 0.0001). The Italian version of the KJOC Shoulder and Elbow score performed in a similar way to the English version and demonstrated good validity, reliability, and responsiveness after conservative and surgical treatment. II.
Validation of the Australian Propensity for Angry Driving Scale (Aus-PADS).
Leal, Nerida L; Pachana, Nancy A
2009-09-01
The present study used a university sample to assess the test-retest reliability and validity of the Australian Propensity for Angry Driving Scale (Aus-PADS). The scale has stability over time, and convergent validity was established, as Aus-PADS scores correlated significantly with established anger and impulsivity measures. Discriminant validity was also established, as Aus-PADS scores did not correlate with Venturesomeness scores. The Aus-PADS has demonstrated criterion validity, as scores were correlated with behavioural measures, such as yelling at other drivers, gesturing at other drivers, and feeling angry but not doing anything. Aus-PADS scores reliably predicted the frequency of these behaviours over and above other study variables. No significant relationship between aggressive driving and crash involvement was observed. It was concluded that the Aus-PADS is a reliable and valid tool appropriate for use in Australian research, and that the potential relationship between aggressive driving and crash involvement warrants further investigation with a more representative (and diverse) driver sample.
ERIC Educational Resources Information Center
Allen, Daniel N.; Thaler, Nicholas S.; Barchard, Kimberly A.; Vertinski, Mary; Mayfield, Joan
2012-01-01
The Comprehensive Trail Making Test (CTMT) is a relatively new version of the Trail Making Test that has a number of appealing features, including a large normative sample that allows raw scores to be converted to standard "T" scores adjusted for age. Preliminary validity information suggests that CTMT scores are sensitive to brain…
An Investigation of Indicators of Success in Graduates of a Progressive, Urban, Public High School
ERIC Educational Resources Information Center
Kunkel, Christine D.
2016-01-01
Using standardized test scores to measure success in schools is a controversial topic in education today. Many feel that test scores are not a valid indicator of success, or are being overused to the detriment of the curriculum. But if not test scores, then what is the alternative? This study examines potential alternatives, or more authentic…
ERIC Educational Resources Information Center
Haertel, Edward
2013-01-01
In validating uses of testing, it is helpful to distinguish those that rely directly on the information provided by scores or score distributions ("direct" uses and consequences) versus those that instead capitalize on the motivational effects of testing, or use testing and test reporting to shape public opinion ("indirect" uses and consequences).…
Development and validation of parenting measures for body image and eating patterns in childhood.
Damiano, Stephanie R; Hart, Laura M; Paxton, Susan J
2015-01-01
Evidence-based parenting interventions are important in assisting parents to help their children develop healthy body image and eating patterns. To adequately assess the impact of parenting interventions, valid parent measures are required. The aim of this study was to develop and assess the validity and reliability of two new parent measures, the Parenting Intentions for Body image and Eating patterns in Childhood (Parenting Intentions BEC) and the Knowledge Test for Body image and Eating patterns in Childhood (Knowledge Test BEC). Participants were 27 professionals working in research or clinical treatment of body dissatisfaction or eating disorders, and 75 parents of children aged 2-6 years, who completed the measures via an online questionnaire. Seven scenarios were developed for the Parenting Intentions BEC to describe common experiences about the body and food that parents might need to respond to in front of their child. Parents ranked four behavioural intentions, derived from the current literature on parenting risk factors for body dissatisfaction and unhealthy eating patterns in children. Two subscales were created, one representing positive behavioural intentions, the other negative behavioural intentions. After piloting a larger pool of items, 13 statements were used to construct the Knowledge Test BEC. These were designed to be factual statements about the influence of parent language, media, family meals, healthy eating, and self-esteem on child eating and body image. The validity of both measures was tested by comparing parent and professional scores, and reliability was assessed by comparing parent scores over two testing occasions. Compared with parents, professionals reported significantly higher scores on the Positive Intentions subscale and significantly lower on the Negative Intentions subscale of the Parenting Intentions BEC; confirming the discriminant validity of six out of the seven scenarios. Test-retest reliability was also confirmed as parent scores on the two Parenting Intentions subscales did not differ over time. Eleven out of the 13 Knowledge Test items demonstrated sufficient discriminant validity and test-retest reliability. Overall, results indicated that the six-scenario Parenting Intentions BEC and the 11-item Knowledge Test BEC are valid and reliable measures for parents of young children.
ERIC Educational Resources Information Center
Munger, Kristen A.; Murray, Maria S.
2017-01-01
The purpose of this study was to examine the validity evidence of first-grade spelling scores from a standardized test of nonsense word spellings and their potential value within universal literacy screening. Spelling scores from the Test of Phonological Awareness: Second Edition PLUS for 47 first-grade children were scored using a standardized…
Liaw, Sok Ying; Rashasegaran, Ahtherai; Wong, Lai Fun; Deneen, Christopher Charles; Cooper, Simon; Levett-Jones, Tracy; Goh, Hongli Sam; Ignacio, Jeanette
2018-03-01
The development of clinical reasoning skills in recognising and responding to clinical deterioration is essential in pre-registration nursing education. Simulation has been increasingly used by educators to develop this skill. To develop and evaluate the psychometric properties of a Clinical Reasoning Evaluation Simulation Tool (CREST) for measuring clinical reasoning skills in recognising and responding to clinical deterioration in a simulated environment. A scale development with psychometric testing and mixed methods study. Nursing students and academic staff were recruited at a university. A three-phase prospective study was conducted. Phase 1 involved the development and content validation of the CREST; Phase 2 included the psychometric testing of the tool with 15 second-year and 15 third-year nursing students who undertook the simulation-based assessment; Phase 3 involved the usability testing of the tool with nine academic staff through a survey questionnaire and focus group discussion. A 10-item CREST was developed based on a model of clinical reasoning. A content validity of 0.93 was obtained from the validation of 15 international experts. The construct validity was supported as the third-year students demonstrated significantly higher (p<0.001) clinical reasoning scores than the second-year students. The concurrent validity was also supported with significant positive correlations between global rating scores and almost all subscale scores, and the total scores. The predictive validity was supported with an existing tool. The internal consistency was high with a Cronbach's alpha of 0.92. A high inter-rater reliability was demonstrated with an intraclass correlation coefficient of 0.88. The usability of the tool was rated positively by the nurse educators but the need to ease the scoring process was highlighted. A valid and reliable tool was developed to measure the effectiveness of simulation in developing clinical reasoning skills for recognising and responding to clinical deterioration. Copyright © 2017. Published by Elsevier Ltd.
The Motivated Strategies for Learning Questionnaire: score validity among medicine residents.
Cook, David A; Thompson, Warren G; Thomas, Kris G
2011-12-01
The Motivated Strategies for Learning Questionnaire (MSLQ) purports to measure motivation using the expectancy-value model. Although it is widely used in other fields, this instrument has received little study in health professions education. The purpose of this study was to evaluate the validity of MSLQ scores. We conducted a validity study evaluating the relationships of MSLQ scores to other variables and their internal structure (reliability and factor analysis). Participants included 210 internal medicine and family medicine residents participating in a web-based course on ambulatory medicine at an academic medical centre. Measurements included pre-course MSLQ scores, pre- and post-module motivation surveys, post-module knowledge test and post-module Instructional Materials Motivation Survey (IMMS) scores. Internal consistency was universally high for all MSLQ items together (Cronbach's α = 0.93) and for each domain (α ≥ 0.67). Total MSLQ scores showed statistically significant positive associations with post-test knowledge scores. For example, a 1-point rise in total MSLQ score was associated with a 4.4% increase in post-test scores (β = 4.4; p < 0.0001). Total MSLQ scores showed moderately strong, statistically significant associations with several other measures of effort, motivation and satisfaction. Scores on MSLQ domains demonstrated associations that generally aligned with our hypotheses. Self-efficacy and control of learning belief scores demonstrated the strongest domain-specific relationships with knowledge scores (β = 2.9 for both). Confirmatory factor analysis showed a borderline model fit. Follow-up exploratory factor analysis revealed the scores of five factors (self-efficacy, intrinsic interest, test anxiety, extrinsic goals, attribution) demonstrated psychometric and predictive properties similar to those of the original scales. Scores on the MSLQ are reliable and predict meaningful outcomes. However, the factor structure suggests a simplified model might better fit the empiric data. Future research might consider how assessing and responding to motivation could enhance learning. © Blackwell Publishing Ltd 2011.
Relationship of Elementary and Secondary School Achievement Test Scores to Later Academic Success.
ERIC Educational Resources Information Center
Loyd, Brenda H.; And Others
1980-01-01
This study investigated the relationship between achievement test scores on the Iowa Tests of Basic Skills (ITBS) and Iowa Tests of Educational Development (ITED), and high school and college grade point average. Support for the predictive validity of the ITBS and ITED achievement test batteries is provided. (Author/GK)
The Autonomic Symptom Profile: a new instrument to assess autonomic symptoms
NASA Technical Reports Server (NTRS)
Suarez, G. A.; Opfer-Gehrking, T. L.; Offord, K. P.; Atkinson, E. J.; O'Brien, P. C.; Low, P. A.
1999-01-01
OBJECTIVE: To develop a new specific instrument called the Autonomic Symptom Profile to measure autonomic symptoms and test its validity. BACKGROUND: Measuring symptoms is important in the evaluation of quality of life outcomes. There is no validated, self-completed questionnaire on the symptoms of patients with autonomic disorders. METHODS: The questionnaire is 169 items concerning different aspects of autonomic symptoms. The Composite Autonomic Symptom Scale (COMPASS) with item-weighting was established; higher scores indicate more or worse symptoms. Autonomic function tests were performed to generate the Composite Autonomic Scoring Scale (CASS) and to quantify autonomic deficits. We compared the results of the COMPASS with the CASS derived from the Autonomic Reflex Screen to evaluate validity. RESULTS: The instrument was tested in 41 healthy controls (mean age 46.6 years), 33 patients with nonautonomic peripheral neuropathies (mean age 59.5 years), and 39 patients with autonomic failure (mean age 61.1 years). COMPASS scores correlated well with the CASS, demonstrating an acceptable level of content and criterion validity. The mean (+/-SD) overall COMPASS score was 9.8 (+/-9) in controls, 25.9 (+/-17.9) in the patients with nonautonomic peripheral neuropathies, and 52.3 (+/-24.2) in the autonomic failure group. Scores of symptoms of orthostatic intolerance and secretomotor dysfunction best predicted the CASS on multiple stepwise regression analysis. CONCLUSIONS: We describe a questionnaire that measures autonomic symptoms and present evidence for its validity. The instrument shows promise in assessing autonomic symptoms in clinical trials and epidemiologic studies.
Construct validity of the individual work performance questionnaire.
Koopmans, Linda; Bernaards, Claire M; Hildebrandt, Vincent H; de Vet, Henrica C W; van der Beek, Allard J
2014-03-01
To examine the construct validity of the Individual Work Performance Questionnaire (IWPQ). A total of 1424 Dutch workers from three occupational sectors (blue, pink, and white collar) participated in the study. First, IWPQ scores were correlated with related constructs (convergent validity). Second, differences between known groups were tested (discriminative validity). First, IWPQ scores correlated weakly to moderately with absolute and relative presenteeism, and work engagement. Second, significant differences in IWPQ scores were observed for workers differing in job satisfaction, and workers differing in health. Overall, the results indicate acceptable construct validity of the IWPQ. Researchers are provided with a reliable and valid instrument to measure individual work performance comprehensively and generically, among workers from different occupational sectors, with and without health problems.
Junghaenel, Doerte U.; Schneider, Stefan; Stone, Arthur A.; Christodoulou, Christopher; Broderick, Joan E.
2014-01-01
Objective This study examined the ecological validity and clinical utility of NIH Patient Reported-Outcomes Measurement Information System (PROMIS®) instruments for anger, depression, and fatigue in women with premenstrual symptoms. Methods One-hundred women completed daily diaries and weekly PROMIS assessments over 4 weeks. Weekly assessments were administered through Computerized Adaptive Testing (CAT). Weekly CATs and corresponding daily scores were compared to evaluate ecological validity. To test clinical utility, we examined if CATs could detect changes in symptom levels, if these changes mirrored those obtained from daily scores, and if CATs could identify clinically meaningful premenstrual symptom change. Results PROMIS CAT scores were higher in the pre-menstrual than the baseline (ps < .0001) and post-menstrual (ps < .0001) weeks. The correlations between CATs and aggregated daily scores ranged from .73 to .88 supporting ecological validity. Mean CAT scores showed systematic changes in accordance with the menstrual cycle and the magnitudes of the changes were similar to those obtained from the daily scores. Finally, Receiver Operating Characteristic (ROC) analyses demonstrated the ability of the CATs to discriminate between women with and without clinically meaningful premenstrual symptom change. Conclusions PROMIS CAT instruments for anger, depression, and fatigue demonstrated validity and utility in premenstrual symptom assessment. The results provide encouraging initial evidence of the utility of PROMIS instruments for the measurement of affective premenstrual symptoms. PMID:24630180
Donini, Lorenzo Maria; Rosano, Aldo; Di Lazzaro, Luca; Poggiogalle, Eleonora; Lubrano, Carla; Migliaccio, Silvia; Carbonelli, Mariagrazia; Pinto, Alessandro; Lenzi, Andrea
2017-05-15
Obesity is associated to increased risk of metabolic comorbidity as well as increased mortality. Notably, obesity is also associated to the impairment of the psychological status and of quality of life. Only three questionnaires are available in the Italian language evaluating the health-related quality of life in subjects with obesity. The aim of the present study was to test the validity and reliability of the Italian version of the Laval Questionnaire. The original French version was translated into Italian and back-translated by a French native speaker. 273 subjects with obesity (Body Mass Index ≥ 30 kg/m 2 ) were enrolled; the Italian version of the Laval Questionnaire and the O.R.Well-97 questionnaire were administered in order to assess health- related quality of life. The Laval questionnaire consists of 44 items distributed in 6 domains (symptoms, activity/mobility, personal hygiene/clothing, emotions, social interaction, sexual life). Disability and overall psychopathology levels were assessed through the TSD-OC test (SIO test for obesity correlated disabilities) and the SCL-90 (Symptom Checklist-90) questionnaire, respectively. To verify the validity of the Italian version, the analysis of internal consistency, test-retest reliability, and construct validity were performed. The observed proportion of agreement concordance of results was 50.2% with Cohen's K = 0.336 (CI 95%: 0.267-0.404), indicating a fair agreement between the two tests. Test-retest correlation was statistically significant (ρ = 0.82; p < 0.01); validity (standardized Chronbach's alpha) was considered reliable (α > 0.70). The analysis of construct validity showed a statistically significant association in terms of both total score (ρ = -0.66) and scores at each single domain (p < 0.01). A high correlation (p < 0.01) was observed between Laval questionnaire total and single domain scores and other related measures (Body Mass Index, TSD-OC scores, SCL-90 global severity index), revealing a high construct validity of the test. The Italian version of the Laval Questionnaire is a valid and reliable measure to assess the health-related quality of life in subjects with obesity.
The Role of Testing in Affirmative Action.
ERIC Educational Resources Information Center
Manning, Winton H.
Graphs and charts pertaining to testing in affirmative action are presented. Data concern the following: the predictive validity of College Board admissions tests using freshman grade point average as the criterion; validity coefficients of undergraduate grade point average (UGPA) alone, Law School Admission Test (LSAT) scores, and undergraduate…
Shoemaker, Sarah J.; Wolf, Michael S.; Brach, Cindy
2016-01-01
Objective To develop a reliable and valid instrument to assess the understandability and actionability of print and audiovisual materials. Methods We compiled items from existing instruments/guides that the expert panel assessed for face/content validity. We completed four rounds of reliability testing, and produced evidence of construct validity with consumers and readability assessments. Results The experts deemed the PEMAT items face/content valid. Four rounds of reliability testing and refinement were conducted using raters untrained on the PEMAT. Agreement improved across rounds. The final PEMAT showed moderate agreement per Kappa (Average K = 0.57) and strong agreement per Gwet’s AC1 (Average = 0.74). Internal consistency was strong (α = 0.71; Average Item-Total Correlation = 0.62). For construct validation with consumers (n = 47), we found significant differences between actionable and poorly-actionable materials in comprehension scores (76% vs. 63%, p < 0.05) and ratings (8.9 vs. 7.7, p < 0.05). For understandability, there was a significant difference for only one of two topics on consumer numeric scores. For actionability, there were significant positive correlations between PEMAT scores and consumer-testing results, but no relationship for understandability. There were, however, strong, negative correlations between grade-level and both consumer-testing results and PEMAT scores. Conclusions The PEMAT demonstrated strong internal consistency, reliability, and evidence of construct validity. Practice implications The PEMAT can help professionals judge the quality of materials (available at: http://www.ahrq.gov/pemat). PMID:24973195
Alhajj, Mohammed Nasser; Amran, Abdullah Ghalib; Halboub, Esam; Al-Basmi, Abdulghani Ali; Al-Ghabri, Fawaz Abdullah
2017-07-01
This study aimed at developing the Arabic version of the Orofacial Esthetic Scale (OES-Ar) and to investigate its psychometric properties among Arabic-speaking population with and without esthetic impairments. Translation and cross-cultural adaptation was done according to the standard guidelines. Internal consistency was assessed on 230 participants. For test-retest reliability, 50 subjects with natural teeth were recalled within a period of 2 weeks. Validity of the OES-Ar was tested by construct, convergent, and discriminant validity tests. Responsiveness to esthetic changes was assessed in 60 patients. The results showed excellent internal consistency with Cronbach's alpha value of 0.92 and inter-item correlation average value of 0.60. The ICC values ranged from 0.87 to 0.96 which indicated excellent agreement. Construct validity of the OES-Ar was confirmed to be one-factor structure (one-dimensional). For convergent validity, a significant correlation was found between OES summary score and overall impression of the orofacial esthetic as well as between OES summary score and the summary score of the three questions of the OHIP-49Ar related to esthetic. The discriminant validity test revealed significant differences between different study groups (P<0.001). Responsiveness to treatment was confirmed by significant differences between pre- and post-treatment OES total summary score (P<0.001). The OES-Ar has excellent psychometric properties making it valuable instrument to assess orofacial esthetics in Arabic-speaking patients. Copyright © 2016 Japan Prosthodontic Society. Published by Elsevier Ltd. All rights reserved.
Shoemaker, Sarah J; Wolf, Michael S; Brach, Cindy
2014-09-01
To develop a reliable and valid instrument to assess the understandability and actionability of print and audiovisual materials. We compiled items from existing instruments/guides that the expert panel assessed for face/content validity. We completed four rounds of reliability testing, and produced evidence of construct validity with consumers and readability assessments. The experts deemed the PEMAT items face/content valid. Four rounds of reliability testing and refinement were conducted using raters untrained on the PEMAT. Agreement improved across rounds. The final PEMAT showed moderate agreement per Kappa (Average K=0.57) and strong agreement per Gwet's AC1 (Average=0.74). Internal consistency was strong (α=0.71; Average Item-Total Correlation=0.62). For construct validation with consumers (n=47), we found significant differences between actionable and poorly-actionable materials in comprehension scores (76% vs. 63%, p<0.05) and ratings (8.9 vs. 7.7, p<0.05). For understandability, there was a significant difference for only one of two topics on consumer numeric scores. For actionability, there were significant positive correlations between PEMAT scores and consumer-testing results, but no relationship for understandability. There were, however, strong, negative correlations between grade-level and both consumer-testing results and PEMAT scores. The PEMAT demonstrated strong internal consistency, reliability, and evidence of construct validity. The PEMAT can help professionals judge the quality of materials (available at: http://www.ahrq.gov/pemat). Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
A two-factor theory for concussion assessment using ImPACT: memory and speed.
Schatz, Philip; Maerlender, Arthur
2013-12-01
We present the initial validation of a two-factor structure of Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) using ImPACT composite scores and document the reliability and validity of this factor structure. Factor analyses were conducted for baseline (N = 21,537) and post-concussion (N = 560) data, yielding "Memory" (Verbal and Visual) and "Speed" (Visual Motor Speed and Reaction Time) Factors; inclusion of Total Symptom Scores resulted in a third discrete factor. Speed and Memory z-scores were calculated, and test-retest reliability (using intra-class correlation coefficients) at 1 month (0.88/0.81), 1 year (0.85/0.75), and 2 years (0.76/0.74) were higher than published data using Composite scores. Speed and Memory scores yielded 89% sensitivity and 70% specificity, which was higher than composites (80%/62%) and comparable with subscales (91%/69%). This emergent two-factor structure has improved test-retest reliability with no loss of sensitivity/specificity and may improve understanding and interpretability of ImPACT test results.
Could situational judgement tests be used for selection into dental foundation training?
Patterson, F; Ashworth, V; Mehra, S; Falcon, H
2012-07-13
To pilot and evaluate a machine-markable situational judgement test (SJT) designed to select candidates into UK dental foundation training. Single centre pilot study. UK postgraduate deanery in 2010. Seventy-four candidates attending interview for dental foundation training in Oxford and Wessex Deaneries volunteered to complete the situational judgement test. The situational judgement test was developed to assess relevant professional attributes for dentistry (for example, empathy and integrity) in a machine-markable format. Test content was developed by subject matter experts working with experienced psychometricians. Evaluation of psychometric properties of the pilot situational judgement test (for example, reliability, validity and fairness). Scores in the dental foundation training selection process (short-listing and interviews) were used to examine criterion-related validity. Candidates completed an evaluation questionnaire to examine candidate reactions and face validity of the new test. Forty-six candidates were female and 28 male; mean age was 23.5-years-old (range 22-32). Situational judgement test scores were normally distributed and the test showed good internal reliability when corrected for test length (α = 0.74). Situational judgement test scores positively correlated with the management, leadership and professionalism interview (N = 50; r = 0.43, p <0.01) but not with the clinical skills interview, providing initial evidence of criterion-related validity as the situational judgement test is designed to test non-cognitive professional attributes beyond clinical knowledge. Most candidates perceived the situational judgement test as relevant to dentistry, appropriate for their training level, and fair. This initial pilot study suggests that a situational judgement test is an appropriate and innovative method to measure professional attributes (eg empathy and integrity) for selection into foundation training. Further research will explore the long-term predictive validity of the situational judgement test once candidates have entered training.
Cross-Validation of easyCBM Reading Cut Scores in Oregon: 2009-2010. Technical Report #1108
ERIC Educational Resources Information Center
Park, Bitnara Jasmine; Irvin, P. Shawn; Anderson, Daniel; Alonzo, Julie; Tindal, Gerald
2011-01-01
This technical report presents results from a cross-validation study designed to identify optimal cut scores when using easyCBM[R] reading tests in Oregon. The cross-validation study analyzes data from the 2009-2010 academic year for easyCBM[R] reading measures. A sample of approximately 2,000 students per grade, randomly split into two groups of…
O'Grady, Anthony; Allen, David; Happerfield, Lisa; Johnson, Nicola; Provenzano, Elena; Pinder, Sarah E; Tee, Lilian; Gu, Mai; Kay, Elaine W
2010-12-01
Immunohistochemistry (IHC) is used as the frontline assay to determine HER2 status in invasive breast cancer patients. The aim of the study was to compare the performance of the Leica Oracle HER2 Bond IHC System (Oracle) with the current most readily accepted Dako HercepTest (HercepTest), using both commercially validated and modified ASCO/CAP and UK HER2 IHC scoring guidelines. A total of 445 breast cancer samples from 3 international clinical HER2 referral centers were stained with the 2 test systems and scored in a blinded fashion by experienced pathologists. The overall agreement between the 2 tests in a 3×3 (negative, equivocal and positive) analysis shows a concordance of 86.7% and 86.3%, respectively when analyzed using commercially validated and modified ASCO/CAP and UK HER2 IHC scoring guidelines. There is a good concordance between the Oracle and the HercepTest. The advantages of a complete fully automated test such as the Oracle include standardization of key analytical factors and improved turn around time. The implementation of the modified ASCO/CAP and UK HER2 IHC scoring guidelines has minimal effect on either assay interpretation, showing that Oracle can be used as a methodology for accurately determining HER2 IHC status in formalin fixed, paraffin-embedded breast cancer tissue.
ERIC Educational Resources Information Center
Goodwin, Amanda P.; Huggins, A. Corinne; Carlo, Maria; Malabonga, Valerie; Kenyon, Dorry; Louguit, Mohammed; August, Diane
2012-01-01
This study describes the development and validation of the Extract the Base test (ETB), which assesses derivational morphological awareness. Scores on this test were validated for 580 monolingual students and 373 Spanish-speaking English language learners (ELLs) in third through fifth grade. As part of the validation of the internal structure,…
Sanchez-Garcia, Manuel; Extremera, Natalio; Fernandez-Berrocal, Pablo
2016-11-01
This research examined evidence regarding the reliability and validity of scores on the Spanish version of the Mayer-Salovey-Caruso Emotional Intelligence Test, Version 2.0 (MSCEIT; Mayer, Salovey, & Caruso, 2002). In Study 1, we found a close convergence of the Spanish consensus scores and the general and expert consensus scores determined with Mayer, Salovey, Caruso, and Sitarenios (2003) data. The MSCEIT also demonstrated adequate evidence of reliability of test scores as estimated by internal consistency and test-retest correlation after 12 weeks. Confirmatory factor analysis supported a 3-level higher factor model with 8 manifest variables (task scores), 4 first-level factors (corresponding to the 4-branch model of Mayer & Salovey [1997], with 2 tasks for each branch), 2 second-level factors (experiential and strategic areas, with 2 branches for each area), and 1 third-level factor (overall emotional intelligence [EI]), and multigroup analyses supported MSCEIT cross-gender invariance. Study 2 found evidence for the discriminant validity of scores on the MSCEIT subscales, which were differentially related to personality and self-reported EI. Study 3 provided evidence of the incremental validity of scores on the MSCEIT, which added significant variance to the prospective prediction of psychological well-being after controlling for personality traits. The psychometric properties of the Spanish MSCEIT are similar to those of the original English version, supporting its use for assessing emotional abilities in the Spanish population. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Singh, Varun Pratap; Singh, Rajkumar
2014-03-01
The aim of this study was to develop a reliable and valid Nepali version of the Psychosocial Impact of Dental Aesthetic Questionnaire (PIDAQ). Cross-sectional descriptive validation study. B.P. Koirala Institute of Health Sciences, Dharan, Nepal. A rigorous translation process including conceptual and semantic evaluation, translation, back translation and pre-testing was carried out. Two hundred and fifty-two undergraduates, including equal numbers of males and females with an age ranging from 18 to 29 years (mean age: 22·33±2·114 years), participated in this study. Reliability was assessed by Cronbach's alpha coefficient and the coefficient of correlation was used to assess correlation between items and test-retest reliability. The construct validity was tested by factorial analysis. Convergent construct validity was tested by comparison of PIDAQ scores with the aesthetic component of the index of orthodontic treatment needs (IOTN-AC) and perception of occlusion scale (POS), respectively. Discriminant construct validity was assessed by differences in score for those who demand treatment and those who did not. The response rate was 100%. One hundred and twenty-three individuals had a demand for orthodontic treatment. The Nepali PIDAQ had excellent reliability with Cronbach's alpha of 0·945, corrected item correlation between 0·525 and 0·790 and overall test-retest reliability of 0·978. The construct validity was good with formation of a new sub-domain 'Dental self-consciousness'. The scale had good correlation with IOTN-AC and POS fulfilling convergent construct validity. The discriminant construct validity was proved by significant differences in scores for subjects with demand and without demand for treatment. To conclude, Nepali version of PIDAQ has good psychometric properties and can be used effectively in this population group for further research.
Rater Expertise in a Second Language Speaking Assessment: The Influence of Training and Experience
ERIC Educational Resources Information Center
Davis, Lawrence Edward
2012-01-01
Speaking performance tests typically employ raters to produce scores; accordingly, variability in raters' scoring decisions has important consequences for test reliability and validity. One such source of variability is the rater's level of expertise in scoring. Therefore, it is important to understand how raters' performance is influenced by…
Holcomb, W R; Adams, N A; Ponder, H M; Anderson, W P
1984-03-01
Tested by multivariate regression the validity of the MMPI with accused murderers (N = 96) who were undergoing pre-trial evaluations. Four significant behavioral and cognitive predictors of MMPI elevated scores were identified. These include low intelligence, history of drug abuse, suspiciousness observed on the ward, and the fact that the accused was a stranger to the victim. These results support the validity of the MMPI with this population and also suggest that high F scale scores on the MMPI are more a measure of psychopathology than invalidity due to test-taking response bias.
Role of test motivation in intelligence testing.
Duckworth, Angela Lee; Quinn, Patrick D; Lynam, Donald R; Loeber, Rolf; Stouthamer-Loeber, Magda
2011-05-10
Intelligence tests are widely assumed to measure maximal intellectual performance, and predictive associations between intelligence quotient (IQ) scores and later-life outcomes are typically interpreted as unbiased estimates of the effect of intellectual ability on academic, professional, and social life outcomes. The current investigation critically examines these assumptions and finds evidence against both. First, we examined whether motivation is less than maximal on intelligence tests administered in the context of low-stakes research situations. Specifically, we completed a meta-analysis of random-assignment experiments testing the effects of material incentives on intelligence-test performance on a collective 2,008 participants. Incentives increased IQ scores by an average of 0.64 SD, with larger effects for individuals with lower baseline IQ scores. Second, we tested whether individual differences in motivation during IQ testing can spuriously inflate the predictive validity of intelligence for life outcomes. Trained observers rated test motivation among 251 adolescent boys completing intelligence tests using a 15-min "thin-slice" video sample. IQ score predicted life outcomes, including academic performance in adolescence and criminal convictions, employment, and years of education in early adulthood. After adjusting for the influence of test motivation, however, the predictive validity of intelligence for life outcomes was significantly diminished, particularly for nonacademic outcomes. Collectively, our findings suggest that, under low-stakes research conditions, some individuals try harder than others, and, in this context, test motivation can act as a third-variable confound that inflates estimates of the predictive validity of intelligence for life outcomes.
Role of test motivation in intelligence testing
Duckworth, Angela Lee; Quinn, Patrick D.; Lynam, Donald R.; Loeber, Rolf; Stouthamer-Loeber, Magda
2011-01-01
Intelligence tests are widely assumed to measure maximal intellectual performance, and predictive associations between intelligence quotient (IQ) scores and later-life outcomes are typically interpreted as unbiased estimates of the effect of intellectual ability on academic, professional, and social life outcomes. The current investigation critically examines these assumptions and finds evidence against both. First, we examined whether motivation is less than maximal on intelligence tests administered in the context of low-stakes research situations. Specifically, we completed a meta-analysis of random-assignment experiments testing the effects of material incentives on intelligence-test performance on a collective 2,008 participants. Incentives increased IQ scores by an average of 0.64 SD, with larger effects for individuals with lower baseline IQ scores. Second, we tested whether individual differences in motivation during IQ testing can spuriously inflate the predictive validity of intelligence for life outcomes. Trained observers rated test motivation among 251 adolescent boys completing intelligence tests using a 15-min “thin-slice” video sample. IQ score predicted life outcomes, including academic performance in adolescence and criminal convictions, employment, and years of education in early adulthood. After adjusting for the influence of test motivation, however, the predictive validity of intelligence for life outcomes was significantly diminished, particularly for nonacademic outcomes. Collectively, our findings suggest that, under low-stakes research conditions, some individuals try harder than others, and, in this context, test motivation can act as a third-variable confound that inflates estimates of the predictive validity of intelligence for life outcomes. PMID:21518867
Vyas, Shaleen; Nagarajappa, Sandesh; Dasar, Pralhad L; Mishra, Prashant
2016-10-01
Linguistically adapted oral health literacy tools are helpful to assess oral health literacy among local population with clarity and understandability. The original oral health literacy adult questionnaire, Oral Health Literacy Adult Questionnaire, was given in English (2013), consisting of 17 items under 4 domains. The present study rationalizes to culturally adapt and validate Oral Health Literacy Adult Questionnaire into Hindi language. Thus, we objectified to translate Oral Health Literacy Adult Questionnaire into Hindi and test its psychometric properties like reliability and validity among primary school teachers. The Oral Health Literacy Adult Questionnaire was translated into Oral Health Literacy Adult Questionnaire - Hindi Version using the World Health Organization recommended translation back-translation protocol. During pre-testing, an expert panel assessed content validity of the questionnaire. Face validity was assessed on a small sample of 10 individuals. A cross-sectional study was conducted (June-July 2015) and OHL-AQ-H was administered on a convenient sample of 170 primary school teachers. Internal consistency and test-retest reliability were assessed using Cronbach's alpha and Intra-class correlation coefficient (ICC), respectively, with 2 weeks interval to ascertain adherence to the questionnaire response. Predictive validity was tested by comparing OHL-AQ-H scores with clinical indicators like oral hygiene scores and dental caries scores. The concurrent and discriminant validity was assessed through self-reported oral health and through negative association with sociodemographic variables. The data was analyzed by descriptive tests using chi-square and bivariate logistic regression in SPSS software, version 20 and p<0.05 was considered as the significance level. The mean OHL-AQ-H score was 13.58±2.82. ICC and Cronbach's alpha for Oral Health Literacy Adult Questionnaire - Hindi Version were 0.94 and 0.70, respectively. Comparisons of varying levels of oral health literacy with self-reported oral health established significant concurrent validity (p=0.01). Significant predictive validity was observed between OHL-AQ-H scores and clinical parameters like oral hygiene status (p=0.005) and dentition status (p=0.001). The translated and culturally adapted Oral Health Literacy Adult Questionnaire - Hindi Version indicated good reliability and validity among primary school teachers to assess oral health literacy among Hindi speaking population. Hence, improving OHL levels and implementing education oriented policies can improve the quality of life.
Angers, Magalie; Svotelis, Amy; Balg, Frederic; Allard, Jean-Pascal
2016-04-01
The Ankle Osteoarthritis Scale (AOS) is a self-administered score specific for ankle osteoarthritis (OA) with excellent reliability and strong construct and criterion validity. Many recent randomized multicentre trials have used the AOS, and the involvement of the French-speaking population is limited by the absence of a French version. Our goal was to develop a French version and validate the psychometric properties to assure equivalence to the original English version. Translation was performed according to American Association of Orthopaedic Surgeons (AAOS) 2000 guidelines for cross-cultural adaptation. Similar to the validation process of the English AOS, we evaluated the psychometric properties of the French version (AOS-Fr): criterion validity (AOS-Fr v. Western Ontario and McMaster Universities Arthritis Index [WOMAC] and SF-36 scores), construct validity (AOS-Fr correlation to single heel-lift test), and reliability (AOS-Fr test-retest). Sixty healthy individuals tested a prefinal version of the AOS-Fr for comprehension, leading to modifications and a final version that was approved by C. Saltzman, author of the AOS. We then recruited patients with ankle OA for evaluation of the AOS-Fr psychometric properties. Twenty-eight patients with ankle OA participated in the evaluation. The AOS-Fr showed strong criterion validity (AOS:WOMAC r = 0.709 and AOS:SF-36 r = -0.654) and construct validity (r = 0.664) and proved to be reliable (test-retest intraclass correlation coefficient = 0.922). The AOS-Fr is a reliable and valid score equivalent to the English version in terms of psychometric properties, thus is available for use in multicentre trials.
Ciampa, Philip J; Skinner, Shannon L; Patricio, Sérgio R; Rothman, Russell L; Vermund, Sten H; Audet, Carolyn M
2012-01-01
The relationship between HIV knowledge and HIV-related behaviors in settings like Mozambique has been limited by a lack of rigorously validated measures. A convenience sample of women seeking prenatal care at two clinics were administered an adapted, orally-administered, 27 item HIV-knowledge scale, the HK-27. Validation analyses were stratified by survey language (Portuguese and Echuabo). Kuder-Richardson (KR-20) coefficients estimated internal reliability. Construct validity was assessed with bivariate associations between HK-27 scores (% correct) and selected participant characteristics. The association between knowledge, self-reported HIV testing, and HIV infection were evaluated with multivariable logistic regression. Participants (N = 348) had a median age of 24; 188 spoke Portuguese, and 160 spoke Echuabo. Mean HK-27 scores were higher for Portuguese-speaking participants than Echuabo-speaking participants (68% correct vs. 42%, p<0.001). Internal reliability was strong (KR-20>0.8) for scales in both languages. Higher HK-27 scores were significantly (p≤0.05) correlated with more education, more media items in the home, a history of HIV testing, and participant work outside of the home for women of both languages. HK-27 scores were independently associated with completion of HIV testing in multivariable analysis (per 1% correct: aOR:1.02, 95%CI:0.01-0.03, p = 0.01), but not with HIV infection. HK-27 is a reliable and valid measure of HIV knowledge among Portuguese and Echuabo-speaking Mozambican women. The HK-27 demonstrated significant knowledge deficits among women in the study, and higher scores were associated with higher HIV testing probability. Future studies should evaluate the role of the HK-27 in longitudinal studies and in other populations.
Buiza, Cristina; Navarro, Ana; Díaz-Orueta, Unai; González, Mari Feli; Alaba, Javier; Arriola, Enrique; Hernández, Carmen; Zulaica, Amaia; Yanguas, José Javier
2011-01-01
The cognitive assessment of patients with advanced dementia needs proper screening instruments that allow obtain information about the cognitive state and resources that these individuals still have. The present work conducts a Spanish validation study of the Severe Mini Mental State Examination (SMMSE). Forty-seven patients with advanced dementia (Mini-Cognitive Examination [MEC]<11) were evaluated with the Reisberg's Global Deterioration Scale, MEC, SMMSE and Severe Cognitive Impairment Profile scales. All test items were discriminative. The test showed high internal (α=0.88), test-retest (0.64 to 1.00, P<.01) and between observers reliabilities (0.69-1.00, p<0.01), both for scores total and for each item separately. Construct validity was tested through correlations between the instrument and MEC scores (r=0.59, P<0.01). Further information on the construct validity was obtained by dividing the sample into groups that scored above or below 5 points in the MEC and recalculating their correlations with SMMSE. The correlation between the scores in the SMMSE and MEC was significant in the MEC 0-5 group (r=0.55, P<.05), but not in the MEC>5 group. Additionally, differences in scores were found in the SMMSE, but not in the MEC, between the three GDS groups (5, 6 and 7) (H=11.1, P<.05). The SMMSE is an instrument for the assessment of advanced cognitive impairment which prevents the floor effect through an extension of lower measurement range relative to that of the MEC. From our results, this rapid screening tool and easy to administer, can be considered valid and reliable. Copyright © 2010 SEGG. Published by Elsevier Espana. All rights reserved.
Rodrigues, Letícia C.; Marques, Aline P.; Barros, Paula B.; Michaelsen, Stella M.
2014-01-01
BACKGROUND: The Balance Evaluation Systems Test (BESTest) was recently created to allow the development of treatments according to the specific balance system affected in each patient. The Brazilian version of the BESTest has not been specifically tested after stroke. OBJECTIVE: To evaluate the intra- and inter-rater reliability and concurrent and convergent validity of the total score of the BESTest and BESTest sections for adults with hemiparesis after stroke. METHOD: The study included 16 subjects (61.1±7.5 years) with chronic hemiparesis (54.5±43.5 months after stroke). The BESTest was administered by two raters in the same week and one of the raters repeated the test after a one-week interval. Intraclass correlation coefficient (ICC) was calculated to assess intra- and interrater reliability. Concurrent validity with the Berg Balance Scale (BBS) and convergent validity with the Activities-specific Balance Confidence scale (ABC-Brazil) were assessed using Pearson's correlation coefficient. RESULTS: Both the BESTest total score (ICC=0.98) and the BESTest sections (ICC between 0.85 and 0.96) have excellent intrarater reliability. Interrater reliability for the total score was excellent (ICC=0.93) and, for the sections, it ranged between 0.71 and 0.94. The correlation coefficient between the BESTest and the BBS and ABC-Brazil were 0.78 and 0.59, respectively. CONCLUSIONS: The Brazilian version of the BESTest demonstrated adequate reliability when measured by sections and could identify what balance system was affected in patients after stroke. Concurrent validity was excellent with the BBS total score and good to excellent with the sections. The total scores but not the sections present adequate convergent validity with the ABC-Brazil. However, other psychometric properties should be further investigated. PMID:25003281
Dutch validation of the low anterior resection syndrome score.
Hupkens, B J P; Breukink, S O; Olde Reuver Of Briel, C; Tanis, P J; de Noo, M E; van Duijvendijk, P; van Westreenen, H L; Dekker, J W T; Chen, T Y T; Juul, T
2018-04-21
The aim of this study was to validate the Dutch translation of the low anterior resection syndrome (LARS) score in a population of Dutch rectal cancer patients. Patients who underwent surgery for rectal cancer received the LARS score questionnaire, a single quality of life (QoL) category question and the European Organization for Research and Treatment of Cancer (EORTC) QLQ-C30 questionnaire. A subgroup of patients received the LARS score twice to assess the test-retest reliability. A total of 165 patients were included in the analysis, identified in six Dutch centres. The response rate was 62.0%. The percentage of patients who reported 'major LARS' was 59.4%. There was a high proportion of patients with a perfect or moderate fit between the QoL category question and the LARS score, showing a good convergent validity. The LARS score was able to discriminate between patients with or without neoadjuvant radiotherapy (P = 0.003), between total and partial mesorectal excision (P = 0.008) and between age groups (P = 0.039). There was a statistically significant association between a higher LARS score and an impaired function on the global QoL subscale and the physical, role, emotional and social functioning subscales of the EORTC QLQ-C30 questionnaire. The test-retest reliability of the LARS score was good, with an interclass correlation coefficient of 0.79. The good psychometric properties of the Dutch version of the LARS score are comparable overall to the earlier validations in other countries. Therefore, the Dutch translation can be considered to be a valid tool for assessing LARS in Dutch rectal cancer patients. Colorectal Disease © 2018 The Association of Coloproctology of Great Britain and Ireland.
Hodge, Megan M; Gotzke, Carrie L
2014-01-01
This study evaluated construct-related validity of the Test of Children's Speech (TOCS). Intelligibility scores obtained using open-set word identification tasks (orthographic transcription) for the TOCS word and sentence tests and rate scores for the TOCS sentence test (words per minute or WPM and intelligible words per minute or IWPM) were compared for a group of 15 adults (18-30 years of age) with normal speech production and three groups of children: 48 3-6 year-olds with typical speech development and neurological histories (TDS), 48 3-6 year-olds with a speech sound disorder of unknown origin and no identified neurological impairment (SSD-UNK), and 22 3-10 year-olds with dysarthria and cerebral palsy (DYS). As expected, mean intelligibility scores and rates increased with age in the TDS group. However, word test intelligibility, WPM and IWPM scores for the 6 year-olds in the TDS group were significantly lower than those for the adults. The DYS group had significantly lower word and sentence test intelligibility and WPM and IWPM scores than the TDS and SSD-UNK groups. Compared to the TDS group, the SSD-UNK group also had significantly lower intelligibility scores for the word and sentence tests, and significantly lower IWPM, but not WPM scores on the sentence test. The results support the construct-related validity of TOCS as a tool for obtaining intelligibility and rate scores that are sensitive to group differences in 3-6 year-old children, with and without speech sound disorders, and to 3+ year-old children with speech disorders, with and without dysarthria. Readers will describe the word and sentence intelligibility and speaking rate performance of children with typically developing speech at age levels of 3, 4, 5 and 6 years, as measured by the Test of Children's Speech, and how these compare with adult speakers and two groups of children with speech disorders. They will also recognize what measures on this test differentiate children with speech sound disorders of unknown origin from children with cerebral palsy and dysarthria. Copyright © 2014 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Huesman, Ronald L., Jr.; Frisbie, David A.
This study investigated the effect of extended-time limits in terms of performance levels and score comparability for reading comprehension scores on the Iowa Tests of Basic Skills (ITBS). The first part of the study compared the average reading comprehension scores on the ITBS of 61 sixth-graders with learning disabilities and 397 non learning…
Embedded measures of performance validity using verbal fluency tests in a clinical sample.
Sugarman, Michael A; Axelrod, Bradley N
2015-01-01
The objective of this study was to determine to what extent verbal fluency measures can be used as performance validity indicators during neuropsychological evaluation. Participants were clinically referred for neuropsychological evaluation in an urban-based Veteran's Affairs hospital. Participants were placed into 2 groups based on their objectively evaluated effort on performance validity tests (PVTs). Individuals who exhibited credible performance (n = 431) failed 0 PVTs, and those with poor effort (n = 192) failed 2 or more PVTs. All participants completed the Controlled Oral Word Association Test (COWAT) and Animals verbal fluency measures. We evaluated how well verbal fluency scores could discriminate between the 2 groups. Raw scores and T scores for Animals discriminated between the credible performance and poor-effort groups with 90% specificity and greater than 40% sensitivity. COWAT scores had lower sensitivity for detecting poor effort. A combination of FAS and Animals scores into logistic regression models yielded acceptable group classification, with 90% specificity and greater than 44% sensitivity. Verbal fluency measures can yield adequate detection of poor effort during neuropsychological evaluation. We provide suggested cut points and logistic regression models for predicting the probability of poor effort in our clinical setting and offer suggested cutoff scores to optimize sensitivity and specificity.
Gilet, Hélène; Arnould, Benoit; Fofana, Fatoumata; Clerson, Pierre; Colombel, Jean-Frédéric; D'Hondt, Olivier; Faure, Patrick; Hagège, Hervé; Nachury, Maria; Nahon, Stéphane; Tucat, Gilbert; Vandromme, Luc; Cazala-Telinge, Ines; Thibout, Emmanuel
2014-01-01
Severe Crohn's disease management includes anti-tumor necrosis factor (anti-TNF) drugs that differ from early-stage treatments regarding efficacy, safety, and convenience. This study aimed to finalize and psychometrically validate the Satisfaction for PAtients in Crohn's diseasE Questionnaire (SPACE-Q(©)), developed to measure satisfaction with anti-TNF treatment in patients with severe Crohn's disease. A total of 279 patients with severe Crohn's disease receiving anti-TNF therapy completed the SPACE-Q 62-item pilot version at inclusion and 12 and 13 weeks after first anti-TNF injection. The final SPACE-Q scoring was defined using multitrait and regression analyses and clinical relevance considerations. Psychometric validation included clinical validity against Harvey-Bradshaw score, concurrent validity against Treatment Satisfaction Questionnaire for Medication (TSQM), internal consistency reliability, test-retest reliability, and responsiveness against the patient global impression of change (PGIC). Quality of completion was good (55%-67% of patients completed all items). Four items were removed from the questionnaire. Eleven scores were defined within the final 58-item SPACE-Q: disease control; symptoms, anal symptoms, and quality of life transition scales; tolerability; convenience; expectation confirmation toward efficacy, side effects, and convenience; satisfaction with treatment; and motivation. Scores met standards for concurrent validity (correlation between SPACE-Q satisfaction with treatment and TSQM satisfaction scores =0.59), internal consistency reliability (Cronbach's α=0.67-0.93), test-retest reliability (intraclass correlations =0.62-0.91), and responsiveness (improvement in treatment experience assessed by the SPACE-Q for patients reporting improvement on the PGIC). Significantly different mean scores were observed between groups of patients with different Harvey-Bradshaw disease severity scores. The SPACE-Q is a valid, reliable, and responsive instrument to measure satisfaction with anti-TNF treatment in patients with severe Crohn's disease and for use in future studies.
Yang, J C; Noble, J
1990-01-01
This study investigated the validity of three American College Testing-Proficiency Examination Program (ACT-PEP) tests (Maternal and Child Nursing, Psychiatric/Mental Health Nursing, Adult Nursing) for predicting the academic performance of registered nurses (RNs) enrolled in bachelor's degree BSN programs nationwide. This study also examined RN students' performance on the ACT-PEP tests by their demographic characteristics: student's age, sex, race, student status (full- or part-time), and employment status (full- or part-time). The total sample for the three tests comprised 2,600 students from eight institutions nationwide. The median correlation coefficients between the three ACT-PEP tests and the semester grade point averages ranged from .36 to .56. Median correlation coefficients increased over time, supporting the stability of ACT-PEP test scores for predicting academic performance over time. The relative importance of selected independent variables for predicting academic performance was also examined; the most important variable for predicting academic performance was typically the ACT-PEP test score. Across the institutions, student demographic characteristics did not contribute significantly to explaining academic performance, over and above ACT-PEP scores.
Hsiao, Pei-Chi; Yu, Wan-Hui; Lee, Shih-Chieh; Chen, Mei-Hsiang; Hsieh, Ching-Lin
2018-06-14
The responsiveness and predictive validity of the Tablet-based Symbol Digit Modalities Test (T-SDMT) are unknown, which limits the utility of the T-SDMT in both clinical and research settings. The purpose of this study was to examine the responsiveness and predictive validity of the T-SDMT in inpatients with stroke. A follow-up, repeated-assessments design. One rehabilitation unit at a local medical center. A total of 50 inpatients receiving rehabilitation completed T-SDMT assessments at admission to and discharge from a rehabilitation ward. The median follow-up period was 14 days. The Barthel index (BI) was assessed at discharge and was used as the criterion of the predictive validity. The mean changes in the T-SDMT scores between admission and discharge were statistically significant (paired t-test = 3.46, p = 0.001). The T-SDMT scores showed a nearly moderate standardized response mean (0.49). A moderate association (Pearson's r = 0.47) was found between the scores of the T-SDMT at admission and those of the BI at discharge, indicating good predictive validity of the T-SDMT. Our results support the responsiveness and predictive validity of the T-SDMT in patients with stroke receiving rehabilitation in hospitals. This study provides empirical evidence supporting the use of the T-SDMT as an outcome measure for assessing processingspeed in inpatients with stroke. The scores of the T-SDMT could be used to predict basic activities of daily living function in inpatients with stroke.
Why Lessons Learned from the Past Require Haertel's Expanded Scope for Test Validation
ERIC Educational Resources Information Center
Shepard, Lorrie A.
2013-01-01
In his article, Haertel (this issue) asks a fundamental question about how use of a test is expected to cause improvements in the educational system and in learning. He also considers how test validity should be investigated and argues for a more expansive view of validity that does not stop with scoring or generalization (the more technical and…
Reddy, Linda A; Fabiano, Gregory A; Dudek, Christopher M; Hsu, Louis
2013-12-01
The present study examined the validity of a teacher observation measure, the Classroom Strategies Scale--Observer Form (CSS), as a predictor of student performance on statewide tests of mathematics and English language arts. The CSS is a teacher practice observational measure that assesses evidence-based instructional and behavioral management practices in elementary school. A series of two-level hierarchical generalized linear models were fitted to data of a sample of 662 third- through fifth-grade students to assess whether CSS Part 2 Instructional Strategy and Behavioral Management Strategy scale discrepancy scores (i.e., ∑ |recommended frequency--frequency ratings|) predicted statewide mathematics and English language arts proficiency scores when percentage of minority students in schools was controlled. Results indicated that the Instructional Strategy scale discrepancy scores significantly predicted mathematics and English language arts proficiency scores: Relatively larger discrepancies on observer ratings of what teachers did versus what should have been done were associated with lower proficiency scores. Results offer initial evidence of the predictive validity of the CSS Part 2 Instructional Strategy discrepancy scores on student academic outcomes. PsycINFO Database Record (c) 2013 APA, all rights reserved.
AZARI, Nadia; SOLEIMANI, Farin; VAMEGHI, Roshanak; SAJEDI, Firoozeh; SHAHSHAHANI, Soheila; KARIMI, Hossein; KRASKIAN, Adis; SHAHROKHI, Amin; TEYMOURI, Robab; GHARIB, Masoud
2017-01-01
Objective Bayley Scales of infant & toddler development is a well-known diagnostic developmental assessment tool for children aged 1–42 months. Our aim was investigating the validity & reliability of this scale in Persian speaking children. Materials & Methods The method was descriptive-analytic. Translation- back translation and cultural adaptation was done. Content & face validity of translated scale was determined by experts’ opinions. Overall, 403 children aged 1 to 42 months were recruited from health centers of Tehran, during years of 2013-2014 for developmental assessment in cognitive, communicative (receptive & expressive) and motor (fine & gross) domains. Reliability of scale was calculated through three methods; internal consistency using Cronbach’s alpha coefficient, test-retest and interrater methods. Construct validity was calculated using factor analysis and comparison of the mean scores methods. Results Cultural and linguistic changes were made in items of all domains especially on communication subscale. Content and face validity of the test were approved by experts’ opinions. Cronbach’s alpha coefficient was above 0.74 in all domains. Pearson correlation coefficient in various domains, were ≥ 0.982 in test retest method, and ≥0.993 in inter-rater method. Construct validity of the test was approved by factor analysis. Moreover, the mean scores for the different age groups were compared and statistically significant differences were observed between mean scores of different age groups, that confirms validity of the test. Conclusion The Bayley Scales of Infant and Toddler Development is a valid and reliable tool for child developmental assessment in Persian language children. PMID:28277556
Is the Simple Shoulder Test a valid outcome instrument for shoulder arthroplasty?
Hsu, Jason E; Russ, Stacy M; Somerson, Jeremy S; Tang, Anna; Warme, Winston J; Matsen, Frederick A
2017-10-01
The Simple Shoulder Test (SST) is a brief, inexpensive, and widely used patient-reported outcome tool, but it has not been rigorously evaluated for patients having shoulder arthroplasty. The goal of this study was to rigorously evaluate the validity of the SST for outcome assessment in shoulder arthroplasty using a systematic review of the literature and an analysis of its properties in a series of 408 surgical cases. SST scores, 36-Item Short Form Health Survey scores, and satisfaction scores were collected preoperatively and 2 years postoperatively. Responsiveness was assessed by comparing preoperative and 2-year postoperative scores. Criterion validity was determined by correlating the SST with the 36-Item Short Form Health Survey. Construct validity was tested through 5 clinical hypotheses regarding satisfaction, comorbidities, insurance status, previous failed surgery, and narcotic use. Scores after arthroplasty improved from 3.9 ± 2.8 to 10.2 ± 2.3 (P < .001). The change in SST correlated strongly with patient satisfaction (P < .001). The SST had large Cohen's d effect sizes and standardized response means. Criterion validity was supported by significant differences between satisfied and unsatisfied patients, those with more severe and less severe comorbidities, those with workers' compensation or Medicaid and other types of insurance, those with and without previous failed shoulder surgery, and those taking and those not taking narcotic pain medication before surgery (P < .005). These data combined with a systematic review of the literature demonstrate that the SST is a valid and responsive patient-reported outcome measure for assessing the outcomes of shoulder arthroplasty. Copyright © 2017 Journal of Shoulder and Elbow Surgery Board of Trustees. Published by Elsevier Inc. All rights reserved.
Stroke-Associated Pneumonia Risk Score: Validity in a French Stroke Unit.
Cugy, Emmanuelle; Sibon, Igor
2017-01-01
Stroke-associated pneumonia is a leading cause of in-hospital death and post-stroke outcome. Screening patients at high risk is one of the main challenges in acute stroke units. Several screening tests have been developed, but their feasibility and validity still remain unclear. The aim of our study was to evaluate the validity of four risk scores (Pneumonia score, A2DS2, ISAN score, and AIS-APS) in a population of ischemic stroke patients admitted in a French stroke unit. Consecutive ischemic stroke patients admitted to a stroke unit were retrospectively analyzed. Data that allowed to retrospectively calculate the different pneumonia risk scores were recorded. Sensitivity and specificity of each score were assessed for in-hospital stroke-associated pneumonia and mortality. The qualitative and quantitative accuracy and utility of each diagnostic screening test were assessed by measuring the Youden Index and the Clinical Utility Index. Complete data were available for only 1960 patients. Pneumonia was observed in 8.6% of patients. Sensitivity and specificity were, respectively, .583 and .907 for Pneumonia score, .744 and .796 for A2DS2, and .696 and .812 for ISAN score. Data were insufficient to test AIS-APS. Stroke-associated pneumonia risk scores had an excellent negative Clinical Utility Index (.77-.87) to screen for in-hospital risk of pneumonia after acute ischemic stroke. All scores might be useful and applied to screen stroke-associated pneumonia in stroke patients treated in French comprehensive stroke units. Copyright © 2017 National Stroke Association. Published by Elsevier Inc. All rights reserved.
Reliability and Validity of the Greek Migraine Disability Assessment (MIDAS) Questionnaire.
Oikonomidi, Theodora; Vikelis, Michail; Artemiadis, Artemios; Chrousos, George P; Darviri, Christina
2018-03-01
The Migraine Disability Assessment (MIDAS) Questionnaire is a reliable and valid instrument for migraine-related disability. Such a tool is needed to quantify migraine-related disability in the Greek population. This validation study aims to assess the test-retest reliability, internal consistency, item discriminant and convergent validity of the Greek translation of the MIDAS. Adults diagnosed with migraine completed the MIDAS Questionnaire on two occasions 3 weeks apart to assess reliability, and completed the RAND-36 to assess validity. Participants (n = 152) had a median MIDAS score of 24 and mostly severe disability (58% were grade IV). The test-retest reliability analysis (N = 59) revealed excellent reliability for the total score. Internal consistency was α = 0.71 for initial and α = 0.82 for retest completion. For item discriminant validity, the correlations between each question and the total score were significant, with high correlations for questions 2-5 (range 0.67 ≤ r ≤ 0.79; p < 0.01). For convergent validity, there was significant negative correlation between the total score and all RAND-36 subscales except for 'emotional wellbeing'. The negative correlation indicates that patients with a lower degree of disability according to their MIDAS score tended to have better wellbeing. Psychometric properties are comparable with those of other published validation studies of the MIDAS and the original. Findings on question 1 show that missing work/school days may be closely related with increased affect issues. The Greek version of the MIDAS Questionnaire has good reliability and validity. This study allowed for cross-cultural comparability of research findings.
Vieira, Gisele de Lacerda Chaves; Pagano, Adriana Silvino; Reis, Ilka Afonso; Rodrigues, Júlia Santos Nunes; Torres, Heloísa de Carvalho
2018-01-01
ABSTRACT Objective: to perform the translation, adaptation and validation of the Diabetes Attitudes Scale - third version instrument into Brazilian Portuguese. Methods: methodological study carried out in six stages: initial translation, synthesis of the initial translation, back-translation, evaluation of the translated version by the Committee of Judges (27 Linguists and 29 health professionals), pre-test and validation. The pre-test and validation (test-retest) steps included 22 and 120 health professionals, respectively. The Content Validity Index, the analyses of internal consistency and reproducibility were performed using the R statistical program. Results: in the content validation, the instrument presented good acceptance among the Judges with a mean Content Validity Index of 0.94. The scale presented acceptable internal consistency (Cronbach’s alpha = 0.60), while the correlation of the total score at the test and retest moments was considered high (Polychoric Correlation Coefficient = 0.86). The Intra-class Correlation Coefficient, for the total score, presented a value of 0.65. Conclusion: the Brazilian version of the instrument (Escala de Atitudes dos Profissionais em relação ao Diabetes Mellitus) was considered valid and reliable for application by health professionals in Brazil. PMID:29319739
ERIC Educational Resources Information Center
Han, Chao
2016-01-01
As a property of test scores, reliability/dependability constitutes an important psychometric consideration, and it underpins the validity of measurement results. A review of interpreter certification performance tests (ICPTs) reveals that (a) although reliability/dependability checking has been recognized as an important concern, its theoretical…
How Should Colleges Treat Multiple Admissions Test Scores? ACT Working Paper 2017-4
ERIC Educational Resources Information Center
Mattern, Krista; Radunzel, Justine; Bertling, Maria; Ho, Andrew
2017-01-01
The percentage of students retaking college admissions tests is rising (Harmston & Crouse, 2016). Researchers and college admissions offices currently use a variety of methods for summarizing these multiple scores. Testing companies, interested in validity evidence like correlations with college first-year grade-point averages (FYGPA), often…
The Air Force Officer Qualifying Test: Validity, Fairness, and Bias
2010-01-01
scores. The Standards for Educational and Psychological Testing (AERA, APA, and NCME, 1999) provides a set of guidelines published and endorsed by the...determining the validity and bias of selection tests falls upon professionals in the discipline of industrial/organizational psychology 20 See Roper v. Dep’t...i). 30 The Air Force Officer Qualifying Test : Validity, Fairness, and Bias and closely related fields (e.g., educational psychology and
Effort, symptom validity testing, performance validity testing and traumatic brain injury.
Bigler, Erin D
2014-01-01
To understand the neurocognitive effects of brain injury, valid neuropsychological test findings are paramount. This review examines the research on what has been referred to a symptom validity testing (SVT). Above a designated cut-score signifies a 'passing' SVT performance which is likely the best indicator of valid neuropsychological test findings. Likewise, substantially below cut-point performance that nears chance or is at chance signifies invalid test performance. Significantly below chance is the sine qua non neuropsychological indicator for malingering. However, the interpretative problems with SVT performance below the cut-point yet far above chance are substantial, as pointed out in this review. This intermediate, border-zone performance on SVT measures is where substantial interpretative challenges exist. Case studies are used to highlight the many areas where additional research is needed. Historical perspectives are reviewed along with the neurobiology of effort. Reasons why performance validity testing (PVT) may be better than the SVT term are reviewed. Advances in neuroimaging techniques may be key in better understanding the meaning of border zone SVT failure. The review demonstrates the problems with rigidity in interpretation with established cut-scores. A better understanding of how certain types of neurological, neuropsychiatric and/or even test conditions may affect SVT performance is needed.
Timed activity performance in persons with upper limb amputation: A preliminary study.
Resnik, Linda; Borgia, Mathew; Acluche, Frantzy
55 subjects with upper limb amputation were administered the T-MAP twice within one week. To develop a timed measure of activity performance for persons with upper limb amputation (T-MAP); examine the measure's internal consistency, test-retest reliability and validity; and compare scores by prosthesis use. Measures of activity performance for persons with upper limb amputation are needed The time required to perform daily activities is a meaningful metric that implication for participation in life roles. Internal consistency and test-retest reliability were evaluated. Construct validity was examined by comparing scores by amputation level. Exploratory analyses compared sub-group scores, and examined correlations with other measures. Scale alpha was 0.77, ICC was 0.93. Timed scores differed by amputation level. Subjects using a prosthesis took longer to perform all tasks. T-MAP was not correlated with other measures of dexterity or activity, but was correlated with pain for non-prosthesis users. The timed scale had adequate internal consistency and excellent test-retest reliability. Analyses support reliability and construct validity of the T-MAP. 2c "outcomes" research. Published by Elsevier Inc.
Goos, Matthias; Schubach, Fabian; Seifert, Gabriel; Boeker, Martin
2016-08-17
Health professionals often manage medical problems in critical situations under time pressure and on the basis of vague information. In recent years, dual process theory has provided a framework of cognitive processes to assist students in developing clinical reasoning skills critical especially in surgery due to the high workload and the elevated stress levels. However, clinical reasoning skills can be observed only indirectly and the corresponding constructs are difficult to measure in order to assess student performance. The script concordance test has been established in this field. A number of studies suggest that the test delivers a valid assessment of clinical reasoning. However, different scoring methods have been suggested. They reflect different interpretations of the underlying construct. In this work we want to shed light on the theoretical framework of script theory and give an idea of script concordance testing. We constructed a script concordance test in the clinical context of "acute abdomen" and compared previously proposed scores with regard to their validity. A test comprising 52 items in 18 clinical scenarios was developed, revised along the guidelines and administered to 56 4(th) and 5(th) year medical students at the end of a blended-learning seminar. We scored the answers using five different scoring methods (distance (2×), aggregate (2×), single best answer) and compared the scoring keys, the resulting final scores and Cronbach's α after normalization of the raw scores. All scores except the single best answers calculation achieved acceptable reliability scores (>= 0.75), as measured by Cronbach's α. Students were clearly distinguishable from the experts, whose results were set to a mean of 80 and SD of 5 by the normalization process. With the two aggregate scoring methods, the students' means values were between 62.5 (AGGPEN) and 63.9 (AGG) equivalent to about three expert SD below the experts' mean value (Cronbach's α : 0.76 (AGGPEN) and 0.75 (AGG)). With the two distance scoring methods the students' mean was between 62.8 (DMODE) and 66.8 (DMEAN) equivalent to about two expert SD below the experts' mean value (Cronbach's α: 0.77 (DMODE) and 0.79 (DMEAN)). In this study the single best answer (SBA) scoring key yielded the worst psychometric results (Cronbach's α: 0.68). Assuming the psychometric properties of the script concordance test scores are valid, then clinical reasoning skills can be measured reliably with different scoring keys in the SCT presented here. Psychometrically, the distance methods seem to be superior, wherein inherent statistical properties of the scales might play a significant role. For methodological reasons, the aggregate methods can also be used. Despite the limitations and complexity of the underlying scoring process and the calculation of reliability, we advocate for SCT because it allows a new perspective on the measurement and teaching of cognitive skills.
Measuring cervical cancer risk: development and validation of the CARE Risky Sexual Behavior Index.
Reiter, Paul L; Katz, Mira L; Ferketich, Amy K; Ruffin, Mack T; Paskett, Electra D
2009-12-01
To develop and validate a risky sexual behavior index specific to cervical cancer research. Sexual behavior data on 428 women from the Community Awareness Resources and Education (CARE) study were utilized. A weighting scheme for eight risky sexual behaviors was generated and validated in creating the CARE Risky Sexual Behavior Index. Cutpoints were then identified to classify women as having a low, medium, or high level of risky sexual behavior. Index scores ranged from 0 to 35, with women considered to have a low level of risky sexual behavior if their score was less than six (31.3% of sample), a medium level if their score was 6–10 (30.6%), or a high level if their score was 11 or greater (38.1%). A strong association was observed between the created categories and having a previous abnormal Pap smear test (p < 0.001). The CARE Risky Sexual Behavior Index provides a tool for measuring risky sexual behavior level for cervical cancer research. Future studies are needed to validate this index in varied populations and test its use in the clinical setting.
Testing Reading Comprehension of Theoretical Discourse with Cloze.
ERIC Educational Resources Information Center
Greene, Benjamin B., Jr.
2001-01-01
Presents evidence from a large sample of reading test scores for the validity of cloze-based assessments of reading comprehension for the discourse typically encountered in introductory college economics textbooks. Notes that results provide strong evidence that appropriately designed cloze tests permit valid assessments of reading comprehension…
Reliability and validity analysis of the open-source Chinese Foot and Ankle Outcome Score (FAOS).
Ling, Samuel K K; Chan, Vincent; Ho, Karen; Ling, Fona; Lui, T H
2017-12-21
Develop the first reliable and validated open-source outcome scoring system in the Chinese language for foot and ankle problems. Translation of the English FAOS into Chinese following regular protocols. First, two forward-translations were created separately, these were then combined into a preliminary version by an expert committee, and was subsequently back-translated into English. The process was repeated until the original and back translations were congruent. This version was then field tested on actual patients who provided feedback for modification. The final Chinese FAOS version was then tested for reliability and validity. Reliability analysis was performed on 20 subjects while validity analysis was performed on 50 subjects. Tools used to validate the Chinese FAOS were the SF36 and Pain Numeric Rating Scale (NRS). Internal consistency between the FAOS subgroups was measured using Cronbach's alpha. Spearman's correlation was calculated between each subgroup in the FAOS, SF36 and NRS. The Chinese FAOS passed both reliability and validity testing; meaning it is reliable, internally consistent and correlates positively with the SF36 and the NRS. The Chinese FAOS is a free, open-source scoring system that can be used to provide a relatively standardised outcome measure for foot and ankle studies. Copyright © 2017 Elsevier Ltd. All rights reserved.
Validity and reliability of Nintendo Wii Fit balance scores.
Wikstrom, Erik A
2012-01-01
Interactive gaming systems have the potential to help rehabilitate patients with musculoskeletal conditions. The Nintendo Wii Balance Board, which is part of the Wii Fit game, could be an effective tool to monitor progress during rehabilitation because the board and game can provide objective measures of balance. However, the validity and reliability of Wii Fit balance scores remain unknown. To determine the concurrent validity of balance scores produced by the Wii Fit game and the intrasession and intersession reliability of Wii Fit balance scores. Descriptive laboratory study. Sports medicine research laboratory. Forty-five recreationally active participants (age = 27.0 ± 9.8 years, height = 170.9 ± 9.2 cm, mass = 72.4 ± 11.8 kg) with a heterogeneous history of lower extremity injury. Participants completed a single-limb-stance task on a force plate and the Star Excursion Balance Test (SEBT) during the first test session. Twelve Wii Fit balance activities were completed during 2 test sessions separated by 1 week. Postural sway in the anteroposterior (AP) and mediolateral (ML) directions and the AP, ML, and resultant center-of-pressure (COP) excursions were calculated from the single-limb stance. The normalized reach distance was recorded for the anterior, posteromedial, and posterolateral directions of the SEBT. Wii Fit balance scores that the game software generated also were recorded. All 96 of the calculated correlation coefficients among Wii Fit activity outcomes and established balance outcomes were interpreted as poor (r < 0.50). Intrasession reliability for Wii Fit balance activity scores ranged from good (intraclass correlation coefficient [ICC] = 0.80) to poor (ICC = 0.39), with 8 activities having poor intrasession reliability. Similarly, 11 of the 12 Wii Fit balance activity scores demonstrated poor intersession reliability, with scores ranging from fair (ICC = 0.74) to poor (ICC = 0.29). Wii Fit balance activity scores had poor concurrent validity relative to COP outcomes and SEBT reach distances. In addition, the included Wii Fit balance activity scores generally had poor intrasession and intersession reliability.
Carter, Amanda G; Creedy, Debra K; Sidebotham, Mary
2017-11-01
develop and test a tool designed for use by academics to evaluate pre-registration midwifery students' critical thinking skills in reflective writing. a descriptive cohort design was used. a random sample (n = 100) of archived student reflective writings based on a clinical event or experience during 2014 and 2015. a staged model for tool development was used to develop a fifteen item scale involving item generation; mapping of draft items to critical thinking concepts and expert review to test content validity; inter-rater reliability testing; pilot testing of the tool on 100 reflective writings; and psychometric testing. Item scores were analysed for mean, range and standard deviation. Internal reliability, content and construct validity were assessed. expert review of the tool revealed a high content validity index score of 0.98. Using two independent raters to establish inter-rater reliability, good absolute agreement of 72% was achieved with a Kappa coefficient K = 0.43 (p<0.0001). Construct validity via exploratory factor analysis revealed three factors: analyses context, reasoned inquiry, and self-evaluation. The mean total score for the tool was 50.48 (SD = 12.86). Total and subscale scores correlated significantly. The scale achieved good internal reliability with a Cronbach's alpha coefficient of .93. this study establishedthe reliability and validity of the CACTiM (reflection) for use by academics to evaluate midwifery students' critical thinking in reflective writing. Validation with large diverse samples is warranted. reflective practice is a key learning and teaching strategy in undergraduate Bachelor of Midwifery programmes and essential for safe, competent practice. There is the potential to enhance critical thinking development by assessingreflective writing with the CACTiM (reflection) tool to provide formative and summative feedback to students and inform teaching strategies. Crown Copyright © 2017. Published by Elsevier Ltd. All rights reserved.
The Epidemiology of Modern Test Score Use: Anticipating Aggregation, Adjustment, and Equating
ERIC Educational Resources Information Center
Ho, Andrew
2013-01-01
In his thoughtful focus article, Haertel (this issue) pushes testing experts to broaden the scope of their validation efforts and to invite scholars from other disciplines to join them. He credits existing validation frameworks for helping the measurement community to identify incomplete or nonexistent validity arguments. However, he notes his…
Urdu version of the neck disability index: a reliability and validity study.
Farooq, Muhammad Nazim; Mohseni-Bandpei, Mohammad A; Gilani, Syed Amir; Hafeez, Ambreen
2017-04-08
Despite the wide use of the neck disability index (NDI) for assessing disability in patients with neck pain, the NDI has not yet been translated and validated in Urdu. The first purpose of the present study was to translate and cross-culturally adapt the NDI into the Urdu language (NDI-U). The second purpose was to investigate the reliability, validity and responsiveness of the NDI-U in Urdu-speaking patients experiencing chronic mechanical neck pain (CMNP). Translation and cross-cultural adaptation of the original version of the NDI were carried out using previously described procedures. Seventy-six patients with CMNP and thirty healthy participants were recruited for the study. NDI-U and visual analogue scales for pain intensity (VAS pain ) and disability (VAS disability ) were administered to all the participants at baseline and to the patients 3 weeks after receiving physiotherapy intervention. The global rating of change scale (GROC) was also administered at this time. Test-retest reliability and internal consistency were carried out on forty-six randomly selected patients two days after they completed the NDI-U. The NDI-U was evaluated for factor analysis, content validity, construct validity (discriminative and convergent validity) and responsiveness. An intra-class correlation coefficient (ICC 2,1 ) revealed excellent test-retest reliability for all items (ICC 2,1 = 0.86-0.98) and total scores (ICC 2,1 = 0.99) of the NDI-U. The NDI-U was found internally consistent with a Cronbach's alpha of 0.90 and a fair to good correlation between single items and the NDI-U total scores (r = 0.34 to 0.89). Factor analysis of the NDI-U produced two factors explaining 66.71% of the variance. Content validity was good, as no floor or ceiling effects were detected for the NDI-U total score. To determine discriminative validity, an independent t-test revealed a significant difference in the NDI-U total scores between the patients and healthy controls (P < 0.001). For convergent validity, Pearson's correlation coefficient showed a strong correlation between NDI-U and VAS disability (r = 0.83, P < 0.001) and a moderate correlation between NDI-U and VAS pain (r = 0.62, P < 0.001). To measure responsiveness, an independent t-test showed a significant difference in the NDI-U change scores between the stable and the improved groups (P < 0.001). Furthermore, moderate correlations were found between the NDI-U change scores and the GROC (r = 0.50, P < 0.001), VAS disability change scores (r = 0.58, P < 0.001) and VAS pain change scores (r = 0.55, P < 0.001). The results showed that the NDI-U is a reliable, valid and responsive questionnaire to measure disability in Urdu-speaking patients with CMNP.
Test-Retest Reliability and Predictive Validity of the Implicit Association Test in Children
ERIC Educational Resources Information Center
Rae, James R.; Olson, Kristina R.
2018-01-01
The Implicit Association Test (IAT) is increasingly used in developmental research despite minimal evidence of whether children's IAT scores are reliable across time or predictive of behavior. When test-retest reliability and predictive validity have been assessed, the results have been mixed, and because these studies have differed on many…
The Validity and Clinical Uses of the Pepper Visual Skills for Reading Test.
ERIC Educational Resources Information Center
Watson, G.; And Others
1990-01-01
The Pepper Visual Skills for Reading Test was assessed as a measure of reading ability with meaningful text in 38 adults with macular degeneration; scores were compared with assessment made using the Gray Oral Reading Test, a previously standardized assessment. The test's validity was confirmed. (Author/JDD)
A Longitudinal Study of the Predictive Validity of a Kindergarten Screening Battery.
ERIC Educational Resources Information Center
Kilgallon, Mary K.; Mueller, Richard J.
Test validity was studied in nine subtests of a kindergarten screening battery used to predict reading comprehension for children up to five years after entering kindergarten. The independent variables were kindergarteners' scores on the: (1) Otis-Lennon Mental Ability Test; (2) Bender Visual Motor Gestalt Test; (3) Detroit Tests of Learning…
Cross-Validation of the Computerized Adaptive Screening Test (CAST).
ERIC Educational Resources Information Center
Pliske, Rebecca M.; And Others
The Computerized Adaptive Screening Test (CAST) was developed to provide an estimate at recruiting stations of prospects' Armed Forces Qualification Test (AFQT) scores. The CAST was designed to replace the paper-and-pencil Enlistment Screening Test (EST). The initial validation study of CAST indicated that CAST predicts AFQT at least as accurately…
Ang, Rebecca P; Chong, Wan Har; Huan, Vivien S; Yeo, Lay See
2007-01-01
This article reports the development and initial validation of scores obtained from the Adolescent Concerns Measure (ACM), a scale which assesses concerns of Asian adolescent students. In Study 1, findings from exploratory factor analysis using 619 adolescents suggested a 24-item scale with four correlated factors--Family Concerns (9 items), Peer Concerns (5 items), Personal Concerns (6 items), and School Concerns (4 items). Initial estimates of convergent validity for ACM scores were also reported. The four-factor structure of ACM scores derived from Study 1 was confirmed via confirmatory factor analysis in Study 2 using a two-fold cross-validation procedure with a separate sample of 811 adolescents. Support was found for both the multidimensional and hierarchical models of adolescent concerns using the ACM. Internal consistency and test-retest reliability estimates were adequate for research purposes. ACM scores show promise as a reliable and potentially valid measure of Asian adolescents' concerns.
Gerard, James M; Scalzo, Anthony J; Borgman, Matthew A; Watson, Christopher M; Byrnes, Chelsie E; Chang, Todd P; Auerbach, Marc; Kessler, David O; Feldman, Brian L; Payne, Brian S; Nibras, Sohail; Chokshi, Riti K; Lopreiato, Joseph O
2018-06-01
We developed a first-person serious game, PediatricSim, to teach and assess performances on seven critical pediatric scenarios (anaphylaxis, bronchiolitis, diabetic ketoacidosis, respiratory failure, seizure, septic shock, and supraventricular tachycardia). In the game, players are placed in the role of a code leader and direct patient management by selecting from various assessment and treatment options. The objective of this study was to obtain supportive validity evidence for the PediatricSim game scores. Game content was developed by 11 subject matter experts and followed the American Heart Association's 2011 Pediatric Advanced Life Support Provider Manual and other authoritative references. Sixty subjects with three different levels of experience were enrolled to play the game. Before game play, subjects completed a 40-item written pretest of knowledge. Game scores were compared between subject groups using scoring rubrics developed for the scenarios. Validity evidence was established and interpreted according to Messick's framework. Content validity was supported by a game development process that involved expert experience, focused literature review, and pilot testing. Subjects rated the game favorably for engagement, realism, and educational value. Interrater agreement on game scoring was excellent (intraclass correlation coefficient = 0.91, 95% confidence interval = 0.89-0.9). Game scores were higher for attendings followed by residents then medical students (Pc < 0.01) with large effect sizes (1.6-4.4) for each comparison. There was a very strong, positive correlation between game and written test scores (r = 0.84, P < 0.01). These findings contribute validity evidence for PediatricSim game scores to assess knowledge of pediatric emergency medicine resuscitation.
Junghaenel, Doerte U; Schneider, Stefan; Stone, Arthur A; Christodoulou, Christopher; Broderick, Joan E
2014-04-01
This study examined the ecological validity and clinical utility of NIH Patient Reported-Outcomes Measurement Information System (PROMIS®) instruments for anger, depression, and fatigue in women with premenstrual symptoms. One-hundred women completed daily diaries and weekly PROMIS assessments over 4weeks. Weekly assessments were administered through Computerized Adaptive Testing (CAT). Weekly CATs and corresponding daily scores were compared to evaluate ecological validity. To test clinical utility, we examined if CATs could detect changes in symptom levels, if these changes mirrored those obtained from daily scores, and if CATs could identify clinically meaningful premenstrual symptom change. PROMIS CAT scores were higher in the pre-menstrual than the baseline (ps<.0001) and post-menstrual (ps<.0001) weeks. The correlations between CATs and aggregated daily scores ranged from .73 to .88 supporting ecological validity. Mean CAT scores showed systematic changes in accordance with the menstrual cycle and the magnitudes of the changes were similar to those obtained from the daily scores. Finally, Receiver Operating Characteristic (ROC) analyses demonstrated the ability of the CATs to discriminate between women with and without clinically meaningful premenstrual symptom change. PROMIS CAT instruments for anger, depression, and fatigue demonstrated validity and utility in premenstrual symptom assessment. The results provide encouraging initial evidence of the utility of PROMIS instruments for the measurement of affective premenstrual symptoms. Copyright © 2014 Elsevier Inc. All rights reserved.
Rostami, Reza; Sadeghi, Vahid; Zarei, Jamileh; Haddadi, Parvaneh; Mohazzab-Torabi, Saman; Salamati, Payman
2013-04-01
The aim of this study was to compare the Persian version of the wechsler intelligence scale for children - fourth edition (WISC-IV) and cognitive assessment system (CAS) tests, to determine the correlation between their scales and to evaluate the probable concurrent validity of these tests in patients with learning disorders. One-hundered-sixty-two children with learning disorder who were presented at Atieh Comprehensive Psychiatry Center were selected in a consecutive non-randomized order. All of the patients were assessed based on WISC-IV and CAS scores questionnaires. Pearson correlation coefficient was used to analyze the correlation between the data and to assess the concurrent validity of the two tests. Linear regression was used for statistical modeling. The type one error was considered 5% in maximum. There was a strong correlation between total score of WISC-IV test and total score of CAS test in the patients (r=0.75, P<0.001). The correlations among the other scales were mostly high and all of them were statistically significant (P<0.001). A linear regression model was obtained (α = 0.51, β = 0.81 and P<0.001). There is an acceptable correlation between the WISC-IV scales and CAS test in children with learning disorders. A concurrent validity is established between the two tests and their scales.
Rostami, Reza; Sadeghi, Vahid; Zarei, Jamileh; Haddadi, Parvaneh; Mohazzab-Torabi, Saman; Salamati, Payman
2013-01-01
Objective The aim of this study was to compare the Persian version of the wechsler intelligence scale for children - fourth edition (WISC-IV) and cognitive assessment system (CAS) tests, to determine the correlation between their scales and to evaluate the probable concurrent validity of these tests in patients with learning disorders. Methods One-hundered-sixty-two children with learning disorder who were presented at Atieh Comprehensive Psychiatry Center were selected in a consecutive non-randomized order. All of the patients were assessed based on WISC-IV and CAS scores questionnaires. Pearson correlation coefficient was used to analyze the correlation between the data and to assess the concurrent validity of the two tests. Linear regression was used for statistical modeling. The type one error was considered 5% in maximum. Findings There was a strong correlation between total score of WISC-IV test and total score of CAS test in the patients (r=0.75, P<0.001). The correlations among the other scales were mostly high and all of them were statistically significant (P<0.001). A linear regression model was obtained (α = 0.51, β = 0.81 and P<0.001). Conclusion There is an acceptable correlation between the WISC-IV scales and CAS test in children with learning disorders. A concurrent validity is established between the two tests and their scales. PMID:23724180
NASA Astrophysics Data System (ADS)
Meilinda; Rustaman, N. Y.; Firman, H.; Tjasyono, B.
2018-05-01
The Climate Change System Thinking Instrument (CCSTI) is developed to measure a system thinking ability in the concept of climate change. CCSTI is developed in four phase’s development including instrument draft development, validation and evaluation including readable material test, expert validation, and field test. The result of field test is analyzed by looking at the readability score in Cronbach’s alpha test. Draft instrument is tested on college students majoring in Biology Education, Physics Education, and Chemistry Education randomly with a total number of 80 college students. Score of Content Validation Index at 0.86, which means that the CCSTI developed are categorized as very appropriate with question indicators and Cronbach’s alpha about 0.605 which mean categorized undesirable to minimal acceptable. From 45 questions of system thinking, there are 37 valid questions spread in four indicators of system thinking, which are system thinking phase I (pre-requirement), system thinking phase II (basic), system thinking phase III (intermediate), and system thinking phase IV (coherent expert).
Validation of the Arabic version of the score for allergic rhinitis tool.
Alharethy, Sami; Wedami, Mawaheb Al; Syouri, Falah; Alqabbani, Almaha A; Baqays, Abdulsalam; Mesallam, Tamer; Aldrees, Turki
2017-01-01
Allergic rhinitis (AR) is a common inflammation of the nasal mucosa in response to allergen exposure. We translated and validated the Score for Allergic Rhinitis (SFAR) into an Arabic version so that the disease can be studied in an Arabic population. SFAR is a non-invasive self-administered tool that evaluates eight items related to AR. This study aimed to translate and culturally adapt the SFAR questionnaire into Arabic, and assess the validity, consistency, and reliability of the translated version in an Arabic-speaking population of patients with suspected AR. Cross-sectional. Tertiary care hospital in Riyadh. The Arabic version of the SFAR was administered to patients with suspected AR and control participants. Comparison of the AR and control groups to determine the test-retest reliability and internal consistency of the instrument. The AR (n=173) and control (n=75) groups had significantly different Arabic SFAR scores (P < .0001). The instrument provided satisfactory internal consistency (Cronbach's alpha value of 0.7). The test-retest reliability was excellent for the total Arabic SFAR score (r =0.836, P < .0001). These findings demonstrate that the Arabic version of the SFAR is a valid tool that can be used to screen Arabic speakers with suspected AR. The absence of objective allergy testing.
Brown, Zachary M; Gibbs, Jenna C; Adachi, Jonathan D; Ashe, Maureen C; Hill, Keith D; Kendler, David L; Khan, Aliya; Papaioannou, Alexandra; Prasad, Sadhana; Wark, John D; Giangregorio, Lora M
2017-11-28
We sought to evaluate the Balance Outcome Measure for Elder Rehabilitation (BOOMER) in community-dwelling women 65 years and older with vertebral fracture and to describe score distributions and potential ceiling and floor effects. This was a secondary data analysis of baseline data from the Build Better Bones with Exercise randomized controlled trial using the BOOMER. A total of 141 women with osteoporosis and radiographically confirmed vertebral fracture were included. Concurrent validity and internal consistency were assessed in comparison to the Short Physical Performance Battery (SPPB). Normality and ceiling/floor effects of total BOOMER scores and component test items were also assessed. Exploratory analyses of assistive aid use and falls history were performed. Tests for concurrent validity demonstrated moderate correlation between total BOOMER and SPPB scores. The BOOMER component tests showed modest internal consistency. Substantial ceiling effect and nonnormal score distributions were present among overall sample and those not using assistive aids for total BOOMER scores, although scores were normally distributed for those using assistive aids. The static standing with eyes closed test demonstrated the greatest ceiling effects of the component tests, with 92% of participants achieving a maximal score. While the BOOMER compares well with the SPPB in community-dwelling women with vertebral fractures, researchers or clinicians considering using the BOOMER in similar or higher-functioning populations should be aware of the potential for ceiling effects.
Salamonsen, Matthew; McGrath, David; Steiler, Geoff; Ware, Robert; Colt, Henri; Fielding, David
2013-09-01
To reduce complications and increase success, thoracic ultrasound is recommended to guide all chest drainage procedures. Despite this, no tools currently exist to assess proceduralist training or competence. This study aims to validate an instrument to assess physician skill at performing thoracic ultrasound, including effusion markup, and examine its validity. We developed an 11-domain, 100-point assessment sheet in line with British Thoracic Society guidelines: the Ultrasound-Guided Thoracentesis Skills and Tasks Assessment Test (UGSTAT). The test was used to assess 22 participants (eight novices, seven intermediates, seven advanced) on two occasions while performing thoracic ultrasound on a pleural effusion phantom. Each test was scored by two blinded expert examiners. Validity was examined by assessing the ability of the test to stratify participants according to expected skill level (analysis of variance) and demonstrating test-retest and intertester reproducibility by comparison of repeated scores (mean difference [95% CI] and paired t test) and the intraclass correlation coefficient. Mean scores for the novice, intermediate, and advanced groups were 49.3, 73.0, and 91.5 respectively, which were all significantly different (P < .0001). There were no significant differences between repeated scores. Procedural training on mannequins prior to unsupervised performance on patients is rapidly becoming the standard in medical education. This study has validated the UGSTAT, which can now be used to determine the adequacy of thoracic ultrasound training prior to clinical practice. It is likely that its role could be extended to live patients, providing a way to document ongoing procedural competence.
Interpretation and Utilization of Scores on the Air Force Officer Qualifying Test.
ERIC Educational Resources Information Center
Miller, Robert E.
The report summarizes a large body of data relevant to the proper interpretation and use of aptitude scores on the Air Force Officer Qualifying Test (AFOQT). Included are descriptions of the AFOQT testing program and the test itself. Technical data include an extensive sampling of validation studies covering predictors of success in pilot…
Towards Virtual FLS: Development of a Peg Transfer Simulator
Arikatla, Venkata S; Ahn, Woojin; Sankaranarayanan, Ganesh; De, Suvranu
2014-01-01
Background Peg transfer is one of five tasks in the Fundamentals of Laparoscopic Surgery (FLS), program. We report the development and validation of a Virtual Basic Laparoscopic Skill Trainer-Peg Transfer (VBLaST-PT©) simulator for automatic real-time scoring and objective quantification of performance. Methods We have introduced new techniques in order to allow bi-manual manipulation of pegs and automatic scoring/evaluation while maintaining high quality of simulation. We performed a preliminary face and construct validation study with 22 subjects divided into two groups: experts (PGY 4–5, fellow and practicing surgeons) and novice (PGY 1–3). Results Face validation shows high scores for all the aspects of the simulation. A two-tailed Mann-Whitney U-test scores showed significant difference between the two groups on completion time (p=0.003), FLS score (p=0.002) and the VBLaST-PT© score (p=0.006). Conclusions VBLaST-PT© is a high quality virtual simulator that showed both face and construct validity. PMID:24030904
Shedler, J; Beck, A; Bensen, S
2000-07-01
Many case-finding instruments are available to help primary care physicians (PCPs) diagnose depression, but they are not widely used. Physicians often consider these instruments too time consuming or feel they do not provide sufficient diagnostic information. Our study examined the validity and utility of the Quick PsychoDiagnostics (QPD) Panel, an automated mental health test designed to meet the special needs of PCPs. The test screens for 9 common psychiatric disorders and requires no physician time to administer or score. We evaluated criterion validity relative to the Structured Clinical Interview for DSM-IV (SCID), and evaluated convergent validity by correlating QPD Panel scores with established mental health measures. Sensitivity to change was examined by readministering the test to patients pretreatment and posttreatment. Utility was evaluated through physician and patient satisfaction surveys. For major depression, sensitivity and specificity were 81% and 96%, respectively. For other disorders, sensitivities ranged from 69% to 98%, and specificities ranged from 90% to 97%. The depression severity score correlated highly with the Beck, Hamilton, Zung, and CES-D depression scales, and the anxiety score correlated highly with the Spielberger State-Trait Anxiety Inventory and the anxiety subscale of the Symptom Checklist 90 (Ps <.001). The test was sensitive to change. All PCPs agreed or strongly agreed that the QPD Panel "is convenient and easy to use," "can be used immediately by any physician," and "helps provide better patient care." Patients also rated the test favorably. The QPD Panel is a valid mental health assessment tool that can diagnose a range of common psychiatric disorders and is practical for routine use in primary care.
Bleau Lavigne, Maude; Reeves, Isabelle; Sasseville, Marie-Josée; Loignon, Christine
The primary purpose of this study was to develop 2 survey tools to explore factors influencing adoption of best practices for diabetic foot ulcer offloading treatment in primary health care settings. One survey was intended for the patients receiving care for a diabetic foot ulcer in primary health care settings and the other was intended for the health professionals providing treatment. The second purpose of this study was to evaluate the psychometric properties of the 2 surveys. Development and validation of survey instruments. Two surveys were developed using a published guide. Following review of pertinent literature and identification of variables to be measured, a bank of items was developed and pretested to determine clarity of the item and responses. Psychometric testing comprised measurement of content validity index (CVI) and intraclass correlation coefficient (ICC). Only items obtaining satisfactory CVI and ICC scores were included in the final version of the surveys. The final version of the patient survey contained 41 items and the final version of the survey for health care professionals contained 21 items. The patient-intended survey's items demonstrate high content validity scores and satisfactory test-retest reliability scores. The overall CVI score was 0.98. Forty of the 49 items eligible for testing obtain satisfactory ICC scores. One item's test-retest reliability could not be tested but it was retained based on its high CVI. The health professional-intended survey, an overall CVI score of 0.91 but items had lower ICC scores (63%, 31 of the 49 items), did not achieve a satisfactory ICC score for inclusion in the final instrument. This project led to development of 2 instruments designed to identify and explore factors influencing adoption of best practices for diabetic foot ulcer offloading treatment in the primary health care setting. Future research and testing is required to translate these French surveys into English and additional languages, in order to reach a broader population.
Absolute and Relative Measures of Instructional Sensitivity
ERIC Educational Resources Information Center
Naumann, Alexander; Hartig, Johannes; Hochweber, Jan
2017-01-01
Valid inferences on teaching drawn from students' test scores require that tests are sensitive to the instruction students received in class. Accordingly, measures of the test items' instructional sensitivity provide empirical support for validity claims about inferences on instruction. In the present study, we first introduce the concepts of…
Vuillerot, Carole; Meilleur, Katherine G.; Jain, Minal; Waite, Melissa; Wu, Tianxia; Linton, Melody; Datsgir, Jahannaz; Donkervoort, Sandra; Leach, Meganne E.; Rutkowski, Anne; Rippert, Pascal; Payan, Christine; Iwaz, Jean; Hamroun, Dalil; Bérard, Carole; Poirot, Isabelle; Bönnemann, Carsten G.
2016-01-01
Objective To develop and validate an English version of the Neuromuscular (NM)-Score, a classification for patients with NM diseases in each of the 3 motor function domains: D1, standing and transfers; D2, axial and proximal motor function; and D3, distal motor function. Design Validation survey. Setting Patients seen at a medical research center between June and September 2013. Participants Consecutive patients (N = 42) aged 5 to 19 years with a confirmed or suspected diagnosis of congenital muscular dystrophy. Interventions Not applicable. Main Outcome Measures An English version of the NM-Score was developed by a 9-person expert panel that assessed its content validity and semantic equivalence. Its concurrent validity was tested against criterion standards (Brooke Scale, Motor Function Measure [MFM], activity limitations for patients with upper and/or lower limb impairments [ACTIVLIM], Jebsen Test, and myometry measurements). Informant agreement between patient/caregiver (P/C)-reported and medical doctor (MD)-reported NM scores was measured by weighted kappa. Results Significant correlation coefficients were found between NM scores and criterion standards. The highest correlations were found between NM-score D1 and MFM score D1 (ρ = −.944, P<.0001), ACTIVLIM (ρ = −.895, P<.0001), and hip abduction strength by myometry (ρ = −.811, P<.0001). Informant agreement between P/C-reported and MD-reported NM scores was high for D1 (κ = .801; 95% confidence interval [CI], .701–.914) but moderate for D2 (κ = .592; 95% CI, .412–.773) and D3 (κ = .485; 95% CI, .290–.680). Correlation coefficients between the NM scores and the criterion standards did not significantly differ between P/C-reported and MD-reported NM scores. Conclusions Patients and physicians completed the English NM-Score easily and accurately. The English version is a reliable and valid instrument that can be used in clinical practice and research to describe the functional abilities of patients with NM diseases. PMID:24862765
NASA Astrophysics Data System (ADS)
Monika, Icha; Yeni, Laili Fitri; Ariyati, Eka
2016-02-01
This research aimed to reveal the validity of the flipbook as a medium of learning for the sub-material of environmental pollution in the tenth grade based on the results of the activity test of kencur (Kaempferia galanga) extract to control the growth of the Fusarium oxysporum fungus. The research consisted of two stages. First, testing the validity of the medium of flipbook through validation by seven assessors and analyzed based on the total average score of all aspects. Second, testing the activity of the kencur extract against the growth of Fusarium oxysporum by using the experimental method with 10 treatments and 3 repetitions which were analyzed using one-way analysis of variance (ANOVA) test. The making of the flipbook medium was done through the stages of analysis for the potential and problems, data collection, design, validation, and revision. The validation analysis on the flipbook received an average score of 3.7 and was valid to a certain extent, so it could be used in the teaching and learning process especially in the sub-material of environmental pollution in the tenth grade of the senior high school.
Automated Essay Scoring versus Human Scoring: A Comparative Study
ERIC Educational Resources Information Center
Wang, Jinhao; Brown, Michelle Stallone
2007-01-01
The current research was conducted to investigate the validity of automated essay scoring (AES) by comparing group mean scores assigned by an AES tool, IntelliMetric [TM] and human raters. Data collection included administering the Texas version of the WriterPlacer "Plus" test and obtaining scores assigned by IntelliMetric [TM] and by…
Development and Validation of a Bilingual Stroke Preparedness Assessment Instrument.
Skolarus, Lesli E; Mazor, Kathleen M; Sánchez, Brisa N; Dome, Mackenzie; Biller, José; Morgenstern, Lewis B
2017-04-01
Stroke preparedness interventions are limited by the lack of psychometrically sound intermediate end points. We sought to develop and assess the reliability and validity of the video-Stroke Action Test (video-STAT) an English and a Spanish video-based test to assess people's ability to recognize and react to stroke signs. Video-STAT development and testing was divided into 4 phases: (1) video development and community-generated response options, (2) pilot testing in community health centers, (3) administration in a national sample, bilingual sample, and neurologist sample, and (4) administration before and after a stroke preparedness intervention. The final version of the video-STAT included 8 videos: 4 acute stroke/emergency, 2 prior stroke/nonemergency, 1 nonstroke/emergency, and 1 nonstroke/nonemergency. Acute stroke recognition and action response were queried after each vignette. Video-STAT scoring was based on the acute stroke vignettes only (score range 0-12 best). The national sample consisted of 598 participants, 438 who took the video-STAT in English and 160 who took the video-STAT in Spanish. There was adequate internal consistency (Cronbach α=0.72). The average video-STAT score was 5.6 (SD=3.6), whereas the average neurologist score was 11.4 (SD=1.3). There was no difference in video-STAT scores between the 116 bilingual video-STAT participants who took the video-STAT in English or Spanish. Compared with baseline scores, the video-STAT scores increased after a stroke preparedness intervention (6.2 versus 8.9, P <0.01) among a sample of 101 black adults and youth. The video-STAT yields reliable scores that seem to be valid measures of stroke preparedness. © 2017 American Heart Association, Inc.
Development and Validation of a Bilingual Stroke Preparedness Assessment Instrument
Skolarus, Lesli E.; Mazor, Kathleen M.; Sánchez, Brisa N.; Dome, Mackenzie; Biller, José; Morgenstern, Lewis B.
2017-01-01
Background and Purpose Stroke preparedness interventions are limited by the lack of psychometrically sound intermediate endpoints. We sought to develop and assess the reliability and validity of the video-Stroke Action Test, video-STAT, an English and Spanish video-based test to assess people’s ability to recognize and react to stroke signs. Methods Video-STAT development and testing was divided into four phases: 1) video development and community-generated response options; 2) pilot testing in community health centers; 3) administration in a national sample, bilingual sample and neurologist sample; and 4) administration before and after a stroke preparedness intervention. Results The final version of the video-STAT included 8 videos: 4 acute stroke/emergency, 2 prior stroke/non-emergency, 1 non-stroke/emergency, 1 non-stroke/non-emergency. Acute stroke recognition and action response were queried after each vignette. Video-STAT scoring was based on the acute stroke vignettes only (score range 0–12 best). The national sample consisted of 598 participants, 438 who took the video-STAT in English and 160 who took the video-STAT in Spanish. There was adequate internal consistency (Cronbach’s alpha=0.72). The average video-STAT score was 5.6 (sd=3.6) while the average neurologist score was 11.4 (sd=1.3). There was no difference in video-STAT scores between the 116 bilingual video-STAT participants who took the video-STAT in English or Spanish. Compared to baseline scores, the video-STAT scores increased following a stroke preparedness intervention (6.2 vs. 8.9, p<0.01) among a sample of 101 African American adults and youth. Conclusion The video-STAT yields reliable scores that appear to be valid measures of stroke preparedness. PMID:28250199
ERIC Educational Resources Information Center
Harsch, Claudia; Ushloda, Ema; Ladroue, Christophe
2017-01-01
The project examined the predictive validity of the "TOEFL iBT"® test with a focus on the relationship between TOEFL iBT scores and students' subsequent academic success in postgraduate studies in one leading university in the United Kingdom, paying specific attention to the role of linguistic preparedness as perceived by students and…
Rantz, Marilyn J; Aud, Myra A; Zwygart-Stauffacher, Mary; Mehr, David R; Petroski, Gregory F; Owen, Steven V; Madsen, Richard W; Flesner, Marcia; Conn, Vicki; Maas, Meridean
2008-01-01
Field test results are reported for the Observable Indicators of Nursing Home Care Quality Instrument-Assisted Living Version, an instrument designed to measure the quality of care in assisted living facilities after a brief 30-minute walk-through. The OIQ-AL was tested in 207 assisted-living facilities in two states using classical test theory, generalizability theory, and exploratory factor analysis. The 34-item scale has a coherent six-factor structure that conceptually describes the multidimensional concept of care quality in assisted living. The six factors can be logically clustered into process (Homelike and Caring, 21 items) and structure (Access and Choice; Lighting; Plants and Pets; Outdoor Spaces) subscales and for a total quality score. Classical test theory results indicate most subscales and the total quality score from the OIQ-AL have acceptable interrater, test-retest, and strong internal consistency reliabilities. Generalizability theory analyses reveal that dependability of scores from the instrument are strong, particularly by including a second observer who conducts a site visit and independently completes an instrument, or by a single observer conducting two site visits and completing instruments during each visit. Scoring guidelines based on the total sample of observations (N = 358) help guide those who want to use the measure to interpret both subscale and total scores. Content validity was supported by two expert panels of people experienced in the assisted-living field, and a content validity index calculated for the first version of the scale is high (3.43 on a four-point scale). The OIQ-AL gives reliable and valid scores for researchers, and may be useful for consumers, providers, and others interested in measuring quality of care in assisted-living facilities.
Infant polysomnography: reliability and validity of infant arousal assessment.
Crowell, David H; Kulp, Thomas D; Kapuniai, Linda E; Hunt, Carl E; Brooks, Lee J; Weese-Mayer, Debra E; Silvestri, Jean; Ward, Sally Davidson; Corwin, Michael; Tinsley, Larry; Peucker, Mark
2002-10-01
Infant arousal scoring based on the Atlas Task Force definition of transient EEG arousal was evaluated to determine (1). whether transient arousals can be identified and assessed reliably in infants and (2). whether arousal and no-arousal epochs scored previously by trained raters can be validated reliably by independent sleep experts. Phase I for inter- and intrarater reliability scoring was based on two datasets of sleep epochs selected randomly from nocturnal polysomnograms of healthy full-term, preterm, idiopathic apparent life-threatening event cases, and siblings of Sudden Infant Death Syndrome infants of 35 to 64 weeks postconceptional age. After training, test set 1 reliability was assessed and discrepancies identified. After retraining, test set 2 was scored by the same raters to determine interrater reliability. Later, three raters from the trained group rescored test set 2 to assess inter- and intrarater reliabilities. Interrater and intrarater reliability kappa's, with 95% confidence intervals, ranged from substantial to almost perfect levels of agreement. Interrater reliabilities for spontaneous arousals were initially moderate and then substantial. During the validation phase, 315 previously scored epochs were presented to four sleep experts to rate as containing arousal or no-arousal events. Interrater expert agreements were diverse and considered as noninterpretable. Concordance in sleep experts' agreements, based on identification of the previously sampled arousal and no-arousal epochs, was used as a secondary evaluative technique. Results showed agreement by two or more experts on 86% of the Collaborative Home Infant Monitoring Evaluation Study arousal scored events. Conversely, only 1% of the Collaborative Home Infant Monitoring Evaluation Study-scored no-arousal epochs were rated as an arousal. In summary, this study presents an empirically tested model with procedures and criteria for attaining improved reliability in transient EEG arousal assessments in infants using the modified Atlas Task Force standards. With training based on specific criteria, substantial inter- and intrarater agreement in identifying infant arousals was demonstrated. Corroborative validation results were too disparate for meaningful interpretation. Alternate evaluation based on concordance agreements supports reliance on infant EEG criteria for assessment. Results mandate additional confirmatory validation studies with specific training on infant EEG arousal assessment criteria.
The reliability and validity of the Tokyo Autistic Behaviour Scale.
Kurita, H; Miyake, Y
1990-03-01
The Tokyo Autistic Behavior Scale (TABS) consisting of 39 items provisionally grouped in four areas--interpersonal-social relationship, language-communication, habit-mannerism and others--is an instrument used by a child's caretaker to rate the child's autistic behaviors on a 3-point scale. Test-retest reliability was satisfactory (i.e., an r for a total score was .94). Among six DSM-III diagnostic groups, infantile autism showed a significantly higher total TABS score than the other five groups, and a taxonomic validity coefficient was .54. An r between total scores of the TABS and the Childhood Autism Rating Scale--Tokyo Version was .59. The area scores showed a lower validity than the total score. The TABS appears to be a useful instrument to assess autistic behavior.
Validation of scores of use of inhalation devices: valoration of errors *
Zambelli-Simões, Letícia; Martins, Maria Cleusa; Possari, Juliana Carneiro da Cunha; Carvalho, Greice Borges; Coelho, Ana Carla Carvalho; Cipriano, Sonia Lucena; de Carvalho-Pinto, Regina Maria; Cukier, Alberto; Stelmach, Rafael
2015-01-01
Abstract Objective: To validate two scores quantifying the ability of patients to use metered dose inhalers (MDIs) or dry powder inhalers (DPIs); to identify the most common errors made during their use; and to identify the patients in need of an educational program for the use of these devices. Methods: This study was conducted in three phases: validation of the reliability of the inhaler technique scores; validation of the contents of the two scores using a convenience sample; and testing for criterion validation and discriminant validation of these instruments in patients who met the inclusion criteria. Results: The convenience sample comprised 16 patients. Interobserver disagreement was found in 19% and 25% of the DPI and MDI scores, respectively. After expert analysis on the subject, the scores were modified and were applied in 72 patients. The most relevant difficulty encountered during the use of both types of devices was the maintenance of total lung capacity after a deep inhalation. The degree of correlation of the scores by observer was 0.97 (p < 0.0001). There was good interobserver agreement in the classification of patients as able/not able to use a DPI (50%/50% and 52%/58%; p < 0.01) and an MDI (49%/51% and 54%/46%; p < 0.05). Conclusions: The validated scores allow the identification and correction of inhaler technique errors during consultations and, as a result, improvement in the management of inhalation devices. PMID:26398751
The Effects of Item by Item Feedback Given during an Ability Test.
ERIC Educational Resources Information Center
Whetton, C.; Childs, R.
1981-01-01
Answer-until-correct (AUC) is a procedure for providing feedback during a multiple-choice test, giving an increased range of scores. The performance of secondary students on a verbal ability test using AUC procedures was compared with a group using conventional instructions. AUC scores considerably enhanced reliability but not validity.…
Job-Derived Selection: Follow Up Report. Technical Report No. 4.
ERIC Educational Resources Information Center
McCormick, Ernest J.; And Others
A study dealt with the use of the Position Analysis Questionnaire (PAQ) within a job component validity framework as the basis for estimating aptitude requirements of jobs represented by scores on commercially available tests as contrasted with scores on General Aptitude Test Battery (GATB) tests. Procedures generally consisted of the use of job…
Reliability and Validity of the TIMPSI for Infants With Spinal Muscular Atrophy Type I
Krosschell, Kristin J.; Maczulski, Jo Anne; Scott, Charles; King, Wendy; Hartman, Jill T.; Case, Laura E.; Viazzo-Trussell, Donata; Wood, Janine; Roman, Carolyn A.; Hecker, Eva; Meffert, Marianne; Léveillé, Maude; Kienitz, Krista; Swoboda, Kathryn J.
2014-01-01
Purpose This study examined the reliability and validity of the Test of Infant Motor Performance Screening Items (TIMPSI) in infants with type I spinal muscular atrophy (SMA). Methods After training, 12 evaluators scored 4 videos of infants with type I SMA to assess interrater reliability. Intrarater and test-retest reliability was further assessed for 9 evaluators during a SMA type I clinical trial, with 9 evaluators testing a total of 38 infants twice. Relatedness of the TIMPSI score to ability to reach and ventilatory support was also examined. Results Excellent interrater video score reliability was noted (intraclass correlation coefficient, 0.97–0.98). Intrarater reliability was excellent (intraclass correlation coefficient, 0.91–0.98) and test-retest reliability ranged from r = 0.82 to r = 0.95. The TIMPSI score was related to the ability to reach (P ≤ .05). Conclusion The TIMPSI can reliably be used to assess motor function in infants with type I SMA. In addition, the TIMPSI scores are related to the ability to reach, an important functional skill in children with type I SMA. PMID:23542189
Iwata, Shintaro; Uehara, Kosuke; Ogura, Koichi; Akiyama, Toru; Shinoda, Yusuke; Yonemoto, Tsukasa; Kawai, Akira
2016-09-01
The Musculoskeletal Tumor Society (MSTS) scoring system is a widely used functional evaluation tool for patients treated for musculoskeletal tumors. Although the MSTS scoring system has been validated in English and Brazilian Portuguese, a Japanese version of the MSTS scoring system has not yet been validated. We sought to determine whether a Japanese-language translation of the MSTS scoring system for the lower extremity had (1) sufficient reliability and internal consistency, (2) adequate construct validity, and (3) reasonable criterion validity compared with the Toronto Extremity Salvage Score (TESS) and SF-36 using psychometric analysis. The Japanese version of the MSTS scoring system was developed using accepted guidelines, which included translation of the English version of the MSTS into Japanese by five native Japanese bilingual musculoskeletal oncology surgeons and integrated into one document. One hundred patients with a diagnosis of intermediate or malignant bone or soft tissue tumors located in the lower extremity and who had undergone tumor resection with or without reconstruction or amputation participated in this study. Reliability was evaluated by test-retest analysis, and internal consistency was established by Cronbach's alpha coefficient. Construct validity was evaluated using the principal factor analysis and Akaike information criterion network. Criterion validity was evaluated by comparing the MSTS scoring system with the TESS and SF-36. Test-retest analysis showed a high intraclass correlation coefficient (0.92; 95% CI, 0.88-0.95), indicating high reliability of the Japanese version of the MSTS scoring system, although a considerable ceiling effect was observed, with 23 patients (23%) given the maximum score. Cronbach's alpha coefficient was 0.87 (95% CI, 0.82-0.90), suggesting a high level of internal consistency. Factor analysis revealed that all items had high loading values and communalities; we identified a central role for the items "walking" and "gait" according to the Akaike information criterion network. The total MSTS score was correlated with that of the TESS (r = 0.81; 95% CI, 0.73-0.87; p < 0.001) and the physical component summary and physical functioning of the SF-36. The Japanese-language translation of the MSTS scoring system for the lower extremity has sufficient reliability and reasonable validity. Nevertheless, the observation of a ceiling effect suggests poor ability of this system to discriminate from among patients who have a high level of function.
Feasibility of remote administration of the Fundamentals of Laparoscopic Surgery (FLS) skills test.
Okrainec, Allan; Vassiliou, Melina; Kapoor, Andrew; Pitzul, Kristen; Henao, Oscar; Kaneva, Pepa; Jackson, Timothy; Ritter, E Matt
2013-11-01
Fundamentals of Laparoscopic Surgery (FLS) certification testing currently is offered at accredited test centers or at select surgical conferences. Maintaining these test centers requires considerable investment in human and financial resources. Additionally, it can be challenging for individuals outside North America to become FLS certified. The objective of this pilot study was to assess the feasibility of remotely administering and scoring the FLS examination using live videoconferencing compared with standard onsite testing. This parallel mixed-methods study used both FLS scoring data and participant feedback to determine the barriers to feasibility of remote proctoring for the FLS examination. Participants were tested at two accredited FLS testing centers. An official FLS proctor administered and scored the FLS exam remotely while another onsite proctor provided a live score of participants' performance. Participant feedback was collected during testing. Interrater reliabilities of onsite and remote FLS scoring data were compared using intraclass correlation coefficients (ICCs). Participant feedback was analyzed using modified grounded theory to identify themes for barriers to feasibility. The scores of the remote and onsite proctors showed excellent interrater reliability in the total FLS (ICC 0.995, CI [0.985-0.998]). Several barriers led to critical errors in remote scoring, but most were accompanied by a solution incorporated into the study protocol. The most common barrier was the chain of custody for exam accessories. The results of this pilot study suggest that remote administration of the FLS has the potential to decrease costs without altering test-taker scores or exam validity. Further research is required to validate protocols for remote and onsite proctors and to direct execution of these protocols in a controlled environment identical to current FLS test administration.
TOEFL iBT Speaking Test Scores as Indicators of Oral Communicative Language Proficiency
ERIC Educational Resources Information Center
Bridgeman, Brent; Powers, Donald; Stone, Elizabeth; Mollaun, Pamela
2012-01-01
Scores assigned by trained raters and by an automated scoring system (SpeechRater[TM]) on the speaking section of the TOEFL iBT[TM] were validated against a communicative competence criterion. Specifically, a sample of 555 undergraduate students listened to speech samples from 184 examinees who took the Test of English as a Foreign Language…
ERIC Educational Resources Information Center
Bingxiun, Liu; And Others
1990-01-01
To estimate the predictive validity of the Chinese National Medical Examination, scores of a sample (n=1,717) of participating examinees were compared with program directors' ratings on nine aspects of clinical competence. Test scores were consistent with competence measures and overall, correlated significantly with ratings, while varying for…
Kruizinga, Ingrid; Jansen, Wilma; de Haan, Carolien L.; Raat, Hein
2012-01-01
Background The KIPPPI (Brief Instrument Psychological and Pedagogical Problem Inventory) is a Dutch questionnaire that measures psychosocial and pedagogical problems in 2-year olds and consists of a KIPPPI Total score, Wellbeing scale, Competence scale, and Autonomy scale. This study examined the reliability, validity, screening accuracy and clinical application of the KIPPPI. Methods Parents of 5959 2-year-old children in the Rotterdam area, the Netherlands, were invited to participate in the study. Parents of 3164 children (53.1% of all invited parents) completed the questionnaire. The internal consistency was evaluated and in subsamples the test-retest reliability and concurrent validity with regard to the Child Behavioral Checklist (CBCL). Discriminative validity was evaluated by comparing scores of parents who worried about their child’s upbringing and parent’s that did not. Screening accuracy of the KIPPPI was evaluated against the CBCL by calculating the Receiver Operating Characteristic (ROC) curves. The clinical application was evaluated by the relation between KIPPPI scores and the clinical decision made by the child health professionals. Results Psychometric properties of the KIPPPI Total score, Wellbeing scale, Competence scale and Autonomy scale were respectively: Cronbach’s alphas: 0.88, 0.86, 0.83, 0.58. Test-retest correlations: 0.80, 0.76, 0.73, 0.60. Concurrent validity was as hypothesised. The KIPPPI was able to discriminate between parents that worried about their child and parents that did not. Screening accuracy was high (>0.90) for the KIPPPI Total score and for the Wellbeing scale. The KIPPPI scale scores and clinical decision of the child health professional were related (p<0.05), indicating a good clinical application. Conclusion The results in this large-scale study of a diverse general population sample support the reliability, validity and clinical application of the KIPPPI Total score, Wellbeing scale and Competence scale. Also, the screening accuracy of the KIPPPI Total score and Wellbeing scale were supported. The Autonomy scale needs further study. PMID:23185388
Development of a Culturally Valid Counselor Burnout Inventory for Korean Counselors
ERIC Educational Resources Information Center
Yu, Kumlan; Lee, Sang Min; Nesbit, Elisabeth A.
2008-01-01
This article describes the development of the culturally valid Counselor Burnout Inventory. A multistage approach including item translation; item refinement; and evaluation of factorial validity, reliability, and score validity was used to test constructs and validation. Implications for practice and future research are discussed. (Contains 3…
ERIC Educational Resources Information Center
Cannon, Joanna E.; Hubley, Anita M.
2014-01-01
Content validation is a crucial, but often neglected, component of good test development. In the present study, content validity evidence was collected to determine the degree to which elements (e.g., grammatical structures, items, picture responses, administration, and scoring instructions) of the Comprehension of Written Grammar (CWG) test are…
Amin, Amit P; Nathan, Sandeep; Vassallo, Patricia; Calvin, James E
2009-05-20
To emphasize the importance of troponin in the context of a new score for risk stratifying acute coronary syndromes (ACS) patients. Although troponins have powerful prognostic value, current ACS scores do not fully capitalize this prognostic ability. Here, we weigh troponin status in a multiplicative manner to develop the TRACS score from previously published Rush score risk factors (RRF). 2,866 ACS patients (46.7% troponin positive) from 9 centers comprising the TRACS registry, were randomly split into derivation (n=1,422) and validation (n=1,444) cohorts. In the derivation sample, RRF sum was multiplied by 3 if troponins were positive to yield the TRACS score, which was grouped into five categories of 0-2, 3-5, 6-8, 9-11, 12-15 (multiples of 3). Predictive performance of this score to predict hospital death was ascertained in the validation sample. The TRACS score had ROC AUC of 0.71 in the validation cohort. Logistic regression, Kaplan-Meier analysis, likelihood-ratio and Bayesian Information Criterion (BIC) test indicated that weighing troponin status with 3 in the TRACS score improved the prediction of mortality. Hosmer-Lemeshow test indicated sound model fit. We demonstrate that weighing troponin as a multiple of 3 yields robust prognostication of hospital mortality in ACS patients, when used in the context of the TRACS score.
Amin, Amit P; Nathan, Sandeep; Vassallo, Patricia; Calvin, James E
2009-01-01
Structured Abstract Objective: To emphasize the importance of troponin in the context of a new score for risk stratifying acute coronary syndromes (ACS) patients. Although troponins have powerful prognostic value, current ACS scores do not fully capitalize this prognostic ability. Here, we weigh troponin status in a multiplicative manner to develop the TRACS score from previously published Rush score risk factors (RRF). Methods: 2,866 ACS patients (46.7% troponin positive) from 9 centers comprising the TRACS registry, were randomly split into derivation (n=1,422) and validation (n=1,444) cohorts. In the derivation sample, RRF sum was multiplied by 3 if troponins were positive to yield the TRACS score, which was grouped into five categories of 0-2, 3-5, 6-8, 9-11, 12-15 (multiples of 3). Predictive performance of this score to predict hospital death was ascertained in the validation sample. Results: The TRACS score had ROC AUC of 0.71 in the validation cohort. Logistic regression, Kaplan-Meier analysis, likelihood-ratio and Bayesian Information Criterion (BIC) test indicated that weighing troponin status with 3 in the TRACS score improved the prediction of mortality. Hosmer-Lemeshow test indicated sound model fit. Conclusions: We demonstrate that weighing troponin as a multiple of 3 yields robust prognostication of hospital mortality in ACS patients, when used in the context of the TRACS score. PMID:19557150
ERIC Educational Resources Information Center
Kramer, Gene A.; Johnston, JoElle
1997-01-01
A study examined the relationship between Optometry Admission Test scores and pre-optometry or undergraduate grade point average (GPA) with first and second year performance in optometry schools. The test's predictive validity was limited but significant, and comparable to those reported for other admission tests. In addition, the scores…
[Development and validation of the Visual Analogue Scale (VAS) Spine Score].
Knop, C; Oeser, M; Bastian, L; Lange, U; Zdichavsky, M; Blauth, M
2001-06-01
The aim of the study was the development and validation of a new subjective rating scale for assessment of outcome in patients with thoracolumbar fractures and fracture dislocations. The VAS spine score consists of 19 score items, using 100-mm visual analogue scales. The items are answered by the patients independently of rater assessment. To measure the analogue scales and calculate the score, a computer-aided system was evolved consisting of self-developed software and digitizer board. The overall score is the mean of all items answered with values between 0 and 100. The individual score loss is calculated as the difference between the preinjury score and at follow-up with values between 0 and 100. The VAS spine score was tested for reliability with a group of 136 healthy volunteers. We performed a test-retest study with an interval of 24 h. For statistical analysis of the validity, we prospectively followed a group of 53 patients with the new outcome score. We chose patients with injuries of the thoracolumbar spine, all having been operatively treated by combined posterior-anterior stabilization and fusion between 1994 and 1996. In the reference group, the average test score was 91.95 (58-100) and 92.10 (58-100) at retest. The mean individual difference between test and retest scored 1.037 (0-8). A high reliability was proved by a strong correlation with a coefficient of 0.976 (p < 0.001). A high internal consistency of the VAS spine score was shown by a Cronbach-alpha of 0.9117. The mean score for the preinjury status of the patients was comparable to the reference group, amounting to 89.60 (21-100). The mean score at the time of implant removal was significantly (p < 0.001) decreased to 58.25 (13-97). Until the time of follow-up a significant (p < 0.001) increase was noted, and the group scored 66.08 (15-100) at follow-up. This was a significant (p < 0.001) difference compared with the preinjury status. The individual score loss averaged 24.1 (0-80). In the patient group we also noted a Cronbach-alpha > 0.95, indicating a high internal consistency. With the VAS spine score the authors have inaugurated a new tool for outcome measurement in the treatment of patients with thoracolumbar injuries. The study has proved the score to be both reliable and valid. The application of the score is helpful in analyzing the subjective outcome, and the results can be correlated with objective measures. The score is a useful tool for comparative clinical studies, addressing the outcome after different methods of treatment.
Iversen, J V; Bartels, E M; Jørgensen, J E; Nielsen, T G; Ginnerup, C; Lind, M C; Langberg, H
2016-12-01
The VISA-A questionnaire has proven to be a valid and reliable tool for assessing severity of Achilles tendinopathy (AT). The aim was to translate and cross-culturally adapt the VISA-A questionnaire for a Danish-speaking AT population, and subsequently perform validity and reliability tests. Translation and following cross-cultural adaptation was performed as translation, synthesis, reverse translation, expert review, and pretesting. The final Danish version (VISA-A-DK) was tested for reliability on healthy controls (n = 75) and patients (n = 36). Tests for internal consistency, validity, and structure were performed on 71 patients. VISA-A-DK showed good reliability for patients (r = 0.80 ICC = 0.79) and healthy individuals (r = 0.98 ICC = 0.97). Internal consistency was 0.73 (Cronbach's alpha). The mean VISA-A-DK score in AT patients was 51 [47-55]. This was significantly lower than healthy controls with a score of 93 (90-95). Criterion validity was considered good when comparing the scores of the Danish version with the original version in both healthy individuals and patients. VISA-A-DK is a valid and reliable instrument and has shown compatible to the original version in assessment of AT patients. VISA-A-DK is a useful tool in the assessment of AT, both in research and in a clinical setting. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
2011-01-01
Background Since stress is hypothesized to play a role in the etiology of obesity during adolescence, research on associations between adolescent stress and obesity-related parameters and behaviours is essential. Due to lack of a well-established recent stress checklist for use in European adolescents, the study investigated the reliability and validity of the Adolescent Stress Questionnaire (ASQ) for assessing perceived stress in European adolescents. Methods The ASQ was translated into the languages of the participating cities (Ghent, Stockholm, Vienna, Zaragoza, Pecs and Athens) and was implemented within the HELENA cross-sectional study. A total of 1140 European adolescents provided a valid ASQ, comprising 10 component scales, used for internal reliability (Cronbach α) and construct validity (confirmatory factor analysis or CFA). Contributions of socio-demographic (gender, age, pubertal stage, socio-economic status) characteristics to the ASQ score variances were investigated. Two-hundred adolescents also provided valid saliva samples for cortisol analysis to compare with the ASQ scores (criterion validity). Test-retest reliability was investigated using two ASQ assessments from 37 adolescents. Results Cronbach α-values of the ASQ scales (0.57 to 0.88) demonstrated a moderate internal reliability of the ASQ, and intraclass correlation coefficients (0.45 to 0.84) established an insufficient test-retest reliability of the ASQ. The adolescents' gender (girls had higher stress scores than boys) and pubertal stage (those in a post-pubertal development had higher stress scores than others) significantly contributed to the variance in ASQ scores, while their age and socio-economic status did not. CFA results showed that the original scale construct fitted moderately with the data in our European adolescent population. Only in boys, four out of 10 ASQ scale scores were a significant positive predictor for baseline wake-up salivary cortisol, suggesting a rather poor criterion validity of the ASQ, especially in girls. Conclusions In our European adolescent sample, the ASQ had an acceptable internal reliability and construct validity and the adolescents' gender and pubertal stage systematically contributed to the ASQ variance, but its test-retest reliability and criterion validity were rather poor. Overall, the utility of the ASQ for assessing perceived stress in adolescents across Europe is uncertain and some aspects require further examination. PMID:21943341
Translation and Validation of the Knee Society Score - KSS for Brazilian Portuguese
Silva, Adriana Lucia Pastore e; Demange, Marco Kawamura; Gobbi, Riccardo Gomes; da Silva, Tânia Fernanda Cardoso; Pécora, José Ricardo; Croci, Alberto Tesconi
2012-01-01
Objective To translate, culturally adapt and validate the "Knee Society Score"(KSS) for the Portuguese language and determine its measurement properties, reproducibility and validity. Methods We analyzed 70 patients of both sexes, aged between 55 and 85 years, in a cross-sectional clinical trial, with diagnosis of primary osteoarthritis ,undergoing total knee arthroplasty surgery. We assessed the patients with the English version of the KSS questionnaire and after 30 minutes with the Portuguese version of the KSS questionnaire, done by a different evaluator. All the patients were assessed preoperatively, and again at three, and six months postoperatively. Results There was no statistical difference, using Cronbach's alpha index and the Bland-Altman graphical analysis, for the knees core during the preoperative period (p =1), and at three months (p =0.991) and six months postoperatively (p =0.985). There was no statistical difference for knee function score for all three periods (p =1.0). Conclusion The Brazilian version of the Knee Society Score is easy to apply, as well providing as a valid and reliable instrument for measuring the knee score and function of Brazilian patients undergoing TKA. Level of Evidence: Level I - Diagnostic Studies- Investigating a Diagnostic Test- Testing of previously developed diagnostic criteria on consecutive patients (with universally applied 'gold' reference standard). PMID:24453576
Simulation-based assessment in anesthesiology: requirements for practical implementation.
Boulet, John R; Murray, David J
2010-04-01
Simulations have taken a central role in the education and assessment of medical students, residents, and practicing physicians. The introduction of simulation-based assessments in anesthesiology, especially those used to establish various competencies, has demanded fairly rigorous studies concerning the psychometric properties of the scores. Most important, major efforts have been directed at identifying, and addressing, potential threats to the validity of simulation-based assessment scores. As a result, organizations that wish to incorporate simulation-based assessments into their evaluation practices can access information regarding effective test development practices, the selection of appropriate metrics, the minimization of measurement errors, and test score validation processes. The purpose of this article is to provide a broad overview of the use of simulation for measuring physician skills and competencies. For simulations used in anesthesiology, studies that describe advances in scenario development, the development of scoring rubrics, and the validation of assessment results are synthesized. Based on the summary of relevant research, psychometric requirements for practical implementation of simulation-based assessments in anesthesiology are forwarded. As technology expands, and simulation-based education and evaluation takes on a larger role in patient safety initiatives, the groundbreaking work conducted to date can serve as a model for those individuals and organizations that are responsible for developing, scoring, or validating simulation-based education and assessment programs in anesthesiology.
Objectivity, Reliability, and Validity of the Bent-Knee Push-Up for College-Age Women
ERIC Educational Resources Information Center
Wood, Heather M.; Baumgartner, Ted A.
2004-01-01
The revised push-up test has been found to have good validity but it produces many zero scores for women. Maybe there should be an alternative to the revised push-up test for college-age women. The purpose of this study was to determine the objectivity, reliability, and validity for the bent-knee push-up test (executed on hands and knees) for…
A score to estimate the likelihood of detecting advanced colorectal neoplasia at colonoscopy
Kaminski, Michal F; Polkowski, Marcin; Kraszewska, Ewa; Rupinski, Maciej; Butruk, Eugeniusz; Regula, Jaroslaw
2014-01-01
Objective This study aimed to develop and validate a model to estimate the likelihood of detecting advanced colorectal neoplasia in Caucasian patients. Design We performed a cross-sectional analysis of database records for 40-year-old to 66-year-old patients who entered a national primary colonoscopy-based screening programme for colorectal cancer in 73 centres in Poland in the year 2007. We used multivariate logistic regression to investigate the associations between clinical variables and the presence of advanced neoplasia in a randomly selected test set, and confirmed the associations in a validation set. We used model coefficients to develop a risk score for detection of advanced colorectal neoplasia. Results Advanced colorectal neoplasia was detected in 2544 of the 35 918 included participants (7.1%). In the test set, a logistic-regression model showed that independent risk factors for advanced colorectal neoplasia were: age, sex, family history of colorectal cancer, cigarette smoking (p<0.001 for these four factors), and Body Mass Index (p=0.033). In the validation set, the model was well calibrated (ratio of expected to observed risk of advanced neoplasia: 1.00 (95% CI 0.95 to 1.06)) and had moderate discriminatory power (c-statistic 0.62). We developed a score that estimated the likelihood of detecting advanced neoplasia in the validation set, from 1.32% for patients scoring 0, to 19.12% for patients scoring 7–8. Conclusions Developed and internally validated score consisting of simple clinical factors successfully estimates the likelihood of detecting advanced colorectal neoplasia in asymptomatic Caucasian patients. Once externally validated, it may be useful for counselling or designing primary prevention studies. PMID:24385598
A score to estimate the likelihood of detecting advanced colorectal neoplasia at colonoscopy.
Kaminski, Michal F; Polkowski, Marcin; Kraszewska, Ewa; Rupinski, Maciej; Butruk, Eugeniusz; Regula, Jaroslaw
2014-07-01
This study aimed to develop and validate a model to estimate the likelihood of detecting advanced colorectal neoplasia in Caucasian patients. We performed a cross-sectional analysis of database records for 40-year-old to 66-year-old patients who entered a national primary colonoscopy-based screening programme for colorectal cancer in 73 centres in Poland in the year 2007. We used multivariate logistic regression to investigate the associations between clinical variables and the presence of advanced neoplasia in a randomly selected test set, and confirmed the associations in a validation set. We used model coefficients to develop a risk score for detection of advanced colorectal neoplasia. Advanced colorectal neoplasia was detected in 2544 of the 35,918 included participants (7.1%). In the test set, a logistic-regression model showed that independent risk factors for advanced colorectal neoplasia were: age, sex, family history of colorectal cancer, cigarette smoking (p<0.001 for these four factors), and Body Mass Index (p=0.033). In the validation set, the model was well calibrated (ratio of expected to observed risk of advanced neoplasia: 1.00 (95% CI 0.95 to 1.06)) and had moderate discriminatory power (c-statistic 0.62). We developed a score that estimated the likelihood of detecting advanced neoplasia in the validation set, from 1.32% for patients scoring 0, to 19.12% for patients scoring 7-8. Developed and internally validated score consisting of simple clinical factors successfully estimates the likelihood of detecting advanced colorectal neoplasia in asymptomatic Caucasian patients. Once externally validated, it may be useful for counselling or designing primary prevention studies. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Pittman, Joyce; Beeson, Terrie; Terry, Colin; Dillon, Jill; Hampton, Charity; Kerley, Denise; Mosier, Judith; Gumiela, Ellen; Tucker, Jessica
2016-01-01
Despite prevention strategies, hospital-acquired pressure ulcers (HAPUs) continue to occur in the acute care setting. The purpose of this study was to develop an operational definition of and an instrument for identifying avoidable/unavoidable HAPUs in the acute care setting. The Indiana University Health Pressure Ulcer Prevention Inventory (PUPI) was developed and psychometric testing was performed. A retrospective pilot study of 31 adult hospitalized patients with an HAPU was conducted using the PUPI. Overall content validity index of 0.99 and individual item content validity index scores (0.9-1.0) demonstrated excellent content validity. Acceptable PUPI criterion validity was demonstrated with no statistically significant differences between wound specialists' and other panel experts' scoring. Construct validity findings were acceptable with no statistically significant differences among avoidable or unavoidable HAPU patients and their Braden Scale total scores. Interrater reliability was acceptable with perfect agreement on the total PUPI score between raters (κ = 1.0; P = .025). Raters were in total agreement 93% (242/260) of the time on all 12 individual PUPI items. No risk factors were found to be significantly associated with unavoidable HAPUs. An operational definition of and an instrument for identifying avoidable/unavoidable HAPUs in the acute care setting were developed and tested. The instrument provides an objective and structured method for identifying avoidable/unavoidable HAPUs. The PUPI provides an additional method that could be used in root-cause analyses and when reporting adverse pressure ulcer events.
Lasebikan, Victor Olufolahan
2012-01-01
Objective. To validate the Yoruba version of Family Burden Interview Schedule (Y-FBIS) for assessing the burden on caregivers of persons with schizophrenia. Methods. Three hundred and sixty-eight dyads of persons with schizophrenia and their caregivers were recruited from a psychiatric outpatient clinic. The (Y-FBIS) and the Yoruba version of the GHQ-12 (Y-GHQ-12) were applied to the caregivers. Patients' level of social functioning was assessed using the Global Assessment of Functioning scale. Results. All (368) caregivers were used for tests of internal consistency, 180 for interrater reliability, and another 180 for test-retest reliability. Internal consistency of the Y-FBIS was demonstrated by a significant Cronbach α of between 0.62 and 0.82 for each item. Concurrent validity of the Y-FBIS was illustrated by its significant positive correlation with Y-GHQ-12 (r = 0.633 , P < 0.01). Split-half reliability was 0.849. Intraclass correlation coefficient for the total score of Y-FBIS was 0.849 at 95% confidence interval. Test-retest reliability of individual scales ranged from 0.780 to 0.874 and was 0.830 for total objective scale score. Convergent validity was shown by the significant positive correlation (r = 0.83) between the objective burden score and subjective burden score of Y-FBIS. ROC curve area was 0.981. Conclusion. The Y-FBIS is a valid, reliable, and sensitive instrument for assessing the burden on caregivers of persons with schizophrenia in Nigeria. PMID:23738196
Development and validation of a food-based diet quality index for New Zealand adolescents
2013-01-01
Background As there is no population-specific, simple food-based diet index suitable for examination of diet quality in New Zealand (NZ) adolescents, there is a need to develop such a tool. Therefore, this study aimed to develop an adolescent-specific diet quality index based on dietary information sourced from a Food Questionnaire (FQ) and examine its validity relative to a four-day estimated food record (4DFR) obtained from a group of adolescents aged 14 to 18 years. Methods A diet quality index for NZ adolescents (NZDQI-A) was developed based on ‘Adequacy’ and ‘Variety’ of five food groups reflecting the New Zealand Food and Nutrition Guidelines for Healthy Adolescents. The NZDQI-A was scored from zero to 100, with a higher score reflecting a better diet quality. Forty-one adolescents (16 males, 25 females, aged 14–18 years) each completed the FQ and a 4DFR. The test-retest reliability of the FQ-derived NZDQI-A scores over a two-week period and the relative validity of the scores compared to the 4DFR were estimated using Pearson’s correlations. Construct validity was examined by comparing NZDQI-A scores against nutrient intakes obtained from the 4DFR. Results The NZDQI-A derived from the FQ showed good reliability (r = 0.65) and reasonable agreement with 4DFR in ranking participants by scores (r = 0.39). More than half of the participants were classified into the same thirds of scores while 10% were misclassified into the opposite thirds by the two methods. Higher NZDQI-A scores were also associated with lower total fat and saturated fat intakes and higher iron intakes. Conclusions Higher NZDQI-A scores were associated with more desirable fat and iron intakes. The scores derived from either FQ or 4DFR were comparable and reproducible when repeated within two weeks. The NZDQI-A is relatively valid and reliable in ranking diet quality in adolescents at a group level even in a small sample size. Further studies are required to test the predictive validity of this food-based diet index in larger samples. PMID:23759064
Alsalaheen, Bara; Haines, Jamie; Yorke, Amy; Broglio, Steven P
2015-12-01
To examine the reliability, convergent, and discriminant validity of the limits of stability (LOS) test to assess dynamic postural stability in adolescents using a portable forceplate system. Cross-sectional reliability observational study. School setting. Adolescents (N=36) completed all measures during the first session. To examine the reliability of the LOS test, a subset of 15 participants repeated the LOS test after 1 week. Not applicable. Outcome measurements included the LOS test, Balance Error Scoring System, Instrumented Balance Error Scoring System, and Modified Clinical Test for Sensory Interaction on Balance. A significant relation was observed among LOS composite scores (r=.36-.87, P<.05). However, no relation was observed between LOS and static balance outcome measurements. The reliability of the LOS composite scores ranged from moderate to good (intraclass correlation coefficient model 2,1=.73-.96). The results suggest that the LOS composite scores provide unique information about dynamic postural stability, and the LOS test completed at 100% of the theoretical limit appeared to be a reliable test of dynamic postural stability in adolescents. Clinicians should use dynamic balance measurement as part of their balance assessment and should not use static balance testing (eg, Balance Error Scoring System) to make inferences about dynamic balance, especially when balance assessment is used to determine rehabilitation outcomes, or when making return to play decisions after injury. Copyright © 2015 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Tarescavage, Anthony M; Wygant, Dustin B; Gervais, Roger O; Ben-Porath, Yossef S
2013-01-01
The current study examined the over-reporting Validity Scales of the MMPI-2 Restructured Form (MMPI-2-RF; Ben-Porath & Tellegen, 2008/2011) in relation to the Slick, Sherman, and Iverson (1999) criteria for the diagnosis of Malingered Neurocognitive Dysfunction in a sample of 916 consecutive non-head injury disability claimants. The classification of Malingered Neurocognitive Dysfunction was based on scores from several cognitive symptom validity tests and response bias indicators built into traditional neuropsychological tests. Higher scores on MMPI-2-RF Validity Scales, particularly the Response Bias Scale (Gervais, Ben-Porath, Wygant, & Green, 2007), were associated with probable and definite Malingered Neurocognitive Dysfunction. The MMPI-2-RF's Validity Scales classification accuracy of Malingered Neurocognitive Dysfunction improved when multiple scales were interpreted. Additionally, higher scores on MMPI-2-RF substantive scales measuring distress, internalizing dysfunction, thought dysfunction, and social avoidance were associated with probable and definite Malingered Neurocognitive Dysfunction. Implications for clinical practice and future directions are noted.
Zhao, Xiaohui; Oppler, Scott; Dunleavy, Dana; Kroopnick, Marc
2010-10-01
This study investigated the validity of four approaches (the average, most recent, highest-within-administration, and highest-across-administration approaches) of using repeaters' Medical College Admission Test (MCAT) scores to predict Step 1 scores. Using the differential predication method, this study investigated the magnitude of differences in the expected Step 1 total scores between MCAT nonrepeaters and three repeater groups (two-time, three-time, and four-time test takers) for the four scoring approaches. For the average score approach, matriculants with the same MCAT average are expected to achieve similar Step 1 total scores regardless of whether the individual attempted the MCAT exam one or multiple times. For the other three approaches, repeaters are expected to achieve lower Step 1 scores than nonrepeaters; for a given MCAT score, as the number of attempts increases, the expected Step 1 decreases. The effect was strongest for the highest-across-administration approach, followed by the highest-within-administration approach, and then the most recent approach. Using the average score is the best approach for considering repeaters' MCAT scores in medical school admission decisions.
Cross-cultural validity of a dietary questionnaire for studies of dental caries risk in Japanese
2014-01-01
Background Diet is a major modifiable contributing factor in the etiology of dental caries. The purpose of this paper is to examine the reliability and cross-cultural validity of the Japanese version of the Food Frequency Questionnaire to assess dietary intake in relation to dental caries risk in Japanese. Methods The 38-item Food Frequency Questionnaire, in which Japanese food items were added to increase content validity, was translated into Japanese, and administered to two samples. The first sample comprised 355 pregnant women with mean age of 29.2 ± 4.2 years for the internal consistency and criterion validity analyses. Factor analysis (principal components with Varimax rotation) was used to determine dimensionality. The dietary cariogenicity score was calculated from the Food Frequency Questionnaire and used for the analyses. Salivary mutans streptococci level was used as a semi-quantitative assessment of dental caries risk and measured by Dentocult SM. Dentocult SM scores were compared with the dietary cariogenicity score computed from the Food Frequency Questionnaire to examine criterion validity, and assessed by Spearman’s correlation coefficient (rs) and Kruskal-Wallis test. Test-retest reliability of the Food Frequency Questionnaire was assessed with a second sample of 25 adults with mean age of 34.0 ± 3.0 years by using the intraclass correlation coefficient analysis. Results The Japanese language version of the Food Frequency Questionnaire showed high test-retest reliability (ICC = 0.70) and good criterion validity assessed by relationship with salivary mutans streptococci levels (rs = 0.22; p < 0.001). Factor analysis revealed four subscales that construct the questionnaire (solid sugars, solid and starchy sugars, liquid and semisolid sugars, sticky and slowly dissolving sugars). Internal consistency were low to acceptable (Cronbach’s alpha = 0.67 for the total scale, 0.46-0.61 for each subscale). Mean dietary cariogenicity scores were 50.8 ± 19.5 in the first sample, 47.4 ± 14.1, and 40.6 ± 11.3 for the first and second administrations in the second sample. The distribution of Dentocult SM score was 6.8% (score = 0), 34.4% (score = 1), 39.4% (score = 2), and 19.4% (score = 3). Participants with higher scores were more likely to have higher dietary cariogenicity scores (p < 0.001; Kruskal-Wallis test). Conclusions These results provide the preliminary evidence for the reliability and validity of the Japanese language Food Frequency Questionnaire. PMID:24383547
The Pareidolia Test: A Simple Neuropsychological Test Measuring Visual Hallucination-Like Illusions.
Mamiya, Yasuyuki; Nishio, Yoshiyuki; Watanabe, Hiroyuki; Yokoi, Kayoko; Uchiyama, Makoto; Baba, Toru; Iizuka, Osamu; Kanno, Shigenori; Kamimura, Naoto; Kazui, Hiroaki; Hashimoto, Mamoru; Ikeda, Manabu; Takeshita, Chieko; Shimomura, Tatsuo; Mori, Etsuro
2016-01-01
Visual hallucinations are a core clinical feature of dementia with Lewy bodies (DLB), and this symptom is important in the differential diagnosis and prediction of treatment response. The pareidolia test is a tool that evokes visual hallucination-like illusions, and these illusions may be a surrogate marker of visual hallucinations in DLB. We created a simplified version of the pareidolia test and examined its validity and reliability to establish the clinical utility of this test. The pareidolia test was administered to 52 patients with DLB, 52 patients with Alzheimer's disease (AD) and 20 healthy controls (HCs). We assessed the test-retest/inter-rater reliability using the intra-class correlation coefficient (ICC) and the concurrent validity using the Neuropsychiatric Inventory (NPI) hallucinations score as a reference. A receiver operating characteristic (ROC) analysis was used to evaluate the sensitivity and specificity of the pareidolia test to differentiate DLB from AD and HCs. The pareidolia test required approximately 15 minutes to administer, exhibited good test-retest/inter-rater reliability (ICC of 0.82), and moderately correlated with the NPI hallucinations score (rs = 0.42). Using an optimal cut-off score set according to the ROC analysis, and the pareidolia test differentiated DLB from AD with a sensitivity of 81% and a specificity of 92%. Our study suggests that the simplified version of the pareidolia test is a valid and reliable surrogate marker of visual hallucinations in DLB.
Ebrahimzadeh, Mohammad H; Birjandinejad, Ali; Razi, Shiva; Mardani-Kivi, Mohsen; Reza Kachooei, Amir
2015-09-01
Oxford shoulder score is a specific 12-item patient-reported tool for evaluation of patients with inflammatory and degenerative disorders of the shoulder. Since its introduction, it has been translated and culturally adapted in some Western and Eastern countries. The aim of this study was to translate the Oxford Shoulder Score (OSS) in Persian and to test its validity and reliability in Persian speaking population in Iran. One hundred patients with degenerative or inflammatory shoulder problem participated in the survey in 2012. All patients completed the Persian version of OSS, Persian DASH and the SF-36 for testing validity. Randomly, 37 patients filled out the Persian OSS again three days after the initial visit to assess the reliability of the questionnaire. Cronbach's alpha coefficient was 0.93. The intraclass correlation coefficient was 0.93. In terms of validity, there was a significant correlation between the Persian OSS and DASH and SF-36 scores (P < 0.001). The Persian version of the OSS proved to be a valid, reliable, and reproducible tool as demonstrated by high Cronbach's alpha and Pearson's correlation coefficients. The Persian transcript of OSS is administrable to Persian speaking patients with shoulder condition and it is understandable by them.
Validation of the VISA-A questionnaire for Turkish language: the VISA-A-Tr study.
Dogramaci, Yunus; Kalaci, Aydiner; Kücükkübas, Nigar; Inandi, Taceddin; Esen, Erdinc; Yanat, A Nedim
2011-04-01
To evaluate the validity and reliability of the Turkish version of the Victorian Institute of Sports Assessment-Achilles (VISA-A) questionnaire for patients with Achilles tendinopathy. Fifty-five patients with a diagnosis of Achilles tendinopathy and 55 healthy subjects were included in the study. VISA-A questionnaires were translated and culturally adapted into Turkish. The final Turkish version (VISA-A-Tr) was tested for reliability on healthy individuals and patients. Tests for internal consistency, validity and structure were performed on 55 patients. The VISA-A-Tr showed good test-retest reliability (Pearson's r=0.99, p<0.001). The patients with Achilles tendinopathy had a significantly lower score (p<0.001) than the healthy individuals. The VISA-A-Tr score correlated significantly with the Stanish tendon grading system (Spearman's r=-0.86; p<0.001). The VISA-A-Tr is a valid and reliable tool for evaluating the severity of Achilles tendinopathy.
Does the Defining Issues Test measure ethical judgment ability or political position?
Bailey, Charles D
2011-01-01
This article addresses the construct validity of the Defining Issues Test of ethical judgment (DIT/DIT-2). Alleging a political bias in the test, Emler and colleagues (1983, 1998, 1999, 2007), show that conservatives score higher when asked to fake as liberals, implying that they understand the reasoning associated with "higher" moral development but avoid items they see as liberally biased. DIT proponents challenge the internal validity of faking studies, advocating an explained-variance validation. This study takes a new approach: Adult participants complete the DIT-2, then evaluate the raw responses of others to discern political orientation and ethical development. Results show that individuals scoring higher on the DIT-2 rank others' ethical judgment in a way consistent with DIT-2-based rankings. Accuracy at assessing political orientation, however, is low. Results support the DIT-2's validity as a measure of ethical development, not an expression of political position.
Validation of the Spanish Addiction Severity Index Multimedia Version (S-ASI-MV).
Butler, Stephen F; Redondo, José Pedro; Fernandez, Kathrine C; Villapiano, Albert
2009-01-01
This study aimed to develop and test the reliability and validity of a Spanish adaptation of the ASI-MV, a computer administered version of the Addiction Severity Index, called the S-ASI-MV. Participants were 185 native Spanish-speaking adult clients from substance abuse treatment facilities serving Spanish-speaking clients in Florida, New Mexico, California, and Puerto Rico. Participants were administered the S-ASI-MV as well as Spanish versions of the general health subscale of the SF-36, the work and family unit subscales of the Social Adjustment Scale Self-Report, the Michigan Alcohol Screening Test, the alcohol and drug subscales of the Personality Assessment Inventory, and the Hopkins Symptom Checklist-90. Three-to-five-day test-retest reliability was examined along with criterion validity, convergent/discriminant validity, and factorial validity. Measurement invariance between the English and Spanish versions of the ASI-MV was also examined. The S-ASI-MV demonstrated good test-retest reliability (ICCs for composite scores between .59 and .93), criterion validity (rs for composite scores between .66 and .87), and convergent/discriminant validity. Factorial validity and measurement invariance were demonstrated. These results compared favorably with those reported for the original interviewer version of the ASI and the English version of the ASI-MV.
Stone, Lisanne L; Janssens, Jan M A M; Vermulst, Ad A; Van Der Maten, Marloes; Engels, Rutger C M E; Otten, Roy
2015-01-01
The Strengths and Difficulties Questionnaire is one of the most employed screening instruments. Although there is a large research body investigating its psychometric properties, reliability and validity are not yet fully tested using modern techniques. Therefore, we investigate reliability, construct validity, measurement invariance, and predictive validity of the parent and teacher version in children aged 4-7. Besides, we intend to replicate previous studies by investigating test-retest reliability and criterion validity. In a Dutch community sample 2,238 teachers and 1,513 parents filled out questionnaires regarding problem behaviors and parenting, while 1,831 children reported on sociometric measures at T1. These children were followed-up during three consecutive years. Reliability was examined using Cronbach's alpha and McDonald's omega, construct validity was examined by Confirmatory Factor Analysis, and predictive validity was examined by calculating developmental profiles and linking these to measures of inadequate parenting, parenting stress and social preference. Further, mean scores and percentiles were examined in order to establish norms. Omega was consistently higher than alpha regarding reliability. The original five-factor structure was replicated, and measurement invariance was established on a configural level. Further, higher SDQ scores were associated with future indices of higher inadequate parenting, higher parenting stress and lower social preference. Finally, previous results on test-retest reliability and criterion validity were replicated. This study is the first to show SDQ scores are predictively valid, attesting to the feasibility of the SDQ as a screening instrument. Future research into predictive validity of the SDQ is warranted.
Development of the Daily Activities of Infants Scale: a measure supporting early motor development.
Bartlett, Doreen J; Fanning, Jamie Kneale; Miller, Linda; Conti-Becker, Angela; Doralp, Samantha
2008-08-01
We describe the development and preliminary psychometric testing of the Daily Activities of Infants Scale (DAIS), a parent-completed measure of opportunities parents provide infants for development of postural control and movement. First we obtained 1300 photographs of typical activities from 17 families with infants aged 4 to 11 months. Through consensus we established nine dimensions of activities, graded across three levels of opportunity for development. Pilot testing supported content validity of the DAIS. Subsequently, 50 parents of infants born preterm aged 4 to 11 months participated in psychometric testing. There were 25 male and 25 female infant participants with a mean gestational age of 29.4 weeks (SD 3.6) and a mean birthweight of 1266 grams (SD 635). We found that completion of the DAIS over 1 day was representative of data collected over 3 sequential days. Older infants obtained significantly higher DAIS scores than younger infants, providing preliminary evidence for discriminant validity. The DAIS scores demonstrated a part-correlation of 0.20 (p<0.01) with scores on the Alberta Infant Motor Scale obtained concurrently, providing some evidence for convergent validity. The intraclass correlation coefficients reflecting interrater reliability and test-retest reliability of the total DAIS score were 0.76 (95% confidence interval [CI] 0.60-0.86) and 0.77 (95% CI 0.60-0.87) respectively. The DAIS has sufficient reliability and validity for use in clinical practice and research.
The validation of the visual analogue scale for patient satisfaction after total hip arthroplasty.
Brokelman, Roy B G; Haverkamp, Daniel; van Loon, Corné; Hol, Annemiek; van Kampen, Albert; Veth, Rene
2012-06-01
INTRODUCTION: Patient satisfaction becomes more important in our modern health care system. The assessment of satisfaction is difficult because it is a multifactorial item for which no golden standard exists. One of the potential methods of measuring satisfaction is by using the well-known visual analogue scale (VAS). In this study, we validated VAS for satisfaction. PATIENT AND METHODS: In this prospective study, we studied 147 patients (153 hips). The construct validity was measured using the Spearman correlation test that compares the satisfaction VAS with the Harris hip score, pain VAS at rest and during activity, Oxford hip score, Short Form 36 and Western Ontario McMaster Universities Osteoarthritis Index. The reliability was tested using the intra-class coefficient. RESULTS: The Pearson correlation test showed correlations in the range of 0.40-0.80. The satisfaction VAS had a high correlation between the pain VAS and Oxford hip score, which could mean that pain is one of the most important factors in patient satisfaction. The intra-class coefficient was 0.95. CONCLUSIONS: There is a moderate to mark degree of correlation between the satisfaction VAS and the currently available subjective and objective scoring systems. The intra-class coefficient of 0.95 indicates an excellent test-retest reliability. The VAS satisfaction is a simple instrument to quantify the satisfaction of a patient after total hip arthroplasty. In this study, we showed that the satisfaction VAS has a good validity and reliability.
Validity of the Dictionary of Occupational Titles for Assessing Upper Extremity Work Demands
Opsteegh, Lonneke; Soer, Remko; Reinders-Messelink, Heleen A.; Reneman, Michiel F.; van der Sluis, Corry K.
2010-01-01
Objectives The Dictionary of Occupational Titles (DOT) is used in vocational rehabilitation to guide decisions about the ability of a person with activity limitations to perform activities at work. The DOT has categorized physical work demands in five categories. The validity of this categorization is unknown. Aim of this study was to investigate whether the DOT could be used validly to guide decisions for patients with injuries to the upper extremities. Four hypotheses were tested. Methods A database including 701 healthy workers was used. All subjects filled out the Dutch Musculoskeletal Questionnaire, from which an Upper Extremity Work Demands score (UEWD) was derived. First, relation between the DOT-categories and UEWD-score was analysed using Spearman correlations. Second, variance of the UEWD-score in occupational groups was tested by visually inspecting boxplots and assessing kurtosis of the distribution. Third, it was investigated whether occupations classified in one DOT-category, could significantly differ on UEWD-scores. Fourth, it was investigated whether occupations in different DOT-categories could have similar UEWD-scores using Mann Whitney U-tests (MWU). Results Relation between the DOT-categories and the UEWD-score was weak (rsp = 0.40; p<.01). Overlap between categories was found. Kurtosis exceeded ±1.0 in 3 occupational groups, indicating large variance. UEWD-scores were significantly different within one DOT-category (MWU = 1.500; p<.001). UEWD scores between DOT-categories were not significantly different (MWU = 203.000; p = .49). Conclusion All four hypotheses could not be rejected. The DOT appears to be invalid for assessing upper extremity work demands. PMID:21151934
The woodcock reading mastery test: impact of normative changes.
Pae, Hye Kyeong; Wise, Justin C; Cirino, Paul T; Sevcik, Rose A; Lovett, Maureen W; Wolf, Maryanne; Morris, Robin D
2005-09-01
This study examined the magnitude of differences in standard scores, convergent validity, and concurrent validity when an individual's performance was gauged using the revised and the normative update (Woodcock, 1998) editions of the Woodcock Reading Mastery Test in which the actual test items remained identical but norms have been updated. From three metropolitan areas, 899 first to third grade students referred by their teachers for a reading intervention program participated. Results showed the inverse Flynn effect, indicating systematic inflation averaging 5 to 9 standard score points, regardless of gender, IQ, city site, or ethnicity, when calculated using the updated norms. Inflation was greater at lower raw score levels. Implications for using the updated norms for identifying children with reading disabilities and changing norms during an ongoing study are discussed.
Validation of the Pediatric Cardiac Quality of Life Inventory
Marino, Bradley S.; Tomlinson, Ryan S.; Wernovsky, Gil; Drotar, Dennis; Newburger, Jane W.; Mahony, Lynn; Mussatto, Kathleen; Tong, Elizabeth; Cohen, Mitchell; Andersen, Charlotte; Shera, David; Khoury, Philip R.; Wray, Jo; Gaynor, J. William; Helfaer, Mark A.; Kazak, Anne E.; Shea, Judy A.
2012-01-01
OBJECTIVE The purpose of this multicenter study was to confirm the validity and reliability of the Pediatric Cardiac Quality of Life Inventory (PCQLI). METHODS Seven centers recruited pediatric patients (8–18 years of age) with heart disease (HD) and their parents to complete the PCQLI and generic health-related quality of life (Pediatric Quality of Life Inventory [PedsQL]) and non–quality of life (Self-Perception Profile for Children [SPPC]/Self-Perception Profile for Adolescents [SPPA] and Youth Self-Report [YSR]/Child Behavior Checklist [CBCL]) tools. PCQLI construct validity was assessed through correlations of PCQLI scores between patients and parents and with severity of congenital HD, medical care utilization, and PedsQL, SPPC/SPPA, and YSR/CBCL scores. PCQLI test-retest reliability was evaluated. RESULTS The study enrolled 1605 patient-parent pairs. Construct validity was substantiated by the association of lower PCQLI scores with Fontan palliation and increased numbers of cardiac operations, hospital admissions, and physician visits (P < .001); moderate to good correlations between patient and parent PCQLI scores (r = 0.41–0.61; P <.001); and fair to good correlations between PCQLI total scores and PedsQL total (r = 0.70–0.76), SPPC/SPPA global self-worth (r = 0.43–0.46), YSR/CBCL total competency (r = 0.28–0.37), and syndrome and Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition-oriented scale (r = −0.58 to −0.30; P < .001) scores. Test-retest reliability correlations were excellent (r = 0.78–0.90; P < .001). CONCLUSIONS PCQLI scores are valid and reliable for children and adolescents with congenital and acquired HD and may be useful for future research and clinical management. Pediatrics 2010;126:498–508 PMID:20805147
Caetano, Ana Celia; Dias, Sara; Santa-Cruz, André; Rolanda, Carla
2018-01-01
Recently, the Obstructed Defecation Syndrome score (ODS score) was developed and validated by Renzi to assess clinical staging and to allow evaluation and comparison of the efficacy of treatment of this disorder. Our goal is to validate the Portuguese version of Renzi ODS score, according to the Consensus based Standards for the selection of the Health Measurement Instruments (COSMIN) checklist. Following guidelines for cross-cultural validity, Renzi ODS score was translated into the Portuguese language. Then, a group of patients and healthy controls were invited to fill in the Renzi ODS score at baseline, after 2 weeks and 3 months, respectively. We assessed internal consistency, reliability and measurement error, content and construct validity, responsiveness and interpretability. A total of 113 individuals (77 patients; 36 healthy controls) completed the questionnaire. Seventy and 30 patients repeated the Renzi ODS score after 2 weeks and 3 months respectively. Factor analysis confirmed the unidimensionality of the scale. Cronbach's α coefficient of 0.77 supported item's homogeneity. Weighted quadratic kappa of 0.89 established test-retest reliability. The smallest detectable change at the individual level was 2.66 and at the group level was 0.30. Renzi ODS score and the total (-0.32) and physical (-0.43) SF-36 scores correlated negatively. Patient and control's groups significantly differed (11 points). The change score of Renzi ODS score between baseline and 3 months correlated negatively with the clinical evolution (-0.86). ROC analysis showed minimal important change of 2.00 with AUC 0.97. Neither floor nor ceiling effects were observed. This work validated the Portuguese version of Renzi ODS score. We can now use this reliable, responsive, and interpretable (at the group level) tool to evaluate Portuguese ODS patients.
Kamamoto, Cristhine de Souza Leão; Hassun, Karime Marques; Bagatin, Ediléia; Tomimori, Jane
2014-01-01
BACKGROUND many studies about the psychosocial impact of acne have been reported in international medical literature describing quality of life as a relevant clinical outcome. It is well known that the patient's perception about the disease may be different from the physician's evaluation. Therefore, it is important to use validated instruments that turn the patient's subjective opinion into objective information. OBJECTIVES to translate into Brazilian-Portuguese language and to culturally adapt a quality of life questionnaire, the Acne-Specific Quality of Life Questionnaire (Acne-QoL), as well as to evaluate its reliability and validity. METHODS measurement properties were assessed: 1) validity: comparison between severity and Acne-QoL domain scores, correlations between acne duration and Acne-QoL domain scores, and correlation between Acne-QoL domain scores and SF-36 components; 2) internal consistency: Cronbach's α coefficient; 3) test-retest reproducibility: intraclass correlation coefficient and Wilcoxon test. RESULTS Eighty subjects with a mean age of 20.5 ± 4.8 years presenting mild (33.8%), moderate (36.2%) and severe (30%) facial acne were enrolled. Acne-QoL domain scores were similar among the different acne severity groups except for role-social domain. Subjects with shorter acne duration presented significant higher scores. Acne-QoL domains showed significant correlations, both between themselves and with SF-36 role-social and mental health components. Internal consistency (0.925-0.952) and test-retest reproducibility were considered acceptable (0.768-0.836). CONCLUSIONS the Brazilian-Portuguese version of the Acne-QoL is a reliable and valid satisfactory outcome measure to be used in facial acne studies. PMID:24626652
Sainz de Baranda, Pilar; Rodríguez-Iniesta, María; Ayala, Francisco; Santonja, Fernando; Cejudo, Antonio
2014-07-01
To examine the criterion-related validity of the horizontal hip joint angle (H-HJA) test and vertical hip joint angle (V-HJA) test for estimating hamstring flexibility measured through the passive straight-leg raise (PSLR) test using contemporary statistical measures. Validity study. Controlled laboratory environment. One hundred thirty-eight professional trampoline gymnasts (61 women and 77 men). Hamstring flexibility. Each participant performed 2 trials of H-HJA, V-HJA, and PSLR tests in a randomized order. The criterion-related validity of H-HJA and V-HJA tests was measured through the estimation equation, typical error of the estimate (TEEST), validity correlation (β), and their respective confidence limits. The findings from this study suggest that although H-HJA and V-HJA tests showed moderate to high validity scores for estimating hamstring flexibility (standardized TEEST = 0.63; β = 0.80), the TEEST statistic reported for both tests was not narrow enough for clinical purposes (H-HJA = 10.3 degrees; V-HJA = 9.5 degrees). Subsequently, the predicted likely thresholds for the true values that were generated were too wide (H-HJA = predicted value ± 13.2 degrees; V-HJA = predicted value ± 12.2 degrees). The results suggest that although the HJA test showed moderate to high validity scores for estimating hamstring flexibility, the prediction intervals between the HJA and PSLR tests are not strong enough to suggest that clinicians and sport medicine practitioners should use the HJA and PSLR tests interchangeably as gold standard measurement tools to evaluate and detect short hamstring muscle flexibility.
Coster, Wendy J; Haley, Stephen M; Ni, Pengsheng; Dumas, Helene M; Fragala-Pinkham, Maria A
2008-04-01
To examine score agreement, validity, precision, and response burden of a prototype computer adaptive testing (CAT) version of the self-care and social function scales of the Pediatric Evaluation of Disability Inventory compared with the full-length version of these scales. Computer simulation analysis of cross-sectional and longitudinal retrospective data; cross-sectional prospective study. Pediatric rehabilitation hospital, including inpatient acute rehabilitation, day school program, outpatient clinics; community-based day care, preschool, and children's homes. Children with disabilities (n=469) and 412 children with no disabilities (analytic sample); 38 children with disabilities and 35 children without disabilities (cross-validation sample). Not applicable. Summary scores from prototype CAT applications of each scale using 15-, 10-, and 5-item stopping rules; scores from the full-length self-care and social function scales; time (in seconds) to complete assessments and respondent ratings of burden. Scores from both computer simulations and field administration of the prototype CATs were highly consistent with scores from full-length administration (r range, .94-.99). Using computer simulation of retrospective data, discriminant validity, and sensitivity to change of the CATs closely approximated that of the full-length scales, especially when the 15- and 10-item stopping rules were applied. In the cross-validation study the time to administer both CATs was 4 minutes, compared with over 16 minutes to complete the full-length scales. Self-care and social function score estimates from CAT administration are highly comparable with those obtained from full-length scale administration, with small losses in validity and precision and substantial decreases in administration time.
Effort Analysis: Individual Score Validation of Achievement Test Data
ERIC Educational Resources Information Center
Wise, Steven L.
2015-01-01
Whenever the purpose of measurement is to inform an inference about a student's achievement level, it is important that we be able to trust that the student's test score accurately reflects what that student knows and can do. Such trust requires the assumption that a student's test event is not unduly influenced by construct-irrelevant factors…
Using School Lotteries to Evaluate the Value-Added Model
ERIC Educational Resources Information Center
Deutsch, Jonah
2013-01-01
There has been an active debate in the literature over the validity of value-added models. In this study, the author tests the central assumption of value-added models that school assignment is random relative to expected test scores conditional on prior test scores, demographic variables, and other controls. He uses a Chicago charter school's…
Kharroubi, Akram; Saba, Elias; Ghannam, Ibrahim; Darwish, Hisham
2017-12-01
The need for simple self-assessment tools is necessary to predict women at high risk for developing osteoporosis. In this study, tools like the IOF One Minute Test, Fracture Risk Assessment Tool (FRAX), and Simple Calculated Osteoporosis Risk Estimation (SCORE) were found to be valid for Palestinian women. The threshold for predicting women at risk for each tool was estimated. The purpose of this study is to evaluate the validity of the updated IOF (International Osteoporosis Foundation) One Minute Osteoporosis Risk Assessment Test, FRAX, SCORE as well as age alone to detect the risk of developing osteoporosis in postmenopausal Palestinian women. Three hundred eighty-two women 45 years and older were recruited including 131 women with osteoporosis and 251 controls following bone mineral density (BMD) measurement, 287 completed questionnaires of the different risk assessment tools. Receiver operating characteristic (ROC) curves were evaluated for each tool using bone BMD as the gold standard for osteoporosis. The area under the ROC curve (AUC) was the highest for FRAX calculated with BMD for predicting hip fractures (0.897) followed by FRAX for major fractures (0.826) with cut-off values ˃1.5 and ˃7.8%, respectively. The IOF One Minute Test AUC (0.629) was the lowest compared to other tested tools but with sufficient accuracy for predicting the risk of developing osteoporosis with a cut-off value ˃4 total yes questions out of 18. SCORE test and age alone were also as good predictors of risk for developing osteoporosis. According to the ROC curve for age, women ≥64 years had a higher risk of developing osteoporosis. Higher percentage of women with low BMD (T-score ≤-1.5) or osteoporosis (T-score ≤-2.5) was found among women who were not exposed to the sun, who had menopause before the age of 45 years, or had lower body mass index (BMI) compared to controls. Women who often fall had lower BMI and approximately 27% of the recruited postmenopausal Palestinian women had accidents that caused fractures. Simple self-assessment tools like FRAX without BMD, SCORE, and the IOF One Minute Tests were valid for predicting Palestinian postmenopausal women at high risk of developing osteoporosis.
Predictive validity of pre-admission assessments on medical student performance.
Dabaliz, Al-Awwab; Kaadan, Samy; Dabbagh, M Marwan; Barakat, Abdulaziz; Shareef, Mohammad Abrar; Al-Tannir, Mohamad; Obeidat, Akef; Mohamed, Ayman
2017-11-24
To examine the predictive validity of pre-admission variables on students' performance in a medical school in Saudi Arabia. In this retrospective study, we collected admission and college performance data for 737 students in preclinical and clinical years. Data included high school scores and other standardized test scores, such as those of the National Achievement Test and the General Aptitude Test. Additionally, we included the scores of the Test of English as a Foreign Language (TOEFL) and the International English Language Testing System (IELTS) exams. Those datasets were then compared with college performance indicators, namely the cumulative Grade Point Average (cGPA) and progress test, using multivariate linear regression analysis. In preclinical years, both the National Achievement Test (p=0.04, B=0.08) and TOEFL (p=0.017, B=0.01) scores were positive predictors of cGPA, whereas the General Aptitude Test (p=0.048, B=-0.05) negatively predicted cGPA. Moreover, none of the pre-admission variables were predictive of progress test performance in the same group. On the other hand, none of the pre-admission variables were predictive of cGPA in clinical years. Overall, cGPA strongly predict-ed students' progress test performance (p<0.001 and B=19.02). Only the National Achievement Test and TOEFL significantly predicted performance in preclinical years. However, these variables do not predict progress test performance, meaning that they do not predict the functional knowledge reflected in the progress test. We report various strengths and deficiencies in the current medical college admission criteria, and call for employing more sensitive and valid ones that predict student performance and functional knowledge, especially in the clinical years.
Predictive validity of pre-admission assessments on medical student performance
Dabaliz, Al-Awwab; Kaadan, Samy; Dabbagh, M. Marwan; Barakat, Abdulaziz; Shareef, Mohammad Abrar; Al-Tannir, Mohamad; Obeidat, Akef
2017-01-01
Objectives To examine the predictive validity of pre-admission variables on students’ performance in a medical school in Saudi Arabia. Methods In this retrospective study, we collected admission and college performance data for 737 students in preclinical and clinical years. Data included high school scores and other standardized test scores, such as those of the National Achievement Test and the General Aptitude Test. Additionally, we included the scores of the Test of English as a Foreign Language (TOEFL) and the International English Language Testing System (IELTS) exams. Those datasets were then compared with college performance indicators, namely the cumulative Grade Point Average (cGPA) and progress test, using multivariate linear regression analysis. Results In preclinical years, both the National Achievement Test (p=0.04, B=0.08) and TOEFL (p=0.017, B=0.01) scores were positive predictors of cGPA, whereas the General Aptitude Test (p=0.048, B=-0.05) negatively predicted cGPA. Moreover, none of the pre-admission variables were predictive of progress test performance in the same group. On the other hand, none of the pre-admission variables were predictive of cGPA in clinical years. Overall, cGPA strongly predict-ed students’ progress test performance (p<0.001 and B=19.02). Conclusions Only the National Achievement Test and TOEFL significantly predicted performance in preclinical years. However, these variables do not predict progress test performance, meaning that they do not predict the functional knowledge reflected in the progress test. We report various strengths and deficiencies in the current medical college admission criteria, and call for employing more sensitive and valid ones that predict student performance and functional knowledge, especially in the clinical years. PMID:29176032
Mayo, Ann M
2015-01-01
It is important for CNSs and other APNs to consider the reliability and validity of instruments chosen for clinical practice, evidence-based practice projects, or research studies. Psychometric testing uses specific research methods to evaluate the amount of error associated with any particular instrument. Reliability estimates explain more about how well the instrument is designed, whereas validity estimates explain more about scores that are produced by the instrument. An instrument may be architecturally sound overall (reliable), but the same instrument may not be valid. For example, if a specific group does not understand certain well-constructed items, then the instrument does not produce valid scores when used with that group. Many instrument developers may conduct reliability testing only once, yet continue validity testing in different populations over many years. All CNSs should be advocating for the use of reliable instruments that produce valid results. Clinical nurse specialists may find themselves in situations where reliability and validity estimates for some instruments that are being utilized are unknown. In such cases, CNSs should engage key stakeholders to sponsor nursing researchers to pursue this most important work.
Ahmad, Badariah; Ramadas, Amutha; Kia Fatt, Quek; Md Zain, Anuar Zaini
2014-04-08
Diabetes education and self-care remains the cornerstone of diabetes management. There are many structured diabetes modules available in the United Kingdom, Europe and United States of America. Contrastingly, few structured and validated diabetes modules are available in Malaysia. This pilot study aims to develop and validate diabetes education material suitable and tailored for a multicultural society like Malaysia. The theoretical framework of this module was founded from the Health Belief Model (HBM). The participants were assessed using 6-item pre- and post-test questionnaires that measured some of the known HBM constructs namely cues to action, perceived severity and perceived benefit. Data was analysed using PASW Statistics 18.0. The pre- and post-test questionnaires were administered to 88 participants (31 males). In general, there was a significant increase in the total score in post-test (97.34 ± 6.13%) compared to pre-test (92.80 ± 12.83%) (p < 0.05) and a significant increase in excellent score (>85%) at post-test (84.1%) compared to pre-test (70.5%) (p < 0.05). There was an improvement in post-test score in 4 of 6 items tested. The remaining 2 items which measured the perceived severity and cues to action had poorer post-test score. The preliminary results from this pilot study suggest contextualised content material embedded within MY DEMO maybe suitable for integration with the existing diabetes education programmes. This was the first known validated diabetes education programme available in the Malay language.
Ross, Thomas P
2014-12-01
The reliability and validity of standard and qualitative scores for the Ruff Figural Fluency Test (RFFT; Ruff, 1988) was examined in 102 healthy undergraduates. Participants (M age = 21.79; SD = 3.7; age = 80% Caucasian) were administered the RFFT and measures assessing executive functions (EF) and other cognitive domains. Inter-scorer reliability was excellent (0.9 range) for most RFFT indices. Test-retest coefficients (M interval = 7 weeks) ranged from 0.64 for the error ratio score to 0.87 for unique designs. RFFT indices correlated with Block Design performance and nonverbal measures of working memory, but were unrelated to measures of verbal fluency, verbal learning, or working memory for verbal material. RFFT novel design output correlated with most measures of EF supporting the convergent validity of this measure. In contrast, correlations between measures of EF and qualitative scores were absent or weak. RFFT score interpretation is discussed in light of relevant models of EF and directions for future research are presented. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Music therapy career aptitude test.
Lim, Hayoung A
2011-01-01
The purpose of the Music Therapy Career Aptitude Test (MTCAT) was to measure the affective domain of music therapy students including their self-awareness as it relates to the music therapy career, value in human development, interest in general therapy, and aptitude for being a professional music therapist. The MTCAT was administered to 113 music therapy students who are currently freshman or sophomores in an undergraduate music therapy program or in the first year of a music therapy master's equivalency program. The results of analysis indicated that the MTCAT is normally distributed and that all 20 questions are significantly correlated with the total test score of the MTCAT. The reliability of the MTCAT was considerably high (Cronbach's Coefficient Alpha=0.8). The criterion-related validity was examined by comparing the MTCAT scores of music therapy students with the scores of 43 professional music therapists. The correlation between the scores of students and professionals was found to be statistically significant. The results suggests that normal distribution, internal consistency, homogeneity of construct, item discrimination, correlation analysis, content validity, and criterion-related validity in the MTCAT may be helpful in predicting music therapy career aptitude and may aid in the career decision making process of college music therapy students.
Wu, Y Z; Wang, W J; Feng, N P; Chen, B; Li, G C; Liu, J W; Liu, H L; Yang, Y Y
2016-07-06
To evaluate the validity, reliability, and acceptability of the brief version of the self-management knowledge, attitude, and behavior (KAB) assessment scale for diabetes patients. Diabetes patients who were managed at the Xinkaipu Community Health Service Center of Tianxin in Changsha, Hunan Province were selected for survey by cluster sampling. A total of 350 diabetes patients were surveyed using the brief scale to collect data on knowledge, attitudes, and behaviors of self-management. Content validity was evaluated by Pearson correlation coefficient between the brief scale and subscales of knowledge, attitude, and behavior. Structure validity was evaluated by factor analysis, and discrimination validity was evaluated by an independent sample t-test between the high-score and low-score groups. Reliability was tested by internal consistency reliability and split-half reliability. The evaluation indexes of internal consistency reliability were Cronbach's α coefficients, θ coefficient, and Ω coefficient. Acceptability was evaluated by valid response rate and completion time of the brief scale. A total of 346(98.9%) valid questionnaires were returned, with average survey time of (11.43±3.4) minutes. Average score of the brief scale was 78.85 ± 11.22; scores of the knowledge, attitude, and behavior subscales were 16.45 ± 4.42, 21.33 ± 2.03, and 41.07 ± 8.34, respectively. Pearson correlation coefficients between the brief scale and the knowledge, attitude, and behavior subscales were 0.92, 0.42, and 0.60, respectively; P-values were all less than 0.01, indicating that the face validity and content validity of the brief scale were achieved to a good level. The common factor cumulative variance contribution rate of the brief scale and three subscales was from 53.66% to 61.75%, which achieved more than 50% of the approved standard. There were 11 common factors; 41 of the total 42 items had factor loadings above 0.40 in their relevant common factor, indicating that the brief scale and three subscales had good construct validity. Patients were divided into a high-score group and a low-score group, then scores of the brief scale and three subscales were compared between the groups using a t-test. The results were all significant, indicating that the brief scale and three subscales had good discriminate validity. Mean scores of the brief scale and three subscales of the high-score group were 91.55±6.81, 19.51±2.17, 22.74±1.88, and 49.30±6.20, respectively; these were higher than the low-score group (65.89±5.79, 12.29±4.76, 20.22±1.88, and 33.39±6.17, respectively) with t-values 27.76, 13.31, 9.20, and 17.56 (P-values were less than 0.001). The Cronbach's α coefficient, θ coefficient, Ω coefficient, and split-half reliability of the brief scale were 0.83, 0.87, 0.96, and 0.84, respectively. These values for the three subscales were all above 0.70, except for the θ coefficient of the attitude subscale with 0.64, indicating that the brief scale and three subscales had acceptable internal consistency reliability. The brief version of the diabetes self-management knowledge, attitude, and behavior assessment scale showed good acceptability, validity, and reliability, to responsibly evaluate self-management KAB among patients with diabetes.
Workplace nutrition knowledge questionnaire: psychometric validation and application.
Guadagnin, Simone C; Nakano, Eduardo Y; Dutra, Eliane S; de Carvalho, Kênia M B; Ito, Marina K
2016-11-01
Workplace dietary intervention studies in low- and middle-income countries using psychometrically sound measures are scarce. This study aimed to validate a nutrition knowledge questionnaire (NQ) and its utility in evaluating the changes in knowledge among participants of a Nutrition Education Program (NEP) conducted at the workplace. A NQ was tested for construct validity, internal consistency and discriminant validity. It was applied in a NEP conducted at six workplaces, in order to evaluate the effect of an interactive or a lecture-based education programme on nutrition knowledge. Four knowledge domains comprising twenty-three items were extracted in the final version of the NQ. Internal consistency of each domain was significant, with Kuder-Richardson formula values>0·60. These four domains presented a good fit in the confirmatory factor analysis. In the discriminant validity test, both the Expert and Lay groups scored>0·52, but the Expert group scores were significantly higher than those of the Lay group in all domains. When the NQ was applied in the NEP, the overall questionnaire scores increased significantly because of the NEP intervention, in both groups (P<0·001). However, the increase in NQ scores was significantly higher in the interactive group than in the lecture group, in the overall score (P=0·008) and in the healthy eating domain (P=0·009). The validated NQ is a short and useful tool to assess gain in nutrition knowledge among participants of NEP at the workplace. According to the NQ, an interactive nutrition education had a higher impact on nutrition knowledge than a lecture programme.
Talip, Whadi-ah; Steyn, Nelia P; Visser, Marianne; Charlton, Karen E; Temple, Norman
2003-09-01
We wanted to develop and validate a test that assesses the knowledge and practices of health professionals (HPs) with regard to the role of nutrition, physical activity, and smoking cessation (lifestyle modification) in chronic diseases of lifestyle. A descriptive cross-sectional validation study was carried out. The validation design consisted of two phases, namely 1) test planning and development and 2) test evaluation. The study sample consisted of five groups of HPs: dietitians, dietetic interns, general practitioners, medical students, and nurses. The overall response rate was 58%, resulting in a sample size of 186 participants. A test was designed to evaluate the knowledge and practices of HPs. The test was first evaluated by an expert group to ensure content, construct, and face validity. Thereafter, the questionnaire was tested on five groups of HPs to test for criterion validity. Internal consistency was evaluated by Cronbach's alpha. An expert panel ensured content, construct, and face validity of the test. Groups with the most training and exposure to nutrition (dietitians and dietetic interns) had the highest group mean score, ranging from 61% to 88%, whereas those with limited nutrition training (general practitioners, medical students, and nurses) had significantly lower scores, ranging from 26% to 80%. This result demonstrated criterion validity. Internal consistency of the overall test demonstrated a Cronbach's alpha of 0.99. Most HPs identified the mass media as their main source of information on lifestyle modification. These HPs also identified lack of time, lack of patient compliance, and lack of knowledge as barriers that prevent them from providing counseling on lifestyle modification. The results of this study showed that this test instrument identifies groups of health professionals with adequate training (knowledge) in lifestyle modification and those who require further training (knowledge).
Çelik, Derya
2016-01-01
The Constant-Murley score (CMS) is widely used to evaluate disabilities associated with shoulder injuries, but it has been criticized for relying on imprecise terminology and a lack of standardized methodology. A modified guideline, therefore, was published in 2008 with several recommendations. This new version has not yet been translated or culturally adapted for Turkish-speaking populations. The purpose of this study was to translate and cross-culturally adapt the modified CMS and its test protocol, as well as define and measure its reliability and validity. The modified CMS was translated into Turkish, consistent with published methodological guidelines. The measurement properties of the Turkish version of the modified CMS were tested in 30 patients (12 males, 18 females; mean age: 59.5±13.5 years) with a variety of shoulder pathologies. Intraclass correlation coefficients (ICC) were used to estimate test-retest reliability. Construct validity was analyzed with the Turkish version of the American Shoulder and Elbow Surgeons (ASES) Standardized Shoulder Assessment Form and Short-Form Health Survey (SF-12). No difficulties were found in the translation process. The Turkish version of the modified CMS showed excellent test-retest reliability (ICC=0.86). The correlation coefficients between the Turkish version of the modified CMS and the ASES, SF-12-physical component score, and SF-12 mental component scores were found to be 0.48, 0.35, and 0.05, respectively. No floor or ceiling effects were found. The translation and cultural adaptation of the modified CMS and its standardized test protocol into Turkish were successful. The Turkish version of the modified CMS has sufficient reliability and validity to measure a variety of shoulder disorders for Turkish-speaking individuals.
Schoenmakers, Birgitte; Wens, Johan
2014-03-04
To investigate if the psychometric qualities of an OSCE consisting of more complex simulated patient encounters remain valid and reliable in the assessment of postgraduate trainees in general practice. In this intervention study without control group, the traditional OSCE was formally replaced by the new, complex version. The study population was composed by all postgraduate trainees (second and third phase) in general practice during the ongoing academic year. Data were handled and collected as part of the formal assessment program. Univariate analyses, the variance of scores and multivariate analyses were performed to assess the test qualities. A total of 340 students participated. Average final scores were slightly higher for third-phase students (t-test, p =0.05). Overall test scores were equally distributed on station level, circuit level and phase level. A multiple regression analysis revealed that test scores were dependent on the stations and circuits, but not on the master phase. In a changing learning environment, assessment and evaluation strategies require reorientation. The reliability and validity of the OSCE remain subject to discussion. In particular, when it comes to content and design, the traditional OSCE might underestimate the performance level of postgraduate trainees in general practice. A reshaping of this OSCE to a more sophisticated design with more complex patient encounters appears to restore the validity of the test results.
ERIC Educational Resources Information Center
Blagov, Pavel S.; Bi, Wu; Shedler, Jonathan; Westen, Drew
2012-01-01
The Shedler-Westen Assessment Procedure (SWAP) is a personality assessment instrument designed for use by expert clinical assessors. Critics have raised questions about its psychometrics, most notably its validity across observers and situations, the impact of its fixed score distribution on research findings, and its test-retest reliability. We…
ERIC Educational Resources Information Center
Watkins, David; Astilla, Estela
1980-01-01
Evidence is presented partially supporting the reliability and construct validity of the Coopersmith Self-Esteem Inventory with Filipino adolescent girls. A test-retest coefficient of 0.61 was found over a nine-month period. Self-esteem scores were significantly associated with IQ scores and teacher ratings of pupils' self-esteem. (Author/BW)
McCaffrey, Ruth; Bishop, Mary; Adonis-Rizzo, Marie; Williamson, Ellen; McPherson, Melanie; Cruikshank, Alice; Carrier, Vicki Jo; Sands, Simone; Pigano, Diane; Girard, Patricia; Lauzon, Cathy
2007-01-01
Hospital-acquired deep vein thrombosis (DVT) and pulmonary embolisms (PE) are preventable problems that can increase mortality. Early assessment and recognition of risk as well as initiating appropriate prevention measures can prevent DVT or PE. The purpose of this research project was to develop a DVT risk assessment tool and test the tool for validity and reliability. Three phases were undertaken in developing and testing the JFK Medical Center DVT risk assessment tool. Investigation and clarification of risk and predisposing factors for DVT were identified from the literature, expert nursing knowledge, and medical staff input. Second, item development and weighting were undertaken. Third, parametric testing for content validity measured the differences in mean assessment tool scores between a group of patients who developed DVT in the hospital and a demographically similar group who did not develop DVT. Interrater reliability was measured by having three different nurses score each patient and compare the differences in scores among the three. The DVT group had significantly higher scores on the JFK DVT assessment scale than did those who did not experience DVT. Interrater reliability showed a strong correlation among the scores of the three nurses (.98). Providing a valid and reliable tool for measuring the risk for DVT or PE in hospitalized patients will enable nurses to intervene early in patients at risk. Basing DVT risk assessment on the evidence provided in this study will assist nurses in becoming more confident in recognizing the necessity for interventions in hospitalized patients and decreasing risk. Nurses can now evaluate patients at risk for DVT or PE using the JFK Medial Center's risk assessment tool.
Agreeing on Validity Arguments
ERIC Educational Resources Information Center
Sireci, Stephen G.
2013-01-01
Kane (this issue) presents a comprehensive review of validity theory and reminds us that the focus of validation is on test score interpretations and use. In reacting to his article, I support the argument-based approach to validity and all of the major points regarding validation made by Dr. Kane. In addition, I call for a simpler, three-step…
Taylor, K; Parashar, D; Bouverat, G; Poulos, A; Gullien, R; Stewart, E; Aarre, R; Crystal, P; Wallis, M
2017-11-01
Optimum mammography positioning technique is necessary to maximise cancer detection. Current criteria for mammography appraisal lack reliability and validity with a need to develop a more objective system. We aimed to establish current international practice in assessing image quality (IQ), of screening mammograms then develop and validate a reproducible assessment tool. A questionnaire sent to centres in countries undertaking population screening identified practice, participants for an expert panel (EP) of radiologists/radiographers and a testing panel (TP) of radiographers. The EP developed category criteria and descriptors using a modified Delphi process to agree definitions. The EP scored 12 screening mammograms to test agreement then a main set of 178 cases. Weighted scores were derived for each descriptor enabling calculation of numerical parameters for each new category. The TP then scored the main set. Statistical analysis included ANOVA, t-tests and Kendall's coefficient. 11 centres in 8 countries responded forming an EP of 7 members and TP of 44 members. The EP showed moderate agreement when the scoring the mini test set W = 0.50 p < 0.001 and the main set W = 0.55 p < 0.001, 'posterior nipple line' being the most difficult descriptor. The weighted total scores differentiated the 4 new categories Perfect, Good, Adequate and Inadequate (p < 0.001). We have developed an assessment tool by Delphi consensus and weighted consensus criteria. We have successfully tabulated a range of numerical scores for each new category providing the first validated and reproducible mammography IQ scoring system. Copyright © 2017 The College of Radiographers. Published by Elsevier Ltd. All rights reserved.
Validation of an Instrument to Measure High School Students' Attitudes toward Fitness Testing
ERIC Educational Resources Information Center
Mercier, Kevin; Silverman, Stephen
2014-01-01
Purpose: The purpose of this investigation was to develop an instrument that has scores that are valid and reliable for measuring students' attitudes toward fitness testing. Method: The method involved the following steps: (a) an elicitation study, (b) item development, (c) a pilot study, and (d) a validation study. The pilot study included 427…
Aldekhayel, Salah A; Alselaim, Nahar A; Magzoub, Mohi Eldin; Al-Qattan, Mohammad M; Al-Namlah, Abdullah M; Tamim, Hani; Al-Khayal, Abdullah; Al-Habdan, Sultan I; Zamakhshary, Mohammed F
2012-10-24
Script Concordance Test (SCT) is a new assessment tool that reliably assesses clinical reasoning skills. Previous descriptions of developing SCT-question banks were merely subjective. This study addresses two gaps in the literature: 1) conducting the first phase of a multistep validation process of SCT in Plastic Surgery, and 2) providing an objective methodology to construct a question bank based on SCT. After developing a test blueprint, 52 test items were written. Five validation questions were developed and a validation survey was established online. Seven reviewers were asked to answer this survey. They were recruited from two countries, Saudi Arabia and Canada, to improve the test's external validity. Their ratings were transformed into percentages. Analysis was performed to compare reviewers' ratings by looking at correlations, ranges, means, medians, and overall scores. Scores of reviewers' ratings were between 76% and 95% (mean 86% ± 5). We found poor correlations between reviewers (Pearson's: +0.38 to -0.22). Ratings of individual validation questions ranged between 0 and 4 (on a scale 1-5). Means and medians of these ranges were computed for each test item (mean: 0.8 to 2.4; median: 1 to 3). A subset of test items comprising 27 items was generated based on a set of inclusion and exclusion criteria. This study proposes an objective methodology for validation of SCT-question bank. Analysis of validation survey is done from all angles, i.e., reviewers, validation questions, and test items. Finally, a subset of test items is generated based on a set of criteria.
Bax, Simon; Bredy, Charlene; Kempny, Aleksander; Dimopoulos, Konstantinos; Devaraj, Anand; Walsh, Simon; Jacob, Joseph; Nair, Arjun; Kokosi, Maria; Keir, Gregory; Kouranos, Vasileios; George, Peter M; McCabe, Colm; Wilde, Michael; Wells, Athol; Li, Wei; Wort, Stephen John; Price, Laura C
2018-04-01
European Respiratory Society (ERS) guidelines recommend the assessment of patients with interstitial lung disease (ILD) and severe pulmonary hypertension (PH), as defined by a mean pulmonary artery pressure (mPAP) ≥35 mmHg at right heart catheterisation (RHC). We developed and validated a stepwise echocardiographic score to detect severe PH using the tricuspid regurgitant velocity and right atrial pressure (right ventricular systolic pressure (RVSP)) and additional echocardiographic signs. Consecutive ILD patients with suspected PH underwent RHC between 2005 and 2015. Receiver operating curve analysis tested the ability of components of the score to predict mPAP ≥35 mmHg, and a score devised using a stepwise approach. The score was tested in a contemporaneous validation cohort. The score used "additional PH signs" where RVSP was unavailable, using a bootstrapping technique. Within the derivation cohort (n=210), a score ≥7 predicted severe PH with 89% sensitivity, 71% specificity, positive predictive value 68% and negative predictive value 90%, with similar performance in the validation cohort (n=61) (area under the curve (AUC) 84.8% versus 83.1%, p=0.8). Although RVSP could be estimated in 92% of studies, reducing this to 60% maintained a fair accuracy (AUC 74.4%). This simple stepwise echocardiographic PH score can predict severe PH in patients with ILD.
Bax, Simon; Bredy, Charlene; Kempny, Aleksander; Dimopoulos, Konstantinos; Devaraj, Anand; Walsh, Simon; Jacob, Joseph; Nair, Arjun; Kokosi, Maria; Keir, Gregory; Kouranos, Vasileios; George, Peter M.; McCabe, Colm; Wilde, Michael; Wells, Athol; Li, Wei; Wort, Stephen John; Price, Laura C.
2018-01-01
European Respiratory Society (ERS) guidelines recommend the assessment of patients with interstitial lung disease (ILD) and severe pulmonary hypertension (PH), as defined by a mean pulmonary artery pressure (mPAP) ≥35 mmHg at right heart catheterisation (RHC). We developed and validated a stepwise echocardiographic score to detect severe PH using the tricuspid regurgitant velocity and right atrial pressure (right ventricular systolic pressure (RVSP)) and additional echocardiographic signs. Consecutive ILD patients with suspected PH underwent RHC between 2005 and 2015. Receiver operating curve analysis tested the ability of components of the score to predict mPAP ≥35 mmHg, and a score devised using a stepwise approach. The score was tested in a contemporaneous validation cohort. The score used “additional PH signs” where RVSP was unavailable, using a bootstrapping technique. Within the derivation cohort (n=210), a score ≥7 predicted severe PH with 89% sensitivity, 71% specificity, positive predictive value 68% and negative predictive value 90%, with similar performance in the validation cohort (n=61) (area under the curve (AUC) 84.8% versus 83.1%, p=0.8). Although RVSP could be estimated in 92% of studies, reducing this to 60% maintained a fair accuracy (AUC 74.4%). This simple stepwise echocardiographic PH score can predict severe PH in patients with ILD. PMID:29750141
McCaul, Courtney; Boone, Kyle B; Ermshar, Annette; Cottingham, Maria; Victor, Tara L; Ziegler, Elizabeth; Zeller, Michelle A; Wright, Matthew
2018-01-18
To cross-validate the Dot Counting Test in a large neuropsychological sample. Dot Counting Test scores were compared in credible (n = 142) and non-credible (n = 335) neuropsychology referrals. Non-credible patients scored significantly higher than credible patients on all Dot Counting Test scores. While the original E-score cut-off of ≥17 achieved excellent specificity (96.5%), it was associated with mediocre sensitivity (52.8%). However, the cut-off could be substantially lowered to ≥13.80, while still maintaining adequate specificity (≥90%), and raising sensitivity to 70.0%. Examination of non-credible subgroups revealed that Dot Counting Test sensitivity in feigned mild traumatic brain injury (mTBI) was 55.8%, whereas sensitivity was 90.6% in patients with non-credible cognitive dysfunction in the context of claimed psychosis, and 81.0% in patients with non-credible cognitive performance in depression or severe TBI. Thus, the Dot Counting Test may have a particular role in detection of non-credible cognitive symptoms in claimed psychiatric disorders. Alternative to use of the E-score, failure on ≥1 cut-offs applied to individual Dot Counting Test scores (≥6.0″ for mean grouped dot counting time, ≥10.0″ for mean ungrouped dot counting time, and ≥4 errors), occurred in 11.3% of the credible sample, while nearly two-thirds (63.6%) of the non-credible sample failed one of more of these cut-offs. An E-score cut-off of 13.80, or failure on ≥1 individual score cut-offs, resulted in few false positive identifications in credible patients, and achieved high sensitivity (64.0-70.0%), and therefore appear appropriate for use in identifying neurocognitive performance invalidity.
Proposal and validation of a clinical trunk control test in individuals with spinal cord injury.
Quinzaños, J; Villa, A R; Flores, A A; Pérez, R
2014-06-01
One of the problems that arise in spinal cord injury (SCI) is alteration in trunk control. Despite the need for standardized scales, these do not exist for evaluating trunk control in SCI. To propose and validate a trunk control test in individuals with SCI. National Institute of Rehabilitation, Mexico. The test was developed and later evaluated for reliability and criteria, content, and construct validity. We carried out 531 tests on 177 patients and found high inter- and intra-rater reliability. In terms of criterion validity, analysis of variance demonstrated a statistically significant difference in the test score of patients with adequate or inadequate trunk control according to the assessment of a group of experts. A receiver operating characteristic curve was plotted for optimizing the instrument's cutoff point, which was determined at 13 points, with a sensitivity of 98% and a specificity of 92.2%. With regard to construct validity, the correlation between the proposed test and the spinal cord independence measure (SCIM) was 0.873 (P=0.001) and that with the evolution time was 0.437 (P=0.001). For testing the hypothesis with qualitative variables, the Kruskal-Wallis test was performed, which resulted in a statistically significant difference between the scores in the proposed scale of each group defined by these variables. It was proven experimentally that the proposed trunk control test is valid and reliable. Furthermore, the test can be used for all patients with SCI despite the type and level of injury.
The Arthroscopic Surgical Skill Evaluation Tool (ASSET).
Koehler, Ryan J; Amsdell, Simon; Arendt, Elizabeth A; Bisson, Leslie J; Braman, Jonathan P; Bramen, Jonathan P; Butler, Aaron; Cosgarea, Andrew J; Harner, Christopher D; Garrett, William E; Olson, Tyson; Warme, Winston J; Nicandri, Gregg T
2013-06-01
Surgeries employing arthroscopic techniques are among the most commonly performed in orthopaedic clinical practice; however, valid and reliable methods of assessing the arthroscopic skill of orthopaedic surgeons are lacking. The Arthroscopic Surgery Skill Evaluation Tool (ASSET) will demonstrate content validity, concurrent criterion-oriented validity, and reliability when used to assess the technical ability of surgeons performing diagnostic knee arthroscopic surgery on cadaveric specimens. Cross-sectional study; Level of evidence, 3. Content validity was determined by a group of 7 experts using the Delphi method. Intra-articular performance of a right and left diagnostic knee arthroscopic procedure was recorded for 28 residents and 2 sports medicine fellowship-trained attending surgeons. Surgeon performance was assessed by 2 blinded raters using the ASSET. Concurrent criterion-oriented validity, interrater reliability, and test-retest reliability were evaluated. Content validity: The content development group identified 8 arthroscopic skill domains to evaluate using the ASSET. Concurrent criterion-oriented validity: Significant differences in the total ASSET score (P < .05) between novice, intermediate, and advanced experience groups were identified. Interrater reliability: The ASSET scores assigned by each rater were strongly correlated (r = 0.91, P < .01), and the intraclass correlation coefficient between raters for the total ASSET score was 0.90. Test-retest reliability: There was a significant correlation between ASSET scores for both procedures attempted by each surgeon (r = 0.79, P < .01). The ASSET appears to be a useful, valid, and reliable method for assessing surgeon performance of diagnostic knee arthroscopic surgery in cadaveric specimens. Studies are ongoing to determine its generalizability to other procedures as well as to the live operating room and other simulated environments.
Park, Juhyun; Kang, Minyong; Jeong, Chang Wook; Oh, Sohee; Lee, Jeong Woo; Lee, Seung Bae; Son, Hwancheol; Jeong, Hyeon; Cho, Sung Yong
2015-08-01
The modified Seoul National University Renal Stone Complexity scoring system (S-ReSC-R) for retrograde intrarenal surgery (RIRS) was developed as a tool to predict stone-free rate (SFR) after RIRS. We externally validated the S-ReSC-R. We retrospectively reviewed 159 patients who underwent RIRS. The S-ReSC-R was assigned from 1 to 12 according to the location and number of sites involved. The stone-free status was defined as no evidence of a stone or with clinically insignificant residual fragment stones less than 2 mm. Interobserver and test-retest reliabilities were evaluated. Statistical performance of the prediction model was assessed by its predictive accuracy, predictive probability, and clinical usefulness. Overall SFR was 73.0%. The SFRs were 86.7%, 70.2%, and 48.6% in low-score (1-2), intermediate-score (3-4), and high-score (5-12) groups, respectively (p<0.001). External validation of S-ReSC-R revealed an area under the curve (AUC) of 0.731 (95% CI 0.650-0.813). The AUC of the three-titered S-ReSC-R was 0.701 (95% CI 0.609-0.794). The calibration plot showed that the predicted probability of SFR had a concordance comparable to that of observed frequency. The Hosmer-Lemeshow goodness of fit test revealed a p-value of 0.01 for the S-ReSC-R and 0.90 for the three-titered S-ReSC-R. Interobserver and test-retest reliabilities revealed an almost perfect level of agreement. The present study proved the predictive value of S-ReSC-R to predict SFR following RIRS in an independent cohort. Interobserver and test-retest reliabilities confirmed that S-ReSC-R was reliable and valid.
Concurrent validity of the Swedish version of the life-space assessment questionnaire.
Fristedt, Sofi; Kammerlind, Ann-Sofi; Bravell, Marie Ernsth; Fransson, Eleonor I
2016-11-08
The Life-Space Assessment (LSA), developed in the USA, is an instrument focusing on mobility with respect to reaching different areas defined as life-spaces, extending from the room where the person sleeps to mobility outside one's hometown. A newly translated Swedish version of the LSA (LSA-S) has been tested for test-retest reliability, but the validity remains to be tested. The purpose of the present study was to examine the concurrent validity of the LSA-S, by comparing and correlating the LSA scores to other measures of mobility. The LSA was included in a population-based study of health, functioning and mobility among older persons in Sweden, and the present analysis comprised 312 community-dwelling participants. To test the concurrent validity, the LSA scores were compared to a number of other mobility-related variables, including the Short Physical Performance Battery (SPPB) as well as "stair climbing", "transfers", "transportation", "food shopping", "travel for pleasure" and "community activities". The LSA total mean scores for different levels of the other mobility-related variables, and measures of correlation were calculated. Higher LSA total mean scores were observed with higher levels of all the other mobility related variables. Most of the correlations between the LSA and the other mobility variables were large (r = 0.5-1.0) and significant at the 0.01 level. The LSA total score, as well as independent life-space and assistive life-space correlated with transportation (0.63, 0.66, 0.64) and food shopping (0.55, 0.58, 0.55). Assistive life-space also correlated with SPPB (0.47). With respect to maximal life-space, the correlations with the mobility-related variables were generally lower (below 0.5), probably since this aspect of life-space mobility is highly influenced by social support and is not so dependent on the individual's own physical function. LSA was shown to be a valid measure of mobility when using the LSA total, independent LS or assistive LSA.
Assessing Wildlife Habitat Value of New England Salt Marshes: II. Model Testing and Validation
We test a previously described model to assess the wildlife habitat value of New England salt marshes by comparing modeled habitat values and scores with bird abundance and species richness at sixteen salt marshes in Narragansett Bay, Rhode Island USA. Assessment scores ranged f...
A New Clinical Pain Knowledge Test for Nurses: Development and Psychometric Evaluation.
Bernhofer, Esther I; St Marie, Barbara; Bena, James F
2017-08-01
All nurses care for patients with pain, and pain management knowledge and attitude surveys for nurses have been around since 1987. However, no validated knowledge test exists to measure postlicensure clinicians' knowledge of the core competencies of pain management in current complex patient populations. To develop and test the psychometric properties of an instrument designed to measure pain management knowledge of postlicensure nurses. Psychometric instrument validation. Four large Midwestern U.S. hospitals. Registered nurses employed full time and part time August 2015 to April 2016, aged M = 43.25 years; time as RN, M = 16.13 years. Prospective survey design using e-mail to invite nurses to take an electronic multiple choice pain knowledge test. Content validity of initial 36-item test "very good" (95.1% agreement). Completed tests that met analysis criteria, N = 747. Mean initial test score, 69.4% correct (range 27.8-97.2). After revision/removal of 13 unacceptable questions, mean test score was 50.4% correct (range 8.7-82.6). Initial test item percent difficulty range was 15.2%-98.1%; discrimination values range, 0.03-0.50; final test item percent difficulty range, 17.6%-91.1%, discrimination values range, -0.04 to 1.04. Split-half reliability final test was 0.66. A high decision consistency reliability was identified, with test cut-score of 75%. The final 23-item Clinical Pain Knowledge Test has acceptable discrimination, difficulty, decision consistency, reliability, and validity in the general clinical inpatient nurse population. This instrument will be useful in assessing pain management knowledge of clinical nurses to determine gaps in education, evaluate knowledge after pain management education, and measure research outcomes. Copyright © 2017 American Society for Pain Management Nursing. Published by Elsevier Inc. All rights reserved.
Automated Essay Scoring versus Human Scoring: A Correlational Study
ERIC Educational Resources Information Center
Wang, Jinhao; Brown, Michelle Stallone
2008-01-01
The purpose of the current study was to analyze the relationship between automated essay scoring (AES) and human scoring in order to determine the validity and usefulness of AES for large-scale placement tests. Specifically, a correlational research design was used to examine the correlations between AES performance and human raters' performance.…
Furtado, Ricardo; Jones, Anamaria; Furtado, Rita NV; Jennings, Fábio; Natour, Jamil
2009-01-01
OBJECTIVE: To develop a Brazilian version of the gesture behavior test (GBT) for patients with chronic low back pain. METHODS: Translation of GBT into Portuguese was performed by a rheumatologist fluent in the language of origin (French) and skilled in the validation of questionnaires. This translated version was back-translated into French by a native-speaking teacher of the language. The two translators then created a final consensual version in Portuguese. Cultural adaptation was carried out by two rheumatologists, one educated patient and the native-speaking French teacher. Thirty patients with chronic low back pain and fifteen healthcare professionals involved in the education of patients with low back pain through back schools (gold-standard) were evaluated. Reproducibility was initially tested by two observers (inter-observer); the procedures were also videotaped for later evaluation by one of the observers (intra-observer). For construct validation, we compared patients’ scores against the scores of the healthcare professionals. RESULTS: Modifications were made to the GBT for cultural reasons. The Spearman’s correlation coefficient and the intra-class coefficient, which was employed to measure reproducibility, ranged between 0.87 and 0.99 and 0.94 to 0.99, respectively (p < 0.01). With regard to validation, the Mann-Whitney test revealed a significant difference (p < 0.01) between the averages for healthcare professionals (26.60; SD 2.79) and patients (16.30; SD 6.39). There was a positive correlation between the GBT score and the score on the Roland Morris Disability Questionnaire (r= 0.47). CONCLUSIONS: The Brazilian version of the GBT proved to be a reproducible and valid instrument. In addition, according to the questionnaire results, more disabled patients exhibited more protective gesture behavior related to low-back. PMID:19219312
Pelizza, Lorenzo; Paterlini, Federica; Azzali, Silvia; Garlassi, Sara; Scazza, Ilaria; Pupo, Simona; Simmons, Magenta; Nelson, Barnaby; Raballo, Andrea
2018-04-26
The Comprehensive Assessment of At-Risk Mental States (CAARMS) was specifically developed to assess and detect young people at ultra-high risk (UHR) of developing psychosis. The current study was undertaken to test the reliability and validity of the authorized Italian version of the CAARMS (CAARMS-ITA) in a help-seeking population. Psychometric properties of the CAARMS-ITA were established using a sample of 223 Italian adolescents and young adults aged between 13 and 35 years, who were divided into 3 groups according to the CAARMS criteria: UHR-negative individuals (UHR [-]; n = 64), UHR-positive (UHR [+]; n = 55) and individuals with a first-episode psychosis (FEP; n = 104). The CAARMS-ITA's reliability was tested measuring interrater reliability and internal consistency. Construct validity was tested comparing the Positive and Negative Syndrome Scale (PANSS) and CAARMS-ITA subscale scores across groups (ie, UHR [-], UHR [+] and FEP). For concurrent validity, we studied correlations between symptoms of the CAARMS-ITA and their equivalents in the PANSS. Finally, the predictive validity was examined by following up with UHR [+] individuals. The 12-month transition rate to psychosis was calculated. The CAARMS-ITA showed good interrater reliability. The PANSS "Positive Symptoms" subscale scores in UHR [+] individuals were intermediate between FEP and UHR [-] groups. The positive and negative symptoms scores of the CAARMS-ITA significantly correlated with the corresponding scores of the PANSS. After 12 months, 4 of 41 (9.8%) UHR [+] individuals had transitioned to psychosis. The CAARMS-ITA is a reliable and valid instrument for assessing and detecting at-risk mental states in Italian clinical settings. It also appears to be helpful in the prediction of psychosis transition. © 2018 John Wiley & Sons Australia, Ltd.
Burns, Ted M.; Conaway, Mark; Sanders, Donald B.
2010-01-01
Objective: To study the concurrent and construct validity and test-retest reliability in the practice setting of an outcome measure for myasthenia gravis (MG). Methods: Eleven centers participated in the validation study of the Myasthenia Gravis Composite (MGC) scale. Patients with MG were evaluated at 2 consecutive visits. Concurrent and construct validities of the MGC were assessed by evaluating MGC scores in the context of other MG-specific outcome measures. We used numerous potential indicators of clinical improvement to assess the sensitivity and specificity of the MGC for detecting clinical improvement. Test-retest reliability was performed on patients at the University of Virginia. Results: A total of 175 patients with MG were enrolled at 11 sites from July 1, 2008, to January 31, 2009. A total of 151 patients were seen in follow-up. Total MGC scores showed excellent concurrent validity with other MG-specific scales. Analyses of sensitivities and specificities of the MGC revealed that a 3-point improvement in total MGC score was optimal for signifying clinical improvement. A 3-point improvement in the MGC also appears to represent a meaningful improvement to most patients, as indicated by improved 15-item myasthenia gravis quality of life scale (MG-QOL15) scores. The psychometric properties were no better for an individualized subscore made up of the 2 functional domains that the patient identified as most important to treat. The test-retest reliability coefficient of the MGC was 98%, with a lower 95% confidence interval of 97%, indicating excellent test-retest reliability. Conclusions: The Myasthenia Gravis Composite is a reliable and valid instrument for measuring clinical status of patients with myasthenia gravis in the practice setting and in clinical trials. PMID:20439845
McCarthy, Julie M; Van Iddekinge, Chad H; Lievens, Filip; Kung, Mei-Chuan; Sinar, Evan F; Campion, Michael A
2013-09-01
Considerable evidence suggests that how candidates react to selection procedures can affect their test performance and their attitudes toward the hiring organization (e.g., recommending the firm to others). However, very few studies of candidate reactions have examined one of the outcomes organizations care most about: job performance. We attempt to address this gap by developing and testing a conceptual framework that delineates whether and how candidate reactions might influence job performance. We accomplish this objective using data from 4 studies (total N = 6,480), 6 selection procedures (personality tests, job knowledge tests, cognitive ability tests, work samples, situational judgment tests, and a selection inventory), 5 key candidate reactions (anxiety, motivation, belief in tests, self-efficacy, and procedural justice), 2 contexts (industry and education), 3 continents (North America, South America, and Europe), 2 study designs (predictive and concurrent), and 4 occupational areas (medical, sales, customer service, and technological). Consistent with previous research, candidate reactions were related to test scores, and test scores were related to job performance. Further, there was some evidence that reactions affected performance indirectly through their influence on test scores. Finally, in no cases did candidate reactions affect the prediction of job performance by increasing or decreasing the criterion-related validity of test scores. Implications of these findings and avenues for future research are discussed. PsycINFO Database Record (c) 2013 APA, all rights reserved
Forbey, Johnathan D; Lee, Tayla T C; Ben-Porath, Yossef S; Arbisi, Paul A; Gartland, Diane
2013-08-01
The current study explored associations between two potentially invalidating self-report styles detected by the Validity scales of the Minnesota Multiphasic Personality Inventory-2-Restructured Form (MMPI-2-RF), over-reporting and under-reporting, and scores on the MMPI-2-RF substantive, as well as eight collateral self-report measures administered either at the same time or within 1 to 10 days of MMPI-2-RF administration. Analyses were conducted with data provided by college students, male prisoners, and male psychiatric outpatients from a Veterans Administration facility. Results indicated that if either an over- or under-reporting response style was suggested by the MMPI-2-RF Validity scales, scores on the majority of the MMPI-2-RF substantive scales, as well as a number of collateral measures, were significantly affected in all three groups in the expected directions. Test takers who were identified as potentially engaging in an over- or under-reporting response style by the MMPI-2-RF Validity scales appeared to approach extra-test measures similarly regardless of when these measures were administered in relation to the MMPI-2-RF. Limitations and suggestions for future study are discussed.
Further validation of the Internet-based Dementia Risk Assessment.
Brandt, Jason; Blehar, Justin; Anderson, Allan; Gross, Alden L
2014-01-01
Most approaches to the detection of presymptomatic or prodromal Alzheimer's disease require the costly collection and analysis of biological samples or neuroimaging measurements. The Dementia Risk Assessment (DRA) was developed to facilitate this detection by collecting self-report and proxy-report of dementia risk variables and episodic memory performance on a free Internet website. We now report two validation studies. In Study 1, 130 community-residing older adults seeking memory screening at senior health fairs were tested using the Mini-Cog, and were then observed while taking the DRA. They were compared to a demographically-matched subsample from our anonymous Internet sample. Participants seeking memory screening had more dementia risk factors and obtained lower scores on the DRA's recognition memory test (RMT) than their Internet controls. In addition, those who failed the Mini-Cog obtained much lower scores on the RMT than those who passed the Mini-Cog. In Study 2, 160 older adults seeking evaluation of cognitive difficulties took the DRA prior to diagnostic evaluations at outpatient dementia clinics. Patients who ultimately received the diagnosis of a dementia syndrome scored significantly lower on the RMT than those diagnosed with other conditions or deemed normal. Lower education, family history of dementia, presence of hypercholesterolemia and diabetes, and memory test score distinguished the dementia and no-dementia groups with around 82% accuracy. In addition, score on the RMT correlated highly with scores on other instruments widely used to detect cognitive decline. These findings support the concurrent validity of the DRA for detecting prevalent cognitive impairment. Prospective studies of cognitively normal persons who subsequently develop dementia will be necessary to establish its predictive validity.
Cannon, Joanna E; Hubley, Anita M; Millhoff, Courtney; Mazlouman, Shahla
2016-01-01
The aim of the current study was to gather validation evidence for the Comprehension of Written Grammar (CWG; Easterbrooks, 2010) receptive test of 26 grammatical structures of English print for use with children who are deaf and hard of hearing (DHH). Reliability and validity data were collected for 98 participants (49 DHH and 49 hearing) in Grades 2-6. The objectives were to: (a) examine 4-week test-retest reliability data; and (b) provide evidence of known-groups validity by examining expected differences between the groups on the CWG vocabulary pretest and main test, as well as selected structures. Results indicated excellent test-retest reliability estimates for CWG test scores. DHH participants performed statistically significantly lower on the CWG vocabulary pretest and main test than the hearing participants. Significantly lower performance by DHH participants on most expected grammatical structures (e.g., basic sentence patterns, auxiliary "be" singular/plural forms, tense, comparatives, and complementation) also provided known groups evidence. Overall, the findings of this study showed strong evidence of the reliability of scores and known group-based validity of inferences made from the CWG. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Reliability and validity of the Parenting Scale of Inconsistency.
Yoshizumi, Takahiro; Murase, Satomi; Murakami, Takashi; Takai, Jiro
2006-08-01
The purposes of the present study were to develop a Parenting Scale of Inconsistency and to evaluate its initial reliability and validity. The 12 items assess the inconsistency among parents' moods, behaviors, and attitudes toward children. In the primary study, 517 participants completed three measures: the new Parenting Scale of Inconsistency, the Parental Bonding Instrument, and the Depression Scale of the General Health Questionnaire. The Parenting Scale of Inconsistency had good test-retest reliability of .85 and internal consistency of .88 (Cronbach coefficient alpha). Construct validity was good as Inconsistency scores were significantly correlated with the Care and Overprotection scores of the Parental Bonding Instrument and with the Depression scores. Moreover, Inconsistency scores' relation with a dimension of parenting style distinct from Care and Overprotection suggested that the Parenting Scale of Inconsistency had factorial validity. This scale seems a potential measure for examining the relationships between inconsistent parenting and the mental health of children.
The Vocal Cord Dysfunction Questionnaire: Validity and Reliability of the Persian Version.
Ghaemi, Hamide; Khoddami, Seyyedeh Maryam; Soleymani, Zahra; Zandieh, Fariborz; Jalaie, Shohreh; Ahanchian, Hamid; Khadivi, Ehsan
2017-12-25
The aim of this study was to develop, validate, and assess the reliability of the Persian version of Vocal Cord Dysfunction Questionnaire (VCDQ P ). The study design was cross-sectional or cultural survey. Forty-four patients with vocal fold dysfunction (VFD) and 40 healthy volunteers were recruited for the study. To assess the content validity, the prefinal questions were given to 15 experts to comment on its essential. Ten patients with VFD rated the importance of VCDQ P in detecting face validity. Eighteen of the patients with VFD completed the VCDQ 1 week later for test-retest reliability. To detect absolute reliability, standard error of measurement and smallest detected change were calculated. Concurrent validity was assessed by completing the Persian Chronic Obstructive Pulmonary Disease (COPD) Assessment Test (CAT) by 34 patients with VFD. Discriminant validity was measured from 34 participants. The VCDQ was further validated by administering the questionnaire to 40 healthy volunteers. Validation of the VCDQ as a treatment outcome tool was conducted in 18 patients with VFD using pre- and posttreatment scores. The internal consistency was confirmed (Cronbach α = 0.78). The test-retest reliability was excellent (intraclass correlation coefficient = 0.97). The standard error of measurement and smallest detected change values were acceptable (0.39 and 1.08, respectively). There was a significant correlation between the VCDQ P and the CAT total scores (P < 0.05). Discriminative validity was significantly different. The VCDQ scores in patients with VFD before and after treatment was significantly different (P < 0.001). The VCDQ was cross-culturally adapted to Persian and demonstrated to be a valid and reliable self-administered questionnaire in Persian-speaking population. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Lange, Rael T; Brickell, Tracey A; Lippa, Sara M; French, Louis M
2015-01-01
The purpose of this study was to examine the clinical utility of three recently developed validity scales (Validity-10, NIM5, and LOW6) designed to screen for symptom exaggeration using the Neurobehavioral Symptom Inventory (NSI). Participants were 272 U.S. military service members who sustained a mild, moderate, severe, or penetrating traumatic brain injury (TBI) and who were evaluated by the neuropsychology service at Walter Reed Army Medical Center within 199 weeks post injury. Participants were divided into two groups based on the Negative Impression Management scale of the Personality Assessment Inventory: (a) those who failed symptom validity testing (SVT-fail; n = 27) and (b) those who passed symptom validity testing (SVT-pass; n = 245). Participants in the SVT-fail group had significantly higher scores (p<.001) on the Validity-10, NIM5, LOW6, NSI total, and Personality Assessment Inventory (PAI) clinical scales (range: d = 0.76 to 2.34). Similarly high sensitivity, specificity, positive predictive power (PPP), and negative predictive (NPP) values were found when using all three validity scales to differentiate SVT-fail versus SVT-pass groups. However, the Validity-10 scale consistently had the highest overall values. The optimal cutoff score for the Validity-10 scale to identify possible symptom exaggeration was ≥19 (sensitivity = .59, specificity = .89, PPP = .74, NPP = .80). For the majority of people, these findings provide support for the use of the Validity-10 scale as a screening tool for possible symptom exaggeration. When scores on the Validity-10 exceed the cutoff score, it is recommended that (a) researchers and clinicians do not interpret responses on the NSI, and (b) clinicians follow up with a more detailed evaluation, using well-validated symptom validity measures (e.g., Minnesota Multiphasic Personality Inventory-2 Restructured Form, MMPI-2-RF, validity scales), to seek confirmatory evidence to support an hypothesis of symptom exaggeration.
ERIC Educational Resources Information Center
Weigle, Sara Cushing
2011-01-01
Automated scoring has the potential to dramatically reduce the time and costs associated with the assessment of complex skills such as writing, but its use must be validated against a variety of criteria for it to be accepted by test users and stakeholders. This study addresses two validity-related issues regarding the use of e-rater® with the…
Zhang, Ying-Li; Liang, Wei; Chen, Zuo-Ming; Zhang, Hong-Mei; Zhang, Jian-Hong; Weng, Xiao-Qin; Yang, Shi-Chang; Zhang, Lei; Shen, Li-Juan; Zhang, Ya-Lin
2013-12-01
This study examined the validity and reliability of the Patient Health Questionnaire-9 (PHQ-9) and Patient Health Questionnaire-2 (PHQ-2). The optimal cutoff score when screening for depression among Chinese college students was also determined. A total of 959 participants completed the PHQ-9 and the Beck Depression Inventory (BDI) questionnaire. The Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders was used to diagnose depression. Statistical tests were performed to determine the reliability, validity, and receiver operating characteristic curve of the data. The concurrent validity was tested by examining associations between PHQ-9 and BDI. The sensitivity and specificity, as well as the positive and negative predictive values, were calculated for different cutoff scores of PHQ-9 and PHQ-2. The internal consistency values of PHQ-9 and PHQ-2 were 0.854 and 0.727, respectively. The test-retest reliability values of PHQ-9 and PHQ-2 were 0.873 and 0.829, respectively. The scores of PHQ-9 (r = 0.790) and PHQ-2 (r = 0.651) were significantly associated with that of BDI. PHQ-9 had an optimal cutoff score of 11, which indicated a sensitivity of 0.89 and a specificity of 0.97, with an area under the curve of 0.977 (95% confidence interval: 0.966-0.988). The PHQ-2 demonstrated satisfactory sensitivity (0.81) and specificity (0.96) at the cutoff score of 3, and its area under the curve was 0.939. The PHQ-9 and the PHQ-2 are valid and reliable tools to screen depression in Chinese college students. For screening purposes, cutoff scores of 11 and 3 are recommended for PHQ-9 and PHQ-2, respectively. Copyright © 2013 Wiley Publishing Asia Pty Ltd.
Allen Gomes, Ana; Ruivo Marques, Daniel; Meia-Via, Ana Maria; Meia-Via, Mariana; Tavares, José; Fernandes da Silva, Carlos; Pinto de Azevedo, Maria Helena
2015-04-01
Based on successive samples totaling more than 5000 higher education students, we scrutinized the reliability, structure, initial validity and normative scores of a brief self-report seven-item scale to screen for the continuum of nighttime insomnia complaints/perceived sleep quality, used by our team for more than a decade, henceforth labeled the Basic Scale on Insomnia complaints and Quality of Sleep (BaSIQS). In study/sample 1 (n = 1654), the items were developed based on part of a larger survey on higher education sleep-wake patterns. The test-retest study was conducted in an independent small group (n = 33) with a 2-8 week gap. In study/sample 2 (n = 360), focused mainly on validity, the BaSIQS was completed together with the Pittsburgh Sleep Quality Index (PSQI). In study 3, a large recent sample of students from universities all over the country (n = 2995) answered the BaSIQS items, based on which normative scores were determined, and an additional question on perceived sleep problems in order to further analyze the scale's validity. Regarding reliability, Cronbach alpha coefficients were systematically higher than 0.7, and the test-retest correlation coefficient was greater than 0.8. Structure analyses revealed consistently satisfactory two-factor and single-factor solutions. Concerning validity analyses, BaSIQS scores were significantly correlated with PSQI component scores and overall score (r = 0.652 corresponding to a large association); mean scores were significantly higher in those students classifying themselves as having sleep problems (p < 0.0001, d = 0.99 corresponding to a large effect size). In conclusion, the BaSIQS is very easy to administer, and appears to be a reliable and valid scale in higher education students. It might be a convenient short tool in research and applied settings to rapidly assess sleep quality or screen for insomnia complaints, and it may be easily used in other populations with minor adaptations.
Harris, Joshua D; Erickson, Brandon J; Cvetanovich, Gregory L; Abrams, Geoffrey D; McCormick, Frank M; Gupta, Anil K; Verma, Nikhil N; Bach, Bernard R; Cole, Brian J
2014-02-01
Condition-specific questionnaires are important components in evaluation of outcomes of surgical interventions. No condition-specific study methodological quality questionnaire exists for evaluation of outcomes of articular cartilage surgery in the knee. To develop a reliable and valid knee articular cartilage-specific study methodological quality questionnaire. Cross-sectional study. A stepwise, a priori-designed framework was created for development of a novel questionnaire. Relevant items to the topic were identified and extracted from a recent systematic review of 194 investigations of knee articular cartilage surgery. In addition, relevant items from existing generic study methodological quality questionnaires were identified. Items for a preliminary questionnaire were generated. Redundant and irrelevant items were eliminated, and acceptable items modified. The instrument was pretested and items weighed. The instrument, the MARK score (Methodological quality of ARticular cartilage studies of the Knee), was tested for validity (criterion validity) and reliability (inter- and intraobserver). A 19-item, 3-domain MARK score was developed. The 100-point scale score demonstrated face validity (focus group of 8 orthopaedic surgeons) and criterion validity (strong correlation to Cochrane Quality Assessment score and Modified Coleman Methodology Score). Interobserver reliability for the overall score was good (intraclass correlation coefficient [ICC], 0.842), and for all individual items of the MARK score, acceptable to perfect (ICC, 0.70-1.000). Intraobserver reliability ICC assessed over a 3-week interval was strong for 2 reviewers (≥0.90). The MARK score is a valid and reliable knee articular cartilage condition-specific study methodological quality instrument. This condition-specific questionnaire may be used to evaluate the quality of studies reporting outcomes of articular cartilage surgery in the knee.
The reliability and validity of the SF-8 with a conflict-affected population in northern Uganda.
Roberts, Bayard; Browne, John; Ocaka, Kaducu Felix; Oyok, Thomas; Sondorp, Egbert
2008-12-02
The SF-8 is a health-related quality of life instrument that could provide a useful means of assessing general physical and mental health amongst populations affected by conflict. The purpose of this study was to test the validity and reliability of the SF-8 with a conflict-affected population in northern Uganda. A cross-sectional multi-staged, random cluster survey was conducted with 1206 adults in camps for internally displaced persons in Gulu and Amuru districts of northern Uganda. Data quality was assessed by analysing the number of incomplete responses to SF-8 items. Response distribution was analysed using aggregate endorsement frequency. Test-retest reliability was assessed in a separate smaller survey using the intraclass correlation test. Construct validity was measured using principal component analysis, and the Pearson Correlation test for item-summary score correlation and inter-instrument correlations. Known groups validity was assessed using a two sample t-test to evaluates the ability of the SF-8 to discriminate between groups known to have, and not have, physical and mental health problems. The SF-8 showed excellent data quality. It showed acceptable item response distribution based upon analysis of aggregate endorsement frequencies. Test-retest showed a good intraclass correlation of 0.61 for PCS and 0.68 for MCS. The principal component analysis indicated strong construct validity and concurred with the results of the validity tests by the SF-8 developers. The SF-8 also showed strong construct validity between the 8 items and PCS and MCS summary score, moderate inter-instrument validity, and strong known groups validity. This study provides evidence on the reliability and validity of the SF-8 amongst IDPs in northern Uganda.
The reliability and validity of the SF-8 with a conflict-affected population in northern Uganda
Roberts, Bayard; Browne, John; Ocaka, Kaducu Felix; Oyok, Thomas; Sondorp, Egbert
2008-01-01
Background The SF-8 is a health-related quality of life instrument that could provide a useful means of assessing general physical and mental health amongst populations affected by conflict. The purpose of this study was to test the validity and reliability of the SF-8 with a conflict-affected population in northern Uganda. Methods A cross-sectional multi-staged, random cluster survey was conducted with 1206 adults in camps for internally displaced persons in Gulu and Amuru districts of northern Uganda. Data quality was assessed by analysing the number of incomplete responses to SF-8 items. Response distribution was analysed using aggregate endorsement frequency. Test-retest reliability was assessed in a separate smaller survey using the intraclass correlation test. Construct validity was measured using principal component analysis, and the Pearson Correlation test for item-summary score correlation and inter-instrument correlations. Known groups validity was assessed using a two sample t-test to evaluates the ability of the SF-8 to discriminate between groups known to have, and not have, physical and mental health problems. Results The SF-8 showed excellent data quality. It showed acceptable item response distribution based upon analysis of aggregate endorsement frequencies. Test-retest showed a good intraclass correlation of 0.61 for PCS and 0.68 for MCS. The principal component analysis indicated strong construct validity and concurred with the results of the validity tests by the SF-8 developers. The SF-8 also showed strong construct validity between the 8 items and PCS and MCS summary score, moderate inter-instrument validity, and strong known groups validity. Conclusion This study provides evidence on the reliability and validity of the SF-8 amongst IDPs in northern Uganda. PMID:19055716
Validity and reliability of Abbreviated Mental Test Score (AMTS) among older Iranian.
Foroughan, Mahshid; Wahlund, Lars-Olof; Jafari, Zahra; Rahgozar, Mehdi; Farahani, Ida G; Rashedi, Vahid
2017-11-01
Cognitive impairment is common among older people and is associated with increased morbidity and mortality. The main aim of this study was to evaluate the validity of the Persian version of the Abbreviated Mental Test Score (AMTS) as a screening tool for dementia. Data were obtained from a cross-sectional study. One hundred and one older adults who were members of Iranian Alzheimer Association and 101 of their siblings were entered into this study by convenient sampling. The Diagnostic and Statistical Manual of Mental Disorders, 4th edition, criteria for diagnosing dementia and the Mini-Mental State Examination were used as the study tools. The gathered data were analyzed by the Mann-Whitney U-test, the Kruskal-Wallis test, Spearman's rank correlation coefficient, and the receiver-operating characteristic. The AMTS could successfully differentiate the dementia group from the non-dementia group. Scores were significantly correlated with Diagnostic and Statistical Manual of Mental Disorders diagnosis for dementia and Mini-Mental State Examination scores (P < 0.001). Educational level (P < 0.001) and male sex (P = 0.015) were positively associated with AMTS, whereas (P < 0.001) was negatively associated with AMTS. Total Cronbach's α coefficient was 0.90. The scores 6 and 7 showed the optimum balance between sensitivity (99% and 94%, respectively) and specificity (85% and 86%, respectively). The Persian version of the AMTS is a valid cognitive assessment tool for older Iranian adults and can be used for dementia screening in Iran. © 2017 Japanese Psychogeriatric Society.
Hermassi, Souhail; Chelly, Mohamed-Souhaiel; Wollny, Rainer; Hoffmeyer, Birgit; Fieseler, Georg; Schulze, Stephan; Irlenbusch, Lars; Delank, Karl-Stefan; Shephard, Roy J; Bartels, Thomas; Schwesig, René
2018-06-01
This study assessed the validity of the handball-specific complex test (HBCT) and two non-specific field tests in professional elite handball athletes, using the match performance score (MPS) as the gold standard of performance. Thirteen elite male handball players (age: 27.4±4.8 years; premier German league) performed the HBCT, the Yo-Yo Intermittent Recovery (YYIR) test and a repeated shuttle sprint ability (RSA) test at the beginning of pre-season training. The RSA results were evaluated in terms of best time, total time, and fatigue decrement. Heart rates (HR) were assessed at selected times throughout all tests; the recovery HR was measured immediately post-test and 10 minutes later. The match performance score was based on various handball specific parameters (e.g., field goals, assists, steals, blocks, and technical mistakes) as seen during all matches of the immediately subsequent season (2015/2016). The parameters of run 1, run 2, and HR recovery at minutes 6 and 10 of the RSA test all showed a variance of more than 10% (range: 11-15%). However, the variance of scores for the YYIR test was much smaller (range: 1-7%). The resting HR (r2=0.18), HR recovery at minute 10 (r2=0.10), lactate concentration at rest (r2=0.17), recovery of heart rate from 0 to 10 minutes (r2=0.15), and velocity of second throw at first trial (r2=0.37) were the most valid HBCT parameters. Much effort is necessary to assess MPS and to develop valid tests. Speed and the rate of functional recovery seem the best predictors of competitive performance for elite handball players.
Establishing Reliability and Validity of the Criterion Referenced Exam of GeoloGy Standards EGGS
NASA Astrophysics Data System (ADS)
Guffey, S. K.; Slater, S. J.; Slater, T. F.; Schleigh, S.; Burrows, A. C.
2016-12-01
Discipline-based geoscience education researchers have considerable need for a criterion-referenced, easy-to-administer and -score conceptual diagnostic survey for undergraduates taking introductory science survey courses in order for faculty to better be able to monitor the learning impacts of various interactive teaching approaches. To support ongoing education research across the geosciences, we are continuing to rigorously and systematically work to firmly establish the reliability and validity of the recently released Exam of GeoloGy Standards, EGGS. In educational testing, reliability refers to the consistency or stability of test scores whereas validity refers to the accuracy of the inferences or interpretations one makes from test scores. There are several types of reliability measures being applied to the iterative refinement of the EGGS survey, including test-retest, alternate form, split-half, internal consistency, and interrater reliability measures. EGGS rates strongly on most measures of reliability. For one, Cronbach's alpha provides a quantitative index indicating the extent to which if students are answering items consistently throughout the test and measures inter-item correlations. Traditional item analysis methods further establish the degree to which a particular item is reliably assessing students is actually quantifiable, including item difficulty and item discrimination. Validity, on the other hand, is perhaps best described by the word accuracy. For example, content validity is the to extent to which a measurement reflects the specific intended domain of the content, stemming from judgments of people who are either experts in the testing of that particular content area or are content experts. Perhaps more importantly, face validity is a judgement of how representative an instrument is reflective of the science "at face value" and refers to the extent to which a test appears to measure a the targeted scientific domain as viewed by laypersons, examinees, test users, the public, and other invested stakeholders.
A Statistical Analysis of Data Used in Critical Decision Making by Secondary School Personnel.
ERIC Educational Resources Information Center
Dunn, Charleta J.; Kowitz, Gerald T.
Guidance decisions depend on the validity of standardized tests and teacher judgment records as measures of student achievement. To test this validity, a sample of 400 high school juniors, randomly selected from two large Gulf Coas t area schools, were administered the Iowa Tests of Educational Development. The nine subtest scores and each…
Validation of the Seating and Mobility Script Concordance Test
ERIC Educational Resources Information Center
Cohen, Laura J.; Fitzgerald, Shirley G.; Lane, Suzanne; Boninger, Michael L.; Minkel, Jean; McCue, Michael
2009-01-01
The purpose of this study was to develop the scoring system for the Seating and Mobility Script Concordance Test (SMSCT), obtain and appraise internal and external structure evidence, and assess the validity of the SMSCT. The SMSCT purpose is to provide a method for testing knowledge of seating and mobility prescription. A sample of 106 therapists…
Confirmatory Factor Analysis of the TerraNova Comprehensive Tests of Basic Skills/5
ERIC Educational Resources Information Center
Stevens, Joseph J.; Zvoch, Keith
2007-01-01
Confirmatory factor analysis was used to explore the internal validity of scores on the TerraNova Comprehensive Tests of Basic Skills/5 using samples from a southwestern school district and standardization samples reported by the publisher. One of the strengths claimed for battery-type achievement tests is provision of reliable and valid samples…
Risk score to predict gastrointestinal bleeding after acute ischemic stroke.
Ji, Ruijun; Shen, Haipeng; Pan, Yuesong; Wang, Penglian; Liu, Gaifen; Wang, Yilong; Li, Hao; Singhal, Aneesh B; Wang, Yongjun
2014-07-25
Gastrointestinal bleeding (GIB) is a common and often serious complication after stroke. Although several risk factors for post-stroke GIB have been identified, no reliable or validated scoring system is currently available to predict GIB after acute stroke in routine clinical practice or clinical trials. In the present study, we aimed to develop and validate a risk model (acute ischemic stroke associated gastrointestinal bleeding score, the AIS-GIB score) to predict in-hospital GIB after acute ischemic stroke. The AIS-GIB score was developed from data in the China National Stroke Registry (CNSR). Eligible patients in the CNSR were randomly divided into derivation (60%) and internal validation (40%) cohorts. External validation was performed using data from the prospective Chinese Intracranial Atherosclerosis Study (CICAS). Independent predictors of in-hospital GIB were obtained using multivariable logistic regression in the derivation cohort, and β-coefficients were used to generate point scoring system for the AIS-GIB. The area under the receiver operating characteristic curve (AUROC) and the Hosmer-Lemeshow goodness-of-fit test were used to assess model discrimination and calibration, respectively. A total of 8,820, 5,882, and 2,938 patients were enrolled in the derivation, internal validation and external validation cohorts. The overall in-hospital GIB after AIS was 2.6%, 2.3%, and 1.5% in the derivation, internal, and external validation cohort, respectively. An 18-point AIS-GIB score was developed from the set of independent predictors of GIB including age, gender, history of hypertension, hepatic cirrhosis, peptic ulcer or previous GIB, pre-stroke dependence, admission National Institutes of Health stroke scale score, Glasgow Coma Scale score and stroke subtype (Oxfordshire). The AIS-GIB score showed good discrimination in the derivation (0.79; 95% CI, 0.764-0.825), internal (0.78; 95% CI, 0.74-0.82) and external (0.76; 95% CI, 0.71-0.82) validation cohorts. The AIS-GIB score was well calibrated in the derivation (P = 0.42), internal (P = 0.45) and external (P = 0.86) validation cohorts. The AIS-GIB score is a valid clinical grading scale to predict in-hospital GIB after AIS. Further studies on the effect of the AIS-GIB score on reducing GIB and improving outcome after AIS are warranted.
Böttcher, B; Fessler, S; Friedl, F; Toth, B; Walter, M H; Wildt, L; Riedl, D
2018-04-01
Patients with polycystic ovary syndrome (PCOS) report a decreased health-related quality of life (HRQOL) and higher levels of psychological distress. Validated questionnaires are necessary to assess the impact of PCOS on patients' lives. The aim of the present study was to evaluate the German "Polycystic Ovary Syndrome Questionnaire" (PCOSQ-G). The psychometric properties of the PCOSQ-G were investigated in PCOS patients with item-total correlation, internal consistency and test-retest reliability. Correlations with the Short-Form-36 Health Survey (SF-36) and the Hospital Anxiety and Depression Scale (HADS-D) were calculated to evaluate the validity of the PCOSQ-G. Discriminatory validity was investigated through a receiver operating characteristic curve and independent sample t tests compared with healthy controls. Good psychometric properties were found for most items. Acceptable to high internal consistency was found for the total score (α = 0.94-0.95) and all subscales (α = 0.70-0.97). High test-retest reliability was found for the total score (0.86) and all subscales (0.81-0.90). The validity analyses showed that the PCOSQ-G total score was positively correlated with both SF-36 summary scales and was negatively correlated with both HADS subscales. Patients reported significantly lower values for the PCOSQ-G total score (p < 0.001) and all subscales, and the PCOSQ-G discriminated well between patients and healthy controls (AUC = 0.81, p < 0.001). PCOSQ-G is a reliable and valid tool to assess the HRQOL in patients with PCOS and can be used in future clinical research. Patients with PCOS exhibited an impaired HRQOL, which indicates the need for psychosomatic counseling.
Giesinger, Johannes M; Kieffer, Jacobien M; Fayers, Peter M; Groenvold, Mogens; Petersen, Morten Aa; Scott, Neil W; Sprangers, Mirjam A G; Velikova, Galina; Aaronson, Neil K
2016-01-01
To further evaluate the higher order measurement structure of the European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Questionnaire Core 30 (QLQ-C30), with the aim of generating a summary score. Using pretreatment QLQ-C30 data (N = 3,282), we conducted confirmatory factor analyses to test seven previously evaluated higher order models. We compared the summary score(s) derived from the best performing higher order model with the original QLQ-C30 scale scores, using tumor stage, performance status, and change over time (N = 244) as grouping variables. Although all models showed acceptable fit, we continued in the interest of parsimony with known-groups validity and responsiveness analyses using a summary score derived from the single higher order factor model. The validity and responsiveness of this QLQ-C30 summary score was equal to, and in many cases superior to the original, underlying QLQ-C30 scale scores. Our results provide empirical support for a measurement model for the QLQ-C30 yielding a single summary score. The availability of this summary score can avoid problems with potential type I errors that arise because of multiple testing when making comparisons based on the 15 outcomes generated by this questionnaire and may reduce sample size requirements for health-related quality of life studies using the QLQ-C30 questionnaire when an overall summary score is a relevant primary outcome. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Verheijde, Joseph L; White, Fred; Tompkins, James; Dahl, Peder; Hentz, Joseph G; Lebec, Michael T; Cornwall, Mark
2013-12-01
To investigate reliability, validity, and sensitivity to change of the Lower Extremity Functional Scale (LEFS) in individuals affected by stroke. The secondary objective was to test the validity and sensitivity of a single-item linear analog scale (LAS) of function. Prospective cohort reliability and validation study. A single rehabilitation department in an academic medical center. Forty-three individuals receiving neurorehabilitation for lower extremity dysfunction after stroke were studied. Their ages ranged from 32 to 95 years, with a mean of 70 years; 77% were men. Test-retest reliability was assessed by calculating the classical intraclass correlation coefficient, and the Bland-Altman limits of agreement. Validity was assessed by calculating the Pearson correlation coefficient between the instruments. Sensitivity to change was assessed by comparing baseline scores with end of treatment scores. Measurements were taken at baseline, after 1-3 days, and at 4 and 8 weeks. The LEFS, Short-Form-36 Physical Function Scale, Berg Balance Scale, Six-Minute Walk Test, Five-Meter Walk Test, Timed Up-and-Go test, and the LAS of function were used. The test-retest reliability of the LEFS was found to be excellent (ICC = 0.96). Correlated with the 6 other measures of function studied, the validity of the LEFS was found to be moderate to high (r = 0.40-0.71). Regarding the sensitivity to change, the mean LEFS scores from baseline to study end increased 1.2 SD and for LAS 1.1 SD. LEFS exhibits good reliability, validity, and sensitivity to change in patients with lower extremity impairments secondary to stroke. Therefore, the LEFS can be a clinically efficient outcome measure in the rehabilitation of patients with subacute stroke. The LAS is shown to be a time-saving and reasonable option to track changes in a patient's functional status. Copyright © 2013 American Academy of Physical Medicine and Rehabilitation. Published by Elsevier Inc. All rights reserved.
Gärtner, Fania R; de Miranda, Esteriek; Rijnders, Marlies E; Freeman, Liv M; Middeldorp, Johanna M; Bloemenkamp, Kitty W M; Stiggelbout, Anne M; van den Akker-van Marle, M Elske
2015-10-01
To validate the Labor and Delivery Index (LADY-X), a new delivery-specific utility measure. In a test-retest design, women were surveyed online, 6 to 8 weeks postpartum and again 1 to 2 weeks later. For reliability testing, we assessed the standard error of measurement (S.E.M.) and the intraclass correlation coefficient (ICC). For construct validity, we tested hypotheses on the association with comparison instruments (Mackey Childbirth Satisfaction Rating Scale and Wijma Delivery Experience Questionnaire), both on domain and total score levels. We assessed known-group differences using eight obstetrical indicators: method and place of birth, induction, transfer, control over pain medication, complications concerning mother and child, and experienced control. The questionnaire was completed by 308 women, 257 (83%) completed the retest. The distribution of LADY-X scores was skewed. The reliability was good, as the ICC exceeded 0.80 and the S.E.M. was 0.76. Requirements for good construct validity were fulfilled: all hypotheses for convergent and divergent validity were confirmed, and six of eight hypotheses for known-group differences were confirmed as all differences were statistically significant (P-values: <0.001-0.023), but for two tests, difference scores did not exceed the S.E.M. The LADY-X demonstrates good reliability and construct validity. Despite its skewed distribution, the LADY-X can discriminate between groups. With the preference weights available, the LADY-X might fulfill the need for a utility measure for cost-effectiveness studies for perinatal care interventions. Copyright © 2015 Elsevier Inc. All rights reserved.
The Recognition Memory Test Examination of ethnic differences and norm validity.
O'Bryant, Sid E; Hilsabeck, Robin C; McCaffrey, Robert J; Drew Gouvier, Wm
2003-03-01
The possibility of racial bias in neuropsychological test materials has received increasing attention in recent years. The purpose of the present study was to investigate whether an own-race recognition bias would provide an advantage for Caucasian participants over African American participants on the Faces subtest of the Recognition Memory Test (RMT). Thirty Caucasian and 30 African American undergraduates completed the RMT, Shipley Institute of Living Scale (SILS), and Symbol Digit Modalities Test (SDMT). No significant group difference was found on RMT Faces. However, mean RMT Faces scores for both groups were below the 10th percentile in spite of average scores on the SDMT and SILS. A second study was conducted to further examine the validity of the RMT norms for this age range (i.e., 18-24) and to provide 2-week test-retest reliabilities. The mean RMT Faces subtest score was 39.78 (10th percentile), and 28% of the sample scored at or below the fifth percentile. Test-retest reliabilities were.63 and.64 for RMT Words and Faces, respectively. Results of these studies suggest that re-examination of the current norms for RMT Faces is warranted for adults aged.
Dwyer, Tim; Takahashi, Susan Glover; Hynes, Melissa Kennedy; Herold, Jodi; Wasserstein, David; Nousiainen, Markku; Ferguson, Peter; Wadey, Veronica; Murnaghan, M. Lucas; Leroux, Tim; Semple, John; Hodges, Brian; Ogilvie-Harris, Darrell
2014-01-01
Background Assessing residents’ understanding and application of the 6 intrinsic CanMEDS roles (communicator, professional, manager, collaborator, health advocate, scholar) is challenging for postgraduate medical educators. We hypothesized that an objective structured clinical examination (OSCE) designed to assess multiple intrinsic CanMEDS roles would be sufficiently reliable and valid. Methods The OSCE comprised 6 10-minute stations, each testing 2 intrinsic roles using case-based scenarios (with or without the use of standardized patients). Residents were evaluated using 5-point scales and an overall performance rating at each station. Concurrent validity was sought by correlation with in-training evaluation reports (ITERs) from the last 12 months and an ordinal ranking created by program directors (PDs). Results Twenty-five residents from postgraduate years (PGY) 0, 3 and 5 participated. The interstation reliability for total test scores (percent) was 0.87, while reliability for each of the communicator, collaborator, manager and professional roles was greater than 0.8. Total test scores, individual station scores and individual CanMEDS role scores all showed a significant effect by PGY level. Analysis of the PD rankings of intrinsic roles demonstrated a high correlation with the OSCE role scores. A correlation was seen between ITER and OSCE for the communicator role, while the ITER medical expert and total scores highly correlated with the communicator, manager and professional OSCE scores. Conclusion An OSCE designed to assess the intrinsic CanMEDS roles was sufficiently valid and reliable for regular use in an orthopedic residency program. PMID:25078926
Reliability and validity of the Chinese pediatric voice handicap index.
Liu, Kena; Liu, Shaofeng; Zhou, Zhou; Ren, Qinyi; Zhong, Jie; Luo, Renzhong; Qin, Huabiao; Zhang, Siyi; Ge, Pingjiang
2018-02-01
To evaluate the reliability and validity of the Chinese version of pediatric voice handicap index (pVHI). The original English version-pVHI was translated into Chinese. Parents of 52 children with voice dysphonia and 43 children with no history or symptoms of voice problems were asked to fill the Chinese pVHI questionnaires twice with an interval of 2 weeks. GRB (Grade, Roughness, Breathiness) scale was used for perceptual assessment by two otolaryngologists and one speech pathologist for each child's voice. The internal consistency was assessed using Cronbach's alpha coefficient. Pearson's correlation coefficient was used to evaluate the test-retest reliability. The Kendall's coefficient of concordance W was used to assess the consistency of GRB scores of 3 voice specialists. The nonparametric Mann-Whitney test was used to assess the differences between the dysphonia group and controls. The correlation between pVHI and GRB scores were assessed using Pearson's correlation coefficient. The internal consistency of total score and three subscales scores of Chinese pVHI were 0.788-0.944. The test-retest reliability was 0.631-0.887(P < .001). The pVHI scores of control group significantly were lower than the pathological group (P = .000). The GRB scores of 3 voice specialists have an excellent consistency (W = 0.694-0.807, P = .000). The pVHI scores positively correlated with GRB assessment (P < .01). The Chinese version of pVHI had a good reliability and validity. It can be applicable and useful supplementary tool for evaluating parents' perception of their children's dysphonia. Copyright © 2017. Published by Elsevier B.V.
Hackethal, A; Immenroth, M; Bürger, T
2006-04-01
The Minimally Invasive Surgical Trainer-Virtual Reality (MIST-VR) simulator is validated for laparoscopy training, but benchmarks and target scores for assessing single tasks are needed. Control data for the MIST-VR traversal task scenario were collected from 61 novices who performed the task 10 times over 3 days (1 h daily). Data were collected on the time taken, error score, economy of movement, and total score. Test differences were analyzed through percentage scores and t-tests for paired samples. Improvement was greatest over tests 1 to 5 (improvement: test(1.2), 38.07%; p = 0.000; test(4.5), 10.66%; p = 0.010): between tests 5 and 10, improvement slowed and scores stabilized. Variation in participants' performance fell steadily over the 10 tests. Trainees should perform at least 10 tests of the traversal task-five to get used to the equipment and task (automation phase; target total score, 95.16) and five to stabilize and consolidate performance (test 10 target total score, 74.11).
Lim, Renly; Liong, Men Long; Khan, Nurzalina Abdul Karim; Yuen, Kah Hay
2017-02-17
There is currently no published information on the validity and reliability of the Golombok Rust Inventory of Sexual Satisfaction in the Asian population, specifically in patients with stress urinary incontinence, which limits its use in this region. Our study aimed to evaluate the psychometric properties of this questionnaire in the Malaysian population. Ten couples were recruited for the pilot testing. The agreement between the English and Chinese or Malay versions were tested using the intraclass correlation coefficients, with results of more than 0.80 for all subscales and overall scores indicating good agreement. Sixty-six couples were included in the subsequent phase. The following data are presented in the order of English, Chinese, and Malay. Cronbach's alphas for the male total score were 0.82, 0.88, and 0.95. For the female total score, Cronbach's alphas were 0.76, 0.78, and 0.88. Intraclass correlation coefficients for the male total score were 0.93, 0.94, and 0.99, while intraclass correlation coefficients for the female total score were 0.89, 0.86, and 0.88. In conclusion, the English, Chinese, and Malay versions each proved to be valid and reliable in our Malaysian population.
Making the Term "Validity" Useful
ERIC Educational Resources Information Center
Koretz, Daniel
2016-01-01
Daniel Koretz is the Henry Lee Shattuck Professor of Education at the Harvard Graduate School of Education. His research focuses on educational assessment and policy, particularly the effects of high-stakes testing on educational practice and the validity of score gains. He is the author of "Measuring Up: What Educational Testing Really Tells…
Turc, Guillaume; Aguettaz, Pierre; Ponchelle-Dequatre, Nelly; Hénon, Hilde; Naggara, Olivier; Leclerc, Xavier; Cordonnier, Charlotte; Leys, Didier; Mas, Jean-Louis; Oppenheim, Catherine
2014-01-01
The aim of our study was to validate in an independent cohort the MRI-DRAGON score, an adaptation of the (CT-) DRAGON score to predict 3-month outcome in acute ischemic stroke patients undergoing MRI before intravenous thrombolysis (IV-tPA). We reviewed consecutive (2009-2013) anterior circulation stroke patients treated within 4.5 hours by IV-tPA in the Lille stroke unit (France), where MRI is the first-line pretherapeutic work-up. We assessed the discrimination and calibration of the MRI-DRAGON score to predict poor 3-month outcome, defined as modified Rankin Score >2, using c-statistic and the Hosmer-Lemeshow test, respectively. We included 230 patients (mean ±SD age 70.4±16.0 years, median [IQR] baseline NIHSS 8 [5]-[14]; poor outcome in 78(34%) patients). The c-statistic was 0.81 (95%CI 0.75-0.87), and the Hosmer-Lemeshow test was not significant (p = 0.54). The MRI-DRAGON score showed good prognostic performance in the external validation cohort. It could therefore be used to inform the patient's relatives about long-term prognosis and help to identify poor responders to IV-tPA alone, who may be candidates for additional therapeutic strategies, if they are otherwise eligible for such procedures based on the institutional criteria.
ERIC Educational Resources Information Center
Krach, S. Kathleen; Loe, Scott A.; Jones, W. Paul; Farrally, Autumn
2009-01-01
Validity studies with the Reynolds Intellectual Ability scales (RIAS) indicated that RIAS composite intelligence index (CIX) and verbal intelligence index (VIX) scores have moderate-to-high correlation with comparable scores on other instruments. The authors of the RIAS described the VIX scale as a measure of crystallized ability and the nonverbal…
ERIC Educational Resources Information Center
Reeve, Charlie L.; Bonaccio, Silvia
2011-01-01
Claims of changes in the validity coefficients associated with general mental ability (GMA) tests due to the passage of time (i.e., temporal validity degradation) have been the focus of an on-going debate in applied psychology. To evaluate whether and, if so, under what conditions this degradation may occur, we integrate evidence from multiple…
Creation and validation of web-based food allergy audiovisual educational materials for caregivers.
Rosen, Jamie; Albin, Stephanie; Sicherer, Scott H
2014-01-01
Studies reveal deficits in caregivers' ability to prevent and treat food-allergic reactions with epinephrine and a consumer preference for validated educational materials in audiovisual formats. This study was designed to create brief, validated educational videos on food allergen avoidance and emergency management of anaphylaxis for caregivers of children with food allergy. The study used a stepwise iterative process including creation of a needs assessment survey consisting of 25 queries administered to caregivers and food allergy experts to identify curriculum content. Preliminary videos were drafted, reviewed, and revised based on knowledge and satisfaction surveys given to another cohort of caregivers and health care professionals. The final materials were tested for validation of their educational impact and user satisfaction using pre- and postknowledge tests and satisfaction surveys administered to a convenience sample of 50 caretakers who had not participated in the development stages. The needs assessment identified topics of importance including treatment of allergic reactions and food allergen avoidance. Caregivers in the final validation included mothers (76%), fathers (22%), and other caregivers (2%). Race/ethnicity were white (66%), black (12%), Asian (12%), Hispanic (8%), and other (2%). Knowledge tests (maximum score = 18) increased from a mean score of 12.4 preprogram to 16.7 postprogram (p < 0.0001). On a 7-point Likert scale, all satisfaction categories remained above a favorable mean score of 6, indicating participants were overall very satisfied, learned a lot, and found the materials to be informative, straightforward, helpful, and interesting. This web-based audiovisual curriculum on food allergy improved knowledge scores and was well received.
Arunakul, Marut; Arunakul, Preeyaphan; Suesiritumrong, Chakhrist; Angthong, Chayanin; Chernchujit, Bancha
2015-06-01
Self-administered questionnaires have become an important aspect for clinical outcome assessment of foot and ankle-related problems. The Foot and Ankle Ability Measure (FAAM) subjective form is a region-specific questionnaire that is widely used and has sufficient validity and reliability from previous studies. Translate the original English version of FAAM into a Thai version and evaluate the validity and reliability of Thai FAAM in patients with foot and ankle-related problems. The FAAM subjective form was translated into Thai using forward-backward translation protocol. Afterward, reliability and validity were tested. Following responses from 60 consecutive patients on two questionnaires, the Thai FAAM subjective form and the short form (SF)-36, were used. The validity was tested by correlating the scores from both questionnaires. The reliability was adopted by measuring the test-retest reliability and internal consistency. Thai FAAM score including activity of daily life (ADL) and Sport subscale demonstrated the sufficient correlations with physical functioning (PF) and physical composite score (PCS) domains of the SF-36 (statistically significant with p < 0.001 level and ≥ 0.5 values). The result of reliability revealed highly intra-class correlation coefficient as 0.8 and 0.77, respectively from test-retest study. The internal consistency was strong (Cronbach alpha = 0.94 and 0.88, respectively). The Thai version of FAAM subjective form retained the characteristics of the original version and has proved a reliable evaluation instrument for patients with foot and ankle-related problems.
Applications of computerized adaptive testing (CAT) to the assessment of headache impact.
Ware, John E; Kosinski, Mark; Bjorner, Jakob B; Bayliss, Martha S; Batenhorst, Alice; Dahlöf, Carl G H; Tepper, Stewart; Dowson, Andrew
2003-12-01
To evaluate the feasibility of computerized adaptive testing (CAT) and the reliability and validity of CAT-based estimates of headache impact scores in comparison with 'static' surveys. Responses to the 54-item Headache Impact Test (HIT) were re-analyzed for recent headache sufferers (n = 1016) who completed telephone interviews during the National Survey of Headache Impact (NSHI). Item response theory (IRT) calibrations and the computerized dynamic health assessment (DYNHA) software were used to simulate CAT assessments by selecting the most informative items for each person and estimating impact scores according to pre-set precision standards (CAT-HIT). Results were compared with IRT estimates based on all items (total-HIT), computerized 6-item dynamic estimates (CAT-HIT-6), and a developmental version of a 'static' 6-item form (HIT-6-D). Analyses focused on: respondent burden (survey length and administration time), score distributions ('ceiling' and 'floor' effects), reliability and standard errors, and clinical validity (diagnosis, level of severity). A random sample (n = 245) was re-assessed to test responsiveness. A second study (n = 1103) compared actual CAT surveys and an improved 'static' HIT-6 among current headache sufferers sampled on the Internet. Respondents completed measures from the first study and the generic SF-8 Health Survey; some (n = 540) were re-tested on the Internet after 2 weeks. In the first study, simulated CAT-HIT and total-HIT scores were highly correlated (r = 0.92) without 'ceiling' or 'floor' effects and with a substantial reduction (90.8%) in respondent burden. Six of the 54 items accounted for the great majority of item administrations (3603/5028, 77.6%). CAT-HIT reliability estimates were very high (0.975-0.992) in the range where 95% of respondents scored, and relative validity (RV) coefficients were high for diagnosis (RV = 0.87) and severity (RV = 0.89); patient-level classifications were accurate 91.3% for a diagnosis of migraine. For all three criteria of change, CAT-HIT scores were more responsive than all other measures. In the second study, estimates of respondent burden, item usage, reliability and clinical validity were replicated. The test-retest reliability of CAT-HIT was 0.79 and alternate forms coefficients ranged from 0.85 to 0.91. All correlations with the generic SF-8 were negative. CAT-based administrations of headache impact items achieved very large reductions in respondent burden without compromising validity for purposes of patient screening or monitoring changes in headache impact over time. IRT models and CAT-based dynamic health assessments warrant testing among patients with other conditions.
Yun, Young Ho; Kim, Soo-Hyun; Lee, Kyoung-Min; Park, Sang Min; Lee, Chang Geol; Choi, Youn Seon; Lee, Won Sup; Kim, Si-Young; Heo, Dae Seog
2006-09-01
Our goal was to validate an instrument with which terminally ill patients could evaluate the quality of care they receive at the end of life (EOL). Questionnaire development followed a four-phase process: item generation and reduction, construction, pilot testing, and field-testing. Using relevance and priority criteria and pilot testing, we developed a 16-item questionnaire. Factor analyses of data from 235 patients resulted in the Quality Care Questionnaire-End of Life (QCQ-EOL) covering dignity-conserving care, care by health care professionals, individualised care, and family relationships. All subscales and total scores showed high internal consistency (Cronbach alpha range, 0.73-0.89). The ability of total score and selective subscale scores clearly differentiated patients on the basis of clinical situation, sense of dignity, and general rating of care quality. Correlations of scores between patients and caregivers were substantial. The QCQ-EOL can be adopted to assess the quality of care received by terminally ill patients.
Yunhua, Tang; Weiqiang, Ju; Maogen, Chen; Sai, Yang; Zhiheng, Zhang; Dongping, Wang; Zhiyong, Guo; Xiaoshun, He
2018-06-01
Early allograft dysfunction (EAD) and early postoperative complications are two important clinical endpoints when evaluating clinical outcomes of liver transplantation (LT). We developed and validated two ICGR15-MELD models in 87 liver transplant recipients for predicting EAD and early postoperative complications after LT by incorporating the quantitative liver function tests (ICGR15) into the MELD score. Eighty seven consecutive patients who underwent LT were collected and divided into a training cohort (n = 61) and an internal validation cohort (n = 26). For predicting EAD after LT, the area under curve (AUC) for ICGR15-MELD score was 0.876, with a sensitivity of 92.0% and a specificity of 75.0%, which is better than MELD score or ICGR15 alone. The recipients with a ICGR15-MELD score ≥0.243 have a higher incidence of EAD than those with a ICGR15-MELD score <0.243 (P <0.001). For predicting early postoperative complications, the AUC of ICGR15-MELD score was 0.832, with a sensitivity of 90.9% and a specificity of 71.0%. Those recipients with an ICGR15-MELD score ≥0.098 have a higher incidence of early postoperative complications than those with an ICGR15-MELD score <0.098 (P < 0.001). Finally, application of the two ICGR15-MELD models in the validation cohort still gave good accuracy (AUC, 0.835 and 0.826, respectively) in predicting EAD and early postoperative complications after LT. The combination of quantitative liver function tests (ICGR15) and the preoperative MELD score is a reliable and effective predictor of EAD and early postoperative complications after LT, which is better than MELD score or ICGR15 alone.
Butts, Ryan J; Savage, Andrew J; Atz, Andrew M; Heal, Elisabeth M; Burnette, Ali L; Kavarana, Minoo M; Bradley, Scott M; Chowdhury, Shahryar M
2015-09-01
This study aimed to develop a reliable and feasible score to assess the risk of rejection in pediatric heart transplantation recipients during the first post-transplant year. The first post-transplant year is the most likely time for rejection to occur in pediatric heart transplantation. Rejection during this period is associated with worse outcomes. The United Network for Organ Sharing database was queried for pediatric patients (age <18 years) who underwent isolated orthotopic heart transplantation from January 1, 2000 to December 31, 2012. Transplantations were divided into a derivation cohort (n = 2,686) and a validation (n = 509) cohort. The validation cohort was randomly selected from 20% of transplantations from 2005 to 2012. Covariates found to be associated with rejection (p < 0.2) were included in the initial multivariable logistic regression model. The final model was derived by including only variables independently associated with rejection. A risk score was then developed using relative magnitudes of the covariates' odds ratio. The score was then tested in the validation cohort. A 9-point risk score using 3 variables (age, cardiac diagnosis, and panel reactive antibody) was developed. Mean score in the derivation and validation cohorts were 4.5 ± 2.6 and 4.8 ± 2.7, respectively. A higher score was associated with an increased rate of rejection (score = 0, 10.6% in the validation cohort vs. score = 9, 40%; p < 0.01). In weighted regression analysis, the model-predicted risk of rejection correlated closely with the actual rates of rejection in the validation cohort (R(2) = 0.86; p < 0.01). The rejection score is accurate in determining the risk of early rejection in pediatric heart transplantation recipients. The score has the potential to be used in clinical practice to aid in determining the immunosuppressant regimen and the frequency of rejection surveillance in the first post-transplant year. Copyright © 2015 American College of Cardiology Foundation. Published by Elsevier Inc. All rights reserved.
Measurement of COPD Severity Using a Survey-Based Score
Omachi, Theodore A.; Katz, Patricia P.; Yelin, Edward H.; Iribarren, Carlos; Blanc, Paul D.
2010-01-01
Background: A comprehensive survey-based COPD severity score has usefulness for epidemiologic and health outcomes research. We previously developed and validated the survey-based COPD Severity Score without using lung function or other physiologic measurements. In this study, we aimed to further validate the severity score in a different COPD cohort and using a combination of patient-reported and objective physiologic measurements. Methods: Using data from the Function, Living, Outcomes, and Work cohort study of COPD, we evaluated the concurrent and predictive validity of the COPD Severity Score among 1,202 subjects. The survey instrument is a 35-point score based on symptoms, medication and oxygen use, and prior hospitalization or intubation for COPD. Subjects were systemically assessed using structured telephone survey, spirometry, and 6-min walk testing. Results: We found evidence to support concurrent validity of the score. Higher COPD Severity Score values were associated with poorer FEV1 (r = −0.38), FEV1% predicted (r = −0.40), Body mass, Obstruction, Dyspnea, Exercise Index (r = 0.57), and distance walked in 6 min (r = −0.43) (P < .0001 in all cases). Greater COPD severity was also related to poorer generic physical health status (r = −0.49) and disease-specific health-related quality of life (r = 0.57) (P < .0001). The score also demonstrated predictive validity. It was also associated with a greater prospective risk of acute exacerbation of COPD defined as ED visits (hazard ratio [HR], 1.31; 95% CI, 1.24-1.39), hospitalizations (HR, 1.59; 95% CI, 1.44-1.75), and either measure of hospital-based care for COPD (HR, 1.34; 95% CI, 1.26-1.41) (P < .0001 in all cases). Conclusion: The COPD Severity Score is a valid survey-based measure of disease-specific severity, both in terms of concurrent and predictive validity. The score is a psychometrically sound instrument for use in epidemiologic and outcomes research in COPD. PMID:20040611
The Pareidolia Test: A Simple Neuropsychological Test Measuring Visual Hallucination-Like Illusions
Mamiya, Yasuyuki; Nishio, Yoshiyuki; Watanabe, Hiroyuki; Yokoi, Kayoko; Uchiyama, Makoto; Baba, Toru; Iizuka, Osamu; Kanno, Shigenori; Kamimura, Naoto; Kazui, Hiroaki; Hashimoto, Mamoru; Ikeda, Manabu; Takeshita, Chieko; Shimomura, Tatsuo; Mori, Etsuro
2016-01-01
Background Visual hallucinations are a core clinical feature of dementia with Lewy bodies (DLB), and this symptom is important in the differential diagnosis and prediction of treatment response. The pareidolia test is a tool that evokes visual hallucination-like illusions, and these illusions may be a surrogate marker of visual hallucinations in DLB. We created a simplified version of the pareidolia test and examined its validity and reliability to establish the clinical utility of this test. Methods The pareidolia test was administered to 52 patients with DLB, 52 patients with Alzheimer’s disease (AD) and 20 healthy controls (HCs). We assessed the test-retest/inter-rater reliability using the intra-class correlation coefficient (ICC) and the concurrent validity using the Neuropsychiatric Inventory (NPI) hallucinations score as a reference. A receiver operating characteristic (ROC) analysis was used to evaluate the sensitivity and specificity of the pareidolia test to differentiate DLB from AD and HCs. Results The pareidolia test required approximately 15 minutes to administer, exhibited good test-retest/inter-rater reliability (ICC of 0.82), and moderately correlated with the NPI hallucinations score (rs = 0.42). Using an optimal cut-off score set according to the ROC analysis, and the pareidolia test differentiated DLB from AD with a sensitivity of 81% and a specificity of 92%. Conclusions Our study suggests that the simplified version of the pareidolia test is a valid and reliable surrogate marker of visual hallucinations in DLB. PMID:27171377
Clerici, Francesca; Ghiretti, Roberta; Di Pucchio, Alessandra; Pomati, Simone; Cucumo, Valentina; Marcone, Alessandra; Vanacore, Nicola; Mariani, Claudio; Cappa, Stefano Francesco
2017-06-01
The Free and Cued Selective Reminding Test (FCSRT) is the memory test recommended by the International Working Group on Alzheimer's disease (AD) for the detection of amnestic syndrome of the medial temporal type in prodromal AD. Assessing the construct validity and internal consistency of the Italian version of the FCSRT is thus crucial. The FCSRT was administered to 338 community-dwelling participants with memory complaints (57% females, age 74.5 ± 7.7 years), including 34 with AD, 203 with Mild Cognitive Impairment, and 101 with Subjective Memory Impairment. Internal Consistency was estimated using Cronbach's alpha coefficient. To assess convergent validity, five FCSRT scores (Immediate Free Recall, Immediate Total Recall, Delayed Free Recall, Delayed Total Recall, and Index of Sensitivity of Cueing) were correlated with three well-validated memory tests: Story Recall, Rey Auditory Verbal Learning test, and Rey Complex Figure (RCF) recall (partial correlation analysis). To assess divergent validity, a principal component analysis (an exploratory factor analysis) was performed including, in addition to the above-mentioned memory tasks, the following tests: Word Fluencies, RCF copy, Clock Drawing Test, Trail Making Test, Frontal Assessment Battery, Raven Coloured Progressive Matrices, and Stroop Colour-Word Test. Cronbach's alpha coefficients for immediate recalls (IFR and ITR) and delayed recalls (DFR and DTR) were, respectively, .84 and .81. All FCSRT scores were highly correlated with those of the three well-validated memory tests. The factor analysis showed that the FCSRT does not load on the factors saturated by non-memory tests. These findings indicate that the FCSRT has a good internal consistency and has an excellent construct validity as an episodic memory measure. © 2015 The British Psychological Society.
The Effects of Primacy on Rater Cognition: An Eye-Tracking Study
ERIC Educational Resources Information Center
Ballard, Laura
2017-01-01
Rater scoring has an impact on writing test reliability and validity. Thus, there has been a continued call for researchers to investigate issues related to rating (Crusan, 2015). Investigating the scoring process and understanding how raters arrive at particular scores are critical "because the score is ultimately what will be used in making…
A Validity-Based Approach to Quality Control and Assurance of Automated Scoring
ERIC Educational Resources Information Center
Bejar, Isaac I.
2011-01-01
Automated scoring of constructed responses is already operational in several testing programmes. However, as the methodology matures and the demand for the utilisation of constructed responses increases, the volume of automated scoring is likely to increase at a fast pace. Quality assurance and control of the scoring process will likely be more…
Screening for cognitive impairment in older individuals. Validation study of a computer-based test.
Green, R C; Green, J; Harrison, J M; Kutner, M H
1994-08-01
This study examined the validity of a computer-based cognitive test that was recently designed to screen the elderly for cognitive impairment. Criterion-related validity was examined by comparing test scores of impaired patients and normal control subjects. Construct-related validity was computed through correlations between computer-based subtests and related conventional neuropsychological subtests. University center for memory disorders. Fifty-two patients with mild cognitive impairment by strict clinical criteria and 50 unimpaired, age- and education-matched control subjects. Control subjects were rigorously screened by neurological, neuropsychological, imaging, and electrophysiological criteria to identify and exclude individuals with occult abnormalities. Using a cut-off total score of 126, this computer-based instrument had a sensitivity of 0.83 and a specificity of 0.96. Using a prevalence estimate of 10%, predictive values, positive and negative, were 0.70 and 0.96, respectively. Computer-based subtests correlated significantly with conventional neuropsychological tests measuring similar cognitive domains. Thirteen (17.8%) of 73 volunteers with normal medical histories were excluded from the control group, with unsuspected abnormalities on standard neuropsychological tests, electroencephalograms, or magnetic resonance imaging scans. Computer-based testing is a valid screening methodology for the detection of mild cognitive impairment in the elderly, although this particular test has important limitations. Broader applications of computer-based testing will require extensive population-based validation. Future studies should recognize that normal control subjects without a history of disease who are typically used in validation studies may have a high incidence of unsuspected abnormalities on neurodiagnostic studies.
ERIC Educational Resources Information Center
Bing, Mark N.; Stewart, Susan M.; Davison, H. Kristl
2009-01-01
Handheld calculators have been used on the job for more than 30 years, yet the degree to which these devices can affect performance on employment tests of mathematical ability has not been thoroughly examined. This study used a within-subjects research design (N = 167) to investigate the effects of calculator use on test score reliability, test…
Jeong, Jae Yoon; Jun, Dae Won; Bai, Daiseg; Kim, Ji Yean; Sohn, Joo Hyun; Ahn, Sang Bong; Kim, Sang Gyune; Kim, Tae Yeob; Kim, Hyoung Su; Jeong, Soung Won; Cho, Yong Kyun; Song, Do Seon; Kim, Hee Yeon; Jung, Young Kul; Yoon, Eileen L
2017-09-01
The aim of this study was to validate a new paper and pencil test battery to diagnose minimal hepatic encephalopathy (MHE) in Korea. A new paper and pencil test battery was composed of number connection test-A (NCT-A), number connection test-B (NCT-B), digit span test (DST), and symbol digit modality test (SDMT). The norm of the new test was based on 315 healthy individuals between the ages of 20 and 70 years old. Another 63 healthy subjects (n = 31) and cirrhosis patients (n = 32) were included as a validation cohort. All participants completed the new paper and pencil test, a critical flicker frequency (CFF) test and computerized cognitive function test (visual continuous performance test [CPT]). The scores on the NCT-A and NCT-B increased but those of DST and SDMT decreased according to age. Twelve of the cirrhotic patients (37.5%) were diagnosed with MHE based on the new paper and pencil test battery. The total score of the paper and pencil test battery showed good positive correlation with the CFF (r = 0.551, P < 0.001) and computerized cognitive function test. Also, this score was lower in patients with MHE compared to those without MHE (P < 0.001). Scores on the CFF (32.0 vs. 28.7 Hz, P = 0.028) and the computer base cognitive test decreased significantly in patients with MHE compared to those without MHE. Test-retest reliability was comparable. In conclusion, the new paper and pencil test battery including NCT-A, NCT-B, DST, and SDMT showed good correlation with neuropsychological tests. This new paper and pencil test battery could help to discriminate patients with impaired cognitive function in cirrhosis (registered at Clinical Research Information Service [CRIS], https://cris.nih.go.kr/cris, KCT0000955). © 2017 The Korean Academy of Medical Sciences.
ERIC Educational Resources Information Center
Stanley, Leanne M.; Edwards, Michael C.
2016-01-01
The purpose of this article is to highlight the distinction between the reliability of test scores and the fit of psychometric measurement models, reminding readers why it is important to consider both when evaluating whether test scores are valid for a proposed interpretation and/or use. It is often the case that an investigator judges both the…
The Effect of Stakes on Accountability Test Scores and Pass Rates
ERIC Educational Resources Information Center
Steedle, Jeffrey T.; Grochowalski, Joseph
2017-01-01
Students may not fully demonstrate their knowledge and skills on accountability tests if there are no stakes attached to individual performance. In that case, assessment results may not accurately reflect student achievement, so the validity of score interpretations and uses suffers. For this study, matched samples of students taking state…
Louwers, Annoek; Beelen, Anita; Holmefur, Marie; Krumlinde-Sundholm, Lena
2016-12-01
To develop and evaluate a test activity from which bimanual performance in adolescents with unilateral cerebral palsy (CP) can be observed and scored with the Assisting Hand Assessment (AHA), and to evaluate the construct validity of the AHA test items for the extended age range 18 months to 18 years. A new test activity was developed and evaluated for its ability to elicit bimanual actions in adolescents with (n=20) and without (n=10) unilateral CP. The AHA scores of 126 adolescents (mean age 14y 3mo, SD 2y 6mo; 71 males, 55 females) and 157 children with unilateral CP (mean age 6y 1mo, SD 2y 10mo; 102 males, 55 females) were analysed using the Rasch measurement model. The test activity elicited bimanual actions in 100% of typically developing adolescents and in 96.8% and 57.9% of adolescents with unilateral CP (moderately and severely limited hand function respectively). The scale demonstrated good construct validity; thus the same scoring criteria can be used for the age range studied. The new Assisting Hand Assessment for adolescents (Ad-AHA) activity is valid for use with 13- to 18-year-olds to elicit bimanual performance in adolescents with unilateral CP. The same AHA scoring criteria can be used both for children and for adolescents within the age range 18 months to 18 years. © 2016 The Authors. Developmental Medicine & Child Neurology published by John Wiley & Sons Ltd on behalf of Mac Keith Press.
Psychometric properties of the Albanian version of the Orofacial Esthetic Scale: OES-ALB.
Bimbashi, Venera; Čelebić, Asja; Staka, Gloria; Hoxha, Flurije; Peršić, Sanja; Petričević, Nikola
2015-08-26
The aim was to adapt the Orofacial Esthetic Scale (OES) and to test psychometric properties of the Albanian language version in the cultural environment of the Republic of Kosovo. The OES questionnaire was translated from the original English version according to the accepted techniques. The reliability (internal consistency), and validity (construct, convergent and discriminative) were tested in 169 subjects, test-retest in 61 dental students (DS), and responsiveness in 51 prosthodontic patients with treatment needs (PPTN). The corrected item correlation coefficients of OES-ALB ranged from 0.686 to 0.909. The inter-item correlation coefficient ranged between 0.572 and 0.919. The Cronbach's alpha was 0.961 and IIC 0.758. Test- retest was confirmed by good ICCs and by no significant differences of the OES scores through the period of 14 days without any orofacial changes (p > 0.05). Construct validity was proved by the presence of one-factor composition that assumed 79.079% of the variance. Convergent validity showed significant correlation between one general question about satisfaction with orofacial esthetics and the OES summary score, as well as between the sum of the 3 OHIP-ALB49 questions related to orofacial aesthetics and the OES summary score. Discriminative validity was confirmed with statistically significant differences between DS, prosthodontic patients without treatment need and PPTN (p < 0.01). Responsiveness was confirmed by a significant increase of OES scores after PPTN patients received new fixed partial or removable dentures (P < 0.001). The results proved excellent psychometric properties of the OES-ALB questionnaire in the Republic of Kosovo.
Issar, Tushar; Arnold, Ria; Kwai, Natalie C G; Pussell, Bruce A; Endre, Zoltan H; Poynten, Ann M; Kiernan, Matthew C; Krishnan, Arun V
2018-05-01
To demonstrate construct validity of the Total Neuropathy Score (TNS) in assessing peripheral neuropathy in subjects with chronic kidney disease (CKD). 113 subjects with CKD and 40 matched controls were assessed for peripheral neuropathy using the TNS. An exploratory factor analysis was conducted and internal consistency of the scale was evaluated using Cronbach's alpha. Construct validity of the TNS was tested by comparing scores between case and control groups. Factor analysis revealed valid item correlations and internal consistency of the TNS was good with a Cronbach's alpha of 0.897. Subjects with CKD scored significantly higher on the TNS (CKD: median, 6, interquartile range, 1-13; controls: median, 0, interquartile range, 0-1; p < 0.001). Subgroup analysis revealed construct validity was maintained for subjects with stages 3-5 CKD with and without diabetes. The TNS is a valid measure of peripheral neuropathy in patients with CKD. The TNS is the first neuropathy scale to be formally validated in patients with CKD. Copyright © 2018 International Federation of Clinical Neurophysiology. Published by Elsevier B.V. All rights reserved.
The assessment of fatigue: Psychometric qualities and norms for the Checklist individual strength.
Worm-Smeitink, M; Gielissen, M; Bloot, L; van Laarhoven, H W M; van Engelen, B G M; van Riel, P; Bleijenberg, G; Nikolaus, S; Knoop, H
2017-07-01
The Checklist Individual Strength (CIS) measures four dimensions of fatigue: Fatigue severity, concentration problems, reduced motivation and activity. On the fatigue severity subscale, a cut-off score of 35 is used. This study 1) investigated the psychometric qualities of the CIS; 2) validated the cut-off score for severe fatigue and 3) provided norms. Representatives of the Dutch general population (n=2288) completed the CIS. The factor structure was investigated using an exploratory factor analysis. Internal consistency and test-retest reliability were determined. Concurrent validity was assessed in two additional samples by correlating the CIS with other fatigue scales (Chalder Fatigue Questionnaire, MOS Short form-36 Vitality subscale, EORTC QLQ-C30 fatigue subscale). To validate the fatigue severity cut-off score, a Receiver Operating Characteristics analysis was performed with patients referred to a chronic fatigue treatment centre (n=5243) and a healthy group (n=1906). Norm scores for CIS subscales were calculated for the general population, patients with chronic fatigue syndrome (CFS; n=1407) and eight groups with other medical conditions (n=1411). The original four-factor structure of the CIS was replicated. Internal consistency (α=0.84-0.95) and test-retest reliability (r=0.74-0.86) of the subscales were high. Correlations with other fatigue scales were moderate to high. The 35 points cut-off score for severe fatigue is appropriate, but, given the 17% false positive rate, should be adjusted to 40 for research in CFS. The CIS is a valid and reliable tool for the assessment of fatigue, with a validated cut-off score for severe fatigue that can be used in clinical practice. Copyright © 2017. Published by Elsevier Inc.
Coster, Wendy J.; Haley, Stephen M.; Ni, Pengsheng; Dumas, Helene M.; Fragala-Pinkham, Maria A.
2009-01-01
Objective To examine score agreement, validity, precision, and response burden of a prototype computer adaptive testing (CAT) version of the Self-Care and Social Function scales of the Pediatric Evaluation of Disability Inventory (PEDI) compared to the full-length version of these scales. Design Computer simulation analysis of cross-sectional and longitudinal retrospective data; cross-sectional prospective study. Settings Pediatric rehabilitation hospital, including inpatient acute rehabilitation, day school program, outpatient clinics; community-based day care, preschool, and children’s homes. Participants Four hundred sixty-nine children with disabilities and 412 children with no disabilities (analytic sample); 38 children with disabilities and 35 children without disabilities (cross-validation sample). Interventions Not applicable. Main Outcome Measures Summary scores from prototype CAT applications of each scale using 15-, 10-, and 5-item stopping rules; scores from the full-length Self-Care and Social Function scales; time (in seconds) to complete assessments and respondent ratings of burden. Results Scores from both computer simulations and field administration of the prototype CATs were highly consistent with scores from full-length administration (all r’s between .94 and .99). Using computer simulation of retrospective data, discriminant validity and sensitivity to change of the CATs closely approximated that of the full-length scales, especially when the 15- and 10-item stopping rules were applied. In the cross-validation study the time to administer both CATs was 4 minutes, compared to over 16 minutes to complete the full-length scales. Conclusions Self-care and Social Function score estimates from CAT administration are highly comparable to those obtained from full-length scale administration, with small losses in validity and precision and substantial decreases in administration time. PMID:18373991
Yamato, Tie Parma; Maher, Chris; Koes, Bart; Moseley, Anne
2017-06-01
The Physiotherapy Evidence Database (PEDro) scale has been widely used to investigate methodological quality in physiotherapy randomized controlled trials; however, its validity has not been tested for pharmaceutical trials. The aim of this study was to investigate the validity and interrater reliability of the PEDro scale for pharmaceutical trials. The reliability was also examined for the Cochrane Back and Neck (CBN) Group risk of bias tool. This is a secondary analysis of data from a previous study. We considered randomized placebo controlled trials evaluating any pain medication for chronic spinal pain or osteoarthritis. Convergent validity was evaluated by correlating the PEDro score with the summary score of the CBN risk of bias tool. The construct validity was tested using a linear regression analysis to determine the degree to which the total PEDro score is associated with treatment effect sizes, journal impact factor, and the summary score for the CBN risk of bias tool. The interrater reliability was estimated using the Prevalence and Bias Adjusted Kappa coefficient and 95% confidence interval (CI) for the PEDro scale and CBN risk of bias tool. Fifty-three trials were included, with 91 treatment effect sizes included in the analyses. The correlation between PEDro scale and CBN risk of bias tool was 0.83 (95% CI 0.76-0.88) after adjusting for reliability, indicating strong convergence. The PEDro score was inversely associated with effect sizes, significantly associated with the summary score for the CBN risk of bias tool, and not associated with the journal impact factor. The interrater reliability for each item of the PEDro scale and CBN risk of bias tool was at least substantial for most items (>0.60). The intraclass correlation coefficient for the PEDro score was 0.80 (95% CI 0.68-0.88), and for the CBN, risk of bias tool was 0.81 (95% CI 0.69-0.88). There was evidence for the convergent and construct validity for the PEDro scale when used to evaluate methodological quality of pharmacological trials. Both risk of bias tools have acceptably high interrater reliability. Copyright © 2017 Elsevier Inc. All rights reserved.
de Vroege, Lars; Emons, Wilco H M; Sijtsma, Klaas; van der Feltz-Cornelis, Christina M
2018-01-01
The Bermond-Vorst Alexithymia Questionnaire (BVAQ) has been validated in student samples and small clinical samples, but not in the general population; thus, representative general-population norms are lacking. We examined the factor structure of the BVAQ in Longitudinal Internet Studies for the Social Sciences panel data from the Dutch general population ( N = 974). Factor analyses revealed a first-order five-factor model and a second-order two-factor model. However, in the second-order model, the factor interpreted as analyzing ability loaded on both the affective factor and the cognitive factor. Further analyses showed that the first-order test scores are more reliable than the second-order test scores. External and construct validity were addressed by comparing BVAQ scores with a clinical sample of patients suffering from somatic symptom and related disorder (SSRD) ( N = 235). BVAQ scores differed significantly between the general population and patients suffering from SSRD, suggesting acceptable construct validity. Age was positively associated with alexithymia. Males showed higher levels of alexithymia. The BVAQ is a reliable alternative measure for measuring alexithymia.
Validity and reliability of the Turkish Migraine Disability Assessment (MIDAS) questionnaire.
Ertaş, Mustafa; Siva, Aksel; Dalkara, Turgay; Uzuner, Nevzat; Dora, Babür; Inan, Levent; Idiman, Fethi; Sarica, Yakup; Selçuki, Deniz; Sirin, Hadiye; Oğuzhanoğlu, Atilla; Irkeç, Ceyla; Ozmenoğlu, Mehmet; Ozbenli, Taner; Oztürk, Musa; Saip, Sabahattin; Neyal, Münife; Zarifoğlu, Mehmet
2004-09-01
The aim of this study is to assess the comprehensibility, internal consistency, patient-physician reliability, test-retest reliability, and validity of Turkish version of Migraine Disability Assessment (MIDAS) questionnaire in patients with headache. MIDAS questionnaire has been developed by Stewart et al and shown to be reliable and valid to determine the degree of disability caused by migraine. This study was designed as a national multicenter study to demonstrate the reliability and validity of Turkish version of MIDAS questionnaire. Patients applying to 17 Neurology Clinics in Turkey were evaluated at the baseline (visit 1), week 4 (visit 2), and week 12 (visit 3) visits in terms of disease severity and comprehensibility, internal consistency, test-retest reliability, and validity of MIDAS. Since the severity of the disease has been found to change significantly at visit 2 compared to visit 1, test-retest reliability was assessed using the MIDAS scores of a subgroup of patients whose disease severity remained unchanged (up to +/-3 days difference in the number of days with headache between visits 1 and 2). A total of 306 patients (86.2% female, mean age: 35.0 +/- 9.8 years) were enrolled into the study. A total of 65.7%, 77.5%, 82.0% of patients reported that "they had fully understood the MIDAS questionnaire" in visits 1, 2, and 3, respectively. A highly positive correlation was found between physician and patient and the applied total MIDAS scores in all three visits (Spearman correlation coefficients were R= 0.87, 0.83, and 0.90, respectively, P <.001). Internal consistency of MIDAS was assessed using Cronbach's alpha and was found at acceptable (>0.7) or excellent (>0.8) levels in both patient and physician applied MIDAS scores, respectively. Total MIDAS score showed good test-retest reliability (R= 0.68). Both the number of days with headache and the total MIDAS scores were positively correlated at all visits with correlation coefficients between 0.47 and 0.63. There was also a moderate degree of correlation (R= 0.54) between the total MIDAS score at week 12 and the number of days with headache at visit 2 + visit 3, which quantify headache-related disability over a 3-month period similar to MIDAS questionnaire. These findings demonstrated that the Turkish translation is equivalent to the English version of MIDAS in terms of internal consistency, test-retest reliability, and validity. Physicians can reliably use the Turkish translation of the MIDAS questionnaire in defining the severity of illness and its treatment strategy when applied as a self-administered report by migraine patients themselves.
A Direct Comparison of Real-World and Virtual Navigation Performance in Chronic Stroke Patients.
Claessen, Michiel H G; Visser-Meily, Johanna M A; de Rooij, Nicolien K; Postma, Albert; van der Ham, Ineke J M
2016-04-01
An increasing number of studies have presented evidence that various patient groups with acquired brain injury suffer from navigation problems in daily life. This skill is, however, scarcely addressed in current clinical neuropsychological practice and suitable diagnostic instruments are lacking. Real-world navigation tests are limited by geographical location and associated with practical constraints. It was, therefore, investigated whether virtual navigation might serve as a useful alternative. To investigate the convergent validity of virtual navigation testing, performance on the Virtual Tubingen test was compared to that on an analogous real-world navigation test in 68 chronic stroke patients. The same eight subtasks, addressing route and survey knowledge aspects, were assessed in both tests. In addition, navigation performance of stroke patients was compared to that of 44 healthy controls. A correlation analysis showed moderate overlap (r = .535) between composite scores of overall real-world and virtual navigation performance in stroke patients. Route knowledge composite scores correlated somewhat stronger (r = .523) than survey knowledge composite scores (r = .442). When comparing group performances, patients obtained lower scores than controls on seven subtasks. Whereas the real-world test was found to be easier than its virtual counterpart, no significant interaction-effects were found between group and environment. Given moderate overlap of the total scores between the two navigation tests, we conclude that virtual testing of navigation ability is a valid alternative to navigation tests that rely on real-world route exposure.
Muñoz, Gerard; Buxó, Maria; de Gracia, Javier; Olveira, Casilda; Martinez-Garcia, Miguel Angel; Giron, Rosa; Polverino, Eva; Alvarez, Antonio; Birring, Surinder S; Vendrell, Montserrat
2016-05-01
The Leicester Cough Questionnaire (LCQ) has been validated in non-cystic fibrosis bronchiectasis (NCFBC). The present study aimed to create and validate a Spanish version of the LCQ (LCQ-Sp) in NCFBC. The LCQ-Sp was developed following a standardized protocol. For reliability, we assessed internal consistency and the change in score over a 15-day period in stable state. For responsiveness, we assessed the change in scores between visit 1 and the first exacerbation. For validity, we evaluated convergent validity through correlation with the Saint George's Respiratory Questionnaire (SGRQ) and discriminant validity. Two hundred fifty-nine patients (118 mild bronchiectasis, 90 moderate bronchiectasis and 47 severe bronchiectasis) were included. Internal consistency was high for the total scoring and good for the different domains (Cronbach's α: 0.86-0.91). The test-retest reliability shows an intraclass correlation coefficient of 0.87 for the total score. The mean LCQ-Sp score at visit 1 decreased at the beginning of an exacerbation (15.13 ± 4.06 vs. 12.24 ± 4.64; p < 0.001). The correlation between LCQ-Sp and SGRQ scores was -0.66 (p < 0.01). The differences in the LCQ-Sp total score between the different groups of severity were significant (p < 0.001). The LCQ-Sp discriminates disease severity, is responsive to change when faced with exacerbations and is reliable for use in bronchiectasis. © The Author(s) 2016.
Sutcliffe, Robert P; Hollyman, Marianne; Hodson, James; Bonney, Glenn; Vohra, Ravi S; Griffiths, Ewen A
2016-11-01
Laparoscopic cholecystectomy is commonly performed, and several factors increase the risk of open conversion, prolonging operating time and hospital stay. Preoperative stratification would improve consent, scheduling and identify appropriate training cases. The aim of this study was to develop a validated risk score for conversion for use in clinical practice. Preoperative patient and disease-related variables were identified from a prospective cholecystectomy database (CholeS) of 8820 patients, divided into main and validation sets. Preoperative predictors of conversion were identified by multivariable binary logistic regression. A risk score was developed and validated using a forward stepwise approach. Some 297 procedures (3.4%) were converted. The risk score was derived from six significant predictors: age (p = 0.005), sex (p < 0.001), indication for surgery (p < 0.001), ASA (p < 0.001), thick-walled gallbladder (p = 0.040) and CBD diameter (p = 0.004). Testing the score on the validation set yielded an AUROC = 0.766 (p < 0.001), and a score >6 identified patients at high risk of conversion (7.1% vs. 1.2%). This validated risk score allows preoperative identification of patients at six-fold increased risk of conversion to open cholecystectomy. Copyright © 2016 International Hepato-Pancreato-Biliary Association Inc. Published by Elsevier Ltd. All rights reserved.
Baena-Díez, José Miguel; Subirana, Isaac; Ramos, Rafael; Gómez de la Cámara, Agustín; Elosua, Roberto; Vila, Joan; Marín-Ibáñez, Alejandro; Guembe, María Jesús; Rigo, Fernando; Tormo-Díaz, María José; Moreno-Iribas, Conchi; Cabré, Joan Josep; Segura, Antonio; Lapetra, José; Quesada, Miquel; Medrano, María José; González-Diego, Paulino; Frontera, Guillem; Gavrila, Diana; Ardanaz, Eva; Basora, Josep; García, José María; García-Lareo, Manel; Gutiérrez-Fuentes, José Antonio; Mayoral, Eduardo; Sala, Joan; Dégano, Irene R; Francès, Albert; Castell, Conxa; Grau, María; Marrugat, Jaume
2018-04-01
To assess the validity of the original low-risk SCORE function without and with high-density lipoprotein cholesterol and SCORE calibrated to the Spanish population. Pooled analysis with individual data from 12 Spanish population-based cohort studies. We included 30 919 individuals aged 40 to 64 years with no history of cardiovascular disease at baseline, who were followed up for 10 years for the causes of death included in the SCORE project. The validity of the risk functions was analyzed with the area under the ROC curve (discrimination) and the Hosmer-Lemeshow test (calibration), respectively. Follow-up comprised 286 105 persons/y. Ten-year cardiovascular mortality was 0.6%. The ratio between estimated/observed cases ranged from 9.1, 6.5, and 9.1 in men and 3.3, 1.3, and 1.9 in women with original low-risk SCORE risk function without and with high-density lipoprotein cholesterol and calibrated SCORE, respectively; differences were statistically significant with the Hosmer-Lemeshow test between predicted and observed mortality with SCORE (P < .001 in both sexes and with all functions). The area under the ROC curve with the original SCORE was 0.68 in men and 0.69 in women. All versions of the SCORE functions available in Spain significantly overestimate the cardiovascular mortality observed in the Spanish population. Despite the acceptable discrimination capacity, prediction of the number of fatal cardiovascular events (calibration) was significantly inaccurate. Copyright © 2017 Sociedad Española de Cardiología. Published by Elsevier España, S.L.U. All rights reserved.
Further validation and definition of the psychometric properties of the Asthma Impact Survey.
Schatz, Michael; Zeiger, Robert S; Yang, Su-Jau; Chen, Wansu; Kosinski, Mark
2011-07-01
The Asthma Impact Survey (AIS-6) is a brief disease-specific quality-of-life instrument with limited published validation data. To obtain additional validation data and psychometric properties of the AIS-6. In November, 2007, patients with persistent asthma were mailed a survey that included the AIS-6, the mini-Asthma Quality of Life Questionnaire (mAQLQ), and the Asthma Control Test (ACT). Follow-up surveys were sent in April, July, and October 2008. Year 2008 exacerbations and short-acting β-agonist (SABA) dispensings were captured from administrative data. A total of 2680 patients had complete baseline survey data. Criterion validity was demonstrated by the strong correlations of the AIS-6 with the mAQLQ (r = -0.84 to -0.86); construct validity by significant relationships (P < .0001) of the AIS-6 with mAQLQ domain scores, ACT score, and history of exacerbations; and predictive validity by significant relationships (P < .0001) between AIS-6 scores at the end of 2007 and year 2008 exacerbations and high SABA dispensings. Responsiveness was demonstrated by significant (P < .0001) correlations (r = -0.39 to -0.58) between changes in AIS-6 scores and changes in mAQLQ and ACT scores over time. A preliminary minimally important difference (MID) in AIS-6 was estimated to be 4 by using the mAQLQ MID as an anchor. Excellent internal consistency (α = 0.94) and test-retest reliability (intraclass correlation coefficient = 0.86-0.91) were also demonstrated. The AIS-6 demonstrated good psychometric properties in a large independent sample and could be used to assess asthma-specific quality of life in clinical practice and clinical research. Copyright © 2011 American Academy of Allergy, Asthma & Immunology. Published by Mosby, Inc. All rights reserved.
Cross-Cultural Adaptation and Validation of the Italian Version of SWAL-QOL.
Ginocchio, Daniela; Alfonsi, Enrico; Mozzanica, Francesco; Accornero, Anna Rosa; Bergonzoni, Antonella; Chiarello, Giulia; De Luca, Nicoletta; Farneti, Daniele; Marilia, Simonelli; Calcagno, Paola; Turroni, Valentina; Schindler, Antonio
2016-10-01
The aim of the study was to evaluate the reliability and validity of the Italian SWAL-QOL (I-SWAL-QOL). The study consisted of five phases: item generation, reliability analysis, normative data generation, validity analysis, and responsiveness analysis. The item generation phase followed the five-step, cross-cultural, adaptation process of translation and back-translation. A group of 92 dysphagic patients was enrolled for the internal consistency analysis. Seventy-eight patients completed the I-SWAL-QOL twice, 2 weeks apart, for test-retest reliability analysis. A group of 200 asymptomatic subjects completed the I-SWAL-QOL for normative data generation. I-SWAL-QOL scores obtained by both the group of dysphagic subjects and asymptomatic ones were compared for validity analysis. I-SWAL-QOL scores were correlated with SF-36 scores in 67 patients with dysphagia for concurrent validity analysis. Finally, I-SWAL-QOL scores obtained in a group of 30 dysphagic patients before and after successful rehabilitation treatment were compared for responsiveness analysis. All the enrolled patients managed to complete the I-SWAL-QOL without needing any assistance, within 20 min. Internal consistency was acceptable for all I-SWAL-QOL subscales (α > 0.70). Test-retest reliability was also satisfactory for all subscales (ICC > 0.7). A significant difference between the dysphagic group and the control group was found in all I-SWAL-QOL subscales (p < 0.05). Mild to moderate correlations between I-SWAL-QOL and SF-36 subscales were observed. I-SWAL-QOL scores obtained in the pre-treatment condition were significantly lower than those obtained after swallowing rehabilitation. I-SWAL-QOL is reliable, valid, responsive to changes in QOL, and recommended for clinical practice and outcome research.
2014-01-01
Background Diabetes education and self-care remains the cornerstone of diabetes management. There are many structured diabetes modules available in the United Kingdom, Europe and United States of America. Contrastingly, few structured and validated diabetes modules are available in Malaysia. This pilot study aims to develop and validate diabetes education material suitable and tailored for a multicultural society like Malaysia. Methods The theoretical framework of this module was founded from the Health Belief Model (HBM). The participants were assessed using 6-item pre- and post-test questionnaires that measured some of the known HBM constructs namely cues to action, perceived severity and perceived benefit. Data was analysed using PASW Statistics 18.0. Results The pre- and post-test questionnaires were administered to 88 participants (31 males). In general, there was a significant increase in the total score in post-test (97.34 ± 6.13%) compared to pre-test (92.80 ± 12.83%) (p < 0.05) and a significant increase in excellent score (>85%) at post-test (84.1%) compared to pre-test (70.5%) (p < 0.05). There was an improvement in post-test score in 4 of 6 items tested. The remaining 2 items which measured the perceived severity and cues to action had poorer post-test score. Conclusions The preliminary results from this pilot study suggest contextualised content material embedded within MY DEMO maybe suitable for integration with the existing diabetes education programmes. This was the first known validated diabetes education programme available in the Malay language. PMID:24708715
Score Trends, SAT Validity and Subgroup Differences
ERIC Educational Resources Information Center
Camara, Wayne
2008-01-01
Presented at the Summer Institute on College Admissions at Harvard in June 2008. The presentation explores whether the SAT validity has changed with the test changes and if those changes affect specific subgroups.
Racial/Ethnic Differences in the Predictive Validity of MCAT Scores.
ERIC Educational Resources Information Center
Jones, Robert F.; Mitchell, Karen
Medical College Admission Test (MCAT) score differences were examined for Black and White examinees who entered American medical schools in 1978 and 1979. The incidence of academic difficulty resulting in delayed graduation, withdrawal, or dismissal was also examined. The MCAT provides six scores: biology, chemistry, physics, science problems,…
Methodological Approaches to Online Scoring of Essays.
ERIC Educational Resources Information Center
Chung, Gregory K. W. K.; O'Neil, Harold F., Jr.
This report examines the feasibility of scoring essays using computer-based techniques. Essays have been incorporated into many of the standardized testing programs. Issues of validity and reliability must be addressed to deploy automated approaches to scoring fully. Two approaches that have been used to classify documents, surface- and word-based…
Rodríguez-Martínez, Carlos E; Nino, Gustavo; Castro-Rodriguez, Jose A
2014-01-01
There is a critical need for validation studies of questionnaires designed to assess the level of control of asthma in children younger than 5 years old. To validate the Spanish version of the Test for Respiratory and Asthma Control in Kids (TRACK) questionnaire in children younger than age 5 years with symptoms consistent with asthma. In a prospective cohort validation study, parents and/or caregivers of children younger than age 5 years and with symptoms consistent with asthma, during a baseline and a follow-up visit 2 to 6 weeks later, completed the information required to assess the content validity, criterion validity, construct validity, test-retest reliability, sensitivity to change, internal consistency reliability, and usability of the TRACK questionnaire. Median (interquartile range) of the TRACK scores were significantly different between patients with well-controlled asthma, patients with not well-controlled asthma, and patients with very poorly controlled asthma (90.0 [75.0-95.0], 75.0 [55.0-85.0], and 35.0 [25.0-55.0], respectively, P < .001). TRACK scores were significantly different between patients classified as currently symptomatic and symptomatic in the recent past (42.5 [25.0-55.0] vs 85.0 [75.0-90.0]; P < .001). The intraclass correlation coefficient of the measurements was 0.755 (95% CI, 0.503-1.00). All patients whose clinical status changed showed an increase of 10 or more points in TRACK score between baseline and follow-up visits. The Cronbach α was 0.77 for the questionnaire as a whole. The Spanish version of the TRACK questionnaire has excellent sensitivity to change and usability; adequate criterion validity, construct validity, and test-retest reliability; and an acceptable internal consistency, when used in children younger than age 5 years with symptoms consistent with asthma. Copyright © 2014 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.
Validation of the Mayo Hip Score: construct validity, reliability and responsiveness to change.
Singh, Jasvinder A; Schleck, Cathy; Harmsen, W Scott; Lewallen, David G
2016-01-19
Previous studies have provided the initial evidence for construct validity and test-retest reliability of the Mayo Hip Score. Instruments used for Total Hip Arthroplasty (THA) outcomes assessment should be valid, reliable and responsive to change. Our main objective was to examine the responsiveness to change, association with subsequent revision and the construct validity of the Mayo hip score. Discriminant ability was assessed by calculating effect size (ES), standardized response mean (SRM) and Guyatt's responsiveness index (GRI). Minimal clinically important difference (MCII) and moderate improvement thresholds were calculated. We assessed construct validity by examining association of scores with preoperative patient characteristics and correlation with Harris hip score, and assessed association of scores with the risk of subsequent revision. Five thousand three hundred seven provided baseline data; of those with baseline data, 2,278 and 2,089 (39%) provided 2- and 5-year data, respectively. Large ES, SRM and GRI ranging 2.66-2.78, 2.42-2.61 and 1.67-1.88 were noted for Mayo hip scores with THA, respectively. The MCII and moderate improvement thresholds were 22.4-22.7 and 39.4-40.5 respectively. Hazard ratios of revision surgery were higher with lower final score or less improvement in Mayo hip score at 2-years and borderline significant/non-significant at 5-years, respectively: (1) score ≤55 with hazard ratios of 2.24 (95% CI, 1.45, 3.46; p = 0.0003) and 1.70 (95% CI, 1.00, 2.92; p = 0.05) of implant revision subsequently, compared to 72-80 points; (2) no improvement or worsening score with hazard ratios 3.94 (95% CI, 1.50, 10.30; p = 0.005) and 2.72 (95% CI, 0.85,8.70; p = 0.09), compared to improvement >50-points. Mayo hip score had significant positive correlation with younger age, male gender, lower BMI, lower ASA class and lower Deyo-Charlson index (p ≤ 0.003 for each) and with Harris hip scores (p < 0.001). Mayo Hip Score is valid, sensitive to change and associated with future risk of revision surgery in patients with primary THA.
Kim, Chung-Il; Han, Dong-Wook; Park, Il-Hyeok
2014-04-01
The Test of Gross Motor Development-II (TGMD-II) is a frequently used assessment tool for measuring motor ability. The purpose of this study is to investigate the reliability and validity of TGMD-II's weighting scores (by comparing pre-weighted TGMD-II scores with post ones) as well as examine applicability of the TGMD-II on Korean preschool children. A total of 121 Korean children (three kindergartens) participated in this study. There were 65 preschoolers who were 5-years-old (37 boys and 28 girls) and 56 preschoolers who were 6-years-old (34 boys and 22 girls). For internal consistency, reliability, and construct validity, only one researcher evaluated all of the children using the TGMD-II in the following areas: running; galloping; sliding; hopping; leaping; horizontal jumping; overhand throwing; underhand rolling; striking a stationary ball; stationary dribbling; kicking; and catching. For concurrent validity, the evaluator measured physical fitness (strength, flexibility, power, agility, endurance, and balance). The key findings were as follows: first, the reliability coefficient and the validity coefficient between pre-weighted and post-weighted TGMD-II scores were quite similar. Second, the research showed adequate reliability and validity of the TGMD-II for Korean preschool children. The TGMD-II is a proper instrument to test Korean children's motor development. Yet, applying relative weighting on the TGMD-II should be a point of consideration. Copyright © 2014 Elsevier Ltd. All rights reserved.
Validity of Montreal Cognitive Assessment in non-english speaking patients with Parkinson's disease.
Krishnan, Syam; Justus, Sunitha; Meluveettil, Radhamani; Menon, Ramshekhar N; Sarma, Sankara P; Kishore, Asha
2015-01-01
The Montreal Cognitive Assessment is a brief and easy screening tool for accurately testing cognitive dysfunction in Parkinson's disease. We tested its validity for use in non-English (Malayalam) speaking patients with Parkinson's disease. We developed a Malayalam (a south-Indian language) version of Montreal Cognitive Assessment and applied to 70 patients with Parkinson's disease and 60 age- and education-matched healthy controls. Metric properties were assessed, and the scores were compared with the performance in validated Malayalam versions of Mini Mental Status Examination and Addenbrooke's Cognitive Examination. The Montreal Cognitive Assessment-Malayalam showed good internal consistency and test-retest reliability and its scores correlated with Mini Mental Status Examination (patients: R = 0.70; P < 0.001; healthy controls: R = 0.26; P = 0.04) and Addenbrooke's Cognitive Examination (patients: R = 0.8; P < 0.001; healthy controls: R = 0.52; P < 0.001) scores. This study establishes the reliability of cross-cultural adaptation of Montreal Cognitive Assessment for assessing cognition in Malayalam-speaking Parkinson's disease patients for early screening and potential future interventions for cognitive dysfunction.
ERIC Educational Resources Information Center
Zeidner, Moshe
1987-01-01
This study examined the cross-cultural validity of the sex bias contention with respect to standardized aptitude testing, used for academic prediction purposes in Israel. Analyses were based on the grade point average and scores of 1778 Jewish and 1017 Arab students who were administered standardized college entrance test batteries. (Author/LMO)
A Rasch-Based Validation of the Hooper Visual Organization Test in Chinese-Speaking Children
ERIC Educational Resources Information Center
Wuang, Yee-Pay; Wang, Li-Chen; Su, Chwen-Yng
2010-01-01
The aim of this study was to examine the validation of the Hooper Visual Organization Test (HVOT) for use in children by testing for item fit, unidimensionality, item hierarchy, reliability, and screening capacity. A modified scoring system was devised for the HVOT so that children received some credit for being able to describe the function of…
Lubans, David R; Smith, Jordan J; Harries, Simon K; Barnett, Lisa M; Faigenbaum, Avery D
2014-05-01
The aim of this study was to describe the development and assess test-retest reliability and construct validity of the Resistance Training Skills Battery (RTSB) for adolescents. The RTSB provides an assessment of resistance training skill competency and includes 6 exercises (i.e., body weight squat, push-up, lunge, suspended row, standing overhead press, and front support with chest touches). Scoring for each skill is based on the number of performance criteria successfully demonstrated. An overall resistance training skill quotient (RTSQ) is created by adding participants' scores for the 6 skills. Participants (44 boys and 19 girls, mean age = 14.5 ± 1.2 years) completed the RTSB on 2 occasions separated by 7 days. Participants also completed the following fitness tests, which were used to create a muscular fitness score (MFS): handgrip strength, timed push-up, and standing long jump tests. Intraclass correlation (ICC), paired samples t-tests, and typical error were used to assess test-retest reliability. To assess construct validity, gender and RTSQ were entered into a regression model predicting MFS. The rank order repeatability of the RTSQ was high (ICC = 0.88). The model explained 39% of the variance in MFS (p ≤ 0.001) and RTSQ (r = 0.40, p ≤ 0.001) was a significant predictor. This study has demonstrated the construct validity and test-retest reliability of the RTSB in a sample of adolescents. The RTSB can reliably rank participants in regards to their resistance training competency and has the necessary sensitivity to detect small changes in resistance training skill proficiency.
The Arthroscopic Surgical Skill Evaluation Tool (ASSET)
Koehler, Ryan J.; Amsdell, Simon; Arendt, Elizabeth A; Bisson, Leslie J; Braman, Jonathan P; Butler, Aaron; Cosgarea, Andrew J; Harner, Christopher D; Garrett, William E; Olson, Tyson; Warme, Winston J.; Nicandri, Gregg T.
2014-01-01
Background Surgeries employing arthroscopic techniques are among the most commonly performed in orthopaedic clinical practice however, valid and reliable methods of assessing the arthroscopic skill of orthopaedic surgeons are lacking. Hypothesis The Arthroscopic Surgery Skill Evaluation Tool (ASSET) will demonstrate content validity, concurrent criterion-oriented validity, and reliability, when used to assess the technical ability of surgeons performing diagnostic knee arthroscopy on cadaveric specimens. Study Design Cross-sectional study; Level of evidence, 3 Methods Content validity was determined by a group of seven experts using a Delphi process. Intra-articular performance of a right and left diagnostic knee arthroscopy was recorded for twenty-eight residents and two sports medicine fellowship trained attending surgeons. Subject performance was assessed by two blinded raters using the ASSET. Concurrent criterion-oriented validity, inter-rater reliability, and test-retest reliability were evaluated. Results Content validity: The content development group identified 8 arthroscopic skill domains to evaluate using the ASSET. Concurrent criterion-oriented validity: Significant differences in total ASSET score (p<0.05) between novice, intermediate, and advanced experience groups were identified. Inter-rater reliability: The ASSET scores assigned by each rater were strongly correlated (r=0.91, p <0.01) and the intra-class correlation coefficient between raters for the total ASSET score was 0.90. Test-retest reliability: there was a significant correlation between ASSET scores for both procedures attempted by each individual (r = 0.79, p<0.01). Conclusion The ASSET appears to be a useful, valid, and reliable method for assessing surgeon performance of diagnostic knee arthroscopy in cadaveric specimens. Studies are ongoing to determine its generalizability to other procedures as well as to the live OR and other simulated environments. PMID:23548808
Marshall, Paul; Schroeder, Ryan; O'Brien, Jeffrey; Fischer, Rebecca; Ries, Adam; Blesi, Brita; Barker, Jessica
2010-10-01
This study examines the effectiveness of symptom validity measures to detect suspect effort in cognitive testing and invalid completion of ADHD behavior rating scales in 268 adults referred for ADHD assessment. Patients were diagnosed with ADHD based on cognitive testing, behavior rating scales, and clinical interview. Suspect effort was diagnosed by at least two of the following: failure on embedded and free-standing SVT measures, a score > 2 SD below the ADD population average on tests, failure on an ADHD behavior rating scale validity scale, or a major discrepancy between reported and observed ADHD behaviors. A total of 22% of patients engaged in symptom exaggeration. The Word Memory test immediate recall and consistency score (both 64%), TOVA omission errors (63%) and reaction time variability (54%), CAT-A infrequency scale (58%), and b Test (47%) had good sensitivity as well as at least 90% specificity. Clearly, such measures should be used to help avoid making false positive diagnoses of ADHD.
The analytical validation of the Oncotype DX Recurrence Score assay
Baehner, Frederick L
2016-01-01
In vitro diagnostic multivariate index assays are highly complex molecular assays that can provide clinically actionable information regarding the underlying tumour biology and facilitate personalised treatment. These assays are only useful in clinical practice if all of the following are established: analytical validation (i.e., how accurately/reliably the assay measures the molecular characteristics), clinical validation (i.e., how consistently/accurately the test detects/predicts the outcomes of interest), and clinical utility (i.e., how likely the test is to significantly improve patient outcomes). In considering the use of these assays, clinicians often focus primarily on the clinical validity/utility; however, the analytical validity of an assay (e.g., its accuracy, reproducibility, and standardisation) should also be evaluated and carefully considered. This review focuses on the rigorous analytical validation and performance of the Oncotype DX® Breast Cancer Assay, which is performed at the Central Clinical Reference Laboratory of Genomic Health, Inc. The assay process includes tumour tissue enrichment (if needed), RNA extraction, gene expression quantitation (using a gene panel consisting of 16 cancer genes plus 5 reference genes and quantitative real-time RT-PCR), and an automated computer algorithm to produce a Recurrence Score® result (scale: 0–100). This review presents evidence showing that the Recurrence Score result reported for each patient falls within a tight clinically relevant confidence interval. Specifically, the review discusses how the development of the assay was designed to optimise assay performance, presents data supporting its analytical validity, and describes the quality control and assurance programmes that ensure optimal test performance over time. PMID:27729940
The analytical validation of the Oncotype DX Recurrence Score assay.
Baehner, Frederick L
2016-01-01
In vitro diagnostic multivariate index assays are highly complex molecular assays that can provide clinically actionable information regarding the underlying tumour biology and facilitate personalised treatment. These assays are only useful in clinical practice if all of the following are established: analytical validation (i.e., how accurately/reliably the assay measures the molecular characteristics), clinical validation (i.e., how consistently/accurately the test detects/predicts the outcomes of interest), and clinical utility (i.e., how likely the test is to significantly improve patient outcomes). In considering the use of these assays, clinicians often focus primarily on the clinical validity/utility; however, the analytical validity of an assay (e.g., its accuracy, reproducibility, and standardisation) should also be evaluated and carefully considered. This review focuses on the rigorous analytical validation and performance of the Oncotype DX ® Breast Cancer Assay, which is performed at the Central Clinical Reference Laboratory of Genomic Health, Inc. The assay process includes tumour tissue enrichment (if needed), RNA extraction, gene expression quantitation (using a gene panel consisting of 16 cancer genes plus 5 reference genes and quantitative real-time RT-PCR), and an automated computer algorithm to produce a Recurrence Score ® result (scale: 0-100). This review presents evidence showing that the Recurrence Score result reported for each patient falls within a tight clinically relevant confidence interval. Specifically, the review discusses how the development of the assay was designed to optimise assay performance, presents data supporting its analytical validity, and describes the quality control and assurance programmes that ensure optimal test performance over time.
Asaadi, Sina; Ashrafi, Farzad; Omidbeigi, Mahmoud; Nasiri, Zahra; Pakdaman, Hossein; Amini-Harandi, Ali
2016-01-05
Cognitive impairment in patients with Parkinson's disease (PD) mainly involves executive function (EF). The frontal assessment battery (FAB) is an efficient tool for the assessment of EFs. The aims of this study were to determine the validity and reliability of the psychometric properties of the Persian version of FAB and assess its correlation with formal measures of EFs to provide normative data for the Persian version of FAB in patients with PD. The study recruited 149 healthy participants and 49 patients with idiopathic PD. In PD patients, FAB results were compared to their performance on EF tests. Reliability analysis involved test-retest reliability and internal consistency, whereas validity analysis involved convergent validity approach. FAB scores compared in normal controls and in PD patients matched for age, education, and Mini-Mental State Examination (MMSE) score. In PD patients, FAB scores were significantly decreased compared to normal controls, and correlated with Stroop test and Wisconsin Card Sorting Test (WCST). In healthy subjects, FAB scores varied according to the age, education, and MMSE. In the FAB subtest analysis, the performances of PD patients were worse than the healthy participants on similarities, fluency tasks, and Luria's motor series. Persian version of FAB could be used as a reliable scale for the assessment of frontal lobe functions in Iranian patients with PD. Furthermore, normative data provided for the Persian version of this test improve the accuracy and confidence in the clinical application of the FAB.
ERIC Educational Resources Information Center
Slepkov, Aaron D.; Vreugdenhil, Andrew J.; Shiell, Ralph C.
2016-01-01
There are numerous benefits to answer-until-correct (AUC) approaches to multiple-choice testing, not the least of which is the straightforward allotment of partial credit. However, the benefits of granting partial credit can be tempered by the inevitable increase in test scores and by fears that such increases are further contaminated by a large…
ERIC Educational Resources Information Center
Yarbrough, Nükhet D.
2016-01-01
As part of a project to translate and administer the Torrance Tests of Creative Thinking (TTCT) to Turkish elementary and secondary students, 35 professionals were trained in a full-day workshop to learn to score the verbal TTCT. All trainees scored the same 4 sets of TTCT verbal criterion tests for fluency, flexibility, and originality by filling…
A comparison of WISC-IV and SB-5 intelligence scores in adolescents with autism spectrum disorder.
Baum, Katherine T; Shear, Paula K; Howe, Steven R; Bishop, Somer L
2015-08-01
In autism spectrum disorders, results of cognitive testing inform clinical care, theories of neurodevelopment, and research design. The Wechsler Intelligence Scale for Children and the Stanford-Binet are commonly used in autism spectrum disorder evaluations and scores from these tests have been shown to be highly correlated in typically developing populations. However, they have not been compared in individuals with autism spectrum disorder, whose core symptoms can make testing challenging, potentially compromising test reliability. We used a within-subjects research design to evaluate the convergent validity between the Wechsler Intelligence Scale for Children, 4th ed., and Stanford-Binet, 5th ed., in 40 youth (ages 10-16 years) with autism spectrum disorder. Corresponding intelligence scores were highly correlated (r = 0.78 to 0.88), but full-scale intelligence quotient (IQ) scores (t(38) = -2.27, p = 0.03, d = -0.16) and verbal IQ scores (t(36) = 2.23, p = 0.03; d = 0.19) differed between the two tests. Most participants obtained higher full-scale IQ scores on the Stanford-Binet, 5th ed., compared to Wechsler Intelligence Scale for Children, 4th ed., with 14% scoring more than one standard deviation higher. In contrast, verbal indices were higher on the Wechsler Intelligence Scale for Children, 4th ed., Verbal-nonverbal discrepancy classifications were only consistent for 60% of the sample. Comparisons of IQ test scores in autism spectrum disorder and other special groups are important, as it cannot necessarily be assumed that convergent validity findings in typically developing children and adolescents hold true across all pediatric populations. © The Author(s) 2014.
Deichmann Nielsen, Lea; Bech, Per; Hounsgaard, Lise; Alkier Gildberg, Frederik
2017-08-01
Unstructured risk assessment, as well as confounders (underlying reasons for the patient's risk behaviour and alliance), risk behaviour, and parameters of alliance, have been identified as factors that prolong the duration of mechanical restraint among forensic mental health inpatients. To clinically validate a new, structured short-term risk assessment instrument called the Mechanical Restraint-Confounders, Risk, Alliance Score (MR-CRAS), with the intended purpose of supporting the clinicians' observation and assessment of the patient's readiness to be released from mechanical restraint. The content and layout of MR-CRAS and its user manual were evaluated using face validation by forensic mental health clinicians, content validation by an expert panel, and pilot testing within two, closed forensic mental health inpatient units. The three sub-scales (Confounders, Risk, and a parameter of Alliance) showed excellent content validity. The clinical validations also showed that MR-CRAS was perceived and experienced as a comprehensible, relevant, comprehensive, and useable risk assessment instrument. MR-CRAS contains 18 clinically valid items, and the instrument can be used to support the clinical decision-making regarding the possibility of releasing the patient from mechanical restraint. The present three studies have clinically validated a short MR-CRAS scale that is currently being psychometrically tested in a larger study.
Rand, Stacey; Malley, Juliette; Towers, Ann-Marie; Netten, Ann; Forder, Julien
2017-08-18
The Adult Social Care Outcomes Toolkit (ASCOT-SCT4) is a multi-attribute utility index designed for the evaluation of long-term social care services. The measure comprises eight attributes that capture aspects of social care-related quality of life. The instrument has previously been validated with a sample of older adults who used home care services in England. This paper aims to demonstrate the instrument's test-retest reliability and provide evidence for its validity in a diverse sample of adults who use publicly-funded, community-based social care in England. A survey of 770 social care service users was conducted in England. A subsample of 100 services users participated in a follow-up interview between 7 and 21 days after baseline. Spearman rank correlation coefficients between the ASCOT-SCT4 index score and the EQ-5D-3 L, the ICECAP-A or ICECAP-O and overall quality of life were used to assess convergent validity. Data on variables hypothesised to be related to the ASCOT-SCT4 index score, as well as rating of individual attributes, were also collected. Hypothesised relationships were tested using one-way ANOVA or Fisher's exact test. Test-retest reliability was assessed using the intra-class correlation coefficient for the ASCOT-SCT4 index score at baseline and follow-up. There were moderate to strong correlations between the ASCOT-SCT4 index and EQ-5D-3 L, the ICECAP-A or ICECAP-O, and overall quality of life (all correlations ≥ 0.3). The construct validity was further supported by statistically significant hypothesised relationships between the ASCOT-SCT4 index and individual characteristics in univariate and multivariate analysis. There was also further evidence for the construct validity for the revised Food and drink and Dignity items. The test-retest reliability was considered to be good (ICC = 0.783; 95% CI: 0.678-0.857). The ASCOT-SCT4 index has good test-retest reliability for adults with physical or sensory disabilities who use social care services. The index score and the attributes appear to be valid for adults receiving social care for support reasons connected to underlying mental health problems, and physical or sensory disabilities. Further reliability testing with a wider sample of social care users is warranted, as is further exploration of the relationship between the ASCOT-SCT4, ICECAP-A/O and EQ-5D-3 L indices.
Cho, Jae Hoon; Jeong, Yong Soo; Lee, Yeo Jin; Hong, Seok-Chan; Yoon, Joo-Heon; Kim, Jin Kook
2009-06-01
The Korean Version of the Sniffin' stick (KVSS) is the first olfactory test for Koreans. Although we adopted the Sniffin' Stick, we modified it to make it more suitable for Koreans. KVSS I is a screening test, and KVSS II a more comprehensive test. The aims of this study were to apply the KVSS test and assess its clinical validity and reliability in comparison to CC-SIT. One hundred and seventy-four healthy volunteers and 206 patients with subjective decreased olfaction participated. Each participant was tested with both the CC-SIT and KVSS tests and then the correlation between these two tests was analyzed. The correlation between CC-SIT and KVSS I was 0.720 (p<0.01) and 0.714 between the CC-SIT and KVSS II total scores (p<0.01). When the degree of olfaction based on the KVSS I was used, the mean CC-SIT score was 8.6+/-1.8 for normosmia, 7.3+/-2.2 for hyposmia, and 4.2+/-2.3 for anosmia. When the KVSS II total was applied, the mean CC-SIT score was 8.4+/-1.8 for normosmia, 7.3+/-2.0 for hyposmia, and 3.7+/-2.0 for anosmia. The means of the three group differed significantly in both cases (p<0.01). Thus, the KVSS test demonstrates validity and reliability for Korean in comparison with CC-SIT.
Fransen, Job; D'Hondt, Eva; Bourgois, Jan; Vaeyens, Roel; Philippaerts, Renaat M; Lenoir, Matthieu
2014-06-01
This study investigated convergent and discriminant validity between two motor competence assessment instruments in 2485 Flemish children: the Bruininks-Oseretsky Test of Motor Proficiency 2 Short Form (BOT-2 Short Form) and the KörperKoördinationsTest für Kinder (KTK). A Pearson correlation assessed the relationship between BOT-2 Short Form total, gross and fine motor composite scores and KTK Motor Quotient in three age cohorts (6-7, 8-9, 10-11 years). Crosstabs were used to measure agreement in classification in children scoring below percentile 5 and 15 and above percentile 85 and 95. Moderately strong positive (r=0.44-0.64) associations between BOT-2 total and gross motor composite scores and KTK Motor Quotient and weak positive correlations between BOT-2 Short Form fine motor composite and KTK Motor Quotient scores (r=0.25-0.37) were found. Levels of agreement were fair to moderate. Therefore, some proof of convergent and discriminant validity between BOT-2 Short Form and KTK was established in this study, underlining the notion that the evaluation of motor competence should not be based upon a single assessment instrument. Copyright © 2014 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Hildebrand, Myrene; Hoover, H. D.
This study compared the reliability and validity of two different measures of reading ability, the Degrees of Reading Power (DRP) and the Iowa Tests of Basic Skills (ITBS) Reading test and the ITBS Vocabulary test. The data consisted of scores of 377 grade 5 and grade 6 students on these tests, along with their assigned reading levels in the…
Construct Validity: Advances in Theory and Methodology
Strauss, Milton E.; Smith, Gregory T.
2008-01-01
Measures of psychological constructs are validated by testing whether they relate to measures of other constructs as specified by theory. Each test of relations between measures reflects on the validity of both the measures and the theory driving the test. Construct validation concerns the simultaneous process of measure and theory validation. In this chapter, we review the recent history of validation efforts in clinical psychological science that has led to this perspective, and we review five recent advances in validation theory and methodology of importance for clinical researchers. These are: the emergence of nonjustificationist philosophy of science; an increasing appreciation for theory and the need for informative tests of construct validity; valid construct representation in experimental psychopathology; the need to avoid representing multidimensional constructs with a single score; and the emergence of effective new statistical tools for the evaluation of convergent and discriminant validity. PMID:19086835
Handling missing values in the MDS-UPDRS.
Goetz, Christopher G; Luo, Sheng; Wang, Lu; Tilley, Barbara C; LaPelle, Nancy R; Stebbins, Glenn T
2015-10-01
This study was undertaken to define the number of missing values permissible to render valid total scores for each Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS) part. To handle missing values, imputation strategies serve as guidelines to reject an incomplete rating or create a surrogate score. We tested a rigorous, scale-specific, data-based approach to handling missing values for the MDS-UPDRS. From two large MDS-UPDRS datasets, we sequentially deleted item scores, either consistently (same items) or randomly (different items) across all subjects. Lin's Concordance Correlation Coefficient (CCC) compared scores calculated without missing values with prorated scores based on sequentially increasing missing values. The maximal number of missing values retaining a CCC greater than 0.95 determined the threshold for rendering a valid prorated score. A second confirmatory sample was selected from the MDS-UPDRS international translation program. To provide valid part scores applicable across all Hoehn and Yahr (H&Y) stages when the same items are consistently missing, one missing item from Part I, one from Part II, three from Part III, but none from Part IV can be allowed. To provide valid part scores applicable across all H&Y stages when random item entries are missing, one missing item from Part I, two from Part II, seven from Part III, but none from Part IV can be allowed. All cutoff values were confirmed in the validation sample. These analyses are useful for constructing valid surrogate part scores for MDS-UPDRS when missing items fall within the identified threshold and give scientific justification for rejecting partially completed ratings that fall below the threshold. © 2015 International Parkinson and Movement Disorder Society.
The medial tibial stress syndrome score: a new patient-reported outcome measure.
Winters, Marinus; Moen, Maarten H; Zimmermann, Wessel O; Lindeboom, Robert; Weir, Adam; Backx, Frank Jg; Bakker, Eric Wp
2016-10-01
At present, there is no validated patient-reported outcome measure (PROM) for patients with medial tibial stress syndrome (MTSS). Our aim was to select and validate previously generated items and create a valid, reliable and responsive PROM for patients with MTSS: the MTSS score. A prospective cohort study was performed in multiple sports medicine, physiotherapy and military facilities in the Netherlands. Participants with MTSS filled out the previously generated items for the MTSS score on 3 occasions. From previously generated items, we selected the best items. We assessed the MTSS score for its validity, reliability and responsiveness. The MTSS score was filled out by 133 participants with MTSS. Factor analysis showed the MTSS score to exhibit a single-factor structure with acceptable internal consistency (α=0.58) and good test-retest reliability (intraclass correlation coefficient=0.81). The MTSS score ranges from 0 to 10 points. The smallest detectable change in our sample was 0.69 at the group level and 4.80 at the individual level. Construct validity analysis showed significant moderate-to-large correlations (r=0.34-0.52, p<0.01). Responsiveness of the MTSS score was confirmed by a significant relation with the global perceived effect scale (β=-0.288, R(2)=0.21, p<0.001). The MTSS score is a valid, reliable and responsive PROM to measure the severity of MTSS. It is designed to evaluate treatment outcomes in clinical studies. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
The Validity and reliability of the Comprehensive Home Environment Survey (CHES).
Pinard, Courtney A; Yaroch, Amy L; Hart, Michael H; Serrano, Elena L; McFerren, Mary M; Estabrooks, Paul A
2014-01-01
Few comprehensive measures exist to assess contributors to childhood obesity within the home, specifically among low-income populations. The current study describes the modification and psychometric testing of the Comprehensive Home Environment Survey (CHES), an inclusive measure of the home food, physical activity, and media environment related to childhood obesity. The items were tested for content relevance by an expert panel and piloted in the priority population. The CHES was administered to low-income parents of children 5 to 17 years (N = 150), including a subsample of parents a second time and additional caregivers to establish test-retest and interrater reliabilities. Children older than 9 years (n = 95), as well as parents (N = 150) completed concurrent assessments of diet and physical activity behaviors (predictive validity). Analyses and item trimming resulted in 18 subscales and a total score, which displayed adequate internal consistency (α = .74-.92) and high test-retest reliability (r ≥ .73, ps < .01) and interrater reliability (r ≥ .42, ps < .01). The CHES score and a validated screener for the home environment were correlated (r = .37, p < .01; concurrent validity). CHES subscales were significantly correlated with behavioral measures (r = -.20-.55, p < .05; predictive validity). The CHES shows promise as a valid/reliable assessment of the home environment related to childhood obesity, including healthy diet and physical activity.
Jacova, Claudia; McGrenere, Joanna; Lee, Hyunsoo S; Wang, William W; Le Huray, Sarah; Corenblith, Emily F; Brehmer, Matthew; Tang, Charlotte; Hayden, Sherri; Beattie, B Lynn; Hsiung, Ging-Yuek R
2015-01-01
Cognitive Testing on Computer (C-TOC) is a novel computer-based test battery developed to improve both usability and validity in the computerized assessment of cognitive function in older adults. C-TOC's usability was evaluated concurrently with its iterative development to version 4 in subjects with and without cognitive impairment, and health professional advisors representing different ethnocultural groups. C-TOC version 4 was then validated against neuropsychological tests (NPTs), and by comparing performance scores of subjects with normal cognition, Cognitive Impairment Not Dementia (CIND) and Alzheimer disease. C-TOC's language tests were validated in subjects with aphasic disorders. The most important usability issue that emerged from consultations with 27 older adults and with 8 cultural advisors was the test-takers' understanding of the task, particularly executive function tasks. User interface features did not pose significant problems. C-TOC version 4 tests correlated with comparator NPT (r=0.4 to 0.7). C-TOC test scores were normal (n=16)>CIND (n=16)>Alzheimer disease (n=6). All normal/CIND NPT performance differences were detected on C-TOC. Low computer knowledge adversely affected test performance, particularly in CIND. C-TOC detected impairments in aphasic disorders (n=11). In general, C-TOC had good validity in detecting cognitive impairment. Ensuring test-takers' understanding of the tasks, and considering their computer knowledge appear important steps towards C-TOC's implementation.
The Validity of College Grade Prediction Equations Over Time.
ERIC Educational Resources Information Center
Sawyer, Richard L.; Maxey, James
A sample of 260 colleges was surveyed during the years 1972-1976 to determine the validity of predicting college freshmen grades from standardized test scores and high school grades using the American College Testing (ACT) Assessment Program, an evaluative and placement service for students and educators involved in the transition from high school…
Examinee Noneffort and the Validity of Program Assessment Results
ERIC Educational Resources Information Center
Wise, Steven L.; DeMars, Christine E.
2010-01-01
Educational program assessment studies often use data from low-stakes tests to provide evidence of program quality. The validity of scores from such tests, however, is potentially threatened by examinee noneffort. This study investigated the extent to which one type of noneffort--rapid-guessing behavior--distorted the results from three types of…
Criterion-Related Validity: Assessing the Value of Subscores
ERIC Educational Resources Information Center
Davison, Mark L.; Davenport, Ernest C., Jr.; Chang, Yu-Feng; Vue, Kory; Su, Shiyang
2015-01-01
Criterion-related profile analysis (CPA) can be used to assess whether subscores of a test or test battery account for more criterion variance than does a single total score. Application of CPA to subscore evaluation is described, compared to alternative procedures, and illustrated using SAT data. Considerations other than validity and reliability…
Discriminant Validity of the WISC-IV Culture-Language Interpretive Matrix
ERIC Educational Resources Information Center
Styck, Kara M.; Watkins, Marley W.
2014-01-01
The Culture-Language Interpretive Matrix (C-LIM) was developed to help practitioners determine the validity of test scores obtained from students who are culturally and linguistically different from the normative group of a test. The present study used an idiographic approach to investigate the diagnostic utility of the C-LIM for the Wechsler…
Carrillo-Larco, Rodrigo M; Miranda, J Jaime; Gilman, Robert H; Medina-Lezama, Josefina; Chirinos-Pacheco, Julio A; Muñoz-Retamozo, Paola V; Smeeth, Liam; Checkley, William; Bernabe-Ortiz, Antonio
2017-11-29
Chronic Kidney Disease (CKD) represents a great burden for the patient and the health system, particularly if diagnosed at late stages. Consequently, tools to identify patients at high risk of having CKD are needed, particularly in limited-resources settings where laboratory facilities are scarce. This study aimed to develop a risk score for prevalent undiagnosed CKD using data from four settings in Peru: a complete risk score including all associated risk factors and another excluding laboratory-based variables. Cross-sectional study. We used two population-based studies: one for developing and internal validation (CRONICAS), and another (PREVENCION) for external validation. Risk factors included clinical- and laboratory-based variables, among others: sex, age, hypertension and obesity; and lipid profile, anemia and glucose metabolism. The outcome was undiagnosed CKD: eGFR < 60 ml/min/1.73m 2 . We tested the performance of the risk scores using the area under the receiver operating characteristic (ROC) curve, sensitivity, specificity, positive/negative predictive values and positive/negative likelihood ratios. Participants in both studies averaged 57.7 years old, and over 50% were females. Age, hypertension and anemia were strongly associated with undiagnosed CKD. In the external validation, at a cut-off point of 2, the complete and laboratory-free risk scores performed similarly well with a ROC area of 76.2% and 76.0%, respectively (P = 0.784). The best assessment parameter of these risk scores was their negative predictive value: 99.1% and 99.0% for the complete and laboratory-free, respectively. The developed risk scores showed a moderate performance as a screening test. People with a score of ≥ 2 points should undergo further testing to rule out CKD. Using the laboratory-free risk score is a practical approach in developing countries where laboratories are not readily available and undiagnosed CKD has significant morbidity and mortality.
The use of neuropsychological tests to assess intelligence.
Gansler, David A; Varvaris, Mark; Schretlen, David J
We sought to derive a 'neuropsychological intelligence quotient' (NIQ) to replace IQ testing in some routine assessments. We administered neuropsychological testing and a seven-subtest short form of the Wechsler Adult Intelligence Scale to a community sample of 394 adults aged 18-96 years. We regressed Wechsler Full Scale IQs (W-FSIQ) on 23 neuropsychological scores and derived an NIQ from 9 measures that explained significant variance in W-FSIQ. We then compared subgroups of 284 healthy and 108 unhealthy participants in NIQ and W-FSIQ to assess criterion validity, correlated NIQ and W-FSIQ scores with education level and independence for activities of daily living to assess convergent validity, and compared validity coefficients for the NIQ with those of 'hold' and 'no-hold' indices. By design, NIQ and W-FSIQ scores correlated highly (r = .84), and both were higher in healthy participants. The difference was larger for NIQ, which accounted for more variability in activities of daily living. The NIQ and 'no-hold' index were better predicted by health status and less predicted by educational status than the 'hold' index. We constructed an NIQ that correlates highly with Wechsler FSIQ. Tests required to obtain NIQ are commonly used and can be administered in about 45 min. Validity properties of NIQ and W-FSIQ are similar. The NIQ bore greater resemblance to a 'no-hold' than 'hold' index. One can obtain a reasonably accurate estimate of current Full Scale IQ without formal intelligence testing from a brief neuropsychological battery.
Validation of the Female Sexual Function Index (FSFI) for web-based administration.
Crisp, Catrina C; Fellner, Angela N; Pauls, Rachel N
2015-02-01
Web-based questionnaires are becoming increasingly valuable for clinical research. The Female Sexual Function Index (FSFI) is the gold standard for evaluating female sexual function; yet, it has not been validated in this format. We sought to validate the Female Sexual Function Index (FSFI) for web-based administration. Subjects enrolled in a web-based research survey of sexual function from the general population were invited to participate in this validation study. The first 151 respondents were included. Validation participants completed the web-based version of the FSFI followed by a mailed paper-based version. Demographic data were collected for all subjects. Scores were compared using the paired t test and the intraclass correlation coefficient. One hundred fifty-one subjects completed both web- and paper-based versions of the FSFI. Those subjects participating in the validation study did not differ in demographics or FSFI scores from the remaining subjects in the general population study. Total web-based and paper-based FSFI scores were not significantly different (mean 20.31 and 20.29 respectively, p = 0.931). The six domains or subscales of the FSFI were similar when comparing web and paper scores. Finally, intraclass correlation analysis revealed a high degree of correlation between total and subscale scores, r = 0.848-0.943, p < 0.001. Web-based administration of the FSFI is a valid alternative to the paper-based version.
Bryant, Elizabeth; Murtagh, Shemane; Finucane, Laura; McCrum, Carol; Mercer, Christopher; Smith, Toby; Canby, Guy; Rowe, David A; Moore, Ann P
2018-05-11
In response for the need of a freely available, stand-alone, validated outcome measure for use within musculoskeletal (MSK) physiotherapy practice, sensitive enough to measure clinical effectiveness, we developed an MSK patient reported outcome measure. This study examined the validity and reliability of the newly developed Brighton musculoskeletal Patient-Reported Outcome Measure (BmPROM) within physiotherapy outpatient settings. Two hundred twenty-four patients attending physiotherapy outpatient departments in South East England with an MSK condition participated in this study. The BmPROM was assessed for user friendliness (rated feedback, N = 224), reliability (internal consistency and test-retest reliability, n = 42), validity (internal and external construct validity, N = 224), and responsiveness (internal, n = 25). Exploratory factor analysis indicated that a two-factor model provides a good fit to the data. Factors were representative of "Functionality" and "Wellbeing". Correlations observed between the BmPROM and SF-36 domains provided evidence of convergent validity. Reliability results indicated that both subscales were internally consistent with alphas above the acceptable limits for both "Functionality" (α = .85, 95% CI [.81, .88]) and 'Wellbeing' (α = .80, 95% CI [.75, .84]). Test-retest analyses (n = 42) demonstrated a high degree of reliability between "Functionality" (ICC = .84; 95% CI [.72, .91]) and "Wellbeing" scores (ICC = .84; 95% CI [.72, .91]). Further examination of test-retest reliability through the Bland-Altman analysis demonstrated that the difference between "Functionality" and "Wellbeing" test scores did not vary as a function of absolute test score. Large treatment effect sizes were found for both subscales (Functionality d = 1.10; Wellbeing 1.03). The BmPROM is a reliable and valid outcome measure for use in evaluating physiotherapy treatment of MSK conditions. Copyright © 2018 John Wiley & Sons, Ltd.
Feenstra, Heleen E M; Murre, Jaap M J; Vermeulen, Ivar E; Kieffer, Jacobien M; Schagen, Sanne B
2018-04-01
To facilitate large-scale assessment of a variety of cognitive abilities in clinical studies, we developed a self-administered online neuropsychological test battery: the Amsterdam Cognition Scan (ACS). The current studies evaluate in a group of adult cancer patients: test-retest reliability of the ACS and the influence of test setting (home or hospital), and the relationship between our online and a traditional test battery (concurrent validity). Test-retest reliability was studied in 96 cancer patients (57 female; M age = 51.8 years) who completed the ACS twice. Intraclass correlation coefficients (ICCs) were used to assess consistency over time. The test setting was counterbalanced between home and hospital; influence on test performance was assessed by repeated measures analyses of variance. Concurrent validity was studied in 201 cancer patients (112 female; M age = 53.5 years) who completed both the online and an equivalent traditional neuropsychological test battery. Spearman or Pearson correlations were used to assess consistency between online and traditional tests. ICCs of the online tests ranged from .29 to .76, with an ICC of .78 for the ACS total score. These correlations are generally comparable with the test-retest correlations of the traditional tests as reported in the literature. Correlating online and traditional test scores, we observed medium to large concurrent validity (r/ρ = .42 to .70; total score r = .78), except for a visuospatial memory test (ρ = .36). Correlations were affected-as expected-by design differences between online tests and their offline counterparts. Although development and optimization of the ACS is an ongoing process, and reliability can be optimized for several tests, our results indicate that it is a highly usable tool to obtain (online) measures of various cognitive abilities. The ACS is expected to facilitate efficient gathering of data on cognitive functioning in the near future.
A Study on the Impact of Fatigue on Human Raters When Scoring Speaking Responses
ERIC Educational Resources Information Center
Ling, Guangming; Mollaun, Pamela; Xi, Xiaoming
2014-01-01
The scoring of constructed responses may introduce construct-irrelevant factors to a test score and affect its validity and fairness. Fatigue is one of the factors that could negatively affect human performance in general, yet little is known about its effects on a human rater's scoring quality on constructed responses. In this study, we compared…
A Psychometric Review of Measures Assessing Discrimination Against Sexual Minorities.
Morrison, Todd G; Bishop, C J; Morrison, Melanie A; Parker-Taneo, Kandice
2016-08-01
Discrimination against sexual minorities is widespread and has deleterious consequences on victims' psychological and physical wellbeing. However, a review of the psychometric properties of instruments measuring lesbian, gay, and bisexual (LGB) discrimination has not been conducted. The results of this review, which involved evaluating 162 articles, reveal that most have suboptimal psychometric properties. Specifically, myriad scales possess questionable content validity as (1) items are not created in collaboration with sexual minorities; (2) measures possess a small number of items and, thus, may not sufficiently represent the domain of interest; and (3) scales are "adapted" from measures designed to examine race- and gender-based discrimination. Additional limitations include (1) summed scores are computed, often in the absence of scale score reliability metrics; (2) summed scores operate from the questionable assumption that diverse forms of discrimination are necessarily interrelated; (3) the dimensionality of instruments presumed to consist of subscales is seldom tested; (4) tests of criterion-related validity are routinely omitted; and (5) formal tests of measures' construct validity are seldom provided, necessitating that one infer validity based on the results obtained. The absence of "gold standard" measures, the attendant difficulty in formulating a coherent picture of this body of research, and suggestions for psychometric improvements are noted.
Rosenthal, Rachel; Hamel, Christian; Oertli, Daniel; Demartines, Nicolas; Gantert, Walter A
2010-08-01
The aim of the present study was to investigate whether trainees' performance on a virtual reality angled laparoscope navigation task correlates with scores obtained on a validated conventional test of spatial ability. 56 participants of a surgery workshop performed an angled laparoscope navigation task on the Xitact LS 500 virtual reality Simulator. Performance parameters were correlated with the score of a validated paper-and-pencil test of spatial ability. Performance at the conventional spatial ability test significantly correlated with performance at the virtual reality task for overall task score (p < 0.001), task completion time (p < 0.001) and economy of movement (p = 0.035), not for endoscope travel speed (p = 0.947). In conclusion, trainees' performance in a standardized virtual reality camera navigation task correlates with their innate spatial ability. This VR session holds potential to serve as an assessment tool for trainees.
Development of a Self-Report Measure of Reward Sensitivity:A Test in Current and Former Smokers.
Hughes, John R; Callas, Peter W; Priest, Jeff S; Etter, Jean-Francois; Budney, Alan J; Sigmon, Stacey C
2017-06-01
Tobacco use or abstinence may increase or decrease reward sensitivity. Most existing measures of reward sensitivity were developed decades ago, and few have undergone extensive psychometric testing. We developed a 58-item survey of the anticipated enjoyment from, wanting for, and frequency of common rewards (the Rewarding Events Inventory-REI). The current analysis focuses on ratings of anticipated enjoyment. The first validation study recruited current and former smokers from Internet sites. The second study recruited smokers who wished to quit and monetarily reinforced them to stay abstinent in a laboratory study and a comparison group of former smokers. In both studies, participants completed the inventory on two occasions, 3-7 days apart. They also completed four anhedonia scales and a behavioral test of reduced reward sensitivity. Half of the enjoyment ratings loaded on four factors: socializing, active hobbies, passive hobbies, and sex/drug use. Cronbach's alpha coefficients were all ≥0.73 for overall mean and factor scores. Test-retest correlations were all ≥0.83. Correlations of the overall and factor scores with frequency of rewards and anhedonia scales were 0.19-0.53, except for the sex/drugs factor. The scores did not correlate with behavioral tests of reward and did not differ between current and former smokers. Lower overall mean enjoyment score predicted a shorter time to relapse. Internal reliability and test-retest reliability of the enjoyment outcomes of the REI are excellent, and construct and predictive validity are modest but promising. The REI is comprehensive and up-to-date, yet is short enough to use on repeated occasions. Replication tests, especially predictive validity tests, are needed. Both use of and abstinence from nicotine appear to increase or decrease how rewarding nondrug rewards are; however, self-report scales to test this have limitations. Our inventory of enjoyment from 58 rewards appears to be reliable and valid as well as comprehensive and up-to-date, yet is short enough to use on repeated occasions. Replication tests, especially of the predictive validity of our scale, are needed. © The Author 2017. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Kitada, Masako; Musashi, Manabu; Kano, Masato
2011-08-01
To examine reliability and validity of Kano Test for Social Nicotine Dependence (KTSND), a scale assessing the psychosocial acceptability of smoking, and to develop a new version when validity or reliability of KTSND was not acceptable. We carried out a self-administered cross-sectional survey on undergraduate university students. The participants completed the KTSND, and supplemented three questions on the attitudes toward tobacco control policies and smoking states. Using daily smokers, we examined the relationship between the KTSND and Fagerström Test for Nicotine Dependence (FTND). In each study, we examined test-retest reliability and construct validity, discriminant and convergent validity, and factor validity. Although the KTSND had high internal consistency (Cronbach's a 0.82) and high test-retest reliability (r=0.72), the results of factor analysis were unacceptable; we expected three factors to be extracted, however, only two factors of "Overestimate of smoking usefulness" and "Allege smoking as a taste and/or culture" were extracted. Using the Kano's Test for Assessing Acceptability of Smoking (KTAAS), the new version of KTSND in which a question was replaced with another one, the third factor of "Neglect of harm of tobacco smoking" was extracted adding to the above-mentioned two. KTAAS had also both high internal consistency (Cronbach's alpha 0.82) and test-retest reliability (r=0.66). Overall, the KTSND and the KTAAS score differed according to smoking states, and the nonsmokers' scores were the lowest. The KTSND was a popular questionnaire in Japan, however, its validity assessed using factor analysis was not acceptable, while KTAAS had sufficient reliability and validity, and might assess the cognition and attitude affirming or accepting tobacco smoking among university students.
Can a Two-Question Test Be Reliable and Valid for Predicting Academic Outcomes?
ERIC Educational Resources Information Center
Bridgeman, Brent
2016-01-01
Scores on essay-based assessments that are part of standardized admissions tests are typically given relatively little weight in admissions decisions compared to the weight given to scores from multiple-choice assessments. Evidence is presented to suggest that more weight should be given to these assessments. The reliability of the writing scores…
The "Don't Know" Option in Progress Testing
ERIC Educational Resources Information Center
Ravesloot, C. J.; Van der Schaaf, M. F.; Muijtjens, A. M. M.; Haaring, C.; Kruitwagen, C. L. J. J.; Beek, F. J. A.; Bakker, J.; Van Schaik, J.P.J.; Ten Cate, Th. J.
2015-01-01
Formula scoring (FS) is the use of a don't know option (DKO) with subtraction of points for wrong answers. Its effect on construct validity and reliability of progress test scores, is subject of discussion. Choosing a DKO may not only be affected by knowledge level, but also by risk taking tendency, and may thus introduce construct-irrelevant…
Stræde, Mia; Brabrand, Mikkel
2014-01-01
Clinical scores can be of aid to predict early mortality after admission to a medical admission unit. A developed scoring system needs to be externally validated to minimise the risk of the discriminatory power and calibration to be falsely elevated. We performed the present study with the objective of validating the Simple Clinical Score (SCS) and the HOTEL score, two existing risk stratification systems that predict mortality for medical patients based solely on clinical information, but not only vital signs. Pre-planned prospective observational cohort study. Danish 460-bed regional teaching hospital. We included 3046 consecutive patients from 2 October 2008 until 19 February 2009. 26 (0.9%) died within one calendar day and 196 (6.4%) died within 30 days. We calculated SCS for 1080 patients. We found an AUROC of 0.960 (95% confidence interval [CI], 0.932 to 0.988) for 24-hours mortality and 0.826 (95% CI, 0.774-0.879) for 30-day mortality, and goodness-of-fit test, χ(2) = 2.68 (10 degrees of freedom), P = 0.998 and χ(2) = 4.00, P = 0.947, respectively. We included 1470 patients when calculating the HOTEL score. Discriminatory power (AUROC) was 0.931 (95% CI, 0.901-0.962) for 24-hours mortality and goodness-of-fit test, χ(2) = 5.56 (10 degrees of freedom), P = 0.234. We find that both the SCS and HOTEL scores showed an excellent to outstanding ability in identifying patients at high risk of dying with good or acceptable precision.
Lei, Pingguang; Lei, Guanghe; Tian, Jianjun; Zhou, Zengfen; Zhao, Miao; Wan, Chonghua
2014-10-01
This paper is aimed to develop the irritable bowel syndrome (IBS) scale of the system of Quality of Life Instruments for Chronic Diseases (QLICD-IBS) by the modular approach and validate it by both classical test theory and generalizability theory. The QLICD-IBS was developed based on programmed decision procedures with multiple nominal and focus group discussions, in-depth interview, and quantitative statistical procedures. One hundred twelve inpatients with IBS were used to provide the data measuring QOL three times before and after treatments. The psychometric properties of the scale were evaluated with respect to validity, reliability, and responsiveness employing correlation analysis, factor analyses, multi-trait scaling analysis, t tests and also G studies and D studies of generalizability theory analysis. Multi-trait scaling analysis, correlation, and factor analyses confirmed good construct validity and criterion-related validity when using SF-36 as a criterion. Test-retest reliability coefficients (Pearson r and intra-class correlation (ICC)) for the overall score and all domains were higher than 0.80; the internal consistency α for all domains at two measurements were higher than 0.70 except for the social domain (0.55 and 0.67, respectively). The overall score and scores for all domains/facets had statistically significant changes after treatments with moderate or higher effect size standardized response mean (SRM) ranging from 0.72 to 1.02 at domain levels. G coefficients and index of dependability (Ф coefficients) confirmed the reliability of the scale further with more exact variance components. The QLICD-IBS has good validity, reliability, responsiveness, and some highlights and can be used as the quality of life instrument for patients with IBS.
Critical thinking skills in midwifery practice: Development of a self-assessment tool for students.
Carter, Amanda G; Creedy, Debra K; Sidebotham, Mary
2017-07-01
Develop and test a tool designed for use by pre-registration midwifery students to self-appraise their critical thinking in practice. A descriptive cohort design was used. All students (n=164) enrolled in a three-year Bachelor of Midwifery program in Queensland, Australia. The staged model for tool development involved item generation, mapping draft items to critical thinking concepts and expert review to test content validity, pilot testing of the tool to a convenience sample of students, and psychometric testing. Students (n=126, 76.8% response rate) provided demographic details, completed the new tool, and five questions from the Motivated Strategies for Learning Questionnaire (MSLQ) via an online platform or paper version. A high content validity index score of 0.97 was achieved through expert review. Construct validity via factor analysis revealed four factors: seeks information, reflects on practice, facilitates shared decision making, and evaluates practice. The mean total score for the tool was 124.98 (SD=12.58). Total and subscale scores correlated significantly. The scale achieved good internal reliability with a Cronbach's alpha coefficient of 0.92. Concurrent validity with the MSLQ subscale was 0.35 (p<0.001). This study established the reliability and validity of the CACTiM - student version for use by pre-registration midwifery students to self-assess critical thinking in practice. Critical thinking skills are vital for safe and effective midwifery practice. Students' assessment of their critical thinking development throughout their pre-registration programme makes these skills explicit, and could guide teaching innovation to address identified deficits. The availability of a reliable and valid tool assists research into the development of critical thinking in education and practice. Crown Copyright © 2017. Published by Elsevier Ltd. All rights reserved.
Badia, X; Mascaró, J M; Lozano, R
1999-10-01
The aim of this study was to assess the feasibility, validity, reliability and sensitivity to change of a Spanish version of the Dermatology Life Quality Index (DLQI) in patients with mild to moderate eczema and psoriasis who were treated with topical corticosteroids. The final study sample comprised 237 patients (48% eczema). Discriminant validity was tested by comparing patients' scores with those of a random sample of the general population (n = 100), and convergent validity by analysing correlations between DLQI scores, measures of clinical severity, and domain scores on the Nottingham Health Profile (NHP). Internal consistency and test-retest reliability were tested in clinically stable patients (n = 94), and responsiveness in a clinically unstable group (n = 143) initiating treatment with topical corticosteroids. Patient scores were significantly higher than general population scores (4.3 vs. 0. 27, P < 0.001). Correlations with NHP domains ranged from 0.12 to 0. 32, and there was significant correlation with clinical measures (r = 0.26, P < 0.001). Reliability was good (Cronbach's alpha = 0.83; intraclass correlation coefficient = 0.88), and the instrument proved responsive to change (effect size for the total group of de novo patients = 0.70), though the great majority of changes occurred in items 1 and 2. The NHP Emotional Reactions and Mobility domains were more responsive than some DLQI domains. In clinical trials of treatments for mild to moderate eczema and psoriasis, it is likely that only items 1 and 2 of the DLQI will be needed, and it is probably advisable to include generic instruments alongside the DLQI.
Prognostic score to predict mortality during TB treatment in TB/HIV co-infected patients.
Nguyen, Duc T; Jenkins, Helen E; Graviss, Edward A
2018-01-01
Estimating mortality risk during TB treatment in HIV co-infected patients is challenging for health professionals, especially in a low TB prevalence population, due to the lack of a standardized prognostic system. The current study aimed to develop and validate a simple mortality prognostic scoring system for TB/HIV co-infected patients. Using data from the CDC's Tuberculosis Genotyping Information Management System of TB patients in Texas reported from 01/2010 through 12/2016, age ≥15 years, HIV(+), and outcome being "completed" or "died", we developed and internally validated a mortality prognostic score using multiple logistic regression. Model discrimination was determined by the area under the receiver operating characteristic (ROC) curve (AUC). The model's good calibration was determined by a non-significant Hosmer-Lemeshow's goodness of fit test. Among the 450 patients included in the analysis, 57 (12.7%) died during TB treatment. The final prognostic score used six characteristics (age, residence in long-term care facility, meningeal TB, chest x-ray, culture positive, and culture not converted/unknown), which are routinely collected by TB programs. Prognostic scores were categorized into three groups that predicted mortality: low-risk (<20 points), medium-risk (20-25 points) and high-risk (>25 points). The model had good discrimination and calibration (AUC = 0.82; 0.80 in bootstrap validation), and a non-significant Hosmer-Lemeshow test p = 0.71. Our simple validated mortality prognostic scoring system can be a practical tool for health professionals in identifying TB/HIV co-infected patients with high mortality risk.
Validity of the MCAT in Predicting Performance in the First Two Years of Medical School.
ERIC Educational Resources Information Center
Jones, Robert F.; Thomae-Forgues, Maria
1984-01-01
The first systematic summary of predictive validity research on the new Medical College Admission Test (MCAT) is presented. The results show that MCAT scores have significant predictive validity with respect to first- and second-year medical school course grades. Further directions for MCAT validity research are described. (Author/MLW)
Internet cognitive testing of large samples needed in genetic research.
Haworth, Claire M A; Harlaar, Nicole; Kovas, Yulia; Davis, Oliver S P; Oliver, Bonamy R; Hayiou-Thomas, Marianna E; Frances, Jane; Busfield, Patricia; McMillan, Andrew; Dale, Philip S; Plomin, Robert
2007-08-01
Quantitative and molecular genetic research requires large samples to provide adequate statistical power, but it is expensive to test large samples in person, especially when the participants are widely distributed geographically. Increasing access to inexpensive and fast Internet connections makes it possible to test large samples efficiently and economically online. Reliability and validity of Internet testing for cognitive ability have not been previously reported; these issues are especially pertinent for testing children. We developed Internet versions of reading, language, mathematics and general cognitive ability tests and investigated their reliability and validity for 10- and 12-year-old children. We tested online more than 2500 pairs of 10-year-old twins and compared their scores to similar internet-based measures administered online to a subsample of the children when they were 12 years old (> 759 pairs). Within 3 months of the online testing at 12 years, we administered standard paper and pencil versions of the reading and mathematics tests in person to 30 children (15 pairs of twins). Scores on Internet-based measures at 10 and 12 years correlated .63 on average across the two years, suggesting substantial stability and high reliability. Correlations of about .80 between Internet measures and in-person testing suggest excellent validity. In addition, the comparison of the internet-based measures to ratings from teachers based on criteria from the UK National Curriculum suggests good concurrent validity for these tests. We conclude that Internet testing can be reliable and valid for collecting cognitive test data on large samples even for children as young as 10 years.
Husbands, Adrian; Mathieson, Alistair; Dowell, Jonathan; Cleland, Jennifer; MacKenzie, Rhoda
2014-04-23
The UK Clinical Aptitude Test (UKCAT) was designed to address issues identified with traditional methods of selection. This study aims to examine the predictive validity of the UKCAT and compare this to traditional selection methods in the senior years of medical school. This was a follow-up study of two cohorts of students from two medical schools who had previously taken part in a study examining the predictive validity of the UKCAT in first year. The sample consisted of 4th and 5th Year students who commenced their studies at the University of Aberdeen or University of Dundee medical schools in 2007. Data collected were: demographics (gender and age group), UKCAT scores; Universities and Colleges Admissions Service (UCAS) form scores; admission interview scores; Year 4 and 5 degree examination scores. Pearson's correlations were used to examine the relationships between admissions variables, examination scores, gender and age group, and to select variables for multiple linear regression analysis to predict examination scores. Ninety-nine and 89 students at Aberdeen medical school from Years 4 and 5 respectively, and 51 Year 4 students in Dundee, were included in the analysis. Neither UCAS form nor interview scores were statistically significant predictors of examination performance. Conversely, the UKCAT yielded statistically significant validity coefficients between .24 and .36 in four of five assessments investigated. Multiple regression analysis showed the UKCAT made a statistically significant unique contribution to variance in examination performance in the senior years. Results suggest the UKCAT appears to predict performance better in the later years of medical school compared to earlier years and provides modest supportive evidence for the UKCAT's role in student selection within these institutions. Further research is needed to assess the predictive validity of the UKCAT against professional and behavioural outcomes as the cohort commences working life.
2014-01-01
Background The UK Clinical Aptitude Test (UKCAT) was designed to address issues identified with traditional methods of selection. This study aims to examine the predictive validity of the UKCAT and compare this to traditional selection methods in the senior years of medical school. This was a follow-up study of two cohorts of students from two medical schools who had previously taken part in a study examining the predictive validity of the UKCAT in first year. Methods The sample consisted of 4th and 5th Year students who commenced their studies at the University of Aberdeen or University of Dundee medical schools in 2007. Data collected were: demographics (gender and age group), UKCAT scores; Universities and Colleges Admissions Service (UCAS) form scores; admission interview scores; Year 4 and 5 degree examination scores. Pearson’s correlations were used to examine the relationships between admissions variables, examination scores, gender and age group, and to select variables for multiple linear regression analysis to predict examination scores. Results Ninety-nine and 89 students at Aberdeen medical school from Years 4 and 5 respectively, and 51 Year 4 students in Dundee, were included in the analysis. Neither UCAS form nor interview scores were statistically significant predictors of examination performance. Conversely, the UKCAT yielded statistically significant validity coefficients between .24 and .36 in four of five assessments investigated. Multiple regression analysis showed the UKCAT made a statistically significant unique contribution to variance in examination performance in the senior years. Conclusions Results suggest the UKCAT appears to predict performance better in the later years of medical school compared to earlier years and provides modest supportive evidence for the UKCAT’s role in student selection within these institutions. Further research is needed to assess the predictive validity of the UKCAT against professional and behavioural outcomes as the cohort commences working life. PMID:24762134
A New Interactive Screening Test for Autism Spectrum Disorders in Toddlers.
Choueiri, Roula; Wagner, Sheldon
2015-08-01
To develop a clinically valid interactive level 2 screening assessment for autism spectrum disorders (ASD) in toddlers that is brief, easily administered, and scored by clinicians. We describe the development, training, standardization, and validation of the Rapid Interactive Screening Test for Autism in Toddlers (RITA-T) with ASD-specific diagnostic instruments. The RITA-T can be administered and scored in 10 minutes. We studied the validity of the RITA-T to distinguish between toddlers with ASD from toddlers with developmental delay (DD)/non-ASD in an early childhood clinic. We also evaluated the test's performance in toddlers with no developmental concerns. We identified a cutoff score based on sensitivity, specificity, and positive predictive value of the RITA-T that best differentiates between ASD and DD/non-ASD. A total of 61 toddlers were enrolled. RITA-T scores were correlated with ASD-specific diagnostic tools (r = 0.79; P < .01) and ASD clinical diagnoses (r = 0.77; P < .01). Mean scores were significantly different in subjects with ASD, those with DD/non-ASD, and those with no developmental concerns (20.8 vs 13 vs 10.6, respectively; P < .0001). At a cutoff score of >14 , the RITA-T had a sensitivity of 1.00, specificity of 0.84, and positive predictive value of 0.88 for identifying ASD risk in a high-risk group. The RITA-T is a promising new level 2 interactive screening tool for improving the early identification of ASD in toddlers in general pediatric and early intervention settings and allowing access to treatment. Copyright © 2015 Elsevier Inc. All rights reserved.
A COMPARISON OF THE EMPIRICAL VALIDITY OF SIX TESTS OF ABILITY WITH EDUCABLE MENTAL RETARDATES.
ERIC Educational Resources Information Center
MUELLER, MAX W.
AN INVESTIGATION OF THE VALIDITY OF INTELLIGENCE AND OTHER TESTS USED IN THE DIAGNOSIS OF RETARDED CHILDREN WAS PERFORMED. EXPERIMENTAL SAMPLES CONSISTED OF 101 CHILDREN SELECTED FROM SPECIAL CLASSES FOR EDUCABLE MENTALLY RETARDED (EMR) WHOSE AGES RANGED FROM 6.9 TO 10 YEARS AND WHOSE IQ SCORES RANGED FROM 50 TO 80. THE TESTS EVALUATED WERE (1)…
ERIC Educational Resources Information Center
DUENK, LESTER G.
THE PRIMARY OBJECTIVE OF THIS STUDY WAS TO ESTABLISH THE CONCURRENT VALIDITY OF THE MINNESOTA TESTS OF CREATIVE THINKING, ABBREVIATED FORM VII, (MTCT VII) BY DETERMINING THE RELATIONSHIP BETWEEN ITS SCORES AND CREATIVE ABILITY AS MEASURED BY ACCUMULATED TEACHER RATINGS OF INDUSTRIAL ARTS PROJECTS AND INVESTIGATOR-DEVELOPED TESTS OF CREATIVITY. THE…
Validating the Watson Glaser Critical Thinking Appraisal
ERIC Educational Resources Information Center
Hassan, Karma El; Madhum, Ghida
2007-01-01
This study validated the Watson Glaser Critical Thinking Appraisal (WGCTA) on a sample of 273 private university students in Lebanon. For that purpose, evidence for construct validation was investigated through identifying the test's factor structure and subscale total correlations, in addition to differences in scores by gender, different levels,…
Evaluation of Criterion Validity for Scales with Congeneric Measures
ERIC Educational Resources Information Center
Raykov, Tenko
2007-01-01
A method for estimating criterion validity of scales with homogeneous components is outlined. It accomplishes point and interval estimation of interrelationship indices between composite scores and criterion variables and is useful for testing hypotheses about criterion validity of measurement instruments. The method can also be used with missing…
[Validation of a Spanish version of the Childhood Asthma Control Test (Sc-ACT) for use in Spain].
Pérez-Yarza, E G; Castro-Rodriguez, J A; Villa Asensi, J R; Garde Garde, J; Hidalgo Bermejo, F J
2015-08-01
The Childhood Asthma Control Test (c-ACT) is a validated tool for determining pediatric asthma control. However, it is not validated in the Spanish language in Spain. We evaluated the psychometric properties of the Spanish version of the Childhood Asthma Control Test (Sc-ACT) for assessing asthma control in children ages 4 to11. This national, multicentre, prospective study was conducted in Spain with asthmatic children and their caregivers. Patients were assessed at 3 visits (Baseline, 2 Weeks, and 4 Months). Clinical variables included: symptoms, exacerbations, FEV1, asthma classification, PAQLQ and PACQLQ questionnaire scores, and asthma control as perceived by physicians, patients and caregivers. The Sc-ACT feasibility, validity, reliability, and sensitivity to change were assessed. A total of 394 children were included; mean (SD) time to complete the Sc-ACT was 5.3 (4.4) minutes. Sc-ACT score was correlated with asthma control as perceived by physician (-0.52), patient (-0.53), and caregiver (-0.51) and with the PAQLQ (0.56) and PACQLQ (0.55) scores. Sc-ACT was found to be significantly related to intensity and frequency of asthma symptoms. Cronbach alpha coefficient α was 0.81 and intraclass correlation coefficient was ≥0.85 for all of the items. The global effect size of Sc-ACT was 0.55. The cutoff point scores of 21 or higher indicated a good asthma control and their MCID was 4 points. The Spanish version of the c-ACT was found to be a reliable and valid questionnaire for evaluating asthma control in Spanish-speaking children ages 4 to 11 in Spain. Copyright © 2014 Asociación Española de Pediatría. Published by Elsevier España, S.L.U. All rights reserved.
Olfactory identification and Stroop interference converge in schizophrenia.
Purdon, S E
1998-01-01
OBJECTIVE: To test the discriminant validity of a model predicting a dissociation between measures of right and left frontal lobe function in people with schizophrenia. PARTICIPANTS: Twenty-one clinically stable outpatients with schizophrenia. INTERVENTIONS: Patients were administered the University of Pennsylvania Smell Identification Test (UPSIT), the Stroop Color-Word Test (Stroop), and the Positive and Negative Syndrome Scale (PANSS). OUTCOME MEASURES: Scores on these tests and relation among scores. RESULTS: There was a convergence of UPSII and Stroop interference scores consistent with a common cerebral basis for limitations in olfactory identification and inhibition of distraction. There was also a divergence of UPSIT and Stroop reading scores suggesting that the olfactory identification limitation is distinct from a general limitation of attention or a dysfunction of the left dorsolateral prefrontal cortex. Most notable was the 81% classification convergence between the UPSIT and Stroop incongruous colour naming scores compared with the near-random 57% classification convergence of the UPSIT and Stroop reading scores. CONCLUSIONS: These data are consistent with a right orbitofrontal dysfunction in a subgroup of patients with schizophrenia, although the involvement of mesial temporal structures in both tasks must be ruled out with further study. A multifactorial model depicting contributions from diverse cerebral structures is required to describe the pathophysiology of schizophrenia. Valid behavioural methods for classifying suspected subgroups of patients with particular cerebral dysfunction would be of value in the construction of this model. PMID:9595890
Creation and Validation of the Self-esteem/Self-image Female Sexuality (SESIFS) Questionnaire
Lordello, Maria CO; Ambrogini, Carolina C; Fanganiello, Ana L; Embiruçu, Teresa R; Zaneti, Marina M; Veloso, Laise; Piccirillo, Livia B; Crude, Bianca L; Haidar, Mauro; Silva, Ivaldo
2014-01-01
INTRODUCTION Self-esteem and self-image are psychological aspects that affect sexual function. AIMS To validate a new measurement tool that correlates the concepts of self-esteem, self-image, and sexuality. METHODS A 20-question test (the self-esteem/self-image female sexuality [SESIFS] questionnaire) was created and tested on 208 women. Participants answered: Rosenberg’s self-esteem scale, the female sexual quotient (FSQ), and the SESIFS questionnaire. Pearson’s correlation coefficient was used to test concurrent validity of the SESIFS against Rosenberg’s self-esteem scale and the FSQ. Reliability was tested using the Cronbach’s alpha coefficient. RESULT The new questionnaire had a good overall reliability (Cronbach’s alpha r = 0.862, p < 0.001), but the sexual domain scored lower than expected (r = 0.65). The validity was good: overall score r = 0.38, p < 0.001, self-esteem domain r = 0.32, p < 0.001, self-image domain r = 0.31, p < 0.001, sexual domain r = 0.29, p < 0.001. CONCLUSIONS The SESIFS questionnaire has limitations in measuring the correlation among self-esteem, self-image, and sexuality domains. A new, revised version is being tested and will be presented in an upcoming publication. PMID:25574149
Creation and Validation of the Self-esteem/Self-image Female Sexuality (SESIFS) Questionnaire.
Lordello, Maria Co; Ambrogini, Carolina C; Fanganiello, Ana L; Embiruçu, Teresa R; Zaneti, Marina M; Veloso, Laise; Piccirillo, Livia B; Crude, Bianca L; Haidar, Mauro; Silva, Ivaldo
2014-01-01
Self-esteem and self-image are psychological aspects that affect sexual function. To validate a new measurement tool that correlates the concepts of self-esteem, self-image, and sexuality. A 20-question test (the self-esteem/self-image female sexuality [SESIFS] questionnaire) was created and tested on 208 women. Participants answered: Rosenberg's self-esteem scale, the female sexual quotient (FSQ), and the SESIFS questionnaire. Pearson's correlation coefficient was used to test concurrent validity of the SESIFS against Rosenberg's self-esteem scale and the FSQ. Reliability was tested using the Cronbach's alpha coefficient. The new questionnaire had a good overall reliability (Cronbach's alpha r = 0.862, p < 0.001), but the sexual domain scored lower than expected (r = 0.65). The validity was good: overall score r = 0.38, p < 0.001, self-esteem domain r = 0.32, p < 0.001, self-image domain r = 0.31, p < 0.001, sexual domain r = 0.29, p < 0.001. The SESIFS questionnaire has limitations in measuring the correlation among self-esteem, self-image, and sexuality domains. A new, revised version is being tested and will be presented in an upcoming publication.
Gasquoine, Philip G; Weimer, Amy A; Amador, Arnoldo
2017-04-01
To measure specificity as failure rates for non-clinical, bilingual, Mexican Americans on three popular performance validity measures: (a) the language format Reliable Digit Span; (b) visual-perceptual format Test of Memory Malingering; and (c) visual-perceptual format Dot Counting, using optimal/suboptimal effort cut scores developed for monolingual, English-speakers. Participants were 61 consecutive referrals, aged between 18 and 65 years, with <16 years of education who were subjectively bilingual (confirmed via formal assessment) and chose the language of assessment, Spanish or English, for the performance validity tests. Failure rates were 38% for Reliable Digit Span, 3% for the Test of Memory Malingering, and 7% for Dot Counting. For Reliable Digit Span, the failure rates for Spanish (46%) and English (31%) languages of administration did not differ significantly. Optimal/suboptimal effort cut scores derived for monolingual English-speakers can be used with Spanish/English bilinguals when using the visual-perceptual format Test of Memory Malingering and Dot Counting. The high failure rate for Reliable Digit Span suggests it should not be used as a performance validity measure with Spanish/English bilinguals, irrespective of the language of test administration, Spanish or English.
Visual judgements of steadiness in one-legged stance: reliability and validity.
Haupstein, T; Goldie, P
2000-01-01
There is a paucity of information about the validity and reliability of clinicians' visual judgements of steadiness in one-legged stance. Such judgements are used frequently in clinical practice to support decisions about treatment in the fields of neurology, sports medicine, paediatrics and orthopaedics. The aim of the present study was to address the validity and reliability of visual judgements of steadiness in one-legged stance in a group of physiotherapists. A videotape of 20 five-second performances was shown to 14 physiotherapists with median clinical experience of 6.75 years. Validity of visual judgement was established by correlating scores obtained from an 11-point rating scale with criterion scores obtained from a force platform. In addition, partial correlations were used to control for the potential influence of body weight on the relationship between the visual judgements and criterion scores. Inter-observer reliability was quantified between the physiotherapists; intra-observer reliability was quantified between two tests four weeks apart. Mean criterion-related validity was high, regardless of whether body weight was controlled for statistically (Pearson's r = 0.84, 0.83, respectively). The standard error of estimating the criterion score was 3.3 newtons. Inter-observer reliability was high (ICC (2,1) = 0.81 at Test 1 and 0.82 at Test 2). Intra-observer reliability was high (on average ICC (2,1) = 0.88; Pearson's r = 0.90). The standard error of measurement for the 11-point scale was one unit. The finding of higher accuracy of making visual judgements than previously reported may be due to several aspects of design: use of a criterion score derived from the variability of the force signal which is more discriminating than variability of centre of pressure; use of a discriminating visual rating scale; specificity and clear definition of the phenomenon to be rated.
Translation and validation of a Spanish version of the xerostomia inventory.
Serrano, Carlos; Fariña, María P; Pérez, Cristhian; Fernández, Marcos; Forman, Katherine; Carrasco, Mauricio
2016-12-01
The aim of this study was to validate a Spanish cross-cultural adaptation of the xerostomia inventory (XI). The original English version of XI was translated into Spanish, cross-culturally adapted and field tested. The Spanish version of XI (XI-Sp) was tested with a sample of 41 patients with xerostomia. The reliability of the XI-Sp was determined through internal consistency and test-retest methods. The construct validity of XI-Sp was determined by means of correlation between XI-Sp scores and salivary flow measurements. Overall XI-Sp scores were 40.8 (SD = 10) for the first application and 40.2 (SD = 9.5) for the second. Cronbach's alpha value for the XI-Sp was 0.89 and 0.87, respectively, while interitem correlation averages were r = 0.44 and r = 0.39 for each application. Interitem correlation and corrected total was r c ≥0.30. The test-retest intraclass correlation coefficient value for the XI-Sp score was 0.59 and 0.91. Convergent validity for construct validity correlation with salivary flow showed a medium effect size (r 2 = 0.10) for the first application but did not make a statistically significant prediction for the second (r 2 = 0.7). This study provides evidence concerning the reliability of the XI-Sp, showing that it may be a useful tool for Spanish-speaking xerostomia patients for both clinical and epidemiologic research. © 2015 John Wiley & Sons A/S and The Gerodontology Association. Published by John Wiley & Sons Ltd.
DOT National Transportation Integrated Search
1975-01-01
Scores on the American Optical Company (AOC) test (1965 edition), Dvorine test, Farnsworth Lantern test, Color Threshold Tester, Farnsworth-Munsell 100-Hue test, Farnsworth Panel D-15 test, and Schmidt-Haensch Anomaloscope were obtained from 137 men ...
Soltanparast, Sanaz; Jafari, Zahra; Sameni, Seyed Jalal; Salehi, Masoud
2014-01-01
The purpose of the present study was to evaluate the psychometric properties (validity and reliability) of the Persian version of the Sustained Auditory Attention Capacity Test in children with attention deficit hyperactivity disorder. The Persian version of the Sustained Auditory Attention Capacity Test was constructed to assess sustained auditory attention using the method provided by Feniman and colleagues (2007). In this test, comments were provided to assess the child's attentional deficit by determining inattention and impulsiveness error, the total scores of the sustained auditory attention capacity test and attention span reduction index. In the present study for determining the validity and reliability of in both Rey Auditory Verbal Learning test and the Persian version of the Sustained Auditory Attention Capacity Test (SAACT), 46 normal children and 41 children with Attention Deficit Hyperactivity (ADHD), all right-handed and aged between 7 and 11 of both genders, were evaluated. In determining convergent validity, a negative significant correlation was found between the three parts of the Rey Auditory Verbal Learning test (first, fifth, and immediate recall) and all indicators of the SAACT except attention span reduction. By comparing the test scores between the normal and ADHD groups, discriminant validity analysis showed significant differences in all indicators of the test except for attention span reduction (p< 0.001). The Persian version of the Sustained Auditory Attention Capacity test has good validity and reliability, that matches other reliable tests, and it can be used for the identification of children with attention deficits and if they suspected to have Attention Deficit Hyperactivity Disorder.
The development of a test of biodiversity knowledge of high school students
NASA Astrophysics Data System (ADS)
Ajayi, Olabisi Modupe
2002-09-01
The primary purpose of this study was to develop a valid and reliable test of the knowledge of biodiversity of high school students. The test differentiated students' knowledge on three levels of biodiversity: species, ecosystem and genetics. A secondary purpose was to examine how biodiversity scores were affected by gender, grade point average, and families' socioeconomic status. The initial phase of the instrument development involved the construction of 60 dichotomous items (true/false). To establish content validity, a panel of biodiversity experts reviewed the items for appropriateness and clarity. The items were checked for readability using Flesch-Kincaid Readability Index and the readability was at the fifth grade level. The instrument was subjected to factor analysis. As a result, the final instrument was compiled and named the Ajayi Biodiversity Instrument (ABI). The reliability of ABI was .87. The mean score on the 25-item test was 79%. No significant difference at >0.05 was found in the score of students on each of the three subtests for genetics, species, and ecosystem. No significant difference was found in the score of students relative to their family's socioeconomic status. There was a significant correlation between grade point average and participation in extracurricular activities that related to biodiversity concepts and scores on ABI. Gender differences emerged at the ecosystem level, females scoring higher than males. Differences among ethnic groups also emerged. Anglo-Americans scored significantly higher on the test of knowledge of biodiversity for high school students than the rest of the ethnic groups combined.
Validity of Peer Evaluation for Team-Based Learning in a Dental School in Japan.
Nishigawa, Keisuke; Hayama, Rika; Omoto, Katsuhiro; Okura, Kazuo; Tajima, Toyoko; Suzuki, Yoshitaka; Hosoki, Maki; Ueda, Mayu; Inoue, Miho; Rodis, Omar Marianito Maningo; Matsuka, Yoshizo
2017-12-01
The aim of this study was to determine the validity of peer evaluation for team-based learning (TBL) classes in dental education in comparison with the term-end examination records and TBL class scores. Examination and TBL class records of 256 third- and fourth-year dental students in six fixed prosthodontics courses from 2013 to 2015 in one dental school in Japan were investigated. Results of the term-end examination during those courses, individual readiness assurance test (IRAT), group readiness assurance test (GRAT), group assignment projects (GAP), and peer evaluation of group members in TBL classes were collected. Significant positive correlations were found between all combinations of peer evaluation, IRAT, and term-end examination. Individual scores also showed a positive correlation with group score (total of GRAT and GAP). From the investigation of the correlations in the six courses, significant positive correlations between peer evaluation and individual score were found in four of the six courses. In this study, peer evaluation seemed to be a valid index for learning performance in TBL classes. To verify the effectiveness of peer evaluation, all students have to realize the significance of scoring the team member's performance. Clear criteria and detailed instruction for appropriate evaluation are also required.
Aggio, Daniel; Fairclough, Stuart; Knowles, Zoe; Graves, Lee
2016-01-01
Adaptation of physical activity self-report questionnaires is sometimes required to reflect the activity behaviours of diverse populations. The processes used to modify self-report questionnaires though are typically underreported. This two-phased study used a formative approach to investigate the validity and reliability of the Physical Activity Questionnaire for Adolescents (PAQ-A) in English youth. Phase one examined test content and response process validity and subsequently informed a modified version of the PAQ-A. Phase two assessed the validity and reliability of the modified PAQ-A. In phase one, focus groups (n = 5) were conducted with adolescents (n = 20) to investigate test content and response processes of the original PAQ-A. Based on evidence gathered in phase one, a modified version of the questionnaire was administered to participants (n = 169, 14.5 ± 1.7 years) in phase two. Internal consistency and test-retest reliability were assessed using Cronbach's alpha and intra-class correlations, respectively. Spearman correlations were used to assess associations between modified PAQ-A scores and accelerometer-derived physical activity, self-reported fitness and physical activity self-efficacy. Phase one revealed that the original PAQ-A was unrepresentative for English youth and that item comprehension varied. Contextual and population/cultural-specific modifications were made to the PAQ-A for use in the subsequent phase. In phase two, modified PAQ-A scores had acceptable internal consistency (α = 0.72) and test-retest reliability (ICC = 0.78). Modified PAQ-A scores were significantly associated with objectively assessed moderate-to-vigorous physical activity (r = 0.39), total physical activity (r = 0.42), self-reported fitness (r = 0.35), and physical activity self-efficacy (r = 0.32) (p ≤ 0.01). The modified PAQ-A had acceptable internal consistency and test-retest reliability. Modified PAQ-A scores displayed weak-to-moderate correlations with objectively measured physical activity, self-reported fitness, and self-efficacy providing evidence of satisfactory criterion and construct validity, respectively. Further testing with more diverse English samples is recommended to provide a more complete assessment of the tool.
Reliability and concurrent validity of the Dutch hip and knee replacement expectations surveys
2010-01-01
Background Preoperative expectations of outcome of total hip and knee arthroplasty are important determinants of patients' satisfaction and functional outcome. Aims of the study were (1) to translate the Hospital for Special Surgery Hip Replacement Expectations Survey and Knee Replacement Expectations Survey into Dutch and (2) to study test-retest reliability and concurrent validity. Methods Patients scheduled for total hip (N = 112) or knee replacement (N = 101) were sent the Dutch Expectations Surveys twice with a 2 week interval to determine test-retest reliability. To determine concurrent validity, the Expectation WOMAC was sent. Results The results for the Dutch Hip Replacement Expectations Survey revealed good test-retest reliability (ICC 0.87), no bias and good internal consistency (alpha 0.86) (N = 72). The correlation between the Hip Expectations Score and the Expectation WOMAC score was 0.59 (N = 86). The results for the Dutch Knee Replacement Expectations Survey revealed good test-retest reliability (ICC 0.79), no bias and good internal consistency (alpha 0.91) (N = 46). The correlation with the Expectation WOMAC score was 0.52 (N = 57). Conclusions Both Dutch Expectations Surveys are reliable instruments to determine patients' expectations before total hip or knee arthroplasty. As for concurrent validity, the correlation between both surveys and the Expectation WOMAC was moderate confirming that the same construct was determined. However, patients scored systematically lower on the Expectation WOMAC compared to the Dutch Expectation Surveys. Research on patients' expectations before total hip and knee replacement has only been performed in a limited amount of countries. With the Dutch Expectations Surveys it is now possible to determine patients' expectations in another culture and healthcare setting. PMID:20958990
Reliability and concurrent validity of the Dutch hip and knee replacement expectations surveys.
van den Akker-Scheek, Inge; van Raay, Jos J A M; Reininga, Inge H F; Bulstra, Sjoerd K; Zijlstra, Wiebren; Stevens, Martin
2010-10-19
Preoperative expectations of outcome of total hip and knee arthroplasty are important determinants of patients' satisfaction and functional outcome. Aims of the study were (1) to translate the Hospital for Special Surgery Hip Replacement Expectations Survey and Knee Replacement Expectations Survey into Dutch and (2) to study test-retest reliability and concurrent validity. Patients scheduled for total hip (N = 112) or knee replacement (N = 101) were sent the Dutch Expectations Surveys twice with a 2 week interval to determine test-retest reliability. To determine concurrent validity, the Expectation WOMAC was sent. The results for the Dutch Hip Replacement Expectations Survey revealed good test-retest reliability (ICC 0.87), no bias and good internal consistency (alpha 0.86) (N = 72). The correlation between the Hip Expectations Score and the Expectation WOMAC score was 0.59 (N = 86). The results for the Dutch Knee Replacement Expectations Survey revealed good test-retest reliability (ICC 0.79), no bias and good internal consistency (alpha 0.91) (N = 46). The correlation with the Expectation WOMAC score was 0.52 (N = 57). Both Dutch Expectations Surveys are reliable instruments to determine patients' expectations before total hip or knee arthroplasty. As for concurrent validity, the correlation between both surveys and the Expectation WOMAC was moderate confirming that the same construct was determined. However, patients scored systematically lower on the Expectation WOMAC compared to the Dutch Expectation Surveys. Research on patients' expectations before total hip and knee replacement has only been performed in a limited amount of countries. With the Dutch Expectations Surveys it is now possible to determine patients' expectations in another culture and healthcare setting.
Validation of the Headache Impact Test (HIT-6) in patients with chronic migraine.
Rendas-Baum, Regina; Yang, Min; Varon, Sepideh F; Bloudek, Lisa M; DeGryse, Ronald E; Kosinski, Mark
2014-08-01
The Headache Impact Test (HIT)-6 was developed and has been validated in patients with various types of headache. The objective of this study was to report the psychometric properties of the HIT-6 among patients with chronic migraine. Data came from two international, multicenter, randomized, double-blind, placebo-controlled clinical trials of chronic migraine patients (N = 1,384) undergoing prophylaxis therapy. Confirmatory factor analysis and differential item functioning (DIF) analysis were used to test the latent structure and cross-cultural comparability of the HIT-6. Reliability, construct validity, and responsiveness were assessed. Two sets of criterion groups were used: (1) 28-day headache frequency: <10, 10-14, and ≥15 days; (2) sample quartiles of the total cumulative hours of headache: <140, 140 to <280, 280 to <420, and ≥420 hours. Two sets of responsiveness categories were defined as reduction of <30%, 30% to <50%, or ≥50% in (1) number of headache days and (2) cumulative hours of headache. Measurement invariance tests supported the stability of the HIT-6 latent structure across studies. DIF analysis supported cross-cultural comparability. Good reliability was observed across studies (Cronbach's α: 0.75-0.92; intraclass correlation coefficient: 0.76-0.80). HIT-6 scores correlated strongly (-0.86 to -0.59) with scores of the Migraine-Specific Quality-of-Life Questionnaire. Analysis of variance indicated that HIT-6 scores discriminated across both types of criterion groups (P<0.001), across studies and time points. HIT-6 change scores were significantly higher in magnitude in groups experiencing greater improvement (P<0.001). All measurement properties were consistently verified across the two studies, supporting the validity of the HIT-6 among chronic migraine patients. NCT00156910 and NCT00168428 on www.ClinicalTrials.gov.
Wong, Carlos K H; Lang, Brian H H; Yu, Hill M S; Lam, Cindy L K
2017-08-01
The aim of this study was to examine the acceptability, validity, and reliability of the EuroQoL Five-Dimension Five-Level (EQ-5D-5L) and Short-Form Six-Dimension (SF-6D) health utility measures in patients with symptomatic benign thyroid nodules. Data from a randomized controlled trial (ClinicalTrials.gov identifier: NCT02398721) of 294 patients with symptomatic benign thyroid nodules were utilized for this psychometric evaluation of health-related quality of life (HR-QOL) measurement. Three HR-QOL questionnaires-the generic 12-item Short Form Health Survey (SF-12v2), EQ-5D-5L, and SF-6D-were interviewer-administered at baseline and 2 weeks afterwards. Responses to SF-6D were transformed to SF-6D utility scores using a Hong Kong population scoring algorithm derived by standard gamble, whereas responses to EQ-5D-5L were mapped onto EQ-5D-3L response via interim mapping algorithms and then converted to EQ-5D-5L utility scores using a Chinese-specific value set. Construct validity was determined by evaluating Spearman correlation between SF-12v2 scores and utility scores. Two-week test-retest reliability was assessed using intra-class correlation coefficient. No significant (>15%) floor and ceiling effects were observed for SF-6D utility scores. The SF-6D utility scores had a moderate Spearman rank correlation with the SF-12v2 domain score providing evidence for adequate construct validity. The SF-6D utility scores showed good test-retest reliability (0.794; range 0.696-0.860). Better reliability was observed in SF-6D utility scores than in EQ-5D-5L utility scores. While the EQ-5D-5L instrument was less reproducible, the SF-6D instrument appeared to be an applicable, valid, and reliable measure in assessing the HR-QOL of Chinese patients with symptomatic benign thyroid nodules. The impact of utility score selection on the effectiveness and cost effectiveness of clinical interventions targeted to these patients needs further exploration. NCT02398721, ClinicalTrials.gov.
Rebar, Amanda L.; Ram, Nilam; Conroy, David E.
2014-01-01
Objective The Single-Category Implicit Association Test (SC-IAT) has been used as a method for assessing automatic evaluations of physical activity, but measurement artifact or consciously-held attitudes could be confounding the outcome scores of these measures. The objective of these two studies was to address these measurement concerns by testing the validity of a novel SC-IAT scoring technique. Design Study 1 was a cross-sectional study, and study 2 was a prospective study. Method In study 1, undergraduate students (N = 104) completed SC-IATs for physical activity, flowers, and sedentary behavior. In study 2, undergraduate students (N = 91) completed a SC-IAT for physical activity, self-reported affective and instrumental attitudes toward physical activity, physical activity intentions, and wore an accelerometer for two weeks. The EZ-diffusion model was used to decompose the SC-IAT into three process component scores including the information processing efficiency score. Results In study 1, a series of structural equation model comparisons revealed that the information processing score did not share variability across distinct SC-IATs, suggesting it does not represent systematic measurement artifact. In study 2, the information processing efficiency score was shown to be unrelated to self-reported affective and instrumental attitudes toward physical activity, and positively related to physical activity behavior, above and beyond the traditional D-score of the SC-IAT. Conclusions The information processing efficiency score is a valid measure of automatic evaluations of physical activity. PMID:25484621
Measuring Decision-Making During Thyroidectomy: Validity Evidence for a Web-Based Assessment Tool.
Madani, Amin; Gornitsky, Jordan; Watanabe, Yusuke; Benay, Cassandre; Altieri, Maria S; Pucher, Philip H; Tabah, Roger; Mitmaker, Elliot J
2018-02-01
Errors in judgment during thyroidectomy can lead to recurrent laryngeal nerve injury and other complications. Despite the strong link between patient outcomes and intraoperative decision-making, methods to evaluate these complex skills are lacking. The purpose of this study was to develop objective metrics to evaluate advanced cognitive skills during thyroidectomy and to obtain validity evidence for them. An interactive online learning platform was developed ( www.thinklikeasurgeon.com ). Trainees and surgeons from four institutions completed a 33-item assessment, developed based on a cognitive task analysis and expert Delphi consensus. Sixteen items required subjects to make annotations on still frames of thyroidectomy videos, and accuracy scores were calculated based on an algorithm derived from experts' responses ("visual concordance test," VCT). Seven items were short answer (SA), requiring users to type their answers, and scores were automatically calculated based on their similarity to a pre-populated repertoire of correct responses. Test-retest reliability, internal consistency, and correlation of scores with self-reported experience and training level (novice, intermediate, expert) were calculated. Twenty-eight subjects (10 endocrine surgeons and otolaryngologists, 18 trainees) participated. There was high test-retest reliability (intraclass correlation coefficient = 0.96; n = 10) and internal consistency (Cronbach's α = 0.93). The assessment demonstrated significant differences between novices, intermediates, and experts in total score (p < 0.01), VCT score (p < 0.01) and SA score (p < 0.01). There was high correlation between total case number and total score (ρ = 0.95, p < 0.01), between total case number and VCT score (ρ = 0.93, p < 0.01), and between total case number and SA score (ρ = 0.83, p < 0.01). This study describes the development of novel metrics and provides validity evidence for an interactive Web-based platform to objectively assess decision-making during thyroidectomy.
Ortuño-Sierra, Javier; Aritio-Solana, Rebeca; Inchausti, Félix; Chocarro de Luis, Edurne; Lucas Molina, Beatriz; Pérez de Albéniz, Alicia; Fonseca-Pedrero, Eduardo
2017-01-01
The main purpose of the present study was to assess the depressive symptomatology and to gather new validity evidences of the Reynolds Depression Scale-Short form (RADS-SF) in a representative sample of youths. The sample consisted of 2914 adolescents with a mean age of 15.85 years (SD = 1.68). We calculated the descriptive statistics and internal consistency of the RADS-SF scores. Also, confirmatory factor analyses (CFAs) at the item level and successive multigroup CFAs to test measurement invariance, were conducted. Latent mean differences across gender and educational level groups were estimated, and finally, we studied the sources of validity evidences with other external variables. The level of internal consistency of the RADS-SF Total score by means of Ordinal alpha was .89. Results from CFAs showed that the one-dimensional model displayed appropriate goodness of-fit indices with CFI value over .95, and RMSEA value under .08. In addition, the results support the strong measurement invariance of the RADS-SF scores across gender and age. When latent means were compared, statistically significant differences were found by gender and age. Females scored 0.347 over than males in Depression latent variable, whereas older adolescents scored 0.111 higher than the younger group. In addition, the RADS-SF score was associated with the RADS scores. The results suggest that the RADS-SF could be used as an efficient screening test to assess self-reported depressive symptoms in adolescents from the general population.
Lee, Ji Hyun; Cho, Kyoung Im; Spertus, John; Kim, Seong Man
2012-08-01
The Peripheral Artery Questionnaire (PAQ), as developed in US English, is a validated scale to evaluate the health status of patients with peripheral artery disease (PAD). The aim of this study was to translate the PAQ into Korean and to evaluate its reliability and validity. A multi-step process of forward-translation, reconciliation, consultation with the developer, back-translation and proofreading was conducted. The test-retest reliability was evaluated at a 2-week interval using the intra-class correlation coefficient (ICC). The validity was assessed by identifying associations between Korean PAQ (KPAQ) scores and Korean Health Assessment Questionnaire (KHAQ) scores. A total of 100 PAD patients were enrolled: 63 without and 37 with severe claudication. The reliability of the KPAQ was adequate, with an ICC of 0.71. There were strong correlations between KPAQ's subscales. Cronbach's alpha for the summary score was 0.94, indicating good internal consistency and congruence with the original US version. The validity was supported by a significant correlation between the total KHAQ score and KPAQ physical function, stability, symptom, social limitation and quality of life scores (r = -0.24 to -0.90; p < 0.001) as well as between the KHAQ walking subscale and the KPAQ physical function score (r = -0.55, p < 0.001). Our results indicate that the KPAQ is a reliable, valid instrument to evaluate the health status of Korean patients with PAD.
Validity and Reliability of Nintendo Wii Fit Balance Scores
Wikstrom, Erik A.
2012-01-01
Context: Interactive gaming systems have the potential to help rehabilitate patients with musculoskeletal conditions. The Nintendo Wii Balance Board, which is part of the Wii Fit game, could be an effective tool to monitor progress during rehabilitation because the board and game can provide objective measures of balance. However, the validity and reliability of Wii Fit balance scores remain unknown. Objective: To determine the concurrent validity of balance scores produced by the Wii Fit game and the intrasession and intersession reliability of Wii Fit balance scores. Design: Descriptive laboratory study. Setting: Sports medicine research laboratory. Patients or Other Participants: Forty-five recreationally active participants (age = 27.0 ± 9.8 years, height = 170.9 ± 9.2 cm, mass = 72.4 ± 11.8 kg) with a heterogeneous history of lower extremity injury. Intervention(s): Participants completed a single-limb–stance task on a force plate and the Star Excursion Balance Test (SEBT) during the first test session. Twelve Wii Fit balance activities were completed during 2 test sessions separated by 1 week. Main Outcome Measure(s): Postural sway in the anteroposterior (AP) and mediolateral (ML) directions and the AP, ML, and resultant center-of-pressure (COP) excursions were calculated from the single-limb stance. The normalized reach distance was recorded for the anterior, posteromedial, and posterolateral directions of the SEBT. Wii Fit balance scores that the game software generated also were recorded. Results: All 96 of the calculated correlation coefficients among Wii Fit activity outcomes and established balance outcomes were interpreted as poor (r < 0.50). Intrasession reliability for Wii Fit balance activity scores ranged from good (intraclass correlation coefficient [ICC] = 0.80) to poor (ICC = 0.39), with 8 activities having poor intrasession reliability. Similarly, 11 of the 12 Wii Fit balance activity scores demonstrated poor intersession reliability, with scores ranging from fair (ICC = 0.74) to poor (ICC = 0.29). Conclusions: Wii Fit balance activity scores had poor concurrent validity relative to COP outcomes and SEBT reach distances. In addition, the included Wii Fit balance activity scores generally had poor intrasession and intersession reliability. PMID:22892412
Reliability and Concurrent Validity of Dynamic Rotator Stability Test-A Cross Sectional study.
Binoy Mathew, K V; Eapen, Charu; Kumar, P Senthil
2012-01-01
To find intra rater and inter rater reliability of Dynamic Rotator Stability Test (DRST) and to find concurrent validity of Dynamic Rotator Stability Test (DRST) with University of Pennsylvania Shoulder Score (PENN) Scale. 40 subjects of either gender between the age group of 18-70 with painful shoulder conditions of musculoskeletal origin was selected through convenient sampling. Tester 1 and tester 2 administered DRST and PENN scale randomly. In a subgroup of 20 subjects DRST was administered by both the testers to find the inter rater reliability. 180° Standard Universal Goniometer was used to take measurements. For intra-rater reliability, all the test variables were showing highly significant correlation (p=.94 - 1). For inter -rater, with tester 2, test variables like position, ROM, force, direction of abnormal translation, pain during the test, compensatory movement during test were found to be significant (p=.71-1).only some variables of DRST showed significant correlation with PENN scale (P=.320-.450). Dynamic Rotator Stability Test has good intra rater and moderate inter rater reliability. Concurrent validity of Dynamic Rotator Stability Test was found to be poor when compared to PENN Shoulder Score.
Validating a tool to measure auxiliary nurse midwife and nurse motivation in rural Nepal.
Morrison, Joanna; Batura, Neha; Thapa, Rita; Basnyat, Regina; Skordis-Worrall, Jolene
2015-05-12
A global shortage of health workers in rural areas increases the salience of motivating and supporting existing health workers. Understandings of motivation may vary in different settings, and it is important to use measurement methods that are contextually appropriate. We identified a measurement tool, previously used in Kenya, and explored its validity and reliability to measure the motivation of auxiliary nurse midwives (ANM) and staff nurses (SN) in rural Nepal. Qualitative and quantitative methods were used to assess the content validity, the construct validity, the internal consistency and the reliability of the tool. We translated the tool into Nepali and it was administered to 137 ANMs and SNs in three districts. We collected qualitative data from 78 nursing personnel and district- and central-level stakeholders using interviews and focus group discussions. We calculated motivation scores for ANMs and SNs using the quantitative data and conducted statistical tests for validity and reliability. Motivation scores were compared with qualitative data. Descriptive exploratory analysis compared mean motivation scores by ANM and SN sociodemographic characteristics. The concept of self-efficacy was added to the tool before data collection. Motivation was revealed through conscientiousness. Teamwork and the exertion of extra effort were not adequately captured by the tool, but important in illustrating motivation. The statement on punctuality was problematic in quantitative analysis, and attendance was more expressive of motivation. The calculated motivation scores usually reflected ANM and SN interview data, with some variation in other stakeholder responses. The tool scored within acceptable limits in validity and reliability testing and was able to distinguish motivation of nursing personnel with different sociodemographic characteristics. We found that with minor modifications, the tool provided valid and internally consistent measures of motivation among ANMs and SNs in this context. We recommend the use of this tool in similar contexts, with the addition of statements about self-efficacy, teamwork and exertion of extra effort. Absenteeism should replace the punctuality statement, and statements should be worded both positively and negatively to mitigate positive response bias. Collection of qualitative data on motivation creates a more nuanced understanding of quantitative scores.
Unnanuntana, Aasis; Ruangsomboon, Pakpoom; Keesukpunt, Worawut
2018-06-01
The 2-minute walk test (2mwt) is a performance-based test that evaluates functional recovery after total knee arthroplasty (TKA). This study evaluated its validity compared with the modified Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC), Oxford Knee Score (OKS), modified Knee Score, Numerical Pain Rating Scale, and Timed Up and Go test, and its responsiveness in assessing functional recovery in TKA patients. This prospective cohort study included 162 patients undergoing primary TKA between 2013 and 2015. We used patient-reported outcome measures (modified WOMAC, OKS, modified Knee Score, Numerical Pain Rating Scale) and performance-based tests (2mwt and Timed Up and Go test) at baseline and 3, 6, and 12 months postoperatively. The construct validity of 2mwt was determined between the 2mwt distances walked and other outcome measurements. To assess responsiveness, effect size and standardized response mean were analyzed. Minimal clinically important difference of 2mwt at 12 months after TKA was also calculated. All outcome measurements improved significantly from baseline to 3, 6, and 12 months postoperatively. Bivariate analysis revealed mild to moderate associations between the 2mwt and modified WOMAC function subscales, and moderate to strong associations with OKS. Mild to moderate correlations were found for pain and stiffness between 2mwt and other outcome measurements. The effect size and standardized response mean at 12 months were large, with a minimal clinically important difference of 12.7 m. 2mwt is a validated performance-based test with responsiveness properties. Being simple and easy to perform, it can be used routinely in clinical practice to evaluate functional recovery after TKA. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Monroe, Todd; Carter, Michael
2012-09-01
Cognitive scales are used frequently in geriatric research and practice. These instruments are constructed with underlying assumptions that are a part of their validation process. A common measurement scale used in older adults is the Folstein Mini Mental State Exam (MMSE). The MMSE was designed to screen for cognitive impairment and is used often in geriatric research. This paper has three aims. Aim one was to explore four potential threats to validity in the use of the MMSE: (1) administering the exam without meeting the underlying assumptions, (2) not reporting that the underlying assumptions were assessed prior to test administration, (3) use of variable and inconsistent cut-off scores for the determination of presence of cognitive impairment, and (4) failure to adjust the scores based on the demographic characteristics of the tested subject. Aim two was to conduct a literature search to determine if the assumptions of (1) education level assessment, (2) sensory assessment, and (3) language fluency were being met and clearly reported in published research using the MMSE. Aim three was to provide recommendations to minimalize threats to validity in research studies that use cognitive scales, such as the MMSE. We found inconsistencies in published work in reporting whether or not subjects meet the assumptions that underlie a reliable and valid MMSE score. These inconsistencies can pose threats to the reliability of exam results. Fourteen of the 50 studies reviewed reported inclusion of all three of these assumptions. Inconsistencies in reporting the inclusion of the underlying assumptions for a reliable score could mean that subjects were not appropriate to be tested by use of the MMSE or that an appropriate test administration of the MMSE was not clearly reported. Thus, the research literature could have threats to both validity and reliability based on misuse of or improper reported use of the MMSE. Six recommendations are provided to minimalize these threats in future research.
Cappelleri, Joseph C; Jason Lundy, J; Hays, Ron D
2014-05-01
The US Food and Drug Administration's guidance for industry document on patient-reported outcomes (PRO) defines content validity as "the extent to which the instrument measures the concept of interest" (FDA, 2009, p. 12). According to Strauss and Smith (2009), construct validity "is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity" (p. 7). Hence, both qualitative and quantitative information are essential in evaluating the validity of measures. We review classical test theory and item response theory (IRT) approaches to evaluating PRO measures, including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized "difficulty" (severity) order of items is represented by observed responses. If a researcher has few qualitative data and wants to get preliminary information about the content validity of the instrument, then descriptive assessments using classical test theory should be the first step. As the sample size grows during subsequent stages of instrument development, confidence in the numerical estimates from Rasch and other IRT models (as well as those of classical test theory) would also grow. Classical test theory and IRT can be useful in providing a quantitative assessment of items and scales during the content-validity phase of PRO-measure development. Depending on the particular type of measure and the specific circumstances, the classical test theory and/or the IRT should be considered to help maximize the content validity of PRO measures. Copyright © 2014 Elsevier HS Journals, Inc. All rights reserved.
Sisic, Nedim; Jelicic, Mario; Pehar, Miran; Spasic, Miodrag; Sekulic, Damir
2016-01-01
In basketball, anthropometric status is an important factor when identifying and selecting talents, while agility is one of the most vital motor performances. The aim of this investigation was to evaluate the influence of anthropometric variables and power capacities on different preplanned agility performances. The participants were 92 high-level, junior-age basketball players (16-17 years of age; 187.6±8.72 cm in body height, 78.40±12.26 kg in body mass), randomly divided into a validation and cross-validation subsample. The predictors set consisted of 16 anthropometric variables, three tests of power-capacities (Sargent-jump, broad-jump and medicine-ball-throw) as predictors. The criteria were three tests of agility: a T-Shape-Test; a Zig-Zag-Test, and a test of running with a 180-degree turn (T180). Forward stepwise multiple regressions were calculated for validation subsamples and then cross-validated. Cross validation included correlations between observed and predicted scores, dependent samples t-test between predicted and observed scores; and Bland Altman graphics. Analysis of the variance identified centres being advanced in most of the anthropometric indices, and medicine-ball-throw (all at P<0.05); with no significant between-position-differences for other studied motor performances. Multiple regression models originally calculated for the validation subsample were then cross-validated, and confirmed for Zig-zag-Test (R of 0.71 and 0.72 for the validation and cross-validation subsample, respectively). Anthropometrics were not strongly related to agility performance, but leg length is found to be negatively associated with performance in basketball-specific agility. Power capacities are confirmed to be an important factor in agility. The results highlighted the importance of sport-specific tests when studying pre-planned agility performance in basketball. The improvement in power capacities will probably result in an improvement in agility in basketball athletes, while anthropometric indices should be used in order to identify those athletes who can achieve superior agility performance.
Simons, Janine A; Fietzek, Urban M; Waldmann, Annika; Warnecke, Tobias; Schuster, Tibor; Ceballos-Baumann, Andrés O
2014-09-01
Dysphagia in patients with Parkinson's disease (PD) significantly reduces quality of life and predicted lifetime. Current screening procedures are insufficiently evaluated. We aimed to develop and validate a patient-reported outcome questionnaire for early diagnosis of dysphagia in patients with PD. The two-phased project comprised the questionnaire, diagnostic scales construction (N = 105), and a validation study (N = 82). Data for the project were gathered from PD patients at a German Movement Disorder Center. For validation purposes, a clinical evaluation focusing on swallowing tests, tests of sensory reflexes, and fiberoptic endoscopic evaluation of swallowing (FEES) was performed that yielded a criteria sum score against which the results of the questionnaire were compared. Specificity and sensitivity were evaluated for the detection of noticeable dysphagia and for the risk of aspiration. The Munich Dysphagia Test - Parkinson's disease (MDT-PD) consists of 26 items that show high internal consistency (α = 0.91). For the validation study, 82 patients, aged 70.9 ± 8.7 (mean ± SD), with a median Hoehn & Yahr stage of 3, were assessed. 73% of patients had dysphagia with noticeable oropharyngeal symptoms (44%) or with penetration/aspiration (29%). The criteria sum score correlated positively with the screening result (r = 0.70, p < 0.001). The MDT-PD sum score classified not noticeable dysphagia vs. risk of aspiration (noticeable dysphagia) with a sensitivity of 90% (82%) and a specificity of 86% (71%), and yielded similar results in cross-validation, respectively. MDT-PD is a valid screening tool for early diagnosis of swallowing problems and aspiration risk, as well as initial graduation of dysphagia severity in PD patients. Copyright © 2014 Elsevier Ltd. All rights reserved.
Validation of the Perceived Stigmatization Questionnaire for Brazilian adult burn patients.
Freitas, Noélle de Oliveira; Forero, Carlos García; Caltran, Marina Paes; Alonso, Jordi; Dantas, Rosana A Spadoti; Piccolo, Monica Sarto; Farina, Jayme Adriano; Lawrence, John W; Rossi, Lidia A
2018-01-01
Currently, there is no questionnaire to assess perceived stigmatization among people with visible differences in Brazil. The Perceived Stigmatization Questionnaire (PSQ), developed in the United States, is a valid instrument to assess the perception of stigmatizing behaviours among burn survivors. The objective of this cross-sectional and multicentre study was to assess the factor structure, reliability and validity of the Brazilian Portuguese version of the PSQ in burn patients. A Brazilian version of the 21-item PSQ was answered by 240 adult burn patients, undergoing rehabilitation in two burns units in Brazil. We tested its construct validity by correlating PSQ scores with depression (Beck Depression Index-BDI) and self-esteem (Rosenberg Self-Esteem Scale-RSE), as well as with two domains of the Revised Burn Specific Health Scale-BSHS-R: affect and body image, and interpersonal relationships. We used Confirmatory Item Factor Analysis (CIFA) to test whether the data fit a measurement model involving a three-factor structure (absence of friendly behaviour; confusing/staring behaviour; and hostile behaviour). We conducted Exploratory Factor Analyses (EFA) of the subscale in a 50% random sample of individuals (training split), treating items as ordinal categorical using unweighted least squares estimation. To assess discriminant validity of the Brazilian version of the PSQ we correlated PSQ scores with known groups (sex, total body surface area burned, and visibility of the scars) and assessed its reliability by means of Cronbach's alpha and using test-retest. Goodness-of-fit indices for confirmatory factor analysis were satisfactory for the PSQ, but not for the hostile behaviour subscale, which was modified to improve fit by eliminating 3 items. Cronbach's alphas for the PSQ refined version (PSQ-R) ranged from 0.65 to 0.88, with test-retest reliability 0.87 for the total score. The PSQ-R scores correlated strongly with depression (0.63; p < 0.001), self-esteem (-0.57; p < 0.001), body image (-0.63; p < 0.001), and interpersonal relationships (-0.55; p < 0.001). PSQ-R total scores were significantly lower for patients with visible scars (effect size = 0.51, p = 0.029). The PSQ-R showed reliability and validity comparable to the original version. However, the cross-cultural structure of the subscale "hostile behaviour" and sensitivity to change of the PSQ should be further evaluated.
NASA Astrophysics Data System (ADS)
Cataloglu, Erdat
The purpose of this study was to construct a valid and reliable multiple-choice achievement test to assess students' understanding of core concepts of introductory quantum mechanics. Development of the Quantum Mechanics Visualization Instrument (QMVI) occurred across four successive semesters in 1999--2001. During this time 213 undergraduate and graduate students attending the Pennsylvania State University (PSU) at University Park and Arizona State University (ASU) participated in this development and validation study. Participating students were enrolled in four distinct groups of courses: Modern Physics, Undergraduate Quantum Mechanics, Graduate Quantum Mechanics, and Chemistry Quantum Mechanics. Expert panels of professors of physics experienced in teaching quantum mechanics courses and graduate students in physics and science education established the core content and assisted in the validating of successive versions of the 24-question QMVI. Instrument development was guided by procedures outlined in the Standards for Educational and Psychological Testing (AERA-APA-NCME, 1999). Data gathered in this study provided information used in the development of successive versions of the QMVI. Data gathered in the final phase of administration of the QMVI also provided evidence that the intended score interpretation of the QMVI achievement test is valid and reliable. A moderate positive correlation coefficient of 0.49 was observed between the students' QMVI scores and their confidence levels. Analyses of variance indicated that students' scores in Graduate Quantum Mechanics and Undergraduate Quantum Mechanics courses were significantly higher than the mean scores of students in Modern Physics and Chemistry Quantum Mechanics courses (p < 0.05). That finding is consistent with the additional understanding and experience that should be anticipated in graduate students and junior-senior level students over sophomore physics majors and majors in another field. The moderate positive correlation coefficient of 0.42 observed between students' QMVI scores and their final course grades was also consistent with expectations in a valid instrument. In addition, the Cronbach-alpha reliability coefficient of the QMVI was found to be 0.82. Limited findings were drawn on students' understanding of introductory quantum mechanics concepts. Data suggested that the construct of quantum mechanics understanding is most likely multidimensional and the Main Topic defined as "Quantum Mechanics Postulates" may be an especially important factor for students in acquiring a successful understanding of quantum mechanics.
Validation of the Perceived Stigmatization Questionnaire for Brazilian adult burn patients
Forero, Carlos García; Caltran, Marina Paes; Alonso, Jordi; Dantas, Rosana A. Spadoti; Piccolo, Monica Sarto; Farina, Jayme Adriano; Lawrence, John W.; Rossi, Lidia A.
2018-01-01
Currently, there is no questionnaire to assess perceived stigmatization among people with visible differences in Brazil. The Perceived Stigmatization Questionnaire (PSQ), developed in the United States, is a valid instrument to assess the perception of stigmatizing behaviours among burn survivors. The objective of this cross-sectional and multicentre study was to assess the factor structure, reliability and validity of the Brazilian Portuguese version of the PSQ in burn patients. A Brazilian version of the 21-item PSQ was answered by 240 adult burn patients, undergoing rehabilitation in two burns units in Brazil. We tested its construct validity by correlating PSQ scores with depression (Beck Depression Index-BDI) and self-esteem (Rosenberg Self-Esteem Scale-RSE), as well as with two domains of the Revised Burn Specific Health Scale—BSHS-R: affect and body image, and interpersonal relationships. We used Confirmatory Item Factor Analysis (CIFA) to test whether the data fit a measurement model involving a three-factor structure (absence of friendly behaviour; confusing/staring behaviour; and hostile behaviour). We conducted Exploratory Factor Analyses (EFA) of the subscale in a 50% random sample of individuals (training split), treating items as ordinal categorical using unweighted least squares estimation. To assess discriminant validity of the Brazilian version of the PSQ we correlated PSQ scores with known groups (sex, total body surface area burned, and visibility of the scars) and assessed its reliability by means of Cronbach's alpha and using test-retest. Goodness-of-fit indices for confirmatory factor analysis were satisfactory for the PSQ, but not for the hostile behaviour subscale, which was modified to improve fit by eliminating 3 items. Cronbach’s alphas for the PSQ refined version (PSQ-R) ranged from 0.65 to 0.88, with test-retest reliability 0.87 for the total score. The PSQ-R scores correlated strongly with depression (0.63; p < 0.001), self-esteem (-0.57; p < 0.001), body image (-0.63; p < 0.001), and interpersonal relationships (-0.55; p < 0.001). PSQ-R total scores were significantly lower for patients with visible scars (effect size = 0.51, p = 0.029). The PSQ-R showed reliability and validity comparable to the original version. However, the cross-cultural structure of the subscale “hostile behaviour” and sensitivity to change of the PSQ should be further evaluated. PMID:29381711
Evren, Cuneyt; Ogel, Kultegin; Evren, Bilge; Bozkurt, Muge
2014-01-01
The aim of this study was to evaluate psychometric properties of the Drug Use Disorders Identification Test (DUDIT) and the Drug Abuse Screening Test (DAST-10) in prisoners with (n = 124) or without (n = 78) drug use disorder. Participants were evaluated with the DUDIT, the DAST-10, and the Addiction Profile Index-Short (API-S). The DUDIT and the DAST-10 were found to be psychometrically sound drug abuse screening measures with high convergent validity when compared with each other (r = 0.86), and API-S (r = 0.88 and r = 0.84, respectively), and to have a Cronbach's α of 0.93 and 0.87, respectively. In addition, a single component accounted for 58.28% of total variance for DUDIT, whereas this was 47.10% for DAST-10. The DUDIT had sensitivity and specificity scores of 0.95 and 0.79, respectively, when using the optimal cut-off score of 10, whereas these scores were 0.88 and 0.74 for the DAST-10 when using the optimal cut-off score of 4. Additionally, both the DUDIT and the DAST-10 showed good discriminant validity as they differentiated prisoners with drug use disorder from those without. Findings support the Turkish versions of both the DUDIT and the DAST-10 as reliable and valid drug abuse screening instruments that measure unidimensional constructs.
Why Education Practitioners and Stakeholders Should Care about Person Fit in Educational Assessments
ERIC Educational Resources Information Center
Walker, A. Adrienne
2017-01-01
In this article, A. Adrienne Walker introduces the concept of person fit to education stakeholders as a source of evidence to inform the trustworthiness of a test score for interpretation and use (validity). Person fit analyses are used in educational measurement research to explore the degree to which a person's test score can be interpreted as a…
Willeumier, Julie J; van der Wal, C W P G; van der Wal, Robert J P; Dijkstra, P D S; Vliet Vlieland, Thea P M; van de Sande, Michiel A J
2017-01-01
The aim of this study was to translate and culturally adapt the Toronto Extremity Salvage Score (TESS) to Dutch and to validate the translated version. The TESS lower and upper extremity versions (LE and UE) were translated to Dutch according to international guidelines. The translated version was validated in 98 patients with surgically treated bone or soft tissue tumors of the LE or UE. To assess test-retest reliability, participants were asked to fill in a second questionnaire after one week. Construct validity was determined by computing Spearman rank correlations with the Short Form- (SF-) 36. The internal consistency (0.957 and 0.938 for LE and UE, resp.) and test-retest reliability (intraclass correlation coefficients 0.963 and 0.969 for LE and UE, resp.) were good for both questionnaires. The Dutch LE and UE TESS versions correlated most strongly with the SF-36 physical function dimension ( r = 0.737 for LE, 0.726 for UE) and the physical component summary score ( r = 0.811 and 0.797 for LE and UE). The Dutch TESS questionnaire for lower and upper extremities is a consistent, reliable, and valid instrument to measure patient-reported physical function in surgically treated patients with a soft tissue or bone tumor.
Construct Validity and Reliability of the SARA Gait and Posture Sub-scale in Early Onset Ataxia
Lawerman, Tjitske F.; Brandsma, Rick; Verbeek, Renate J.; van der Hoeven, Johannes H.; Lunsing, Roelineke J.; Kremer, Hubertus P. H.; Sival, Deborah A.
2017-01-01
Aim: In children, gait and posture assessment provides a crucial marker for the early characterization, surveillance and treatment evaluation of early onset ataxia (EOA). For reliable data entry of studies targeting at gait and posture improvement, uniform quantitative biomarkers are necessary. Until now, the pediatric test construct of gait and posture scores of the Scale for Assessment and Rating of Ataxia sub-scale (SARA) is still unclear. In the present study, we aimed to validate the construct validity and reliability of the pediatric (SARAGAIT/POSTURE) sub-scale. Methods: We included 28 EOA patients [15.5 (6–34) years; median (range)]. For inter-observer reliability, we determined the ICC on EOA SARAGAIT/POSTURE sub-scores by three independent pediatric neurologists. For convergent validity, we associated SARAGAIT/POSTURE sub-scores with: (1) Ataxic gait Severity Measurement by Klockgether (ASMK; dynamic balance), (2) Pediatric Balance Scale (PBS; static balance), (3) Gross Motor Function Classification Scale -extended and revised version (GMFCS-E&R), (4) SARA-kinetic scores (SARAKINETIC; kinetic function of the upper and lower limbs), (5) Archimedes Spiral (AS; kinetic function of the upper limbs), and (6) total SARA scores (SARATOTAL; i.e., summed SARAGAIT/POSTURE, SARAKINETIC, and SARASPEECH sub-scores). For discriminant validity, we investigated whether EOA co-morbidity factors (myopathy and myoclonus) could influence SARAGAIT/POSTURE sub-scores. Results: The inter-observer agreement (ICC) on EOA SARAGAIT/POSTURE sub-scores was high (0.97). SARAGAIT/POSTURE was strongly correlated with the other ataxia and functional scales [ASMK (rs = -0.819; p < 0.001); PBS (rs = -0.943; p < 0.001); GMFCS-E&R (rs = -0.862; p < 0.001); SARAKINETIC (rs = 0.726; p < 0.001); AS (rs = 0.609; p = 0.002); and SARATOTAL (rs = 0.935; p < 0.001)]. Comorbid myopathy influenced SARAGAIT/POSTURE scores by concurrent muscle weakness, whereas comorbid myoclonus predominantly influenced SARAKINETIC scores. Conclusion: In young EOA patients, separate SARAGAIT/POSTURE parameters reveal a good inter-observer agreement and convergent validity, implicating the reliability of the scale. In perspective of incomplete discriminant validity, it is advisable to interpret SARAGAIT/POSTURE scores for comorbid muscle weakness. PMID:29326569
Gervais, Roger O; Ben-Porath, Yossef S; Wygant, Dustin B; Green, Paul
2008-12-01
The MMPI-2 Response Bias Scale (RBS) is designed to detect response bias in forensic neuropsychological and disability assessment settings. Validation studies have demonstrated that the scale is sensitive to cognitive response bias as determined by failure on the Word Memory Test (WMT) and other symptom validity tests. Exaggerated memory complaints are a common feature of cognitive response bias. The present study was undertaken to determine the extent to which the RBS is sensitive to memory complaints and how it compares in this regard to other MMPI-2 validity scales and indices. This archival study used MMPI-2 and Memory Complaints Inventory (MCI) data from 1550 consecutive non-head-injury disability-related referrals to the first author's private practice. ANOVA results indicated significant increases in memory complaints across increasing RBS score ranges with large effect sizes. Regression analyses indicated that the RBS was a better predictor of the mean memory complaints score than the F, F(B), and F(P) validity scales and the FBS. There was no correlation between the RBS and the CVLT, an objective measure of verbal memory. These findings suggest that elevated scores on the RBS are associated with over-reporting of memory problems, which provides further external validation of the RBS as a sensitive measure of cognitive response bias. Interpretive guidelines for the RBS are provided.
A Supervised Learning Process to Validate Online Disease Reports for Use in Predictive Models.
Patching, Helena M M; Hudson, Laurence M; Cooke, Warrick; Garcia, Andres J; Hay, Simon I; Roberts, Mark; Moyes, Catherine L
2015-12-01
Pathogen distribution models that predict spatial variation in disease occurrence require data from a large number of geographic locations to generate disease risk maps. Traditionally, this process has used data from public health reporting systems; however, using online reports of new infections could speed up the process dramatically. Data from both public health systems and online sources must be validated before they can be used, but no mechanisms exist to validate data from online media reports. We have developed a supervised learning process to validate geolocated disease outbreak data in a timely manner. The process uses three input features, the data source and two metrics derived from the location of each disease occurrence. The location of disease occurrence provides information on the probability of disease occurrence at that location based on environmental and socioeconomic factors and the distance within or outside the current known disease extent. The process also uses validation scores, generated by disease experts who review a subset of the data, to build a training data set. The aim of the supervised learning process is to generate validation scores that can be used as weights going into the pathogen distribution model. After analyzing the three input features and testing the performance of alternative processes, we selected a cascade of ensembles comprising logistic regressors. Parameter values for the training data subset size, number of predictors, and number of layers in the cascade were tested before the process was deployed. The final configuration was tested using data for two contrasting diseases (dengue and cholera), and 66%-79% of data points were assigned a validation score. The remaining data points are scored by the experts, and the results inform the training data set for the next set of predictors, as well as going to the pathogen distribution model. The new supervised learning process has been implemented within our live site and is being used to validate the data that our system uses to produce updated predictive disease maps on a weekly basis.
Thamboo, Andrew; Velasquez, Nathalia; Habib, Al-Rahim R; Zarabanda, David; Paknezhad, Hassan; Nayak, Jayakar V
2017-08-01
The validated Empty Nose Syndrome 6-Item Questionnaire (ENS6Q) identifies empty nose syndrome (ENS) patients. The unvalidated cotton test assesses improvement in ENS-related symptoms. By first validating the cotton test using the ENS6Q, we define the minimal clinically important difference (MCID) score for the ENS6Q. Individual case-control study. Fifteen patients diagnosed with ENS and 18 controls with non-ENS sinonasal conditions underwent office cotton placement. Both groups completed ENS6Q testing in three conditions-precotton, cotton in situ, and postcotton-to measure the reproducibility of ENS6Q scoring. Participants also completed a five-item transition scale ranging from "much better" to "much worse" to rate subjective changes in nasal breathing with and without cotton placement. Mean changes for each transition point, and the ENS6Q MCID, were then calculated. In the precotton condition, significant differences (P < .001) in all ENS6Q questions between ENS and controls were noted. With cotton in situ, nearly all prior ENS6Q differences normalized between ENS and control patients. For ENS patients, the changes in the mean differences between the precotton and cotton in situ conditions compared to postcotton versus cotton in situ conditions were insignificant among individuals. Including all 33 participants, the mean change in the ENS6Q between the parameters "a little better" and "about the same" was 4.25 (standard deviation [SD] = 5.79) and -2.00 (SD = 3.70), giving an MCID of 6.25. Cotton testing is a validated office test to assess for ENS patients. Cotton testing also helped to determine the MCID of the ENS6Q, which is a 7-point change from the baseline ENS6Q score. 3b. Laryngoscope, 127:1746-1752, 2017. © 2017 The American Laryngological, Rhinological and Otological Society, Inc.
Validity of a novel computerized screening test system for mild cognitive impairment.
Park, Jin-Hyuck; Jung, Minye; Kim, Jongbae; Park, Hae Yean; Kim, Jung-Ran; Park, Ji-Hyuk
2018-06-20
ABSTRACTBackground:The mobile screening test system for screening mild cognitive impairment (mSTS-MCI) was developed for clinical use. However, the clinical usefulness of mSTS-MCI to detect elderly with MCI from those who are cognitively healthy has yet to be validated. Moreover, the comparability between this system and traditional screening tests for MCI has not been evaluated. The purpose of this study was to examine the validity and reliability of the mSTS-MCI and confirm the cut-off scores to detect MCI. The data were collected from 107 healthy elderly people and 74 elderly people with MCI. Concurrent validity was examined using the Korean version of Montreal Cognitive Assessment (MoCA-K) as a gold standard test, and test-retest reliability was investigated using 30 of the study participants at four-week intervals. The sensitivity, specificity, positive predictive value, and negative predictive value (NPV) were confirmed through Receiver Operating Characteristic (ROC) analysis, and the cut-off scores for elderly people with MCI were identified. Concurrent validity showed statistically significant correlations between the mSTS-MCI and MoCA-K and test-rests reliability indicated high correlation. As a result of screening predictability, the mSTS-MCI had a higher NPV than the MoCA-K. The mSTS-MCI was identified as a system with a high degree of validity and reliability. In addition, the mSTS-MCI showed high screening predictability, indicating it can be used in the clinical field as a screening test system for mild cognitive impairment.
[German validation of the Acute Cystitis Symptom Score].
Alidjanov, J F; Pilatz, A; Abdufattaev, U A; Wiltink, J; Weidner, W; Naber, K G; Wagenlehner, F
2015-09-01
The Uzbek version of the Acute Cystitis Symptom Score (ACSS) was developed as a simple self-reporting questionnaire to improve diagnosis and therapy of women with acute cystitis (AC). The purpose of this work was to validate the ACSS in the German language. The ACSS consists of 18 questions in four subscales: (1) typical symptoms, (2) differential diagnosis, (3) quality of life, and (4) additional circumstances. Translation of the ACSS into German was performed according to international guidelines. For the validation process 36 German-speaking women (age: 18-90 years), with and without symptoms of AC, were included in the study. Classification of participants into two groups (patients or controls) was based on the presence or absence of typical symptoms and significant bacteriuria (≥ 10(3) CFU/ml). Statistical evaluations of reliability, validity, and predictive ability were performed. ROC curve analysis was performed to assess sensitivity and specificity of ACSS and its subscales. The Mann-Whitney's U test and t-test were used to compare the scores of the groups. Of the 36 German-speaking women (age: 40 ± 19 years), 19 were diagnosed with AC (patient group), while 17 women served as controls. Cronbach's α for the German ACSS total scale was 0.87. A threshold score of ≥ 6 points in category 1 (typical symptoms) significantly predicted AC (sensitivity 94.7%, specificity 82.4%). There were no significant differences in ACSS scores in patients and controls compared to the original Uzbek version of the ACSS. The German version of the ACSS showed a high reliability and validity. Therefore, the German version of the ACSS can be reliably used in clinical practice and research for diagnosis and therapeutic monitoring of patients suffering from AC.
Validation of measures from the smartphone sway balance application: a pilot study.
Patterson, Jeremy A; Amick, Ryan Z; Thummar, Tarunkumar; Rogers, Michael E
2014-04-01
A number of different balance assessment techniques are currently available and widely used. These include both subjective and objective assessments. The ability to provide quantitative measures of balance and posture is the benefit of objective tools, however these instruments are not generally utilized outside of research laboratory settings due to cost, complexity of operation, size, duration of assessment, and general practicality. The purpose of this pilot study was to assess the value and validity of using software developed to access the iPod and iPhone accelerometers output and translate that to the measurement of human balance. Thirty healthy college-aged individuals (13 male, 17 female; age = 26.1 ± 8.5 years) volunteered. Participants performed a static Athlete's Single Leg Test protocol for 10 sec, on a Biodex Balance System SD while concurrently utilizing a mobile device with balance software. Anterior/posterior stability was recorded using both devices, described as the displacement in degrees from level, and was termed the "balance score." There were no significant differences between the two reported balance scores (p = 0.818. Mean balance score on the balance platform was 1.41 ± 0.90, as compared to 1.38 ± 0.72 using the mobile device. There is a need for a valid, convenient, and cost-effective tool to objectively measure balance. Results of this study are promising, as balance score derived from the Smartphone accelerometers were consistent with balance scores obtained from a previously validated balance system. However, further investigation is necessary as this version of the mobile software only assessed balance in the anterior/posterior direction. Additionally, further testing is necessary on a healthy populations and as well as those with impairment of the motor control system. Level 2b (Observational study of validity)(1.)
Huang, Min H; Miller, Kara; Smith, Kristin; Fredrickson, Kayle; Shilling, Tracy
2016-01-01
Cancer is primarily a disease of older adults. About 77% of all cancers are diagnosed in persons aged 55 years and older. Cancer and its treatment can cause diverse sequelae impacting body systems underlying balance control. No study has examined the psychometric properties of balance assessment tools in older cancer survivors, presenting a significant challenge in the selection of outcome measures for clinicians treating this fast-growing population. This study aimed to determine the reliability, validity, and minimal detectable change (MDC) of the Balance Evaluation System Test (BESTest), Mini-Balance Evaluation Systems Test (Mini-BESTest), and Brief-Balance Evaluation Systems Test (Brief-BESTest) in community-dwelling older cancer survivors. This study was a cross-sectional design. Twenty breast and 8 prostate cancer survivors participated [age (SD) = 68.4 (8.13) years]. The BESTest and Activity-specific Balance Confidence (ABC) Scale were administered during the first session. Scores of Mini-BESTest and Brief-BESTest were extracted on the basis of the scores of BESTest. The BESTest was repeated within 1 to 2 weeks by the same rater to determine the test-retest reliability. For the analysis of the inter-rater reliability, 21 participants were randomly selected to be evaluated by 2 raters. A primary rater administered the test. The 2 raters independently and concurrently scored the performance of the participants. Each rater recorded the ratings separately on the scoring sheet. No discussion among the raters was allowed throughout the testing. Intraclass correlation coefficients (ICCs), standard error of measurement, minimal detectable change (MDC), and Bland-Altman plots were calculated. Concurrent validity of these balance tests with the ABC Scale was examined using the Spearman correlation. The BESTest, Mini-BESTest, and Brief-BESTest had high test-retest (ICC = 0.90-0.94) and interrater reliability (ICC = 0.86-0.96), small standard error of measurement (0.86-2.47 points), and MDC (2.39-6.86 points). The Bland-Altman plot revealed no systematic errors. The scores of BESTest, Mini-BEST, and Brief-BEST were correlated significantly with those of ABC Scale (P < .01), supporting their concurrent validity. The BESTest, Mini-BESTest, and Brief-BESTest showed high interrater and test-retest reliability, and excellent concurrent validity with the ABC Scale for community-dwelling cancer survivors aged 55 years and older who had completed cancer treatments for at least 3 months. Future studies are necessary to determine the predictive values for determining fall risks using balance assessment tools in older cancer survivors. Clinicians can utilize the BESTest and its short versions to evaluate balance problems in community-dwelling older cancer survivors and apply the established MDC to assess the intervention outcomes.
1997-02-01
application with a strong resemblance to a video game , concern has been raised that prior video game experience might have a moderating effect on scores. Much...such as spatial ability. The effects of computer or video game experience on work sample scores have not been systematically investigated. The purpose...of this study was to evaluate the incremental validity of prior video game experience over that of general aptitude as a predictor of work sample test
An empirical look at the Defense Mechanism Test (DMT): reliability and construct validity.
Ekehammar, Bo; Zuber, Irena; Konstenius, Marja-Liisa
2005-07-01
Although the Defense Mechanism Test (DMT) has been in use for almost half a century, there are still quite contradictory views about whether it is a reliable instrument, and if so, what it really measures. Thus, based on data from 39 female students, we first examined DMT inter-coder reliability by analyzing the agreement among trained judges in their coding of the same DMT protocols. Second, we constructed a "parallel" photographic picture that retained all structural characteristic of the original and analyzed DMT parallel-test reliability. Third, we examined the construct validity of the DMT by (a) employing three self-report defense-mechanism inventories and analyzing the intercorrelations between DMT defense scores and corresponding defenses in these instruments, (b) studying the relationships between DMT responses and scores on trait and state anxiety, and (c) relating DMT-defense scores to measures of self-esteem. The main results showed that the DMT can be coded with high reliability by trained coders, that the parallel-test reliability is unsatisfactory compared to traditional psychometric standards, that there is a certain generalizability in the number of perceptual distortions that people display from one picture to another, and that the construct validation provided meager empirical evidence for the conclusion that the DMT measures what it purports to measure, that is, psychological defense mechanisms.
Kelly, Maureen E; Regan, Daniel; Dunne, Fidelma; Henn, Patrick; Newell, John; O'Flynn, Siun
2013-05-10
Internationally, tests of general mental ability are used in the selection of medical students. Examples include the Medical College Admission Test, Undergraduate Medicine and Health Sciences Admission Test and the UK Clinical Aptitude Test. The most widely used measure of their efficacy is predictive validity.A new tool, the Health Professions Admission Test- Ireland (HPAT-Ireland), was introduced in 2009. Traditionally, selection to Irish undergraduate medical schools relied on academic achievement. Since 2009, Irish and EU applicants are selected on a combination of their secondary school academic record (measured predominately by the Leaving Certificate Examination) and HPAT-Ireland score. This is the first study to report on the predictive validity of the HPAT-Ireland for early undergraduate assessments of communication and clinical skills. Students enrolled at two Irish medical schools in 2009 were followed up for two years. Data collected were gender, HPAT-Ireland total and subsection scores; Leaving Certificate Examination plus HPAT-Ireland combined score, Year 1 Objective Structured Clinical Examination (OSCE) scores (Total score, communication and clinical subtest scores), Year 1 Multiple Choice Questions and Year 2 OSCE and subset scores. We report descriptive statistics, Pearson correlation coefficients and Multiple linear regression models. Data were available for 312 students. In Year 1 none of the selection criteria were significantly related to student OSCE performance. The Leaving Certificate Examination and Leaving Certificate plus HPAT-Ireland combined scores correlated with MCQ marks.In Year 2 a series of significant correlations emerged between the HPAT-Ireland and subsections thereof with OSCE Communication Z-scores; OSCE Clinical Z-scores; and Total OSCE Z-scores. However on multiple regression only the relationship between Total OSCE Score and the Total HPAT-Ireland score remained significant; albeit the predictive power was modest. We found that none of our selection criteria strongly predict clinical and communication skills. The HPAT- Ireland appears to measures ability in domains different to those assessed by the Leaving Certificate Examination. While some significant associations did emerge in Year 2 between HPAT Ireland and total OSCE scores further evaluation is required to establish if this pattern continues during the senior years of the medical course.
2013-01-01
Background Internationally, tests of general mental ability are used in the selection of medical students. Examples include the Medical College Admission Test, Undergraduate Medicine and Health Sciences Admission Test and the UK Clinical Aptitude Test. The most widely used measure of their efficacy is predictive validity. A new tool, the Health Professions Admission Test- Ireland (HPAT-Ireland), was introduced in 2009. Traditionally, selection to Irish undergraduate medical schools relied on academic achievement. Since 2009, Irish and EU applicants are selected on a combination of their secondary school academic record (measured predominately by the Leaving Certificate Examination) and HPAT-Ireland score. This is the first study to report on the predictive validity of the HPAT-Ireland for early undergraduate assessments of communication and clinical skills. Method Students enrolled at two Irish medical schools in 2009 were followed up for two years. Data collected were gender, HPAT-Ireland total and subsection scores; Leaving Certificate Examination plus HPAT-Ireland combined score, Year 1 Objective Structured Clinical Examination (OSCE) scores (Total score, communication and clinical subtest scores), Year 1 Multiple Choice Questions and Year 2 OSCE and subset scores. We report descriptive statistics, Pearson correlation coefficients and Multiple linear regression models. Results Data were available for 312 students. In Year 1 none of the selection criteria were significantly related to student OSCE performance. The Leaving Certificate Examination and Leaving Certificate plus HPAT-Ireland combined scores correlated with MCQ marks. In Year 2 a series of significant correlations emerged between the HPAT-Ireland and subsections thereof with OSCE Communication Z-scores; OSCE Clinical Z-scores; and Total OSCE Z-scores. However on multiple regression only the relationship between Total OSCE Score and the Total HPAT-Ireland score remained significant; albeit the predictive power was modest. Conclusion We found that none of our selection criteria strongly predict clinical and communication skills. The HPAT- Ireland appears to measures ability in domains different to those assessed by the Leaving Certificate Examination. While some significant associations did emerge in Year 2 between HPAT Ireland and total OSCE scores further evaluation is required to establish if this pattern continues during the senior years of the medical course. PMID:23663266
Predicting Performance in Higher Education Using Proximal Predictors.
Niessen, A Susan M; Meijer, Rob R; Tendeiro, Jorge N
2016-01-01
We studied the validity of two methods for predicting academic performance and student-program fit that were proximal to important study criteria. Applicants to an undergraduate psychology program participated in a selection procedure containing a trial-studying test based on a work sample approach, and specific skills tests in English and math. Test scores were used to predict academic achievement and progress after the first year, achievement in specific course types, enrollment, and dropout after the first year. All tests showed positive significant correlations with the criteria. The trial-studying test was consistently the best predictor in the admission procedure. We found no significant differences between the predictive validity of the trial-studying test and prior educational performance, and substantial shared explained variance between the two predictors. Only applicants with lower trial-studying scores were significantly less likely to enroll in the program. In conclusion, the trial-studying test yielded predictive validities similar to that of prior educational performance and possibly enabled self-selection. In admissions aimed at student-program fit, or in admissions in which past educational performance is difficult to use, a trial-studying test is a good instrument to predict academic performance.
Costa-Tutusaus, Lluís; Guerra-Balic, Myriam
2016-01-28
Lifestyle is intimately related to health. A questionnaire that specifically scores the healthiness of lifestyle of Catalan adolescents is needed. The objective of this study was to develop and validate a scoring questionnaire called VISA-TEEN to assess the healthy lifestyle of young Catalans that can be answered quickly and user-friendly. A lifestyle questionnaire was developed based on the analysis of contributions from two focus groups, one with adolescents and the other with people who work with them (teachers and doctors). A panel of experts validated the content of items that were ultimately selected for the VISA-TEEN questionnaire. Three hundred ninety-six adolescents (215 boys and 181 girls, age = 13-19 years) completed the VISA-TEEN. Internal consistency was assessed using Cronbach's alpha (α) reliability coefficient. Test-retest reliability, using an intraclass correlation coefficient (ICC), was calculated based on scores attained two weeks apart. Construct validity was assessed by the extraction of components with an exploratory factor analysis. The relationship between the scores was measured using the health-related quality of life (HRQoL) KIDSCREEN-10 Index (the relationship was assessed by calculating Pearson's r correlation coefficient). The association of scores in the VISA-TEEN for self-rated health (SRH) was also examined by executing an analysis of variance (ANOVA) between the different categories of this variable. We also calculated the index of fit for factor scales (IFFS) for each component, as well as the discriminatory power of the instrument using Ferguson's δ (delta) coefficient. The VISA-TEEN questionnaire showed acceptable reliability (α = 0.66, αest = 0.77) and a very good test-retest agreement (ICC = 0.860). It could be broken down into the following five components, all with an acceptable or very good IFFS (0.7-0.96): diet, substance abuse, physical activity, Rational Use of Technological Leisure (RUTL), and hygiene. Scores on the VISA-TEEN showed significant correlation with the KIDSCREEN index (r = 0.21, p < 0.001) and were associated with SRH (p < 0.001). The discriminatory power was found to be δ = 0.97. The VISA-TEEN questionnaire developed to study the lifestyle of Catalan adolescents is a valid instrument to apply in this population as it is shown in the present psychometric tests to understand the role of lifestyle in the health of teenagers or to test the efficacy of health campaigns intended to improve teenagers' lifestyle.
Validity of Walk Score® as a measure of neighborhood walkability in Japan.
Koohsari, Mohammad Javad; Sugiyama, Takemi; Hanibuchi, Tomoya; Shibata, Ai; Ishii, Kaori; Liao, Yung; Oka, Koichiro
2018-03-01
Objective measures of environmental attributes have been used to understand how neighborhood environments relate to physical activity. However, this method relies on detailed spatial data, which are often not easily available. Walk Score® is a free, publicly available web-based tool that shows how walkable a given location is based on objectively-derived proximity to several types of local destinations and street connectivity. To date, several studies have tested the concurrent validity of Walk Score as a measure of neighborhood walkability in the USA and Canada. However, it is unknown whether Walk Score is a valid measure in other regions. The current study examined how Walk Score is correlated with objectively-derived attributes of neighborhood walkability, for residential addresses in Japan. Walk Scores were obtained for 1072 residential addresses in urban and rural areas in Japan. Five environmental attributes (residential density, intersection density, number of local destinations, sidewalk availability, and access to public transportation) were calculated using geographic information systems for each address. Pearson's correlation coefficients between Walk Score and these environmental attributes were calculated (conducted in May 2017). Significant positive correlations were observed between Walk Score and environmental attributes relevant to walking. Walk Score was most closely associated with intersection density ( r = 0.82) and with the number of local destinations ( r = 0.77). Walk Score appears to be a valid measure of neighborhood walkability in Japan. Walk Score will allow urban designers and public health practitioners to identify walkability of local areas without relying on detailed geographic data.
Estimated Student Score Gain on the ACT COMP Exam: Valid Tool for Institutional Assessment?
ERIC Educational Resources Information Center
Banta, Trudy W.; And Others
1987-01-01
An institution can test seniors with the ACT College Outcome Measures Project (COMP) exam, then subtract from the senior score an estimated freshman score. Studies at the University of Tennessee, Knoxville, indicate that this method is not reliable to make judgments about the quality of general education programs. (Author/MLW)
ERIC Educational Resources Information Center
Clariana, Roy B.; Wallace, Patricia
2007-01-01
This proof-of-concept investigation describes a computer-based approach for deriving the knowledge structure of individuals and of groups from their written essays, and considers the convergent criterion-related validity of the computer-based scores relative to human rater essay scores and multiple-choice test scores. After completing a…
Awadh, Ammar Ihsan; Hassali, Mohamed Azmi; Al-lela, Omer Qutaiba; Bux, Siti Halimah; Elkalmi, Ramadan M; Hadi, Hazrina
2014-10-27
Parents are the main decision makers for their children vaccinations. This fact makes parents' immunization knowledge and practices as predictor factors for immunization uptake and timeliness. The aim of this pilot study was to develop a reliable and valid instrument in Malaysian language to measure immunization knowledge and practice (KP) of Malaysian parents. A cross-sectional prospective pilot survey was conducted among 88 Malaysian parents who attended public health facilities that provide vaccinations. Translated immunization KP questionnaires (Bahasa Melayu version) were used. Descriptive statistics were applied, face and content validity were assessed, and internal consistency, test-retest reliability, and construct validity were determined. The mean ± standard deviation (SD) of the knowledge scores was 7.36 ± 2.29 and for practice scores was 7.13 ± 2.20. Good internal consistency was found for knowledge and practice items (Cronbach's alpha = 0.757 and 0.743 respectively); the test-retest reliability value was 0.740 (p = 0.014). A panel of three specialist pharmacists who are experts in this field judged the face and content validity of the final questionnaire. Parents with up-to-date immunized children had significantly better knowledge and practice scores than parents who did not (p < 0.001 and p = 0.001 respectively), suggesting a good construct validity. A significant difference was found in knowledge and practice scores among parents' age (p = 0.006 and p = 0.029 respectively) and place of living (p = 0.037 and p = 0.043). The parents' knowledge level was positively associated with their practice toward immunization (Spearman's rank correlation coefficient 0.310, p = 0.003). The pilot study concluded that the Bahasa Melayu version of the immunization KP questionnaire has good reliability and validity for measuring the knowledge and practices of Malaysian parents and therefore this version can be used in future research.
ERIC Educational Resources Information Center
Bornstein, Robert F.
2011-01-01
Although definitions of validity have evolved considerably since L. J. Cronbach and P. E. Meehl's classic (1955) review, contemporary validity research continues to emphasize correlational analyses assessing predictor-criterion relationships, with most outcome criteria being self-reports. The present article describes an alternative way of…
Truth and Evidence in Validity Theory
ERIC Educational Resources Information Center
Borsboom, Denny; Markus, Keith A.
2013-01-01
According to Kane (this issue), "the validity of a proposed interpretation or use depends on how well the evidence supports" the claims being made. Because truth and evidence are distinct, this means that the validity of a test score interpretation could be high even though the interpretation is false. As an illustration, we discuss the case of…
Jacob, Robin; Somers, Marie-Andree; Zhu, Pei; Bloom, Howard
2016-06-01
In this article, we examine whether a well-executed comparative interrupted time series (CITS) design can produce valid inferences about the effectiveness of a school-level intervention. This article also explores the trade-off between bias reduction and precision loss across different methods of selecting comparison groups for the CITS design and assesses whether choosing matched comparison schools based only on preintervention test scores is sufficient to produce internally valid impact estimates. We conduct a validation study of the CITS design based on the federal Reading First program as implemented in one state using results from a regression discontinuity design as a causal benchmark. Our results contribute to the growing base of evidence regarding the validity of nonexperimental designs. We demonstrate that the CITS design can, in our example, produce internally valid estimates of program impacts when multiple years of preintervention outcome data (test scores in the present case) are available and when a set of reasonable criteria are used to select comparison organizations (schools in the present case). © The Author(s) 2016.