score reliability coefficients: Topics by Science.gov

Sample records for score reliability coefficients

Coefficient Alpha and Reliability of Scale Scores

ERIC Educational Resources Information Center

Almehrizi, Rashid S.

2013-01-01

The majority of large-scale assessments develop various score scales that are either linear or nonlinear transformations of raw scores for better interpretations and uses of assessment results. The current formula for coefficient alpha (a; the commonly used reliability coefficient) only provides internal consistency reliability estimates of raw…
The reliability of multidimensional neuropsychological measures: from alpha to omega.

PubMed

Watkins, Marley W

To demonstrate that Coefficient omega, a model-based estimate, is more a more appropriate index of reliability than coefficient alpha for the multidimensional scales that are commonly employed by neuropsychologists. As an illustration, a structural model of an overarching general factor and four first-order factors for the WAIS-IV based on the standardization sample of 2200 participants was identified and omega coefficients were subsequently computed for WAIS-IV composite scores. Alpha coefficients were ≥ .90 and omega coefficients ranged from .75 to .88 for WAIS-IV factor index scores, indicating that the blend of general and group factor variance in each index score created a reliable multidimensional composite. However, the amalgam of variance from general and group factors did not allow the precision of Full Scale IQ (FSIQ) and factor index scores to be disentangled. In contrast, omega hierarchical coefficients were low for all four factor index scores (.10-.41), indicating that most of the reliable variance of each factor index score was due to the general intelligence factor. In contrast, the omega hierarchical coefficient for the FSIQ score was .84. Meaningful interpretation of WAIS-IV factor index scores as unambiguous indicators of group factors is imprecise, thereby fostering unreliable identification of neurocognitive strengths and weaknesses, whereas the WAIS-IV FSIQ score can be interpreted as a reliable measure of general intelligence. It was concluded that neuropsychologists should base their clinical decisions on reliable scores as indexed by coefficient omega.
[The appraisal of reliability and validity of subjective workload assessment technique and NASA-task load index].

PubMed

Xiao, Yuan-mei; Wang, Zhi-ming; Wang, Mian-zhen; Lan, Ya-jia

2005-06-01

To test the reliability and validity of two mental workload assessment scales, i.e. subjective workload assessment technique (SWAT) and NASA task load index (NASA-TLX). One thousand two hundred and sixty-eight mental workers were sampled from various kinds of occupations, such as scientific research, education, administration and medicine, etc, with randomized cluster sampling. The re-test reliability, split-half reliability, Cronbach's alpha coefficient and correlation coefficients between item score and total score were adopted to test the reliability. The test of validity included structure validity. The re-test reliability coefficients of these two scales and their items were ranged from 0.516 to 0.753 (P < 0.01), indicating the two scales had good re-test reliability; the split-half reliability of SWAT was 0.645, and its Cronbach's alpha coefficient was more than 0.80, all the correlation coefficients between its items score and total score were more than 0.70; as for NASA-TLX, both the split-half reliability and Cronbach's alpha coefficient were more than 0.80, the correlation coefficients between its items score and total score were all more than 0.60 (P < 0.01) except the item of performance. Both scales had good inner consistency. The Pearson correlation coefficient between the two scales was 0.492 (P < 0.01), implying the results of the two scales had good consistency. Factor analysis showed that the two scales had good structure validity. Both SWAT and NASA-TLX have good reliability and validity and may be used as a valid tool to assess mental workload in China after being revised properly.
Understanding a Widely Misunderstood Statistic: Cronbach's "Alpha"

ERIC Educational Resources Information Center

Ritter, Nicola L.

2010-01-01

It is important to explore score reliability in virtually all studies, because tests are not reliable. The present paper explains the most frequently used reliability estimate, coefficient alpha, so that the coefficient's conceptual underpinnings will be understood. Researchers need to understand score reliability because of the possible impact…
ScoreRel CI: An Excel Program for Computing Confidence Intervals for Commonly Used Score Reliability Coefficients

ERIC Educational Resources Information Center

Barnette, J. Jackson

2005-01-01

An Excel program developed to assist researchers in the determination and presentation of confidence intervals around commonly used score reliability coefficients is described. The software includes programs to determine confidence intervals for Cronbachs alpha, Pearson r-based coefficients such as those used in test-retest and alternate forms…
Testing the Difference between Reliability Coefficients Alpha and Omega

ERIC Educational Resources Information Center

Deng, Lifang; Chan, Wai

2017-01-01

Reliable measurements are key to social science research. Multiple measures of reliability of the total score have been developed, including coefficient alpha, coefficient omega, the greatest lower bound reliability, and others. Among these, the coefficient alpha has been most widely used, and it is reported in nearly every study involving the…
Attenuation of the Squared Canonical Correlation Coefficient under Varying Estimates of Score Reliability

ERIC Educational Resources Information Center

Wilson, Celia M.

2010-01-01

Research pertaining to the distortion of the squared canonical correlation coefficient has traditionally been limited to the effects of sampling error and associated correction formulas. The purpose of this study was to compare the degree of attenuation of the squared canonical correlation coefficient under varying conditions of score reliability.…
Reliable scar scoring system to assess photographs of burn patients.

PubMed

Mecott, Gabriel A; Finnerty, Celeste C; Herndon, David N; Al-Mousawi, Ahmed M; Branski, Ludwik K; Hegde, Sachin; Kraft, Robert; Williams, Felicia N; Maldonado, Susana A; Rivero, Haidy G; Rodriguez-Escobar, Noe; Jeschke, Marc G

2015-12-01

Several scar-scoring scales exist to clinically monitor burn scar development and maturation. Although scoring scars through direct clinical examination is ideal, scars must sometimes be scored from photographs. No scar scale currently exists for the latter purpose. We modified a previously described scar scale (Yeong et al., J Burn Care Rehabil 1997) and tested the reliability of this new scale in assessing burn scars from photographs. The new scale consisted of three parameters as follows: scar height, surface appearance, and color mismatch. Each parameter was assigned a score of 1 (best) to 4 (worst), generating a total score of 3-12. Five physicians with burns training scored 120 representative photographs using the original and modified scales. Reliability was analyzed using coefficient of agreement, Cronbach alpha, intraclass correlation coefficient, variance, and coefficient of variance. Analysis of variance was performed using the Kruskal-Wallis test. Color mismatch and scar height scores were validated by analyzing actual height and color differences. The intraclass correlation coefficient, the coefficient of agreement, and Cronbach alpha were higher for the modified scale than those of the original scale. The original scale produced more variance than that in the modified scale. Subanalysis demonstrated that, for all categories, the modified scale had greater correlation and reliability than the original scale. The correlation between color mismatch scores and actual color differences was 0.84 and between scar height scores and actual height was 0.81. The modified scar scale is a simple, reliable, and useful scale for evaluating photographs of burn patients. Copyright © 2015 Elsevier Inc. All rights reserved.
Reliability Generalization: Exploring Variation of Reliability Coefficients of MMPI Clinical Scales Scores.

ERIC Educational Resources Information Center

Vacha-Haase, Tammi; Kogan, Lori R.; Tani, Crystal R.; Woodall, Renee A.

2001-01-01

Used reliability generalization to explore the variance of scores on 10 Minnesota Multiphasic Personality Inventory (MMPI) clinical scales drawing on 1,972 articles in the literature on the MMPI. Results highlight the premise that scores, not tests, are reliable or unreliable, and they show that study characteristics do influence scores on the…
Validity and reliability of the Diagnostic Adaptive Behaviour Scale.

PubMed

Tassé, M J; Schalock, R L; Balboni, G; Spreat, S; Navas, P

2016-01-01

The Diagnostic Adaptive Behaviour Scale (DABS) is a new standardised adaptive behaviour measure that provides information for evaluating limitations in adaptive behaviour for the purpose of determining a diagnosis of intellectual disability. This article presents validity evidence and reliability data for the DABS. Validity evidence was based on comparing DABS scores with scores obtained on the Vineland Adaptive Behaviour Scale, second edition. The stability of the test scores was measured using a test and retest, and inter-rater reliability was assessed by computing the inter-respondent concordance. The DABS convergent validity coefficients ranged from 0.70 to 0.84, while the test-retest reliability coefficients ranged from 0.78 to 0.95, and the inter-rater concordance as measured by intraclass correlation coefficients ranged from 0.61 to 0.87. All obtained validity and reliability indicators were strong and comparable with the validity and reliability coefficients of the most commonly used adaptive behaviour instruments. These results and the advantages of the DABS for clinician and researcher use are discussed. © 2015 MENCAP and International Association of the Scientific Study of Intellectual and Developmental Disabilities and John Wiley & Sons Ltd.
Use of Internal Consistency Coefficients for Estimating Reliability of Experimental Tasks Scores

PubMed Central

Green, Samuel B.; Yang, Yanyun; Alt, Mary; Brinkley, Shara; Gray, Shelley; Hogan, Tiffany; Cowan, Nelson

2017-01-01

Reliabilities of scores for experimental tasks are likely to differ from one study to another to the extent that the task stimuli change, the number of trials varies, the type of individuals taking the task changes, the administration conditions are altered, or the focal task variable differs. Given reliabilities vary as a function of the design of these tasks and the characteristics of the individuals taking them, making inferences about the reliability of scores in an ongoing study based on reliability estimates from prior studies is precarious. Thus, it would be advantageous to estimate reliability based on data from the ongoing study. We argue that internal consistency estimates of reliability are underutilized for experimental task data and in many applications could provide this information using a single administration of a task. We discuss different methods for computing internal consistency estimates with a generalized coefficient alpha and the conditions under which these estimates are accurate. We illustrate use of these coefficients using data for three different tasks. PMID:26546100
Reliability of Total Test Scores When Considered as Ordinal Measurements

ERIC Educational Resources Information Center

Biswas, Ajoy Kumar

2006-01-01

This article studies the ordinal reliability of (total) test scores. This study is based on a classical-type linear model of observed score (X), true score (T), and random error (E). Based on the idea of Kendall's tau-a coefficient, a measure of ordinal reliability for small-examinee populations is developed. This measure is extended to large…
Urdu translation of the Hamilton Rating Scale for Depression: Results of a validation study

PubMed Central

Hashmi, Ali M.; Naz, Shahana; Asif, Aftab; Khawaja, Imran S.

2016-01-01

Objective: To develop a standardized validated version of the Hamilton Rating Scale for Depression (HAM-D) in Urdu. Methods: After translation of the HAM-D into the Urdu language following standard guidelines, the final Urdu version (HAM-D-U) was administered to 160 depressed outpatients. Inter-item correlation was assessed by calculating Cronbach alpha. Correlation between HAM-D-U scores at baseline and after a 2-week interval was evaluated for test-retest reliability. Moreover, scores of two clinicians on HAM-D-U were compared for inter-rater reliability. For establishing concurrent validity, scores of HAM-D-U and BDI-U were compared by using Spearman correlation coefficient. The study was conducted at Mayo Hospital, Lahore, from May to December 2014. Results: The Cronbach alpha for HAM-D-U was 0.71. Composite scores for HAM-D-U at baseline and after a 2-week interval were also highly correlated with each other (Spearman correlation coefficient 0.83, p-value < 0.01) indicating good test-retest reliability. Composite scores for HAM-D-U and BDI-U were positively correlated with each other (Spearman correlation coefficient 0.85, p < 0.01) indicating good concurrent validity. Scores of two clinicians for HAM-D-U were also positively correlated (Spearman correlation coefficient 0.82, p-value < 0.01) indicated good inter-rater reliability. Conclusion: The HAM-D-U is a valid and reliable instrument for the assessment of Depression. It shows good inter-rater and test-retest reliability. The HAM-D-U can be a tool either for clinical management or research. PMID:28083049
Urdu translation of the Hamilton Rating Scale for Depression: Results of a validation study.

PubMed

Hashmi, Ali M; Naz, Shahana; Asif, Aftab; Khawaja, Imran S

2016-01-01

To develop a standardized validated version of the Hamilton Rating Scale for Depression (HAM-D) in Urdu. After translation of the HAM-D into the Urdu language following standard guidelines, the final Urdu version (HAM-D-U) was administered to 160 depressed outpatients. Inter-item correlation was assessed by calculating Cronbach alpha. Correlation between HAM-D-U scores at baseline and after a 2-week interval was evaluated for test-retest reliability. Moreover, scores of two clinicians on HAM-D-U were compared for inter-rater reliability. For establishing concurrent validity, scores of HAM-D-U and BDI-U were compared by using Spearman correlation coefficient. The study was conducted at Mayo Hospital, Lahore, from May to December 2014. The Cronbach alpha for HAM-D-U was 0.71. Composite scores for HAM-D-U at baseline and after a 2-week interval were also highly correlated with each other (Spearman correlation coefficient 0.83, p-value < 0.01) indicating good test-retest reliability. Composite scores for HAM-D-U and BDI-U were positively correlated with each other (Spearman correlation coefficient 0.85, p < 0.01) indicating good concurrent validity. Scores of two clinicians for HAM-D-U were also positively correlated (Spearman correlation coefficient 0.82, p-value < 0.01) indicated good inter-rater reliability. The HAM-D-U is a valid and reliable instrument for the assessment of Depression. It shows good inter-rater and test-retest reliability. The HAM-D-U can be a tool either for clinical management or research.
Reliability and Validity of the TIMPSI for Infants With Spinal Muscular Atrophy Type I

PubMed Central

Krosschell, Kristin J.; Maczulski, Jo Anne; Scott, Charles; King, Wendy; Hartman, Jill T.; Case, Laura E.; Viazzo-Trussell, Donata; Wood, Janine; Roman, Carolyn A.; Hecker, Eva; Meffert, Marianne; Léveillé, Maude; Kienitz, Krista; Swoboda, Kathryn J.

2014-01-01

Purpose This study examined the reliability and validity of the Test of Infant Motor Performance Screening Items (TIMPSI) in infants with type I spinal muscular atrophy (SMA). Methods After training, 12 evaluators scored 4 videos of infants with type I SMA to assess interrater reliability. Intrarater and test-retest reliability was further assessed for 9 evaluators during a SMA type I clinical trial, with 9 evaluators testing a total of 38 infants twice. Relatedness of the TIMPSI score to ability to reach and ventilatory support was also examined. Results Excellent interrater video score reliability was noted (intraclass correlation coefficient, 0.97–0.98). Intrarater reliability was excellent (intraclass correlation coefficient, 0.91–0.98) and test-retest reliability ranged from r = 0.82 to r = 0.95. The TIMPSI score was related to the ability to reach (P ≤ .05). Conclusion The TIMPSI can reliably be used to assess motor function in infants with type I SMA. In addition, the TIMPSI scores are related to the ability to reach, an important functional skill in children with type I SMA. PMID:23542189
[Validity, reliability, and acceptability of the brief version of the self-management knowledge, attitude, and behavior assessment scale for diabetes patients].

PubMed

Wu, Y Z; Wang, W J; Feng, N P; Chen, B; Li, G C; Liu, J W; Liu, H L; Yang, Y Y

2016-07-06

To evaluate the validity, reliability, and acceptability of the brief version of the self-management knowledge, attitude, and behavior (KAB) assessment scale for diabetes patients. Diabetes patients who were managed at the Xinkaipu Community Health Service Center of Tianxin in Changsha, Hunan Province were selected for survey by cluster sampling. A total of 350 diabetes patients were surveyed using the brief scale to collect data on knowledge, attitudes, and behaviors of self-management. Content validity was evaluated by Pearson correlation coefficient between the brief scale and subscales of knowledge, attitude, and behavior. Structure validity was evaluated by factor analysis, and discrimination validity was evaluated by an independent sample t-test between the high-score and low-score groups. Reliability was tested by internal consistency reliability and split-half reliability. The evaluation indexes of internal consistency reliability were Cronbach's α coefficients, θ coefficient, and Ω coefficient. Acceptability was evaluated by valid response rate and completion time of the brief scale. A total of 346(98.9%) valid questionnaires were returned, with average survey time of (11.43±3.4) minutes. Average score of the brief scale was 78.85 ± 11.22; scores of the knowledge, attitude, and behavior subscales were 16.45 ± 4.42, 21.33 ± 2.03, and 41.07 ± 8.34, respectively. Pearson correlation coefficients between the brief scale and the knowledge, attitude, and behavior subscales were 0.92, 0.42, and 0.60, respectively; P-values were all less than 0.01, indicating that the face validity and content validity of the brief scale were achieved to a good level. The common factor cumulative variance contribution rate of the brief scale and three subscales was from 53.66% to 61.75%, which achieved more than 50% of the approved standard. There were 11 common factors; 41 of the total 42 items had factor loadings above 0.40 in their relevant common factor, indicating that the brief scale and three subscales had good construct validity. Patients were divided into a high-score group and a low-score group, then scores of the brief scale and three subscales were compared between the groups using a t-test. The results were all significant, indicating that the brief scale and three subscales had good discriminate validity. Mean scores of the brief scale and three subscales of the high-score group were 91.55±6.81, 19.51±2.17, 22.74±1.88, and 49.30±6.20, respectively; these were higher than the low-score group (65.89±5.79, 12.29±4.76, 20.22±1.88, and 33.39±6.17, respectively) with t-values 27.76, 13.31, 9.20, and 17.56 (P-values were less than 0.001). The Cronbach's α coefficient, θ coefficient, Ω coefficient, and split-half reliability of the brief scale were 0.83, 0.87, 0.96, and 0.84, respectively. These values for the three subscales were all above 0.70, except for the θ coefficient of the attitude subscale with 0.64, indicating that the brief scale and three subscales had acceptable internal consistency reliability. The brief version of the diabetes self-management knowledge, attitude, and behavior assessment scale showed good acceptability, validity, and reliability, to responsibly evaluate self-management KAB among patients with diabetes.
Reliability and validity of the Chinese pediatric voice handicap index.

PubMed

Liu, Kena; Liu, Shaofeng; Zhou, Zhou; Ren, Qinyi; Zhong, Jie; Luo, Renzhong; Qin, Huabiao; Zhang, Siyi; Ge, Pingjiang

2018-02-01

To evaluate the reliability and validity of the Chinese version of pediatric voice handicap index (pVHI). The original English version-pVHI was translated into Chinese. Parents of 52 children with voice dysphonia and 43 children with no history or symptoms of voice problems were asked to fill the Chinese pVHI questionnaires twice with an interval of 2 weeks. GRB (Grade, Roughness, Breathiness) scale was used for perceptual assessment by two otolaryngologists and one speech pathologist for each child's voice. The internal consistency was assessed using Cronbach's alpha coefficient. Pearson's correlation coefficient was used to evaluate the test-retest reliability. The Kendall's coefficient of concordance W was used to assess the consistency of GRB scores of 3 voice specialists. The nonparametric Mann-Whitney test was used to assess the differences between the dysphonia group and controls. The correlation between pVHI and GRB scores were assessed using Pearson's correlation coefficient. The internal consistency of total score and three subscales scores of Chinese pVHI were 0.788-0.944. The test-retest reliability was 0.631-0.887(P < .001). The pVHI scores of control group significantly were lower than the pathological group (P = .000). The GRB scores of 3 voice specialists have an excellent consistency (W = 0.694-0.807, P = .000). The pVHI scores positively correlated with GRB assessment (P < .01). The Chinese version of pVHI had a good reliability and validity. It can be applicable and useful supplementary tool for evaluating parents' perception of their children's dysphonia. Copyright © 2017. Published by Elsevier B.V.
Comparison of Reliability Measures under Factor Analysis and Item Response Theory

ERIC Educational Resources Information Center

Cheng, Ying; Yuan, Ke-Hai; Liu, Cheng

2012-01-01

Reliability of test scores is one of the most pervasive psychometric concepts in measurement. Reliability coefficients based on a unifactor model for continuous indicators include maximal reliability rho and an unweighted sum score-based omega, among many others. With increasing popularity of item response theory, a parallel reliability measure pi…
Estimating Between-Person and Within-Person Subscore Reliability with Profile Analysis.

PubMed

Bulut, Okan; Davison, Mark L; Rodriguez, Michael C

2017-01-01

Subscores are of increasing interest in educational and psychological testing due to their diagnostic function for evaluating examinees' strengths and weaknesses within particular domains of knowledge. Previous studies about the utility of subscores have mostly focused on the overall reliability of individual subscores and ignored the fact that subscores should be distinct and have added value over the total score. This study introduces a profile reliability approach that partitions the overall subscore reliability into within-person and between-person subscore reliability. The estimation of between-person reliability and within-person reliability coefficients is demonstrated using subscores from number-correct scoring, unidimensional and multidimensional item response theory scoring, and augmented scoring approaches via a simulation study and a real data study. The effects of various testing conditions, such as subtest length, correlations among subscores, and the number of subtests, are examined. Results indicate that there is a substantial trade-off between within-person and between-person reliability of subscores. Profile reliability coefficients can be useful in determining the extent to which subscores provide distinct and reliable information under various testing conditions.
The inter and intra rater reliability of the Netball Movement Screening Tool.

PubMed

Reid, Duncan A; Vanweerd, Rebecca J; Larmer, Peter J; Kingstone, Rachel

2015-05-01

To establish the inter- and intra-rater reliability of the Netball Movement Screening Tool, for screening adolescent female netball players. Inter- and intra-rater reliability study. Forty secondary school netball players were recruited to take part in the study. Twenty subjects were screened simultaneously and independently by two raters to ascertain inter-rater agreement. Twenty subjects were scored by rater one on two occasions, separated by a week, to ascertain intra-rater agreement. Inter and intra-rater agreement was assessed utilising the two-way mixed inter class correlation coefficient and weighted kappa statistics. No significant demographic differences were found between the inter and intra-rater groups of subjects. Inter class correlation coefficients' demonstrated excellent inter-rater (two-way mixed inter class correlation coefficients 0.84, standard error of measurement 0.25) and intra-rater (two-way mixed inter class correlation coefficients 0.96, standard error of measurement 0.13) reliability for the overall Netball Movement Screening Tool score and substantial-excellent (two-way mixed inter class correlation coefficients 1.0-0.65) inter-rater and substantial-excellent intra-rater (two-way mixed inter class correlation coefficients 0.96-0.79) reliability for the component scores of the Netball Movement Screening Tool. Kappa statistic showed substantial to poor inter-rater (k=0.75-0.32) and intra-rater (k=0.77-0.27) agreement for individual tests of the NMST. The Netball Movement Screening Tool may be a reliable screening tool for adolescent netball players; however the individual test scores have low reliability. The screening tool can be administered reliably by raters with similar levels of training in the tool but variable clinical experience. On-going research needs to be undertaken to ascertain whether the Netball Movement Screening Tool is a valid tool in ascertaining increased injury risk for netball players. Copyright © 2014 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.

Comparing Fit and Reliability Estimates of a Psychological Instrument Using Second-Order CFA, Bifactor, and Essentially Tau-Equivalent (Coefficient Alpha) Models via AMOS 22

ERIC Educational Resources Information Center

Black, Ryan A.; Yang, Yanyun; Beitra, Danette; McCaffrey, Stacey

2015-01-01

Estimation of composite reliability within a hierarchical modeling framework has recently become of particular interest given the growing recognition that the underlying assumptions of coefficient alpha are often untenable. Unfortunately, coefficient alpha remains the prominent estimate of reliability when estimating total scores from a scale with…
Interrater Reliability Estimators Commonly Used in Scoring Language Assessments: A Monte Carlo Investigation of Estimator Accuracy

ERIC Educational Resources Information Center

Morgan, Grant B.; Zhu, Min; Johnson, Robert L.; Hodge, Kari J.

2014-01-01

Common estimators of interrater reliability include Pearson product-moment correlation coefficients, Spearman rank-order correlations, and the generalizability coefficient. The purpose of this study was to examine the accuracy of estimators of interrater reliability when varying the true reliability, number of scale categories, and number of…
Interobserver Reliability of the Total Body Score System for Quantifying Human Decomposition.

PubMed

Dabbs, Gretchen R; Connor, Melissa; Bytheway, Joan A

2016-03-01

Several authors have tested the accuracy of the Total Body Score (TBS) method for quantifying decomposition, but none have examined the reliability of the method as a scoring system by testing interobserver error rates. Sixteen participants used the TBS system to score 59 observation packets including photographs and written descriptions of 13 human cadavers in different stages of decomposition (postmortem interval: 2-186 days). Data analysis used a two-way random model intraclass correlation in SPSS (v. 17.0). The TBS method showed "almost perfect" agreement between observers, with average absolute correlation coefficients of 0.990 and average consistency correlation coefficients of 0.991. While the TBS method may have sources of error, scoring reliability is not one of them. Individual component scores were examined, and the influences of education and experience levels were investigated. Overall, the trunk component scores were the least concordant. Suggestions are made to improve the reliability of the TBS method. © 2016 American Academy of Forensic Sciences.
Assessing the Quality of Mobile Exercise Apps Based on the American College of Sports Medicine Guidelines: A Reliable and Valid Scoring Instrument.

PubMed

Guo, Yi; Bian, Jiang; Leavitt, Trevor; Vincent, Heather K; Vander Zalm, Lindsey; Teurlings, Tyler L; Smith, Megan D; Modave, François

2017-03-07

Regular physical activity can not only help with weight management, but also lower cardiovascular risks, cancer rates, and chronic disease burden. Yet, only approximately 20% of Americans currently meet the physical activity guidelines recommended by the US Department of Health and Human Services. With the rapid development of mobile technologies, mobile apps have the potential to improve participation rates in exercise programs, particularly if they are evidence-based and are of sufficient content quality. The goal of this study was to develop and test an instrument, which was designed to score the content quality of exercise program apps with respect to the exercise guidelines set forth by the American College of Sports Medicine (ACSM). We conducted two focus groups (N=14) to elicit input for developing a preliminary 27-item scoring instruments based on the ACSM exercise prescription guidelines. Three reviewers who were no sports medicine experts independently scored 28 exercise program apps using the instrument. Inter- and intra-rater reliability was assessed among the 3 reviewers. An expert reviewer, a Fellow of the ACSM, also scored the 28 apps to create criterion scores. Criterion validity was assessed by comparing nonexpert reviewers' scores to the criterion scores. Overall, inter- and intra-rater reliability was high with most coefficients being greater than .7. Inter-rater reliability coefficients ranged from .59 to .99, and intra-rater reliability coefficients ranged from .47 to 1.00. All reliability coefficients were statistically significant. Criterion validity was found to be excellent, with the weighted kappa statistics ranging from .67 to .99, indicating a substantial agreement between the scores of expert and nonexpert reviewers. Finally, all apps scored poorly against the ACSM exercise prescription guidelines. None of the apps received a score greater than 35, out of a possible maximal score of 70. We have developed and presented valid and reliable scoring instruments for exercise program apps. Our instrument may be useful for consumers and health care providers who are looking for apps that provide safe, progressive general exercise programs for health and fitness. ©Yi Guo, Jiang Bian, Trevor Leavitt, Heather K Vincent, Lindsey Vander Zalm, Tyler L Teurlings, Megan D Smith, François Modave. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 07.03.2017.
Reliability of Summed Item Scores Using Structural Equation Modeling: An Alternative to Coefficient Alpha

ERIC Educational Resources Information Center

Green, Samuel B.; Yang, Yanyun

2009-01-01

A method is presented for estimating reliability using structural equation modeling (SEM) that allows for nonlinearity between factors and item scores. Assuming the focus is on consistency of summed item scores, this method for estimating reliability is preferred to those based on linear SEM models and to the most commonly reported estimate of…
Psychometric Inferences from a Meta-Analysis of Reliability and Internal Consistency Coefficients

ERIC Educational Resources Information Center

Botella, Juan; Suero, Manuel; Gambara, Hilda

2010-01-01

A meta-analysis of the reliability of the scores from a specific test, also called reliability generalization, allows the quantitative synthesis of its properties from a set of studies. It is usually assumed that part of the variation in the reliability coefficients is due to some unknown and implicit mechanism that restricts and biases the…
Estimating the Reliability of a Test Battery Composite or a Test Score Based on Weighted Item Scoring

ERIC Educational Resources Information Center

Feldt, Leonard S.

2004-01-01

In some settings, the validity of a battery composite or a test score is enhanced by weighting some parts or items more heavily than others in the total score. This article describes methods of estimating the total score reliability coefficient when differential weights are used with items or parts.
A Flexible Latent Class Approach to Estimating Test-Score Reliability

ERIC Educational Resources Information Center

van der Palm, Daniël W.; van der Ark, L. Andries; Sijtsma, Klaas

2014-01-01

The latent class reliability coefficient (LCRC) is improved by using the divisive latent class model instead of the unrestricted latent class model. This results in the divisive latent class reliability coefficient (DLCRC), which unlike LCRC avoids making subjective decisions about the best solution and thus avoids judgment error. A computational…
Large Sample Confidence Intervals for Item Response Theory Reliability Coefficients

ERIC Educational Resources Information Center

Andersson, Björn; Xin, Tao

2018-01-01

In applications of item response theory (IRT), an estimate of the reliability of the ability estimates or sum scores is often reported. However, analytical expressions for the standard errors of the estimators of the reliability coefficients are not available in the literature and therefore the variability associated with the estimated reliability…
Skin colour assessment of replanted fingers in digital images and its reliability for the incorporation of images in nursing progress notes.

PubMed

Terashima, Taiko; Yoshimura, Sadako

2018-03-01

To determine whether nurses can accurately assess the skin colour of replanted fingers displayed as digital images on a computer screen. Colour measurement and clinical diagnostic methods for medical digital images have been studied, but reproducing skin colour on a computer screen remains difficult. The inter-rater reliability of skin colour assessment scores was evaluated. In May 2014, 21 nurses who worked on a trauma ward in Japan participated in testing. Six digital images with different skin colours were used. Colours were scored from both digital images and direct patient's observation. The score from a digital image was defined as the test score, and its difference from the direct assessment score as the difference score. Intraclass correlation coefficients were calculated. Nurses' opinions were classified and summarised. The intraclass correlation coefficients for the test scores were fair. Although the intraclass correlation coefficients for the difference scores were poor, they improved to good when three images that might have contributed to poor reliability were excluded. Most nurses stated that it is difficult to assess skin colour in digital images; they did not think it could be a substitute for direct visual assessment. However, most nurses were in favour of including images in nursing progress notes. Although the inter-rater reliability was fairly high, the reliability of colour reproduction in digital images as indicated by the difference scores was poor. Nevertheless, nurses expect the incorporation of digital images in nursing progress notes to be useful. This gap between the reliability of digital colour reproduction and nurses' expectations towards it must be addressed. High inter-rater reliability for digital images in nursing progress notes was not observed. Assessments of future improvements in colour reproduction technologies are required. Further digitisation and visualisation of nursing records might pose challenges. © 2017 John Wiley & Sons Ltd.
Intra- and inter-observer reliability of ten major histological scoring systems used for the evaluation of in vivo cartilage repair.

PubMed

Bonasia, Davide Edoardo; Marmotti, Antongiulio; Massa, Alessandro Domenico Felice; Ferro, Andrea; Blonna, Davide; Castoldi, Filippo; Rossi, Roberto

2015-09-01

In the last two decades, many surgical techniques have been described for articular cartilage repair. Reliable histological scoring systems are fundamental tools to evaluate new procedures. Several histological scoring systems have been described, and these can be divided in elementary and comprehensive scores, according to the number of sub-items. The aim of this study was to test the inter- and intra-observer reliability of ten main scores used for the histological evaluation of in vivo cartilage repair. The authors tested the starting hypothesis that elementary scores would show superior intra- and inter-observer reliability compared with comprehensive scores. Fifty histological sections obtained from the trochlea of New Zealand Rabbit and stained with Safranin-O fast green were used. The histological sections were analysed by 4 observers: 2 experienced in cartilage histology and 2 inexperienced. Histological evaluations were performed at time 1 and time 2, separated by a 30-day interval. The following scores were used: Mankin, O'Driscoll, Pineda, Wakitani, Fortier, Selleres, ICRS, ICRSII, Oswestry (OsScore) and modified O'Driscoll. Intra- and inter-observer reliability were evaluated for each score. In addition, the pavement-ceiling effect and the Bland-Altman Coefficient of Repeatability were then evaluated for each sub-item of every score. Intra-observer reliability was high for all observers in every score, even though the reliability was significantly lower for non-expert observers compared with expert counterparts. In terms of Coefficient of Repeatability, some scores performed better (O'Driscoll, Modified O'Driscoll and ICRSII) than others (Fortier, Seller). Inter-observer reliability was high for all observers in every score, but significantly lower for non-expert compared with expert observers. In expert hands, all the scores showed high intra- and inter-observer reliability, independently of the complexity. Although every score has advantages and disadvantages, ICRSII, O'Driscoll and Modified O'Driscoll scores should be preferred for the evaluation of in vivo cartilage repair in animal models.
The Reliability and Validity of Scores from the Children's Version of the Perception of Success Questionnaire.

ERIC Educational Resources Information Center

Liukkonen, Jarmo; Leskinen, Esko

1999-01-01

Analyzed the reliability and validity of scores of 557 14-year-old Finnish male soccer players on the children's version of the Perception of Success Questionnaire (G. Roberts and others, 1998). Internal consistency coefficients for the two subscales' scores were high, and scores on both scales had strong construct validity. (LSD)
Basic Concepts in Classical Test Theory: Tests Aren't Reliable, the Nature of Alpha, and Reliability Generalization as a Meta-analytic Method.

ERIC Educational Resources Information Center

Helms, LuAnn Sherbeck

This paper discusses the fact that reliability is about scores and not tests and how reliability limits effect sizes. The paper also explores the classical reliability coefficients of stability, equivalence, and internal consistency. Stability is concerned with how stable test scores will be over time, while equivalence addresses the relationship…
Test-Retest Reliability and Minimal Detectable Change of Randomized Dichotic Digits in Learning-Disabled Children: Implications for Dichotic Listening Training.

PubMed

Mahdavi, Mohammad Ebrahim; Pourbakht, Akram; Parand, Akram; Jalaie, Shohreh

2018-03-01

Evaluation of dichotic listening to digits is a common part of many studies for diagnosis and managing auditory processing disorders in children. Previous researchers have verified test-retest relative reliability of dichotic digits results in normal children and adults. However, detecting intervention-related changes in the ear scores after dichotic listening training requires information regarding trial-to-trial typical variation of individual ear scores that is estimated using indices of absolute reliability. Previous studies have not addressed absolute reliability of dichotic listening results. To compare the results of the Persian randomized dichotic digits test (PRDDT) and its relative and absolute indices of reliability between typical achieving (TA) and learning-disabled (LD) children. A repeated measures observational study. Fifteen LD children were recruited from a previously performed study with age range of 7-12 yr. The control group consisted of 15 TA schoolchildren with age range of 8-11 yr. The Persian randomized dichotic digits test was administered on the children under free recall condition in two test sessions 7-12 days apart. We compared the average of the ear scores and ear advantage between TA and LD children. Relative indices of reliability included Pearson's correlation and intraclass correlation (ICC 2,1 ) coefficients and absolute reliability was evaluated by calculation of standard error of measurement (SEM) and minimal detectable change (MDC) using the raw ear scores. The Pearson correlation coefficient indicated that in both groups of children the ear scores of test and retest sessions were strongly and positively (greater than +0.8) correlated. The ear scores showed excellent ICC coefficient of consistency (0.78-0.82) and fair to excellent ICC coefficient of absolute agreement (0.62-0.74) in TA children and excellent ICC coefficients of consistency and absolute agreement in LD children (0.76-0.87). SEM and SEM% of the ear scores in TA children were 1.46 and 1.44% for the right ear and 4.68 and 5.47% for the left ear. SEM and SEM% of the ear scores in LD children were 4.55 and 5.88% for the right ear to 7.56 and 12.81% for the left ear. MDC and MDC% of the ear scores in TA children varied from 4.03 and 3.99% for the right ear to 12.93 and 15.13% for the left ear. MDC and MDC% of the ear scores in LD children varied from 12.57 and 16.25% for the right ear to 20.89 and 35.39% for the left ear. The LD children indicated test-retest relative reliability as high as TA children in the ear scores measured by PRDDT. However, within-subject variations of the ear scores calculated by indices of absolute reliability were considerably higher in LD children versus TA children. The results of the current study could have implications for detecting real training-related changes in the ear scores. American Academy of Audiology
[Evaluation on the validity and reliability of the Diabetes Self-management Knowledge, Attitude, and Behavior Assessment Scale (DSKAB)].

PubMed

Liu, Xiaoli; Dai, Long; Chen, Bo; Feng, Nongping; Wu, Qianhui; Lin, Yonghai; Zhang, Lan; Tan, Dong; Zhang, Jinhua; Tu, Huijuan; Li, Changfeng; Wang, Wenjuan

2016-01-01

To evaluate the validity and reliability of Diabetes Self-management Knowledge, Attitude, and Behavior Assessment Scale (DSKAB). We selected 460 patients with diabetes in the community, used the scale which was after two rounds of the Delphi method and pilot study. Investigators surveyed the patients by the way of face to face. by draw lots, we selected 25 community diabetes randomly for repeating investigations after one week. The validity analyses included face validity, content validity, construct validity and discriminant validity. The reliability analyses included Cronbach's α coefficient, θ coefficient, Ω coefficient, split-half reliability and test-retest reliability. This study distributed a total of 460 questionnaires, reclaimed 442, qualified 432. The score of the scale was 254.59 ± 28.90, the scores of the knowledge, attitude, behavior sub-scales were 82.44 ± 11.24, 63.53 ± 5.77 and 108.61 ± 17.55, respectively. It had excellent face validity and content validity. The correlation coefficient was from 0.71 to 0.91 among three sub-scales and the scale, P<0.001. The common factor cumulative variance contribution rate of the scale and three sub-scales was from 57.28% to 67.19%, which achieved more than 50% of the approved standard, there was 25 common factors, 91 items of the total 98 items held factor loading ≥0.40 in its relevant common factor, it had good construct validity. The scores of high group and low group in three sub-scales were: knowledge (91.12 ± 3.62) and (69.96 ± 11.20), attitude (68.75 ± 4.51) and (58.79 ± 4.87), behavior (129.38 ± 8.53) and (89.65 ± 11.34),mean scores of three sub-scales were apparently different, which compared between high score group and low score group, the t value were - 19.45, -16.24 and -30.29, respectively, P<0.001, and it had good discriminant validity. The Cronbach's α coefficient of the scale and three sub-scales was from 0.79 to 0.93, the θ coefficient was from 0.86 to 0.95, the Ω coefficient was from 0.90 to 0.98, split-half reliability was from 0.89 to 0.95.Test-retest reliability of the scale was 0.51;the three sub-scales was from 0.46 to 0.52, P<0.05. The validity and reliability of the Diabetes Self-management Knowledge, Attitude, and Behavior Assessment Scale are excellent, which is a suitable instrument to evaluate the self-management for patients with diabetes.
The validity and reliability of the Thai version of the Kujala score for patients with patellofemoral pain syndrome.

PubMed

Apivatgaroon, Adinun; Angthong, Chayanin; Sanguanjit, Prakasit; Chernchujit, Bancha

2016-10-01

To develop a Thai version of the Kujala score and show the evaluation of the validity and reliability of the score. The Thai version of the Kujala score was developed using the forward-backward translation protocol. The 49 PFPS patients answered the Thai version of questionnaires including the Kujala score, Short Form-36 (SF-36) and International Knee Documentation Committee (IKDC) Subjective Knee Form. The validity between the scores has been tested. The reliability was assessed using test-retest reliability and internal consistency. The Thai version of the Kujala score showed a good correlation with Thai IKDC Subjective Knee Form (Pearson's correlation coefficient; r = 0.74: p < 0.01) and moderate correlation with the Thai SF-36 subscales of physical component summary, total score and role physical (r = 0.586, 0.571 and 0.524, respectively: p < 0.01). The test-retest reliability was excellent with an intra-class correlation coefficient of 0.908 (p < 0.001; 95% CI [0.842-0.947]). The internal consistency was strong with Cronbach's alpha of 0.952 (p < 0.001). No floor and ceiling effects were observed. The Thai version of the Kujala score has shown good validity and reliability. This score can be effectively used for evaluating Thai patients with patellofemoral pain syndrome. Implications for Rehabilitation The Kujala score is a self-administered questionnaire for patients with patellofemoral pain syndrome (PFPS). The validity and reliability of the Thai version of Kujala are compatible with other versions (Turkish, Chinese and Persian version). The Thai version of Kujala has been shown to have validity and reliability in Thai PFPS patients and can be used for clinical evaluation and also in the research work.
[Validity, reliability, and acceptability of the scale of knowledge, attitude, and behavior of lifestyle intervention in a diabetes high-risk population].

PubMed

Wang, W J; Dong, J; Ren, Z P; Chen, B; He, W; Li, W D; Hao, Z W

2016-07-06

To evaluate the validity, reliability, and acceptability of the scale of knowledge, attitude, and behavior of lifestyle intervention in a diabetes high-risk population (HILKAB), and provide scientific evidence for its usage. By convenient sampling, we selected 406 individuals at high risk for diabetes for survey using the HILKAB. Pearson correlation coefficient, factor analysis, independent sampling, and t-test for high- and low-score groups were used to evaluate the content validity, construct validity, and discriminant validity of the scale. Reliability of the scale was evaluated by internal consistency, which included Cronbach's α coefficient, θ coefficient, Ω coefficient, and split-half reliability. Scale acceptability was evaluated by acceptance rate and completion time of the survey. In this study, 366 questionnaires (90.1%) was qnalified and the completion time was (8.62±2.79) minutes. Scores for knowledge, attitude, and behavior were 10.60±3.73, 26.56±3.58, 17.09±9.74, respectively. The scale had good face validity and content validity. The correlation coefficient of items and the dimension to which they belong was between 0.25 and 0.97, and the correlation coefficient of three dimensions and the entire scale was between 0.64 and 0.91, all with P<0.001. Factor analysis of the scale extracted eight common factors. The cumulative variance contribution rate was 65.23%, thereby reaching the 50% approved standard. Of 30 items there were 29 items with factor loadings ≥0.40, indicating the scale had good construct validity. For the high-score group, scores for knowledge, attitude, and behavior dimensions were 13.89±2.55, 29.56± 2.46, 28.05 ± 2.93, respectively, which were higher than those for the low-score group (7.67 ± 2.78, 23.89 ± 3.35, 6.25 ± 3.13); t-values were 55.14, 119.40, 95.29, respectively, with P<0.001. The scale consisted of three dimensions: knowledge, attitude, and behavior. The Cronbach's α coefficient was between 0.84 and 0.92, the θ coefficient was between 0.85 and 0.96, the Ω coefficient was between 0.90 and 0.94, and the split-half reliability was between 0.77 and 0.95, reaching the 0.70 standard letter. The validity, reliability, and acceptability of the HILKAB scale were satisfactory for use in a population at high risk of diabetes.
Reliability of the ecSatter Inventory as a tool to measure eating competence.

PubMed

Stotts, Jodi L; Lohse, Barbara

2007-01-01

To examine the reliability of the ecSatter Inventory (ecSI), a measure of eating competence. Self-report questionnaires were administered in person or by mail. Retesting occurred 2 to 6 weeks after completion of the first questionnaire. Both administrations of the questionnaire were completed by 259 participants who were mostly food secure, white females with some college education; mean age was 26.9 +/- 10.4 years. Test-retest reliability and internal consistency. Spearman's rank correlation coefficients to estimate test-retest reliability and Cronbach alpha coefficients to estimate internal consistency. Spearman's rank correlation coefficient for ecSI total score was 0.68; subscale coefficients were 0.70 for eating attitudes, 0.70 for contextual skills, 0.65 for food acceptance, and 0.52 for internal regulation. Cronbach alpha coefficient for ecSI total score was 0.77. Subscale alphas coefficients were 0.80 for eating attitudes, 0.69 for contextual skills, 0.68 for food acceptance, and 0.66 for internal regulation. This study provides psychometric evidence about the reliability of ecSI as a measure of eating competence in this sample. Although some ecSI items may require revision, results suggest that the instrument may be used to evaluate nutrition education designed to improve eating competence.
Is Coefficient Alpha Robust to Non-Normal Data?

PubMed Central

Sheng, Yanyan; Sheng, Zhaohui

2011-01-01

Coefficient alpha has been a widely used measure by which internal consistency reliability is assessed. In addition to essential tau-equivalence and uncorrelated errors, normality has been noted as another important assumption for alpha. Earlier work on evaluating this assumption considered either exclusively non-normal error score distributions, or limited conditions. In view of this and the availability of advanced methods for generating univariate non-normal data, Monte Carlo simulations were conducted to show that non-normal distributions for true or error scores do create problems for using alpha to estimate the internal consistency reliability. The sample coefficient alpha is affected by leptokurtic true score distributions, or skewed and/or kurtotic error score distributions. Increased sample sizes, not test lengths, help improve the accuracy, bias, or precision of using it with non-normal data. PMID:22363306
Increasing reliability of APACHE II scores in a medical-surgical intensive care unit: a quality improvement study.

PubMed

Donahoe, Laura; McDonald, Ellen; Kho, Michelle E; Maclennan, Margaret; Stratford, Paul W; Cook, Deborah J

2009-01-01

Given their clinical, research, and administrative purposes, scores on the Acute Physiology and Chronic Health Evaluation (APACHE) II should be reliable, whether calculated by health care personnel or a clinical information system. To determine reliability of APACHE II scores calculated by a clinical information system and by health care personnel before and after a multifaceted quality improvement intervention. APACHE II scores of 37 consecutive patients admitted to a closed, 15-bed, university-affiliated intensive care unit were collected by a research coordinator, a database clerk, and a clinical information system. After a quality improvement intervention focused on health care personnel and the clinical information system, the same methods were used to collect data on 32 consecutive patients. The research coordinator and the clerk did not know each other's scores or the information system's score. The data analyst did not know the source of the scores until analysis was complete. APACHE II scores obtained by the clerk and the research coordinator were highly reliable (intraclass correlation coefficient, 0.88 before vs 0.80 after intervention; P = .25). No significant changes were detected after the intervention; however, compared with scores of the research coordinator, the overall reliability of APACHE II scores calculated by the clinical information system improved (intraclass correlation coefficient, 0.24 before intervention vs 0.91 after intervention, P < .001). After completion of a quality improvement intervention, health care personnel and a computerized clinical information system calculated sufficiently reliable APACHE II scores for clinical, research, and administrative purposes.

A Reliability Generalization Meta-Analysis of Coefficient Alpha for the Maslach Burnout Inventory

ERIC Educational Resources Information Center

Wheeler, Denna L.; Vassar, Matt; Worley, Jody A.; Barnes, Laura L. B.

2011-01-01

The purpose of this study was to synthesize internal consistency reliability for the subscale scores on the Maslach Burnout Inventory (MBI). The authors addressed three research questions: (a) What is the mean subscale score reliability for the MBI across studies? (b) What factors are associated with observed variance in MBI subscale score…
The Pittsburgh Sleep Quality Index: validation of the Urdu translation.

PubMed

Hashmi, Ali Madeeh; Khawaja, Imran Shuja; Butt, Zeeshan; Umair, Muhammad; Naqvi, Suhaib Haider; Jawad-Ul-Haq

2014-02-01

To translate and validate the Pittsburgh Sleep Quality Index (PSQI), a standardized self-administered questionnaire for the assessment of subjective sleep quality into the Urdu language. Validation study. Mayo Hospital, Lahore, from March to April 2012. The PSQI was translated into Urdu following standard guidelines. The final Urdu version (PSQI-U) was administered to 200 healthy volunteers comprising medical students, nursing staff and doctors. Inter-item correlation was assessed by calculating Cronbach alpha. Correlation of component scores with global score was assessed by calculating Spearman correlation coefficient. Correlation between global PSQI-U scores at baseline with global scores for each PSQI-U and PSQI-E at 4-week interval was evaluated by calculating Spearman correlation coefficient. Moreover, scores on individual items of the scale at baseline were compared with respective scores after 4-week by t-test. One hundred and eighty five (185) participants completed the PSQI-U at baseline. The Cronbach alpha for PSQI-U was 0.56. Scores on individual components of the PSQI-U and composite scores were all highly correlated with each other (all p-values < 0.01). Composite scores for PSQI-U at baseline and PSQI-E at 4-week interval were also highly correlated with each other (Spearman correlation coefficient 0.74, p-value < 0.01) indicating good linguistic interchangeability. Composite scores for PSQI-U at baseline and at 4-week interval were positively correlated with each other (Spearman correlation coefficient 0.70, p < 0.01) indicating good test-retest reliability. The PSQI-U is a valid and reliable instrument for the assessment of sleep quality. It shows good linguistic interchangeability and test-retest reliability in comparison to the original English version when applied to individuals who speak the Urdu language. The PSQI-U can be a tool either for clinical management or research.
Validity and Reliability of the Golombok Rust Inventory of Sexual Satisfaction in Couples with Incontinent Partners.

PubMed

Lim, Renly; Liong, Men Long; Khan, Nurzalina Abdul Karim; Yuen, Kah Hay

2017-02-17

There is currently no published information on the validity and reliability of the Golombok Rust Inventory of Sexual Satisfaction in the Asian population, specifically in patients with stress urinary incontinence, which limits its use in this region. Our study aimed to evaluate the psychometric properties of this questionnaire in the Malaysian population. Ten couples were recruited for the pilot testing. The agreement between the English and Chinese or Malay versions were tested using the intraclass correlation coefficients, with results of more than 0.80 for all subscales and overall scores indicating good agreement. Sixty-six couples were included in the subsequent phase. The following data are presented in the order of English, Chinese, and Malay. Cronbach's alphas for the male total score were 0.82, 0.88, and 0.95. For the female total score, Cronbach's alphas were 0.76, 0.78, and 0.88. Intraclass correlation coefficients for the male total score were 0.93, 0.94, and 0.99, while intraclass correlation coefficients for the female total score were 0.89, 0.86, and 0.88. In conclusion, the English, Chinese, and Malay versions each proved to be valid and reliable in our Malaysian population.
Reliability and validity of a questionnaire for self-assessment of complete dentures.

PubMed

Komagamine, Yuriko; Kanazawa, Manabu; Kaiba, Yoshinori; Sato, Yusuke; Minakuchi, Shunsuke

2014-05-02

Demand for complete denture treatment is expected to rise over several decades. However, to date, no questionnaire on complete dentures, as evaluated by edentulous patients, has been shown to be reliable and valid. This study sought to assess the reliability and validity of Patient's Denture Assessment (PDA), which provides a multidimensional evaluation of dentures among edentulous patients. Patients, who had new complete dentures fabricated at the University Hospital of Dentistry, Tokyo Medical and Dental University through 2009 to 2010, were enrolled. The reliability of the PDA was determined by examining internal consistency and test-retest reliability. Internal consistency for all of the question items and the six subscales was measured using Cronbach's α and average inter-item correlation coefficients among 93 participants. For 33 of these participants, test-retest reliability was determined at a 2 month-interval using the interclass correlation coefficients (ICCs) and 95% confidence interval for the summary scores and the six subscale scores. The PDA was validated in 93 participants by examining the difference in the summary score and the six subscale scores of the PDA before and after replacement with new dentures by the paired t-test. Ability to detect change was also tested in 93 patients using effect size. The Cronbach's α for the PDA ranged from 0.56 to 0.93. The average inter-item correlation coefficients ranged from 0.28 to 0.83. ICCs for the PDA ranged from 0.37 to 0.83. The paired t-test showed a significant difference between the summary score and the six subscale scores before and after replacement with new dentures (p < 0.05) and the effect size was 0.97. The PDA demonstrated good reliability by assessing internal consistency and test-retest reliability. In addition, the PDA demonstrated good validity by assessing discriminant validity. Thus, the PDA could help dentists obtain a detailed understanding of the patients' perceptions in using their dentures.
Test-retest reliability of the safe driving behavior measure for community-dwelling elderly drivers.

PubMed

Song, Chiang-Soon; Lee, Joo-Hyun; Han, Sang-Woo

2016-06-01

[Purpose] The Safe Driving Behavior Measure (SDBM) is a self-report measurement tools that assesses the safe-driving behaviors of the elderly. The purpose of this study was to evaluate the test-retest reliability of the SDBM among community-dwelling elderly drivers. [Subjects and Methods] A total of sixty-one community-dwelling elderly were enrolled to investigate the reliability of the SDBM. The SDBM was assessed in two sessions that were conducted three days apart in a quiet and well-organized assessment room. That test-retest reliability of overall scores and three domain scores of the SDBM were statistically evaluated using intraclass correlation coefficients [ICC (2.1)]. Pearson correlation coefficients were used to quantify bivariate associations among the three domains of the SDBM. [Results] The SDBM demonstrated excellent rest-retest reliability for community-dwelling elderly drivers. The Cronbach alpha coefficients of the three domains of person-vehicle (0.979), person-environment (0.944), and person-vehicle-environment (0.971) of the SDBM indicate high internal consistency. [Conclusion] The results of this study suggest that the SDBM is a reliable measure for evaluating the safe- driving of automobiles by community-dwelling elderly, and is adequate for detecting changes in scores in clinical settings.
[Validating the Spanish version of the Nursing Activities Score].

PubMed

Sánchez-Sánchez, M M; Arias-Rivera, S; Fraile-Gamo, M P; Thuissard-Vasallo, I J; Frutos-Vivar, F

2015-01-01

Validating workload scores ensures that they are appropriate for the purpose for which they were developed. To validate the Nursing Activities Score (NAS) Spanish version. Observational and prospective study. 1,045 patients who were admitted to a medical-surgical unit and a serious burns unit in 2006 were included. The nurse in charge assessed patient workloads by Nine Equivalent of Nursing Manpower use Score and NAS. To assess the internal consistency of the measurements of NAS, item-test correlations, Cronbach's α and Cronbach's α corrected by omitting each of the items were calculated. The intraobserver and interobserver reliability were assessed with the intraclass correlation coefficient by viewing recordings and Kappa (interobserver reliability) was estimated. For the analysis of internal validity, a factorial principal components analysis was performed. Convergent validity was assessed using the Spearman correlation coefficient values obtained from the Nine Equivalent of Nursing Manpower use Score and Spanish-NAS scales. For internal consistency, 164 questionnaires were analysed and a Cronbach's α of 0.373 was calculated. The intraclass correlation coefficient for intraobserver reliability estimate was 0.837 (95% IC: 0.466-0.950) and 0.662 (95% IC: 0.033-0.882) for interobserver reliability. The estimated kappa was 0.371. For internal validity, exploratory factor analysis showed that the first item explained 58.9% of the variance of the questionnaire. For convergent validity 1006 questionnaires were included and a Spearman correlation coefficient of 0.746 was observed. The psychometric properties of Spanish-NAS are acceptable. Copyright © 2014 Elsevier España, S.L.U. y SEEIUC. All rights reserved.
High inter-rater reliability, agreement, and convergent validity of Constant score in patients with clavicle fractures.

PubMed

Ban, Ilija; Troelsen, Anders; Kristensen, Morten Tange

2016-10-01

The Constant score (CS) has been the primary endpoint in most studies on clavicle fractures. However, the CS was not developed to assess patients with clavicle fractures. Our aim was to examine inter-rater reliability and agreement of the CS in patients with clavicle fractures. The secondary aim was to estimate the correlation between the CS and the Disabilities of the Arm, Shoulder and Hand score and the internal consistency of the 2 scores. On the basis of sample sizing, 36 patients (31 male and 5 female patients; mean age, 41.3 years) with clavicle fractures underwent standardized CS assessment at a mean of 6.8 weeks (SD, 1.0 weeks) after injury. Reliability and agreement of the CS were determined by 2 raters. The interclass correlation coefficient (ICC2,1), standard error of measurement, minimal detectable change, Cronbach α coefficient, and Pearson correlation coefficient were estimated. Inter-rater reliability of the total CS was excellent (interclass correlation coefficient, 0.94; 95% confidence interval, 0.88-0.97), with no systematic difference between the 2 raters (P = .75). The standard error of measurement (measurement error at the group level) was 4.9, whereas the minimal detectable change (smallest change needed to indicate a real change for an individual) was 13.6 CS points. The internal consistency of the 10 CS items was good, with a Cronbach α of .85, and we found a strong correlation (r = -0.92) between the CS and Disabilities of the Arm, Shoulder and Hand score. The CS was found to be reliable for assessing patients with clavicle fractures, especially at the group level. With high inter-rater reliability and agreement, in addition to good internal consistency, the standardized CS used in this study can be used for comparison of results from different settings. Copyright © 2016 Journal of Shoulder and Elbow Surgery Board of Trustees. Published by Elsevier Inc. All rights reserved.
[Reliability and validity of the modified Perceived Health Competence Scale (PHCS) Japanese version].

PubMed

Togari, Taisuke; Yamazaki, Yoshihiko; Koide, Syotaro; Miyata, Ayako

2006-01-01

In community and workplace health plans, the Perceived Health Competence Scale (PHCS) is employed as an index of health competency. The purpose of this research was to examine the reliability and validity of a modified Japanese PHCS. Interviews were sought with 3,000 randomly selected Japanese individuals using a two-step stratified method. Valid PHCS responses were obtained from 1,910 individuals, yielding a 63.7% response rate. Reliability was assessed using Cronbach's alpha coefficient (henceforth, alpha) to evaluate internal consistency, and by employing item-total correlation and alpha coefficient analyses to assess the effect of removal of variables from the model. To examine content validity, we assessed the correlation between the PHCS score and four respondent attribute characteristics, that is, sex, age, the presence of chronic disease, and the existence of chronic disease at age 18. The correlation between PHCS score and commonly employed healthy lifestyle indices was examined to assess construct validity. General linear model statistical analysis was employed. The modified Japanese PHCS demonstrated a satisfactory alpha coefficient of 0.869. Moreover, reliability was confirmed by item-total correlation and alpha coefficient analyses after removal of variables from the model. Differences in PHCS scores were seen between individuals 60 years and older, and younger individuals. These with current chronic disease, or who had had a chronic disease at age 18, tended to have lower PHCS scores. After controlling for the presence of current or age 18 chronic disease, age, and sex, significant correlations were seen between PHCS scores and tobacco use, dietary habits, and exercise, but not alcohol use or frequency of medical consultation. This study supports the reliability and validity, and hence supports the use, of the modified Japanese PHCS. Future longitudinal research is needed to evaluate the predictive power of modified Japanese PHCS scores, to examine factors influencing the development of perceived health competence, and to assess the effects of interventions on perceived health competence.
Translation and adaptation of the fatigue severity scale for use in Portugal.

PubMed

Laranjeira, Carlos António

2012-08-01

The Fatigue Severity Scale (FSS) is a widely used instrument to measure the impact of fatigue on specific types of functioning. This study aims to translate and test the reliability and validity of the Portuguese version of the FSS. The questionnaire was administered to a worker sample of 424 nurses. Reliability analysis showed satisfactory results (Cronbach's alpha coefficient = .87). The test-retest reliability was .85. The principal component analysis showed that the FSS was a measure with a one-factor structure. The construct validity of the total FSS score was assessed by correlation with Maslach Burnout Inventory (MBI) score, Depression Anxiety Stress Scale (DASS) score, and Visual Analogue Scale (VAS) score. Each of the corresponding correlation coefficients among the total FSS score and MBI score, DASS score, and perceived fatigue score (VAS) were .55 (p < .01), .62 (p < .01), and .68 (p < .01), respectively, which shows sufficient construct validity. To measure the discriminant validity of FSS, we examined the differences in scores between groups in terms of the number of hours of sleep and overtime. The less nurses slept and the longer they worked, the higher their total FSS score became. This preliminary validation study of the Portuguese version of FSS proved that it is an acceptable, reliable, and valid measure of fatigue in the working population. Copyright © 2012 Elsevier Inc. All rights reserved.
Monte Carlo Approach for Reliability Estimations in Generalizability Studies.

ERIC Educational Resources Information Center

Dimitrov, Dimiter M.

A Monte Carlo approach is proposed, using the Statistical Analysis System (SAS) programming language, for estimating reliability coefficients in generalizability theory studies. Test scores are generated by a probabilistic model that considers the probability for a person with a given ability score to answer an item with a given difficulty…
Neurology objective structured clinical examination reliability using generalizability theory

PubMed Central

Park, Yoon Soo; Lukas, Rimas V.; Brorson, James R.

2015-01-01

Objectives: This study examines factors affecting reliability, or consistency of assessment scores, from an objective structured clinical examination (OSCE) in neurology through generalizability theory (G theory). Methods: Data include assessments from a multistation OSCE taken by 194 medical students at the completion of a neurology clerkship. Facets evaluated in this study include cases, domains, and items. Domains refer to areas of skill (or constructs) that the OSCE measures. G theory is used to estimate variance components associated with each facet, derive reliability, and project the number of cases required to obtain a reliable (consistent, precise) score. Results: Reliability using G theory is moderate (Φ coefficient = 0.61, G coefficient = 0.64). Performance is similar across cases but differs by the particular domain, such that the majority of variance is attributed to the domain. Projections in reliability estimates reveal that students need to participate in 3 OSCE cases in order to increase reliability beyond the 0.70 threshold. Conclusions: This novel use of G theory in evaluating an OSCE in neurology provides meaningful measurement characteristics of the assessment. Differing from prior work in other medical specialties, the cases students were randomly assigned did not influence their OSCE score; rather, scores varied in expected fashion by domain assessed. PMID:26432851
Neurology objective structured clinical examination reliability using generalizability theory.

PubMed

Blood, Angela D; Park, Yoon Soo; Lukas, Rimas V; Brorson, James R

2015-11-03

This study examines factors affecting reliability, or consistency of assessment scores, from an objective structured clinical examination (OSCE) in neurology through generalizability theory (G theory). Data include assessments from a multistation OSCE taken by 194 medical students at the completion of a neurology clerkship. Facets evaluated in this study include cases, domains, and items. Domains refer to areas of skill (or constructs) that the OSCE measures. G theory is used to estimate variance components associated with each facet, derive reliability, and project the number of cases required to obtain a reliable (consistent, precise) score. Reliability using G theory is moderate (Φ coefficient = 0.61, G coefficient = 0.64). Performance is similar across cases but differs by the particular domain, such that the majority of variance is attributed to the domain. Projections in reliability estimates reveal that students need to participate in 3 OSCE cases in order to increase reliability beyond the 0.70 threshold. This novel use of G theory in evaluating an OSCE in neurology provides meaningful measurement characteristics of the assessment. Differing from prior work in other medical specialties, the cases students were randomly assigned did not influence their OSCE score; rather, scores varied in expected fashion by domain assessed. © 2015 American Academy of Neurology.
Reliability of the Balance Evaluation Systems Test (BESTest) and BESTest sections for adults with hemiparesis

PubMed Central

Rodrigues, Letícia C.; Marques, Aline P.; Barros, Paula B.; Michaelsen, Stella M.

2014-01-01

BACKGROUND: The Balance Evaluation Systems Test (BESTest) was recently created to allow the development of treatments according to the specific balance system affected in each patient. The Brazilian version of the BESTest has not been specifically tested after stroke. OBJECTIVE: To evaluate the intra- and inter-rater reliability and concurrent and convergent validity of the total score of the BESTest and BESTest sections for adults with hemiparesis after stroke. METHOD: The study included 16 subjects (61.1±7.5 years) with chronic hemiparesis (54.5±43.5 months after stroke). The BESTest was administered by two raters in the same week and one of the raters repeated the test after a one-week interval. Intraclass correlation coefficient (ICC) was calculated to assess intra- and interrater reliability. Concurrent validity with the Berg Balance Scale (BBS) and convergent validity with the Activities-specific Balance Confidence scale (ABC-Brazil) were assessed using Pearson's correlation coefficient. RESULTS: Both the BESTest total score (ICC=0.98) and the BESTest sections (ICC between 0.85 and 0.96) have excellent intrarater reliability. Interrater reliability for the total score was excellent (ICC=0.93) and, for the sections, it ranged between 0.71 and 0.94. The correlation coefficient between the BESTest and the BBS and ABC-Brazil were 0.78 and 0.59, respectively. CONCLUSIONS: The Brazilian version of the BESTest demonstrated adequate reliability when measured by sections and could identify what balance system was affected in patients after stroke. Concurrent validity was excellent with the BBS total score and good to excellent with the sections. The total scores but not the sections present adequate convergent validity with the ABC-Brazil. However, other psychometric properties should be further investigated. PMID:25003281
[Analysis of reliability and validity of the Chinese Version of Dizziness Handicap Inventory (DHI)].

PubMed

Zhang, Yi; Liu, Bo; Wang, Yongjun; Zhou, Yun; Wang, Rui; Gong, Jing; Peng, Xiaoxia

2015-09-01

To investigate the reliability and validity of the Chinese Version of Dizziness Handicap Inventory (DHI). Cross-cultural adaptation of health-related quality of life measures was used for translating the DHI to Chinese version. The DHI contains 7 physical, 9 emotional, and 9 functional questions.The patients scored the DHI straightforward. Then the scores of the total scale and each subscales were calculated and evaluated.Three hundred and sixty-six dizzy patients,116 males and 250 females, aged from 14 to 79 years, were included in the research and finished the questionnaire of the evaluation.SPSS 13.0 was used for statistical analysis. Reliability: Cronbach α values for the total and subscale of DHI were 0.751-0.912. The reliability coefficients were 0.877-0.921 (P < 0.001). the correlation coefficients between the total scale and three subscale were 0.815-0.934 (P < 0.001). The correlation coefficients of the scores internal the subscale were higher than between the other subscale (r = 0.446-0.781). Common factor analysis provides 5 factors. The cumulative variance ratio was 54.5%. The component of each item was over 0.4. Chinese version of DHI has good reliability and validity, which can be used to evaluate the dizzy patients.
The development and validation of a questionnaire for rotator cuff disorders: The Functional Shoulder Score

PubMed Central

Ibrahim, Edward F; Petrou, Charalambos; Galanos, Antonis

2015-01-01

Background The purpose of the present study was to validate the Functional Shoulder Score (FSS), a new patient-reported outcome score specifically designed to evaluate patients with rotator cuff disorders. Methods One hundred and nineteen patients were assessed using two shoulder scoring systems [the FSS and the Constant–Murley Score (CMS)] at 3 weeks pre- and 6 months post-arthroscopic rotator cuff surgery. The reliability, validity, responsiveness and interpretability of the FSS were evaluated. Results Reliability analysis (test–retest) showed an intraclass correlation coefficient value of 0.96 [95% confidence interval (CI) = 0.92 to 0.98]. Internal consistency analysis revealed a Cronbach's alpha coefficient of 0.93. The Pearson correlation coefficient FSS-CMS was 0.782 pre-operatively and 0.737 postoperatively (p < 0.0005). There was a statistically significant increase in FSS scores postoperatively, an effect size of 3.06 and standardized response mean of 2.80. The value for minimal detectable change was ±8.38 scale points (based on a 90% CI) and the minimal clinically important difference for improvement was 24.7 ± 5.4 points. Conclusions The FSS is a patient-reported outcome measure that can easily be incorporated into clinical practice, providing a quick, reliable, valid and practical measure for rotator cuff problems. The questionnaire is highly sensitive to clinical change. PMID:27582986
Transcultural validation of the Oxford Shoulder Score for the French-speaking population.

PubMed

Tuton, D; Barbe, C; Salmon, J-H; Dramé, M; Nérot, C; Ohl, X

2016-09-01

Patient-reported outcome measures (PROMs) have been gaining in popularity over the last decade. The Oxford Shoulder Score (OSS) is a well-established self-administered questionnaire for shoulder evaluation adapted for the English-speaking population. The aim of the present study was to develop a translation and a transcultural adaptation of the OSS and to assess its validity in native French-speaker patients with shoulder pain. The translation process was carried out following a translation/back-translation methodology by two translators. All patients completed the French OSS, the Subjective Shoulder Value (SSV), and the Constant score. Internal consistency was tested using Cronbach's α coefficient. Validity was assessed by calculating the Pearson correlation coefficient between the OSS and the Constant score and the SSV. One hundred forty-four patients suffering from degenerative or inflammatory diseases of the shoulder were included in this study. The average time required to complete the French OSS was 2min and 45s. Seventy patients were asked to complete the questionnaire twice (test/retest reliability). Internal consistency was high with Cronbach's α coefficient=0.93. The intraclass correlation coefficient was 0.91 (95% CI: 0.88-0.94) for test/retest reliability. The French OSS score was significantly correlated with the Constant-Murley score (r=0.73 and P<0.0001) and with the SSV (r=0.68 and P<0.0001). The present study shows that the French version of the OSS is reliable, valid, and reproducible. The sensitivity to change now needs to be evaluated. This score was adapted to the French-speaking population for the self-assessment of patients with degenerative or inflammatory disorders of the shoulder. Level 1, Test of previously developed criteria, diagnostic test study. Copyright © 2016 Elsevier Masson SAS. All rights reserved.
An interrater reliability study of the Braden scale in two nursing homes.

PubMed

Kottner, Jan; Dassen, Theo

2008-10-01

Adequate risk assessment is essential in pressure ulcer prevention. Assessment scales were designed to support practitioners in identifying persons at pressure ulcer risk. The Braden scale is one of the most extensively studied risk assessment instruments, although the majority of studies focused on validity rather than reliability. The first aim was to measure the interrater reliability of the Braden scale and its individual items. The second aim was to study different statistical approaches regarding interrater reliability estimation. An interrater reliability study was conducted in two German nursing homes. Residents (n = 152) from 8 units were assessed twice. The raters were trained nurses with a work experience ranging from 0.5 to 30 years. Data were analysed using an overall percentage of agreement, weighted and unweighted kappa and the intraclass correlation coefficient. Differences between nurses rating the overall Braden score ranged from 0 up to 9 points. Interrater reliability expressed by the intraclass correlation coefficient ranged from 0.73 (95% CI 0.26 - 0.91) to 0.95 (95% CI 0.87 - 0.98). Calculated intraclass correlation coefficients for individual items ranged from 0.06 (95% CI -0.31 to 0.48) to 0.97 (95% CI 0.93-0.99) with the lowest values being measured for the items "sensory perception" and "nutrition". There was no association between work experience and the level of interrater reliability. With two exceptions, simple kappa-values were always lower than weighted kappa-values and intraclass correlation coefficients. Although the calculated interrater reliability coefficients for the total Braden score were high in some cases, several clinically relevant differences occurred between the nurses. Due to interrater reliability being very low for the items "sensory perception" and "nutrition", it is doubtful if their assessment contributes to any valid results. The calculation of weighted kappa or intraclass correlation coefficients is the most appropriate interrater reliability estimates.
Inter-Rater Reliability and Intra-Rater Reliability of Assessing the 2-Minute Push-Up Test.

PubMed

Fielitz, Lynn; Coelho, Jeffrey; Horne, Thomas; Brechue, William

2016-02-01

The purpose of this study was to assess inter-rater reliability and intra-rater reliability of the 2-minute, 90° push-up test as utilized in the Army Physical Fitness Test. Analysis of rater assessment reliability included both total score agreement and agreement across individual push-up repetitions. This study utilized 8 Raters who assessed 15 different videotaped push-up performances over 4 iterations separated by a minimum of 1 week. The 15 push-up participants were videotaped during the semiannual Army Physical Fitness Test. Each Rater randomly viewed the 15 push-up and verbally responded with a "yes" or "no" to each push-up repetition. The data generated were analyzed using the Pearson product-moment correlation as well as the kappa, modified kappa and the intra-class correlation coefficient (3,1). An attribute agreement analysis was conducted to determine the percent of inter-rater and intra-rater agreement across individual push-ups.The results indicated that Raters varied a great deal in assessing push-ups. Over the 4 trials of 15 participants, the overall scores of the Raters varied between 3.0 and 35.7 push-ups. Post hoc comparisons found that there was significant increase in the grand mean of push-ups from trials 1-3 to trial 4 (p < 0.05). Also, there was a significant difference among raters over the 4 trials (p < 0.05). Pearson correlation coefficients for inter-rater and intra-rater reliability identified inter-rater reliability coefficients were between 0.10 and 0.97. Intra-rater coefficients were between 0.48 and 0.99. Intra-rater agreement for individual push-up repetitions ranged from 41.8% to 84.8%. The results indicated that the raters failed to assess the same push-up repetition with the same score (below 70% agreement) as well as failed to agree when viewed between raters (29%). Interestingly, as previously mentioned, scores on trial 4 increased significantly which might have been caused by rater drift or that the Raters did not maintain the push-up standard over the trials. It does appear that the final push-up scores received by each participant was a close approximation of actual performance (within 65%) but when assessing physical performance for retention in the Army, a more reliable test might be considered. Reprint & Copyright © 2016 Association of Military Surgeons of the U.S.
Validity and reliability of Nintendo Wii Fit balance scores.

PubMed

Wikstrom, Erik A

2012-01-01

Interactive gaming systems have the potential to help rehabilitate patients with musculoskeletal conditions. The Nintendo Wii Balance Board, which is part of the Wii Fit game, could be an effective tool to monitor progress during rehabilitation because the board and game can provide objective measures of balance. However, the validity and reliability of Wii Fit balance scores remain unknown. To determine the concurrent validity of balance scores produced by the Wii Fit game and the intrasession and intersession reliability of Wii Fit balance scores. Descriptive laboratory study. Sports medicine research laboratory. Forty-five recreationally active participants (age = 27.0 ± 9.8 years, height = 170.9 ± 9.2 cm, mass = 72.4 ± 11.8 kg) with a heterogeneous history of lower extremity injury. Participants completed a single-limb-stance task on a force plate and the Star Excursion Balance Test (SEBT) during the first test session. Twelve Wii Fit balance activities were completed during 2 test sessions separated by 1 week. Postural sway in the anteroposterior (AP) and mediolateral (ML) directions and the AP, ML, and resultant center-of-pressure (COP) excursions were calculated from the single-limb stance. The normalized reach distance was recorded for the anterior, posteromedial, and posterolateral directions of the SEBT. Wii Fit balance scores that the game software generated also were recorded. All 96 of the calculated correlation coefficients among Wii Fit activity outcomes and established balance outcomes were interpreted as poor (r < 0.50). Intrasession reliability for Wii Fit balance activity scores ranged from good (intraclass correlation coefficient [ICC] = 0.80) to poor (ICC = 0.39), with 8 activities having poor intrasession reliability. Similarly, 11 of the 12 Wii Fit balance activity scores demonstrated poor intersession reliability, with scores ranging from fair (ICC = 0.74) to poor (ICC = 0.29). Wii Fit balance activity scores had poor concurrent validity relative to COP outcomes and SEBT reach distances. In addition, the included Wii Fit balance activity scores generally had poor intrasession and intersession reliability.
Validity and cross-cultural adaptation of the persian version of the oxford elbow score.

PubMed

Ebrahimzadeh, Mohammad H; Kachooei, Amir Reza; Vahedi, Ehsan; Moradi, Ali; Mashayekhi, Zeinab; Hallaj-Moghaddam, Mohammad; Azami, Mehran; Birjandinejad, Ali

2014-01-01

Oxford Elbow Score (OES) is a patient-reported questionnaire used to assess outcomes after elbow surgery. The aim of this study was to validate and adapt the OES into Persian language. After forward-backward translation of the OES into Persian, a total number of 92 patients after elbow surgeries completed the Persian OES along with the Persian DASH and SF-36. To assess test-retest reliability, 31 randomly selected patients (34%) completed the Persian OES again after three days while abstaining from all forms of therapeutic regimens. Reliability of the Persian OES was assessed by measuring intraclass correlation coefficient (ICC) for test-retest reliability and Cronbach's alpha for internal consistency. Spearman's correlation coefficient was used to test the construct validity. Cronbach's alpha coefficient was 0.92 showing excellent reliability. Cronbach's alpha for function, pain, and social-psychological subscales was 0.95, 0.86, and 0.85, respectively. Intraclass correlation coefficient (ICC) was 0.85 for the overall questionnaire and 0.90, 0.76, and 0.75 for function, pain, and social-psychological subscales, respectively. Construct validity was confirmed as the Spearman correlation between OES and DASH was 0.80. Persian OES is a valid and reliable patient-reported outcome measure to assess postsurgical elbow status in Persian speaking population.

The Balanced Inventory of Desirable Responding (BIDR): A Reliability Generalization Study

ERIC Educational Resources Information Center

Li, Andrew; Bagger, Jessica

2007-01-01

The Balanced Inventory of Desirable Responding (BIDR) is one of the most widely used social desirability scales. The authors conducted a reliability generalization study to examine the typical reliability coefficients of BIDR scores and explored factors that explained the variability of reliability estimates across studies. The results indicated…
Reliability Generalization of Scores on the Spielberger State-Trait Anxiety Inventory.

ERIC Educational Resources Information Center

Barnes, Laura L. B.; Harp, Diane; Jung, Woo Sik

2002-01-01

Conducted a reliability generalization study for the State-Trait Anxiety Inventory (C. Spielberger, 1983) by reviewing and classifying 816 research articles. Average reliability coefficients were acceptable for both internal consistency and test-retest reliability, but variation was present among the estimates. Other differences are discussed.…
Reliable change indices and standardized regression-based change score norms for evaluating neuropsychological change in children with epilepsy.

PubMed

Busch, Robyn M; Lineweaver, Tara T; Ferguson, Lisa; Haut, Jennifer S

2015-06-01

Reliable change indices (RCIs) and standardized regression-based (SRB) change score norms permit evaluation of meaningful changes in test scores following treatment interventions, like epilepsy surgery, while accounting for test-retest reliability, practice effects, score fluctuations due to error, and relevant clinical and demographic factors. Although these methods are frequently used to assess cognitive change after epilepsy surgery in adults, they have not been widely applied to examine cognitive change in children with epilepsy. The goal of the current study was to develop RCIs and SRB change score norms for use in children with epilepsy. Sixty-three children with epilepsy (age range: 6-16; M=10.19, SD=2.58) underwent comprehensive neuropsychological evaluations at two time points an average of 12 months apart. Practice effect-adjusted RCIs and SRB change score norms were calculated for all cognitive measures in the battery. Practice effects were quite variable across the neuropsychological measures, with the greatest differences observed among older children, particularly on the Children's Memory Scale and Wisconsin Card Sorting Test. There was also notable variability in test-retest reliabilities across measures in the battery, with coefficients ranging from 0.14 to 0.92. Reliable change indices and SRB change score norms for use in assessing meaningful cognitive change in children following epilepsy surgery are provided for measures with reliability coefficients above 0.50. This is the first study to provide RCIs and SRB change score norms for a comprehensive neuropsychological battery based on a large sample of children with epilepsy. Tables to aid in evaluating cognitive changes in children who have undergone epilepsy surgery are provided for clinical use. An Excel sheet to perform all relevant calculations is also available to interested clinicians or researchers. Copyright © 2015 Elsevier Inc. All rights reserved.
The Score Reliability of Draw-a-Person Intellectual Ability Test (DAP: IQ) for Rural Malawi Students

ERIC Educational Resources Information Center

Khasu, Denis S.; Williams, Thomas O., Jr.

2016-01-01

In this brief article, the reliability of scores for the Draw-A-Person Intellectual Ability Test for Children, Adolescents, and Adults (DAP: IQ; Reynolds & Hickman, 2004) was examined through several analyses with a sample of 147 children from rural Malawi, Africa using a Chichewa translation of instructions. Cronbach alpha coefficients for…
Reliability and validity of the Dutch pediatric Voice Handicap Index.

PubMed

Veder, Laura; Pullens, Bas; Timmerman, Marieke; Hoeve, Hans; Joosten, Koen; Hakkesteegt, Marieke

2017-05-01

The pediatric voice handicap index (pVHI) has been developed to provide a better insight into the parents' perception of their child's voice related quality of life. The purpose of the present study was to validate the Dutch pVHI by evaluating its internal consistency and reliability. Furthermore, we determined the optimal cut-off point for a normal pVHI score. All items of the English pVHI were translated into Dutch. Parents of children in our dysphonic and control group were asked to fill out the questionnaire. For the test re-test analysis we used a different study group who filled out the pVHI twice as part of a large follow up study. Internal consistency was analyzed through Cronbach's α coefficient. The test-retest reliability was assessed by determining Pearson's correlation coefficient. Mann-Whitney test was used to compare the scores of the questionnaire of the control group with the dysphonic group. By calculating receiver operating characteristic (ROC) curves, sensitivity and specificity we were able to set a cut-off point. We obtained data from 122 asymptomatic children and from 79 dysphonic children. The scores of the questionnaire significantly differed between both groups. The internal consistency showed an overall Cronbach α coefficient of 0.96 and an excellent test-retest reliability of the total pVHI questionnaire with a Pearson's correlation coefficient of 0.90. A cut-off point for the total pVHI questionnaire was set at 7 points with a specificity of 85% and sensitivity of 100%. A cut-off point for the VAS score was set at 13 with a specificity of 93% and sensitivity of 97%. The Dutch pVHI is a valid and reliable tool for the assessment of children with voice problems. By setting a cut-off point for the score of the total pVHI questionnaire of 7 points and the VAS score of 13, the pVHI might be used as a screening tool to assess dysphonic complaints and the pVHI might be a useful and complementary tool to identify children with dysphonia. Copyright © 2017 Elsevier B.V. All rights reserved.
Validation of the Japanese version of the Pediatric Quality of Life Inventory (PedsQL) Cancer Module.

PubMed

Tsuji, Naoko; Kakee, Naoko; Ishida, Yasushi; Asami, Keiko; Tabuchi, Ken; Nakadate, Hisaya; Iwai, Tsuyako; Maeda, Miho; Okamura, Jun; Kazama, Takuro; Terao, Yoko; Ohyama, Wataru; Yuza, Yuki; Kaneko, Takashi; Manabe, Atsushi; Kobayashi, Kyoko; Kamibeppu, Kiyoko; Matsushima, Eisuke

2011-04-10

The PedsQL 3.0 Cancer Module is a widely used instrument to measure pediatric cancer specific health-related quality of life (HRQOL) for children aged 2 to 18 years. We developed the Japanese version of the PedsQL Cancer Module and investigated its reliability and validity among Japanese children and their parents. Participants were 212 children with cancer and 253 of their parents. Reliability was determined by internal consistency using Cronbach's coefficient alpha and test-retest reliability using intra-class correlation coefficient (ICC). Validity was assessed through factor validity, convergent and discriminant validity, concurrent validity, and clinical validity. Factor validity was examined by exploratory factor analysis. Convergent and discriminant validity were examined by multitrait scaling analysis. Concurrent validity was assessed using Spearman's correlation coefficients between the Cancer Module and Generic Core Scales, and the comparison of the scores of child self-reports with those of other self-rating depression scales for children. Clinical validity was assessed by comparing the on- and off- treatment scores using Kruskal-Wallis and Mann-Whitney U tests. Cronbach's coefficient alpha was over 0.70 for the total scale and over 0.60 for each subscale by age except for the 'pain and hurt' subscale for children aged 5 to 7 years. For test-retest reliability, the ICC exceeded 0.70 for the total scale for each age. Exploratory factor analysis demonstrated sufficient factorial validity. Multitrait scaling analysis showed high success rates. Strong correlations were found between the reports by children and their parents, and the scores of the Cancer Module and the Generic Core Scales except for 'treatment anxiety' subscales for child reports. The Depression Self-Rating Scale for Children (DSRS-C) scores were significantly correlated with emotional domains and the total score of the cancer module. Children who had been off treatment over 12 months demonstrated significantly higher scores than those on treatment. The results demonstrate the reliability and validity of the Japanese version of the PedsQL Cancer Module among Japanese children.
Application of the diligence inventory in dental education.

PubMed

Jasinevicius, T R; Bernard, H; Schuttenberg, E M

1998-04-01

The fifty-five-item Diligence Inventory for Higher Education (DI-HE) was applied to a new subject group--190 dental students. After item and factor analysis, a fifty-item (four subscale) inventory best reflected this group. The DI-HE's split half reliability was 0.81 (p < 0.001), the reliability coefficient for the pre- and post-test was 0.68 (p < 0.01), and the correlation coefficient alpha was 0.90. The DI-HE scores were high, with no statistical differences among the four classes. Overall, significant relationships were found between grade point averages (GPAs) and DI-HE total and subscale scores, with r values as high as 0.44. While female students' DI-HE scores were significantly higher (p = 0.023) than male students' scores, no correlations between DI-HE scores and GPAs for females were found. The results suggest that DI-HE may be useful for assessment purposes in professional education.
Psychometric Properties of Scores from the Web-based LibQUAL+ Study of Perceptions of Library Service Quality.

ERIC Educational Resources Information Center

Cook, Colleen; Thompson, Bruce

2001-01-01

Investigated the psychometric integrity of scores from the LibQUAL+ evaluation of perceived library service quality conducted by ARL (Association of Research Libraries). Examines score structure, score reliability, score correlation and concurrent validity coefficients, scale means, and scale standardized norms, and considers the potential of the…
Validation of Turkish version of brief negative symptom scale.

PubMed

Polat Nazlı, Irmak; Ergül, Ceylan; Aydemir, Ömer; Chandhoke, Swati; Üçok, Alp; Gönül, Ali Saffet

2016-11-01

Negative symptoms in schizophrenia have been assessed by many instruments. However, a current consensus on these symptoms has been built and new tools, such as the Brief Negative Symptom Scale (BNSS), are generated. This study aimed to evaluate reliability and validity of the Turkish version of BNSS. The scale was translated to Turkish and backtranslated to English. After the approval of the translation, 75 schizophrenia patients were interviewed with BNSS, Positive and Negative Syndrome Scale (PANSS), Calgary Depression Scale for Schizophrenia (CDSS) and Extrapyramidal Symptom Rating Scale (ESRS). Reliability and validity analyses were then calculated. In the reliability analysis, the Cronbach's alpha coefficient was 0.96 and item-total score correlation coefficients were between 0.655-0.884. The intraclass correlation coefficient was 0.665. The inter-rater reliability was 0.982 (p < 0.0001). In the validity analysis, the total score of BNSS-TR was correlated with PANSS Total Score, Positive Symptoms Subscale, Negative Symptoms Subscale, and General Psychopathology Subscale. CDSS and ESRS were not correlated with BNSS-TR. The factor structure of the scale was consisting the same items as in the original version. Our study confirms that the Turkish version of BNSS is an applicable tool for the evaluation of negative symptoms in schizophrenia.
MEASURING SPORT-SPECIFIC PHYSICAL ABILITIES IN MALE GYMNASTS: THE MEN'S GYMNASTICS FUNCTIONAL MEASUREMENT TOOL.

PubMed

Sleeper, Mark D; Kenyon, Lisa K; Elliott, James M; Cheng, M Samuel

2016-12-01

Despite the availability of various field-tests for many competitive sports, a reliable and valid test specifically developed for use in men's gymnastics has not yet been developed. The Men's Gymnastics Functional Measurement Tool (MGFMT) was designed to assess sport-specific physical abilities in male competitive gymnasts. The purpose of this study was to develop the MGFMT by establishing a scoring system for individual test items and to initiate the process of establishing test-retest reliability and construct validity. A total of 83 competitive male gymnasts ages 7-18 underwent testing using the MGFMT. Thirty of these subjects underwent re-testing one week later in order to assess test-retest reliability. Construct validity was assessed using a simple regression analysis between total MGFMT scores and the gymnasts' USA-Gymnastics competitive level to calculate the coefficient of determination (r 2 ). Test-retest reliability was analyzed using Model 1 Intraclass correlation coefficients (ICC). Statistical significance was set at the p<0.05 level. The relationship between total MGFMT scores and subjects' current USA-Gymnastics competitive level was found to be good (r 2 = 0.63). Reliability testing of the MGFMT composite test score showed excellent test-retest reliability over a one-week period (ICC = 0.97). Test-retest reliability of the individual component tests ranged from good to excellent (ICC = 0.75-0.97). The results of this study provide initial support for the construct validity and test-retest reliability of the MGFMT. Level 3.
Oxford Shoulder Score: A Cross-Cultural Adaptation and Validation Study of the Persian Version in Iran.

PubMed

Ebrahimzadeh, Mohammad H; Birjandinejad, Ali; Razi, Shiva; Mardani-Kivi, Mohsen; Reza Kachooei, Amir

2015-09-01

Oxford shoulder score is a specific 12-item patient-reported tool for evaluation of patients with inflammatory and degenerative disorders of the shoulder. Since its introduction, it has been translated and culturally adapted in some Western and Eastern countries. The aim of this study was to translate the Oxford Shoulder Score (OSS) in Persian and to test its validity and reliability in Persian speaking population in Iran. One hundred patients with degenerative or inflammatory shoulder problem participated in the survey in 2012. All patients completed the Persian version of OSS, Persian DASH and the SF-36 for testing validity. Randomly, 37 patients filled out the Persian OSS again three days after the initial visit to assess the reliability of the questionnaire. Cronbach's alpha coefficient was 0.93. The intraclass correlation coefficient was 0.93. In terms of validity, there was a significant correlation between the Persian OSS and DASH and SF-36 scores (P < 0.001). The Persian version of the OSS proved to be a valid, reliable, and reproducible tool as demonstrated by high Cronbach's alpha and Pearson's correlation coefficients. The Persian transcript of OSS is administrable to Persian speaking patients with shoulder condition and it is understandable by them.
A comparison between patient recall and concurrent measurement of preoperative quality of life outcome in total hip arthroplasty.

PubMed

Howell, Jonathan; Xu, Min; Duncan, Clive P; Masri, Bassam A; Garbuz, Donald S

2008-09-01

The objective is to evaluate the reliability of patients' recall of preoperative pain and function during the immediate postoperation period after total hip arthroplasty. A prospective cohort of 104 patients completed a survey about their quality of life before operation, and recalled preoperative status at 3 days, 6 weeks, and 12 weeks after operation. Quality of life was measured by the Western Ontario and McMaster University Osteoarthritis Index, the Oxford-12 hip score, and the 12-item Short-Form score. The intraclass correlation coefficient and Spearman correlation coefficient were used to compare preoperative quality of life scores to the scores recalled. The reliability of recall remained high up to 3 months postoperation. Patients are able to accurately recall their preoperative function for up to 3 months after total hip arthroplasty.
Reliability generalization study of the Yale-Brown Obsessive-Compulsive Scale for children and adolescents.

PubMed

López-Pina, José Antonio; Sánchez-Meca, Julio; López-López, José Antonio; Marín-Martínez, Fulgencio; Núñez-Núñez, Rosa Ma; Rosa-Alcázar, Ana I; Gómez-Conesa, Antonia; Ferrer-Requena, Josefa

2015-01-01

The Yale-Brown Obsessive-Compulsive Scale for children and adolescents (CY-BOCS) is a frequently applied test to assess obsessive-compulsive symptoms. We conducted a reliability generalization meta-analysis on the CY-BOCS to estimate the average reliability, search for reliability moderators, and propose a predictive model that researchers and clinicians can use to estimate the expected reliability of the CY-BOCS scores. A total of 47 studies reporting a reliability coefficient with the data at hand were included in the meta-analysis. The results showed good reliability and a large variability associated to the standard deviation of total scores and sample size.
A new scale for the assessment of performance and capacity of hand function in children with hemiplegic cerebral palsy: reliability and validity studies.

PubMed

Rosa-Rizzotto, M; Visonà Dalla Pozza, L; Corlatti, A; Luparia, A; Marchi, A; Molteni, F; Facchin, P; Pagliano, E; Fedrizzi, E

2014-10-01

In hemiplegic children, the recognition of the activity limitation pattern and the possibility of grading its severity are relevant for clinicians while planning interventions, monitoring results, predicting outcomes. Aim of the study is to examine the reliability and validity of Besta Scale, an instrument used to measure in hemiplegic children from 18 months to 12 years of age both grasp on request (capacity) and spontaneous use of upper limb (performance) in bimanual play activities and in ADL. Psychometric analysis of reliability and of validity of the Besta scale was performed. Outpatient study sample Reliability study: A sample of 39 patients was enrolled. The administration of Besta scale was video-recorded in a standardized manner. All videos were scored by 20 independent raters on subsequent viewing. 3 raters randomly selected from the 20-raters group rescored the same video two years later for intra-rater reliability. Intra and inter-rater reliability were calculated using Intraclass Correlation Coefficient (ICC) and Kendall's coefficient (K), respectively. Internal consistency reliability was assessed using Alpha's Chronbach coefficient. Validity study: a sample of 105 children was assessed 5 times (at t0 and 2, 3, 6 and 12 months later) by 20 independent raters. Each patient underwent at the same time to QUEST and Besta scale administration and assessment. Criterion validity was calculated using rho-Pearson coefficient. Reliability study: The inter-rater reliability calculated with Kendall's coefficient resulted moderate K=0.47. The intra-rater (or test-retest) reliability for 3 raters was excellent (ICC=0.927). The Cronbach's alpha for internal consistency was 0.972. Validity study: Besta scale showed a good criterion validity compared to QUEST increasing by age and severity of impairment. Rho Pearson's correlation coefficient r was 0.81 (P<0.0001). Limitations. Besta scales in infants finds hard to distinguish between mild to moderately impaired hand function. Besta scale scoring system is a valid and reliable tool, utilizable in a clinical setting to monitor evolution of unimanual and bimanual manipulation and to distinguish hand's capacity from performance.
Validation of a New Metric for Assessing the Integration of Health Protection and Health Promotion in a Sample of Small- and Medium-Sized Employer Groups.

PubMed

Williams, Jessica A R; Nelson, Candace C; Cabán-Martinez, Alberto J; Katz, Jeffrey N; Wagner, Gregory R; Pronk, Nicolaas P; Sorensen, Glorian; McLellan, Deborah L

2015-09-01

To conduct validation analyses for a new measure of the integration of worksite health protection and health promotion approaches developed in earlier research. A survey of small- to medium-sized employers located in the United States was conducted between October 2013 and March 2014 (n = 111). Cronbach α coefficient was used to assess reliability, and Pearson correlation coefficients were used to assess convergent validity. The integration score was positively associated with the measures of occupational safety and health and health promotion activities/policies-supporting its convergent validity (Pearson correlation coefficients of 0.32 to 0.47). Cronbach α coefficient was 0.94, indicating excellent reliability. The integration score seems to be a promising tool for assessing integration of health promotion and health protection. Further work is needed to test its dimensionality and validate its use in other samples.
The Reliability and Validity of the Coopersmith Self-Esteem Inventory for a Sample of Filipino High School Girls.

ERIC Educational Resources Information Center

Watkins, David; Astilla, Estela

1980-01-01

Evidence is presented partially supporting the reliability and construct validity of the Coopersmith Self-Esteem Inventory with Filipino adolescent girls. A test-retest coefficient of 0.61 was found over a nine-month period. Self-esteem scores were significantly associated with IQ scores and teacher ratings of pupils' self-esteem. (Author/BW)
Reliability, Validity, and Sensitivity to Change Overtime of the Modified Melasma Area and Severity Index Score.

PubMed

Abou-Taleb, Doaa A E; Ibrahim, Ahmed K; Youssef, Eman M K; Moubasher, Alaa E A

2017-02-01

The new modified Melasma Area and Severity Index (mMASI) score, the recently used outcome measure for melasma, has not been tested to determine its sensitivity to change in melasma. To determine the reliability, validity, and sensitivity to change overtime of the mMASI score in assessment of the severity of melasma. Pearson correlation, Cronbach alpha, and intraclass correlation coefficient were calculated to assess the reliability of the mMASI score. Validity of the mMASI scale was carried out using Spearman correlation between mMASI total score (before and after treatment), clinical data, and patient's responses. The mMASI score showed excellent reliability and good validity for assessment of the severity of melasma. The authors also determined that the mMASI score demonstrated sensitivity to change over time. An excellent degree of agreement between the mMSAI and MASI scores was revealed. The mMASI score is reliable, valid, and responsive to change in the assessment of severity of melasma. Moreover, the mMASI score was found to be easier to learn and perform and simpler in calculation compared with the MASI score. Overall, the mMASI score can effectively replace the MASI score.
Can we have an overall osteoarthritis severity score for the patellofemoral joint using magnetic resonance imaging? Reliability and validity.

PubMed

Kobayashi, Sarah; Peduto, Anthony; Simic, Milena; Fransen, Marlene; Refshauge, Kathryn; Mah, Jean; Pappas, Evangelos

2018-04-01

This work aimed to assess inter-rater reliability and agreement of a magnetic resonance imaging (MRI)-based Kellgren and Lawrence (K&L) grading for patellofemoral joint osteoarthritis (OA) and to validate it against the MRI Osteoarthritis Knee Score (MOAKS). MRI scans from people aged 45 to 75 years with chronic knee pain participating in a randomised clinical trial evaluating dietary supplements were utilised. Fifty participants were randomly selected and scored using the MRI-based K&L grading using axial and sagittal MRI scans. Raters conducted inter-rater reliability, blinded to clinical information, radiology reports and other rater results. Intra- and inter-rater reliability and agreement were evaluated using the intra-class correlation coefficient (ICC) and Cohen's weighted kappa. There was a 2-week interval between the first and second readings for intra-rater reliability. Validity was assessed using the MOAKS and evaluated using Spearman's correlation coefficient. Intra-rater reliability of the K&L system was excellent: ICC 0.91 (95% CI 0.82-0.95); weighted kappa (ĸ = 0.69). Inter-rater reliability was high (ICC 0.88; 95% CI 0.79-0.93), while agreement between raters was moderate (ĸ = 0.49-0.57). Validity analysis demonstrated a strong correlation between the total MOAKS features score and the K&L grading system (ρ = 0.62-0.67) but weak correlations when compared with individual MOAKS features (ρ = 0.19-0.61). The high reliability and good agreement show consistency in grading the severity of patellofemoral OA with the MRI-based K&L score. Our validity results suggest that the scale may be useful, particularly in the clinical environment. Future research should validate this method against clinical findings.
The Validation of a Case-Based, Cumulative Assessment and Progressions Examination

PubMed Central

Coker, Adeola O.; Copeland, Jeffrey T.; Gottlieb, Helmut B.; Horlen, Cheryl; Smith, Helen E.; Urteaga, Elizabeth M.; Ramsinghani, Sushma; Zertuche, Alejandra; Maize, David

2016-01-01

Objective. To assess content and criterion validity, as well as reliability of an internally developed, case-based, cumulative, high-stakes third-year Annual Student Assessment and Progression Examination (P3 ASAP Exam). Methods. Content validity was assessed through the writing-reviewing process. Criterion validity was assessed by comparing student scores on the P3 ASAP Exam with the nationally validated Pharmacy Curriculum Outcomes Assessment (PCOA). Reliability was assessed with psychometric analysis comparing student performance over four years. Results. The P3 ASAP Exam showed content validity through representation of didactic courses and professional outcomes. Similar scores on the P3 ASAP Exam and PCOA with Pearson correlation coefficient established criterion validity. Consistent student performance using Kuder-Richardson coefficient (KR-20) since 2012 reflected reliability of the examination. Conclusion. Pharmacy schools can implement internally developed, high-stakes, cumulative progression examinations that are valid and reliable using a robust writing-reviewing process and psychometric analyses. PMID:26941435
Reliability of a visual scoring system with fluorescent tracers to assess dermal pesticide exposure.

PubMed

Aragon, Aurora; Blanco, Luis; Lopez, Lylliam; Liden, Carola; Nise, Gun; Wesseling, Catharina

2004-10-01

We modified Fenske's semi-quantitative 'visual scoring system' of fluorescent tracer deposited on the skin of pesticide applicators and evaluated its reproducibility in the Nicaraguan setting. The body surface of 33 farmers, divided into 31 segments, was videotaped in the field after spraying with a pesticide solution containing a fluorescent tracer. A portable UV lamp was used for illumination in a foldaway dark room. The videos of five farmers were randomly selected. The scoring was based on a matrix with extension of fluorescent patterns (scale 0-5) on the ordinate and intensity (scale 0-5) on the abscissa, with the product of these two ranks as the final score for each body segment (0-25). Five medical students rated and evaluated the quality of 155 video images having undergone 4 h of training. Cronbach alpha coefficients and two-way random effects intraclass correlation coefficients (ICC) with absolute agreement were computed to assess inter-rater reliability. Consistency was high (Cronbach alpha = 0.96), but the scores differed substantially between raters. The overall ICC was satisfactory [0.75; 95% confidence interval (CI) = 0.62-0.83], but it was lower for intensity (0.54; 95% CI = 0.40-0.66) and higher for extension (0.80; 95% CI = 0.71-0.86). ICCs were lowest for images with low scores and evaluated as low quality, and highest for images with high scores and high quality. Inter-rater reliability coefficients indicate repeatability of the scoring system. However, field conditions for recording fluorescence should be improved to achieve higher quality images, and training should emphasize a better mechanism for the reading of body areas with low contamination.

Reliability and validity of 12-item Short-Form health survey (SF-12) for the health status of Chinese community elderly population in Xujiahui district of Shanghai.

PubMed

Shou, Juan; Ren, Limin; Wang, Haitang; Yan, Fei; Cao, Xiaoyun; Wang, Hui; Wang, Zhiliang; Zhu, Shanzhu; Liu, Yao

2016-04-01

The 12-item Short-Form Health Survey (SF-12) is the abridged practical version of SF-36. This cross-sectional study was aimed to assess the reliability and validity of SF-12 for the health status of Chinese community elderly population. The Chinese community elderly people in Xujiahui district of Shanghai were investigated. The internal consistency reliability was assessed using Cronbach's alpha and split-half reliability coefficients. Construct validity was analyzed using exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). Spearman's correlation coefficient (ρ) was used for the evaluation of criterion, convergent, and discriminant validity with Spearman's ρ ≥ 0.4 as satisfactory. Comparisons of the SF-12 summary scores among populations that differed in demographics were performed for discriminant validity. Total 1343 individuals aged ≥60 and <85 years old (response rate: 91.3 %) were analyzed. The Cronbach's α value (0.910) and the split-half reliability coefficient (0.812) reflected satisfactory internal consistency reliability of SF-12. EFA extracted a two-factor model (physical and mental health). About 60.7 % of the total variance was explained by the two factors. CFA showed that the two-factor solution provided a good fit to the data. Good convergent validity and discriminant validity of SF-12 were proved by the correction analyses (Spearman's ρ > 0.4) and the comparisons of the SF-12 summary scores among populations (P < 0.05). SF-12 summary scores were significantly correlated with the SF-36 summary scores (Spearman's ρ > 0.4, P < 0.05). In conclusion, SF-12 had satisfactory reliability and validity in measuring health status of Chinese community elderly population in Xujiahui district of Shanghai.
MEASURING SPORT-SPECIFIC PHYSICAL ABILITIES IN MALE GYMNASTS: THE MEN'S GYMNASTICS FUNCTIONAL MEASUREMENT TOOL

PubMed Central

Kenyon, Lisa K.; Elliott, James M; Cheng, M. Samuel

2016-01-01

Purpose/Background Despite the availability of various field-tests for many competitive sports, a reliable and valid test specifically developed for use in men's gymnastics has not yet been developed. The Men's Gymnastics Functional Measurement Tool (MGFMT) was designed to assess sport-specific physical abilities in male competitive gymnasts. The purpose of this study was to develop the MGFMT by establishing a scoring system for individual test items and to initiate the process of establishing test-retest reliability and construct validity. Methods A total of 83 competitive male gymnasts ages 7-18 underwent testing using the MGFMT. Thirty of these subjects underwent re-testing one week later in order to assess test-retest reliability. Construct validity was assessed using a simple regression analysis between total MGFMT scores and the gymnasts’ USA-Gymnastics competitive level to calculate the coefficient of determination (r2). Test-retest reliability was analyzed using Model 1 Intraclass correlation coefficients (ICC). Statistical significance was set at the p<0.05 level. Results The relationship between total MGFMT scores and subjects’ current USA-Gymnastics competitive level was found to be good (r2 = 0.63). Reliability testing of the MGFMT composite test score showed excellent test-retest reliability over a one-week period (ICC = 0.97). Test-retest reliability of the individual component tests ranged from good to excellent (ICC = 0.75-0.97). Conclusions The results of this study provide initial support for the construct validity and test-retest reliability of the MGFMT. Level of Evidence Level 3 PMID:27999723
The Healthy Eating Index-2010 is a valid and reliable measure of diet quality according to the 2010 Dietary Guidelines for Americans.

PubMed

Guenther, Patricia M; Kirkpatrick, Sharon I; Reedy, Jill; Krebs-Smith, Susan M; Buckman, Dennis W; Dodd, Kevin W; Casavale, Kellie O; Carroll, Raymond J

2014-03-01

The Healthy Eating Index (HEI), a measure of diet quality, was updated to reflect the 2010 Dietary Guidelines for Americans and the accompanying USDA Food Patterns. To assess the validity and reliability of the HEI-2010, exemplary menus were scored and 2 24-h dietary recalls from individuals aged ≥2 y from the 2003-2004 NHANES were used to estimate multivariate usual intake distributions and assess whether the HEI-2010 1) has a distribution wide enough to detect meaningful differences in diet quality among individuals, 2) distinguishes between groups with known differences in diet quality by using t tests, 3) measures diet quality independently of energy intake by using Pearson correlation coefficients, 4) has >1 underlying dimension by using principal components analysis (PCA), and 5) is internally consistent by calculating Cronbach's coefficient α. HEI-2010 scores were at or near the maximum levels for the exemplary menus. The distribution of scores among the population was wide (5th percentile = 31.7; 95th percentile = 70.4). As predicted, men's diet quality (mean HEI-2010 total score = 49.8) was poorer than women's (52.7), younger adults' diet quality (45.4) was poorer than older adults' (56.1), and smokers' diet quality (45.7) was poorer than nonsmokers' (53.3) (P < 0.01). Low correlations with energy were observed for HEI-2010 total and component scores (|r| ≤ 0.21). Cronbach's coefficient α was 0.68, supporting the reliability of the HEI-2010 total score as an indicator of overall diet quality. Nonetheless, PCA indicated multiple underlying dimensions, highlighting the fact that the component scores are equally as important as the total. A comparable reevaluation of the HEI-2005 yielded similar results. This study supports the validity and the reliability of both versions of the HEI.
Reliability and validity of television food advertising questionnaire in Malaysia.

PubMed

Zalma, Abdul Razak; Safiah, Md Yusof; Ajau, Danis; Khairil Anuar, Md Isa

2015-09-01

Interventions to counter the influence of television food advertising amongst children are important. Thus, reliable and valid instrument to assess its effect is needed. The objective of this study was to determine the reliability and validity of such a questionnaire. The questionnaire was administered twice on 32 primary schoolchildren aged 10-11 years in Selangor, Malaysia. The interval between the first and second administration was 2 weeks. Test-retest method was used to examine the reliability of the questionnaire. Intra-rater reliability was determined by kappa coefficient and internal consistency by Cronbach's alpha coefficient. Construct validity was evaluated using factor analysis. The test-retest correlation showed moderate-to-high reliability for all scores (r = 0.40*, p = 0.02 to r = 0.95**, p = 0.00), with one exception, consumption of fast foods (r = 0.24, p = 0.20). Kappa coefficient showed acceptable-to-strong intra-rater reliability (K = 0.40-0.92), except for two items under knowledge on television food advertising (K = 0.26 and K = 0.21) and one item under preference for healthier foods (K = 0.33). Cronbach's alpha coefficient indicated acceptable internal consistency for all scores (0.45-0.60). After deleting two items under Consumption of Commonly Advertised Food, the items showed moderate-to-high loading (0.52, 0.84, 0.42 and 0.42) with the Scree plot showing that there was only one factor. The Kaiser-Meyer-Olkin was 0.60, showing that the sample was adequate for factor analysis. The questionnaire on television food advertising is reliable and valid to assess the effect of media literacy education on television food advertising on schoolchildren. © The Author (2013). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Validation of the korean-version of the nonmotor symptoms scale for Parkinson's disease.

PubMed

Koh, Seong-Beom; Kim, Jae Woo; Ma, Hyeo-Il; Ahn, Tae-Beom; Cho, Jin Whan; Lee, Phil Hyu; Chung, Sun Ju; Kim, Joong-Seok; Kwon, Do Young; Baik, Jong Sam

2012-12-01

Non-motor symptoms are common in Parkinson's disease (PD), and are the primary cause of disability in many PD patients. Our aim in this study was to translate the origin non-motor symptoms scale for PD (NMSS), which was written in English, into Korean (K-NMSS), and to evaluate its reliability and validity for use with Korean-speaking patients with PD. In total, 102 patients with PD from 9 movement disorders sections of university teaching hospitals in Korea were enrolled in this study. They were assessed using the K-NMSS, the Unified Parkinson's Disease Rating Scale (UPDRS), the Korean version of the Mini-Mental Status Examination (K-MMSE), the Korean version of the Montgomery-Asberg Depression Rating Scale (K-MADS), the Epworth Sleepiness Scale (ESS), and Parkinson's Disease Questionnaire 39 (PDQ39). Test-retest reliability was assessed over a time interval of 10-14 days in all but one patient. The K-NMSS was administered to 102 patients with PD. The internal consistency and reliability of this tool was 0.742 (mean Cronbach's α-coefficient). The test-retest correlation reliability was 0.941 (Guttman split-half coefficient). There was a moderate correlation between the total K-NMSS score and the scores for UPDRS part I [Spearman's rank correlation coefficient, (rS)=0.521, p<0.001] and UPDRS part II (rS=0.464, p=0.001), but there was only a weak correlation between the total K-NMSS score and the UPDRS part III score (rS=0.288, p=0.003). The total K-NMSS score was significantly correlated with the K-MADS (rS=0.594, p<0.001), K-MMSE (rS=-0.291, p=0.003), and ESS (rS=0.348, p<0.001). The total K-NMSS score was also significantly and positively correlated with the PDQ39 score (rS=0.814, p<0.001). The K-NMSS exhibited good reliability and validity for the assessment of non-motor symptoms in Korean PD patients.
Validation of the Korean-Version of the Nonmotor Symptoms Scale for Parkinson's Disease

PubMed Central

Koh, Seong-Beom; Kim, Jae Woo; Ma, Hyeo-Il; Ahn, Tae-Beom; Cho, Jin Whan; Lee, Phil Hyu; Chung, Sun Ju; Kim, Joong-Seok; Kwon, Do Young

2012-01-01

Background and Purpose Non-motor symptoms are common in Parkinson's disease (PD), and are the primary cause of disability in many PD patients. Our aim in this study was to translate the origin non-motor symptoms scale for PD (NMSS), which was written in English, into Korean (K-NMSS), and to evaluate its reliability and validity for use with Korean-speaking patients with PD. Methods In total, 102 patients with PD from 9 movement disorders sections of university teaching hospitals in Korea were enrolled in this study. They were assessed using the K-NMSS, the Unified Parkinson's Disease Rating Scale (UPDRS), the Korean version of the Mini-Mental Status Examination (K-MMSE), the Korean version of the Montgomery-Asberg Depression Rating Scale (K-MADS), the Epworth Sleepiness Scale (ESS), and Parkinson's Disease Questionnaire 39 (PDQ39). Test-retest reliability was assessed over a time interval of 10-14 days in all but one patient. Results The K-NMSS was administered to 102 patients with PD. The internal consistency and reliability of this tool was 0.742 (mean Cronbach's α-coefficient). The test-retest correlation reliability was 0.941 (Guttman split-half coefficient). There was a moderate correlation between the total K-NMSS score and the scores for UPDRS part I [Spearman's rank correlation coefficient, (rS)=0.521, p<0.001] and UPDRS part II (rS=0.464, p=0.001), but there was only a weak correlation between the total K-NMSS score and the UPDRS part III score (rS=0.288, p=0.003). The total K-NMSS score was significantly correlated with the K-MADS (rS=0.594, p<0.001), K-MMSE (rS=-0.291, p=0.003), and ESS (rS=0.348, p<0.001). The total K-NMSS score was also significantly and positively correlated with the PDQ39 score (rS=0.814, p<0.001). Conclusions The K-NMSS exhibited good reliability and validity for the assessment of non-motor symptoms in Korean PD patients. PMID:23323136
Validity and reliability of a pilot scale for assessment of multiple system atrophy symptoms.

PubMed

Matsushima, Masaaki; Yabe, Ichiro; Takahashi, Ikuko; Hirotani, Makoto; Kano, Takahiro; Horiuchi, Kazuhiro; Houzen, Hideki; Sasaki, Hidenao

2017-01-01

Multiple system atrophy (MSA) is a rare progressive neurodegenerative disorder for which brief yet sensitive scale is required in order for use in clinical trials and general screening. We previously compared several scales for the assessment of MSA symptoms and devised an eight-item pilot scale with large standardized response mean [handwriting, finger taps, transfers, standing with feet together, turning trunk, turning 360°, gait, body sway]. The aim of the present study is to investigate the validity and reliability of a simple pilot scale for assessment of multiple system atrophy symptoms. Thirty-two patients with MSA (15 male/17 female; 20 cerebellar subtype [MSA-C]/12 parkinsonian subtype [MSA-P]) were prospectively registered between January 1, 2014 and February 28, 2015. Patients were evaluated by two independent raters using the Unified MSA Rating Scale (UMSARS), Scale for Assessment and Rating of Ataxia (SARA), and the pilot scale. Correlations between UMSARS, SARA, pilot scale scores, intraclass correlation coefficients (ICCs), and Cronbach's alpha coefficients were calculated. Pilot scale scores significantly correlated with scores for UMSARS Parts I, II, and IV as well as with SARA scores. Intra-rater and inter-rater ICCs and Cronbach's alpha coefficients remained high (> 0.94) for all measures. The results of the present study indicate the validity and reliability of the eight-item pilot scale, particularly for the assessment of symptoms in patients with early state multiple system atrophy.
The Korean version of the Carpal Tunnel Questionnaire. Cross cultural adaptation, reliability, validity and responsiveness.

PubMed

Kim, J K; Lim, H M

2015-02-01

The purpose of this study was to translate and culturally adapt the Carpal Tunnel Questionnaire to produce an equivalent Korean version. A total of 53 patients completed the Korean version of the Carpal Tunnel Questionnaire pre-operatively and 3 months after open carpal tunnel release. All 53 also completed the Korean version of the Disabilities of Arm, Shoulder, and Hand questionnaire pre-operatively and 3 months post-operatively. Reliability was measured by determining the test-retest reliability and internal consistency. Test-retest reliability was assessed using intraclass correlation coefficients and paired t-tests, and internal consistency using Cronbach's alpha coefficients. Pearson correlation analysis was carried out on the Korean version of the Carpal Tunnel Questionnaire scores and the Korean version of the Disabilities of Arm, Shoulder, and Hand scores to assess construct validity. Responsiveness was evaluated using effect sizes and standardized response means. The reliability of the Korean version of the Carpal Tunnel Questionnaire was good. The scores in the Korean version of the Disabilities of Arm, Shoulder, and Hand strongly correlated with the scores in the Korean version of the Carpal Tunnel Questionnaire. Standardized response mean and effect size were both large for the Korean version of the Carpal Tunnel Questionnaire. The study shows that the Korean version of the Carpal Tunnel Questionnaire is a reliable, valid and responsive instrument for measuring outcomes in carpal tunnel syndrome. © The Author(s) 2014.
Adaptation and validation of the Spanish version of the Actinic Keratosis Quality of Life questionnaire.

PubMed

Longo Imedio, Isabel; Serra-Guillén, Carlos

2016-01-01

While there are questionnaires for evaluating the effects of skin cancer on patient quality of life, there are no specific questionnaires available in Spanish for evaluating quality of life in patients with actinic keratosis. The aim of this study was to translate and culturally adapt the Actinic Keratosis Quality of Life (AKQoL) questionnaire into Spanish. The original questionnaire was translated into Spanish following the guidelines for the cross-cultural adaptation of self-report measures. Several measures of general reliability and validity were calculated, including Cronbach α for internal consistency and the Spearman rank-order correlation coefficient and a Bland-Altman plot for test-retest reliability. To test concurrent validity, we used the Pearson correlation coefficient to measure the correlation between AKQoL and Skindex-29 scores. The final version of the questionnaire was administered to 621 patients with actinic keratosis, who scored a mean (SD) of 5.25 (4.73) points (total possible score, 0-25). The Cronbach α reliability coefficient analysis was 0.84. The correlation between the mean (SD) score on the Skindex-29 (1.87 [4.07]) and on the AKQoL (1.97 [2.98] was 0.344 (P=.002, Spearman's rho), with a proportion of shared variance of 11.8%. The translation, cross-cultural adaptation, and validation of the original AKQoL produced a reliable, easily understandable questionnaire for evaluating the impact of actinic keratosis on the quality of life of patients in our setting. Copyright © 2016 AEDV. Published by Elsevier España, S.L.U. All rights reserved.
Validation of a Detailed Scoring Checklist for Use During Advanced Cardiac Life Support Certification

PubMed Central

McEvoy, Matthew D.; Smalley, Jeremy C.; Nietert, Paul J.; Field, Larry C.; Furse, Cory M.; Blenko, John W.; Cobb, Benjamin G.; Walters, Jenna L.; Pendarvis, Allen; Dalal, Nishita S.; Schaefer, John J.

2012-01-01

Introduction Defining valid, reliable, defensible, and generalizable standards for the evaluation of learner performance is a key issue in assessing both baseline competence and mastery in medical education. However, prior to setting these standards of performance, the reliability of the scores yielding from a grading tool must be assessed. Accordingly, the purpose of this study was to assess the reliability of scores generated from a set of grading checklists used by non-expert raters during simulations of American Heart Association (AHA) MegaCodes. Methods The reliability of scores generated from a detailed set of checklists, when used by four non-expert raters, was tested by grading team leader performance in eight MegaCode scenarios. Videos of the scenarios were reviewed and rated by trained faculty facilitators and by a group of non-expert raters. The videos were reviewed “continuously” and “with pauses.” Two content experts served as the reference standard for grading, and four non-expert raters were used to test the reliability of the checklists. Results Our results demonstrate that non-expert raters are able to produce reliable grades when using the checklists under consideration, demonstrating excellent intra-rater reliability and agreement with a reference standard. The results also demonstrate that non-expert raters can be trained in the proper use of the checklist in a short amount of time, with no discernible learning curve thereafter. Finally, our results show that a single trained rater can achieve reliable scores of team leader performance during AHA MegaCodes when using our checklist in continuous mode, as measures of agreement in total scoring were very strong (Lin’s Concordance Correlation Coefficient = 0.96; Intraclass Correlation Coefficient = 0.97). Discussion We have shown that our checklists can yield reliable scores, are appropriate for use by non-expert raters, and are able to be employed during continuous assessment of team leader performance during the review of a simulated MegaCode. This checklist may be more appropriate for use by Advanced Cardiac Life Support (ACLS) instructors during MegaCode assessments than current tools provided by the AHA. PMID:22863996
Standardizing an approach to the evaluation of implementation science proposals.

PubMed

Crable, Erika L; Biancarelli, Dea; Walkey, Allan J; Allen, Caitlin G; Proctor, Enola K; Drainoni, Mari-Lynn

2018-05-29

The fields of implementation and improvement sciences have experienced rapid growth in recent years. However, research that seeks to inform health care change may have difficulty translating core components of implementation and improvement sciences within the traditional paradigms used to evaluate efficacy and effectiveness research. A review of implementation and improvement sciences grant proposals within an academic medical center using a traditional National Institutes of Health framework highlighted the need for tools that could assist investigators and reviewers in describing and evaluating proposed implementation and improvement sciences research. We operationalized existing recommendations for writing implementation science proposals as the ImplemeNtation and Improvement Science Proposals Evaluation CriTeria (INSPECT) scoring system. The resulting system was applied to pilot grants submitted to a call for implementation and improvement science proposals at an academic medical center. We evaluated the reliability of the INSPECT system using Krippendorff's alpha coefficients and explored the utility of the INSPECT system to characterize common deficiencies in implementation research proposals. We scored 30 research proposals using the INSPECT system. Proposals received a median cumulative score of 7 out of a possible score of 30. Across individual elements of INSPECT, proposals scored highest for criteria rating evidence of a care or quality gap. Proposals generally performed poorly on all other criteria. Most proposals received scores of 0 for criteria identifying an evidence-based practice or treatment (50%), conceptual model and theoretical justification (70%), setting's readiness to adopt new services/treatment/programs (54%), implementation strategy/process (67%), and measurement and analysis (70%). Inter-coder reliability testing showed excellent reliability (Krippendorff's alpha coefficient 0.88) for the application of the scoring system overall and demonstrated reliability scores ranging from 0.77 to 0.99 for individual elements. The INSPECT scoring system presents a new scoring criteria with a high degree of inter-rater reliability and utility for evaluating the quality of implementation and improvement sciences grant proposals.
[Study of functional rating scale for amyotrophic lateral sclerosis: revised ALSFRS(ALSFRS-R) Japanese version].

PubMed

Ohashi, Y; Tashiro, K; Itoyama, Y; Nakano, I; Sobue, G; Nakamura, S; Sumino, S; Yanagisawa, N

2001-04-01

Amyotrophic lateral sclerosis(ALS) is progressive, degenerative, fatal disease of the motor neuron. No efficacious therapy is available to slow the progressive loss of function, but several new approaches including neurotrophic factors, antioxidants and glutamate antagonists, are currently being evaluated as potential therapies. Mortality, and/or time to tracheostomy, muscle strength and pulmonary function are used as primary endpoints in clinical trials for treatment of ALS. The effect of new therapies on the quality of patients' lives are also important, so we sought to develop a rating scale to measure it. The revised ALS Functional Rating Scale(ALSFRS-R), which has addition of items to ALSFRS to enhance the ability to assess respiratory symptoms, is an assessment determining the degree of impairment in ALS patients' abilities to function independently in activities of daily living. It consists of 12 items to evaluate bulbar function, motor function and respiratory function and each item is scored from 0(unable) to 4(normal). We translated the English score into Japanese one with minor modification considering the inter cultural difference. And we examined reliability of the translated scale. As a measure of reliability, the intraclass correlation coefficient(ICC) was evaluated for total score and the Kappa coefficient proposed by Cohen and Kraemer was calculated for each item. Moreover, we examined sensitivity to clinical change over time and carried out the factor analysis to analyze the factorial structure. The subjects were 27 ALS patients and each was scored twice for reliability or three times for sensitivity by 2 to 5 neurologists and if possible, nurses. The ICC for total score was 0.97(95% C. I.; 0.94-0.98). Extension of the Kappa coefficients were 0.48 to 1.00 for inter-rater reliability and the averaged Kappa coefficients were 0.63 to 1.00 for intra rater reliability, respectively. Concerning the factorial structure, the contribution of the first factor(the first principal component) were 53.5% principal factor solution. The factor loadings of items were 0.52-0.91 except "salivation" and this factor almost equal to the simple sum of all items was interpreted as the general degree of deterioration. The promax votation revealed the riginally supposed factor structure with 3 factors(groups of items): neuromuscuclar function, respiratory function and bulbar function. The rating scale correlated with Global clinical impression of change(GCIC) scored by neurologists and declined with time, indicating its sensitivity to change. On the bases of these results, ALSFRS-R(Japanese version) is considered to be highly reliable enough for clinical use.
Reliability of a novel, semi-quantitative scale for classification of structural brain magnetic resonance imaging in children with cerebral palsy.

PubMed

Fiori, Simona; Cioni, Giovanni; Klingels, Katrjin; Ortibus, Els; Van Gestel, Leen; Rose, Stephen; Boyd, Roslyn N; Feys, Hilde; Guzzetta, Andrea

2014-09-01

To describe the development of a novel rating scale for classification of brain structural magnetic resonance imaging (MRI) in children with cerebral palsy (CP) and to assess its interrater and intrarater reliability. The scale consists of three sections. Section 1 contains descriptive information about the patient and MRI. Section 2 contains the graphical template of brain hemispheres onto which the lesion is transposed. Section 3 contains the scoring system for the quantitative analysis of the lesion characteristics, grouped into different global scores and subscores that assess separately side, regions, and depth. A larger interrater and intrarater reliability study was performed in 34 children with CP (22 males, 12 females; mean age at scan of 9 y 5 mo [SD 3 y 3 mo], range 4 y-16 y 11 mo; Gross Motor Function Classification System level I, [n=22], II [n=10], and level III [n=2]). Very high interrater and intrarater reliability of the total score was found with indices above 0.87. Reliability coefficients of the lobar and hemispheric subscores ranged between 0.53 and 0.95. Global scores for hemispheres, basal ganglia, brain stem, and corpus callosum showed reliability coefficients above 0.65. This study presents the first visual, semi-quantitative scale for classification of brain structural MRI in children with CP. The high degree of reliability of the scale supports its potential application for investigating the relationship between brain structure and function and examining treatment response according to brain lesion severity in children with CP. © 2014 Mac Keith Press.
Reliability and Validity of a Japanese-language and Culturally Adapted Version of the Musculoskeletal Tumor Society Scoring System for the Lower Extremity.

PubMed

Iwata, Shintaro; Uehara, Kosuke; Ogura, Koichi; Akiyama, Toru; Shinoda, Yusuke; Yonemoto, Tsukasa; Kawai, Akira

2016-09-01

The Musculoskeletal Tumor Society (MSTS) scoring system is a widely used functional evaluation tool for patients treated for musculoskeletal tumors. Although the MSTS scoring system has been validated in English and Brazilian Portuguese, a Japanese version of the MSTS scoring system has not yet been validated. We sought to determine whether a Japanese-language translation of the MSTS scoring system for the lower extremity had (1) sufficient reliability and internal consistency, (2) adequate construct validity, and (3) reasonable criterion validity compared with the Toronto Extremity Salvage Score (TESS) and SF-36 using psychometric analysis. The Japanese version of the MSTS scoring system was developed using accepted guidelines, which included translation of the English version of the MSTS into Japanese by five native Japanese bilingual musculoskeletal oncology surgeons and integrated into one document. One hundred patients with a diagnosis of intermediate or malignant bone or soft tissue tumors located in the lower extremity and who had undergone tumor resection with or without reconstruction or amputation participated in this study. Reliability was evaluated by test-retest analysis, and internal consistency was established by Cronbach's alpha coefficient. Construct validity was evaluated using the principal factor analysis and Akaike information criterion network. Criterion validity was evaluated by comparing the MSTS scoring system with the TESS and SF-36. Test-retest analysis showed a high intraclass correlation coefficient (0.92; 95% CI, 0.88-0.95), indicating high reliability of the Japanese version of the MSTS scoring system, although a considerable ceiling effect was observed, with 23 patients (23%) given the maximum score. Cronbach's alpha coefficient was 0.87 (95% CI, 0.82-0.90), suggesting a high level of internal consistency. Factor analysis revealed that all items had high loading values and communalities; we identified a central role for the items "walking" and "gait" according to the Akaike information criterion network. The total MSTS score was correlated with that of the TESS (r = 0.81; 95% CI, 0.73-0.87; p < 0.001) and the physical component summary and physical functioning of the SF-36. The Japanese-language translation of the MSTS scoring system for the lower extremity has sufficient reliability and reasonable validity. Nevertheless, the observation of a ceiling effect suggests poor ability of this system to discriminate from among patients who have a high level of function.
Scoring ultrasound synovitis in rheumatoid arthritis: a EULAR-OMERACT ultrasound taskforce-Part 2: reliability and application to multiple joints of a standardised consensus-based scoring system

PubMed Central

Terslev, Lene; Naredo, Esperanza; Aegerter, Philippe; Wakefield, Richard J; Backhaus, Marina; Balint, Peter; Bruyn, George A W; Iagnocco, Annamaria; Jousse-Joulin, Sandrine; Schmidt, Wolfgang A; Szkudlarek, Marcin; Conaghan, Philip G; Filippucci, Emilio

2017-01-01

Objectives To test the reliability of new ultrasound (US) definitions and quantification of synovial hypertrophy (SH) and power Doppler (PD) signal, separately and in combination, in a range of joints in patients with rheumatoid arthritis (RA) using the European League Against Rheumatisms–Outcomes Measures in Rheumatology (EULAR-OMERACT) combined score for PD and SH. Methods A stepwise approach was used: (1) scoring static images of metacarpophalangeal (MCP) joints in a web-based exercise and subsequently when scanning patients; (2) scoring static images of wrist, proximal interphalangeal joints, knee and metatarsophalangeal joints in a web-based exercise and subsequently when scanning patients using different acquisitions (standardised vs usual practice). For reliability, kappa coefficients (κ) were used. Results Scoring MCP joints in static images showed substantial intraobserver variability but good to excellent interobserver reliability. In patients, intraobserver reliability was the same for the two acquisition methods. Interobserver reliability for SH (κ=0.87) and PD (κ=0.79) and the EULAR-OMERACT combined score (κ=0.86) were better when using a ‘standardised’ scan. For the other joints, the intraobserver reliability was excellent in static images for all scores (κ=0.8–0.97) and the interobserver reliability marginally lower. When using standardised scanning in patients, the intraobserver was good (κ=0.64 for SH and the EULAR-OMERACT combined score, 0.66 for PD) and the interobserver reliability was also good especially for PD (κ range=0.41–0.92). Conclusion The EULAR-OMERACT score demonstrated moderate-good reliability in MCP joints using a standardised scan and is equally applicable in non-MCP joints. This scoring system should underpin improved reliability and consequently the responsiveness of US in RA clinical trials. PMID:28948984
A Generalizability Analysis of Score Consistency for the Balanced Inventory of Desirable Responding

ERIC Educational Resources Information Center

Vispoel, Walter P.; Tao, Shuqin

2013-01-01

Our goal in this investigation was to evaluate the reliability of scores from the Balanced Inventory of Desirable Responding (BIDR) more comprehensively than in prior research using a generalizability-theory framework based on both dichotomous and polytomous scoring of items. Generalizability coefficients accounting for specific-factor, transient,…
A reliability generalization meta-analysis of coefficient alpha and test-retest coefficient for the aging males' symptoms (AMS) scale.

PubMed

Lee, Chin-Pang; Chiu, Yu-Wen; Chu, Chun-Lin; Chen, Yu; Jiang, Kun-Hao; Chen, Jiun-Liang; Chen, Ching-Yen

2016-12-01

The aging males' symptoms (AMS) scale is an instrument used to determine the health-related quality of life in adult and elderly men. The purpose of this study was to synthesize internal consistency (Cronbach's alpha) and test-retest reliability for the AMS scale and its three subscales. Of the 123 studies reviewed, 12 provided alpha coefficients which were then used in the meta-analyses of internal consistency. Seven of the 12 included studies provided test-retest coefficients, and these were used in the meta-analyses of test-retest reliability. The AMS scale had excellent internal consistency [α = 0.89 (95% CI 0.88-0.90)]; the mean alpha estimates across the AMS subscales ranged from 0.79 to 0.82. The AMS scale also had good test-retest reliability [r = 0.85 (95% CI 0.82-0.88]; the test-retest reliability coefficients of the AMS subscales ranged from 0.76 to 0.83. There was significant heterogeneity among the included studies. The AMS scale and the three subscales had fairly good internal consistency and test-retest reliability. Future psychometric studies of the AMS scale should report important characteristics of the participants, details of item scores, and test-retest reliability.
Interrater reliability of early intervention providers scoring the alberta infant motor scale.

PubMed

Blanchard, Y; Neilan, E; Busanich, J; Garavuso, L; Klimas, D

2004-01-01

This study was designed to examine the interrater reliability of early intervention providers scoring of the Alberta Infant Motor Scale (AIMS) and to examine whether training on the AIMS would improve their interrater reliability. Eight early intervention providers were randomly assigned to two groups. Participants in Group 1 scored the AIMS on seven videotapes of infants prior to receiving training and after training on another set of seven videotapes of infants. Participants in Group 2 scored the AIMS on all 14 videotapes of the infants after receiving training. Overall interrater reliability before and after training was high with intraclass correlation coefficients ranging from 0.98 to 0.99. Detailed examination of the results showed that training improved the reliability of the supine subscale in a subgroup of infants between the ages of five and seven months. Training also had an effect on the classification of infants as normal or abnormal in their motor development based on their percentile rankings. The AIMS manual provides sufficient information to attain high interrater reliability without training, but revisions regarding scoring are strongly recommended.
Investigating the technical adequacy of curriculum-based measurement in written expression for students who are deaf or hard of hearing.

PubMed

Cheng, Shu-Fen; Rose, Susan

2009-01-01

This study investigated the technical adequacy of curriculum-based measures of written expression (CBM-W) in terms of writing prompts and scoring methods for deaf and hard-of-hearing students. Twenty-two students at the secondary school-level completed 3-min essays within two weeks, which were scored for nine existing and alternative curriculum-based measurement (CBM) scoring methods. The technical features of the nine scoring methods were examined for interrater reliability, alternate-form reliability, and criterion-related validity. The existing CBM scoring method--number of correct minus incorrect word sequences--yielded the highest reliability and validity coefficients. The findings from this study support the use of the CBM-W as a reliable and valid tool for assessing general writing proficiency with secondary students who are deaf or hard of hearing. The CBM alternative scoring methods that may serve as additional indicators of written expression include correct subject-verb agreements, correct clauses, and correct morphemes.
Cross-cultural adaptation, reliability and validity of the Turkish version of the Hospital for Special Surgery (HSS) Knee Score.

PubMed

Narin, Selnur; Unver, Bayram; Bakırhan, Serkan; Bozan, Ozgür; Karatosun, Vasfi

2014-01-01

The purpose of this study was to adapt the English version of the Hospital for Special Surgery (HSS) knee score for use in a Turkish population and to evaluate its validity, reliability and cultural adaptation. Standard forward-back translation of the HSS knee score was performed and the Turkish version was applied in 73 patients. The Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC), Mini-Mental State Examination and sit-to-stand test were also performed and analyzed. Internal consistency reliability was tested using Cronbach's alpha. The intraclass correlation coefficient (ICC) was used to calculate the test-retest reliability at one-week intervals. Validity was assessed by calculating the Pearson correlation between the HSS, WOMAC and sit-to-stand test scores. The ICC ranged from 0.98 to 0.99 with high internal consistency (Cronbach's alpha: 0.87). The WOMAC score correlated with total HSS score (r: -0.80, p<0.001) and sit-to-stand score (r: 0.12, p: 0.312). The Turkish version of the HSS knee score is reliable and valid in evaluating the total knee arthroplasty in Turkish patients.

Sample size planning for composite reliability coefficients: accuracy in parameter estimation via narrow confidence intervals.

PubMed

Terry, Leann; Kelley, Ken

2012-11-01

Composite measures play an important role in psychology and related disciplines. Composite measures almost always have error. Correspondingly, it is important to understand the reliability of the scores from any particular composite measure. However, the point estimates of the reliability of composite measures are fallible and thus all such point estimates should be accompanied by a confidence interval. When confidence intervals are wide, there is much uncertainty in the population value of the reliability coefficient. Given the importance of reporting confidence intervals for estimates of reliability, coupled with the undesirability of wide confidence intervals, we develop methods that allow researchers to plan sample size in order to obtain narrow confidence intervals for population reliability coefficients. We first discuss composite reliability coefficients and then provide a discussion on confidence interval formation for the corresponding population value. Using the accuracy in parameter estimation approach, we develop two methods to obtain accurate estimates of reliability by planning sample size. The first method provides a way to plan sample size so that the expected confidence interval width for the population reliability coefficient is sufficiently narrow. The second method ensures that the confidence interval width will be sufficiently narrow with some desired degree of assurance (e.g., 99% assurance that the 95% confidence interval for the population reliability coefficient will be less than W units wide). The effectiveness of our methods was verified with Monte Carlo simulation studies. We demonstrate how to easily implement the methods with easy-to-use and freely available software. ©2011 The British Psychological Society.
Test-retest reliability of the Capute scales for neurodevelopmental screening of a high risk sample: Impact of test-retest interval and degree of neonatal risk.

PubMed

McCurdy, M; Bellows, A; Deng, D; Leppert, M; Mahone, E; Pritchard, A

2015-01-01

Reliable and valid screening and assessment tools are necessary to identify children at risk for neurodevelopmental disabilities who may require additional services. This study evaluated the test-retest reliability of the Capute Scales in a high-risk sample, hypothesizing adequate reliability across 6- and 12-month intervals. Capute Scales scores (N = 66) were collected via retrospective chart review from a NICU follow-up clinic within a large urban medical center spanning three age-ranges: 12-18, 19-24, and 25-36 months. On average, participants were classified as very low birth weight and premature. Reliability of the Capute Scales was evaluated with intraclass correlation coefficients across length of test-retest interval, age at testing, and degree of neonatal complications. The Capute Scales demonstrated high reliability, regardless of length of test-retest interval (ranging from 6 to 14 months) or age of participant, for all index scores, including overall Developmental Quotient (DQ), language-based skill index (CLAMS) and nonverbal reasoning index (CAT). Linear regressions revealed that greater neonatal risk was related to poorer test-retest reliability; however, reliability coefficients remained strong. The Capute Scales afford clinicians a reliable and valid means of screening and assessing for neurodevelopmental delay within high-risk infant populations.
The reliability of three psoriasis assessment tools: Psoriasis area and severity index, body surface area and physician global assessment.

PubMed

Bożek, Agnieszka; Reich, Adam

2017-08-01

A wide variety of psoriasis assessment tools have been proposed to evaluate the severity of psoriasis in clinical trials and daily practice. The most frequently used clinical instrument is the psoriasis area and severity index (PASI); however, none of the currently published severity scores used for psoriasis meets all the validation criteria required for an ideal score. The aim of this study was to compare and assess the reliability of 3 commonly used assessment instruments for psoriasis severity: the psoriasis area and severity index (PASI), body surface area (BSA) and physician global assessment (PGA). On the scoring day, 10 trained dermatologists evaluated 9 adult patients with plaque-type psoriasis using the PASI, BSA and PGA. All the subjects were assessed twice by each physician. Correlations between the assessments were analyzed using the Pearson correlation coefficient. Intra-class correlation coefficient (ICC) was calculated to analyze intra-rater reliability, and the coefficient of variation (CV) was used to assess inter-rater variability. Significant correlations were observed among the 3 scales in both assessments. In all 3 scales the ICCs were > 0.75, indicating high intra-rater reliability. The highest ICC was for the BSA (0.96) and the lowest one for the PGA (0.87). The CV for the PGA and PASI were 29.3 and 36.9, respectively, indicating moderate inter-rater variability. The CV for the BSA was 57.1, indicating high inter-rater variability. Comparing the PASI, PGA and BSA, it was shown that the PGA had the highest inter-rater reliability, whereas the BSA had the highest intra-rater reliability. The PASI showed intermediate values in terms of interand intra-rater reliability. None of the 3 assessment instruments showed a significant advantage over the other. A reliable assessment of psoriasis severity requires the use of several independent evaluations simultaneously.
Reliability testing of a portfolio assessment tool for postgraduate family medicine training in South Africa

PubMed Central

Mash, Bob; Derese, Anselme

2013-01-01

Abstract Background Competency-based education and the validity and reliability of workplace-based assessment of postgraduate trainees have received increasing attention worldwide. Family medicine was recognised as a speciality in South Africa six years ago and a satisfactory portfolio of learning is a prerequisite to sit the national exit exam. A massive scaling up of the number of family physicians is needed in order to meet the health needs of the country. Aim The aim of this study was to develop a reliable, robust and feasible portfolio assessment tool (PAT) for South Africa. Methods Six raters each rated nine portfolios from the Stellenbosch University programme, using the PAT, to test for inter-rater reliability. This rating was repeated three months later to determine test–retest reliability. Following initial analysis and feedback the PAT was modified and the inter-rater reliability again assessed on nine new portfolios. An acceptable intra-class correlation was considered to be > 0.80. Results The total score was found to be reliable, with a coefficient of 0.92. For test–retest reliability, the difference in mean total score was 1.7%, which was not statistically significant. Amongst the subsections, only assessment of the educational meetings and the logbook showed reliability coefficients > 0.80. Conclusion This was the first attempt to develop a reliable, robust and feasible national portfolio assessment tool to assess postgraduate family medicine training in the South African context. The tool was reliable for the total score, but the low reliability of several sections in the PAT helped us to develop 12 recommendations regarding the use of the portfolio, the design of the PAT and the training of raters.
An instrument to characterize the environment for residents' evidence-based medicine learning and practice.

PubMed

Mi, Misa; Moseley, James L; Green, Michael L

2012-02-01

Many residency programs offer training in evidence-based medicine (EBM). However, these curricula often fail to achieve optimal learning outcomes, perhaps because they neglect various contextual factors in the learning environment. We developed and validated an instrument to characterize the environment for EBM learning and practice in residency programs. An EBM Environment Scale was developed following scale development principles. A survey was administered to residents across six programs in primary care specialties at four medical centers. Internal consistency reliability was analyzed with Cronbach's coefficient alpha. Validity was assessed by comparing predetermined subscales with the survey's internal structure as assessed via factor analysis. Scores were also compared for subgroups based on residency program affiliation and residency characteristics. Out of 262 eligible residents, 124 completed the survey (response rate 47%). The overall mean score was 3.89 (standard deviation=0.56). The initial reliability analysis of the 48-item scale had a high reliability coefficient (Cronbach α=.94). Factor analysis and further item analysis resulted in a shorter 36-item scale with a satisfactory reliability coefficient (Cronbach α=.86). Scores were higher for residents with prior EBM training in medical school (4.14 versus 3.62) and in residency (4.25 versus 3.69). If further testing confirms its properties, the EBM Environment Scale may be used to understand the influence of the learning environment on the effectiveness of EBM training. Additionally, it may detect changes in the EBM learning environment in response to programmatic or institutional interventions.
The validation of the visual analogue scale for patient satisfaction after total hip arthroplasty.

PubMed

Brokelman, Roy B G; Haverkamp, Daniel; van Loon, Corné; Hol, Annemiek; van Kampen, Albert; Veth, Rene

2012-06-01

INTRODUCTION: Patient satisfaction becomes more important in our modern health care system. The assessment of satisfaction is difficult because it is a multifactorial item for which no golden standard exists. One of the potential methods of measuring satisfaction is by using the well-known visual analogue scale (VAS). In this study, we validated VAS for satisfaction. PATIENT AND METHODS: In this prospective study, we studied 147 patients (153 hips). The construct validity was measured using the Spearman correlation test that compares the satisfaction VAS with the Harris hip score, pain VAS at rest and during activity, Oxford hip score, Short Form 36 and Western Ontario McMaster Universities Osteoarthritis Index. The reliability was tested using the intra-class coefficient. RESULTS: The Pearson correlation test showed correlations in the range of 0.40-0.80. The satisfaction VAS had a high correlation between the pain VAS and Oxford hip score, which could mean that pain is one of the most important factors in patient satisfaction. The intra-class coefficient was 0.95. CONCLUSIONS: There is a moderate to mark degree of correlation between the satisfaction VAS and the currently available subjective and objective scoring systems. The intra-class coefficient of 0.95 indicates an excellent test-retest reliability. The VAS satisfaction is a simple instrument to quantify the satisfaction of a patient after total hip arthroplasty. In this study, we showed that the satisfaction VAS has a good validity and reliability.
Validity and reliability of the Physical Activity Scale for the Elderly (PASE) in Japanese elderly people.

PubMed

Hagiwara, Akiko; Ito, Naomi; Sawai, Kazuhiko; Kazuma, Keiko

2008-09-01

In Japan, there are no valid and reliable physical activity questionnaires for elderly people. In this study, we translated the Physical Activity Scale for the Elderly (PASE) into Japanese and assessed its validity and reliability. Three hundred and twenty-five healthy and elderly subjects over 65 years were enrolled. Concurrent validity was evaluated by Spearman's rank correlation coefficient between PASE scores and an accelerometer (waking steps and energy expenditure), a physical activity questionnaire for adults in general (the Japan Arteriosclerosis Longitudinal Study Physical Activity Questionnaire, JALSPAQ), grip strength, mid-thigh muscle area per bodyweight, static valance and bodyfat percentage. Reliability was evaluated by the test-retest method over a period of 3-4 weeks. The mean PASE score in this study was 114.9. The PASE score was significantly correlated with walking steps (rho = 0.17, P = 0.014), energy expenditure (rho = 0.16, P = 0.024), activity measured with the JALSPAQ (rho = 0.48, P < 0.001), mid-thigh muscle area per bodyweight (rho = 0.15, P = 0.006) and static balance (rho = 0.19, P = 0.001). The proportion of consistency in the response between the first and second surveys was adequately high. The intraclass correlation coefficient for the PASE score was 0.65. The Japanese version of PASE was shown to have acceptable validity and reliability. The PASE is useful to measure the physical activity of elderly people in Japan.
Reliable and valid assessment of point-of-care ultrasonography.

PubMed

Todsen, Tobias; Tolsgaard, Martin Grønnebæk; Olsen, Beth Härstedt; Henriksen, Birthe Merete; Hillingsø, Jens Georg; Konge, Lars; Jensen, Morten Lind; Ringsted, Charlotte

2015-02-01

To explore the reliability and validity of the Objective Structured Assessment of Ultrasound Skills (OSAUS) scale for point-of-care ultrasonography (POC US) performance. POC US is increasingly used by clinicians and is an essential part of the management of acute surgical conditions. However, the quality of performance is highly operator-dependent. Therefore, reliable and valid assessment of trainees' ultrasonography competence is needed to ensure patient safety. Twenty-four physicians, representing novices, intermediates, and experts in POC US, scanned 4 different surgical patient cases in a controlled set-up. All ultrasound examinations were video-recorded and assessed by 2 blinded radiologists using OSAUS. Reliability was examined using generalizability theory. Construct validity was examined by comparing performance scores between the groups and by correlating physicians' OSAUS scores with diagnostic accuracy. The generalizability coefficient was high (0.81) and a D-study demonstrated that 1 assessor and 5 cases would result in similar reliability. The construct validity of the OSAUS scale was supported by a significant difference in the mean scores between the novice group (17.0; SD 8.4) and the intermediate group (30.0; SD 10.1), P = 0.007, as well as between the intermediate group and the expert group (72.9; SD 4.4), P = 0.04, and by a high correlation between OSAUS scores and diagnostic accuracy (Spearman ρ correlation coefficient = 0.76; P < 0.001). This study demonstrates high reliability as well as evidence of construct validity of the OSAUS scale for assessment of POC US competence. Hence, the OSAUS scale may be suitable for both in-training as well as end-of-training assessment.
Quality-of-life survey for patients diagnosed with nonmuscle-invasive bladder cancer.

PubMed

Abáigar-Pedraza, I; Megías-Garrigós, J; Sánchez-Payá, J

2016-05-01

To determine the reliability and validity of a quality-of-life survey for patients with nonmuscle-invasive bladder cancer. A total of 180 patients were included in the study. We developed a survey with 21 questions grouped into 5 areas. The patients filled in this survey and the Functional Assessment of Cancer Therapy - Bladder Cancer (FACT-BL) survey. To assess reliability, we calculated Cronbach's alpha coefficient and the kappa index. To determine criterion validity, we studied the association between the scores obtained from our survey and those from the FACT-BL survey using the Pearson correlation coefficient. To determine the construct validity (factorial and discriminatory), we performed a factor analysis, comparing it with Student's t-test for the scores obtained according to the tumour characteristics of reduced quality of life (e.g., malignancies located at the trigone of the bladder). Cronbach's alpha reliability coefficient was .83, and the kappa index varied between .7 and 1. For the association study between the new survey and the FACT-BL survey, we measured an r=.82 for the overall score and between r=.68 (disease) and r=.97 (sex life) in the various measures. In the factor analysis, we measured a Kaiser-Meyer-Olkin index of .77 and performed the Barlett test (P<.001). The comparison between the scores, in the presence or absence of certain tumour characteristics, has shown a reduced quality of life when those characteristics are present, which was statistically significant (P<.05) in the majority of cases. Our survey to measure the quality of life of patients with nonmuscle-invasive bladder cancer is reliable and valid. Copyright © 2015 AEU. Publicado por Elsevier España, S.L.U. All rights reserved.
Data mining-based coefficient of influence factors optimization of test paper reliability

NASA Astrophysics Data System (ADS)

Xu, Peiyao; Jiang, Huiping; Wei, Jieyao

2018-05-01

Test is a significant part of the teaching process. It demonstrates the final outcome of school teaching through teachers' teaching level and students' scores. The analysis of test paper is a complex operation that has the characteristics of non-linear relation in the length of the paper, time duration and the degree of difficulty. It is therefore difficult to optimize the coefficient of influence factors under different conditions in order to get text papers with clearly higher reliability with general methods [1]. With data mining techniques like Support Vector Regression (SVR) and Genetic Algorithm (GA), we can model the test paper analysis and optimize the coefficient of impact factors for higher reliability. It's easy to find that the combination of SVR and GA can get an effective advance in reliability from the test results. The optimal coefficient of influence factors optimization has a practicability in actual application, and the whole optimizing operation can offer model basis for test paper analysis.
Interrater Reliability of the Supports Intensity Scale (SIS)

ERIC Educational Resources Information Center

Thompson, James R.; Tasse, Marc J.; McLaughlin, Colleen A.

2008-01-01

The interrater reliability of the Supports Intensity Scale (SIS) was investigated under the condition that interviewers had to have been trained and/or experienced in its administration and scoring. Both corrected and noncorrected Pearson's product-moment coefficients were generated to assess interinterviewer, interrespondent, and mixed interrater…
Validity and Reliability of Nintendo Wii Fit Balance Scores

PubMed Central

Wikstrom, Erik A.

2012-01-01

Context: Interactive gaming systems have the potential to help rehabilitate patients with musculoskeletal conditions. The Nintendo Wii Balance Board, which is part of the Wii Fit game, could be an effective tool to monitor progress during rehabilitation because the board and game can provide objective measures of balance. However, the validity and reliability of Wii Fit balance scores remain unknown. Objective: To determine the concurrent validity of balance scores produced by the Wii Fit game and the intrasession and intersession reliability of Wii Fit balance scores. Design: Descriptive laboratory study. Setting: Sports medicine research laboratory. Patients or Other Participants: Forty-five recreationally active participants (age = 27.0 ± 9.8 years, height = 170.9 ± 9.2 cm, mass = 72.4 ± 11.8 kg) with a heterogeneous history of lower extremity injury. Intervention(s): Participants completed a single-limb–stance task on a force plate and the Star Excursion Balance Test (SEBT) during the first test session. Twelve Wii Fit balance activities were completed during 2 test sessions separated by 1 week. Main Outcome Measure(s): Postural sway in the anteroposterior (AP) and mediolateral (ML) directions and the AP, ML, and resultant center-of-pressure (COP) excursions were calculated from the single-limb stance. The normalized reach distance was recorded for the anterior, posteromedial, and posterolateral directions of the SEBT. Wii Fit balance scores that the game software generated also were recorded. Results: All 96 of the calculated correlation coefficients among Wii Fit activity outcomes and established balance outcomes were interpreted as poor (r < 0.50). Intrasession reliability for Wii Fit balance activity scores ranged from good (intraclass correlation coefficient [ICC] = 0.80) to poor (ICC = 0.39), with 8 activities having poor intrasession reliability. Similarly, 11 of the 12 Wii Fit balance activity scores demonstrated poor intersession reliability, with scores ranging from fair (ICC = 0.74) to poor (ICC = 0.29). Conclusions: Wii Fit balance activity scores had poor concurrent validity relative to COP outcomes and SEBT reach distances. In addition, the included Wii Fit balance activity scores generally had poor intrasession and intersession reliability. PMID:22892412
Interrater and Test-Retest Reliability and Minimal Detectable Change of the Balance Evaluation Systems Test (BESTest) and Subsystems With Community-Dwelling Older Adults.

PubMed

Wang-Hsu, Elizabeth; Smith, Susan S

2017-01-10

Falls are a common cause of injuries and hospital admissions in older adults. Balance limitation is a potentially modifiable factor contributing to falls. The Balance Evaluation Systems Test (BESTest), a clinical balance measure, categorizes balance into 6 underlying subsystems. Each of the subsystems is scored individually and summed to obtain a total score. The reliability of the BESTest and its individual subsystems has been reported in patients with various neurological disorders and cancer survivors. However, the reliability and minimal detectable change (MDC) of the BESTest with community-dwelling older adults have not been reported. The purposes of our study were to (1) determine the interrater and test-retest reliability of the BESTest total and subsystem scores; and (2) estimate the MDC of the BESTest and its individual subsystem scores with community-dwelling older adults. We used a prospective cohort methodological design. Community-dwelling older adults (N = 70; aged 70-94 years; mean = 85.0 [5.5] years) were recruited from a senior independent living community. Trained testers (N = 3) administered the BESTest. All participants were tested with the BESTest by the same tester initially and then retested 7 to 14 days later. With 32 of the participants, a second tester concurrently scored the retest for interrater reliability. Testers were blinded to each other's scores. Intraclass correlation coefficients [ICC(2,1)] were used to determine the interrater and test-retest reliability. Test-retest reliability was also analyzed using method error and the associated coefficients of variation (CVME). MDC was calculated using standard error of measurement. Interrater reliability (N = 32) of the BESTest total score was ICC(2, 1) = 0.97 (95% confidence interval [CI], 0.94-0.99). The ICCs for the individual subsystem scores ranged from 0.85 to 0.94. Test-retest reliability (N = 70) of the BESTest total score was ICC(2,1) = 0.93 (95% CI, 0.89-0.96). ICCs for the individual subsystem scores ranged from 0.72 to 0.89. The CVME (N = 70) of the BESTest total score was 4.1%. The CVME for the subsystem scores ranged from 5.0% to 10.7%. MDC (N = 70) for the BESTest total score at the 95% CI was 7.6%, or 8.2 points. MDC at the 95% CI for subsystem scores ranged from 11.7% to 19.0% (2.1-3.4 points). Results demonstrated generally good to excellent interrater and test-retest reliability in both the BESTest total and subsystem scores with community-dwelling older adults. The BESTest total and individual subsystem scores demonstrate good to excellent interrater and test-retest reliability with community-dwelling older adults. A change of 7.6% (8.2 points) or more in the BESTest total and a percentage change ranged from 11.7% to 19.0% (2.1-3.4 points) in the subsystem scores are suggested for clinicians to be 95% confident of true change when evaluating change in this population.
Assessment of the 4Ts pretest clinical scoring system as a predictor of heparin-induced thrombocytopenia.

PubMed

Strutt, Jaclyn K; Mackey, Jennifer E; Johnson, Stephen M; Sylvia, Lynne M

2011-02-01

To evaluate the utility of the 4Ts clinical scoring system as a pretest probability method for the detection of heparin-induced thrombocytopenia (HIT). Prospective observational study. Medical and surgical inpatients at a tertiary care medical center. Eighty consecutive patients with suspicion of HIT who had a polyspecific enzyme-linked immunosorbent assay (ELISA) performed between December 1, 2008, and April 1, 2009, for detection of platelet factor 4 (PF4)-heparin antibodies. The predictive value of the 4Ts scoring system as determined by using a standard laboratory marker of HIT--the ELISA--and the interrater reliability of the scoring system were assessed. Sixty-seven (84%) of the 80 patients had low clinical probability of HIT based on the calculated 4Ts score. The ELISA result was negative for PF4-heparin antibodies in 74 patients (93%). Based on the results of the ELISA, the negative predictive value of the 4Ts score was 91%. Each 4Ts score was calculated by two independent investigators and adjudicated by a third investigator when necessary. The interrater reliability of the scoring system was fair (Cohen κ coefficient 0.362, 95% confidence interval [CI] 0.222-0.502; weighted κ coefficient 0.554 (95% CI 0.441-0.667). Determination of the timing of HIT was associated with the largest number of discrepancies (16) between evaluators, followed by other causes of thrombocytopenia (15), degree of decline in platelet count (14), and the presence of thrombosis or other sequelae (2). A low 4Ts score supports a low probability of HIT based on the results of the polyspecific ELISA. Overall, the interrater reliability of the scoring system was fair. Components of the 4Ts scoring system need to be further clarified or modified in order to improve interrater reliability and thereby increase the clinical utility of this pretest probability model.
Establishing Reliable Cognitive Change in Children with Epilepsy: The Procedures and Results for a Sample with Epilepsy

ERIC Educational Resources Information Center

van Iterson, Loretta; Augustijn, Paul B.; de Jong, Peter F.; van der Leij, Aryan

2013-01-01

The goal of this study was to investigate reliable cognitive change in epilepsy by developing computational procedures to determine reliable change index scores (RCIs) for the Dutch Wechsler Intelligence Scales for Children. First, RCIs were calculated based on stability coefficients from a reference sample. Then, these RCIs were applied to a…
Development, scoring, and reliability of the Microscale Audit of Pedestrian Streetscapes (MAPS)

PubMed Central

2013-01-01

Background Streetscape (microscale) features of the built environment can influence people’s perceptions of their neighborhoods’ suitability for physical activity. Many microscale audit tools have been developed, but few have published systematic scoring methods. We present the development, scoring, and reliability of the Microscale Audit of Pedestrian Streetscapes (MAPS) tool and its theoretically-based subscales. Methods MAPS was based on prior instruments and was developed to assess details of streetscapes considered relevant for physical activity. MAPS sections (route, segments, crossings, and cul-de-sacs) were scored by two independent raters for reliability analyses. There were 290 route pairs, 516 segment pairs, 319 crossing pairs, and 53 cul-de-sac pairs in the reliability sample. Individual inter-rater item reliability analyses were computed using Kappa, intra-class correlation coefficient (ICC), and percent agreement. A conceptual framework for subscale creation was developed using theory, expert consensus, and policy relevance. Items were grouped into subscales, and subscales were analyzed for inter-rater reliability at tiered levels of aggregation. Results There were 160 items included in the subscales (out of 201 items total). Of those included in the subscales, 80 items (50.0%) had good/excellent reliability, 41 items (25.6%) had moderate reliability, and 18 items (11.3%) had low reliability, with limited variability in the remaining 21 items (13.1%). Seventeen of the 20 route section subscales, valence (positive/negative) scores, and overall scores (85.0%) demonstrated good/excellent reliability and 3 demonstrated moderate reliability. Of the 16 segment subscales, valence scores, and overall scores, 12 (75.0%) demonstrated good/excellent reliability, three demonstrated moderate reliability, and one demonstrated poor reliability. Of the 8 crossing subscales, valence scores, and overall scores, 6 (75.0%) demonstrated good/excellent reliability, and 2 demonstrated moderate reliability. The cul-de-sac subscale demonstrated good/excellent reliability. Conclusions MAPS items and subscales predominantly demonstrated moderate to excellent reliability. The subscales and scoring system represent a theoretically based framework for using these complex microscale data and may be applicable to other similar instruments. PMID:23621947
The PEDro scale had acceptably high convergent validity, construct validity, and interrater reliability in evaluating methodological quality of pharmaceutical trials.

PubMed

Yamato, Tie Parma; Maher, Chris; Koes, Bart; Moseley, Anne

2017-06-01

The Physiotherapy Evidence Database (PEDro) scale has been widely used to investigate methodological quality in physiotherapy randomized controlled trials; however, its validity has not been tested for pharmaceutical trials. The aim of this study was to investigate the validity and interrater reliability of the PEDro scale for pharmaceutical trials. The reliability was also examined for the Cochrane Back and Neck (CBN) Group risk of bias tool. This is a secondary analysis of data from a previous study. We considered randomized placebo controlled trials evaluating any pain medication for chronic spinal pain or osteoarthritis. Convergent validity was evaluated by correlating the PEDro score with the summary score of the CBN risk of bias tool. The construct validity was tested using a linear regression analysis to determine the degree to which the total PEDro score is associated with treatment effect sizes, journal impact factor, and the summary score for the CBN risk of bias tool. The interrater reliability was estimated using the Prevalence and Bias Adjusted Kappa coefficient and 95% confidence interval (CI) for the PEDro scale and CBN risk of bias tool. Fifty-three trials were included, with 91 treatment effect sizes included in the analyses. The correlation between PEDro scale and CBN risk of bias tool was 0.83 (95% CI 0.76-0.88) after adjusting for reliability, indicating strong convergence. The PEDro score was inversely associated with effect sizes, significantly associated with the summary score for the CBN risk of bias tool, and not associated with the journal impact factor. The interrater reliability for each item of the PEDro scale and CBN risk of bias tool was at least substantial for most items (>0.60). The intraclass correlation coefficient for the PEDro score was 0.80 (95% CI 0.68-0.88), and for the CBN, risk of bias tool was 0.81 (95% CI 0.69-0.88). There was evidence for the convergent and construct validity for the PEDro scale when used to evaluate methodological quality of pharmacological trials. Both risk of bias tools have acceptably high interrater reliability. Copyright © 2017 Elsevier Inc. All rights reserved.
Adaptation and Assessment of Reliability and Validity of the Greek Version of the Ohkuma Questionnaire for Dysphagia Screening

PubMed Central

Papadopoulou, Soultana L.; Exarchakos, Georgios; Christodoulou, Dimitrios; Theodorou, Stavroula; Beris, Alexandre; Ploumis, Avraam

2016-01-01

Introduction The Ohkuma questionnaire is a validated screening tool originally used to detect dysphagia among patients hospitalized in Japanese nursing facilities. Objective The purpose of this study is to evaluate the reliability and validity of the adapted Greek version of the Ohkuma questionnaire. Methods Following the steps for cross-cultural adaptation, we delivered the validated Ohkuma questionnaire to 70 patients (53 men, 17 women) who were either suffering from dysphagia or not. All of them completed the questionnaire a second time within a month. For all of them, we performed a bedside and VFSS study of dysphagia and asked participants to undergo a second VFSS screening, with the exception of nine individuals. Statistical analysis included measurement of internal consistency with Cronbach's α coefficient, reliability with Cohen's Kappa, Pearson's correlation coefficient and construct validity with categorical components, and One-Way Anova test. Results According to Cronbach's α coefficient (0.976) for total score, there was high internal consistency for the Ohkuma Dysphagia questionnaire. Test-retest reliability (Cohen's Kappa) ranged from 0.586 to 1.00, exhibiting acceptable stability. We also estimated the Pearson's correlation coefficient for the test-retest total score, which reached high levels (0.952; p = 0.000). The One-Way Anova test in the two measurement times showed statistically significant correlation in both measurements (p = 0.02 and p = 0.016). Conclusion The adapted Greek version of the questionnaire is valid and reliable and can be used for the screening of dysphagia in the Greek-speaking patients. PMID:28050209
Adaptation and Assessment of Reliability and Validity of the Greek Version of the Ohkuma Questionnaire for Dysphagia Screening.

PubMed

Papadopoulou, Soultana L; Exarchakos, Georgios; Christodoulou, Dimitrios; Theodorou, Stavroula; Beris, Alexandre; Ploumis, Avraam

2017-01-01

Introduction The Ohkuma questionnaire is a validated screening tool originally used to detect dysphagia among patients hospitalized in Japanese nursing facilities. Objective The purpose of this study is to evaluate the reliability and validity of the adapted Greek version of the Ohkuma questionnaire. Methods Following the steps for cross-cultural adaptation, we delivered the validated Ohkuma questionnaire to 70 patients (53 men, 17 women) who were either suffering from dysphagia or not. All of them completed the questionnaire a second time within a month. For all of them, we performed a bedside and VFSS study of dysphagia and asked participants to undergo a second VFSS screening, with the exception of nine individuals. Statistical analysis included measurement of internal consistency with Cronbach's α coefficient, reliability with Cohen's Kappa, Pearson's correlation coefficient and construct validity with categorical components, and One-Way Anova test. Results According to Cronbach's α coefficient (0.976) for total score, there was high internal consistency for the Ohkuma Dysphagia questionnaire. Test-retest reliability (Cohen's Kappa) ranged from 0.586 to 1.00, exhibiting acceptable stability. We also estimated the Pearson's correlation coefficient for the test-retest total score, which reached high levels (0.952; p = 0.000). The One-Way Anova test in the two measurement times showed statistically significant correlation in both measurements ( p = 0.02 and p = 0.016). Conclusion The adapted Greek version of the questionnaire is valid and reliable and can be used for the screening of dysphagia in the Greek-speaking patients.
Reliability and Validity of 3 Methods of Assessing Orthopedic Resident Skill in Shoulder Surgery.

PubMed

Bernard, Johnathan A; Dattilo, Jonathan R; Srikumaran, Uma; Zikria, Bashir A; Jain, Amit; LaPorte, Dawn M

Traditional measures for evaluating resident surgical technical skills (e.g., case logs) assess operative volume but not level of surgical proficiency. Our goal was to compare the reliability and validity of 3 tools for measuring surgical skill among orthopedic residents when performing 3 open surgical approaches to the shoulder. A total of 23 residents at different stages of their surgical training were tested for technical skill pertaining to 3 shoulder surgical approaches using the following measures: Objective Structured Assessment of Technical Skills (OSATS) checklists, the Global Rating Scale (GRS), and a final pass/fail assessment determined by 3 upper extremity surgeons. Adverse events were recorded. The Cronbach α coefficient was used to assess reliability of the OSATS checklists and GRS scores. Interrater reliability was calculated with intraclass correlation coefficients. Correlations among OSATS checklist scores, GRS scores, and pass/fail assessment were calculated with Spearman ρ. Validity of OSATS checklists was determined using analysis of variance with postgraduate year (PGY) as a between-subjects factor. Significance was set at p < 0.05 for all tests. Criterion validity was shown between the OSATS checklists and GRS for the 3 open shoulder approaches. Checklist scores showed superior interrater reliability compared with GRS and subjective pass/fail measurements. GRS scores were positively correlated across training years. The incidence of adverse events was significantly higher among PGY-1 and PGY-2 residents compared with more experienced residents. OSATS checklists are a valid and reliable assessment of technical skills across 3 surgical shoulder approaches. However, checklist scores do not measure quality of technique. Documenting adverse events is necessary to assess quality of technique and ultimate pass/fail status. Multiple methods of assessing surgical skill should be considered when evaluating orthopedic resident surgical performance. Copyright Â© 2016 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.

Faculty and resident evaluations of medical students on a surgery clerkship correlate poorly with standardized exam scores.

PubMed

Goldstein, Seth D; Lindeman, Brenessa; Colbert-Getz, Jorie; Arbella, Trisha; Dudas, Robert; Lidor, Anne; Sacks, Bethany

2014-02-01

The clinical knowledge of medical students on a surgery clerkship is routinely assessed via subjective evaluations from faculty members and residents. Interpretation of these ratings should ideally be valid and reliable. However, prior literature has questioned the correlation between subjective and objective components when assessing students' clinical knowledge. Retrospective cross-sectional data were collected from medical student records at The Johns Hopkins University School of Medicine from July 2009 through June 2011. Surgical faculty members and residents rated students' clinical knowledge on a 5-point, Likert-type scale. Interrater reliability was assessed using intraclass correlation coefficients for students with ≥4 attending surgeon evaluations (n = 216) and ≥4 resident evaluations (n = 207). Convergent validity was assessed by correlating average evaluation ratings with scores on the National Board of Medical Examiners (NBME) clinical subject examination for surgery. Average resident and attending surgeon ratings were also compared by NBME quartile using analysis of variance. There were high degrees of reliability for resident ratings (intraclass correlation coefficient, .81) and attending surgeon ratings (intraclass correlation coefficient, .76). Resident and attending surgeon ratings shared a moderate degree of variance (19%). However, average resident ratings and average attending surgeon ratings shared a small degree of variance with NBME surgery examination scores (ρ(2) ≤ .09). When ratings were compared among NBME quartile groups, the only significant difference was for residents' ratings of students with the lower 25th percentile of scores compared with the top 25th percentile of scores (P = .007). Although high interrater reliability suggests that attending surgeons and residents rate students with consistency, the lack of convergent validity suggests that these ratings may not be reflective of actual clinical knowledge. Both faculty members and residents may benefit from training in knowledge assessment, which will likely increase opportunities to recognize deficiencies and make student evaluation a more valuable tool. Copyright © 2014 Elsevier Inc. All rights reserved.
Reliability, Validity, and Responsiveness of InFLUenza Patient-Reported Outcome (FLU-PRO©) Scores in Influenza-Positive Patients.

PubMed

Powers, John H; Bacci, Elizabeth D; Guerrero, M Lourdes; Leidy, Nancy Kline; Stringer, Sonja; Kim, Katherine; Memoli, Matthew J; Han, Alison; Fairchok, Mary P; Chen, Wei-Ju; Arnold, John C; Danaher, Patrick J; Lalani, Tahaniyat; Ridoré, Michelande; Burgess, Timothy H; Millar, Eugene V; Hernández, Andrés; Rodríguez-Zulueta, Patricia; Smolskis, Mary C; Ortega-Gallegos, Hilda; Pett, Sarah; Fischer, William; Gillor, Daniel; Macias, Laura Moreno; DuVal, Anna; Rothman, Richard; Dugas, Andrea; Ruiz-Palacios, Guillermo M

2018-02-01

To assess the reliability, validity, and responsiveness of InFLUenza Patient-Reported Outcome (FLU-PRO©) scores for quantifying the presence and severity of influenza symptoms. An observational prospective cohort study of adults (≥18 years) with influenza-like illness in the United States, the United Kingdom, Mexico, and South America was conducted. Participants completed the 37-item draft FLU-PRO daily for up to 14 days. Item-level and factor analyses were used to remove items and determine factor structure. Reliability of the final tool was estimated using Cronbach α and intraclass correlation coefficients (2-day reliability). Convergent and known-groups validity and responsiveness were assessed using global assessments of influenza severity and return to usual health. Of the 536 patients enrolled, 221 influenza-positive subjects comprised the analytical sample. The mean age of the patients was 40.7 years, 60.2% were women, and 59.7% were white. The final 32-item measure has six factors/domains (nose, throat, eyes, chest/respiratory, gastrointestinal, and body/systemic), with a higher order factor representing symptom severity overall (comparative fit index = 0.92; root mean square error of approximation = 0.06). Cronbach α was high (total = 0.92; domain range = 0.71-0.87); test-retest reliability (intraclass correlation coefficient, day 1-day 2) was 0.83 for total scores and 0.57 to 0.79 for domains. Day 1 FLU-PRO domain and total scores were moderately to highly correlated (≥0.30) with Patient Global Rating of Flu Severity (except nose and throat). Consistent with known-groups validity, scores differentiated severity groups on the basis of global rating (total: F = 57.2, P < 0.001; domains: F = 8.9-67.5, P < 0.001). Subjects reporting return to usual health showed significantly greater (P < 0.05) FLU-PRO score improvement by day 7 than did those who did not, suggesting score responsiveness. Results suggest that FLU-PRO scores are reliable, valid, and responsive to change in influenza-positive adults. Copyright © 2018 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Validation of the Chinese version of the FOUR score in the assessment of neurosurgical patients with different level of consciousness.

PubMed

Peng, Juan; Deng, Yingying; Chen, Fangyao; Zhang, Xiaomei; Wang, Xiaoyan; Zhou, Ying; Zhou, Hongzhen; Qiu, Binghui

2015-12-10

The Glasgow Coma Scale (GCS) is currently the most widely used scoring system for comatose patients. A decade ago, the Full Outline of Unresponsiveness (FOUR) score was devised to better capture four functional aspects of consciousness (eye, motor responses, brainstem reflexes, and respiration). This study aimed to validate the Chinese version of the FOUR score in patients with different levels of consciousness. The study had two phases: (1) translation of the FOUR score, and (2) assessment of its reliability and validity. The Chinese version of the FOUR score was developed according to a standardized protocol. One hundred-twenty consecutive patients with acute brain damage, admitted to Nanfang Hospital (Southern Medical University, Guangdong, China) from November 2014 to February 2015, were enrolled. The inter-rater agreement for the FOUR score and GCS was evaluated using intraclass correlation coefficient (ICC). Receiver operating characteristic (ROC) curves were established to determine the scales' abilities to predict outcome. The rater agreement was excellent both for FOUR (ICC = 0.970; p < 0.001) and GCS (ICC = 0.958; p < 0.001). The FOUR score yielded an excellent test-retest reliability (ICC = 0.930; p < 0.001). Spearman's correlation coefficients between GCS and the FOUR score were high: r = 0.932, first rating; r = 0.887, second rating (all p < 0.001). Areas under the curve (AUC) for mortality were 0.834 (95 % CI, 0.740-0.928) and 0.815 (95 % CI, 0.723-0.908) for the FOUR score and GCS, respectively. The Chinese version of the FOUR score is a reliable scale for evaluating the level of consciousness in patients with acute brain injury.
Reliability and validity of the upper-body dressing scale in Japanese patients with vascular dementia with hemiparesis.

PubMed

Endo, Arisa; Suzuki, Makoto; Akagi, Atsumi; Chiba, Naoyuki; Ishizaka, Ikuyo; Matsunaga, Atsuhiko; Fukuda, Michinari

2015-03-01

The purpose of this study was to examine the reliability and validity of the Upper-body Dressing Scale (UBDS) for buttoned shirt dressing, which evaluates the learning process of new component actions of upper-body dressing in patients diagnosed with dementia and hemiparesis. This was a preliminary correlational study of concurrent validity and reliability in which 10 vascular dementia patients with hemiparesis were enrolled and assessed repeatedly by six occupational therapists by means of the UBDS and the dressing item of the Functional Independence Measure (FIM). Intraclass correlation coefficient was 0.97 for intra-rater reliability and 0.99 for inter-rater reliability. The level of correlation between UBDS score and FIM dressing item scores was -0.93. UBDS scores for paralytic hand passed into the sleeve and sleeve pulled up beyond the shoulder joint were worse than the scores for the other components of the task. The UBDS has good reliability and validity for vascular dementia patients with hemiparesis. Further research is needed to investigate the relation between UBDS score and the effect of intervention and to clarify sensitivity or responsiveness of the scale to clinical change. Copyright © 2014 John Wiley & Sons, Ltd.
Reliable Change Indices and Standardized Regression-Based Change Score Norms for Evaluating Neuropsychological Change in Children with Epilepsy

PubMed Central

Busch, Robyn M.; Lineweaver, Tara T.; Ferguson, Lisa; Haut, Jennifer S.

2015-01-01

Reliable change index scores (RCIs) and standardized regression-based change score norms (SRBs) permit evaluation of meaningful changes in test scores following treatment interventions, like epilepsy surgery, while accounting for test-retest reliability, practice effects, score fluctuations due to error, and relevant clinical and demographic factors. Although these methods are frequently used to assess cognitive change after epilepsy surgery in adults, they have not been widely applied to examine cognitive change in children with epilepsy. The goal of the current study was to develop RCIs and SRBs for use in children with epilepsy. Sixty-three children with epilepsy (age range 6–16; M=10.19, SD=2.58) underwent comprehensive neuropsychological evaluations at two time points an average of 12 months apart. Practice adjusted RCIs and SRBs were calculated for all cognitive measures in the battery. Practice effects were quite variable across the neuropsychological measures, with the greatest differences observed among older children, particularly on the Children’s Memory Scale and Wisconsin Card Sorting Test. There was also notable variability in test-retest reliabilities across measures in the battery, with coefficients ranging from 0.14 to 0.92. RCIs and SRBs for use in assessing meaningful cognitive change in children following epilepsy surgery are provided for measures with reliability coefficients above 0.50. This is the first study to provide RCIs and SRBs for a comprehensive neuropsychological battery based on a large sample of children with epilepsy. Tables to aid in evaluating cognitive changes in children who have undergone epilepsy surgery are provided for clinical use. An excel sheet to perform all relevant calculations is also available to interested clinicians or researchers. PMID:26043163
Long-term stability of the Wechsler Intelligence Scale for Children--Fourth Edition.

PubMed

Watkins, Marley W; Smith, Lourdes G

2013-06-01

Long-term stability of the Wechsler Intelligence Scale for Children-Fourth Edition (WISC-IV; Wechsler, 2003) was investigated with a sample of 344 students from 2 school districts twice evaluated for special education eligibility at an average interval of 2.84 years. Test-retest reliability coefficients for the Verbal Comprehension Index (VCI), Perceptual Reasoning Index (PRI), Working Memory Index (WMI), Processing Speed Index (PSI), and the Full Scale IQ (FSIQ) were .72, .76, .66, .65, and .82, respectively. As predicted, the test-retest reliability coefficients for the subtests (Mdn = .56) were generally lower than the index scores (Mdn = .69) and the FSIQ (.82). On average, subtest scores did not differ by more than 1 point, and index scores did not differ by more than 2 points across the test-retest interval. However, 25% of the students earned FSIQ scores that differed by 10 or more points, and 29%, 39%, 37%, and 44% of the students earned VCI, PRI, WMI, and PSI scores, respectively, that varied by 10 or more points. Given this variability, it cannot be assumed that WISC-IV scores will be consistent across long test-retest intervals for individual students. PsycINFO Database Record (c) 2013 APA, all rights reserved.
[The reliability of a questionnaire regarding Colombian children's physical activity].

PubMed

Herazo-Beltrán, Aliz Y; Domínguez-Anaya, Regina

2012-10-01

Reporting the Physical Activity Questionnaire for school children's (PAQ-C) test-retest reliability and internal consistency. This was a descriptive study of 100 school-aged children aged 9 to 11 years old attending a school in Cartagena, Colombia. The sample was randomly selected. The PAQ-C was given twice, one week apart, after the informed consent forms had been signing by the children's parents and school officials. Cronbach's alpha coefficient of reliability was used for assessing internal consistency and an intra-class correlation coefficient for test-retest reliability SPSS (version 17.0) was used for statistical analysis. The questionnaire scored 0.73 internal consistencies during the first measurement and 0.78 on the second; intra-class correlation coefficient was 0.60. There were differences between boys and girls regarding both measurements. The PAQ-C had acceptable internal consistency and test-retest reliability, thereby making it useful for measuring children's self-reported physical activity and a valuable tool for population studies in Colombia.
Choosing the best index for the average score intraclass correlation coefficient.

PubMed

Shieh, Gwowen

2016-09-01

The intraclass correlation coefficient (ICC)(2) index from a one-way random effects model is widely used to describe the reliability of mean ratings in behavioral, educational, and psychological research. Despite its apparent utility, the essential property of ICC(2) as a point estimator of the average score intraclass correlation coefficient is seldom mentioned. This article considers several potential measures and compares their performance with ICC(2). Analytical derivations and numerical examinations are presented to assess the bias and mean square error of the alternative estimators. The results suggest that more advantageous indices can be recommended over ICC(2) for their theoretical implication and computational ease.
Observer reliability of the Gross Motor Performance Measure and the Quality of Upper Extremity Skills Test, based on video recordings.

PubMed

Sorsdahl, Anne Brit; Moe-Nilssen, Rolf; Strand, Liv Inger

2008-02-01

The aim of this study was to examine observer reliability of the Gross Motor Performance Measure (GMPM) and the Quality of Upper Extremity Skills Test (QUEST) based on video clips. The tests were administered to 26 children with cerebral palsy (CP; 14 males, 12 females; range 2-13y, mean 7y 6mo), 24 with spastic CP, and two with dyskinesia. Respectively, five, six, five, four, and six children were classified in Gross Motor Function Classification System Levels I to V; and four, nine, five, five, and three children were classified in Manual Ability Classification System levels I to V. The children's performances were recorded and edited. Two experienced paediatric physical therapists assessed the children from watching the video clips. Intraobserver and interobserver reliability values of the total scores were mostly high, intraclass correlation coefficient (ICC)(1,1) varying from 0.69 to 0.97 with only one coefficient below 0.89. The ICCs of subscores varied from 0.36 to 0.95, finding'Alignment'and'Weight shift'in GMPM and'Protective extension'in QUEST highly reliable. The subscores'Dissociated movements'in GMPM and QUEST, and'Grasp'in QUEST were the least reliable, and recommendations are made to increase reliability of these subscores. Video scoring was time consuming, but was found to offer many advantages; the possibility to review performance, to use special trained observers for scoring and less demanding assessment for the children.
Psychometrics Matter in Health Behavior: A Long-term Reliability Generalization Study.

PubMed

Pickett, Andrew C; Valdez, Danny; Barry, Adam E

2017-09-01

Despite numerous calls for increased understanding and reporting of reliability estimates, social science research, including the field of health behavior, has been slow to respond and adopt such practices. Therefore, we offer a brief overview of reliability and common reporting errors; we then perform analyses to examine and demonstrate the variability of reliability estimates by sample and over time. Using meta-analytic reliability generalization, we examined the variability of coefficient alpha scores for a well-designed, consistent, nationwide health study, covering a span of nearly 40 years. For each year and sample, reliability varied. Furthermore, reliability was predicted by a sample characteristic that differed among age groups within each administration. We demonstrated that reliability is influenced by the methods and individuals from which a given sample is drawn. Our work echoes previous calls that psychometric properties, particularly reliability of scores, are important and must be considered and reported before drawing statistical conclusions.
Reliability of sonographic assessment of tendinopathy in tennis elbow.

PubMed

Poltawski, Leon; Ali, Syed; Jayaram, Vijay; Watson, Tim

2012-01-01

To assess the reliability and compute the minimum detectable change using sonographic scales to quantify the extent of pathology and hyperaemia in the common extensor tendon in people with tennis elbow. The lateral elbows of 19 people with tennis elbow were assessed sonographically twice, 1-2 weeks apart. Greyscale and power Doppler images were recorded for subsequent rating of abnormalities. Tendon thickening, hypoechogenicity, fibrillar disruption and calcification were each rated on four-point scales, and scores were summed to provide an overall rating of structural abnormality; hyperaemia was scored on a five point scale. Inter-rater reliability was established using the intraclass correlation coefficient (ICC) to compare scores assigned independently to the same set of images by a radiologist and a physiotherapist with training in musculoskeletal imaging. Test-retest reliability was assessed by comparing scores assigned by the physiotherapist to images recorded at the two sessions. The minimum detectable change (MDC) was calculated from the test-retest reliability data. ICC values for inter-rater reliability ranged from 0.35 (95% CI: 0.05, 0.60) for fibrillar disruption to 0.77 (0.55, 0.88) for overall greyscale score, and 0.89 (0.79, 0.95) for hyperaemia. Test-retest reliability ranged from 0.70 (0.48, 0.84) for tendon thickening to 0.82 (0.66, 0.90) for overall greyscale score and 0.86 (0.73, 0.93) for calcification. The MDC for the greyscale total score was 2.0/12 and for the hyperaemia score was 1.1/5. The sonographic scoring system used in this study may be used reliably to quantify tendon abnormalities and change over time. A relatively inexperienced imager can conduct the assessment and use the rating scales reliably.
Estimating the Effect of Changes in Criterion Score Reliability on the Power of the "F" Test of Equality of Means

ERIC Educational Resources Information Center

Feldt, Leonard S.

2011-01-01

This article presents a simple, computer-assisted method of determining the extent to which increases in reliability increase the power of the "F" test of equality of means. The method uses a derived formula that relates the changes in the reliability coefficient to changes in the noncentrality of the relevant "F" distribution. A readily available…
Evaluation of a clinical dehydration scale in children requiring intravenous rehydration.

PubMed

Kinlin, Laura M; Freedman, Stephen B

2012-05-01

To evaluate the reliability and validity of a previously derived clinical dehydration scale (CDS) in a cohort of children with gastroenteritis and evidence of dehydration. Participants were 226 children older than 3 months who presented to a tertiary care emergency department and required intravenous rehydration. Reliability was assessed at treatment initiation, by comparing the scores assigned independently by a trained research nurse and a physician. Validity was assessed by using parameters reflective of disease severity: weight gain, baseline laboratory results, willingness of the physician to discharge the patient, hospitalization, and length of stay. Interobserver reliability was moderate, with a weighted κ of 0.52 (95% confidence interval [CI] 0.41, 0.63). There was no correlation between CDS score and percent weight gain, a proxy measure of fluid deficit (Spearman correlation coefficient = -0.03; 95% CI -0.18, 0.12). There were, however, modest and statistically significant correlations between CDS score and several other parameters, including serum bicarbonate (Pearson correlation coefficient = -0.35; 95% CI -0.46, -0.22) and length of stay (Pearson correlation coefficient = 0.24; 95% CI 0.11, 0.36). The scale's discriminative ability was assessed for the outcome of hospitalization, yielding an area under the receiver operating characteristic curve of 0.65 (95% CI 0.57, 0.73). In children administered intravenous rehydration, the CDS was characterized by moderate interobserver reliability and weak associations with objective measures of disease severity. These data do not support its use as a tool to dictate the need for intravenous rehydration or to predict clinical course.
Validation of a Novel Scoring System for Changes in Skeletal Manifestations of Hypophosphatasia in Newborns, Infants, and Children: The Radiographic Global Impression of Change Scale.

PubMed

Whyte, Michael P; Fujita, Kenji P; Moseley, Scott; Thompson, David D; McAlister, William H

2018-05-01

Hypophosphatasia (HPP) is the heritable metabolic disease characterized by impaired skeletal mineralization due to low activity of the tissue-nonspecific isoenzyme of alkaline phosphatase. Although HPP during growth often manifests with distinctive radiographic skeletal features, no validated method was available to quantify them, including changes over time. We created the Radiographic Global Impression of Change (RGI-C) scale to assess changes in the skeletal burden of pediatric HPP. Site-specific pairs of radiographs of newborns, infants, and children with HPP from three clinical studies of asfotase alfa, an enzyme replacement therapy for HPP, were obtained at baseline and during treatment. Each pair was scored by three pediatric radiologists ("raters"), with nine raters across the three studies. Intrarater and interrater agreement was determined by weighted Kappa coefficients. Interrater reliability was assessed using intraclass correlation coefficients (ICCs) and by two-way random effects analysis of variance (ANOVA) and a mixed-model repeated measures ANOVA. Pearson correlation coefficients evaluated relationships of the RGI-C to the Rickets Severity Scale (RSS), Pediatric Outcomes Data Collection Instrument Global Function Parent Normative Score, Childhood Health Assessment Questionnaire Disability Index, 6-Minute Walk Test percent predicted, and Z-score for height in patients aged 6 to 12 years at baseline. Eighty-nine percent (8/9) of raters showed substantial or almost perfect intrarater agreement of sequential RGI-C scores (weighted Kappa coefficients, 0.72 to 0.93) and moderate or substantial interrater agreement (weighted Kappa coefficients, 0.53 to 0.71) in patients aged 0 to 12 years at baseline. Moderate-to-good interrater reliability was observed (ICC, 0.57 to 0.65). RGI-C scores were significantly (p ≤ 0.0065) correlated with the RSS and with measures of global function, disability, endurance, and growth in the patients aged 6 to 12 years at baseline. Thus, the RGI-C is valid and reliable for detecting clinically important changes in skeletal manifestations of severe HPP in newborns, infants, and children, including during asfotase alfa treatment. © 2018 The Authors. Journal of Bone and Mineral Research Published by Wiley Periodicals Inc. © 2018 The Authors. Journal of Bone and Mineral Research Published by Wiley Periodicals Inc.
Results of the validation study of the Psodisk instrument, and determination of the cut-off scores for varying degrees of impairment.

PubMed

Sampogna, F; Linder, D; Romano, G V; Gualberti, G; Merolla, R; di Luzio Paparatti, U

2015-04-01

The Psodisk is a 10-item visual instrument, aimed at measuring the burden of psoriasis on patients. To validate the Psodisk in a large sample of patients with psoriasis, and to define categories for the interpretation of the scores. Data were collected in 21 dermatological centres. The Psodisk was administered at baseline (t0), after 2 or 3 days (t1) and about 3 months (t2) after baseline, and data were used to assess validity and reliability of the instrument. The cut-off scores were determined using the perception of the severity of the disease by the patient as anchor point. The evaluable population consisted of 320 patients at baseline, with a mean Psodisk score of 36.9. The concurrent validity of the instrument was confirmed by the high correlation with Skindex-29 and DLQI. Factor analyses selected a single factor, which alone explained almost 60% of the variance. Cronbach's coefficient alpha was 0.927, suggesting a good reliability. Test-retest reliability was verified by a Pearson's correlation coefficient between the Psodisk scores at baseline and t1 of 0.924. Five categories of disease burden were defined: 1. minimal (<9); 2. mild (9-15); 3. moderate (16-30); 4. marked (31-50); 5. severe (>50). The Psodisk showed good psychometric properties. The definition of the cut-off scores will be useful to evaluate the burden of psoriasis on patients. © 2014 European Academy of Dermatology and Venereology.
[Reliability and validity of the Chinese version on Comprehensive Scores for Financial Toxicity based on the patient-reported outcome measures].

PubMed

Yu, H H; Bi, X; Liu, Y Y

2017-08-10

Objective: To evaluate the reliability and validity of the Chinese version on comprehensive scores for financial toxicity (COST), based on the patient-reported outcome measures. Methods: A total of 118 cancer patients were face-to-face interviewed by well-trained investigators. Cronbach's α and Pearson correlation coefficient were used to evaluate reliability. Content validity index (CVI) and exploratory factor analysis (EFA) were used to evaluate the content validity and construct validity, respectively. Results: The Cronbach's α coefficient appeared as 0.889 for the whole questionnaire, with the results of test-retest were between 0.77 and 0.98. Scale-content validity index (S-CVI) appeared as 0.82, with item-content validity index (I-CVI) between 0.83 and 1.00. Two components were extracted from the Exploratory factor analysis, with cumulative rate as 68.04% and loading>0.60 on every item. Conclusion: The Chinese version of COST scale showed high reliability and good validity, thus can be applied to assess the financial situation in cancer patients.
Reliability and Validity of the Italian Version of the Protocol of Orofacial Myofunctional Evaluation with Scores (I-OMES).

PubMed

Scarponi, Letizia; de Felicio, Claudia Maria; Sforza, Chiarella; Pimenta Ferreira, Claudia Lucia; Ginocchio, Daniela; Pizzorni, Nicole; Barozzi, Stefania; Mozzanica, Francesco; Schindler, Antonio

2018-05-30

To evaluate the reliability, validity, and responsiveness of the Italian OMES (I-OMES). The study consisted of 3 phases: (1) internal consistency and reliability, (2) validity, and (3) responsiveness analysis. The recruited population included 27 patients with orofacial myofunctional disorders (OMD) and 174 healthy volunteers. Forty-seven subjects, 18 healthy and all recruited patients with OMD were assessed for inter-rater and test-retest reliability analysis. I-OMES and Nordic Orofacial Test - Screening (NOT-S) scores of the patients were correlated for concurrent validity analysis. I-OMES scores from 27 patients with OMD and 27 age- and gender-matched healthy subjects were compared to investigate construct validity. I-OMES scores before and after successful swallowing rehabilitation in patients were compared for responsiveness analysis. Adequate internal consistency (Cronbach α = 0.71) and strong inter-rater and test-retest reliability (intraclass coefficient correlation = 0.97 and 0.98, respectively) were found. I-OMES and NOT-S scores significantly and inversely correlated (r = -0.38). A statistical significance (p < 0.001) was found between the pathological group and the control group for the total I-OMES score. The mean I-OMES score improved from 90 (78-102) to 99 (89-103) after myofunctional rehabilitation (p < 0.001). The I-OMES is a reliable and valid tool to evaluate OMD. © 2018 S. Karger AG, Basel.
Technical analysis of the Slosson Written Expression Test.

PubMed

Erford, Bradley T; Hofler, Donald B

2004-06-01

The Slosson Written Expression Test was designed to assess students ages 8-17 years at risk for difficulties in written expression. Scores from three independent samples were used to evaluate the test's reliability and validity for measuring students' written expression. Test-retest reliability of the SWET subscales ranged from .80 to .94 (n = 151), and .95 for the Written Expression Total Standard Scores. The median alternate-form reliability for students' Written Expression Total Standard Scores was .81 across the three forms. Scores on the Slosson test yielded concurrent validity coefficients (n = 143) of .60 with scores from the Woodcock-Johnson: Tests of Achievement-Third Edition Broad Written Language Domain and .49 with scores on the Test of Written Language-Third Edition Spontaneous Writing Quotient. Exploratory factor analytic procedures suggested the Slosson test is comprised of two dimensions, Writing Mechanics and Writing Maturity (47.1% and 20.1% variance accounted for, respectively). In general, the Slosson Written Expression Test presents with sufficient technical characteristics to be considered a useful written expression screening test.
The Youth Throwing Score: Validating Injury Assessment in Young Baseball Players.

PubMed

Ahmad, Christopher S; Padaki, Ajay S; Noticewala, Manish S; Makhni, Eric C; Popkin, Charles A

2017-02-01

Epidemic levels of shoulder and elbow injuries have been reported recently in youth and adolescent baseball players. Despite the concerning frequency of these injuries, no instrument has been validated to assess upper extremity injury in this patient population. Purpose/Hypothesis: The purpose of this study was to validate an upper extremity assessment tool specifically designed for young baseball players. We hypothesized that this tool will be both reliable and valid. Cohort study (diagnosis); Level of evidence, 2. The Youth Throwing Score (YTS) was constructed by an interdisciplinary team of providers and coaches as a tool to assess upper extremity injury in youth and adolescent baseball players (age range, 10-18 years). The psychometric properties of the test were then determined. A total of 223 players completed the final survey. The players' mean age was 14.3 ± 2.7 years. Pilot analysis showed that none of the 14 questions received a mean athlete importance rating less than 3 of 5, and the final survey read at a Flesch-Kincaid level of 4.1, which is appropriate for patients aged 9 years and older. The players self-assigned their injury status, resulting in a mean instrument score of 59.7 ± 8.4 for the 148 players "playing without pain," 42.0 ± 11.5 for the 60 players "playing with pain," and 40.4 ± 10.5 for the 15 players "not playing due to pain." Players playing without pain scored significantly higher than those playing with pain and those not playing due to pain ( P < .001). Psychometric analysis showed a test-retest intraclass correlation coefficient of 0.90 and a Cronbach alpha intra-item reliability coefficient of 0.93, indicating excellent reliability and internal consistency. Pearson correlation coefficients of 0.65, 0.62, and 0.31 were calculated between the YTS and the Pediatric Outcomes Data Collection Instrument sports/physical functioning module, the Kerlan-Jobe Orthopaedic Clinic Shoulder and Elbow score, and the Quick Disabilities of the Arm, Shoulder, and Hand (QuickDASH) score, respectively. Injured players scored a mean of 9.4 points higher after treatment ( P < .001), and players who improved in their self-assigned pain categorization scored 16.5 points higher ( P < .001). The YTS is the first valid and reliable instrument for assessing young baseball players' upper extremity health.
Assessing oral health-related quality of life in general dental practice in Scotland: validation of the OHIP-14.

PubMed

Fernandes, Marcelo José; Ruta, Danny Adolph; Ogden, Graham Richard; Pitts, Nigel Berry; Ogston, Simon Alexander

2006-02-01

To validate the Oral Health Impact Profile (OHIP)-14 in a sample of patients attending general dental practice. Patients with pathology-free impacted wisdom teeth were recruited from six general dental practices in Tayside, Scotland, and followed for a year to assess the development of problems related to impaction. The OHIP-14 was completed at baseline and at 1-year follow-up, and analysed using three different scoring methods: a summary score, a weighted and standardized score and the total number of problems reported. Instrument reliability was measured by assessing internal consistency and test-retest reliability. Construct validity was assessed using a number of variables. Linear regression was then used to model the relationship between OHIP-14 and all significantly correlated variables. Responsiveness was measured using the standardized response mean (SRM). Adjusted R(2)s and SRMs were calculated for each of the three scoring methods. Estimates for the differences between adjusted R(2)s and the differences between SRMs were obtained with 95% confidence intervals. A total of 278 and 169 patients completed the questionnaire at baseline and follow-up, respectively. Reliability - Cronbach's alpha coefficients ranged from 0.30 to 0.75. Alpha coefficients for all 14 items were 0.88 and 0.87 for baseline and follow-up, respectively. Test-retest coefficients ranged from 0.72 to 0.78. Validity - OHIP-14 scores were significantly correlated with number of teeth, education, main activity, the use of mouthwash, frequency of seeing a dentist, the reason for the last dental appointment, smoking, alcohol intake, pain and symptoms. Adjusted R(2)s ranged from 0.123 to 0.202 and there were no statistically significant differences between those for the three different scoring methods. Responsiveness - The SRMs ranged from 0.37 to 0.56 and there was a statistically significant difference between the summary scores method and the total number of problems method for symptomatic patients. The OHIP-14 is a valid and reliable measure of oral health-related quality of life in general dental practice and is responsive to third molar clinical change. The summary score method demonstrated performance as good as, or better than, the other methods studied.

Evaluation of a Simpler Tool to Assess Nontechnical Skills During Simulated Critical Events.

PubMed

Watkins, Scott C; Roberts, David A; Boulet, John R; McEvoy, Matthew D; Weinger, Matthew B

2017-04-01

Management of critical events requires teams to employ nontechnical skills (NTS), such as teamwork, communication, decision making, and vigilance. We sought to estimate the reliability and provide evidence for the validity of the ratings gathered using a new tool for assessing the NTS of anesthesia providers, the behaviorally anchored rating scale (BARS), and compare its scores with those of an established NTS tool, the Anaesthetists' Nontechnical Skills (ANTS) scale. Six previously trained raters (4 novices and 2 experts) reviewed and scored 18 recorded simulated pediatric crisis management scenarios using a modified ANTS and a BARS tool. Pearson correlation coefficients were calculated separately for the novice and expert raters, by scenario, and overall. The intrarater reliability of the ANTS total score was 0.73 (expert, 0.57; novice, 0.84); for the BARS tool, it was 0.80 (expert, 0.79; novice, 0.81). The average interrater reliability of BARS scores (0.58) was better than ANTS scores (0.37), and the interrater reliabilities of scores from novices (0.69 BARS and 0.52 ANTS) were better than those obtained from experts (0.47 BARS and 0.21 ANTS) for both scoring instruments. The Pearson correlation between the ANTS and BARS total scores was 0.74. Overall, reliability estimates were better for the BARS scores than the ANTS scores. For both measures, the intrarater and interrater reliability was better for novices compared with domain experts, suggesting that properly trained novices can reliably assess the NTS of anesthesia providers managing a simulated critical event. There was substantial correlation between the 2 scoring instruments, suggesting that the tools measured similar constructs. The BARS tool can be an alternative to the ANTS scale for the formative assessment of NTS of anesthesia providers.
The Female Sexual Function Index (FSFI): linguistic validation of the Italian version.

PubMed

Filocamo, Maria Teresa; Serati, Maurizio; Li Marzi, Vincenzo; Costantini, Elisabetta; Milanesi, Martina; Pietropaolo, Amelia; Polledro, Patrizio; Gentile, Barbara; Maruccia, Serena; Fornia, Samanta; Lauri, Irene; Alei, Rosanna; Arcangeli, Paola; Sighinolfi, Maria Chiara; Manassero, Francesca; Andretta, Elena; Palazzetti, Anna; Bertelli, Elena; Del Popolo, Giulio; Villari, Donata

2014-02-01

Although several new measurements for female sexual dysfunction (FSD) have recently been developed, the Female Sexual Function Index (FSFI) remains the gold standard for screening and one of the most widely used questionnaires. The Italian translation of the FSFI has been used in several studies conducted in Italy, but a linguistic validation of the Italian version does not exist. The aim of this study was to perform a linguistic validation of the Italian version of the FSFI. A multicenter cross-sectional study conducted in 14 urological and gynecological clinics, uniformly distributed over Italian territory. We performed all steps necessary to determine the reliability and the test-retest reliability of the Italian version of the FSFI. The study population was a convenience sample of 409 Italian women. The reliability of the questionnaire was calculated using Cronbach's alpha, which was considered weak, moderate, or high if its value was found less than 0.6, between 0.6 and 0.8, or equal to or greater than 0.8, respectively. The test-retest reliability was assessed for all women in the sample by calculating Pearson's concordance correlation coefficient for each domain and for the total score, both at baseline and after 15 days (r range between -1.00 to +1.00, where +1.00 indicates the strongest positive association). Cronbach's alpha coefficients for total and domain score were sufficiently high, ranging from 0.92 to 0.97 for the total sample. The test-retest procedure revealed that the concordance correlation coefficient was very high both for FSFI-I total score (Pearson's P = 0.93) and for each domain (Pearson's P always >0.92). For the first time in the literature, our study has produced a validated and reliable Italian version of the FSFI questionnaire. Consequently, the Italian FSFI can be used as a reliable tool for preliminary screening for female sexual dysfunction for Italian women. © 2013 International Society for Sexual Medicine.
Bronchiolitis Score of Sant Joan de Déu: BROSJOD Score, validation and usefulness.

PubMed

Balaguer, Mònica; Alejandre, Carme; Vila, David; Esteban, Elisabeth; Carrasco, Josep L; Cambra, Francisco José; Jordan, Iolanda

2017-04-01

To validate the bronchiolitis score of Sant Joan de Déu (BROSJOD) and to examine the previously defined scoring cutoff. Prospective, observational study. BROSJOD scoring was done by two independent physicians (at admission, 24 and 48 hr). Internal consistency of the score was assessed using Cronbach's α. To determine inter-rater reliability, the concordance correlation coefficient estimated as an intraclass correlation coefficient (CCC) and limits of agreement estimated as the 90% total deviation index (TDI) were estimated. An expert opinion was used to classify patients according to clinical severity. A validity analysis was conducted comparing the 3-level classification score to that expert opinion. Volume under the surface (VUS), predictive values, and probability of correct classification (PCC) were measured to assess discriminant validity. About 112 patients were recruited, 62 of them (55.4%) males. Median age: 52.5 days (IQR: 32.75-115.25). The admission Cronbach's α was 0.77 (CI95%: 0.71; 0.82) and at 24 hr it was 0.65 (CI95%: 0.48; 0.7). The inter-rater reliability analysis was: CCC at admission 0.96 (95%CI 0.94-0.97), at 24 h 0.77 (95%CI 0.65-0.86), and at 48 hr 0.94 (95%CI 0.94-0.97); TDI 90%: 1.6, 2.9, and 1.57, respectively. The discriminant validity at admission: VUS of 0.8 (95%CI 0.70-0.90), at 24 h 0.92 (95%CI 0.85-0.99), and at 48 hr 0.93 (95%CI 0.87-0.99). The predictive values and PCC values were within 38-100% depending on the level of clinical severity. There is a high inter-rater reliability, showing the BROSJOD score to be reliable and valid, even when different observers apply it. Pediatr Pulmonol. 2017;52:533-539. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Development of the Seasonal Migrant Agricultural Worker Stress Scale in Sanliurfa, Southeast Turkey.

PubMed

Simsek, Zeynep; Ersin, Fatma; Kirmizitoprak, Evin

2016-01-01

Stress is one of the main causes of health problems, especially mental disorders. These health problems cause a significant amount of ability loss and increase cost. It is estimated that by 2020, mental disorders will constitute 15% of the total disease burden, and depression will rank second only after ischemic heart disease. Environmental experiences are paramount in increasing the liability of mental disorders in those who constantly face sustained high levels of stress. The objective of this study was to develop a stress scale for seasonal migrant agricultural workers aged 18 years and older. The sample consisted of 270 randomly selected seasonal migrant agricultural workers. The average age of the participants was 33.1 ± 14, and 50.7% were male. The Cronbach alpha coefficient and test-retest methods were used for reliability analyses. Although the factor analysis was performed for the structure validity of the scale, the Kaiser-Meyer-Olkin coefficient and Bartlett test were used to determine the convenience of the data for the factor analysis. In the reliability analyses, the Cronbach alpha coefficient of internal consistency was calculated as .96, and the test-retest reliability coefficient was .81. In the exploratory factor analysis for validity of the scale, four factors were obtained, and the factors represented workplace physical conditions (25.7% of the total variance), workplace psychosocial and economic factors (19.3% of the total variance), workplace health problems (15.2% of the total variance), and school problems (10.1% of the total variance). The four factors explained 70.3% of the total variance. As a result of the expert opinions and analyses, a stress scale with 48 items was developed. The highest score to be obtained from the scale was 144, and the lowest score was 0. The increase in the score indicates the increase in the stress levels. The findings show that the scale is a valid and reliable assessment instrument that can be used in epidemiological research and planning interventions.
Reliability and validity of the Parenting Scale of Inconsistency.

PubMed

Yoshizumi, Takahiro; Murase, Satomi; Murakami, Takashi; Takai, Jiro

2006-08-01

The purposes of the present study were to develop a Parenting Scale of Inconsistency and to evaluate its initial reliability and validity. The 12 items assess the inconsistency among parents' moods, behaviors, and attitudes toward children. In the primary study, 517 participants completed three measures: the new Parenting Scale of Inconsistency, the Parental Bonding Instrument, and the Depression Scale of the General Health Questionnaire. The Parenting Scale of Inconsistency had good test-retest reliability of .85 and internal consistency of .88 (Cronbach coefficient alpha). Construct validity was good as Inconsistency scores were significantly correlated with the Care and Overprotection scores of the Parental Bonding Instrument and with the Depression scores. Moreover, Inconsistency scores' relation with a dimension of parenting style distinct from Care and Overprotection suggested that the Parenting Scale of Inconsistency had factorial validity. This scale seems a potential measure for examining the relationships between inconsistent parenting and the mental health of children.
Reliability and validity of the visual analogue scale for disability in patients with chronic musculoskeletal pain.

PubMed

Boonstra, Anne M; Schiphorst Preuper, Henrica R; Reneman, Michiel F; Posthumus, Jitze B; Stewart, Roy E

2008-06-01

To determine the reliability and concurrent validity of a visual analogue scale (VAS) for disability as a single-item instrument measuring disability in chronic pain patients was the objective of the study. For the reliability study a test-retest design and for the validity study a cross-sectional design was used. A general rehabilitation centre and a university rehabilitation centre was the setting for the study. The study population consisted of patients over 18 years of age, suffering from chronic musculoskeletal pain; 52 patients in the reliability study, 344 patients in the validity study. Main outcome measures were as follows. Reliability study: Spearman's correlation coefficients (rho values) of the test and retest data of the VAS for disability; validity study: rho values of the VAS disability scores with the scores on four domains of the Short-Form Health Survey (SF-36) and VAS pain scores, and with Roland-Morris Disability Questionnaire scores in chronic low back pain patients. Results were as follows: in the reliability study rho values varied from 0.60 to 0.77; and in the validity study rho values of VAS disability scores with SF-36 domain scores varied from 0.16 to 0.51, with Roland-Morris Disability Questionnaire scores from 0.38 to 0.43 and with VAS pain scores from 0.76 to 0.84. The conclusion of the study was that the reliability of the VAS for disability is moderate to good. Because of a weak correlation with other disability instruments and a strong correlation with the VAS for pain, however, its validity is questionable.
Short forms of the Child Perceptions Questionnaire for 11–14-year-old children (CPQ11–14): Development and initial evaluation

PubMed Central

Jokovic, Aleksandra; Locker, David; Guyatt, Gordan

2006-01-01

Background The Child Perceptions Questionnaire for children aged 11 to 14 years (CPQ11–14) is a 37-item measure of oral-health-related quality of life (OHRQoL) encompassing four domains: oral symptoms, functional limitations, emotional and social well-being. To facilitate its use in clinical settings and population-based health surveys, it was shortened to 16 and 8 items. Item impact and stepwise regression methods were used to produce each version. This paper describes the developmental process, compares the discriminative properties of the resulting four short-forms and evaluates their precision relative to the original CPQ11–14. Methods The item impact method used data from the CPQ11–14 item reduction study to select the questions with the highest impact scores in each domain. The regression method, where the dependent variable was the overall CPQ11–14 score and the independent variables its individual questions, was applied to the data collected in the validity study for the CPQ11–14. The measurement properties (i.e. criterion validity, construct validity, internal consistency reliability and test-retest reliability) of all 4 short-forms were evaluated using the data from the validity and reliability studies for the CPQ11–14. Results All short forms detected substantial variability in children's OHRQoL. The mean scores on the two 16-item questionnaires were almost identical, while on the two 8-item questionnaires they differed by only one score point. The mean scores standardized to 0–100 were higher on the short forms than the original CPQ11–14 (p < 0.001). There were strong significant correlations between all short-form scores and CPQ11–14 scores (0.87–0.98; p < 0.001). Hypotheses concerning construct validity were confirmed: the short-forms' scores were highest in the oro-facial, lower in the orthodontic and lowest in the paediatric dentistry group; all short-form questionnaires were positively correlated with the ratings of oral health and overall well-being, with the correlation coefficient being higher for the latter. The relative validity coefficients were 0.85 to 1.18. Cronbach's alpha and intraclass correlation coefficients ranged 0.71–0.83 and 0.71–0.77, respectively. Conclusion All short forms demonstrated excellent criterion validity and good construct validity. The reliability coefficients exceeded standards for group-level comparisons. However, these are preliminary findings based on the convenience sampling and further testing in replicated studies involving clinical and general samples of children in various settings is necessary to establish measurement sensitivity and discriminative properties of these questionnaires. PMID:16423298
Validity and reliability of the infant breastfeeding assessment tool, the mother baby assessment tool, and the LATCH scoring system.

PubMed

Altuntas, Nilgun; Turkyilmaz, Canan; Yildiz, Havva; Kulali, Ferit; Hirfanoglu, Ibrahim; Onal, Esra; Ergenekon, Ebru; Koç, Esin; Atalay, Yıldız

2014-05-01

We aimed to evaluate the validity and reliability of the Infant Breastfeeding Assessment Tool (IBFAT), the Mother Baby Assessment (MBA) Tool, and the LATCH scoring system. Mothers who delivered healthy, full-term infants in the Obstetrics & Gynecology Service of Gazi University, Ankara, Turkey, between December 2013 and January 2014 and their infants were included in the study. Forty-six randomly selected breastfeeding sessions were monitored and scored simultaneously by three researchers (Raters 1, 2, and 3) using LATCH, IBFAT, and the MBA Tool. Researchers put the score sheets in an envelope in order to hide them from each other. The compatibility of the scores given by three researchers was assessed by statistical methods. We found positive and significant correlation coefficients between 0.81 to 0.88 for the total MBA score, between 0.90 to 0.95 for the total IBFAT score, and between 0.85 to 0.91 for the total LATCH score. Correlation coefficients testing these three tools ranged from 0.71 to 0.88, with the minimum value being noted for the correlation between LATCH and IBFAT scores and the maximum value being noted for the correlation between LATCH and MBA scores. We found positive and significant correlations between researchers' scores for 46 observations using the three assessment tools. This study showed that these above-mentioned tools were compatible for the assessment of the efficiency of breastfeeding.
The minimal clinically important difference of the control of allergic rhinitis and asthma test (CARAT): cross-cultural validation and relation with pollen counts

PubMed Central

van der Leeuw, Sander; van der Molen, Thys; Dekhuijzen, PN Richard; Fonseca, Joao A; van Gemert, Frederik A; Gerth van Wijk, Roy; Kocks, Janwillem WH; Oosterom, Helma; Riemersma, Roland A; Tsiligianni, Ioanna G; de Weger, Letty A; Oude Elberink, Joanne NG; Flokstra-de Blok, Bertine MJ

2015-01-01

Background: The Control of Allergic Rhinitis and Asthma Test (CARAT) monitors control of asthma and allergic rhinitis. Aims: To determine the CARAT’s minimal clinically important difference (MCID) and to evaluate the psychometric properties of the Dutch CARAT. Methods: CARAT was applied in three measurements at 1-month intervals. Patients diagnosed with asthma and/or rhinitis were approached. MCID was evaluated using Global Rating of Change (GRC) and standard error of measurement (s.e.m.). Cronbach’s alpha was used to evaluate internal consistency. Spearman’s correlation coefficients were calculated between CARAT, the Asthma Control Questionnaire (ACQ5) and the Visual Analog Scale (VAS) on airway symptoms to determine construct and longitudinal validity. Test–retest reliability was evaluated with intra-class correlation coefficient (ICC). Changes in pollen counts were compared with delta CARAT and ACQ5 scores. Results: A total of 92 patients were included. The MCID of the CARAT was 3.50 based on GRC scores; the s.e.m. was 2.83. Cronbach’s alpha was 0.82. Correlation coefficients between CARAT and ACQ5 and VAS questions ranged from 0.64 to 0.76 (P<0.01). Longitudinally, correlation coefficients between delta CARAT scores and delta ACQ5 and VAS scores ranged from 0.41 to 0.67 (P<0.01). Test–retest reliability showed an ICC of 0.81 (P<0.01) and 0.80 (P<0.01). Correlations with pollen counts were higher for CARAT than for ACQ5. Conclusions: This is the first investigation of the MCID of the CARAT. The CARAT uses a whole-point scale, which suggests that the MCID is 4 points. The CARAT is a valid and reliable tool that is also applicable in the Dutch population. PMID:25569880
The reliability and validity of the Tokyo Autistic Behaviour Scale.

PubMed

Kurita, H; Miyake, Y

1990-03-01

The Tokyo Autistic Behavior Scale (TABS) consisting of 39 items provisionally grouped in four areas--interpersonal-social relationship, language-communication, habit-mannerism and others--is an instrument used by a child's caretaker to rate the child's autistic behaviors on a 3-point scale. Test-retest reliability was satisfactory (i.e., an r for a total score was .94). Among six DSM-III diagnostic groups, infantile autism showed a significantly higher total TABS score than the other five groups, and a taxonomic validity coefficient was .54. An r between total scores of the TABS and the Childhood Autism Rating Scale--Tokyo Version was .59. The area scores showed a lower validity than the total score. The TABS appears to be a useful instrument to assess autistic behavior.
A Psychometric Study of the Bayley Scales of Infant and Toddler Development in Persian Language Children.

PubMed

Azari, Nadia; Soleimani, Farin; Vameghi, Roshanak; Sajedi, Firoozeh; Shahshahani, Soheila; Karimi, Hossein; Kraskian, Adis; Shahrokhi, Amin; Teymouri, Robab; Gharib, Masoud

2017-01-01

Bayley Scales of infant & toddler development is a well-known diagnostic developmental assessment tool for children aged 1-42 months. Our aim was investigating the validity & reliability of this scale in Persian speaking children. The method was descriptive-analytic. Translation- back translation and cultural adaptation was done. Content & face validity of translated scale was determined by experts' opinions. Overall, 403 children aged 1 to 42 months were recruited from health centers of Tehran, during years of 2013-2014 for developmental assessment in cognitive, communicative (receptive & expressive) and motor (fine & gross) domains. Reliability of scale was calculated through three methods; internal consistency using Cronbach's alpha coefficient, test-retest and interrater methods. Construct validity was calculated using factor analysis and comparison of the mean scores methods. Cultural and linguistic changes were made in items of all domains especially on communication subscale. Content and face validity of the test were approved by experts' opinions. Cronbach's alpha coefficient was above 0.74 in all domains. Pearson correlation coefficient in various domains, were ≥ 0.982 in test retest method, and ≥0.993 in inter-rater method. Construct validity of the test was approved by factor analysis. Moreover, the mean scores for the different age groups were compared and statistically significant differences were observed between mean scores of different age groups, that confirms validity of the test. The Bayley Scales of Infant and Toddler Development is a valid and reliable tool for child developmental assessment in Persian language children.
Reliability and validity of the test of gross motor development-II in Korean preschool children: applying AHP.

PubMed

Kim, Chung-Il; Han, Dong-Wook; Park, Il-Hyeok

2014-04-01

The Test of Gross Motor Development-II (TGMD-II) is a frequently used assessment tool for measuring motor ability. The purpose of this study is to investigate the reliability and validity of TGMD-II's weighting scores (by comparing pre-weighted TGMD-II scores with post ones) as well as examine applicability of the TGMD-II on Korean preschool children. A total of 121 Korean children (three kindergartens) participated in this study. There were 65 preschoolers who were 5-years-old (37 boys and 28 girls) and 56 preschoolers who were 6-years-old (34 boys and 22 girls). For internal consistency, reliability, and construct validity, only one researcher evaluated all of the children using the TGMD-II in the following areas: running; galloping; sliding; hopping; leaping; horizontal jumping; overhand throwing; underhand rolling; striking a stationary ball; stationary dribbling; kicking; and catching. For concurrent validity, the evaluator measured physical fitness (strength, flexibility, power, agility, endurance, and balance). The key findings were as follows: first, the reliability coefficient and the validity coefficient between pre-weighted and post-weighted TGMD-II scores were quite similar. Second, the research showed adequate reliability and validity of the TGMD-II for Korean preschool children. The TGMD-II is a proper instrument to test Korean children's motor development. Yet, applying relative weighting on the TGMD-II should be a point of consideration. Copyright © 2014 Elsevier Ltd. All rights reserved.
A Turkish Version of the Critical-Care Pain Observation Tool: Reliability and Validity Assessment.

PubMed

Aktaş, Yeşim Yaman; Karabulut, Neziha

2017-08-01

The study aim was to evaluate the validity and reliability of the Critical-Care Pain Observation Tool in critically ill patients. A repeated measures design was used for the study. A convenience sample of 66 patients who had undergone open-heart surgery in the cardiovascular surgery intensive care unit in Ordu, Turkey, was recruited for the study. The patients were evaluated by using the Critical-Care Pain Observation Tool at rest, during a nociceptive procedure (suctioning), and 20 minutes after the procedure while they were conscious and intubated after surgery. The Turkish version of the Critical-Care Pain Observation Tool has shown statistically acceptable levels of validity and reliability. Inter-rater reliability was supported by moderate-to-high-weighted κ coefficients (weighted κ coefficient = 0.55 to 1.00). For concurrent validity, significant associations were found between the scores on the Critical-Care Pain Observation Tool and the Behavioral Pain Scale scores. Discriminant validity was also supported by higher scores during suctioning (a nociceptive procedure) versus non-nociceptive procedures. The internal consistency of the Critical-Care Pain Observation Tool was 0.72 during a nociceptive procedure and 0.71 during a non-nociceptive procedure. The validity and reliability of the Turkish version of the Critical-Care Pain Observation Tool was determined to be acceptable for pain assessment in critical care, especially for patients who cannot communicate verbally. Copyright © 2016 American Society of PeriAnesthesia Nurses. Published by Elsevier Inc. All rights reserved.
Development and validation of the irritable bowel syndrome scale under the system of quality of life instruments for chronic diseases QLICD-IBS: combinations of classical test theory and generalizability theory.

PubMed

Lei, Pingguang; Lei, Guanghe; Tian, Jianjun; Zhou, Zengfen; Zhao, Miao; Wan, Chonghua

2014-10-01

This paper is aimed to develop the irritable bowel syndrome (IBS) scale of the system of Quality of Life Instruments for Chronic Diseases (QLICD-IBS) by the modular approach and validate it by both classical test theory and generalizability theory. The QLICD-IBS was developed based on programmed decision procedures with multiple nominal and focus group discussions, in-depth interview, and quantitative statistical procedures. One hundred twelve inpatients with IBS were used to provide the data measuring QOL three times before and after treatments. The psychometric properties of the scale were evaluated with respect to validity, reliability, and responsiveness employing correlation analysis, factor analyses, multi-trait scaling analysis, t tests and also G studies and D studies of generalizability theory analysis. Multi-trait scaling analysis, correlation, and factor analyses confirmed good construct validity and criterion-related validity when using SF-36 as a criterion. Test-retest reliability coefficients (Pearson r and intra-class correlation (ICC)) for the overall score and all domains were higher than 0.80; the internal consistency α for all domains at two measurements were higher than 0.70 except for the social domain (0.55 and 0.67, respectively). The overall score and scores for all domains/facets had statistically significant changes after treatments with moderate or higher effect size standardized response mean (SRM) ranging from 0.72 to 1.02 at domain levels. G coefficients and index of dependability (Ф coefficients) confirmed the reliability of the scale further with more exact variance components. The QLICD-IBS has good validity, reliability, responsiveness, and some highlights and can be used as the quality of life instrument for patients with IBS.
Translation, adaptation and validation of a Portuguese version of the Moorehead-Ardelt Quality of Life Questionnaire II.

PubMed

Maciel, João; Infante, Paulo; Ribeiro, Susana; Ferreira, André; Silva, Artur C; Caravana, Jorge; Carvalho, Manuel G

2014-11-01

The prevalence of obesity has increased worldwide. An assessment of the impact of obesity on health-related quality of life (HRQoL) requires specific instruments. The Moorehead-Ardelt Quality of Life Questionnaire II (MA-II) is a widely used instrument to assess HRQoL in morbidly obese patients. The objective of this study was to translate and validate a Portuguese version of the MA-II.The study included forward and backward translations of the original MA-II. The reliability of the Portuguese MA-II was estimated using the internal consistency and test-retest methods. For validation purposes, the Spearman's rank correlation coefficient was used to evaluate the correlation between the Portuguese MA-II and the Portuguese versions of two other questionnaires, the 36-item Short Form Health Survey (SF-36) and the Impact of Weight on Quality of Life-Lite (IWQOL-Lite).One hundred and fifty morbidly obese patients were randomly assigned to test the reliability and validity of the Portuguese MA-II. Good internal consistency was demonstrated by a Cronbach's alpha coefficient of 0.80, and a very good agreement in terms of test-retest reliability was recorded, with an overall intraclass correlation coefficient (ICC) of 0.88. The total sums of MA-II scores and each item of MA-II were significantly correlated with all domains of SF-36 and IWQOL-Lite. A statistically significant negative correlation was found between the MA-II total score and BMI. Moreover, age, gender and surgical status were independent predictors of MA-II total score.A reliable and valid Portuguese version of the MA-II was produced, thus enabling the routine use of MA-II in the morbidly obese Portuguese population.
Appraising the quality of medical education research methods: the Medical Education Research Study Quality Instrument and the Newcastle-Ottawa Scale-Education.

PubMed

Cook, David A; Reed, Darcy A

2015-08-01

The Medical Education Research Study Quality Instrument (MERSQI) and the Newcastle-Ottawa Scale-Education (NOS-E) were developed to appraise methodological quality in medical education research. The study objective was to evaluate the interrater reliability, normative scores, and between-instrument correlation for these two instruments. In 2014, the authors searched PubMed and Google for articles using the MERSQI or NOS-E. They obtained or extracted data for interrater reliability-using the intraclass correlation coefficient (ICC)-and normative scores. They calculated between-scale correlation using Spearman rho. Each instrument contains items concerning sampling, controlling for confounders, and integrity of outcomes. Interrater reliability for overall scores ranged from 0.68 to 0.95. Interrater reliability was "substantial" or better (ICC > 0.60) for nearly all domain-specific items on both instruments. Most instances of low interrater reliability were associated with restriction of range, and raw agreement was usually good. Across 26 studies evaluating published research, the median overall MERSQI score was 11.3 (range 8.9-15.1, of possible 18). Across six studies, the median overall NOS-E score was 3.22 (range 2.08-3.82, of possible 6). Overall MERSQI and NOS-E scores correlated reasonably well (rho 0.49-0.72). The MERSQI and NOS-E are useful, reliable, complementary tools for appraising methodological quality of medical education research. Interpretation and use of their scores should focus on item-specific codes rather than overall scores. Normative scores should be used for relative rather than absolute judgments because different research questions require different study designs.
Scoring haemophilic arthropathy on X-rays: improving inter- and intra-observer reliability and agreement using a consensus atlas.

PubMed

Foppen, Wouter; van der Schaaf, Irene C; Beek, Frederik J A; Verkooijen, Helena M; Fischer, Kathelijn

2016-06-01

The radiological Pettersson score (PS) is widely applied for classification of arthropathy to evaluate costly haemophilia treatment. This study aims to assess and improve inter- and intra-observer reliability and agreement of the PS. Two series of X-rays (bilateral elbows, knees, and ankles) of 10 haemophilia patients (120 joints) with haemophilic arthropathy were scored by three observers according to the PS (maximum score 13/joint). Subsequently, (dis-)agreement in scoring was discussed until consensus. Example images were collected in an atlas. Thereafter, second series of 120 joints were scored using the atlas. One observer rescored the second series after three months. Reliability was assessed by intraclass correlation coefficients (ICC), agreement by limits of agreement (LoA). Median Pettersson score at joint level (PSjoint) of affected joints was 6 (interquartile range 3-9). Using the consensus atlas, inter-observer reliability of the PSjoint improved significantly from 0.94 (95 % confidence interval (CI) 0.91-0.96) to 0.97 (CI 0.96-0.98). LoA improved from ±1.7 to ±1.1 for the PSjoint. Therefore, true differences in arthropathy were differences in the PSjoint of >2 points. Intra-observer reliability of the PSjoint was 0.98 (CI 0.97-0.98), intra-observer LoA were ±0.9 points. Reliability and agreement of the PS improved by using a consensus atlas. • Reliability of the Pettersson score significantly improved using the consensus atlas. • The presented consensus atlas improved the agreement among observers. • The consensus atlas could be recommended to obtain a reproducible Pettersson score.
A two-factor theory for concussion assessment using ImPACT: memory and speed.

PubMed

Schatz, Philip; Maerlender, Arthur

2013-12-01

We present the initial validation of a two-factor structure of Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) using ImPACT composite scores and document the reliability and validity of this factor structure. Factor analyses were conducted for baseline (N = 21,537) and post-concussion (N = 560) data, yielding "Memory" (Verbal and Visual) and "Speed" (Visual Motor Speed and Reaction Time) Factors; inclusion of Total Symptom Scores resulted in a third discrete factor. Speed and Memory z-scores were calculated, and test-retest reliability (using intra-class correlation coefficients) at 1 month (0.88/0.81), 1 year (0.85/0.75), and 2 years (0.76/0.74) were higher than published data using Composite scores. Speed and Memory scores yielded 89% sensitivity and 70% specificity, which was higher than composites (80%/62%) and comparable with subscales (91%/69%). This emergent two-factor structure has improved test-retest reliability with no loss of sensitivity/specificity and may improve understanding and interpretability of ImPACT test results.
Translation, Validation and Reliability of the Kidney Diseases Quality of Life-Short Form (KDQOL-SF Form) Tool in Urdu.

PubMed

Anees, Muhammad; Ibrahim, Muhammad; Imtiaz, Marium; Batool, Shazia; Elahi, Irfan; Malik, Muzammil Riaz

2016-08-01

To translate, validate and assess the reliability of kidney disease quality of life - short form (KDQOL-SF-36) in Urdu, national language of Pakistan. Amulticentric descriptive cross-sectional study. Department of Nephrology, Mayo Hospital, Lahore, from February to July 2015. Patients of end-stage renal disease (ESRD) on maintenance hemodialysis (MHD) for more than three months, were included in the study. Patients of ESRD not on dialysis, and those with acute renal failure were excluded. The English version of KDQOL-SF-36 was translated in Urdu and then translated back in English; further validation was done by a senior professor of Punjab University, Lahore. One hundred and thirty patients were included in the study. Fifty patients were from Mayo Hospital, 35 from Shalamar Hospital and 50 from Shaikh Zayed Hospital, Lahore. The internal consistency reliability coefficient for overall scale was 0.84. Twelve sub-scales (symptoms, effect of kidney disease, burden of kidney disease, cognitive function, quality of social interaction, sexual function, social support, physical functioning, role physical, pain, emotional well-being and role emotional) had more than 0.70 internal consistency reliability coefficient. Overall mean score of the domains i.e kidney disease component score (KDCS), physical component score (PCS), and mental component score (MCS) was 60.62 ±17.61, 43.12 ±19.54, and 49.27 ±14.52, respectively. Asignificant positive relationship was observed between KDCS and MCS domains, KDCS and PCS domains, PCS, and MCS domains. The Urdu version of KDQOL-SF-36 is a reliable and valid version to measure QOLin kidney disease patients on dialysis in Pakistan.
Turkish version of the modified Constant-Murley score and standardized test protocol: reliability and validity.

PubMed

Çelik, Derya

2016-01-01

The Constant-Murley score (CMS) is widely used to evaluate disabilities associated with shoulder injuries, but it has been criticized for relying on imprecise terminology and a lack of standardized methodology. A modified guideline, therefore, was published in 2008 with several recommendations. This new version has not yet been translated or culturally adapted for Turkish-speaking populations. The purpose of this study was to translate and cross-culturally adapt the modified CMS and its test protocol, as well as define and measure its reliability and validity. The modified CMS was translated into Turkish, consistent with published methodological guidelines. The measurement properties of the Turkish version of the modified CMS were tested in 30 patients (12 males, 18 females; mean age: 59.5±13.5 years) with a variety of shoulder pathologies. Intraclass correlation coefficients (ICC) were used to estimate test-retest reliability. Construct validity was analyzed with the Turkish version of the American Shoulder and Elbow Surgeons (ASES) Standardized Shoulder Assessment Form and Short-Form Health Survey (SF-12). No difficulties were found in the translation process. The Turkish version of the modified CMS showed excellent test-retest reliability (ICC=0.86). The correlation coefficients between the Turkish version of the modified CMS and the ASES, SF-12-physical component score, and SF-12 mental component scores were found to be 0.48, 0.35, and 0.05, respectively. No floor or ceiling effects were found. The translation and cultural adaptation of the modified CMS and its standardized test protocol into Turkish were successful. The Turkish version of the modified CMS has sufficient reliability and validity to measure a variety of shoulder disorders for Turkish-speaking individuals.

Examining the Reliability of Interval Level Data Using Root Mean Square Differences and Concordance Correlation Coefficients

ERIC Educational Resources Information Center

Barchard, Kimberly A.

2012-01-01

This article introduces new statistics for evaluating score consistency. Psychologists usually use correlations to measure the degree of linear relationship between 2 sets of scores, ignoring differences in means and standard deviations. In medicine, biology, chemistry, and physics, a more stringent criterion is often used: the extent to which…
Consistency of near-death experience accounts over two decades: are reports embellished over time?

PubMed

Greyson, Bruce

2007-06-01

"Near-death experiences," commonly reported after clinical death and resuscitation, may require intervention and, if reliable, may elucidate altered brain functioning under extreme stress. It has been speculated that accounts of near-death experiences are exaggerated over the years. The objective of this study was to test the reliability over two decades of accounts of near-death experiences. Seventy-two patients with near-death experience who had completed the NDE scale in the 1980s (63% of the original cohort still alive) completed the scale a second time, without reference to the original scale administration. The primary outcome was differences in NDE scale scores on the two administrations. The secondary outcome was the statistical association between differences in scores and years elapsed between the two administrations. Mean scores did not change significantly on the total NDE scale, its 4 factors, or its 16 items. Correlation coefficients between scores on the two administrations were significant at P<0.001 for the total NDE scale, for its 4 factors, and for its 16 items. Correlation coefficients between score changes and time elapsed between the two administrations were not significant for the total NDE scale, for its 4 factors, or for its 16 items. Contrary to expectation, accounts of near-death experiences, and particularly reports of their positive affect, were not embellished over a period of almost two decades. These data support the reliability of near-death experience accounts.
Computer assisted Objective structured clinical examination versus Objective structured clinical examination in assessment of Dermatology undergraduate students.

PubMed

Chaudhary, Richa; Grover, Chander; Bhattacharya, S N; Sharma, Arun

2017-01-01

The assessment of dermatology undergraduates is being done through computer assisted objective structured clinical examination at our institution for the last 4 years. We attempted to compare objective structured clinical examination (OSCE) and computer assisted objective structured clinical examination (CA-OSCE) as assessment tools. To assess the relative effectiveness of CA-OSCE and OSCE as assessment tools for undergraduate dermatology trainees. Students underwent CA-OSCE as well as OSCE-based evaluation of equal weightage as an end of posting assessment. The attendance as well as the marks in both the examination formats were meticulously recorded and statistically analyzed using SPSS version 20.0. Intercooled Stata V9.0 was used to assess the reliability and internal consistency of the examinations conducted. Feedback from both students and examiners was also recorded. The mean attendance for the study group was 77% ± 12.0%. The average score on CA- OSCE and OSCE was 47.4% ± 19.8% and 53.5% ± 18%, respectively. These scores showed a mutually positive correlation, with Spearman's coefficient being 0.593. Spearman's rank correlation coefficient between attendance scores and assessment score was 0.485 for OSCE and 0.451 for CA-OSCE. The Cronbach's alpha coefficient for all the tests ranged from 0.76 to 0.87 indicating high reliability. The comparison was based on a single batch of 139 students. Such an evaluation on more students in larger number of batches over successive years could help throw more light on the subject. Computer assisted objective structured clinical examination was found to be a valid, reliable and effective format for dermatology assessment, being rated as the preferred format by examiners.
Interrater and Intrarater Reliability of the Tuck Jump Assessment by Health Professionals of Varied Educational Backgrounds

PubMed Central

Dudley, Lisa A.; Smith, Craig A.; Olson, Brandon K.; Chimera, Nicole J.

2013-01-01

Objective. The Tuck Jump Assessment (TJA), a clinical plyometric assessment, identifies 10 jumping and landing technique flaws. The study objective was to investigate TJA interrater and intrarater reliability with raters of different educational and clinical backgrounds. Methods. 40 participants were video recorded performing the TJA using published protocol and instructions. Five raters of varied educational and clinical backgrounds scored the TJA. Each score of the 10 technique flaws was summed for the total TJA score. Approximately one month later, 3 raters scored the videos again. Intraclass correlation coefficients determined interrater (5 and 3 raters for first and second session, resp.) and intrarater (3 raters) reliability. Results. Interrater reliability with 5 raters was poor (ICC = 0.47; 95% confidence intervals (CI) 0.33–0.62). Interrater reliability between 3 raters who completed 2 scoring sessions improved from 0.52 (95% CI 0.35–0.68) for session one to 0.69 (95% CI 0.55–0.81) for session two. Intrarater reliability was poor to moderate, ranging from 0.44 (95% CI 0.22–0.68) to 0.72 (95% CI 0.55–0.84). Conclusion. Published protocol and training of raters were insufficient to allow consistent TJA scoring. There may be a learned effect with the TJA since interrater reliability improved with repetition. TJA instructions and training should be modified and enhanced before clinical implementation. PMID:26464881
Development and validation of the Myasthenia Gravis Impairment Index.

PubMed

Barnett, Carolina; Bril, Vera; Kapral, Moira; Kulkarni, Abhaya; Davis, Aileen M

2016-08-30

We aimed to develop a measure of myasthenia gravis impairment using a previously developed framework and to evaluate reliability and validity, specifically face, content, and construct validity. The first draft of the Myasthenia Gravis Impairment Index (MGII) included examination items from available measures enriched with newly developed, patient-reported items, modified after patient input. International neuromuscular specialists evaluated face and content validity via an e-mail survey. Test-retest reliability was assessed in stable patients at a 3-week interval and interrater reliability was evaluated in the same day. Construct validity was assessed through correlations between the MGII and other measures and by comparing scores in different patient groups. The first draft was assessed by 18 patients, and 72 specialists answered the survey. The second draft had 7 examination and 22 patient-reported items. Field testing included 200 patients, with 54 patients completing the reliability studies. Test-retest reliability of the total score was good (intraclass correlation coefficient 0.92; 95% confidence interval 0.79-0.94), as was interrater reliability of the examination component (intraclass correlation coefficient 0.81; 95% confidence interval 0.79-0.94). The MGII correlated well with comparison measures, with higher correlations with the MG-activities of daily living (r = 0.91) and MG-specific quality of life 15-item scale (r = 0.78). When assessing different patient groups, the scores followed expected patterns. The MGII was developed using a patient-centered framework of myasthenia-related impairments and incorporating patient input throughout the development process. It is reliable in an outpatient setting and has demonstrated construct validity. Responsiveness studies are under way. © 2016 American Academy of Neurology.
Development and validation of the Myasthenia Gravis Impairment Index

PubMed Central

Bril, Vera; Kapral, Moira; Kulkarni, Abhaya; Davis, Aileen M.

2016-01-01

Objective: We aimed to develop a measure of myasthenia gravis impairment using a previously developed framework and to evaluate reliability and validity, specifically face, content, and construct validity. Methods: The first draft of the Myasthenia Gravis Impairment Index (MGII) included examination items from available measures enriched with newly developed, patient-reported items, modified after patient input. International neuromuscular specialists evaluated face and content validity via an e-mail survey. Test–retest reliability was assessed in stable patients at a 3-week interval and interrater reliability was evaluated in the same day. Construct validity was assessed through correlations between the MGII and other measures and by comparing scores in different patient groups. Results: The first draft was assessed by 18 patients, and 72 specialists answered the survey. The second draft had 7 examination and 22 patient-reported items. Field testing included 200 patients, with 54 patients completing the reliability studies. Test–retest reliability of the total score was good (intraclass correlation coefficient 0.92; 95% confidence interval 0.79–0.94), as was interrater reliability of the examination component (intraclass correlation coefficient 0.81; 95% confidence interval 0.79–0.94). The MGII correlated well with comparison measures, with higher correlations with the MG–activities of daily living (r = 0.91) and MG-specific quality of life 15-item scale (r = 0.78). When assessing different patient groups, the scores followed expected patterns. Conclusions: The MGII was developed using a patient-centered framework of myasthenia-related impairments and incorporating patient input throughout the development process. It is reliable in an outpatient setting and has demonstrated construct validity. Responsiveness studies are under way. PMID:27402891
Reliability of a retail food store survey and development of an accompanying retail scoring system to communicate survey findings and identify vendors for healthful food and marketing initiatives.

PubMed

Ghirardelli, Alyssa; Quinn, Valerie; Sugerman, Sharon

2011-01-01

To develop a retail grocery instrument with weighted scoring to be used as an indicator of the food environment. Twenty six retail food stores in low-income areas in California. Observational. Inter-rater reliability for grocery store survey instrument. Description of store scoring methodology weighted to emphasize availability of healthful food. Type A intra-class correlation coefficients (ICC) with absolute agreement definition or a κ test for measures using ranges as categories. Measures of availability and price of fruits and vegetables performed well in reliability testing (κ = 0.681-0.800). Items for vegetable quality were better than for fruit (ICC 0.708 vs 0.528). Kappa scores indicated low to moderate agreement (0.372-0.674) on external store marketing measures and higher scores for internal store marketing. "Next to" the checkout counter was more reliable than "within 6 feet." Health departments using the store scoring system reported it as the most useful communication of neighborhood findings. There was good reliability of the measures among the research pairs. The local store scores can show the need to bring in resources and to provide access to fruits and vegetables and other healthful food. Copyright © 2011 Society for Nutrition Education. Published by Elsevier Inc. All rights reserved.
Basic Scale on Insomnia complaints and Quality of Sleep (BaSIQS): reliability, initial validity and normative scores in higher education students.

PubMed

Allen Gomes, Ana; Ruivo Marques, Daniel; Meia-Via, Ana Maria; Meia-Via, Mariana; Tavares, José; Fernandes da Silva, Carlos; Pinto de Azevedo, Maria Helena

2015-04-01

Based on successive samples totaling more than 5000 higher education students, we scrutinized the reliability, structure, initial validity and normative scores of a brief self-report seven-item scale to screen for the continuum of nighttime insomnia complaints/perceived sleep quality, used by our team for more than a decade, henceforth labeled the Basic Scale on Insomnia complaints and Quality of Sleep (BaSIQS). In study/sample 1 (n = 1654), the items were developed based on part of a larger survey on higher education sleep-wake patterns. The test-retest study was conducted in an independent small group (n = 33) with a 2-8 week gap. In study/sample 2 (n = 360), focused mainly on validity, the BaSIQS was completed together with the Pittsburgh Sleep Quality Index (PSQI). In study 3, a large recent sample of students from universities all over the country (n = 2995) answered the BaSIQS items, based on which normative scores were determined, and an additional question on perceived sleep problems in order to further analyze the scale's validity. Regarding reliability, Cronbach alpha coefficients were systematically higher than 0.7, and the test-retest correlation coefficient was greater than 0.8. Structure analyses revealed consistently satisfactory two-factor and single-factor solutions. Concerning validity analyses, BaSIQS scores were significantly correlated with PSQI component scores and overall score (r = 0.652 corresponding to a large association); mean scores were significantly higher in those students classifying themselves as having sleep problems (p < 0.0001, d = 0.99 corresponding to a large effect size). In conclusion, the BaSIQS is very easy to administer, and appears to be a reliable and valid scale in higher education students. It might be a convenient short tool in research and applied settings to rapidly assess sleep quality or screen for insomnia complaints, and it may be easily used in other populations with minor adaptations.
Judging in Rhythmic Gymnastics at Different Levels of Performance.

PubMed

Leandro, Catarina; Ávila-Carvalho, Lurdes; Sierra-Palmeiro, Elena; Bobo-Arce, Marta

2017-12-01

This study aimed to analyse the quality of difficulty judging in rhythmic gymnastics, at different levels of performance. The sample consisted of 1152 difficulty scores concerning 288 individual routines, performed in the World Championships in 2013. The data were analysed using the mean absolute judge deviation from the final difficulty score, a Cronbach's alpha coefficient and intra-class correlations, for consistency and reliability assessment. For validity assessment, mean deviations of judges' difficulty scores, the Kendall's coefficient of concordance W and ANOVA eta-squared values were calculated. Overall, the results in terms of consistency (Cronbach's alpha mostly above 0.90) and reliability (intra-class correlations for single and average measures above 0.70 and 0.90, respectively) were satisfactory, in the first and third parts of the ranking on all apparatus. The medium level gymnasts, those in the second part of the ranking, had inferior reliability indices and highest score dispersion. In this part, the minimum of corrected item-total correlation of individual judges was 0.55, with most values well below, and the matrix for between-judge correlations identified remarkable inferior correlations. These findings suggest that the quality of difficulty judging in rhythmic gymnastics may be compromised at certain levels of performance. In future, special attention should be paid to the judging analysis of the medium level gymnasts, as well as the Code of Points applicability at this level.
Judging in Rhythmic Gymnastics at Different Levels of Performance

PubMed Central

Ávila-Carvalho, Lurdes; Sierra-Palmeiro, Elena; Bobo-Arce, Marta

2017-01-01

Abstract This study aimed to analyse the quality of difficulty judging in rhythmic gymnastics, at different levels of performance. The sample consisted of 1152 difficulty scores concerning 288 individual routines, performed in the World Championships in 2013. The data were analysed using the mean absolute judge deviation from the final difficulty score, a Cronbach’s alpha coefficient and intra-class correlations, for consistency and reliability assessment. For validity assessment, mean deviations of judges’ difficulty scores, the Kendall’s coefficient of concordance W and ANOVA eta-squared values were calculated. Overall, the results in terms of consistency (Cronbach’s alpha mostly above 0.90) and reliability (intra-class correlations for single and average measures above 0.70 and 0.90, respectively) were satisfactory, in the first and third parts of the ranking on all apparatus. The medium level gymnasts, those in the second part of the ranking, had inferior reliability indices and highest score dispersion. In this part, the minimum of corrected item-total correlation of individual judges was 0.55, with most values well below, and the matrix for between-judge correlations identified remarkable inferior correlations. These findings suggest that the quality of difficulty judging in rhythmic gymnastics may be compromised at certain levels of performance. In future, special attention should be paid to the judging analysis of the medium level gymnasts, as well as the Code of Points applicability at this level. PMID:29339996
Development and evaluation of the nurse quality of communication with patient questionnaire.

PubMed

Vuković, Mira; Gvozdenović, Branislav S; Stamatović-Gajić, Branka; Ilić, Miodrag; Gajić, Tomislav

2010-01-01

Nurse/patient relationship as a complex interrelation or as an interaction of the factor patient and factor nurse has been a subject of a number of studies during the past ten years. Nurse/patient communication is a special entity, usually observed within a framework of the wider nurse/patient relationship. In that regard, we wanted to develop a standardized questionnaire that could reliably measure the quality of communication between nurse and patient, and be used by nurses. The main goal of this study was to develop and evaluate construct validity of the Nurse Quality of Communication with Patient Questionnaire (NQCPQ), as well as to evaluate its reliability. The goal was also to establish a measure of inter-raters reliability, using two repeated measurements of results by items and scores of the NQCPQ, on the same observed units by two assessors. The starting NQCPQ that consists of 25 items, was filled in by two groups of nurses. Each nurse was questioned during morning and afternoon shifts, in order to evaluate their communication with hospitalized patients, using marks from 1 to 6. To evaluate construct validity, we used the analysis of main components, while reliability was assessed using intraclass correlation coefficient and Cronbach-alpha coefficient. To evaluate interraters reliability, we used Pearson correlation coefficient. Using a group of 118 patients, we explained 86% of the unknown, regarding the investigated phenomenon (communication nurse/patient), using one component by which we separated 6 items of the questionnaire. Inter-item correlation (alpha) in this component was 0.96. Pearson correlation coefficient was highly significant, value 0.7 by item, and correlation coefficient for scores at repeated measurements was 0.84. NQCPQ is 6-item instrument with high construct validity. It can be used to measure quality of nurse/patient communication in a simple, fast and reliable way. It could contribute to more adequate research and defining of this problem, and as such could be used in studies of interaction of psychometric, clinical, biochemical, socio-cultural, demographic and other parameters as well.
Validity and reliability of the Turkish Migraine Disability Assessment (MIDAS) questionnaire.

PubMed

Ertaş, Mustafa; Siva, Aksel; Dalkara, Turgay; Uzuner, Nevzat; Dora, Babür; Inan, Levent; Idiman, Fethi; Sarica, Yakup; Selçuki, Deniz; Sirin, Hadiye; Oğuzhanoğlu, Atilla; Irkeç, Ceyla; Ozmenoğlu, Mehmet; Ozbenli, Taner; Oztürk, Musa; Saip, Sabahattin; Neyal, Münife; Zarifoğlu, Mehmet

2004-09-01

The aim of this study is to assess the comprehensibility, internal consistency, patient-physician reliability, test-retest reliability, and validity of Turkish version of Migraine Disability Assessment (MIDAS) questionnaire in patients with headache. MIDAS questionnaire has been developed by Stewart et al and shown to be reliable and valid to determine the degree of disability caused by migraine. This study was designed as a national multicenter study to demonstrate the reliability and validity of Turkish version of MIDAS questionnaire. Patients applying to 17 Neurology Clinics in Turkey were evaluated at the baseline (visit 1), week 4 (visit 2), and week 12 (visit 3) visits in terms of disease severity and comprehensibility, internal consistency, test-retest reliability, and validity of MIDAS. Since the severity of the disease has been found to change significantly at visit 2 compared to visit 1, test-retest reliability was assessed using the MIDAS scores of a subgroup of patients whose disease severity remained unchanged (up to +/-3 days difference in the number of days with headache between visits 1 and 2). A total of 306 patients (86.2% female, mean age: 35.0 +/- 9.8 years) were enrolled into the study. A total of 65.7%, 77.5%, 82.0% of patients reported that "they had fully understood the MIDAS questionnaire" in visits 1, 2, and 3, respectively. A highly positive correlation was found between physician and patient and the applied total MIDAS scores in all three visits (Spearman correlation coefficients were R= 0.87, 0.83, and 0.90, respectively, P <.001). Internal consistency of MIDAS was assessed using Cronbach's alpha and was found at acceptable (>0.7) or excellent (>0.8) levels in both patient and physician applied MIDAS scores, respectively. Total MIDAS score showed good test-retest reliability (R= 0.68). Both the number of days with headache and the total MIDAS scores were positively correlated at all visits with correlation coefficients between 0.47 and 0.63. There was also a moderate degree of correlation (R= 0.54) between the total MIDAS score at week 12 and the number of days with headache at visit 2 + visit 3, which quantify headache-related disability over a 3-month period similar to MIDAS questionnaire. These findings demonstrated that the Turkish translation is equivalent to the English version of MIDAS in terms of internal consistency, test-retest reliability, and validity. Physicians can reliably use the Turkish translation of the MIDAS questionnaire in defining the severity of illness and its treatment strategy when applied as a self-administered report by migraine patients themselves.
The interrater and intrarater reliability of the functional movement screen: A systematic review with meta-analysis.

PubMed

Cuchna, Jennifer W; Hoch, Matthew C; Hoch, Johanna M

2016-05-01

To synthesize the literature and perform a meta-analysis for both the interrater and intrarater reliability of the FMS™. Academic Search Complete, CINAHL, Medline and SportsDiscus databases were systematically searched from inception to March 2015. Studies were included if the primary purpose was to determine the interrater or intrarater reliability of the FMS™, assessed and scored all 7-items using the standard scoring criteria, provided a composite score and employed intraclass correlation coefficients (ICCs). Studies were excluded if reliability was not the primary aim, participants were injured at data collection, or a modified FMS™ or scoring system was utilized. Seven papers were included; 6 assessing interrater and 6 assessing intrarater reliability. There was moderate evidence in good interrater reliability with a summary ICC of 0.843 (95% CI = 0.640, 0.936; Q7 = 84.915, p < 0.0001). There was moderate evidence in good intrarater reliability with a summary ICC of 0.869 (95% CI = 0.785, 0.921; Q12 = 60.763, p < 0.0001). There was moderate evidence for both forms of reliability. The sensitivity assessments revealed this interpretation is stable and not influenced by any one study. Overall, the FMS™ is a reliable tool for clinical practice. Copyright © 2015 Elsevier Ltd. All rights reserved.
Ultrasound definition of tendon damage in patients with rheumatoid arthritis. Results of a OMERACT consensus-based ultrasound score focussing on the diagnostic reliability.

PubMed

Bruyn, George A W; Hanova, Petra; Iagnocco, Annamaria; d'Agostino, Maria-Antonietta; Möller, Ingrid; Terslev, Lene; Backhaus, Marina; Balint, Peter V; Filippucci, Emilio; Baudoin, Paul; van Vugt, Richard; Pineda, Carlos; Wakefield, Richard; Garrido, Jesus; Pecha, Ondrej; Naredo, Esperanza

2014-11-01

To develop the first ultrasound scoring system of tendon damage in rheumatoid arthritis (RA) and assess its intraobserver and interobserver reliability. We conducted a Delphi study on ultrasound-defined tendon damage and ultrasound scoring system of tendon damage in RA among 35 international rheumatologists with experience in musculoskeletal ultrasound. Twelve patients with RA were included and assessed twice by 12 rheumatologists-sonographers. Ultrasound examination for tendon damage in B mode of five wrist extensor compartments (extensor carpi radialis brevis and longus; extensor pollicis longus; extensor digitorum communis; extensor digiti minimi; extensor carpi ulnaris) and one ankle tendon (tibialis posterior) was performed blindly, independently and bilaterally in each patient. Intraobserver and interobserver reliability were calculated by κ coefficients. A three-grade semiquantitative scoring system was agreed for scoring tendon damage in B mode. The mean intraobserver reliability for tendon damage scoring was excellent (κ value 0.91). The mean interobserver reliability assessment showed good κ values (κ value 0.75). The most reliable were the extensor digiti minimi, the extensor carpi ulnaris, and the tibialis posterior tendons. An ultrasound reference image atlas of tenosynovitis and tendon damage was also developed. Ultrasound is a reproducible tool for evaluating tendon damage in RA. This study strongly supports a new reliable ultrasound scoring system for tendon damage. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Reliability of the penetration aspiration scale with flexible endoscopic evaluation of swallowing.

PubMed

Butler, Susan G; Markley, Lisa; Sanders, Brian; Stuart, Andrew

2015-06-01

The Penetration Aspiration Scale (PAS), although designed for videofluoroscopy, has been utilized with flexible endoscopic evaluation of swallowing (FEES) in both research and clinical practice. The purpose of this investigation was to determine inter- and intrarater reliability of the PAS with FEES as a function of clinician FEES experience and retest interval. Three groups of 3 clinicians (N=9) with varying FEES experience (beginning, intermediate, and advanced) assigned PAS scores to 35 swallows. Initial ratings were repeated following short-term (ie, 1 day) and long-term (ie, 1 week) retest intervals. Intraclass correlation coefficients were calculated to assess interrater reliability on the first rating for each group. The coefficients were .91, .82, and .89 for the beginning, intermediate, and advanced clinicians, respectively. Overall interrater reliability across all 9 clinicians, irrespective of experience, was .85. Intraclass correlation coefficients were also calculated to assess intrarater reliability. The intrarater reliability for short- and long-term ratings was .90, .94, and .96 and .96, .97, and .94 for the beginning, intermediate, and advanced clinicians, respectively. Overall intrarater reliability across all 9 clinicians and all 3 ratings was .94. Excellent inter- and intrarater reliability was evidenced with the application of the PAS for FEES regardless of clinician experience and retest interval. © The Author(s) 2015.
Development and validation of a VISA tendinopathy questionnaire for greater trochanteric pain syndrome, the VISA-G.

PubMed

Fearon, A M; Ganderton, C; Scarvell, J M; Smith, P N; Neeman, T; Nash, C; Cook, J L

2015-12-01

Greater trochanteric pain syndrome (GTPS) is common, resulting in significant pain and disability. There is no condition specific outcome score to evaluate the degree of severity of disability associated with GTPS in patients with this condition. To develop a reliable and valid outcome measurement capable of evaluating the severity of disability associated with GTPS. A phenomenological framework using in-depth semi structured interviews of patients and medical experts, and focus groups of physiotherapists was used in the item generation. Item and format clarification was undertaken via piloting. Multivariate analysis provided the basis for item reduction. The resultant VISA-G was tested for reliability with the inter class co-efficient (ICC), internal consistency (Cronbach's Alpha), and construct validity (correlation co-efficient) on 52 naïve participants with GTPS and 31 asymptomatic participants. The resultant outcome measurement tool is consistent in style with existing tendinopathy outcome measurement tools, namely the suite of VISA scores. The VISA-G was found to be have a test-retest reliability of ICC2,1 (95% CI) of 0.827 (0.638-0.923). Internal consistency was high with a Cronbach's Alpha of 0.809. Construct validity was demonstrated: the VISA-G measures different constructs than tools previously used in assessing GTPS, the Harris Hip Score and the Oswestry Disability Index (Spearman Rho:0.020 and 0.0205 respectively). The VISA-G did not demonstrate any floor or ceiling effect in symptomatic participants. The VISA-G is a reliable and valid score for measuring the severity of disability associated GTPS. Copyright © 2015 Elsevier Ltd. All rights reserved.
Translation, reliability, and clinical utility of the Melbourne Assessment 2.

PubMed

Gerber, Corinna N; Plebani, Anael; Labruyère, Rob

2017-10-12

The aims were to (i) provide a German translation of the Melbourne Assessment 2 (MA2), a quantitative test to measure unilateral upper limb function in children with neurological disabilities and (ii) to evaluate its reliability and aspects of clinical utility. After its translation into German and approval of the back translation by the original authors, the MA2 was performed and videotaped twice with 30 children with neuromotor disorders. For each participant, two raters scored the video of the first test for inter-rater reliability. To determine test-retest reliability, one rater additionally scored the video of the second test while the other rater repeated the scoring of the first video to evaluate intra-rater reliability. Time needed for rater training, test administration, and scoring was recorded. The four subscale scores showed excellent intra-, inter-rater, and test-retest reliability with intraclass correlation coefficients of 0.90-1.00 (95%-confidence intervals 0.78-1.00). Score items revealed substantial to almost perfect intra-rater reliability (weighted kappa k w = 0.66-1.00) for the more affected side. Score item inter-rater and test-retest reliability of the same extremity were, with one exception, moderate to almost perfect (k w = 0.42-0.97; k w = 0.40-0.89). Furthermore, the MA2 was feasible and acceptable for patients and clinicians. The MA2 showed excellent subscale and moderate to almost perfect score item reliability. Implications for Rehabilitation There is a lack of high-quality studies about psychometric properties of upper limb measurement tools in the neuropediatric population. The Melbourne Assessment 2 is a promising tool for reliable measurement of unilateral upper limb movement quality in the neuropediatric population. The Melbourne Assessment 2 is acceptable and practicable to therapists and patients for routine use in clinical care.
Reliability of ultrasound thickness measurement of the abdominal muscles during clinical isometric endurance tests.

PubMed

ShahAli, Shabnam; Arab, Amir Massoud; Talebian, Saeed; Ebrahimi, Esmaeil; Bahmani, Andia; Karimi, Noureddin; Nabavi, Hoda

2015-07-01

The study was designed to evaluate the intra-examiner reliability of ultrasound (US) thickness measurement of abdominal muscles activity when supine lying and during two isometric endurance tests in subjects with and without Low back pain (LBP). A total of 19 women (9 with LBP, 10 without LBP) participated in the study. Within-day reliability of the US thickness measurements at supine lying and the two isometric endurance tests were assessed in all subjects. The intra-class correlation coefficient (ICC) was used to assess the relative reliability of thickness measurement. The standard error of measurement (SEM), minimal detectable change (MDC) and the coefficient of variation (CV) were used to evaluate the absolute reliability. Results indicated high ICC scores (0.73-0.99) and also small SEM and MDC scores for within-day reliability assessment. The Bland-Altman plots of agreement in US measurement of the abdominal muscles during the two isometric endurance tests demonstrated that 95% of the observations fall between the limits of agreement for test and retest measurements. Together the results indicate high intra-tester reliability for the US measurement of the thickness of abdominal muscles in all the positions tested. According to the study's findings, US imaging can be used as a reliable method for assessment of abdominal muscles activity in supine lying and the two isometric endurance tests employed, in participants with and without LBP. Copyright © 2014 Elsevier Ltd. All rights reserved.
The Chinese version of Instrument of Professional Attitude for Student Nurses (IPASN): Assessment of reliability and validity.

PubMed

Xiao, Yu-Ying; Li, Ting; Xiao, Lin; Wang, Su-Wei; Wang, Si-Qi; Wang, Han-Xiao; Wang, Bei-Bei; Gao, Yu-Lin

2017-02-01

Professional attitude is of great importance for nursing talents in the modern society. To develop an effective educational program for student nurses in China, an appropriate instrument is required for the assessment of their professional attitude. To assess the validity and reliability of the Instrument of Professional Attitude for Student Nurses (IPASN) in Chinese version. The original version of IPASN was translated through Brislin model (translation, back translation, culture adaption and pilot study) with the authorization from the developer. A total of 681 nursing students were chosen by stratified convenience sampling to assess construct validity using exploratory factor analysis (EFA). Besides, item analysis, Cronbach's alpha coefficients, test-retest reliability were conducted to test the psychometric properties in this part. A total of 204 nursing undergraduate trainees were selected by cluster convenience sampling to confirm the structure using confirmatory factor analysis (CFA) in another time. Corrected item-total correlations, alpha if item deleted were between 0.33 and 0.69, 0.906 and 0.913, respectively, indicating no item should be deleted. Cronbach alpha value was 0.91 for the total scale and Cronbach alpha coefficient for subscales ranged from 0.67 to 0.89. Test-retest reliability estimated from intraclass correlation coefficient (ICC) was 0.74 (P<0.05). Differences in item scores between the high-score group (the first 27%) and low-score group (the last 27%) were significant (P<0.001), indicating that the item discrimination ability was good. Seven subscales (contribution to increase of scientific information load, autonomy, community service, continuous education, to promote professional development, cooperation and theory guiding practice) were identified in EFA and confirmed in CFA, and explained 65.5% of the total variance. It indicated that the Chinese version of IPASN was valid and reliable for the evaluation of nursing students' professional attitude. Copyright © 2016 Elsevier Ltd. All rights reserved.
Reliability of the Adult Myopathy Assessment Tool in Individuals with Myositis

PubMed Central

Harris-Love, Michael O.; Joe, Galen; Davenport, Todd E.; Koziol, Deloris; Rose, Kristen Abbett; Shrader, Joseph A.; Vasconcelos, Olavo M.; McElroy, Beverly; Dalakas, Marinos C.

2015-01-01

Objective The Adult Myopathy Assessment Tool (AMAT) is a 13-item performance-based battery developed to assess functional status and muscle endurance. The purpose of this study was to determine the intrarater and interrater reliability of the AMAT in adults with myosits. Methods Nineteen raters (13 physical therapists and 6 physicians) scored videotaped recordings of patients with myositis performing the AMAT for a total of 114 tests and 1,482 item observations per session. Raters rescored the AMAT test and item observations during a follow up session (19 ±6 days between scoring sessions). All raters completed a single, self-directed, electronic training module prior to the initial scoring session. Results Intrarater and interrater reliability correlation coefficients were .94 or greater for the AMAT Functional Subscale, Endurance Subscale, and Total score (all p < 0.02 for Ho:ρ ≤ 0.75). All AMAT items had satisfactory intrarater agreement (Kappa statistics with Fleiss-Cohen weights, Kw = .57-1.00). Interrater agreement was acceptable for each AMAT item (K = .56-.89) except the sit up (K = .16). The standard error of measurement and 95% confidence interval range for the AMAT Total scores did not exceed 2 points across all observations (AMAT Total score range = 0-45). Conclusions The AMAT is a reliable, domain-specific assessment of functional status and muscle endurance for adult subjects with myositis. Results of this study suggest that physicians and physical therapists may reliably score the AMAT following a single training session. The AMAT Functional Subscale, Endurance Subscale, and Total score exhibit interrater and intrarater reliability suitable for clinical and research use. PMID:25201624

Assessment of the reliability and consistency of the "malnutrition inflammation score" (MIS) in Mexican adults with chronic kidney disease for diagnosis of protein-energy wasting syndrome (PEW).

PubMed

González-Ortiz, Ailema Janeth; Arce-Santander, Celene Viridiana; Vega-Vega, Olynka; Correa-Rotter, Ricardo; Espinosa-Cuevas, María de Los Angeles

2014-10-04

The protein-energy wasting syndrome (PEW) is a condition of malnutrition, inflammation, anorexia and wasting of body reserves resulting from inflammatory and non-inflammatory conditions in patients with chronic kidney disease (CKD).One way of assessing PEW, extensively described in the literature, is using the Malnutrition Inflammation Score (MIS). To assess the reliability and consistency of MIS for diagnosis of PEW in Mexican adults with CKD on hemodialysis (HD). Study of diagnostic tests. A sample of 45 adults with CKD on HD were analyzed during the period June-July 2014.The instrument was applied on 2 occasions; the test-retest reliability was calculated using the Intraclass Correlation Coefficient (ICC); the internal consistency of the questionnaire was analyzed using Cronbach's αcoefficient. A weighted Kappa test was used to estimate the validity of the instrument; the result was subsequently compared with the Bilbrey nutritional index (BNI). The reliability of the questionnaires, evaluated in the patient sample, was ICC=0.829.The agreement between MIS observations was considered adequate, k= 0.585 (p <0.001); when comparing it with BNI, a value of k = 0.114 was obtained (p <0.001).In order to estimate the tendency, a correlation test was performed. The r² correlation coefficient was 0.488 (P <0.001). MIS has adequate reliability and validity for diagnosing PEW in the population with chronic kidney disease on HD. Copyright AULA MEDICA EDICIONES 2014. Published by AULA MEDICA. All rights reserved.
[Development and validation of the Chinese version of modified body imgae scale in Chinese population].

PubMed

Gao, X X; Zhu, L; Yu, S J; Xu, T

2018-02-25

Objective: To develop the Chinese version of modified body image scale (MBIS) questionnaires, and to validate them in Chinese population. Methods: The original English MBIS questionnaire was translated into Chinese, following the WHO cross-cultural adaptation of health-related quality of life measures. The reliability and validity of the Chinese version of MBIS questionnaires were evaluated in Chinese population, MRKH syndrome patients. Results: Totally 50 patients with MRKH syndrome completed the MBIS and short-form 12-item health survey (SF-12) questionnaires. The Cronbach's alpha of MBIS was 0.741, intraclass correlation coefficients were 0.472-0.815 ( P< 0.01). MBIS scores were positively correlated with SF-12 scores (Spearman correlation coefficient was-0.409, P< 0.01) . Factor analysis showed that MBIS had one common factor. Conclusion: Chinese version of MBIS has high reliability and validity in Chinese population, therefore is suitable for clinic and research.
The reliability and validity of the Complex Task Performance Assessment: A performance-based assessment of executive function.

PubMed

Wolf, Timothy J; Dahl, Abigail; Auen, Colleen; Doherty, Meghan

2017-07-01

The objective of this study was to evaluate the inter-rater reliability, test-retest reliability, concurrent validity, and discriminant validity of the Complex Task Performance Assessment (CTPA): an ecologically valid performance-based assessment of executive function. Community control participants (n = 20) and individuals with mild stroke (n = 14) participated in this study. All participants completed the CTPA and a battery of cognitive assessments at initial testing. The control participants completed the CTPA at two different times one week apart. The intra-class correlation coefficient (ICC) for inter-rater reliability for the total score on the CTPA was .991. The ICCs for all of the sub-scores of the CTPA were also high (.889-.977). The CTPA total score was significantly correlated to Condition 4 of the DKEFS Color-Word Interference Test (p = -.425), and the Wechsler Test of Adult Reading (p = -.493). Finally, there were significant differences between control subjects and individuals with mild stroke on the total score of the CTPA (p = .007) and all sub-scores except interpretation failures and total items incorrect. These results are also consistent with other current executive function performance-based assessments and indicate that the CTPA is a reliable and valid performance-based measure of executive function.
Effect of clinically discriminating, evidence-based checklist items on the reliability of scores from an Internal Medicine residency OSCE.

PubMed

Daniels, Vijay J; Bordage, Georges; Gierl, Mark J; Yudkowsky, Rachel

2014-10-01

Objective structured clinical examinations (OSCEs) are used worldwide for summative examinations but often lack acceptable reliability. Research has shown that reliability of scores increases if OSCE checklists for medical students include only clinically relevant items. Also, checklists are often missing evidence-based items that high-achieving learners are more likely to use. The purpose of this study was to determine if limiting checklist items to clinically discriminating items and/or adding missing evidence-based items improved score reliability in an Internal Medicine residency OSCE. Six internists reviewed the traditional checklists of four OSCE stations classifying items as clinically discriminating or non-discriminating. Two independent reviewers augmented checklists with missing evidence-based items. We used generalizability theory to calculate overall reliability of faculty observer checklist scores from 45 first and second-year residents and predict how many 10-item stations would be required to reach a Phi coefficient of 0.8. Removing clinically non-discriminating items from the traditional checklist did not affect the number of stations (15) required to reach a Phi of 0.8 with 10 items. Focusing the checklist on only evidence-based clinically discriminating items increased test score reliability, needing 11 stations instead of 15 to reach 0.8; adding missing evidence-based clinically discriminating items to the traditional checklist modestly improved reliability (needing 14 instead of 15 stations). Checklists composed of evidence-based clinically discriminating items improved the reliability of checklist scores and reduced the number of stations needed for acceptable reliability. Educators should give preference to evidence-based items over non-evidence-based items when developing OSCE checklists.
An alternative to the balance error scoring system: using a low-cost balance board to improve the validity/reliability of sports-related concussion balance testing.

PubMed

Chang, Jasper O; Levy, Susan S; Seay, Seth W; Goble, Daniel J

2014-05-01

Recent guidelines advocate sports medicine professionals to use balance tests to assess sensorimotor status in the management of concussions. The present study sought to determine whether a low-cost balance board could provide a valid, reliable, and objective means of performing this balance testing. Criterion validity testing relative to a gold standard and 7 day test-retest reliability. University biomechanics laboratory. Thirty healthy young adults. Balance ability was assessed on 2 days separated by 1 week using (1) a gold standard measure (ie, scientific grade force plate), (2) a low-cost Nintendo Wii Balance Board (WBB), and (3) the Balance Error Scoring System (BESS). Validity of the WBB center of pressure path length and BESS scores were determined relative to the force plate data. Test-retest reliability was established based on intraclass correlation coefficients. Composite scores for the WBB had excellent validity (r = 0.99) and test-retest reliability (R = 0.88). Both the validity (r = 0.10-0.52) and test-retest reliability (r = 0.61-0.78) were lower for the BESS. These findings demonstrate that a low-cost balance board can provide improved balance testing accuracy/reliability compared with the BESS. This approach provides a potentially more valid/reliable, yet affordable, means of assessing sports-related concussion compared with current methods.
Japanese Adaptation of the Stroke and Aphasia Quality of Life Scale-39 (SAQOL-39): Comparative Study among Different Types of Aphasia.

PubMed

Kamiya, Akane; Kamiya, Kentaro; Tatsumi, Hiroshi; Suzuki, Makihiko; Horiguchi, Satoshi

2015-11-01

We have developed a Japanese version of the Stroke and Aphasia Quality of Life Scale-39 (SAQOL-39), designated as SAQOL-39-J, and used psychometric methods to examine its acceptability and reliability. The acceptability and reliability of SAQOL-39-J, which was developed from the English version using a standard translation and back-translation method, were examined in 54 aphasia patients using standard psychometric methods. The acceptability and reliability of SAQOL-39-J were then compared among patients with different types of aphasia. SAQOL-39-J showed good acceptability, internal consistency (Cronbach's α score = .90), and test-retest reliability (intraclass correlation coefficient = .97). Broca's aphasia patients showed the lowest total scores and communication scores on SAQOL-39-J. The Japanese version of SAQOL-39, SAQOL-39-J, provides acceptable and reliable data in Japanese stroke patients with aphasia. Among different types of aphasia, Broca's aphasia patients had the lowest total and communication SAQOL-39-J scores. Further studies are needed to assess the effectiveness of health care interventions on health-related quality of life in this population. Copyright © 2015 National Stroke Association. Published by Elsevier Inc. All rights reserved.
Testing reliability and validity of oral impacts on daily performances for Chinese-speaking elderly Singaporeans.

PubMed

Nair, Rahul; Tsakos, Georgios; Yee Ting Fai, Robert

2016-12-01

To cross-culturally adapt the oral impacts on daily performance (OIDP) and assess its reliability and validity on Chinese-speaking community dwelling elderly Singaporeans. There are no previous reports of valid oral health-related quality of life instruments for elderly Singaporeans or perceived conditions associated with impacts reported in OIDP among the Singaporean elders. The OIDP was translated from English to Chinese and then back translated. The OIDP questionnaire along with questions related to overall quality of life and self-rated dental health was administered to 202 Chinese-speaking elderly Singaporeans by trained interviewers, and it was repeated after 1 month. Test-retest reliability was assessed using intraclass correlation coefficient; internal consistency was established using Cronbach's alpha, and construct validity using correlation coefficients with self-reported oral health-related and global quality of life measures. In addition, Kruskal-Wallis tests assessed differences in the OIDP score between different subjective health and global quality of life groups. The median age of participants was 75 years. About 19% reported oral impacts and difficulty eating was the most prevalent oral impact. Internal consistency was good with a Cronbach's alpha of 0.75, and the intraclass correlation coefficient was 0.75 (0.67-0.81). OIDP was significantly correlated with all measures of self-reported oral health and global ratings of quality of life, with correlation coefficients ranging between 0.15 and 0.52. Groups with worse perceptions about their health and quality of life had significantly higher OIDP scores. The OIDP showed successful reliability and validity for its use among Chinese-speaking older Singaporeans. © 2015 John Wiley & Sons A/S and The Gerodontology Association. Published by John Wiley & Sons Ltd.
Reliability of the modified Paediatric Evaluation of Disability Inventory, Dutch version (PEDI-NL) for children with cerebral palsy and cerebral visual impairment.

PubMed

Salavati, M; Waninge, A; Rameckers, E A A; de Blécourt, A C E; Krijnen, W P; Steenbergen, B; van der Schans, C P

2015-02-01

The aims of this study were to adapt the Paediatric Evaluation of Disability Inventory, Dutch version (PEDI-NL) for children with cerebral visual impairment (CVI) and cerebral palsy (CP) and determine test-retest and inter-respondent reliability. The Delphi method was used to gain consensus among twenty-one health experts familiar with CVI. Test-retest and inter-respondent reliability were assessed for parents and caregivers of 75 children (aged 50-144 months) with CP and CVI. The percentage identical scores of item scores were computed, as well as the interclass coefficients (ICC) and Cronbach's alphas of scale scores over the domains self-care, mobility, and social function. All experts agreed on the adaptation of the PEDI-NL for children with CVI. On item score, for the Functional Skills scale, mean percentage identical scores variations for test-retest reliability were 73-79 with Caregiver Assistance scale 73-81, and for inter-respondent reliability 21-76 with Caregiver Assistance scale 40-43. For all scales over all domains ICCs exceeded 0.87. For the domains self-care, mobility, and social function, the Functional Skills scale and the Caregiver Assistance scale have Cronbach's alpha above 0.88. The adapted PEDI-NL for children with CP and CVI is reliable and comparable to the original PEDI-NL. Copyright © 2014 Elsevier Ltd. All rights reserved.
Analysis of the reliability and validity of the Turkish version of the intermittent and constant osteoarthritis pain questionnaire.

PubMed

Erel, Suat; Şimşek, İbrahim Engin; Özkan, Hüseyin

2015-01-01

The aim of this study was to analyze the validity and reliability of the Turkish version (ICOAP-TR) of the intermittent and constant osteoarthritis pain (ICOAP) questionnaire in patients with knee osteoarthritis (OA). Thirty-eight volunteer patients diagnosed with knee OA answered the questionnaire twice with an interval of 2-4 days. The reliability of the measurement was assessed using Cronbach's alpha coefficient and intraclass correlation (ICC) for test-retest reliability. Criterion validity was tested against the Western Ontario and McMaster Universities Arthritis Index (WOMAC) pain score and visual analog scale (VAS) designed to assess the perceived discomfort rated by the patient. Test-retest reliability was found to be ICC=0.942 for total score, 0.902 for constant pain subscale, and 0.945 for intermittent pain subscale. Internal consistency was tested using Cronbach's alpha and was found to be 0.970 for total score, 0.948 for constant pain subscale, and 0.972 for intermittent pain subscale. For criterion validity, the correlation between the total score of ICOAP-TR and WOMAC pain subscale was r=0.779 (p<0.05), and correlation between total score of ICOAP-TR and VAS was r=0.570 (p<0.05). The ICOAP-TR is a reliable and valid instrument to be used with patients with knee OA.
Development of a Valid and Reliable Knee Articular Cartilage Condition-Specific Study Methodological Quality Score.

PubMed

Harris, Joshua D; Erickson, Brandon J; Cvetanovich, Gregory L; Abrams, Geoffrey D; McCormick, Frank M; Gupta, Anil K; Verma, Nikhil N; Bach, Bernard R; Cole, Brian J

2014-02-01

Condition-specific questionnaires are important components in evaluation of outcomes of surgical interventions. No condition-specific study methodological quality questionnaire exists for evaluation of outcomes of articular cartilage surgery in the knee. To develop a reliable and valid knee articular cartilage-specific study methodological quality questionnaire. Cross-sectional study. A stepwise, a priori-designed framework was created for development of a novel questionnaire. Relevant items to the topic were identified and extracted from a recent systematic review of 194 investigations of knee articular cartilage surgery. In addition, relevant items from existing generic study methodological quality questionnaires were identified. Items for a preliminary questionnaire were generated. Redundant and irrelevant items were eliminated, and acceptable items modified. The instrument was pretested and items weighed. The instrument, the MARK score (Methodological quality of ARticular cartilage studies of the Knee), was tested for validity (criterion validity) and reliability (inter- and intraobserver). A 19-item, 3-domain MARK score was developed. The 100-point scale score demonstrated face validity (focus group of 8 orthopaedic surgeons) and criterion validity (strong correlation to Cochrane Quality Assessment score and Modified Coleman Methodology Score). Interobserver reliability for the overall score was good (intraclass correlation coefficient [ICC], 0.842), and for all individual items of the MARK score, acceptable to perfect (ICC, 0.70-1.000). Intraobserver reliability ICC assessed over a 3-week interval was strong for 2 reviewers (≥0.90). The MARK score is a valid and reliable knee articular cartilage condition-specific study methodological quality instrument. This condition-specific questionnaire may be used to evaluate the quality of studies reporting outcomes of articular cartilage surgery in the knee.
Reliability and Construct Validity of Limits of Stability Test in Adolescents Using a Portable Forceplate System.

PubMed

Alsalaheen, Bara; Haines, Jamie; Yorke, Amy; Broglio, Steven P

2015-12-01

To examine the reliability, convergent, and discriminant validity of the limits of stability (LOS) test to assess dynamic postural stability in adolescents using a portable forceplate system. Cross-sectional reliability observational study. School setting. Adolescents (N=36) completed all measures during the first session. To examine the reliability of the LOS test, a subset of 15 participants repeated the LOS test after 1 week. Not applicable. Outcome measurements included the LOS test, Balance Error Scoring System, Instrumented Balance Error Scoring System, and Modified Clinical Test for Sensory Interaction on Balance. A significant relation was observed among LOS composite scores (r=.36-.87, P<.05). However, no relation was observed between LOS and static balance outcome measurements. The reliability of the LOS composite scores ranged from moderate to good (intraclass correlation coefficient model 2,1=.73-.96). The results suggest that the LOS composite scores provide unique information about dynamic postural stability, and the LOS test completed at 100% of the theoretical limit appeared to be a reliable test of dynamic postural stability in adolescents. Clinicians should use dynamic balance measurement as part of their balance assessment and should not use static balance testing (eg, Balance Error Scoring System) to make inferences about dynamic balance, especially when balance assessment is used to determine rehabilitation outcomes, or when making return to play decisions after injury. Copyright © 2015 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Measuring stakeholder participation in evaluation: an empirical validation of the Participatory Evaluation Measurement Instrument (PEMI).

PubMed

Daigneault, Pierre-Marc; Jacob, Steve; Tremblay, Joël

2012-08-01

Stakeholder participation is an important trend in the field of program evaluation. Although a few measurement instruments have been proposed, they either have not been empirically validated or do not cover the full content of the concept. This study consists of a first empirical validation of a measurement instrument that fully covers the content of participation, namely the Participatory Evaluation Measurement Instrument (PEMI). It specifically examines (1) the intercoder reliability of scores derived by two research assistants on published evaluation cases; (2) the convergence between the scores of coders and those of key respondents (i.e., authors); and (3) the convergence between the authors' scores on the PEMI and the Evaluation Involvement Scale (EIS). A purposive sample of 40 cases drawn from the evaluation literature was used to assess reliability. One author per case in this sample was then invited to participate in a survey; 25 fully usable questionnaires were received. Stakeholder participation was measured on nominal and ordinal scales. Cohen's κ, the intraclass correlation coefficient, and Spearman's ρ were used to assess reliability and convergence. Reliability results ranged from fair to excellent. Convergence between coders' and authors' scores ranged from poor to good. Scores derived from the PEMI and the EIS were moderately associated. Evidence from this study is strong in the case of intercoder reliability and ranges from weak to strong in the case of convergent validation. Globally, this suggests that the PEMI can produce scores that are both reliable and valid.
Preliminary evidence for good psychometric properties of the Norwegian version of the Brief Problems Monitor (BPM).

PubMed

Richter, Jörg

2015-04-01

Methods to assess intervention progress and outcome for frequent use are needed. To provide preliminary information about psychometric properties for the Norwegian version of the Brief Problems Monitor. Cronbach's alpha scores and intra-class correlation coefficients as indicators for internal consistency (reliability) and Pearson correlation coefficients between corresponding subscales of the long and short ASEBA form versions as well as multiple regression coefficients to explore the predictive power of the reduced item-set related to the corresponding scale-scores of the long version were calculated in large, representative data sets of Norwegian children and adolescents. Cronbach's alpha scores of the Norwegian version of the BPM subscales varied between 0.67 (attention BPM-youth) and 0.88 (attention BPM-teacher) and between 0.90 (BPM-youth) and 0.96 (BPM-teacher) for its total problem score. Corresponding subscales from the long versions and the BPM as well as the total problems scores were closely correlated with coefficients of high effect size (all r > 0.80). The variance of the items of the BPM explained about three-quarters or more of the variance in the corresponding subscales of the long version. The Norwegian BPM has good psychometric properties in terms of 1) being acceptable to good internal consistency and in terms of 2) regression coefficients of high effect size from the BPM items to the problem-scale scores of the long versions as validity indicators. Its use in clinical practice and research can be recommended.
Reliability of Neurobehavioral Assessments from Birth to Term Equivalent Age in Preterm and Term Born Infants.

PubMed

Eeles, Abbey L; Olsen, Joy E; Walsh, Jennifer M; McInnes, Emma K; Molesworth, Charlotte M L; Cheong, Jeanie L Y; Doyle, Lex W; Spittle, Alicia J

2017-02-01

Neurobehavioral assessments provide insight into the functional integrity of the developing brain and help guide early intervention for preterm (<37 weeks' gestation) infants. In the context of shorter hospital stays, clinicians often need to assess preterm infants prior to term equivalent age. Few neurobehavioral assessments used in the preterm period have established interrater reliability. To evaluate the interrater reliability of the Hammersmith Neonatal Neurological Examination (HNNE) and the NICU Network Neurobehavioral Scale (NNNS), when used both preterm and at term (>36 weeks). Thirty-five preterm infants and 11 term controls were recruited. Five assessors double-scored the HNNE and NNNS administered either preterm or at term. A one-way random effects, absolute, single-measures interclass correlation coefficient (ICC) was calculated to determine interrater reliability. Interrater reliability for the HNNE was excellent (ICC > 0.74) for optimality scores, and good (ICC 0.60-0.74) to excellent for subtotal scores, except for 'Tone Patterns' (ICC 0.54). On the NNNS, interrater reliability was predominantly excellent for all items. Interrater agreement was generally excellent at both time points. Overall, the HNNE and NNNS neurobehavioral assessments demonstrated mostly excellent interrater reliability when used prior to term and at term.
Preliminary testing of the reliability and feasibility of SAGE: a system to measure and score engagement with and use of research in health policies and programs.

PubMed

Makkar, Steve R; Williamson, Anna; D'Este, Catherine; Redman, Sally

2017-12-19

Few measures of research use in health policymaking are available, and the reliability of such measures has yet to be evaluated. A new measure called the Staff Assessment of Engagement with Evidence (SAGE) incorporates an interview that explores policymakers' research use within discrete policy documents and a scoring tool that quantifies the extent of policymakers' research use based on the interview transcript and analysis of the policy document itself. We aimed to conduct a preliminary investigation of the usability, sensitivity, and reliability of the scoring tool in measuring research use by policymakers. Nine experts in health policy research and two independent coders were recruited. Each expert used the scoring tool to rate a random selection of 20 interview transcripts, and each independent coder rated 60 transcripts. The distribution of scores among experts was examined, and then, interrater reliability was tested within and between the experts and independent coders. Average- and single-measure reliability coefficients were computed for each SAGE subscales. Experts' scores ranged from the limited to extensive scoring bracket for all subscales. Experts as a group also exhibited at least a fair level of interrater agreement across all subscales. Single-measure reliability was at least fair except for three subscales: Relevance Appraisal, Conceptual Use, and Instrumental Use. Average- and single-measure reliability among independent coders was good to excellent for all subscales. Finally, reliability between experts and independent coders was fair to excellent for all subscales. Among experts, the scoring tool was comprehensible, usable, and sensitive to discriminate between documents with varying degrees of research use. Secondly, the scoring tool yielded scores with good reliability among the independent coders. There was greater variability among experts, although as a group, the tool was fairly reliable. The alignment between experts' and independent coders' ratings indicates that the independent coders were scoring in a manner comparable to health policy research experts. If the present findings are replicated in a larger sample, end users (e.g. policy agency staff) could potentially be trained to use SAGE to reliably score research use within their agencies, which would provide a cost-effective and time-efficient approach to utilising this measure in practice.
Music therapy career aptitude and generalized self-efficacy in music therapy students.

PubMed

Lim, Hayoung A; Befi, Cathy M

2014-01-01

While the Music Therapy Career Aptitude Test (MTCAT) provides a measure of student aptitude, measures of perceived self-efficacy may provide additional information about a students' suitability for a music therapy career. As a first step in determining whether future studies examining combined scores from the MTCAT and the Generalized Self-Efficacy (GSE) scale would be useful to help predict academic success in music therapy, we explored the internal reliability of these two measures in a sample of undergraduate students, and the relationship (concurrent validity) of the measures to one another. Eighty undergraduate music therapy students (14 male; 66 female) completed the MTCAT and GSE. To determine internal reliability we conducted tests of normality and calculated Cronbach's Coefficient Alpha for each measure. Pearson correlation coefficients were calculated to ascertain the strength of the relationship between the MTCAT and GSE. MTCAT scores were normally distributed and had high internal consistency (Cronbach's α = 0.706). GSE scores were not normally distributed, but had high internal consistency (Cronbach's α = 0.748). The correlation coefficient analysis revealed that MTCAT and GSE scores were moderately correlated ((r = 0.426, p < 0.0001). MTCAT scores can be used to partially determine perceived self-efficacy in undergraduate music therapy students; however, a more complete picture of student suitability for music therapy may be determined by administering the GSE alongside the MTCAT. Future studies are needed to determine whether combined MTCAT and GSE scores can be used to predict student success in an undergraduate music therapy program. © the American Music Therapy Association 2014. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Validity and reliability of the Fels physical activity questionnaire for children.

PubMed

Treuth, Margarita S; Hou, Ningqi; Young, Deborah R; Maynard, L Michele

2005-03-01

The aim was to evaluate the reliability and validity of the Fels physical activity questionnaire (PAQ) for children 7-19 yr of age. A cross-sectional study was conducted among 130 girls and 99 boys in elementary (N=70), middle (N=81), and high (N=78) schools in rural Maryland. Weight and height were measured on the initial school visit. All the children then wore an Actiwatch accelerometer for 6 d. The Fels PAQ for children was given on two separate occasions to evaluate reliability and was compared with accelerometry data to evaluate validity. The reliability of the Fels PAQ for the girls, boys, and the elementary, middle, and high school age groups range was r=0.48-0.76. For the elementary school children, the correlation coefficient examining validity between the Fels PAQ total score and Actiwatch (counts per minute) was 0.34 (P=0.004). The correlation coefficients were lower in middle school (r=0.11, P=0.31) and high school (r=0.21, P=0.006) adolescents. The sport index of the Fels PAQ for children had the highest validity in the high school participants (r=0.34, P=0.002). The Fels PAQ for children is moderately reliable for all age groups of children. Validity of the Fels PAQ for children is acceptable for elementary and high school students when the total activity score or the sport index is used. The sport index was similar to the total score for elementary students but was a better measure of physical activity among high school students.
A Psychometric Study of the Bayley Scales of Infant and Toddler Development in Persian Language Children

PubMed Central

AZARI, Nadia; SOLEIMANI, Farin; VAMEGHI, Roshanak; SAJEDI, Firoozeh; SHAHSHAHANI, Soheila; KARIMI, Hossein; KRASKIAN, Adis; SHAHROKHI, Amin; TEYMOURI, Robab; GHARIB, Masoud

2017-01-01

Objective Bayley Scales of infant & toddler development is a well-known diagnostic developmental assessment tool for children aged 1–42 months. Our aim was investigating the validity & reliability of this scale in Persian speaking children. Materials & Methods The method was descriptive-analytic. Translation- back translation and cultural adaptation was done. Content & face validity of translated scale was determined by experts’ opinions. Overall, 403 children aged 1 to 42 months were recruited from health centers of Tehran, during years of 2013-2014 for developmental assessment in cognitive, communicative (receptive & expressive) and motor (fine & gross) domains. Reliability of scale was calculated through three methods; internal consistency using Cronbach’s alpha coefficient, test-retest and interrater methods. Construct validity was calculated using factor analysis and comparison of the mean scores methods. Results Cultural and linguistic changes were made in items of all domains especially on communication subscale. Content and face validity of the test were approved by experts’ opinions. Cronbach’s alpha coefficient was above 0.74 in all domains. Pearson correlation coefficient in various domains, were ≥ 0.982 in test retest method, and ≥0.993 in inter-rater method. Construct validity of the test was approved by factor analysis. Moreover, the mean scores for the different age groups were compared and statistically significant differences were observed between mean scores of different age groups, that confirms validity of the test. Conclusion The Bayley Scales of Infant and Toddler Development is a valid and reliable tool for child developmental assessment in Persian language children. PMID:28277556
Validity and reliability of the Turkish version of the European Health Literacy Survey Questionnaire.

PubMed

Abacigil, Filiz; Harlak, Hacer; Okyay, Pinar; Kiraz, Didem Evci; Gursoy Turan, Selen; Saruhan, Gulnur; Karakaya, Kagan; Tuzun, Hakan; Baran Deniz, Emine; Tontus, Omer; Beser, Erdal

2018-04-10

Health literacy is a public health priority which refers to individual's knowledge, motivation and competence to access, understand, appraise and apply health information to prevent disease and promote health in daily life. This study aimed to adapt European Health Literacy Survey Questionnaire (HLS-EU-Q47) into Turkish and to investigate its psychometric properties. The questionnaire was translated into Turkish by using both group translation and expert opinion methods. Forward translation-back translation method was used for language validity and the final Turkish version (HLS-TR) was formed. HLS-EU-Q47 and Health Awareness Scale (HAS) were administered to 505 respondents. The scale reliability was examined using Crohnbach's alpha coefficient and the construct validity was assessed by principal axis factoring procedure. The convergent validity was obtained by Pearson correlation coefficients between HLS-TR and HAS scores and discriminant validity was examined comparing the scores of participants who were stratified according to ages, educational status, gender, general health status and social status. Cronbach's alpha coefficient for the whole scale was 0.95. Principal axis factoring extracted nine factors which eigenvalues were >1 and explained 50.01% of total variance. Factor matrix displayed that all items gave greater load in factor 1, showing that health literacy measured with one factor. Positive and significant correlation was found between HLS-TR and HAS. Significant relations were found between HLS-TR scores and selected determinants of health. This study revealed that the HLS-TR was a valid and reliable measuring instrument with appropriate psychometric characteristics.
Characterization of Severe Arterial Phase Respiratory Motion Artifact on Gadoxetate Disodium-Enhanced MRI - Assessment of Interrater Agreement and Reliability.

PubMed

Ringe, Kristina Imeen; Luetkens, Julian A; Fimmers, Rolf; Hammerstingl, Renate Maria; Layer, Günter; Maurer, Martin H; Nähle, Claas Philip; Michalik, Sabine; Reimer, Peter; Schraml, Christina; Schreyer, Andreas G; Stumpp, Patrick; Vogl, Thomas J; Wacker, Frank K; Willinek, Winfried; Kukuk, Guido Mattias

2018-04-01

To assess the interrater agreement and reliability of experienced abdominal radiologists in the characterization and grading of arterial phase gadoxetate disodium-related respiratory motion artifact on liver MRI. This prospective multicenter study was initiated by the working group for abdominal imaging within the German Roentgen Society (DRG), and approved by the local IRB of each participating center. 11 board-certified radiologists independently reviewed 40 gadoxetate disodium-enhanced liver MRI datasets. Motion artifacts in the arterial phase were assessed on a 5-point scale. Interrater agreement and reliability were calculated using the intraclass correlation coefficient (ICC) and Kendall coefficient of concordance (W), with p < 0.05 deemed significant. The ICC for interrater agreement and reliability were 0.983 (CI 0.973 - 0.990) and 0.985 (CI 0.978 - 0.991), respectively (both p < 0.0001), indicating excellent agreement and reliability. Kendall's W for interrater agreement was 0.865. A severe motion artifact, defined as a mean motion score ≥ 4 in the arterial phase was observed in 12 patients. In these specific cases, a motion score ≥ 4 was assigned by all readers in 75 % (n = 9/12 cases). Differentiation and grading of arterial phase respiratory motion artifact is possible with a high level of inter-/intrarater agreement and interrater reliability, which is crucial for assessing the incidence of this phenomenon in larger multicenter studies. · Inter- and intrarater agreement for motion artifact scoring is excellent among experienced readers.. · Interrater reliability for motion artifact scoring is excellent among experienced readers.. · Characterization of severe motion artifacts proved feasible in this multicenter study.. · Ringe KI, Luetkens JA, Fimmers R et al. Characterization of Severe Arterial Phase Respiratory Motion Artifact on Gadoxetate Disodium-Enhanced MRI - Assessment of Interrater Agreement and Reliability. Fortschr Röntgenstr 2017; 190: 341 - 347. © Georg Thieme Verlag KG Stuttgart · New York.

Validity and reliability of the Greek version of the xerostomia questionnaire in head and neck cancer patients.

PubMed

Memtsa, Pinelopi Theopisti; Tolia, Maria; Tzitzikas, Ioannis; Bizakis, Ioannis; Pistevou-Gombaki, Kyriaki; Charalambidou, Martha; Iliopoulou, Chrysoula; Kyrgias, George

2017-03-01

Xerostomia after radiation therapy for head and neck (H&N) cancer has serious effects on patients' quality of life. The purpose of this study was to validate the Greek version of the self-reported eight-item xerostomia questionnaire (XQ) in patients treated with radiotherapy for H&N cancer. The XQ was translated into Greek and administered to 100 XQ patients. An exploratory factor analysis was performed. Reliability measures were calculated. Several types of validity were evaluated. The observer-rated scoring system was also used. The mean XQ value was 41.92 (SD 22.71). Factor analysis revealed the unidimensional nature of the questionnaire. High reliability measures (ICC, Cronbach's α, Pearson coefficients) were obtained. Patients differed statistically significantly in terms of XQ score, depending on the RTOG/EORTC classification. The Greek version of XQ is valid and reliable. Its score is well related to observer's findings and it can be used to evaluate the impact of radiation therapy on the subjective feeling of xerostomia.
Simplified Radiographic Damage Index for Affected Joints in Chronic Gouty Arthritis

PubMed Central

2016-01-01

The aim of this study was to develop and validate a new radiographic damage scoring method (DAmagE index of GoUt; DAEGU) in chronic gout using plain radiography. Two independent observers scored foot x-rays from 15 patients with chronic gout according to the DAEGU method and the modified Sharp/van der Heijde (SvdH) method. The 10 metatarsophalangeal (MTP) and 2 interphalangeal (IP) joints of the first toes of both feet were scored to assess the degrees of erosion and joint space narrowing (JSN). The intraobserver and interobserver reliabilities were analyzed by calculating the intraclass correlation coefficient (ICC) and minimal detectable change (MDC). The correlation between the DAEGU and SvdH methods was analyzed by calculating the Spearman's rho correlation coefficients and Kappa coefficients. The DAEGU method was found to be highly reproducible (0.945–0.987 for the intraobserver and 0.993–0.996 for the interobserver ICC values). The erosion, JSN, and total scores exhibited strong positive correlations between the DAEGU and SvdH methods and also within each method (r = 0.860–0.969, P < 0.001 for all parameters). The DAEGU and SvdH methods were in very good agreement as determined by Kappa coefficient analysis [0.732 (0.387–1.000) for erosion and 1.000 (1.000–1.000) for JSN]. In conclusion, this study revealed that DAEGU method was a reliable and feasible tool in the assessment of radiographic damage in chronic gout. The DAEGU method may provide a more easy assessment of structural damage in chronic gout in the real clinical practice. PMID:26955246
The cross-cultural adaptation, reliability, and validity of the Copenhagen Neck Functional Disability Scale in patients with chronic neck pain: Turkish version study.

PubMed

Yapali, Gökmen; Günel, Mintaze Kerem; Karahan, Sevilay

2012-05-15

The study design was cross-cultural adaptation and investigation of reliability and validity of the Copenhagen Neck Functional Disability Scale (CNFDS). The aim of this study was to translate the CNFDS into Turkish language and assess its reliability and validity among patients with neck pain in Turkish population. The CNFDS is a reliable and valid evaluation instrument for disability, but there is no published the Turkish version of the CNFDS. One hundred one subjects who had chronic neck pain were included in this study. The CNFDS, Neck Pain and Disability Scale, and visual analogue scale were administered to all subjects. For investigating test-retest reliability, correlation between CNFDS scores, applied at 1-week interval, intraclass correlation coefficient score for test-retest reliability was 0.86 (95% confidence interval = 0.679-0.935). There was no difference between test-retest scores (P < 0.001). For investigating concurrent validity, correlation between total score of the CNFDS and the mean visual analogue scale was r = 0.73 (P < 0.001). Concurrent validity of the CNFDS was very good. For investigating construct validity, correlation between total score of the CNFDS and the Neck Pain and Disability Scale was r = 0.78 (P < 0.001). Construct validity of the CNFDS was also very good. Our results suggest that the Turkish version of the CNFDS is a reliable and valid instrument for Turkish people.
Evaluating Written Patient Information for Eczema in German: Comparing the Reliability of Two Instruments, DISCERN and EQIP

PubMed Central

McCool, Megan E.; Wahl, Josepha; Schlecht, Inga; Apfelbacher, Christian

2015-01-01

Patients actively seek information about how to cope with their health problems, but the quality of the information available varies. A number of instruments have been developed to assess the quality of patient information, primarily though in English. Little is known about the reliability of these instruments when applied to patient information in German. The objective of our study was to investigate and compare the reliability of two validated instruments, DISCERN and EQIP, in order to determine which of these instruments is better suited for a further study pertaining to the quality of information available to German patients with eczema. Two independent raters evaluated a random sample of 20 informational brochures in German. All the brochures addressed eczema as a disorder and/or therapy options and care. Intra-rater and inter-rater reliability were assessed by calculating intra-class correlation coefficients, agreement was tested with weighted kappas, and the correlation of the raters’ scores for each instrument was measured with Pearson’s correlation coefficient. DISCERN demonstrated substantial intra- and inter-rater reliability. It also showed slightly better agreement than EQIP. There was a strong correlation of the raters’ scores for both instruments. The findings of this study support the reliability of both DISCERN and EQIP. However, based on the results of the inter-rater reliability, agreement and correlation analyses, we consider DISCERN to be the more precise tool for our project on patient information concerning the treatment and care of eczema. PMID:26440612
Evaluating Written Patient Information for Eczema in German: Comparing the Reliability of Two Instruments, DISCERN and EQIP.

PubMed

McCool, Megan E; Wahl, Josepha; Schlecht, Inga; Apfelbacher, Christian

2015-01-01

Patients actively seek information about how to cope with their health problems, but the quality of the information available varies. A number of instruments have been developed to assess the quality of patient information, primarily though in English. Little is known about the reliability of these instruments when applied to patient information in German. The objective of our study was to investigate and compare the reliability of two validated instruments, DISCERN and EQIP, in order to determine which of these instruments is better suited for a further study pertaining to the quality of information available to German patients with eczema. Two independent raters evaluated a random sample of 20 informational brochures in German. All the brochures addressed eczema as a disorder and/or therapy options and care. Intra-rater and inter-rater reliability were assessed by calculating intra-class correlation coefficients, agreement was tested with weighted kappas, and the correlation of the raters' scores for each instrument was measured with Pearson's correlation coefficient. DISCERN demonstrated substantial intra- and inter-rater reliability. It also showed slightly better agreement than EQIP. There was a strong correlation of the raters' scores for both instruments. The findings of this study support the reliability of both DISCERN and EQIP. However, based on the results of the inter-rater reliability, agreement and correlation analyses, we consider DISCERN to be the more precise tool for our project on patient information concerning the treatment and care of eczema.
Rating scale for psychogenic nonepileptic seizures: scale development and clinimetric testing.

PubMed

Cianci, Vittoria; Ferlazzo, Edoardo; Condino, Francesca; Mauvais, Hélène Somma; Farnarier, Guy; Labate, Angelo; Latella, Maria Adele; Gasparini, Sara; Branca, Damiano; Pucci, Franco; Vazzana, Francesco; Gambardella, Antonio; Aguglia, Umberto

2011-06-01

Our aim was to develop a clinimetric scale evaluating motor phenomena, associated features, and severity of psychogenic nonepileptic seizures (PNES). Sixty video/EEG-recorded PNES induced by suggestion maneuvers were evaluated. We examined the relationship between results from this scale and results from the Clinical Global Impression (CGI) scale to validate this technique. Interrater reliabilities of the PNES scale for three raters were analyzed using the AC1 statistic, Kendall's coefficient of concordance (KCC), and intraclass correlation coefficients (ICCs). The relationship between the CGI and PNES scales was evaluated with Spearman correlations. The AC1 statistic demonstrated good interrater reliability for each phenomenon analyzed (tremor/oscillation, tonic; clonic/jerking, hypermotor/agitation, atonic/akinetic, automatisms, associated features). KCC and the ICC showed moderate interrater agreement for phenomenology, associated phenomena, and total PNES scores. Spearman's correlation of mean CGI score with mean total PNES score was 0.69 (P<0.001). The scale described here accurately evaluates the phenomenology of PNES and could be used to assess and compare subgroups of patients with PNES. Copyright © 2011 Elsevier Inc. All rights reserved.
The Persian developmental sentence scoring as a clinical measure of morphosyntax in children.

PubMed

Jalilevand, Nahid; Kamali, Mohammad; Modarresi, Yahya; Kazemi, Yalda

2016-01-01

Background: Developmental Sentence Scoring (DSS) was developed as a numerical measurement and a clinical method based on the morphosyntactic acquisition in the English language. The aim of this study was to develop a new numerical tool similar to DSS to assess the morphosyntactic abilities in Persian-speaking children. Methods: In this cross-sectional and comparative study, the language samples of 115 typically developing Persian-speaking children aged 30 - 65 months were audio recorded during the free play and picture description sessions. The Persian Developmental Sentence Score (PDSS) and the Mean Length of Utterance (MLU) were calculated. Pearson correlation and one - way Analysis of variance (ANOVA) were used for data analysis. Results: The correlation between PDSS and MLU in morphemes (convergent validity) was significant with a correlation coefficient of 0.97 (p< 0.001). The value Cronbach's Alpha (α= 0.79) in the grammatical categories and the split-half coefficient (0.86) indicated acceptable internal consistency reliability. Conclusion: The PDSS could be used as a reliable numerical measurement to estimate the syntactic development in Persian-speaking children.
The Persian developmental sentence scoring as a clinical measure of morphosyntax in children

PubMed Central

Jalilevand, Nahid; Kamali, Mohammad; Modarresi, Yahya; Kazemi, Yalda

2016-01-01

Background: Developmental Sentence Scoring (DSS) was developed as a numerical measurement and a clinical method based on the morphosyntactic acquisition in the English language. The aim of this study was to develop a new numerical tool similar to DSS to assess the morphosyntactic abilities in Persian-speaking children. Methods: In this cross-sectional and comparative study, the language samples of 115 typically developing Persian-speaking children aged 30 - 65 months were audio recorded during the free play and picture description sessions. The Persian Developmental Sentence Score (PDSS) and the Mean Length of Utterance (MLU) were calculated. Pearson correlation and one – way Analysis of variance (ANOVA) were used for data analysis. Results: The correlation between PDSS and MLU in morphemes (convergent validity) was significant with a correlation coefficient of 0.97 (p< 0.001). The value Cronbach's Alpha (α= 0.79) in the grammatical categories and the split-half coefficient (0.86) indicated acceptable internal consistency reliability. Conclusion: The PDSS could be used as a reliable numerical measurement to estimate the syntactic development in Persian-speaking children. PMID:28210600
How reliable are Functional Movement Screening scores? A systematic review of rater reliability.

PubMed

Moran, Robert W; Schneiders, Anthony G; Major, Katherine M; Sullivan, S John

2016-05-01

Several physical assessment protocols to identify intrinsic risk factors for injury aetiology related to movement quality have been described. The Functional Movement Screen (FMS) is a standardised, field-expedient test battery intended to assess movement quality and has been used clinically in preparticipation screening and in sports injury research. To critically appraise and summarise research investigating the reliability of scores obtained using the FMS battery. Systematic literature review. Systematic search of Google Scholar, Scopus (including ScienceDirect and PubMed), EBSCO (including Academic Search Complete, AMED, CINAHL, Health Source: Nursing/Academic Edition), MEDLINE and SPORTDiscus. Studies meeting eligibility criteria were assessed by 2 reviewers for risk of bias using the Quality Appraisal of Reliability Studies checklist. Overall quality of evidence was determined using van Tulder's levels of evidence approach. 12 studies were appraised. Overall, there was a 'moderate' level of evidence in favour of 'acceptable' (intraclass correlation coefficient ≥0.6) inter-rater and intra-rater reliability for composite scores derived from live scoring. For inter-rater reliability of composite scores derived from video recordings there was 'conflicting' evidence, and 'limited' evidence for intra-rater reliability. For inter-rater reliability based on live scoring of individual subtests there was 'moderate' evidence of 'acceptable' reliability (κ≥0.4) for 4 subtests (Deep Squat, Shoulder Mobility, Active Straight-leg Raise, Trunk Stability Push-up) and 'conflicting' evidence for the remaining 3 (Hurdle Step, In-line Lunge, Rotary Stability). This review found 'moderate' evidence that raters can achieve acceptable levels of inter-rater and intra-rater reliability of composite FMS scores when using live ratings. Overall, there were few high-quality studies, and the quality of several studies was impacted by poor study reporting particularly in relation to rater blinding. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
German version, inter- and intrarater reliability and internal consistency of the "Agitated Behavior Scale" (ABS-G) in patients with moderate to severe traumatic brain injury.

PubMed

Hellweg, Stephanie; Schuster-Amft, Corina

2016-07-19

Agitation is frequently observed during early recovery after traumatic brain injury (TBI). Agitated behaviour often interferes with a goal-orientated rehabilitation and can be a substantial hindrance to therapy. Despite the relatively high occurance of agitation in TBI population there is no objective assessement in German (G) available. An existing scale with excellent psychometric properties is the "Agitated Behavior Scale (ABS)" developed by Corrigan in 1989. The aim of the study was to translate the Agitated Behavior Scale (ABS) into German (ABS-G) and investigate the inter- and intrarater reliability and internal consistency in patients with moderate to severe TBI. A formal nine-step translation and cross-cultural adaptation procedure (TCCA) was applied. Subsequently a prospective observational patient study was conducted. To examine the interrater reliability and internal consistency, two therapists rated 20 patients independently after a therapy session. This procedure was repeated twice on a weekly basis. The intrarater reliability was assessed through video recordings from three patients. Nine raters scored the demonstrated behaviour on the videotape with the ABS-G independently twice within one month. The inter- and intrarater reliability were evaluated with the Spearman rank correlation coefficient and the quadratic weighted kappa. The internal consistency was tested with Cronbach's alpha. Behaviour of 20 patients (18 males; mean age 41 ± 20.7; mean Functional Independence Measure (FIM) cognitive score on admission 7.1 ± 4.04; mean ABS-G score at first observation 17.3 ± 2.83) was assessed threefold. Interrater reliability yielded a correlation coefficient for ABS-G total score of all 60 paired observations of r s 0.845 and a weighted Kappa of 0.738. Intrarater reliability for ABS-G total score ranged between r s 0.719 and 0.953 and showed a weighted Kappa between 0.871 and 0.953. Cronbach's alpha indicated moderate internal consistency with 0.661. This study demonstrates that the ABS-G is a reliable instrument for evaluating agitation in patients with moderate to severe TBI. Hereby it would be possible to monitor agitation objectively and optimise the management of agitated patients according to international recommendations.
A Note on Some Characteristics and Correlates of the Meier Art Test of Aesthetic Perception.

ERIC Educational Resources Information Center

Stallings, William M.; Anderson, Frances E.

The reliability and the predictive and concurrent validity of the MATAP were investigated with the implicit goal of improving the prediction of course grades in the College of Fine and Applied Arts. It was found that reliability and validity coefficients were low, and it was suggested that the scoring system was a source of error variance. (MS)
Reliability of scoring arousals in normal children and children with obstructive sleep apnea syndrome.

PubMed

Wong, Tat Kong; Galster, Patricia; Lau, Tai Shing; Lutz, Janita M; Marcus, Carole L

2004-09-15

Scoring of arousals in children is based on an extension of adult criteria, as defined by the American Sleep Disorders Association (ASDA). By this, a minimum duration of 3 seconds is required. A few recent studies utilized modified criteria for the study of children, with durations as short as 1 second. However, the validity and reliability of scoring these shorter arousals have never been verified. Based on studies in adults, we hypothesized that interscorer agreement for scoring arousals shorter than 3 seconds was poor. Retrospective review of polysomnograms by 2 experienced sleep practitioners who independently scored arousals according to the ASDA 3-second criteria and modified duration criteria of 1 and 2 seconds. Academic hospital. 20 polysomnographic studies from children aged 3 to 8 years with mild to severe obstructive sleep apnea syndrome, and 16 polysomnographic studies from normal children. None. The intraclass correlation coefficient for scoring ASDA arousals was 0.90 (95% confidence interval: 0.81-0.95), indicating excellent interscorer agreement. The intraclass correlation coefficient for scoring modified 1-second and 2-second arousals were 0.35 (95% confidence interval: 0.02-0.61) and 0.42 (95% confidence interval: 0.12-0.65) respectively, indicating poor to fair interscorer agreement. Furthermore, modified 1-second and 2-second arousals accounted for less than 15% of all arousals scored. We conclude that there is much poorer interscorer agreement for scoring arousals shorter than 3 seconds, when compared to the standard ASDA criteria. We propose that scoring of arousals in children should follow the standard ASDA criteria.
Cross-cultural adaptation and validation of the Persian version of the Intermittent and Constant Osteoarthritis Pain Measure for the knee.

PubMed

Panah, Sara Hojat; Baharlouie, Hamze; Rezaeian, Zahra Sadat; Hawker, Gilian

2016-01-01

The present study aimed to translate and evaluate the reliability and validity of the Persian version of the 11-item Intermittent and Constant Osteoarthritis Pain (ICOAP) measure in Iranian subjects with Knee Osteoarthritis (KOA). The ICOAP questionnaire was translated according to the Manufacturers Alliance for Productivity and Innovation (MAPI) protocol. The procedure consisted of forward and backward translation, as well as the assessment of the psychometric properties of the Persian version of the questionnaire. A sample of 230 subjects with KOA was asked to complete the Persian versions of ICOAP and Knee injury and Osteoarthritis Outcome Score (KOOS). The ICOAP was readministered to forty subjects five days after the first visit. Test-retest reliability was assessed using Intraclass Correlation Coefficient (ICC), and internal consistency was assessed by Cronbach's alpha and item-total correlation. The correlation between ICOAP and KOOS was determined using Spearman's correlation coefficient. Subjects found the Persian-version of the ICOAP to be clear, simple, and unambiguous, confirming its face validity. Spearman correlations between ICOAP total and subscale scores with KOOS scores were between 0.5 and 0.7, confirming construct validity. Cronbach's alpha, used to assess internal consistency, was 0.89, 0.93, and 0.92 for constant pain, intermittent pain, and total pain scores, respectively. The ICC was 0.90 for constant pain and 0.91 for the intermittent pain and total pain score. The Persian version of the ICOAP is a reliable and valid outcome measure that can be used in Iranian subjects with KOA.
Web-Enabled Mechanistic Case Diagramming: A Novel Tool for Assessing Students' Ability to Integrate Foundational and Clinical Sciences.

PubMed

Ferguson, Kristi J; Kreiter, Clarence D; Haugen, Thomas H; Dee, Fred R

2018-02-20

As medical schools move from discipline-based courses to more integrated approaches, identifying assessment tools that parallel this change is an important goal. The authors describe the use of test item statistics to assess the reliability and validity of web-enabled mechanistic case diagrams (MCDs) as a potential tool to assess students' ability to integrate basic science and clinical information. Students review a narrative clinical case and construct an MCD using items provided by the case author. Students identify the relationships among underlying risk factors, etiology, pathogenesis and pathophysiology, and the patients' signs and symptoms. They receive one point for each correctly-identified link. In 2014-15 and 2015-16, case diagrams were implemented in consecutive classes of 150 medical students. The alpha reliability coefficient for the overall score, constructed using each student's mean proportion correct across all cases, was 0.82. Discrimination indices for each of the case scores with the overall score ranged from 0.23 to 0.51. In a G study using those students with complete data (n = 251) on all 16 cases, 10% of the variance was true score variance, and systematic case variance was large. Using 16 cases generated a G coefficient (relative score reliability) equal to .72 and a Phi equal to .65. The next phase of the project will involve deploying MCDs in higher-stakes settings to determine whether similar results can be achieved. Further analyses will determine whether these assessments correlate with other measures of higher-order thinking skills.
Improving the residency admissions process by integrating a professionalism assessment: a validity and feasibility study.

PubMed

Bajwa, Nadia M; Yudkowsky, Rachel; Belli, Dominique; Vu, Nu Viet; Park, Yoon Soo

2017-03-01

The purpose of this study was to provide validity and feasibility evidence in measuring professionalism using the Professionalism Mini-Evaluation Exercise (P-MEX) scores as part of a residency admissions process. In 2012 and 2013, three standardized-patient-based P-MEX encounters were administered to applicants invited for an interview at the University of Geneva Pediatrics Residency Program. Validity evidence was gathered for P-MEX content (item analysis); response process (qualitative feedback); internal structure (inter-rater reliability with intraclass correlation and Generalizability); relations to other variables (correlations); and consequences (logistic regression to predict admission). To improve reliability, Kane's formula was used to create an applicant composite score using P-MEX, structured letter of recommendation (SLR), and structured interview (SI) scores. Applicant rank lists using composite scores versus faculty global ratings were compared using the Wilcoxon signed-rank test. Seventy applicants were assessed. Moderate associations were found between pairwise correlations of P-MEX scores and SLR (r = 0.25, P = .036), SI (r = 0.34, P = .004), and global ratings (r = 0.48, P < .001). Generalizability of the P-MEX using three cases was moderate (G-coefficient = 0.45). P-MEX scores had the greatest correlation with acceptance (r = 0.56, P < .001), were the strongest predictor of acceptance (OR 4.37, P < .001), and increased pseudo R-squared by 0.20 points. Including P-MEX scores increased composite score reliability from 0.51 to 0.74. Rank lists of applicants using composite score versus global rating differed significantly (z = 5.41, P < .001). Validity evidence supports the use of P-MEX scores to improve the reliability of the residency admissions process by improving applicant composite score reliability.
Clinical use of the ABO-Scoring Index: reliability and subtraction frequency.

PubMed

Lieber, William S; Carlson, Sean K; Baumrind, Sheldon; Poulton, Donald R

2003-10-01

This study tested the reliability and subtraction frequency of the study model-scoring system of the American Board of Orthodontists (ABO). We used a sample of 36 posttreatment study models that were selected randomly from six different orthodontic offices. Intrajudge and interjudge reliability was calculated using nonparametric statistics (Spearman rank coefficient, Wilcoxon, Kruskal-Wallis, and Mann-Whitney tests). We found differences ranging from 3 to 6 subtraction points (total score) for intrajudge scoring between two sessions. For overall total ABO score, the average correlation was .77. Intrajudge correlation was greatest for occlusal relationships and least for interproximal contacts. Interjudge correlation for ABO score averaged r = .85. Correlation was greatest for buccolingual inclination and least for overjet. The data show that some judges, on average, were much more lenient than others and that this resulted in a range of total scores between 19.7 and 27.5. Most of the deductions were found in the buccal segments and most were related to the second molars. We present these findings in the context of clinicians preparing for the ABO phase III examination and for orthodontists in their ongoing evaluation of clinical results.
A 2-year study of Gram stain competency assessment in 40 clinical laboratories.

PubMed

Goodyear, Nancy; Kim, Sara; Reeves, Mary; Astion, Michael L

2006-01-01

We used a computer-based competency assessment tool for Gram stain interpretation to assess the performance of 278 laboratory staff from 40 laboratories on 40 multiple-choice questions. We report test reliability, mean scores, median, item difficulty, discrimination, and analysis of the highest- and lowest-scoring questions. The questions were reliable (KR-20 coefficient, 0.80). Overall mean score was 88% (range, 63%-98%). When categorized by cell type, the means were host cells, 93%; other cells (eg, yeast), 92%; gram-positive, 90%; and gram-negative, 88%. When categorized by type of interpretation, the means were other (eg, underdecolorization), 92%; identify by structure (eg, bacterial morphologic features), 91%; and identify by name (eg, genus and species), 87%. Of the 6 highest-scoring questions (mean scores, > or = 99%) 5 were identify by structure and 1 was identify by name. Of the 6 lowest-scoring questions (mean scores, < 75%) 5 were gram-negative and 1 was host cells. By type of interpretation, 2 were identify by structure and 4 were identify by name. Computer-based Gram stain competency assessment examinations are reliable. Our analysis helps laboratories identify areas for continuing education in Gram stain interpretation and will direct future revisions of the tests.
Validity and Reliability of the Brazilian Version of the Rapid Estimate of Adult Literacy in Dentistry--BREALD-30.

PubMed

Junkes, Monica C; Fraiz, Fabian C; Sardenberg, Fernanda; Lee, Jessica Y; Paiva, Saul M; Ferreira, Fernanda M

2015-01-01

The aim of the present study was to translate, perform the cross-cultural adaptation of the Rapid Estimate of Adult Literacy in Dentistry to Brazilian-Portuguese language and test the reliability and validity of this version. After translation and cross-cultural adaptation, interviews were conducted with 258 parents/caregivers of children in treatment at the pediatric dentistry clinics and health units in Curitiba, Brazil. To test the instrument's validity, the scores of Brazilian Rapid Estimate of Adult Literacy in Dentistry (BREALD-30) were compared based on occupation, monthly household income, educational attainment, general literacy, use of dental services and three dental outcomes. The BREALD-30 demonstrated good internal reliability. Cronbach's alpha ranged from 0.88 to 0.89 when words were deleted individually. The analysis of test-retest reliability revealed excellent reproducibility (intraclass correlation coefficient = 0.983 and Kappa coefficient ranging from moderate to nearly perfect). In the bivariate analysis, BREALD-30 scores were significantly correlated with the level of general literacy (rs = 0.593) and income (rs = 0.327) and significantly associated with occupation, educational attainment, use of dental services, self-rated oral health and the respondent's perception regarding his/her child's oral health. However, only the association between the BREALD-30 score and the respondent's perception regarding his/her child's oral health remained significant in the multivariate analysis. The BREALD-30 demonstrated satisfactory psychometric properties and is therefore applicable to adults in Brazil.
Validity and Reliability of the Brazilian Version of the Rapid Estimate of Adult Literacy in Dentistry – BREALD-30

PubMed Central

Junkes, Monica C.; Fraiz, Fabian C.; Sardenberg, Fernanda; Lee, Jessica Y.; Paiva, Saul M.; Ferreira, Fernanda M.

2015-01-01

Objective The aim of the present study was to translate, perform the cross-cultural adaptation of the Rapid Estimate of Adult Literacy in Dentistry to Brazilian-Portuguese language and test the reliability and validity of this version. Methods After translation and cross-cultural adaptation, interviews were conducted with 258 parents/caregivers of children in treatment at the pediatric dentistry clinics and health units in Curitiba, Brazil. To test the instrument's validity, the scores of Brazilian Rapid Estimate of Adult Literacy in Dentistry (BREALD-30) were compared based on occupation, monthly household income, educational attainment, general literacy, use of dental services and three dental outcomes. Results The BREALD-30 demonstrated good internal reliability. Cronbach’s alpha ranged from 0.88 to 0.89 when words were deleted individually. The analysis of test-retest reliability revealed excellent reproducibility (intraclass correlation coefficient = 0.983 and Kappa coefficient ranging from moderate to nearly perfect). In the bivariate analysis, BREALD-30 scores were significantly correlated with the level of general literacy (rs = 0.593) and income (rs = 0.327) and significantly associated with occupation, educational attainment, use of dental services, self-rated oral health and the respondent’s perception regarding his/her child's oral health. However, only the association between the BREALD-30 score and the respondent’s perception regarding his/her child's oral health remained significant in the multivariate analysis. Conclusion The BREALD-30 demonstrated satisfactory psychometric properties and is therefore applicable to adults in Brazil. PMID:26158724
Basic psychometric properties of the transfer assessment instrument (version 3.0).

PubMed

Tsai, Chung-Ying; Rice, Laura A; Hoelmer, Claire; Boninger, Michael L; Koontz, Alicia M

2013-12-01

To refine the Transfer Assessment Instrument (TAI 2.0), develop a training program for the TAI, and analyze the basic psychometric properties of the TAI 3.0, including reliability, standard error of measurement (SEM), minimal detectable change (MDC), and construct validity. Repeated measures. A winter sports clinic for disabled veterans. Wheelchair users (N=41) who perform sitting-pivot or standing-pivot transfers. Not applicable. TAI version 3.0, intraclass correlation coefficients, SEMs, and MDCs for reliable measurement of raters' responses. Spearman correlation coefficient, 1-way analysis of variance, and independent t tests to evaluate construct validity. TAI 3.0 had acceptable to high levels of reliability (range, .74-.88). The SEMs for part 1, part 2, and final scores ranged from .45 to .75. The MDC was 1.5 points on the 10-point scale for the final score. There were weak correlations (ρ range, -.13 to .25; P>.11) between TAI final scores and subjects' characteristics (eg, sex, body mass index, age, type of disability, length of wheelchair use, grip and elbow strength, sitting balance). With comprehensive training, the refined TAI 3.0 yields high reliability among raters of different clinical backgrounds and experience. TAI 3.0 was unbiased toward certain physical characteristics that may influence transfer. TAI fills a void in the field by providing a quantitative measurement of transfers and a tool that can be used to detect problems and guide transfer training. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

Development and psychometric validation of a scoring questionnaire to assess healthy lifestyles among adolescents in Catalonia.

PubMed

Costa-Tutusaus, Lluís; Guerra-Balic, Myriam

2016-01-28

Lifestyle is intimately related to health. A questionnaire that specifically scores the healthiness of lifestyle of Catalan adolescents is needed. The objective of this study was to develop and validate a scoring questionnaire called VISA-TEEN to assess the healthy lifestyle of young Catalans that can be answered quickly and user-friendly. A lifestyle questionnaire was developed based on the analysis of contributions from two focus groups, one with adolescents and the other with people who work with them (teachers and doctors). A panel of experts validated the content of items that were ultimately selected for the VISA-TEEN questionnaire. Three hundred ninety-six adolescents (215 boys and 181 girls, age = 13-19 years) completed the VISA-TEEN. Internal consistency was assessed using Cronbach's alpha (α) reliability coefficient. Test-retest reliability, using an intraclass correlation coefficient (ICC), was calculated based on scores attained two weeks apart. Construct validity was assessed by the extraction of components with an exploratory factor analysis. The relationship between the scores was measured using the health-related quality of life (HRQoL) KIDSCREEN-10 Index (the relationship was assessed by calculating Pearson's r correlation coefficient). The association of scores in the VISA-TEEN for self-rated health (SRH) was also examined by executing an analysis of variance (ANOVA) between the different categories of this variable. We also calculated the index of fit for factor scales (IFFS) for each component, as well as the discriminatory power of the instrument using Ferguson's δ (delta) coefficient. The VISA-TEEN questionnaire showed acceptable reliability (α = 0.66, αest = 0.77) and a very good test-retest agreement (ICC = 0.860). It could be broken down into the following five components, all with an acceptable or very good IFFS (0.7-0.96): diet, substance abuse, physical activity, Rational Use of Technological Leisure (RUTL), and hygiene. Scores on the VISA-TEEN showed significant correlation with the KIDSCREEN index (r = 0.21, p < 0.001) and were associated with SRH (p < 0.001). The discriminatory power was found to be δ = 0.97. The VISA-TEEN questionnaire developed to study the lifestyle of Catalan adolescents is a valid instrument to apply in this population as it is shown in the present psychometric tests to understand the role of lifestyle in the health of teenagers or to test the efficacy of health campaigns intended to improve teenagers' lifestyle.
Cross-cultural adaptation and validation of the Turkish version of Oxford hip score.

PubMed

Tuğay, Baki Umut; Tuğay, Nazan; Güney, Hande; Hazar, Zeynep; Yüksel, İnci; Atilla, Bülent

2015-06-01

The purpose of this study was to translate the Oxford hip score (OHS) into Turkish and to evaluate the psychometric properties by testing the internal consistency, reproducibility, construct validity, and responsiveness in patients with hip osteoarthritis (OA). Oxford hip score was translated and culturally adapted according to the guidelines in the literature. Seventy patients (mean age 61.45 ± 9.29 years) with hip osteoarthritis participated in the study. Patients completed the Turkish Oxford hip score (OHS-TR), the Short-Form 36 (SF-36), and Western Ontario and McMaster Universities Index (WOMAC). Internal consistency was tested using Cronbach's α coefficient. Patients completed OHS-TR questionnaire twice in 7 days for determining the reproducibility. Correlation between the total results of both tests was determined by the Pearson correlation coefficient and intraclass correlation coefficient (ICC). Validity was assessed by calculating the Pearson correlation coefficient between the OHS-TR and WOMAC and SF-36 scores. Floor and ceiling effects were analyzed. The internal consistency was high (Cronbach's α 0.93). The construct validity showed a significant correlation between the OHS-TR and WOMAC and related SF-36 domains (p < 0.001). The ICC's ranged between 0.80 and 0.99. There was no floor or ceiling effect in total OHS-TR score. The OHS-TR questionnaire is valid, reliable, and responsive for the Turkish-speaking patients with hip OA.
Construction of the Mandarin version of the International Prostate Symptom Score inventory in assessing lower urinary tract symptoms in a Malaysian population.

PubMed

Quek, Kia Fatt; Chua, Chong Beng; Razack, Azad Hassan; Low, Wah Yun; Loh, Chit Sin

2005-01-01

The purpose of the present study was to validate the Mandarin version of the International Prostate Symptom Score (Mand-IPSS) in a Malaysian population. The validity and reliability were studied in patients with lower urinary tract symptoms (LUTS; benign prostatic hyperplasia [BPH] group) and without LUTS (control group). Test-retest methodology was used to assess the reliability while Cronbach alpha was used to assess the internal consistency. Sensitivity to change was used to express the effect size index in the preintervention versus post-intervention score in patients with LUTS who underwent transurethral resection of the prostate. For the control group and BPH group, the internal consistency was excellent and a high degree of internal consistency was observed for all seven items (Cronbach alpha = 0.86-0.98 and 0.90-0.98, respectively). Test-retest correlation coefficients for all items were highly significant. Intraclass correlation coefficient (ICC) was high for the control (ICC = 0.93-0.99) and BPH group (ICC = 0.91-0.99). The sensitivity and specificity showed a high degree of sensitivity and specificity to the effects of treatment. A high degree of significance between baseline and post-treatment scores was observed across all seven items in the BPH group but not in the control group. The Mand-IPSS is a suitable, reliable, valid and sensitive instrument to measure clinical change in the Malaysian population.
Cross-cultural validity of a dietary questionnaire for studies of dental caries risk in Japanese.

PubMed

Shinga-Ishihara, Chikako; Nakai, Yukie; Milgrom, Peter; Murakami, Kaori; Matsumoto-Nakano, Michiyo

2014-01-02

Diet is a major modifiable contributing factor in the etiology of dental caries. The purpose of this paper is to examine the reliability and cross-cultural validity of the Japanese version of the Food Frequency Questionnaire to assess dietary intake in relation to dental caries risk in Japanese. The 38-item Food Frequency Questionnaire, in which Japanese food items were added to increase content validity, was translated into Japanese, and administered to two samples. The first sample comprised 355 pregnant women with mean age of 29.2 ± 4.2 years for the internal consistency and criterion validity analyses. Factor analysis (principal components with Varimax rotation) was used to determine dimensionality. The dietary cariogenicity score was calculated from the Food Frequency Questionnaire and used for the analyses. Salivary mutans streptococci level was used as a semi-quantitative assessment of dental caries risk and measured by Dentocult SM. Dentocult SM scores were compared with the dietary cariogenicity score computed from the Food Frequency Questionnaire to examine criterion validity, and assessed by Spearman's correlation coefficient (rs) and Kruskal-Wallis test. Test-retest reliability of the Food Frequency Questionnaire was assessed with a second sample of 25 adults with mean age of 34.0 ± 3.0 years by using the intraclass correlation coefficient analysis. The Japanese language version of the Food Frequency Questionnaire showed high test-retest reliability (ICC = 0.70) and good criterion validity assessed by relationship with salivary mutans streptococci levels (rs = 0.22; p < 0.001). Factor analysis revealed four subscales that construct the questionnaire (solid sugars, solid and starchy sugars, liquid and semisolid sugars, sticky and slowly dissolving sugars). Internal consistency were low to acceptable (Cronbach's alpha = 0.67 for the total scale, 0.46-0.61 for each subscale). Mean dietary cariogenicity scores were 50.8 ± 19.5 in the first sample, 47.4 ± 14.1, and 40.6 ± 11.3 for the first and second administrations in the second sample. The distribution of Dentocult SM score was 6.8% (score = 0), 34.4% (score = 1), 39.4% (score = 2), and 19.4% (score = 3). Participants with higher scores were more likely to have higher dietary cariogenicity scores (p < 0.001; Kruskal-Wallis test). These results provide the preliminary evidence for the reliability and validity of the Japanese language Food Frequency Questionnaire.
Translation, cross-cultural adaptation and validation of the Bulgarian version of the Dizziness Handicap Inventory.

PubMed

Georgieva-Zhostova, Spaska; Kolev, Ognyan I; Stambolieva, Katerina

2014-09-01

The aim of the present study was the translation, cross-cultural adaptation and validation of the Dizziness Handicap Inventory in Bulgarian language (DHI-BG). Ninety-seven vestibular patients (19 men and 78 women, mean age 45.08 ± 13.85 years) took part in the investigation. All participants were asked to fill in the DHI-BG. Internal consistency was estimated using Cronbach's alpha and item-total correlation, reproducibility by calculating Bland-Altman's limits of agreement and intraclass correlation coefficients (ICCs). Associations were estimated by Spearman's correlation coefficients. The Cronbach's alpha for the total score, functional, physical and emotional subscales of DHI-BG were 0.88, 0.75, 0.72 and 0.81. The floor and ceiling effects of the DHI-BG total scale were evaluated with respect to the limits of agreement which were ±9.4-14.53 points. Intraclass correlation coefficients (ICCs) for all scale and subscales were higher than the recommended value of 0.75 and determined good test-retest reliability. The range of items correlation for DHI-BG was from 0.27 (item 12) to 0.72 (item 3). No significant differences were observed in the Cronbach's alpha coefficients between the DHI-BG and the original version, the German and Italian versions of the questionnaire. The most significant difference was observed in comparison with the German version of DHI. Construct validity presented a moderate correlation between Romberg coefficients and DHI-BG scores and strong correlation between all scores of DHI and the self-perceived disability. The results suggest that DHI-BG scores show a good discriminative validity between groups with different levels of self-assessed disability. The Bulgarian version of the DHI is a reliable and valid tool in assessing the impact of dizziness on the quality of life in Bulgarian vestibular patients.
Study samples are too small to produce sufficiently precise reliability coefficients.

PubMed

Charter, Richard A

2003-04-01

In a survey of journal articles, test manuals, and test critique books, the author found that a mean sample size (N) of 260 participants had been used for reliability studies on 742 tests. The distribution was skewed because the median sample size for the total sample was only 90. The median sample sizes for the internal consistency, retest, and interjudge reliabilities were 182, 64, and 36, respectively. The author presented sample size statistics for the various internal consistency methods and types of tests. In general, the author found that the sample sizes that were used in the internal consistency studies were too small to produce sufficiently precise reliability coefficients, which in turn could cause imprecise estimates of examinee true-score confidence intervals. The results also suggest that larger sample sizes have been used in the last decade compared with those that were used in earlier decades.
[Application of the Children's Impact of Event Scale (Chinese Version) on a rapid assessment of posttraumatic stress disorder among children from the Wenchuan earthquake area].

PubMed

Zhao, Gao-feng; Zhang, Qiang; Pang, Yan; Ren, Zheng-jia; Peng, Dan; Jiang, Guo-guo; Liu, Shan-ming; Chen, Ying; Geng, Ting; Zhang, Shu-sen; Yang, Yan-chun; Deng, Hong

2009-11-01

To explore the reliability and validity of the Children's Impact of Event Scale (Chinese version, CRIES-13) and to determine the value and the optimal cutoff point of the score of CRIES-13 in screening posttraumatic stress disorder (PTSD), so as to provide evidence for PTSD prevention and identify children at risk in Wenchuan earthquake areas. A total of 253 children experienced the Wenchuan earthquake were tested through Stratified random cluster sampling. The authors examined CRIES-13's internal consistency, discriminative validity and predictive value of the cut-off. PTSD was assessed with the DSM-IV criteria. Area under the curve while sensitivity, specificity and Youden index were computed based on the receiver operating characteristic curve analysis. Optimal cutoff point was determined by the maximum of Youden index. 20.9% of the subjects were found to have met the DSM-IV criteria for PTSD 7 months after the Wenchuan earthquake accident. The Cronbach's coefficient of CRIES-13 was 0.903 and the mean inter-item correlation coefficients ranged from 0.283 to 0.689, the correlation coefficient of the three factors with the total scale scores ranged from 0.836 to 0.868 while the correlation coefficient among the three factors ranged from 0.568 to 0.718, PTSD cases indicated much higher scores than non-PTSD cases, the Youden index reached maximum value when the total score approached 18 in CRIES-13 with sensitivity and specificity as 81.1% and 76.5% respectively. Consistency check showed that there were no significant differences between the results of CRIES-13 score >/= 32 and clinical diagnosis (Kappa = 0.529) from the screening program. CRIES-13 appeared to be a reliable and valid measure for assessing the posttraumatic stress symptoms among children after the earthquake accident in the Wenchuan area. The CRIES-13 seemed to be a useful self-rating diagnostic instrument for survivors with PTSD symptoms as a clinical concern by using a 18 cut-off in total score. Consistency check showed that there was no significant difference between the screening result of CRIES-13 score >/= 32 and clinical diagnosis.
Measuring teamwork and conflict among emergency medical technician personnel.

PubMed

Patterson, P Daniel; Weaver, Matthew D; Weaver, Sallie J; Rosen, Michael A; Todorova, Gergana; Weingart, Laurie R; Krackhardt, David; Lave, Judith R; Arnold, Robert M; Yealy, Donald M; Salas, Eduardo

2012-01-01

We sought to develop a reliable and valid tool for measuring teamwork among emergency medical technician (EMT) partnerships. We adapted existing scales and developed new items to measure components of teamwork. After recruiting a convenience sample of 39 agencies, we tested a 122-item draft survey tool (EMT-TEAMWORK). We performed a series of exploratory factor analyses (EFAs) and confirmatory factor analysis (CFA) to test reliability and construct validity, describing variation in domain and global scores using descriptive statistics. We received 687 completed surveys. The EFAs identified a nine-factor solution. We labeled these factors 1) Team Orientation, 2) Team Structure & Leadership, 3) Partner Communication, Team Support, & Monitoring, 4) Partner Trust and Shared Mental Models, 5) Partner Adaptability & Back-Up Behavior, 6) Process Conflict, 7) Strong Task Conflict, 8) Mild Task Conflict, and 9) Interpersonal Conflict. We tested a short-form (30-item SF) and long-form (45-item LF) version. The CFAs determined that both the SF and the LF possess positive psychometric properties of reliability and construct validity. The EMT-TEAMWORK-SF has positive internal consistency properties, with a mean Cronbach's alpha coefficient ≥0.70 across all nine factors (mean = 0.84; minimum = 0.78, maximum = 0.94). The mean Cronbach's alpha coefficient for the EMT-TEAMWORK-LF was 0.87 (minimum = 0.79, maximum = 0.94). There was wide variation in weighted scores across all nine factors and the global score for the SF and LF. Mean scores were lowest for the Team Orientation factor (48.1, standard deviation [SD] 21.5, SF; 49.3, SD 19.8, LF) and highest (more positive) for the Interpersonal Conflict factor (87.7, SD 18.1, for both SF and LF). We developed a reliable and valid survey to evaluate teamwork between EMT partners.
[Validity and reliability of Pediatric Quality of Life Inventory Version 4.0 Generic Core Scales in Chinese children and adolescents].

PubMed

Chen, Yu-Ming; He, Li-Ping; Mai, Jin-Cheng; Hao, Yuan-Tao; Xiong, Li-Hua; Chen, Wei-Qing; Wu, Jiang-Nan

2008-06-01

To evaluate the reliability and validity of parent proxy-report scales of Pediatric Quality of Life Inventory Version 4.0 (PedsQL 4.0) Generic Core Scales, the Chinese Version. 3493 school students aged 6-18 years were recruited using multistage cluster sampling method. Health-related quality of life was assessed using the above-mentioned PedsQL 4.0 scales. The internal consistency was assessed, using Cronbach's a coefficient, while its validity was tested through correlation analysis, t-test and exploratory factor analysis. The internal consistency reliability for Total Scale Score (Cronbach's alpha = 0.90), Physical Health Summary Score (alpha= 0.81), and Psychosocial Health Summary Score (alpha= 0.89) were excellent. Six major factors were extracted by factor analysis which basically matched the designed structure of the original version accounting for nearly 66% of the variance. The total Scale Score significantly decreased by 3.5 to 13.3 (P < 0.05) in children and adolescents who had diseases including cold, skin hypersensitiveness, food allergy, courbature or arthralgia, breathlessness with a frequency of 6 times or more per year or had asthma as compared to those with lower frequency (< or = 5 times/y) of the diseases or without asthma. We found moderate to high correlations between items and the subscales. Correlation coefficients ranged between 0.45 to 0.84 (P < 0.01). The reliability and validity of the parent proxy-report scales of PedsQL 4.0 Generic Core Scales of the Chinese Version were as good as the original version. Our findings suggested that the scales could be applied to evaluate the health-related quality of life in childhood children in similar Chinese regions to Guangzhou.
Cross-cultural adaptation, reliability, and validity of the Turkish version of PedsQL 3.0 Arthritis Module: a quality-of-life measure for patients with juvenile idiopathic arthritis in Turkey.

PubMed

Tarakci, E; Baydogan, S N; Kasapcopur, O; Dirican, A

2013-04-01

The aim of this study was to describe the cultural adaptation, validity, and reliability of a Turkish version of the pediatric quality-of-life inventory (PedsQL) 3.0 Arthritis Module in a population with juvenile idiopathic arthritis (JIA). A total of 169 patients with JIA and their parents were enrolled in the study. The Turkish version of the childhood health assessment questionnaire (CHAQ) was used to evaluate the validity of related domains in the PedsQL 3.0 Arthritis Module. Both the PedsQL 3.0 Arthritis Module and CHAQ were filled out by children over 8 years of age and by the parents of children 2-7 years of age. Internal reliability was poor to excellent (Cronbach's alpha coefficients 0.56-0.84 for self-reporting and 0.63-0.82 for parent reporting), and interobserver reliability varied from good to excellent (intraclass correlation coefficient (ICC) 0.79-0.91 for self-reporting and 0.80-0.88 for parent reporting) for the total scores of the PedsQL 3.0 Arthritis Module. Parent-child concordance for all scores was moderate to excellent (ICC 0.42-0.92). The PedsQL 3.0 Arthritis Module and CHAQ were highly positively correlated, with coefficients from 0.21 to 0.76, indicating concurrent validity. We demonstrated the reliability and validity of quality-of-life measurement using the Turkish version of the PedsQL 3.0 Arthritis Module in our sociocultural context. The PedsQL 3.0 Arthritis Module can be utilized as a tool for the evaluation of quality of life in patients with JIA aged 2-18 years.
Validation of the Physical Activity Questionnaire for Older Children (PAQ-C) among Chinese Children.

PubMed

Wang, Jing Jing; Baranowski, Tom; Lau, Wc Patrick; Chen, Tzu An; Pitkethly, Amanda Jane

2016-03-01

This study initially validates the Chinese version of the Physical Activity Questionnaire for Older Children (PAQ-C), which has been identified as a potentially valid instrument to assess moderate-to-vigorous physical activity (MVPA) in children among diverse racial groups. The psychometric properties of the PAQ-C with 742 Hong Kong Chinese children were assessed with the scale's internal consistency, reliability, test-retest reliability, confirmatory factory analysis (CFA) in the overall sample, and multistep invariance tests across gender groups as well as convergent validity with body mass index (BMI), and an accelerometry-based MVPA. The Cronbach alpha coefficient (α=0.79), composite reliability value (ρ=0.81), and the intraclass correlation coefficient (α=0.82) indicate the satisfactory reliability of the PAQ-C score. The CFA indicated data fit a single factor model, suggesting that the PAQ-C measures only one construct, on MVPA over the previous 7 days. The multiple-group CFAs suggested that the factor loadings and variances and covariances of the PAQ-C measurement model were invariant across gender groups. The PAQ-C score was related to accelerometry-based MVPA (r=0.33) and inversely related to BMI (r=-0.18). This study demonstrates the reliability and validity of the PAQ-C in Chinese children. Copyright © 2016 The Editorial Board of Biomedical and Environmental Sciences. Published by China CDC. All rights reserved.
The Outpatient Experience Questionnaire of comprehensive public hospital in China: development, validity and reliability.

PubMed

Hu, Yinhuan; Zhang, Zixia; Xie, Jinzhu; Wang, Guanping

2017-02-01

The objective of this study is to describe the development of the Outpatient Experience Questionnaire (OPEQ) and to assess the validity and reliability of the scale. Literature review, patient interviews, Delphi method and Cross-sectional validation survey. Six comprehensive public hospitals in China. The survey was carried out on a sample of 600 outpatients. Acceptability of the questionnaire was assessed according to the overall response rate, item non-response rate and the average completion time. Correlation coefficients and confirmatory factor analysis were used to test construct validity. Delphi method was used to assess the content validity of the questionnaire. Cronbach's coefficient alpha and split-half reliability coefficient were used to estimate the internal reliability of the questionnaire. The overall response rate was 97.2% and the item non-response rate ranged from 0% to 0.3%. The mean completion time was 6 min. The Spearman correlations of item-total score ranged from 0.466 to 0.765. The results of confirmatory factor analysis showed that all items had factor loadings above 0.40 and the dimension intercorrelation ranged from 0.449 to 0.773, the goodness of fit of the questionnaire was reasonable. The overall authority grade of expert consultation was 0.80 and Kendall's coefficient of concordance W was 0.186. The Cronbach's coefficients alpha of six dimensions ranged from 0.708 to 0.895, the split-half reliability coefficient (Spearman-Brown coefficient) was 0.969. The OPEQ is a promising instrument covering the most important aspects which influence outpatient experiences of comprehensive public hospital in China. It has good evidence for acceptability, validity and reliability. © The Author 2016. Published by Oxford University Press in association with the International Society for Quality in Health Care. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
Reliability and validity of migraine disability assessment questionnaire-Thai version (Thai-MIDAS).

PubMed

Seethong, Piman; Nimmannit, Akarin; Chaisewikul, Rungsan; Prayoonwiwat, Naraporn; Chotinaiwattarakul, Wattanachai

2013-02-01

To assess the validity and test-retest reliability of a Thai translation of the Migraine Disability Assessment (MIDAS) Questionnaire in Thai patients with migraine. Migraineurs from the Headache Clinic in Siriraj Hospital were recruited and asked to complete a 13-weeks diary and answered the Thai-MIDAS at once. Some participants were asked to provide the 2nd Thai-MIDAS in the next 2 weeks for test-retest reliability. Ninety-three patients had completed the 13-weeks diaries. Age range was 18-58 years with mean 37.69 +/- 9.60 years. All 5 items and the total score of Thai-MIDAS were moderately correlated with data from 13-weeks diary (Spearman's correlation coefficient = 0.32-0.62). The test-retest reliability of the total score of Thai-MIDAS in 30 patients demonstrated a highly reliable degree of intraclass correlation (ICC = 0.76, 95% CI 0.49-0.88). The present study reveals that the Thai-MIDAS has satisfactory validity and reliability in comparison with the original English MIDAS version.
Reliability and validity of the Tigrigna version of the Pelvic Floor Distress Inventory-Short Form 20 (PFDI-20) and Pelvic Floor Impact Questionnaire-7 (PFIQ-7).

PubMed

Goba, Gelila K; Legesse, Awol Yeman; Zelelow, Yibrah Berhe; Gebreselassie, Mussie Alemayehu; Rogers, Rebecca G; Kenton, Kimberly S; Mueller, Margaret G

2018-03-13

This study adapted the Pelvic Floor Distress Inventory-Short Form 20 (PFDI-20) and the Pelvic Floor Impact Questionnaire-7 (PFIQ-7) into the Tigrigna language of northern Ethiopia and validated the their reliability and validity through patient interviews. Expert translation, cognitive interviewing, and patient interviews using translated questionnaires were conducted. A subset of women was reinterviewed 1 week later. Intraclass correlation coefficients (ICC), Bland-Altman analysis, and Cronbach's alpha values were assessed. Total and subscale scores were compared between women with and without pelvic floor disorders (PFDs) using the Mann-Whitney U test. Spearman's correlation coefficients were used to compare severity of pelvic organ prolapse (POP) stage according to the POP Quantification (POP-Q) system and PFDI-20 and PFIQ-7 and subscale scores. Ten women participated in cognitive interviewing and 118 age 49 ± 10 years, mean ± standard deviation (SD) with and without PFDs were interviewed using the translated questionnaires, both of which presented adequate face validity and test-retest reliability [intraclass correlation coefficient (ICC) 0.765-0.969, p < 0.001]. Construct validity was significant between clinical symptoms and full forms (p <0.001) and their subscales (p <0.001), except for the Pelvic Organ Prolapse Impact Questionnaire (POPIQ). Differences between first and second scores on total PFDI-20 and PFIQ-7 and subscales largely fell within 0 ± 1.96 SD. Cronbach's alpha values were 0.891-0.930 for PFDI-20 and 0.909-0.956 for PFIQ-7 (p < 0.001). Analysis of known groups showed differences PFDI-20 and PFIQ-7 scores between women with and without PFDs (p <0.001 for full forms and subscales, except for anal incontinence (AI) and the Urinary Impact Questionnaire (UIQ)/POPIQ. The translated Tigrigna versions of the PFDI-20 and PFIQ-7 questionnaires are reliable, valid, and feasible tools to evaluate symptoms and quality of life (QoL) of Tigrigna-speaking Ethiopian women with PFDs.
Reliability of scores between stroke patients and significant others on the Reintegration to Normal Living (RNL) Index.

PubMed

Tooth, Leigh R; McKenna, Kryss T; Smith, Melinda; O'Rourke, Peter K

2003-05-06

This study measured reliability between stroke patients' and significant others' scores on items on the Reintegration to Normal Living (RNL) Index and whether there were any scoring biases. The 11-item RNL Index was administered to 57 pairs of patients and significants six months after stroke rehabilitation. The index was scored using a 10-point visual analogue scale. Patient and significant other demographic information and data on patients' clinical, functional and cognitive status were collected. Reliability was measured using the intra-class correlation coefficient (ICC) and percent agreement. Overall poor reliability was found for the RNL Index total score (ICC=.36, 95% CI .07 to .59) and the daily functioning subscale (ICC=.24, 95% Cl -.003 to .46) and moderate reliability was found for the perception of self subscale (ICC= .55, 95% Cl .28 to .73). There was a moderate bias for patients to rate themselves as achieving better reintegration than was indicated by significant others, although no demographic or clinical factors were associated with this bias. Exact match agreement was best for the subjective items and worse for items reflecting mobility around the community and participation in a work activity. Caution is needed when interpreting patient information reported by significant others on the RNL Index. The use of a shorter scale to rate the RNL Index requires investigation.
Cross-cultural adaptation and validation of the Korean version of the neck disability index.

PubMed

Song, Kyung-Jin; Choi, Byung-Wan; Choi, Byung-Ryeul; Seo, Gyeu-Beom

2010-09-15

Validation of a translated, culturally adapted questionnaire. The purpose of this study is to translate and culturally adapt the Neck Disability Index (NDI) and to validate the use of the derived version in Korean patient. Although several valid measures exist for measurement of neck pain and functional impairment, these measures have yet been validated in Korean version. The NDI was linguistically translated into Korean, and prefinal version was assessed and modified by a pilot study. The reliability and validity of the derived Korean version was examined in 78 patients with degenerative cervical spine disease. Test-retest reliability, internal consistency, and construct validity were investigated by comparing Visual Analogue Scale (VAS) and Short Form Health Survey (SF-36) scores. Factor analysis of Korean NDI extracted 2 factors with eigenvalues >1. The intraclass-correlation coefficient of test-retest reliability was 0.93. Reliability, estimated by internal consistency, had a Cronbach alpha value of 0.82. The correlation between NDI and VAS scores was r = 0.49, and the correlation between NDI and SF-36 scores was r = -0.44. The physical health component score of SF-36 was highly correlated with NDI, and the correlation between VAS scores and the mental health component scores of SF-36 was high. The derived Korean version of the NDI was found to be a reliable and valid instrument for measuring disability in Korean patients with cervical problems. The authors recommend its use in future Korean clinical studies.
Development and reliability of a preliminary Foot Osteoarthritis Magnetic Resonance Imaging Score

PubMed Central

Halstead, Jill; Martín-Hervás, Carmen; Hensor, Elizabeth MA; McGonagle, Dennis; Keenan, Anne-Maree

2017-01-01

Objective Foot osteoarthritis (OA) is very common but under-investigated musculoskeletal condition and there is little consensus as to common MRI imaging features. The aim of this study was to develop a preliminary foot OA MRI score (FOAMRIS) and evaluate its reliability. Methods This preliminary semi-quantitative score included the hindfoot, midfoot and metatarsophalangeal joints. Joints were scored for joint space narrowing (JSN, 0-3), osteophytes (0-3), joint effusion-synovitis and bone cysts (present/absent). Erosions and bone marrow lesions (BMLs) were scored (0-3) and BMLs were evaluated adjacent to entheses and at sub-tendon sites (present/absent). Additionally, tenosynovitis was scored (0-3) and midfoot ligament pathology was scored (present/absent). Reliability was evaluated in 15 people with foot pain and MRI-detected OA using 3.0T MRI multi-sequence protocols and assessed using intraclass correlation coefficients (ICC) as an overall score and per anatomical site (see supplementary data). Results Intra-reader agreement (ICC) was generally good to excellent across the foot in joint features (JSN 0.94, osteophytes 0.94, effusion-synovitis 0.62 and cysts 0.93), bone features (BML 0.89, erosion 0.78, BML-entheses 0.79, BML sub-tendon 0.75) and soft-tissue features (tenosynovitis 0.90, ligaments 0.87). Inter-reader agreement was lower for joint features (JSN 0.60, osteophytes 0.41, effusion-synovitis 0.03) and cysts 0.65, bone features (BML 0.80, erosion 0.00, BML-entheses 0.49, BML sub-tendon -0.24) and soft-tissue features (tenosynovitis 0.48, ligaments 0.50). Conclusion This preliminary FOAMRIS demonstrated good intra-reader reliability and fair inter-reader reliability when assessing the total feature scores. Further development is required in cohorts with a range of pathologies and to assess the psychometric measurement properties. PMID:28572462
The modified gait abnormality rating scale in patients with a conversion disorder: a reliability and responsiveness study.

PubMed

Vandenberg, Justin M; George, Deanna R; O'Leary, Andrea J; Olson, Lindsay C; Strassburg, Kaitlyn R; Hollman, John H

2015-01-01

Individuals with conversion disorder have neurologic symptoms that are not identified by an underlying organic cause. Often the symptoms manifest as gait disturbances. The modified gait abnormality rating scale (GARS-M) may be useful for quantifying gait abnormalities in these individuals. The purpose of this study was to examine the reliability, responsiveness and concurrent validity of GARS-M scores in individuals with conversion disorder. Data from 27 individuals who completed a rehabilitation program were included in this study. Pre- and post-intervention videos were obtained and walking speed was measured. Five examiners independently evaluated gait performance according to the GARS-M criteria. Inter- and intrarater reliability of GARS-M scores were estimated with intraclass correlation coefficients (ICCs). Responsiveness was estimated with the minimum detectable change (MDC). Pre- to post-treatment changes in GARS-M scores were analyzed with a dependent t-test. The correlation between GARS-M scores and walking speed was analyzed to assess concurrent validity. GARS-M scores were quantified with good-to-excellent inter- (ICC = 0.878) and intrarater reliability (ICC = 0.989). The MDC was 2 points. Mean GARS-M scores decreased from 7 ± 5 at baseline to 1 ± 2 at discharge (t26 = 7.411, p < 0.001) and 85% of patients improved beyond the MDC. Furthermore, GARS-M scores and walking speed measurements were moderately correlated (r = -0.582, p = 0.004), indicating that the GARS-M has acceptable concurrent validity. Our findings provide evidence that the GARS-M scores are reliable, valid and responsive for quantifying gait abnormalities in patients with conversion disorder. GARS-M scores provide objective measures upon which treatment effects can be assessed. Copyright © 2014 Elsevier B.V. All rights reserved.
Reliability of the modified Gross Motor Function Measure-88 (GMFM-88) for children with both Spastic Cerebral Palsy and Cerebral Visual Impairment: A preliminary study.

PubMed

Salavati, M; Krijnen, W P; Rameckers, E A A; Looijestijn, P L; Maathuis, C G B; van der Schans, C P; Steenbergen, B

2015-01-01

The aims of this study were to adapt the Gross Motor Function Measure-88 (GMFM-88) for children with Cerebral Palsy (CP) and Cerebral Visual Impairment (CVI) and to determine the test-retest and interobserver reliability of the adapted version. Sixteen paediatric physical therapists familiar with CVI participated in the adaptation process. The Delphi method was used to gain consensus among a panel of experts. Seventy-seven children with CP and CVI (44 boys and 33 girls, aged between 50 and 144 months) participated in this study. To assess test-retest and interobserver reliability, the GMFM-88 was administered twice within three weeks (Mean=9 days, SD=6 days) by trained paediatric physical therapists, one of whom was familiar with the child and one who wasn't. Percentages of identical scores, Cronbach's alphas and intraclass correlation coefficients (ICC) were computed for each dimension level. All experts agreed on the proposed adaptations of the GMFM-88 for children with CP and CVI. Test-retest reliability ICCs for dimension scores were between 0.94 and 1.00, mean percentages of identical scores between 29 and 71, and interobserver reliability ICCs of the adapted GMFM-88 were 0.99-1.00 for dimension scores. Mean percentages of identical scores varied between 53 and 91. Test-retest and interobserver reliability of the GMFM-88-CVI for children with CP and CVI was excellent. Internal consistency of dimension scores lay between 0.97 and 1.00. The psychometric properties of the adapted GMFM-88 for children with CP and CVI are reliable and comparable to the original GMFM-88. Copyright © 2015 Elsevier Ltd. All rights reserved.
Cross-cultural adaptation and validation of the sino-nasal outcome test (SNOT-22) for Spanish-speaking patients.

PubMed

de los Santos, Gonzalo; Reyes, Pablo; del Castillo, Raúl; Fragola, Claudio; Royuela, Ana

2015-11-01

Our objective was to perform translation, cross-cultural adaptation and validation of the sino-nasal outcome test 22 (SNOT-22) to Spanish language. SNOT-22 was translated, back translated, and a pretest trial was performed. The study included 119 individuals divided into 60 cases, who met diagnostic criteria for chronic rhinosinusitis according to the European Position Paper on Rhinosinusitis 2012; and 59 controls, who reported no sino-nasal disease. Internal consistency was evaluated with Cronbach's alpha test, reproducibility with Kappa coefficient, reliability with intraclass correlation coefficient (ICC), validity with Mann-Whitney U test and responsiveness with Wilcoxon test. In cases, Cronbach's alpha was 0.91 both before and after treatment, as for controls, it was 0.90 at their first test assessment and 0.88 at 3 weeks. Kappa coefficient was calculated for each item, with an average score of 0.69. ICC was also performed for each item, with a score of 0.87 in the overall score and an average among all items of 0.71. Median score for cases was 47, and 2 for controls, finding the difference to be highly significant (Mann-Whitney U test, p < 0.001). Clinical changes were observed among treated patients, with a median score of 47 and 13.5 before and after treatment, respectively (Wilcoxon test, p < 0.001). The effect size resulted in 0.14 in treated patients whose status at 3 weeks was unvarying; 1.03 in those who were better and 1.89 for much better group. All controls were unvarying with an effect size of 0.05. The Spanish version of the SNOT-22 has the internal consistency, reliability, reproducibility, validity and responsiveness necessary to be a valid instrument to be used in clinical practice.

Validity and Reliability of the Persian Version of Baecke Habitual Physical Activity Questionnaire in Healthy Subjects.

PubMed

Sadeghisani, Meissam; Dehghan Manshadi, Farideh; Azimi, Hadi; Montazeri, Ali

2016-09-01

Baecke Habitual Physical Activity Questionnaire (BHPAQ) has widely been employed in clinical and laboratorial studies as a tool for measuring subjects' physical activities. But, the reliability and validity of this questionnaire have not been investigated among Persian speakers. Therefore, the aim of the current study was examining the reliability and validity of the Persian version of the BHPAQ in healthy Persian adults. After following the process of forward-backward translation, 32 subjects were invited to fill out the Persian version of the questionnaire in two independent sessions (3 - 7 days after the first session) in order to determine the reliability index. Also, the validity of the questionnaire was assessed through concurrent validity by 126 subjects (66 males and 60 females) answering both the Baecke and the International Physical Activity Questionnaire (IPAQ). An acceptable level of intraclass correlation coefficient (ICC of work score = 0.95, sport score = 0.93, and leisure score = 0.77) was achieved for the Persian Baecke questionnaire. Correlations between Persian Baecke and IPAQ with and without the score for sitting position were found to be 0.19 and 0.36, respectively. The Persian version of the BHPAQ is a reliable and valid instrument that can be used to measure the level of habitual functional activities in Persian-speaking subjects.
Development of the Japanese 15D instrument of health-related quality of life: verification of reliability and validity among elderly people.

PubMed

Okamoto, Nozomi; Hisashige, Akinori; Tanaka, Yuu; Kurumatani, Norio

2013-01-01

The 15D is a self-administered questionnaire for assessment of health-related quality of life, which contains 15 questions with 5 response options each. This study was conducted to evaluate the reliability and validity of the Japanese 15D. The subjects were 430 community-dwelling elderly people. Each item of the 15D was scored on a 5-point Likert scale, with level 1 being the best, score 1. Reliability was assessed by determination of the internal consistency and test-retest reliability. Criterion-based validity was assessed using the Japanese version of the Nottingham Health Profile (NHP) and Tokyo Metropolitan Institute of Gerontology Index of Competence (TMIG index). Acceptability was assessed by inquiring about the time required to complete the questionnaire and the burden felt in responding to it. The answers of 423 individuals who responded to all items were analyzed. The median time required to complete the questionnaire was 5.0 minutes, and the proportion of subjects who indicated that the questionnaire was easy to complete was 98.3%. The Cronbach's alpha coefficients for all 15 items in the 2 surveys were 0.793 and 0.792, respectively. The intraclass correlation coefficients for the 15 items ranged from 0.44 to 0.72. In the relationship between the 15D and the NHP, the correlation coefficients between the corresponding domains were higher than those between non-corresponding domains. The prevalence of disability in higher-level functional capacity was higher in the "level 2 to 5" group than in the "level 1" group. The Japanese version of the 15D showed sufficient internal consistency and moderate repeatability. Because of the short time required to complete the Japanese 15D and the significant relationships between the scores on the 15D and the NHP, and between the 15D and higher-level functional capacity, the acceptability and validity of the Japanese 15D were considered to be sufficient.
Reliability and validity of the new Tanaka B Intelligence Scale scores: a group intelligence test.

PubMed

Uno, Yota; Mizukami, Hitomi; Ando, Masahiko; Yukihiro, Ryoji; Iwasaki, Yoko; Ozaki, Norio

2014-01-01

The present study evaluated the reliability and concurrent validity of the new Tanaka B Intelligence Scale, which is an intelligence test that can be administered on groups within a short period of time. The new Tanaka B Intelligence Scale and Wechsler Intelligence Scale for Children-Third Edition were administered to 81 subjects (mean age ± SD 15.2 ± 0.7 years) residing in a juvenile detention home; reliability was assessed using Cronbach's alpha coefficient, and concurrent validity was assessed using the one-way analysis of variance intraclass correlation coefficient. Moreover, receiver operating characteristic analysis for screening for individuals who have a deficit in intellectual function (an FIQ<70) was performed. In addition, stratum-specific likelihood ratios for detection of intellectual disability were calculated. The Cronbach's alpha for the new Tanaka B Intelligence Scale IQ (BIQ) was 0.86, and the intraclass correlation coefficient with FIQ was 0.83. Receiver operating characteristic analysis demonstrated an area under the curve of 0.89 (95% CI: 0.85-0.96). In addition, the stratum-specific likelihood ratio for the BIQ≤65 stratum was 13.8 (95% CI: 3.9-48.9), and the stratum-specific likelihood ratio for the BIQ≥76 stratum was 0.1 (95% CI: 0.03-0.4). Thus, intellectual disability could be ruled out or determined. The present results demonstrated that the new Tanaka B Intelligence Scale score had high reliability and concurrent validity with the Wechsler Intelligence Scale for Children-Third Edition score. Moreover, the post-test probability for the BIQ could be calculated when screening for individuals who have a deficit in intellectual function. The new Tanaka B Intelligence Test is convenient and can be administered within a variety of settings. This enables evaluation of intellectual development even in settings where performing intelligence tests have previously been difficult.
Reliability and Validity of the New Tanaka B Intelligence Scale Scores: A Group Intelligence Test

PubMed Central

Uno, Yota; Mizukami, Hitomi; Ando, Masahiko; Yukihiro, Ryoji; Iwasaki, Yoko; Ozaki, Norio

2014-01-01

Objective The present study evaluated the reliability and concurrent validity of the new Tanaka B Intelligence Scale, which is an intelligence test that can be administered on groups within a short period of time. Methods The new Tanaka B Intelligence Scale and Wechsler Intelligence Scale for Children-Third Edition were administered to 81 subjects (mean age ± SD 15.2±0.7 years) residing in a juvenile detention home; reliability was assessed using Cronbach’s alpha coefficient, and concurrent validity was assessed using the one-way analysis of variance intraclass correlation coefficient. Moreover, receiver operating characteristic analysis for screening for individuals who have a deficit in intellectual function (an FIQ<70) was performed. In addition, stratum-specific likelihood ratios for detection of intellectual disability were calculated. Results The Cronbach’s alpha for the new Tanaka B Intelligence Scale IQ (BIQ) was 0.86, and the intraclass correlation coefficient with FIQ was 0.83. Receiver operating characteristic analysis demonstrated an area under the curve of 0.89 (95% CI: 0.85–0.96). In addition, the stratum-specific likelihood ratio for the BIQ≤65 stratum was 13.8 (95% CI: 3.9–48.9), and the stratum-specific likelihood ratio for the BIQ≥76 stratum was 0.1 (95% CI: 0.03–0.4). Thus, intellectual disability could be ruled out or determined. Conclusion The present results demonstrated that the new Tanaka B Intelligence Scale score had high reliability and concurrent validity with the Wechsler Intelligence Scale for Children-Third Edition score. Moreover, the post-test probability for the BIQ could be calculated when screening for individuals who have a deficit in intellectual function. The new Tanaka B Intelligence Test is convenient and can be administered within a variety of settings. This enables evaluation of intellectual development even in settings where performing intelligence tests have previously been difficult. PMID:24940880
Critical thinking evaluation in reflective writing: Development and testing of Carter Assessment of Critical Thinking in Midwifery (Reflection).

PubMed

Carter, Amanda G; Creedy, Debra K; Sidebotham, Mary

2017-11-01

develop and test a tool designed for use by academics to evaluate pre-registration midwifery students' critical thinking skills in reflective writing. a descriptive cohort design was used. a random sample (n = 100) of archived student reflective writings based on a clinical event or experience during 2014 and 2015. a staged model for tool development was used to develop a fifteen item scale involving item generation; mapping of draft items to critical thinking concepts and expert review to test content validity; inter-rater reliability testing; pilot testing of the tool on 100 reflective writings; and psychometric testing. Item scores were analysed for mean, range and standard deviation. Internal reliability, content and construct validity were assessed. expert review of the tool revealed a high content validity index score of 0.98. Using two independent raters to establish inter-rater reliability, good absolute agreement of 72% was achieved with a Kappa coefficient K = 0.43 (p<0.0001). Construct validity via exploratory factor analysis revealed three factors: analyses context, reasoned inquiry, and self-evaluation. The mean total score for the tool was 50.48 (SD = 12.86). Total and subscale scores correlated significantly. The scale achieved good internal reliability with a Cronbach's alpha coefficient of .93. this study establishedthe reliability and validity of the CACTiM (reflection) for use by academics to evaluate midwifery students' critical thinking in reflective writing. Validation with large diverse samples is warranted. reflective practice is a key learning and teaching strategy in undergraduate Bachelor of Midwifery programmes and essential for safe, competent practice. There is the potential to enhance critical thinking development by assessingreflective writing with the CACTiM (reflection) tool to provide formative and summative feedback to students and inform teaching strategies. Crown Copyright © 2017. Published by Elsevier Ltd. All rights reserved.
Reliability and validity of the Korean version of Pediatric Voice Handicap Index: in school age children.

PubMed

Park, Sung Shin; Kwon, Tack-Kyun; Choi, Seong Hee; Lee, Won Yong; Hong, Young Hye; Jeong, Nyun Gi; Sung, Myung-Whun; Kim, Kwang Hyun

2013-01-01

The aim of this study was to assess the reliability and validity of the Pediatric Voice Handicap Index (pVHI) for cross-cultural adaptation of the Korean version with school age children. The questionnaire was translated into Korean and was completed by 101 Korean parents who have children with or without disordered voice. The Korean version-pVHI scores were obtained with 60 parents of normal children and 41 parents who have children with voice problems. Content validity was verified by five experienced speech-language pathologists with clinical specialization in voice disorders. Internal consistency was calculated through Cronbach's α coefficient and test-retest reliability of the Korean version-pVHI score was determined using Pearson product-moment correlation coefficients. Mann-Whitney U test was used to compare GRBAS with the Korean version-pVHI scores between normal and dysphonia group. The relationship between the parent-reported the Korean version-pVHI total scores and perceptual ratings of voice quality from experts was investigated using Spearman correlation coefficients. The results showed that the Korean version-pVHI provided a high internal consistency (α=0.92) and test-retest reliability of its subscales: total (T) 0.97, functional (F) 0.90, physical (P) 0.95, emotional (E) 0.92. The Korean version-pVHI mean scores in normal group were 1.28 (T), 0.62 (F), 0.35 (P) and 0.32 (E), respectively whereas those of the Korean version-pVHI in children group with dysphonia were 23.13 (T), 8.90 (F), 9.54 (P) and 4.93 (E). Significant differences in the Korean version-pVHI (T, F, P, E) and perceptual evaluation (grade, rough, breathy) between normal and dysphonia group were revealed (P<0.05). Moreover, relatively moderate-to-high correlation between the Korean version-pVHI parameters (T) and perceptual measures (G) was exhibited in children with dysphonia. The subjective Korean version-pVHI can be applicable and useful supplementary tool for evaluating parents' perception of their children's voice dysfunction, identifying multifactors on daily life affecting their children's voice and measuring treatment efficacy before and after therapeutic intervention. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Translation and validation of a Nepalese version of the Psychosocial Impact of Dental Aesthetic Questionnaire (PIDAQ).

PubMed

Singh, Varun Pratap; Singh, Rajkumar

2014-03-01

The aim of this study was to develop a reliable and valid Nepali version of the Psychosocial Impact of Dental Aesthetic Questionnaire (PIDAQ). Cross-sectional descriptive validation study. B.P. Koirala Institute of Health Sciences, Dharan, Nepal. A rigorous translation process including conceptual and semantic evaluation, translation, back translation and pre-testing was carried out. Two hundred and fifty-two undergraduates, including equal numbers of males and females with an age ranging from 18 to 29 years (mean age: 22·33±2·114 years), participated in this study. Reliability was assessed by Cronbach's alpha coefficient and the coefficient of correlation was used to assess correlation between items and test-retest reliability. The construct validity was tested by factorial analysis. Convergent construct validity was tested by comparison of PIDAQ scores with the aesthetic component of the index of orthodontic treatment needs (IOTN-AC) and perception of occlusion scale (POS), respectively. Discriminant construct validity was assessed by differences in score for those who demand treatment and those who did not. The response rate was 100%. One hundred and twenty-three individuals had a demand for orthodontic treatment. The Nepali PIDAQ had excellent reliability with Cronbach's alpha of 0·945, corrected item correlation between 0·525 and 0·790 and overall test-retest reliability of 0·978. The construct validity was good with formation of a new sub-domain 'Dental self-consciousness'. The scale had good correlation with IOTN-AC and POS fulfilling convergent construct validity. The discriminant construct validity was proved by significant differences in scores for subjects with demand and without demand for treatment. To conclude, Nepali version of PIDAQ has good psychometric properties and can be used effectively in this population group for further research.
Translation, cross-cultural adaptation, and validation of the Turkish version of the Harris Hip Score.

PubMed

Çelik, Derya; Can, Canan; Aslan, Yasemin; Ceylan, Hasan Huseyin; Bilsel, Kerem; Ozdincler, Arzu Razak

2014-01-01

The Harris Hip Score (HHS) developed to assess function and pain from the perspective of patients hip pathologies. The purpose of this study was to translate and culturally adapt the HHS into Turkish, and thereby determine the reliability and validity of the translated version. The HHS was translated into Turkish in accordance with the stages recommended by Beaton. The measurement properties of the HHS were tested in 80 patients; 52 males, mean age 51 years (range 21-75 years) suffering from different hip pathologies. The test-retest reliability was tested in 58 patients; 28 males mean age, 52 years (range 30-73 years) after an interval of seven days. The Cronbach's Alpha was used to assess internal consistency and the intra-class correlation coefficient (ICC) was used to estimate the test-retest reliability. Patients were asked to answer the Oxford Hip Score (OHS), the Western Ontario and McMaster Universities Arthritis Index (WOMAC), the VAS and the Short Form-36 (SF-36) for the validity of the estimation. The Turkish version of the HHS showed sufficient internal consistency (Cronbach's alpha,0.70) and test-retest reliability (ICC = 0.91). The correlation coefficients between the HHS, the WOMAC and the OHS were 0.64 and 0.89 respectively. The highest correlations between the HHS and SF-36 were with the physical function scale (r = 0.72), and the lowest correlations were with the mental function scale (r = 0.10). We observed no floor or ceiling effects. The Turkish version of the HHS has sufficient reliability and validity to measure patient-reported outcome for Turkish-speaking individuals with a variety of hip disorders.
Developing an oropharyngeal cancer (OPC) knowledge and behaviors survey.

PubMed

Dodd, Virginia J; Riley Iii, Joseph L; Logan, Henrietta L

2012-09-01

To use the community participation research model to (1) develop a survey assessing knowledge about mouth and throat cancer and (2) field test and establish test-retest reliability with newly developed instrument. Cognitive interviews with primarily rural African American adults to assess their perception and interpretation of survey items. Test-retest reliability was established with a racially diverse rural population. Test-retest reliabilities ranged from .79 to .40 for screening awareness and .74 to .19 for knowledge. Coefficients increased for composite scores. Community participation methodology provided a culturally appropriate survey instrument that demonstrated acceptable levels of reliability.
Manual muscle testing and hand-held dynamometry in people with inflammatory myopathy: An intra- and interrater reliability and validity study

PubMed Central

Baschung Pfister, Pierrette; Sterkele, Iris; Maurer, Britta; de Bie, Rob A.; Knols, Ruud H.

2018-01-01

Manual muscle testing (MMT) and hand-held dynamometry (HHD) are commonly used in people with inflammatory myopathy (IM), but their clinimetric properties have not yet been sufficiently studied. To evaluate the reliability and validity of MMT and HHD, maximum isometric strength was measured in eight muscle groups across three measurement events. To evaluate reliability of HHD, intra-class correlation coefficients (ICC), the standard error of measurements (SEM) and smallest detectable changes (SDC) were calculated. To measure reliability of MMT linear Cohen`s Kappa was computed for single muscle groups and ICC for total score. Additionally, correlations between MMT8 and HHD were evaluated with Spearman Correlation Coefficients. Fifty people with myositis (56±14 years, 76% female) were included in the study. Intra-and interrater reliability of HHD yielded excellent ICCs (0.75–0.97) for all muscle groups, except for interrater reliability of ankle extension (0.61). The corresponding SEMs% ranged from 8 to 28% and the SDCs% from 23 to 65%. MMT8 total score revealed excellent intra-and interrater reliability (ICC>0.9). Intrarater reliability of single muscle groups was substantial for shoulder and hip abduction, elbow and neck flexion, and hip extension (0.64–0.69); moderate for wrist (0.53) and knee extension (0.49) and fair for ankle extension (0.35). Interrater reliability was moderate for neck flexion (0.54) and hip abduction (0.44); fair for shoulder abduction, elbow flexion, wrist and ankle extension (0.20–0.33); and slight for knee extension (0.08). Correlations between the two tests were low for wrist, knee, ankle, and hip extension; moderate for elbow flexion, neck flexion and hip abduction; and good for shoulder abduction. In conclusion, the MMT8 total score is a reliable assessment to consider general muscle weakness in people with myositis but not for single muscle groups. In contrast, our results confirm that HHD can be recommended to evaluate strength of single muscle groups. PMID:29596450
The Effect of Music on the Test Scores of the Students in Limits and Derivatives Subject in the Mathematics Exams Done with Music

ERIC Educational Resources Information Center

Kesan, Cenk; Ozkalkan, Zuhal; Iric, Hamdullah; Kaya, Deniz

2012-01-01

In the exams based on limits and derivatives, in this study, it was tried to determine that if there was any difference in students' test scores according to the type of music listened to and environment without music. For this purpose, the achievement test including limits and derivatives and whose reliability coefficient of Cronbach Alpha is…
The relative reliability of actively participating and passively observing raters in a simulation-based assessment for selection to specialty training in anaesthesia.

PubMed

Roberts, M J; Gale, T C E; Sice, P J A; Anderson, I R

2013-06-01

Selection to specialty training is a high-stakes assessment demanding valuable consultant time. In one initial entry level and two higher level anaesthesia selection centres, we investigated the feasibility of using staff participating in simulation scenarios, rather than observing consultants, to rate candidate performance. We compared participant and observer scores using four different outcomes: inter-rater reliability; score distributions; correlation of candidate rankings; and percentage of candidates whose selection might be affected by substituting participants' for observers' ratings. Inter-rater reliability between observers was good (correlation coefficient 0.73-0.96) but lower between participants (correlation coefficient 0.39-0.92), particularly at higher level where participants also rated candidates more favourably than did observers. Station rank orderings were strongly correlated between the rater groups at entry level (rho 0.81, p < 0.001) but weaker at the two higher level centres (rho 0.52, p = 0.018; rho 0.58, p = 0.001). Substituting participants' for observers' ratings had less effect once scores were combined with those from other selection centre stations. Selection decisions for 0-20% of candidates could have changed, depending on the numbers of training posts available. We conclude that using participating raters is feasible at initial entry level only. Anaesthesia © 2013 The Association of Anaesthetists of Great Britain and Ireland.
Adaptation and Validation of a Nutrition Environment Measures Survey for University Grab-and-Go Establishments.

PubMed

Lo, Brian K C; Minaker, Leia; Chan, Alicia N T; Hrgetic, Jessica; Mah, Catherine L

2016-03-01

To adapt and validate a survey instrument to assess the nutrition environment of grab-and-go establishments at a university campus. A version of the Nutrition Environment Measures Survey for grab-and-go establishments (NEMS-GG) was adapted from existing NEMS instruments and tested for reliability and validity through a cross-sectional assessment of the grab-and-go establishments at the University of Toronto. Product availability, price, and presence of nutrition information were evaluated. Cohen's kappa coefficient and intra-class correlation coefficients (ICC) were assessed for inter-rater reliability, and construct validity was assessed using the known-groups comparison method (via store scores). Fifteen grab-and-go establishments were assessed. Inter-rater reliability was high with an almost perfect agreement for availability (mean κ = 0.995) and store scores (ICC = 0.999). The tool demonstrated good face and construct validity. About half of the venues carried fruit and vegetables (46.7% and 53.3%, respectively). Regular and healthier entrée items were generally the same price. Healthier grains were cheaper than regular options. Six establishments displayed nutrition information. Establishments operated by the university's Food Services consistently scored the highest across all food premise types for nutrition signage, availability, and cost of healthier options. Health promotion strategies are needed to address availability and variety of healthier grab-and-go options in university settings.
A polarized light microscopy method for accurate and reliable grading of collagen organization in cartilage repair.

PubMed

Changoor, A; Tran-Khanh, N; Méthot, S; Garon, M; Hurtig, M B; Shive, M S; Buschmann, M D

2011-01-01

Collagen organization, a feature that is critical for cartilage load bearing and durability, is not adequately assessed in cartilage repair tissue by present histological scoring systems. Our objectives were to develop a new polarized light microscopy (PLM) score for collagen organization and to test its reliability. This PLM score uses an ordinal scale of 0-5 to rate the extent that collagen network organization resembles that of young adult hyaline articular cartilage (score of 5) vs a totally disorganized tissue (score of 0). Inter-reader reliability was assessed using Intraclass Correlation Coefficients (ICC) for Agreement, calculated from scores of three trained readers who independently evaluated blinded sections obtained from normal (n=4), degraded (n=2) and repair (n=22) human cartilage biopsies. The PLM score succeeded in distinguishing normal, degraded and repair cartilages, where the latter displayed greater complexity in collagen structure. Excellent inter-reader reproducibility was found with ICCs for Agreement of 0.90 [ICC(2,1)] (lower boundary of the 95% confidence interval is 0.83) and 0.96 [ICC(2,3)] (lower boundary of the 95% confidence interval is 0.94), indicating the reliability of a single reader's scores and the mean of all three readers' scores, respectively. This PLM method offers a novel means for systematically evaluating collagen organization in repair cartilage. We propose that it be used to supplement current gold standard histological scoring systems for a more complete assessment of repair tissue quality. Copyright © 2010 Osteoarthritis Research Society International. Published by Elsevier Ltd. All rights reserved.
Alberta infant motor scale: reliability and validity when used on preterm infants in Taiwan.

PubMed

Jeng, S F; Yau, K I; Chen, L C; Hsiao, S F

2000-02-01

The goal of this study was to examine the reliability and validity of measurements obtained with the Alberta Infant Motor Scale (AIMS) for evaluation of preterm infants in Taiwan. Two independent groups of preterm infants were used to investigate the reliability (n=45) and validity (n=41) for the AIMS. In the reliability study, the AIMS was administered to the infants by a physical therapist, and infant performance was videotaped. The performance was then rescored by the same therapist and by 2 other therapists to examine the intrarater and interrater reliability. In the validity study, the AIMS and the Bayley Motor Scale were administered to the infants at 6 and 12 months of age to examine criterion-related validity. Intraclass correlation coefficients (ICCs) for intrarater and interrater reliability of measurements obtained with the AIMS were high (ICC=.97-.99). The AIMS scores correlated with the Bayley Motor Scale scores at 6 and 12 months (r=.78 and.90), although the AIMS scores at 6 months were only moderately predictive of the motor function at 12 months (r=.56). The results suggest that measurements obtained with the AIMS have acceptable reliability and concurrent validity but limited predictive value for evaluating preterm Taiwanese infants.
Comparison of formula and number-right scoring in undergraduate medical training: a Rasch model analysis.

PubMed

Cecilio-Fernandes, Dario; Medema, Harro; Collares, Carlos Fernando; Schuwirth, Lambert; Cohen-Schotanus, Janke; Tio, René A

2017-11-09

Progress testing is an assessment tool used to periodically assess all students at the end-of-curriculum level. Because students cannot know everything, it is important that they recognize their lack of knowledge. For that reason, the formula-scoring method has usually been used. However, where partial knowledge needs to be taken into account, the number-right scoring method is used. Research comparing both methods has yielded conflicting results. As far as we know, in all these studies, Classical Test Theory or Generalizability Theory was used to analyze the data. In contrast to these studies, we will explore the use of the Rasch model to compare both methods. A 2 × 2 crossover design was used in a study where 298 students from four medical schools participated. A sample of 200 previously used questions from the progress tests was selected. The data were analyzed using the Rasch model, which provides fit parameters, reliability coefficients, and response option analysis. The fit parameters were in the optimal interval ranging from 0.50 to 1.50, and the means were around 1.00. The person and item reliability coefficients were higher in the number-right condition than in the formula-scoring condition. The response option analysis showed that the majority of dysfunctional items emerged in the formula-scoring condition. The findings of this study support the use of number-right scoring over formula scoring. Rasch model analyses showed that tests with number-right scoring have better psychometric properties than formula scoring. However, choosing the appropriate scoring method should depend not only on psychometric properties but also on self-directed test-taking strategies and metacognitive skills.
Braden scale (ALB) for assessing pressure ulcer risk in hospital patients: A validity and reliability study.

PubMed

Chen, Hong-Lin; Cao, Ying-Juan; Zhang, Wei; Wang, Jing; Huai, Bao-Sha

2017-02-01

The inter-rater reliability of Braden Scale is not so good. We modified the Braden(ALB) scale by defining nutrition subscale based on serum albumin, then assessed it's the validity and reliability in hospital patients. We designed a retrospective study for validity analysis, and a prospective study for reliability analysis. Receiver operating curve (ROC) and area under the curve (AUC) were used to evaluate the predictive validity. Intra-class correlation coefficient (ICC) was used to investigate the inter-rater reliability. Two thousand five hundred twenty-five patients were included for validity analysis, 76 patients (3.0%) developed pressure ulcer. Positive correlation was found between serum albumin and nutrition score in Braden scale (Spearman's coefficient 0.2203, P<0.0001). The AUCs for Braden scale and Braden(ALB) scale predicting pressure ulcer risk were 0.813 (95% CI 0.797-0.828; P<0.0001), and 0.859 (95% CI 0.845-0.872; P<0.0001), respectively. The Braden(ALB) scale was even more valid than the Braden scale (z=1.860, P=0.0628). In different age subgroups, the Braden(ALB) scale seems also more valid than the original Braden scale, but no statistically significant differences were found (P>0.05). The inter-rater reliability study showed the ICC-value for nutrition increased 45.9%, and increased 4.3% for total score. The Braden(ALB) scale has similar validity compared with the original Braden scale for in hospital patients. However, the inter-rater reliability was significantly increased. Copyright © 2016 Elsevier Inc. All rights reserved.
Reliability and sources of variation of the ABILHAND-Kids questionnaire in children with cerebral palsy.

PubMed

de Jong, Lex D; van Meeteren, Annemiek; Emmelot, Cornelis H; Land, Nanne E; Dijkstra, Pieter U

2018-03-01

To determine reliability of the ABILHAND-Kids, explore sources of variation associated with these measurement results, and generate repeatability coefficients. A reliability study with a repeated measures design was performed in an ambulatory rehabilitation care department from a rehabilitation center, and a center for special education. A physician, an occupational therapist, and parents of 27 children with spastic cerebral palsy independently rated the children's manual capacity when performing 21 standardized tasks of the ABILHAND-Kids from video recordings twice with a three week time interval (27 first-, and 25 second video recordings available). Parents additionally rated their children's performance based on their own perception of their child's ability to perform manual activities in everyday life, resulting in eight ratings per child. ABILHAND-Kids ratings were systematically different between observers, sessions, and rating method. Participant × observer interaction (66%) and residual variance (20%) contributed the most to error variance (9%). Test-retest reliability was 0.92. Repeatability coefficients (between 0.81 and 1.82 logit points) were largest for the parents' performance-based ratings. ABILHAND-Kids scores can be reliably used as a performance- and capacity-based rating method across different raters. Parents' performance-based ratings are less reliable than their capacity-based ratings. Resulting repeatability coefficients can be used to interpret ABILHAND-Kids ratings with more confidence. Implications for Rehabilitation The ABILHAND-Kids is a valuable tool to assess a child's unimanual and bimanual upper limb activities. The reliability of the ABILHANDS-Kids is good across different observers as a performance- and capacity-based rating method. Parents' performance-based ratings are less reliable than their capacity-based ones. This study has generated repeatability coefficients for clinical decision making.
Development and psychometric evaluation of a clinical global impression for schizoaffective disorder scale.

PubMed

Allen, Michael H; Daniel, David G; Revicki, Dennis A; Canuso, Carla M; Turkoz, Ibrahim; Fu, Dong-Jing; Alphs, Larry; Ishak, K Jack; Bartko, John J; Lindenmayer, Jean-Pierre

2012-01-01

The Clinical Global Impression for Schizoaffective Disorder scale is a new rating scale adapted from the Clinical Global Impression scale for use in patients with schizoaffective disorder. The psychometric characteristics of the Clinical Global Impression for Schizoaffective Disorder are described. Content validity was assessed using an investigator questionnaire. Inter-rater reliability was determined with 12 sets of videotaped interviews rated independently by two trained individuals. Test-retest reliability was assessed using 30 randomly selected raters from clinical trials who evaluated the same videos on separate occasions two weeks apart. Convergent and divergent validity and effect size were evaluated by comparing scores between the Clinical Global Impression for Schizoaffective Disorder and the Positive and Negative Syndrome Scale, 21-item Hamilton Rating Scale for Depression, and Young Mania Rating Scale scales using pooled patient data from two clinical trials. Clinical Global Impression for Schizoaffective Disorder scores were then linked to corresponding Positive and Negative Syndrome Scale scores. Content validity was strong. Inter-rater agreement was good to excellent for most scales and subscales (intra-class correlation coefficient ≥ 0.50). Test-retest showed good reproducibility, with intraclass correlation coefficients ranging from 0.444 to 0.898. Spearman correlations between Clinical Global Impression for Schizoaffective Disorder domains and corresponding symptom scales were 0.60 or greater, and effect sizes for Clinical Global Impression for Schizoaffective Disorder overall and domain scores were similar to Positive and Negative Syndrome Scale Young Mania Rating Scale, and 21-item Hamilton Rating Scale for Depression scores. Raters anticipated that the scale might be less effective in distinguishing negative from depressive symptoms, and, in fact, the results here may reflect that clinical reality. Multiple lines of evidence support the reliability and validity of the Clinical Global Impression for Schizoaffective Disorder for studies in schizoaffective disorder.
Development of a questionnaire to evaluate asthma control in Japanese asthma patients.

PubMed

Tohda, Yuji; Hozawa, Soichiro; Tanaka, Hiroshi

2018-01-01

The asthma control questionnaires used in Japan are Japanese translations of those developed outside Japan, and have some limitations; a questionnaire designed to optimally evaluate asthma control levels for Japanese may be necessary. The present study was conducted to validate the Japan Asthma Control Survey (JACS) questionnaire in Japanese asthma patients. A total of 226 adult patients with mild to severe persistent asthma were enrolled and responded to the JACS questionnaire, asthma control questionnaire (ACQ), and Mini asthma quality of life questionnaire (Mini AQLQ) at Weeks 0 and 4. The reliability, validity, and sensitivity/responsiveness of the JACS questionnaire were evaluated. The intra-class correlation coefficients (ICCs) were within the range of 0.55-0.75 for all JACS scores, indicating moderate/substantial reproducibility. For internal consistency, Cronbach's alpha coefficients ranged from 0.76 to 0.92 in total and subscale scores, which were greater than the lower limit of internal consistency. As for factor validity, the cumulative contribution ratio of four main factors was 0.66. For criterion-related validity, the correlation coefficients between the JACS total score and ACQ5, ACQ6, and Mini AQLQ scores were -0.78, -0.78, and 0.77, respectively, showing a significant correlation (p < 0.0001). The JACS questionnaire was validated in terms of reliability and validity. It will be necessary to evaluate the therapeutic efficacy measured by the JACS questionnaire and calculate cutoff values for the asthma control status in a higher number of patients. UMIN000016589. Copyright © 2017 Japanese Society of Allergology. Production and hosting by Elsevier B.V. All rights reserved.

Telepsychiatry: assessment of televideo psychiatric interview reliability with present- and next-generation internet infrastructures.

PubMed

Yoshino, A; Shigemura, J; Kobayashi, Y; Nomura, S; Shishikura, K; Den, R; Wakisaka, H; Kamata, S; Ashida, H

2001-09-01

We assessed the reliability of remote video psychiatric interviews conducted via the internet using narrow and broad bandwidths. Televideo psychiatric interviews conducted with 42 in-patients with chronic schizophrenia using two bandwidths (narrow, 128 kilobits/s; broad, 2 megabits/s) were assessed in terms of agreement with face-to-face interviews in a test-retest fashion. As a control, agreement was assessed between face-to-face interviews. Psychiatric symptoms were rated using the Oxford version of the Brief Psychiatric Rating Scale (BPRS), and agreement between interviews was estimated as the intraclass correlation coefficient (ICC). The ICC was significantly lower in the narrow bandwidth than in the broad bandwidth and the control for both positive symptoms score and total score. While reliability of televideo psychiatric interviews is insufficient using the present narrow-band internet infrastructure, the next generation of infrastructure (broad-band) may permit reliable diagnostic interviews.
Assessing fundamental 2-dimensional understanding of basic soft tissue techniques.

PubMed

Jabbour, Noel; Dobratz, Eric J; Dresner, Harley S; Hilger, Peter A

2011-01-01

To develop a written practical examination and scoring system for assessing trainee skills in basic soft-tissue techniques. A brief written practical examination was developed to assess the ability of trainees to sketch preoperative plans and postoperative results for common soft-tissue techniques: simple-excision, M-plasty, geometric broken line closure, Z-plasty, V-to-Y flap, and rhombic flap. A scoring system was developed to assign 0 to 5 points to each of 10 items on the examination for a total score of 0-50. The 15-minute examination was administered as a pretest, posttest, and 3-month posttest assessment as part of a soft-tissue course at our institution. University of Minnesota, Otolaryngology Department. Three raters reviewed all examination answer sheets independently. The pretest scores of examinees correlated strongly with their level of training; the average pretest for junior residents (PGY 1-2) compared with senior residents (PGY 4-5) was 17.3 (of 50) versus 26.0 (p < 0.01). The scoring system showed a high intrarater reliability and high interrater reliability with correlation coefficients of r = 0.99 and r = 0.95, respectively and agreement coefficients of κ = 0.82 and κ = 0.77, respectively. This written practical examination and scoring system may be used to assess the skills of trainees accurately in basic soft tissue techniques and to expose areas of deficiency that can be addressed in future training sessions. Copyright © 2011 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.
Translation to Brazilian Portuguese, cultural adaptation and reproducibility of the questionnaire "Ankylosing Spondylitis: What do you know?"

PubMed

Orlandi, Aline; Brumini, Christine; Jones, Anamaria; Natour, Jamil

2016-09-26

Ankylosing spondylitis (AS) generates inflammation and pain in entheses, peripheral joints and the spine. Education regarding AS can improve patients' disability. Thus, it is important to assess patients' knowledge. There is no instrument in the literature for assessing knowledge of AS in Portuguese. The aim here was to translate to the Brazilian Portuguese language, culturally adapt and test the reliability of the questionnaire "Ankylosing Spondylitis: What do you know?" and to correlate the findings with other factors. Original article regarding validation of questionnaire, produced at the Federal University of Sao Paulo (Unifesp). For translation and cultural adaptation, Guilleman methodology was used. After the first phase, the reliability was tested on 30 patients. Correlations between these scores and other factors were examined. In the interobserver assessment, the Pearson correlation coefficient and Cronbach's alpha were 0.831 and 0.895, respectively. In the intraobserver evaluation, the intraclass correlation coefficient and Cronbach's alpha were 0.79 and 0.883, respectively. At this stage, the score for area of knowledge A showed correlations with ethnicity and education; the score for area D, with age; the total score and scores for areas A and B with "social aspects" of SF-36; and the score for area D with "pain", "vitality" and "emotional aspects" of SF-36. The Brazilian version of the questionnaire "Ankylosing Spondylitis: What do you know?" was created. It is reproducible and correlates with education level, ethnicity and the SF-36 domains "social aspects" and "emotional aspects".
Validation of the Chinese Version of the Quality of Nursing Work Life Scale

PubMed Central

Fu, Xia; Xu, Jiajia; Song, Li; Li, Hua; Wang, Jing; Wu, Xiaohua; Hu, Yani; Wei, Lijun; Gao, Lingling; Wang, Qiyi; Lin, Zhanyi; Huang, Huigen

2015-01-01

Quality of Nursing Work Life (QNWL) serves as a predictor of a nurse’s intent to leave and hospital nurse turnover. However, QNWL measurement tools that have been validated for use in China are lacking. The present study evaluated the construct validity of the QNWL scale in China. A cross-sectional study was conducted conveniently from June 2012 to January 2013 at five hospitals in Guangzhou, which employ 1938 nurses. The participants were asked to complete the QNWL scale and the World Health Organization Quality of Life abbreviated version (WHOQOL-BREF). A total of 1922 nurses provided the final data used for analyses. Sixty-five nurses from the first investigated division were re-measured two weeks later to assess the test-retest reliability of the scale. The internal consistency reliability of the QNWL scale was assessed using Cronbach’s α. Test-retest reliability was assessed using the intra-class correlation coefficient (ICC). Criterion-relation validity was assessed using the correlation of the total scores of the QNWL and the WHOQOL-BREF. Construct validity was assessed with the following indices: χ2 statistics and degrees of freedom; relative mean square error of approximation (RMSEA); the Akaike information criterion (AIC); the consistent Akaike information criterion (CAIC); the goodness-of-fit index (GFI); the adjusted goodness of fit index; and the comparative fit index (CFI). The findings demonstrated high internal consistency (Cronbach’s α = 0.912) and test-retest reliability (interclass correlation coefficient = 0.74) for the QNWL scale. The chi-square test (χ2 = 13879.60, df [degree of freedom] = 813 P = 0.0001) was significant. The RMSEA value was 0.091, and AIC = 1806.00, CAIC = 7730.69, CFI = 0.93, and GFI = 0.74. The correlation coefficient between the QNWL total scores and the WHOQOL-BREF total scores was 0.605 (p<0.01). The QNWL scale was reliable and valid in Chinese-speaking nurses and could be used as a clinical and research instrument for measuring work-related factors among nurses in China. PMID:25950838
Development of an audit instrument for nursing care plans in the patient record

PubMed Central

Bjorvell, C; Thorell-Ekstrand, I; Wredling, R

2000-01-01

Objectives—To develop, validate, and test the reliability of an audit instrument that measures the extent to which patient records describe important aspects of nursing care. Material—Twenty records from each of three hospital wards were collected and audited. The auditors were registered nurses with a knowledge of nursing documentation in accordance with the VIPS model—a model designed to structure nursing documentation. (VIPS is an acronym formed from the Swedish words for wellbeing, integrity, prevention, and security.) Methods—An audit instrument was developed by determining specific criteria to be met. The audit questions were aimed at revealing the content of the patient for nursing assessment, nursing diagnosis, planned interventions, and outcome. Each of the 60 records was reviewed by the three auditors independently and the reliability of the instrument was tested by calculating the inter-rater reliability coefficient. Content validity was tested by using an expert panel and calculating the content validity ratio. The criterion related validity was estimated by the correlation between the score of the Cat-ch-Ing instrument and the score of an earlier developed and used audit instrument. The results were then tested by using Pearson's correlation coefficient. Results—The new audit instrument, named Cat-ch-Ing, consists of 17 questions designed to judge the nursing documentation. Both quantity and quality variables are judged on a rating scale from zero to three, with a maximum score of 80. The inter-rater reliability coefficients were 0.98, 0.98, and 0.92, respectively for each group of 20 records, the content validity ratio ranged between 0.20 and 1.0 and the criterion related validity showed a significant correlation of r = 0.68 (p< 0.0001, 95% CI 0.57 to 0.76) between the two audit instruments. Conclusion—The Cat-ch-Ing instrument has proved to be a valid and reliable audit instrument for nursing records when the VIPS model is used as the basis of the documentation. (Quality in Health Care 2000;9:6–13) Key Words: audit instrument; nursing care plans; quality assurance PMID:10848373
Frame-of-reference training for simulation-based intraoperative communication assessment.

PubMed

Gardner, Aimee K; Russo, Michael A; Jabbour, Ibrahim I; Kosemund, Matthew; Scott, Daniel J

2016-09-01

The purpose of this study was to examine the impact of frame-of-reference (FOR) training on assessments of intraoperative communication skills and identify areas of need to inform curricular efforts. Simulation instructors (M.D., Ph.D., Research Fellow, Simulation Technician) underwent a 2-hour FOR training session with the operating room communication instrument. They then independently rated communication skills of 19 PGY1s who participated in a team-based simulation. Residents completed self-assessments via video review of the scenario. Intraclass correlation coefficients were used to examine inter-rater reliability. Relationships between trained raters and resident scores were assessed with Pearson correlation coefficients and paired sample t tests. Inter-reliability after FOR training was .91. The correlation between trained rater scores and resident evaluations was nonsignificant. Residents significantly underestimated their intraoperative communication skills (P < .05). Use of names, closed loop communication, and sharing information with team members demonstrated consistently low ratings among all residents. These findings reveal that a number of individuals can be trained to reliably rate resident intraoperative communication performance and that residents tend to under-rate their communication skills. Copyright © 2016 Elsevier Inc. All rights reserved.
[Eppendorf Schizophrenia Inventory (ESI) vs. Frankfurt Complaint Questionnaire (FCQ). Direct comparison in a clinical trial].

PubMed

Mass, R

2005-09-01

This study is the first to directly compare two clinical questionnaires which are both aimed at self-experienced cognitive dysfunctions of schizophrenia: Eppendorf Schizophrenia Inventory (ESI) and Frankfurt Complaint Questionnaire (FCQ). Evaluated were (a) diagnostic validity, (b) psychometric properties, (c) scale intercorrelations, and (d) factor analytic stability. Ad (a): schizophrenic subjects (n=36) show highly significant increases in the ESI scales and sum score when compared to other clinical groups (patients with depression, alcohol dependence, or obsessive-compulsive disorder, n>30, respectively); on the other hand, the FCQ yields no systematic group differences. Ad (b): mean of reliability coefficients (Cronbach alpha) of the ESI scales is r(tt)=0.86, mean of reliability coefficients of the FCQ scales is significantly lower. Ad (c): the mean intercorrelation between ESI and FCQ scales amounts to r(xy)=0.56 (minimum 0.29, maximum 0.73), corresponding to an average shared variance of about 31%. Ad (d): factor analysis yielded an ESI factor and a FBF factor; one-way ANOVA with the factor scores confirms the diagnostic validity of the ESI. ESI and FCQ measure essentially different aspects of schizophrenic psychopathology. Regarding reliability and diagnostic validity, the ESI is superior to the FCQ.
The Structured Interview & Scoring Tool-Massachusetts Alzheimer's Disease Research Center (SIST-M): development, reliability, and cross-sectional validation of a brief structured clinical dementia rating interview.

PubMed

Okereke, Olivia I; Copeland, Maura; Hyman, Bradley T; Wanggaard, Taylor; Albert, Marilyn S; Blacker, Deborah

2011-03-01

The Clinical Dementia Rating (CDR) and CDR Sum-of-Boxes can be used to grade mild but clinically important cognitive symptoms of Alzheimer disease. However, sensitive clinical interview formats are lengthy. To develop a brief instrument for obtaining CDR scores and to assess its reliability and cross-sectional validity. Using legacy data from expanded interviews conducted among 347 community-dwelling older adults in a longitudinal study, we identified 60 questions (from a possible 131) about cognitive functioning in daily life using clinical judgment, inter-item correlations, and principal components analysis. Items were selected in 1 cohort (n=147), and a computer algorithm for generating CDR scores was developed in this same cohort and re-run in a replication cohort (n=200) to evaluate how well the 60 items retained information from the original 131 items. Short interviews based on the 60 items were then administered to 50 consecutively recruited older individuals, with no symptoms or mild cognitive symptoms, at an Alzheimer's Disease Research Center. Clinical Dementia Rating scores based on short interviews were compared with those from independent long interviews. In the replication cohort, agreement between short and long CDR interviews ranged from κ=0.65 to 0.79, with κ=0.76 for Memory, κ=0.77 for global CDR, and intraclass correlation coefficient for CDR Sum-of-Boxes=0.89. In the cross-sectional validation, short interview scores were slightly lower than those from long interviews, but good agreement was observed for global CDR and Memory (κ≥0.70) as well as for CDR Sum-of-Boxes (intraclass correlation coefficient=0.73). The Structured Interview & Scoring Tool-Massachusetts Alzheimer's Disease Research Center is a brief, reliable, and sensitive instrument for obtaining CDR scores in persons with symptoms along the spectrum of mild cognitive change.
Reliability and Validity of the Musculoskeletal Tumor Society Scoring System for the Upper Extremity in Japanese Patients.

PubMed

Uehara, Kosuke; Ogura, Koichi; Akiyama, Toru; Shinoda, Yusuke; Iwata, Shintaro; Kobayashi, Eisuke; Tanzawa, Yoshikazu; Yonemoto, Tsukasa; Kawano, Hirotaka; Kawai, Akira

2017-09-01

The Musculoskeletal Tumor Society (MSTS) scoring system developed in 1993 is a widely used disease-specific evaluation tool for assessment of physical function in patients with musculoskeletal tumors; however, only a few studies have confirmed its reliability and validity. The aim of this study was to validate the MSTS scoring system for the upper extremity (MSTS-UE) in Japanese patients with musculoskeletal tumors for use by others in research. Does the MSTS-UE have: (1) sufficient reliability and internal consistency; (2) adequate construct validity; and (3) reasonable criterion validity in comparison to the Toronto Extremity Salvage Score (TESS) or SF-36? Reliability was performed using test-retest analysis, and internal consistency was evaluated with Cronbach's alpha coefficient. Construct validity was evaluated using a scree plot to confirm the construct number and the Akaike information criterion network. Criterion validity was evaluated by comparing the MSTS-UE with the TESS and SF-36. The test-retest reliability with intraclass correlation coefficient (0.95; 95% CI, 0.91-0.97) was excellent, and internal consistency with Cronbach's α (0.7; 95% CI, 0.53-0.81) was acceptable. There were no ceiling and floor effects. The Akaike Information Criterion network showed that lifting ability, pain, and dexterity played central roles among the components. The MSTS-UE showed substantial correlation with the TESS scoring scale (r = 0.75; p < 0.001) and fair correlation with the SF-36 physical component summary (r = 0.37; p = 0.007). Although the MSTS-UE showed slight correlation with the SF-36 mental component summary, the emotional acceptance component of the MSTS-UE showed fair correlation (r = 0.29; p = 0.039). We can conclude that the MSTS is not an adequate measure of general health-related quality of life; however, this system was designed mainly to be a simple measure of function in a single extremity. To evaluate the mental state of patients with musculoskeletal tumors in the upper extremity, further study is needed.
Reliability and validity of the Thai version of the Pediatric Quality of Life Inventory 4.0.

PubMed

Sritipsukho, Paskorn; Wisai, Matoorada; Thavorncharoensap, Montarat

2013-04-01

The study aimed to evaluate the reliability and validity of the Thai version of the Pediatric Quality of Life Inventory™ 4.0 Core Scales (PedsQL) as a measure of health-related quality of life (HRQOL). The PedsQL items were completed by 2,086 pupils aged 8-15 years and 1,914 parents from four schools, and 100 pediatric outpatients and 100 parents from a University Hospital. Test-retest reliability was conducted in a randomly selected of 150 pupils at a 1-month interval. Internal consistency reliability for the Total Scale score (α = 0.84 self-report, 0.88 proxy-report), Physical Health Summary score (α = 0.76 self-report, 0.79 proxy-report), and Psychosocial Health Summary score (α = 0.74 self-report, 0.85 proxy-report) exceeded the minimum reliability standard of 0.70. School children had significantly higher mean HRQOL scores compared to those with chronic health conditions for all subscales with the mean differences of 3.1-12.4 for self-report (p < 0.03) and 7.7-15.6 for proxy-report (p < 0.001). Test-retest reliability showed intraclass correlation coefficients above 0.60 in all subscales (p < 0.001). The Thai version of PedsQL had adequate reliability and validity and could be used as an outcome measure of HRQOL in Thai children aged 8-15 years.
Reliability, validity, and significance of assessment of sense of contribution in the workplace.

PubMed

Takaki, Jiro; Taniguchi, Toshiyo; Fujii, Yasuhito

2014-01-29

The purpose of this study was to assess the validity and reliability of the Sense of Contribution Scale (SCS), a newly developed, 7-item questionnaire used to measure sense of contribution in the workplace. Workers at 272 organizations answered questionnaires that included the SCS. Because of non-participation or missing data, the number of subjects included in the analyses for internal consistency and validity varied from 1,675 to 2,462 (response rates 54.6%-80.2%). Fifty-four workers were included in the analysis of test-retest reliability (response rate, 77.1%). The SCS showed high internal consistency (Cronbach's α coefficients in men and women were 0.85 and 0.86, respectively) and test-retest reliability (intraclass correlation coefficient = 0.91). Significant (p < 0.001), positive, moderate correlations were found between the SCS score and scores for organization-based self-esteem and work engagement in both genders, which support the SCS's convergent and discriminant validity. The criterion validity of the SCS was supported by the finding that in both genders, the SCS scores were significantly (p < 0.05) and inversely associated with psychological distress and sleep disturbance in crude and in multivariable analyses that adjusted for demographics, organization-based self-esteem, work engagement, effort-reward ratio, workplace bullying, and procedural and interactional justice. The SCS is a psychometrically satisfactory measure of sense of contribution in the workplace. The SCS provides a new and useful instrument to measure sense of contribution, which is independently associated with mental health in workers, for studies in organizational science, occupational health psychology and occupational medicine.
Use of the script concordance approach to evaluate clinical reasoning in food-ruminant practitioners.

PubMed

Dufour, Simon; Latour, Sylvie; Chicoine, Yvan; Fecteau, Gilles; Forget, Sylvain; Moreau, Jean; Trépanier, André

2012-01-01

A script concordance test (SCT) was developed measuring clinical reasoning of food-ruminant practitioners for whom potential clinical competence difficulties were identified by their provincial professional organization. The SCT was designed to be used as part of a broader evaluation procedure. A scoring key was developed based on answers from a reference panel of 12 experts and using the modified aggregate method commonly used for SCTs. A convenient sample of 29 food-ruminant practitioners was constituted to assess the reliability and precision of the SCT and to determine a fair threshold value for success. Cronbach's α coefficients were computed to evaluate internal reliability. To evaluate SCT precision, a test-retest methodology was used and measures of agreement beyond chance were computed at question and test levels. After optimization, the 36-question SCT yielded acceptable internal reliability (Cronbach's α=0.70). Precision of the SCT at question level was excellent with 33 questions (92%) yielding moderate to almost perfect agreement between administrations. At test level, fair agreement (concordance correlation coefficient=0.32) was observed between administrations. A slight SCT score improvement (M=+2.8 points) on the second administration was in part responsible for some of the disagreement and was potentially a result of an adaptation to the SCT format. Scores distribution was used to determine a fair threshold value for success, while considering the underlying objectives of the examination. The data suggest that the developed SCT can be used as a reliable and precise measurement of clinical reasoning of food-ruminant practitioners.
Reliability, Validity, and Significance of Assessment of Sense of Contribution in the Workplace

PubMed Central

Takaki, Jiro; Taniguchi, Toshiyo; Fujii, Yasuhito

2014-01-01

The purpose of this study was to assess the validity and reliability of the Sense of Contribution Scale (SCS), a newly developed, 7-item questionnaire used to measure sense of contribution in the workplace. Workers at 272 organizations answered questionnaires that included the SCS. Because of non-participation or missing data, the number of subjects included in the analyses for internal consistency and validity varied from 1,675 to 2,462 (response rates 54.6%–80.2%). Fifty-four workers were included in the analysis of test–retest reliability (response rate, 77.1%). The SCS showed high internal consistency (Cronbach’s α coefficients in men and women were 0.85 and 0.86, respectively) and test–retest reliability (intraclass correlation coefficient = 0.91). Significant (p < 0.001), positive, moderate correlations were found between the SCS score and scores for organization-based self-esteem and work engagement in both genders, which support the SCS’s convergent and discriminant validity. The criterion validity of the SCS was supported by the finding that in both genders, the SCS scores were significantly (p < 0.05) and inversely associated with psychological distress and sleep disturbance in crude and in multivariable analyses that adjusted for demographics, organization-based self-esteem, work engagement, effort–reward ratio, workplace bullying, and procedural and interactional justice. The SCS is a psychometrically satisfactory measure of sense of contribution in the workplace. The SCS provides a new and useful instrument to measure sense of contribution, which is independently associated with mental health in workers, for studies in organizational science, occupational health psychology and occupational medicine. PMID:24481035
Reliability of Verbal Handoff Assessment and Handoff Quality Before and After Implementation of a Resident Handoff Bundle.

PubMed

Feraco, Angela M; Starmer, Amy J; Sectish, Theodore C; Spector, Nancy D; West, Daniel C; Landrigan, Christopher P

2016-08-01

1) To develop validity evidence for the use of the Verbal Handoff Assessment Tool (VHAT) and examine the reliability of VHAT scores, and 2) to determine whether implementation of a resident handoff bundle (RHB) was associated with improved verbal patient handoffs among pediatric resident physicians. In a pre-post design, prospectively audio recorded verbal patient handoffs conducted at Boston Children's Hospital before and after implementation of the RHB were rated using the VHAT, which was developed for this study (primary outcome). Using generalizability theory, we evaluated the reliability of VHAT scores. Overall, VHAT scores increased after RHB implementation (mean 142 vs 191, possible score 0-500; P < .0001). When accounting for clustering according to resident physician, hospital unit, unit census, and patient complexity, implementation of the RHB was associated with a 63-point increase in VHAT score. Using generalizability theory, we determined that a resident's mean VHAT score on the basis of a handoff of 15 patients assessed by a single observer was sufficiently reliable for relative ranking decisions (ie, norm-based; generalizability coefficient, 0.81), whereas a VHAT score on the basis of a handoff of 21 patients would be sufficiently reliable for high-stakes, standard-based decisions (Phi, 0.80). Verbal handoffs improved after implementation of a RHB, although gains were variable across the 2 clinical units. The VHAT shows promise as an assessment tool for resident handoff skills. If used for competency or entrustment decisions, a resident's mean VHAT score should be on the basis of observation of verbal handoff of ≥21 patients. Copyright © 2016 Academic Pediatric Association. Published by Elsevier Inc. All rights reserved.
A comprehensive scoring system to measure healthy community design in land use plans and regulations.

PubMed

Maiden, Kristin M; Kaplan, Marina; Walling, Lee Ann; Miller, Patricia P; Crist, Gina

2017-02-01

Comprehensive land use plans and their corresponding regulations play a role in determining the nature of the built environment and community design, which are factors that influence population health and health disparities. To determine the level in which a plan addresses healthy living and active design, there is a need for a systematic, reliable and valid method of analyzing and scoring health-related content in plans and regulations. This paper describes the development and validation of a scoring tool designed to measure the strength and comprehensiveness of health-related content found in land use plans and the corresponding regulations. The measures are scored based on the presence of a specific item and the specificity and action-orientation of language. To establish reliability and validity, 42 land use plans and regulations from across the United States were scored January-April 2016. Results of the psychometric analysis indicate the scorecard is a reliable scoring tool for land use plans and regulations related to healthy living and active design. Intraclass correlation coefficients (ICC) scores showed strong inter-rater reliability for total strength and comprehensiveness. ICC scores for total implementation scores showed acceptable consistency among scorers. Cronbach's alpha values for all focus areas were acceptable. Strong content validity was measured through a committee vetting process. The development of this tool has far-reaching implications, bringing standardization of measurement to the field of land use plan assessment, and paving the way for systematic inclusion of health-related design principles, policies, and requirements in land use plans and their corresponding regulations. Copyright © 2016 Elsevier Inc. All rights reserved.
An examination of the interrater reliability between practitioners and researchers on the static-99.

PubMed

Quesada, Stephen P; Calkins, Cynthia; Jeglic, Elizabeth L

2014-11-01

Many studies have validated the psychometric properties of the Static-99, the most widely used measure of sexual offender recidivism risk. However much of this research relied on instrument coding completed by well-trained researchers. This study is the first to examine the interrater reliability (IRR) of the Static-99 between practitioners in the field and researchers. Using archival data from a sample of 1,973 formerly incarcerated sex offenders, field raters' scores on the Static-99 were compared with those of researchers. Overall, clinicians and researchers had excellent IRR on Static-99 total scores, with IRR coefficients ranging from "substantial" to "outstanding" for the individual 10 items of the scale. The most common causes of discrepancies were coding manual errors, followed by item subjectivity, inaccurate item scoring, and calculation errors. These results offer important data with regard to the frequency and perceived nature of scoring errors. © The Author(s) 2013.
Ages and stages questionnaires: adaptation to an Arabic speaking population and cultural sensitivity.

PubMed

Charafeddine, Lama; Sinno, Durriyah; Ammous, Farah; Yassin, Walid; Al-Shaar, Laila; Mikati, Mohamad A

2013-09-01

Early detection of developmental delay is essential to initiate early intervention. The Ages and Stages Questionnaires (ASQ) correlate well with physician's assessment and have high predictive value. No such tool exists in Arabic. Translate and test the applicability and reliability of Arabic translated Ages and Stages Questionnaires (A-ASQ) in an Arabic speaking population. 733 healthy children were assessed. ASQ-II for 10 age groups (4-60 months) were translated to Arabic, back translations and cultural adaptation were performed. Test-retest reliability and internal consistency were evaluated using Pearson Correlation Coefficient (CC) and Cronbach's alpha (Cα). Mean scores per domain were compared to US normative scores using t-test. A-ASQ, after culturally relevant adaptations, was easily administered for 4-36 months age groups but not for 4-5 year old due to numerous cultural differences in the later. For the 4-36 month age groups Pearson CC ranged from 0.345 to 0.833. The internal consistency coefficients Cα scores ranged from 0.111 to 0.816. Significant differences were found in the mean domain scores of all age groups between Lebanese and US normative sample (p-value <0.001) with some exceptions in gross motor, fine motor and personal social domains. A-ASQ was easily translated and administered with acceptable internal consistency and reliability in the younger age groups. It proved to be culturally sensitive, which should be taken into consideration when adapting such tool to non-western populations. Copyright © 2013 European Paediatric Neurology Society. Published by Elsevier Ltd. All rights reserved.
Validity and Reliability of the Korean Version of the Hyperthyroidism Symptom Scale.

PubMed

Lee, Jie Eun; Lee, Dong Hwa; Oh, Tae Jung; Kim, Kyoung Min; Choi, Sung Hee; Lim, Soo; Park, Young Joo; Park, Do Joon; Jang, Hak Chul; Moon, Jae Hoon

2018-03-01

Thyrotoxicosis is a common disease resulting from an excess of thyroid hormones, which affects many organ systems. The clinical symptoms and signs are relatively nonspecific and can vary depending on age, sex, comorbidities, and the duration and cause of the disease. Several symptom rating scales have been developed in an attempt to assess these symptoms objectively and have been applied to diagnosis or to evaluation of the response to treatment. The aim of this study was to assess the reliability and validity of the Korean version of the hyperthyroidism symptom scale (K-HSS). Twenty-eight thyrotoxic patients and 10 healthy subjects completed the K-HSS at baseline and after follow-up at Seoul National University Bundang Hospital. The correlation between K-HSS scores and thyroid function was analyzed. K-HSS scores were compared between baseline and follow-up in patient and control groups. Cronbach's α coefficient was calculated to demonstrate the internal consistency of K-HSS. The mean age of the participants was 34.7±9.8 years and 13 (34.2%) were men. K-HSS scores demonstrated a significant positive correlation with serum free thyroxine concentration and decreased significantly with improved thyroid function. K-HSS scores were highest in subclinically thyrotoxic subjects, lower in patients who were euthyroid after treatment, and lowest in the control group at follow-up, but these differences were not significant. Cronbach's α coefficient for the K-HSS was 0.86. The K-HSS is a reliable and valid instrument for evaluating symptoms of thyrotoxicosis in Korean patients. Copyright © 2018 Korean Endocrine Society.
Validity and Reliability of the Korean Version of the Hyperthyroidism Symptom Scale

PubMed Central

Lee, Dong Hwa

2018-01-01

Background Thyrotoxicosis is a common disease resulting from an excess of thyroid hormones, which affects many organ systems. The clinical symptoms and signs are relatively nonspecific and can vary depending on age, sex, comorbidities, and the duration and cause of the disease. Several symptom rating scales have been developed in an attempt to assess these symptoms objectively and have been applied to diagnosis or to evaluation of the response to treatment. The aim of this study was to assess the reliability and validity of the Korean version of the hyperthyroidism symptom scale (K-HSS). Methods Twenty-eight thyrotoxic patients and 10 healthy subjects completed the K-HSS at baseline and after follow-up at Seoul National University Bundang Hospital. The correlation between K-HSS scores and thyroid function was analyzed. K-HSS scores were compared between baseline and follow-up in patient and control groups. Cronbach's α coefficient was calculated to demonstrate the internal consistency of K-HSS. Results The mean age of the participants was 34.7±9.8 years and 13 (34.2%) were men. K-HSS scores demonstrated a significant positive correlation with serum free thyroxine concentration and decreased significantly with improved thyroid function. K-HSS scores were highest in subclinically thyrotoxic subjects, lower in patients who were euthyroid after treatment, and lowest in the control group at follow-up, but these differences were not significant. Cronbach's α coefficient for the K-HSS was 0.86. Conclusion The K-HSS is a reliable and valid instrument for evaluating symptoms of thyrotoxicosis in Korean patients. PMID:29589389
Psychometric Properties of Difficulties of Working with Patients with Personality Disorders and Attitudes Towards Patients with Personality Disorders Scales.

PubMed

Eren, Nurhan

2014-12-01

In this study, we aimed to develop two reliable and valid assessment instruments for investigating the level of difficulties mental health workers experience while working with patients with personality disorders and the attitudes they develop tt the patients. The research was carried out based on the general screening model. The study sample consisted of 332 mental health workers in several mental health clinics of Turkey, with a certain amount of experience in working with personality disorders, who were selected with a random assignment method. In order to collect data, the Personal Information Questionnaire, Difficulty of Working with Personality Disorders Scale (PD-DWS), and Attitudes Towards Patients with Personality Disorders Scale (PD-APS), which are being examined for reliability and validity, were applied. To determine construct validity, the Adjective Check List, Maslach Burnout Inventory, and State and Trait Anxiety Inventory were used. Explanatory factor analysis was used for investigating the structural validity, and Cronbach alpha, Spearman-Brown, Guttman Split-Half reliability analyses were utilized to examine the reliability. Also, item reliability and validity computations were carried out by investigating the corrected item-total correlations and discriminative indexes of the items in the scales. For the PD-DWS KMO test, the value was .946; also, a significant difference was found for the Bartlett sphericity test (p<.001). The computed test-retest coefficient reliability was .702; the Cronbach alpha value of the total test score was .952. For PD-APS KMO, the value was .925; a significant difference was found in Bartlett sphericity test (p<.001); the computed reliability coefficient based on continuity was .806; and the Cronbach alpha value of the total test score was .913. Analyses on both scales were based on total scores. It was found that PD-DWS and PD-APS have good psychometric properties, measuring the structure that is being investigated, are compatible with other scales, have high levels of internal reliability between their items, and are consistent across time. Therefore, it was concluded that both scales are valid and reliable instruments.

Electrical impedance myography in facioscapulohumeral muscular dystrophy.

PubMed

Statland, Jeffrey M; Heatwole, Chad; Eichinger, Katy; Dilek, Nuran; Martens, William B; Tawil, Rabi

2016-10-01

In this study we determined the reliability and validity of electrical impedance myography (EIM) in facioscapulohumeral muscular dystrophy (FSHD). We performed a prospective study of EIM on 16 bilateral limb and trunk muscles in 35 genetically defined and clinically affected FSHD patients (reliability testing on 18 patients). Summary scores based on body region were derived. Reactance and phase (50 and 100 kHz) were compared with measures of strength, FSHD disease severity, and functional outcomes. Participants were mostly men, mean age 53.0 years, and included a full range of severity. Limb and trunk muscles showed good to excellent reliability [intraclass correlation coefficients (ICC) 0.72-0.99]. Summary scores for the arm, leg, and trunk showed excellent reliability (ICC 0.89-0.98). Reactance was the most sensitive EIM parameter to a broad range of FSHD disease metrics. EIM is a reliable measure of muscle composition in FSHD that offers the possibility to serially evaluate affected muscles. Muscle Nerve 54: 696-701, 2016. © 2016 Wiley Periodicals, Inc.
Reliability and validity of the Korean version of the Short Musculoskeletal Function Assessment questionnaire for patients with musculoskeletal disorder.

PubMed

Jung, Kyoung-Sim; Jung, Jin-Hwa; In, Tae-Sung; Cho, Hwi-Young

2016-09-01

[Purpose] The purpose of this study was to establish the reliability and validity of the Short Musculoskeletal Function Assessment questionnaire, which was translated into Korean, for patients with musculoskeletal disorder. [Subjects and Methods] Fifty-five subjects (26 males and 29 females) with musculoskeletal diseases participated in the study. The Short Musculoskeletal Function Assessment questionnaire focuses on a limited range of physical functions and includes a dysfunction index and a bother index. Reliability was determined using the intraclass correlation coefficient, and validity was examined by correlating short musculoskeletal function assessment scores with the 36-item Short-Form Health Survey (SF-36) score. [Results] The reliability was 0.97 for the dysfunction index and 0.94 for the bother index. Validity was established by comparison with Korean version of the SF-36. [Conclusion] This study demonstrated that the Korean version of the Short Musculoskeletal Function Assessment questionnaire is a reliable and valid instrument for the assessment of musculoskeletal disorders.
Evaluation of Fracture and Osteotomy Union in the Setting of Osteogenesis Imperfecta: Reliability of the Modified Radiographic Union Score for Tibial Fractures (RUST).

PubMed

Franzone, Jeanne M; Finkelstein, Mark S; Rogers, Kenneth J; Kruse, Richard W

2017-09-08

Evaluation of the union of osteotomies and fractures in patients with osteogenesis imperfecta (OI) is a critical component of patient care. Studies of the OI patient population have so far used varied criteria to evaluate bony union. The radiographic union score for tibial fractures (RUST), which was subsequently revised to the modified RUST, is an objective standardized method of evaluating fracture healing. We sought to evaluate the reliability of the modified RUST in the setting of the tibias of patients with OI. Tibial radiographs of 30 patients with OI fractures, or osteotomies were scored by 3 observers on 2 separate occasions. Each of the 4 cortices was given a score (1=no callus, 2=callus present, 3=bridging callus, and 4=remodeled, fracture not visible) and the modified RUST is the sum of these scores (range, 4 to 16). The interobserver and intraobserver reliabilities were evaluated using intraclass coefficients (ICC) with 95% confidence intervals. The ICC representing the interobserver reliability for the first iteration of scores was 0.926 (0.864 to 0.962) and for the second series was 0.915 (0.845 to 0.957). The ICCs representing the intraobserver reliability for each of the 3 reviewers for the measurements in series 1 and 2 were 0.860 (0.707 to 0.934), 0.994 (0.986 to 0.997), and 0.974 (0.946 to 0.988). The modified RUST has excellent interobserver and intraobserver reliability in the setting of OI despite challenges related to the poor quality of the bone and its dysplastic nature. The application and routine use of the modified RUST in the OI population will help standardize our evaluation of osteotomy and fracture healing. Level III-retrospective study of nonconsecutive patients.
Standard Setting Methods for Pass/Fail Decisions on High-Stakes Objective Structured Clinical Examinations: A Validity Study.

PubMed

Yousuf, Naveed; Violato, Claudio; Zuberi, Rukhsana W

2015-01-01

CONSTRUCT: Authentic standard setting methods will demonstrate high convergent validity evidence of their outcomes, that is, cutoff scores and pass/fail decisions, with most other methods when compared with each other. The objective structured clinical examination (OSCE) was established for valid, reliable, and objective assessment of clinical skills in health professions education. Various standard setting methods have been proposed to identify objective, reliable, and valid cutoff scores on OSCEs. These methods may identify different cutoff scores for the same examinations. Identification of valid and reliable cutoff scores for OSCEs remains an important issue and a challenge. Thirty OSCE stations administered at least twice in the years 2010-2012 to 393 medical students in Years 2 and 3 at Aga Khan University are included. Psychometric properties of the scores are determined. Cutoff scores and pass/fail decisions of Wijnen, Cohen, Mean-1.5SD, Mean-1SD, Angoff, borderline group and borderline regression (BL-R) methods are compared with each other and with three variants of cluster analysis using repeated measures analysis of variance and Cohen's kappa. The mean psychometric indices on the 30 OSCE stations are reliability coefficient = 0.76 (SD = 0.12); standard error of measurement = 5.66 (SD = 1.38); coefficient of determination = 0.47 (SD = 0.19), and intergrade discrimination = 7.19 (SD = 1.89). BL-R and Wijnen methods show the highest convergent validity evidence among other methods on the defined criteria. Angoff and Mean-1.5SD demonstrated least convergent validity evidence. The three cluster variants showed substantial convergent validity with borderline methods. Although there was a high level of convergent validity of Wijnen method, it lacks the theoretical strength to be used for competency-based assessments. The BL-R method is found to show the highest convergent validity evidences for OSCEs with other standard setting methods used in the present study. We also found that cluster analysis using mean method can be used for quality assurance of borderline methods. These findings should be further confirmed by studies in other settings.
Evaluation of a modified Karnofsky score to assess physical and psychological wellbeing of cats in a hospital setting.

PubMed

Taffin, Elien Rl; Paepe, Dominique; Campos, Miguel; Duchateau, Luc; Goris, Nesya; De Roover, Katrien; Daminet, Sylvie

2016-11-01

Objectives The Karnofsky score (KS) modified for cats, a scoring system to rate health and quality of life (QOL) in cats, is used in clinical trials, but its reliability and validity are yet to be determined. The present study aims to evaluate the scientific robustness of the KS when adapted for use in a hospital setting. Methods A list of variables to consider during the physical examination, which informs the clinician's score (CS) part of the KS, was added and clinicians were allowed to choose a score anywhere between 0 and 50. The Karnofsky QOL questionnaire was adapted for use in a hospital setting. F-tests with Bonferroni correction and Spearman rank correlation coefficients were used to evaluate reliability and validity of the KS to assess the health and wellbeing of cats in a hospital setting. The records of 54 feline immunodeficiency virus-positive cats, which were recruited for a clinical trial and hospitalised for 6 weeks, were reviewed. Four veterinarians scored the CS, and one veterinarian and a veterinary nurse assessed the QOL score. Results Mean absolute difference between observers was significantly larger for the CS than for the QOL score ( P <0.001) and two veterinarians scored significantly higher than the remaining two veterinarians ( P <0.001). Inter-observer correlation ranged from 0.45-0.75 for the CS. For the QOL score, the absolute difference between observers was small, no significant difference was found between observers and a high degree of inter-observer correlation was noted (r = 0.91). Conclusions and relevance The results indicate low inter-observer reliability for the CS, requiring additional modifications to this part of the KS. The QOL score seems more reliable, and the questionnaire may serve as a reliable tool in the assessment of QOL in cats in a hospital setting. Consequently, further adaptation of the KS is mandatory when simultaneous assessment of both the cat's clinical health and perceived wellbeing is required.
Interobserver Reliability of Peripheral Muscle Strength Tests and Short Physical Performance Battery in Patients With Chronic Obstructive Pulmonary Disease: A Prospective Observational Study.

PubMed

Medina-Mirapeix, Francesc; Bernabeu-Mora, Roberto; Llamazares-Herrán, Eduardo; Sánchez-Martínez, Ma Piedad; García-Vidal, José Antonio; Escolar-Reina, Pilar

2016-11-01

To evaluate the interobserver reliability of the Short Physical Performance Battery (SPPB) and hand dynamometry when measuring isometric muscle strength in people with chronic obstructive pulmonary disease (COPD). Reliability study. Each patient was assessed by a pulmonology physician and a physical therapist in 2 separate sessions 7 to 14 days apart (mean, 9.8±0.8d). Each rater was blinded to the other's results. Pneumology unit of a public hospital. Random sample of outpatients with stable COPD (N=30). Not applicable. SPPB and muscle strength (kg) using electronic handgrip and handheld dynamometers. Reliability was assessed with intraclass correlation coefficients (ICCs), standard error of measurement values, and Bland-Altman plots. ICCs were calculated for the SPPB summary score and for its 3 subscales. The ICCs for the overall reliability of the SPPB summary score and for grip and quadriceps strength were .82 (95% confidence interval [CI], .62-.91), .97 (95% CI, .93-.98), and .76 (95% CI, .49-.88), respectively. The standard error of measurement values were .55 points, 1.30kg, and 1.22kg, respectively. The mean differences between the rater's scores were near zero for grip strength and SPPB summary score measures. The ICCs for the SPPB subscales were .84 (95% CI, .66-.92) for the chair subscale, .75 (95% CI, .48-.88) for gait, and .33 (95% CI, -.42 to .68) for balance. Interobserver reliability was good for quadriceps and handgrip dynamometry and for the SPPB summary score and its chair stand and gait speed subscales. Both pulmonary physicians and physical therapists can obtain and exchange the scores. Because the reliability of the balance subscale was questionable, it is better to use the SPPB summary score. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
The reliability of the Glasgow Coma Scale: a systematic review.

PubMed

Reith, Florence C M; Van den Brande, Ruben; Synnot, Anneliese; Gruen, Russell; Maas, Andrew I R

2016-01-01

The Glasgow Coma Scale (GCS) provides a structured method for assessment of the level of consciousness. Its derived sum score is applied in research and adopted in intensive care unit scoring systems. Controversy exists on the reliability of the GCS. The aim of this systematic review was to summarize evidence on the reliability of the GCS. A literature search was undertaken in MEDLINE, EMBASE and CINAHL. Observational studies that assessed the reliability of the GCS, expressed by a statistical measure, were included. Methodological quality was evaluated with the consensus-based standards for the selection of health measurement instruments checklist and its influence on results considered. Reliability estimates were synthesized narratively. We identified 52 relevant studies that showed significant heterogeneity in the type of reliability estimates used, patients studied, setting and characteristics of observers. Methodological quality was good (n = 7), fair (n = 18) or poor (n = 27). In good quality studies, kappa values were ≥0.6 in 85%, and all intraclass correlation coefficients indicated excellent reliability. Poor quality studies showed lower reliability estimates. Reliability for the GCS components was higher than for the sum score. Factors that may influence reliability include education and training, the level of consciousness and type of stimuli used. Only 13% of studies were of good quality and inconsistency in reported reliability estimates was found. Although the reliability was adequate in good quality studies, further improvement is desirable. From a methodological perspective, the quality of reliability studies needs to be improved. From a clinical perspective, a renewed focus on training/education and standardization of assessment is required.
Spanish validation of the social stigma scale: Community Attitudes towards Mental Illness.

PubMed

Ochoa, Susana; Martínez-Zambrano, Francisco; Vila-Badia, Regina; Arenas, Oti; Casas-Anguera, Emma; García-Morales, Esther; Villellas, Raúl; Martín, José Ramón; Pérez-Franco, María Belén; Valduciel, Tamara; García-Franco, Mar; Miguel, Jose; Balsera, Joaquim; Pascual, Gemma; Julia, Eugènia; Casellas, Diana; Haro, Josep Maria

2016-01-01

The stigma against people with mental illness is very high. In Spain there are currently no tools to assess this construct. The aim of this study was to validate the Spanish version of the Community Attitudes towards Mental Illness questionnaire in an adolescent population, and determining its internal consistency and temporal stability. Another analysis by gender will be also performed. A translation and back-translation of the Community Attitudes towards Mental Illness was performed. A total of 150 students of between 14 and 18 years-old were evaluated with this tool in two stages. Internal consistency was tested using Cronbach α; and intraclass correlation coefficient was used for test-retest reliability. Gender-stratified analyses were also performed. The Cronbach α was 0.861 for the first evaluation and 0.909 for the second evaluation. The values of the intraclass correlation coefficient ranged from 0.775 to 0.339 in the item by item analysis, and between 0.88 and 0.81 in the subscales. In the segmentation by gender, it was found that girls scored between 0.797 and 0.863 in the intraclass correlation coefficient, and boys scored between 0.889 and 0.774. In conclusion, the Community Attitudes towards Mental Illness is a reliable tool for the assessment of social stigma. Although reliable results have been found for boys and girls, our results found some gender differences in the analysis. Copyright © 2014 SEP y SEPB. Published by Elsevier España. All rights reserved.
Development of Reliable and Validated Tools to Evaluate Technical Resuscitation Skills in a Pediatric Simulation Setting: Resuscitation and Emergency Simulation Checklist for Assessment in Pediatrics.

PubMed

Faudeux, Camille; Tran, Antoine; Dupont, Audrey; Desmontils, Jonathan; Montaudié, Isabelle; Bréaud, Jean; Braun, Marc; Fournier, Jean-Paul; Bérard, Etienne; Berlengi, Noémie; Schweitzer, Cyril; Haas, Hervé; Caci, Hervé; Gatin, Amélie; Giovannini-Chami, Lisa

2017-09-01

To develop a reliable and validated tool to evaluate technical resuscitation skills in a pediatric simulation setting. Four Resuscitation and Emergency Simulation Checklist for Assessment in Pediatrics (RESCAPE) evaluation tools were created, following international guidelines: intraosseous needle insertion, bag mask ventilation, endotracheal intubation, and cardiac massage. We applied a modified Delphi methodology evaluation to binary rating items. Reliability was assessed comparing the ratings of 2 observers (1 in real time and 1 after a video-recorded review). The tools were assessed for content, construct, and criterion validity, and for sensitivity to change. Inter-rater reliability, evaluated with Cohen kappa coefficients, was perfect or near-perfect (>0.8) for 92.5% of items and each Cronbach alpha coefficient was ≥0.91. Principal component analyses showed that all 4 tools were unidimensional. Significant increases in median scores with increasing levels of medical expertise were demonstrated for RESCAPE-intraosseous needle insertion (P = .0002), RESCAPE-bag mask ventilation (P = .0002), RESCAPE-endotracheal intubation (P = .0001), and RESCAPE-cardiac massage (P = .0037). Significantly increased median scores over time were also demonstrated during a simulation-based educational program. RESCAPE tools are reliable and validated tools for the evaluation of technical resuscitation skills in pediatric settings during simulation-based educational programs. They might also be used for medical practice performance evaluations. Copyright © 2017 Elsevier Inc. All rights reserved.
Standardised clients as assessors in a veterinary communication OSCE: a reliability and validity study.

PubMed

Artemiou, E; Adams, C L; Hecker, K G; Vallevand, A; Violato, C; Coe, J B

2014-11-22

In human medicine, standardised patients (SP) have been shown to reliably and accurately assess learners' communication performance in high-stakes certification Objective Structured Clinical Examinations (OSCE), offering a feasible way to reduce the need for recruitment, time commitment and coordination of faculty assessors. In this study, we evaluated the use of standardised clients (SC) as a viable option for assessing veterinary students' communication performance. We designed a four-station, two-track communication skills OSCE. SC assessors used an adapted nine-item Liverpool Undergraduate Communication Assessment Scale (LUCAS). Faculty used a 21-item checklist derived from the Calgary-Cambridge Guide (CCG) and a five-point global rating scale. Participants were second year veterinary students (n=96). For the four stations, intrastation reliability (α) ranged from 0.63 to 0.82 for the LUCAS, and 0.73 to 0.87 for the CCG. The interstation reliability coefficients were 0.85 for the LUCAS and 0.89 for the CGG. The calculated Generalisability (G) coefficients were 0.62 for the LUCAS and 0.60 for the CGG. Supporting construct validity, SC and faculty assessors showed a significant correlation between the LUCAS and CCG total percent scores (r=0.45, P<0.001), and likewise between the LUCAS and global rating scores (r=0.49, P<0.001).Study results support that SC assessors offer a reliable and valid approach for assessing veterinary communication OSCE. British Veterinary Association.
Nuclear anxiety: a test-construction study

DOE Office of Scientific and Technical Information (OSTI.GOV)

Braunstein, A.L.

1986-01-01

The Nuclear Anxiety Scale was administered to 263 undergraduate and graduate studies (on eight occasions in December, 1985 and January, 1986). (1) The obtained alpha coefficient was .91. This was significant at the .01 level, and demonstrated that the scale was internally homogeneous and consistent. (2) Item discrimination indices (point biserial correlation coefficients) computered for the thirty (30) items yielded a range of .25 to .64. All coefficients were significant at the .01 level, and all 30 items were retained as demonstrating significant discriminability. (3) The correlation between two administrations of the scale (with a 48-hour interval) was .83. Thismore » was significant at the .01 level, and demonstrated test-retest reliability and stability over time. (4) The point-biserial correlation coefficient between scores on the Nuclear Anxiety Scale, and the students' self-report of nuclear anxiety as being either a high or low ranked stressor, was .59. This was significant at the .01 level, and demonstrated concurrent validity. (5) The correlation coefficient between scores on the Nuclear Anxiety Scale and the Spielberger State-Trait Anxiety Inventory, A-Trait, (1970), was .41. This was significant at the .01 level, and demonstrated convergent validity. (6) The correlation coefficient between positively stated and negatively stated items (with scoring reversed) was .76. This was significant at the .01 level, and demonstrated freedom from response set bias.« less
Cross-cultural adaptation and reliability and validity of the Dutch Patient-Rated Tennis Elbow Evaluation (PRTEE-D).

PubMed

van Ark, Mathijs; Zwerver, Johannes; Diercks, Ronald L; van den Akker-Scheek, Inge

2014-08-11

Lateral Epicondylalgia (LE) is a common injury for which no reliable and valid measure exists to determine severity in the Dutch language. The Patient-Rated Tennis Elbow Evaluation (PRTEE) is the first questionnaire specifically designed for LE but in English. The aim of this study was to translate into Dutch and cross-culturally adapt the PRTEE and determine reliability and validity of the PRTEE-D (Dutch version). The PRTEE was cross-culturally adapted according to international guidelines. Participants (n = 122) were asked to fill out the PRTEE-D twice with a one week interval to assess test-retest reliability. Internal consistency of the PRTEE-D was determined by calculating Crohnbach's alphas for the questionnaire and subscales. Intraclass Correlation Coefficients (ICC) were calculated for the overall PRTEE-D score, pain and function subscale and individual questions to determine test-retest reliability. Additionally, the Disabilities for the Arm, Shoulder and Hand questionnaire (DASH) and Visual Analogue Scale (VAS) pain scores were obtained from 30 patients to assess construct validity; Spearman's correlation coefficients were calculated between the PRTEE-D (subscales) and DASH and VAS-pain scores. The PRTEE was successfully cross-culturally adapted into Dutch (PRTEE-D). Crohnbach's alpha for the first assessment of the PRTEE-D was 0.98; Crohnbach's alpha was 0.93 for the pain subscale and 0.97 for the function subscale. ICC for the PRTEE-D was 0.98; subscales also showed excellent ICC values (pain scale 0.97 and function scale 0.97). A significant moderate correlation exists between PRTEE-D and DASH (0.65) and PRTEE-D and VAS pain (0.68). The PRTEE was successfully cross-culturally adapted and this study showed that the PRTEE-D is reliable and valid to obtain an indication of severity of LE. An easy-to-use instrument for practitioners is now available and this facilitates comparing Dutch and international research data.
Reliability, validity, and sensitivity to change of the lower extremity functional scale in individuals affected by stroke.

PubMed

Verheijde, Joseph L; White, Fred; Tompkins, James; Dahl, Peder; Hentz, Joseph G; Lebec, Michael T; Cornwall, Mark

2013-12-01

To investigate reliability, validity, and sensitivity to change of the Lower Extremity Functional Scale (LEFS) in individuals affected by stroke. The secondary objective was to test the validity and sensitivity of a single-item linear analog scale (LAS) of function. Prospective cohort reliability and validation study. A single rehabilitation department in an academic medical center. Forty-three individuals receiving neurorehabilitation for lower extremity dysfunction after stroke were studied. Their ages ranged from 32 to 95 years, with a mean of 70 years; 77% were men. Test-retest reliability was assessed by calculating the classical intraclass correlation coefficient, and the Bland-Altman limits of agreement. Validity was assessed by calculating the Pearson correlation coefficient between the instruments. Sensitivity to change was assessed by comparing baseline scores with end of treatment scores. Measurements were taken at baseline, after 1-3 days, and at 4 and 8 weeks. The LEFS, Short-Form-36 Physical Function Scale, Berg Balance Scale, Six-Minute Walk Test, Five-Meter Walk Test, Timed Up-and-Go test, and the LAS of function were used. The test-retest reliability of the LEFS was found to be excellent (ICC = 0.96). Correlated with the 6 other measures of function studied, the validity of the LEFS was found to be moderate to high (r = 0.40-0.71). Regarding the sensitivity to change, the mean LEFS scores from baseline to study end increased 1.2 SD and for LAS 1.1 SD. LEFS exhibits good reliability, validity, and sensitivity to change in patients with lower extremity impairments secondary to stroke. Therefore, the LEFS can be a clinically efficient outcome measure in the rehabilitation of patients with subacute stroke. The LAS is shown to be a time-saving and reasonable option to track changes in a patient's functional status. Copyright © 2013 American Academy of Physical Medicine and Rehabilitation. Published by Elsevier Inc. All rights reserved.
Reliability and validation of the Dutch Achilles tendon Total Rupture Score.

PubMed

Opdam, K T M; Zwiers, R; Wiegerinck, J I; Kleipool, A E B; Haverlag, R; Goslings, J C; van Dijk, C N

2018-03-01

Patient-reported outcome measures (PROMs) have become a cornerstone for the evaluation of the effectiveness of treatment. The Achilles tendon Total Rupture Score (ATRS) is a PROM for outcome and assessment of an Achilles tendon rupture. The aim of this study was to translate the ATRS to Dutch and evaluate its reliability and validity in the Dutch population. A forward-backward translation procedure was performed according to the guidelines of cross-cultural adaptation process. The Dutch ATRS was evaluated for reliability and validity in patients treated for a total Achilles tendon rupture from 1 January 2012 to 31 December 2014 in one teaching hospital and one academic hospital. Reliability was assessed by the intraclass correlation coefficients (ICC), Cronbach's alpha and minimal detectable change (MDC). We assessed construct validity by calculation of Spearman's rho correlation coefficient with domains of the Foot and Ankle Outcome Score (FAOS), Victorian Institute of Sports Assessment-Achilles questionnaire (VISA-A) and Numeric Rating Scale (NRS) for pain in rest and during running. The Dutch ATRS had a good test-retest reliability (ICC = 0.852) and a high internal consistency (Cronbach's alpha = 0.96). MDC was 30.2 at individual level and 3.5 at group level. Construct validity was supported by 75 % of the hypothesized correlations. The Dutch ATRS had a strong correlation with NRS for pain during running (r = -0.746) and all the five subscales of the Dutch FAOS (r = 0.724-0.867). There was a moderate correlation with the VISA-A-NL (r = 0.691) and NRS for pain in rest (r = -0.580). The Dutch ATRS shows an adequate reliability and validity and can be used in the Dutch population for measuring the outcome of treatment of a total Achilles tendon rupture and for research purposes. Diagnostic study, Level I.
Training less-experienced faculty improves reliability of skills assessment in cardiac surgery.

PubMed

Lou, Xiaoying; Lee, Richard; Feins, Richard H; Enter, Daniel; Hicks, George L; Verrier, Edward D; Fann, James I

2014-12-01

Previous work has demonstrated high inter-rater reliability in the objective assessment of simulated anastomoses among experienced educators. We evaluated the inter-rater reliability of less-experienced educators and the impact of focused training with a video-embedded coronary anastomosis assessment tool. Nine less-experienced cardiothoracic surgery faculty members from different institutions evaluated 2 videos of simulated coronary anastomoses (1 by a medical student and 1 by a resident) at the Thoracic Surgery Directors Association Boot Camp. They then underwent a 30-minute training session using an assessment tool with embedded videos to anchor rating scores for 10 components of coronary artery anastomosis. Afterward, they evaluated 2 videos of a different student and resident performing the task. Components were scored on a 1 to 5 Likert scale, yielding an average composite score. Inter-rater reliabilities of component and composite scores were assessed using intraclass correlation coefficients (ICCs) and overall pass/fail ratings with kappa. All components of the assessment tool exhibited improvement in reliability, with 4 (bite, needle holder use, needle angles, and hand mechanics) improving the most from poor (ICC range, 0.09-0.48) to strong (ICC range, 0.80-0.90) agreement. After training, inter-rater reliabilities for composite scores improved from moderate (ICC, 0.76) to strong (ICC, 0.90) agreement, and for overall pass/fail ratings, from poor (kappa = 0.20) to moderate (kappa = 0.78) agreement. Focused, video-based anchor training facilitates greater inter-rater reliability in the objective assessment of simulated coronary anastomoses. Among raters with less teaching experience, such training may be needed before objective evaluation of technical skills. Published by Elsevier Inc.
Preliminary validation of 2 magnetic resonance image scoring systems for osteoarthritis of the hip according to the OMERACT filter.

PubMed

Maksymowych, Walter P; Cibere, Jolanda; Loeuille, Damien; Weber, Ulrich; Zubler, Veronika; Roemer, Frank W; Jaremko, Jacob L; Sayre, Eric C; Lambert, Robert G W

2014-02-01

Development of a validated magnetic resonance image (MRI) scoring system is essential in hip OA because radiographs are insensitive to change. We assessed the feasibility and reliability of 2 previously developed scoring methods: (1) the Hip Inflammation MRI Scoring System (HIMRISS) and (2) the Hip Osteoarthritis MRI Scoring System (HOAMS). Six readers (3 radiologists, 3 rheumatologists) participated in 2 reading exercises. In Reading Exercise 1, MRI of the hip of 20 subjects were read at a single time point followed by further standardization of methodology. In Reading Exercise 2, MRI of the hip of 18 subjects from a randomized controlled trial, assessed at 2 timepoints, and 27 subjects from a cross-sectional study were read for HIMRISS and HOAMS bone marrow lesions (BML) and synovitis. Reliability was assessed using intraclass correlation coefficient (ICC) and kappa statistics. Both methods were considered feasible. For Reading 1, HIMRISS ICC were 0.52, 0.61, 0.70, and 0.58 for femoral BML, acetabular BML, effusion, and total scores, respectively; and for HOAMS, summed BML and synovitis ICC were 0.52 and 0.46, respectively. For Reading 2, HIMRISS and HOAMS ICC for BML and synovitis-effusion improved substantially. Interobserver reliability for change scores was 0.81 and 0.71 for HIMRISS femoral and HOAMS summed BML, respectively. Responsiveness and discrimination was moderate to high for synovitis-effusion. Significant associations were noted between BML or synovitis scores and Western Ontario and McMaster Universities Osteoarthritis Index pain scores for baseline values (p ≤ 0.001). The BML and synovitis-effusion components of both HIMRISS and HOAMS scoring systems are feasible and reliable, and should be validated further.
The Comprehensive Snack Parenting Questionnaire (CSPQ): Development and Test-Retest Reliability.

PubMed

Gevers, Dorus W M; Kremers, Stef P J; de Vries, Nanne K; van Assema, Patricia

2018-04-26

The narrow focus of existing food parenting instruments led us to develop a food parenting practices instrument measuring the full range of food practices constructs with a focus on snacking behavior. We present the development of the questionnaire and our research on the test-retest reliability. The developed Comprehensive Snack Parenting Questionnaire (CSPQ) covers 21 constructs. Test-retest reliability was assessed by calculating intra class correlation coefficients and percentage agreement after two administrations of the CSPQ among a sample of 66 Dutch parents. Test-retest reliability analysis revealed acceptable intra class correlation coefficients (≥0.41) or agreement scores (≥0.60) for all items. These results, together with earlier work, suggest sufficient psychometric characteristics. The comprehensive, but brief CSPQ opens up chances for highly essential but unstudied research questions to understand and predict children’s snack intake. Example applications include studying the interactional nature of food parenting practices or interactions of food parenting with general parenting or child characteristics.
Validation of an instrument to measure quality of life in British children with inflammatory bowel disease.

PubMed

Ogden, C A; Akobeng, A K; Abbott, J; Aggett, P; Sood, M R; Thomas, A G

2011-09-01

To validate IMPACT-III (UK), a health-related quality of life (HRQoL) instrument, in British children with inflammatory bowel disease (IBD). One hundred six children and parents were invited to participate. IMPACT-III (UK) was validated by inspection by health professionals and children to assess face and content validity, factor analysis to determine optimum domain structure, use of Cronbach alpha coefficients to test internal reliability, ANOVA to assess discriminant validity, correlation with the Child Health Questionnaire to assess concurrent validity, and use of intraclass correlation coefficients to assess test-retest reliability. The independent samples t test was used to measure differences between sexes and age groups, and between paper and computerised versions of IMPACT-III (UK). IMPACT-III (UK) had good face and content validity. The most robust factor solution was a 5-domain structure: body image, embarrassment, energy, IBD symptoms, and worries/concerns about IBD, all of which demonstrated good internal reliability (α = 0.74-0.88). Discriminant validity was demonstrated by significant (P < 0.05, P < 0.01) differences in HRQoL scores between the severe, moderate, and inactive/mild symptom severity groups for the embarrassment scale (63.7 vs 81.0 vs 81.2), IBD symptom scale (45.0 vs 64.2 vs 80.6), and the energy scale (46.4 vs 62.1 vs 77.7). Concurrent validity of IMPACT-III (UK) with comparable domains of the Child Health Questionnaire was confirmed. Test-retest reliability was confirmed with good intraclass correlation coefficients of 0.66 to 0.84. Paper and computer versions of IMPACT-III (UK) collected comparable scores, and there were no differences between the sexes and age groups. IMPACT-III (UK) appears to be a useful tool to measure HRQoL in British children with IBD.
Translation, Cross-Cultural Adaptation, and Validation of the Activity Rating Scale for Disorders of the Knee.

PubMed

Flosadottir, Vala; Roos, Ewa M; Ageberg, Eva

2017-09-01

The Activity Rating Scale (ARS) for disorders of the knee evaluates the level of activity by the frequency of participation in 4 separate activities with high demands on knee function, with a score ranging from 0 (none) to 16 (pivoting activities 4 times/wk). To translate and cross-culturally adapt the ARS into Swedish and to assess measurement properties of the Swedish version of the ARS. Cohort study (diagnosis); Level of evidence, 2. The COSMIN guidelines were followed. Participants (N = 100 [55 women]; mean age, 27 years) who were undergoing rehabilitation for a knee injury completed the ARS twice for test-retest reliability. The Knee injury and Osteoarthritis Outcome Score (KOOS), Tegner Activity Scale (TAS), and modernized Saltin-Grimby Physical Activity Level Scale (SGPALS) were administered at baseline to validate the ARS. Construct validity and responsiveness of the ARS were evaluated by testing predefined hypotheses regarding correlations between the ARS, KOOS, TAS, and SGPALS. The Cronbach alpha, intraclass correlation coefficients, absolute reliability, standard error of measurement, smallest detectable change, and Spearman rank-order correlation coefficients were calculated. The ARS showed good internal consistency (α ≈ 0.96), good test-retest reliability (intraclass correlation coefficient >0.9), and no systematic bias between measurements. The standard error of measurement was less than 2 points, and the smallest detectable change was less than 1 point at the group level and less than 5 points at the individual level. More than 75% of the hypotheses were confirmed, indicating good construct validity and good responsiveness of the ARS. The Swedish version of the ARS is valid, reliable, and responsive for evaluating the level of activity based on the frequency of participation in high-demand knee sports activities in young adults with a knee injury.
Psychometric properties of the Calgary Cambridge guides to assess communication skills of undergraduate medical students.

PubMed

Simmenroth-Nayda, Anne; Heinemann, Stephanie; Nolte, Catharina; Fischer, Thomas; Himmel, Wolfgang

2014-12-06

The aim of this study was to analyse the psychometric properties of the short version of the Calgary Cambridge Guides and to decide whether it can be recommended for use in the assessment of communications skills in young undergraduate medical students. Using a translated version of the Guide, 30 members from the Department of General Practice rated 5 videotaped encounters between students and simulated patients twice. Item analysis should detect possible floor and/or ceiling effects. The construct validity was investigated using exploratory factor analysis. Intra-rater reliability was measured in an interval of 3 months, inter-rater reliability was assessed by the intraclass correlation coefficient. The score distribution of the items showed no ceiling or floor effects. Four of the five factors extracted from the factor analysis represented important constructs of doctor-patient communication The ratings for the first and second round of assessing the videos correlated at 0.75 (p<0.0001). Intraclass correlation coefficients for each item ranged were moderate and ranged from 0.05 to 0.57. Reasonable score distributions of most items without ceiling or floor effects as well as a good test-retest reliability and construct validity recommend the C-CG as an instrument for assessing communication skills in undergraduate medical students. Some deficiencies in inter-rater reliability are a clear indication that raters need a thorough instruction before using the C-CG.

Reliability and Accuracy of Cross-sectional Radiographic Assessment of Severe Knee Osteoarthritis: Role of Training and Experience.

PubMed

Klara, Kristina; Collins, Jamie E; Gurary, Ellen; Elman, Scott A; Stenquist, Derek S; Losina, Elena; Katz, Jeffrey N

2016-07-01

To dêtermine the reliability of radiographic assessment of knee osteoarthritis (OA) by nonclinician readers compared to an experienced radiologist. The radiologist trained 3 nonclinicians to evaluate radiographic characteristics of knee OA. The radiologist and nonclinicians read preoperative films of 36 patients prior to total knee replacement. Intrareader and interreader reliability were measured using the weighted κ statistic and intraclass correlation coefficient (ICC). Scores κ < 0.20 indicated slight agreement, 0.21-0.40 fair, 0.41-0.60 moderate, 0.61-0.80 substantial, and 0.81-1.0 almost perfect agreement. Intrareader reliability among nonclinicians (κ) ranged from 0.40 to 1.0 for individual radiographic features and 0.72 to 1.0 for Kellgren-Lawrence (KL) grade. ICC ranged from 0.89 to 0.98 for the Osteoarthritis Research Society International (OARSI) summary score. Interreader agreement among nonclinicians ranged from κ of 0.45 to 0.94 for individual features, and 0.66 to 0.97 for KL grade. ICC ranged from 0.87 to 0.96 for the OARSI Summary Score. Interreader reliability between nonclinicians and the radiologist ranged from κ of 0.56 to 0.85 for KL grade. ICC ranged from 0.79 to 0.88 for the OARSI Summary Score. Intrareader and interreader agreement was variable for individual radiograph features but substantial for summary KL grade and OARSI Summary Score. Investigators face tradeoffs between cost and reader experience. These data suggest that in settings where costs are constrained, trained nonclinicians may be suitable readers of radiographic knee OA, particularly if a summary score (KL grade or OARSI Score) is used to determine radiographic severity.
Quality Evaluation Scores are no more Reliable than Gestalt in Evaluating the Quality of Emergency Medicine Blogs: A METRIQ Study.

PubMed

Thoma, Brent; Sebok-Syer, Stefanie S; Colmers-Gray, Isabelle; Sherbino, Jonathan; Ankel, Felix; Trueger, N Seth; Grock, Andrew; Siemens, Marshall; Paddock, Michael; Purdy, Eve; Kenneth Milne, William; Chan, Teresa M

2018-01-30

Construct: We investigated the quality of emergency medicine (EM) blogs as educational resources. Online medical education resources such as blogs are increasingly used by EM trainees and clinicians. However, quality evaluations of these resources using gestalt are unreliable. We investigated the reliability of two previously derived quality evaluation instruments for blogs. Sixty English-language EM websites that published clinically oriented blog posts between January 1 and February 24, 2016, were identified. A random number generator selected 10 websites, and the 2 most recent clinically oriented blog posts from each site were evaluated using gestalt, the Academic Life in Emergency Medicine (ALiEM) Approved Instructional Resources (AIR) score, and the Medical Education Translational Resources: Impact and Quality (METRIQ-8) score, by a sample of medical students, EM residents, and EM attendings. Each rater evaluated all 20 blog posts with gestalt and 15 of the 20 blog posts with the ALiEM AIR and METRIQ-8 scores. Pearson's correlations were calculated between the average scores for each metric. Single-measure intraclass correlation coefficients (ICCs) evaluated the reliability of each instrument. Our study included 121 medical students, 88 EM residents, and 100 EM attendings who completed ratings. The average gestalt rating of each blog post correlated strongly with the average scores for ALiEM AIR (r = .94) and METRIQ-8 (r = .91). Single-measure ICCs were fair for gestalt (0.37, IQR 0.25-0.56), ALiEM AIR (0.41, IQR 0.29-0.60) and METRIQ-8 (0.40, IQR 0.28-0.59). The average scores of each blog post correlated strongly with gestalt ratings. However, neither ALiEM AIR nor METRIQ-8 showed higher reliability than gestalt. Improved reliability may be possible through rater training and instrument refinement.
Validation and reliability of a Behcet's Syndrome Activity Scale in Korea.

PubMed

Choi, Hyo Jin; Seo, Mi Ryoung; Ryu, Hee Jung; Baek, Han Joo

2016-01-01

We prepared a cross-cultural adaptation of the Behcet's Syndrome Activity Scale (BSAS) and evaluated its reliability and validity in Korea. Fifty patients with Behcet's disease (BD) who attended the Rheumatology Clinic of Gachon University Gil Medical Center were included in this study. The first BSAS questionnaire was administered at each clinic visit, and the second questionnaire was completed at home within 24 hours of the visit. A Behcet's Disease Current Activity Form (BDCAF) and a Behcet's Disease Quality of Life (BDQOL) form were also given to patients. The test-retest reliability was analyzed by intraclass correlation coefficients (ICC). To assess the validity, the total BSAS score was compared with the BDCAF score, the patient/physician global assessment, and the BDQOL by Spearman rank correlation. Twelve males and 38 females were enrolled. The mean age was 48.5 years and the mean disease duration was 6.7 years. Thirty-eight patients (76.0%) returned the questionnaire by mail. For the test-retest reliability, the two assessments were significantly correlated on all 10 items of the BSAS questionnaire (p < 0.05) and the total BSAS score (ICC, 0.925; p < 0.001). The total BSAS score was statistically correlated with the BDQOL, BDCAF, and patient/physician global assessment (p < 0.01). The Korean version of BSAS is a reliable and valid instrument to measure BD activity.
Transcultural adaptation to Brazilian Portuguese and reliability of the effort-reward imbalance in household and family work

PubMed Central

de Vasconcellos, Ilmeire Ramos Rosembach; Griep, Rosane Härter; Portela, Luciana; Alves, Márcia Guimarães de Mello; Rotenberg, Lúcia

2016-01-01

ABSTRACT OBJECTIVE To describe the steps in the transcultural adaptation of the scale in the Effort-reward imbalance model to household and family work to the Brazilian context. METHODS We performed the translation, back-translation, and initial psychometric evaluation of the questionnaire that comprised three dimensions: (i) effort (eight items, emphasizing quantitative workload), (ii) reward (11 items that seek to capture the intrinsic value of family and household work, societal esteem, recognition from the spouse/partner, and affection from the children), and (iii) overcommitment (four items related to intrinsic effort). The scale was included in a sectional study conducted with 1,045 nursing workers. A subsample of 222 subjects answered the questionnaire for a second time, seven to 15 days thereafter. The data were collected between October 2012 and May 2013. The internal consistency of the scale was evaluated using Cronbach’s alpha and test-retest reliability analysis, square weighted kappa, prevalence and bias adjusted Kappa, and intraclass correlation coefficient. RESULTS Prevalence and bias-adjusted Kappa (ka) of the scale dimensions ranged from 0.80-0.83 for overcommitment, 0.78-0.90 for effort, and 0.76-0.93 for reward. In most dimensions, the values of minimum and maximum scores, average, standard deviation, and Cronbach’s alpha were similar in test and retest scores. Only on societal esteem subdimension (reward) was there little variation in standard deviation (test score of 2.24 and retest score of 3.36) and in Cronbach’s alpha coefficient (test score of 0.38 and retest score of 0.59). CONCLUSIONS The Brazilian version of the scale was found to have proper reliability indices regarding time stability, which suggests adapting it to be used in population with characteristics that are similar to the one in this study. PMID:27355466
Cross-Cultural Adaptation and Validation of the Voice Handicap Index into Thai.

PubMed

Jaruchinda, Pariyanan; Suwanwarangkool, Thadchai

2015-12-01

The voice handicap index (VHI) is one of the most utilized instruments for measuring a patient's self-assessment of voice severity. The VHI has been translated into several languages, but not in Thai. To examine the psychometric properties of a Thai translation of the voice Handicap Index (VHI) and assess the applicability in the screening diagnosis. After receiving permission from the American Speech Language Hearing Association (ASHA), the original VHI had been translated and adapted to Thai by forward and backward standard translation. Eighty-five patients with voice disorders, divided in four groups according to the etiology of the diseases (neurogenic, structural, functional, and inflammatory), and 30 asymptomatic subjects were included in the present study. Internal consistency was analyzed through Cronbach's a coefficient. For the VHI test-retest reliability analysis, the Thai VHI was completed twice by 22 patients and assessed through the intraclass correlation coefficient. For clinical validity evaluation, the VHI scores from the pathological group were compared with the control group and compared among the four different pathological groups. The cutoff point for distinguishing the normal from the patient group was assessed by ROC analysis. Effects of age and gender on VHI scores were also evaluated. The Thai VHI showed a significant high internal consistency and test-retest reliability (Cronbach's α = 0.96 and r = 0.843, respectively). Mann-Whitney U test was used to compare the control group and pathological groups and revealed significant difference in total scores and its three domains scores (p < 0.001). ROC analysis demonstrated that a VHI score of 13 should be considered the threshold for revealing the impact of quality of life in voice disorder patients. Age and gender were not affect the VHI scores in both control and patient groups. The Thai VHI has high reliability and validity. The Thai version of VHI is considered to be a self-assessment tool for the severity of voice disorders in Thai patients.
Reliability of Lactation Assessment Tools Applied to Overweight and Obese Women.

PubMed

Chapman, Donna J; Doughty, Katherine; Mullin, Elizabeth M; Pérez-Escamilla, Rafael

2016-05-01

The interrater reliability of lactation assessment tools has not been evaluated in overweight/obese women. This study aimed to compare the interrater reliability of 4 lactation assessment tools in this population. A convenience sample of 45 women (body mass index > 27.0) was videotaped while breastfeeding (twice daily on days 2, 4, and 7 postpartum). Three International Board Certified Lactation Consultants independently rated each videotaped session using 4 tools (Infant Breastfeeding Assessment Tool [IBFAT], modified LATCH [mLATCH], modified Via Christi [mVC], and Riordan's Tool [RT]). For each day and tool, we evaluated interrater reliability with 1-way repeated-measures analyses of variance, intraclass correlation coefficients (ICCs), and percentage absolute agreement between raters. Analyses of variance showed significant differences between raters' scores on day 2 (all scales) and day 7 (RT). Intraclass correlation coefficient values reflected good (mLATCH) to excellent reliability (IBFAT, mVC, and RT) on days 2 and 7. All day 4 ICCs reflected good reliability. The ICC for mLATCH was significantly lower than all others on day 2 and was significantly lower than IBFAT (day 7). Percentage absolute interrater agreement for scale components ranged from 31% (day 2: observable swallowing, RT) to 92% (day 7: IBFAT, fixing; and mVC, latch time). Swallowing scores on all scales had the lowest levels of interrater agreement (31%-64%). We demonstrated differences in the interrater reliability of 4 lactation assessment tools when applied to overweight/obese women, with the lowest values observed on day 4. Swallowing assessment was particularly unreliable. Researchers and clinicians using these scales should be aware of the differences in their psychometric behavior. © The Author(s) 2015.
Reliability and validity of the Turkish version of the Berg Balance Scale.

PubMed

Sahin, Fusun; Yilmaz, Figen; Ozmaden, Asli; Kotevolu, Nurdan; Sahin, Tulay; Kuran, Banu

2008-01-01

The purpose of this study was to develop a Turkish version of the Berg Balance Scale (BBS) and assess its reliability and validity. Sixty healthy volunteers older than 65 years were included in to the study. Subjects who had lower extremity amputation, or were armchair or bedridden were excluded. After translation process, the Turkish version of the scale was administered to each participant twice with an interval of 2 weeks. The intraclass correlation coefficient (ICC) was calculated to assess intra- and inter-observer reliability. Chronbach alpha was calculated to evaluate internal consistency of the total BBS score. Interclass correlation coefficient was calcuated to examine test-retest reliability. Convergent validity was assessed by correlating the scale with Modified Barthel Index (MBI) and Timed Up and Go Test (TUG). Construct validity was assessed with factor analysis. The mean age in years of the participants were 77.00+/-5.67 (range: 67-92 yrs). The ICC for intra- and inter- observer reliability was 0.98 (p<0.0001) and 0.97 (p<0.0001), respectively. Chronbach alpha of the Turkish version of the BBS was 0.98. The test-retest reliability (ICC) of the Turkish version of the BBS was determined as 0.98 for the total score, and ranged from 0.86-0.99 for individual items. In terms of validity, the Turkish version of the BBS was correlated with the MBI (in positive direction) and TUG (in negative direction) (r=0.67 p<0.0001; r=-0.75 p<0.0001, respectively). The Turkish version of the BBS is a reliable and valid scale to be used in balance assessment of Turkish older adults.
Reliability and validity of the range of motion scale (ROMS) in patients with abnormal postures.

PubMed

van Rooijen, Diana E; Lalli, Stefania; Marinus, Johan; Maihöfner, Christian; McCabe, Candida S; Munts, Alex G; van der Plas, Anton A; Tijssen, Marina A J; van de Warrenburg, Bart P; Albanese, Alberto; van Hilten, Jacobus J

2015-03-01

Sustained abnormal postures (i.e., fixed dystonia) are the most frequently reported motor abnormalities in complex regional pain syndrome (CRPS), but these symptoms may also develop after peripheral trauma without CRPS. Currently, there is no valid and reliable measurement instrument available to measure the severity and distribution of these postures. The range of motion scale (ROMS) was therefore developed to assess the severity based on the possible active range of motion of all joints (arms, legs, trunk, and neck), and the present study evaluates its reliability and validity. Inter- and intra-rater reliability of the ROMS was determined in 16 patients with abnormal sustained postures, who were videotaped following a standard video protocol in a university hospital. The recordings were rated by a panel of international experts. In addition, 30 patients were clinically tested with both the Burke-Fahn-Marsden (BFM) scale as well as the ROMS to assess construct validity. Inter-rater reliability for total ROMS scores showed an intra-class correlation coefficient (ICC) of 0.85. The majority of the scores for the separate joints (13 out of 18) demonstrated an almost perfect agreement with ICCs ranging from 0.81 to 0.94; of the other items, one showed fair, one moderate, and three substantial agreement. The ICCs for the intra-rater reliability ranged from moderate to almost perfect (0.68-0.98). Spearman's correlation coefficients between corresponding body areas as measured with the ROMS or BFM were all above 0.82. The ROMS is a reliable and valid instrument to evaluate the severity and distribution of sustained abnormal postures. Wiley Periodicals, Inc.
Validity and reliability of the Self-Reported Physical Fitness (SRFit) survey.

PubMed

Keith, NiCole R; Clark, Daniel O; Stump, Timothy E; Miller, Douglas K; Callahan, Christopher M

2014-05-01

An accurate physical fitness survey could be useful in research and clinical care. To estimate the validity and reliability of a Self-Reported Fitness (SRFit) survey; an instrument that estimates muscular fitness, flexibility, cardiovascular endurance, BMI, and body composition (BC) in adults ≥ 40 years of age. 201 participants completed the SF-36 Physical Function Subscale, International Physical Activity Questionnaire (IPAQ), Older Adults' Desire for Physical Competence Scale (Rejeski), the SRFit survey, and the Rikli and Jones Senior Fitness Test. BC, height and weight were measured. SRFit survey items described BC, BMI, and Senior Fitness Test movements. Correlations between the Senior Fitness Test and the SRFit survey assessed concurrent validity. Cronbach's Alpha measured internal consistency within each SRFit domain. SRFit domain scores were compared with SF-36, IPAQ, and Rejeski survey scores to assess construct validity. Intraclass correlations evaluated test-retest reliability. Correlations between SRFit and the Senior Fitness Test domains ranged from 0.35 to 0.79. Cronbach's Alpha scores were .75 to .85. Correlations between SRFit and other survey scores were -0.23 to 0.72 and in the expected direction. Intraclass correlation coefficients were 0.79 to 0.93. All P-values were 0.001. Initial evaluation supports the SRFit survey's validity and reliability.
FLiGS Score: A New Method of Outcome Assessment for Lip Carcinoma–Treated Patients

PubMed Central

Grassi, Rita; Toia, Francesca; Di Rosa, Luigi; Cordova, Adriana

2015-01-01

Background: Lip cancer and its treatment have considerable functional and cosmetic effects with resultant nutritional and physical detriments. As we continue to investigate new treatment regimens, we are simultaneously required to assess postoperative outcomes to design interventions that lessen the adverse impact of this disease process. We wish to introduce Functional Lip Glasgow Scale (FLiGS) score as a new method of outcome assessment to measure the effect of lip cancer and its treatment on patients’ daily functioning. Methods: Fifty patients affected by lip squamous cell carcinoma were recruited between 2009 and 2013. Patients were asked to fill the FLiGS questionnaire before surgery, 1 month, 6 months, and 1 year after surgery. The subscores were used to calculate a total FLiGS score of global oral disability. Statistical analysis was performed to test validity and reliability. Results: FLiGS scores improved significantly from preoperative to 12 months postoperative values (P = 0.000). Statistical evidence of validity was provided through rs (Spearman correlation coefficient) that resulted >0.30 for all surveys and for which P < 0.001. FLiGS score reliability was shown through examination of internal consistency and test-retest reliability. Conclusions: FLiGS score is a simple way of assessing functional impairment related to lip cancer before and after surgery; it is sensitive, valid, reliable, and clinically relevant: it provides useful information to orient the physician in the postoperative management and in the rehabilitation program. PMID:26034652
Cross-cultural adaptation and validation of the Peripheral Artery Questionnaire: Korean version for patients with peripheral vascular diseases.

PubMed

Lee, Ji Hyun; Cho, Kyoung Im; Spertus, John; Kim, Seong Man

2012-08-01

The Peripheral Artery Questionnaire (PAQ), as developed in US English, is a validated scale to evaluate the health status of patients with peripheral artery disease (PAD). The aim of this study was to translate the PAQ into Korean and to evaluate its reliability and validity. A multi-step process of forward-translation, reconciliation, consultation with the developer, back-translation and proofreading was conducted. The test-retest reliability was evaluated at a 2-week interval using the intra-class correlation coefficient (ICC). The validity was assessed by identifying associations between Korean PAQ (KPAQ) scores and Korean Health Assessment Questionnaire (KHAQ) scores. A total of 100 PAD patients were enrolled: 63 without and 37 with severe claudication. The reliability of the KPAQ was adequate, with an ICC of 0.71. There were strong correlations between KPAQ's subscales. Cronbach's alpha for the summary score was 0.94, indicating good internal consistency and congruence with the original US version. The validity was supported by a significant correlation between the total KHAQ score and KPAQ physical function, stability, symptom, social limitation and quality of life scores (r = -0.24 to -0.90; p < 0.001) as well as between the KHAQ walking subscale and the KPAQ physical function score (r = -0.55, p < 0.001). Our results indicate that the KPAQ is a reliable, valid instrument to evaluate the health status of Korean patients with PAD.
Quality-of-life in insect venom allergy: validation of the Turkish version of the "Vespid Allergy Quality of Life Questionnaire" (VQLQ-T).

PubMed

Sin, Betül Ayşe; Öztuna, Derya; Gelincik, Aslı; Gürlek, Feridun; Baysan, Abdullah; Sin, Aytül Zerrin; Aydın, Ömür; Mısırlıgil, Zeynep

2016-01-01

"Vespid Allergy Quality of Life Questionnaire (VQLQ)" has been used to assess psychological burden of disease. The aim of this study was to evaluate validity, reliability and responsiveness to interventions of the Turkish version. The Turkish language Questionnaire (VQLQ-T) was administered to 81 patients with bee allergy and 65 patients with vespid allergy from different groups to achieve cross-sectional validation. To establish longitudinal validity, the questionnaire was administered to 36 patients treated with venom immunotherapy. The cross-sectional validation in patients with vespid venom allergy showed a correlation coefficient of 0.97 (Cronbach α). Spearman's correlation coefficient of the pretreatment VQLQ-T score with Expectation of Outcome (EoO) questionnaire score was 0.55 (p < 0.001). After treatment, correlation between VQLQ-T score and EoO score was 0.64 (p = 0.003) in these patients. The cross-sectional instrument validation for non-beekeepers with bee venom allergy yielded a correlation coefficient of 0.96 (Cronbach α). Spearman's correlation coefficient between pretreatment VQLQ-T score and EoO score was 0.47 (p < 0.001) and after treatment, correlation between VQLQ-T score and EoO score was 0.78 (p = 0.008) in these patients. These findings indicate cross-sectional validity of VQLQ-T. In the longitudinal validation, there was a positive correlation between EoO and VQLQ-T with a correlation coefficient of 0.562 (p < 0.001). While mean (±SD) VQLQ-T score was 5.27 (±1.29) in pretreatment, it was 2.78 (±1.01) after treatment (p < 0.001). The correlation between the mean change in VQLQ-T score and the mean change in EoO score was 0.42 (p = 0.011). The Turkish version of VQLQ-T enables measurement of Quality of Life (QoL) in patients with either vespid or bee venom allergy. Furthermore, responsiveness of this instrument demonstrates the questionnaire's ability to detect changes over time.
Validation of the Penn Acoustic Neuroma Quality-of-Life Scale (PANQOL) for Spanish-Speaking Patients.

PubMed

Medina, Maria Del Mar; Carrillo, Alvaro; Polo, Ruben; Fernandez, Borja; Alonso, Daniel; Vaca, Miguel; Cordero, Adela; Perez, Cecilia; Muriel, Alfonso; Cobeta, Ignacio

2017-04-01

Objective To perform translation, cross-cultural adaptation, and validation of the Penn Acoustic Neuroma Quality-of-Life Scale (PANQOL) to the Spanish language. Study Design Prospective study. Setting Tertiary neurotologic referral center. Subjects and Methods PANQOL was translated and translated back, and a pretest trial was performed. The study included 27 individuals diagnosed with vestibular schwannoma. Inclusion criteria were adults with untreated vestibular schwannoma, diagnosed in the past 12 months. Feasibility, internal consistency, test-retest reliability, construct validity, and ceiling and floor effects were assessed for the present study. Results The mean overall score of the PANQOL was 69.21 (0-100 scale, lowest to highest quality of life). Cronbach's α was 0.87. Intraclass correlation coefficient was performed for each item, with an overall score of 0.92. The κ coefficient scores were between moderate and almost perfect in more than 92% of patients. Anxiety and energy domains of the PANQOL were correlated with both physical and mental components of the SF-12. Hearing, balance, and pain domains were correlated with the SF-12 physical component. Facial and general domains were not significantly correlated with any component of the SF-12. Furthermore, the overall score of the PANQOL was correlated with the physical component of the SF-12. Conclusion Feasibility, internal consistency, reliability, and construct validity outcomes in the current study support the validity of the Spanish version of the PANQOL.
Test-retest reliability at the item level and total score level of the Norwegian version of the Spinal Cord Injury Falls Concern Scale (SCI-FCS).

PubMed

Roaldsen, Kirsti Skavberg; Måøy, Åsa Blad; Jørgensen, Vivien; Stanghelle, Johan Kvalvik

2016-05-01

Translation of the Spinal Cord Injury Falls Concern Scale (SCI-FCS), and investigation of test-retest reliability on item-level and total-score-level. Translation, adaptation and test-retest study. A specialized rehabilitation setting in Norway. Fifty-four wheelchair users with a spinal cord injury. The median age of the cohort was 49 years, and the median number of years after injury was 13. Interventions/measurements: The SCI-FCS was translated and back-translated according to guidelines. Individuals answered the SCI-FCS twice over the course of one week. We investigated item-level test-retest reliability using Svensson's rank-based statistical method for disagreement analysis of paired ordinal data. For relative reliability, we analyzed the total-score-level test-retest reliability with intraclass correlation coefficients (ICC2.1), the standard error of measurement (SEM), and the smallest detectable change (SDC) for absolute reliability/measurement-error assessment and Cronbach's alpha for internal consistency. All items showed satisfactory percentage agreement (≥69%) between test and retest. There were small but non-negligible systematic disagreements among three items; we recovered an 11-13% higher chance for a lower second score. There was no disagreement due to random variance. The test-retest agreement (ICC2.1) was excellent (0.83). The SEM was 2.6 (12%), and the SDC was 7.1 (32%). The Cronbach's alpha was high (0.88). The Norwegian SCI-FCS is highly reliable for wheelchair users with chronic spinal cord injuries.
The Arthroscopic Surgical Skill Evaluation Tool (ASSET).

PubMed

Koehler, Ryan J; Amsdell, Simon; Arendt, Elizabeth A; Bisson, Leslie J; Braman, Jonathan P; Bramen, Jonathan P; Butler, Aaron; Cosgarea, Andrew J; Harner, Christopher D; Garrett, William E; Olson, Tyson; Warme, Winston J; Nicandri, Gregg T

2013-06-01

Surgeries employing arthroscopic techniques are among the most commonly performed in orthopaedic clinical practice; however, valid and reliable methods of assessing the arthroscopic skill of orthopaedic surgeons are lacking. The Arthroscopic Surgery Skill Evaluation Tool (ASSET) will demonstrate content validity, concurrent criterion-oriented validity, and reliability when used to assess the technical ability of surgeons performing diagnostic knee arthroscopic surgery on cadaveric specimens. Cross-sectional study; Level of evidence, 3. Content validity was determined by a group of 7 experts using the Delphi method. Intra-articular performance of a right and left diagnostic knee arthroscopic procedure was recorded for 28 residents and 2 sports medicine fellowship-trained attending surgeons. Surgeon performance was assessed by 2 blinded raters using the ASSET. Concurrent criterion-oriented validity, interrater reliability, and test-retest reliability were evaluated. Content validity: The content development group identified 8 arthroscopic skill domains to evaluate using the ASSET. Concurrent criterion-oriented validity: Significant differences in the total ASSET score (P < .05) between novice, intermediate, and advanced experience groups were identified. Interrater reliability: The ASSET scores assigned by each rater were strongly correlated (r = 0.91, P < .01), and the intraclass correlation coefficient between raters for the total ASSET score was 0.90. Test-retest reliability: There was a significant correlation between ASSET scores for both procedures attempted by each surgeon (r = 0.79, P < .01). The ASSET appears to be a useful, valid, and reliable method for assessing surgeon performance of diagnostic knee arthroscopic surgery in cadaveric specimens. Studies are ongoing to determine its generalizability to other procedures as well as to the live operating room and other simulated environments.
The medial tibial stress syndrome score: a new patient-reported outcome measure.

PubMed

Winters, Marinus; Moen, Maarten H; Zimmermann, Wessel O; Lindeboom, Robert; Weir, Adam; Backx, Frank Jg; Bakker, Eric Wp

2016-10-01

At present, there is no validated patient-reported outcome measure (PROM) for patients with medial tibial stress syndrome (MTSS). Our aim was to select and validate previously generated items and create a valid, reliable and responsive PROM for patients with MTSS: the MTSS score. A prospective cohort study was performed in multiple sports medicine, physiotherapy and military facilities in the Netherlands. Participants with MTSS filled out the previously generated items for the MTSS score on 3 occasions. From previously generated items, we selected the best items. We assessed the MTSS score for its validity, reliability and responsiveness. The MTSS score was filled out by 133 participants with MTSS. Factor analysis showed the MTSS score to exhibit a single-factor structure with acceptable internal consistency (α=0.58) and good test-retest reliability (intraclass correlation coefficient=0.81). The MTSS score ranges from 0 to 10 points. The smallest detectable change in our sample was 0.69 at the group level and 4.80 at the individual level. Construct validity analysis showed significant moderate-to-large correlations (r=0.34-0.52, p<0.01). Responsiveness of the MTSS score was confirmed by a significant relation with the global perceived effect scale (β=-0.288, R(2)=0.21, p<0.001). The MTSS score is a valid, reliable and responsive PROM to measure the severity of MTSS. It is designed to evaluate treatment outcomes in clinical studies. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Oxford Knee Score: cross-cultural adaptation and validation of the Turkish version in patients with osteoarthritis of the knee.

PubMed

Tuğay, Baki Umut; Tuğay, Nazan; Güney, Hande; Kınıklı, Gizem İrem; Yüksel, İnci; Atilla, Bülent

2016-01-01

The Oxford Knee Score (OKS) is a valid, short, self-administered, and site- specific outcome measure specifically developed for patients with knee arthroplasty. This study aimed to cross-culturally adapt and validate the OKS to be used in Turkish-speaking patients with osteoarthritis of the knee. The OKS was translated and culturally adapted according to the guidelines in the literature. Ninety-one patients (mean age: 55.89±7.85 years) with knee osteoarthritis participated in the study. Patients completed the Turkish version of the Oxford Knee Score (OKS-TR), Short-Form 36 Health Survey (SF-36), and Western Ontario and McMaster Universities Index (WOMAC) questionnaires. Internal consistency was tested using Cronbach's α coefficient. Patients completed the OKS-TR questionnaire twice in 7 days to determine the reproducibility. Correlation between the total results of both tests was determined by Spearman's correlation coefficient and intraclass correlation coefficients (ICC). Validity was assessed by calculating Spearman's correlation coefficient between the OKS, WOMAC, and SF-36 scores. Floor and ceiling effects were analyzed. Internal consistency was high (Cronbach's α: 0.90). The reproducibility tested by 2 different methods showed no significant difference (p>0.05). The construct validity analyses showed a significant correlation between the OKS and the other scores (p<0.05). There was no floor or ceiling effect in total OKS score. The OKS-TR is a reliable and valid measure for the self-assessment of pain and function in Turkish-speaking patients with osteoarthritis of the knee.
The Consumer Quality Index in an accident and emergency department: internal consistency, validity and discriminative capacity.

PubMed

Bos, Nanne; Sturms, Leontien M; Stellato, Rebecca K; Schrijvers, Augustinus J P; van Stel, Henk F

2015-10-01

Patients' experiences are an indicator of health-care performance in the accident and emergency department (A&E). The Consumer Quality Index for the Accident and Emergency department (CQI A&E), a questionnaire to assess the quality of care as experienced by patients, was investigated. The internal consistency, construct validity and discriminative capacity of the questionnaire were examined. In the Netherlands, twenty-one A&Es participated in a cross-sectional survey, covering 4883 patients. The questionnaire consisted of 78 questions. Principal components analysis determined underlying domains. Internal consistency was determined by Cronbach's alpha coefficients, construct validity by Pearson's correlation coefficients and the discriminative capacity by intraclass correlation coefficients and reliability of A&E-level mean scores (G-coefficient). Seven quality domains emerged from the principal components analysis: information before treatment, timeliness, attitude of health-care professionals, professionalism of received care, information during treatment, environment and facilities, and discharge management. Domains were internally consistent (range: 0.67-0.84). Five domains and the 'global quality rating' had the capacity to discriminate among A&Es (significant intraclass correlation coefficient). Four domains and the 'global quality rating' were close to or above the threshold for reliably demonstrating differences among A&Es. The patients' experiences score on the domain timeliness showed the largest range between the worst- and best-performing A&E. The CQI A&E is a validated survey to measure health-care performance in the A&E from patients' perspective. Five domains regarding quality of care aspects and the 'global quality rating' had the capacity to discriminate among A&Es. © 2013 John Wiley & Sons Ltd.
Evaluative measurement properties of the patient-specific functional scale for primary shoulder complaints in physical therapy practice.

PubMed

Koehorst, Marije L S; van Trijffel, Emiel; Lindeboom, Robert

2014-08-01

Clinical measurement, longitudinal. To assess the test-retest reliability, construct validity, and responsiveness of the Patient-Specific Functional Scale (PSFS) in patients with a primary shoulder complaint. Health measurement outcomes have become increasingly important for evaluating treatment. Patient-specific questionnaires are useful tools for determining treatment goals and evaluating treatment in individual patients. These questionnaires have not yet been validated in patients with nonspecific shoulder pain. Patients completed the PSFS, the numeric pain rating scale, and the Shoulder Pain and Disability Index at baseline, and after 1 week and 4 to 6 weeks. Test-retest reliability was determined using intraclass correlation coefficients. To assess convergent validity, change scores of the PSFS were correlated with the numeric pain rating scale and Shoulder Pain and Disability Index change scores. Responsiveness was assessed by calculating the area under the curve, the minimal clinically important change, and minimal detectable change, using the global rating of change as an external criterion. Fifty patients (37 men; mean age, 47.7 years) participated in the study. Reliability was high (intraclass correlation coefficient = 0.87; 95% confidence interval [CI]: 0.72, 0.94). The correlations between the change scores of the PSFS and those of the Shoulder Pain and Disability Index and numeric pain rating scale were 0.45 (95% CI: 0.17, 0.80) and 0.55 (95% CI: 0.29, 0.73), respectively. The area under the curve for the PSFS was 0.67 (95% CI: 0.51, 0.83). The minimal detectable change and minimal clinically important change were 0.97 and 1.29 points, respectively. These results suggest that the PSFS is a reliable, valid, and responsive instrument that can be used as an evaluative instrument in patients with a primary shoulder complaint.
Cross-Cultural Adaptation and Validation of the Back Beliefs Questionnaire to the Arabic Language.

PubMed

Alamrani, Samia; Alsobayel, Hana; Alnahdi, Ali H; Moloney, Niamh; Mackey, Martin

2016-06-01

Translation, cross-cultural adaptation, and psychometric testing. To translate the Back Beliefs Questionnaire (BBQ) into Arabic and investigate its psychometric properties in an Arabic-speaking sample of individuals with low back pain (LBP). Back pain beliefs are associated with pain chronicity and disability in people with LBP. The BBQ is a recognized and frequently used tool for measuring these beliefs. To date the BBQ has not been translated into Arabic. The English version of the BBQ was translated and culturally adapted into Arabic (BBQ-Ar) according to published guidelines. The BBQ-Ar was then tested in a sample of 115 Arabic-speaking individuals with LBP. Reliability was evaluated through internal consistency (Cronbach α) and test-retest reliability (intraclass correlation coefficient), the latter in a subgroup of 25. Construct validity was assessed using exploratory factor analysis and by examining the correlation between the BBQ-Ar, the Oswestry Disability Index and a Numerical Pain Rating Scale. Internal consistency of the BBQ-Ar was good (Cronbach α = 0.77). Test-retest reliability was good (intraclass correlation coefficient [2,1] = 0.88). Exploratory factor analysis revealed a three-factor structure, explaining 46% of total variance, with the first factor alone explaining 24%. Eight of the nine scoring items were loaded on the first factor thus forming a unidimensional scale. A significant negative correlation was found between Oswestry Disability Index and BBQ-Ar scores (r = -0.307; P < 0.01), whereas no significant correlation was found between BBQ-Ar and Pain Rating Scale scores. No floor or celling effects were observed. The BBQ-Ar is a valid and reliable tool that can be used to assess back pain beliefs in Arabic-speaking individuals. N/A.

Assessment of Biopsychosocial Complexity and Health Care Needs: Measurement Properties of the INTERMED Self-Assessment Version.

PubMed

van Reedt Dortland, Arianne K B; Peters, Lilian L; Boenink, Annette D; Smit, Jan H; Slaets, Joris P J; Hoogendoorn, Adriaan W; Joos, Andreas; Latour, Corine H M; Stiefel, Friedrich; Burrus, Cyrille; Guitteny-Collas, Marie; Ferrari, Silvia

2017-05-01

The INTERMED Self-Assessment questionnaire (IMSA) was developed as an alternative to the observer-rated INTERMED (IM) to assess biopsychosocial complexity and health care needs. We studied feasibility, reliability, and validity of the IMSA within a large and heterogeneous international sample of adult hospital inpatients and outpatients as well as its predictive value for health care use (HCU) and quality of life (QoL). A total of 850 participants aged 17 to 90 years from five countries completed the IMSA and were evaluated with the IM. The following measurement properties were determined: feasibility by percentages of missing values; reliability by Cronbach α; interrater agreement by intraclass correlation coefficients; convergent validity of IMSA scores with mental health (Short Form 36 emotional well-being subscale and Hospital Anxiety and Depression Scale), medical health (Cumulative Illness Rating Scale) and QoL (Euroqol-5D) by Spearman rank correlations; and predictive validity of IMSA scores with HCU and QoL by (generalized) linear mixed models. Feasibility, face validity, and reliability (Cronbach α = 0.80) were satisfactory. Intraclass correlation coefficient between IMSA and IM total scores was .78 (95% CI = .75-.81). Correlations of the IMSA with the Short Form 36, Hospital Anxiety and Depression Scale, Cumulative Illness Rating Scale, and Euroqol-5D (convergent validity) were -.65, .15, .28, and -.59, respectively. The IMSA significantly predicted QoL and also HCU (emergency department visits, hospitalization, outpatient visits, and diagnostic examinations) after 3- and 6-month follow-up. Results were comparable between hospital sites, inpatients and outpatients, as well as age groups. The IMSA is a generic and time-efficient method to assess biopsychosocial complexity and to provide guidance for multidisciplinary care trajectories in adult patients, with good reliability and validity across different cultures.
Family Impact Scale (FIS): Cross-cultural Adaptation and Psychometric Properties for the Peruvian Spanish Language.

PubMed

Abanto, Jenny; Albites, Ursula; Bönecker, Marcelo; Paiva, Saul M; Castillo, Jorge L; Aguilar-Gálvez, Denisse

2015-12-01

The lack of a Family Impact Scale (FIS) in Spanish language limits its use as an indicator in Spanish-speaking countries and precludes comparisons with data from other cultural and ethnic groups. The purpose of this study was therefore to adapt the FIS cross-culturally to the Peruvian Spanish language and assess its reliability and validity. In order to translate and adapt the FIS cross-culturally, it was answered by 60 parents in two pilot tests, after which it was tested on 200 parents of children aged 11 to 14 years who were clinically examined for dental caries experience and malocclusions. Internal consistency was assessed by Cronbach's alpha coefficient while repeat administration of the FIS on the same 200 parents enabled the test-retest reliability to be assessed via intraclass correlation coefficient (ICC). Construct and discriminant validity were based on associations of the FIS with global ratings of oral health and clinical groups, respectively. Mean (standard deviation) FIS total score was 5.20 (5.86). Internal consistency was confirmed by Cronbach's alpha 0.84. Test-retest reliability revealed excellent reproducibility (ICC = 0.96). Construct validity was good, demonstrating statistically significant associations between total FIS score and global ratings of oral health (p=0.007) and overall wellbeing (p=0.002), as well as for the subscale scores (p<0.05) with exception of the financial burden subscale. The FIS was also able to discriminate between children with and without dental caries experience and malocclusions (p<0.05). Satisfactory psychometric results for the Peruvian Spanish FIS confirm it as a reliable, valid instrument for assessing the impact on the family caused by children's oral conditions. Sociedad Argentina de Investigación Odontológica.
Development and validation of the Pediatric Anesthesia Behavior score--an objective measure of behavior during induction of anesthesia.

PubMed

Beringer, Richard M; Greenwood, Rosemary; Kilpatrick, Nicky

2014-02-01

Measuring perioperative behavior changes requires validated objective rating scales. We developed a simple score for children's behavior during induction of anesthesia (Pediatric Anesthesia Behavior score) and assessed its reliability, concurrent validity, and predictive validity. Data were collected as part of a wider observational study of perioperative behavior changes in children undergoing general anesthesia for elective dental extractions. One-hundred and two healthy children aged 2-12 were recruited. Previously validated behavioral scales were used as follows: the modified Yale Preoperative Anxiety Scale (m-YPAS); the induction compliance checklist (ICC); the Pediatric Anesthesia Emergence Delirium scale (PAED); and the Post-Hospitalization Behavior Questionnaire (PHBQ). Pediatric Anesthesia Behavior (PAB) score was independently measured by two investigators, to allow assessment of interobserver reliability. Concurrent validity was assessed by examining the correlation between the PAB score, the m-YPAS, and the ICC. Predictive validity was assessed by examining the association between the PAB score, the PAED scale, and the PHBQ. The PAB score correlated strongly with both the m-YPAS (P < 0.001) and the ICC (P < 0.001). PAB score was significantly associated with the PAED score (P = 0.031) and with the PHBQ (P = 0.034). Two independent investigators recorded identical PAB scores for 94% of children and overall, there was close agreement between scores (Kappa coefficient of 0.886 [P < 0.001]). The PAB score is simple to use and may predict which children are at increased risk of developing postoperative behavioral disturbance. This study provides evidence for its reliability and validity. © 2013 John Wiley & Sons Ltd.
Measuring teamwork and conflict among Emergency Medical Technician personnel

PubMed Central

Patterson, P. Daniel; Weaver, Matthew D.; Weaver, Sallie J.; Rosen, Michael A.; Todorova, Gergana; Weingart, Laurie R.; Krackhardt, David; Lave, Judith R.; Arnold, Robert M.; Yealy, Donald M.; Salas, Eduardo

2011-01-01

Objective We sought to develop a reliable and valid tool for measuring teamwork among Emergency Medical Technician (EMT) partnerships. Methods We adapted existing scales and developed new items to measure components of teamwork. After recruiting a convenience sample of 39 agencies, we tested a 122-item draft survey tool. We performed a series of Exploratory Factor Analyses (EFA) and Confirmatory Factor Analysis (CFA) to test reliability and construct validity, describing variation in domain and global scores using descriptive statistics. Results We received 687 completed surveys. The EFA analyses identified a 9-factor solution. We labeled these factors [1] Team Orientation, [2] Team Structure & Leadership, [3] Partner Communication, Team Support, & Monitoring, [4] Partner Trust and Shared Mental Models, [5] Partner Adaptability & Back-Up Behavior, [6] Process Conflict, [7] Strong Task Conflict, [8] Mild Task Conflict, and [9] Interpersonal Conflict. We tested a short form (30-item SF) and long form (45-item LF) version. The CFA analyses determined that both the SF and LF versions possess positive psychometric properties of reliability and construct validity. The EMT-TEAMWORK-SF has positive internal consistency properties with a mean Cronbach’s alpha coefficient ≥0.70 across all 9-factors (mean=0.84; min=0.78, max=0.94). The mean Cronbach’s alpha coefficient for the EMT-TEAMWORK-LF version was 0.87 (min=0.79, max=0.94). There was wide variation in weighted scores across all 9 factors and the global score for the SF and LF versions. Mean scores were lowest for the Team Orientation factor (48.1, SD 21.5 SF; 49.3 SD 19.8 LF) and highest (more positive) for the Interpersonal Conflict factor (87.7 SD 18.1 for both SF and LF). Conclusions We developed a reliable and valid survey to evaluate teamwork between EMT partners. PMID:22128909
Reliability and validity of abbreviated surveys derived from the National Eye Institute Visual Function Questionnaire: The Study of Osteoporotic Fractures

PubMed Central

Gergana, Kodjebacheva; Coleman, Anne L.; Ensrud, Kristine E.; Cauley, Jane A.; Yu, Fei; Stone, Katie L.; Pedula, Kathryn L.; Hochberg, Marc C.; Mangione, Carol M.

2010-01-01

Purpose To test the reliability and validity of questionnaires shortened from the National Eye Institute 25-item Vision Function Questionnaire (NEI VFQ-9 and NEI VFQ-8). Design A cross-sectional multi-center cohort study. Methods Reliability was assessed by Cronbach alpha coefficients. Validity was evaluated by studying the association of vision-targeted quality-of-life composite scores with objective visual function measurements. Study population: A total of 5,482 women between the ages of 65 and 100 years participated in the Year-10 clinic visit in the Study of Osteoporotic Fractures (SOF). A total of 3,631 women with complete data were included in the visual acuity (VA) and visual field (VF) analysis of the NEI VFQ-9, which is defined for those who care to drive. and 5,311 in the analysis of the NEI VFQ-8. To assess differences in prevalent eye diseases, which were ascertained for a random sample of SOF participants, 853 and 1,237 women were included in the NEI VFQ-9 and the NEI VFQ-8 analyses, respectively. Results Cronbach alpha coefficient for the NEI VFQ-9 scale was 0.83 and that of the NEI VFQ-8 was 0.84. Using both questionnaires, women with VA worse than 20/40 had lower composite scores compared to those with VA 20/40 or better (p<0.001). Participants with mild, moderate, and severe binocular VF loss had lower composite scores compared to those with no binocular VF loss (p<0.001).Compared to women without chronic eye diseases in both eyes, women with at least one chronic eye disease in at least one eye had lower composite scores. Conclusions Both questionnaires showed high reliability across items and validity with respect to clinical markers of eye disease Future research should compare the properties of these shortened surveys to those of the NEI VFQ-25. PMID:20103058
[Reliability of a bibliometric tool used in France for hospital founding].

PubMed

Darmoni, Stefan J; Ladner, Joël; Devos, Patrick; Gehanno, Jean-François

2009-01-01

SIGAPS is a bibliometric score that aims at making an inventory, evaluating and promoting scientific publications of hospitals that perform research. It has become a major stake in France since it is one of the most important components of the MERRI (Mission Training, Research, Reference and Innovation) founding of hospitals. This score is based on the points attributed to the authors of articles published in journals indexed in Medline, according to the rank of the authors and the Impact Factor of the journal. to compare the reliability of the score when applying different way of computing it, and different weights for the rank or the Impact Factor. we computed the scores of all the physicians of a University Hospital, using the rules that are actually applied at the national level. We then used 4 different scenarios, with different weight given to the rank of authors or the Impact Factor. We compared the scores obtained by each author according to the different scenarios with the Spearman's rank and Pearson's correlation coefficients. The score is not significantly affected when no points are given to the fourth authors and above, when the last author get more points or to change the points according to the Impact Factor of the journal. The different scenarios do not lead to significant changes for the physicians' scores, and therefore for the cumulated score of the hospital. Despite the well known limits of bibliometric indicators, the SIGAPS score appears reliable to compare the hospitals for founding decisions.
Measuring the Reliability of Picture Story Exercises like the TAT

PubMed Central

Gruber, Nicole; Kreuzpointner, Ludwig

2013-01-01

As frequently reported, psychometric assessments on Picture Story Exercises, especially variations of the Thematic Apperception Test, mostly reveal inadequate scores for internal consistency. We demonstrate that the reason for this apparent shortcoming is not caused by the coding system itself but from the incorrect use of internal consistency coefficients, especially Cronbach’s α. This problem could be eliminated by using the category-scores as items instead of the picture-scores. In addition to a theoretical explanation we prove mathematically why the use of category-scores produces an adequate internal consistency estimation and examine our idea empirically with the origin data set of the Thematic Apperception Test by Heckhausen and two additional data sets. We found generally higher values when using the category-scores as items instead of picture-scores. From an empirical and theoretical point of view, the estimated reliability is also superior to each category within a picture as item measuring. When comparing our suggestion with a multifaceted Rasch-model we provide evidence that our procedure better fits the underlying principles of PSE. PMID:24348902
Stability of scores for the Slosson Full-Range Intelligence Test.

PubMed

Williams, Thomas O; Eaves, Ronald C; Woods-Groves, Suzanne; Mariano, Gina

2007-08-01

The test-retest stability of the Slosson Full-Range Intelligence Test by Algozzine, Eaves, Mann, and Vance was investigated with test scores from a sample of 103 students. With a mean interval of 13.7 mo. and different examiners for each of the two test administrations, the test-retest reliability coefficients for the Full-Range IQ, Verbal Reasoning, Abstract Reasoning, Quantitative Reasoning, and Memory were .93, .85, .80, .80, and .83, respectively. Mean differences from the test-retest scores were not statistically significantly different for any of the scales. Results suggest that Slosson scores are stable over time even when different examiners administer the test.
Validity and reliability of a Nigerian-Yoruba version of the stroke-specific quality of life scale 2.0.

PubMed

Odetunde, Marufat Oluyemisi; Akinpelu, Aderonke Omobonike; Odole, Adesola Christiana

2017-10-19

Psychometric evidence is necessary to establish scientific integrity and clinical usefulness of translations and cultural adaptations of the Stroke-Specific Quality of Life (SS-QoL) scale. However, the limited evidence on psychometrics of Yoruba version of SS-QoL 2.0 (SS-QoL(Y)) is a significant shortcoming. This study assessed the test-retest reliability, internal consistency, convergent, divergent, discriminant and known-group validity of the SS-QoL(Y). Yoruba version of the WHOQoL-BREF was used to test the convergent and divergent validity of the SS-QoL(Y) among 100 consenting stroke survivors. The WHOQoL-BREF and SS-QoL(Y) was administered randomly in order to eliminate bias. The test-retest reliability of the SS-QoL(Y) was carried out among 68 of the respondents within an interval of 7 days. All respondents were purposively recruited from selected secondary and tertiary health facilities in South-west Nigeria. Data were analysed using descriptive statistics of mean and standard deviation, and inferential statistics of Spearman correlation, Cronbach's alpha, Intra-class Correlation Coefficient (ICC), Independent t-test and One-way ANOVA. Alpha level was set at p < 0.05. The physical health, psychological health, social relationship and environment domains on WHOQoL-BREF with correlation coefficient that ranged from 0.214 to 0.360 showed significant correlation with similar domains on SS-QoL(Y). Dissimilar domains between the two scales had r values from 0.035 to 0.366. Discriminant validity of SS-QoL(Y) showed that items' r value ranged from 0.711 to 0.920 with their hypothesized domains. The scale demonstrated moderate to strong test-retest reliability with Intra-class correlation coefficient (ICC) for the domains and overall scores (r = 0.47 to 0.81) and moderate to high internal consistency (Cronbach's alpha =0.61 to 0.82) for domains scores. These correlations were also significant for the domains and overall scores (p < 0.05). There were no significant differences across different age groups or gender for the domains or overall scores of SS-QoL(Y). Discriminant and known-group validity, test-retest reliability and internal consistency of the Yoruba version of the Stroke Specific Quality of Life 2.0 are adequate while the convergent and divergent validity are low but acceptable. The SS-QoL(Y) is recommended for assessing health-related quality of life among Yoruba stroke survivors.
Validation of a score tool for measurement of histological severity in juvenile dermatomyositis and association with clinical severity of disease

PubMed Central

Varsani, Hemlata; Charman, Susan C; Li, Charles K; Marie, Suely K N; Amato, Anthony A; Banwell, Brenda; Bove, Kevin E; Corse, Andrea M; Emslie-Smith, Alison M; Jacques, Thomas S; Lundberg, Ingrid E; Minetti, Carlo; Nennesmo, Inger; Rushing, Elisabeth J; Sallum, Adriana M E; Sewry, Caroline; Pilkington, Clarissa A; Holton, Janice L; Wedderburn, Lucy R

2015-01-01

Objectives To study muscle biopsy tissue from patients with juvenile dermatomyositis (JDM) in order to test the reliability of a score tool designed to quantify the severity of histological abnormalities when applied to biceps humeri in addition to quadriceps femoris. Additionally, to evaluate whether elements of the tool correlate with clinical measures of disease severity. Methods 55 patients with JDM with muscle biopsy tissue and clinical data available were included. Biopsy samples (33 quadriceps, 22 biceps) were prepared and stained using standardised protocols. A Latin square design was used by the International Juvenile Dermatomyositis Biopsy Consensus Group to score cases using our previously published score tool. Reliability was assessed by intraclass correlation coefficient (ICC) and scorer agreement (α) by assessing variation in scorers’ ratings. Scores from the most reliable tool items correlated with clinical measures of disease activity at the time of biopsy. Results Inter- and intraobserver agreement was good or high for many tool items, including overall assessment of severity using a Visual Analogue Scale. The tool functioned equally well on biceps and quadriceps samples. A modified tool using the most reliable score items showed good correlation with measures of disease activity. Conclusions The JDM biopsy score tool has high inter- and intraobserver agreement and can be used on both biceps and quadriceps muscle tissue. Importantly, the modified tool correlates well with clinical measures of disease activity. We propose that standardised assessment of muscle biopsy tissue should be considered in diagnostic investigation and clinical trials in JDM. PMID:24064003
Development and Validation of an Instrument for Measuring the Quality of Teamwork in Teaching Teams in Postgraduate Medical Training (TeamQ)

PubMed Central

Slootweg, Irene A.; Lombarts, Kiki M. J. M. H.; Boerebach, Benjamin C. M.; Heineman, Maas Jan; Scherpbier, Albert J. J. A.; van der Vleuten, Cees P. M.

2014-01-01

Background Teamwork between clinical teachers is a challenge in postgraduate medical training. Although there are several instruments available for measuring teamwork in health care, none of them are appropriate for teaching teams. The aim of this study is to develop an instrument (TeamQ) for measuring teamwork, to investigate its psychometric properties and to explore how clinical teachers assess their teamwork. Method To select the items to be included in the TeamQ questionnaire, we conducted a content validation in 2011, using a Delphi procedure in which 40 experts were invited. Next, for pilot testing the preliminary tool, 1446 clinical teachers from 116 teaching teams were requested to complete the TeamQ questionnaire. For data analyses we used statistical strategies: principal component analysis, internal consistency reliability coefficient, and the number of evaluations needed to obtain reliable estimates. Lastly, the median TeamQ scores were calculated for teams to explore the levels of teamwork. Results In total, 31 experts participated in the Delphi study. In total, 114 teams participated in the TeamQ pilot. The median team response was 7 evaluations per team. The principal component analysis revealed 11 factors; 8 were included. The reliability coefficients of the TeamQ scales ranged from 0.75 to 0.93. The generalizability analysis revealed that 5 to 7 evaluations were needed to obtain internal reliability coefficients of 0.70. In terms of teamwork, the clinical teachers scored residents' empowerment as the highest TeamQ scale and feedback culture as the area that would most benefit from improvement. Conclusions This study provides initial evidence of the validity of an instrument for measuring teamwork in teaching teams. The high response rates and the low number of evaluations needed for reliably measuring teamwork indicate that TeamQ is feasible for use by teaching teams. Future research could explore the effectiveness of feedback on teamwork in follow up measurements. PMID:25393006
Development and validation of an instrument for measuring the quality of teamwork in teaching teams in postgraduate medical training (TeamQ).

PubMed

Slootweg, Irene A; Lombarts, Kiki M J M H; Boerebach, Benjamin C M; Heineman, Maas Jan; Scherpbier, Albert J J A; van der Vleuten, Cees P M

2014-01-01

Teamwork between clinical teachers is a challenge in postgraduate medical training. Although there are several instruments available for measuring teamwork in health care, none of them are appropriate for teaching teams. The aim of this study is to develop an instrument (TeamQ) for measuring teamwork, to investigate its psychometric properties and to explore how clinical teachers assess their teamwork. To select the items to be included in the TeamQ questionnaire, we conducted a content validation in 2011, using a Delphi procedure in which 40 experts were invited. Next, for pilot testing the preliminary tool, 1446 clinical teachers from 116 teaching teams were requested to complete the TeamQ questionnaire. For data analyses we used statistical strategies: principal component analysis, internal consistency reliability coefficient, and the number of evaluations needed to obtain reliable estimates. Lastly, the median TeamQ scores were calculated for teams to explore the levels of teamwork. In total, 31 experts participated in the Delphi study. In total, 114 teams participated in the TeamQ pilot. The median team response was 7 evaluations per team. The principal component analysis revealed 11 factors; 8 were included. The reliability coefficients of the TeamQ scales ranged from 0.75 to 0.93. The generalizability analysis revealed that 5 to 7 evaluations were needed to obtain internal reliability coefficients of 0.70. In terms of teamwork, the clinical teachers scored residents' empowerment as the highest TeamQ scale and feedback culture as the area that would most benefit from improvement. This study provides initial evidence of the validity of an instrument for measuring teamwork in teaching teams. The high response rates and the low number of evaluations needed for reliably measuring teamwork indicate that TeamQ is feasible for use by teaching teams. Future research could explore the effectiveness of feedback on teamwork in follow up measurements.
Validity and applicability of a video-based animated tool to assess mobility in elderly Latin American populations

PubMed Central

Guerra, Ricardo Oliveira; Oliveira, Bruna Silva; Alvarado, Beatriz Eugenia; Curcio, Carmen Lucia; Rejeski, W Jack; Marsh, Anthony P; Ip, Edward H; Barnard, Ryan T; Guralnik, Jack M; Zunzunegui, Maria Victoria

2016-01-01

Aim To assess the reliability and the validity of Portuguese- and Spanish-translated versions of the video-based short-form Mobility Assessment Tool in assessing self-reported mobility, and to provide evidence for the applicability of these videos in elderly Latin American populations as a complement to physical performance measures. Methods The sample consisted of 300 elderly participants (150 from Brazil, 150 from Colombia) recruited at neighborhood social centers. Mobility was assessed with the Mobility Assessment Tool, and compared with the Short Physical Performance Battery score and self-reported functional limitations. Reliability was calculated using intraclass correlation coefficients. Multiple linear regression analyses were used to assess associations among mobility assessment tools and health, and sociodemographic variables. Results A significant gradient of increasing Mobility Assessment Tool score with better physical function was observed for both self-reported and objective measures, and in each city. Associations between self-reported mobility and health were strong, and significant. Mobility Assessment Tool scores were lower in women at both sites. Intraclass correlation coefficients of the Mobility Assessment Tool were 0.94 (95% confidence interval 0.90–0.97) in Brazil and 0.81 (95% confidence interval 0.66–0.91) in Colombia. Mobility Assessment Tool scores were lower in Manizales than in Natal after adjustment by Short Physical Performance Battery, self-rated health and sex. Conclusions These results provide evidence for high reliability and good validity of the Mobility Assessment Tool in its Spanish and Portuguese versions used in Latin American populations. In addition, the Mobility Assessment Tool can detect mobility differences related to environmental features that cannot be captured by objective perfor mance measures. PMID:24666718
Validity and applicability of a video-based animated tool to assess mobility in elderly Latin American populations.

PubMed

Guerra, Ricardo Oliveira; Oliveira, Bruna Silva; Alvarado, Beatriz Eugenia; Curcio, Carmen Lucia; Rejeski, W Jack; Marsh, Anthony P; Ip, Edward H; Barnard, Ryan T; Guralnik, Jack M; Zunzunegui, Maria Victoria

2014-10-01

To assess the reliability and the validity of Portuguese- and Spanish-translated versions of the video-based short-form Mobility Assessment Tool in assessing self-reported mobility, and to provide evidence for the applicability of these videos in elderly Latin American populations as a complement to physical performance measures. The sample consisted of 300 elderly participants (150 from Brazil, 150 from Colombia) recruited at neighborhood social centers. Mobility was assessed with the Mobility Assessment Tool, and compared with the Short Physical Performance Battery score and self-reported functional limitations. Reliability was calculated using intraclass correlation coefficients. Multiple linear regression analyses were used to assess associations among mobility assessment tools and health, and sociodemographic variables. A significant gradient of increasing Mobility Assessment Tool score with better physical function was observed for both self-reported and objective measures, and in each city. Associations between self-reported mobility and health were strong, and significant. Mobility Assessment Tool scores were lower in women at both sites. Intraclass correlation coefficients of the Mobility Assessment Tool were 0.94 (95% confidence interval 0.90-0.97) in Brazil and 0.81 (95% confidence interval 0.66-0.91) in Colombia. Mobility Assessment Tool scores were lower in Manizales than in Natal after adjustment by Short Physical Performance Battery, self-rated health and sex. These results provide evidence for high reliability and good validity of the Mobility Assessment Tool in its Spanish and Portuguese versions used in Latin American populations. In addition, the Mobility Assessment Tool can detect mobility differences related to environmental features that cannot be captured by objective performance measures. © 2013 Japan Geriatrics Society.
Measurement Properties of the Lower Extremity Functional Scale: A Systematic Review.

PubMed

Mehta, Saurabh P; Fulton, Allison; Quach, Cedric; Thistle, Megan; Toledo, Cesar; Evans, Neil A

2016-03-01

Systematic review of measurement properties. Many primary studies have examined the measurement properties, such as reliability, validity, and sensitivity to change, of the Lower Extremity Functional Scale (LEFS) in different clinical populations. A systematic review summarizing these properties for the LEFS may provide an important resource. To locate and synthesize evidence on the measurement properties of the LEFS and to discuss the clinical implications of the evidence. A literature search was conducted in 4 databases (PubMed, MEDLINE, Embase, and CINAHL), using predefined search terms. Two reviewers performed a critical appraisal of the included studies using a standardized assessment form. A total of 27 studies were included in the review, of which 18 achieved a very good to excellent methodological quality level. The LEFS scores demonstrated excellent test-retest reliability (intraclass correlation coefficients ranging between 0.85 and 0.99) and demonstrated the expected relationships with measures assessing similar constructs (Pearson correlation coefficient values of greater than 0.7). The responsiveness of the LEFS scores was excellent, as suggested by consistently high effect sizes (greater than 0.8) in patients with different lower extremity conditions. Minimal detectable change at the 90% confidence level (MDC90) for the LEFS scores varied between 8.1 and 15.3 across different reassessment intervals in a wide range of patient populations. The pooled estimate of the MDC90 was 6 points and the minimal clinically important difference was 9 points in patients with lower extremity musculoskeletal conditions, which are indicative of true change and clinically meaningful change, respectively. The results of this review support the reliability, validity, and responsiveness of the LEFS scores for assessing functional impairment in a wide array of patient groups with lower extremity musculoskeletal conditions.
Validation of the Polish version of Diabetes Quality of Life - Brief Clinical Inventory (DQL-BCI) among patients with type 2 diabetes.

PubMed

Dudzińska, Marta; Tarach, Jerzy S; Burroughs, Thomas E; Zwolak, Agnieszka; Matuszek, Beata; Smoleń, Agata; Nowakowski, Andrzej

2014-10-27

The aim of the study was to develop a Polish version of the Diabetes Quality of Life Brief Clinical Inventory (DQL-BCI) and to perform validating evaluation of selected psychometric aspects. The translation process was performed in accordance with generally accepted international principles of translation and cultural adaptation of measurement tools. Two hundred and seventy-four subjects with type 2 diabetes completed the Polish version of DQL-BCI, the generic EQ-5D questionnaire and the diabetes-specific DSC-R. The examination provides information about the reliability (internal consistency, test-retest) and the construct validity of the studied tool (the relationship between the DQL-BCI score and EQ-5D and DSC-R scales, as well as selected clinical patient characteristics). Cronbach's α (internal consistency) for the translated version of DQL-BCI was 0.76. Test-retest Pearson correlation coefficient was 0.96. Spearman's coefficient correlation between DQL-BCI score and EQ-5D index and EQ-VAS were 0.6 (p = 0.0000001) and 0.61 (p = 0.0000001) respectively. The correlation between scores of the examined tool and DSC-R total score was -0.6 (p = 0.0000001). Quality of life was lower among patients with microvascular as well as macrovascular complications and with occurring hypoglycemic episodes. The result of this study is the Polish scale used to test the quality of life of patients with diabetes, which includes the range of problems faced by patients while maintaining a patient-friendly form. High reliability of the scale and good construct validity qualify the Polish version of DQL-BCI as a reliable tool in both research and individual diagnostics.
Validation of the Polish version of Diabetes Quality of Life – Brief Clinical Inventory (DQL-BCI) among patients with type 2 diabetes

PubMed Central

Tarach, Jerzy S.; Burroughs, Thomas E.; Zwolak, Agnieszka; Matuszek, Beata; Smoleń, Agata; Nowakowski, Andrzej

2014-01-01

Introduction The aim of the study was to develop a Polish version of the Diabetes Quality of Life Brief Clinical Inventory (DQL-BCI) and to perform validating evaluation of selected psychometric aspects. Material and methods The translation process was performed in accordance with generally accepted international principles of translation and cultural adaptation of measurement tools. Two hundred and seventy-four subjects with type 2 diabetes completed the Polish version of DQL-BCI, the generic EQ-5D questionnaire and the diabetes-specific DSC-R. The examination provides information about the reliability (internal consistency, test-retest) and the construct validity of the studied tool (the relationship between the DQL-BCI score and EQ-5D and DSC-R scales, as well as selected clinical patient characteristics). Results Cronbach's α (internal consistency) for the translated version of DQL-BCI was 0.76. Test-retest Pearson correlation coefficient was 0.96. Spearman's coefficient correlation between DQL-BCI score and EQ-5D index and EQ-VAS were 0.6 (p = 0.0000001) and 0.61 (p = 0.0000001) respectively. The correlation between scores of the examined tool and DSC-R total score was –0.6 (p = 0.0000001). Quality of life was lower among patients with microvascular as well as macrovascular complications and with occurring hypoglycemic episodes. Conclusions The result of this study is the Polish scale used to test the quality of life of patients with diabetes, which includes the range of problems faced by patients while maintaining a patient-friendly form. High reliability of the scale and good construct validity qualify the Polish version of DQL-BCI as a reliable tool in both research and individual diagnostics. PMID:25395940
Development of a Facebook Addiction Scale.

PubMed

Andreassen, Cecilie Schou; Torsheim, Torbjørn; Brunborg, Geir Scott; Pallesen, Ståle

2012-04-01

The Bergen Facebook Addiction Scale (BFAS), initially a pool of 18 items, three reflecting each of the six core elements of addiction (salience, mood modification, tolerance, withdrawal, conflict, and relapse), was constructed and administered to 423 students together with several other standardized self-report scales (Addictive Tendencies Scale, Online Sociability Scale, Facebook Attitude Scale, NEO-FFI, BIS/BAS scales, and Sleep questions). That item within each of the six addiction elements with the highest corrected item-total correlation was retained in the final scale. The factor structure of the scale was good (RMSEA = .046, CFI = .99) and coefficient alpha was .83. The 3-week test-retest reliability coefficient was .82. The scores converged with scores for other scales of Facebook activity. Also, they were positively related to Neuroticism and Extraversion, and negatively related to Conscientiousness. High scores on the new scale were associated with delayed bedtimes and rising times.
Validity and Reliability of Thai Version of the Foot and Ankle Ability Measure (FAAM) Subjective Form.

PubMed

Arunakul, Marut; Arunakul, Preeyaphan; Suesiritumrong, Chakhrist; Angthong, Chayanin; Chernchujit, Bancha

2015-06-01

Self-administered questionnaires have become an important aspect for clinical outcome assessment of foot and ankle-related problems. The Foot and Ankle Ability Measure (FAAM) subjective form is a region-specific questionnaire that is widely used and has sufficient validity and reliability from previous studies. Translate the original English version of FAAM into a Thai version and evaluate the validity and reliability of Thai FAAM in patients with foot and ankle-related problems. The FAAM subjective form was translated into Thai using forward-backward translation protocol. Afterward, reliability and validity were tested. Following responses from 60 consecutive patients on two questionnaires, the Thai FAAM subjective form and the short form (SF)-36, were used. The validity was tested by correlating the scores from both questionnaires. The reliability was adopted by measuring the test-retest reliability and internal consistency. Thai FAAM score including activity of daily life (ADL) and Sport subscale demonstrated the sufficient correlations with physical functioning (PF) and physical composite score (PCS) domains of the SF-36 (statistically significant with p < 0.001 level and ≥ 0.5 values). The result of reliability revealed highly intra-class correlation coefficient as 0.8 and 0.77, respectively from test-retest study. The internal consistency was strong (Cronbach alpha = 0.94 and 0.88, respectively). The Thai version of FAAM subjective form retained the characteristics of the original version and has proved a reliable evaluation instrument for patients with foot and ankle-related problems.
TEST-RETEST RELIABILITY OF THE CLOSED KINETIC CHAIN UPPER EXTREMITY STABILITY TEST (CKCUEST) IN ADOLESCENTS: RELIABILITY OF CKCUEST IN ADOLESCENTS.

PubMed

de Oliveira, Valéria M A; Pitangui, Ana C R; Nascimento, Vinícius Y S; da Silva, Hítalo A; Dos Passos, Muana H P; de Araújo, Rodrigo C

2017-02-01

The Closed Kinetic Chain Upper Extremity Stability Test (CKCUEST) has been proposed as an option to assess upper limb function and stability; however, there are few studies that support the use of this test in adolescents. The purpose of the present study was to investigate the intersession reliability and agreement of three CKCUEST scores in adolescents and establish clinimetric values for this test. Test-retest reliability. Twenty-five healthy adolescents of both sexes were evaluated. The subjects performed two CKCUEST with an interval of one week between the tests. An intraclass correlation coefficient (ICC 3,3 ) two-way mixed model with a 95% interval of confidence was utilized to determine intersession reliability. A Bland-Altman graph was plotted to analyze the agreement between assessments. The presence of systematic error was evaluated by a one-sample t test. The difference between the evaluation and reevaluation was observed using a paired-sample t test. The level of significance was set at 0.05. Standard error of measurements and minimum detectable changes were calculated. The intersession reliability of the average touches score, normalized score, and power score were 0.68, 0.68 and 0.87, the standard error of measurement were 2.17, 1.35 and 6.49, and the minimal detectable change was 6.01, 3.74 and 17.98, respectively. The presence of systematic error (p < 0.014), the significant difference between the measurements (p < 0.05), and the analysis of the Bland-Altman graph infer that CKCUEST is a discordant test with moderate to excellent reliability when used with adolescents. The CKCUEST is a measurement with moderate to excellent reliability for adolescents. 2b.

Considerations in the use of reflective writing for student assessment: issues of reliability and validity.

PubMed

Moniz, Tracy; Arntfield, Shannon; Miller, Kristina; Lingard, Lorelei; Watling, Chris; Regehr, Glenn

2015-09-01

Reflective writing is a popular tool to support the growth of reflective capacity in undergraduate medical learners. Its popularity stems from research suggesting that reflective capacity may lead to improvements in skills such as empathy, communication, collaboration and professionalism. This has led to assumptions that reflective writing can also serve as a tool for student assessment. However, evidence to support the reliability and validity of reflective writing as a meaningful assessment strategy is lacking. Using a published instrument for measuring 'reflective capacity' (the Reflection Evaluation for Learners' Enhanced Competencies Tool [REFLECT]), four trained raters independently scored four samples of writing from each of 107 undergraduate medical students to determine the reliability of reflective writing scores. REFLECT scores were then correlated with scores on a Year 4 objective structured clinical examination (OSCE) and Year 2 multiple-choice question (MCQ) examinations to examine, respectively, convergent and divergent validity. Across four writing samples, four-rater Cronbach's α-values ranged from 0.72 to 0.82, demonstrating reasonable inter-rater reliability with four raters using the REFLECT rubric. However, inter-sample reliability was fairly low (four-sample Cronbach's α = 0.54, single-sample intraclass correlation coefficient: 0.23), which suggests that performance on one reflective writing sample was not strongly indicative of performance on the next. Approximately 14 writing samples are required to achieve reasonable inter-sample reliability. The study found weak, non-significant correlations between reflective writing scores and both OSCE global scores (r = 0.13) and MCQ examination scores (r = 0.10), demonstrating a lack of relationship between reflective writing and these measures of performance. Our findings suggest that to draw meaningful conclusions about reflective capacity as a stable construct in individuals requires 14 writing samples per student, each assessed by four or five raters. This calls into question the feasibility and utility of using reflective writing rigorously as an assessment tool in undergraduate medical education. © 2015 John Wiley & Sons Ltd.
Validation of the Mini-TQ in a Dutch-speaking population: a rapid assessment for tinnitus-related distress.

PubMed

Vanneste, S; Plazier, M; van der Loo, E; Ost, J; Meeus, O; Van de Heyning, P; De Ridder, D

2011-01-01

Up to 30% of the adult population experiences tinnitus at some point in life. The aim of the present study was to validate the Mini-Tinnitus Questionnaire (TQ) in a Dutch-speaking population for measuring tinnitus-related distress and compare it with the extended version normally used in clinical practice and research. We assessed 181 patients at the Tinnitus Research Initiative clinic of Antwerp University Hospital. Twelve items from the TQ chosen by Hiller and Goebel based on the optimal combination of high item correlation, reliability, and sensitivity were selected and correlated to the different subscale and global scores of the TQ. Internal consistency was evaluated using Cronbach's alpha coefficient, and the Guttman split-half coefficient was used to confirm reliability. Correlation to the global TQ score was .93, internal consistency was .87, and reliability was .89. This study further revealed that the Mini-TQ correlates better with the different subscales of the TQ in the Dutch-speaking population. The convergence validity was confirmed, ensuring that this new instrument measures distress. In addition, the norms suggested by Hiller and Goebel were verified and established. Based on these results, the Mini-TQ is recommended as a valid instrument for evaluating tinnitus-related distress in Dutch-speaking populations for a compact, quick, and economical assessment.
Reliability and Normative Data for the Dynamic Visual Acuity Test for Vestibular Screening.

PubMed

Riska, Kristal M; Hall, Courtney D

2016-06-01

The purpose of this study was to determine reliability of computerized dynamic visual acuity (DVA) testing and to determine reference values for younger and older adults. A primary function of the vestibular system is to maintain gaze stability during head motion. The DVA test quantifies gaze stabilization with the head moving versus stationary. Commercially available computerized systems allow clinicians to incorporate DVA into their assessment; however, information regarding reliability and normative values of these systems is sparse. Forty-six healthy adults, grouped by age, with normal vestibular function were recruited. Each participant completed computerized DVA testing including static visual acuity, minimum perception time, and DVA using the NeuroCom inVision System. Testing was performed by two examiners in the same session and then repeated at a follow-up session 3 to 14 days later. Intraclass correlation coefficients (ICCs) were used to determine inter-rater and test-retest reliability. ICCs for inter-rater reliability ranged from 0.323 to 0.937 and from 0.434 to 0.909 for horizontal and vertical head movements, respectively. ICCs for test-retest reliability ranged from 0.154 to 0.856 and from 0.377 to 0.9062 for horizontal and vertical head movements, respectively. Overall, raw scores (left/right DVA and up/down DVA) were more reliable than DVA loss scores. Reliability of a commercially available DVA system has poor-to-fair reliability for DVA loss scores. The use of a convergence paradigm and not incorporating the forced choice paradigm may contribute to poor reliability.
Concordance of DSM-IV Axis I and II diagnoses by personal and informant's interview.

PubMed

Schneider, Barbara; Maurer, Konrad; Sargk, Dieter; Heiskel, Harald; Weber, Bernhard; Frölich, Lutz; Georgi, Klaus; Fritze, Jürgen; Seidler, Andreas

2004-06-30

The validity and reliability of using psychological autopsies to diagnose a psychiatric disorder is a critical issue. Therefore, interrater and test-retest reliability of the Structured Clinical Interview for DSM-IV Axis I and Personality Disorders and the usefulness of these instruments for the psychological autopsy method were investigated. Diagnoses by informant's interview were compared with diagnoses generated by a personal interview of 35 persons. Interrater reliability and test-retest reliability were assessed in 33 and 29 persons, respectively. Chi-square analysis, kappa and intraclass correlation coefficients, and Kendall's tau were used to determine agreement of diagnoses. Kappa coefficients were above 0.84 for substance-related disorders, mood disorders, and anxiety and adjustment disorders, and above 0.65 for Axis II disorders for interrater and test-retest reliability. Agreement by personal and relative's interview generated kappa coefficients above 0.79 for most Axis I and above 0.65 for most personality disorder diagnoses; Kendall's tau for dimensional individual personality disorder scores ranged from 0.22 to 0.72. Despite of a small number of psychiatric disorders in the selected population, the present results provide support for the validity of most diagnoses obtained through the best-estimate method using the Structured Clinical Interview for DSM-IV Axis I and Personality Disorders. This instrument can be recommended as a tool for the psychological autopsy procedure in post-mortem research. Copyright 2004 Elsevier Ireland Ltd.
[Reliability of the Japanese version of the Scale for the Assessment and Rating of Ataxia (SARA)].

PubMed

Sato, Kazunori; Yabe, Ichiro; Soma, Hiroyuki; Yasui, Kenichi; Ito, Mizuki; Shimohata, Takayoshi; Onodera, Osamu; Nakashima, Kenji; Sobue, Gen; Nishizawa, Masatoyo; Sasaki, Hidenao

2009-05-01

The International Cooperative Ataxia Rating Scale (ICARS) is widely used as a scale for the assessment of the severity of cerebellar ataxia. However, this scale comprises several items; thus, making the application of this scale is not sufficiently practical to perform daily assessment of ataxic patients. A new rating scale--Scale for the Assessment and Rating of Ataxia (SARA)--was shown to provide highly reliable assessments; further, the scores on SARA correlated with the ICARS score and the Barthel index. After obtaining the permission, original SARA was translated into Japanese. To examine the reliability and internal consistency of the Japanese version of the SARA for the assessment of cerebellar ataxia in 66 patients with spinocerebellar degeneration. Intraclass coefficients (ICC) were observed to be greater than 0.8 except in the case of the inter-rater "finger chase" and "fast alternating hand movement" tests. The Japanese version of SARA is highly reliable and very useful for the assessment of cerebellar ataxia on a daily basis.
Validity and reliability of portfolio assessment of competency in a baccalaureate dental hygiene program

NASA Astrophysics Data System (ADS)

Gadbury-Amyot, Cynthia C.

This study examined validity and reliability of portfolio assessment using Messick's (1996, 1995) unified framework of construct validity. Theoretical and empirical evidence was sought for six aspects of construct validity. The sample included twenty student portfolios. Each portfolio were evaluated by seven faculty raters using a primary trait analysis scoring rubric. There was a significant relationship (r = .81--.95; p < .01) between the seven subscales in the scoring rubric demonstrating measurement of a common construct. Item analysis was conducted to examine convergent and discriminant empirical relationships of the 35 items in the scoring rubric. There was a significant relationship between all items ( p < .01), and all but one item was more strongly correlated with its own subscale than with other subscales. However, correlations of items across subscales were predominantly moderate in strength indicating that items did not strongly discriminate between subscales. A fully crossed, two facet generalizability (G) study design was used to examine reliability. Analysis of variance demonstrated that the greatest source of variance was the scoring rubric itself, accounting for 78% of the total variance. The smallest source of variance was the interaction between portfolio and rubric (1.15%) indicating that while the seven subscales varied in difficulty level, the relative standing of individual portfolios was maintained across subscales. Faculty rater variance accounted for only 1.28% of total variance. A phi coefficient of .86, analogous to a reliability coefficient in classical test theory, was obtained in the Decision study by increasing the subscales to fourteen and decreasing faculty raters to three. There was a significant relationship between portfolios and grade point average (r = .70; p < .01), and the National Dental Hygiene Board Examination (r = .60; p < .01). The relationship between portfolios and the Central Regional Dental Testing Service examination was both weak and nonsignificant (r = .19; p > .05). An open-ended survey was used to elicit student feedback on portfolio development. A majority of the students (76%) perceived value in the development of programmatic portfolios. In conclusion, the pattern of findings from this study suggest that portfolios can serve as a valid and reliable measure for assessing student competency.
Assessing physiotherapists' communication skills for promoting patient autonomy for self-management: reliability and validity of the communication evaluation in rehabilitation tool.

PubMed

Murray, Aileen; Hall, Amanda; Williams, Geoffrey C; McDonough, Suzanne M; Ntoumanis, Nikos; Taylor, Ian; Jackson, Ben; Copsey, Bethan; Hurley, Deirdre A; Matthews, James

2018-02-27

To assess the inter-rater reliability and concurrent validity of the Communication Evaluation in Rehabilitation Tool, which aims to externally assess physiotherapists competency in using Self-Determination Theory-based communication strategies in practice. Audio recordings of initial consultations between 24 physiotherapists and 24 patients with chronic low back pain in four hospitals in Ireland were obtained as part of a larger randomised controlled trial. Three raters, all of whom had Ph.Ds in psychology and expertise in motivation and physical activity, independently listened to the 24 audio recordings and completed the 18-item Communication Evaluation in Rehabilitation Tool. Inter-rater reliability between all three raters was assessed using intraclass correlation coefficients. Concurrent validity was assessed using Pearson's r correlations with a reference standard, the Health Care Climate Questionnaire. The total score for the Communication Evaluation in Rehabilitation Tool is an average of all 18 items. Total scores demonstrated good inter-rater reliability (Intraclass Correlation Coefficient (ICC) = 0.8) and concurrent validity with the Health Care Climate Questionnaire total score (range: r = 0.7-0.88). Item-level scores of the Communication Evaluation in Rehabilitation Tool identified five items that need improvement. Results provide preliminary evidence to support future use and testing of the Communication Evaluation in Rehabilitation Tool. Implications for Rehabilitation Promoting patient autonomy is a learned skill and while interventions exist to train clinicians in these skills there are no tools to assess how well clinicians use these skills when interacting with a patient. The lack of robust assessment has severe implications regarding both the fidelity of clinician training packages and resulting outcomes for promoting patient autonomy. This study has developed a novel measurement tool Communication Evaluation in Rehabilitation Tool and a comprehensive user manual to assess how well health care providers use autonomy-supportive communication strategies in real world-clinical settings. This tool has demonstrated good inter-rater reliability and concurrent validity in its initial testing phase. The Communication Evaluation in Rehabilitation Tool can be used in future studies to assess autonomy-supportive communication and undergo further measurement property testing as per our recommendations.
A Note on the Incremental Validity of Aggregate Predictors.

ERIC Educational Resources Information Center

Day, H. D.; Marshall, David

Three computer simulations were conducted to show that very high aggregate predictive validity coefficients can occur when the across-case variability in absolute score stability occurring in both the predictor and criterion matrices is quite small. In light of the increase in internal consistency reliability achieved by the method of aggregation…
Which is the most useful patient-reported outcome in femoroacetabular impingement? Test-retest reliability of six questionnaires.

PubMed

Hinman, Rana S; Dobson, Fiona; Takla, Amir; O'Donnell, John; Bennell, Kim L

2014-03-01

The most reliable patient-reported outcomes (PROs) for people with femoroacetabular impingement (FAI) is unknown because there have been no direct comparisons of questionnaires. Thus, the aim was to evaluate the test-retest reliability of six existing PROs in a single cohort of young active people with hip/groin pain consistent with a clinical diagnosis of FAI. Young adults with clinical FAI completed six PRO questionnaires on two occasions, 1-2 weeks apart. The PROs were modified Harris Hip Score, Hip dysfunction and Osteoarthritis Score, Hip Outcome Score, Non-Arthritic Hip Score, International Hip Outcome Tool, Copenhagen Hip and Groin Outcome Score. 30 young adults (mean age 24 years, SD 4 years, range 18-30 years; 15 men) with stable symptoms participated. Intraclass correlation coefficient(3,1) values ranged from 0.73 to 0.93 (95% CI 0.38 to 0.98) indicating that most questionnaires reached minimal reliability benchmarks. Measurement error at the individual level was quite large for most questionnaires (minimal detectable change (MDC95) 12.4-35.6, 95% CI 8.7 to 54.0). In contrast, measurement error at the group level was quite small for most questionnaires (MDC95 2.2-7.3, 95% CI 1.6 to 11). The majority of the questionnaires were reliable and precise enough for use at the group level. Samples of only 23-30 individuals were required to achieve acceptable measurement variation at the group level. Further direct comparisons of these questionnaires are required to assess other measurement properties such as validity, responsiveness and meaningful change in young people with FAI.
The reliability and validity of a Japanese version of symptom checklist 90 revised

PubMed Central

Tomioka, Mitsunao; Shimura, Midori; Hidaka, Mikio; Kubo, Chiharu

2008-01-01

Objective To examine the validity and reliability of a Japanese version of the Symptom Checklist 90 Revised (SCL-90-R (J)). Methods The English SCL-90-R was translated to Japanese and the Japanese version confirmed by back-translation. To determine the factor validity and internal consistency of the nine primary subscales, 460 people from the community completed SCL-90-R(J). Test-retest reliability was examined for 104 outpatients and 124 healthy undergraduate students. The convergent-discriminant validity was determined for 80 inpatients who replied to both SCL-90-R(J) and the Minnesota Multiphasic Personality Inventory (MMPI). Results The correlation coefficients between the nine primary subscales and items were .26 to .78. Cronbach's alpha coefficients were from .76 (Phobic Anxiety) to .86 (Interpersonal Sensitivity). Pearson's correlation coefficients between test-retest scores were from .81 (Psychoticism) to .90 (Somatization) for the outpatients and were from .64 (Phobic Anxiety) to .78 (Paranoid Ideation) for the students. Each of the nine primary subscales correlated well with their corresponding constructs in the MMPI. Conclusion We confirmed the validity and reliability of SCL-90-R(J) for the measurement of individual distress. The nine primary subscales were consistent with the items of the original English version. PMID:18957078
Measuring the impact of diagnostic decision support on the quality of clinical decision making: development of a reliable and valid composite score.

PubMed

Ramnarayan, Padmanabhan; Kapoor, Ritika R; Coren, Michael; Nanduri, Vasantha; Tomlinson, Amanda L; Taylor, Paul M; Wyatt, Jeremy C; Britto, Joseph F

2003-01-01

Few previous studies evaluating the benefits of diagnostic decision support systems have simultaneously measured changes in diagnostic quality and clinical management prompted by use of the system. This report describes a reliable and valid scoring technique to measure the quality of clinical decision plans in an acute medical setting, where diagnostic decision support tools might prove most useful. Sets of differential diagnoses and clinical management plans generated by 71 clinicians for six simulated cases, before and after decision support from a Web-based pediatric differential diagnostic tool (ISABEL), were used. A composite quality score was calculated separately for each diagnostic and management plan by considering the appropriateness value of each component diagnostic or management suggestion, a weighted sum of individual suggestion ratings, relevance of the entire plan, and its comprehensiveness. The reliability and validity (face, concurrent, construct, and content) of these two final scores were examined. Two hundred fifty-two diagnostic and 350 management suggestions were included in the interrater reliability analysis. There was good agreement between raters (intraclass correlation coefficient, 0.79 for diagnoses, and 0.72 for management). No counterintuitive scores were demonstrated on visual inspection of the sets. Content validity was verified by a consultation process with pediatricians. Both scores discriminated adequately between the plans of consultants and medical students and correlated well with clinicians' subjective opinions of overall plan quality (Spearman rho 0.65, p < 0.01). The diagnostic and management scores for each episode showed moderate correlation (r = 0.51). The scores described can be used as key outcome measures in a larger study to fully assess the value of diagnostic decision aids, such as the ISABEL system.
The reliability and convergent and divergent validity of the Ruff Figural Fluency Test in healthy young adults.

PubMed

Ross, Thomas P

2014-12-01

The reliability and validity of standard and qualitative scores for the Ruff Figural Fluency Test (RFFT; Ruff, 1988) was examined in 102 healthy undergraduates. Participants (M age = 21.79; SD = 3.7; age = 80% Caucasian) were administered the RFFT and measures assessing executive functions (EF) and other cognitive domains. Inter-scorer reliability was excellent (0.9 range) for most RFFT indices. Test-retest coefficients (M interval = 7 weeks) ranged from 0.64 for the error ratio score to 0.87 for unique designs. RFFT indices correlated with Block Design performance and nonverbal measures of working memory, but were unrelated to measures of verbal fluency, verbal learning, or working memory for verbal material. RFFT novel design output correlated with most measures of EF supporting the convergent validity of this measure. In contrast, correlations between measures of EF and qualitative scores were absent or weak. RFFT score interpretation is discussed in light of relevant models of EF and directions for future research are presented. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Measuring occupational balance and its relationship to perceived stress and health: Mesurer l'équilibre occupationnel et sa relation avec le stress perçus et la santé.

PubMed

Yu, Yu; Manku, Mandeep; Backman, Catherine L

2018-04-01

There is an assumption that occupational balance is integrally related to health and well-being. This study aimed to investigate test-retest reliability of the English-translated Occupational Balance Questionnaire (OBQ), its relationship to measures of health (Short Form Health Survey-36 Version 2.0 [SF-36v2]) and stress (Perceived Stress Scale-10; PSS-10), and demographic differences in OBQ scores in Canadian adults. Test-retest reliability (2 weeks) was assessed using intraclass correlation (ICC) coefficients. Online surveys from 86 adults were analyzed using descriptive, correlational, and t test statistics. OBQ test-retest reliability was ICC = 0.74 (95% CI [0.34, 0.90]; p = .003) when excluding an influential case ( n = 20). OBQ correlations with PSS-10 were r = -.72; with SF-36v2 Mental Component Score, r = .65; and with Physical Component Score, r = .31; all p < .001. Age and gender had no impact on OBQ scores. Findings help elucidate relationships among health, stress, and occupational balance; however, further psychometric testing is warranted before using OBQ for clinical purposes.
Promoting motivation through mode of instruction: The relationship between use of affective teaching techniques and motivation to learn science

NASA Astrophysics Data System (ADS)

Sanchez Rivera, Yamil

The purpose of this study is to add to what we know about the affective domain and to create a valid instrument for future studies. The Motivation to Learn Science (MLS) Inventory is based on Krathwohl's Taxonomy of Affective Behaviors (Krathwohl et al., 1964). The results of the Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA) demonstrated that the MLS Inventory is a valid and reliable instrument. Therefore, the MLS Inventory is a uni-dimensional instrument composed of 9 items with convergent validity (no divergence). The instrument had a high Chronbach Alpha value of .898 during the EFA analysis and .919 with the CFA analysis. Factor loadings on the 9 items ranged from .617 to .800. Standardized regression weights ranged from .639 to .835 in the CFA analysis. Various indices (RMSEA = .033; NFI = .987; GFI = .985; CFI = 1.000) demonstrated a good fitness of the proposed model. Hierarchical linear modeling was used to statistical analyze data where students' motivation to learn science scores (level-1) were nested within teachers (level-2). The analysis was geared toward identifying if teachers' use of affective behavior (a level-2 classroom variable) was significantly related with students' MLS scores (level-1 criterion variable). Model testing proceeded in three phases: intercept-only model, means-as-outcome model, and a random-regression coefficient model. The intercept-only model revealed an intra-class correlation coefficient of .224 with an estimated reliability of .726. Therefore, data suggested that only 22.4% of the variance in MLS scores is between-classes and the remaining 77.6% is at the student-level. Due to the significant variance in MLS scores, X2(62.756, p<.0001), teachers' TAB scores were added as a level-2 predictor. The regression coefficient was non-significant (p>.05). Therefore, the teachers' self-reported use of affective behaviors was not a significant predictor of students' motivation to learn science.
EQ-5D-5L and SF-6D Utility Measures in Symptomatic benign Thyroid Nodules: Acceptability and Psychometric Evaluation.

PubMed

Wong, Carlos K H; Lang, Brian H H; Yu, Hill M S; Lam, Cindy L K

2017-08-01

The aim of this study was to examine the acceptability, validity, and reliability of the EuroQoL Five-Dimension Five-Level (EQ-5D-5L) and Short-Form Six-Dimension (SF-6D) health utility measures in patients with symptomatic benign thyroid nodules. Data from a randomized controlled trial (ClinicalTrials.gov identifier: NCT02398721) of 294 patients with symptomatic benign thyroid nodules were utilized for this psychometric evaluation of health-related quality of life (HR-QOL) measurement. Three HR-QOL questionnaires-the generic 12-item Short Form Health Survey (SF-12v2), EQ-5D-5L, and SF-6D-were interviewer-administered at baseline and 2 weeks afterwards. Responses to SF-6D were transformed to SF-6D utility scores using a Hong Kong population scoring algorithm derived by standard gamble, whereas responses to EQ-5D-5L were mapped onto EQ-5D-3L response via interim mapping algorithms and then converted to EQ-5D-5L utility scores using a Chinese-specific value set. Construct validity was determined by evaluating Spearman correlation between SF-12v2 scores and utility scores. Two-week test-retest reliability was assessed using intra-class correlation coefficient. No significant (>15%) floor and ceiling effects were observed for SF-6D utility scores. The SF-6D utility scores had a moderate Spearman rank correlation with the SF-12v2 domain score providing evidence for adequate construct validity. The SF-6D utility scores showed good test-retest reliability (0.794; range 0.696-0.860). Better reliability was observed in SF-6D utility scores than in EQ-5D-5L utility scores. While the EQ-5D-5L instrument was less reproducible, the SF-6D instrument appeared to be an applicable, valid, and reliable measure in assessing the HR-QOL of Chinese patients with symptomatic benign thyroid nodules. The impact of utility score selection on the effectiveness and cost effectiveness of clinical interventions targeted to these patients needs further exploration. NCT02398721, ClinicalTrials.gov.
Cumulative trauma disorders in the upper extremities: reliability of the postural and repetitive risk-factors index.

PubMed

James, C P; Harburn, K L; Kramer, J F

1997-08-01

This study addresses test-retest reliability of the Postural and Repetitive Risk-Factors Index (PRRI) for work-related upper body injuries. This assessment was developed by the present authors. A repeated measures design was used to assess the test-retest reliability of a videotaped work-site assessment of subjects' movements. Ten heavy users of video display terminals (VDTs) from a local banking industry participated in the study. The 10 subjects' movements were videotaped for 2 hours on each of 2 separate days, while working on-site at their VDTs. The videotaped assessment, which utilized known postural risk factors for developing musculoskeletal disorder, pain, and discomfort in heavy VDT users (ie, repetitiveness, awkward and static postures, and contraction time), was called the PRRI. The videotaped movement assessments were subsequently analyzed in 15-minute sessions (five sessions per 2-hour videotape, which produced a total of 10 sessions over the 2 testing days), and each session was chosen randomly from the videotape. The subjects' movements were given a postural risk score according to the criteria in the PRRI. Each subject was therefore tested a total of 10 times (ie, 10 sessions), over two days. The maximum PRRI score for both sides of the body was 216 points. Reliability coefficients (RCs) for the PRRI scores were calculated, and the reliability of any one session met the minimum criterion for excellent reliability, which was .75. A two-way analysis of variance (ANOVA) confirmed that there was no statistically significant difference between sessions (p < .05). Calculations using the standard error of measurement (SEM) indicated that an individual tested once, on one day and with a PRRI score of 25, required a change of at least 8 points in order to be confident that a true change in score had occurred. The significant results from the reliability tests indicated that the PRRI was a reliable measurement tool that could be used by occupational health practitioners on the job site.
The Reliability and Validity of the Perceived Dietary Adherence Questionnaire for People with Type 2 Diabetes

PubMed Central

Asaad, Ghada; Sadegian, Maryam; Lau, Rita; Xu, Yunke; Soria-Contreras, Diana C.; Bell, Rhonda C.; Chan, Catherine B.

2015-01-01

Nutrition therapy is essential for diabetes treatment, and assessment of dietary intake can be time consuming. The purpose of this study was to develop a reliable and valid instrument to measure diabetic patients’ adherence to Canadian diabetes nutrition recommendations. Specific information derived from three, repeated 24-h dietary recalls of 64 type 2 diabetic patients, aged 59.2 ± 9.7 years, was correlated with a total score and individual items of the Perceived Dietary Adherence Questionnaire (PDAQ). Test-retest reliability was completed by 27 type 2 diabetic patients, aged 62.8 ± 8.4 years. The correlation coefficients for PDAQ items versus 24-h recalls ranged from 0.46 to 0.11. The intra-class correlation (0.78) was acceptable, indicating good reliability. The results suggest that PDAQ is a valid and reliable measure of diabetes nutrition recommendations. Because it is quick to administer and score, it may be useful as a screening tool in research and as a clinical tool to monitor dietary adherence. PMID:26198247
Camera-tracking gaming control device for evaluation of active wrist flexion and extension.

PubMed

Shefer Eini, Dalit; Ratzon, Navah Z; Rizzo, Albert A; Yeh, Shih-Ching; Lange, Belinda; Yaffe, Batia; Daich, Alexander; Weiss, Patrice L; Kizony, Rachel

Cross sectional. Measuring wrist range of motion (ROM) is an essential procedure in hand therapy clinics. To test the reliability and validity of a dynamic ROM assessment, the Camera Wrist Tracker (CWT). Wrist flexion and extension ROM of 15 patients with distal radius fractures and 15 matched controls were assessed with the CWT and with a universal goniometer. One-way model intraclass correlation coefficient analysis indicated high test-retest reliability for extension (ICC = 0.92) and moderate reliability for flexion (ICC = 0.49). Standard error for extension was 2.45° and for flexion was 4.07°. Repeated-measures analysis revealed a significant main effect for group; ROM was greater in the control group (F[1, 28] = 47.35; P < .001). The concurrent validity of the CWT was partially supported. The results indicate that the CWT may provide highly reliable scores for dynamic wrist extension ROM, and moderately reliable scores for flexion, in people recovering from a distal radius fracture. N/A. Copyright © 2016 Hanley & Belfus. Published by Elsevier Inc. All rights reserved.
The Chinese version of the Pediatric Quality of Life Inventory™ (PedsQL™) 3.0 Asthma Module: reliability and validity.

PubMed

Feng, Lifen; Zhang, Yingfen; Chen, Ruoqing; Hao, Yuantao

2011-08-07

Health-related quality of life (HRQOL) has been recognized as an important health outcome measurement for pediatric patients. One of the most promising instruments in measuring pediatric HRQOL emerged in recent years is the Pediatric Quality of Life Inventory (PedsQL™). The PedsQL™ 3.0 Asthma Module, one of the PedsQL™disease-specific scales, was designed to measure HRQOL dimensions specifically tailored for pediatric asthma. The present study is aimed to evaluate the psychometric properties of the Chinese version of the PedsQL™ 3.0 Asthma Module. The PedsQL™ 3.0 Asthma Module was translated into Chinese following the PedsQL™ Measurement Model Translation Methodology. The Chinese version scale was administered to 204 children with asthma and 337 parents of children with asthma from four Triple A hospitals. The psychometric properties were then evaluated. The percentage of missing value for each item of the scale ranged from 0.00% to 8.31%. All child self-report subscales and parent proxy-report subscales approached or exceeded the minimum reliability standard of 0.70 for alpha coefficient, except 3 subscales of Young Child (aged 5-7) self-report (alphas ranging from 0.59 to 0.68). Test-retest reliability was satisfactory with intraclass correlation coefficients (ICCs) which exceeded the recommended standard of 0.80 in all subscales. Correlation coefficients between items and their hypothesized subscales were higher than those with other subscales. The PedsQL™ 3.0 Asthma Module distinguished between outpatients and inpatients. Patients with mild asthma reported higher scores than those with moderate/severe asthma in majority of subscales. The intercorrelations among the PedsQL™ 3.0 Asthma Module subscales and the PedsQL™ 4.0 Generic Core Scales were in medium to large effect size. The child self-report scores were consistent with the parent proxy-report scores. The Chinese version of the PedsQL™ 3.0 Asthma Module has acceptable psychometric properties, except the internal consistency reliability for Young Child (aged 5-7) self-report. Further studies should be focused on testing responsiveness of the Chinese version scale in longitudinal studies, evaluating the reliability and validity of the scale for the patients with severe asthma or teens independently, and assessing HRQOL of children with asthma in other areas.
The Cardiff Acne Disability Index (CADI): linguistic and cultural validation in Serbian.

PubMed

Jankovic, Slavenka; Vukicevic, Jelica; Djordjevic, Sanja; Jankovic, Janko; Marinkovic, Jelena; Basra, Mohammad K A

2013-02-01

The aims of this study were to translate the Cardiff Acne Disability Index (CADI) into Serbian and to assess its validity and reliability in Serbian acne patients. The CADI was translated and linguistically validated into Serbian according to published guidelines. This version of CADI, along with the Serbian version of Children's Dermatology Life Quality Index (CDLQI) and a short demographic questionnaire, was administrated to a cohort of secondary school pupils. The Global Acne Grading Score was used to measure the clinical severity of acne. The internal consistency reliability of the Serbian version of CADI was assessed by Cronbach's alpha coefficient while its concurrent validity was assessed by Spearman's correlation coefficient. Construct validity was examined by factor analysis. A total of 465 pupils completed questionnaires. Self-reported acne was present in 76% of pupils (353/465). The Serbian version of CADI showed high internal consistency reliability (Cronbach's alpha coefficient = 0.79). The mean item-total correlation coefficient was 0.74 with a range of 0.53-0.81. The concurrent validity of the scale was supported by a moderate but highly significant correlation with the CDLQI (Spearman's rho = 0.66; P < 0.001). Factor analysis revealed the presence of two dimensions underlying the factor structure of the scale. The Serbian version of the CADI is a reliable, valid, and valuable tool for assessing the impact of acne on the quality of life of Serbian-speaking patients.

The reliability and validity of qualitative scores for the Controlled Oral Word Association Test.

PubMed

Ross, Thomas P; Calhoun, Emily; Cox, Tara; Wenner, Carolyn; Kono, Whitney; Pleasant, Morgan

2007-05-01

The reliability and validity of two qualitative scoring systems for the Controlled Oral Word Association Test [Benton, A. L., Hamsher, de S. K., & Sivan, A. B. (1983). Multilingual aplasia examination (2nd ed.). Iowa City, IA: AJA Associates] were examined in 108 healthy young adults. The scoring systems developed by Troyer et al. [Troyer, A. K., Moscovich, M., & Winocur, G. (1997). Clustering and switching as two components of verbal fluency: Evidence from younger and older healthy adults. Neuropsychology, 11, 138-146] and by Abwender et al. [Abwender, D. A., Swan, J. G., Bowerman, J. T., & Connolly, S. W. (2001a). Qualitative analysis of verbal fluency output: Review and comparison of several scoring methods. Assessment, 8, 323-336] each demonstrated excellent interrater reliability (all indices at or above r(icc)=.9). Consistent with previous research [e.g., Ross, T. P. (2003). The reliability of cluster and switch scores for the COWAT. Archives of Clinical Psychology, 18, 153-164), test-retest reliability coefficients (N=53; M interval 44.6 days) for the qualitative scores were modest to poor (r(icc)=.6 to .4 range). Correlations among COWAT scores, measures of executive functioning, verbal learning, working memory, and vocabulary were examined. The idea that qualitative scores represent distinct executive functions such as cognitive flexibility or strategy utilization was not supported. We offer the interpretation that COWAT performance may require the ability to retrieve words in a non-routine manner while suppressing habitual responses and associated processing interference, presumably due to a spread of activation across semantic or lexical networks. This interpretation, though speculative at present, implies that clustering and switching on the COWAT may not be entirely deliberate, but rather an artifact of a passive (i.e., state-dependent) process. Ideas for future research, most noticeably experimental studies using cognitive methods (e.g., priming), are discussed.
Reliability and concurrent validity of the Infant Motor Profile.

PubMed

Heineman, Kirsten R; Middelburg, Karin J; Bos, Arend F; Eidhof, Lieke; La Bastide-Van Gemert, Sacha; Van Den Heuvel, Edwin R; Hadders-Algra, Mijna

2013-06-01

The Infant Motor Profile (IMP) is a qualitative assessment of motor behaviour in infancy. It consists of five domains: movement variation, variability, fluency, symmetry, and performance. The aim of this study was to assess interobserver reliability and concurrent validity of the IMP with the Alberta Infant Motor Scale (AIMS) and an age-specific neurological examination. Fifty-nine preterm infants (25 females, 34 males; median gestational age 29.7wks, median birthweight 1285g) and 146 term infants (74 females, 72 males; median gestational age 40.1wks, birthweight 3500g) were included. Assessments were performed at corrected ages of 4, 6, 10, 12, and 18 months and consisted of the IMP, AIMS, and an age-specific neurological examination. Interobserver reliability was investigated on a sample of 25 video recordings. Non-parametric statistics were used to analyse the data. Interobserver reliability was high (intraclass correlation coefficient 0.95). At all ages, AIMS scores correlated weakly to fairly with total IMP scores (Spearman's ρ 0.36-0.55), but moderately to strongly with scores on the performance domain of the IMP (Spearman's ρ 0.47-0.84). A clear relation was found between total IMP score and outcome of the neurological examination (Kruskal-Wallis p<0.001 at all ages). Interobserver reliability of the IMP is good. Concurrent validity with the AIMS is best for the IMP performance domain. Concurrent validity with age-specific neurological examination is very good. © The Authors. Developmental Medicine & Child Neurology © 2013 Mac Keith Press.
Validation of Yoruba Version of Family Burden Interview Schedule (Y-FBIS) on Caregivers of Schizophrenia Patients

PubMed Central

Lasebikan, Victor Olufolahan

2012-01-01

Objective. To validate the Yoruba version of Family Burden Interview Schedule (Y-FBIS) for assessing the burden on caregivers of persons with schizophrenia. Methods. Three hundred and sixty-eight dyads of persons with schizophrenia and their caregivers were recruited from a psychiatric outpatient clinic. The (Y-FBIS) and the Yoruba version of the GHQ-12 (Y-GHQ-12) were applied to the caregivers. Patients' level of social functioning was assessed using the Global Assessment of Functioning scale. Results. All (368) caregivers were used for tests of internal consistency, 180 for interrater reliability, and another 180 for test-retest reliability. Internal consistency of the Y-FBIS was demonstrated by a significant Cronbach α of between 0.62 and 0.82 for each item. Concurrent validity of the Y-FBIS was illustrated by its significant positive correlation with Y-GHQ-12 (r = 0.633 , P < 0.01). Split-half reliability was 0.849. Intraclass correlation coefficient for the total score of Y-FBIS was 0.849 at 95% confidence interval. Test-retest reliability of individual scales ranged from 0.780 to 0.874 and was 0.830 for total objective scale score. Convergent validity was shown by the significant positive correlation (r = 0.83) between the objective burden score and subjective burden score of Y-FBIS. ROC curve area was 0.981. Conclusion. The Y-FBIS is a valid, reliable, and sensitive instrument for assessing the burden on caregivers of persons with schizophrenia in Nigeria. PMID:23738196
Validation and reliability of a Behcet’s Syndrome Activity Scale in Korea

PubMed Central

Choi, Hyo Jin; Seo, Mi Ryoung; Ryu, Hee Jung; Baek, Han Joo

2016-01-01

Background/Aims: We prepared a cross-cultural adaptation of the Behcet’s Syndrome Activity Scale (BSAS) and evaluated its reliability and validity in Korea. Methods: Fifty patients with Behcet’s disease (BD) who attended the Rheumatology Clinic of Gachon University Gil Medical Center were included in this study. The first BSAS questionnaire was administered at each clinic visit, and the second questionnaire was completed at home within 24 hours of the visit. A Behcet’s Disease Current Activity Form (BDCAF) and a Behcet’s Disease Quality of Life (BDQOL) form were also given to patients. The test-retest reliability was analyzed by intraclass correlation coefficients (ICC). To assess the validity, the total BSAS score was compared with the BDCAF score, the patient/physician global assessment, and the BDQOL by Spearman rank correlation. Results: Twelve males and 38 females were enrolled. The mean age was 48.5 years and the mean disease duration was 6.7 years. Thirty-eight patients (76.0%) returned the questionnaire by mail. For the test-retest reliability, the two assessments were significantly correlated on all 10 items of the BSAS questionnaire (p < 0.05) and the total BSAS score (ICC, 0.925; p < 0.001). The total BSAS score was statistically correlated with the BDQOL, BDCAF, and patient/physician global assessment (p < 0.01). Conclusions: The Korean version of BSAS is a reliable and valid instrument to measure BD activity. PMID:26767871
First quality score for referral letters in gastroenterology—a validation study

PubMed Central

Eskeland, Sigrun Losada; Brunborg, Cathrine; Seip, Birgitte; Wiencke, Kristine; Hovde, Øistein; Owen, Tanja; Skogestad, Erik; Huppertz-Hauss, Gert; Halvorsen, Fred-Arne; Garborg, Kjetil; Aabakken, Lars; de Lange, Thomas

2016-01-01

Objective To create and validate an objective and reliable score to assess referral quality in gastroenterology. Design An observational multicentre study. Setting and participants 25 gastroenterologists participated in selecting variables for a Thirty Point Score (TPS) for quality assessment of referrals to gastroenterology specialist healthcare for 9 common indications. From May to September 2014, 7 hospitals from the South-Eastern Norway Regional Health Authority participated in collecting and scoring 327 referrals to a gastroenterologist. Main outcome measure Correlation between the TPS and a visual analogue scale (VAS) for referral quality. Results The 327 referrals had an average TPS of 13.2 (range 1–25) and an average VAS of 4.7 (range 0.2–9.5). The reliability of the score was excellent, with an intra-rater intraclass correlation coefficient (ICC) of 0.87 and inter-rater ICC of 0.91. The overall correlation between the TPS and the VAS was moderate (r=0.42), and ranged from fair to substantial for the various indications. Mean agreement was good (ICC=0.47, 95% CI (0.34 to 0.57)), ranging from poor to good. Conclusions The TPS is reliable, objective and shows good agreement with the subjective VAS. The score may be a useful tool for assessing referral quality in gastroenterology, particularly important when evaluating the effect of interventions to improve referral quality. PMID:27855107
Reliability of a computer and Internet survey (Computer User Profile) used by adults with and without traumatic brain injury (TBI).

PubMed

Kilov, Andrea M; Togher, Leanne; Power, Emma

2015-01-01

To determine test-re-test reliability of the 'Computer User Profile' (CUP) in people with and without TBI. The CUP was administered on two occasions to people with and without TBI. The CUP investigated the nature and frequency of participants' computer and Internet use. Intra-class correlation coefficients and kappa coefficients were conducted to measure reliability of individual CUP items. Descriptive statistics were used to summarize content of responses. Sixteen adults with TBI and 40 adults without TBI were included in the study. All participants were reliable in reporting demographic information, frequency of social communication and leisure activities and computer/Internet habits and usage. Adults with TBI were reliable in 77% of their responses to survey items. Adults without TBI were reliable in 88% of their responses to survey items. The CUP was practical and valuable in capturing information about social, leisure, communication and computer/Internet habits of people with and without TBI. Adults without TBI scored more items with satisfactory reliability overall in their surveys. Future studies may include larger samples and could also include an exploration of how people with/without TBI use other digital communication technologies. This may provide further information on determining technology readiness for people with TBI in therapy programmes.
Testing the validity, reliability and utility of the Self-Administration of Medication (SAM) tool in patients undergoing rehabilitation.

PubMed

Anderson, Jessica; Manias, Elizabeth; Kusljic, Snezana; Finch, Sue

2014-01-01

Determination of patients' ability to self-administer medications in the hospital has largely been determined using the subjective judgment of health professionals. To examine the validity, reliability and utility of the Self-Administration of Medication (SAM) tool as an objective means to determine patients' ability to self-administer in a rehabilitation unit of a public teaching hospital in Melbourne, Australia. To assess validity of the SAM tool, associations were examined between the total SAM tool score and of the patients' competence to self-administer from the perceptions of the tool administrator, patients and nurses. Validity also was determined from a principal component analysis. Pearson correlations were calculated for how SAM scores related to scores obtained from the Functional Independence Measure (FIM) and Barthel Score Index (BSI). To assess the SAM tool's reliability, a Cronbach's alpha coefficient was calculated. Utility of the SAM tool was evidenced by documenting its administration time. One hundred patients participated in this study. The SAM tool had a Cronbach's alpha coefficient of 0.75 and took a mean time of 5.36 min to complete. The capability to self-medicate section of the SAM tool had strong correlations with the FIM (r = 0.485) and BSI (r = 0.472) data, respectively, and the total SAM tool had moderate and strong correlations with the nurses' (r = 0.315) and tool administrator's (r = 0.632) perceptions of patients' ability to self-administer, respectively. Bland-Altman and ROC curve analyses showed poor agreement between the total SAM tool score and the nurses' perceptions. The SAM tool demonstrated acceptable overall internal consistency. It only requires a short time to be completed and is more objective than seeking out health professionals' perceptions. Additional research is needed to further validate this approach to determining patients' ability to self-medicate. Crown Copyright © 2014. Published by Elsevier Inc. All rights reserved.
Development and Psychometric Evaluation of a Clinical Global Impression for Schizoaffective Disorder Scale

PubMed Central

Daniel, David G; Revicki, Dennis A; Canuso, Carla M; Turkoz, Ibrahim; Fu, Dong-Jing; Alphs, Larry; Ishak, K. Jack; Bartko, John J; Lindenmayer, Jean-Pierre

2012-01-01

Objective: The Clinical Global Impression for Schizoaffective Disorder scale is a new rating scale adapted from the Clinical Global Impression scale for use in patients with schizoaffective disorder. The psychometric characteristics of the Clinical Global Impression for Schizoaffective Disorder are described. Design: Content validity was assessed using an investigator questionnaire. Inter-rater reliability was determined with 12 sets of videotaped interviews rated independently by two trained individuals. Test-retest reliability was assessed using 30 randomly selected raters from clinical trials who evaluated the same videos on separate occasions two weeks apart. Convergent and divergent validity and effect size were evaluated by comparing scores between the Clinical Global Impression for Schizoaffective Disorder and the Positive and Negative Syndrome Scale, 21-item Hamilton Rating Scale for Depression, and Young Mania Rating Scale scales using pooled patient data from two clinical trials. Clinical Global Impression for Schizoaffective Disorder scores were then linked to corresponding Positive and Negative Syndrome Scale scores. Results: Content validity was strong. Inter-rater agreement was good to excellent for most scales and subscales (intra-class correlation coefficient ≥0.50). Test-retest showed good reproducibility, with intraclass correlation coefficients ranging from 0.444 to 0.898. Spearman correlations between Clinical Global Impression for Schizoaffective Disorder domains and corresponding symptom scales were 0.60 or greater, and effect sizes for Clinical Global Impression for Schizoaffective Disorder overall and domain scores were similar to Positive and Negative Syndrome Scale Young Mania Rating Scale, and 21-item Hamilton Rating Scale for Depression scores. Raters anticipated that the scale might be less effective in distinguishing negative from depressive symptoms, and, in fact, the results here may reflect that clinical reality. Conclusion: Multiple lines of evidence support the reliability and validity of the Clinical Global Impression for Schizoaffective Disorder for studies in schizoaffective disorder. PMID:22347687
Reliability and validity of TEMPS-A in a Japanese non-clinical population: application to unipolar and bipolar depressives.

PubMed

Matsumoto, Satoko; Akiyama, Tsuyoshi; Tsuda, Hitoshi; Miyake, Yuko; Kawamura, Yoshiya; Noda, Toshie; Akiskal, Kareen K; Akiskal, Hagop S

2005-03-01

In Japan, TEMPS-A has gathered much attention, because Kraepelin's concepts on "fundamental states" of mood disorder and temperaments have been widely respected. TEMPS-A was translated into Japanese (and after the approval of the English back translation by H.S.A.), it was administered to 1391 non-clinical subjects, and 29 unipolar and 30 bipolar patients in remission. Of the non-clinical sample, 426 were readministered the instrument again in 1 month. A control group matched for gender and age was drawn from the non-clinical sample. Regarding test-retest reliability, Spearman's coefficients for depressive, cyclothymic, hyperthymic, irritable and anxious temperaments were 0.79, 0.84, 0.87, 0.81 and 0.87, respectively; regarding internal consistency, Cronbach's alpha coefficients were 0.69, 0.84, 0.79, 0.83 and 0.87, respectively. The unipolar and bipolar groups showed significantly higher depressive, cyclothymic and anxious temperament scores than the control group. Curiously, the bipolar group showed significantly lower hyperthymic score than the control group; irritable temperament scores showed no significant differences. Depressive, cyclothymic, irritable and anxious temperament scores showed significant correlations with each other. Between the unipolar and bipolar groups, there was little difference regarding the temperament scores. Also the inter-temperament correlations showed the same pattern in the unipolar and bipolar groups. The clinically well cohort was 70% male. TEMPS-A showed a high reliability and validity (internal consistency) in a Japanese non-clinical sample. By and large, the hypothesized five temperament structure was upheld. Depressive, cyclothymic and anxious temperaments showed concurrent validity with mood disorder. Irritable temperament may represent a subtype of depressive, cyclothymic or anxious temperaments. There may be a temperamental commonality between unipolar and bipolar disorders. TEMPS-A will open new possibilities for international research on mood disorder and personality traits.
Evaluation of the Validity and Reliability of the Chinese Healthy Eating Index.

PubMed

Yuan, Ya-Qun; Li, Fan; Wu, Han; Wang, Ying-Chuan; Chen, Jing-Si; He, Geng-Sheng; Li, Shu-Guang; Chen, Bo

2018-01-24

The Chinese Healthy Eating Index (CHEI) is a measuring instrument of diet quality in accordance with the Dietary Guidelines for Chinese (DGC)-2016. The objective of the study was to evaluate the validity and reliability of the CHEI. Data from 12,473 adults from the China Health and Nutrition Survey (CHNS)-2011, including 3-day-24-h dietary recalls were used in this study. The CHEI was assessed by four exemplary menus developed by the DGC-2016, the general linear models, the independent t -test and the Mann-Whitney U -test, the Spearman's correlation analysis, the principal components analysis (PCA), the Cronbach's coefficient, and the Pearson correlation with nutrient intakes. A higher CHEI score was linked with lower exposure to known risk factors of Chinese diets. The CHEI scored nearly perfect for exemplary menus for adult men (99.8), adult women (99.7), and the healthy elderly (99.1), but not for young children (91.2). The CHEI was able to distinguish the difference in diet quality between smokers and non-smokers ( P < 0.0001), people with higher and lower education levels ( P < 0.0001), and people living in urban and rural areas ( P < 0.0001). Low correlations with energy intake for the CHEI total and component scores (|r| < 0.34, P < 0.01) supported the index assessed diet quality independently of diet quantity. The PCA indicated that underlying multiple dimensions compose the CHEI, and Cronbach's coefficient α was 0.22. Components of dairy, fruits and cooking oils had the greatest impact on the total score. People with a higher CHEI score had not only a higher absolute intake of nutrients ( P < 0.001), but also a more nutrient-dense diet ( P < 0.001). Our findings support the validity and reliability of the CHEI when using the 3-day-24-h recalls.
Turkish Version of Kolcaba's Immobilization Comfort Questionnaire: A Validity and Reliability Study.

PubMed

Tosun, Betül; Aslan, Özlem; Tunay, Servet; Akyüz, Aygül; Özkan, Hüseyin; Bek, Doğan; Açıksöz, Semra

2015-12-01

The purpose of this study was to determine the validity and reliability of the Turkish version of the Immobilization Comfort Questionnaire (ICQ). The sample used in this methodological study consisted of 121 patients undergoing lower extremity arthroscopy in a training and research hospital. The validity study of the questionnaire assessed language validity, structural validity and criterion validity. Structural validity was evaluated via exploratory factor analysis. Criterion validity was evaluated by assessing the correlation between the visual analog scale (VAS) scores (i.e., the comfort and pain VAS scores) and the ICQ scores using Spearman's correlation test. The Kaiser-Meyer-Olkin coefficient and Bartlett's test of sphericity were used to determine the suitability of the data for factor analysis. Internal consistency was evaluated to determine reliability. The data were analyzed with SPSS version 15.00 for Windows. Descriptive statistics were presented as frequencies, percentages, means and standard deviations. A p value ≤ .05 was considered statistically significant. A moderate positive correlation was found between the ICQ scores and the VAS comfort scores; a moderate negative correlation was found between the ICQ and the VAS pain measures in the criterion validity analysis. Cronbach α values of .75 and .82 were found for the first and second measurements, respectively. The findings of this study reveal that the ICQ is a valid and reliable tool for assessing the comfort of patients in Turkey who are immobilized because of lower extremity orthopedic problems. Copyright © 2015. Published by Elsevier B.V.
Psychometric properties of the Calgary Cambridge guides to assess communication skills of undergraduate medical students

PubMed Central

Simmenroth-Nayda, Anne; Heinemann, Stephanie; Nolte, Catharina; Fischer, Thomas; Himmel, Wolfgang

2014-01-01

Objectives: The aim of this study was to analyse the psychometric properties of the short version of the Calgary Cambridge Guides and to decide whether it can be recommended for use in the assessment of communications skills in young undergraduate medical students. Methods: Using a translated version of the Guide, 30 members from the Department of General Practice rated 5 videotaped encounters between students and simulated patients twice. Item analysis should detect possible floor and/or ceiling effects. The construct validity was investigated using exploratory factor analysis. Intra-rater reliability was measured in an interval of 3 months, inter-rater reliability was assessed by the intraclass correlation coefficient. Results: The score distribution of the items showed no ceiling or floor effects. Four of the five factors extracted from the factor analysis represented important constructs of doctor-patient communication The ratings for the first and second round of assessing the videos correlated at 0.75 (p < 0.0001). Intraclass correlation coefficients for each item ranged were moderate and ranged from 0.05 to 0.57. Conclusions: Reasonable score distributions of most items without ceiling or floor effects as well as a good test-retest reliability and construct validity recommend the C-CG as an instrument for assessing communication skills in undergraduate medical students. Some deficiencies in inter-rater reliability are a clear indication that raters need a thorough instruction before using the C-CG. PMID:25480988
AHRQ's hospital survey on patient safety culture: psychometric analyses.

PubMed

Blegen, Mary A; Gearhart, Susan; O'Brien, Roxanne; Sehgal, Niraj L; Alldredge, Brian K

2009-09-01

This project analyzed the psychometric properties of the Agency for Healthcare Research and Quality Hospital Survey on Patient Safety Culture (HSOPSC) including factor structure, interitem reliability and intraclass correlations, usefulness for assessment, predictive validity, and sensitivity. The survey was administered to 454 health care staff in 3 hospitals before and after a series of multidisciplinary interventions designed to improve safety culture. Respondents (before, 434; after, 368) included nurses, physicians, pharmacists, and other hospital staff members. Factor analysis partially confirmed the validity of the HSOPSC subscales. Interitem consistency reliability was above 0.7 for 5 subscales; the staffing subscale had the lowest reliability coefficients. The intraclass correlation coefficients, agreement among the members of each unit, were within recommended ranges. The pattern of high and low scores across the subscales of the HSOPSC in the study hospitals were similar to the sample of Pacific region hospitals reported by the Agency for Healthcare Research and Quality and corresponded to the proportion of items in each subscale that are worded negatively (reverse scored). Most of the unit and hospital dimensions were correlated with the Safety Grade outcome measure in the tool. Overall, the tool was shown to have moderate-to-strong validity and reliability, with the exception of the staffing subscale. The usefulness in assessing areas of strength and weakness for hospitals or units among the culture subscales is questionable. The culture subscales were shown to correlate with the perceived outcomes, but further study is needed to determine true predictive validity.
Development and validation of an achievement test in introductory quantum mechanics: The Quantum Mechanics Visualization Instrument (QMVI)

NASA Astrophysics Data System (ADS)

Cataloglu, Erdat

The purpose of this study was to construct a valid and reliable multiple-choice achievement test to assess students' understanding of core concepts of introductory quantum mechanics. Development of the Quantum Mechanics Visualization Instrument (QMVI) occurred across four successive semesters in 1999--2001. During this time 213 undergraduate and graduate students attending the Pennsylvania State University (PSU) at University Park and Arizona State University (ASU) participated in this development and validation study. Participating students were enrolled in four distinct groups of courses: Modern Physics, Undergraduate Quantum Mechanics, Graduate Quantum Mechanics, and Chemistry Quantum Mechanics. Expert panels of professors of physics experienced in teaching quantum mechanics courses and graduate students in physics and science education established the core content and assisted in the validating of successive versions of the 24-question QMVI. Instrument development was guided by procedures outlined in the Standards for Educational and Psychological Testing (AERA-APA-NCME, 1999). Data gathered in this study provided information used in the development of successive versions of the QMVI. Data gathered in the final phase of administration of the QMVI also provided evidence that the intended score interpretation of the QMVI achievement test is valid and reliable. A moderate positive correlation coefficient of 0.49 was observed between the students' QMVI scores and their confidence levels. Analyses of variance indicated that students' scores in Graduate Quantum Mechanics and Undergraduate Quantum Mechanics courses were significantly higher than the mean scores of students in Modern Physics and Chemistry Quantum Mechanics courses (p < 0.05). That finding is consistent with the additional understanding and experience that should be anticipated in graduate students and junior-senior level students over sophomore physics majors and majors in another field. The moderate positive correlation coefficient of 0.42 observed between students' QMVI scores and their final course grades was also consistent with expectations in a valid instrument. In addition, the Cronbach-alpha reliability coefficient of the QMVI was found to be 0.82. Limited findings were drawn on students' understanding of introductory quantum mechanics concepts. Data suggested that the construct of quantum mechanics understanding is most likely multidimensional and the Main Topic defined as "Quantum Mechanics Postulates" may be an especially important factor for students in acquiring a successful understanding of quantum mechanics.
Using the Hemophilia Joint Health Score for assessment of children: Reliability of the Spanish version.

PubMed

R, Cuesta-Barriuso; A, Torres-Ortuño; S, Pérez-Alenda; J, Carrasco Juan; F, Querol; J, Nieto-Munuera; Ja, López-Pina

2018-02-27

Numerous measuring instruments for the evaluation of hemophilic arthropathy have been developed. One of the most used systems is the Hemophilia Joint Health Score (HJHS) given its sensitivity to clinical changes appearing in the joints because of recurrent hemarthrosis. Assessing the interrater reliability, using the Spanish version of the HJHS (version 2.1) in children with hemophilia. Reliability study to assess the interrater reliability of the Spanish version of HJHS. A sample of 36 children aged 7-13 years diagnosed with hemophilia A or B was used. Two physiotherapists performed physical assessments with the Spanish version of the HJHS. Descriptive statistics (range, mean, standard deviation) and the analysis of interrater reliability were calculated. The interrater reliability was heterogeneous since the Kappa coefficient range (ĸ), although significant (p < 0.001), ranged 0.31-1.00 in the variables of HJHS (swelling, duration of swelling, muscle atrophy, crepitus on motion, flexion loss, extension loss, joint pain, strength, and global gait). In assessing the bias of observers with the Bland and Altman method, the observer 1 scored 0.41 (CI [-0.67, 1.49]) units above observer 2, and the difference between the two was significant (t(36) = 4.48), p < 0.001). The interrater reliability of the Spanish population version of the HJHS is high. This scale should be used generically in evaluating musculoskeletal pediatric patients with hemophilia.
The reliability and validity of ultrasound to quantify muscles in older adults: a systematic review

PubMed Central

Scafoglieri, Aldo; Jager‐Wittenaar, Harriët; Hobbelen, Johannes S.M.; van der Schans, Cees P.

2017-01-01

Abstract This review evaluates the reliability and validity of ultrasound to quantify muscles in older adults. The databases PubMed, Cochrane, and Cumulative Index to Nursing and Allied Health Literature were systematically searched for studies. In 17 studies, the reliability (n = 13) and validity (n = 8) of ultrasound to quantify muscles in community‐dwelling older adults (≥60 years) or a clinical population were evaluated. Four out of 13 reliability studies investigated both intra‐rater and inter‐rater reliability. Intraclass correlation coefficient (ICC) scores for reliability ranged from −0.26 to 1.00. The highest ICC scores were found for the vastus lateralis, rectus femoris, upper arm anterior, and the trunk (ICC = 0.72 to 1.000). All included validity studies found ICC scores ranging from 0.92 to 0.999. Two studies describing the validity of ultrasound to predict lean body mass showed good validity as compared with dual‐energy X‐ray absorptiometry (r 2 = 0.92 to 0.96). This systematic review shows that ultrasound is a reliable and valid tool for the assessment of muscle size in older adults. More high‐quality research is required to confirm these findings in both clinical and healthy populations. Furthermore, ultrasound assessment of small muscles needs further evaluation. Ultrasound to predict lean body mass is feasible; however, future research is required to validate prediction equations in older adults with varying function and health. PMID:28703496
Reliability, validity, and minimal detectable change of the push-off test scores in assessing upper extremity weight-bearing ability.

PubMed

Mehta, Saurabh P; George, Hannah R; Goering, Christian A; Shafer, Danielle R; Koester, Alan; Novotny, Steven

2017-11-01

Clinical measurement study. The push-off test (POT) was recently conceived and found to be reliable and valid for assessing weight bearing through injured wrist or elbow. However, further research with larger sample can lend credence to the preliminary findings supporting the use of the POT. This study examined the interrater reliability, construct validity, and measurement error for the POT in patients with wrist conditions. Participants with musculoskeletal (MSK) wrist conditions were recruited. The performance on the POT, grip isometric strength of wrist extensors was assessed. The shortened version of the Disabilities of the Arm, Shoulder and Hand and numeric pain rating scale were completed. The intraclass correlation coefficient assessed interrater reliability of the POT. Pearson correlation coefficients (r) examined the concurrent relationships between the POT and other measures. The standard error of measurement and the minimal detectable change at 90% confidence interval were assessed as measurement error and index of true change for the POT. A total of 50 participants with different elbow or wrist conditions (age: 48.1 ± 16.6 years) were included in this study. The results of this study strongly supported the interrater reliability (intraclass correlation coefficient: 0.96 and 0.93 for the affected and unaffected sides, respectively) of the POT in patients with wrist MSK conditions. The POT showed convergent relationships with the grip strength on the injured side (r = 0.89) and the wrist extensor strength (r = 0.7). The POT showed smaller standard error of measurement (1.9 kg). The minimal detectable change at 90% confidence interval for the POT was 4.4 kg for the sample. This study provides additional evidence to support the reliability and validity of the POT. This is the first study that provides the values for the measurement error and true change on the POT scores in patients with wrist MSK conditions. Further research should examine the responsiveness and discriminant validity of the POT in patients with wrist conditions. Copyright © 2017 Hanley & Belfus. Published by Elsevier Inc. All rights reserved.
Development and testing of a scale to assess physician attitudes about handheld computers with decision support.

PubMed

Ray, Midge N; Houston, Thomas K; Yu, Feliciano B; Menachemi, Nir; Maisiak, Richard S; Allison, Jeroan J; Berner, Eta S

2006-01-01

The authors developed and evaluated a rating scale, the Attitudes toward Handheld Decision Support Software Scale (H-DSS), to assess physician attitudes about handheld decision support systems. The authors conducted a prospective assessment of psychometric characteristics of the H-DSS including reliability, validity, and responsiveness. Participants were 82 Internal Medicine residents. A higher score on each of the 14 five-point Likert scale items reflected a more positive attitude about handheld DSS. The H-DSS score is the mean across the fourteen items. Attitudes toward the use of the handheld DSS were assessed prior to and six months after receiving the handheld device. Cronbach's Alpha was used to assess internal consistency reliability. Pearson correlations were used to estimate and detect significant associations between scale scores and other measures (validity). Paired sample t-tests were used to test for changes in the mean attitude scale score (responsiveness) and for differences between groups. Internal consistency reliability for the scale was alpha = 0.73. In testing validity, moderate correlations were noted between the attitude scale scores and self-reported Personal Digital Assistant (PDA) usage in the hospital (correlation coefficient = 0.55) and clinic (0.48), p < 0.05 for both. The scale was responsive, in that it detected the expected increase in scores between the two administrations (3.99 (s.d. = 0.35) vs. 4.08, (s.d. = 0.34), p < 0.005). The authors' evaluation showed that the H-DSS scale was reliable, valid, and responsive. The scale can be used to guide future handheld DSS development and implementation.
The Arthroscopic Surgical Skill Evaluation Tool (ASSET)

PubMed Central

Koehler, Ryan J.; Amsdell, Simon; Arendt, Elizabeth A; Bisson, Leslie J; Braman, Jonathan P; Butler, Aaron; Cosgarea, Andrew J; Harner, Christopher D; Garrett, William E; Olson, Tyson; Warme, Winston J.; Nicandri, Gregg T.

2014-01-01

Background Surgeries employing arthroscopic techniques are among the most commonly performed in orthopaedic clinical practice however, valid and reliable methods of assessing the arthroscopic skill of orthopaedic surgeons are lacking. Hypothesis The Arthroscopic Surgery Skill Evaluation Tool (ASSET) will demonstrate content validity, concurrent criterion-oriented validity, and reliability, when used to assess the technical ability of surgeons performing diagnostic knee arthroscopy on cadaveric specimens. Study Design Cross-sectional study; Level of evidence, 3 Methods Content validity was determined by a group of seven experts using a Delphi process. Intra-articular performance of a right and left diagnostic knee arthroscopy was recorded for twenty-eight residents and two sports medicine fellowship trained attending surgeons. Subject performance was assessed by two blinded raters using the ASSET. Concurrent criterion-oriented validity, inter-rater reliability, and test-retest reliability were evaluated. Results Content validity: The content development group identified 8 arthroscopic skill domains to evaluate using the ASSET. Concurrent criterion-oriented validity: Significant differences in total ASSET score (p<0.05) between novice, intermediate, and advanced experience groups were identified. Inter-rater reliability: The ASSET scores assigned by each rater were strongly correlated (r=0.91, p <0.01) and the intra-class correlation coefficient between raters for the total ASSET score was 0.90. Test-retest reliability: there was a significant correlation between ASSET scores for both procedures attempted by each individual (r = 0.79, p<0.01). Conclusion The ASSET appears to be a useful, valid, and reliable method for assessing surgeon performance of diagnostic knee arthroscopy in cadaveric specimens. Studies are ongoing to determine its generalizability to other procedures as well as to the live OR and other simulated environments. PMID:23548808
Gait Deviation Index, Gait Profile Score and Gait Variable Score in children with spastic cerebral palsy: Intra-rater reliability and agreement across two repeated sessions.

PubMed

Rasmussen, Helle Mätzke; Nielsen, Dennis Brandborg; Pedersen, Niels Wisbech; Overgaard, Søren; Holsgaard-Larsen, Anders

2015-07-01

The Gait Deviation Index (GDI) and Gait Profile Score (GPS) are the most used summary measures of gait in children with cerebral palsy (CP). However, the reliability and agreement of these indices have not been investigated, limiting their clinimetric quality for research and clinical practice. The aim of this study was to investigate the intra-rater reliability and agreement of summary measures of gait (GDI; GPS; and the Gait Variable Score (GVS) derived from the GPS). The intra-rater reliability and agreement were investigated across two repeated sessions in 18 children aged 5-12 years diagnosed with spastic CP. No systematic bias was observed between the sessions and no heteroscedasticity was observed in Bland-Altman plots. For the GDI and GPS, excellent reliability with intraclass correlation coefficient (ICC) values of 0.8-0.9 was found, while the GVS was found to have fair to good reliability with ICCs of 0.4-0.7. The agreement for the GDI and the logarithmically transformed GPS, in terms of the standard error of measurement as a percentage of the grand mean (SEM%) varied from 4.1 to 6.7%, whilst the smallest detectable change in percent (SDC%) ranged from 11.3 to 18.5%. For the logarithmically transformed GVS, we found a fair to large variation in SEM% from 7 to 29% and in SDC% from 18 to 81%. The GDI and GPS demonstrated excellent reliability and acceptable agreement proving that they can both be used in research and clinical practice. However, the observed large variability for some of the GVS requires cautious consideration when selecting outcome measures. Copyright © 2015 Elsevier B.V. All rights reserved.

Development and Validation of the Behavioral Avoidance Test-Back Pain (BAT-Back) for Patients With Chronic Low Back Pain.

PubMed

Holzapfel, Sebastian; Riecke, Jenny; Rief, Winfried; Schneider, Jessica; Glombiewski, Julia A

2016-11-01

Pain-related fear and avoidance of physical activities are central elements of the fear-avoidance model of musculoskeletal pain. Pain-related fear has typically been measured by self-report instruments. In this study, we developed and validated a Behavioral Avoidance Test (BAT) for chronic low back pain (CLBP) patients with the aim of assessing pain-related avoidance behavior by direct observation. The BAT-Back was administered to a group of CLBP patients (N=97) and pain-free controls (N=31). Furthermore, pain, pain-related fear, disability, catastrophizing, and avoidance behavior were measured using self-report instruments. Reliability was assessed with intraclass correlation coefficient and Cronbach α. Validity was assessed by examining correlation and regression analysis. The intraclass correlation coefficient for the BAT-Back avoidance score was r=0.76. Internal consistency was α=0.95. CLBP patients and controls differed significantly on BAT-Back avoidance scores as well as self-report measures. BAT-Back avoidance scores were significantly correlated with scores on each of the self-report measures (rs=0.27 to 0.54). They were not significantly correlated with general anxiety and depression, age, body mass index, and pain duration. The BAT-Back avoidance score was able to capture unique variance in disability after controlling for other variables (eg, pain intensity and pain-related fear). Results indicate that the BAT-Back is a reliable and valid measure of pain-related avoidance behavior. It may be useful for clinicians in tailoring treatments for chronic pain as well as an outcome measure for exposure treatments.
RENZI SCORE FOR OBSTRUCTED DEFECATION SYNDROME - VALIDATION OF THE PORTUGUESE VERSION ACCORDING TO THE COSMIN CHECKLIST.

PubMed

Caetano, Ana Celia; Dias, Sara; Santa-Cruz, André; Rolanda, Carla

2018-01-01

Recently, the Obstructed Defecation Syndrome score (ODS score) was developed and validated by Renzi to assess clinical staging and to allow evaluation and comparison of the efficacy of treatment of this disorder. Our goal is to validate the Portuguese version of Renzi ODS score, according to the Consensus based Standards for the selection of the Health Measurement Instruments (COSMIN) checklist. Following guidelines for cross-cultural validity, Renzi ODS score was translated into the Portuguese language. Then, a group of patients and healthy controls were invited to fill in the Renzi ODS score at baseline, after 2 weeks and 3 months, respectively. We assessed internal consistency, reliability and measurement error, content and construct validity, responsiveness and interpretability. A total of 113 individuals (77 patients; 36 healthy controls) completed the questionnaire. Seventy and 30 patients repeated the Renzi ODS score after 2 weeks and 3 months respectively. Factor analysis confirmed the unidimensionality of the scale. Cronbach's α coefficient of 0.77 supported item's homogeneity. Weighted quadratic kappa of 0.89 established test-retest reliability. The smallest detectable change at the individual level was 2.66 and at the group level was 0.30. Renzi ODS score and the total (-0.32) and physical (-0.43) SF-36 scores correlated negatively. Patient and control's groups significantly differed (11 points). The change score of Renzi ODS score between baseline and 3 months correlated negatively with the clinical evolution (-0.86). ROC analysis showed minimal important change of 2.00 with AUC 0.97. Neither floor nor ceiling effects were observed. This work validated the Portuguese version of Renzi ODS score. We can now use this reliable, responsive, and interpretable (at the group level) tool to evaluate Portuguese ODS patients.
Reliability of the test of gross motor development second edition (TGMD-2) for Kindergarten children in Myanmar

PubMed Central

Aye, Thanda; Oo, Khin Saw; Khin, Myo Thuzar; Kuramoto-Ahuja, Tsugumi; Maruyama, Hitoshi

2017-01-01

[Purpose] The purpose of this study was to investigate reliability of the test of gross motor development second edition (TGMD-2) for Kindergarten children in Myanmar. [Subjects and Methods] Fifty healthy Kindergarten children (23 males, 27 females) whose parents/guardians had given written consent were participated. The subjects were explained and demonstrated all 12 gross motor skills of TGMD-2 before the assessment. Each subject individually performed two trials for each gross motor skill and the performance was video recorded. Three raters separately watched the video recordings and rated for inter-rater reliability. The second assessment was done one month later with 25 out of 50 subjects for test-rest reliability. The video recordings of 12 subjects were randomly selected from the first 50 recordings for intra-rater reliability six weeks after the first assessment. The agreement on the locomotor and object control raw scores and the gross motor quotient (GMQ) were calculated. [Results] The findings of all the reliability coefficients for the locomotor and object control raw scores and the GMQ were interpreted as good and excellent reliability. [Conclusion] The results represented that TGMD-2 is a highly reliable and appropriate assessment tool for assessing gross motor skill development of Kindergarten children in Myanmar. PMID:29184278
Reliability of the test of gross motor development second edition (TGMD-2) for Kindergarten children in Myanmar.

PubMed

Aye, Thanda; Oo, Khin Saw; Khin, Myo Thuzar; Kuramoto-Ahuja, Tsugumi; Maruyama, Hitoshi

2017-10-01

[Purpose] The purpose of this study was to investigate reliability of the test of gross motor development second edition (TGMD-2) for Kindergarten children in Myanmar. [Subjects and Methods] Fifty healthy Kindergarten children (23 males, 27 females) whose parents/guardians had given written consent were participated. The subjects were explained and demonstrated all 12 gross motor skills of TGMD-2 before the assessment. Each subject individually performed two trials for each gross motor skill and the performance was video recorded. Three raters separately watched the video recordings and rated for inter-rater reliability. The second assessment was done one month later with 25 out of 50 subjects for test-rest reliability. The video recordings of 12 subjects were randomly selected from the first 50 recordings for intra-rater reliability six weeks after the first assessment. The agreement on the locomotor and object control raw scores and the gross motor quotient (GMQ) were calculated. [Results] The findings of all the reliability coefficients for the locomotor and object control raw scores and the GMQ were interpreted as good and excellent reliability. [Conclusion] The results represented that TGMD-2 is a highly reliable and appropriate assessment tool for assessing gross motor skill development of Kindergarten children in Myanmar.
Predictive validity of a selection centre testing non-technical skills for recruitment to training in anaesthesia.

PubMed

Gale, T C E; Roberts, M J; Sice, P J; Langton, J A; Patterson, F C; Carr, A S; Anderson, I R; Lam, W H; Davies, P R F

2010-11-01

Assessment centres are an accepted method of recruitment in industry and are gaining popularity within medicine. We describe the development and validation of a selection centre for recruitment to speciality training in anaesthesia based on an assessment centre model incorporating the rating of candidate's non-technical skills. Expert consensus identified non-technical skills suitable for assessment at the point of selection. Four stations-structured interview, portfolio review, presentation, and simulation-were developed, the latter two being realistic scenarios of work-related tasks. Evaluation of the selection centre focused on applicant and assessor feedback ratings, inter-rater agreement, and internal consistency reliability coefficients. Predictive validity was sought via correlations of selection centre scores with subsequent workplace-based ratings of appointed trainees. Two hundred and twenty-four candidates were assessed over two consecutive annual recruitment rounds; 68 were appointed and followed up during training. Candidates and assessors demonstrated strong approval of the selection centre with more than 70% of ratings 'good' or 'excellent'. Mean inter-rater agreement coefficients ranged from 0.62 to 0.77 and internal consistency reliability of the selection centre score was high (Cronbach's α=0.88-0.91). The overall selection centre score was a good predictor of workplace performance during the first year of appointment. An assessment centre model based on the rating of non-technical skills can produce a reliable and valid selection tool for recruitment to speciality training in anaesthesia. Early results on predictive validity are encouraging and justify further development and evaluation.
Validity of the Special Needs Education Assessment Tool (SNEAT), a Newly Developed Scale for Children with Disabilities.

PubMed

Kohara, Aiko; Han, ChangWan; Kwon, HaeJin; Kohzuki, Masahiro

2015-11-01

The improvement of the quality of life (QOL) of children with disabilities has been considered important. Therefore, the Special Needs Education Assessment Tool (SNEAT) was developed based on the concept of QOL to objectively evaluate the educational outcome of children with disabilities. SNEAT consists of 11 items in three domains: physical functioning, mental health, and social functioning. This study aimed to verify the reliability and construct validity of SNEAT using 93 children collected from the classes on independent activities of daily living for children with disabilities in Okinawa Prefecture between October and November 2014. Survey data were collected in a longitudinal prospective cohort study. The reliability of SNEAT was verified via the internal consistency method and the test-pretest method; both the coefficient of Cronbach's α and the intra-class correlation coefficient were over 0.7. The validity of SNEAT was also verified via one-way repeated-measures ANOVA and the latent growth curve model. The scores of all the items and domains and the total scores obtained from one-way repeated-measures ANOVA were the same as the predicted scores. SNEAT is valid based on its goodness-of-fit values obtained using the latent growth curve model, where the values of comparative fit index (0.983) and root mean square error of approximation (0.062) were within the goodness-of-fit range. These results indicate that SNEAT has high reliability and construct validity and may contribute to improve QOL of children with disabilities in the classes on independent activities of daily living for children with disabilities.
Assessment of workplace bullying and harassment: reliability and validity of a Japanese version of the negative acts questionnaire.

PubMed

Takaki, Jiro; Tsutsumi, Akizumi; Fujii, Yasuhito; Taniguchi, Toshiyo; Hirokawa, Kumi; Hibino, Yuri; Lemmer, Richard J; Nashiwa, Hitomi; Wang, Da-Hong; Ogino, Keiki

2010-01-01

Interest in workplace bullying and harassment has been increasing in Japan. At present, the Negative Acts Questionnaire (NAQ) is one of the most frequently used questionnaires for assessing these issues. The purpose of this study was to develop a Japanese version of the NAQ. We translated the original version of the NAQ using a back-translation method. Participants in this study were recruited from 737 workers at a manufacturing company in Japan. Data were obtained from questionnaires completed by 517 respondents (response rate: 70.1%). We used a cross-validation approach. A three-factor model was obtained from exploratory factor analyses. The confirmatory factor analysis for this model revealed values of 0.94, 0.91, 0.95, and 0.054 for the goodness-of-fit index, the adjusted goodness-of-fit index, the comparative fit index, and the root mean square error of approximation, respectively. Pearson's correlation coefficients for the NAQ scores with the Job Content Questionnaire (JCQ) support scores and the Effort-Reward Imbalance Questionnaire scores for respect and job security were significant (p<0.001) and the direction of these associations were consistent with our expectations, with the exceptions of the correlations between the NAQ sexual harassment score and the JCQ support scores. Cronbach's alpha coefficients for the scores on the entire NAQ scale and on three subscales (person-related bullying, work-related bullying, and sexual harassment) were 0.90, 0.84, 0.60, and 0.60, respectively. A Japanese version of the NAQ was developed and it appears to have acceptable levels of internal consistency reliability and factor- and construct-validity.
Development and evaluation of the "BRISK Scale," a brief observational measure of risk communication competence.

PubMed

Han, Paul K J; Joekes, Katherine; Mills, Greg; Gutheil, Caitlin; Smith, Kahsi; Cochran, Nancy E; Elwyn, Glyn

2016-12-01

To develop and evaluate a brief observational measure of clinical risk communication competence. A 4-item checklist-type measure, the BRISK (Brief Risk Information Skill) Scale, was developed by selecting and refining items from a more comprehensive measure of clinical risk communication competence. Six volunteer raters received brief training on the measure and then used the BRISK Scale to evaluate 52 video-recorded encounters between 2nd-year medical students and standardized patients conducted as part of an Observed Structured Clinical Examination (OSCE) involving a risk communication task. Internal consistency reliability, inter-rater reliability, and criterion validity were assessed. Raters reported no difficulties using the BRISK Scale; scores across all raters and subjects ranged from 0 to 16 with a mean score of 6.49 (SD=3.17). The BRISK Scale showed good internal consistency reliability (α=0.64), and inter-rater reliability at the scale level (Intraclass Correlation Coefficient (ICC)=0.79 for consistency, and 0.75 for absolute agreement) and individual-item level (ICC range: 0.62-.91). Novice raters' BRISK Scale scores were highly correlated (r=0.84, p<0.01) with expert raters' scores on the Risk Communication Content measure, a more comprehensive measure of risk communication competence. The BRISK Scale is a promising new brief observational measure of clinical risk communication competence. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Systematic review found AMSTAR, but not R(evised)-AMSTAR, to have good measurement properties.

PubMed

Pieper, Dawid; Buechter, Roland Brian; Li, Lun; Prediger, Barbara; Eikermann, Michaela

2015-05-01

To summarize all available evidence on measurement properties in terms of reliability, validity, and feasibility of the Assessment of Multiple Systematic Reviews (AMSTAR) tool, including R(evised)-AMSTAR. MEDLINE, EMBASE, Psycinfo, and CINAHL were searched for studies containing information on measurement properties of the tools in October 2013. We extracted data on study characteristics and measurement properties. These data were analyzed following measurement criteria. We included 13 studies, four of them were labeled as validation studies. Nine articles dealt with AMSTAR, two articles dealt with R-AMSTAR, and one article dealt with both instruments. In terms of interrater reliability, most items showed a substantial agreement (>0.6). The median intraclass correlation coefficient (ICC) for the overall score of AMSTAR was 0.83 (range 0.60-0.98), indicating a high agreement. In terms of validity, ICCs were very high with all but one ICC lower than 0.8 when the AMSTAR score was compared with scores from other tools. Scoring AMSTAR takes between 10 and 20 minutes. AMSTAR seems to be reliable and valid. Further investigations for systematic reviews of other study designs than randomized controlled trials are needed. R-AMSTAR should be further investigated as evidence for its use is limited and its measurement properties have not been studied sufficiently. In general, test-retest reliability should be investigated in future studies. Copyright © 2015 Elsevier Inc. All rights reserved.
Applications of computerized adaptive testing (CAT) to the assessment of headache impact.

PubMed

Ware, John E; Kosinski, Mark; Bjorner, Jakob B; Bayliss, Martha S; Batenhorst, Alice; Dahlöf, Carl G H; Tepper, Stewart; Dowson, Andrew

2003-12-01

To evaluate the feasibility of computerized adaptive testing (CAT) and the reliability and validity of CAT-based estimates of headache impact scores in comparison with 'static' surveys. Responses to the 54-item Headache Impact Test (HIT) were re-analyzed for recent headache sufferers (n = 1016) who completed telephone interviews during the National Survey of Headache Impact (NSHI). Item response theory (IRT) calibrations and the computerized dynamic health assessment (DYNHA) software were used to simulate CAT assessments by selecting the most informative items for each person and estimating impact scores according to pre-set precision standards (CAT-HIT). Results were compared with IRT estimates based on all items (total-HIT), computerized 6-item dynamic estimates (CAT-HIT-6), and a developmental version of a 'static' 6-item form (HIT-6-D). Analyses focused on: respondent burden (survey length and administration time), score distributions ('ceiling' and 'floor' effects), reliability and standard errors, and clinical validity (diagnosis, level of severity). A random sample (n = 245) was re-assessed to test responsiveness. A second study (n = 1103) compared actual CAT surveys and an improved 'static' HIT-6 among current headache sufferers sampled on the Internet. Respondents completed measures from the first study and the generic SF-8 Health Survey; some (n = 540) were re-tested on the Internet after 2 weeks. In the first study, simulated CAT-HIT and total-HIT scores were highly correlated (r = 0.92) without 'ceiling' or 'floor' effects and with a substantial reduction (90.8%) in respondent burden. Six of the 54 items accounted for the great majority of item administrations (3603/5028, 77.6%). CAT-HIT reliability estimates were very high (0.975-0.992) in the range where 95% of respondents scored, and relative validity (RV) coefficients were high for diagnosis (RV = 0.87) and severity (RV = 0.89); patient-level classifications were accurate 91.3% for a diagnosis of migraine. For all three criteria of change, CAT-HIT scores were more responsive than all other measures. In the second study, estimates of respondent burden, item usage, reliability and clinical validity were replicated. The test-retest reliability of CAT-HIT was 0.79 and alternate forms coefficients ranged from 0.85 to 0.91. All correlations with the generic SF-8 were negative. CAT-based administrations of headache impact items achieved very large reductions in respondent burden without compromising validity for purposes of patient screening or monitoring changes in headache impact over time. IRT models and CAT-based dynamic health assessments warrant testing among patients with other conditions.
Adaptation, reliability and validity testing of a Persian version of the Health Assessment Questionnaire-Disability Index in Iranian patients with rheumatoid arthritis.

PubMed

Nazary-Moghadam, Salman; Zeinalzadeh, Afsaneh; Salavati, Mahyar; Almasi, Simin; Negahban, Hossein

2017-01-01

The aim of the present study was to culturally adapt and evaluate reliability and validity of Health Assessment Questionnaire-Disability Index (HAQ-DI) in Iranian patients with rheumatoid arthritis (RA). 234 patients with RA for validation study, Eighty-six participants for reliability study. Test-retest relative reliability and internal consistency of Persian version of HAQ-DI were examined by intraclass correlation coefficient (ICC) and Cronbach's alpha, respectively. Additionally, HAQ-DI construct validity (Spearman's correlation) was examined using Persian version of Short-Form 36 Health survey (SF-36), activity and severity parameters. Persian version of HAQ-DI total score showed excellent test-retest reliability (ICC = 0.98) and internal consistency (Cronbach's alpha = 0.95). Spearman's correlations between the total PHAQ-DI score and activity and severity parameters were above 0.55. Correlation between PHAQ-DI and SF-36 Physical Health were higher as compared with SF-36 Mental Health. Persian version of HAQ-DI is a reliable and valid culturally-adapted instrument in order to measure functional limitations in Iranian people with RA. Copyright © 2016 Elsevier Ltd. All rights reserved.
[Development of skill scale for communication skill measurement of pharmacist].

PubMed

Teramachi, Hitomi; Komada, Natsuki; Tanizawa, Katsuya; Kuzuya, Yumi; Tsuchiya, Teruo

2011-04-01

To purpose of this study was to develop a pharmacist communication skill scale. A 38 items scale was made and 283 pharmacists responded. The original questionnaire consisted of 38 items, with 1-5 graded Likert scale. Completed responses of 228 pharmacists data were used for testing the reliability and the validity of this scale. The first group of items from the original questionnaire were 38, and finally 38 original items were chosen for investigation of content validity, correlation coefficient and commonality. From factor analysis, four factors were chosen among the 31 items as follows: patient respect reception skill, problem discovery and solution skill, positive approach skill, feelings processing skill. The correlation coefficient between this original scale and the KiSS-18 (Social Skill) received high score (r=0.694). The reliability of this scale showed high internal consistency (Cronbach α coefficient=0.951), so the result of test for the validity of this scale supports high content validity. Thus we propose adoption of pharmacist communication skill scale to carry a brief eponymous name as TePSS-31. The above findings indicate that this developed scale possess adequate validity and reliability for practical use.
INTRA-RATER RELIABILITY OF THE MULTIPLE SINGLE-LEG HOP-STABILIZATION TEST AND RELATIONSHIPS WITH AGE, LEG DOMINANCE AND TRAINING.

PubMed

Sawle, Leanne; Freeman, Jennifer; Marsden, Jonathan

2017-04-01

Balance is a complex construct, affected by multiple components such as strength and co-ordination. However, whilst assessing an athlete's dynamic balance is an important part of clinical examination, there is no gold standard measure. The multiple single-leg hop-stabilization test is a functional test which may offer a method of evaluating the dynamic attributes of balance, but it needs to show adequate intra-tester reliability. The purpose of this study was to assess the intra-rater reliability of a dynamic balance test, the multiple single-leg hop-stabilization test on the dominant and non-dominant legs. Intra-rater reliability study. Fifteen active participants were tested twice with a 10-minute break between tests. The outcome measure was the multiple single-leg hop-stabilization test score, based on a clinically assessed numerical scoring system. Results were analysed using an Intraclass Correlations Coefficient (ICC 2,1 ) and Bland-Altman plots. Regression analyses explored relationships between test scores, leg dominance, age and training (an alpha level of p = 0.05 was selected). ICCs for intra-rater reliability were 0.85 for the dominant and non-dominant legs (confidence intervals = 0.62-0.95 and 0.61-0.95 respectively). Bland-Altman plots showed scores within two standard deviations. A significant correlation was observed between the dominant and non-dominant leg on balance scores (R 2 =0.49, p<0.05), and better balance was associated with younger participants in their non-dominant leg (R 2 =0.28, p<0.05) and their dominant leg (R 2 =0.39, p<0.05), and a higher number of hours spent training for the non-dominant leg R 2 =0.37, p<0.05). The multiple single-leg hop-stabilisation test demonstrated strong intra-tester reliability with active participants. Younger participants who trained more, have better balance scores. This test may be a useful measure for evaluating the dynamic attributes of balance. 3.
Interest in Aesthetic Rhinoplasty Scale.

PubMed

Naraghi, Mohsen; Atari, Mohammad

2017-04-01

Interest in cosmetic surgery is increasing, with rhinoplasty being one of the most popular surgical procedures. It is essential that surgeons identify patients with existing psychological conditions before any procedure. This study aimed to develop and validate the Interest in Aesthetic Rhinoplasty Scale (IARS). Four studies were conducted to develop the IARS and to evaluate different indices of validity (face, content, construct, criterion, and concurrent validities) and reliability (internal consistency, split-half coefficient, and temporal stability) of the scale. The four study samples included a total of 463 participants. Statistical analysis revealed satisfactory psychometric properties in all samples. Scores on the IARS were negatively correlated with self-esteem scores ( r = -0.296; p < 0.01) and positively associated with scores for psychopathologic symptoms ( r = 0.164; p < 0.05), social dysfunction ( r = 0.268; p < 0.01), and depression ( r = 0.308; p < 0.01). The internal and test-retest coefficients of consistency were found to be high (α = 0.93; intraclass coefficient = 0.94). Rhinoplasty patients were found to have significantly higher IARS scores than nonpatients ( p < 0.001). Findings of the present studies provided evidence for face, content, construct, criterion, and concurrent validities and internal and test-retest reliability of the IARS. This evidence supports the use of the scale in clinical and research settings. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.
Assessment of contrast sensitivity by Spaeth Richman Contrast Sensitivity Test and Pelli Robson Chart Test in patients with varying severity of glaucoma.

PubMed

Thakur, Sahil; Ichhpujani, Parul; Kumar, Suresh; Kaur, Ravneet; Sood, Sunandan

2018-05-14

This study was designed to assess the efficacy, reliability and repeatability of SPARCS (Spaeth Richman Contrast Sensitivity Test) as compared to the conventional Pelli Robson Chart Test for the assessment of contrast sensitivity in patients with glaucoma. We evaluated 135 eyes of 135 patients who were age and sex matched into three groups (controls, disc suspects and glaucoma) of 45 patients each. The glaucoma subgroup was further divided into subgroups of mild, moderate and severe based on the visual field damage. There was a strong positive correlation between Pelli Robson scores and SPARCS scores (S = 0.807, P < 0.001). Intraclass correlation coefficient (ICC) for Pelli Robson Test was 0.952 and 0.988 for SPARCS. The coefficient of repeatability (COR) for mean SPARCS was 5.65%, while COR of Pelli Robson Test was 12.44%. SPARCS was found to have better repeatability than Pelli Robson Test based on COR values. Pelli Robson score had a sensitivity of 80% and a specificity of 65.6% for detecting glaucoma patients as compared to 84.4% and 70%, respectively, for SPARCS scores. SPARCS is a better alternative to conventional Pelli Robson Chart Test for assessment of contrast sensitivity in patients with glaucoma. Being independent of the effects of literacy and educational status, it offers a universal way to measure contrast sensitivity. It can also be reliably used in patients with varying severity of glaucoma.
Developing a Psychometric Instrument to Measure Physical Education Teachers' Job Demands and Resources.

PubMed

Zhang, Tan; Chen, Ang

2017-01-01

Based on the job demands-resources model, the study developed and validated an instrument that measures physical education teachers' job demands-resources perception. Expert review established content validity with the average item rating of 3.6/5.0. Construct validity and reliability were determined with a teacher sample ( n = 397). Exploratory factor analysis established a five-dimension construct structure matching the theoretical construct deliberated in the literature. The composite reliability scores for the five dimensions range from .68 to .83. Validity coefficients (intraclass correlational coefficients) are .69 for job resources items and .82 for job demands items. Inter-scale correlational coefficients range from -.32 to .47. Confirmatory factor analysis confirmed the construct validity with high dimensional factor loadings (ranging from .47 to .84 for job resources scale and from .50 to .85 for job demands scale) and adequate model fit indexes (root mean square error of approximation = .06). The instrument provides a tool to measure physical education teachers' perception of their working environment.
Developing a Psychometric Instrument to Measure Physical Education Teachers’ Job Demands and Resources

PubMed Central

Zhang, Tan; Chen, Ang

2017-01-01

Based on the job demands–resources model, the study developed and validated an instrument that measures physical education teachers’ job demands–resources perception. Expert review established content validity with the average item rating of 3.6/5.0. Construct validity and reliability were determined with a teacher sample (n = 397). Exploratory factor analysis established a five-dimension construct structure matching the theoretical construct deliberated in the literature. The composite reliability scores for the five dimensions range from .68 to .83. Validity coefficients (intraclass correlational coefficients) are .69 for job resources items and .82 for job demands items. Inter-scale correlational coefficients range from −.32 to .47. Confirmatory factor analysis confirmed the construct validity with high dimensional factor loadings (ranging from .47 to .84 for job resources scale and from .50 to .85 for job demands scale) and adequate model fit indexes (root mean square error of approximation = .06). The instrument provides a tool to measure physical education teachers’ perception of their working environment. PMID:29200808
The Functional Arm Scale for Throwers (FAST)-Part II: Reliability and Validity of an Upper Extremity Region-Specific and Population-Specific Patient-Reported Outcome Scale for Throwing Athletes.

PubMed

Huxel Bliven, Kellie C; Snyder Valier, Alison R; Bay, R Curtis; Sauers, Eric L

2017-04-01

The Functional Arm Scale for Throwers (FAST) is an upper extremity (UE) region-specific and population-specific patient-reported outcome (PRO) scale developed to measure health-related quality of life in throwers with UE injuries. Stages I and II, described in a companion paper, of FAST development produced a 22-item scale and a 9-item pitcher module. Stage III of scale development, establishing reliability and validity of the FAST, is reported herein. To describe stage III of scale development: reliability and validity of the FAST. Cohort study (diagnosis); Level of evidence, 2. Data from throwing athletes collected over 5 studies were pooled to assess reliability and validity of the FAST. Reliability was estimated using FAST scores from 162 throwing athletes who were injured (n = 23) and uninjured (n = 139). Concurrent validity was estimated using FAST scores and Disabilities of the Arm, Shoulder, and Hand (DASH) and Kerlan-Jobe Orthopaedic Clinic (KJOC) scores from 106 healthy, uninjured throwing athletes. Known-groups validity was estimated using FAST scores from 557 throwing athletes who were injured (n = 142) and uninjured (n = 415). Reliability and validity were assessed using intraclass correlation coefficients (ICCs), and measurement error was assessed using standard error of measurement (SEM) and minimum detectable change (MDC). Receiver operating characteristic curves and sensitivity/specificity values were estimated for known-groups validity. Data from a separate group (n = 18) of postsurgical and nonoperative/conservative rehabilitation patients were analyzed to report responsiveness of the FAST. The FAST total, subscales, and pitcher module scores demonstrated excellent test-retest reliability (ICC, 0.91-0.98). The SEM 95 and MDC 95 for the FAST total score were 3.8 and 10.5 points, respectively. The SEM 95 and MDC 95 for the pitcher module score were 5.7 and 15.7 points, respectively. The FAST scores showed acceptable correlation with DASH (ICC, 0.49-0.82) and KJOC (ICC, 0.62-0.81) scores. The FAST total score classified 85.1% of players into the correct injury group. For predicting UE injury status, a FAST total cutoff score of 10.0 out of 100.0 was 91% sensitive and 75% specific, and a pitcher module score of 10.0 out of 100.0 was 87% sensitive and 78% specific. The FAST total score demonstrated responsiveness on several indices between intake and discharge time points. The FAST is a reliable, valid, and responsive UE region-specific and population-specific PRO scale for measuring patient-reported health care outcomes in throwing athletes with injury.
The Functional Arm Scale for Throwers (FAST)—Part II: Reliability and Validity of an Upper Extremity Region-Specific and Population-Specific Patient-Reported Outcome Scale for Throwing Athletes

PubMed Central

Huxel Bliven, Kellie C.; Snyder Valier, Alison R.; Bay, R. Curtis; Sauers, Eric L.

2017-01-01

Background: The Functional Arm Scale for Throwers (FAST) is an upper extremity (UE) region-specific and population-specific patient-reported outcome (PRO) scale developed to measure health-related quality of life in throwers with UE injuries. Stages I and II, described in a companion paper, of FAST development produced a 22-item scale and a 9-item pitcher module. Stage III of scale development, establishing reliability and validity of the FAST, is reported herein. Purpose: To describe stage III of scale development: reliability and validity of the FAST. Study Design: Cohort study (diagnosis); Level of evidence, 2. Methods: Data from throwing athletes collected over 5 studies were pooled to assess reliability and validity of the FAST. Reliability was estimated using FAST scores from 162 throwing athletes who were injured (n = 23) and uninjured (n = 139). Concurrent validity was estimated using FAST scores and Disabilities of the Arm, Shoulder, and Hand (DASH) and Kerlan-Jobe Orthopaedic Clinic (KJOC) scores from 106 healthy, uninjured throwing athletes. Known-groups validity was estimated using FAST scores from 557 throwing athletes who were injured (n = 142) and uninjured (n = 415). Reliability and validity were assessed using intraclass correlation coefficients (ICCs), and measurement error was assessed using standard error of measurement (SEM) and minimum detectable change (MDC). Receiver operating characteristic curves and sensitivity/specificity values were estimated for known-groups validity. Data from a separate group (n = 18) of postsurgical and nonoperative/conservative rehabilitation patients were analyzed to report responsiveness of the FAST. Results: The FAST total, subscales, and pitcher module scores demonstrated excellent test-retest reliability (ICC, 0.91-0.98). The SEM95 and MDC95 for the FAST total score were 3.8 and 10.5 points, respectively. The SEM95 and MDC95 for the pitcher module score were 5.7 and 15.7 points, respectively. The FAST scores showed acceptable correlation with DASH (ICC, 0.49-0.82) and KJOC (ICC, 0.62-0.81) scores. The FAST total score classified 85.1% of players into the correct injury group. For predicting UE injury status, a FAST total cutoff score of 10.0 out of 100.0 was 91% sensitive and 75% specific, and a pitcher module score of 10.0 out of 100.0 was 87% sensitive and 78% specific. The FAST total score demonstrated responsiveness on several indices between intake and discharge time points. Conclusion: The FAST is a reliable, valid, and responsive UE region-specific and population-specific PRO scale for measuring patient-reported health care outcomes in throwing athletes with injury. PMID:28451614
Validation of a Spanish version of the Leicester Cough Questionnaire in non-cystic fibrosis bronchiectasis.

PubMed

Muñoz, Gerard; Buxó, Maria; de Gracia, Javier; Olveira, Casilda; Martinez-Garcia, Miguel Angel; Giron, Rosa; Polverino, Eva; Alvarez, Antonio; Birring, Surinder S; Vendrell, Montserrat

2016-05-01

The Leicester Cough Questionnaire (LCQ) has been validated in non-cystic fibrosis bronchiectasis (NCFBC). The present study aimed to create and validate a Spanish version of the LCQ (LCQ-Sp) in NCFBC. The LCQ-Sp was developed following a standardized protocol. For reliability, we assessed internal consistency and the change in score over a 15-day period in stable state. For responsiveness, we assessed the change in scores between visit 1 and the first exacerbation. For validity, we evaluated convergent validity through correlation with the Saint George's Respiratory Questionnaire (SGRQ) and discriminant validity. Two hundred fifty-nine patients (118 mild bronchiectasis, 90 moderate bronchiectasis and 47 severe bronchiectasis) were included. Internal consistency was high for the total scoring and good for the different domains (Cronbach's α: 0.86-0.91). The test-retest reliability shows an intraclass correlation coefficient of 0.87 for the total score. The mean LCQ-Sp score at visit 1 decreased at the beginning of an exacerbation (15.13 ± 4.06 vs. 12.24 ± 4.64; p < 0.001). The correlation between LCQ-Sp and SGRQ scores was -0.66 (p < 0.01). The differences in the LCQ-Sp total score between the different groups of severity were significant (p < 0.001). The LCQ-Sp discriminates disease severity, is responsive to change when faced with exacerbations and is reliable for use in bronchiectasis. © The Author(s) 2016.

Validation of a Spanish version of the Leicester Cough Questionnaire in non-cystic fibrosis bronchiectasis

PubMed Central

Muñoz, Gerard; Buxó, Maria; de Gracia, Javier; Olveira, Casilda; Martinez-Garcia, Miguel Angel; Giron, Rosa; Polverino, Eva; Alvarez, Antonio; Birring, Surinder S

2016-01-01

The Leicester Cough Questionnaire (LCQ) has been validated in non-cystic fibrosis bronchiectasis (NCFBC). The present study aimed to create and validate a Spanish version of the LCQ (LCQ-Sp) in NCFBC. The LCQ-Sp was developed following a standardized protocol. For reliability, we assessed internal consistency and the change in score over a 15-day period in stable state. For responsiveness, we assessed the change in scores between visit 1 and the first exacerbation. For validity, we evaluated convergent validity through correlation with the Saint George’s Respiratory Questionnaire (SGRQ) and discriminant validity. Two hundred fifty-nine patients (118 mild bronchiectasis, 90 moderate bronchiectasis and 47 severe bronchiectasis) were included. Internal consistency was high for the total scoring and good for the different domains (Cronbach’s α: 0.86–0.91). The test–retest reliability shows an intraclass correlation coefficient of 0.87 for the total score. The mean LCQ-Sp score at visit 1 decreased at the beginning of an exacerbation (15.13 ± 4.06 vs. 12.24 ± 4.64; p < 0.001). The correlation between LCQ-Sp and SGRQ scores was −0.66 (p < 0.01). The differences in the LCQ-Sp total score between the different groups of severity were significant (p < 0.001). The LCQ-Sp discriminates disease severity, is responsive to change when faced with exacerbations and is reliable for use in bronchiectasis. PMID:26902541
Reliability of the ECHOWS Tool for Assessment of Patient Interviewing Skills.

PubMed

Boissonnault, Jill S; Evans, Kerrie; Tuttle, Neil; Hetzel, Scott J; Boissonnault, William G

2016-04-01

History taking is an important component of patient/client management. Assessment of student history-taking competency can be achieved via a standardized tool. The ECHOWS tool has been shown to be valid with modest intrarater reliability in a previous study but did not demonstrate sufficient power to definitively prove its stability. The purposes of this study were: (1) to assess the reliability of the ECHOWS tool for student assessment of patient interviewing skills and (2) to determine whether the tool discerns between novice and experienced skill levels. A reliability and construct validity assessment was conducted. Three faculty members from the United States and Australia scored videotaped histories from standardized patients taken by students and experienced clinicians from each of these countries. The tapes were scored twice, 3 to 6 weeks apart. Reliability was assessed using interclass correlation coefficients (ICCs) and repeated measures. Analysis of variance models assessed the ability of the tool to discern between novice and experienced skill levels. The ECHOWS tool showed excellent intrarater reliability (ICC [3,1]=.74-.89) and good interrater reliability (ICC [2,1]=.55) as a whole. The summary of performance (S) section showed poor interrater reliability (ICC [2,1]=.27). There was no statistical difference in performance on the tool between novice and experienced clinicians. A possible ceiling effect may occur when standardized patients are not coached to provide complex and obtuse responses to interviewer questions. Variation in familiarity with the ECHOWS tool and in use of the online training may have influenced scoring of the S section. The ECHOWS tool demonstrates excellent intrarater reliability and moderate interrater reliability. Sufficient training with the tool prior to student assessment is recommended. The S section must evolve in order to provide a more discerning measure of interviewing skills. © 2016 American Physical Therapy Association.
The role and reliability of the Psychopathy Checklist-Revised in U.S. sexually violent predator evaluations: a case law survey.

PubMed

DeMatteo, David; Edens, John F; Galloway, Meghann; Cox, Jennifer; Smith, Shannon Toney; Formon, Dana

2014-06-01

The civil commitment of offenders as sexually violent predators (SVPs) is a highly contentious area of U.S. mental health law. The Psychopathy Checklist-Revised (PCL-R) is frequently used in mental health evaluations in these cases to aid legal decision making. Although generally perceived to be a useful assessment tool in applied settings, recent research has raised questions about the reliability of PCL-R scores in SVP cases. In this report, we review the use of the PCL-R in SVP trials identified as part of a larger project investigating its role in U.S. case law. After presenting data on how the PCL-R is used in SVP cases, we examine the reliability of scores reported in these cases. We located 214 cases involving the PCL-R, 88 of which included an actual score and 29 of which included multiple scores. In the 29 cases with multiple scores, the intraclass correlation coefficient for a single evaluator for the PCL-R scores was only .58, and only 41.4% of the difference scores were within 1 standard error of measurement unit. The average score reported by prosecution experts was significantly higher than the average score reported by defense-retained experts, and prosecution experts reported PCL-R scores of 30 or above in nearly 50% of the cases, compared with less than 10% of the cases for defense witnesses (κ = .29). In conjunction with other recently published findings demonstrating the unreliability of PCL-R scores in applied settings, our results raise questions as to whether this instrument should be admitted into SVP proceedings.
Validity and reliability of the patient assessment of constipation quality of life questionnaire for the Turkish population.

PubMed

Bengi, Göksel; Yalçın, Mustafa; Akpınar, Hale; Keskinoğlu, Pembe; Ellidokuz, Hülya

2015-07-01

There are few specific evaluation forms for evaluating the quality of life among patients with chronic constipation. Our study aimed to determine the validity and reliability of the translated Patient Assessment of Constipation Quality of Life (PAC-QOL) questionnaire for the Turkish population because evidence of its reliability and validity is required to justify its use in other studies and clinical practice. This study included 154 patients with constipation who were treated at the Department of Gastroenterology, Dokuz Eylül University Hospital between January and June 2012. The translated PAC-QOL questionnaire was completed by patients at the clinic and also at a 2-week follow-up to test its reliability. Cronbach's alpha coefficient (internal consistency) was 0.91 (good) for the translated PAC-QOL questionnaire. Time validity was evaluated using the intraclass correlation coefficient (ICC) method, and the ICC value for all questions was confirmed as 0.68 at the 2-week follow-up. The validity of the tool in the study group was evaluated using factor analysis, and the results were highly significant (Kaiser-Meyer-Olkin value: 0.857; Bartlett's test: p=0.001). Questions were categorized according to six factors based on the factor analysis, and these factors explained 65.1% of the total variation. For hypothesis verification of the tool, the correlation coefficient for PAC-QOL and PAC Symptoms (PAC-SYM) was r=0.577 (p<0.001), whereas the correlation coefficient for PAC-QOL and constipation severity score was r=0.457 (p<0.001). The PAC-QOL questionnaire was reliable, although not valid because of the limited sample group.
Reliability and validity of simplified Chinese version of Swiss Spinal Stenosis Questionnaire for patients with degenerative lumbar spinal stenosis.

PubMed

Yi, Honglei; Wei, Xianzhao; Zhang, Wei; Chen, Ziqiang; Wang, Xinhui; Ji, Xinran; Zhu, Xiaodong; Wang, Fei; Xu, Ximing; Li, Zhikun; Fan, Jianping; Wang, Chuanfeng; Chen, Kai; Zhang, Guoyou; Zhao, Yinchuan; Li, Ming

2014-05-01

This was a prospective clinical validation study. To evaluate the reliability and validity of the adapted simplified Chinese version of Swiss Spinal Stenosis (SC-SSS) Questionnaire. The SSS Questionnaire is a reliable and valid instrument to assess the perception of function and pain for patients with degenerative lumbar spinal stenosis. However, there is no culturally adapted SSS Questionnaire for use in mainland China. This was a prospective clinical validation study. The adaption was conducted according to International Quality of Life Assessment Project guidelines. To examine the psychometric properties of the adapted SC-SSS Questionnaire, a sample of 105 patients with lumbar spinal stenosis were included. Thirty-two patients were randomly selected to evaluate the test-retest reliability. Reliability assessment of the SC-SSS Questionnaire was determined by calculating Cronbach α and intraclass coefficient values. Concurrent validity was assessed by correlating SC-SSS Questionnaire scores with relevant domains of the 36-Item Short Form Health Survey. Cronbach α of the symptom severity scale, physical function scale, patients, and satisfaction scale of SC-SSS Questionnaire are 0.89, 0.86, 0.91, respectively, which revealed very good internal consistency. The test-retest reproducibility was found to be excellent with the intraclass correlation coefficient of 0.93, 0.91, and 0.95. In terms of concurrent validity, SC-SSS Questionnaire had good correlation with physical functioning and bodily pain of 36-Item Short Form Health Survey (r = 0.663, 0.653) and low correlation with mental health (r = 0.289). The physical function scale had good correlation with physical functioning of 36-Item Short Form Health Survey (r = 0.637), whereas the rest had moderate correlation. The satisfaction scale score was highly correlated with the change in the symptom severity (r = 0.71) and physical function (r = 0.68) scale score. The SC-SSS Questionnaire showed satisfactory reliability and validity in the evaluation of functionality in patients with lumbar spinal stenosis who are experiencing neurogenic claudication. It is simple and easy to use and can be recommended in clinical and research practice in mainland China. 3.
The retest reliability of the six-minute walk test in patients referred to a cardiac rehabilitation programme.

PubMed

Hanson, Lisa C; McBurney, Helen; Taylor, Nicholas F

2012-03-01

The purpose of this paper was to determine if the Six-minute Walk Test (6MWT) was a reliable exercise test for patients referred to cardiac rehabilitation when up to three tests were performed and to determine if test scores differed according to between-test time interval. Thirty adults aged 63 ± 7.9 years referred to cardiac rehabilitation participated in a repeated measures reliability trial. Participants completed three 6MWTs within a one-week period. Participants were randomly allocated to one of three groups: on the first day, Group A completed three walks, Group B completed two walks and Group C completed one walk. Relative reliability was expressed in a ratio (ICC(2,1) ), and absolute reliability was expressed in metres (95% confidence intervals) for group and individuals. The 6MWT demonstrated a high level of relative reliability (intraclass correlation coefficients [ICC] = 0.94) across the three walks. There was no statistically significant difference between the test scores of the three groups. However, there was an increase in distance walked from the first to the second to the third 6MWT. Absolute reliability indicated that a change of at least 44 m would be required to be interpreted as true change in a group, and at least 95 m to be interpreted as true change in an individual with 95% confidence. Three 6MWTs completed in relatively short timeframes were not sufficient for reliable results as there was an increase in the distance walked, and relatively large increases in distances would be required to be interpreted as change. It did not make any difference whether the tests were all completed on one day or over one week. This study highlighted problems that may arise when relying on reliability coefficients alone to interpret reliability. These results suggest that the 6MWT may not have sufficient reliability to be a suitable test to evaluate exercise tolerance in patients referred to cardiac rehabilitation. Copyright © 2011 John Wiley & Sons, Ltd.
Validity, Reliability, and Feasibility of Durometer Measurements of Scleroderma Skin Disease in a Multicenter Treatment Trial

PubMed Central

MERKEL, PETER A.; SILLIMAN, NANCY P.; DENTON, CHRISTOPHER P.; FURST, DANIEL E.; KHANNA, DINESH; EMERY, PAUL; HSU, VIVIEN M.; STREISAND, JAMES B.; POLISSON, RICHARD P.; ÅKESSON, ANITA; COPPOCK, JOHN; van den HOOGEN, FRANK; HERRICK, ARIANE; MAYES, MAUREEN D.; VEALE, DOUGLAS; SEIBOLD, JAMES R.; BLACK, CAROL M.; KORN, JOSEPH H.

2013-01-01

Objective To determine the validity, reliability, and feasibility of durometer measurements of skin hardness as an outcome measure in clinical trials of scleroderma. Methods Skin hardness was measured during a multicenter treatment trial for scleroderma using handheld digital durometers with a continuous scale. Skin thickness was measured by modified Rodnan skin score (MRSS). Other outcome data collected included the Scleroderma Health Assessment Questionnaire. In a reliability exercise in advance of the trial, 9 investigators examined the same 5 scleroderma patients by MRSS and durometry. Results Forty-three patients with early diffuse cutaneous systemic sclerosis were studied at 11 international centers (mean age 49 years [range 24–76], median disease duration 6.4 months [range 0.3–23], and median baseline MRSS 22 [range 11–38]). The reliability of durometer measurements was excellent, with high interobserver intraclass correlation coefficients (ICCs) (0.82–0.92), and each result was greater than the corresponding skin site ICCs for MRSS (0.54–0.85). Baseline durometer scores correlated well with MRSS (r = 0.69, P < 0.0001), patient self-assessments of skin disease (r = 0.69, P < 0.0001), and Health Assessment Questionnaire (HAQ) disability scores (r = 0.34, P = 0.03). Change in durometer scores correlated with change in MRSS (r = 0.70, P < 0.0001), change in patient self-assessments of skin disease (r = 0.52, P = 0.003), and change in HAQ disability scores (r = 0.42, P = 0.017). The effect size was greater for durometry than for MRSS or patient self-assessment. Conclusion Durometer measurements of skin hardness in patients with scleroderma are reliable, simple, accurate, demonstrate good sensitivity to change compared with traditional skin scoring, and reflect patients' self-assessments of their disease. Durometer measurements are valid, objective, and scalable, and should be considered for use as a complementary outcome measure to skin scoring in clinical trials of scleroderma. PMID:18438905
Validity and reliability of global operative assessment of laparoscopic skills (GOALS) in novice trainees performing a laparoscopic cholecystectomy.

PubMed

Kramp, Kelvin H; van Det, Marc J; Hoff, Christiaan; Lamme, Bas; Veeger, Nic J G M; Pierie, Jean-Pierre E N

2015-01-01

Global Operative Assessment of Laparoscopic Skills (GOALS) assessment has been designed to evaluate skills in laparoscopic surgery. A longitudinal blinded study of randomized video fragments was conducted to estimate the validity and reliability of GOALS in novice trainees. In total, 10 trainees each performed 6 consecutive laparoscopic cholecystectomies. Sixty procedures were recorded on video. Video fragments of (1) opening of the peritoneum; (2) dissection of Calot's triangle and achievement of critical view of safety; and (3) dissection of the gallbladder from the liver bed were blinded, randomized, and rated by 2 consultant surgeons using GOALS. Also, a grade was given for overall competence. The correlation of GOALS with live observation Objective Structured Assessment of Technical Skills (OSATS) scores was calculated. Construct validity was estimated using the Friedman 2-way analysis of variance by ranks and the Wilcoxon signed-rank test. The interrater reliability was calculated using the absolute and consistency agreement 2-way random-effects model intraclass correlation coefficient. A high correlation was found between mean GOALS score (r = 0.879, p = 0.021) and mean OSATS score. The GOALS score increased significantly across the 6 procedures (p = 0.002). The trainees performed significantly better on their sixth when compared with their first cholecystectomy (p = 0.004). The consistency agreement interrater reliability was 0.37 for the mean GOALS score (p = 0.002) and 0.55 for overall competence (p < 0.001) of the 3 video fragments. The validity observed in this randomized blinded longitudinal study supports the existing evidence that GOALS is a valid tool for assessment of novice trainees. A relatively low reliability was found in this study. Copyright © 2014 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.
Validity and Reliability of the Chronic Respiratory Disease Questionnaire in Elderly Individuals with Mild to Moderate Non-Cystic Fibrosis Bronchiectasis.

PubMed

Vodanovich, Domagoj A; Bicknell, Thomas J; Holland, Anne E; Hill, Catherine J; Cecins, Nola; Jenkins, Sue; McDonald, Christine F; Burge, Angela T; Thompson, Philip; Stirling, Robert G; Lee, Annemarie L

2015-01-01

The chronic respiratory disease questionnaire (CRDQ) is designed to assess health-related quality of life (HRQOL) in chronic respiratory conditions, but its reliability, validity and responsiveness in individuals with mild to moderate non-cystic fibrosis (CF) bronchiectasis are unclear. This study aimed to determine measurement properties of the CRDQ in non-CF bronchiectasis. Participants with non-CF bronchiectasis involved in a randomised controlled trial of exercise training were recruited. Internal consistency was assessed using Cronbach's α. Over 8 weeks, reliability was evaluated using intra-class correlation coefficients and Bland-Altman analysis for measures of agreement. Convergent and divergent validity was assessed by correlations with the other HRQOL questionnaires and the Hospital Anxiety and Depression Scale (HADS). The responsiveness to exercise training was assessed using effect sizes and standardised response means. Eighty-five participants were included (mean age ± SD, 64 ± 13 years). Internal consistency was adequate (>0.7) for all CRDQ domains and the total score. Test-retest reliability ranged from 0.69 to 0.85 for each CRDQ domain and was 0.82 for the total score. Dyspnoea (CRDQ) was related to St George's respiratory questionnaire (SGRQ) symptoms only (r = 0.38), with no relationship to the Leicester cough questionnaire (LCQ) or HADS. Moderate correlations were found between the total score of the CRDQ, the SGRQ (rs = -0.49) and the LCQ score (rs = 0.51). Lower CRDQ scores were associated with higher anxiety and depression (rs = -0.46 to -0.56). The responsiveness of the CRDQ was small (effect size 0.1-0.24). The CRDQ is a valid and reliable measure of HRQOL in mild to moderate non-CF bronchiectasis, but responsiveness was limited. © 2015 S. Karger AG, Basel.
The Reliability and Validity of the Thoracolumbar Injury Classification System in Pediatric Spine Trauma.

PubMed

Savage, Jason W; Moore, Timothy A; Arnold, Paul M; Thakur, Nikhil; Hsu, Wellington K; Patel, Alpesh A; McCarthy, Kathryn; Schroeder, Gregory D; Vaccaro, Alexander R; Dimar, John R; Anderson, Paul A

2015-09-15

The thoracolumbar injury classification system (TLICS) was evaluated in 20 consecutive pediatric spine trauma cases. The purpose of this study was to determine the reliability and validity of the TLICS in pediatric spine trauma. The TLICS was developed to improve the categorization and management of thoracolumbar trauma. TLICS has been shown to have good reliability and validity in the adult population. The clinical and radiographical findings of 20 pediatric thoracolumbar fractures were prospectively presented to 20 surgeons with disparate levels of training and experience with spinal trauma. These injuries were consecutively scored using the TLICS. Cohen unweighted κ coefficients and Spearman rank order correlation values were calculated for the key parameters (injury morphology, status of posterior ligamentous complex, neurological status, TLICS total score, and proposed management) to assess the inter-rater reliabilities. Five surgeons scored the same cases 3 months later to assess the intra-rater reliability. The actual management of each case was then compared with the treatment recommended by the TLICS algorithm to assess validity. The inter-rater κ statistics of all subgroups (injury morphology, status of the posterior ligamentous complex, neurological status, TLICS total score, and proposed treatment) were within the range of moderate to substantial reproducibility (0.524-0.958). All subgroups had excellent intra-rater reliability (0.748-1.000). The various indices for validity were calculated (80.3% correct, 0.836 sensitivity, 0.785 specificity, 0.676 positive predictive value, 0.899 negative predictive value). Overall, TLICS demonstrated good validity. The TLICS has good reliability and validity when used in the pediatric population. The inter-rater reliability of predicting management and indices for validity are lower than those in adults with thoracolumbar fractures, which is likely due to differences in the way children are treated for certain types of injuries. TLICS can be used to reliably categorize thoracolumbar injuries in the pediatric population; however, modifications may be needed to better guide treatment in this specific patient population. 4.
eHealth literacy in chronic disease patients: An item response theory analysis of the eHealth literacy scale (eHEALS).

PubMed

Paige, Samantha R; Krieger, Janice L; Stellefson, Michael; Alber, Julia M

2017-02-01

Chronic disease patients are affected by low computer and health literacy, which negatively affects their ability to benefit from access to online health information. To estimate reliability and confirm model specifications for eHealth Literacy Scale (eHEALS) scores among chronic disease patients using Classical Test (CTT) and Item Response Theory techniques. A stratified sample of Black/African American (N=341) and Caucasian (N=343) adults with chronic disease completed an online survey including the eHEALS. Item discrimination was explored using bi-variate correlations and Cronbach's alpha for internal consistency. A categorical confirmatory factor analysis tested a one-factor structure of eHEALS scores. Item characteristic curves, in-fit/outfit statistics, omega coefficient, and item reliability and separation estimates were computed. A 1-factor structure of eHEALS was confirmed by statistically significant standardized item loadings, acceptable model fit indices (CFI/TLI>0.90), and 70% variance explained by the model. Item response categories increased with higher theta levels, and there was evidence of acceptable reliability (ω=0.94; item reliability=89; item separation=8.54). eHEALS scores are a valid and reliable measure of self-reported eHealth literacy among Internet-using chronic disease patients. Providers can use eHEALS to help identify patients' eHealth literacy skills. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Assessment of reliability, validity, responsiveness and minimally important change of the German Hip dysfunction and osteoarthritis outcome score (HOOS) in patients with osteoarthritis of the hip.

PubMed

Arbab, Dariusch; van Ochten, Johannes H M; Schnurr, Christoph; Bouillon, Bertil; König, Dietmar

2017-12-01

Patient-reported outcome measures are a critical tool in evaluating the efficacy of orthopedic procedures. The intention of this study was to evaluate reliability, validity, responsiveness and minimally important change of the German version of the Hip dysfunction and osteoarthritis outcome score (HOOS). The German HOOS was investigated in 251 consecutive patients before and 6 months after total hip arthroplasty. All patients completed HOOS, Oxford-Hip Score, Short-Form (SF-36) and numeric scales for pain and disability. Test-retest reliability, internal consistency, floor and ceiling effects, construct validity and minimal important change were analyzed. The German HOOS demonstrated excellent test-retest reliability with intraclass correlation coefficient values > 0.7. Cronbach´s alpha values demonstrated strong internal consistency. As hypothesized, HOOS subscales strongly correlated with corresponding OHS and SF-36 domains. All subscales showed excellent (effect size/standardized response means > 0.8) responsiveness between preoperative assessment and postoperative follow-up. The HOOS and all subdomains showed higher changes than the minimal detectable change which indicates true changes. The German version of the HOOS demonstrated good psychometric properties. It proved to be valid, reliable and responsive to the changes instrument for use in patients with hip osteoarthritis undergoing total hip replacement.
Test-retest and interrater reliability of the functional lower extremity evaluation.

PubMed

Haitz, Karyn; Shultz, Rebecca; Hodgins, Melissa; Matheson, Gordon O

2014-12-01

Repeated-measures clinical measurement reliability study. To establish the reliability and face validity of the Functional Lower Extremity Evaluation (FLEE). The FLEE is a 45-minute battery of 8 standardized functional performance tests that measures 3 components of lower extremity function: control, power, and endurance. The reliability and normative values for the FLEE in healthy athletes are unknown. A face validity survey for the FLEE was sent to sports medicine personnel to evaluate the level of importance and frequency of clinical usage of each test included in the FLEE. The FLEE was then administered and rated for 40 uninjured athletes. To assess test-retest reliability, each athlete was tested twice, 1 week apart, by the same rater. To assess interrater reliability, 3 raters scored each athlete during 1 of the testing sessions. Intraclass correlation coefficients were used to assess the test-retest and interrater reliability of each of the FLEE tests. In the face validity survey, the FLEE tests were rated as highly important by 58% to 71% of respondents but frequently used by only 26% to 45% of respondents. Interrater reliability intraclass correlation coefficients ranged from 0.83 to 1.00, and test-retest reliability ranged from 0.71 to 0.95. The FLEE tests are considered clinically important for assessing lower extremity function by sports medicine personnel but are underused. The FLEE also is a reliable assessment tool. Future studies are required to determine if use of the FLEE to make return-to-play decisions may reduce reinjury rates.
Test Reliability at the Individual Level

PubMed Central

Hu, Yueqin; Nesselroade, John R.; Erbacher, Monica K.; Boker, Steven M.; Burt, S. Alexandra; Keel, Pamela K.; Neale, Michael C.; Sisk, Cheryl L.; Klump, Kelly

2016-01-01

Reliability has a long history as one of the key psychometric properties of a test. However, a given test might not measure people equally reliably. Test scores from some individuals may have considerably greater error than others. This study proposed two approaches using intraindividual variation to estimate test reliability for each person. A simulation study suggested that the parallel tests approach and the structural equation modeling approach recovered the simulated reliability coefficients. Then in an empirical study, where forty-five females were measured daily on the Positive and Negative Affect Schedule (PANAS) for 45 consecutive days, separate estimates of reliability were generated for each person. Results showed that reliability estimates of the PANAS varied substantially from person to person. The methods provided in this article apply to tests measuring changeable attributes and require repeated measures across time on each individual. This article also provides a set of parallel forms of PANAS. PMID:28936107
Testing the reliability of the Fall Risk Screening Tool in an elderly ambulatory population.

PubMed

Fielding, Susan J; McKay, Michael; Hyrkas, Kristiina

2013-11-01

To identify and test the reliability of a fall risk screening tool in an ambulatory outpatient clinic. The Fall Risk Screening Tool (Albert Lea Medical Center, MN, USA) was scripted for an interview format. Two interviewers separately screened a convenience sample of 111 patients (age ≥ 65 years) in an ambulatory outpatient clinic in a northeastern US city. The interviewers' scoring of fall risk categories was similar. There was good internal consistency (Cronbach's α = 0.834-0.889) and inter-rater reliability [intra-class correlation coefficients (ICC) = 0.824-0.881] for total, Risk Factor and Client's Health Status subscales. The Physical Environment scores indicated acceptable internal consistency (Cronbach's α = 0.742) and adequate reliability (ICC = 0.688). Two Physical Environment items (furniture and medical equipment condition) had low reliabilities [Kappa (K) = 0.323, P = 0.08; K = -0.078, P = 0.648), respectively. The scripted Fall Risk Screening Tool demonstrated good reliability in this sample. Rewording two Physical Environment items will be considered. A reliable instrument such as the scripted Fall Risk Screening Tool provides a standardised assessment for identifying high fall risk patients. This tool is especially useful because it assesses personal, behavioural and environmental factors specific to community-dwelling patients; the interview format also facilitates patient-provider interaction. © 2013 John Wiley & Sons Ltd.
Validation of the Quality-of-Life Questionnaire of the European Foundation for Osteoporosis (QUALEFFO-26) in Korean population.

PubMed

Lee, Jung Sub; Shin, Jong Ki; Son, Seung Min; An, Sung Jin; Kang, Sung Shik

2014-07-01

We aimed to evaluate the reliability and validity of the adapted Korean version of the Quality-of-Life Questionnaire of the European Foundation for Osteoporosis (QUALEFFO-26). Translation/retranslation of the English version of QUALEFFO was conducted, and all steps of the cross-cultural adaptation process were performed. The Korean version of the visual analog scale measure of pain, QUALEFFO-26 and the previously validated Short Form-36 (SF-36) were mailed to 162 consecutive patients with osteoporosis. Factor analysis and reliability assessment by kappa statistics of agreement for each item, the intraclass correlation coefficient and Cronbach's α were conducted. Construct validity was also evaluated by comparing the responses of QUALEFFO-26 with the responses of SF-36 using Pearson's correlation coefficient. Factor analysis extracted 3 factors. All items had a kappa statistics of agreement greater than 0.6. The QUALEFFO-26 showed good test/retest reliability (QUALEFFO-26: 0.8271). Internal consistency of Cronbach's α was found to be very good (QUALEFFO-26: 0.873). The Korean version of QUALEFFO-26 showed good significant correlation with SF-36 total score and with single SF-36 domains scores. The adapted Korean version of the QUALEFFO-26 was successfully translated and showed acceptable measurement properties and, as such, is considered suitable for outcome assessments in the Korean-speaking patients with osteoporosis.
Assessment of tinnitus-related impairments and disabilities using the German THI-12: sensitivity and stability of the scale over time.

PubMed

Görtelmeyer, Roman; Schmidt, Jürgen; Suckfüll, Markus; Jastreboff, Pawel; Gebauer, Alexander; Krüger, Hagen; Wittmann, Werner

2011-08-01

To evaluate the reliability, dimensionality, predictive validity, construct validity, and sensitivity to change of the THI-12 total and sub-scales as diagnostic aids to describe and quantify tinnitus-evoked reactions and evaluate treatment efficacy. Explorative analysis of the German tinnitus handicap inventory (THI-12) to assess potential sensitivity to tinnitus therapy in placebo-controlled randomized studies. Correlation analysis, including Cronbach's coefficient α and explorative common factor analysis (EFA), was conducted within and between assessments to demonstrate the construct validity, dimensionality, and factorial structure of the THI-12. N = 618 patients suffering from subjective tinnitus who were to be screened to participate in a randomized, placebo-controlled, 16-week, longitudinal study. The THI-12 can reliably diagnose tinnitus-related impairments and disabilities and assess changes over time. The test-retest coefficient for neighboured visits was r > 0.69, the internal consistency of the THI-12 total score was α ≤ 0.79 and α ≤ 0.89 at subsequent visits. Predictability of THI-12 total score and overall variance increased with successive measurements. The three-factorial structure allowed for evaluation of factors that affect aspects of patients' health-related quality of life. The THI-12, with its three-factorial structure, is a simple, reliable, and valid instrument for the diagnosis and assessment of tinnitus and associated impairment over time.
Ecologically relevant outcome measure for post-inpatient rehabilitation.

PubMed

Marquez de la Plata, Carlos; Qualls, Devin; Plenger, Patrick; Malec, James F; Hayden, Mary Ellen

2017-01-01

Transfer of skills learned within the clinic environment to patients' home or community is important in post-inpatient brain injury rehabilitation (PBIR). Outcome measures used in PBIR assess level of independence during functional tasks; however, available functional instruments do not quantitate the environment in which the behaviors occur. To examine the reliability and validity of an instrument used to assess patients' functional abilities while quantifying the amount of structure and distractions in the environment. 2501 patients who sustained a traumatic brain injury (TBI) or cerebrovascular accident (CVA) and participated in a multidisciplinary PBIR program between 2006 and 2014 were identified retrospectively for this study. The PERPOS and MPAI-4 were used to assess functional abilities at admission and at discharge. Construct validity was assessed using a bivariate Spearman rho analysis A subsample of 56 consecutive admissions during 2014 were examined to determine inter-rater reliability. Intra-class correlation coefficient (ICC) and Kappa coefficients assessed inter-rater agreement of the total PERPOS and PERPOS subscales respectively. The PERPOS and MPAI-4 demonstrated a strong negative association among both TBI and CVA patients. Kappa scores for the three PERPOS scales each demonstrated good to excellent inter-rater agreement. The ICC for overall PERPOS scores fell in the good agreement range. The PERPOS can be used reliably in PBIR to quantify patients' functional abilities within the context of environmental demands.
Internal and temporal reliability estimates for informant ratings of personality using the NEO PI-R and IAS. NEO Personality Inventory. Interpersonal Adjective Scales.

PubMed

Kurtz, J E; Lee, P A; Sherker, J L

1999-06-01

This study examines the internal consistency and temporal stability of informant ratings from two widely used instruments for normal personality assessment, the revised NEO Personality Inventory (NEO PI-R) and the Interpersonal Adjective Scales (IAS). Well-known adult targets were selected by 109 undergraduate students and rated on two occasions separated by a 6-month interval. With few exceptions, estimates of internal consistency are adequate to good for both instruments. NEO PI-R domain scores yield coefficient alphas ranging from .89 to .96, with a median of .80 for the 30 facet scales. IAS octant scales show coefficient alphas ranging from .83 to .92. Retest Pearson correlations are above .70 for each of the NEO PI-R domain scores and both IAS axis coordinates, and intraclass correlations are above .60 for all scales from both instruments. Score changes were small but statistically significant for three of the five NEO PI-R domains at retest. The retest stability of IAS type classifications varies as a function of the extremity of the associated octant scores.
THE INTRA- AND INTER-RATER RELIABILITY OF THE SOCCER INJURY MOVEMENT SCREEN (SIMS).

PubMed

McCunn, Robert; Aus der Fünten, Karen; Govus, Andrew; Julian, Ross; Schimpchen, Jan; Meyer, Tim

2017-02-01

The growing volume of movement screening research reveals a belief among practitioners and researchers alike that movement quality may have an association with injury risk. However, existing movement screening tools have not considered the sport-specific movement and injury patterns relevant to soccer. The present study introduces the Soccer Injury Movement Screen (SIMS), which has been designed specifically for use within soccer. Furthermore, the purpose of the present study was to assess the intra- and inter-rater reliability of the SIMS and determine its suitability for use in further research. The study utilized a test-retest design to discern reliablility. Twenty-five (11 males, 14 females) healthy, recreationally active university students (age 25.5 ± 4.0 years, height 171 ± 9 cm, weight 64.7 ± 12.6 kg) agreed to participate. The SIMS contains five sub-tests: the anterior reach, single-leg deadlift, in-line lunge, single-leg hop for distance and tuck jump. Each movement was scored out of 10 points and summed to produce a composite score out of 50. The anterior reach and single-leg hop for distance were scored in real-time while the remaining tests were filmed and scored retrospectively. Three raters conducted the SIMS with each participant on three occasions separated by an average of three and a half days (minimum one day, maximum seven days). Rater 1 re-scored the filmed movements for all participants on all occasions six months later to establish the 'pure' intra-rater (intra-occasion) reliability for those movements. Intraclass correlation coefficient (ICC) values for intra- and inter-rater composite score reliability ranged from 0.66-0.72 and 0.79-0.86 respectively. Weighted kappa values representing the intra- and inter-rater reliability of the individual sub-tests ranged from 0.35-0.91 indicating fair to almost perfect agreement. Establishing the reliability of the SIMS is a prerequisite for further research seeking to investigate the relationship between test score and subsequent injury. The present results indicate acceptable reliability for this purpose; however, room for further development of the intra-rater reliability exists for some of the individual sub-tests. 2b.

THE INTRA- AND INTER-RATER RELIABILITY OF THE SOCCER INJURY MOVEMENT SCREEN (SIMS)

PubMed Central

aus der Fünten, Karen; Govus, Andrew; Julian, Ross; Schimpchen, Jan; Meyer, Tim

2017-01-01

Background/purpose The growing volume of movement screening research reveals a belief among practitioners and researchers alike that movement quality may have an association with injury risk. However, existing movement screening tools have not considered the sport-specific movement and injury patterns relevant to soccer. The present study introduces the Soccer Injury Movement Screen (SIMS), which has been designed specifically for use within soccer. Furthermore, the purpose of the present study was to assess the intra- and inter-rater reliability of the SIMS and determine its suitability for use in further research. Methods The study utilized a test-retest design to discern reliablility. Twenty-five (11 males, 14 females) healthy, recreationally active university students (age 25.5 ± 4.0 years, height 171 ± 9 cm, weight 64.7 ± 12.6 kg) agreed to participate. The SIMS contains five sub-tests: the anterior reach, single-leg deadlift, in-line lunge, single-leg hop for distance and tuck jump. Each movement was scored out of 10 points and summed to produce a composite score out of 50. The anterior reach and single-leg hop for distance were scored in real-time while the remaining tests were filmed and scored retrospectively. Three raters conducted the SIMS with each participant on three occasions separated by an average of three and a half days (minimum one day, maximum seven days). Rater 1 re-scored the filmed movements for all participants on all occasions six months later to establish the ‘pure’ intra-rater (intra-occasion) reliability for those movements. Results Intraclass correlation coefficient (ICC) values for intra- and inter-rater composite score reliability ranged from 0.66-0.72 and 0.79-0.86 respectively. Weighted kappa values representing the intra- and inter-rater reliability of the individual sub-tests ranged from 0.35-0.91 indicating fair to almost perfect agreement. Conclusions Establishing the reliability of the SIMS is a prerequisite for further research seeking to investigate the relationship between test score and subsequent injury. The present results indicate acceptable reliability for this purpose; however, room for further development of the intra-rater reliability exists for some of the individual sub-tests. Level of evidence 2b PMID:28217416
A comparison of Google Glass and traditional video vantage points for bedside procedural skill assessment.

PubMed

Evans, Heather L; O'Shea, Dylan J; Morris, Amy E; Keys, Kari A; Wright, Andrew S; Schaad, Douglas C; Ilgen, Jonathan S

2016-02-01

This pilot study assessed the feasibility of using first person (1P) video recording with Google Glass (GG) to assess procedural skills, as compared with traditional third person (3P) video. We hypothesized that raters reviewing 1P videos would visualize more procedural steps with greater inter-rater reliability than 3P rating vantages. Seven subjects performed simulated internal jugular catheter insertions. Procedures were recorded by both Google Glass and an observer's head-mounted camera. Videos were assessed by 3 expert raters using a task-specific checklist (CL) and both an additive- and summative-global rating scale (GRS). Mean scores were compared by t-tests. Inter-rater reliabilities were calculated using intraclass correlation coefficients. The 1P vantage was associated with a significantly higher mean CL score than the 3P vantage (7.9 vs 6.9, P = .02). Mean GRS scores were not significantly different. Mean inter-rater reliabilities for the CL, additive-GRS, and summative-GRS were similar between vantages. 1P vantage recordings may improve visualization of tasks for behaviorally anchored instruments (eg, CLs), whereas maintaining similar global ratings and inter-rater reliability when compared with conventional 3P vantage recordings. Copyright © 2016 Elsevier Inc. All rights reserved.
Creation of the TXP parenting questionnaire and study of its psychometric properties.

PubMed

Benito, Ana; Calvo, Gema; Real-López, Matías; Gallego, María José; Francés, Sonia; Turbi, Ángel; Haro, Gonzalo

2018-01-15

Parenting is linked to conduct disorders (CD) and substance related disorders (SRD) in adolescents, but with differences according to cultural context. A questionnaire with two versions (parenting questionnaire TXP-A for adolescents and TXP-C for primary caregivers) was designed using the Delphi method to evaluate parenting practices related to CD and SRD in a Spanish population. It was validated in a community sample of 631 adolescents aged between 14 and 16 and their caregivers. Results suggest a 29-item TXP-A questionnaire with bifactorial structure: affection-communication and control-structure, with high internal (Cronbach’s alpha=0.89) and test-retest (intraclass correlation coefficient=0.94) reliabilities. Both factors are related to SRD (r=0.273, p<0.001) and with most of the psychopathological dimensions studied. The total score and affection-communication are related to dissocial disorder (t=3.259, p=0.001) and its severity (r=-0,119; p=0.003). Inter-observer reliability between adolescents and caregivers is low, in part because the 16-item TXP-C has a different bifactorial structure: affection-communication and prosocial values. TXP-C’s internal (Cronbach’s alpha=0.87) and test-retest (intraclass correlation coefficient=0.94) reliabilities are high. The total score and affection-communication were related to dissocial disorder (t=2.586; p=0.010) but TXP-C did not discriminate according to SRD. In conclusion, the TXP-A questionnaire for adolescents seems to be a reliable, valid and unbiased instrument that evaluates the perception of parenting practices, relating higher affection-communication and control-structure to less psychopathology and alcohol and drug use. TXP-C also seems to be reliable and unbiased, but shows less evidence of validity regarding substance use and psychopathology. .
Determining the Appropriateness of the "What If" Situations Test (WIST) with Turkish Pre-Schoolers.

PubMed

Citak Tunc, Gulseren; Gorak, Gulay; Ozyazicioglu, Nurcan; Ak, Bedriye; Isil, Ozlem; Vural, Pinar

2018-04-01

Measurement instruments are needed to assess the child's sexual abuse prevention program. The purpose of the study was to determine the reliability and validity of the WIST (What If Situations Test) for Turkish culture. Participants were children of the 3-6 age group attending pre-school education institutions and the sample size was identified by means of a power analysis. Seventy children were identified as the sample with 0.85 power and 0.05 type I error according to the power analysis. Language validity, content validity, internal validity coefficient (Cronbach alpha coefficient), and test-retest analyses were conducted in terms of validity and reliability in the scope of efforts for adaptation to Turkish culture. Firstly, Kendall W = 0.83 was the score for the expert opinions concerning the content validity of the language validity scale. It was found that the Cronbach alpha coefficients were between 0.68 and 0.90 for the scale sub-dimensions of appropriate and inappropriate recognition, saying, doing, telling, and reporting. The test-retest reliability of the scale was found to be r = 0.89 and the test-retest reliabilities for the sub-dimensions (appropriate recognition, inappropriate recognition, say skills, do skills, tell skills, and reporting skills) were between r = 0.48 and r = 0.92. The test-retest reliability for the Personal Safety Questionnaire (PSQ), as having complimentary items to the WIST, was found to be r = 0.82. The reliability and validity analysis of the 'What If' Situations Test (WIST), used to evaluate pre-schoolers' skills regarding self-protection against sexual abuse, showed that the Test's adaptation to Turkish culture was reliable and valid.
The analysis of reliability and validity of the IT-MAIS, MAIS and MUSS.

PubMed

Zhong, Yan; Xu, Tianqiu; Dong, Ruijuan; Lyu, Jing; Liu, Bo; Chen, Xueqing

2017-05-01

The aim of this study was to investigate the reliability and validity of the Infant-toddler Meaningful Auditory Integration Scale (IT-MAIS), Meaningful Auditory Integration Scale (MAIS), and Meaningful Use of Speech Scale (MUSS). IT-MAIS, MAIS and MUSS were divided into 3 sub dimensions. 300 children with cochlear implants (CI) were included in the investigation. To assess test-retest reliability of these questionnaires, 30 children were selected randomly to be evaluated at a two-week interval indicated that there were no significant changes between test and retest. Furthermore random test analysis by different evaluators was also administered to 30 users. Reliability test: Test-retest reliability of the three scales was proved to be satisfactory. All domains had correlation coefficients that exceeded 0.750(P < 0.01). The Cronbach's α of the three scales and their three domains were greater than 0.700. Reliability between evaluators of the three scales were considered to be satisfactory. All domains had correlation coefficients that exceeded 0.750(P < 0.01). Validity test: The evaluation of content validity by expert review showed the questionnaire had good content validity; The correlation coefficients between the overall scores of the three scales and their three domains were 0.699-0.978(P < 0.01). There were correlations among the three sub-domains but the strength of the correlations was relatively low. There was certain construct validity. IT-MAIS, MAIS, MUSS scales have good reliability and validity, and can be used to measure the outcome for children with cochlear implants hearing and speech evaluation. Copyright © 2017 Elsevier B.V. All rights reserved.
Validity and Reliability of the Turkish version of DSM-5 Social Anxiety Disorder Severity Scale- Child Form.

PubMed

Yalin Sapmaz, Şermin; Ergin, Dilek; Şen Celasin, Nesrin; Karaarslan, Duygu; Öztürk, Masum; Özek Erkuran, Handan; Köroğlu, Ertuğrul; Aydemir, Ömer

2017-12-01

This study aimed to assess the validity and reliability of the Turkish version of the Diagnostic and statistical manual of Mental Disorders. (5 th ed.) (DSM-5) Social Anxiety Disorder Severity Scale- Child Form. The scale was prepared by carrying out the translation and back translation of the DSM-5 Social Anxiety Disorder Severity Scale - Child Form. The study group consisted of 31 patients that had been treated in a child psychiatry unit and diagnosed with social anxiety disorder and 99 healthy volunteers that were attending middle or high school during the study period. For the assessment, the Screen for Child Anxiety and Related Emotional Disorders (SCARED) was also used along with the DSM-5 Social Anxiety Disorder Severity Scale - Child Form. Regarding reliability analyses, Cronbach's alpha internal consistency coefficient was calculated as 0.941, while item-total score correlation coefficients were measured between 0.566 and 0.866. A test-retest correlation coefficient was calculated as r=0.711. As for construct validity, one factor that could explain 66.0 % of the variance was obtained. As for concurrent validity, the scale showed a high correlation with the SCARED. It was concluded that the Turkish version of the DSM-5 Social Anxiety Disorder Severity Scale - Child Form could be utilized as a valid and reliable tool both in clinical practice and for research purposes.
Psychometric properties of the foot and ankle outcome score in a community-based study of adults with and without osteoarthritis.

PubMed

Golightly, Yvonne M; Devellis, Robert F; Nelson, Amanda E; Hannan, Marian T; Lohmander, L Stefan; Renner, Jordan B; Jordan, Joanne M

2014-03-01

Foot and ankle problems are common in adults, and large observational studies are needed to advance our understanding of the etiology and impact of these conditions. Valid and reliable measures of foot and ankle symptoms and physical function are necessary for this research. This study examined psychometric properties of the Foot and Ankle Outcome Score (FAOS) subscales (pain, other symptoms, activities of daily living [ADL], sport and recreational function [sport/recreation], and foot- and ankle-related quality of life [QOL]) in a large, community-based sample of African American and white men and women ages ≥50 years. Johnston County Osteoarthritis Project participants (n = 1,670) completed the 42-item FAOS (mean age 69 years, 68% women, 31% African American, mean body mass index [BMI] 31.5 kg/m(2) ). Internal consistency, test-retest reliability, convergent validity, and structural validity of each subscale were examined for the sample and for subgroups according to race, sex, age, BMI, presence of knee or hip osteoarthritis, and presence of knee, hip, or low back symptoms. For the sample and each subgroup, Cronbach's alpha coefficients ranged from 0.95-0.97 (pain), 0.97-0.98 (ADL), 0.94-0.96 (sport/recreation), 0.89-0.92 (QOL), and 0.72-0.82 (symptoms). Correlation coefficients ranged from 0.24-0.52 for pain and symptoms subscales with foot and ankle symptoms and from 0.30-0.55 for ADL and sport/recreation subscales with the Western Ontario and McMaster Universities Osteoarthritis Index function subscale. Intraclass correlation coefficients for test-retest reliability ranged from 0.63-0.81. Items loaded on a single factor for each subscale except symptoms (2 factors). The FAOS exhibited sufficient reliability and validity in this large cohort study. Copyright © 2014 by the American College of Rheumatology.
Translation, Cross-Cultural Adaptation, and Validation of the Activity Rating Scale for Disorders of the Knee

PubMed Central

Flosadottir, Vala; Roos, Ewa M.; Ageberg, Eva

2017-01-01

Background: The Activity Rating Scale (ARS) for disorders of the knee evaluates the level of activity by the frequency of participation in 4 separate activities with high demands on knee function, with a score ranging from 0 (none) to 16 (pivoting activities 4 times/wk). Purpose: To translate and cross-culturally adapt the ARS into Swedish and to assess measurement properties of the Swedish version of the ARS. Study Design: Cohort study (diagnosis); Level of evidence, 2. Methods: The COSMIN guidelines were followed. Participants (N = 100 [55 women]; mean age, 27 years) who were undergoing rehabilitation for a knee injury completed the ARS twice for test-retest reliability. The Knee injury and Osteoarthritis Outcome Score (KOOS), Tegner Activity Scale (TAS), and modernized Saltin-Grimby Physical Activity Level Scale (SGPALS) were administered at baseline to validate the ARS. Construct validity and responsiveness of the ARS were evaluated by testing predefined hypotheses regarding correlations between the ARS, KOOS, TAS, and SGPALS. The Cronbach alpha, intraclass correlation coefficients, absolute reliability, standard error of measurement, smallest detectable change, and Spearman rank-order correlation coefficients were calculated. Results: The ARS showed good internal consistency (α ≈ 0.96), good test-retest reliability (intraclass correlation coefficient >0.9), and no systematic bias between measurements. The standard error of measurement was less than 2 points, and the smallest detectable change was less than 1 point at the group level and less than 5 points at the individual level. More than 75% of the hypotheses were confirmed, indicating good construct validity and good responsiveness of the ARS. Conclusion: The Swedish version of the ARS is valid, reliable, and responsive for evaluating the level of activity based on the frequency of participation in high-demand knee sports activities in young adults with a knee injury. PMID:28979920
Evaluating trauma team performance in a Level I trauma center: Validation of the trauma team communication assessment (TTCA-24).

PubMed

DeMoor, Stephanie; Abdel-Rehim, Shady; Olmsted, Richard; Myers, John G; Parker-Raley, Jessica

2017-07-01

Nontechnical skills (NTS), such as team communication, are well-recognized determinants of trauma team performance and good patient care. Measuring these competencies during trauma resuscitations is essential, yet few valid and reliable tools are available. We aimed to demonstrate that the Trauma Team Communication Assessment (TTCA-24) is a valid and reliable instrument that measures communication effectiveness during activations. Two tools with adequate psychometric strength (Trauma Nontechnical Skills Scale [T-NOTECHS], Team Emergency Assessment Measure [TEAM]) were identified during a systematic review of medical literature and compared with TTCA-24. Three coders used each tool to evaluate 35 stable and 35 unstable patient activations (defined according to Advanced Trauma Life Support criteria). Interrater reliability was calculated between coders using the intraclass correlation coefficient. Spearman rank correlation coefficient was used to establish concurrent validity between TTCA-24 and the other two validated tools. Coders achieved an intraclass correlation coefficient of 0.87 for stable patient activations and 0.78 for unstable activations scoring excellent on the interrater agreement guidelines. The median score for each assessment showed good team communication for all 70 videos (TEAM, 39.8 of 54; T-NOTECHS, 17.4 of 25; and TTCA-24, 87.4 of 96). A significant correlation between TTTC-24 and T-NOTECHS was revealed (p = 0.029), but no significant correlation between TTCA-24 and TEAM (p = 0.77). Team communication was rated slightly better across all assessments for stable versus unstable patient activations, but not statistically significant. TTCA-24 correlated with T-NOTECHS, an instrument measuring nontechnical skills for trauma teams, but not TEAM, a tool that assesses communication in generic emergency settings. TTCA-24 is a reliable and valid assessment that can be a useful adjunct when evaluating interpersonal and team communication during trauma activations. Diagnostic tests or criteria, level II.
Italian Validation of Homophobia Scale (HS).

PubMed

Ciocca, Giacomo; Capuano, Nicolina; Tuziak, Bogdan; Mollaioli, Daniele; Limoncin, Erika; Valsecchi, Diana; Carosa, Eleonora; Gravina, Giovanni L; Gianfrilli, Daniele; Lenzi, Andrea; Jannini, Emmanuele A

2015-09-01

The Homophobia Scale (HS) is a valid tool to assess homophobia. This test is self-reporting, composed of 25 items, which assesses a total score and three factors linked to homophobia: behavior/negative affect, affect/behavioral aggression, and negative cognition. The aim of this study was to validate the HS in the Italian context. An Italian translation of the HS was carried out by two bilingual people, after which an English native translated the test back into the English language. A psychologist and sexologist checked the translated items from a clinical point of view. We recruited 100 subjects aged18-65 for the Italian validation of the HS. The Pearson coefficient and Cronbach's α coefficient were performed to test the test-retest reliability and internal consistency. A sociodemographic questionnaire including the main information as age, geographic distribution, partnership status, education, religious orientation, and sex orientation was administrated together with the translated version of HS. The analysis of the internal consistency showed an overall Cronbach's α coefficient of 0.92. In the four domains, the Cronbach's α coefficient was 0.90 in behavior/negative affect, 0.94 in affect/behavioral aggression, and 0.92 in negative cognition, whereas in the total score was 0.86. The test-retest reliability showed the following results: the HS total score was r = 0.93 (P < 0.0001), behavior/negative affect was r = 0.79 (P < 0.0001), affect/behavioral aggression was r = 0.81 (P < 0.0001), and negative cognition was r = 0.75 (P < 0.0001). The Italian validation of the HS revealed the use of this self-report test to have good psychometric properties. This study offers a new tool to assess homophobia. In this regard, the HS can be introduced into the clinical praxis and into programs for the prevention of homophobic behavior.
Italian Validation of Homophobia Scale (HS)

PubMed Central

Ciocca, Giacomo; Capuano, Nicolina; Tuziak, Bogdan; Mollaioli, Daniele; Limoncin, Erika; Valsecchi, Diana; Carosa, Eleonora; Gravina, Giovanni L; Gianfrilli, Daniele; Lenzi, Andrea; Jannini, Emmanuele A

2015-01-01

Introduction The Homophobia Scale (HS) is a valid tool to assess homophobia. This test is self-reporting, composed of 25 items, which assesses a total score and three factors linked to homophobia: behavior/negative affect, affect/behavioral aggression, and negative cognition. Aim The aim of this study was to validate the HS in the Italian context. Methods An Italian translation of the HS was carried out by two bilingual people, after which an English native translated the test back into the English language. A psychologist and sexologist checked the translated items from a clinical point of view. We recruited 100 subjects aged18–65 for the Italian validation of the HS. The Pearson coefficient and Cronbach's α coefficient were performed to test the test–retest reliability and internal consistency. Main Outcome Measures A sociodemographic questionnaire including the main information as age, geographic distribution, partnership status, education, religious orientation, and sex orientation was administrated together with the translated version of HS. Results The analysis of the internal consistency showed an overall Cronbach's α coefficient of 0.92. In the four domains, the Cronbach's α coefficient was 0.90 in behavior/negative affect, 0.94 in affect/behavioral aggression, and 0.92 in negative cognition, whereas in the total score was 0.86. The test–retest reliability showed the following results: the HS total score was r = 0.93 (P < 0.0001), behavior/negative affect was r = 0.79 (P < 0.0001), affect/behavioral aggression was r = 0.81 (P < 0.0001), and negative cognition was r = 0.75 (P < 0.0001). Conclusions The Italian validation of the HS revealed the use of this self-report test to have good psychometric properties. This study offers a new tool to assess homophobia. In this regard, the HS can be introduced into the clinical praxis and into programs for the prevention of homophobic behavior. PMID:26468384
Effect of rater training on reliability and accuracy of mini-CEX scores: a randomized, controlled trial.

PubMed

Cook, David A; Dupras, Denise M; Beckman, Thomas J; Thomas, Kris G; Pankratz, V Shane

2009-01-01

Mini-CEX scores assess resident competence. Rater training might improve mini-CEX score interrater reliability, but evidence is lacking. Evaluate a rater training workshop using interrater reliability and accuracy. Randomized trial (immediate versus delayed workshop) and single-group pre/post study (randomized groups combined). Academic medical center. Fifty-two internal medicine clinic preceptors (31 randomized and 21 additional workshop attendees). The workshop included rater error training, performance dimension training, behavioral observation training, and frame of reference training using lecture, video, and facilitated discussion. Delayed group received no intervention until after posttest. Mini-CEX ratings at baseline (just before workshop for workshop group), and four weeks later using videotaped resident-patient encounters; mini-CEX ratings of live resident-patient encounters one year preceding and one year following the workshop; rater confidence using mini-CEX. Among 31 randomized participants, interrater reliabilities in the delayed group (baseline intraclass correlation coefficient [ICC] 0.43, follow-up 0.53) and workshop group (baseline 0.40, follow-up 0.43) were not significantly different (p = 0.19). Mean ratings were similar at baseline (delayed 4.9 [95% confidence interval 4.6-5.2], workshop 4.8 [4.5-5.1]) and follow-up (delayed 5.4 [5.0-5.7], workshop 5.3 [5.0-5.6]; p = 0.88 for interaction). For the entire cohort, rater confidence (1 = not confident, 6 = very confident) improved from mean (SD) 3.8 (1.4) to 4.4 (1.0), p = 0.018. Interrater reliability for ratings of live encounters (entire cohort) was higher after the workshop (ICC 0.34) than before (ICC 0.18) but the standard error of measurement was similar for both periods. Rater training did not improve interrater reliability or accuracy of mini-CEX scores. clinicaltrials.gov identifier NCT00667940
Implementing the undergraduate mini-CEX: a tailored approach at Southampton University.

PubMed

Hill, Faith; Kendall, Kathleen; Galbraith, Kevin; Crossley, Jim

2009-04-01

The mini-clinical evaluation exercise (mini-CEX) is widely used in the UK to assess clinical competence, but there is little evidence regarding its implementation in the undergraduate setting. This study aimed to estimate the validity and reliability of the undergraduate mini-CEX and discuss the challenges involved in its implementation. A total of 3499 mini-CEX forms were completed. Validity was assessed by estimating associations between mini-CEX score and a number of external variables, examining the internal structure of the instrument, checking competency domain response rates and profiles against expectations, and by qualitative evaluation of stakeholder interviews. Reliability was evaluated by overall reliability coefficient (R), estimation of the standard error of measurement (SEM), and from stakeholders' perceptions. Variance component analysis examined the contribution of relevant factors to students' scores. Validity was threatened by various confounding variables, including: examiner status; case complexity; attachment specialty; patient gender, and case focus. Factor analysis suggested that competency domains reflect a single latent variable. Maximum reliability can be achieved by aggregating scores over 15 encounters (R = 0.73; 95% confidence interval [CI] +/- 0.28 based on a 6-point assessment scale). Examiner stringency contributed 29% of score variation and student attachment aptitude 13%. Stakeholder interviews revealed staff development needs but the majority perceived the mini-CEX as more reliable and valid than the previous long case. The mini-CEX has good overall utility for assessing aspects of the clinical encounter in an undergraduate setting. Strengths include fidelity, wide sampling, perceived validity, and formative observation and feedback. Reliability is limited by variable examiner stringency, and validity by confounding variables, but these should be viewed within the context of overall assessment strategies.
Cross-cultural validity of a dietary questionnaire for studies of dental caries risk in Japanese

PubMed Central

2014-01-01

Background Diet is a major modifiable contributing factor in the etiology of dental caries. The purpose of this paper is to examine the reliability and cross-cultural validity of the Japanese version of the Food Frequency Questionnaire to assess dietary intake in relation to dental caries risk in Japanese. Methods The 38-item Food Frequency Questionnaire, in which Japanese food items were added to increase content validity, was translated into Japanese, and administered to two samples. The first sample comprised 355 pregnant women with mean age of 29.2 ± 4.2 years for the internal consistency and criterion validity analyses. Factor analysis (principal components with Varimax rotation) was used to determine dimensionality. The dietary cariogenicity score was calculated from the Food Frequency Questionnaire and used for the analyses. Salivary mutans streptococci level was used as a semi-quantitative assessment of dental caries risk and measured by Dentocult SM. Dentocult SM scores were compared with the dietary cariogenicity score computed from the Food Frequency Questionnaire to examine criterion validity, and assessed by Spearman’s correlation coefficient (rs) and Kruskal-Wallis test. Test-retest reliability of the Food Frequency Questionnaire was assessed with a second sample of 25 adults with mean age of 34.0 ± 3.0 years by using the intraclass correlation coefficient analysis. Results The Japanese language version of the Food Frequency Questionnaire showed high test-retest reliability (ICC = 0.70) and good criterion validity assessed by relationship with salivary mutans streptococci levels (rs = 0.22; p < 0.001). Factor analysis revealed four subscales that construct the questionnaire (solid sugars, solid and starchy sugars, liquid and semisolid sugars, sticky and slowly dissolving sugars). Internal consistency were low to acceptable (Cronbach’s alpha = 0.67 for the total scale, 0.46-0.61 for each subscale). Mean dietary cariogenicity scores were 50.8 ± 19.5 in the first sample, 47.4 ± 14.1, and 40.6 ± 11.3 for the first and second administrations in the second sample. The distribution of Dentocult SM score was 6.8% (score = 0), 34.4% (score = 1), 39.4% (score = 2), and 19.4% (score = 3). Participants with higher scores were more likely to have higher dietary cariogenicity scores (p < 0.001; Kruskal-Wallis test). Conclusions These results provide the preliminary evidence for the reliability and validity of the Japanese language Food Frequency Questionnaire. PMID:24383547
Validity of faculty and resident global assessment of medical students' clinical knowledge during their pediatrics clerkship.

PubMed

Dudas, Robert A; Colbert, Jorie M; Goldstein, Seth; Barone, Michael A

2012-01-01

Medical knowledge is one of six core competencies in medicine. Medical student assessments should be valid and reliable. We assessed the relationship between faculty and resident global assessment of pediatric medical student knowledge and performance on a standardized test in medical knowledge. Retrospective cross-sectional study of medical students on a pediatric clerkship in academic year 2008-2009 at one academic health center. Faculty and residents rated students' clinical knowledge on a 5-point Likert scale. The inter-rater reliability of clinical knowledge ratings was assessed by calculating the intra-class correlation coefficient (ICC) for residents' ratings, faculty ratings, and both rating types combined. Convergent validity between clinical knowledge ratings and scores on the National Board of Medical Examiners (NBME) clinical subject examination in pediatrics was assessed with Pearson product moment correlation correction and the coefficient of the determination. There was moderate agreement for global clinical knowledge ratings by faculty and moderate agreement for ratings by residents. The agreement was also moderate when faculty and resident ratings were combined. Global ratings of clinical knowledge had high convergent validity with pediatric examination scores when students were rated by both residents and faculty. Our findings provide evidence for convergent validity of global assessment of medical students' clinical knowledge with NBME subject examination scores in pediatrics. Copyright Â© 2012 Academic Pediatric Association. Published by Elsevier Inc. All rights reserved.
The Jebsen Taylor Test of Hand Function: A Pilot Test-Retest Reliability Study in Typically Developing Children.

PubMed

Reedman, Sarah Elizabeth; Beagley, Simon; Sakzewski, Leanne; Boyd, Roslyn N

2016-08-01

The aim of this pilot study was to evaluate reproducibility of the Jebsen Taylor Test of Hand Function (JTTHF) in children. Eighty-seven typically developing children 5 to 10 years old were included from five Outside School Hours Care centers in the Greater Brisbane Region, Australia. Hand function was assessed on two occasions with a modified JTTHF, then reproducibility was assessed using Intraclass Correlation Coefficient (ICC [3,1]) and the Standard Error of Measurement (SEM). Total scores for male and female children were not significantly different. Five-year-old children were significantly different to all other age groups and were excluded from further analysis. Results for 71 children, 6 to 10 years old were analyzed (mean age 8.31 years (SD 1.32); 33 males). Test-retest reliability for total scores on the dominant and nondominant hands were ICC 0.74 (95% CI 0.61, 0.83) and ICC 0.72 (95% CI 0.59, 0.82), respectively. 'Writing' and 'Simulated Feeding' subtests demonstrated poor reproducibility. The Smallest Real Difference was 5.09 seconds for total score on the dominant hand. Findings indicate good test-retest reliability for the JTTHF total score to measure hand function in typically developing children aged 6 to 10 years.
Implicit Review Instrument to Evaluate Quality of Care Delivered by Physicians to Children in Emergency Departments.

PubMed

Marcin, James P; Romano, Patrick S; Dharmar, Madan; Chamberlain, James M; Dudley, Nanette; Macias, Charles G; Nigrovic, Lise E; Powell, Elizabeth C; Rogers, Alexander J; Sonnett, Meridith; Tzimenatos, Leah; Alpern, Elizabeth R; Andrews-Dickert, Rebecca; Borgialli, Dominic A; Sidney, Erika; Casper, Charlie; Dean, Jonathan Michael; Kuppermann, Nathan

2018-06-01

To evaluate the consistency, reliability, and validity of an implicit review instrument that measures the quality of care provided to children in the emergency department (ED). Medical records of randomly selected children from 12 EDs in the Pediatric Emergency Care Applied Research Network (PECARN). Eight pediatric emergency medicine physicians applied the instrument to 620 medical records. We determined internal consistency using Cronbach's alpha and inter-rater reliability using the intraclass correlation coefficient (ICC). We evaluated the validity of the instrument by correlating scores with four condition-specific explicit review instruments. Individual reviewers' Cronbach's alpha had a mean of 0.85 with a range of 0.76-0.97; overall Cronbach's alpha was 0.90. The ICC was 0.49 for the summary score with a range from 0.40 to 0.46. Correlations between the quality of care score and the four condition-specific explicit review scores ranged from 0.24 to 0.38. The quality of care instrument demonstrated good internal consistency, moderate inter-rater reliability, high inter-rater agreement, and evidence supporting validity. The instrument could be useful for systems' assessment and research in evaluating the care delivered to children in the ED. © Health Research and Educational Trust.
4H Leukodystrophy: A Brain Magnetic Resonance Imaging Scoring System.

PubMed

Vrij-van den Bos, Suzanne; Hol, Janna A; La Piana, Roberta; Harting, Inga; Vanderver, Adeline; Barkhof, Frederik; Cayami, Ferdy; van Wieringen, Wessel N; Pouwels, Petra J W; van der Knaap, Marjo S; Bernard, Geneviève; Wolf, Nicole I

2017-06-01

4H (hypomyelination, hypodontia and hypogonadotropic hypogonadism) leukodystrophy (4H) is an autosomal recessive hypomyelinating white matter (WM) disorder with neurologic, dental, and endocrine abnormalities. The aim of this study was to develop and validate a magnetic resonance imaging (MRI) scoring system for 4H. A scoring system (0-54) was developed to quantify hypomyelination and atrophy of different brain regions. Pons diameter and bicaudate ratio were included as measures of cerebral and brainstem atrophy, and reference values were determined using controls. Five independent raters completed the scoring system in 40 brain MRI scans collected from 36 patients with genetically proven 4H. Interrater reliability (IRR) and correlations between MRI scores, age, gross motor function, gender, and mutated gene were assessed. IRR for total MRI severity was found to be excellent (intraclass correlation coefficient: 0.87; 95% confidence interval: 0.80-0.92) but varied between different items with some (e.g., myelination of the cerebellar WM) showing poor IRR. Atrophy increased with age in contrast to hypomyelination scores. MRI scores (global, hypomyelination, and atrophy scores) significantly correlated with clinical handicap ( p < 0.01 for all three items) and differed between the different genotypes. Our 4H MRI scoring system reliably quantifies hypomyelination and atrophy in patients with 4H, and MRI scores reflect clinical disease severity. Georg Thieme Verlag KG Stuttgart · New York.
Stability of an ERP-based measure of brain network activation (BNA) in athletes: A new electrophysiological assessment tool for concussion.

PubMed

Eckner, James T; Rettmann, Ashley; Narisetty, Naveen; Greer, Jacob; Moore, Brandon; Brimacombe, Susan; He, Xuming; Broglio, Steven P

2016-01-01

To determine test-re-test reliabilities of novel Evoked Response Potential (ERP)-based Brain Network Activation (BNA) scores in healthy athletes. Observational, repeated-measures study. Forty-two healthy male and female high school and collegiate athletes completed auditory oddball and go/no-go ERP assessments at baseline, 1 week, 6 weeks and 1 year. The BNA algorithm was applied to the ERP data, considering electrode location, frequency band, peak latency and normalized amplitude to generate seven unique BNA scores for each testing session. Mean BNA scores, intra-class correlation coefficient (ICC) values and reliable change (RC) values were calculated for each of the seven BNA networks. BNA scores ranged from 46.3 ± 34.9 to 69.9 ± 22.8, ICC values ranged from 0.46-0.65 and 95% RC values ranged from 38.3-68.1 across the seven networks. The wide range of BNA scores observed in this population of healthy athletes suggests that a single BNA score or set of BNA scores from a single after-injury test session may be difficult to interpret in isolation without knowledge of the athlete's own baseline BNA score(s) and/or the results of serial tests performed at additional time points. The stability of each BNA network should be considered when interpreting test-re-test BNA score changes.
Reliability of Computerized Neurocognitive Tests for Concussion Assessment: A Meta-Analysis.

PubMed

Farnsworth, James L; Dargo, Lucas; Ragan, Brian G; Kang, Minsoo

2017-09-01

Although widely used, computerized neurocognitive tests (CNTs) have been criticized because of low reliability and poor sensitivity. A systematic review was published summarizing the reliability of Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) scores; however, this was limited to a single CNT. Expansion of the previous review to include additional CNTs and a meta-analysis is needed. Therefore, our purpose was to analyze reliability data for CNTs using meta-analysis and examine moderating factors that may influence reliability. A systematic literature search (key terms: reliability, computerized neurocognitive test, concussion) of electronic databases (MEDLINE, PubMed, Google Scholar, and SPORTDiscus) was conducted to identify relevant studies. Studies were included if they met all of the following criteria: used a test-retest design, involved at least 1 CNT, provided sufficient statistical data to allow for effect-size calculation, and were published in English. Two independent reviewers investigated each article to assess inclusion criteria. Eighteen studies involving 2674 participants were retained. Intraclass correlation coefficients were extracted to calculate effect sizes and determine overall reliability. The Fisher Z transformation adjusted for sampling error associated with averaging correlations. Moderator analyses were conducted to evaluate the effects of the length of the test-retest interval, intraclass correlation coefficient model selection, participant demographics, and study design on reliability. Heterogeneity was evaluated using the Cochran Q statistic. The proportion of acceptable outcomes was greatest for the Axon Sports CogState Test (75%) and lowest for the ImPACT (25%). Moderator analyses indicated that the type of intraclass correlation coefficient model used significantly influenced effect-size estimates, accounting for 17% of the variation in reliability. The Axon Sports CogState Test, which has a higher proportion of acceptable outcomes and shorter test duration relative to other CNTs, may be a reliable option; however, future studies are needed to compare the diagnostic accuracy of these instruments.

The inter-rater reliability test of the modified Morse Fall Scale among patients ≥ 55 years old in an acute care hospital in Singapore.

PubMed

Tang, Wing Sze; Chow, Yeow Leng; Koh, Serena Siew Lin

2014-02-01

A prospective, descriptive study was conducted in an acute care hospital in Singapore to determine the inter-rater reliability of the modified Morse Fall Scale by evaluating the degrees of agreement on the ratings of the individual items and overall score between the 'gold standard' assessor and the facility assessors. One hundred and forty-two subjects were recruited during the 1.5 month data collection period. The simple and weighted κ-values were all > 0.8 except for the item 'effects of medications' (κ and κw = 0.63), and the correlation coefficient (rs = 0.89) was significantly high at a significance level of < 0.001. The modified Morse Fall Scale was shown to be a reliable fall risk assessment tool having a relative high inter-rater reliability level for the overall score and individual items. This study provides evidence-based psychometric support for the clinical application of this tool. © 2013 Wiley Publishing Asia Pty Ltd.
Reliability and Construct Validity of the NEI VFQ-25 in a Subset of Patients With Geographic Atrophy From the Phase 2 Mahalo Study.

PubMed

Sivaprasad, Sobha; Tschosik, Elizabeth; Kapre, Audrey; Varma, Rohit; Bressler, Neil M; Kimel, Miriam; Dolan, Chantal; Silverman, David

2018-06-01

Geographic atrophy (GA) is an advanced form of age-related macular degeneration characterized by progressive, irreversible visual function loss. This analysis evaluates the psychometric properties of the 25-Item National Eye Institute Visual Function Questionnaire (NEI VFQ-25) composite, near activity, and distance activity scores in patients with GA. Reliability and validity study. Reliability and validity were tested with NEI VFQ-25 data collected from 100 subjects with GA from United States' sites of the phase 2 Mahalo study of lampalizumab (ClinicalTrials.gov identifier: NCT01229215). Strong internal consistency and reproducibility were demonstrated for the NEI VFQ-25 composite (Cronbach's α, 0.95; intraclass correlation coefficient [ICC], 0.86), near activity (Cronbach's α, 0.84; ICC, 0.80), and distance activity (Cronbach's α, 0.84; ICC, 0.84) scores. Convergent validity with the binocular measures, Minnesota Low-Vision Reading Test (MNRead) reading speed and Functional Reading Independence (FRI) index score, was demonstrated for baseline NEI VFQ-25 composite (Pearson correlation [r] = 0.61 and 0.69, respectively), near activities (r = 0.69 and 0.73), and distance activities (r = 0.57 and 0.64) scores. Known-group validity testing for baseline mean NEI VFQ-25 scores (composite, near activities, and distance activities) showed differences between patients with mean maximum MNRead reading speed ≥ 80 vs < 80 words per minute, and between mean FRI index score ≥ 2.5 vs < 2.5 (all P < .0001). Psychometric evidence supports the NEI VFQ-25 as a reliable and valid cross-sectional measure of the impact of GA on patient visual function and vision-related quality of life. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Assessing quality of life in severe obesity: development and psychometric properties of the ORWELL-R.

PubMed

Camolas, José; Ferreira, André; Mannucci, Edoardo; Mascarenhas, Mário; Carvalho, Manuel; Moreira, Pedro; do Carmo, Isabel; Santos, Osvaldo

2016-06-01

Several health-related quality-of-life (HRQoL) dimensions are affected by obesity. Our goal was to characterize the psychometric properties of the ORWELL-R, a new obesity-related quality-of-life instrument for assessing the "individual experience of overweightness". This psychometric assessment included two different samples: one multicenter clinical sample, used for assessing internal consistency, construct validity and temporal reliability; and a community sample (collected through a cross-sectional mailing survey design), used for additional construct validity assessment and model fit confirmation. Overall, 946 persons participated (188 from the clinical sample; 758 from community sample). An alpha coefficient of 0.925 (clinical sample) and 0.934 (community sample) was found. Three subscales were identified (53.2 % of variance): Body environment experience (alpha = 0.875), Illness perception and distress (alpha = 0.864), Physical symptoms (alpha = 0.674). Adequate test-retest reliability has been confirmed (ICC: 0.78 for the overall score). ORWELL-R scores were worse in the clinical sample. Worst HRQoL, as measured by higher ORWELL-R scores, was associated with BMI increases. ORWELL-R scores were associated with IWQOL-Lite and lower scores in happiness. ORWELL-R shows good internal consistency and adequate test-retest reliability. Good construct validity was also observed (for convergent and discriminant validity) and confirmed through confirmatory factor analysis (in both clinical and community samples). Presented data sustain ORWELL-R as a reliable and useful instrument to assess obesity-related QoL, in both research and clinical contexts.
Animal-Based Measures to Assess the Welfare of Extensively Managed Ewes

PubMed Central

Hemsworth, Paul; Doyle, Rebecca

2017-01-01

Simple Summary The aim of this study was to assess the reliability and practicality of 10 animal-based welfare measures for extensively managed ewes, which were derived from the scientific literature, previous welfare protocols and through consultation with veterinarians and animal welfare scientists. Measures were examined on 100 Merino ewes, which were individually identified and repeatedly examined at mid-pregnancy, mid-lactation and weaning. Body condition score, fleece condition, skin lesions, tail length, dag score and lameness are proposed for on-farm use in welfare assessments of extensive sheep production systems. These six welfare measures, which address the main welfare concerns for extensively managed ewes, can be reliably and feasibly measured in the field. Abstract The reliability and feasibility of 10 animal-based measures of ewe welfare were examined for use in extensive sheep production systems. Measures were: Body condition score (BCS), rumen fill, fleece cleanliness, fleece condition, skin lesions, tail length, dag score, foot-wall integrity, hoof overgrowth and lameness, and all were examined on 100 Merino ewes (aged 2–4 years) during mid-pregnancy, mid-lactation and weaning by a pool of nine trained observers. The measures of BCS, fleece condition, skin lesions, tail length, dag score and lameness were deemed to be reliable and feasible. All had good observer agreement, as determined by the percentage of agreement, Kendall’s coefficient of concordance (W) and Kappa (k) values. When combined, these nutritional and health measures provide a snapshot of the current welfare status of ewes, as well as evidencing previous or potential welfare issues. PMID:29295551
The Depression Anxiety Stress Scales-21 (DASS-21): further examination of dimensions, scale reliability, and correlates.

PubMed

Osman, Augustine; Wong, Jane L; Bagge, Courtney L; Freedenthal, Stacey; Gutierrez, Peter M; Lozano, Gregorio

2012-12-01

We conducted two studies to examine the dimensions, internal consistency reliability estimates, and potential correlates of the Depression Anxiety Stress Scales-21 (DASS-21; Lovibond & Lovibond, 1995). Participants in Study 1 included 887 undergraduate students (363 men and 524 women, aged 18 to 35 years; mean [M] age = 19.46, standard deviation [SD] = 2.17) recruited from two public universities to assess the specificity of the individual DASS-21 items and to evaluate estimates of internal consistency reliability. Participants in a follow-up study (Study 2) included 410 students (168 men and 242 women, aged 18 to 47 years; M age = 19.65, SD = 2.88) recruited from the same universities to further assess factorial validity and to evaluate potential correlates of the original DASS-21 total and scale scores. Item bifactor and confirmatory factor analyses revealed that a general factor accounted for the greatest proportion of common variance in the DASS-21 item scores (Study 1). In Study 2, the fit statistics showed good fit for the bifactor model. In addition, the DASS-21 total scale score correlated more highly with scores on a measure of mixed depression and anxiety than with scores on the proposed specific scales of depression or anxiety. Coefficient omega estimates for the DASS-21 scale scores were good. Further investigations of the bifactor structure and psychometric properties of the DASS-21, specifically its incremental and discriminant validity, using known clinical groups are needed. © 2012 Wiley Periodicals, Inc.
First quality score for referral letters in gastroenterology-a validation study.

PubMed

Eskeland, Sigrun Losada; Brunborg, Cathrine; Seip, Birgitte; Wiencke, Kristine; Hovde, Øistein; Owen, Tanja; Skogestad, Erik; Huppertz-Hauss, Gert; Halvorsen, Fred-Arne; Garborg, Kjetil; Aabakken, Lars; de Lange, Thomas

2016-10-08

To create and validate an objective and reliable score to assess referral quality in gastroenterology. An observational multicentre study. 25 gastroenterologists participated in selecting variables for a Thirty Point Score (TPS) for quality assessment of referrals to gastroenterology specialist healthcare for 9 common indications. From May to September 2014, 7 hospitals from the South-Eastern Norway Regional Health Authority participated in collecting and scoring 327 referrals to a gastroenterologist. Correlation between the TPS and a visual analogue scale (VAS) for referral quality. The 327 referrals had an average TPS of 13.2 (range 1-25) and an average VAS of 4.7 (range 0.2-9.5). The reliability of the score was excellent, with an intra-rater intraclass correlation coefficient (ICC) of 0.87 and inter-rater ICC of 0.91. The overall correlation between the TPS and the VAS was moderate (r=0.42), and ranged from fair to substantial for the various indications. Mean agreement was good (ICC=0.47, 95% CI (0.34 to 0.57)), ranging from poor to good. The TPS is reliable, objective and shows good agreement with the subjective VAS. The score may be a useful tool for assessing referral quality in gastroenterology, particularly important when evaluating the effect of interventions to improve referral quality. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
The Chinese-version of the CARE Measure reliably differentiates between doctors in primary care: a cross-sectional study in Hong Kong

PubMed Central

2011-01-01

Background The Consultation and Relational Empathy (CARE) Measure is a widely used patient-rated experience measure which has recently been translated into Chinese and has undergone preliminary qualitative and quantitative validation. The objective of this study was to determine the reliability of the Chinese-version of the CARE Measure in reliably differentiating between doctors in a primary care setting in Hong Kong Methods Data were collected from 984 primary care patients attending 20 doctors with differing levels of training in family medicine in 5 public clinics in Hong Kong. The acceptability of the Chinese-CARE measure to patients was assessed. The reliability of the measure in discriminating effectively between doctors was analysed by Generalisability-theory (G-Theory) Results The items in the Chinese-CARE measure were regarded as important by patients and there were few 'not applicable' responses. The measure showed high internal reliability (coefficient 0.95) and effectively differentiated between doctors with only 15-20 patient ratings per doctor (inter-rater reliability > 0.8). Doctors' mean CARE measure scores varied widely, ranging from 24.1 to 45.9 (maximum possible score 50) with a mean of 34.6. CARE Measure scores were positively correlated with level of training in family medicine (Spearman's rho 0.493, p < 0.05). Conclusion These data demonstrate the acceptability, feasibility and reliability of using the Chinese-CARE Measure in primary care in Hong Kong to differentiate between doctors interpersonal competencies. Training in family medicine appears to enhance these key interpersonal skills. PMID:21631927
Creation and Validation of the Self-esteem/Self-image Female Sexuality (SESIFS) Questionnaire

PubMed Central

Lordello, Maria CO; Ambrogini, Carolina C; Fanganiello, Ana L; Embiruçu, Teresa R; Zaneti, Marina M; Veloso, Laise; Piccirillo, Livia B; Crude, Bianca L; Haidar, Mauro; Silva, Ivaldo

2014-01-01

INTRODUCTION Self-esteem and self-image are psychological aspects that affect sexual function. AIMS To validate a new measurement tool that correlates the concepts of self-esteem, self-image, and sexuality. METHODS A 20-question test (the self-esteem/self-image female sexuality [SESIFS] questionnaire) was created and tested on 208 women. Participants answered: Rosenberg’s self-esteem scale, the female sexual quotient (FSQ), and the SESIFS questionnaire. Pearson’s correlation coefficient was used to test concurrent validity of the SESIFS against Rosenberg’s self-esteem scale and the FSQ. Reliability was tested using the Cronbach’s alpha coefficient. RESULT The new questionnaire had a good overall reliability (Cronbach’s alpha r = 0.862, p < 0.001), but the sexual domain scored lower than expected (r = 0.65). The validity was good: overall score r = 0.38, p < 0.001, self-esteem domain r = 0.32, p < 0.001, self-image domain r = 0.31, p < 0.001, sexual domain r = 0.29, p < 0.001. CONCLUSIONS The SESIFS questionnaire has limitations in measuring the correlation among self-esteem, self-image, and sexuality domains. A new, revised version is being tested and will be presented in an upcoming publication. PMID:25574149
Creation and Validation of the Self-esteem/Self-image Female Sexuality (SESIFS) Questionnaire.

PubMed

Lordello, Maria Co; Ambrogini, Carolina C; Fanganiello, Ana L; Embiruçu, Teresa R; Zaneti, Marina M; Veloso, Laise; Piccirillo, Livia B; Crude, Bianca L; Haidar, Mauro; Silva, Ivaldo

2014-01-01

Self-esteem and self-image are psychological aspects that affect sexual function. To validate a new measurement tool that correlates the concepts of self-esteem, self-image, and sexuality. A 20-question test (the self-esteem/self-image female sexuality [SESIFS] questionnaire) was created and tested on 208 women. Participants answered: Rosenberg's self-esteem scale, the female sexual quotient (FSQ), and the SESIFS questionnaire. Pearson's correlation coefficient was used to test concurrent validity of the SESIFS against Rosenberg's self-esteem scale and the FSQ. Reliability was tested using the Cronbach's alpha coefficient. The new questionnaire had a good overall reliability (Cronbach's alpha r = 0.862, p < 0.001), but the sexual domain scored lower than expected (r = 0.65). The validity was good: overall score r = 0.38, p < 0.001, self-esteem domain r = 0.32, p < 0.001, self-image domain r = 0.31, p < 0.001, sexual domain r = 0.29, p < 0.001. The SESIFS questionnaire has limitations in measuring the correlation among self-esteem, self-image, and sexuality domains. A new, revised version is being tested and will be presented in an upcoming publication.
Hypertension Knowledge-Level Scale (HK-LS): a study on development, validity and reliability.

PubMed

Erkoc, Sultan Baliz; Isikli, Burhanettin; Metintas, Selma; Kalyoncu, Cemalettin

2012-03-01

This study was conducted to develop a scale to measure knowledge about hypertension among Turkish adults. The Hypertension Knowledge-Level Scale (HK-LS) was generated based on content, face, and construct validity, internal consistency, test re-test reliability, and discriminative validity procedures. The final scale had 22 items with six sub-dimensions. The scale was applied to 457 individuals aged ≥ 18 years, and 414 of them were re-evaluated for test-retest reliability. The six sub-dimensions encompassed 60.3% of the total variance. Cronbach alpha coefficients were 0.82 for the entire scale and 0.92, 0.59, 0.67, 0.77, 0.72, and 0.76 for the sub-dimensions of definition, medical treatment, drug compliance, lifestyle, diet, and complications, respectively. The scale ensured internal consistency in reliability and construct validity, as well as stability over time. Significant relationships were found between knowledge score and age, gender, educational level, and history of hypertension of the participants. No correlation was found between knowledge score and working at an income-generating job. The present scale, developed to measure the knowledge level of hypertension among Turkish adults, was found to be valid and reliable.
The development, validity, and reliability of the Addiction Profile Index (API).

PubMed

Ögel, Kültegin; Evren, Cüneyt; Karadağ, Figen; Gürol, Defne Tamar

2012-01-01

The objective of this study was to develop a practical questionnaire for multidimensional assessment of problems associated with alcohol and substance abuse that would also be useful for treatment planning. The Addiction Profile Index (API) is a self-report questionnaire consisting of 37 items and the following 5 subscales: characteristics of substance use; dependency diagnosis; the effects of subsance use on the user; craving; motivation to quit using substances. The study included 345 alcohol and/or substance abusers from 2 addiction treatment clinics and a prison addiction service. The validity of the questionnaire was assessed using the Michigan Alcoholism Screening Test (MAST), Readiness to Change Questionnaire (SOCRATES), Penn Alcohol Craving Scale (PACS), Drug Craving Scale (DCS), Structured Clinical Interview for DSM-IV Axis I Disorders (SCID-I), and Addiction Severity Index (ASI). The Cronbach's alpha coefficient for the total API was 0.89 and for the subscales it ranged from 0.63 to 0.86. Item-total correlation coefficients ranged from 0.42 to 0.89. The Spearman Brown split-half method coefficient for the total API was 0.83. In all, 4 factors were obtained using explanatory factor analysis that represented 52.3% of the total variance. The API craving subscale was observed to be consistent with PACS and the API motivation subscale was consistent with SOCRATES. The API total score was strongly correlated with the mean MAST score, and the composite ASI medical status, substance use, legal status, and family social relations subscale scores. Based on ROC analyses, the area under curve was 0.90. With a total API cut-off score of 4, the scale's sensitivity and specificity 0.85 was 0.78, respectively. The findings show that the API is a valid and reliable questionnaire that can be used to measure the severity of different dimensions of substance dependency.
Psychometric Properties of the Persian Translation of the Sexual Quality of Life–Male Questionnaire

PubMed Central

Maasoumi, Raziyeh; Mokarami, Hamidreza; Nazifi, Morteza; Stallones, Lorann; Taban, Abrahim; Yazdani Aval, Mohsen; Samimi, Kazem

2016-01-01

Sexual dysfunction has been demonstrated to be related to a poor quality of life. These dysfunctions are especially prevalent among men. This cross-sectional study aimed to investigate the psychometric properties of the Persian translation of the Sexual Quality of Life–Male (SQOL-M), translated and adapted to measure sexual quality of life among Iranian men. Forward–backward procedures were applied in translating the original SQOL-M into Persian, and then the psychometric properties of the Persian translation of the SQOL-M were studied. A total of 181 participants (23-60 years old) were included in the study. Validity was assessed by construct validity using confirmatory factor analysis, convergent validity, and content validity. The international index of erectile function (IIEF) and the work ability index were used to study the convergent validity. Reliability was evaluated through internal consistency and test–retest reliability analyses. The results from confirmatory factor analysis confirmed a one-factor solution for the Persian version of the SQOL-M. Content validity of the translated measure was endorsed by 10 specialists. Pearson correlations indicated that work ability index score, dimensions of the IIEF, and the IIEF total score were positively correlated with the Persian version of the SQOL-M (p < .001). Reliability evaluation indicated a high internal consistency and test–retest reliability. The Cronbach’s alpha coefficient and intraclass correlation coefficients were .96 and .95, respectively. Results indicated that the Persian version of the SQOL-M has good to excellent psychometric properties and can be used to assess the sexual quality of life among Iranian men. PMID:26856758
Psychometric Properties of the Persian Translation of the Sexual Quality of Life-Male Questionnaire.

PubMed

Maasoumi, Raziyeh; Mokarami, Hamidreza; Nazifi, Morteza; Stallones, Lorann; Taban, Abrahim; Yazdani Aval, Mohsen; Samimi, Kazem

2017-05-01

Sexual dysfunction has been demonstrated to be related to a poor quality of life. These dysfunctions are especially prevalent among men. This cross-sectional study aimed to investigate the psychometric properties of the Persian translation of the Sexual Quality of Life-Male (SQOL-M), translated and adapted to measure sexual quality of life among Iranian men. Forward-backward procedures were applied in translating the original SQOL-M into Persian, and then the psychometric properties of the Persian translation of the SQOL-M were studied. A total of 181 participants (23-60 years old) were included in the study. Validity was assessed by construct validity using confirmatory factor analysis, convergent validity, and content validity. The international index of erectile function (IIEF) and the work ability index were used to study the convergent validity. Reliability was evaluated through internal consistency and test-retest reliability analyses. The results from confirmatory factor analysis confirmed a one-factor solution for the Persian version of the SQOL-M. Content validity of the translated measure was endorsed by 10 specialists. Pearson correlations indicated that work ability index score, dimensions of the IIEF, and the IIEF total score were positively correlated with the Persian version of the SQOL-M ( p < .001). Reliability evaluation indicated a high internal consistency and test-retest reliability. The Cronbach's alpha coefficient and intraclass correlation coefficients were .96 and .95, respectively. Results indicated that the Persian version of the SQOL-M has good to excellent psychometric properties and can be used to assess the sexual quality of life among Iranian men.
One year test-retest reliability of neurocognitive baseline scores in 10- to 12-year olds.

PubMed

Moser, Rosemarie Scolaro; Schatz, Philip; Grosner, Emily; Kollias, Kelly

2017-01-01

How often youth athletes 10-12 years of age should undergo neurocognitive baseline testing remains an unanswered question. We sought to examine the test-retest reliability of annual ImPACT data in a sample of middle school athletes. Participants were 30 youth athletes, ages 10-12 years (Mean = 11.6, SD = 0.6) selected from a larger database of 10-18 year old athletes, who completed two consecutive annual baseline evaluations using the online version of ImPACT. Athlete assent and parental consent were obtained for all participants. Assessments were conducted either individually or in small groups of 2 to 3 athletes, under the supervision of a neuropsychologist or post-doctoral fellow. Test-retest coefficients were as follows: Verbal Memory .71, Visual Memory .35, Visual Motor Speed .69, Reaction Time .34. Intra-class Correlation Coefficients (single/average) were as follows: Verbal Memory .70/.83, Visual Memory .35/.52, Visual Motor Speed .69/.82, Reaction Time .34/.50. Regression-based measures to correct for practice effects revealed that only a small percentage of cases fell outside 90 and 95% confidence intervals, reflecting stability across assessments. Findings indicate that test-retest reliability of Verbal Memory and Visual Motor Speed are generally stable in 10-12 year old athletes. Nevertheless, Visual Memory Index, Reaction Time Index, and Symptom Checklist scores appear to be less reliable over time, especially compared to published data on high school athletes, suggesting the utility of re-testing on an annual basis in this younger age group.
Reliability and validity of the Iranian version of the Pediatric Quality of Life Inventory™ 4.0 Generic Core Scales in adolescents.

PubMed

Amiri, Parisa; M Ardekani, Emad; Jalali-Farahani, Sara; Hosseinpanah, Farhad; Varni, James W; Ghofranipour, Fazlollah; Montazeri, Ali; Azizi, Fereidoun

2010-12-01

The objective of this study was to investigate the reliability and validity of the Iranian version of the Pediatric Quality of Life Inventory™ 4.0 (PedsQL™ 4.0) Generic Core Scales in adolescents After linguistic validation, the Iranian version of the PedsQL™ 4.0 was completed by 848 healthy and 26 chronically ill adolescents aged 13-18 years and their parents. The internal consistency as measured by Cronbach's alpha coefficients exceeded the minimum reliability standard of .70. No floor effects were observed. Ceiling effects detected ranged from 1.5% for adolescent self-report total scale score to 42.2% for self-report social functioning. All monotrait-multimethod correlations were higher than multitrait-multimethod correlations. The intraclass correlation coefficients (ICC) between adolescent self-report and parent proxy-report showed good to excellent agreement. Exploratory factor analysis supported mainly comparable results with the original US English dialect version. The results of the confirmatory factor analysis for 5-factor models for both self-report and proxy-report indicated acceptable fit for the proposed models. Regarding gender and health status, as hypothesized from previous studies, girls reported lower health-related quality of life than boys on the total score, physical and emotional functioning, and healthy adolescents reported significantly higher health-related quality of life than those with chronic illnesses. The findings support the initial reliability and validity of the Iranian version of the PedsQL™ 4.0 as a generic instrument to measure HRQOL of adolescents in Iran.
Validation of the Spanish version of the "Questionnaire on the treatment of approximal and occlusal caries".

PubMed

Ruiz, Begoña; Urzúa, Iván; Cabello, Rodrigo; Rodríguez, Gonzalo; Espelid, Ivar

2013-01-01

To translate and validate a Spanish version of the "Questionnaire on the treatment of approximal and occlusal caries" as a method of collecting information about treatment decisions on caries management in Chilean primary health care services. The original questionnaire proposed by Espelid et al. was translated into Spanish using the forward-backward translation technique. Subsequently, validation of the Spanish version was undertaken. Data were collected from two separate samples; first, from 132 Spanish-speaking dentists recruited from primary health care services and second, from 21 individuals characterised as cariologists. Internal consistency was evaluated by the generation of Cronbach's alpha, test-retest reliability was evaluated by Cohen's kappa, convergent validity was evaluated by comparing the total scale scores to a global evaluation of treatment trends and discriminant validity was evaluated by investigating the differences in total scale scores between the Spanish-speaking dentist and cariologist samples. Cronbach's alpha indicated an internal consistency of 0.63 for the entire scale. Cohen's kappa correlation coefficient expressed a test-retest reliability of 0.83. Convergent validity determined a Pearson's correlation coefficient of 0.24 (p < 0.01). The comparison of proportions (chi-squared) indicated that discriminant validity was statistically significant (p < 0.01), using a one-tailed test. The Spanish version of the "Questionnaire on the treatment of approximal and occlusal caries" is a valid and reliable instrument for collecting information regarding treatment decisions in cariology. The clinical relevance of this study is to acquire a reliable instrument that allows for the determination of treatment decisions in Spanish-speaking dentists.
Acne-specific quality of life questionnaire (Acne-QoL): translation, cultural adaptation and validation into Brazilian-Portuguese language*

PubMed Central

Kamamoto, Cristhine de Souza Leão; Hassun, Karime Marques; Bagatin, Ediléia; Tomimori, Jane

2014-01-01

BACKGROUND many studies about the psychosocial impact of acne have been reported in international medical literature describing quality of life as a relevant clinical outcome. It is well known that the patient's perception about the disease may be different from the physician's evaluation. Therefore, it is important to use validated instruments that turn the patient's subjective opinion into objective information. OBJECTIVES to translate into Brazilian-Portuguese language and to culturally adapt a quality of life questionnaire, the Acne-Specific Quality of Life Questionnaire (Acne-QoL), as well as to evaluate its reliability and validity. METHODS measurement properties were assessed: 1) validity: comparison between severity and Acne-QoL domain scores, correlations between acne duration and Acne-QoL domain scores, and correlation between Acne-QoL domain scores and SF-36 components; 2) internal consistency: Cronbach's α coefficient; 3) test-retest reproducibility: intraclass correlation coefficient and Wilcoxon test. RESULTS Eighty subjects with a mean age of 20.5 ± 4.8 years presenting mild (33.8%), moderate (36.2%) and severe (30%) facial acne were enrolled. Acne-QoL domain scores were similar among the different acne severity groups except for role-social domain. Subjects with shorter acne duration presented significant higher scores. Acne-QoL domains showed significant correlations, both between themselves and with SF-36 role-social and mental health components. Internal consistency (0.925-0.952) and test-retest reproducibility were considered acceptable (0.768-0.836). CONCLUSIONS the Brazilian-Portuguese version of the Acne-QoL is a reliable and valid satisfactory outcome measure to be used in facial acne studies. PMID:24626652
CROSS-CULTURAL ADAPTATION AND VALIDATION OF THE KOREAN VERSION OF THE CUMBERLAND ANKLE INSTABILITY TOOL.

PubMed

Ko, Jupil; Rosen, Adam B; Brown, Cathleen N

2015-12-01

The Cumberland Ankle Instability Tool (CAIT) is a valid and reliable patient reported outcome used to assess the presence and severity of chronic ankle instability (CAI). The CAIT has been cross-culturally adapted into other languages for use in non-English speaking populations. However, there are no valid questionnaires to assess CAI in individuals who speak Korean. The purpose of this study was to translate, cross-culturally adapt, and validate the CAIT, for use in a Korean-speaking population with CAI. Cross-cultural reliability study. The CAIT was cross-culturally adapted into Korean according to accepted guidelines and renamed the Cumberland Ankle Instability Tool-Korean (CAIT-K). Twenty-three participants (12 males, 11 females) who were bilingual in English and Korean were recruited and completed the original and adapted versions to assess agreement between versions. An additional 168 national level Korean athletes (106 male, 62 females; age = 20.3 ± 1.1 yrs), who participated in ≥ 90 minutes of physical activity per week, completed the final version of the CAIT-K twice within 14 days. Their completed questionnaires were assessed for internal consistency, test-retest reliability, criterion validity, and construct validity. For bilingual participants, intra-class correlation coefficients (ICC2,1) between the CAIT and the CAIT-K for test-retest reliability were 0.95 (SEM=1.83) and 0.96 (SEM=1.50) in right and left limbs, respectively. The Cronbach's alpha coefficients were 0.92 and 0.90 for the CAIT-K in right and left limbs, respectively. For native Korean speakers, the CAIT-K had high internal consistency (Cronbach's α=0.89) and intra-class correlation coefficient (ICC2,1 = 0.94, SEM=1.72), correlation with the physical component score (rho=0.70, p = 0.001) of the Short-Form Health Survey (SF-36), and the Kaiser-Meyer-Olkin score was 0.87. The original CAIT was translated, cross-culturally adapted, and validated from English to Korean. The CAIT-K appears to be valid and reliable and could be useful in assessing the Korean speaking population with CAI.
Evidence of Validity for the Japanese Version of the Foot and Ankle Ability Measure

PubMed Central

Uematsu, Daisuke; Suzuki, Hidetomo; Sasaki, Shogo; Nagano, Yasuharu; Shinozuka, Nobuyuki; Sunagawa, Norihiko; Fukubayashi, Toru

2015-01-01

Context: The Foot and Ankle Ability Measure (FAAM) is a valid, reliable, and self-reported outcome instrument for the foot and ankle region. Objective: To provide evidence for translation, cross-cultural adaptation, validity, and reliability of the Japanese version of the FAAM (FAAM-J). Design: Cross-sectional study. Setting: Collegiate athletic training/sports medicine clinical setting. Patients or Other Participants: Eighty-three collegiate athletes. Main Outcome Measure(s): All participants completed the Activities of Daily Living and Sports subscales of the FAAM-J and the Physical Functioning and Mental Health subscales of the Japanese version of the Short Form-36v2 (SF-36). Also, 19 participants (23%) whose conditions were expected to be stable completed another FAAM-J 2 to 6 days later for test-retest reliability. We analyzed the scores of those subscales for convergent and divergent validity, internal consistency, and test-retest reliability. Results: The Activities of Daily Living and Sports subscales of the FAAM-J had correlation coefficients of 0.86 and 0.75, respectively, with the Physical Functioning section of the SF-36 for convergent validity. For divergent validity, the correlation coefficients with Mental Health of the SF-36 were 0.29 and 0.27 for each subscale, respectively. Cronbach α for internal consistency was 0.99 for the Activities of Daily Living and 0.98 for the Sports subscale. A 95% confidence interval with a single measure was ±8.1 and ±14.0 points for each subscale. The test-retest reliability measures revealed intraclass correlation coefficient values of 0.87 for the Activities of Daily Living and 0.91 for the Sports subscales with minimal detectable changes of ±6.8 and ±13.7 for the respective subscales. Conclusions: The FAAM was successfully translated for a Japanese version, and the FAAM-J was adapted cross-culturally. Thus, the FAAM-J can be used as a self-reported outcome measure for Japanese-speaking individuals; however, the scores must be interpreted with caution, especially when applied to different populations and other types of injury than those included in this study. PMID:25310247
Volunteer Functions Inventory: A systematic review.

PubMed

Chacón, Fernando; Gutiérrez, Gema; Sauto, Verónica; Vecina, María L; Pérez, Alfonso

2017-08-01

The objective of this research study was to conduct a systematic review of the research on volunteers using Clary et al.’s VFI (1998). A total of 48 research studies including 67 independent samples met eligibility criteria. The total sample of the studies analyzed ranged from 20375 to 21988 participants, depending on the motivation analyzed. The results show that the Values factor obtained the highest mean score, both overall and in each type of volunteering, whereas the lowest scores were for the Career and Enhancement factors. Studies conducted with samples with a mean age under 40 years obtain higher scores on Career and Understanding scales when compared to studies in older samples. The group of studies with less than 50% women yield higher mean scores on the Social scale than studies with more than 50% women in the sample. All the scales show reliability coefficients between .78 and .84. Only eight of the articles provide data on the reliability of the scale with a mean value of .90. Of the 26 studies that performed factor analysis, 18 confirmed the original structure of six factors.

Reliability of the Walker Cranial Nonmetric Method and Implications for Sex Estimation.

PubMed

Lewis, Cheyenne J; Garvin, Heather M

2016-05-01

The cranial trait scoring method presented in Buikstra and Ubelaker (Standards for data collection from human skeletal remains. Fayetteville, AR: Arkansas Archeological Survey Research Series No. 44, 1994) and Walker (Am J Phys Anthropol, 136, 2008 and 39) is the most common nonmetric cranial sex estimation method utilized by physical and forensic anthropologists. As such, the reliability and accuracy of the method is vital to ensure its validity in forensic applications. In this study, inter- and intra-observer error rates for the Walker scoring method were calculated using a sample of U.S. White and Black individuals (n = 135). Cohen's weighted kappas, intraclass correlation coefficients, and percentage agreements indicate good agreement between trials and observers for all traits except the mental eminence. Slight disagreement in scoring, however, was found to impact sex classifications, leading to lower accuracy rates than those published by Walker. Furthermore, experience does appear to impact trait scoring and sex classification. The use of revised population-specific equations that avoid the mental eminence is highly recommended to minimize the potential for misclassifications. © 2016 American Academy of Forensic Sciences.
Impact of Alzheimer's Disease on Caregiver Questionnaire: internal consistency, convergent validity, and test-retest reliability of a new measure for assessing caregiver burden.

PubMed

Cole, Jason C; Ito, Diane; Chen, Yaozhu J; Cheng, Rebecca; Bolognese, Jennifer; Li-McLeod, Josephine

2014-09-04

There is a lack of validated instruments to measure the level of burden of Alzheimer's disease (AD) on caregivers. The Impact of Alzheimer's Disease on Caregiver Questionnaire (IADCQ) is a 12-item instrument with a seven-day recall period that measures AD caregiver's burden across emotional, physical, social, financial, sleep, and time aspects. Primary objectives of this study were to evaluate psychometric properties of IADCQ administered on the Web and to determine most appropriate scoring algorithm. A national sample of 200 unpaid AD caregivers participated in this study by completing the Web-based version of IADCQ and Short Form-12 Health Survey Version 2 (SF-12v2™). The SF-12v2 was used to measure convergent validity of IADCQ scores and to provide an understanding of the overall health-related quality of life of sampled AD caregivers. The IADCQ survey was also completed four weeks later by a randomly selected subgroup of 50 participants to assess test-retest reliability. Confirmatory factor analysis (CFA) was implemented to test the dimensionality of the IADCQ items. Classical item-level and scale-level psychometric analyses were conducted to estimate psychometric characteristics of the instrument. Test-retest reliability was performed to evaluate the instrument's stability and consistency over time. Virtually none (2%) of the respondents had either floor or ceiling effects, indicating the IADCQ covers an ideal range of burden. A single-factor model obtained appropriate goodness of fit and provided evidence that a simple sum score of the 12 items of IADCQ can be used to measure AD caregiver's burden. Scales-level reliability was supported with a coefficient alpha of 0.93 and an intra-class correlation coefficient (for test-retest reliability) of 0.68 (95% CI: 0.50-0.80). Low-moderate negative correlations were observed between the IADCQ and scales of the SF-12v2. The study findings suggest the IADCQ has appropriate psychometric characteristics as a unidimensional, Web-based measure of AD caregiver burden and is supported by strong model fit statistics from CFA, high degree of item-level reliability, good internal consistency, moderate test-retest reliability, and moderate convergent validity. Additional validation of the IADCQ is warranted to ensure invariance between the paper-based and Web-based administration and to determine an appropriate responder definition.
Psychometric properties of the Thought-Action Fusion Scale in a Turkish sample.

PubMed

Yorulmaz, Orçun; Yilmaz, A Esin; Gençöz, Tülin

2004-10-01

The aim of the present study was to reveal the cross-cultural utility of the Thought-Action Fusion Scale (TAFS; J. Anxiety Disord. 10 (1996) 379). Thought-action fusion (TAF) refers to the tendency to overvalue the significance and the consequences of thoughts. Two hundred and fifty one undergraduate Turkish students participated in the current study. The reliability and validity analyses of the Turkish version of the scale indicated that the TAFS had adequate psychometric properties in a Turkish sample. Consistent with the original TAF, the Turkish version of TAFS revealed two subscales as TAF-Likelihood and TAF-Morality. Reliability analysis showed that TAF Scale and its factors had adequate internal consistencies and split-half reliability coefficients. Confirming the expectations, TAFS scores were found to be significantly and positively correlated with obsessive-compulsive symptoms, responsibility, and guilt measures. Moreover, it was found that people with high obsessive-compulsive symptoms had higher TAFS scores than those with low symptoms.
Accuracy and Reliability of the Klales et al. (2012) Morphoscopic Pelvic Sexing Method.

PubMed

Lesciotto, Kate M; Doershuk, Lily J

2018-01-01

Klales et al. (2012) devised an ordinal scoring system for the morphoscopic pelvic traits described by Phenice (1969) and used for sex estimation of skeletal remains. The aim of this study was to test the accuracy and reliability of the Klales method using a large sample from the Hamann-Todd collection (n = 279). Two observers were blinded to sex, ancestry, and age and used the Klales et al. method to estimate the sex of each individual. Sex was correctly estimated for females with over 95% accuracy; however, the male allocation accuracy was approximately 50%. Weighted Cohen's kappa and intraclass correlation coefficient analysis for evaluating intra- and interobserver error showed moderate to substantial agreement for all traits. Although each trait can be reliably scored using the Klales method, low accuracy rates and high sex bias indicate better trait descriptions and visual guides are necessary to more accurately reflect the range of morphological variation. © 2017 American Academy of Forensic Sciences.
Tissue tonometry is a simple, objective measure for pliability of burn scar: is it reliable?

PubMed

Lye, Ian; Edgar, Dale W; Wood, Fiona M; Carroll, Sara

2006-01-01

Objective measurement of burn scar response to treatment is important to facilitate individual patient care, research, and service development. This work examines the validity and reliability of the tonometer as a means of quantifying scar pliability. Ten burn survivors were recruited into the study. Triplicate measures were taken for each of four scar and one normal skin point. The pliability score from the Vancouver Scar Scale also was used as a comparison. The tonometer demonstrated a high degree of reliability (intraclass correlation coefficients 0.91-0.94). It also was shown to provide a valid measure of pliability by quantifying decreased tissue deformation for scar (2.04 +/- 0.45 mm) compared with normal tissue (3.02 +/- 0.92 mm; t = 4.28, P = .004) and a moderate correlation with Vancouver Scar Scale scores. The tissue tonometer provides a repeatable, objective index of burn scar pliability. Using the methods described, it is a simple, clinically useful technique for monitoring an individual's scar.
The validity and reliability of a customized rigid supportive harness during Smith machine back squat exercise.

PubMed

Scott, Brendan R; Dascombe, Ben J; Delaney, Jace A; Elsworthy, Nathan; Lockie, Robert G; Sculley, Dean V; Slattery, Katie M

2014-03-01

Although the back squat exercise is commonly prescribed to both athletic and clinical populations, individuals with restricted glenohumeral mobility may be unable to safely support the bar on the upper trapezius using their hands. The aims of this study were to investigate the validity and reliability of a back squat variation using a rigid supportive harness that does not require unrestricted glenohumeral mobility for quantifying 1 repetition maximum (1RM). Thirteen young men (age = 25.3 ± 4.5 years, height = 179.2 ± 6.9 cm, and body mass = 86.6 ± 12.0 kg) with at least 2 years resistance training experience volunteered to participate in the study. Subjects reported to the lab on 3 occasions, each separated by 1 week. During testing sessions, subjects were assessed for 1RM using the traditional back squat (session 1) and harness back squat (HBS; sessions 2 and 3) exercises. Mean 1RM for the traditional back squat, and 2 testing sessions of the HBS (HBS1 and HBS2) were 148.4 ± 25.0 kg, 152.5 ± 25.7 kg, and 150.4 ± 22.6 kg, respectively. Back squat and mean HBS 1RM scores were very strongly correlated (r = 0.96; p < 0.001). There were no significant differences in 1RM scores between the 3 trials. The test-retest 1RM scores with the HBS demonstrated high reliability, with an intraclass correlation coefficient of 0.98 (95% confidence interval [CI] = 0.93-0.99), and a coefficient of variation of 2.6% (95% CI = 1.9-4.3). Taken together, these data suggest that the HBS exercise is a valid and reliable method for assessing 1RM in young men with previous resistance training experience and may be useful for individuals with restricted glenohumeral mobility.
Cross-cultural adaptation, reliability, and validation of the Korean version of the identification functional ankle instability (IdFAI).

PubMed

Ko, Jupil; Rosen, Adam B; Brown, Cathleen N

2017-09-12

To cross-culturally adapt the Identification Functional Ankle Instability for use with Korean-speaking participants. The English version of the IdFAI was cross-culturally adapted into Korean based on the guidelines. The psychometric properties in the Korean version of the IdFAI were measured for test-retest reliability, internal consistency, criterion-related validity, discriminative validity, and measurement error 181 native Korean-speakers. Intra-class correlation coefficients (ICC 2,1 ) between the English and Korean versions of the IdFAI for test-retest reliability was 0.98 (standard error of measurement = 1.41). The Cronbach's alpha coefficient was 0.89 for the Korean versions of IdFAI. The Korean versions of the IdFAI had a strong correlation with the SF-36 (r s = -0.69, p < .001) and the Korean version of the Cumberland Ankle Instability Tool (r s = -0.65, p < .001). The cutoff score of >10 was the optimal cutoff score to distinguish between the group memberships. The minimally detectable change of the Korean versions of the IdFAI score was 3.91. The Korean versions of the IdFAI have shown to be an excellent, reliable, and valid instrument. The Korean versions of the IdFAI can be utilized to assess the presence of Chronic Ankle Instability by researchers and clinicians working among Korean-speaking populations. Implications for rehabilitation The high recurrence rate of sprains may result into Chronic Ankle Instability (CAI). The Identification of Functional Ankle Instability Tool (IdFAI) has been validated and recommended to identify patients with Chronic Ankle Instability (CAI). The Korean version of the Identification of Functional Ankle Instability Tool (IdFAI) may be also recommend to researchers and clinicians for assessing the presence of Chronic Ankle Instability (CAI) in Korean-speaking population.
Urdu version of the neck disability index: a reliability and validity study.

PubMed

Farooq, Muhammad Nazim; Mohseni-Bandpei, Mohammad A; Gilani, Syed Amir; Hafeez, Ambreen

2017-04-08

Despite the wide use of the neck disability index (NDI) for assessing disability in patients with neck pain, the NDI has not yet been translated and validated in Urdu. The first purpose of the present study was to translate and cross-culturally adapt the NDI into the Urdu language (NDI-U). The second purpose was to investigate the reliability, validity and responsiveness of the NDI-U in Urdu-speaking patients experiencing chronic mechanical neck pain (CMNP). Translation and cross-cultural adaptation of the original version of the NDI were carried out using previously described procedures. Seventy-six patients with CMNP and thirty healthy participants were recruited for the study. NDI-U and visual analogue scales for pain intensity (VAS pain ) and disability (VAS disability ) were administered to all the participants at baseline and to the patients 3 weeks after receiving physiotherapy intervention. The global rating of change scale (GROC) was also administered at this time. Test-retest reliability and internal consistency were carried out on forty-six randomly selected patients two days after they completed the NDI-U. The NDI-U was evaluated for factor analysis, content validity, construct validity (discriminative and convergent validity) and responsiveness. An intra-class correlation coefficient (ICC 2,1 ) revealed excellent test-retest reliability for all items (ICC 2,1 = 0.86-0.98) and total scores (ICC 2,1 = 0.99) of the NDI-U. The NDI-U was found internally consistent with a Cronbach's alpha of 0.90 and a fair to good correlation between single items and the NDI-U total scores (r = 0.34 to 0.89). Factor analysis of the NDI-U produced two factors explaining 66.71% of the variance. Content validity was good, as no floor or ceiling effects were detected for the NDI-U total score. To determine discriminative validity, an independent t-test revealed a significant difference in the NDI-U total scores between the patients and healthy controls (P < 0.001). For convergent validity, Pearson's correlation coefficient showed a strong correlation between NDI-U and VAS disability (r = 0.83, P < 0.001) and a moderate correlation between NDI-U and VAS pain (r = 0.62, P < 0.001). To measure responsiveness, an independent t-test showed a significant difference in the NDI-U change scores between the stable and the improved groups (P < 0.001). Furthermore, moderate correlations were found between the NDI-U change scores and the GROC (r = 0.50, P < 0.001), VAS disability change scores (r = 0.58, P < 0.001) and VAS pain change scores (r = 0.55, P < 0.001). The results showed that the NDI-U is a reliable, valid and responsive questionnaire to measure disability in Urdu-speaking patients with CMNP.
Short health scale: A valid measure of health-related quality of life in Korean-speaking patients with inflammatory bowel disease

PubMed Central

Park, Soo-Kyung; Ko, Bong Min; Goong, Hyeon Jeong; Seo, Jeong Yeon; Lee, Sang Hyuk; Baek, Hae Lim; Lee, Moon Sung; Park, Dong Il

2017-01-01

AIM To evaluate the short health scale (SHS), a new, simple, four-part visual analogue scale questionnaire that is designed to assess the impact of inflammatory bowel disease (IBD) on health-related quality of life (HRQOL), in Korean-speaking patients with IBD. METHODS The SHS was completed by 256 patients with Crohn’s disease (CD) and ulcerative colitis (UC). Individual SHS items were correlated with inflammatory bowel disease questionnaire (IBDQ) dimensions and with disease activity to assess validity. Test-retest reliability, responsiveness and patient or disease characteristics with probable association with high SHS scores were analyzed. RESULTS Of 256 patients with IBD, 139 (54.3%) had UC and 117 (45.7%) had CD. The correlation coefficients between SHS questions about “symptom burden”, “activities of daily living”, and “disease-related worry” and their corresponding dimensions in the IBDQ ranged from 0.62 to 0.71, compared with correlation coefficients ranging from -0.45 to -0.61 for their non-corresponding dimensions. There was a stepwise increase in SHS scores, with increasing disease activity in both CD and UC (all P values < 0.001). Reliability was confirmed with test-retest correlations ranging from 0.68 to 0.90 (all P values < 0.001). Responsiveness was confirmed with the patients who remained in remission. Their SHS scores remained unchanged, except for the SHS dimension “disease-related worry”. In the multivariate analysis, female sex was associated with worse “general well-being” (OR = 2.28, 95%CI: 1.02-5.08) along with worse disease activity. CONCLUSION The SHS is a valid and reliable measure of HRQOL in Korean-speaking patients with IBD. PMID:28596689
Assessing cancer-specific anxiety in Chinese men with prostate cancer: psychometric evaluation of the Chinese version of the Memorial Anxiety Scale for Prostate Cancer (MAX-PC).

PubMed

Huang, Qingmei; Jiang, Ping; Zhang, Zijun; Luo, Jie; Dai, Yun; Zheng, Li; Wang, Wei

2017-12-01

The Memorial Anxiety Scale for Prostate Cancer (MAX-PC) was developed to identify and assess cancer-specific anxiety among men with prostate cancer (PCa); however, there is no Chinese version. The aim of our study was to translate the English version of MAX-PC into Chinese and evaluate the psychometric properties of it. The study cohort comprised 254 participants. Internal consistency including the Cronbach's alpha coefficient and item-total correlations were used to measure the reliability of the scale. Factor structure was analyzed by exploratory factor analysis and concurrent validity by comparing MAX-PC scores with anxiety subscale scores of the Hospital Anxiety and Depression Scale (HADS). Divergent validity was assessed by correlating MAX-PC with HADS depression subscale, while discriminant ability by comparing differences in MAX-PC scores between different patient groups. The Chinese version of MAX-PC demonstrated good reliability; the Cronbach's alpha coefficient for the total and three subscales (prostate cancer anxiety, PSA anxiety, and fear of recurrence) being 0.94, 0.93, 0.82, and 0.85, respectively. Exploratory factor analysis supported the three-factor structure of the scale established in the original version. Despite the somewhat underperformed divergent validity, the scale demonstrated good concurrent validity with a strong correlation with the HADS anxiety subscale (r = 0.71, p < 0.01). Moreover, discriminant ability was demonstrated by ability to differentiate between disease stages. The MAX-PC Chinese version was confirmed to be a valid, reliable instrument and is thus appropriate for identifying and quantifying cancer-specific anxiety in Chinese PCa patients.
The IMPACT-III (HR) questionnaire: a valid measure of health-related quality of life in Croatian children with inflammatory bowel disease.

PubMed

Abdovic, Slaven; Mocic Pavic, Ana; Milosevic, Milan; Persic, Mladen; Senecic-Cala, Irena; Kolacek, Sanja

2013-12-01

To assess the reliability and validity of IMPACT-III (HR), a disease-specific, health-related quality of life instrument in Croatian children with inflammatory bowel disease. In a multicenter study, 104 children participated in a validation study of IMPACT-III (HR) cross-culturally adapted for Croatia. Factor analysis was used to determine optimal domain structure for this cohort, analysis of Cronbach's alpha coefficients to test internal reliability, ANOVA to assess discriminant validity, and correlation with Pediatric Quality of Life Inventory, Version 4.0 (PedsQL) using Pearson correlation coefficients to assess concurrent validity. Cronbach's alpha for the IMPACT-III (HR) total score was 0.92. The most robust factor solution was a 5-domain structure: Symptoms, Concerns, Socializing, Body Image, and Worry about Stool, all of which demonstrated good internal reliability (α=0.60-0.89), but two items were dropped to achieve this. Discriminant validity was demonstrated by significant differences (P<0.001) in mean IMPACT-III (HR) scores between quiescent and mild or moderate-severe disease activity groups for total (148 vs. 139 or 125) and following factor scores: Symptoms (84 vs. 71 or 61), Socializing (91 vs. 83 or 76), and Worry about Stool (significant only between quiescent and moderate-severe groups, 90 vs. 62, respectively). Concurrent validity of IMPACT-III (HR) with PedsQL showed significant correlation, which was strongest when similar domains were compared. IMPACT-III (HR) appears to be useful tool to measure health-related quality of life in Croatian children with Crohn's disease and ulcerative colitis. Copyright © 2012 European Crohn's and Colitis Organisation. Published by Elsevier B.V. All rights reserved.
Essential tremor quantification based on the combined use of a smartphone and a smartwatch: The NetMD study.

PubMed

López-Blanco, Roberto; Velasco, Miguel A; Méndez-Guerrero, Antonio; Romero, Juan Pablo; Del Castillo, María Dolores; Serrano, J Ignacio; Benito-León, Julián; Bermejo-Pareja, Félix; Rocon, Eduardo

2018-06-01

The use of wearable technology is an emerging field of research in movement disorders. This paper introduces a clinical study to evaluate the feasibility, clinical correlation and reliability of using a system based in smartwatches to quantify tremor in essential tremor (ET) patients and check its acceptance as clinical monitoring tool. The system is based on a commercial smartwatch and an Android smartphone. An investigational Android application controls the process of recording raw data from the smartwatch three-dimensional gyroscopes. Thirty-four ET patients were consecutively enrolled in the experiments and assessed along one year. Arm tremor was videofilmed and scored using the Fahn-Tolosa-Marin Tremor Rating Scale (FTM-TRS). Tremor intensity was quantified with the root mean square of angular velocity measured in the patients' wrists. Eighty-two assessments with smartwatches were performed. Spearman's correlation coefficients (ρ) between clinical tremor (FTM-TRS) scores and smartwatch measures for tremor intensity were 0.590 at rest; ρ = 0.738 in steady posture; ρ = 0.189 in finger-to-nose maneuvers; and ρ = 0.652 in pouring water task. Smartwatch reliability was checked by intraclass realiability coefficients: 0.85, 0.95, 0.91, 0.95 respectively. Most of patients showed good acceptance of the system. This commodity hardware contributes to quantify tremor objectively in a consulting-room by customized Android smart devices as clinical monitoring tool. The NetMD system for tremor analysis is feasible, well-correlated with clinical scores, reliable and well-accepted by patients to tremor follow-up. Therefore, it could be an option to objectively quantify tremor in ET patients during their regular follow-up. Copyright © 2018 Elsevier B.V. All rights reserved.
Translation and Cross-cultural Adaptation of the Hip Disability and Osteoarthritis Score into Persian Language: Reassessment of Validity and Reliability

PubMed Central

Mousavian, Alireza; Kachooie, Amir Reza; Birjandinejad, Ali; Khoshsaligheh, Masood; Ebrahimzadeh, Mohammad Hosein

2018-01-01

Background: This study aimed Persian translation and validation of the hip disability and osteoarthritis outcome score (HOOS) questionnaire. Methods: The study was carried out in two phases. First, we translated the HOOS according to acceptable guidelines. We assessed HOOS content convergent validity on 203 hip osteoarthritis patients using SF-36. Internal consistency was tested using Cronbach's alpha coefficient if each item removed and intraclass correlation coefficient (ICC) for the assessment of test-retest reproducibility. Results: Patients had mean (standard deviation) age of 39 (17). Test-retest ICC in whole was 0.95 (P = 0.014) showing excellent reliability. ICC was 0.92 for the “pain” subscale (P = 0.02), 0.81 for the “symptom” subscale (P = 0.002), 0.81 for the “function of daily living (FDL)” (P = 0.022), 0.88 for the “function of sports and recreational activities” (P = 0.006), but it was 0.62 (P = 0.1) for the “quality of life (QOL).” Cronbach's alpha was 0.92, 0.73, 0.97, 0.86, 0.80, and 0.80 for the pain, symptom, FDL, function of sports, QOL, and stiffness, respectively, showing good to excellent internal consistancy. Having SF-36 for the assessment of convergent validity, there was a strong correlation between total HOOS score and the physical component summary domain of SF-36 (r = 0.64, P = 0.0001), whereas the t correlation with the mental component summary domain was weak (r = 0.16, P = 0.04). Conclusions: The Persian version of the HOOS questionnaire is a valid (regarding physical not mental aspects) and reliable assessment tool in patients with hip osteoarthritis. PMID:29619147
Short health scale: A valid measure of health-related quality of life in Korean-speaking patients with inflammatory bowel disease.

PubMed

Park, Soo-Kyung; Ko, Bong Min; Goong, Hyeon Jeong; Seo, Jeong Yeon; Lee, Sang Hyuk; Baek, Hae Lim; Lee, Moon Sung; Park, Dong Il

2017-05-21

To evaluate the short health scale (SHS), a new, simple, four-part visual analogue scale questionnaire that is designed to assess the impact of inflammatory bowel disease (IBD) on health-related quality of life (HRQOL), in Korean-speaking patients with IBD. The SHS was completed by 256 patients with Crohn's disease (CD) and ulcerative colitis (UC). Individual SHS items were correlated with inflammatory bowel disease questionnaire (IBDQ) dimensions and with disease activity to assess validity. Test-retest reliability, responsiveness and patient or disease characteristics with probable association with high SHS scores were analyzed. Of 256 patients with IBD, 139 (54.3%) had UC and 117 (45.7%) had CD. The correlation coefficients between SHS questions about "symptom burden", "activities of daily living", and "disease-related worry" and their corresponding dimensions in the IBDQ ranged from 0.62 to 0.71, compared with correlation coefficients ranging from -0.45 to -0.61 for their non-corresponding dimensions. There was a stepwise increase in SHS scores, with increasing disease activity in both CD and UC (all P values < 0.001). Reliability was confirmed with test-retest correlations ranging from 0.68 to 0.90 (all P values < 0.001). Responsiveness was confirmed with the patients who remained in remission. Their SHS scores remained unchanged, except for the SHS dimension "disease-related worry". In the multivariate analysis, female sex was associated with worse "general well-being" (OR = 2.28, 95%CI: 1.02-5.08) along with worse disease activity. The SHS is a valid and reliable measure of HRQOL in Korean-speaking patients with IBD.
The Reliability and Validity of the Japanese Version of the Stroke Impact Scale Version 3.0.

PubMed

Ochi, Mitsuhiro; Ohashi, Hiroshi; Hachisuka, Kenji; Saeki, Satoru

It is important to evaluate body functions and structures, activity, and participation in stroke rehabilitation. The Stroke Impact Scale (SIS), a new stroke-specific self-report measure that was developed by Duncan et al, is widely used to measure multidimensional consequences about health-related quality of life. The SIS version 3.0 includes 9 domains (strength, hand function, activity of daily living and instrumental activity of daily living, mobility, communication, emotion, memory and thinking, participation, and recovery). Patients are asked to make a percentage rating of their recovery since their stroke on a visual analog scale of 0 to 100 for the stroke recovery domain. Each item in the 8 domains other than stroke recovery are scored in a range of 1 to 5 as a raw score and calculated using the manual to a final score. We developed a Japanese version of the SIS version 3.0 and assessed its reliability and validity in 32 chronic stroke survivors. The internal consistency (Cronbach's α < 0.70) was satisfactory. The test-retest reliability (ICC, 0.86 to 0.96) was also satisfactory. Regarding convergent validity, a significant correlation (Spearman's correlation coefficient, P < 0.05) was found between the SIS physical domain score and Brunnstrom stage (r, 0.49 to 0.53) and short form 8 (r = 0.82). The Japanese version of the SIS version 3.0 is valid, reliable, and clinically useful for stroke survivors.
Applicability of the ReproQ client experiences questionnaire for quality improvement in maternity care

PubMed Central

Scheerhagen, Marisja; Tholhuijsen, Dominique J.C.; Birnie, Erwin; Franx, Arie; Bonsel, Gouke J.

2016-01-01

Background. The ReproQuestionnaire (ReproQ) measures the client’s experience with maternity care, following the WHO responsiveness model. In 2015, the ReproQ was appointed as national client experience questionnaire and will be added to the national list of indicators in maternity care. For using the ReproQ in quality improvement, the questionnaire should be able to identify best and worst practices. To achieve this, ReproQ should be reliable and able to identify relevant differences. Methods and Findings. We sent questionnaires to 17,867 women six weeks after labor (response 32%). Additionally, we invited 915 women for the retest (response 29%). Next we determined the test–retest reliability, the Minimally Important Difference (MID) and six known group comparisons, using two scorings methods: the percentage women with at least one negative experience and the mean score. The reliability for the percentage negative experience and mean score was both ‘good’ (Absolute agreement = 79%; intraclass correlation coefficient = 0.78). The MID was 11% for the percentage negative and 0.15 for the mean score. Application of the MIDs revealed relevant differences in women’s experience with regard to professional continuity, setting continuity and having travel time. Conclusions. The measurement characteristics of the ReproQ support its use in quality improvement cycle. Test–retest reliability was good, and the observed minimal important difference allows for discrimination of good and poor performers, also at the level of specific features of performance. PMID:27478690
Quantifying bone marrow edema in the rheumatoid cervical spine using magnetic resonance imaging.

PubMed

Suppiah, Ravi; Doyle, Anthony; Rai, Raylynne; Dalbeth, Nicola; Lobo, Maria; Braun, Jürgen; McQueen, Fiona M

2010-08-01

To determine the reliability and feasibility of a new magnetic resonance imaging (MRI) score to quantify bone marrow edema (BME), synovitis, and erosions in the cervical spine of patients with rheumatoid arthritis (RA); and to investigate the correlations among neck pain, clinical markers of RA disease activity, and MRI features of disease activity in the cervical spine. Thirty patients with RA (50% with neck pain) and a Disease Activity Score 28-joint count > 3.2 had an MRI scan of their cervical spine. STIR, VIBE, and T1-weighted postcontrast sequences were used to quantify BME. MRI scans were scored for total BME, synovitis, and erosions using a new scoring method developed by the authors and assessed for reliability and feasibility. Associations between neck pain and clinical markers of disease activity were investigated. BME was present in 14/30 patients; 9/14 (64%) had atlantoaxial BME, 10/14 (71%) had subaxial BME, and 5/14 (36%) had both. Interobserver reliability for total cervical BME score was moderate [intraclass correlation coefficient (ICC) = 0.51]. ICC improved to 0.67 if only the vertebral bodies and dens were considered. There was no correlation between neck pain or clinical measures of RA disease activity and the presence of any MRI features including BME, synovitis, or erosions. Current RA disease activity scores do not identify activity in the cervical spine. An MRI score that quantifies BME, synovitis, and erosions in the cervical spine may provide useful information regarding inflammation and damage. This could alert clinicians to the presence of significant pathology and influence management.
Reliability and validity of a new HIV-specific questionnaire with adults living with HIV in Canada and Ireland: the HIV Disability Questionnaire (HDQ).

PubMed

O'Brien, Kelly K; Solomon, Patricia; Bergin, Colm; O'Dea, Siobhán; Stratford, Paul; Iku, Nkem; Bayoumi, Ahmed M

2015-08-12

Our aim was to assess internal consistency reliability, construct validity, and test-retest reliability of the HDQ with adults living with HIV in Canada and Ireland. We recruited adults 18 years of age or older living with HIV from hospital clinics and AIDS service organizations in Canada and Ireland. We administered the HDQ paired with reference measures (World Health Organization Disability Assessment Schedule, SF-36 Questionnaire, Medical Outcomes Study Social Support Survey), and a demographic questionnaire. We calculated HDQ disability presence, severity and episodic scores (scored from 0-100). We calculated Cronbach's alpha and Intraclass Correlation Coefficients (ICC) (Canada only) for the disability severity and episodic scores and considered coefficients >0.80 and >0.70 as acceptable, respectively. To assess construct validity, we tested 40 a priori hypotheses of correlations between scores on the HDQ and reference measures and two known group hypotheses comparing HDQ presence and severity scores based on age and comorbidity. We considered acceptance of at least 75% of hypotheses as demonstrating support for construct validity. Of the 235 participants (139 Canada; 96 Ireland), the majority were men (74% Ireland; 82% Canada) and were taking antiretroviral therapy (88% Ireland; 91% Canada). Compared with Irish participants, Canadian participants were older (median age: 48 versus 41 years) and reported living with a higher median number of comorbidities (4 versus 1). Cronbach's alpha for Irish and Canadian participants were 0.97 (95% confidence interval (CI): 0.97-0.98) and 0.96 (95 % CI: 0.95-0.98), respectively, for the severity scale and 0.98 (95 % CI: 0.97-0.98) and 0.96 (95 % CI: 0.95-0.98), respectively, for the episodic scale. Of the 40 construct validity correlation hypotheses, 32 (80%) and 22 (55%) were supported among the Canadian and Irish samples respectively; both (100%) known group hypotheses were also supported. ICC values for Canadian participants ranged from 0.80 (95 % CI: 0.71, 0.86) in the cognitive domain to 0.89 (95 % CI: 0.83, 0.92) in the social inclusion domain. The HDQ demonstrates internal consistency reliability and a variable degree of construct validity when administered to adults living with HIV in Canada and Ireland. The HDQ demonstrates test-retest reliability when administered to adults with HIV in Canada. Further validation of the HDQ outside of Canada is needed.
The Premature Ejaculation Profile: validation of self-reported outcome measures for research and practice.

PubMed

Patrick, Donald L; Giuliano, François; Ho, Kai Fai; Gagnon, Dennis D; McNulty, Pauline; Rothman, Margaret

2009-02-01

To evaluate the reliability and validity of the Premature Ejaculation Profile (PEP), a self-reported outcome instrument for evaluating domains of PE and its treatment, comprised of four single-item measures, a profile, and an index score. Data were from men participating in observational studies in the USA (PE, 207 men; non-PE, 1380) and Europe (PE, 201; non-PE, 914) and from men with PE (1238) participating in a phase III randomized, placebo-controlled clinical trial of dapoxetine. The PEP contains four measures: perceived control over ejaculation, personal distress related to ejaculation, satisfaction with sexual intercourse, and interpersonal difficulty related to ejaculation, each assessed on five-point response scales. Test-retest reliability, known-groups validity, and ability to detect a patient-reported global impression of change (PGI) in condition were evaluated for the individual PEP measures and a PEP index score (the mean of all four measures). Profile analysis was conducted using multivariate analysis of variance. All PEP measures showed acceptable reliability (intraclass correlation coefficients ranged from 0.66 to 0.83) and mean scores for all measures differed significantly between PE and non-PE groups (P < 0.001). Men who reported a reduction in PE with treatment in the phase III trial had significantly greater scores on each of the four measures. The PEP profiles of men with and without PE differed significantly (P < 0.001) in both observational studies; higher levels of PGI were associated with higher PEP profiles (P < 0.001). The PEP index score also showed acceptable reliability and was significantly different between the PE and non-PE groups (P < 0.001). Men who reported an improvement in PE with treatment in the phase III trial had significantly greater PEP index scores. In the phase III trial, nausea was the most common adverse event with dapoxetine. The PEP provides a reliable, valid, and interpretable measure for use in monitoring outcomes of men with PE.
Reliability measures in item response theory: manifest versus latent correlation functions.

PubMed

Milanzi, Elasma; Molenberghs, Geert; Alonso, Ariel; Verbeke, Geert; De Boeck, Paul

2015-02-01

For item response theory (IRT) models, which belong to the class of generalized linear or non-linear mixed models, reliability at the scale of observed scores (i.e., manifest correlation) is more difficult to calculate than latent correlation based reliability, but usually of greater scientific interest. This is not least because it cannot be calculated explicitly when the logit link is used in conjunction with normal random effects. As such, approximations such as Fisher's information coefficient, Cronbach's α, or the latent correlation are calculated, allegedly because it is easy to do so. Cronbach's α has well-known and serious drawbacks, Fisher's information is not meaningful under certain circumstances, and there is an important but often overlooked difference between latent and manifest correlations. Here, manifest correlation refers to correlation between observed scores, while latent correlation refers to correlation between scores at the latent (e.g., logit or probit) scale. Thus, using one in place of the other can lead to erroneous conclusions. Taylor series based reliability measures, which are based on manifest correlation functions, are derived and a careful comparison of reliability measures based on latent correlations, Fisher's information, and exact reliability is carried out. The latent correlations are virtually always considerably higher than their manifest counterparts, Fisher's information measure shows no coherent behaviour (it is even negative in some cases), while the newly introduced Taylor series based approximations reflect the exact reliability very closely. Comparisons among the various types of correlations, for various IRT models, are made using algebraic expressions, Monte Carlo simulations, and data analysis. Given the light computational burden and the performance of Taylor series based reliability measures, their use is recommended. © 2014 The British Psychological Society.

Translation, cross-cultural adaptation, and psychometric properties of the German version of the hip disability and osteoarthritis outcome score.

PubMed

Blasimann, Angela; Dauphinee, Sharon Wood; Staal, J Bart

2014-12-01

Clinical measurement. To translate and cross-culturally adapt the Hip disability and Osteoarthritis Outcome Score (HOOS) from English into German, and to study its psychometric properties in patients after hip surgery. There is no specific hip questionnaire in German that not only measures symptoms and function but also contains items about hip-related quality of life. The translation and cross-cultural adaptation involved forward translation, harmonization, cognitive debriefing, back translation, and comparison to the original HOOS following international guidelines. The German version was tested in 51 Swiss inpatients 8 weeks after different types of hip surgery, mainly total hip replacement. The mean age of the participants was 62.5 years, and the age range was from 27 to 87 years. Thirty (58.8%) of the participants were women. Internal consistency and test-retest reliability were estimated using Cronbach alpha and intraclass correlation coefficients for agreement. For construct validity, total scores of the German HOOS were correlated with those of the Western Ontario and McMaster Universities Osteoarthritis Index. The HOOS was also compared to the Medical Outcomes Study 36-Item Short-Form Health Survey. Cronbach alpha values for all German HOOS subscales were between .87 and .93. For test-retest reliability, the intraclass correlation coefficient for agreement was 0.85 for the total scores of the German HOOS. The Spearman rho for the Medical Outcomes Study 36-Item Short-Form Health Survey physical functioning subscale compared to the sum of all HOOS subscales was 0.71, and that for the Medical Outcomes Study 36-Item Short-Form Health Survey physical component summary was 0.97. The German HOOS has demonstrated adequate reliability and validity. Use of the German HOOS is recommended for assessment of patients after hip surgery, with the proviso that additional psychometric testing should be done in future research.
Cross-cultural adaptation and validation of the Japanese version of the new Knee Society Scoring System for osteoarthritic knee with total knee arthroplasty.

PubMed

Hamamoto, Yosuke; Ito, Hiromu; Furu, Moritoshi; Ishikawa, Masahiro; Azukizawa, Masayuki; Kuriyama, Shinichi; Nakamura, Shinichiro; Matsuda, Shuichi

2015-09-01

The purposes of this study were to translate the new Knee Society Score (KSS) into Japanese and to evaluate the construct and content validity, test-retest reliability, and internal consistency of the Japanese version of the new KSS. The Japanese version of the KSS was developed according to cross-cultural guidelines by using the "translation-back translation" method to ensure content validity. KSS data were then obtained from patients who had undergone total knee arthroplasty (TKA). The psychometric properties evaluated were as follows: for feasibility, response rate, and floor and ceiling effects; for construct validity, internal consistency using Cronbach's alpha, and correlations with quality of life. Construct validity was evaluated by using Spearman's correlation coefficient to quantify the correlation between the KSS and the Japanese version of the Oxford 12-item Knee Score or Short Form 36 Health Survey (SF-36) questionnaires. The Japanese version of the KSS was sent to 93 consecutive osteoarthritic patients who underwent primary TKA in our institution. Fifty-five patients completed the questionnaires and were included in this study. Neither a floor nor ceiling effect was observed. The reliability proved excellent in the majority of domains, with intraclass correlation coefficients of 0.65-0.88. Internal consistency, assessed by Cronbach's alpha, was good to excellent for all domains (0.78-0.94). All of the four domains of the KSS correlated significantly with the Oxford 12-item Knee Score. The activity and satisfaction domains of the KSS correlated significantly with all and the majority of subscales of the SF-36, respectively, whereas symptoms and expectation domains showed significant correlations only with bodily pain and vitality subscales and with the physical function, bodily pain, and vitality subscales, respectively. The Japanese version of the new KSS is a valid, reliable, and responsive instrument to capture subjective aspects of the functional symptoms and abilities of patients who undergo TKA.
Psychometric properties of the Brazilian version of the Orthognathic Quality of Life Questionnaire.

PubMed

Gava, Eveline Coutinho Baldoto; Miguel, José Augusto Mendes; de Araújo, Adriana Monteiro; de Oliveira, Branca Heloisa

2013-10-01

To assess the construct validity and reliability of the Brazilian version of the Orthognathic Quality of Life Questionnaire (B-OQLQ). A cross-sectional study was performed, and 101 patients in need of orthodontic-surgical treatment were recruited at a public hospital (Hospital Universitário Pedro Ernesto) and a public dental school (Faculdade de Odontologia da Universidade do Estado do Rio de Janeiro). The B-OQLQ was self-completed. The mean age of the participants was 26.51 ± 9.25 years, and most were female (58.42%; n = 59). The construct validity was assessed using Spearman's correlation coefficient between the B-OQLQ and the Oral Health Impact Profile (OHIP-14) scores and between the B-OQLQ and subjective health indicators' scores. The reliability was assessed in terms of internal consistency and stability (test-retest) using Cronbach's alpha and the intraclass correlation coefficient (ICC), respectively. Significant correlations were found between the B-OQLQ scores and the following: OHIP-14 total score (rs = 0.70, P < .001), perception of oral health (rs = -0.24, P = .02), single-item evaluation of quality of life (rs = -0.29, P = .03), satisfaction with physical appearance (rs = -0.40, P < .001), and satisfaction with facial appearance (rs = -0.39, P = .0001). Cronbach's alpha and the ICC was 0.95 and 0.90, respectively. The domains of B-OQLQ causing the most effect on the quality of life included "social aspects of deformity" (13.0 ± 10.54) and "facial aesthetics" (11.81 ± 6.23). The Brazilian version of the OQLQ was shown to be valid and reliable with good psychometric properties and might thus be considered an appropriate tool to assess the effect of dentofacial deformities on the quality of life of individuals with this condition. Copyright © 2013 American Association of Oral and Maxillofacial Surgeons. Published by Elsevier Inc. All rights reserved.
Psychometric performance of the National Eye Institute visual function questionnaire in Latinos and non-Latinos.

PubMed

Baker, Richard S; Bazargan, Mohsen; Calderón, José L; Hays, Ron D

2006-08-01

To compare the psychometric performance of Spanish versions of the 25-item National Eye Institute Visual Function Questionnaire (NEI VFQ-25) and the NEI VFQ-39 administered to Latino patients with the psychometric performance of the standard English NEI VFQ-25 and NEI VFQ-39 administered to non-Latino patients. Clinic-based cross-sectional survey. Four hundred three patients (160 Latinos and 243 non-Latinos) recruited from general ophthalmology clinics of an urban public hospital over a 6-month period. Structured face-to-face interviews were conducted in Spanish and English to collect data for the NEI VFQ-25 and NEI VFQ-39. We calculated the mean, standard deviation, and percentage of participants having the minimum (floor) and maximum (ceiling) possible score for each item and scale. Internal consistency reliability of the NEI VFQ-25 and NEI VFQ-39 was estimated using the Cronbach alpha and average inter-item correlation. Construct validity for the instruments was assessed by comparing scores for participants classified as having normal versus impaired visual acuity. Instrument scales for general health; general vision; ocular pain; near activities; distance activities; vision-specific social functioning, mental health, role difficulties, and dependency; driving; color vision; and peripheral vision. Internal consistency reliability was significantly lower in the Spanish version than in the English version for 3 scales of the NEI VFQ-25. More importantly, 3 scales in the Spanish version manifested inadequate reliability (alpha< or =0.70), compared with only 1 inadequately reliable subscale in the English version. Reliability coefficients associated with the Spanish NEI VFQ-39 scales exceeded commonly accepted minimum standards. Comparison of reliability coefficients between Latino and non-Latino subgroups demonstrated statistically significant differences for 4 scales: Ocular Pain, Mental Health, Role Difficulties, and Dependency. In each case, the Latino group had the lower internal consistency reliability. However, only for the Ocular Pain subscale was reliability both significantly lower and inadequate (alpha<0.70). Overall performance of the NEI VFQ in Latino populations is adequate. However, in the absence of modifications to improve the reliability of specific Spanish version subscales, comparisons between Latino and non-Latino subgroups using the NEI VFQ must be interpreted with appropriate caution.
Score Reliability and Construct Validity of the Flinn Performance Screening Tool for Adults With Symptoms of Carpal Tunnel Syndrome

PubMed Central

Flinn, Sharon R.; Pease, William S.; Freimer, Miriam L.

2013-01-01

OBJECTIVE We investigated the psychometric properties of the Flinn Performance Screening Tool (FPST) for people referred with symptoms of carpal tunnel syndrome (CTS). METHOD An occupational therapist collected data from 46 participants who completed the Functional Status Scale (FSS) and FPST after the participants’ nerve conduction velocity study to test convergent and contrasted-group validity. RESULTS Seventy-four percent of the participants had abnormal nerve conduction studies. Cronbach’s α coefficients for subscale and total scores of the FPST ranged from .96 to .98. Intrarater reliability for six shared items of the FSS and the FPST was supported by high agreement (71%) and a fair κ statistic (.36). Strong to moderate positive relationships were found between the FSS and FPST scores. Functional status differed significantly among severe, mild, and negative CTS severity groups. CONCLUSION The FPST shows adequate psychometric properties as a client-centered screening tool for occupational performance of people referred for symptoms of CTS. PMID:22549598
The Chinese version of the Severe Respiratory Insufficiency questionnaire for patients with chronic hypercapnic chronic obstructive pulmonary disease receiving non-invasive positive pressure ventilation.

PubMed

Chen, Rongchang; Guan, Lili; Wu, Weiliang; Yang, Zhicong; Li, Xiaoying; Luo, Qun; Liang, Zhenyu; Wang, Fengyan; Guo, Bingpeng; Huo, Yating; Yang, Yuqiong; Zhou, Luqian

2017-08-28

The Severe Respiratory Insufficiency (SRI) questionnaire is the best assessment tool for health-related quality of life in patients with chronic obstructive pulmonary disease (COPD) receiving non-invasive positive pressure ventilation (NIPPV). This study aimed to translate the SRI Questionnaire into Chinese and to validate it. Prospective validation study. A total of 149 participants with chronic hypercapnic COPD receiving NIPPV completed the study. The SRI questionnaire was translated into Chinese using translation and back-translation. Reliability was gauged using Cronbach's α coefficient. Exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) were used to assess construct validity. Content validity was confirmed by evaluating the relationship between the score of each item and the total score of the relevant subscale. Cronbach's α coefficients for each subscale and summary scale were above 0.7. Using EFA, one factor was extracted from the anxiety and summary scales and two factors were extracted from the remaining six subscales. Based on the EFA results, subsequent CFA revealed a good model fit for each subscale, but the extracted factors of each subscale were correlated. Content validity was confirmed by the good relationship between the score of each item and the total score of the relevant subscale. The Chinese version of the SRI questionnaire is valid and reliable for patients with chronic hypercapnic COPD receiving NIPPV in China. NCT02499718. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Intra-rater and inter-rater reliability of ultrasonographic measurements of acromion-greater tuberosity distance in patients with post-stroke hemiplegia.

PubMed

Kumar, Praveen; Cruziah, Reynold; Bradley, Michael; Gray, Selena; Swinkels, Annette

2016-06-01

Glenohumeral subluxation (GHS) is reported in up to 81% of patients with stroke. Ultrasonographic measurements of GHS by measuring the acromion-greater tuberosity (AGT) have been found to be reliable for experienced raters. The primary aim was to assess the intra-rater reliability of measurements of AGT distance in people with stroke following a short course of rater training. A secondary aim was to compare the inter-rater reliability of these measurements between novice and experienced raters. Patients with stroke (n = 16; 5 men, 11 women; 74 ± 10 years) with 1-sided weakness who gave informed consent were recruited. Ultrasonographic measurements were recorded at the bedside by two physiotherapists with patients seated upright in a hospital chair. Reliability was assessed by intra-class correlation coefficients (ICCs) and the standard error of measurements (SEM). Minimum detectable change (MDC90) scores were used to estimate the magnitude of change that is likely to exceed measurement error. Mean ± SD AGT distances on the affected and unaffected sides for rater 1 were 2.2 ± 0.7 and 1.7 ± 0.4 cm, respectively. Corresponding values for rater 2 were 2.5 ± 0.6 and 2.0 ± 0.4 cm. Intra-class correlation coefficient values for the affected and unaffected shoulders for rater 1 were 0.96 and 0.91, respectively. Corresponding values for rater 2 were 0.95 and 0.90.SEM and MDC90 for both affected and unaffected shoulders were ≤ 0.2 cm. Inter-rater reliability coefficients were 0.86 (affected) and 0.76 (unaffected) shoulders. Ultrasonographic measurement of AGT distance demonstrates excellent intra-rater reliability for a novice rater. Inter-rater reliability of ultrasonographic measurement of AGT also demonstrates good reliability between novice and experienced raters.
Measuring illness insight in patients with alcohol-related cognitive dysfunction using the Q8 questionnaire: a validation study

PubMed Central

Walvoort, Serge JW; van der Heijden, Paul T; Kessels, Roy PC; Egger, Jos IM

2016-01-01

Aim Impaired illness insight may hamper treatment outcome in patients with alcohol-related cognitive deficits. In this study, a short questionnaire for the assessment of illness insight (eg, the Q8) was investigated in patients with Korsakoff’s syndrome (KS) and in alcohol use disorder (AUD) patients with mild neurocognitive deficits. Methods First, reliability coefficients were computed and internal structure was investigated. Then, comparisons were made between patients with KS and patients with AUD. Furthermore, correlations with the Dysexecutive Questionnaire (DEX) were investigated. Finally, Q8 total scores were correlated with neuropsychological tests for processing speed, memory, and executive function. Results Internal consistency of the Q8 was acceptable (ie, Cronbach’s α =0.73). The Q8 items represent one factor, and scores differ significantly between AUD and KS patients. The Q8 total score, related to the DEX discrepancy score and scores on neuropsychological tests as was hypothesized, indicates that a higher degree of illness insight is associated with a higher level of cognitive functioning. Conclusion The Q8 is a short, valid, and easy-to-administer questionnaire to reliably assess illness insight in patients with moderate-to-severe alcohol-related cognitive dysfunction. PMID:27445476
Fatigue in children: reliability and validity of the Dutch PedsQL™ Multidimensional Fatigue Scale.

PubMed

Gordijn, M Suzanne; Suzanne Gordijn, M; Cremers, Eline M P; Kaspers, Gertjan J L; Gemke, Reinoud J B J

2011-09-01

The aim of the study is to report on the feasibility, reliability, validity, and the norm-references of the Dutch version of the PedsQL™ Multidimensional Fatigue Scale. The study participants are four hundred and ninety-seven parents of children aged 2-18 years and 366 children aged 5-18 years from various day care facilities, elementary schools, and a high school who completed the Dutch version of the PedsQL™ Multidimensional Fatigue Scale. The number of missing items was minimal. All scales showed satisfactory internal consistency reliability, with Cronbach's coefficient alpha exceeding 0.70. Test-retest reliability was good to excellent (ICCs 0.68-0.84) and inter-observer reliability varied from moderate to excellent (ICCs 0.56-0.93) for total scores. Parent/child concordance for total scores was poor to good (ICCs 0.25-0.68). The PedsQL™ Multidimensional Fatigue Scale was able to distinguish between healthy children and children with an impaired health condition. The Dutch version of the PedsQL™ Multidimensional Fatigue Scale demonstrates an adequate feasibility, reliability, and validity in another sociocultural context. With the obtained norm-references, it can be utilized as a tool in the evaluation of fatigue in healthy and chronically ill children aged 2-18 years.
A validity and reliability study of the Turkish Multidimensional Assessment of Fatigue (MAF) scale in chronic musculoskeletal physical therapy patients.

PubMed

Yildirim, Yücel; Ergin, Gülbin

2013-01-01

Fatigue is primarily a subjective experience and self-report is the most common approach used to measure fatigue. Numerous self-report instruments have been developed to measure fatigue. Unfortunately, each of these measures was tailored for the situation in which fatigue was studied. Therefore, the aim of this study was to determine the reliability and validity of the Turkish language version of the Multidimensional Assessment of Fatigue Scale (MAF-T) in chronic musculoskeletal physical therapy patients. The MAF-T was supplied by the MAPI Research Institute, and 69 chronic musculoskeletal physical therapy patients were evaluated. To validate MAF-T, all participants completed the MAF-T and Short Form-36 (SF-36). The MAF was administered again one week later to assess test-retest reliability. Using Cronbach α, the internal consistency reliability of the MAF-T was 0.90, the Intraclass Correlation Coefficient (ICC) reliability was 0.96. Item-discriminant validity was calculated between r=0.14 and r=0.82. The correlations between the total scores of the MAF-T scale and the subscale scores of SF-36 were negative and significant (p< 0.01). The MAF-T is a valid and reliable scale for assessing fatigue in chronic musculoskeletal physical therapy patients.
Cross-cultural adaptation and validation of the Turkish version of the pain catastrophizing scale among patients with ankylosing spondylitis

PubMed Central

İlçin, Nursen; Gürpınar, Barış; Bayraktar, Deniz; Savcı, Sema; Çetin, Pınar; Sarı, İsmail; Akkoç, Nurullah

2016-01-01

[Purpose] This study describes the cultural adaptation, validation, and reliability of the Turkish version of the Pain Catastrophizing Scale in patients with ankylosing spondylitis. [Methods] The validity of the Turkish version of the Pain Catastrophizing Scale was assessed by evaluating data quality (missing data and floor and ceiling effects), principal components analysis, internal consistency (Cronbach’s alpha), and construct validity (Spearman’s rho). Reproducibility analyses included standard measurement error, minimum detectable change, limits of agreement, and intraclass correlation coefficients. [Results] Sixty-four adult patients with ankylosing spondylitis with a mean age of 42.2 years completed the study. Factor analysis revealed that all questionnaire items could be grouped into two factors. Excellent internal consistency was found, with a Chronbach’s alpha value of 0.95. Reliability analyses showed an intraclass correlation coefficient (95% confidence interval) of 0.96 for the total score. There was a low correlation coefficient between the Turkish version of the Pain Catastrophizing Scale and body mass index, pain levels at rest and during activity, health-related quality of life, and fear and avoidance behaviors. [Conclusion] The results of this study indicate that the Turkish version of the Pain Catastrophizing Scale is a valid and reliable clinical and research tool for patients with ankylosing spondylitis. PMID:26957778
The Children's Play Therapy Instrument (CPTI). Description, development, and reliability studies.

PubMed

Kernberg, P F; Chazan, S E; Normandin, L

1998-01-01

The Children's Play Therapy Instrument (CPTI), its development, and reliability studies are described. The CPTI is a new instrument to examine a child's play activity in individual psychotherapy. Three independent raters used the CPTI to rate eight videotaped play therapy vignettes. Results were compared with the authors' consensual scores from a preliminary study. Generally good to excellent levels of interrater reliability were obtained for the independent raters on intraclass correlation coefficients for ordinal categories of the CPTI. Likewise, kappa levels were acceptable to excellent for nominal categories of the scale. The CPTI holds promise to become a reliable measure of play activity in child psychotherapy. Further research is needed to assess discriminant validity of the CPTI for use as a diagnostic tool and as a measure of process and outcome.
The reliability and validity of fatigue measures during multiple-sprint work: an issue revisited.

PubMed

Glaister, Mark; Howatson, Glyn; Pattison, John R; McInnes, Gill

2008-09-01

The ability to repeatedly produce a high-power output or sprint speed is a key fitness component of most field and court sports. The aim of this study was to evaluate the validity and reliability of eight different approaches to quantify this parameter in tests of multiple-sprint performance. Ten physically active men completed two trials of each of two multiple-sprint running protocols with contrasting recovery periods. Protocol 1 consisted of 12 x 30-m sprints repeated every 35 seconds; protocol 2 consisted of 12 x 30-m sprints repeated every 65 seconds. All testing was performed in an indoor sports facility, and sprint times were recorded using twin-beam photocells. All but one of the formulae showed good construct validity, as evidenced by similar within-protocol fatigue scores. However, the assumptions on which many of the formulae were based, combined with poor or inconsistent test-retest reliability (coefficient of variation range: 0.8-145.7%; intraclass correlation coefficient range: 0.09-0.75), suggested many problems regarding logical validity. In line with previous research, the results support the percentage decrement calculation as the most valid and reliable method of quantifying fatigue in tests of multiple-sprint performance.
Reliability of intra-oral quantitative sensory testing (QST) in patients with atypical odontalgia and healthy controls - a multicentre study.

PubMed

Baad-Hansen, L; Pigg, M; Yang, G; List, T; Svensson, P; Drangsholt, M

2015-02-01

The reliability of comprehensive intra-oral quantitative sensory testing (QST) protocol has not been examined systematically in patients with chronic oro-facial pain. The aim of the present multicentre study was to examine test-retest and interexaminer reliability of intra-oral QST measures in terms of absolute values and z-scores as well as within-session coefficients of variation (CV) values in patients with atypical odontalgia (AO) and healthy pain-free controls. Forty-five patients with AO and 68 healthy controls were subjected to bilateral intra-oral gingival QST and unilateral extratrigeminal QST (thenar) on three occasions (twice on 1 day by two different examiners and once approximately 1 week later by one of the examiners). Intra-class correlation coefficients and kappa values for interexaminer and test-retest reliability were computed. Most of the standardised intra-oral QST measures showed fair to excellent interexaminer (9-12 of 13 measures) and test-retest (7-11 of 13 measures) reliability. Furthermore, no robust differences in reliability measures or within-session variability (CV) were detected between patients with AO and the healthy reference group. These reliability results in chronic orofacial pain patients support earlier suggestions based on data from healthy subjects that intra-oral QST is sufficiently reliable for use as a part of a comprehensive evaluation of patients with somatosensory disturbances or neuropathic pain in the trigeminal region. © 2014 John Wiley & Sons Ltd.
Reliability and fall experience discrimination of Cross Step Moving on Four Spots Test in the elderly.

PubMed

Yamaji, Shunsuke; Demura, Shinichi

2013-07-01

To examine the reliability and fall experience discrimination of the Cross Step Moving on Four Spots Test (CSFT) and the relationship between CSFT and fall-related physical function. The reliability of the CSFT was examined in a test-retest format with the same tester. Fall history, fall risk, fear of falling, activities of daily living (ADL), and various physical parameters were measured for all participants. A community center and university medical school. Elderly community-dwelling subjects (N=533; 62 men, 471 women) aged 65 to 94 years living independently. Not applicable. Time to complete all the CSFT steps required, fall risk score, ADL score, and fall-related physical function (isometric muscle strength: toe grip, plantar flexion, knee extension, hip flexion, hand grip; balance: 1-leg standing time with eyes open, functional reach test using an elastic stick; and gait: 10-m maximal walking speed). The trial-to-trial reliability test indicated good reliability of the CSFT in both sexes (intraclass correlation coefficient =.833 in men, .825 in women). However, trial-to-trial errors increased with an increase in the CSFT values in both sexes. Significant correlations were observed between the CSFT values and scores for most fall-related physical function tests in both sexes. However, the correlation coefficient for all significant correlations was <0.5. Two-way analysis of variance (sex × fall experience) revealed that the fall experience is a significant factor affecting CSFT values; values in fallers were significantly lower than those in nonfallers. The odds ratios in logistic regression analysis were significant in both sexes (men, 1.35; women, 1.48). As determined by the Youden index, the optimal cutoff value for identifying fall experience was 7.32 seconds, with an area under the curve of .676. The CSFT can detect fall experience and is useful in the evaluation of different fall-related physical functions including muscle strength, balance, and mobility. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Validity and test-retest reliability of the self-completion adult social care outcomes toolkit (ASCOT-SCT4) with adults with long-term physical, sensory and mental health conditions in England.

PubMed

Rand, Stacey; Malley, Juliette; Towers, Ann-Marie; Netten, Ann; Forder, Julien

2017-08-18

The Adult Social Care Outcomes Toolkit (ASCOT-SCT4) is a multi-attribute utility index designed for the evaluation of long-term social care services. The measure comprises eight attributes that capture aspects of social care-related quality of life. The instrument has previously been validated with a sample of older adults who used home care services in England. This paper aims to demonstrate the instrument's test-retest reliability and provide evidence for its validity in a diverse sample of adults who use publicly-funded, community-based social care in England. A survey of 770 social care service users was conducted in England. A subsample of 100 services users participated in a follow-up interview between 7 and 21 days after baseline. Spearman rank correlation coefficients between the ASCOT-SCT4 index score and the EQ-5D-3 L, the ICECAP-A or ICECAP-O and overall quality of life were used to assess convergent validity. Data on variables hypothesised to be related to the ASCOT-SCT4 index score, as well as rating of individual attributes, were also collected. Hypothesised relationships were tested using one-way ANOVA or Fisher's exact test. Test-retest reliability was assessed using the intra-class correlation coefficient for the ASCOT-SCT4 index score at baseline and follow-up. There were moderate to strong correlations between the ASCOT-SCT4 index and EQ-5D-3 L, the ICECAP-A or ICECAP-O, and overall quality of life (all correlations ≥ 0.3). The construct validity was further supported by statistically significant hypothesised relationships between the ASCOT-SCT4 index and individual characteristics in univariate and multivariate analysis. There was also further evidence for the construct validity for the revised Food and drink and Dignity items. The test-retest reliability was considered to be good (ICC = 0.783; 95% CI: 0.678-0.857). The ASCOT-SCT4 index has good test-retest reliability for adults with physical or sensory disabilities who use social care services. The index score and the attributes appear to be valid for adults receiving social care for support reasons connected to underlying mental health problems, and physical or sensory disabilities. Further reliability testing with a wider sample of social care users is warranted, as is further exploration of the relationship between the ASCOT-SCT4, ICECAP-A/O and EQ-5D-3 L indices.
Reliability and validity of the Adolescent Stress Questionnaire in a sample of European adolescents - the HELENA study

PubMed Central

2011-01-01

Background Since stress is hypothesized to play a role in the etiology of obesity during adolescence, research on associations between adolescent stress and obesity-related parameters and behaviours is essential. Due to lack of a well-established recent stress checklist for use in European adolescents, the study investigated the reliability and validity of the Adolescent Stress Questionnaire (ASQ) for assessing perceived stress in European adolescents. Methods The ASQ was translated into the languages of the participating cities (Ghent, Stockholm, Vienna, Zaragoza, Pecs and Athens) and was implemented within the HELENA cross-sectional study. A total of 1140 European adolescents provided a valid ASQ, comprising 10 component scales, used for internal reliability (Cronbach α) and construct validity (confirmatory factor analysis or CFA). Contributions of socio-demographic (gender, age, pubertal stage, socio-economic status) characteristics to the ASQ score variances were investigated. Two-hundred adolescents also provided valid saliva samples for cortisol analysis to compare with the ASQ scores (criterion validity). Test-retest reliability was investigated using two ASQ assessments from 37 adolescents. Results Cronbach α-values of the ASQ scales (0.57 to 0.88) demonstrated a moderate internal reliability of the ASQ, and intraclass correlation coefficients (0.45 to 0.84) established an insufficient test-retest reliability of the ASQ. The adolescents' gender (girls had higher stress scores than boys) and pubertal stage (those in a post-pubertal development had higher stress scores than others) significantly contributed to the variance in ASQ scores, while their age and socio-economic status did not. CFA results showed that the original scale construct fitted moderately with the data in our European adolescent population. Only in boys, four out of 10 ASQ scale scores were a significant positive predictor for baseline wake-up salivary cortisol, suggesting a rather poor criterion validity of the ASQ, especially in girls. Conclusions In our European adolescent sample, the ASQ had an acceptable internal reliability and construct validity and the adolescents' gender and pubertal stage systematically contributed to the ASQ variance, but its test-retest reliability and criterion validity were rather poor. Overall, the utility of the ASQ for assessing perceived stress in adolescents across Europe is uncertain and some aspects require further examination. PMID:21943341
Reliability and validity of the Chinese version of the Pediatric Quality Of Life InventoryTM (PedsQLTM) 3.0 neuromuscular module in children with Duchenne muscular dystrophy.

PubMed

Hu, Jun; Jiang, Li; Hong, Siqi; Cheng, Li; Kong, Min; Ye, Yuanzhen

2013-03-15

The Pediatric Quality of Life Inventory(TM) (PedsQL(TM)) is a widely used instrument to measure pediatric health-related quality of life (HRQOL) in children aged 2 to 18 years. The current study aimed to evaluate the reliability and validity of the Chinese version of the PedsQL(TM) 3.0 Neuromuscular Module in children with Duchenne muscular dystrophy (DMD). The PedsQL(TM) 3.0 Neuromuscular Module was translated into Chinese following PedsQL(TM) Measurement Model Translation Methodology. The Chinese version scale was administered to 56 children with DMD and their parents, and the psychometric properties were evaluated. The missing value percentages for each item of the Chinese version scale ranged from 0.00% to 0.54%. Internal consistency reliability approached or exceeded the minimum reliability standard of α = 0.7 (child α = 0.81, parent α = 0.86). Test-retest reliability was satisfactory, with intraclass correlation coefficients (ICCs) of 0.66 for children and 0.88 for parents (P < 0.01). Correlation coefficients between iteims and their hypothesized subscales were higher than those with other subscales (P < 0.05). The subscale of "About My/My Child's Neuromuscular Disease" significantly related to mobility and stair climbing status (Child t = 2.21, Parent t = 2.83, P < 0.05). The inter-correlations among the Chinese version of the PedsQL(TM) 3.0 Neuromuscular Module and the PedsQL(TM) 4.0 Generic Core Scales had medium to large effect sizes (P < 0.05). The child self-report scores were in moderate agreement with the parent proxy-report scores (ICC = 0.51, P < 0.05). The Chinese version of the PedsQLTM 3.0 Neuromuscular Module has acceptable psychometric properties. It is a reliable measure of disease-specific HRQOL in Chinese children with DMD.
Construct validity and reliability of the Finnish version of the Knee Injury and Osteoarthritis Outcome Score.

PubMed

Multanen, Juhani; Honkanen, Mikko; Häkkinen, Arja; Kiviranta, Ilkka

2018-05-22

The Knee Injury and Osteoarthritis Outcome Score (KOOS) is a commonly used knee assessment and outcome tool in both clinical work and research. However, it has not been formally translated and validated in Finnish. The purpose of this study was to translate and culturally adapt the KOOS questionnaire into Finnish and to determine its validity and reliability among Finnish middle-aged patients with knee injuries. KOOS was translated and culturally adapted from English into Finnish. Subsequently, 59 patients with knee injuries completed the Finnish version of KOOS, Western Ontario and McMaster Osteoarthritis Index (WOMAC), Short-Form 36 Health Survey (SF-36) and Numeric Pain Rating Scale (Pain-NRS). The same KOOS questionnaire was re-administered 2 weeks later. Psychometric assessment of the Finnish KOOS was performed by testing its construct validity and reliability by using internal consistency, test-retest reliability and measurement error. The floor and ceiling effects were also examined. The cross-cultural adaptation revealed only minor cultural differences and was well received by the patients. For construct validity, high to moderate Spearman's Correlation Coefficients were found between the KOOS subscales and the WOMAC, SF-36, and Pain-NRS subscales. The Cronbach's alpha was from 0.79 to 0.96 for all subscales indicating acceptable internal consistency. The test-retest reliability was good to excellent, with Intraclass Correlation Coefficients ranging from 0.73 to 0.86 for all KOOS subscales. The minimal detectable change ranged from 17 to 34 on an individual level and from 2 to 4 on a group level. No floor or ceiling effects were observed. This study yielded an appropriately translated and culturally adapted Finnish version of KOOS which demonstrated good validity and reliability. Our data indicate that the Finnish version of KOOS is suitable for assessment of the knee status of Finnish patients with different knee complaints. Further studies are needed to evaluate the predictive ability of KOOS in the Finnish population.
Cross-cultural adaptation and validation of the reliability of the Thai version of the Hip disability and Osteoarthritis Outcome Score (HOOS).

PubMed

Trathitiphan, Warayos; Paholpak, Permsak; Sirichativapee, Winai; Wisanuyotin, Taweechok; Laupattarakasem, Pat; Sukhonthamarn, Kamolsak; Jeeravipoolvarn, Polasak; Kosuwon, Weerachai

2016-10-01

HOOS was developed as an extension of the Western Ontario and McMaster Universities' Osteoarthritis Index questionnaire for measuring symptoms and functional limitations related to the hip(s) of patients with osteoarthritis. To determine the validity and reliability of the Thai version of the Hip disability and Osteoarthritis Outcome Score (HOOS) vis-à-vis hip osteoarthritis, the original HOOS was translated into a Thai version of HOOS, according to international recommendations. Patients with hip osteoarthritis (n = 57; 25 males) were asked to complete the Thai version of HOOS twice: once then again after a 3-week interval. The test-retest reliability was analyzed using the intraclass correlation coefficient (ICC). Internal consistencies were analyzed using Cronbach's alpha, while the construct validity was tested by comparing the Thai HOOS with the Thai modified SF-36 and calculating the Spearman's rank correlation coefficients. The Thai HOOS produced good reliability (i.e., the ICC was greater than 0.9 in all five subscales). All of the Cronbach's alpha showed that the Thai HOOS had high internal consistency (Cronbach's alpha greater than 0.8), especially for the pain and ADL subscales (0.89 and 0.90, respectively). The Spearman's rank correlation for all five subscales of the Thai HOOS had moderate correlation with the Bodily Pain subscale of the Thai SF-36. The pain subscale of the Thai HOOS had a high correlation with the Vitality and Social Function subscales of the Thai SF-36 (r = 0.55 and 0.54)-with which the symptom subscale had a moderate correlation. The Thai version of HOOS had excellent internal consistency, excellent test-retest reliability, and good construct validity. It can be used as a reliable tool for assessing quality of life for patients with hip osteoarthritis in Thailand.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.