Marawar, Satyajit V; Madom, Ian A; Palumbo, Mark; Tallarico, Richard A; Ordway, Nathaniel R; Metkar, Umesh; Wang, Dongliang; Green, Adam; Lavelle, William F
2017-01-01
Treating surgeon's visual assessment of axial MRI images to ascertain the degree of stenosis has a critical impact on surgical decision-making. The purpose of this study was to prospectively analyze the impact of surgeon experience on inter-observer and intra-observer reliability of assessing severity of spinal stenosis on MRIs by spine surgeons directly involved in surgical decision-making. Seven fellowship trained spine surgeons reviewed MRI studies of 30 symptomatic patients with lumbar stenosis and graded the stenosis in the central canal, the lateral recess and the foramen at T12-L1 to L5-S1 as none, mild, moderate or severe. No specific instructions were provided to what constituted mild, moderate, or severe stenosis. Two surgeons were "senior" (>fifteen years of practice experience); two were "intermediate" (>four years of practice experience), and three "junior" (< one year of practice experience). The concordance correlation coefficient (CCC) was calculated to assess inter-observer reliability. Seven MRI studies were duplicated and randomly re-read to evaluate inter-observer reliability. Surgeon experience was found to be a strong predictor of inter-observer reliability. Senior inter-observer reliability was significantly higher assessing central(p<0.001), foraminal p=0.005 and lateral p=0.001 than "junior" group.Senior group also showed significantly higher inter-observer reliability that intermediate group assessing foraminal stenosis (p=0.036). In intra-observer reliability the results were contrary to that found in inter-observer reliability. Inter-observer reliability of assessing stenosis on MRIs increases with surgeon experience. Lower intra-observer reliability values among the senior group, although not clearly explained, may be due to the small number of MRIs evaluated and quality of MRI images.Level of evidence: Level 3.
Exploring Differences in Measurement and Reporting of Classroom Observation Inter-Rater Reliability
ERIC Educational Resources Information Center
Wilhelm, Anne Garrison; Gillespie Rouse, Amy; Jones, Francesca
2018-01-01
Although inter-rater reliability is an important aspect of using observational instruments, it has received little theoretical attention. In this article, we offer some guidance for practitioners and consumers of classroom observations so that they can make decisions about inter-rater reliability, both for study design and in the reporting of data…
Reliability of joint count assessment in rheumatoid arthritis: a systematic literature review.
Cheung, Peter P; Gossec, Laure; Mak, Anselm; March, Lyn
2014-06-01
Joint counts are central to the assessment of rheumatoid arthritis (RA) but reliability is an issue. To evaluate the reliability and agreement of joint counts (intra-observer and inter-observer) by health care professionals (physicians, nurses, and metrologists) and patients in RA, and the impact of training and standardization on joint count reliability through a systematic literature review. Articles reporting joint count reliability or agreement in RA in PubMed, EMBase, and the Cochrane library between 1960 and 2012 were selected. Data were extracted regarding tender joint counts (TJCs) and swollen joint counts (SJCs) derived by physicians, metrologists, or patients for intra-observer and inter-observer reliability. In addition, methods and effects of training or standardization were extracted. Statistics expressing reliability such as intraclass correlation coefficients (ICCs) were extracted. Data analysis was primarily descriptive due to high heterogeneity. Twenty-eight studies on health care professionals (HCP) and 20 studies on patients were included. Intra-observer reliability for TJCs and SJCs was good for HCPs and patients (range of ICC: 0.49-0.98). Inter-observer reliability between HCPs for TJCs was higher than for SJCs (range of ICC: 0.64-0.88 vs. 0.29-0.98). Patient inter-observer reliability with HCPs as comparators was better for TJCs (range of ICC: 0.31-0.91) compared to SJCs (0.16-0.64). Nine studies (7 with HCPs and 2 with patients) evaluated consensus or training, with improvement in reliability of TJCs but conflicting evidence for SJCs. Intra- and inter-observer reliability was high for TJCs for HCPs and patients: among all groups, reliability was better for TJCs than SJCs. Inter-observer reliability of SJCs was poorer for patients than HCPs. Data were inconclusive regarding the potential for training to improve SJC reliability. Overall, the results support further evaluation for patient-reported joint counts as an outcome measure. © 2013 Published by Elsevier Inc.
Smith, Toby O; Clark, Allan; Neda, Sophia; Arendt, Elizabeth A; Post, William R; Grelsamer, Ronald P; Dejour, David; Almqvist, Karl Fredrik; Donell, Simon T
2012-08-01
An accurate physical examination of patients with patellar instability is an important aspect of the diagnosis and treatment. While previous studies have assessed the diagnostic accuracy of such physical examination tests, little has been undertaken to assess the inter- and intra-tester reliability of such techniques. The purpose of this study was to determine the inter- and intra-tester reliability of the physical examination tests used for patients with patellar instability. Five patients (10 knees) with bilateral recurrent patellar instability were assessed by five members of the International Patellofemoral Study Group. Each surgeon assessed each patient twice using 18 reported physical examination tests. The inter- and intra-observer reliability was assessed using weighted Kappa statistics with 95% confidence intervals. The findings of the study suggested that there were very poor inter-observer reliability for the majority of the physical tests, with only the assessments of patellofemoral crepitus, foot arch position and the J-sign presenting with fair to moderate agreement respectively. The intra-observer reliability indicated largely moderate to substantial agreement between the first and second tests performed by each assessor, with the greatest agreement seen for the assessment of tibial torsion, popliteal angle and the Bassett's sign. For the common physical examination tests used in the management of patients with patellar instability inter-observer reliability is poor, while intra-observer reliability is moderate. Standardization of physical exam assessments and further study of these results among different clinicians and more divergent patient groups is indicated. Copyright © 2011 Elsevier B.V. All rights reserved.
Liu, Ying-Buh; Yang, Stephen S; Hsieh, Cheng-Hsing; Lin, Chia-Da; Chang, Shang-Jen
2014-05-01
To evaluate the inter-observer, intra-observer and intra-individual reliability of uroflowmetry and post-void residual urine (PVR) tests in adult men. Healthy volunteers aged over 40 years were enrolled. Every participant underwent two sets of uroflowmetry and PVR tests with a 2-week interval between the tests. The uroflowmetry tests were interpreted by four urologists independently. Uroflowmetry curves were classified as bell-shaped, bell-shaped with tail, obstructive, restrictive, staccato, interrupted and tower-shaped and scored from 1 (highly abnormal) to 5 (absolutely normal). The agreements between the observers, interpretations and tests within individuals were analyzed using kappa statistics and intraclass correlation coefficients. Generalizability theory with decision analysis was used to determine how many observers, tests, and interpretations were needed to obtain an acceptable reliability (> 0.80). Of 108 volunteers, we randomly selected the uroflowmetry results from 25 participants for the evaluation of reliability. The mean age of the studied adults was 55.3 years. The intra-individual and intra-observer reliability on uroflowmetry tests ranged from good to very good. However, the inter-observer reliability on normalcy and specific type of flow pattern were relatively lower. In generalizability theory, three observers were needed to obtain an acceptable reliability on normalcy of uroflow pattern if the patient underwent uroflowmetry tests twice with one observation. The intra-individual and intra-observer reliability on uroflowmetry tests were good while the inter-observer reliability was relatively lower. To improve inter-observer reliability, the definition of uroflowmetry should be clarified by the International Continence Society. © 2013 Wiley Publishing Asia Pty Ltd.
Karstad, Kristina; Rugulies, Reiner; Skotte, Jørgen; Munch, Pernille Kold; Greiner, Birgit A; Burdorf, Alex; Søgaard, Karen; Holtermann, Andreas
2018-05-01
The aim of the study was to develop and evaluate the reliability of the "Danish observational study of eldercare work and musculoskeletal disorders" (DOSES) observation instrument to assess physical and psychosocial risk factors for musculoskeletal disorders (MSD) in eldercare work. During 1.5 years, sixteen raters conducted 117 inter-rater observations from 11 nursing homes. Reliability was evaluated using percent agreement and Gwet's AC1 coefficient. Of the 18 examined items, inter-rater reliability was excellent for 7 items (AC1>0.75) fair to good for 7 items (AC1 0.40-0.75) and poor for 2 items (AC1 0-0.40). For 2 items there was no agreement between the raters (AC1 <0). The reliability did not differ between the first and second half of the data collection period and the inter-rater observations were representative regarding occurrence of events in eldercare work. The instrument is appropriate for assessing physical and psychosocial risk factors for MSD among eldercare workers. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
Dwyer, Tim; Whelan, Daniel B; Khoshbin, Amir; Wasserstein, David; Dold, Andrew; Chahal, Jaskarndip; Nauth, Aaron; Murnaghan, M Lucas; Ogilvie-Harris, Darrell J; Theodoropoulos, John S
2015-04-01
The objective of this study was to establish the intra- and inter-observer reliability of hamstring graft measurement using cylindrical sizing tubes. Hamstring tendons (gracilis and semitendinosus) were harvested from ten cadavers by a single surgeon and whip stitched together to create ten 4-strand hamstring grafts. Ten sports medicine surgeons and fellows sized each graft independently using either hollow cylindrical sizers or block sizers in 0.5-mm increments—the sizing technique used was applied consistently to each graft. Surgeons moved sequentially from graft to graft and measured each hamstring graft twice. Surgeons were asked to state the measured proximal (femoral) and distal (tibial) diameter of each graft, as well as the diameter of the tibial and femoral tunnels that they would drill if performing an anterior cruciate ligament (ACL) reconstruction using that graft. Reliability was established using intra-class correlation coefficients. Overall, both the inter-observer and intra-observer agreement were >0.9, demonstrating excellent reliability. The inter-observer reliability for drill sizes was also excellent (>0.9). Excellent correlation was seen between cylindrical sizing, and drill sizes (>0.9). Sizing of hamstring grafts by multiple surgeons demonstrated excellent intra-observer and intra-observer reliability, potentially validating clinical studies exploring ACL reconstruction outcomes by hamstring graft diameter when standard techniques are used. III.
Bonasia, Davide Edoardo; Marmotti, Antongiulio; Massa, Alessandro Domenico Felice; Ferro, Andrea; Blonna, Davide; Castoldi, Filippo; Rossi, Roberto
2015-09-01
In the last two decades, many surgical techniques have been described for articular cartilage repair. Reliable histological scoring systems are fundamental tools to evaluate new procedures. Several histological scoring systems have been described, and these can be divided in elementary and comprehensive scores, according to the number of sub-items. The aim of this study was to test the inter- and intra-observer reliability of ten main scores used for the histological evaluation of in vivo cartilage repair. The authors tested the starting hypothesis that elementary scores would show superior intra- and inter-observer reliability compared with comprehensive scores. Fifty histological sections obtained from the trochlea of New Zealand Rabbit and stained with Safranin-O fast green were used. The histological sections were analysed by 4 observers: 2 experienced in cartilage histology and 2 inexperienced. Histological evaluations were performed at time 1 and time 2, separated by a 30-day interval. The following scores were used: Mankin, O'Driscoll, Pineda, Wakitani, Fortier, Selleres, ICRS, ICRSII, Oswestry (OsScore) and modified O'Driscoll. Intra- and inter-observer reliability were evaluated for each score. In addition, the pavement-ceiling effect and the Bland-Altman Coefficient of Repeatability were then evaluated for each sub-item of every score. Intra-observer reliability was high for all observers in every score, even though the reliability was significantly lower for non-expert observers compared with expert counterparts. In terms of Coefficient of Repeatability, some scores performed better (O'Driscoll, Modified O'Driscoll and ICRSII) than others (Fortier, Seller). Inter-observer reliability was high for all observers in every score, but significantly lower for non-expert compared with expert observers. In expert hands, all the scores showed high intra- and inter-observer reliability, independently of the complexity. Although every score has advantages and disadvantages, ICRSII, O'Driscoll and Modified O'Driscoll scores should be preferred for the evaluation of in vivo cartilage repair in animal models.
The reliability of four widely used patellar height ratios.
van Duijvenbode, Dennis; Stavenuiter, Michel; Burger, Bart; van Dijke, Cees; Spermon, Jacco; Hoozemans, Marco
2016-03-01
The objective of this study was to evaluate the inter-observer reliability and the intra-observer reliability of four patellar height ratios: Insall-Salvati (IS), modified Insall-Salvati (MIS), Blackburne-Peel (BP) and Caton-Deschamps (CD). The patellar height ratios were assessed by four independent examiners using weight-bearing lateral knee radiographs in 30° flexion. Intra-class correlation coefficients and Fleiss' kappa's were determined. The inter-observer reliability was excellent for the IS and moderate for the other ratios. When the ratio values were categorized, the inter-observer reliability was strong for the IS, moderate for the MIS and BP, and poor for the CD. The intra-observer reliability was excellent for the IS, MIS and CD, and strong for the BP. When the ratio values were categorized, the intra-observer reliability was strong for the IS and MIS, and moderate for the other ratios. Although the IS showed best reliability, we advise to use the MIS as it showed the second best reliability but is, according to the literature, associated with better validity.
van Hamersvelt, Robbert W; Willemink, Martin J; Takx, Richard A P; Eikendal, Anouk L M; Budde, Ricardo P J; Leiner, Tim; Mol, Christian P; Isgum, Ivana; de Jong, Pim A
2014-07-01
To determine inter-observer and inter-examination variability for aortic valve calcification (AVC) and mitral valve and annulus calcification (MC) in low-dose unenhanced ungated lung cancer screening chest computed tomography (CT). We included 578 lung cancer screening trial participants who were examined by CT twice within 3 months to follow indeterminate pulmonary nodules. On these CTs, AVC and MC were measured in cubic millimetres. One hundred CTs were examined by five observers to determine the inter-observer variability. Reliability was assessed by kappa statistics (κ) and intra-class correlation coefficients (ICCs). Variability was expressed as the mean difference ± standard deviation (SD). Inter-examination reliability was excellent for AVC (κ = 0.94, ICC = 0.96) and MC (κ = 0.95, ICC = 0.90). Inter-examination variability was 12.7 ± 118.2 mm(3) for AVC and 31.5 ± 219.2 mm(3) for MC. Inter-observer reliability ranged from κ = 0.68 to κ = 0.92 for AVC and from κ = 0.20 to κ = 0.66 for MC. Inter-observer ICC was 0.94 for AVC and ranged from 0.56 to 0.97 for MC. Inter-observer variability ranged from -30.5 ± 252.0 mm(3) to 84.0 ± 240.5 mm(3) for AVC and from -95.2 ± 210.0 mm(3) to 303.7 ± 501.6 mm(3) for MC. AVC can be quantified with excellent reliability on ungated unenhanced low-dose chest CT, but manual detection of MC can be subject to substantial inter-observer variability. Lung cancer screening CT may be used for detection and quantification of cardiac valve calcifications. • Low-dose unenhanced ungated chest computed tomography can detect cardiac valve calcifications. • However, calcified cardiac valves are not reported by most radiologists. • Inter-observer and inter-examination variability of aortic valve calcifications is sufficient for longitudinal studies. • Volumetric measurement variability of mitral valve and annulus calcifications is substantial.
Apeldoorn, Adri T.; van Helvoirt, Hans; Ostelo, Raymond W.; Meihuizen, Hanneke; Kamper, Steven J.; van Tulder, Maurits W.; de Vet, Henrica C. W.
2016-01-01
Study design Observational inter-rater reliability study. Objectives To examine: (1) the inter-rater reliability of a modified version of Delitto et al.’s classification-based algorithm for patients with low back pain; (2) the influence of different levels of familiarity with the system; and (3) the inter-rater reliability of algorithm decisions in patients who clearly fit into a subgroup (clear classifications) and those who do not (unclear classifications). Methods Patients were examined twice on the same day by two of three participating physical therapists with different levels of familiarity with the system. Patients were classified into one of four classification groups. Raters were blind to the others’ classification decision. In order to quantify the inter-rater reliability, percentages of agreement and Cohen’s Kappa were calculated. Results A total of 36 patients were included (clear classification n = 23; unclear classification n = 13). The overall rate of agreement was 53% and the Kappa value was 0·34 [95% confidence interval (CI): 0·11–0·57], which indicated only fair inter-rater reliability. Inter-rater reliability for patients with a clear classification (agreement 52%, Kappa value 0·29) was not higher than for patients with an unclear classification (agreement 54%, Kappa value 0·33). Familiarity with the system (i.e. trained with written instructions and previous research experience with the algorithm) did not improve the inter-rater reliability. Conclusion Our pilot study challenges the inter-rater reliability of the classification procedure in clinical practice. Therefore, more knowledge is needed about factors that affect the inter-rater reliability, in order to improve the clinical applicability of the classification scheme. PMID:27559279
Connors, Brenda L.; Rende, Richard; Colton, Timothy J.
2014-01-01
The unique yield of collecting observational data on human movement has received increasing attention in a number of domains, including the study of decision-making style. As such, interest has grown in the nuances of core methodological issues, including the best ways of assessing inter-rater reliability. In this paper we focus on one key topic – the distinction between establishing reliability for the patterning of behaviors as opposed to the computation of raw counts – and suggest that reliability for each be compared empirically rather than determined a priori. We illustrate by assessing inter-rater reliability for key outcome measures derived from movement pattern analysis (MPA), an observational methodology that records body movements as indicators of decision-making style with demonstrated predictive validity. While reliability ranged from moderate to good for raw counts of behaviors reflecting each of two Overall Factors generated within MPA (Assertion and Perspective), inter-rater reliability for patterning (proportional indicators of each factor) was significantly higher and excellent (ICC = 0.89). Furthermore, patterning, as compared to raw counts, provided better prediction of observable decision-making process assessed in the laboratory. These analyses support the utility of using an empirical approach to inform the consideration of measuring patterning versus discrete behavioral counts of behaviors when determining inter-rater reliability of observable behavior. They also speak to the substantial reliability that may be achieved via application of theoretically grounded observational systems such as MPA that reveal thinking and action motivations via visible movement patterns. PMID:24999336
Connors, Brenda L; Rende, Richard; Colton, Timothy J
2014-01-01
The unique yield of collecting observational data on human movement has received increasing attention in a number of domains, including the study of decision-making style. As such, interest has grown in the nuances of core methodological issues, including the best ways of assessing inter-rater reliability. In this paper we focus on one key topic - the distinction between establishing reliability for the patterning of behaviors as opposed to the computation of raw counts - and suggest that reliability for each be compared empirically rather than determined a priori. We illustrate by assessing inter-rater reliability for key outcome measures derived from movement pattern analysis (MPA), an observational methodology that records body movements as indicators of decision-making style with demonstrated predictive validity. While reliability ranged from moderate to good for raw counts of behaviors reflecting each of two Overall Factors generated within MPA (Assertion and Perspective), inter-rater reliability for patterning (proportional indicators of each factor) was significantly higher and excellent (ICC = 0.89). Furthermore, patterning, as compared to raw counts, provided better prediction of observable decision-making process assessed in the laboratory. These analyses support the utility of using an empirical approach to inform the consideration of measuring patterning versus discrete behavioral counts of behaviors when determining inter-rater reliability of observable behavior. They also speak to the substantial reliability that may be achieved via application of theoretically grounded observational systems such as MPA that reveal thinking and action motivations via visible movement patterns.
Hand assessment in older adults with musculoskeletal hand problems: a reliability study.
Myers, Helen L; Thomas, Elaine; Hay, Elaine M; Dziedzic, Krysia S
2011-01-07
Musculoskeletal hand pain is common in the general population. This study aims to investigate the inter- and intra-observer reliability of two trained observers conducting a simple clinical interview and physical examination for hand problems in older adults. The reliability of applying the American College of Rheumatology (ACR) criteria for hand osteoarthritis to community-dwelling older adults will also be investigated. Fifty-five participants aged 50 years and over with a current self-reported hand problem and registered with one general practice were recruited from a previous health questionnaire study. Participants underwent a standardised, structured clinical interview and physical examination by two independent trained observers and again by one of these observers a month later. Agreement beyond chance was summarised using Kappa statistics and intra-class correlation coefficients. Median values for inter- and intra-observer reliability for clinical interview questions were found to be "substantial" and "moderate" respectively [median agreement beyond chance (Kappa) was 0.75 (range: -0.03, 0.93) for inter-observer ratings and 0.57 (range: -0.02, 1.00) for intra-observer ratings]. Inter- and intra-observer reliability for physical examination items was variable, with good reliability observed for some items, such as grip and pinch strength, and poor reliability observed for others, notably assessment of altered sensation, pain on resisted movement and judgements based on observation and palpation of individual features at single joints, such as bony enlargement, nodes and swelling. Moderate agreement was observed both between and within observers when applying the ACR criteria for hand osteoarthritis. Standardised, structured clinical interview is reliable for taking a history in community-dwelling older adults with self reported hand problems. Agreement between and within observers for physical examination items is variable. Low Kappa values may have resulted, in part, from a low prevalence of clinical signs and symptoms in the study participants. The decision to use clinical interview and hand assessment variables in clinical practice or further research in primary care should include consideration of clinical applicability and training alongside reliability. Further investigation is required to determine the relationship between these clinical questions and assessments and the clinical course of hand pain and hand problems in community-dwelling older adults.
Reliability of anthropometric measurements in European preschool children: the ToyBox-study.
De Miguel-Etayo, P; Mesana, M I; Cardon, G; De Bourdeaudhuij, I; Góźdź, M; Socha, P; Lateva, M; Iotova, V; Koletzko, B V; Duvinage, K; Androutsos, O; Manios, Y; Moreno, L A
2014-08-01
The ToyBox-study aims to develop and test an innovative and evidence-based obesity prevention programme for preschoolers in six European countries: Belgium, Bulgaria, Germany, Greece, Poland and Spain. In multicentre studies, anthropometric measurements using standardized procedures that minimize errors in the data collection are essential to maximize reliability of measurements. The aim of this paper is to describe the standardization process and reliability (intra- and inter-observer) of height, weight and waist circumference (WC) measurements in preschoolers. All technical procedures and devices were standardized and centralized training was given to the fieldworkers. At least seven children per country participated in the intra- and inter-observer reliability testing. Intra-observer technical error ranged from 0.00 to 0.03 kg for weight and from 0.07 to 0.20 cm for height, with the overall reliability being above 99%. A second training was organized for WC due to low reliability observed in the first training. Intra-observer technical error for WC ranged from 0.12 to 0.71 cm during the first training and from 0.05 to 1.11 cm during the second training, and reliability above 92% was achieved. Epidemiological surveys need standardized procedures and training of researchers to reduce measurement error. In the ToyBox-study, very good intra- and-inter-observer agreement was achieved for all anthropometric measurements performed. © 2014 World Obesity.
Eliasson, Kristina; Palm, Peter; Nyman, Teresia; Forsman, Mikael
2017-07-01
A common way to conduct practical risk assessments is to observe a job and report the observed long term risks for musculoskeletal disorders. The aim of this study was to evaluate the inter- and intra-observer reliability of ergonomists' risk assessments without the support of an explicit risk assessment method. Twenty-one experienced ergonomists assessed the risk level (low, moderate, high risk) of eight upper body regions, as well as the global risk of 10 video recorded work tasks. Intra-observer reliability was assessed by having nine of the ergonomists repeat the procedure at least three weeks after the first assessment. The ergonomists made their risk assessment based on his/her experience and knowledge. The statistical parameters of reliability included agreement in %, kappa, linearly weighted kappa, intraclass correlation and Kendall's coefficient of concordance. The average inter-observer agreement of the global risk was 53% and the corresponding weighted kappa (K w ) was 0.32, indicating fair reliability. The intra-observer agreement was 61% and 0.41 (K w ). This study indicates that risk assessments of the upper body, without the use of an explicit observational method, have non-acceptable reliability. It is therefore recommended to use systematic risk assessment methods to a higher degree. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
IRR (Inter-Rater Reliability) of a COP (Classroom Observation Protocol)--A Critical Appraisal
ERIC Educational Resources Information Center
Rui, Ning; Feldman, Jill M.
2012-01-01
Notwithstanding broad utility of COPs (classroom observation protocols), there has been limited documentation of the psychometric properties of even the most popular COPs. This study attempted to fill this void by closely examining the item and domain-level IRR (inter-rater reliability) of a COP that was used in a federally funded striving readers…
Inter- and intra-observer reliability of clinical movement-control tests for marines
2012-01-01
Background Musculoskeletal disorders particularly in the back and lower extremities are common among marines. Here, movement-control tests are considered clinically useful for screening and follow-up evaluation. However, few studies have addressed the reliability of clinical tests, and no such published data exists for marines. The present aim was therefore to determine the inter- and intra-observer reliability of clinically convenient tests emphasizing movement control of the back and hip among marines. A secondary aim was to investigate the sensitivity and specificity of these clinical tests for discriminating musculoskeletal pain disorders in this group of military personnel. Methods This inter- and intra-observer reliability study used a test-retest approach with six standardized clinical tests focusing on movement control for back and hip. Thirty-three marines (age 28.7 yrs, SD 5.9) on active duty volunteered and were recruited. They followed an in-vivo observation test procedure that covered both low- and high-load (threshold) tasks relevant for marines on operational duty. Two independent observers simultaneously rated performance as “correct” or “incorrect” following a standardized assessment protocol. Re-testing followed 7–10 days thereafter. Reliability was analysed using kappa (κ) coefficients, while discriminative power of the best-fitting tests for back- and lower-extremity pain was assessed using a multiple-variable regression model. Results Inter-observer reliability for the six tests was moderate to almost perfect with κ-coefficients ranging between 0.56-0.95. Three tests reached almost perfect inter-observer reliability with mean κ-coefficients > 0.81. However, intra-observer reliability was fair-to-moderate with mean κ-coefficients between 0.22-0.58. Three tests achieved moderate intra-observer reliability with κ-coefficients > 0.41. Combinations of one low- and one high-threshold test best discriminated prior back pain, but results were inconsistent for lower-extremity pain. Conclusions Our results suggest that clinical tests of movement control of back and hip are reliable for use in screening protocols using several observers with marines. However, test-retest reproducibility was less accurate, which should be considered in follow-up evaluations. The results also indicate that combinations of low- and high-threshold tests have discriminative validity for prior back pain, but were inconclusive for lower-extremity pain. PMID:23273285
Lavelle, William F; Ranade, Ashish; Samdani, Amer F; Gaughan, John P; D'Andrea, Linda P; Betz, Randal R
2014-01-01
Pedicle screws are used increasingly in spine surgery. Concerns of complications associated with screw breach necessitates accurate pedicle screw placement. Postoperative CT imaging helps to detect screw malposition and assess its severity. However, accuracy is dependent on the reading of the CT scans. Inter- and intra-observer variability could affect the reliability of CT scans to assess multiple screw types and sites. The purpose of this study was to assess the reliability of multi-observer analysis of CT scans for determining pedicle screw breach for various screw types and sites in patients with spinal deformity or degenerative pathologies. Axial CT scan images of 23 patients (286 screws) were read by four experienced spine surgeons. Pedicle screw placement was considered 'In' when the screw was fully contained and/or the pedicle wall breach was ≤2 mm. 'Out' was defined as a breach in the medial or lateral pedicle wall >2 mm. Intra-class coefficients (ICC) were calculated to assess the inter- and intra-observer reliability. Marked inter- and intra-observer variability was noticed. The overall inter-observer ICC was 0.45 (95% confidence limits 0.25 to 0.65). The intra-observer ICC was 0.49 (95% confidence limits 0.29 to 0.69). Underlying spinal pathology, screw type, and patient age did not seem to impact the reliability of our CT assessments. Our results indicate the evaluation of pedicle screw breach on CT by a single surgeon is highly variable, and care should be taken when using individual CT evaluations of millimeters of breach as a basis for screw removal. This was a Level III study.
Hesketh, Kim; Sankar, Wudbhav; Joseph, Benjamin; Narayanan, Unni; Mulpuri, Kishore
2016-04-01
The incidence of avascular necrosis (AVN) following reconstructive hip surgery in cerebral palsy (CP) ranges from 0 to 69 % in the current literature. The purpose of this study was to determine the inter- and intra-observer reliability of radiographically diagnosing AVN in children with CP after hip surgery. A retrospective review of 65 children with CP who had reconstructive hip surgery between 2009 and 2012 at BC Children's Hospital was completed. Anterior-posterior and lateral radiographs were presented to four pediatric orthopaedic surgeons over two rounds. Surgeons were asked to review the set of unidentified radiographs and comment 'yes' or 'no' for the presence of AVN. Two weeks later the same set of radiographs was sent in a different order and the surgeons were again asked to comment on AVN. Inter- and intra-observer reliability was determined using kappa statistics. The intra-observer reliability ranged from 0.65 to 0.88 with an average score of 0.76. Inter-observer reliability showed greater variability, ranging from 0.41 to 0.77 with an average score of 0.56 across all surgeons. Although the intra-rater reliability produced a strength of "good" and the inter-rater reliability a strength of "moderate" agreement, the variability within these scores is clinically important as it demonstrates the difficulty in identifying AVN. This may explain the variability in AVN that is reported in the literature. The need for further education and research in the diagnosis of AVN in children with CP who have undergone reconstructive hip surgery is clinically necessary.
Reliability of visual and instrumental color matching.
Igiel, Christopher; Lehmann, Karl Martin; Ghinea, Razvan; Weyhrauch, Michael; Hangx, Ysbrand; Scheller, Herbert; Paravina, Rade D
2017-09-01
The aim of this investigation was to evaluate intra-rater and inter-rater reliability of visual and instrumental shade matching. Forty individuals with normal color perception participated in this study. The right maxillary central incisor of a teaching model was prepared and restored with 10 feldspathic all-ceramic crowns of different shades. A shade matching session consisted of the observer (rater) visually selecting the best match by using VITA classical A1-D4 (VC) and VITA Toothguide 3D Master (3D) shade guides and the VITA Easyshade Advance intraoral spectrophotometer (ES) to obtain both VC and 3D matches. Three shade matching sessions were held with 4 to 6 weeks between sessions. Intra-rater reliability was assessed based on the percentage of agreement for the three sessions for the same observer, whereas the inter-rater reliability was calculated as mean percentage of agreement between different observers. The Fleiss' Kappa statistical analysis was used to evaluate visual inter-rater reliability. The mean intra-rater reliability for the visual shade selection was 64(11) for VC and 48(10) for 3D. The corresponding ES values were 96(4) for both VC and 3D. The percentages of observers who matched the same shade with VC and 3D were 55(10) and 43(12), respectively, while corresponding ES values were 88(8) for VC and 92(4) for 3D. The results for visual shade matching exhibited a high to moderate level of inconsistency for both intra-rater and inter-rater comparisons. The VITA Easyshade Advance intraoral spectrophotometer exhibited significantly better reliability compared with visual shade selection. This study evaluates the ability of observers to consistently match the same shade visually and with a dental spectrophotometer in different sessions. The intra-rater and inter-rater reliability (agreement of repeated shade matching) of visual and instrumental tooth color matching strongly suggest the use of color matching instruments as a supplementary tool in everyday dental practice to enhance the esthetic outcome. © 2017 Wiley Periodicals, Inc.
O'Connor, S; McCaffrey, N; Whyte, E; Moran, K
2016-07-01
To adapt the trunk stability test to facilitate further sub-classification of higher levels of core stability in athletes for use as a screening tool. To establish the inter-tester and intra-tester reliability of this adapted core stability test. Reliability study. Collegiate athletic therapy facilities. Fifteen physically active male subjects (19.46 ± 0.63) free from any orthopaedic or neurological disorders were recruited from a convenience sample of collegiate students. The intraclass correlation coefficients (ICC) and 95% Confidence Intervals (CI) were computed to establish inter-tester and intra-tester reliability. Excellent ICC values were observed in the adapted core stability test for inter-tester reliability (0.97) and good to excellent intra-tester reliability (0.73-0.90). While the 95% CI were narrow for inter-tester reliability, Tester A and C 95% CI's were widely distributed compared to Tester B. The adapted core stability test developed in this study is a quick and simple field based test to administer that can further subdivide athletes with high levels of core stability. The test demonstrated high inter-tester and intra-tester reliability. Copyright © 2015 Elsevier Ltd. All rights reserved.
Palm, Peter; Josephson, Malin; Mathiassen, Svend Erik; Kjellberg, Katarina
2016-06-01
We evaluated the intra- and inter-observer reliability and criterion validity of an observation protocol, developed in an iterative process involving practicing ergonomists, for assessment of working technique during cash register work for the purpose of preventing upper extremity symptoms. Two ergonomists independently assessed 17 15-min videos of cash register work on two occasions each, as a basis for examining reliability. Criterion validity was assessed by comparing these assessments with meticulous video-based analyses by researchers. Intra-observer reliability was acceptable (i.e. proportional agreement >0.7 and kappa >0.4) for 10/10 questions. Inter-observer reliability was acceptable for only 3/10 questions. An acceptable inter-observer reliability combined with an acceptable criterion validity was obtained only for one working technique aspect, 'Quality of movements'. Thus, major elements of the cashiers' working technique could not be assessed with an acceptable accuracy from short periods of observations by one observer, such as often desired by practitioners. Practitioner Summary: We examined an observation protocol for assessing working technique in cash register work. It was feasible in use, but inter-observer reliability and criterion validity were generally not acceptable when working technique aspects were assessed from short periods of work. We recommend the protocol to be used for educational purposes only.
Hudson, John M; Milot, Laurent; Parry, Craig; Williams, Ross; Burns, Peter N
2013-06-01
This study assessed the reproducibility of shear wave elastography (SWE) in the liver of healthy volunteers. Intra- and inter-operator reliability and repeatability were quantified in three different liver segments in a sample of 15 subjects, scanned during four independent sessions (two scans on day 1, two scans 1 wk later) by two operators. A total of 1440 measurements were made. Reproducibility was assessed using the intra-class correlation coefficient (ICC) and a repeated measures analysis of variance. The shear wave speed was measured and used to estimate Young's modulus using the Supersonics Imagine Aixplorer. The median Young's modulus measured through the inter-costal space was 5.55 ± 0.74 kPa. The intra-operator reliability was better for same-day evaluations (ICC = 0.91) than the inter-operator reliability (ICC = 0.78). Intra-observer agreement decreased when scans were repeated on a different day. Inter-session repeatability was between 3.3% and 9.9% for intra-day repeated scans, compared with to 6.5%-12% for inter-day repeated scans. No significant difference was observed in subjects with a body mass index greater or less than 25 kg/m(2). Copyright © 2013 World Federation for Ultrasound in Medicine & Biology. Published by Elsevier Inc. All rights reserved.
Validity and inter-observer reliability of subjective hand-arm vibration assessments.
Coenen, Pieter; Formanoy, Margriet; Douwes, Marjolein; Bosch, Tim; de Kraker, Heleen
2014-07-01
Exposure to mechanical vibrations at work (e.g., due to handling powered tools) is a potential occupational risk as it may cause upper extremity complaints. However, reliable and valid assessment methods for vibration exposure at work are lacking. Measuring hand-arm vibration objectively is often difficult and expensive, while often used information provided by manufacturers lacks detail. Therefore, a subjective hand-arm vibration assessment method was tested on validity and inter-observer reliability. In an experimental protocol, sixteen tasks handling powered tools were executed by two workers. Hand-arm vibration was assessed subjectively by 16 observers according to the proposed subjective assessment method. As a gold standard reference, hand-arm vibration was measured objectively using a vibration measurement device. Weighted κ's were calculated to assess validity, intra-class-correlation coefficients (ICCs) were calculated to assess inter-observer reliability. Inter-observer reliability of the subjective assessments depicting the agreement among observers can be expressed by an ICC of 0.708 (0.511-0.873). The validity of the subjective assessments as compared to the gold-standard reference can be expressed by a weighted κ of 0.535 (0.285-0.785). Besides, the percentage of exact agreement of the subjective assessment compared to the objective measurement was relatively low (i.e., 52% of all tasks). This study shows that subjectively assessed hand-arm vibrations are fairly reliable among observers and moderately valid. This assessment method is a first attempt to use subjective risk assessments of hand-arm vibration. Although, this assessment method can benefit from some future improvement, it can be of use in future studies and in field-based ergonomic assessments. Copyright © 2014 Elsevier Ltd and The Ergonomics Society. All rights reserved.
NASA Astrophysics Data System (ADS)
Saini, K. K.; Sehgal, R. K.; Sethi, B. L.
2008-10-01
In this paper major reliability estimators are analyzed and there comparatively result are discussed. There strengths and weaknesses are evaluated in this case study. Each of the reliability estimators has certain advantages and disadvantages. Inter-rater reliability is one of the best ways to estimate reliability when your measure is an observation. However, it requires multiple raters or observers. As an alternative, you could look at the correlation of ratings of the same single observer repeated on two different occasions. Each of the reliability estimators will give a different value for reliability. In general, the test-retest and inter-rater reliability estimates will be lower in value than the parallel forms and internal consistency ones because they involve measuring at different times or with different raters. Since reliability estimates are often used in statistical analyses of quasi-experimental designs.
Reliability of a four-column classification for tibial plateau fractures.
Martínez-Rondanelli, Alfredo; Escobar-González, Sara Sofía; Henao-Alzate, Alejandro; Martínez-Cano, Juan Pablo
2017-09-01
A four-column classification system offers a different way of evaluating tibial plateau fractures. The aim of this study is to compare the intra-observer and inter-observer reliability between four-column and classic classifications. This is a reliability study, which included patients presenting with tibial plateau fractures between January 2013 and September 2015 in a level-1 trauma centre. Four orthopaedic surgeons blindly classified each fracture according to four different classifications: AO, Schatzker, Duparc and four-column. Kappa, intra-observer and inter-observer concordance were calculated for the reliability analysis. Forty-nine patients were included. The mean age was 39 ± 14.2 years, with no gender predominance (men: 51%; women: 49%), and 67% of the fractures included at least one of the posterior columns. The intra-observer and inter-observer concordance were calculated for each classification: four-column (84%/79%), Schatzker (60%/71%), AO (50%/59%) and Duparc (48%/58%), with a statistically significant difference among them (p = 0.001/p = 0.003). Kappa coefficient for intr-aobserver and inter-observer evaluations: Schatzker 0.48/0.39, four-column 0.61/0.34, Duparc 0.37/0.23, and AO 0.34/0.11. The proposed four-column classification showed the highest intra and inter-observer agreement. When taking into account the agreement that occurs by chance, Schatzker classification showed the highest inter-observer kappa, but again the four-column had the highest intra-observer kappa value. The proposed classification is a more inclusive classification for the posteromedial and posterolateral fractures. We suggest, therefore, that it be used in addition to one of the classic classifications in order to better understand the fracture pattern, as it allows more attention to be paid to the posterior columns, it improves the surgical planning and allows the surgical approach to be chosen more accurately.
RELIABILITY AND VALIDITY OF SUBJECTIVE ASSESSMENT OF LUMBAR LORDOSIS IN CONVENTIONAL RADIOGRAPHY.
Ruhinda, E; Byanyima, R K; Mugerwa, H
2014-10-01
Reliability and validity studies of different lumbar curvature analysis and measurement techniques have been documented however there is limited literature on the reliability and validity of subjective visual analysis. Radiological assessment of lumbar lordotic curve aids in early diagnosis of conditions even before neurologic changes set in. To ascertain the level of reliability and validity of subjective assessment of lumbar lordosis in conventional radiography. A blinded, repeated-measures diagnostic test was carried out on lumbar spine x-ray radiographs. Radiology Department at Joint Clinical Research Centre (JCRC), Mengo-Kampala-Uganda. Seventy (70) lateral lumbar x-ray films were used for this study and were obtained from the archive of JCRC radiology department at Butikiro house, Mengo-Kampala. Poor observer agreement, both inter- and intra-observer, with kappa values of 0.16 was found. Inter-observer agreement was poorer than intra-observer agreement. Kappa values significantly rose when the lumbar lordosis was clustered into four categories without grading each abnormality. The results confirm that subjective assessment of lumbar lordosis has low reliability and validity. Film quality has limited influence on the observer reliability. This study further shows that fewer scale categories of lordosis abnormalities produce better observer reliability.
Inter-rater and intra-rater reliability of a movement control test in shoulder.
Rajasekar, S; Bangera, Rakshith K; Sekaran, Padmanaban
2017-07-01
Movement faults are commonly observed in patients with musculoskeletal pain. The Kinetic Medial Rotation Test (KMRT) is a movement control test used to identify movement faults of the scapula and gleno-humeral joints during arm movement. Objective tests such as the KMRT need to be reliable and valid for the results to be applied across different clinical settings and patient populations. The primary objective of the present study was to determine the intra-rater and inter-rater reliability of KMRT in subjects with and without shoulder pain. Sixty subjects were included in this study based on specific inclusion and exclusion criteria. Two musculoskeletal physiotherapists with different levels of clinical experience performed the tests. The intra-rater reliability was tested in twenty asymptomatic subjects by a single assessor at two week intervals. An equal number of subjects with and without shoulder pain were tested by both the assessors to determine the inter-rater reliability. Both components of the KMRT, the Gleno- Humeral Anterior Translation (GHAT) and the Scapular Forward Tilt (SCFT) were tested. The Kappa values for inter-rater reliability of the GHAT and SCFT were K = 0.68 & K = 0.65 respectively in subjects with shoulder pain. In asymptomatic subjects, the inter-rater reliability of GHAT was K = 0.61 and SCFT was K = 0.85. Intra-rater reliability ranged from K = 0.66 for GHAT to K = 0.87 for SCFT. Our study found substantial agreement in inter-rater reliability of KMRT in subjects with shoulder pain, whereas substantial to near perfect agreement was found in intra-rater and inter-rater reliability of KMRT in subjects without shoulder pain. Copyright © 2017 Elsevier Ltd. All rights reserved.
Sánchez, Guillermo; Nova, John; Arias, Nilsa; Peña, Bibiana
2008-12-01
The Fitzpatrick phototype scale has been used to determine skin sensitivity to ultraviolet light. The reliability of this scale in estimating sensitivity permits risk evaluation of skin cancer based on phototype. Reliability and changes in intra and inter-observer concordance was determined for the Fitzpatrick phototype scale after the assessment methods for establishing the phototype were standardized. An analytical study of intra and inter-observer concordance was performed. The Fitzpatrick phototype scale was standardized using focus group methodology. To determine intra and inter-observer agreement, the weighted kappa statistical method was applied. The standardization effect was measured using the equal kappa contrast hypothesis and Wald test for dependent measurements. The phototype scale was applied to 155 patients over 15 years of age who were assessed four times by two independent observers. The sample was drawn from patients of the Centro Dermatol6gico Federico Lleras Acosta. During the pre-standardization phase, the baseline and six-week inter-observer weighted kappa were 0.31 and 0.40, respectively. The intra-observer kappa values for observers A and B were 0.47 and 0.51, respectively. After the standardization process, the baseline and six-week inter-observer weighted kappa values were 0.77, and 0.82, respectively. Intra-observer kappa coefficients for observers A and B were 0.78 and 0.82. Statistically significant differences were found between coefficients before and after standardization (p<0.001) in all comparisons. Following a standardization exercise, the Fitzpatrick phototype scale yielded reliable, reproducible and consistent results.
Høyer, C; Paludan, J P D; Pavar, S; Biurrun Manresa, J A; Petersen, L J
2014-03-01
To assess the intra- and inter-observer variation in laser Doppler flowmetry curve reading for measurement of toe and ankle pressures. A prospective single blinded diagnostic accuracy study was conducted on 200 patients with known or suspected peripheral arterial disease (PAD), with a total of 760 curve sets produced. The first curve reading for this study was performed by laboratory technologists blinded to clinical clues and previous readings at least 3 months after the primary data sampling. The pressure curves were later reassessed following another period of at least 3 months. Observer agreement in diagnostic classification according to TASC-II criteria was quantified using Cohen's kappa. Reliability was quantified using intra-class correlation coefficients, coefficients of variance, and Bland-Altman analysis. The overall agreement in diagnostic classification (PAD/not PAD) was 173/200 (87%) for intra-observer (κ = .858) and 175/200 (88%) for inter-observer data (κ = .787). Reliability analysis confirmed excellent correlation for both intra- and inter-observer data (ICC all ≥.931). The coefficients of variance ranged from 2.27% to 6.44% for intra-observer and 2.39% to 8.42% for inter-observer data. Subgroup analysis showed lower observer-variation for reading of toe pressures in patients with diabetes and/or chronic kidney disease than patients not diagnosed with these conditions. Bland-Altman plots showed higher variation in toe pressure readings than ankle pressure readings. This study shows substantial intra- and inter-observer agreement in diagnostic classification and reading of absolute pressures when using laboratory technologists as observers. The study emphasises that observer variation for curve reading is an important factor concerning the overall reproducibility of the method. Our data suggest diabetes and chronic kidney disease have an influence on toe pressure reproducibility. Copyright © 2013 European Society for Vascular Surgery. Published by Elsevier Ltd. All rights reserved.
Mosmuller, David; Tan, Robin; Mulder, Frans; Bachour, Yara; de Vet, Henrica; Don Griot, Peter
2016-10-01
It is essential to have a reliable assessment method in order to compare the results of cleft lip and palate surgery. In this study the computer-based program SymNose, a method for quantitative assessment of the nose and lip, will be assessed on usability and reliability. The symmetry of the nose and lip was measured twice in 50 six-year-old complete and incomplete unilateral cleft lip and palate patients by four observers. For the frontal view the asymmetry level of the nose and upper lip were evaluated and for the basal view the asymmetry level of the nose and nostrils were evaluated. A mean inter-observer reliability when tracing each image once or twice was 0.70 and 0.75, respectively. Tracing the photographs with 2 observers and 4 observers gave a mean inter-observer score of 0.86 and 0.92, respectively. The mean intra-observer reliability varied between 0.80 and 0.84. SymNose is a practical and reliable tool for the retrospective assessment of large caseloads of 2D photographs of cleft patients for research purposes. Moderate to high single inter-observer reliability was found. For future research with SymNose reliable outcomes can be achieved by using the average outcomes of single tracings of two observers. Copyright © 2016 European Association for Cranio-Maxillo-Facial Surgery. Published by Elsevier Ltd. All rights reserved.
Høyer, Christian; Pavar, Susanne; Pedersen, Begitte H; Biurrun Manresa, José A; Petersen, Lars J
2013-08-01
Mercury-in-silastic strain gauge pletysmography (SGP) is a well-established technique for blood flow and blood pressure measurements. The aim of this study was to examine (i) the possible influence of clinical clues, e.g. the presence of wounds and color changes during blood pressure measurements, and (ii) intra- and inter-observer variation of curve interpretation for segmental blood pressure measurements. A total of 204 patients with known or suspected peripheral arterial disease (PAD) were included in a diagnostic accuracy trial. Toe and ankle pressures were measured in both limbs, and primary observers analyzed a total of 804 pressure curve sets. The SGP curves were later reanalyzed separately by two observers blinded to clinical clues. Intra- and inter-observer agreement was quantified using Cohen's kappa and reliability was quantified using intra-class correlation coefficients, coefficients of variance, and Bland-Altman analysis. There was an overall agreement regarding patient diagnostic classification (PAD/not PAD) in 202/204 (99.0%) for intra-observer (κ = 0.969, p < 0.001), and 201/204 (98.5%) for inter-observer readings (κ = 0.953, p < 0.001). Reliability analysis showed excellent correlation between blinded versus non-blinded and inter-observer readings for determination of absolute segmental pressures (all intraclass correlation coefficients ≥ 0.984). The coefficient of variance for determination of absolute segmental blood pressure ranged from 2.9-3.4% for blinded/non-blinded data and from 3.8-5.0% for inter-observer data. This study shows a low inter-observer variation among experienced laboratory technicians for reading strain gauge curves. The low variation between blinded/non-blinded readings indicates that SGP measurements are minimally biased by clinical clues.
2010-01-01
Introduction The Glasgow Coma Scale (GCS) is the most widely used scoring system for comatose patients in intensive care. Limitations of the GCS include the impossibility to assess the verbal score in intubated or aphasic patients, and an inconsistent inter-rater reliability. The FOUR (Full Outline of UnResponsiveness) score, a new coma scale not reliant on verbal response, was recently proposed. The aim of the present study was to compare the inter-rater reliability of the GCS and the FOUR score among unselected patients in general critical care. A further aim was to compare the inter-rater reliability of neurologists with that of intensive care unit (ICU) staff. Methods In this prospective observational study, scoring of GCS and FOUR score was performed by neurologists and ICU staff on 267 consecutive patients admitted to intensive care. Results In a total of 437 pair wise ratings the exact inter-rater agreement for the GCS was 71%, and for the FOUR score 82% (P = 0.0016); the inter-rater agreement within a range of ± 1 score point for the GCS was 90%, and for the FOUR score 92% (P = ns.). The exact inter-rater agreement among neurologists was superior to that among ICU staff for the FOUR score (87% vs. 79%, P = 0.04) but not for the GCS (73% vs. 73%). Neurologists and ICU staff did not significantly differ in the inter-rater agreement within a range of ± 1 score point for both GCS (88% vs. 93%) and the FOUR score (91% vs. 88%). Conclusions The FOUR score performed better than the GCS for exact inter-rater agreement, but not for the clinically more relevant agreement within the range of ± 1 score point. Though neurologists outperformed ICU staff with regard to exact inter-rater agreement, the inter-rater agreement of ICU staff within the clinically more relevant range of ± 1 score point equalled that of the neurologists. The small advantage in inter-rater reliability of the FOUR score is most likely insufficient to replace the GCS, a score with a long tradition in intensive care. PMID:20398274
Nutrition Environment Measures Survey in stores (NEMS-S): development and evaluation.
Glanz, Karen; Sallis, James F; Saelens, Brian E; Frank, Lawrence D
2007-04-01
Eating, or nutrition, environments are believed to contribute to obesity and chronic diseases. There is a need for valid, reliable measures of nutrition environments. This article reports on the development and evaluation of measures of nutrition environments in retail food stores. The Nutrition Environment Measures Study developed observational measures of the nutrition environment within retail food stores (NEMS-S) to assess availability of healthy options, price, and quality. After pretesting, measures were completed by independent raters to evaluate inter-rater reliability and across two occasions to assess test-retest reliability in grocery and convenience stores in four neighborhoods differing on income and community design in the Atlanta metropolitan area. Data were collected and analyzed in 2004 and 2005. Ten food categories (e.g., fruits) or indicator food items (e.g., ground beef) were evaluated in 85 stores. Inter-rater reliability and test-retest reliability of availability were high: inter-rater reliability kappas were 0.84 to 1.00, and test-retest reliabilities were .73 to 1.00. Inter-rater reliability for quality across fresh produce was moderate (kappas, 0.44 to 1.00). Healthier options were higher priced for hot dogs, lean ground beef, and baked chips. More healthful options were available in grocery than convenience stores and in stores in higher income neighborhoods. The NEMS-S tool was found to have a high degree of inter-rater and test-retest reliability, and to reveal significant differences across store types and neighborhoods of high and low socioeconomic status. These observational measures of nutrition environments can be applied in multilevel studies of community nutrition, and can inform new approaches to conducting and evaluating nutrition interventions.
Aerts, Frank; Carrier, Kathy; Alwood, Becky
2016-01-01
Background: The assessment of clinical manifestation of muscle fatigue is an effective procedure in establishing therapeutic exercise dose. Few studies have evaluated physical therapist reliability in establishing muscle fatigue through detection of changes in quality of movement patterns in a live setting. Objective: The purpose of this study is to evaluate the inter-rater reliability of physical therapists’ ability to detect altered movement patterns due to muscle fatigue. Design: A reliability study in a live setting with multiple raters. Participants: Forty-four healthy individuals (ages 19-35) were evaluated by six physical therapists in a live setting. Methods: Participants were evaluated by physical therapists for altered movement patterns during resisted shoulder rotation. Each participant completed a total of four tests: right shoulder internal rotation, right shoulder external rotation, left shoulder internal rotation and left shoulder external rotation. Results: For all tests combined, the inter-rater reliability for a single rater scoring ICC (2,1) was .65 (95%, .60, .71) This corresponds to moderate inter-rater reliability between physical therapists. Limitations: The results of this study apply only to healthy participants and therefore cannot be generalized to a symptomatic population. Conclusion: Moderate inter-rater reliability was found between physical therapists in establishing muscle fatigue through the observation of sustained altered movement patterns during dynamic resistive shoulder internal and external rotation. PMID:27347241
Pattyn, Elise; Rajendran, Dévan
2014-04-01
Practitioners traditionally use observation to classify the position of patients' anatomical landmarks. This information may contribute to diagnosis and patient management. To calculate a) Inter-rater reliability of categorising the sagittal plane position of four anatomical landmarks (lateral femoral epicondyle, greater trochanter, mastoid process and acromion) on side-view photographs (with landmarks highlighted and not-highlighted) of anonymised subjects; b) Intra-rater reliability; c) Individual landmark inter-rater reliability; d) Validity against a 'gold standard' photograph. Online inter- and intra-rater reliability study. Photographed subjects: convenience sample of asymptomatic students; raters: randomly selected UK registered osteopaths. 40 photographs of 30 subjects were used, a priori clinically acceptable reliability was ≥0.4. Inter-rater arm: 20 photographs without landmark highlights plus 10 with highlights; Intra-rater arm: 10 duplicate photographs (non-highlighted landmarks). Validity arm: highlighted landmark scores versus 'gold standard' photographs with vertical line. Research ethics approval obtained. Osteopaths (n = 48) categorised landmark position relative to imagined vertical-line; Gwet's Agreement Coefficient 1 (AC1) calculated and chance-corrected coefficient benchmarked against Landis and Koch's scale; Validity calculation used Kendall's tau-B. Inter-rater reliability was 'fair' (AC1 = 0.342; 95% confidence interval (CI) = 0.279-0.404) for non-highlighted landmarks and 'moderate' (AC1 = 0.700; 95% CI = 0.596-0.805) for highlighted landmarks. Intra-rater reliability was 'fair' (AC1 = 0.522); range was 'poor' (AC1 = 0.160) to 'substantial' (AC1 = 0.896). No differences were found between individual landmarks. Validity was 'low' (TB = 0.327; p = 0.104). Both inter- and intra-rater reliability was 'fair' but below clinically acceptable levels, validity was 'low'. Together these results challenge the clinical practice of using observation to categorise anterio-posterior landmark position. Copyright © 2014 Elsevier Ltd. All rights reserved.
Suter, Basil; Testa, Enrique; Stämpfli, Patrick; Konala, Praveen; Rasch, Helmut; Friederich, Niklaus F; Hirschmann, Michael T
2015-03-20
The introduction of a standardized SPECT/CT algorithm including a localization scheme, which allows accurate identification of specific patterns and thresholds of SPECT/CT tracer uptake, could lead to a better understanding of the bone remodeling and specific failure modes of unicondylar knee arthroplasty (UKA). The purpose of the present study was to introduce a novel standardized SPECT/CT algorithm for patients after UKA and evaluate its clinical applicability, usefulness and inter- and intra-observer reliability. Tc-HDP-SPECT/CT images of consecutive patients (median age 65, range 48-84 years) with 21 knees after UKA were prospectively evaluated. The tracer activity on SPECT/CT was localized using a specific standardized UKA localization scheme. For tracer uptake analysis (intensity and anatomical distribution pattern) a 3D volumetric quantification method was used. The maximum intensity values were recorded for each anatomical area. In addition, ratios between the respective value in the measured area and the background tracer activity were calculated. The femoral and tibial component position (varus-valgus, flexion-extension, internal and external rotation) was determined in 3D-CT. The inter- and intraobserver reliability of the localization scheme, grading of the tracer activity and component measurements were determined by calculating the intraclass correlation coefficients (ICC). The localization scheme, grading of the tracer activity and component measurements showed high inter- and intra-observer reliabilities for all regions (tibia, femur and patella). For measurement of component position there was strong agreement between the readings of the two observers; the ICC for the orientation of the femoral component was 0.73-1.00 (intra-observer reliability) and 0.91-1.00 (inter-observer reliability). The ICC for the orientation of the tibial component was 0.75-1.00 (intra-observer reliability) and 0.77-1.00 (inter-observer reliability). The SPECT/CT algorithm presented combining the mechanical information on UKA component position, alignment and metabolic data is highly reliable and proved to be a valuable, consistent and useful tool for analysing postoperative knees after UKA. Using this standardized approach in clinical studies might be helpful in establishing the diagnosis in patients with pain after UKA.
Reliability of movement control tests in the lumbar spine
Luomajoki, Hannu; Kool, Jan; de Bruin, Eling D; Airaksinen, Olavi
2007-01-01
Background Movement control dysfunction [MCD] reduces active control of movements. Patients with MCD might form an important subgroup among patients with non specific low back pain. The diagnosis is based on the observation of active movements. Although widely used clinically, only a few studies have been performed to determine the test reliability. The aim of this study was to determine the inter- and intra-observer reliability of movement control dysfunction tests of the lumbar spine. Methods We videoed patients performing a standardized test battery consisting of 10 active movement tests for motor control in 27 patients with non specific low back pain and 13 patients with other diagnoses but without back pain. Four physiotherapists independently rated test performances as correct or incorrect per observation, blinded to all other patient information and to each other. The study was conducted in a private physiotherapy outpatient practice in Reinach, Switzerland. Kappa coefficients, percentage agreements and confidence intervals for inter- and intra-rater results were calculated. Results The kappa values for inter-tester reliability ranged between 0.24 – 0.71. Six tests out of ten showed a substantial reliability [k > 0.6]. Intra-tester reliability was between 0.51 – 0.96, all tests but one showed substantial reliability [k > 0.6]. Conclusion Physiotherapists were able to reliably rate most of the tests in this series of motor control tasks as being performed correctly or not, by viewing films of patients with and without back pain performing the task. PMID:17850669
Whitfield, Richard H; Newcombe, Robert G; Woollard, Malcolm
2003-12-01
The introduction of the European Resuscitation Guidelines (2000) for cardiopulmonary resuscitation (CPR) and automated external defibrillation (AED) prompted the development of an up-to-date and reliable method of assessing the quality of performance of CPR in combination with the use of an AED. The Cardiff Test of basic life support (BLS) and AED version 3.1 was developed to meet this need and uses standardised checklists to retrospectively evaluate performance from analyses of video recordings and data drawn from a laptop computer attached to a training manikin. This paper reports the inter- and intra-observer reliability of this test. Data used to assess reliability were obtained from an investigation of CPR and AED skill acquisition in a lay responder AED training programme. Six observers were recruited to evaluate performance in 33 data sets, repeating their evaluation after a minimum interval of 3 weeks. More than 70% of the 42 variables considered in this study had a kappa score of 0.70 or above for inter-observer reliability or were drawn from computer data and therefore not subject to evaluator variability. 85% of the 42 variables had kappa scores for intra-observer reliability of 0.70 or above or were drawn from computer data. The standard deviations for inter- and intra-observer measures of time to first shock were 11.6 and 7.7 s, respectively. The inter- and intra-observer reliability for the majority of the variables in the Cardiff Test of BLS and AED version 3.1 is satisfactory. However, reliability is less acceptable with respect to shaking when checking for responsiveness, initial check/clearing of the airway, checks for signs of circulation, time to first shock and performance of interventions in the correct sequence. Further research is required to determine if modifications to the method of assessing these variables can increase reliability.
Figueroa, José; Guarachi, Juan Pablo; Matas, José; Arnander, Magnus; Orrego, Mario
2016-04-01
Computed tomography (CT) is widely used to assess component rotation in patients with poor results after total knee arthroplasty (TKA). The purpose of this study was to simultaneously determine the accuracy and reliability of CT in measuring TKA component rotation. TKA components were implanted in dry-bone models and assigned to two groups. The first group (n = 7) had variable femoral component rotations, and the second group (n = 6) had variable tibial tray rotations. CT images were then used to assess component rotation. Accuracy of CT rotational assessment was determined by mean difference, in degrees, between implanted component rotation and CT-measured rotation. Intraclass correlation coefficient (ICC) was applied to determine intra-observer and inter-observer reliability. Femoral component accuracy showed a mean difference of 2.5° and the tibial tray a mean difference of 3.2°. There was good intra- and inter-observer reliability for both components, with a femoral ICC of 0.8 and 0.76, and tibial ICC of 0.68 and 0.65, respectively. CT rotational assessment accuracy can differ from true component rotation by approximately 3° for each component. It does, however, have good inter- and intra-observer reliability.
Suzuki, T; Sato, Y; Sotome, S; Arai, H; Arai, A; Yoshida, H
2017-06-01
This study was designed to investigate the reliability and validity of measurements of finger diameters with a ring gauge. A reliability study enrolled two independent samples (50 participants and seven examiners in Study I; 26 participants and 26 examiners in Study II). The sizes of each participant's little fingers were measured twice with a ring gauge by each examiner. To investigate the validity of the measurements, five hand therapists compared the finger size and hand volume of 30 participants with the ring gauge and with a figure-of-eight technique (Study III). The intra-class correlation coefficient for intra-observer reliability ranged from 0.97 to 0.99 in Study I, and 0.90 to 0.97 in Study II. The intra-class correlation coefficient for inter-observer reliability was 0.95 in Study I and 0.94 in Study II. The validity study showed a Pearson product moment correlation coefficient of 0.75. The ring gauge showed high reliability and validity for measurement of finger size. III, diagnostic.
Groth, M; Forkert, N D; Buhk, J H; Schoenfeld, M; Goebell, E; Fiehler, J
2013-02-01
To compare intra- and inter-observer reliability of aneurysm measurements obtained by a 3D computer-aided technique with standard manual aneurysm measurements in different imaging modalities. A total of 21 patients with 29 cerebral aneurysms were studied. All patients underwent digital subtraction angiography (DSA), contrast-enhanced (CE-MRA) and time-of-flight magnetic resonance angiography (TOF-MRA). Aneurysm neck and depth diameters were manually measured by two observers in each modality. Additionally, semi-automatic computer-aided diameter measurements were performed using 3D vessel surface models derived from CE- (CE-com) and TOF-MRA (TOF-com) datasets. Bland-Altman analysis (BA) and intra-class correlation coefficient (ICC) were used to evaluate intra- and inter-observer agreement. BA revealed the narrowest relative limits of intra- and inter-observer agreement for aneurysm neck and depth diameters obtained by TOF-com (ranging between ±5.3 % and ±28.3 %) and CE-com (ranging between ±23.3 % and ±38.1 %). Direct measurements in DSA, TOF-MRA and CE-MRA showed considerably wider limits of agreement. The highest ICCs were observed for TOF-com and CE-com (ICC values, 0.92 or higher for intra- as well as inter-observer reliability). Computer-aided aneurysm measurement in 3D offers improved intra- and inter-observer reliability and a reproducible parameter extraction, which may be used in clinical routine and as objective surrogate end-points in clinical trials.
Neri, T; Barthelemy, R; Tourné, Y
2017-12-01
Among radiographic views available for assessing hindfoot alignment, the antero-posterior weight-bearing view with metal cerclage of the hindfoot (Méary view) is the most widely used in France. Internationally, the long axial view (LAV) and hindfoot alignment view (HAV) are used also. The objective of this study was to compare the reliability of these three views. The Méary view with cerclage of the hindfoot is as reliable as the LAV and HAV for assessing hindfoot alignment. All three views were obtained in each of 22 prospectively included patients. Intra-observer and inter-observer reliabilities were assessed by having two observers collect the radiographic measurements then computing the intra-class correlation coefficients (ICCs). The intra-observer and inter-observer ICCs were 0.956 and 0.988 with the Méary view, 0.990 and 0.765 with the HAV, and 0.997 and 0.991 with the LAV, respectively. Correlations were far stronger between the LAV and HAV than between each of these and the Méary view. Compared to the LAV and HAV, the Méary view indicated a greater degree of hindfoot valgus. Intra-observer reliability was excellent with both the LAV and HAV, whereas inter-observer reliability was better with the LAV. Excellent reliability was also obtained with the Méary view. Combining the Méary view to obtain a radiographic image of the clinical deformity with the LAV to measure the angular deviation of the hindfoot axis may be useful when assessing hindfoot malalignment. A comparison of the three views in a larger population is needed before clinical recommendations can be made. II, prospective study. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
Park, Jung-Keun; Boyer, Jon; Tessler, Jamie; Casey, Jeffrey; Schemm, Linda; Gore, Rebecca; Punnett, Laura
2009-07-01
This study examined the inter-rater reliability of expert observations of ergonomic risk factors by four analysts. Ten jobs were observed at a hospital using a newly expanded version of the PATH method (Buchholz et al. 1996), to which selected upper extremity exposures had been added. Two of the four raters simultaneously observed each worker onsite for a total of 443 observation pairs containing 18 categorical exposure items each. For most exposure items, kappa coefficients were 0.4 or higher. For some items, agreement was higher both for the jobs with less rapid hand activity and for the analysts with a higher level of ergonomic job analysis experience. These upper extremity exposures could be characterised reliably with real-time observation, given adequate experience and training of the observers. The revised version of PATH is applicable to the analysis of jobs where upper extremity musculoskeletal strain is of concern.
Foppen, Wouter; van der Schaaf, Irene C; Beek, Frederik J A; Verkooijen, Helena M; Fischer, Kathelijn
2016-06-01
The radiological Pettersson score (PS) is widely applied for classification of arthropathy to evaluate costly haemophilia treatment. This study aims to assess and improve inter- and intra-observer reliability and agreement of the PS. Two series of X-rays (bilateral elbows, knees, and ankles) of 10 haemophilia patients (120 joints) with haemophilic arthropathy were scored by three observers according to the PS (maximum score 13/joint). Subsequently, (dis-)agreement in scoring was discussed until consensus. Example images were collected in an atlas. Thereafter, second series of 120 joints were scored using the atlas. One observer rescored the second series after three months. Reliability was assessed by intraclass correlation coefficients (ICC), agreement by limits of agreement (LoA). Median Pettersson score at joint level (PSjoint) of affected joints was 6 (interquartile range 3-9). Using the consensus atlas, inter-observer reliability of the PSjoint improved significantly from 0.94 (95 % confidence interval (CI) 0.91-0.96) to 0.97 (CI 0.96-0.98). LoA improved from ±1.7 to ±1.1 for the PSjoint. Therefore, true differences in arthropathy were differences in the PSjoint of >2 points. Intra-observer reliability of the PSjoint was 0.98 (CI 0.97-0.98), intra-observer LoA were ±0.9 points. Reliability and agreement of the PS improved by using a consensus atlas. • Reliability of the Pettersson score significantly improved using the consensus atlas. • The presented consensus atlas improved the agreement among observers. • The consensus atlas could be recommended to obtain a reproducible Pettersson score.
Haga, Nienke; van der Heijden-Maessen, Hélène C; van Hoorn, Jessika F; Boonstra, Anne M; Hadders-Algra, Mijna
2007-12-01
To investigate the test-retest, inter-, and intraobserver reliability of the Quality of Upper Extremity Skills Test (QUEST) in young children with cerebral palsy (CP). For test-retest reliability, a test-retest design was used; for the intra- and interobserver reliability, the videotaped test was scored on 2 occasions by 1 observer and by various observers. Groups of preschool-age children in 2 general rehabilitation centers. Twenty-one children with CP (12 boys, 9 girls) aged 2 to 4.5 years (mean, 39 mo). Not applicable. Spearman correlation coefficient. The data indicated that test-retest reliability was strong (rho range, .85-.94). Intraobserver agreement (rho range, .63-.95) and agreement between various observers (rho range, .72-.90) were moderate to strong. Test-retest and inter- and intraobserver reliability of the QUEST in preschool-age children with CP is good.
[A systematic social observation tool: methods and results of inter-rater reliability].
Freitas, Eulilian Dias de; Camargos, Vitor Passos; Xavier, César Coelho; Caiaffa, Waleska Teixeira; Proietti, Fernando Augusto
2013-10-01
Systematic social observation has been used as a health research methodology for collecting information from the neighborhood physical and social environment. The objectives of this article were to describe the operationalization of direct observation of the physical and social environment in urban areas and to evaluate the instrument's reliability. The systematic social observation instrument was designed to collect information in several domains. A total of 1,306 street segments belonging to 149 different neighborhoods in Belo Horizonte, Minas Gerais, Brazil, were observed. For the reliability study, 149 segments (1 per neighborhood) were re-audited, and Fleiss kappa was used to access inter-rater agreement. Mean agreement was 0.57 (SD = 0.24); 53% had substantial or almost perfect agreement, and 20.4%, moderate agreement. The instrument appears to be appropriate for observing neighborhood characteristics that are not time-dependent, especially urban services, property characterization, pedestrian environment, and security.
Carpio, B; Brown, B
1993-01-01
The undergraduate nursing degree program (B.Sc.N.) at McMaster University School of Nursing uses small groups, and is learner-centered and problem-based. A study was conducted during the 1991 admissions cycle to determine the initial reliability and validity of the semi-structured personal interview which constitutes the final component of candidate selection for this program. During the interview, three-member teams assess applicant suitability to the program based on six dimensions: applicant motivation, awareness of the program, problem-solving abilities, ability to relate to others, self-appraisal skills, and career goals. Each interviewer assigns the applicant a global rating using a seven-point scale. For the purposes of this study four interviewer teams were randomly selected from the pool of 31 teams to interview four simulated (preprogrammed) applicants. Using two-factor repeated-measures ANOVA to analyze interview ratings, inter-rater and inter-team intraclass correlation coefficients (ICC) were calculated. Inter-team reliability ranged from .64 to .97 for the individual dimensions, and .66 to .89 on global ratings. Inter-rater ICC for the six dimensions ranged from .81 to .99, and .96 to .99 for the global ratings. The item-to-total correlation coefficients between individual dimensions and global ratings ranged from .8 to 1.0. Pearson correlations between items ranged from .77 to 1.0. The ICC were then calculated for the interview scores of 108 actual applicants to the program. Inter-rater reliability based on global ratings was .79 for the single (1 rater) observation, and .91 for the multiple (3 rater) observation. These findings support the continued use of the interview as a reliable instrument with face validity. Studies of predictive validity will be undertaken.
Reliability of two social cognition tests: The combined stories test and the social knowledge test.
Thibaudeau, Élisabeth; Cellard, Caroline; Legendre, Maxime; Villeneuve, Karèle; Achim, Amélie M
2018-04-01
Deficits in social cognition are common in psychiatric disorders. Validated social cognition measures with good psychometric properties are necessary to assess and target social cognitive deficits. Two recent social cognition tests, the Combined Stories Test (COST) and the Social Knowledge Test (SKT), respectively assess theory of mind and social knowledge. Previous studies have shown good psychometric properties for these tests, but the test-retest reliability has never been documented. The aim of this study was to evaluate the test-retest reliability and the inter-rater reliability of the COST and the SKT. The COST and the SKT were administered twice to a group of forty-two healthy adults, with a delay of approximately four weeks between the assessments. Excellent test-retest reliability was observed for the COST, and a good test-retest reliability was observed for the SKT. There was no evidence of practice effect. Furthermore, an excellent inter-rater reliability was observed for both tests. This study shows a good reliability of the COST and the SKT that adds to the good validity previously reported for these two tests. These good psychometrics properties thus support that the COST and the SKT are adequate measures for the assessment of social cognition. Copyright © 2018. Published by Elsevier B.V.
Takasaki, Hiroshi; Okuyama, Kousuke; Rosedale, Richard
2017-02-01
Mechanical Diagnosis and Therapy (MDT) is used in the treatment of extremity problems. Classifying clinical problems is one method of providing effective treatment to a target population. Classification reliability is a key factor to determine the precise clinical problem and to direct an appropriate intervention. To explore inter-examiner reliability of the MDT classification for extremity problems in three reliability designs: 1) vignette reliability using surveys with patient vignettes, 2) concurrent reliability, where multiple assessors decide a classification by observing someone's assessment, 3) successive reliability, where multiple assessors independently assess the same patient at different times. Systematic review with data synthesis in a quantitative format. Agreement of MDT subgroups was examined using the Kappa value, with the operational definition of acceptable reliability set at ≥ 0.6. The level of evidence was determined considering the methodological quality of the studies. Six studies were included and all studies met the criteria for high quality. Kappa values for the vignette reliability design (five studies) were ≥ 0.7. There was data from two cohorts in one study for the concurrent reliability design and the Kappa values ranged from 0.45 to 1.0. Kappa values for the successive reliability design (data from three cohorts in one study) were < 0.6. The current review found strong evidence of acceptable inter-examiner reliability of MDT classification for extremity problems in the vignette reliability design, limited evidence of acceptable reliability in the concurrent reliability design and unacceptable reliability in the successive reliability design. Copyright © 2017 Elsevier Ltd. All rights reserved.
Dinsdale, Graham; Moore, Tonia; O'Leary, Neil; Tresadern, Philip; Berks, Michael; Roberts, Christopher; Manning, Joanne; Allen, John; Anderson, Marina; Cutolo, Maurizio; Hesselstrand, Roger; Howell, Kevin; Pizzorni, Carmen; Smith, Vanessa; Sulli, Alberto; Wildt, Marie; Taylor, Christopher; Murray, Andrea; Herrick, Ariane L
2017-07-01
Our aim was to assess the reliability of nailfold capillary assessment in terms of image evaluability, image severity grade ('normal', 'early', 'active', 'late'), capillary density, capillary (apex) width, and presence of giant capillaries, and also to gain further insight into differences in these parameters between patients with systemic sclerosis (SSc), patients with primary Raynaud's phenomenon (PRP) and healthy control subjects. Videocapillaroscopy images (magnification 300×) were acquired from all 10 digits from 173 participants: 101 patients with SSc, 22 with PRP and 50 healthy controls. Ten capillaroscopy experts from 7 European centres evaluated the images. Custom image mark-up software allowed extraction of the following outcome measures: overall grade ('normal', 'early', 'active', 'late', 'non-specific', or 'ungradeable'), capillary density (vessels/mm), mean vessel apical width, and presence of giant capillaries. Observers analysed a median of 129 images each. Evaluability (i.e. the availability of measures) varied across outcome measures (e.g. 73.0% for density and 46.2% for overall grade in patients with SSc). Intra-observer reliability for evaluability was consistently higher than inter- (e.g. for density, intra-class correlation coefficient [ICC] was 0.71 within and 0.14 between observers). Conditional on evaluability, both intra- and inter-observer reliability were high for grade (ICC 0.93 and 0.78 respectively), density (0.91 and 0.64) and width (0.91 and 0.85). Evaluability is one of the major challenges in assessing nailfold capillaries. However, when images are evaluable, the high intra- and inter-reliabilities suggest that overall image grade, capillary density and apex width have potential as outcome measures in longitudinal studies. Copyright © 2017 Elsevier Inc. All rights reserved.
Medina-Mirapeix, Francesc; Vivo-Fernández, Iván; López-Cañizares, Juan; García-Vidal, José A; Benítez-Martínez, Josep Carles; Del Baño-Aledo, María Elena
2018-01-01
The objective was to determine the inter-observer and test/retest reliability of the "Five-repetition sit-to-stand" (5STS) test in patients with total knee replacement (TKR). To explore correlation between 5STS and two mobility tests. A reliability study was conducted among 24 (mean age 72.13, S.D. 10.67; 50% were women) outpatients with TKR. They were recruited from a traumatology unit of a public hospital via convenience sampling. A physiotherapist and trauma physician assessed each patient at the same time. The same physiotherapist realized a 5STS second measurement 45-60min after the first one. Reliability was assessed with intraclass correlation coefficients (ICCs) and Bland-Altman plots. Pearson coefficient was calculated to assess the correlation between 5STS, time up to go test (TUG) and four meters gait speed (4MGS). ICC for inter-observer and test-retest reliability of the 5STS were 0.998 (95% confidence interval [CI], 0.995-0.999) and 0.982 (95% CI, 0.959-0.992). Bland-Altman plot inter-observer showed limits between -0.82 and 1.06 with a mean of 0.11 and no heteroscedasticity within the data. Bland-Altman plot for test-retest showed the limits between 1.76 and 4.16, a mean of 1.20 and heteroscedasticity within the data. Pearson correlation coefficient revealed significant correlation between 5STS and TUG (r=0.7, p<0.001) and 4MGS (r=-0.583, p=0.003). This study demonstrates excellent inter-observer and test-retest reliability when it is used in people with TKR, and also significant correlation with other functional mobility tests. These findings support the use of 5STS as outcome measure in TKR population. Copyright © 2017 Elsevier B.V. All rights reserved.
Clinical assessment of effusion in knee osteoarthritis—A systematic review
Maricar, Nasimah; Callaghan, Michael J.; Parkes, Matthew J.; Felson, David T.; O׳Neill, Terence W.
2016-01-01
Objective The aim of this systematic review was to determine the validity and inter- and intra-observer reliability of the assessment of knee joint effusion in osteoarthritis (OA) of the knee. Methods MEDLINE, Web of Knowledge, CINAHL, EMBASE, and AMED were searched from their inception to February 2015. Articles were included according to a priori defined criteria: samples containing participants with knee OA; prospective evaluation of clinical tests and assessments of knee effusion that included reliability, sensitivity, and specificity of these tests. Results A total of 10 publications were reviewed. Eight of these considered reliability and four on validity of clinical assessments against ultrasound effusion. It was not possible to undertake a meta-analysis of reliability or validity because of differences in study designs and the clinical tests. Intra-observer kappa agreement for visible swelling ranged from 0.37 (suprapatellar) to 1.0 (prepatellar); for bulge sign 0.47 and balloon sign 0.37. Inter-observer kappa agreement for visible swelling ranged from −0.02 (prepatellar) to 0.65 (infrapatellar), the balloon sign −0.11 to 0.82, patellar tap −0.02 to 0.75 and bulge sign kappa −0.04 to 0.14 or reliability coefficient 0.97. Reliability and diagnostic accuracy tended to be better in experienced observers. Very few data looked at performance of individual clinical tests with sensitivity ranging 18.2–85.7% and specificity 35.3–93.3%, both higher with larger effusions. Conclusion The majority of unstandardized clinical tests to assess joint effusion in knee OA had relatively low intra- and inter-observer reliability. There is some evidence experience improved reliability and diagnostic accuracy of tests. Currently there is insufficient evidence to recommend any particular test in clinical practice. PMID:26581486
Clinical assessment of effusion in knee osteoarthritis-A systematic review.
Maricar, Nasimah; Callaghan, Michael J; Parkes, Matthew J; Felson, David T; O'Neill, Terence W
2016-04-01
The aim of this systematic review was to determine the validity and inter- and intra-observer reliability of the assessment of knee joint effusion in osteoarthritis (OA) of the knee. MEDLINE, Web of Knowledge, CINAHL, EMBASE, and AMED were searched from their inception to February 2015. Articles were included according to a priori defined criteria: samples containing participants with knee OA; prospective evaluation of clinical tests and assessments of knee effusion that included reliability, sensitivity, and specificity of these tests. A total of 10 publications were reviewed. Eight of these considered reliability and four on validity of clinical assessments against ultrasound effusion. It was not possible to undertake a meta-analysis of reliability or validity because of differences in study designs and the clinical tests. Intra-observer kappa agreement for visible swelling ranged from 0.37 (suprapatellar) to 1.0 (prepatellar); for bulge sign 0.47 and balloon sign 0.37. Inter-observer kappa agreement for visible swelling ranged from -0.02 (prepatellar) to 0.65 (infrapatellar), the balloon sign -0.11 to 0.82, patellar tap -0.02 to 0.75 and bulge sign kappa -0.04 to 0.14 or reliability coefficient 0.97. Reliability and diagnostic accuracy tended to be better in experienced observers. Very few data looked at performance of individual clinical tests with sensitivity ranging 18.2-85.7% and specificity 35.3-93.3%, both higher with larger effusions. The majority of unstandardized clinical tests to assess joint effusion in knee OA had relatively low intra- and inter-observer reliability. There is some evidence experience improved reliability and diagnostic accuracy of tests. Currently there is insufficient evidence to recommend any particular test in clinical practice. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Santos, Maria P. M.; Rech, Cassiano R.; Alberico, Claudia O.; Fermino, Rogério C.; Rios, Ana P.; David, João; Reis, Rodrigo S.; Sarmiento, Olga L.; McKenzie, Thomas L.; Mota, Jorge
2016-01-01
The app for the System for Observing Play and Recreation in Communities (iSOPARC®) was developed to enhance System for Observing Play and Recreation in Communities data collection and management. The study aim was to examine the usability and inter-rater reliability of iSOPARC®. Trained observers collected data in 16 park areas in two Latin…
Reliability of videotaped observational gait analysis in patients with orthopedic impairments
Brunnekreef, Jaap J; van Uden, Caro JT; van Moorsel, Steven; Kooloos, Jan GM
2005-01-01
Background In clinical practice, visual gait observation is often used to determine gait disorders and to evaluate treatment. Several reliability studies on observational gait analysis have been described in the literature and generally showed moderate reliability. However, patients with orthopedic disorders have received little attention. The objective of this study is to determine the reliability levels of visual observation of gait in patients with orthopedic disorders. Methods The gait of thirty patients referred to a physical therapist for gait treatment was videotaped. Ten raters, 4 experienced, 4 inexperienced and 2 experts, individually evaluated these videotaped gait patterns of the patients twice, by using a structured gait analysis form. Reliability levels were established by calculating the Intraclass Correlation Coefficient (ICC), using a two-way random design and based on absolute agreement. Results The inter-rater reliability among experienced raters (ICC = 0.42; 95%CI: 0.38–0.46) was comparable to that of the inexperienced raters (ICC = 0.40; 95%CI: 0.36–0.44). The expert raters reached a higher inter-rater reliability level (ICC = 0.54; 95%CI: 0.48–0.60). The average intra-rater reliability of the experienced raters was 0.63 (ICCs ranging from 0.57 to 0.70). The inexperienced raters reached an average intra-rater reliability of 0.57 (ICCs ranging from 0.52 to 0.62). The two expert raters attained ICC values of 0.70 and 0.74 respectively. Conclusion Structured visual gait observation by use of a gait analysis form as described in this study was found to be moderately reliable. Clinical experience appears to increase the reliability of visual gait analysis. PMID:15774012
Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial
Hallgren, Kevin A.
2012-01-01
Many research designs require the assessment of inter-rater reliability (IRR) to demonstrate consistency among observational ratings provided by multiple coders. However, many studies use incorrect statistical procedures, fail to fully report the information necessary to interpret their results, or do not address how IRR affects the power of their subsequent analyses for hypothesis testing. This paper provides an overview of methodological issues related to the assessment of IRR with a focus on study design, selection of appropriate statistics, and the computation, interpretation, and reporting of some commonly-used IRR statistics. Computational examples include SPSS and R syntax for computing Cohen’s kappa and intra-class correlations to assess IRR. PMID:22833776
Reliability of the Robinson classification for displaced comminuted midshaft clavicular fractures.
Stegeman, Sylvia A; Fernandes, Nicole C; Krijnen, Pieta; Schipper, Inger B
2015-01-01
This study aimed to assess the reliability of the Robinson classification for displaced comminuted midshaft fractures. A total of 102 surgeons and 52 radiologists classified 15 displaced comminuted midshaft clavicular fractures on anteroposterior (AP) and 30-degree caudocephalad radiographs twice. For both surgeons and radiologists, inter-observer and intra-observer agreement significantly improved after showing the 30-degree caudocephalad view in addition to the AP view. Radiologists had significantly higher inter- and intra-observer agreement than surgeons after judging both radiographs (κmultirater of 0.81 vs. 0.56; κintra-observer of 0.73 vs. 0.44). We advise to use two-plane radiography and to routinely incorporate the Robinson classification in the radiology reports. Copyright © 2015 Elsevier Inc. All rights reserved.
Reliability of the Cooking Task in adults with acquired brain injury.
Poncet, Frédérique; Swaine, Bonnie; Taillefer, Chantal; Lamoureux, Julie; Pradat-Diehl, Pascale; Chevignard, Mathilde
2015-01-01
Acquired brain injury (ABI) often leads to deficits in executive functioning (EF) responsible for severe and long-standing disabilities in daily life activities. The Cooking Task is an ecological and valid test of EF involving multi-tasking in a real environment. Given its complex scoring system, it is important to establish the tool's reliability. The objective of the study was to examine the reliability of the Cooking Task (internal consistency, inter-rater and test-retest reliability). A total of 160 patients with ABI (113 men, mean age 37 years, SD = 14.3) were tested using the Cooking Task. For test-retest reliability, patients were assessed by the same rater on two occasions (mean interval 11 days) while two raters independently and simultaneously observed and scored patients' performances to estimate inter-rater reliability. Internal consistency was high for the global scale (Cronbach α = .74). Inter-rater reliability (n = 66) for total errors was also high (ICC = .93), however the test-retest reliability (n = 11) was poor (ICC = .36). In general the Cooking Task appears to be a reliable tool. The low test-retest results were expected given the importance of EF in the performance of novel tasks.
The development of a reliable amateur boxing performance analysis template.
Thomson, Edward; Lamb, Kevin; Nicholas, Ceri
2013-01-01
The aim of this study was to devise a valid performance analysis system for the assessment of the movement characteristics associated with competitive amateur boxing and assess its reliability using analysts of varying experience of the sport and performance analysis. Key performance indicators to characterise the demands of an amateur contest (offensive, defensive and feinting) were developed and notated using a computerised notational analysis system. Data were subjected to intra- and inter-observer reliability assessment using median sign tests and calculating the proportion of agreement within predetermined limits of error. For all performance indicators, intra-observer reliability revealed non-significant differences between observations (P > 0.05) and high agreement was established (80-100%) regardless of whether exact or the reference value of ±1 was applied. Inter-observer reliability was less impressive for both analysts (amateur boxer and experienced analyst), with the proportion of agreement ranging from 33-100%. Nonetheless, there was no systematic bias between observations for any indicator (P > 0.05), and the proportion of agreement within the reference range (±1) was 100%. A reliable performance analysis template has been developed for the assessment of amateur boxing performance and is available for use by researchers, coaches and athletes to classify and quantify the movement characteristics of amateur boxing.
Varga, Zsuzsanna; Cassoly, Estelle; Li, Qiyu; Oehlschlegel, Christian; Tapia, Coya; Lehr, Hans Anton; Klingbiel, Dirk; Thürlimann, Beat; Ruhstaller, Thomas
2015-01-01
Background Proliferative activity (Ki-67 Labelling Index) in breast cancer increasingly serves as an additional tool in the decision for or against adjuvant chemotherapy in midrange hormone receptor positive breast cancer. Ki-67 Index has been previously shown to suffer from high inter-observer variability especially in midrange (G2) breast carcinomas. In this study we conducted a systematic approach using different Ki-67 assessments on large tissue sections in order to identify the method with the highest reliability and the lowest variability. Materials and Methods Five breast pathologists retrospectively analyzed proliferative activity of 50 G2 invasive breast carcinomas using large tissue sections by assessing Ki-67 immunohistochemistry. Ki-67-assessments were done on light microscopy and on digital images following these methods: 1) assessing five regions, 2) assessing only darkly stained nuclei and 3) considering only condensed proliferative areas (‘hotspots’). An individual review (the first described assessment from 2008) was also performed. The assessments on light microscopy were done by estimating. All measurements were performed three times. Inter-observer and intra-observer reliabilities were calculated using the approach proposed by Eliasziw et al. Clinical cutoffs (14% and 20%) were tested using Fleiss’ Kappa. Results There was a good intra-observer reliability in 5 of 7 methods (ICC: 0.76–0.89). The two highest inter-observer reliability was fair to moderate (ICC: 0.71 and 0.74) in 2 methods (region-analysis and individual-review) on light microscopy. Fleiss’-kappa-values (14% cut-off) were the highest (moderate) using the original recommendation on light-microscope (Kappa 0.58). Fleiss’ kappa values (20% cut-off) were the highest (Kappa 0.48 each) in analyzing hotspots on light-microscopy and digital-analysis. No methodologies using digital-analysis were superior to the methods on light microscope. Conclusion Our results show that all methods on light-microscopy for Ki-67 assessment in large tissue sections resulted in a good intra-observer reliability. Region analysis and individual review (the original recommendation) on light-microscopy yielded the highest inter-observer reliability. These results show slight improvement to previously published data on poor-reproducibility and thus might be a practical-pragmatic way for routine assessment of Ki-67 Index in G2 breast carcinomas. PMID:25885288
A Study on the Reliability of Sasang Constitutional Body Trunk Measurement
Jang, Eunsu; Kim, Jong Yeol; Lee, Haejung; Kim, Honggie; Baek, Younghwa; Lee, Siwoo
2012-01-01
Objective. Body trunk measurement for human plays an important diagnostic role not only in conventional medicine but also in Sasang constitutional medicine (SCM). The Sasang constitutional body trunk measurement (SCBTM) consists of the 5-widths and the 8-circumferences which are standard locations currently employed in the SCM society. This study suggests to what extent a comprehensive training can improve the reliability of the SCBTM. Methods. We recruited 10 male subjects and 5 male observers with no experience of anthropometric measurement. We conducted measurements twice before and after a comprehensive training. Relative technical error of measurement (%TEMs) was produced to assess intra and inter observer reliabilities. Results. Post-training intra-observer %TEMs of the SCBTM were 0.27% to 1.85% reduced from 0.27% to 6.26% in pre-training, respectively. Post-training inter-observer %TEMs of those were 0.56% to 1.66% reduced from 1.00% to 9.60% in pre-training, respectively. Post-training % total TEMs which represent the whole reliability were 0.68% to 2.18% reduced from maximum value of 10.18%. Conclusion. A comprehensive training makes the SCBTM more reliable, hence giving a sufficiently confident diagnostic tool. It is strongly recommended to give a comprehensive training in advance to take the SCBTM. PMID:21822442
De Coninck, Kyra; Hambly, Karen; Dickinson, John W; Passfield, Louis
2018-06-01
Chronic lower back pain is still regarded as a poorly understood multifactorial condition. Recently, the thoracolumbar fascia complex has been found to be a contributing factor. Ultrasound imaging has shown that people with chronic lower back pain demonstrate both a significant decrease in shear strain, and a 25% increase in thickness of the thoracolumbar fascia. There is sparse data on whether medical practitioners agree on the level of disorganisation in ultrasound images of thoracolumbar fascia. The purpose of this study was to establish inter-rater reliability of the ranking of architectural disorganisation of thoracolumbar fascia on a scale from 'very disorganised' to 'very organised'. An exploratory analysis was performed using a fully crossed design of inter-rater reliability. Thirty observers were recruited, consisting of 21 medical doctors, 7 physiotherapists and 2 radiologists, with an average of 13.03 ± 9.6 years of clinical experience. All 30 observers independently rated the architectural disorganisation of the thoracolumbar fascia in 30 ultrasound scans, on a Likert-type scale with rankings from 1 = very disorganised to 10 = very organised. Internal consistency was assessed using Cronbach's alpha. Krippendorff's alpha was used to calculate the overall inter-rater reliability. The Krippendorf's alpha was .61, indicating a modest degree of agreement between observers on the different morphologies of thoracolumbar fascia.The Cronbach's alpha (0.98), indicated that there was a high degree of consistency between observers. Experience in ultrasound image analysis did not affect constancy between observers (Cronbach's range between experienced and inexperienced raters: 0.95 and 0.96 respectively). Medical practitioners agree on morphological features such as levels of organisation and disorganisation in ultrasound images of thoracolumbar fascia, regardless of experience. Further analysis by an expert panel is required to develop specific classification criteria for thoracolumbar fascia.
2013-01-01
Summary of background data Recent smartphones, such as the iPhone, are often equipped with an accelerometer and magnetometer, which, through software applications, can perform various inclinometric functions. Although these applications are intended for recreational use, they have the potential to measure and quantify range of motion. The purpose of this study was to estimate the intra and inter-rater reliability as well as the criterion validity of the clinometer and compass applications of the iPhone in the assessment cervical range of motion in healthy participants. Methods The sample consisted of 28 healthy participants. Two examiners measured cervical range of motion of each participant twice using the iPhone (for the estimation of intra and inter-reliability) and once with the CROM (for the estimation of criterion validity). Estimates of reliability and validity were then established using the intraclass correlation coefficient (ICC). Results We observed a moderate intra-rater reliability for each movement (ICC = 0.65-0.85) but a poor inter-rater reliability (ICC < 0.60). For the criterion validity, the ICCs are moderate (>0.50) to good (>0.65) for movements of flexion, extension, lateral flexions and right rotation, but poor (<0.50) for the movement left rotation. Conclusion We found good intra-rater reliability and lower inter-rater reliability. When compared to the gold standard, these applications showed moderate to good validity. However, before using the iPhone as an outcome measure in clinical settings, studies should be done on patients presenting with cervical problems. PMID:23829201
Validation of different pediatric triage systems in the emergency department
Aeimchanbanjong, Kanokwan; Pandee, Uthen
2017-01-01
BACKGROUND: Triage system in children seems to be more challenging compared to adults because of their different response to physiological and psychosocial stressors. This study aimed to determine the best triage system in the pediatric emergency department. METHODS: This was a prospective observational study. This study was divided into two phases. The first phase determined the inter-rater reliability of five triage systems: Manchester Triage System (MTS), Emergency Severity Index (ESI) version 4, Pediatric Canadian Triage and Acuity Scale (CTAS), Australasian Triage Scale (ATS), and Ramathibodi Triage System (RTS) by triage nurses and pediatric residents. In the second phase, to analyze the validity of each triage system, patients were categorized as two groups, i.e., high acuity patients (triage level 1, 2) and low acuity patients (triage level 3, 4, and 5). Then we compared the triage acuity with actual admission. RESULTS: In phase I, RTS illustrated almost perfect inter-rater reliability with kappa of 1.0 (P<0.01). ESI and CTAS illustrated good inter-rater reliability with kappa of 0.8–0.9 (P<0.01). Meanwhile, ATS and MTS illustrated moderate to good inter-rater reliability with kappa of 0.5–0.7 (P<0.01). In phase II, we included 1 041 participants with average age of 4.7±4.2 years, of which 55% were male and 45% were female. In addition 32% of the participants had underlying diseases, and 123 (11.8%) patients were admitted. We found that ESI illustrated the most appropriate predicting ability for admission with sensitivity of 52%, specificity of 81%, and AUC 0.78 (95%CI 0.74–0.81). CONCLUSION: RTS illustrated almost perfect inter-rater reliability. Meanwhile, ESI and CTAS illustrated good inter-rater reliability. Finally, ESI illustrated the appropriate validity for triage system. PMID:28680520
Intra- and Inter-Observer Reliability of the Trunk Impairment Scale for Children with Cerebral Palsy
ERIC Educational Resources Information Center
Saether, Rannei; Jorgensen, Lone
2011-01-01
Standardized scales to evaluate qualities of trunk movements in children with dysfunction are sparse. An examination of the reliability of scales that may be useful in the clinic is important. The aim of this study was to examine the reliability of the Trunk Impairment Scale (TIS) for children with cerebral palsy (CP). Standardized scales are…
Patange Subba Rao, Sheethal Prasad; Lewis, James; Haddad, Ziad; Paringe, Vishal; Mohanty, Khitish
2014-10-01
The aim of the study was to evaluate inter-observer reliability and intra-observer reproducibility between the three-column classification and Schatzker classification systems using 2D and 3D CT models. Fifty-two consecutive patients with tibial plateau fractures were evaluated by five orthopaedic surgeons. All patients were classified into Schatzker and three-column classification systems using x-rays and 2D and 3D CT images. The inter-observer reliability was evaluated in the first round and the intra-observer reliability was determined during the second round 2 weeks later. The average intra-observer reproducibility for the three-column classification was from substantial to excellent in all sub classifications, as compared with Schatzker classification. The inter-observer kappa values increased from substantial to excellent in three-column classification and to moderate in Schatzker classification The average values for three-column classification for all the categories are as follows: (I-III) k2D = 0.718, 95% CI 0.554-0.864, p < 0.0001 and average 3D = 0.874, 95% CI 0.754-0.890, p < 0.0001. For Schatzker classification system, the average values for all six categories are as follows: (I-VI) k2D = 0.536, 95% CI 0.365-0.685, p < 0.0001 and average k3D = 0.552 95% CI 0.405-0.700, p < 0.0001. The values are statistically significant. Statistically significant inter-observer values in both rounds were noted with the three-column classification, making it statistically an excellent agreement. The intra-observer reproducibility for the three-column classification improved as compared with the Schatzker classification. The three-column classification seems to be an effective way to characterise and classify fractures of tibial plateau.
2012-01-01
Background Aims of the present study are the following: 1. to describe the rationale and methodology of the Services and Health for Elderly in Long TERm care (SHELTER) study, a project funded by the European Union, aimed at implementing the interRAI instrument for Long Term Care Facilities (interRAI LTCF) as a tool to assess and gather uniform information about nursing home (NH) residents across different health systems in European countries; 2. to present the results about the test-retest and inter-rater reliability of the interRAI LTCF instrument translated into the languages of participating countries; 3 to illustrate the characteristics of NH residents at study entry. Methods A 12 months prospective cohort study was conducted in 57 NH in 7 EU countries (Czech Republic, England, Finland, France, Germany, Italy, The Netherlands) and 1 non EU country (Israel). Weighted kappa coefficients were used to evaluate the reliability of interRAI LTCF items. Results Mean age of 4156 residents entering the study was 83.4 ± 9.4 years, 73% were female. ADL disability and cognitive impairment was observed in 81.3% and 68.0% of residents, respectively. Clinical complexity of residents was confirmed by a high prevalence of behavioral symptoms (27.5% of residents), falls (18.6%), pressure ulcers (10.4%), pain (36.0%) and urinary incontinence (73.5%). Overall, 197 of the 198 the items tested met or exceeded standard cut-offs for acceptable test-retest and inter-rater reliability after translation into the target languages. Conclusion The interRAI LTCF appears to be a reliable instrument. It enables the creation of databases that can be used to govern the provision of long-term care across different health systems in Europe, to answer relevant research and policy questions and to compare characteristics of NH residents across countries, languages and cultures. PMID:22230771
Inter-rater reliability of an observation-based ergonomics assessment checklist for office workers.
Pereira, Michelle Jessica; Straker, Leon Melville; Comans, Tracy Anne; Johnston, Venerina
2016-12-01
To establish the inter-rater reliability of an observation-based ergonomics assessment checklist for computer workers. A 37-item (38-item if a laptop was part of the workstation) comprehensive observational ergonomics assessment checklist comparable to government guidelines and up to date with empirical evidence was developed. Two trained practitioners assessed full-time office workers performing their usual computer-based work and evaluated the suitability of workstations used. Practitioners assessed each participant consecutively. The order of assessors was randomised, and the second assessor was blinded to the findings of the first. Unadjusted kappa coefficients between the raters were obtained for the overall checklist and subsections that were formed from question-items relevant to specific workstation equipment. Twenty-seven office workers were recruited. The inter-rater reliability between two trained practitioners achieved moderate to good reliability for all except one checklist component. This checklist has mostly moderate to good reliability between two trained practitioners. Practitioner Summary: This reliable ergonomics assessment checklist for computer workers was designed using accessible government guidelines and supplemented with up-to-date evidence. Employers in Queensland (Australia) can fulfil legislative requirements by using this reliable checklist to identify and subsequently address potential risk factors for work-related injury to provide a safe working environment.
Vieira, A; Battini, M; Can, E; Mattiello, S; Stilwell, G
2018-01-08
This study was conducted within the context of the Animal Welfare Indicators (AWIN) project and the underlying scientific motivation for the development of the study was the scarcity of data regarding inter-observer reliability (IOR) of welfare indicators, particularly given the importance of reliability as a further step for developing on-farm welfare assessment protocols. The objective of this study is therefore to evaluate IOR of animal-based indicators (at group and individual-level) of the AWIN welfare assessment protocol (prototype) for dairy goats. In the design of the study, two pairs of observers, one in Portugal and another in Italy, visited 10 farms each and applied the AWIN prototype protocol. Farms in both countries were visited between January and March 2014, and all the observers received the same training before the farm visits were initiated. Data collected during farm visits, and analysed in this study, include group-level and individual-level observations. The results of our study allow us to conclude that most of the group-level indicators presented the highest IOR level ('substantial', 0.85 to 0.99) in both field studies, pointing to a usable set of animal-based welfare indicators that were therefore included in the first level of the final AWIN welfare assessment protocol for dairy goats. Inter-observer reliability of individual-level indicators was lower, but the majority of them still reached 'fair to good' (0.41 to 0.75) and 'excellent' (0.76 to 1) levels. In the paper we explore reasons for the differences found in IOR between the group and individual-level indicators, including how the number of individual-level indicators to be assessed on each animal and the restraining method may have affected the results. Furthermore, we discuss the differences found in the IOR of individual-level indicators in both countries: the Portuguese pair of observers reached a higher level of IOR, when compared with the Italian observers. We argue how the reasons behind these differences may stem from the restraining method applied, or the different background and experience of the observers. Finally, the discussion of the results emphasizes the importance of considering that reliability is not an absolute attribute of an indicator, but derives from an interaction between the indicators, the observers and the situation in which the assessment is taking place. This highlights the importance of further considering the indicators' reliability while developing welfare assessment protocols.
Michaelsen, Stella M; Rocha, André S; Knabben, Rodrigo J; Rodrigues, Luciano P; Fernandes, Claudia G C
2011-01-01
Recently, the reliability of the Brazilian version of the Fugl-Meyer Assessment (FMA) was assessed through the scoring given according to observations made by a single evaluator who applied the test. When different raters apply the scale, the reliability may depend on the interpretation given to the assessment sheet. In such cases, a clear administration manual is essential for ensuring homogeneity of application. To translate and adapt the French Canadian version of the FMA administration manual into Brazilian Portuguese and to evaluate the inter-rater reliability when different evaluators apply the FMA on the basis of the information contained in the manual. Eighteen adults (59±10 years) with chronic hemiparesis (38±35 months after a stroke) took part in this study. Eight patients participated in the first part of the study and 10 in the second part. Based on analyzing the results from part 1, an adapted version was developed, in which information and photos were added to illustrate the positions of the patient and evaluator. The inter-rater reliability was assessed using the intraclass correlation coefficient (ICC). The reliability of the FMA based on the adapted version of the manual was excellent for the total motor scores for the upper limbs (ICC=0.98) and lower limbs (ICC=0.90), as well as for movement sense (ICC=0.98) and upper and lower-limb passive range of motion (ICC=0.84 and 0.90, respectively). The reliability was moderate for tactile sensitivity (0.75). The joint pain assessment presented low reliability. The results showed that, except for pain assessment, application of the FMA based on the adapted version of the application manual for Brazilian Portuguese presented adequate inter-rater reliability.
McIver, Kerry L.; Brown, William H.; Pfeiffer, Karin A.; Dowda, Marsha; Pate, Russell R.
2016-01-01
Purpose This study describes the development and pilot testing of the Observational System for Recording Physical Activity-Elementary School (OSRAC-E) version. Methods This system was developed to observe and document the levels and types of physical activity and physical and social contexts of physical activity in elementary school students during the school day. Inter-observer agreement scores and summary data were calculated. Results All categories had Kappa statistics above 0.80, with the exception of the activity initiator category. Inter-observer agreement scores were 96% or greater. The OSRAC-E was shown to be a reliable observation system that allows researchers to assess physical activity behaviors, the contexts of those behaviors, and the effectiveness of physical activity interventions in the school environment. Conclusion The OSRAC-E can yield data with high interobserver reliability and provide relatively extensive contextual information about physical activity of students in elementary schools. PMID:26889587
Mischiati, Carolina R.; Comerford, Mark; Gosford, Emma; Swart, Jacqueline; Ewings, Sean; Botha, Nadine; Stokes, Maria; Mottram, Sarah L.
2015-01-01
Pre-season screening is well established within the sporting arena, and aims to enhance performance and reduce injury risk. With the increasing need to identify potential injury with greater accuracy, a new risk assessment process has been produced; The Performance Matrix (battery of movement control tests). As with any new method of objective testing, it is fundamental to establish whether the same results can be reproduced between examiners and by the same examiner on consecutive occasions. This study aimed to determine the intra-rater test re-test and inter-rater reliability of tests from a component of The Performance Matrix, The Foundation Matrix. Twenty participants were screened by two experienced musculoskeletal therapists using nine tests to assess the ability to control movement during specific tasks. Movement evaluation criteria for each test were rated as pass or fail. The therapists observed participants real-time and tests were recorded on video to enable repeated ratings four months later to examine intra-rater reliability (videos rated two weeks apart). Overall test percentage agreement was 87% for inter-rater reliability; 98% Rater 1, 94% Rater 2 for test re-test reliability; and 75% for real-time versus video. Intraclass-correlation coefficients (ICCs) were excellent between raters (0.81) and within raters (Rater 1, 0.96; Rater 2, 0.88) but poor for real-time versus video (0.23). Reliability for individual components of each test was more variable: inter-rater, 68-100%; intra-rater, 88-100% Rater 1, 75-100% Rater 2; and real-time versus video 31-100%. Cohen’s Kappa values for inter-rater reliability were 0.0-1.0; intra-rater 0.6-1.0 for Rater 1; -0.1-1.0 for Rater 2; and -0.1-1 for real-time versus video. It is concluded that both inter and intra-rater reliability of tests in The Foundation Matrix are acceptable when rated by experienced therapists. Recommendations are made for modifying some of the criteria to improve reliability where excellence was not reached. Key points The movement control tests of The Foundation Matrix had acceptable reliability between raters and within raters on different days Agreement between observations made on tests performed real-time and on video recordings was low, indicating poor validity of use of video recordings Some movement evaluation criteria related to specific tests that did not achieve excellent agreement could be modified to improve reliability PMID:25983594
Steidle-Kloc, E; Wirth, W; Ruhdorfer, A; Dannhauer, T; Eckstein, F
2016-03-01
The infra-patellar fat pad (IPFP), as intra-articular adipose tissue represents a potential source of pro-inflammatory cytokines and its size has been suggested to be associated with osteoarthritis (OA) of the knee. This study examines inter- and intra-observer reliability of fat-suppressed (fs) and non-fat-suppressed (nfs) MR imaging for determination of IPFP morphological measurements as novel biomarkers. The IPFP of nine right knees of healthy Osteoarthritis Initiative participants was segmented by five readers, using fs and nfs baseline sagittal MRIs. The intra-observer reliability was determined from baseline and 1-year follow-up images. All segmentations were quality controlled (QC) by an expert reader. Reliability was expressed as root mean square coefficient of variation (RMS CV%). After QC, the inter-observer reliability for fs (nfs) imaging was 2.0% (1.1%) for IPFP volume, 2.1%/2.5% (1.6%/1.8%) for anterior/posterior surface areas, 1.8% (1.8%) for depth, and 2.1% (2.4%) for maximum sagittal area. The intra-observer reliability was 3.1% (5.0%) for volume, 2.3%/2.8% (2.5%/2.9%) for anterior/posterior surfaces, 1.9% (3.5%) for depth, and 3.3% (4.5%) for maximum sagittal area. IPFP volume from nfs images was systematically greater (+7.3%) than from fs images, but highly correlated (r=0.98). The results suggest that quantitative measurements of IPFP morphology can be performed with satisfactory reliability when expert QC is implemented. The IPFP is more clearly depicted in nfs images, and there is a small systematic off-set versus analysis from fs images. However, the high linear relationship between fs and nfs imaging suggests that fs images can be used to analyze IPFP morphology, when nfs images are not available. Copyright © 2015 Elsevier GmbH. All rights reserved.
Steidle-Kloc, E.; Wirth, W.; Ruhdorfer, A.; Dannhauer, T.; Eckstein, F.
2015-01-01
The infra-patellar fat pad (IPFP), as intra-articular adipose tissue represents a potential source of pro-inflammatory cytokines and its size has been suggested to be associated with osteoarthritis (OA) of the knee. This study examines inter- and intra-observer reliability of fat-suppressed (fs) and non-fat-suppressed (nfs) MR imaging for determination of IPFP morphological measurements as novel biomarkers. The IPFP of nine right knees of healthy Osteoarthritis Initiative participants was segmented by five readers, using fs and nfs baseline sagittal MRIs. The intra-observer reliability was determined from baseline and 1-year follow-up images. All segmentations were quality controlled (QC) by an expert reader. Reliability was expressed as root mean square coefficient of variation (RMS CV%). After QC, the inter-observer reliability for fs (nfs) imaging was 2.0% (1.1%) for IPFP volume, 2.1%/2.5% (1.6%/1.8%) for anterior/posterior surface areas, 1.8% (1.8%) for depth, and 2.1% (2.4%) for maximum sagittal area. The intra-observer reliability was 3.1% (5.0%) for volume, 2.3%/2.8% (2.5%/2.9%) for anterior/posterior surfaces, 1.9% (3.5%) for depth, and 3.3% (4.5%) for maximum sagittal area. IPFP volume from nfs images was systematically greater (+7.3%) than from fs images, but highly correlated (r = 0.98). The results suggest that quantitative measurements of IPFP morphology can be performed with satisfactory reliability when expert QC is implemented. The IPFP is more clearly depicted in nfs images, and there is a small systematic off-set versus analysis from fs images. However, the high linear relationship between fs and nfs imaging suggests that fs images can be used to analyze IPFP morphology, when nfs images are not available. PMID:26569532
Inter-Rater Reliability of Cyclotorsion Measurements Using Fundus Photography.
Dysli, Muriel; Kanku, Madeleine; Traber, Ghislaine L
2018-04-01
The foveo-papillary angle (FPA) on fundus photographs is the accepted standard for the measurement of ocular cyclotorsion. We assessed the inter-rater reliability of this method in healthy subjects and in patients with trochlear nerve palsies. In this methodological study, fundus photographs of healthy subjects and of patients with trochlear nerve palsies were made with a fundus camera (Zeiss Fundus Camera FF 450 plus, Jena, Germany). Three independent observers measured the FPA on the fundus photographs of all subjects in synedra View (synedra View 16, Version 16.0.0.11, Innsbruck, Austria). One hundred and four eyes of 52 subjects (26 healthy controls and 26 patients) were assessed. The mean FPA of the healthy controls was 5.80 degrees (°) [± 0.44 standard error of the mean (SEM)] compared to 11.55° (± 0.80 SEM) for patients with trochlear nerve palsies. The inter-rater reliability of all measured FPAs showed an intraclass correlation coefficient (ICC) of 0.98 (95% CI 0.97 - 0.98). The inter-rater reliability of objective cyclotorsion measurements using fundus photographs was very high. Georg Thieme Verlag KG Stuttgart · New York.
The inter and intra rater reliability of the Netball Movement Screening Tool.
Reid, Duncan A; Vanweerd, Rebecca J; Larmer, Peter J; Kingstone, Rachel
2015-05-01
To establish the inter- and intra-rater reliability of the Netball Movement Screening Tool, for screening adolescent female netball players. Inter- and intra-rater reliability study. Forty secondary school netball players were recruited to take part in the study. Twenty subjects were screened simultaneously and independently by two raters to ascertain inter-rater agreement. Twenty subjects were scored by rater one on two occasions, separated by a week, to ascertain intra-rater agreement. Inter and intra-rater agreement was assessed utilising the two-way mixed inter class correlation coefficient and weighted kappa statistics. No significant demographic differences were found between the inter and intra-rater groups of subjects. Inter class correlation coefficients' demonstrated excellent inter-rater (two-way mixed inter class correlation coefficients 0.84, standard error of measurement 0.25) and intra-rater (two-way mixed inter class correlation coefficients 0.96, standard error of measurement 0.13) reliability for the overall Netball Movement Screening Tool score and substantial-excellent (two-way mixed inter class correlation coefficients 1.0-0.65) inter-rater and substantial-excellent intra-rater (two-way mixed inter class correlation coefficients 0.96-0.79) reliability for the component scores of the Netball Movement Screening Tool. Kappa statistic showed substantial to poor inter-rater (k=0.75-0.32) and intra-rater (k=0.77-0.27) agreement for individual tests of the NMST. The Netball Movement Screening Tool may be a reliable screening tool for adolescent netball players; however the individual test scores have low reliability. The screening tool can be administered reliably by raters with similar levels of training in the tool but variable clinical experience. On-going research needs to be undertaken to ascertain whether the Netball Movement Screening Tool is a valid tool in ascertaining increased injury risk for netball players. Copyright © 2014 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Reliability of a survey tool for measuring consumer nutrition environment in urban food stores.
Hosler, Akiko S; Dharssi, Aliza
2011-01-01
Despite the increase in the volume and importance of food environment research, there is a general lack of reliable measurement tools. This study presents the development and reliability assessment of a tool for measuring consumer nutrition environment in urban food stores. Cross-sectional design. A racially diverse downtown portion (6 ZIP code areas) in Albany, New York. A sample of 39 food stores was visited by our research team in 2009 to 2010. These stores were randomly selected from 123 eligible food stores identified through multiple government lists and ground-truthing. The Food Retail Outlet Survey Tool was developed to assess the presence of selected food and nonfood items, placement, milk prices, physical characteristics of the store, policy implementation, and advertisements on outside windows. For in-store items, agreement of observations between experienced and lightly trained surveyors was assessed. For window advertisement assessments, inter-method agreement (on-site sketch vs digital photo), and inter-rater agreement (both on-site) among lightly trained surveyors were evaluated. Percent agreement, Kappa, and prevalence-adjusted bias-adjusted kappa were calculated for in-store observations. Interclass correlation coefficients were calculated for window observations. Twenty-seven of the 47 in-store items had 100% agreement. The prevalence-adjusted bias-adjusted kappa indicated excellent agreement (≥0.90) on all items, except aisle width (0.74) and dark-green/orange colored fresh vegetables (0.85). The store type (nonconvenience store), the order of visits (first half), and the time to complete survey (>10 minutes) were associated with lower reliability in these 2 items. Both the inter-method and inter-rater agreements for window advertisements were uniformly high (intraclass correlation coefficient ranged 0.94-1.00), indicating high reliability. The Food Retail Outlet Survey Tool is a reliable tool for quickly measuring consumer nutrition environment. It can be effectively used by an individual who attended a 30-minute group briefing and practiced with 3 to 4 stores.
Reliability of 3D laser-based anthropometry and comparison with classical anthropometry.
Kuehnapfel, Andreas; Ahnert, Peter; Loeffler, Markus; Broda, Anja; Scholz, Markus
2016-05-26
Anthropometric quantities are widely used in epidemiologic research as possible confounders, risk factors, or outcomes. 3D laser-based body scans (BS) allow evaluation of dozens of quantities in short time with minimal physical contact between observers and probands. The aim of this study was to compare BS with classical manual anthropometric (CA) assessments with respect to feasibility, reliability, and validity. We performed a study on 108 individuals with multiple measurements of BS and CA to estimate intra- and inter-rater reliabilities for both. We suggested BS equivalents of CA measurements and determined validity of BS considering CA the gold standard. Throughout the study, the overall concordance correlation coefficient (OCCC) was chosen as indicator of agreement. BS was slightly more time consuming but better accepted than CA. For CA, OCCCs for intra- and inter-rater reliability were greater than 0.8 for all nine quantities studied. For BS, 9 of 154 quantities showed reliabilities below 0.7. BS proxies for CA measurements showed good agreement (minimum OCCC > 0.77) after offset correction. Thigh length showed higher reliability in BS while upper arm length showed higher reliability in CA. Except for these issues, reliabilities of CA measurements and their BS equivalents were comparable.
Inter-rater and intra-rater reliability of the Bahasa Melayu version of Rose Angina Questionnaire.
Hassan, N B; Choudhury, S R; Naing, L; Conroy, R M; Rahman, A R A
2007-01-01
The objective of the study is to translate the Rose Questionnaire (RQ) into a Bahasa Melayu version and adapt it cross-culturally, and to measure its inter-rater and intrarater reliability. This cross sectional study was conducted in the respondents' homes or workplaces in Kelantan, Malaysia. One hundred respondents aged 30 and above with different socio-demographic status were interviewed for face validity. For each inter-rater and intra-rater reliability, a sample of 150 respondents was interviewed. Inter-rater and intra-rater reliabilities were assessed by Cohen's kappa. The overall inter-rater agreements by the five pair of interviewers at point one and two were 0.86, and intrarater reliability by the five interviewers on the seven-item questionnaire at poinone and two was 0.88, as measured by kappa coefficient. The translated Malay version of RQ demonstrated an almost perfect inter-rater and intra-rater reliability and further validation such as sensitivity and specificity analysis of this translated questionnaire is highly recommended.
Llamas, José M.; Cibrián, Rosa; Gandia, José L.; Paredes, Vanessa
2012-01-01
Objectives: Cone Beam Computerized Tomography (CBCT) allows the possibility of modifying some of the diagnostic tools used in orthodontics, such as cephalometry. The first step must be to study the characteristics of these devices in terms of accuracy and reliability of the most commonly used landmarks. The aims were 1- To assess intra and inter-observer reliability in the location of anatomical landmarks belonging to hard tissues of the skull in images taken with a CBCT device, 2- To determine which of those landmarks are more vs. less reliable and 3- To introduce planes of reference so as to create cephalometric analyses appropriated to the 3D reality. Study design: Fifteen patients who had a CBCT (i-CAT®) as a diagnostic register were selected. To assess the reproducibility on landmark location and the differences in the measurements of two observers at different times, 41 landmarks were defined on the three spatial axes (X,Y,Z) and located. 3.690 measurements were taken and, as each determination has 3 coordinates, 11.070 data were processed with SPSS® statistical package. To discover the reproducibility of the method on landmark location, an ANOVA was undertaken using two variation factors: time (t1, t2 and t3) and observer (Ob1 and Ob2) for each axis (X, Y and Z) and landmark. The order of the CBCT scans submitted to the observers (Ob1, Ob2) at t1, t2, and t3, were different and randomly allocated. Multiple comparisons were undertaken using the Bonferroni test. The intra- and inter-examiner ICC´s were calculated. Results: Intra- and inter-examiner reliability was high, both being ICC ≥ 0.99, with the best frequency on axis Z. Conclusions: The most reliable landmarks were: Nasion, Sella, Basion, left Porion, point A, anterior nasal spine, Pogonion, Gnathion, Menton, frontozygomatic sutures, first lower molars and upper and lower incisors. Those with less reliability were the supraorbitals, right zygion and posterior nasal spine. Key words:Cone Beam Computed Tomography, cephalometry, landmark, orthodontics, reliability. PMID:22322503
Inter-rater agreement in evaluation of disability: systematic review of reproducibility studies
Barth, Jürgen; de Boer, Wout E L; Busse, Jason W; Hoving, Jan L; Kedzia, Sarah; Couban, Rachel; Fischer, Katrin; von Allmen, David Y; Spanjer, Jerry
2017-01-01
Objectives To explore agreement among healthcare professionals assessing eligibility for work disability benefits. Design Systematic review and narrative synthesis of reproducibility studies. Data sources Medline, Embase, and PsycINFO searched up to 16 March 2016, without language restrictions, and review of bibliographies of included studies. Eligibility criteria Observational studies investigating reproducibility among healthcare professionals performing disability evaluations using a global rating of working capacity and reporting inter-rater reliability by a statistical measure or descriptively. Studies could be conducted in insurance settings, where decisions on ability to work include normative judgments based on legal considerations, or in research settings, where decisions on ability to work disregard normative considerations.Teams of paired reviewers identified eligible studies, appraised their methodological quality and generalisability, and abstracted results with pretested forms. As heterogeneity of research designs and findings impeded a quantitative analysis, a descriptive synthesis stratified by setting (insurance or research) was performed. Results From 4562 references, 101 full text articles were reviewed. Of these, 16 studies conducted in an insurance setting and seven in a research setting, performed in 12 countries, met the inclusion criteria. Studies in the insurance setting were conducted with medical experts assessing claimants who were actual disability claimants or played by actors, hypothetical cases, or short written scenarios. Conditions were mental (n=6, 38%), musculoskeletal (n=4, 25%), or mixed (n=6, 38%). Applicability of findings from studies conducted in an insurance setting to real life evaluations ranged from generalisable (n=7, 44%) and probably generalisable (n=3, 19%) to probably not generalisable (n=6, 37%). Median inter-rater reliability among experts was 0.45 (range intraclass correlation coefficient 0.86 to κ−0.10). Inter-rater reliability was poor in six studies (37%) and excellent in only two (13%). This contrasts with studies conducted in the research setting, where the median inter-rater reliability was 0.76 (range 0.91-0.53), and 71% (5/7) studies achieved excellent inter-rater reliability. Reliability between assessing professionals was higher when the evaluation was guided by a standardised instrument (23 studies, P=0.006). No such association was detected for subjective or chronic health conditions or the studies’ generalisability to real world evaluation of disability (P=0.46, 0.45, and 0.65, respectively). Conclusions Despite their common use and far reaching consequences for workers claiming disabling injury or illness, research on the reliability of medical evaluations of disability for work is limited and indicates high variation in judgments among assessing professionals. Standardising the evaluation process could improve reliability. Development and testing of instruments and structured approaches to improve reliability in evaluation of disability are urgently needed. PMID:28122727
Sureda, Xisca; Espelt, Albert; Villalbí, Joan R; Cebrecos, Alba; Baranda, Lucía; Pearce, Jamie; Franco, Manuel
2017-01-01
Objectives To describe the development and test–retest reliability of OHCITIES, an instrument characterising alcohol urban environment in terms of availability, promotion and signs of consumption. Design This study involved: (1) developing the conceptual framework for alcohol urban environment by means of literature reviewing and previous alcohol environment research experience; (2) pilot testing and redesigning the instrument; (3) instrument digitalisation; (4) instrument evaluation using test–retest reliability. Setting Data for testing the reliability of the instrument were collected in seven census sections in Madrid in 2016 by two observers. Primary and secondary outcome measures We computed per cent agreement and Cohen’s kappa coefficients to estimate inter-rater and test–retest reliability for alcohol outlet environment measures. We calculated interclass coefficients and their 95% CIs to provide a measure of inter-rater reliability for signs of alcohol consumption measures. Results We collected information on 92 on-premise and 24 off-premise alcohol outlets identified in the studied areas about availability, accessibility and promotion of alcohol. Most per cent-agreement values for alcohol measures in on-premise and off-premise alcohol outlets were greater than 80%, and inter-rater and test–retest reliability values were generally above 0.80. Observers identified 26 streets and 3 public squares with signs of alcohol consumption. Intraclass correlation coefficient between observers for any type of signs of alcohol consumption was 0.50 (95% CI −0.09 to 0.77). Few items promoting alcohol unrelated to alcohol outlets were found on public spaces. Conclusions The OHCITIES instrument is a reliable instrument to characterise alcohol urban environment. This instrument might be used to understand how alcohol environment associates with alcohol behaviours and its related health outcomes, and can help in the design and evaluation of policies to reduce the harm caused by alcohol. PMID:28982829
Sureda, Xisca; Espelt, Albert; Villalbí, Joan R; Cebrecos, Alba; Baranda, Lucía; Pearce, Jamie; Franco, Manuel
2017-10-05
To describe the development and test-retest reliability of OHCITIES, an instrument characterising alcohol urban environment in terms of availability, promotion and signs of consumption. This study involved: (1) developing the conceptual framework for alcohol urban environment by means of literature reviewing and previous alcohol environment research experience; (2) pilot testing and redesigning the instrument; (3) instrument digitalisation; (4) instrument evaluation using test-retest reliability. Data for testing the reliability of the instrument were collected in seven census sections in Madrid in 2016 by two observers. We computed per cent agreement and Cohen's kappa coefficients to estimate inter-rater and test-retest reliability for alcohol outlet environment measures. We calculated interclass coefficients and their 95% CIs to provide a measure of inter-rater reliability for signs of alcohol consumption measures. We collected information on 92 on-premise and 24 off-premise alcohol outlets identified in the studied areas about availability, accessibility and promotion of alcohol. Most per cent-agreement values for alcohol measures in on-premise and off-premise alcohol outlets were greater than 80%, and inter-rater and test-retest reliability values were generally above 0.80. Observers identified 26 streets and 3 public squares with signs of alcohol consumption. Intraclass correlation coefficient between observers for any type of signs of alcohol consumption was 0.50 (95% CI -0.09 to 0.77). Few items promoting alcohol unrelated to alcohol outlets were found on public spaces. The OHCITIES instrument is a reliable instrument to characterise alcohol urban environment. This instrument might be used to understand how alcohol environment associates with alcohol behaviours and its related health outcomes, and can help in the design and evaluation of policies to reduce the harm caused by alcohol. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Kalichman, Leonid; Klindukhov, Alexander; Li, Ling; Linov, Lina
2016-11-01
A reliability and cross-sectional observational study. To introduce a scoring system for visible fat infiltration in paraspinal muscles; to evaluate intertester and intratester reliability of this system and its relationship with indices of muscle density; to evaluate the association between indices of paraspinal muscle degeneration and facet joint osteoarthritis. Current evidence suggests that the paraspinal muscles degeneration is associated with low back pain, facet joint osteoarthritis, spondylolisthesis, and degenerative disc disease. However, the evaluation of paraspinal muscles on computed tomography is not radiological routine, probably because of absence of simple and reliable indices of paraspinal degeneration. One hundred fifty consecutive computed tomography scans of the lower back (N=75) or abdomen (N=75) were evaluated. Mean radiographic density (in Hounsfield units) and SD of the density of multifidus and erector spinae were evaluated at the L4-L5 spinal level. A new index of muscle degeneration, radiographic density ratio=muscle density/SD of density, was calculated. To evaluate the visible fat infiltration in paraspinal muscles, we proposed a 3-graded scoring system. The prevalence of facet joint osteoarthritis was also evaluated. Intraclass correlation and κ statistics were used to evaluate inter-rater and intra-rater reliability. Logistic regression examined the association between paraspinal muscle indices and facet joint osteoarthritis. Intra-rater reliability for fat infiltration score (κ) ranged between 0.87 and 0.92; inter-rater reliability between 0.70 and 0.81. Intra-rater reliability (intraclass correlation) for mean density of paraspinal muscles ranged between 0.96 and 0.99, inter-rater reliability between 0.95 and 0.99; SD intra-rater reliability ranged between 0.82 and 0.91, inter-rater reliability between 0.80 and 0.89. Significant associations (P<0.01) were found between facet joint osteoarthritis, fat infiltration score, and radiographic density ratio. Two suggested indices of paraspinal muscle degeneration showed excellent reliability and were significantly associated with facet joint osteoarthritis. Additional studies are needed to evaluate the associations with other spinal degeneration features and low back pain.
Hellweg, Stephanie; Schuster-Amft, Corina
2016-07-19
Agitation is frequently observed during early recovery after traumatic brain injury (TBI). Agitated behaviour often interferes with a goal-orientated rehabilitation and can be a substantial hindrance to therapy. Despite the relatively high occurance of agitation in TBI population there is no objective assessement in German (G) available. An existing scale with excellent psychometric properties is the "Agitated Behavior Scale (ABS)" developed by Corrigan in 1989. The aim of the study was to translate the Agitated Behavior Scale (ABS) into German (ABS-G) and investigate the inter- and intrarater reliability and internal consistency in patients with moderate to severe TBI. A formal nine-step translation and cross-cultural adaptation procedure (TCCA) was applied. Subsequently a prospective observational patient study was conducted. To examine the interrater reliability and internal consistency, two therapists rated 20 patients independently after a therapy session. This procedure was repeated twice on a weekly basis. The intrarater reliability was assessed through video recordings from three patients. Nine raters scored the demonstrated behaviour on the videotape with the ABS-G independently twice within one month. The inter- and intrarater reliability were evaluated with the Spearman rank correlation coefficient and the quadratic weighted kappa. The internal consistency was tested with Cronbach's alpha. Behaviour of 20 patients (18 males; mean age 41 ± 20.7; mean Functional Independence Measure (FIM) cognitive score on admission 7.1 ± 4.04; mean ABS-G score at first observation 17.3 ± 2.83) was assessed threefold. Interrater reliability yielded a correlation coefficient for ABS-G total score of all 60 paired observations of r s 0.845 and a weighted Kappa of 0.738. Intrarater reliability for ABS-G total score ranged between r s 0.719 and 0.953 and showed a weighted Kappa between 0.871 and 0.953. Cronbach's alpha indicated moderate internal consistency with 0.661. This study demonstrates that the ABS-G is a reliable instrument for evaluating agitation in patients with moderate to severe TBI. Hereby it would be possible to monitor agitation objectively and optimise the management of agitated patients according to international recommendations.
Brunner, Alexander; Gühring, Markus; Schmälzle, Traude; Weise, Kuno; Badke, Andreas
2009-01-01
Evaluation of the kyphosis angle in thoracic and lumbar burst fractures is often used to indicate surgical procedures. The kyphosis angle could be measured as vertebral, segmental and local kyphosis according to the method of Cobb. The vertebral, segmental and local kyphosis according to the method of Cobb were measured at 120 lateral X-rays and sagittal computed tomographies of 60 thoracic and 60 lumbar burst fractures by 3 independent observers on 2 separate occasions. Osteoporotic fractures were excluded. The intra- and interobserver reliability of these angles in X-ray and computed tomogram, using the intra class correlation coefficient (ICC) were evaluated. Highest reproducibility showed the segmental kyphosis followed by the vertebral kyphosis. For thoracic fractures segmental kyphosis shows in X-ray “excellent” inter- and intraobserver reliabilities (ICC 0.826, 0.802) and for lumbar fractures “good” to “excellent” inter- and intraobserver reliabilities (ICC = 0.790, 0.803). In computed tomography, the segmental kyphosis showed “excellent” inter- and intraobserver reliabilities (ICC = 0.824, 0.801) for thoracic and “excellent” inter- and intraobserver reliabilities (ICC = 0.874, 0.835) for the lumbar fractures. Regarding both diagnostic work ups (X-ray and computed tomography), significant differences were evaluated in interobserver reliabilities for vertebral kyphosis measured in lumbar fracture X-rays (p = 0.035) and interobserver reliabilities for local kyphosis, measured in thoracic fracture X-rays (p = 0.010). Regarding both fracture localizations (thoracic and lumbar fractures), significant differences could only be evaluated in interobserver reliabilities for the local kyphosis measured in computed tomographies (p = 0.045) and in intraobserver reliabilities for the vertebral kyphosis measured in X-rays (p = 0.024). “Good” to “excellent” inter- and intraobserver reliabilities for vertebral, segmental and local kyphosis in X-ray make these angles to a helpful tool, indicating surgical procedures. For the practical use in lateral X-ray, we emphasize the determination of the segmental kyphosis, because of the highest reproducibility of this angle. “Good” to “excellent” inter- and intraobserver reliabilities for these three angles could also be evaluated in computed tomographies. Therefore, also in computed tomography, the use of these three angles seems to be generally possible. For a direct correlation of the results in lateral X-ray and in computed tomography, further studies should be needed. PMID:19953277
Evans, Heather L; O'Shea, Dylan J; Morris, Amy E; Keys, Kari A; Wright, Andrew S; Schaad, Douglas C; Ilgen, Jonathan S
2016-02-01
This pilot study assessed the feasibility of using first person (1P) video recording with Google Glass (GG) to assess procedural skills, as compared with traditional third person (3P) video. We hypothesized that raters reviewing 1P videos would visualize more procedural steps with greater inter-rater reliability than 3P rating vantages. Seven subjects performed simulated internal jugular catheter insertions. Procedures were recorded by both Google Glass and an observer's head-mounted camera. Videos were assessed by 3 expert raters using a task-specific checklist (CL) and both an additive- and summative-global rating scale (GRS). Mean scores were compared by t-tests. Inter-rater reliabilities were calculated using intraclass correlation coefficients. The 1P vantage was associated with a significantly higher mean CL score than the 3P vantage (7.9 vs 6.9, P = .02). Mean GRS scores were not significantly different. Mean inter-rater reliabilities for the CL, additive-GRS, and summative-GRS were similar between vantages. 1P vantage recordings may improve visualization of tasks for behaviorally anchored instruments (eg, CLs), whereas maintaining similar global ratings and inter-rater reliability when compared with conventional 3P vantage recordings. Copyright © 2016 Elsevier Inc. All rights reserved.
Wong, M T P; Ho, T P; Ho, M Y; Yu, C S; Wong, Y H; Lee, S Y
2002-05-01
The Geriatric Depression Scale (GDS) is a common screening tool for elderly depression in Hong Kong. This study aimed at (1) developing a standardized manual for the verbal administration and scoring of the GDS-SF, and (2) comparing the inter-rater reliability between the standardized and non-standardized verbal administration of GDS-SF. Two studies were reported. In Study 1, the process of developing the manual was described. In Study 2, we compared the inter-rater reliabilities of GDS-SF scores using the standardized verbal instructions and the traditional non-standardized administration. Results of Study 2 indicated that the standardized procedure in verbal administration and scoring improved the inter-rater reliabilities of GDS-SF. Copyright 2002 John Wiley & Sons, Ltd.
Development and testing of the cancer multidisciplinary team meeting observational tool (MDT-MOT)
Harris, Jenny; Taylor, Cath; Sevdalis, Nick; Jalil, Rozh; Green, James S.A.
2016-01-01
Abstract Objective To develop a tool for independent observational assessment of cancer multidisciplinary team meetings (MDMs), and test criterion validity, inter-rater reliability/agreement and describe performance. Design Clinicians and experts in teamwork used a mixed-methods approach to develop and refine the tool. Study 1 observers rated pre-determined optimal/sub-optimal MDM film excerpts and Study 2 observers independently rated video-recordings of 10 MDMs. Setting Study 2 included 10 cancer MDMs in England. Participants Testing was undertaken by 13 health service staff and a clinical and non-clinical observer. Intervention None. Main Outcome Measures Tool development, validity, reliability/agreement and variability in MDT performance. Results Study 1: Observers were able to discriminate between optimal and sub-optimal MDM performance (P ≤ 0.05). Study 2: Inter-rater reliability was good for 3/10 domains. Percentage of absolute agreement was high (≥80%) for 4/10 domains and percentage agreement within 1 point was high for 9/10 domains. Four MDTs performed well (scored 3+ in at least 8/10 domains), 5 MDTs performed well in 6–7 domains and 1 MDT performed well in only 4 domains. Leadership and chairing of the meeting, the organization and administration of the meeting, and clinical decision-making processes all varied significantly between MDMs (P ≤ 0.01). Conclusions MDT-MOT demonstrated good criterion validity. Agreement between clinical and non-clinical observers (within one point on the scale) was high but this was inconsistent with reliability coefficients and warrants further investigation. If further validated MDT-MOT might provide a useful mechanism for the routine assessment of MDMs by the local workforce to drive improvements in MDT performance. PMID:27084499
Development and testing of the cancer multidisciplinary team meeting observational tool (MDT-MOT).
Harris, Jenny; Taylor, Cath; Sevdalis, Nick; Jalil, Rozh; Green, James S A
2016-06-01
To develop a tool for independent observational assessment of cancer multidisciplinary team meetings (MDMs), and test criterion validity, inter-rater reliability/agreement and describe performance. Clinicians and experts in teamwork used a mixed-methods approach to develop and refine the tool. Study 1 observers rated pre-determined optimal/sub-optimal MDM film excerpts and Study 2 observers independently rated video-recordings of 10 MDMs. Study 2 included 10 cancer MDMs in England. Testing was undertaken by 13 health service staff and a clinical and non-clinical observer. None. Tool development, validity, reliability/agreement and variability in MDT performance. Study 1: Observers were able to discriminate between optimal and sub-optimal MDM performance (P ≤ 0.05). Study 2: Inter-rater reliability was good for 3/10 domains. Percentage of absolute agreement was high (≥80%) for 4/10 domains and percentage agreement within 1 point was high for 9/10 domains. Four MDTs performed well (scored 3+ in at least 8/10 domains), 5 MDTs performed well in 6-7 domains and 1 MDT performed well in only 4 domains. Leadership and chairing of the meeting, the organization and administration of the meeting, and clinical decision-making processes all varied significantly between MDMs (P ≤ 0.01). MDT-MOT demonstrated good criterion validity. Agreement between clinical and non-clinical observers (within one point on the scale) was high but this was inconsistent with reliability coefficients and warrants further investigation. If further validated MDT-MOT might provide a useful mechanism for the routine assessment of MDMs by the local workforce to drive improvements in MDT performance. © The Author 2016. Published by Oxford University Press in association with the International Society for Quality in Health Care; all rights reserved.
Taffin, Elien Rl; Paepe, Dominique; Campos, Miguel; Duchateau, Luc; Goris, Nesya; De Roover, Katrien; Daminet, Sylvie
2016-11-01
Objectives The Karnofsky score (KS) modified for cats, a scoring system to rate health and quality of life (QOL) in cats, is used in clinical trials, but its reliability and validity are yet to be determined. The present study aims to evaluate the scientific robustness of the KS when adapted for use in a hospital setting. Methods A list of variables to consider during the physical examination, which informs the clinician's score (CS) part of the KS, was added and clinicians were allowed to choose a score anywhere between 0 and 50. The Karnofsky QOL questionnaire was adapted for use in a hospital setting. F-tests with Bonferroni correction and Spearman rank correlation coefficients were used to evaluate reliability and validity of the KS to assess the health and wellbeing of cats in a hospital setting. The records of 54 feline immunodeficiency virus-positive cats, which were recruited for a clinical trial and hospitalised for 6 weeks, were reviewed. Four veterinarians scored the CS, and one veterinarian and a veterinary nurse assessed the QOL score. Results Mean absolute difference between observers was significantly larger for the CS than for the QOL score ( P <0.001) and two veterinarians scored significantly higher than the remaining two veterinarians ( P <0.001). Inter-observer correlation ranged from 0.45-0.75 for the CS. For the QOL score, the absolute difference between observers was small, no significant difference was found between observers and a high degree of inter-observer correlation was noted (r = 0.91). Conclusions and relevance The results indicate low inter-observer reliability for the CS, requiring additional modifications to this part of the KS. The QOL score seems more reliable, and the questionnaire may serve as a reliable tool in the assessment of QOL in cats in a hospital setting. Consequently, further adaptation of the KS is mandatory when simultaneous assessment of both the cat's clinical health and perceived wellbeing is required.
Evaluating the use of in-store measures in retail food stores and restaurants in Brazil.
Duran, Ana Clara; Lock, Karen; Latorre, Maria do Rosario D O; Jaime, Patricia Constante
2015-01-01
To assess inter-rater reliability, test-retest reliability, and construct validity of retail food store, open-air food market, and restaurant observation tools adapted to the Brazilian urban context. This study is part of a cross-sectional observation survey conducted in 13 districts across the city of Sao Paulo, Brazil in 2010-2011. Food store and restaurant observational tools were developed based on previously available tools, and then tested it. They included measures on the availability, variety, quality, pricing, and promotion of fruits and vegetables and ultra-processed foods. We used Kappa statistics and intra-class correlation coefficients to assess inter-rater and test-retest reliabilities in samples of 142 restaurants, 97 retail food stores (including open-air food markets), and of 62 restaurants and 45 retail food stores (including open-air food markets), respectively. Construct validity as the tool's abilities to discriminate based on store types and different income contexts were assessed in the entire sample: 305 retail food stores, 8 fruits and vegetable markets, and 472 restaurants. Inter-rater and test-retest reliability were generally high, with most Kappa values greater than 0.70 (range 0.49-1.00). Both tools discriminated between store types and neighborhoods with different median income. Fruits and vegetables were more likely to be found in middle to higher-income neighborhoods, while soda, fruit-flavored drink mixes, cookies, and chips were cheaper and more likely to be found in lower-income neighborhoods. The measures were reliable and able to reveal significant differences across store types and different contexts. Although some items may require revision, results suggest that the tools may be used to reliably measure the food stores and restaurant food environment in urban settings of middle-income countries. Such studies can help .inform health promotion interventions and policies in these contexts.
Evaluating the use of in-store measures in retail food stores and restaurants in Brazil
Duran, Ana Clara; Lock, Karen; Latorre, Maria do Rosario D O; Jaime, Patricia Constante
2015-01-01
ABSTRACT OBJECTIVE To assess inter-rater reliability, test-retest reliability, and construct validity of retail food store, open-air food market, and restaurant observation tools adapted to the Brazilian urban context. METHODS This study is part of a cross-sectional observation survey conducted in 13 districts across the city of Sao Paulo, Brazil in 2010-2011. Food store and restaurant observational tools were developed based on previously available tools, and then tested it. They included measures on the availability, variety, quality, pricing, and promotion of fruits and vegetables and ultra-processed foods. We used Kappa statistics and intra-class correlation coefficients to assess inter-rater and test-retest reliabilities in samples of 142 restaurants, 97 retail food stores (including open-air food markets), and of 62 restaurants and 45 retail food stores (including open-air food markets), respectively. Construct validity as the tool’s abilities to discriminate based on store types and different income contexts were assessed in the entire sample: 305 retail food stores, 8 fruits and vegetable markets, and 472 restaurants. RESULTS Inter-rater and test-retest reliability were generally high, with most Kappa values greater than 0.70 (range 0.49-1.00). Both tools discriminated between store types and neighborhoods with different median income. Fruits and vegetables were more likely to be found in middle to higher-income neighborhoods, while soda, fruit-flavored drink mixes, cookies, and chips were cheaper and more likely to be found in lower-income neighborhoods. CONCLUSIONS The measures were reliable and able to reveal significant differences across store types and different contexts. Although some items may require revision, results suggest that the tools may be used to reliably measure the food stores and restaurant food environment in urban settings of middle-income countries. Such studies can help .inform health promotion interventions and policies in these contexts. PMID:26538101
Kemp, Joanne L; Schache, Anthony G; Makdissi, Michael; Sims, Kevin J; Crossley, Kay M
2013-07-01
This study investigated tests of hip muscle strength and functional performance. The specific objectives were to: (i) establish intra- and inter-rater reliability; (ii) compare differences between dominant and non-dominant limbs; (iii) compare agonist and antagonist muscle strength ratios; (iv) compare differences between genders; and (v) examine relationships between hip muscle strength, baseline measures and functional performance. Reliability study and cross-sectional analysis of hip strength and functional performance. In healthy adults aged 18-50years, normalised hip muscle peak torque and functional performance were evaluated to: (i) establish intra-rater and inter-rater reliability; (ii) analyse differences between limbs, between antagonistic muscle groups and genders; and (iii) associations between strength and functional performance. Excellent reliability (intra-rater ICC=0.77-0.96; inter-rater ICC=0.82-0.95) was observed. No difference existed between dominant and non-dominant limbs. Differences in strength existed between antagonistic pairs of muscles: hip abduction was greater than adduction (p<0.001) and hip ER was greater than IR (p<0.001). Men had greater ER strength (p=0.006) and hop for distance (p<0.001) than women. Strong associations were observed between measures of hip muscle strength (except hip flexion) and age, height, and functional performance. Deficits in hip muscle strength or functional performance may influence hip pain. In order to provide targeted rehabilitation programmes to address patient-specific impairments, and determine when individuals are ready to return to physical activity, clinicians are increasingly utilising tests of hip strength and functional performance. This study provides a battery of reliable, clinically applicable tests which can be used for these purposes. Copyright © 2012 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Rickard, Mandy; Easterbrook, Bethany; Kim, Soojin; Farrokhyar, Forough; Stein, Nina; Arora, Steven; Belostotsky, Vladamir; DeMaria, Jorge; Lorenzo, Armando J; Braga, Luis H
2017-02-01
The urinary tract dilation (UTD) classification system was introduced to standardize terminology in the reporting of hydronephrosis (HN), and bridge a gap between pre- and postnatal classification such as the Society for Fetal Urology (SFU) grading system. Herein we compare the intra/inter-rater reliability of both grading systems. SFU (I-IV) and UTD (I-III) grades were independently assigned by 13 raters (9 pediatric urology staff, 2 nephrologists, 2 radiologists), twice, 3 weeks apart, to 50 sagittal postnatal ultrasonographic views of hydronephrotic kidneys. Data regarding ureteral measurements and bladder abnormalities were included to allow proper UTD categorization. Ten images were repeated to assess intra-rater reliability. Krippendorff's alpha coefficient was used to measure overall and by grade intra/inter-rater reliability. Reliability between specialties and training levels were also analyzed. Overall inter-rater reliability was slightly higher for SFU (α = 0.842, 95% CI 0.812-0.879, in session 1; and α = 0.808, 95% CI 0.775-0.839, in session 2) than for UTD (α = 0.774, 95% CI 0.715-0.827, in session 1; and α = 0.679, 95% CI 0.605-0.750, in session 2). Reliability for intermediate grades (SFU II/III and UTD 2) of HN was poor regardless of the system. Reliabilities for SFU and UTD classifications among Urology, Nephrology, and Radiology, as well as between training levels were not significantly different. Despite the introduction of HN grading systems to standardize the interpretation and reporting of renal ultrasound in infants with HN, none have been proven superior in allowing clinicians to distinguish between "moderate" grades. While this study demonstrated high reliability in distinguishing between "mild" (SFU I/II and UTD 1) and "severe" (SFU IV and UTD 3) grades of HN, the overall reliability between specialties was poor. This is in keeping with a previous report of modest inter-rater reliability of the SFU system. This drawback is likely explained by the subjective interpretation required to assign grades, which can be impacted by experience, image quality, and scanning technique. As shown in the figure, which demonstrates SFU II (a) and SFU III (b), as assigned by a radiologist, it is possible to make an argument that either of these images can be classified into both categories that were observed during the grading sessions of this study. Although both systems have acceptable reliability, the SFU grading system showed higher overall intra/inter-rater reliability regardless of rater specialty than the UTD classification. Inter-rater reliability for SFU grades II/III and UTD 2 was low, highlighting the limitations of both classifications in regards to properly segregating moderate HN grades. Copyright © 2016 Journal of Pediatric Urology Company. Published by Elsevier Ltd. All rights reserved.
Inter-rater agreement in evaluation of disability: systematic review of reproducibility studies.
Barth, Jürgen; de Boer, Wout E L; Busse, Jason W; Hoving, Jan L; Kedzia, Sarah; Couban, Rachel; Fischer, Katrin; von Allmen, David Y; Spanjer, Jerry; Kunz, Regina
2017-01-25
To explore agreement among healthcare professionals assessing eligibility for work disability benefits. Systematic review and narrative synthesis of reproducibility studies. Medline, Embase, and PsycINFO searched up to 16 March 2016, without language restrictions, and review of bibliographies of included studies. Observational studies investigating reproducibility among healthcare professionals performing disability evaluations using a global rating of working capacity and reporting inter-rater reliability by a statistical measure or descriptively. Studies could be conducted in insurance settings, where decisions on ability to work include normative judgments based on legal considerations, or in research settings, where decisions on ability to work disregard normative considerations. : Teams of paired reviewers identified eligible studies, appraised their methodological quality and generalisability, and abstracted results with pretested forms. As heterogeneity of research designs and findings impeded a quantitative analysis, a descriptive synthesis stratified by setting (insurance or research) was performed. From 4562 references, 101 full text articles were reviewed. Of these, 16 studies conducted in an insurance setting and seven in a research setting, performed in 12 countries, met the inclusion criteria. Studies in the insurance setting were conducted with medical experts assessing claimants who were actual disability claimants or played by actors, hypothetical cases, or short written scenarios. Conditions were mental (n=6, 38%), musculoskeletal (n=4, 25%), or mixed (n=6, 38%). Applicability of findings from studies conducted in an insurance setting to real life evaluations ranged from generalisable (n=7, 44%) and probably generalisable (n=3, 19%) to probably not generalisable (n=6, 37%). Median inter-rater reliability among experts was 0.45 (range intraclass correlation coefficient 0.86 to κ-0.10). Inter-rater reliability was poor in six studies (37%) and excellent in only two (13%). This contrasts with studies conducted in the research setting, where the median inter-rater reliability was 0.76 (range 0.91-0.53), and 71% (5/7) studies achieved excellent inter-rater reliability. Reliability between assessing professionals was higher when the evaluation was guided by a standardised instrument (23 studies, P=0.006). No such association was detected for subjective or chronic health conditions or the studies' generalisability to real world evaluation of disability (P=0.46, 0.45, and 0.65, respectively). Despite their common use and far reaching consequences for workers claiming disabling injury or illness, research on the reliability of medical evaluations of disability for work is limited and indicates high variation in judgments among assessing professionals. Standardising the evaluation process could improve reliability. Development and testing of instruments and structured approaches to improve reliability in evaluation of disability are urgently needed. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Inter-operator and inter-device agreement and reliability of the SEM Scanner.
Clendenin, Marta; Jaradeh, Kindah; Shamirian, Anasheh; Rhodes, Shannon L
2015-02-01
The SEM Scanner is a medical device designed for use by healthcare providers as part of pressure ulcer prevention programs. The objective of this study was to evaluate the inter-rater and inter-device agreement and reliability of the SEM Scanner. Thirty-one (31) volunteers free of pressure ulcers or broken skin at the sternum, sacrum, and heels were assessed with the SEM Scanner. Each of three operators utilized each of three devices to collect readings from four anatomical sites (sternum, sacrum, left and right heels) on each subject for a total of 108 readings per subject collected over approximately 30 min. For each combination of operator-device-anatomical site, three SEM readings were collected. Inter-operator and inter-device agreement and reliability were estimated. Over the course of this study, more than 3000 SEM Scanner readings were collected. Agreement between operators was good with mean differences ranging from -0.01 to 0.11. Inter-operator and inter-device reliability exceeded 0.80 at all anatomical sites assessed. The results of this study demonstrate the high reliability and good agreement of the SEM Scanner across different operators and different devices. Given the limitations of current methods to prevent and detect pressure ulcers, the SEM Scanner shows promise as an objective, reliable tool for assessing the presence or absence of pressure-induced tissue damage such as pressure ulcers. Copyright © 2015 Bruin Biometrics, LLC. Published by Elsevier Ltd.. All rights reserved.
RELIABILITY AND VALIDITY OF A BIOMECHANICALLY BASED ANALYSIS METHOD FOR THE TENNIS SERVE
Kibler, W. Ben; Lamborn, Leah; Smith, Belinda J.; English, Tony; Jacobs, Cale; Uhl, Tim L.
2017-01-01
Background An observational tennis serve analysis (OTSA) tool was developed using previously established body positions from three-dimensional kinematic motion analysis studies. These positions, defined as nodes, have been associated with efficient force production and minimal joint loading. However, the tool has yet to be examined scientifically. Purpose The primary purpose of this investigation was to determine the inter-observer reliability for each node between two health care professionals (HCPs) that developed the OTSA, and secondarily to investigate the validity of the OTSA. Methods Two separate studies were performed to meet these objectives. An inter-observer reliability study preceded the validity study by examining 28 videos of players serving. Two HCPs graded each video and scored the presence or absence of obtaining each node. Discriminant validity was determined in 33 tennis players using video taped records of three first serves. Serve mechanics were graded using the OSTA and categorized players into those with good ( ≥ 5) and poor ( ≤ 4) mechanics. Participants performed a series of field tests to evaluate trunk flexibility, lower extremity and trunk power, and dynamic balance. Results The group with good mechanics demonstrated greater backward trunk flexibility (p=0.02), greater rotational power (p=0.02), and higher single leg countermovement jump (p=0.05). Reliability of the OTSA ranged from K = 0.36-1.0, with the majority of all the nodes displaying substantial reliability (K>0.61). Conclusion This study provides HCPs with a valid and reliable field tool used to assess serve mechanics. Physical characteristics of trunk mobility and power appear to discriminate serve mechanics between players. Future intervention studies are needed to determine if improvement in physical function contribute to improved serve mechanics. Level of Evidence 3 PMID:28593098
Margolin, Ezra J; Mlynarczyk, Carrie M; Mulhall, John P; Stember, Doron S; Stahl, Peter J
2017-06-01
Non-curvature penile deformities are prevalent and bothersome manifestations of Peyronie's disease (PD), but the quantitative metrics that are currently used to describe these deformities are inadequate and non-standardized, presenting a barrier to clinical research and patient care. To introduce erect penile volume (EPV) and percentage of erect penile volume loss (percent EPVL) as novel metrics that provide detailed quantitative information about non-curvature penile deformities and to study the feasibility and reliability of three-dimensional (3D) photography for measurement of quantitative penile parameters. We constructed seven penis models simulating deformities found in PD. The 3D photographs of each model were captured in triplicate by four observers using a 3D camera. Computer software was used to generate automated measurements of EPV, percent EPVL, penile length, minimum circumference, maximum circumference, and angle of curvature. The automated measurements were statistically compared with measurements obtained using water-displacement experiments, a tape measure, and a goniometer. Accuracy of 3D photography for average measurements of all parameters compared with manual measurements; inter-test, intra-observer, and inter-observer reliabilities of EPV and percent EPVL measurements as assessed by the intraclass correlation coefficient. The 3D images were captured in a median of 52 seconds (interquartile range = 45-61). On average, 3D photography was accurate to within 0.3% for measurement of penile length. It overestimated maximum and minimum circumferences by averages of 4.2% and 1.6%, respectively; overestimated EPV by an average of 7.1%; and underestimated percent EPVL by an average of 1.9%. All inter-test, inter-observer, and intra-observer intraclass correlation coefficients for EPV and percent EPVL measurements were greater than 0.75, reflective of excellent methodologic reliability. By providing highly descriptive and reliable measurements of penile parameters, 3D photography can empower researchers to better study volume-loss deformities in PD and enable clinicians to offer improved clinical assessment, communication, and documentation. This is the first study to apply 3D photography to the assessment of PD and to accurately measure the novel parameters of EPV and percent EPVL. This proof-of-concept study is limited by the lack of data in human subjects, which could present additional challenges in obtaining reliable measurements. EPV and percent EPVL are novel metrics that can be quickly, accurately, and reliably measured using computational analysis of 3D photographs and can be useful in describing non-curvature volume-loss deformities resulting from PD. Margolin EJ, Mlynarczyk CM, Muhall JP, et al. Three-Dimensional Photography for Quantitative Assessment of Penile Volume-Loss Deformities in Peyronie's Disease. J Sex Med 2017;14:829-833. Copyright © 2017 International Society for Sexual Medicine. Published by Elsevier Inc. All rights reserved.
Oremus, Carolina; Hall, Geoffrey B C; McKinnon, Margaret C
2012-01-01
Introduction Quality assessment of included studies is an important component of systematic reviews. Objective The authors investigated inter-rater and test–retest reliability for quality assessments conducted by inexperienced student raters. Design Student raters received a training session on quality assessment using the Jadad Scale for randomised controlled trials and the Newcastle–Ottawa Scale (NOS) for observational studies. Raters were randomly assigned into five pairs and they each independently rated the quality of 13–20 articles. These articles were drawn from a pool of 78 papers examining cognitive impairment following electroconvulsive therapy to treat major depressive disorder. The articles were randomly distributed to the raters. Two months later, each rater re-assessed the quality of half of their assigned articles. Setting McMaster Integrative Neuroscience Discovery and Study Program. Participants 10 students taking McMaster Integrative Neuroscience Discovery and Study Program courses. Main outcome measures The authors measured inter-rater reliability using κ and the intraclass correlation coefficient type 2,1 or ICC(2,1). The authors measured test–retest reliability using ICC(2,1). Results Inter-rater reliability varied by scale question. For the six-item Jadad Scale, question-specific κs ranged from 0.13 (95% CI −0.11 to 0.37) to 0.56 (95% CI 0.29 to 0.83). The ranges were −0.14 (95% CI −0.28 to 0.00) to 0.39 (95% CI −0.02 to 0.81) for the NOS cohort and −0.20 (95% CI −0.49 to 0.09) to 1.00 (95% CI 1.00 to 1.00) for the NOS case–control. For overall scores on the six-item Jadad Scale, ICC(2,1)s for inter-rater and test–retest reliability (accounting for systematic differences between raters) were 0.32 (95% CI 0.08 to 0.52) and 0.55 (95% CI 0.41 to 0.67), respectively. Corresponding ICC(2,1)s for the NOS cohort were −0.19 (95% CI −0.67 to 0.35) and 0.62 (95% CI 0.25 to 0.83), and for the NOS case–control, the ICC(2,1)s were 0.46 (95% CI −0.13 to 0.92) and 0.83 (95% CI 0.48 to 0.95). Conclusions Inter-rater reliability was generally poor to fair and test–retest reliability was fair to excellent. A pilot rating phase following rater training may be one way to improve agreement. PMID:22855629
Oremus, Mark; Oremus, Carolina; Hall, Geoffrey B C; McKinnon, Margaret C
2012-01-01
Quality assessment of included studies is an important component of systematic reviews. The authors investigated inter-rater and test-retest reliability for quality assessments conducted by inexperienced student raters. Student raters received a training session on quality assessment using the Jadad Scale for randomised controlled trials and the Newcastle-Ottawa Scale (NOS) for observational studies. Raters were randomly assigned into five pairs and they each independently rated the quality of 13-20 articles. These articles were drawn from a pool of 78 papers examining cognitive impairment following electroconvulsive therapy to treat major depressive disorder. The articles were randomly distributed to the raters. Two months later, each rater re-assessed the quality of half of their assigned articles. McMaster Integrative Neuroscience Discovery and Study Program. 10 students taking McMaster Integrative Neuroscience Discovery and Study Program courses. The authors measured inter-rater reliability using κ and the intraclass correlation coefficient type 2,1 or ICC(2,1). The authors measured test-retest reliability using ICC(2,1). Inter-rater reliability varied by scale question. For the six-item Jadad Scale, question-specific κs ranged from 0.13 (95% CI -0.11 to 0.37) to 0.56 (95% CI 0.29 to 0.83). The ranges were -0.14 (95% CI -0.28 to 0.00) to 0.39 (95% CI -0.02 to 0.81) for the NOS cohort and -0.20 (95% CI -0.49 to 0.09) to 1.00 (95% CI 1.00 to 1.00) for the NOS case-control. For overall scores on the six-item Jadad Scale, ICC(2,1)s for inter-rater and test-retest reliability (accounting for systematic differences between raters) were 0.32 (95% CI 0.08 to 0.52) and 0.55 (95% CI 0.41 to 0.67), respectively. Corresponding ICC(2,1)s for the NOS cohort were -0.19 (95% CI -0.67 to 0.35) and 0.62 (95% CI 0.25 to 0.83), and for the NOS case-control, the ICC(2,1)s were 0.46 (95% CI -0.13 to 0.92) and 0.83 (95% CI 0.48 to 0.95). Inter-rater reliability was generally poor to fair and test-retest reliability was fair to excellent. A pilot rating phase following rater training may be one way to improve agreement.
Khadilkar, Leenesh; MacDermid, Joy C; Sinden, Kathryn E; Jenkyn, Thomas R; Birmingham, Trevor B; Athwal, George S
2014-01-01
Video-based movement analysis software (Dartfish) has potential for clinical applications for understanding shoulder motion if functional measures can be reliably obtained. The primary purpose of this study was to describe the functional range of motion (ROM) of the shoulder used to perform a subset of functional tasks. A second purpose was to assess the reliability of functional ROM measurements obtained by different raters using Dartfish software. Ten healthy participants, mean age 29 ± 5 years, were videotaped while performing five tasks selected from the Disabilities of the Arm, Shoulder and Hand (DASH). Video cameras and markers were used to obtain video images suitable for analysis in Dartfish software. Three repetitions of each task were performed. Shoulder movements from all three repetitions were analyzed using Dartfish software. The tracking tool of the Dartfish software was used to obtain shoulder joint angles and arcs of motion. Test-retest and inter-rater reliability of the measurements were evaluated using intraclass correlation coefficients (ICC). Maximum (coronal plane) abduction (118° ± 16°) and (sagittal plane) flexion (111° ± 15°) was observed during 'washing one's hair;' maximum extension (-68° ± 9°) was identified during 'washing one's own back.' Minimum shoulder ROM was observed during 'opening a tight jar' (33° ± 13° abduction and 13° ± 19° flexion). Test-retest reliability (ICC = 0.45 to 0.94) suggests high inter-individual task variability, and inter-rater reliability (ICC = 0.68 to 1.00) showed moderate to excellent agreement. KEY FINDINGS INCLUDE: 1) functional shoulder ROM identified in this study compared to similar studies; 2) healthy individuals require less than full ROM when performing five common ADL tasks 3) high participant variability was observed during performance of the five ADL tasks; and 4) Dartfish software provides a clinically relevant tool to analyze shoulder function.
77 FR 54917 - Findings of Research Misconduct
Federal Register 2010, 2011, 2012, 2013, 2014
2012-09-06
... values for inter-observer reliabilities when coding was done by only one observer, in both cases leading... Research Integrity (ORI) has taken final action in the following case: Marc Hauser, Ph.D., Harvard... collaborators that he miscoded some of the trials and that the study failed to provide support for the initial...
Cutolo, Maurizio; Vanhaecke, Amber; Ruaro, Barbara; Deschepper, Ellen; Ickinger, Claudia; Melsens, Karin; Piette, Yves; Trombetta, Amelia Chiara; De Keyser, Filip; Smith, Vanessa
2018-06-06
A reliable tool to evaluate flow is paramount in systemic sclerosis (SSc). We describe herein on the one hand a systematic literature review on the reliability of laser speckle contrast analysis (LASCA) to measure the peripheral blood perfusion (PBP) in SSc and perform an additional pilot study, investigating the intra- and inter-rater reliability of LASCA. A systematic search was performed in 3 electronic databases, according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. In the pilot study, 30 SSc patients and 30 healthy subjects (HS) underwent LASCA assessment. Intra-rater reliability was assessed by having a first anchor rater performing the measurements at 2 time-points and inter-rater reliability by having the anchor rater and a team of second raters performing the measurements in 15 SSc and 30 HS. The measurements were repeated with a second anchor rater in the other 15 SSc patients, as external validation. Only 1 of the 14 records of interest identified through the systematic search was included in the final analysis. In the additional pilot study: intra-class correlation coefficient (ICC) for intra-rater reliability of the first anchor rater was 0.95 in SSc and 0.93 in HS, the ICC for inter-rater reliability was 0.97 in SSc and 0.93 in HS. Intra- and inter-rater reliability of the second anchor rater was 0.78 and 0.87. The identified literature regarding the reliability of LASCA measurements reports good to excellent inter-rater agreement. This very pilot study could confirm the reliability of LASCA measurements with good to excellent inter-rater agreement and found additionally good to excellent intra-rater reliability. Furthermore, similar results were found in the external validation. Copyright © 2018. Published by Elsevier B.V.
ERIC Educational Resources Information Center
Saxton, Emily; Belanger, Secret; Becker, William
2012-01-01
The purpose of this study was to investigate the intra-rater and inter-rater reliability of the Critical Thinking Analytic Rubric (CTAR). The CTAR is composed of 6 rubric categories: interpretation, analysis, evaluation, inference, explanation, and disposition. To investigate inter-rater reliability, two trained raters scored four sets of…
Serrano-Ortega, Natalia; Frías-Osuna, Antonio; Recio-Gómez, Juan M; Del-Pino-Casado, Rafael
2015-11-01
To develop and validate a scale to measure caregiving dedication regarding activities of daily living in caregivers of dependent older people. Cross-sectional study. Primary Health Care (Andalusia, Spain). a probabilistic sample of 200 caregivers of older relatives from Córdoba, Spain. Content validation by experts, construct validity (by exploratory factor analysis), divergent validity and reliability (internal consistency, test-retest reliability and inter-observers reliability). Cronbach's alpha was 0.86. Intraclass Correlation Coefficient was 0.96 for test-retest reliability and 0.88 for inter-observers reliability. When the sample was divided in two groups according to perceived burden level (presence and absence), the perceived burden was significantly different in each group (P=.001). The factor analysis revealed one only factor that explained 64% of the variance. The scale allows a suitable measure of caregiving dedication regarding activities of daily living in caregivers of older people, because this scale allows a quickly, easy administration, is well accepted by caregivers, has acceptable psychometric results and includes the frequency of caregiving, the kind of attended need and the dependence level in each need. Copyright © 2014 Elsevier España, S.L.U. All rights reserved.
Hobbelen, Johannes S M; Koopmans, Raymond T C M; Verhey, Frans R J; Habraken, Kitty M; de Bie, Rob A
2008-08-01
Paratonia is one of the associated movement disorders characteristic of dementia. The aim of this study was to develop an assessment tool (the Paratonia Assessment Instrument, PAI), based on the new consensus definition of paratonia. An additional aim was to investigate the reliability and validity of the PAI. A three-phase cross-sectional survey was conducted. In the first two phases, the PAI was developed and validated. In the third phase, the inter-observer reliability and feasibility of the instrument was tested. The original PAI consisted of five criteria that all needed to be met in order to make the diagnosis. On the basis of a qualitative analysis, one criterion was reformulated and another was removed. Following this, inter-observer reliability between the two assessors resulted in an improvement of Cohen's kappa from 0.532 in the initial phase to 0.677 in the second phase. This improvement was substantiated in the third phase by two independent assessors with Cohen's kappa ranging from 0.625 to 1. The PAI is a reliable and valid assessment tool for diagnosing paratonia in elderly people with dementia that can be applied easily in daily practice.
Seo, Jeong-Ho; Boedijono, Dimas
2016-01-01
Purpose The aim of this study was to investigate new point-connecting measurements for the hallux valgus angle (HVA) and the first intermetatarsal angle (IMA), which can reflect the degree of subluxation of the first metatarsophalangeal joint (MTPJ). Also, this study attempted to compare the validity of midline measurements and the new point-connecting measurements for the determination of HVA and IMA values. Materials and Methods Sixty feet of hallux valgus patients who underwent surgery between 2007 and 2011 were classified in terms of the severity of HVA, congruency of the first MTPJ, and type of chevron metatarsal osteotomy. On weight-bearing dorsal-plantar radiographs, HVA and IMA values were measured and compared preoperatively and postoperatively using both the conventional and new methods. Results Compared with midline measurements, point-connecting measurements showed higher inter- and intra-observer reliability for preoperative HVA/IMA and similar or higher inter- and intra-observer reliability for postoperative HVA/IMA. Patients who underwent distal chevron metatarsal osteotomy (DCMO) had higher intraclass correlation coefficient for inter- and intra-observer reliability for pre- and post-operative HVA and IMA measured by the point-connecting method compared with the midline method. All differences in the preoperative HVAs and IMAs determined by both the midline method and point-connecting methods were significant between the deviated group and subluxated groups (p=0.001). Conclusion The point-connecting method for measuring HVA and IMA in the subluxated first MTPJ may better reflect the severity of a HV deformity with higher reliability than the midline method, and is more useful in patients with DCMO than in patients with proximal chevron metatarsal osteotomy. PMID:26996576
ASSOCIATIONS BETWEEN THREE CLINICAL ASSESSMENT TOOLS FOR POSTURAL STABILITY
Saxion, Casie E.; Cameron, Kenneth L.; Gerber, J. Parry
2010-01-01
Study Design: Clinical Measurement, Correlation, Reliability Objectives: To assess the relationship between the Single Leg Balance (SLB), modified Balance Error Scoring System (mBESS), and modified Star Excursion Balance (mSEBT) tests and secondarily to assess inter-rater and test-retest reliability of these tests. Background: Ankle sprains often result in chronic instability and dysfunction. Several clinical tests assess postural deficits as a potential cause of this dysfunction; however, limited information exists pertaining to the relationship that these tests have with one another. Methods: Two independent examiners measured the performance of 34 healthy participants completing the SLB Test, mBESS test, and mSEBT at two different time periods. The relationship between tests was assessed using the Pearson Correlation and Fisher's Exact Tests. Inter-rater and test-retest reliability were assessed using the intraclass correlation coefficient (ICC) and Kappa statistics. Results: A significant correlation (r = -0.35) was observed between the mSEBT and the mBESS. Fisher's Exact Test showed a significant association between the SLB Test and mBESS (P = .048), but no association between the SLB and mSEBT (P = 1.000). Inter-rater reliability was excellent for the mSEBT and fair for the mBESS (ICCs of .91 and .61 respectively). Excellent agreement was observed between raters for the SLB test (k = 1.00). Test-retest reliability was excellent for the mSEBT (ICC = 0.98) and fair for the mBESS (ICC = 0.74). There was poor test-retest agreement for the SLB test (k = .211). Conclusion: There was a significant relationship observed between the SLB Test, mBESS test, and mSEBT: however; strength of association measures showed limited overlap between these tests. This suggests that these tests are interrelated but may not assess equal components of postural stability. PMID:21589668
Examiner Training and Reliability in Two Randomized Clinical Trials of Adult Dental Caries
Banting, David W.; Amaechi, Bennett T.; Bader, James D.; Blanchard, Peter; Gilbert, Gregg H.; Gullion, Christina M.; Holland, Jan Carlton; Makhija, Sonia K.; Papas, Athena; Ritter, André V.; Singh, Mabi L.; Vollmer, William M.
2013-01-01
Objectives This report describes the training of dental examiners participating in two dental caries clinical trials and reports the inter- and intra- examiner reliability scores from the initial standardization sessions. Methods Study examiners were trained to use a modified ICDAS-II system to detect the visual signs of non-cavitated and cavitated dental caries in adult subjects. Dental caries was classified as no caries (S), non-cavitated caries (D1), enamel caries (D2) and dentine caries (D3). Three standardization sessions involving 60 subjects and 3604 tooth surface calls were used to calculate several measures of examiner reliability. Results The prevalence of dental caries observed in the standardization sessions ranged from 1.4% to 13.5% of the coronal tooth surfaces examined. Overall agreement between pairs of examiners ranged from 0.88 to 0.99. An intra-class coefficient threshold of 0.60 was surpassed for all but one examiner. Inter-examiner unweighted kappa values were low (0.23– 0.35) but weighted kappas and the ratio of observed to maximum kappas were more encouraging (0.42– 0.83). The highest kappa values occurred for the S/D1 vs. D2/D3 two-level classification of dental caries, for which seven of the eight examiners achieved observed to maximum kappa values over 0.90.Intra-examiner reliability was notably higher than inter-examiner reliability for all measures and dental caries classification systems employed. Conclusion The methods and results for the initial examiner training and standardization sessions for two large clinical trials are reported. Recommendations for others planning examiner training and standardization sessions are offered. PMID:22320292
Examiner training and reliability in two randomized clinical trials of adult dental caries.
Banting, David W; Amaechi, Bennett T; Bader, James D; Blanchard, Peter; Gilbert, Gregg H; Gullion, Christina M; Holland, Jan Carlton; Makhija, Sonia K; Papas, Athena; Ritter, André V; Singh, Mabi L; Vollmer, William M
2011-01-01
This report describes the training of dental examiners participating in two dental caries clinical trials and reports the inter- and intra-examiner reliability scores from the initial standardization sessions. Study examiners were trained to use a modified International Caries Detection and Assessment System II system to detect the visual signs of non-cavitated and cavitated dental caries in adult subjects. Dental caries was classified as no caries (S), non-cavitated caries (D1), enamel caries (D2), and dentine caries (D3). Three standardization sessions involving 60 subjects and 3,604 tooth surface calls were used to calculate several measures of examiner reliability. The prevalence of dental caries observed in the standardization sessions ranged from 1.4 percent to 13.5 percent of the coronal tooth surfaces examined. Overall agreement between pairs of examiners ranged from 0.88 to 0.99. An intra-class coefficient threshold of 0.60 was surpassed for all but one examiner. Inter-examiner unweighted kappa values were low (0.23-0.35), but weighted kappas and the ratio of observed to maximum kappas were more encouraging (0.42-0.83). The highest kappa values occurred for the S/D1 versus D2/D3 two-level classification of dental caries, for which seven of the eight examiners achieved observed to maximum kappa values over 0.90. Intra-examiner reliability was notably higher than inter-examiner reliability for all measures and dental caries classifications employed. The methods and results for the initial examiner training and standardization sessions for two large clinical trials are reported. Recommendations for others planning examiner training and standardization sessions are offered. © 2011 American Association of Public Health Dentistry.
Wengert, G J; Helbich, T H; Woitek, R; Kapetas, P; Clauser, P; Baltzer, P A; Vogl, W-D; Weber, M; Meyer-Baese, A; Pinker, Katja
2016-11-01
To evaluate the inter-/intra-observer agreement of BI-RADS-based subjective visual estimation of the amount of fibroglandular tissue (FGT) with magnetic resonance imaging (MRI), and to investigate whether FGT assessment benefits from an automated, observer-independent, quantitative MRI measurement by comparing both approaches. Eighty women with no imaging abnormalities (BI-RADS 1 and 2) were included in this institutional review board (IRB)-approved prospective study. All women underwent un-enhanced breast MRI. Four radiologists independently assessed FGT with MRI by subjective visual estimation according to BI-RADS. Automated observer-independent quantitative measurement of FGT with MRI was performed using a previously described measurement system. Inter-/intra-observer agreements of qualitative and quantitative FGT measurements were assessed using Cohen's kappa (k). Inexperienced readers achieved moderate inter-/intra-observer agreement and experienced readers a substantial inter- and perfect intra-observer agreement for subjective visual estimation of FGT. Practice and experience reduced observer-dependency. Automated observer-independent quantitative measurement of FGT was successfully performed and revealed only fair to moderate agreement (k = 0.209-0.497) with subjective visual estimations of FGT. Subjective visual estimation of FGT with MRI shows moderate intra-/inter-observer agreement, which can be improved by practice and experience. Automated observer-independent quantitative measurements of FGT are necessary to allow a standardized risk evaluation. • Subjective FGT estimation with MRI shows moderate intra-/inter-observer agreement in inexperienced readers. • Inter-observer agreement can be improved by practice and experience. • Automated observer-independent quantitative measurements can provide reliable and standardized assessment of FGT with MRI.
Lee, Kyoung-bo; Lee, Paul; Yoo, Sang-won; Kim, Young-dong
2016-01-01
[Purpose] The aim of this study was to translate and adapt the Community Balance and Mobility Scale (CB&M) into Korean (K-CB&M) and to verify the reliability and validity of scores obtained with Korean patients. [Subjects and Methods] A total of 16 subjects were recruited from St. Vincent’s Hospital in South Korea. At each testing session, subjects completed the K-CB&M, Berg balance scale (BBS), timed up and go test (TUG), and functional reaching test. All tests were administered by a physical therapist, and subjects completed the tests in an identical standardized order during all testing sessions. [Results] The inter- and intra-rater reliability coefficients were high for most subscores, while moderate inter-rater reliability was observed for the items “walking and looking” and “walk, look, and carry”, and moderate intra-rater reliability was observed for “forward to backward walking”. There was a positive correlation between the K-CB&M and BBS and a negative correlation between the K-CB&M and TUG in the convergent validity assessments. [Conclusion] The reliability and validity of the K-CB&M was high, suggesting that clinical practitioners treating Korean patients with hemiplegia can use this material for assessing static and dynamic balance. PMID:27630420
Balaguier, Romain; Madeleine, Pascal; Vuillerme, Nicolas
2016-01-01
The assessment of pressure pain threshold (PPT) provides a quantitative value related to the mechanical sensitivity to pain of deep structures. Although excellent reliability of PPT has been reported in numerous anatomical locations, its absolute and relative reliability in the lower back region remains to be determined. Because of the high prevalence of low back pain in the general population and because low back pain is one of the leading causes of disability in industrialized countries, assessing pressure pain thresholds over the low back is particularly of interest. The purpose of this study study was (1) to evaluate the intra- and inter- absolute and relative reliability of PPT within 14 locations covering the low back region of asymptomatic individuals and (2) to determine the number of trial required to ensure reliable PPT measurements. Fifteen asymptomatic subjects were included in this study. PPTs were assessed among 14 anatomical locations in the low back region over two sessions separated by one hour interval. For the two sessions, three PPT assessments were performed on each location. Reliability was assessed computing intraclass correlation coefficients (ICC), standard error of measurement (SEM) and minimum detectable change (MDC) for all possible combinations between trials and sessions. Bland-Altman plots were also generated to assess potential bias in the dataset. Relative reliability for both intra- and inter- session was almost perfect with ICC ranged from 0.85 to 0.99. With respect to the intra-session, no statistical difference was reported for ICCs and SEM regardless of the conducted comparisons between trials. Conversely, for inter-session, ICCs and SEM values were significantly larger when two consecutive PPT measurements were used for data analysis. No significant difference was observed for the comparison between two consecutive measurements and three measurements. Excellent relative and absolute reliabilities were reported for both intra- and inter-session. Reliable measurements can be equally achieved when using the mean of two or three consecutive PPT measurements, as usually proposed in the literature, or with only the first one. Although reliability was almost perfect regardless of the conducted comparison between PPT assessments, our results suggest using two consecutive measurements to obtain higher short term absolute reliability.
The validity and reliability of a simple semantic classification of foot posture.
Cross, Hugh A; Lehman, Linda
2008-12-01
The Simple Semantic Classification (SSC) is described as a pragmatic method to assist in the assessment of the weight bearing foot. It was designed for application by therapists and technicians working in underdeveloped situations, after they have had basic orientation in foot function. To present evidence of the validity and inter observer reliability of the SSC. 13 physiotherapists from LEPRA India projects and 12 physical therapists functioning within the National Programme for the Elimination of Hansen's Disease (PNEH), Brazil, participated in an inter-observer exercise. Inter-observer agreement was gauged using the Kappa statistic. The results of the inter-observer exercise were dependent on observations of foot posture made from photographs. This was necessary to ensure that the procedure was standardised for participants in different countries. The method had limitations which were partly reflected in the results. The level of agreement between the principle investigator and Indian physiotherapists was Kappa = 058. The level of agreement between Brazilian physical therapists and the principle investigator was Kappa = 0.70. The authors opine that the results were sufficiently compelling to suggest that the Simple Semantic Classification can be used as a field method to identify people at increased risk of foot pathologies.
Mochizuki, Yuta; Kaneko, Takao; Kawahara, Keisuke; Toyoda, Shinya; Kono, Norihiko; Hada, Masaru; Ikegami, Hiroyasu; Musha, Yoshiro
2017-11-20
The quadrant method was described by Bernard et al. and it has been widely used for postoperative evaluation of anterior cruciate ligament (ACL) reconstruction. The purpose of this research is to further develop the quadrant method measuring four points, which we named four-point quadrant method, and to compare with the quadrant method. Three-dimensional computed tomography (3D-CT) analyses were performed in 25 patients who underwent double-bundle ACL reconstruction using the outside-in technique. The four points in this study's quadrant method were defined as point1-highest, point2-deepest, point3-lowest, and point4-shallowest, in femoral tunnel position. Value of depth and height in each point was measured. Antero-medial (AM) tunnel is (depth1, height2) and postero-lateral (PL) tunnel is (depth3, height4) in this four-point quadrant method. The 3D-CT images were evaluated independently by 2 orthopaedic surgeons. A second measurement was performed by both observers after a 4-week interval. Intra- and inter-observer reliability was calculated by means of intra-class correlation coefficient (ICC). Also, the accuracy of the method was evaluated against the quadrant method. Intra-observer reliability was almost perfect for both AM and PL tunnel (ICC > 0.81). Inter-observer reliability of AM tunnel was substantial (ICC > 0.61) and that of PL tunnel was almost perfect (ICC > 0.81). The AM tunnel position was 0.13% deep, 0.58% high and PL tunnel position was 0.01% shallow, 0.13% low compared to quadrant method. The four-point quadrant method was found to have high intra- and inter-observer reliability and accuracy. This method can evaluate the tunnel position regardless of the shape and morphology of the bone tunnel aperture for use of comparison and can provide measurement that can be compared with various reconstruction methods. The four-point quadrant method of this study is considered to have clinical relevance in that it is a detailed and accurate tool for evaluating femoral tunnel position after ACL reconstruction. Case series, Level IV.
Inter-Observer Reliability of DSM-5 Substance Use Disorders*
Denis, Cécile M.; Gelernter, Joel; Hart, Amy B.; Kranzler, Henry R.
2015-01-01
Aims Although studies have examined the impact of changes made in DSM-5 on the estimated prevalence of substance use disorder (SUD) diagnoses, there is limited evidence of the reliability of DSM-5 SUDs. We evaluated the inter-observer reliability of four DSM-5 SUDs in a sample in which we had previously evaluated the reliability of DSM-IV diagnoses, allowing us to compare the two systems. Methods Two different interviewers each assessed 173 subjects over a 2-week period using the Semi-Structured Assessment for Drug Dependence and Alcoholism (SSADDA). Using the percent agreement and kappa (κ) coefficient, we examined the reliability of DSM-5 lifetime alcohol, opioid, cocaine, and cannabis use disorders, which we compared to that of SSADDA-derived DSM-IV SUD diagnoses. We also assessed the effect of additional lifetime SUD and lifetime mood or anxiety disorder diagnoses on the reliability of the DSM-5 SUD diagnoses. Results Reliability was good to excellent for the four disorders, with κ values ranging from 0.65 to 0.94. Agreement was consistently lower for SUDs of mild severity than for moderate or severe disorders. DSM-5 SUD diagnoses showed greater reliability than DSM-IV diagnoses of abuse or dependence or dependence only. Co-occurring SUD and lifetime mood or anxiety disorders exerted a modest effect on the reliability of the DSM-5 SUD diagnoses. Conclusions For alcohol, opioid, cocaine and cannabis use disorders, DSM-5 criteria and diagnoses are at least as reliable as those of DSM-IV. PMID:26048641
Wollin, Martin; Purdam, Craig; Drew, Michael K
2016-01-01
To investigate inter and intra-tester reliability of an externally fixed dynamometry unilateral hamstring strength test, in the elite sports setting. Reliability study. Sixteen, injury-free, elite male youth football players (age=16.81±0.54 years, height=180.22±5.29cm, weight 73.88±6.54kg, BMI=22.57±1.42) gave written informed consent. Unilateral maximum isometric peak hamstring force was evaluated by externally fixed dynamometry for inter-tester, intra-day and intra-tester, inter-week reliability. The test position was standardised to correlate with the terminal swing phase of the gait running cycle. Inter and intra-tester values demonstrated good to high levels of reliability. The intra-class coefficient (ICC) for inter-tester, intra-day reliability was 0.87 (95% CI=0.75-0.93) with standard error of measure percentage (SEM%) 4.7 and minimal detectable change percentage (MDC%) 12.9. Intra-tester, inter-week reliability results were ICC 0.86 (95% CI, 0.74-0.93), SEM% 5.0 and MDC% 14.0. This study demonstrates good to high inter and intra-tester reliability of isometric externally fixed dynamometry unilateral hamstring strength testing in the regular elite sport setting involving elite male youth football players. The intra-class coefficient in association with the low standard error of measure and minimal detectable change percentages suggest that this procedure is appropriate for clinical and academic use as well as monitoring hamstring strength in the elite sport setting. Crown Copyright © 2015. Published by Elsevier Ltd. All rights reserved.
Baker, Nancy A; Cook, James R; Redfern, Mark S
2009-01-01
This paper describes the inter-rater and intra-rater reliability, and the concurrent validity of an observational instrument, the Keyboard Personal Computer Style instrument (K-PeCS), which assesses stereotypical postures and movements associated with computer keyboard use. Three trained raters independently rated the video clips of 45 computer keyboard users to ascertain inter-rater reliability, and then re-rated a sub-sample of 15 video clips to ascertain intra-rater reliability. Concurrent validity was assessed by comparing the ratings obtained using the K-PeCS to scores developed from a 3D motion analysis system. The overall K-PeCS had excellent reliability [inter-rater: intra-class correlation coefficients (ICC)=.90; intra-rater: ICC=.92]. Most individual items on the K-PeCS had from good to excellent reliability, although six items fell below ICC=.75. Those K-PeCS items that were assessed for concurrent validity compared favorably to the motion analysis data for all but two items. These results suggest that most items on the K-PeCS can be used to reliably document computer keyboarding style.
Bautmans, Ivan; Mets, Tony
2005-06-01
Although a wide variety of protocols are available for evaluating skeletal muscle fatigue resistance, they often necessitate important technological resources or are too complicated for elderly subjects. We present here a new test, designed for elderly persons, based on maintaining maximal voluntary grip strength as long as possible. The aim of the study was to determine the reliability of this test procedure in hospitalized geriatric patients and in young healthy persons. Fatigue resistance was considered as the time in which grip strength decreases to 50% of its maximum value. Twenty geriatric, hospitalized patients (age 83 +/- 6 yrs) and thirty-nine young, healthy persons (age 23 +/- 4 yrs) were evaluated for fatigue resistance by two different observers. Height, weight and body mass index were determined for each participant and the current amount of sports activity was recorded in the young subjects. All participants were able to perform the test. Inter- and intra-rater reliability in both subgroups was good to excellent, with ICC(3,1) values ranging from 0.77 to 0.94. No significant differences in inter- and intra-rater measurements were found, except for inter-observer evaluations of the dominant hand in hospitalized geriatric patients. No significant relationships were found between fatigue resistance and maximal grip strength, anthropometrics or gender. The proposed fatigue resistance test is a reliable tool to evaluate geriatric hospitalized patients as well as young, active and healthy persons. Fatigue resistance scores are not related to gender, maximal strength or anthropometrics within the observed subgroups.
A reliability study of the new sensors for movement analysis (SHARIF-HMIS).
Abedi, Mohen; Manshadi, Farideh Dehghan; Zavieh, Minoo Khalkhali; Ashouri, Sajad; Azimi, Hadi; Parnanpour, Mohamad
2016-04-01
SHARIF-HMIS is a new inertial sensor designed for movement analysis. The aim of the present study was to assess the inter-tester and intra-tester reliability of some kinematic parameters in different lumbar motions making use of this sensor. 24 healthy persons and 28 patients with low back pain participated in the current reliability study. The test was performed in five different lumbar motions consisting of lumbar flexion in 0, 15, and 30° in the right and left directions. For measuring inter-tester reliability, all the tests were carried out twice on the same day separately by two physiotherapists. Intra-tester reliability was assessed by reproducing the tests after 3 days by the same physiotherapist. The present study revealed satisfactory inter- and intra-tester reliability indices in different positions. ICCs for intra-tester reliability ranged from 0.65 to 0.98 and 0.59 to 0.81 for healthy and patient participants, respectively. Also, ICCs for inter-tester reliability ranged from 0.65 to 0.92 for the healthy and 0.65 to 0.87 for patient participants. In general, it can be inferred from the results that measuring the kinematic parameters in lumbar movements using inertial sensors enjoys acceptable reliability. Copyright © 2015 Elsevier Ltd. All rights reserved.
Nagata, Yasufumi; Kado, Yuichiro; Onoue, Takeshi; Otani, Kyoko; Nakazono, Akemi; Otsuji, Yutaka; Takeuchi, Masaaki
2018-01-01
Background Left ventricular ejection fraction (LVEF) and global longitudinal strain (GLS) play important roles in diagnosis and management of cardiac diseases. However, the issue of the accuracy and reliability of LVEF and GLS remains to be solved. Image quality is one of the most important factors affecting measurement variability. The aim of this study was to investigate whether improved image quality could reduce observer variability. Methods Two sets of three apical images were acquired using relatively old- and new-generation ultrasound imaging systems (Vivid 7 and Vivid E95) in 308 subjects. Image quality was assessed by endocardial border delineation index (EBDI) using a 3-point scoring system. Three observers measured the LVEF and GLS, and these values and inter-observer variability were investigated. Results Image quality was significantly better with Vivid E95 (EBDI: 26.8 ± 5.9) than that with Vivid 7 (22.8 ± 6.3, P < 0.0001). Regarding the inter-observer variability of LVEF, the r-value, bias, 95% limit of agreement and intra-class correlation coefficient for Vivid 7 were comparable to those for Vivid E95. The % variabilities were significantly lower for Vivid E95 (5.3–6.5%) than those for Vivid 7 (6.5–7.5%). Regarding GLS, all observer variability parameters were better for Vivid E95 than for Vivid 7. Improvements in image quality yielded benefits to both LVEF and GLS measurement reliability. Multivariate analysis showed that image quality was indeed an important factor of observer variability in the measurement of LVEF and GLS. Conclusions The new-generation ultrasound imaging system offers improved image quality and reduces inter-observer variability in the measurement of LVEF and GLS. PMID:29432198
Nagata, Yasufumi; Kado, Yuichiro; Onoue, Takeshi; Otani, Kyoko; Nakazono, Akemi; Otsuji, Yutaka; Takeuchi, Masaaki
2018-03-01
Left ventricular ejection fraction (LVEF) and global longitudinal strain (GLS) play important roles in diagnosis and management of cardiac diseases. However, the issue of the accuracy and reliability of LVEF and GLS remains to be solved. Image quality is one of the most important factors affecting measurement variability. The aim of this study was to investigate whether improved image quality could reduce observer variability. Two sets of three apical images were acquired using relatively old- and new-generation ultrasound imaging systems (Vivid 7 and Vivid E95) in 308 subjects. Image quality was assessed by endocardial border delineation index (EBDI) using a 3-point scoring system. Three observers measured the LVEF and GLS, and these values and inter-observer variability were investigated. Image quality was significantly better with Vivid E95 (EBDI: 26.8 ± 5.9) than that with Vivid 7 (22.8 ± 6.3, P < 0.0001). Regarding the inter-observer variability of LVEF, the r -value, bias, 95% limit of agreement and intra-class correlation coefficient for Vivid 7 were comparable to those for Vivid E95. The % variabilities were significantly lower for Vivid E95 (5.3-6.5%) than those for Vivid 7 (6.5-7.5%). Regarding GLS, all observer variability parameters were better for Vivid E95 than for Vivid 7. Improvements in image quality yielded benefits to both LVEF and GLS measurement reliability. Multivariate analysis showed that image quality was indeed an important factor of observer variability in the measurement of LVEF and GLS. The new-generation ultrasound imaging system offers improved image quality and reduces inter-observer variability in the measurement of LVEF and GLS. © 2018 The authors.
Vanwolleghem, Griet; Van Dyck, Delfien; Ducheyne, Fabian; De Bourdeaudhuij, Ilse; Cardon, Greet
2014-06-10
Google Street View provides a valuable and efficient alternative to observe the physical environment compared to on-site fieldwork. However, studies on the use, reliability and validity of Google Street View in a cycling-to-school context are lacking. We aimed to study the intra-, inter-rater reliability and criterion validity of EGA-Cycling (Environmental Google Street View Based Audit - Cycling to school), a newly developed audit using Google Street View to assess the physical environment along cycling routes to school. Parents (n = 52) of 11-to-12-year old Flemish children, who mostly cycled to school, completed a questionnaire and identified their child's cycling route to school on a street map. Fifty cycling routes of 11-to-12-year olds were identified and physical environmental characteristics along the identified routes were rated with EGA-Cycling (5 subscales; 37 items), based on Google Street View. To assess reliability, two researchers performed the audit. Criterion validity of the audit was examined by comparing the ratings based on Google Street View with ratings through on-site assessments. Intra-rater reliability was high (kappa range 0.47-1.00). Large variations in the inter-rater reliability (kappa range -0.03-1.00) and criterion validity scores (kappa range -0.06-1.00) were reported, with acceptable inter-rater reliability values for 43% of all items and acceptable criterion validity for 54% of all items. EGA-Cycling can be used to assess physical environmental characteristics along cycling routes to school. However, to assess the micro-environment specifically related to cycling, on-site assessments have to be added.
Spörndly-Nees, Søren; Dåsberg, Brian; Nielsen, Rasmus Oestergaard; Boesen, Morten Ilum
2011-01-01
Background: Lower limb injuries are a large problem in athletes. However, there is a paucity of knowledge on the relationship between alignment of the medial longitudinal arch (MLA) of the foot and development of such injuries. A reliable and valid test to quantify foot type is needed to be able to investigate the relationship between arch type and injury likelihood. Feiss Line is a valid clinical measure of the MLA. However, no study has investigated the reliability of the test. Objectives: The purpose was to describe a modified version of the Feiss Line test and to determine the intra- and inter-tester reliability of this new foot alignment test. To emphasize the purpose of the modified test, the authors have named it The Navicular Position Test. Methods: Intra- and inter-tester reliability were evaluated of The Navicular Position Test with the use of ICC (interclass correlation coefficient) and Bland-Altman limits of agreement on 43 healthy, young, subjects. Results: Inter-tester mean difference -0.35 degrees [–1.32; 0.62] p = 0.47. Bland-Altman limits of agreement –6.55 to 5.85 degrees, ICC = 0.94. Intra-tester mean difference 0.47 degrees [–0.57; 1.50] p = 0.37. Bland-Altman limits of agreement –6.15 to 7.08 degrees, ICC = 0.91. Discussion: The present data support The Navicular Position Test as a reliable test of the navicular bone position during rest and loading measured in a simple test set-up. Conclusion: The Navicular Position Test was shown to have a high intraday-, intra- and inter-tester reliability. When cut off values to categorize the MLA into planus, rectus, or cavus feet, has been determined and presented, the test could be used in prospective observational studies investigating the role of the arch type on the development of various lower limb injuries. PMID:21904698
Abdallah, Faraj W; Yu, Eugene; Cholvisudhi, Phantila; Niazi, Ahtsham U; Chin, Ki J; Abbas, Sherif; Chan, Vincent W
2017-01-01
Ultrasound (US) imaging of the airway may be useful in predicting difficulty of airway management (DAM); but its use is limited by lack of proof of its validity and reliability. We sought to validate US imaging of the airway by comparison to CT-scan, and to assess its inter- and intra-observer reliability. We used submandibular sonographic imaging of the mouth and oropharynx to examine how well the ratio of tongue thickness to oral cavity height correlates with the ratio of tongue volume to oral cavity volume, an established tomographic measure of DAM. A cohort of 34 patients undergoing CT-scan was recruited. Study standardized assessments included CT-measured ratios of tongue volume to oropharyngeal cavity volume; tongue thickness to oral cavity height; and US-measured ratio of tongue thickness to oral cavity height. Two sonographers independently performed US imaging of the airway before and after CT-scan. Our findings indicate that the US-measured ratio of tongue thickness to oral cavity height highly correlates with the CT-measured ratio of tongue volume to oral cavity volume. US measurements also demonstrated strong inter- and intra-observer reliability. This study suggests that US is a valid and reliable tool for imaging the oral and oropharyngeal parts of the airway, as well as for measuring the volumetric relationship between the tongue and oral cavity, and may therefore be a useful predictor of DAM. © 2016 by the American Institute of Ultrasound in Medicine.
Barthassat, Emilienne; Afifi, Faik; Konala, Praveen; Rasch, Helmut; Hirschmann, Michael T
2017-05-08
It was the primary purpose of our study to evaluate the inter- and intra-observer reliability of a standardized SPECT/CT algorithm for evaluating patients with painful primary total hip arthroplasty (THA). The secondary purpose was a comparison of semi-quantitative and 3D volumetric quantification method for assessment of bone tracer uptake (BTU) in those patients. A novel SPECT/CT localization scheme consisting of 14 femoral and 4 acetabular regions on standardized axial and coronal slices was introduced and evaluated in terms of inter- and intra-observer reliability in 37 consecutive patients with hip pain after THA. BTU for each anatomical region was assessed semi-quantitatively using a color-coded Likert type scale (0-10) and volumetrically quantified using a validated software. Two observers interpreted the SPECT/CT findings in all patients two times with six weeks interval between interpretations in random order. Semi-quantitative and quantitative measurements were compared in terms of reliability. In addition, the values were correlated using Pearson`s correlation. A factorial cluster analysis of BTU was performed to identify clinically relevant regions, which should be grouped and analysed together. The localization scheme showed high inter- and intra-observer reliabilities for all femoral and acetabular regions independent of the measurement method used (semiquantitative versus 3D volumetric quantitative measurements). A high to moderate correlation between both measurement methods was shown for the distal femur, the proximal femur and the acetabular cup. The factorial cluster analysis showed that the anatomical regions might be summarized into three distinct anatomical regions. These were the proximal femur, the distal femur and the acetabular cup region. The SPECT/CT algorithm for assessment of patients with pain after THA is highly reliable independent from the measurement method used. Three clinically relevant anatomical regions (proximal femoral, distal femoral, acetabular) were identified.
Measuring symptoms and functioning of youth with ADHD in middle schools.
Evans, Steven W; Allen, Jessica; Moore, Sheryle; Strauss, Victoria
2005-12-01
The identification of reliable and valid means for evaluating the effectiveness of school-based treatments and completing diagnostic evaluations of middle school aged students are needed. The present study examined the inter-rater agreement of teacher ratings and the relationship between ratings and observational data in a middle school setting. The data are interpreted in the context of differences between a secondary and elementary school setting. Teacher ratings and observational data were collected regularly over the course of two academic years for middle school students diagnosed with ADHD. The results indicate low rates of inter-rater agreement as well as low rates of agreement between teachers and observational data, and between observational data collected in different classrooms. Inter-rater agreement was lowest in late fall and gradually increased over the second half of the year. Implications for conducting treatment outcome evaluations of school-based treatment programs and diagnostic evaluations are discussed.
Effect of knee angle on neuromuscular assessment of plantar flexor muscles: A reliability study
Cornu, Christophe; Jubeau, Marc
2018-01-01
Introduction This study aimed to determine the intra- and inter-session reliability of neuromuscular assessment of plantar flexor (PF) muscles at three knee angles. Methods Twelve young adults were tested for three knee angles (90°, 30° and 0°) and at three time points separated by 1 hour (intra-session) and 7 days (inter-session). Electrical (H reflex, M wave) and mechanical (evoked and maximal voluntary torque, activation level) parameters were measured on the PF muscles. Intraclass correlation coefficients (ICC) and coefficients of variation were calculated to determine intra- and inter-session reliability. Results The mechanical measurements presented excellent (ICC>0.75) intra- and inter-session reliabilities regardless of the knee angle considered. The reliability of electrical measurements was better for the 90° knee angle compared to the 0° and 30° angles. Conclusions Changes in the knee angle may influence the reliability of neuromuscular assessments, which indicates the importance of considering the knee angle to collect consistent outcomes on the PF muscles. PMID:29596480
Rossettini, Giacomo; Rondoni, Angie; Lovato, Tommaso; Strobe, Marco; Verzè, Elisa; Vicentini, Marco; Testa, Marco
2016-06-03
Passive Intervertebral Movements (PIVMs) are commonly used to assess and treat patients with nonspecific neck pain. Only very few studies have investigated 3D movements until now. This study assessed intra- and inter-rater reliability of three-dimensional (3D) cervical PIVMs performed by physical therapy students in patients with nonspecific neck pain. Thirty-one patients, mean age 47.2 ± 7.2 years, were independently evaluated by 2 physical therapy students. The raters (A and B) assessed mobility, end-feel and pain provocation performing bilaterally the 3D cervical segmental side-bending test (3D CSSB) from levels C2-C3 to C6-C7. Percentage agreement (raw, positive and negative), Cohen's kappa (95% CI), prevalence index and bias index were calculated to estimate intra- and inter-reliability. Intra-rater reliability showed kappa values ranging between fair and substantial (k 0.29-0.80) for pain provocation, mobility and end-feel, with percentage agreements between 61%-90%. Inter-rater reliability presented kappa values ranging between fair and substantial (k 0.22-0.62) for pain provocation, mobility and end-feel, with percentage agreements between 61% and 80%. Intra-rater reliability of 3D PIVMs was superior to inter-rater reliability in patients with nonspecific neck pain. The most repeatable evaluation parameter was pain. However overall poor reliability suggests avoiding the use of these techniques alone to examine patients and measure their outcome. Further studies are needed to investigate PIVMs reliability in combination with other assessment procedure in symptomatic patients.
Litzenburger, Friederike; Heck, Katrin; Pitchika, Vinay; Neuhaus, Klaus W; Jost, Fabian N; Hickel, Reinhard; Jablonski-Momeni, Anahita; Welk, Alexander; Lederer, Alexander; Kühnisch, Jan
2018-02-01
The purpose of this in vitro study was to evaluate the inter- and intraexaminer reliability of digital bitewing (DBW) radiography and near-infrared light transillumination (NIRT) for proximal caries detection and assessment in posterior teeth. From a pool of 85 patients, 100 corresponding pairs of DBW and NIRT images (~1/3 healthy, ~1/3 with enamel caries and ~1/3 with dentin caries) were chosen. 12 dentists with different professional status and clinical experience repeated the evaluation in two blinded cycles. Two experienced dentists provided a reference diagnosis after analysing all images independently. Statistical analysis included the calculation of simple (κ) and weighted Kappa (wκ) values as a measure of reliability. Logistic regression with a backward elimination model was used to investigate the influence of the diagnostic method, evaluation cycle, type of tooth, and clinical experience on reliability. Altogether, inter- and intraexaminer reliability exhibited good to excellent κ and wκ values for DBW radiography (Inter: κ = 0.60/ 0.63; wκ = 0.74/0.76; Intra: κ = 0.64; wκ = 0.77) and NIRT (Inter: κ = 0.74/0.64; wκ = 0.86/0.82; Intra: κ = 0.68; wκ = 0.84). The backward elimination model revealed NIRT to be significantly more reliable than DBW radiography. This study revealed a good to excellent inter- and intraexaminer reliability for proximal caries detection using DBW and NIRT images. The logistic regression analysis revealed significantly better reliability for NIRT. Additionally, the first evaluation cycle was more reliable according to the reference diagnoses.
Izumi, Betty T; Findholt, Nancy E; Pickus, Hayley A; Nguyen, Thuan; Cuneo, Monica K
2014-06-01
Food stores have gained attention as potential intervention targets for improving children's eating habits. There is a need for valid and reliable instruments to evaluate changes in food store snack and beverage availability secondary to intervention. The aim of this study was to develop a valid, reliable, and resource-efficient instrument to evaluate the healthfulness of food store environments faced by children. The SNACZ food store checklist was developed to assess availability of healthier alternatives to the energy-dense snacks and beverages commonly consumed by children. After pretesting, two trained observers independently assessed the availability of 48 snack and beverage items in 50 food stores located near elementary and middle schools in Portland, Oregon, over a 2-week period in summer 2012. Inter-rater reliability was calculated using the kappa statistic. Overall, the instrument had mostly high inter-rater reliability. Seventy-three percent of items assessed had almost perfect or substantial reliability. Two items had moderate reliability (0.41-0.60), and no items had a reliability score less than 0.41. Eleven items occurred too infrequently to generate a kappa score. The SNACZ food store checklist is a first-step toward developing a valid and reliable tool to evaluate the healthfulness of food store environments faced by children. The tool can be used to compare availability of healthier snack and beverage alternatives across communities and measure change secondary to intervention. As a wider variety of healthier snack and beverage alternatives become available in food stores, the checklist should be updated.
Beck, Stefanie; Ruhnke, Bjarne; Issleib, Malte; Daubmann, Anne; Harendza, Sigrid; Zöllner, Christian
2016-10-07
Training of lay-rescuers is essential to improve survival-rates after cardiac arrest. Multiple campaigns emphasise the importance of basic life support (BLS) training for school children. Trainings require a valid assessment to give feedback to school children and to compare the outcomes of different training formats. Considering these requirements, we developed an assessment of BLS skills using MiniAnne and tested the inter-rater reliability between professionals, medical students and trained school children as assessors. Fifteen professional assessors, 10 medical students and 111-trained school children (peers) assessed 1087 school children at the end of a CPR-training event using the new assessment format. Analyses of inter-rater reliability (intraclass correlation coefficient; ICC) were performed. Overall inter-rater reliability of the summative assessment was high (ICC = 0.84, 95 %-CI: 0.84 to 0.86, n = 889). The number of comparisons between peer-peer assessors (n = 303), peer-professional assessors (n = 339), and peer-student assessors (n = 191) was adequate to demonstrate high inter-rater reliability between peer- and professional-assessors (ICC: 0.76), peer- and student-assessors (ICC: 0.88) and peer- and other peer-assessors (ICC: 0.91). Systematic variation in rating of specific items was observed for three items between professional- and peer-assessors. Using this assessment and integrating peers and medical students as assessors gives the opportunity to assess hands-on skills of school children with high reliability.
Troester, Jordan C.; Jasmin, Jason G.; Duffield, Rob
2018-01-01
The present study examined the inter-trial (within test) and inter-test (between test) reliability of single-leg balance and single-leg landing measures performed on a force plate in professional rugby union players using commercially available software (SpartaMARS, Menlo Park, USA). Twenty-four players undertook test – re-test measures on two occasions (7 days apart) on the first training day of two respective pre-season weeks following 48h rest and similar weekly training loads. Two 20s single-leg balance trials were performed on a force plate with eyes closed. Three single-leg landing trials were performed by jumping off two feet and landing on one foot in the middle of a force plate 1m from the starting position. Single-leg balance results demonstrated acceptable inter-trial reliability (ICC = 0.60-0.81, CV = 11-13%) for sway velocity, anterior-posterior sway velocity, and mediolateral sway velocity variables. Acceptable inter-test reliability (ICC = 0.61-0.89, CV = 7-13%) was evident for all variables except mediolateral sway velocity on the dominant leg (ICC = 0.41, CV = 15%). Single-leg landing results only demonstrated acceptable inter-trial reliability for force based measures of relative peak landing force and impulse (ICC = 0.54-0.72, CV = 9-15%). Inter-test results indicate improved reliability through the averaging of three trials with force based measures again demonstrating acceptable reliability (ICC = 0.58-0.71, CV = 7-14%). Of the variables investigated here, total sway velocity and relative landing impulse are the most reliable measures of single-leg balance and landing performance, respectively. These measures should be considered for monitoring potential changes in postural control in professional rugby union. Key points Single-leg balance demonstrated acceptable inter-trial and inter-test reliability. Single-leg landing demonstrated good inter-trial and inter-test reliability for measures of relative peak landing force and relative impulse, but not time to stabilization. Of the variables investigated, sway velocity and relative landing impulse are the most reliable measures of single-leg balance and landing respectively, and should considered for monitoring changes in postural control. PMID:29769817
The reliability of knee joint position testing using electrogoniometry
Piriyaprasarth, Pagamas; Morris, Meg E; Winter, Adele; Bialocerkowski, Andrea E
2008-01-01
Background The current investigation examined the inter- and intra-tester reliability of knee joint angle measurements using a flexible Penny and Giles Biometric® electrogoniometer. The clinical utility of electrogoniometry was also addressed. Methods The first study examined the inter- and intra-tester reliability of measurements of knee joint angles in supine, sitting and standing in 35 healthy adults. The second study evaluated inter-tester and intra-tester reliability of knee joint angle measurements in standing and after walking 10 metres in 20 healthy adults, using an enhanced measurement protocol with a more detailed electrogoniometer attachment procedure. Both inter-tester reliability studies involved two testers. Results In the first study, inter-tester reliability (ICC[2,10]) ranged from 0.58–0.71 in supine, 0.68–0.79 in sitting and 0.57–0.80 in standing. The standard error of measurement between testers was less than 3.55° and the limits of agreement ranged from -12.51° to 12.21°. Reliability coefficients for intra-tester reliability (ICC[3,10]) ranged from 0.75–0.76 in supine, 0.86–0.87 in sitting and 0.87–0.88 in standing. The standard error of measurement for repeated measures by the same tester was less than 1.7° and the limits of agreement ranged from -8.13° to 7.90°. The second study showed that using a more detailed electrogoniometer attachment protocol reduced the error of measurement between testers to 0.5°. Conclusion Using a standardised protocol, reliable measures of knee joint angles can be gained in standing, supine and sitting by using a flexible goniometer. PMID:18211714
Inter-rater Reliability of Real-Time Ultrasound to Measure Acromiohumeral Distance.
Mackenzie, Tanya Anne; Bdaiwi, Alya H; Herrington, Lee; Cools, Ann
2016-07-01
Real-time ultrasound (RTUS) has been suggested as a reliable measure of acromiohumeral distance. However, to date, no vigorous assessment and reporting of inter-rater reliability of this method has been performed with the shoulder in a neutral position or with active and passive arm abduction. To assess intrasession inter-rater reliability of using RTUS to measure acromiohumeral distance with the shoulder in a neutral position and with 60° active and passive abduction. Inter-rater intrasession reliability of repeated measures. Human performance laboratory. Twenty persons (12 male and 8 female) with an average age of 29.86 years (standard deviation, 7.8). In an inter-rater, intrasession study, RTUS was used to measure the acromiohumeral distance with the shoulder in a neutral position and with 60° of both active and passive abduction. Acromiohumeral distance. Intraclass correlation coefficient (ICC)2.1 scores ranged between 0.65-0.88 (standard error of the mean = 0.81-1.2 mm and minimal detectable differences with 95% confidence = 2.2-2.3 mm) for inter-rater intrasession reliability. RTUS was found to have fair to good inter-rater reliability as a tool to measure acromiohumeral distance with the shoulder in a neutral position and with 60° of both active and passive arm abduction. Copyright © 2016 American Academy of Physical Medicine and Rehabilitation. Published by Elsevier Inc. All rights reserved.
Bervoets, Liene; Van Noten, Caroline; Van Roosbroeck, Sofie; Hansen, Dominique; Van Hoorenbeeck, Kim; Verheyen, Els; Van Hal, Guido; Vankerckhoven, Vanessa
2014-01-01
This study was designed to validate the Dutch Physical Activity Questionnaires for Children (PAQ-C) and Adolescents (PAQ-A). After adjustment of the original Canadian PAQ-C and PAQ-A (i.e. translation/back-translation and evaluation by expert committee), content validity of both PAQs was assessed and calculated using item-level (I-CVI) and scale-level (S-CVI) content validity indexes. Inter-item and inter-rater reliability of 196 PAQ-C and 95 PAQ-A filled in by both children or adolescents and their parent, were evaluated. Inter-item reliability was calculated by Cronbach's alpha (α) and inter-rater reliability was examined by percent observed agreement and weighted kappa (κ). Concurrent validity of PAQ-A was examined in a subsample of 28 obese and 16 normal-weight children by comparing it with concurrently measured physical activity using a maximal cardiopulmonary exercise test for the assessment of peak oxygen uptake (VO2 peak). For both PAQs, I-CVI ranged 0.67-1.00. S-CVI was 0.89 for PAQ-C and 0.90 for PAQ-A. A total of 192 PAQ-C and 94 PAQ-A were fully completed by both child and parent. Cronbach's α was 0.777 for PAQ-C and 0.758 for PAQ-A. Percent agreement ranged 59.9-74.0% for PAQ-C and 51.1-77.7% for PAQ-A, and weighted κ ranged 0.48-0.69 for PAQ-C and 0.51-0.68 for PAQ-A. The correlation between total PAQ-A score and VO2 peak - corrected for age, gender, height and weight - was 0.516 (p = 0.001). Both PAQs have an excellent content validity, an acceptable inter-item reliability and a moderate to good strength of inter-rater agreement. In addition, total PAQ-A score showed a moderate positive correlation with VO2 peak. Both PAQs have an acceptable to good reliability and validity, however, further validity testing is recommended to provide a more complete assessment of both PAQs.
Inter-observer reliability of DSM-5 substance use disorders.
Denis, Cécile M; Gelernter, Joel; Hart, Amy B; Kranzler, Henry R
2015-08-01
Although studies have examined the impact of changes made in DSM-5 on the estimated prevalence of substance use disorder (SUD) diagnoses, there is limited evidence concerning the reliability of DSM-5 SUDs. We evaluated the inter-observer reliability of four DSM-5 SUDs in a sample in which we had previously evaluated the reliability of DSM-IV diagnoses, allowing us to compare the two systems. Two different interviewers each assessed 173 subjects over a 2-week period using the Semi-Structured Assessment for Drug Dependence and Alcoholism (SSADDA). Using the percent agreement and kappa (κ) coefficient, we examined the reliability of DSM-5 lifetime alcohol, opioid, cocaine, and cannabis use disorders, which we compared to that of SSADDA-derived DSM-IV SUD diagnoses. We also assessed the effect of additional lifetime SUD and lifetime mood or anxiety disorder diagnoses on the reliability of the DSM-5 SUD diagnoses. Reliability was good to excellent for the four disorders, with κ values ranging from 0.65 to 0.94. Agreement was consistently lower for SUDs of mild severity than for moderate or severe disorders. DSM-5 SUD diagnoses showed greater reliability than DSM-IV diagnoses of abuse or dependence or dependence only. Co-occurring SUD and lifetime mood or anxiety disorders exerted a modest effect on the reliability of the DSM-5 SUD diagnoses. For alcohol, opioid, cocaine and cannabis use disorders, DSM-5 criteria and diagnoses are at least as reliable as those of DSM-IV. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Savage, Trevor Nicholas; McIntosh, Andrew Stuart
2017-03-01
It is important to understand factors contributing to and directly causing sports injuries to improve the effectiveness and safety of sports skills. The characteristics of injury events must be evaluated and described meaningfully and reliably. However, many complex skills cannot be effectively investigated quantitatively because of ethical, technological and validity considerations. Increasingly, qualitative methods are being used to investigate human movement for research purposes, but there are concerns about reliability and measurement bias of such methods. Using the tackle in Rugby union as an example, we outline a systematic approach for developing a skill analysis protocol with a focus on improving objectivity, validity and reliability. Characteristics for analysis were selected using qualitative analysis and biomechanical theoretical models and epidemiological and coaching literature. An expert panel comprising subject matter experts provided feedback and the inter-rater reliability of the protocol was assessed using ten trained raters. The inter-rater reliability results were reviewed by the expert panel and the protocol was revised and assessed in a second inter-rater reliability study. Mean agreement in the second study improved and was comparable (52-90% agreement and ICC between 0.6 and 0.9) with other studies that have reported inter-rater reliability of qualitative analysis of human movement.
Reliability of tristimulus colourimetry in the assessment of cutaneous bruise colour.
Scafide, Katherine N; Sheridan, Daniel J; Taylor, Laura A; Hayat, Matthew J
2016-06-01
Bruising is one of the most common types of injury clinicians observe among victims of violence and other trauma patients. However, research has shown commonly used qualitative description of cutaneous bruise colour via the naked eye is subjective and unreliable. No published work has formally evaluated the reliability of tristimulus colourimetry as an alternative for assessing bruise colour, despite its clinical and research applications in accurately assessing skin colour. The purpose of this study was to systematically evaluate the test-retest and inter-observer reliability of tristimulus colourimetry in the assessment of cutaneous bruise colour. Two researchers obtained repeated tristimulus colourimetry measures of cutaneous bruises with participants of diverse skin colour. Measures were obtained using the Minolta CR-400 Chomameter. Commission Internationale d'Eclairage (CIE) L*a*b* colour space was used. Data was analysed using intraclass correlation coefficients (ICC), Cronbach's alpha, and minimal detectable change (MDC) on all three L*a*b* values. The colorimeter demonstrated excellent test-retest or intra-rater reliability (L* ICC=0.999; a* ICC=0.973; b* ICC=0.892) and inter-rater reliability (L* ICC=0.997; a* ICC=0.976; b* ICC=0.982). With consistent placement, the tristimulus colourimetry is reliable for the objective assessment and documentation of cutaneous bruise colour for purposes of clinical practice and research. Recommendations for use in practice/research are provided. Copyright © 2016 Elsevier Ltd. All rights reserved.
Peterson, Eleanor B; Calhoun, Aaron W; Rider, Elizabeth A
2014-09-01
With increased recognition of the importance of sound communication skills and communication skills education, reliable assessment tools are essential. This study reports on the psychometric properties of an assessment tool based on the Kalamazoo Consensus Statement Essential Elements Communication Checklist. The Gap-Kalamazoo Communication Skills Assessment Form (GKCSAF), a modified version of an existing communication skills assessment tool, the Kalamazoo Essential Elements Communication Checklist-Adapted, was used to assess learners in a multidisciplinary, simulation-based communication skills educational program using multiple raters. 118 simulated conversations were available for analysis. Internal consistency and inter-rater reliability were determined by calculating a Cronbach's alpha score and intra-class correlation coefficients (ICC), respectively. The GKCSAF demonstrated high internal consistency with a Cronbach's alpha score of 0.844 (faculty raters) and 0.880 (peer observer raters), and high inter-rater reliability with an ICC of 0.830 (faculty raters) and 0.89 (peer observer raters). The Gap-Kalamazoo Communication Skills Assessment Form is a reliable method of assessing the communication skills of multidisciplinary learners using multi-rater methods within the learning environment. The Gap-Kalamazoo Communication Skills Assessment Form can be used by educational programs that wish to implement a reliable assessment and feedback system for a variety of learners. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Bożek, Agnieszka; Reich, Adam
2017-08-01
A wide variety of psoriasis assessment tools have been proposed to evaluate the severity of psoriasis in clinical trials and daily practice. The most frequently used clinical instrument is the psoriasis area and severity index (PASI); however, none of the currently published severity scores used for psoriasis meets all the validation criteria required for an ideal score. The aim of this study was to compare and assess the reliability of 3 commonly used assessment instruments for psoriasis severity: the psoriasis area and severity index (PASI), body surface area (BSA) and physician global assessment (PGA). On the scoring day, 10 trained dermatologists evaluated 9 adult patients with plaque-type psoriasis using the PASI, BSA and PGA. All the subjects were assessed twice by each physician. Correlations between the assessments were analyzed using the Pearson correlation coefficient. Intra-class correlation coefficient (ICC) was calculated to analyze intra-rater reliability, and the coefficient of variation (CV) was used to assess inter-rater variability. Significant correlations were observed among the 3 scales in both assessments. In all 3 scales the ICCs were > 0.75, indicating high intra-rater reliability. The highest ICC was for the BSA (0.96) and the lowest one for the PGA (0.87). The CV for the PGA and PASI were 29.3 and 36.9, respectively, indicating moderate inter-rater variability. The CV for the BSA was 57.1, indicating high inter-rater variability. Comparing the PASI, PGA and BSA, it was shown that the PGA had the highest inter-rater reliability, whereas the BSA had the highest intra-rater reliability. The PASI showed intermediate values in terms of interand intra-rater reliability. None of the 3 assessment instruments showed a significant advantage over the other. A reliable assessment of psoriasis severity requires the use of several independent evaluations simultaneously.
Jones, Terry L; Schlegel, Cara
2014-02-01
Accurate, precise, unbiased, reliable, and cost-effective estimates of nursing time use are needed to insure safe staffing levels. Direct observation of nurses is costly, and conventional surrogate measures have limitations. To test the potential of electronic capture of time and motion through real time location systems (RTLS), a pilot study was conducted to assess efficacy (method agreement) of RTLS time use; inter-rater reliability of RTLS time-use estimates; and associated costs. Method agreement was high (mean absolute difference = 28 seconds); inter-rater reliability was high (ICC = 0.81-0.95; mean absolute difference = 2 seconds); and costs for obtaining RTLS time-use estimates on a single nursing unit exceeded $25,000. Continued experimentation with RTLS to obtain time-use estimates for nursing staff is warranted. © 2013 Wiley Periodicals, Inc.
Measuring the Pain Area: An Intra- and Inter-Rater Reliability Study Using Image Analysis Software.
Dos Reis, Felipe Jose Jandre; de Barros E Silva, Veronica; de Lucena, Raphaela Nunes; Mendes Cardoso, Bruno Alexandre; Nogueira, Leandro Calazans
2016-01-01
Pain drawings have frequently been used for clinical information and research. The aim of this study was to investigate intra- and inter-rater reliability of area measurements performed on pain drawings. Our secondary objective was to verify the reliability when using computers with different screen sizes, both with and without mouse hardware. Pain drawings were completed by patients with chronic neck pain or neck-shoulder-arm pain. Four independent examiners participated in the study. Examiners A and B used the same computer with a 16-inch screen and wired mouse hardware. Examiner C used a notebook with a 16-inch screen and no mouse hardware, and Examiner D used a computer with an 11.6-inch screen and a wireless mouse. Image measurements were obtained using GIMP and NIH ImageJ computer programs. The length of all the images was measured using GIMP software to a set scale in ImageJ. Thus, each marked area was encircled and the total surface area (cm(2) ) was calculated for each pain drawing measurement. A total of 117 areas were identified and 52 pain drawings were analyzed. The intrarater reliability between all examiners was high (ICC = 0.989). The inter-rater reliability was also high. No significant differences were observed when using different screen sizes or when using or not using the mouse hardware. This suggests that the precision of these measurements is acceptable for the use of this method as a measurement tool in clinical practice and research. © 2014 World Institute of Pain.
Ayala, Francisco; De Ste Croix, Mark; Sainz de Baranda, Pilar; Santonja, Fernando
2014-04-01
The purposes were twofold: (a) to ascertain the inter-session reliability of hamstrings total reaction time, pre-motor time and motor time; and (b) to examine sex-related differences in the hamstrings reaction times profile. Twenty-four men and 24 women completed the study. Biceps femoris and semitendinosus total reaction time, pre-motor time and motor time measured during eccentric isokinetic contractions were recorded on three different occasions. Inter-session reliability was examined through typical percentage error (CVTE), percentage change in the mean (CM) and intraclass correlations (ICC). For both biceps femoris and semitendinosus, total reaction time, pre-motor time and motor time measures demonstrated moderate inter-session reliability (CVTE<10%; CM<3%; ICC>0.7). The results also indicated that, although not statistically significant, women reported consistently longer hamstrings total reaction time (23.5ms), pre-motor time (12.7ms) and motor time (7.5ms) values than men. Therefore, an observed change larger than 5%, 9% and 8% for total reaction time, pre-motor time and motor time respectively from baseline scores after performing a training program would indicate that a real change was likely. Furthermore, while not statistically significant, sex differences were noted in the hamstrings reaction time profile which may play a role in the greater incidence of ACL injuries in women. Copyright © 2013 Elsevier Ltd. All rights reserved.
A Reliable, Feasible Method to Observe Neighborhoods at High Spatial Resolution
Kepper, Maura M.; Sothern, Melinda S.; Theall, Katherine P.; Griffiths, Lauren A.; Scribner, Richard; Tseng, Tung-Sung; Schaettle, Paul; Cwik, Jessica M.; Felker-Kantor, Erica; Broyles, Stephanie T.
2016-01-01
Introduction Systematic social observation (SSO) methods traditionally measure neighborhoods at street level and have been performed reliably using virtual applications to increase feasibility. Research indicates that collection at even higher spatial resolution may better elucidate the health impact of neighborhood factors, but whether virtual applications can reliably capture social determinants of health at the smallest geographic resolution (parcel level) remains uncertain. This paper presents a novel, parcel-level SSO methodology and assesses whether this new method can be collected reliably using Google Street View and is feasible. Methods Multiple raters (N=5) observed 42 neighborhoods. In 2016, inter-rater reliability (observed agreement and kappa coefficient) was compared for four SSO methods: (1) street-level in person; (2) street-level virtual; (3) parcel-level in person; and (4) parcel-level virtual. Intra-rater reliability (observed agreement and kappa coefficient) was calculated to determine whether parcel-level methods produce results comparable to traditional street-level observation. Results Substantial levels of inter-rater agreement were documented across all four methods; all methods had >70% of items with at least substantial agreement. Only physical decay showed higher levels of agreement (83% of items with >75% agreement) for direct versus virtual rating source. Intra-rater agreement comparing street- versus parcel-level methods resulted in observed agreement >75% for all but one item (90%). Conclusions Results support the use of Google Street View as a reliable, feasible tool for performing SSO at the smallest geographic resolution. Validation of a new parcel-level method collected virtually may improve the assessment of social determinants contributing to disparities in health behaviors and outcomes. PMID:27989289
Cann, A P; Connolly, M; Ruuska, R; MacNeil, M; Birmingham, T B; Vandervoort, A A; Callaghan, J P
2008-04-01
Despite the ongoing health problem of repetitive strain injuries, there are few tools currently available for ergonomic applications evaluating cumulative loading that have well-documented evidence of reliability and validity. The purpose of this study was to determine the inter-rater reliability of a posture matching based analysis tool (3DMatch, University of Waterloo) for predicting cumulative and peak spinal loads. A total of 30 food service workers were each videotaped for a 1-h period while performing typical work activities and a single work task was randomly selected from each for analysis by two raters. Inter-rater reliability was determined using intraclass correlation coefficients (ICC) model 2,1 and standard errors of measurement for cumulative and peak spinal and shoulder loading variables across all subjects. Overall, 85.5% of variables had moderate to excellent inter-rater reliability, with ICCs ranging from 0.30-0.99 for all cumulative and peak loading variables. 3DMatch was found to be a reliable ergonomic tool when more than one rater is involved.
Terashima, Taiko; Yoshimura, Sadako
2018-03-01
To determine whether nurses can accurately assess the skin colour of replanted fingers displayed as digital images on a computer screen. Colour measurement and clinical diagnostic methods for medical digital images have been studied, but reproducing skin colour on a computer screen remains difficult. The inter-rater reliability of skin colour assessment scores was evaluated. In May 2014, 21 nurses who worked on a trauma ward in Japan participated in testing. Six digital images with different skin colours were used. Colours were scored from both digital images and direct patient's observation. The score from a digital image was defined as the test score, and its difference from the direct assessment score as the difference score. Intraclass correlation coefficients were calculated. Nurses' opinions were classified and summarised. The intraclass correlation coefficients for the test scores were fair. Although the intraclass correlation coefficients for the difference scores were poor, they improved to good when three images that might have contributed to poor reliability were excluded. Most nurses stated that it is difficult to assess skin colour in digital images; they did not think it could be a substitute for direct visual assessment. However, most nurses were in favour of including images in nursing progress notes. Although the inter-rater reliability was fairly high, the reliability of colour reproduction in digital images as indicated by the difference scores was poor. Nevertheless, nurses expect the incorporation of digital images in nursing progress notes to be useful. This gap between the reliability of digital colour reproduction and nurses' expectations towards it must be addressed. High inter-rater reliability for digital images in nursing progress notes was not observed. Assessments of future improvements in colour reproduction technologies are required. Further digitisation and visualisation of nursing records might pose challenges. © 2017 John Wiley & Sons Ltd.
Inter-arch digital model vs. manual cast measurements: Accuracy and reliability.
Kiviahde, Heikki; Bukovac, Lea; Jussila, Päivi; Pesonen, Paula; Sipilä, Kirsi; Raustia, Aune; Pirttiniemi, Pertti
2017-06-28
The purpose of this study was to evaluate the accuracy and reliability of inter-arch measurements using digital dental models and conventional dental casts. Thirty sets of dental casts with permanent dentition were examined. Manual measurements were done with a digital caliper directly on the dental casts, and digital measurements were made on 3D models by two independent examiners. Intra-class correlation coefficients (ICC), a paired sample t-test or Wilcoxon signed-rank test, and Bland-Altman plots were used to evaluate intra- and inter-examiner error and to determine the accuracy and reliability of the measurements. The ICC values were generally good for manual and excellent for digital measurements. The Bland-Altman plots of all the measurements showed good agreement between the manual and digital methods and excellent inter-examiner agreement using the digital method. Inter-arch occlusal measurements on digital models are accurate and reliable and are superior to manual measurements.
Fuller, Catherine J; Bladon, Bruce M; Driver, Adam J; Barr, Alistair R S
2006-03-01
The objective of this study was to assess the reliability of lameness scoring in horses. One veterinary surgeon examined nineteen lame horses on four occasions. Gait was recorded by camcorder, and scored from 0 to 10 ranging from sound to non-weight bearing lameness. A global score of overall change in lameness during the study was also determined for each horse. To measure intra-assessor reliability of the scoring systems, one veterinary surgeon scored videotapes of the horses' gaits on two occasions. To measure inter-assessor reliability, three veterinary surgeons viewed the videotapes, assigning individual lameness scores plus global scores to each horse. Reliability of individual lameness scoring was good intra-assessor, but only just within our acceptable limit inter-assessor. However, global scoring of change in lameness throughout the study was found to be reliable overall. Since clinician scoring is commonly used to assess lameness in horses, this is an important finding, fundamental to future clinical studies.
Gómez-Cabello, Alba; Vicente-Rodríguez, Germán; Albers, Ulrike; Mata, Esmeralda; Rodriguez-Marroyo, Jose A.; Olivares, Pedro R.; Gusi, Narcis; Villa, Gerardo; Aznar, Susana; Gonzalez-Gross, Marcela; Casajús, Jose A.; Ara, Ignacio
2012-01-01
Background The elderly EXERNET multi-centre study aims to collect normative anthropometric data for old functionally independent adults living in Spain. Purpose To describe the standardization process and reliability of the anthropometric measurements carried out in the pilot study and during the final workshop, examining both intra- and inter-rater errors for measurements. Materials and Methods A total of 98 elderly from five different regions participated in the intra-rater error assessment, and 10 different seniors living in the city of Toledo (Spain) participated in the inter-rater assessment. We examined both intra- and inter-rater errors for heights and circumferences. Results For height, intra-rater technical errors of measurement (TEMs) were smaller than 0.25 cm. For circumferences and knee height, TEMs were smaller than 1 cm, except for waist circumference in the city of Cáceres. Reliability for heights and circumferences was greater than 98% in all cases. Inter-rater TEMs were 0.61 cm for height, 0.75 cm for knee-height and ranged between 2.70 and 3.09 cm for the circumferences measured. Inter-rater reliabilities for anthropometric measurements were always higher than 90%. Conclusion The harmonization process, including the workshop and pilot study, guarantee the quality of the anthropometric measurements in the elderly EXERNET multi-centre study. High reliability and low TEM may be expected when assessing anthropometry in elderly population. PMID:22860013
An, Hyeong Su; Moon, Won-Jin; Ryu, Jae-Kyun; Park, Ju Yeon; Yun, Won Sung; Choi, Jin Woo; Jahng, Geon-Ho; Park, Jang-Yeon
2017-12-01
This prospective multi-center study aimed to evaluate the inter-vendor and test-retest reliabilities of resting-state functional magnetic resonance imaging (RS-fMRI) by assessing the temporal signal-to-noise ratio (tSNR) and functional connectivity. Study included 10 healthy subjects and each subject was scanned using three 3T MR scanners (GE Signa HDxt, Siemens Skyra, and Philips Achieva) in two sessions. The tSNR was calculated from the time course data. Inter-vendor and test-retest reliabilities were assessed with intra-class correlation coefficients (ICCs) derived from variant component analysis. Independent component analysis was performed to identify the connectivity of the default-mode network (DMN). In result, the tSNR for the DMN was not significantly different among the GE, Philips, and Siemens scanners (P=0.638). In terms of vendor differences, the inter-vendor reliability was good (ICC=0.774). Regarding the test-retest reliability, the GE scanner showed excellent correlation (ICC=0.961), while the Philips (ICC=0.671) and Siemens (ICC=0.726) scanners showed relatively good correlation. The DMN pattern of the subjects between the two sessions for each scanner and between three scanners showed the identical patterns of functional connectivity. The inter-vendor and test-retest reliabilities of RS-fMRI using different 3T MR scanners are good. Thus, we suggest that RS-fMRI could be used in multicenter imaging studies as a reliable imaging marker. Copyright © 2017 Elsevier Inc. All rights reserved.
Hong, Jae Young; Modi, Hitesh N.; Hur, Chang Yong; Song, Hae Ryong; Park, Jong Hoon
2010-01-01
Several methods are used to measure lumbar lordosis. In adult scoliosis patients, the measurement is difficult due to degenerative changes in the vertebral endplate as well as the coronal and sagittal deformity. We did the observational study with three examiners to determine the reliability of six methods for measuring the global lumbar lordosis in adult scoliosis patients. Ninety lateral lumbar radiographs were collected for the study. The radiographs were divided into normal (Cobb < 10°), low-grade (Cobb 10°–19°), high-grade (Cobb ≥ 20°) group to determine the reliability of Cobb L1–S1, Cobb L1–L5, centroid, posterior tangent L1–S1, posterior tangent L1–L5 and TRALL method in adult scoliosis. The 90 lateral radiographs were measured twice by each of the three examiners using the six measurement methods. The data was analyzed to determine the inter- and intra-observer reliability. In general, for the six radiographic methods, the inter- and intra-class correlation coefficients (ICCs) were all ≥0.82. A comparison of the ICCs and 95% CI for the inter- and intra-observer reliability between the groups with varying degrees of scoliosis showed that, the reliability of the lordosis measurement decreased with increasing severity of scoliosis. In Cobb L1–S1, centroid and posterior tangent L1–S1 methods, the ICCs were relatively lower in the high-grade scoliosis group (≥0.60). And, the mean absolute difference (MAD) in these methods was high in the high-grade scoliosis group (≤7.17°). However, in the Cobb L1–L5 and posterior tangent L1–L5 method, the ICCs were ≥0.86 in all groups. And, in the TRALL method, the ICCs were ≥0.76 in all groups. In addition, in the Cobb L1–L5 and posterior tangent L1–L5 method, the MAD was ≤3.63°. And, in the TRALL method, the MAD was ≤3.84° in all groups. We concluded that the Cobb L1–L5 and the posterior tangent L1–L5 methods are reliable methods for measuring the global lumbar lordosis in adult scoliosis. And the TRALL method is more reliable method than other methods which include the L5–S1 joint in lordosis measurement. PMID:20437183
Impact of clinical history on chest radiograph interpretation.
Test, Matthew; Shah, Samir S; Monuteaux, Michael; Ambroggio, Lilliam; Lee, Edward Y; Markowitz, Richard I; Bixby, Sarah; Diperna, Stephanie; Servaes, Sabah; Hellinger, Jeffrey C; Neuman, Mark I
2013-07-01
The inclusion of clinical information may have unrecognized influence in the interpretation of diagnostic testing. The objective of the study was to determine the impact of clinical history on chest radiograph interpretation in the diagnosis of pneumonia. Prospective case-based study. Radiologists interpreted 110 radiographs of children evaluated for suspicion of pneumonia. Clinical information was withheld during the first interpretation. After 6 months the radiographs were reviewed with clinical information. Radiologists reported on pneumonia indicators described by the World Health Organization (ie, any infiltrate, alveolar infiltrate, interstitial infiltrate, air bronchograms, hilar adenopathy, pleural effusion). Children's Hospital of Philadelphia and Boston Children's Hospital. Six board-certified radiologists. Inter- and inter-rater reliability were assessed using the kappa statistic. The addition of clinical history did not have a substantial impact on the inter-rater reliability in the identification of any infiltrate, alveolar infiltrate, interstitial infiltrate, pleural effusion, or hilar adenopathy. Inter-rater reliability in the identification of air bronchograms improved from fair (k = 0.32) to moderate (k = 0.53). Intra-rater reliability for the identification of alveolar infiltrate remained substantial to almost perfect for all 6 raters with and without clinical information. One rater had a decrease in inter-rater reliability from almost perfect (k = 1.0) to fair (k = 0.21) in the identification of interstitial infiltrate with the addition of clinical history. Alveolar infiltrate and pleural effusion are findings with high intra- and inter-rater reliability in the diagnosis of bacterial pneumonia. The addition of clinical information did not have a substantial impact on the reliability of these findings. © 2012 Society of Hospital Medicine.
Post-traumatic subtalar osteoarthritis: which grading system should we use?
de Muinck Keizer, Robert-Jan O; Backes, Manouk; Dingemans, Siem A; Goslings, J Carel; Schepers, Tim
2016-09-01
To assess and compare post-traumatic osteoarthritis following intra-articular calcaneal fractures, one must have a reliable grading system that consistently grades the post-traumatic changes of the joint. A reliable grading system aids in the communication between treating physicians and improves the interpretation of research. To date, there is no consensus on what grading system to use in the evaluation of post-traumatic subtalar osteoarthritis. The objective of this study was to determine and compare the inter- and intra-rater reliability of two grading systems for post-traumatic subtalar osteoarthritis. Four observers evaluated 50 calcaneal fractures at least one year after trauma on conventional oblique lateral, internally and externally rotated views, and graded post-traumatic subtalar osteoarthritis using the Kellgren and Lawrence Grading Scale (KLGS) and the Paley Grading System (PGS). Inter- and intra-rater reliability were calculated and compared. The inter-rater reliability showed an intra-class correlation (ICC) of 0.54 (95 % CI 0.40-0.67) for the KLGS and an ICC of 0.41 (95 % CI 0.26 - 0.57) for the PGS. This difference was not statistically significant. The intra-rater reliability showed a mean weighted kappa of 0.62 for both the KLGS and the PGS. There is no statistically significant difference in reliability between the Kellgren and Lawrence Grading System (KLGS) and the Paley Grading System (PGS). The PGS allows for an easy two-step approach making it easy for everyday clinical purposes. For research purposes however, the more detailed and widely used KLGS seems preferable.
Engesæter, Ingvild Øvstebø; Laborie, Lene Bjerke; Lehmann, Trude Gundersen; Sera, Francesco; Fevang, Jonas; Pedersen, Douglas; Morcuende, José; Lie, Stein Atle; Engesæter, Lars Birger; Rosendahl, Karen
2012-07-01
To report on intra-observer, inter-observer, and inter-method reliability and agreement for radiological measurements used in the diagnosis of hip dysplasia at skeletal maturity, as obtained by a manual and a digital measurement technique. Pelvic radiographs from 95 participants (56 females) in a follow-up hip study of 18- to 19-year-old patients were included. Eleven radiological measurements relevant for hip dysplasia (Sharp's, Wiberg's, and Ogata's angles; acetabular roof angle of Tönnis; articulo-trochanteric distance; acetabular depth-width ratio; femoral head extrusion index; maximum teardrop width; and the joint space width in three different locations) were validated. Three observers measured the radiographs using both a digital measurement program and manually in AgfaWeb1000. Inter-method and inter- and intra-observer agreement were analyzed using the mean differences between the readings/readers, establishing the 95% limits of agreement. We also calculated the minimum detectable change and the intra-class correlation coefficient. Large variations among different radiological measurements were demonstrated. However, the variation was not related to the use of either the manual or digital measurement technique. For measurements with greater absolute values (Sharp's angle, femoral head extrusion index, and acetabular depth-width ratio) the inter- and intra-observer and inter-method agreements were better as compared to measurements with lower absolute values (acetabular roof angle, teardrop and joint space width). The inter- and intra-observer variation differs notably across different radiological measurements relevant for hip dysplasia at skeletal maturity, a fact that should be taken into account in clinical practice. The agreement between the manual and digital methods is good.
Inter-Observer and Intra-Observer Reliability of Clinical Assessments in Knee Osteoarthritis
Maricar, Nasimah; Callaghan, Michael J; Parkes, Matthew J; Felson, David T; O’Neill, Terence W
2016-01-01
Background Clinical examination of the knee is subject to measurement error. The aim of this analysis was to determine inter- and intra-observer reliability of commonly used clinical tests in patients with knee osteoarthritis(OA). Methods We studied subjects with symptomatic knee OA who were participants in an open-label clinical trial of intra-articular steroid therapy. Following standardisation of the clinical test procedures, two clinicians assessed 25 subjects independently at the same visit, and the same clinician assessed 88 subjects over an interval period of 2–10 weeks; in both cases prior to the steroid intervention. Clinical examination included assessment of bony enlargement, crepitus, quadriceps wasting, knee effusion, joint-line and anserine tenderness and knee range of movement(ROM). Intra-class correlation coefficients(ICC), estimated kappa(κ), weighted kappa(κω) and Bland and Altman plots were used to determine inter- and intra-observer levels of agreement. Results Using Landis and Koch criteria, inter-observer kappa scores were moderate for patellofemoral joint(κ=0.53) and anserine tenderness(κ=0.48); good for bony enlargement(κ=0.66), quadriceps wasting(κ=0.78), crepitus(κ=0.78), medial tibiofemoral joint tenderness(κ=0.76), and effusion assessed by ballottement(κ=0.73) and bulge sign(κω =0.78); and excellent for lateral tibiofemoral joint tenderness(κ=1.00), flexion(ICC=0.97) and extension(ICC=0.87) ROM. Intra-observer kappa scores were moderate for lateral tibiofemoral joint tenderness(κ=0.60), good for crepitus(κ=0.78), effusion assessed by ballottement test(κ=0.77), patellofemoral joint(κ=0.66), medial tibiofemoral joint(κ=0.64) and anserine(κ=0.73) tenderness and excellent for effusion assessed by bulge sign(κω =0.83), bony enlargement(κ=0.98), quadriceps wasting(κ=0.83), flexion(ICC=0.99) and extension(ICC=0.96) ROM. Conclusion Among individuals with symptomatic knee OA, the reliability of clinical examination of the knee was at least good for the majority of clinical signs of knee OA. PMID:27909143
Broekstra, Dieuwke C; Lanting, Rosanne; Werker, Paul M N; van den Heuvel, Edwin R
2015-08-01
Dupuytren disease (DD) is a fibrosing disease affecting the palmar aponeurosis, and is mostly treated by surgery based on measurement of severity of flexion contracture of the fingers. Literature concerning the measurement reliability is scarce. This study aimed to determine the intra- and inter-observer agreement of four variables for diagnosing DD, determining severity of contracture, and disease extent. One of them is a new measurement on the area of nodules and cords for measuring the disease extent in early disease stages. An agreement study (n = 54) was performed by two trained investigators. Agreement was calculated per finger, based on an intraclass correlation coefficient (ICC) using a latent variable model on subjects for diagnosis and Tubiana stage. For total passive extension deficit (TPED) and the area of nodules and cords, agreement was calculated with an ICC using a one-way random effects model with subject as random effect. Inter-observer agreement was very good for diagnosing DD (ICC: 95.5%-99.9%) and good to very good for classifying Tubiana stage (ICC: 73.5%-94.9%). Agreements for area and TPED were moderate (middle finger) to very good (ICC: 48.4%-98.6% and 45.0%-99.5%, respectively). Intra-observer agreement was slightly higher on average than inter-observer agreement. Overall, the intra- and inter-observer agreement in diagnosing DD, and determining the severity of flexion contracture is high. Also, the newly introduced variable area of nodules and cords has high intra- and inter-observer agreement, indicating that it is suitable to measure disease extent. Copyright © 2015 Elsevier Ltd. All rights reserved.
Development of the Therapist Empathy Scale.
Decker, Suzanne E; Nich, Charla; Carroll, Kathleen M; Martino, Steve
2014-05-01
Few measures exist to examine therapist empathy as it occurs in session. A 9-item observer rating scale, called the Therapist Empathy Scale (TES), was developed based on Watson's (1999) work to assess affective, cognitive, attitudinal, and attunement aspects of therapist empathy. The aim of this study was to evaluate the inter-rater reliability, internal consistency, and construct and criterion validity of the TES. Raters evaluated therapist empathy in 315 client sessions conducted by 91 therapists, using data from a multi-site therapist training trial (Martino et al., 2010) in Motivational Interviewing (MI). Inter-rater reliability (ICC = .87 to .91) and internal consistency (Cronbach's alpha = .94) were high. Confirmatory factor analyses indicated some support for single-factor fit. Convergent validity was supported by correlations between TES scores and MI fundamental adherence (r range .50 to .67) and competence scores (r range .56 to .69). Discriminant validity was indicated by negative or nonsignificant correlations between TES and MI-inconsistent behavior (r range .05 to -.33). The TES demonstrates excellent inter-rater reliability and internal consistency. RESULTS indicate some support for a single-factor solution and convergent and discriminant validity. Future studies should examine the use of the TES to evaluate therapist empathy in different psychotherapy approaches and to determine the impact of therapist empathy on client outcome.
Reliability of Measurements Performed by Community-Drawn Anthropometrists from Rural Ethiopia
Ayele, Berhan; Aemere, Abaineh; Gebre, Teshome; Tadesse, Zerihun; Stoller, Nicole E.; See, Craig W.; Yu, Sun N.; Gaynor, Bruce D.; McCulloch, Charles E.; Porco, Travis C.; Emerson, Paul M.; Lietman, Thomas M.; Keenan, Jeremy D.
2012-01-01
Background Undernutrition is an important risk factor for childhood mortality, and remains a major problem facing many developing countries. Millennium Development Goal 1 calls for a reduction in underweight children, implemented through a variety of interventions. To adequately judge the impact of these interventions, it is important to know the reproducibility of the main indicators for undernutrition. In this study, we trained individuals from rural communities in Ethiopia in anthropometry techniques and measured intra- and inter-observer reliability. Methods and Findings We trained 6 individuals without prior anthropometry experience to perform weight, height, and middle upper arm circumference (MUAC) measurements. Two anthropometry teams were dispatched to 18 communities in rural Ethiopia and measurements performed on all consenting pre-school children. Anthropometry teams performed a second independent measurement on a convenience sample of children in order to assess intra-anthropometrist reliability. Both teams measured the same children in 2 villages to assess inter-anthropometrist reliability. We calculated several metrics of measurement reproducibility, including the technical error of measurement (TEM) and relative TEM. In total, anthropometry teams performed measurements on 606 pre-school children, 84 of which had repeat measurements performed by the same team, and 89 of which had measurements performed by both teams. Intra-anthropometrist TEM (and relative TEM) were 0.35 cm (0.35%) for height, 0.05 kg (0.39%) for weight, and 0.18 cm (1.27%) for MUAC. Corresponding values for inter-anthropometrist reliability were 0.67 cm (0.75%) for height, 0.09 kg (0.79%) for weight, and 0.22 kg (1.53%) for MUAC. Inter-anthropometrist measurement error was greater for smaller children than for larger children. Conclusion Measurements of height and weight were more reproducible than measurements of MUAC and measurements of larger children were more reliable than those for smaller children. Community-drawn anthropometrists can provide reliable measurements that could be used to assess the impact of interventions for childhood undernutrition. PMID:22291939
Development and reliability testing of a food store observation form.
Rimkus, Leah; Powell, Lisa M; Zenk, Shannon N; Han, Euna; Ohri-Vachaspati, Punam; Pugach, Oksana; Barker, Dianne C; Resnick, Elissa A; Quinn, Christopher M; Myllyluoma, Jaana; Chaloupka, Frank J
2013-01-01
To develop a reliable food store observational data collection instrument to be used for measuring product availability, pricing, and promotion. Observational data collection. A total of 120 food stores (26 supermarkets, 34 grocery stores, 54 gas/convenience stores, and 6 mass merchandise stores) in the Chicago metropolitan statistical area. Inter-rater reliability for product availability, pricing, and promotion measures on a food store observational data collection instrument. Cohen's kappa coefficient and proportion of overall agreement for dichotomous variables and intra-class correlation coefficient for continuous variables. Inter-rater reliability, as measured by average kappa coefficient, was 0.84 for food and beverage product availability measures, 0.80 for interior store characteristics, and 0.70 for exterior store characteristics. For continuous measures, average intra-class correlation coefficient was 0.82 for product pricing measures; 0.90 for counts of fresh, frozen, and canned fruit and vegetable options; and 0.85 for counts of advertisements on the store exterior and property. The vast majority of measures demonstrated substantial or almost perfect agreement. Although some items may require revision, results suggest that the instrument may be used to reliably measure the food store environment. Copyright © 2013 Society for Nutrition Education and Behavior. Published by Elsevier Inc. All rights reserved.
Beardsley, Chris; Egerton, Tim; Skinner, Brendon
2016-01-01
Objective. The purpose of this study was to investigate the reliability of a digital pelvic inclinometer (DPI) for measuring sagittal plane pelvic tilt in 18 young, healthy males and females. Method. The inter-rater reliability and test-re-test reliabilities of the DPI for measuring pelvic tilt in standing on both the right and left sides of the pelvis were measured by two raters carrying out two rating sessions of the same subjects, three weeks apart. Results. For measuring pelvic tilt, inter-rater reliability was designated as good on both sides (ICC = 0.81-0.88), test-re-test reliability within a single rating session was designated as good on both sides (ICC = 0.88-0.95), and test-re-test reliability between two rating sessions was designated as moderate on the left side (ICC = 0.65) and good on the right side (ICC = 0.85). Conclusion. Inter-rater reliability and test-re-test reliability within a single rating session of the DPI in measuring pelvic tilt were both good, while test-re-test reliability between rating sessions was moderate-to-good. Caution is required regarding the interpretation of the test-re-test reliability within a single rating session, as the raters were not blinded. Further research is required to establish validity.
Construct Validity and Reliability of the SARA Gait and Posture Sub-scale in Early Onset Ataxia
Lawerman, Tjitske F.; Brandsma, Rick; Verbeek, Renate J.; van der Hoeven, Johannes H.; Lunsing, Roelineke J.; Kremer, Hubertus P. H.; Sival, Deborah A.
2017-01-01
Aim: In children, gait and posture assessment provides a crucial marker for the early characterization, surveillance and treatment evaluation of early onset ataxia (EOA). For reliable data entry of studies targeting at gait and posture improvement, uniform quantitative biomarkers are necessary. Until now, the pediatric test construct of gait and posture scores of the Scale for Assessment and Rating of Ataxia sub-scale (SARA) is still unclear. In the present study, we aimed to validate the construct validity and reliability of the pediatric (SARAGAIT/POSTURE) sub-scale. Methods: We included 28 EOA patients [15.5 (6–34) years; median (range)]. For inter-observer reliability, we determined the ICC on EOA SARAGAIT/POSTURE sub-scores by three independent pediatric neurologists. For convergent validity, we associated SARAGAIT/POSTURE sub-scores with: (1) Ataxic gait Severity Measurement by Klockgether (ASMK; dynamic balance), (2) Pediatric Balance Scale (PBS; static balance), (3) Gross Motor Function Classification Scale -extended and revised version (GMFCS-E&R), (4) SARA-kinetic scores (SARAKINETIC; kinetic function of the upper and lower limbs), (5) Archimedes Spiral (AS; kinetic function of the upper limbs), and (6) total SARA scores (SARATOTAL; i.e., summed SARAGAIT/POSTURE, SARAKINETIC, and SARASPEECH sub-scores). For discriminant validity, we investigated whether EOA co-morbidity factors (myopathy and myoclonus) could influence SARAGAIT/POSTURE sub-scores. Results: The inter-observer agreement (ICC) on EOA SARAGAIT/POSTURE sub-scores was high (0.97). SARAGAIT/POSTURE was strongly correlated with the other ataxia and functional scales [ASMK (rs = -0.819; p < 0.001); PBS (rs = -0.943; p < 0.001); GMFCS-E&R (rs = -0.862; p < 0.001); SARAKINETIC (rs = 0.726; p < 0.001); AS (rs = 0.609; p = 0.002); and SARATOTAL (rs = 0.935; p < 0.001)]. Comorbid myopathy influenced SARAGAIT/POSTURE scores by concurrent muscle weakness, whereas comorbid myoclonus predominantly influenced SARAKINETIC scores. Conclusion: In young EOA patients, separate SARAGAIT/POSTURE parameters reveal a good inter-observer agreement and convergent validity, implicating the reliability of the scale. In perspective of incomplete discriminant validity, it is advisable to interpret SARAGAIT/POSTURE scores for comorbid muscle weakness. PMID:29326569
Lauricella, Leticia L; Costa, Priscila B; Salati, Michele; Pego-Fernandes, Paulo M; Terra, Ricardo M
2018-06-01
Database quality measurement should be considered a mandatory step to ensure an adequate level of confidence in data used for research and quality improvement. Several metrics have been described in the literature, but no standardized approach has been established. We aimed to describe a methodological approach applied to measure the quality and inter-rater reliability of a regional multicentric thoracic surgical database (Paulista Lung Cancer Registry). Data from the first 3 years of the Paulista Lung Cancer Registry underwent an audit process with 3 metrics: completeness, consistency, and inter-rater reliability. The first 2 methods were applied to the whole data set, and the last method was calculated using 100 cases randomized for direct auditing. Inter-rater reliability was evaluated using percentage of agreement between the data collector and auditor and through calculation of Cohen's κ and intraclass correlation. The overall completeness per section ranged from 0.88 to 1.00, and the overall consistency was 0.96. Inter-rater reliability showed many variables with high disagreement (>10%). For numerical variables, intraclass correlation was a better metric than inter-rater reliability. Cohen's κ showed that most variables had moderate to substantial agreement. The methodological approach applied to the Paulista Lung Cancer Registry showed that completeness and consistency metrics did not sufficiently reflect the real quality status of a database. The inter-rater reliability associated with κ and intraclass correlation was a better quality metric than completeness and consistency metrics because it could determine the reliability of specific variables used in research or benchmark reports. This report can be a paradigm for future studies of data quality measurement. Copyright © 2018 American College of Surgeons. Published by Elsevier Inc. All rights reserved.
Reliability and validity of the Turkish version of the Berg Balance Scale.
Sahin, Fusun; Yilmaz, Figen; Ozmaden, Asli; Kotevolu, Nurdan; Sahin, Tulay; Kuran, Banu
2008-01-01
The purpose of this study was to develop a Turkish version of the Berg Balance Scale (BBS) and assess its reliability and validity. Sixty healthy volunteers older than 65 years were included in to the study. Subjects who had lower extremity amputation, or were armchair or bedridden were excluded. After translation process, the Turkish version of the scale was administered to each participant twice with an interval of 2 weeks. The intraclass correlation coefficient (ICC) was calculated to assess intra- and inter-observer reliability. Chronbach alpha was calculated to evaluate internal consistency of the total BBS score. Interclass correlation coefficient was calcuated to examine test-retest reliability. Convergent validity was assessed by correlating the scale with Modified Barthel Index (MBI) and Timed Up and Go Test (TUG). Construct validity was assessed with factor analysis. The mean age in years of the participants were 77.00+/-5.67 (range: 67-92 yrs). The ICC for intra- and inter- observer reliability was 0.98 (p<0.0001) and 0.97 (p<0.0001), respectively. Chronbach alpha of the Turkish version of the BBS was 0.98. The test-retest reliability (ICC) of the Turkish version of the BBS was determined as 0.98 for the total score, and ranged from 0.86-0.99 for individual items. In terms of validity, the Turkish version of the BBS was correlated with the MBI (in positive direction) and TUG (in negative direction) (r=0.67 p<0.0001; r=-0.75 p<0.0001, respectively). The Turkish version of the BBS is a reliable and valid scale to be used in balance assessment of Turkish older adults.
Childress, M O; Fulkerson, C M; Lahrman, S A; Weng, H-Y
2016-08-01
The purpose of this study was to assess reliability of lymph node measurements between and within raters in dogs with nodal lymphomas. Three raters measured lymph nodes from 20 dogs twice prior to and once after administering chemotherapy. Sum tumour volume (TV) and sum longest diameter (LD) of all lymph nodes at each time point, and the percent change in measurements following chemotherapy, were calculated for each dog. Inter- and intra-rater reliability were assessed with the intraclass correlation coefficient (ICC). ICC for inter-rater sum TV and sum LD prior to chemotherapy were 0.86 and 0.80, respectively. ICC for inter-rater sum TV and sum LD after chemotherapy were 0.95 and 0.91, respectively. ICC for percent change in sum TV and sum LD were 0.96 and 0.94, respectively. ICC for intra-rater reliability ranged from 0.90 to 0.98 for each rater. Inter- and intra-rater reliability in measurements among the three raters was good to excellent. © 2014 John Wiley & Sons Ltd.
van Trijffel, Emiel; Lindeboom, Robert; Bossuyt, Patrick Mm; Schmitt, Maarten A; Lucas, Cees; Koes, Bart W; Oostendorp, Rob Ab
2014-01-01
Manual spinal joint mobilisations and manipulations are widely used treatments in patients with neck and low-back pain. Inter-examiner reliability of passive intervertebral motion assessment of the cervical and lumbar spine, perceived as important for indicating these interventions, is poor within a univariable approach. The diagnostic process as a whole in daily practice in manual therapy has a multivariable character, however, in which the use and interpretation of passive intervertebral motion assessment depend on earlier results from the diagnostic process. To date, the inter-examiner reliability among manual therapists of a multivariable diagnostic decision-making process in patients with neck or low-back pain is unknown. This study will be conducted as a repeated-measures design in which 14 pairs of manual therapists independently examine a consecutive series of a planned total of 165 patients with neck or low-back pain presenting in primary care physiotherapy. Primary outcome measure is therapists' decision about whether or not manual spinal joint mobilisations or manipulations, or both, are indicated in each patient, alone or as part of a multimodal treatment. Therapists will largely be free to conduct the full diagnostic process based on their formulated examination objectives. For each pair of therapists, 2×2 tables will be constructed and reliability for the dichotomous decision will be expressed using Cohen's kappa. In addition, observed agreement, prevalence of positive decisions, prevalence index, bias index, and specific agreement in positive and negative decisions will be calculated. Univariable logistic regression analysis of concordant decisions will be performed to explore which demographic, professional, or clinical factors contributed to reliability. This study will provide an estimate of the inter-examiner reliability among manual therapists of indicating spinal joint mobilisations or manipulations in patients with neck or low-back pain based on a multivariable diagnostic reasoning and decision-making process, as opposed to reliability of individual tests. As such, it is proposed as an initial step toward the development of an alternative approach to current classification systems and prediction rules for identifying those patients with spinal disorders that may show a better response to manual therapy which can be incorporated in randomised clinical trials. Potential methodological limitations of this study are discussed.
2014-01-01
Background Manual spinal joint mobilisations and manipulations are widely used treatments in patients with neck and low-back pain. Inter-examiner reliability of passive intervertebral motion assessment of the cervical and lumbar spine, perceived as important for indicating these interventions, is poor within a univariable approach. The diagnostic process as a whole in daily practice in manual therapy has a multivariable character, however, in which the use and interpretation of passive intervertebral motion assessment depend on earlier results from the diagnostic process. To date, the inter-examiner reliability among manual therapists of a multivariable diagnostic decision-making process in patients with neck or low-back pain is unknown. Methods This study will be conducted as a repeated-measures design in which 14 pairs of manual therapists independently examine a consecutive series of a planned total of 165 patients with neck or low-back pain presenting in primary care physiotherapy. Primary outcome measure is therapists’ decision about whether or not manual spinal joint mobilisations or manipulations, or both, are indicated in each patient, alone or as part of a multimodal treatment. Therapists will largely be free to conduct the full diagnostic process based on their formulated examination objectives. For each pair of therapists, 2×2 tables will be constructed and reliability for the dichotomous decision will be expressed using Cohen’s kappa. In addition, observed agreement, prevalence of positive decisions, prevalence index, bias index, and specific agreement in positive and negative decisions will be calculated. Univariable logistic regression analysis of concordant decisions will be performed to explore which demographic, professional, or clinical factors contributed to reliability. Discussion This study will provide an estimate of the inter-examiner reliability among manual therapists of indicating spinal joint mobilisations or manipulations in patients with neck or low-back pain based on a multivariable diagnostic reasoning and decision-making process, as opposed to reliability of individual tests. As such, it is proposed as an initial step toward the development of an alternative approach to current classification systems and prediction rules for identifying those patients with spinal disorders that may show a better response to manual therapy which can be incorporated in randomised clinical trials. Potential methodological limitations of this study are discussed. PMID:24982754
Do you see what I see? Mobile eye-tracker contextual analysis and inter-rater reliability.
Stuart, S; Hunt, D; Nell, J; Godfrey, A; Hausdorff, J M; Rochester, L; Alcock, L
2018-02-01
Mobile eye-trackers are currently used during real-world tasks (e.g. gait) to monitor visual and cognitive processes, particularly in ageing and Parkinson's disease (PD). However, contextual analysis involving fixation locations during such tasks is rarely performed due to its complexity. This study adapted a validated algorithm and developed a classification method to semi-automate contextual analysis of mobile eye-tracking data. We further assessed inter-rater reliability of the proposed classification method. A mobile eye-tracker recorded eye-movements during walking in five healthy older adult controls (HC) and five people with PD. Fixations were identified using a previously validated algorithm, which was adapted to provide still images of fixation locations (n = 116). The fixation location was manually identified by two raters (DH, JN), who classified the locations. Cohen's kappa correlation coefficients determined the inter-rater reliability. The algorithm successfully provided still images for each fixation, allowing manual contextual analysis to be performed. The inter-rater reliability for classifying the fixation location was high for both PD (kappa = 0.80, 95% agreement) and HC groups (kappa = 0.80, 91% agreement), which indicated a reliable classification method. This study developed a reliable semi-automated contextual analysis method for gait studies in HC and PD. Future studies could adapt this methodology for various gait-related eye-tracking studies.
Campagna, Giuseppe; Zampetti, Simona; Gallozzi, Alessia; Giansanti, Sara; Chiesa, Claudio; Pacifico, Lucia; Buzzetti, Raffaella
2016-01-01
In a previous study, we found that wrist circumference, in particular its bone component, was associated with insulin resistance in a population of overweight/obese children. The aim of the present study was to evaluate the intra- and inter-operator variability in wrist circumference measurement in a population of obese children and adolescents. One hundred and two (54 male and 48 female) obese children and adolescents were consecutively enrolled. In all subjects wrist circumferences were measured by two different operators two times to assess intra- and inter-operator variability. Statistical analysis was performed using SAS v.9.4 and JMP v.12. Measurements of wrist circumference showed excellent inter-operator reliability with Intra class Correlation Coefficients (ICC) of 0.96 and ICC of 0.97 for the first and the second measurement, respectively. The intra-operator reliability was, also, very strong with a Concordance Correlation Coefficient (CCC) of 0.98 for both operators. The high reproducibility demonstrated in our results suggests that wrist circumference measurement, being safe, non-invasive and repeatable can be easily used in out-patient settings to identify youths with increased risk of insulin-resistance. This can avoid testing the entire population of overweight/obese children for insulin resistance parameters. PMID:27294398
Robbrecht, Cedric; Claes, Steven; Cromheecke, Michiel; Mahieu, Peter; Kakavelakis, Kyriakos; Victor, Jan; Bellemans, Johan; Verdonk, Peter
2014-10-01
Post-operative widening of tibial and/or femoral bone tunnels is a common observation after ACL reconstruction, especially with soft-tissue grafts. There are no studies comparing tunnel widening in hamstring autografts versus tibialis anterior allografts. The goal of this study was to observe the difference in tunnel widening after the use of allograft vs. autograft for ACL reconstruction, by measuring it with a novel 3-D computed tomography based method. Thirty-five ACL-deficient subjects were included, underwent anatomic single-bundle ACL reconstruction and were evaluated at one year after surgery with the use of 3-D CT imaging. Three independent observers semi-automatically delineated femoral and tibial tunnel outlines, after which a best-fit cylinder was derived and the tunnel diameter was determined. Finally, intra- and inter-observer reliability of this novel measurement protocol was defined. In femoral tunnels, the intra-observer ICC was 0.973 (95% CI: 0.922-0.991) and the inter-observer ICC was 0.992 (95% CI: 0.982-0.996). In tibial tunnels, the intra-observer ICC was 0.955 (95% CI: 0.875-0.985). The combined inter-observer ICC was 0.970 (95% CI: 0.987-0.917). Tunnel widening was significantly higher in allografts compared to autografts, in the tibial tunnels (p=0.013) as well as in the femoral tunnels (p=0.007). To our knowledge, this novel, semi-automated 3D-computed tomography image processing method has shown to yield highly reproducible results for the measurement of bone tunnel diameter and area. This series showed a significantly higher amount of tunnel widening observed in the allograft group at one-year follow-up. Level II, Prospective comparative study. Copyright © 2014 Elsevier B.V. All rights reserved.
Décary, Simon; Ouellet, Philippe; Vendittoli, Pascal-André; Desmeules, François
2016-12-01
Clinicians often rely on physical examination tests to guide them in the diagnostic process of knee disorders. However, reliability of these tests is often overlooked and may influence the consistency of results and overall diagnostic validity. Therefore, the objective of this study was to systematically review evidence on the reliability of physical examination tests for the diagnosis of knee disorders. A structured literature search was conducted in databases up to January 2016. Included studies needed to report reliability measures of at least one physical test for any knee disorder. Methodological quality was evaluated using the QAREL checklist. A qualitative synthesis of the evidence was performed. Thirty-three studies were included with a mean QAREL score of 5.5 ± 0.5. Based on low to moderate quality evidence, the Thessaly test for meniscal injuries reached moderate inter-rater reliability (k = 0.54). Based on moderate to excellent quality evidence, the Lachman for anterior cruciate ligament injuries reached moderate to excellent inter-rater reliability (k = 0.42 to 0.81). Based on low to moderate quality evidence, the Tibiofemoral Crepitus, Joint Line and Patellofemoral Pain/Tenderness, Bony Enlargement and Joint Pain on Movement tests for knee osteoarthritis reached fair to excellent inter-rater reliability (k = 0.29 to 0.93). Based on low to moderate quality evidence, the Lateral Glide, Lateral Tilt, Lateral Pull and Quality of Movement tests for patellofemoral pain reached moderate to good inter-rater reliability (k = 0.49 to 0.73). Many physical tests appear to reach good inter-rater reliability, but this is based on low-quality and conflicting evidence. High-quality research is required to evaluate the reliability of knee physical examination tests. Copyright © 2016 Elsevier Ltd. All rights reserved.
Downs, Stephen; Marquez, Jodie; Chiarelli, Pauline
2013-06-01
What is the intra-rater and inter-rater relative reliability of the Berg Balance Scale? What is the absolute reliability of the Berg Balance Scale? Does the absolute reliability of the Berg Balance Scale vary across the scale? Systematic review with meta-analysis of reliability studies. Any clinical population that has undergone assessment with the Berg Balance Scale. Relative intra-rater reliability, relative inter-rater reliability, and absolute reliability. Eleven studies involving 668 participants were included in the review. The relative intrarater reliability of the Berg Balance Scale was high, with a pooled estimate of 0.98 (95% CI 0.97 to 0.99). Relative inter-rater reliability was also high, with a pooled estimate of 0.97 (95% CI 0.96 to 0.98). A ceiling effect of the Berg Balance Scale was evident for some participants. In the analysis of absolute reliability, all of the relevant studies had an average score of 20 or above on the 0 to 56 point Berg Balance Scale. The absolute reliability across this part of the scale, as measured by the minimal detectable change with 95% confidence, varied between 2.8 points and 6.6 points. The Berg Balance Scale has a higher absolute reliability when close to 56 points due to the ceiling effect. We identified no data that estimated the absolute reliability of the Berg Balance Scale among participants with a mean score below 20 out of 56. The Berg Balance Scale has acceptable reliability, although it might not detect modest, clinically important changes in balance in individual subjects. The review was only able to comment on the absolute reliability of the Berg Balance Scale among people with moderately poor to normal balance. Copyright © 2013 Australian Physiotherapy Association. Published by .. All rights reserved.
Elinder, L S; Brunosson, A; Bergström, H; Hagströmer, M; Patterson, E
2012-02-01
Dietary assessment is a challenge in general, and specifically in individuals with intellectual disabilities (ID). This study aimed to evaluate personal digital photography as a method of assessing different aspects of dietary quality in this target group. Eighteen adults with ID were recruited from community residences and activity centres in Stockholm County. Participants were instructed to photograph all foods and beverages consumed during 1 day, while observed. Photographs were coded by two raters. Observations and photographs of meal frequency, intake occasions of four specific food and beverage items, meal quality and dietary diversity were compared. Evaluation of inter-rater reliability and validity of the method was performed by intra-class correlation analysis. With reminders from staff, 85% of all observed eating or drinking occasions were photographed. The inter-rater reliability was excellent for all assessed variables (ICC ≥ 0.88), except for meal quality where ICC was 0.66. The correlations between items assessed in photos and observations were strong to almost perfect with ICC values ranging from 0.71 to 0.92 and all were statistically significant. Personal digital photography appears to be a feasible, reliable and valid method for assessing dietary quality in people with mild to moderate ID, who have daily staff support. © 2011 The Authors. Journal of Intellectual Disability Research © 2011 Blackwell Publishing Ltd.
Introducing a new definition of a near fall: intra-rater and inter-rater reliability.
Maidan, I; Freedman, T; Tzemah, R; Giladi, N; Mirelman, A; Hausdorff, J M
2014-01-01
Near falls (NFs) are more frequent than falls, and may occur before falls, potentially predicting fall risk. As such, identification of a NF is important. We aimed to assess intra and inter-rater reliability of the traditional definition of a NF and to demonstrate the potential utility of a new definition. To this end, 10 older adults, 10 idiopathic elderly fallers, and 10 patients with Parkinson's disease (PD) walked in an obstacle course while wearing a safety harness. All walks were videotaped. Forty-nine video segments were extracted to create 2 clips each of 8.48 min. Four raters scored each event using the traditional definition and, two weeks later, using the new definition. A fifth rater used only the new definition. Intra-rater reliability was determined using Kappa (K) statistics and inter-rater reliability was determined using ICC. Using the traditional definition, three raters had poor intra-rater reliability (K<0.054, p>0.137) and one rater had moderate intra-rater reliability (K=0.624, p<0.001). With the traditional definition, inter-rater reliability between the four raters was moderate (ICC=0.667, p<0.001). In contrast, the new NF definition showed high intra-rater (K>0.601, p<0.001) and excellent inter-rater reliability (ICC=0.815, p<0.001). A priori, it is easy to distinguish falls from usual walking and NFs, but it is more challenging to distinguish NFs from obstacle negotiation and usual walking. Therefore, a more precise definition of NF is required. The results of the present study suggest that the proposed new definition increases intra and inter-rater reliability, a critical step for using NFs to quantify fall risk. Copyright © 2013 Elsevier B.V. All rights reserved.
Visual judgements of steadiness in one-legged stance: reliability and validity.
Haupstein, T; Goldie, P
2000-01-01
There is a paucity of information about the validity and reliability of clinicians' visual judgements of steadiness in one-legged stance. Such judgements are used frequently in clinical practice to support decisions about treatment in the fields of neurology, sports medicine, paediatrics and orthopaedics. The aim of the present study was to address the validity and reliability of visual judgements of steadiness in one-legged stance in a group of physiotherapists. A videotape of 20 five-second performances was shown to 14 physiotherapists with median clinical experience of 6.75 years. Validity of visual judgement was established by correlating scores obtained from an 11-point rating scale with criterion scores obtained from a force platform. In addition, partial correlations were used to control for the potential influence of body weight on the relationship between the visual judgements and criterion scores. Inter-observer reliability was quantified between the physiotherapists; intra-observer reliability was quantified between two tests four weeks apart. Mean criterion-related validity was high, regardless of whether body weight was controlled for statistically (Pearson's r = 0.84, 0.83, respectively). The standard error of estimating the criterion score was 3.3 newtons. Inter-observer reliability was high (ICC (2,1) = 0.81 at Test 1 and 0.82 at Test 2). Intra-observer reliability was high (on average ICC (2,1) = 0.88; Pearson's r = 0.90). The standard error of measurement for the 11-point scale was one unit. The finding of higher accuracy of making visual judgements than previously reported may be due to several aspects of design: use of a criterion score derived from the variability of the force signal which is more discriminating than variability of centre of pressure; use of a discriminating visual rating scale; specificity and clear definition of the phenomenon to be rated.
Reliability of intracerebral hemorrhage classification systems: A systematic review.
Rannikmäe, Kristiina; Woodfield, Rebecca; Anderson, Craig S; Charidimou, Andreas; Chiewvit, Pipat; Greenberg, Steven M; Jeng, Jiann-Shing; Meretoja, Atte; Palm, Frederic; Putaala, Jukka; Rinkel, Gabriel Je; Rosand, Jonathan; Rost, Natalia S; Strbian, Daniel; Tatlisumak, Turgut; Tsai, Chung-Fen; Wermer, Marieke Jh; Werring, David; Yeh, Shin-Joe; Al-Shahi Salman, Rustam; Sudlow, Cathie Lm
2016-08-01
Accurately distinguishing non-traumatic intracerebral hemorrhage (ICH) subtypes is important since they may have different risk factors, causal pathways, management, and prognosis. We systematically assessed the inter- and intra-rater reliability of ICH classification systems. We sought all available reliability assessments of anatomical and mechanistic ICH classification systems from electronic databases and personal contacts until October 2014. We assessed included studies' characteristics, reporting quality and potential for bias; summarized reliability with kappa value forest plots; and performed meta-analyses of the proportion of cases classified into each subtype. We included 8 of 2152 studies identified. Inter- and intra-rater reliabilities were substantial to perfect for anatomical and mechanistic systems (inter-rater kappa values: anatomical 0.78-0.97 [six studies, 518 cases], mechanistic 0.89-0.93 [three studies, 510 cases]; intra-rater kappas: anatomical 0.80-1 [three studies, 137 cases], mechanistic 0.92-0.93 [two studies, 368 cases]). Reporting quality varied but no study fulfilled all criteria and none was free from potential bias. All reliability studies were performed with experienced raters in specialist centers. Proportions of ICH subtypes were largely consistent with previous reports suggesting that included studies are appropriately representative. Reliability of existing classification systems appears excellent but is unknown outside specialist centers with experienced raters. Future reliability comparisons should be facilitated by studies following recently published reporting guidelines. © 2016 World Stroke Organization.
Feijen, Stef; Kuppens, Kevin; Tate, Angela; Baert, Isabel; Struyf, Thomas; Struyf, Filip
2018-04-17
Measuring thoracic spine mobility can be of interest to competitive swimmers as it has been associated with shoulder girdle function and scapular position in subjects with and without shoulder pain. At present, no reliability data of thoracic spine mobility measurements are available in the swimming population. This study aims to evaluate the within-session intra- and interrater reliability of the "lumbar-locked rotation test" for thoracic spine rotation in competitive swimmers aged 10 to 18 years. This reliability study is part of a larger prospective cohort study investigating potential risk factors for the development of shoulder pain in competitive swimmers. Within-session, intra- and inter-rater reliability. Competitive swimming clubs in Belgium. 21 competitive swimmers. Intra- and inter-rater reliability of the lumbar-locked thoracic rotation test. Intraclass correlation coefficients (ICCs) ranged from 0.91 (95% CI 0.78 to 0.96) to 0.96 (0.89-0.98) for intra-rater reliability. Results for inter-rater reliability ranged from 0.89 (0.72-0.95) to 0.86 (0.65-0.94) respectively for right and left thoracic rotation. Results suggest good to excellent reliability of the lumbar-locked thoracic rotation test, indicating this test can be used reliably in clinical practice. Copyright © 2018 Elsevier Ltd. All rights reserved.
Gnat, Rafael; Saulicz, Edward; Miądowicz, Barbara
2012-08-01
To investigate intra- and inter-rater reliability of the ultrasound measurement of transversus abdominis (TrA) thickness and thickness change (difference between thickness at rest and during contraction) in asymptomatic, trained subjects. To define the number of repeated measurements that provide acceptable level of reliability. To investigate variability of the measurements over time of 5 days and the reliability of duplicate analysis of images. A single-group repeated-measures design was used to assess reliability. Healthy volunteers (n = 10) were subjected to 1-week training in voluntary activation of TrA. Real-time ultrasound imaging and subsequent measurement of the TrA thickness at rest and during voluntary contraction were repeated on Monday, Wednesday and Friday of the next week. Using a single repeated measurement, intraclass correlation coefficients (ICCs) for TrA thickness were: 0.86-0.95 (intra-rater), 0.86-0.92 (inter-rater); and for TrA thickness change: 0.34-0.56 (intra-rater), 0.47-0.61 (inter-rater). Using the mean of three repeated measurements respective values were: 0.97, 0.96-0.98; and 0.81-0.84, 0.80-0.90. No significant differences were found between mean values of TrA thickness as well as thickness change obtained on three consecutive measurement days. Duplicate analysis of the images was highly reliable with ICCs of 0.89-0.99. Two repeated measurements for TrA thickness and at least three measurements for TrA thickness change are needed to achieve acceptable levels of intra- and inter-rater reliability. In healthy trained volunteers TrA thickness and thickness change are relatively stable parameters over a 5-day period. Duplicate analysis of the same images by two blinded observers is reliable.
Dinsdale, Graham; Moore, Tonia; O'Leary, Neil; Berks, Michael; Roberts, Christopher; Manning, Joanne; Allen, John; Anderson, Marina; Cutolo, Maurizio; Hesselstrand, Roger; Howell, Kevin; Pizzorni, Carmen; Smith, Vanessa; Sulli, Alberto; Wildt, Marie; Taylor, Christopher; Murray, Andrea; Herrick, Ariane L
2017-09-01
Nailfold capillaroscopic parameters hold increasing promise as outcome measures for clinical trials in systemic sclerosis (SSc). Their inclusion as outcomes would often naturally require capillaroscopy images to be captured at several time points during any one study. Our objective was to assess repeatability of image acquisition (which has been little studied), as well as of measurement. 41 patients (26 with SSc, 15 with primary Raynaud's phenomenon) and 10 healthy controls returned for repeat high-magnification (300×) videocapillaroscopy mosaic imaging of 10 digits one week after initial imaging (as part of a larger study of reliability). Images were assessed in a random order by an expert blinded observer and 4 outcome measures extracted: (1) overall image grade and then (where possible) distal vessel locations were marked, allowing (2) vessel density (across the whole nailfold) to be calculated (3) apex width measurement and (4) giant vessel count. Intra-rater, intra-visit and intra-rater inter-visit (baseline vs. 1week) reliability were examined in 475 and 392 images respectively. A linear, mixed-effects model was used to estimate variance components, from which intra-class correlation coefficients (ICCs) were determined. Intra-visit and inter-visit reliability estimates (ICCs) were (respectively): overall image grade, 0.97 and 0.90; vessel density, 0.92 and 0.65; mean vessel width, 0.91 and 0.79; presence of giant capillary, 0.68 and 0.56. These estimates were conditional on each parameter being measurable. Within-operator image analysis and acquisition are reproducible. Quantitative nailfold capillaroscopy, at least with a single observer, provides reliable outcome measures for clinical studies including randomised controlled trials. Copyright © 2017 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
van Kernebeek, Willem G.; de Schipper, Antoine W.; Savelsbergh, Geert J. P.; Toussaint, Huub M.
2018-01-01
In The Netherlands, the 4-Skills Scan is an instrument for physical education teachers to assess gross motor skills of elementary school children. Little is known about its reliability. Therefore, in this study the test-retest and inter-rater reliability was determined. Respectively, 624 and 557 Dutch 6- to 12-year-old children were analyzed for…
Gallagher, Ruairi M.; Kirkham, Jamie J.; Mason, Jennifer R.; Bird, Kim A.; Williamson, Paula R.; Nunn, Anthony J.; Turner, Mark A.; Smyth, Rosalind L.; Pirmohamed, Munir
2011-01-01
Aim To develop and test a new adverse drug reaction (ADR) causality assessment tool (CAT). Methods A comparison between seven assessors of a new CAT, formulated by an expert focus group, compared with the Naranjo CAT in 80 cases from a prospective observational study and 37 published ADR case reports (819 causality assessments in total). Main Outcome Measures Utilisation of causality categories, measure of disagreements, inter-rater reliability (IRR). Results The Liverpool ADR CAT, using 40 cases from an observational study, showed causality categories of 1 unlikely, 62 possible, 92 probable and 125 definite (1, 62, 92, 125) and ‘moderate’ IRR (kappa 0.48), compared to Naranjo (0, 100, 172, 8) with ‘moderate’ IRR (kappa 0.45). In a further 40 cases, the Liverpool tool (0, 66, 81, 133) showed ‘good’ IRR (kappa 0.6) while Naranjo (1, 90, 185, 4) remained ‘moderate’. Conclusion The Liverpool tool assigns the full range of causality categories and shows good IRR. Further assessment by different investigators in different settings is needed to fully assess the utility of this tool. PMID:22194808
Rathi, Sangeeta; Taylor, Nicholas F; Gee, Jamie; Green, Rodney A
2016-12-01
Ultrasonography is an economical and non-invasive method for measuring real-time joint movements. Although physiotherapists are increasingly using ultrasound imaging for rotator cuff disorders, there is a lack of evidence on their reliability in using ultrasonography to measure glenohumeral translation. The aim of this study was to evaluate the reliability of a physiotherapist in measuring anterior and posterior glenohumeral joint translation with ultrasound. Study design: within day reliability. Anterior and posterior glenohumeral translations were measured at rest, in response to passive accessory motion testing force, and with isometric internal and external rotation in 12 young healthy adults. All the measurements were made in real time by a physiotherapist and an experienced sonographer in two positions (neutral and abducted) and in two views (anterior and posterior). Intra-rater and inter-rater reliability were expressed using intraclass correlation coefficients (ICC) and measurement error (mm). Intra-rater reliability was good for both raters (ICC P : 0.86-0.98; ICC S : 0.85-0.96). The inter-rater reliability between the physiotherapist and sonographer was moderate to good for posterior measurements (ICC 0.50-0.75) and poor to moderate for anterior measurements (ICC 0.31-0.53). For both intra-rater and inter-rater measurements, posterior translation was more reliable than the anterior translation with smaller measurement errors (posterior: 0.1-0.2 mm, anterior: 0.2-0.3 mm). A physiotherapist with minimal training was reliable in measuring glenohumeral joint translations. The ultrasound method was reliable for repeated measurement of both anterior and posterior glenohumeral translations with posterior measurements being more reliable than anterior. This method is recommended for future research to investigate the stabilising role of rotator cuff muscles. Copyright © 2016 Elsevier Ltd. All rights reserved.
Zygmunt, Arkadiusz; Adamczewski, Zbigniew; Zygmunt, Agnieszka; Karbownik-Lewinska, Malgorzata; Lewinski, Andrzej
2017-01-01
Goitre incidence in school-aged children evaluated using ultrasonography is one of the essential indicators of iodine intake in a given area. The aim of the study was to examine what the difference is between the volume of the thyroid gland measured in the supine and sitting position and to determine the intra-observer, inter-observer, and inter-position variations. The survey was conducted among 87 children (56 girls and 31 boys aged 7-13 years, mean age 10.44 ± 1.72 years). The thyroid volume measured in a sitting position was significantly lower than that measured in the supine position. The intra-observer variations for the total thyroid volume equalled 9.56-9.65%. The inter-observer variations were significantly higher and amounted to 34.5-35.7%. The way in which ultrasound evaluation is performed is important for the analysis of the results. It is crucial to aim for the smallest inter-observer variation, which can be achieved by strictly defining the methods of the thyroid measurement and comparing one's measuring techniques with the reference method. The use of standards in ultrasound evaluation performed in the supine position, as well as the use of standards without a strict determination of the study method, can lead to erro-neous conclusions. © 2017 S. Karger AG, Basel.
Reproducibility of African giant pouched rats detecting Mycobacterium tuberculosis.
Ellis, Haylee; Mulder, Christiaan; Valverde, Emilio; Poling, Alan; Edwards, Timothy
2017-04-24
African pouched rats sniffing sputum samples provided by local clinics have significantly increased tuberculosis case findings in Tanzania and Mozambique. The objective of this study was to determine the reproducibility of rat results. Over an 18-month period 11,869 samples were examined by the rats. Intra-rater reliability was assessed through Yule's Q. Inter-rater reliability was assessed with Krippendorff's alpha. Intra-rater reliability was high, with a mean Yule's Q of 0.9. Inter-rater agreement was fair, with Krippendorf's alpha ranging from 0.15 to 0.45. Both Intra- and Inter-rater reliability was independent of the sex of the animals, but they were positively correlated with age. Both intra- and inter-rater agreement was lowest for samples designated as smear-negative by the clinics. Overall, the reproducibility of tuberculosis detection rat results was fair and diagnostic results were therefore independent of the rats used.
Pomerleau, J; Knai, C; Foster, C; Rutter, H; Darmon, N; Derflerova Brazdova, Z; Hadziomeragic, A F; Pekcan, G; Pudule, I; Robertson, A; Brunner, E; Suhrcke, M; Gabrijelcic Blenkus, M; Lhotska, L; Maiani, G; Mistura, L; Lobstein, T; Martin, B W; Elinder, L S; Logstrup, S; Racioppi, F; McKee, M
2013-03-01
The authors designed an instrument to measure objectively aspects of the built and food environments in urban areas, the EURO-PREVOB Community Questionnaire, within the EU-funded project 'Tackling the social and economic determinants of nutrition and physical activity for the prevention of obesity across Europe' (EURO-PREVOB). This paper describes its development, reliability, validity, feasibility and relevance to public health and obesity research. The Community Questionnaire is designed to measure key aspects of the food and built environments in urban areas of varying levels of affluence or deprivation, within different countries. The questionnaire assesses (1) the food environment and (2) the built environment. Pilot tests of the EURO-PREVOB Community Questionnaire were conducted in five to 10 purposively sampled urban areas of different socio-economic status in each of Ankara, Brno, Marseille, Riga, and Sarajevo. Inter-rater reliability was compared between two pairs of fieldworkers in each city centre using three methods: inter-observer agreement (IOA), kappa statistics, and intraclass correlation coefficients (ICCs). Data were collected successfully in all five cities. Overall reliability of the EURO-PREVOB Community Questionnaire was excellent (inter-observer agreement (IOA) > 0.87; intraclass correlation coefficients (ICC)s > 0.91 and kappa statistics > 0.7. However, assessment of certain aspects of the quality of the built environment yielded slightly lower IOA coefficients than the quantitative aspects. The EURO-PREVOB Community Questionnaire was found to be a reliable and practical observational tool for measuring differences in community-level data on environmental factors that can impact on dietary intake and physical activity. The next step is to evaluate its predictive power by collecting behavioural and anthropometric data relevant to obesity and its determinants. Copyright © 2013 The Royal Society for Public Health. Published by Elsevier Ltd. All rights reserved.
Measuring human remains in the field: Grid technique, total station, or MicroScribe?
Sládek, Vladimír; Galeta, Patrik; Sosna, Daniel
2012-09-10
Although three-dimensional (3D) coordinates for human intra-skeletal landmarks are among the most important data that anthropologists have to record in the field, little is known about the reliability of various measuring techniques. We compared the reliability of three techniques used for 3D measurement of human remain in the field: grid technique (GT), total station (TS), and MicroScribe (MS). We measured 365 field osteometric points on 12 skeletal sequences excavated at the Late Medieval/Early Modern churchyard in Všeruby, Czech Republic. We compared intra-observer, inter-observer, and inter-technique variation using mean difference (MD), mean absolute difference (MAD), standard deviation of difference (SDD), and limits of agreement (LA). All three measuring techniques can be used when accepted error ranges can be measured in centimeters. When a range of accepted error measurable in millimeters is needed, MS offers the best solution. TS can achieve the same reliability as does MS, but only when the laser beam is accurately pointed into the center of the prism. When the prism is not accurately oriented, TS produces unreliable data. TS is more sensitive to initialization than is MS. GT measures human skeleton with acceptable reliability for general purposes but insufficiently when highly accurate skeletal data are needed. We observed high inter-technique variation, indicating that just one technique should be used when spatial data from one individual are recorded. Subadults are measured with slightly lower error than are adults. The effect of maximum excavated skeletal length has little practical significance in field recording. When MS is not available, we offer practical suggestions that can help to increase reliability when measuring human skeleton in the field. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
The development of an instrument to match individuals with disabilities and service animals.
Zapf, S A; Rough, R B
There has been an increase in the use of service animals assisting persons with disabilities in the past decade. However many of the service dog agencies do not utilize an assessment that is designed to match the person to the animal in the rehabilitation and psycho-social domains. The purpose of this study was to develop the Service Animal Adaptive Intervention Assessment (SAAIA) and to measure the content validity, inter-rater reliability and clinical utility of the assessment. Two subject groups were used. Subject group one had 43 subjects who measured the content validity and clinical utility of the SAAIA Survey. Subject group two had 12 subjects who measured the inter-rater reliability by completing the SAAIA using information obtained through a video-taped client case scenario. Content validity results indicated a good to high percentage of agreement and a fair percentage of agreement for clinical utility. Inter-rater reliability results indicate good to high agreement on six of the eight variables of the SAAIA. However, the Kappa score indicates low inter-rater reliability. Results indicate the SAAIA has good content validity and inter-rater reliability and fair clinical utility based on percent agreement. However, further research is needed on the reliability of the SAAIA.
Neziraj, M; Sarac Kart, N; Samuelson, Karin
2011-08-01
The view of delirium has changed considerably over the last decade, and delirium is now a very topical issue within the intensive care unit (ICU) setting. Delirium has proved to be common in critically ill patients and is manifested as acute changes in mental status with reduced cognitive ability, incoherent thought patterns, impaired consciousness, agitation and acute confusion. In order to be able to prevent, identify and alleviate problems related to delirium it is important that validated instruments for delirium screening are implemented and evaluated. The aim of this study was to translate the Intensive Care Delirium Screening Checklist (ICDSC) into Swedish and test the inter-rater reliability in a Swedish general ICU setting. The study was carried out during 2009 in a general Swedish ICU. A translation of the scale from English into Swedish was made, including back-translation, critical review and pilot testing. A total of 49 paired ratings were carried out using the Swedish version of the ICDSC scale. The inter-rater reliability was tested using weighted kappa (κ) statistics (linear weighting). The ICDSC scale was successfully translated into Swedish and the inter-rater reliability testing of the Swedish version resulted in a weighted k value of 0.92. The result of this study indicates that the Swedish version of the ICDSC scale has a very good inter-rater reliability. The high inter-rater reliability and the ease of administration make the ICDSC scale applicable for delirium screening in a Swedish ICU setting. © 2011 The Authors. Acta Anaesthesiologica Scandinavica © 2011 The Acta Anaesthesiologica Scandinavica Foundation.
Benjamin, Sara E; Neelon, Brian; Ball, Sarah C; Bangdiwala, Shrikant I; Ammerman, Alice S; Ward, Dianne S
2007-01-01
Background Few assessment instruments have examined the nutrition and physical activity environments in child care, and none are self-administered. Given the emerging focus on child care settings as a target for intervention, a valid and reliable measure of the nutrition and physical activity environment is needed. Methods To measure inter-rater reliability, 59 child care center directors and 109 staff completed the self-assessment concurrently, but independently. Three weeks later, a repeat self-assessment was completed by a sub-sample of 38 directors to assess test-retest reliability. To assess criterion validity, a researcher-administered environmental assessment was conducted at 69 centers and was compared to a self-assessment completed by the director. A weighted kappa test statistic and percent agreement were calculated to assess agreement for each question on the self-assessment. Results For inter-rater reliability, kappa statistics ranged from 0.20 to 1.00 across all questions. Test-retest reliability of the self-assessment yielded kappa statistics that ranged from 0.07 to 1.00. The inter-quartile kappa statistic ranges for inter-rater and test-retest reliability were 0.45 to 0.63 and 0.27 to 0.45, respectively. When percent agreement was calculated, questions ranged from 52.6% to 100% for inter-rater reliability and 34.3% to 100% for test-retest reliability. Kappa statistics for validity ranged from -0.01 to 0.79, with an inter-quartile range of 0.08 to 0.34. Percent agreement for validity ranged from 12.9% to 93.7%. Conclusion This study provides estimates of criterion validity, inter-rater reliability and test-retest reliability for an environmental nutrition and physical activity self-assessment instrument for child care. Results indicate that the self-assessment is a stable and reasonably accurate instrument for use with child care interventions. We therefore recommend the Nutrition and Physical Activity Self-Assessment for Child Care (NAP SACC) instrument to researchers and practitioners interested in conducting healthy weight intervention in child care. However, a more robust, less subjective measure would be more appropriate for researchers seeking an outcome measure to assess intervention impact. PMID:17615078
Bedekar, Nilima; Suryawanshi, Mayuri; Rairikar, Savita; Sancheti, Parag; Shyam, Ashok
2014-01-01
Evaluation of range of motion (ROM) is integral part of assessment of musculoskeletal system. This is required in health fitness and pathological conditions; also it is used as an objective outcome measure. Several methods are described to check spinal flexion range of motion. Different methods for measuring spine ranges have their advantages and disadvantages. Hence, a new device was introduced in this study using the method of dual inclinometer to measure lumbar spine flexion range of motion (ROM). To determine Intra and Inter-rater reliability of mobile device goniometer in measuring lumbar flexion range of motion. iPod mobile device with goniometer software was used. The part being measure i.e the back of the subject was suitably exposed. Subject was standing with feet shoulder width apart. Spinous process of second sacral vertebra S2 and T12 were located, these were used as the reference points and readings were taken. Three readings were taken for each: inter-rater reliability as well as the intra-rater reliability. Sufficient rest was given between each flexion movement. Intra-rater reliability using ICC was r=0.920 and inter-rater r=0.812 at CI 95%. Validity r=0.95. Mobile device goniometer has high intra-rater reliability. The inter-rater reliability was moderate. This device can be used to assess range of motion of spine flexion, representing uni-planar movement.
NASA Astrophysics Data System (ADS)
Jaspers, Mariëlle E.; Maltha, Ilse M.; Klaessens, John H.; Vet, Henrica C.; Verdaasdonk, Rudolf M.; Zuijlen, Paul P.
2016-02-01
In burn wounds early discrimination between the different depths plays an important role in the treatment strategy. The remaining vasculature in the wound determines its healing potential. Non-invasive measurement tools that can identify the vascularization are therefore considered to be of high diagnostic importance. Thermography is a non-invasive technique that can accurately measure the temperature distribution over a large skin or tissue area, the temperature is a measure of the perfusion of that area. The aim of this study was to investigate the clinimetric properties (i.e. reliability and validity) of thermography for measuring burn wound depth. In a cross-sectional study with 50 burn wounds of 35 patients, the inter-observer reliability and the validity between thermography and Laser Doppler Imaging were studied. With ROC curve analyses the ΔT cut-off point for different burn wound depths were determined. The inter-observer reliability, expressed by an intra-class correlation coefficient of 0.99, was found to be excellent. In terms of validity, a ΔT cut-off point of 0.96°C (sensitivity 71%; specificity 79%) differentiates between a superficial partial-thickness and deep partial-thickness burn. A ΔT cut-off point of -0.80°C (sensitivity 70%; specificity 74%) could differentiate between a deep partial-thickness and a full-thickness burn wound. This study demonstrates that thermography is a reliable method in the assessment of burn wound depths. In addition, thermography was reasonably able to discriminate among different burn wound depths, indicating its potential use as a diagnostic tool in clinical burn practice.
Valentim, Daniela Pereira; Sato, Tatiana de Oliveira; Comper, Maria Luiza Caíres; Silva, Anderson Martins da; Boas, Cristiana Villas; Padula, Rosimeire Simprini
There are very few observational methods for analysis of biomechanical exposure available in Brazilian-Portuguese. This study aimed to cross-culturally adapt and test the measurement properties of the Rapid Upper Limb Assessment (RULA) and Strain Index (SI). The cross-cultural adaptation and measurement properties test were established according to Beaton et al. and COSMIN guidelines, respectively. Several tasks that required static posture and/or repetitive motion of upper limbs were evaluated (n>100). The intra-raters' reliability for the RULA ranged from poor to almost perfect (k: 0.00-0.93), and SI from poor to excellent (ICC 2.1 : 0.05-0.99). The inter-raters' reliability was very poor for RULA (k: -0.12 to 0.13) and ranged from very poor to moderate for SI (ICC 2.1 : 0.00-0.53). The agreement was good for RULA (75-100% intra-raters, and 42.24-100% inter-raters) and to SI (EPM: -1.03% to 1.97%; intra-raters, and -0.17% to 1.51% inter-raters). The internal consistency was appropriate for RULA (α=0.88), and low for SI (α=0.65). Moderate construct validity were observed between RULA and SI, in wrist/hand-wrist posture (rho: 0.61) and strength/intensity of exertion (rho: 0.39). The adapted versions of the RULA and SI presented semantic and cultural equivalence for the Brazilian Portuguese. The RULA and SI had reliability estimates ranged from very poor to almost perfect. The internal consistency for RULA was better than the SI. The correlation between methods was moderate only of muscle request/movement repetition. Previous training is mandatory to use of observations methods for biomechanical exposure assessment, although it does not guarantee good reproducibility of these measures. Copyright © 2017 Associação Brasileira de Pesquisa e Pós-Graduação em Fisioterapia. Publicado por Elsevier Editora Ltda. All rights reserved.
Troester, Jordan C; Jasmin, Jason G; Duffield, Rob
2018-06-01
The present study examined the inter-trial (within test) and inter-test (between test) reliability of single-leg balance and single-leg landing measures performed on a force plate in professional rugby union players using commercially available software (SpartaMARS, Menlo Park, USA). Twenty-four players undertook test - re-test measures on two occasions (7 days apart) on the first training day of two respective pre-season weeks following 48h rest and similar weekly training loads. Two 20s single-leg balance trials were performed on a force plate with eyes closed. Three single-leg landing trials were performed by jumping off two feet and landing on one foot in the middle of a force plate 1m from the starting position. Single-leg balance results demonstrated acceptable inter-trial reliability (ICC = 0.60-0.81, CV = 11-13%) for sway velocity, anterior-posterior sway velocity, and mediolateral sway velocity variables. Acceptable inter-test reliability (ICC = 0.61-0.89, CV = 7-13%) was evident for all variables except mediolateral sway velocity on the dominant leg (ICC = 0.41, CV = 15%). Single-leg landing results only demonstrated acceptable inter-trial reliability for force based measures of relative peak landing force and impulse (ICC = 0.54-0.72, CV = 9-15%). Inter-test results indicate improved reliability through the averaging of three trials with force based measures again demonstrating acceptable reliability (ICC = 0.58-0.71, CV = 7-14%). Of the variables investigated here, total sway velocity and relative landing impulse are the most reliable measures of single-leg balance and landing performance, respectively. These measures should be considered for monitoring potential changes in postural control in professional rugby union.
Camara, Camila Thais Pinto; de Freitas, Sandra Maria Sbeghen Ferreira; de Lima, Waléria Paixão; Lima, Camila Astolphi; Amorim, César Ferreira; Perracini, Monica Rodrigues
2017-01-01
Our aim is to estimate inter-observer reliability, test-retest reliability, anthropometric and biomechanical adequacy and minimal detectable change when measuring the length of single-point adjustable canes in community-dwelling older adults. There are 112 participants in the study. They are men and women, aged 60 years and over, who were attending an outpatient community health centre. An exploratory study design was used. Participants underwent two assessments within the same day by two independent observers and by the same observer at an interval of 15-45 days. Two measures were used to establish the length of a single-point adjustable cane: the distance from the distal wrist crease to the floor (WF) and the distance from the top of the greater trochanter of the femur to the floor (TF). Each individual was fitted according to these two measures, and elbow flexion angle was measured. Inter-observer reliability and the test-retest reliability were high in both TF (ICC 3.1 = 0.918 and ICC 2.1 = 0.935) and WF measures (ICC 3.1 = 0.967 and ICC 2.1 = 0.960). Only 1% of the individuals kept an elbow flexion angle within the standard recommendation of 30° ± 10° when the cane length was determined by the TF measure, and 30% of the participants when the cane was determined by the WF measure. The minimal detectable cane length change was 2.2 cm. Our results suggest that, even though both measures are reliable, cane length determined by WF distance is more appropriate to keep the elbow flexion angle within the standard recommendation. The minimal detectable change corresponds to approximately a hole in the cane adjustment. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
How reliable are Functional Movement Screening scores? A systematic review of rater reliability.
Moran, Robert W; Schneiders, Anthony G; Major, Katherine M; Sullivan, S John
2016-05-01
Several physical assessment protocols to identify intrinsic risk factors for injury aetiology related to movement quality have been described. The Functional Movement Screen (FMS) is a standardised, field-expedient test battery intended to assess movement quality and has been used clinically in preparticipation screening and in sports injury research. To critically appraise and summarise research investigating the reliability of scores obtained using the FMS battery. Systematic literature review. Systematic search of Google Scholar, Scopus (including ScienceDirect and PubMed), EBSCO (including Academic Search Complete, AMED, CINAHL, Health Source: Nursing/Academic Edition), MEDLINE and SPORTDiscus. Studies meeting eligibility criteria were assessed by 2 reviewers for risk of bias using the Quality Appraisal of Reliability Studies checklist. Overall quality of evidence was determined using van Tulder's levels of evidence approach. 12 studies were appraised. Overall, there was a 'moderate' level of evidence in favour of 'acceptable' (intraclass correlation coefficient ≥0.6) inter-rater and intra-rater reliability for composite scores derived from live scoring. For inter-rater reliability of composite scores derived from video recordings there was 'conflicting' evidence, and 'limited' evidence for intra-rater reliability. For inter-rater reliability based on live scoring of individual subtests there was 'moderate' evidence of 'acceptable' reliability (κ≥0.4) for 4 subtests (Deep Squat, Shoulder Mobility, Active Straight-leg Raise, Trunk Stability Push-up) and 'conflicting' evidence for the remaining 3 (Hurdle Step, In-line Lunge, Rotary Stability). This review found 'moderate' evidence that raters can achieve acceptable levels of inter-rater and intra-rater reliability of composite FMS scores when using live ratings. Overall, there were few high-quality studies, and the quality of several studies was impacted by poor study reporting particularly in relation to rater blinding. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Test-retest reliability of myofascial trigger point detection in hip and thigh areas.
Rozenfeld, E; Finestone, A S; Moran, U; Damri, E; Kalichman, L
2017-10-01
Myofascial trigger points (MTrP's) are a primary source of pain in patients with musculoskeletal disorders. Nevertheless, they are frequently underdiagnosed. Reliable MTrP palpation is the necessary for their diagnosis and treatment. The few studies that have looked for intra-tester reliability of MTrPs detection in upper body, provide preliminary evidence that MTrP palpation is reliable. Reliability tests for MTrP palpation on the lower limb have not yet been performed. To evaluate inter- and intra-tester reliability of MTrP recognition in hip and thigh muscles. Reliability study. 21 patients (15 males and 6 females, mean age 21.1 years) referred to the physical therapy clinic, 10 with knee or hip pain and 11 with pain in an upper limb, low back, shin or ankle. Two experienced physical therapists performed the examinations, blinded to the subjects' identity, medical condition and results of the previous MTrP evaluation. Each subject was evaluated four times, twice by each examiner in a random order. Dichotomous findings included a palpable taut band, tenderness, referred pain, and relevance of referred pain to patient's complaint. Based on these, diagnosis of latent MTrP's or active MTrP's was established. The evaluation was performed on both legs and included a total of 16 locations in the following muscles: rectus femoris (proximal), vastus medialis (middle and distal), vastus lateralis (middle and distal) and gluteus medius (anterior, posterior and distal). Inter- and intra-tester reliability (Cohen's kappa (κ)) values for single sites ranged from -0.25 to 0.77. Median intra-tester reliability was 0.45 and 0.46 for latent and active MTrP's, and median inter-tester reliability was 0.51 and 0.64 for latent and active MTrPs, respectively. The examination of the distal vastus medialis was most reliable for latent and active MTrP's (intra-tester k = 0.27-0.77, inter-tester k = 0.77 and intra-tester k = 0.53-0.72, inter-tester k = 0.72, correspondingly). Inter- and intra-tester reliability of active and latent MTrP evaluation was moderate to substantial. Palpation evaluation can be used for clinical diagnosis of MTrP's in the hip and thigh muscles. This study provides evidence that MTrP palpation is a moderately reliable diagnostic tool in the hip and thigh muscles and can be used in clinical practice and research. Copyright © 2017 Elsevier Ltd. All rights reserved.
Alyusuf, Raja H; Prasad, Kameshwar; Abdel Satir, Ali M; Abalkhail, Ali A; Arora, Roopa K
2013-01-01
The exponential use of the internet as a learning resource coupled with varied quality of many websites, lead to a need to identify suitable websites for teaching purposes. The aim of this study is to develop and to validate a tool, which evaluates the quality of undergraduate medical educational websites; and apply it to the field of pathology. A tool was devised through several steps of item generation, reduction, weightage, pilot testing, post-pilot modification of the tool and validating the tool. Tool validation included measurement of inter-observer reliability; and generation of criterion related, construct related and content related validity. The validated tool was subsequently tested by applying it to a population of pathology websites. Reliability testing showed a high internal consistency reliability (Cronbach's alpha = 0.92), high inter-observer reliability (Pearson's correlation r = 0.88), intraclass correlation coefficient = 0.85 and κ =0.75. It showed high criterion related, construct related and content related validity. The tool showed moderately high concordance with the gold standard (κ =0.61); 92.2% sensitivity, 67.8% specificity, 75.6% positive predictive value and 88.9% negative predictive value. The validated tool was applied to 278 websites; 29.9% were rated as recommended, 41.0% as recommended with caution and 29.1% as not recommended. A systematic tool was devised to evaluate the quality of websites for medical educational purposes. The tool was shown to yield reliable and valid inferences through its application to pathology websites.
Aartun, Ellen; Degerfalk, Anna; Kentsdotter, Linn; Hestbaek, Lise
2014-02-10
Evidence on the reliability of clinical tests used for the spinal screening of children and adolescents is currently lacking. The aim of this study was to determine the inter- and intra-rater reliability and measurement error of clinical tests commonly used when screening young spines. Two experienced chiropractors independently assessed 111 adolescents aged 12-14 years who were recruited from a primary school in Denmark. A standardised examination protocol was used to test inter-rater reliability including tests for scoliosis, hypermobility, general mobility, inter-segmental mobility and end range pain in the spine. Seventy-five of the 111 subjects were re-examined after one to four hours to test intra-rater reliability. Percentage agreement and Cohen's Kappa were calculated for binary variables, and interclass correlation (ICC) and Bland-Altman plots with Limits of Agreement (LoA) were calculated for continuous measures. Inter-rater percentage agreement for binary data ranged from 59.5% to 100%. Kappa ranged from 0.06-1.00. Kappa ≥ 0.40 was seen for elbow, thumb, fifth finger and trunk/hip flexion hypermobility, pain response in inter-segmental mobility and end range pain in lumbar flexion and extension. For continuous data, ICCs ranged from 0.40-0.95. Only forward flexion as measured by finger-to-floor distance reached an acceptable ICC(≥ 0.75). Overall, results for intra-rater reliability were better than for inter-rater reliability but for both components, the LoA were quite wide compared with the range of assessments. Some clinical tests showed good, and some tests poor, reliability when applied in a spinal screening of adolescents. The results could probably be improved by additional training and further test standardization. This is the first step in evaluating the value of these tests for the spinal screening of adolescents. Future research should determine the association between these tests and current and/or future neck and back pain.
Poulos, Natalie S; Pasch, Keryn E
2015-07-01
Few studies of the food environment have collected primary data, and even fewer have reported reliability of the tool used. This study focused on the development of an innovative electronic data collection tool used to document outdoor food and beverage (FB) advertising and establishments near 43 middle and high schools in the Outdoor MEDIA Study. Tool development used GIS based mapping, an electronic data collection form on handheld devices, and an easily adaptable interface to efficiently collect primary data within the food environment. For the reliability study, two teams of data collectors documented all FB advertising and establishments within one half-mile of six middle schools. Inter-rater reliability was calculated overall and by advertisement or establishment category using percent agreement. A total of 824 advertisements (n=233), establishment advertisements (n=499), and establishments (n=92) were documented (range=8-229 per school). Overall inter-rater reliability of the developed tool ranged from 69-89% for advertisements and establishments. Results suggest that the developed tool is highly reliable and effective for documenting the outdoor FB environment. Copyright © 2015 Elsevier Ltd. All rights reserved.
Poulos, Natalie S.; Pasch, Keryn E.
2015-01-01
Few studies of the food environment have collected primary data, and even fewer have reported reliability of the tool used. This study focused on the development of an innovative electronic data collection tool used to document outdoor food and beverage (FB) advertising and establishments near 43 middle and high schools in the Outdoor MEDIA Study. Tool development used GIS based mapping, an electronic data collection form on handheld devices, and an easily adaptable interface to efficiently collect primary data within the food environment. For the reliability study, two teams of data collectors documented all FB advertising and establishments within one half-mile of six middle schools. Inter-rater reliability was calculated overall and by advertisement or establishment category using percent agreement. A total of 824 advertisements (n=233), establishment advertisements (n=499), and establishments (n=92) were documented (range=8–229 per school). Overall inter-rater reliability of the developed tool ranged from 69–89% for advertisements and establishments. Results suggest that the developed tool is highly reliable and effective for documenting the outdoor FB environment. PMID:26022774
Objective measurements of excess skin in post bariatric patients--inter-rater reliability.
Biörserud, Christina; Fagevik Olsén, Monika; Elander, Anna; Wiklund, Malin
2016-01-01
An ability to reliably assess excess skin after massive weight loss using well-described and transferrable methods is important. The aim of this trial was to evaluate inter-rater reliability of ptosis and circumference measurements in patients with excess skin after bariatric surgery. Twenty-five postbariatric patients were included in the study, and their excess skin was measured 18 months after surgery. A protocol was designed to measure excess skin in a standardised way. To evaluate the inter-rater reliability in the measuring protocol, all patients were measured twice, by a specialist nurse and a specialist physiotherapist. All circumference measurements on different body parts had an ICC > 0.9, indicating high reliability. Furthermore, all breast and abdominal ptosis measurements had high reliability. In contrast, visual evaluation of abdominal ptosis had poor reliability. Measurements of ptoses on different body parts had an ICC > 0.6. There were no systematic differences between the results of the two testers, except for measurements of the buttocks and maximal knee circumference. The measuring protocol presented in this study has high reliability and, therefore, represents a useful instrument to provide a consistent and objective assessment of excess skin in the postbariatric patient.
Chen, Hong-Lin; Cao, Ying-Juan; Zhang, Wei; Wang, Jing; Huai, Bao-Sha
2017-02-01
The inter-rater reliability of Braden Scale is not so good. We modified the Braden(ALB) scale by defining nutrition subscale based on serum albumin, then assessed it's the validity and reliability in hospital patients. We designed a retrospective study for validity analysis, and a prospective study for reliability analysis. Receiver operating curve (ROC) and area under the curve (AUC) were used to evaluate the predictive validity. Intra-class correlation coefficient (ICC) was used to investigate the inter-rater reliability. Two thousand five hundred twenty-five patients were included for validity analysis, 76 patients (3.0%) developed pressure ulcer. Positive correlation was found between serum albumin and nutrition score in Braden scale (Spearman's coefficient 0.2203, P<0.0001). The AUCs for Braden scale and Braden(ALB) scale predicting pressure ulcer risk were 0.813 (95% CI 0.797-0.828; P<0.0001), and 0.859 (95% CI 0.845-0.872; P<0.0001), respectively. The Braden(ALB) scale was even more valid than the Braden scale (z=1.860, P=0.0628). In different age subgroups, the Braden(ALB) scale seems also more valid than the original Braden scale, but no statistically significant differences were found (P>0.05). The inter-rater reliability study showed the ICC-value for nutrition increased 45.9%, and increased 4.3% for total score. The Braden(ALB) scale has similar validity compared with the original Braden scale for in hospital patients. However, the inter-rater reliability was significantly increased. Copyright © 2016 Elsevier Inc. All rights reserved.
Mitchell, S A; Miles, C L; Brennan, L; Matthews, J
2010-02-01
Assessment of children's diets is problematic, typically relying on error-prone parent or child recall or reporting, or resource intensive direct observation. The School Food Checklist (SFC) is an objective instrument comprising of 20 food and beverage categories designed to measure the foods contained in children's packed lunches. The present study aimed to assess intra-rater and inter-rater reliability of each of the food and beverage categories of the SFC for both in-school audits and photograph analysis of children's school lunches. Participants comprised 176 children aged 5-8 years from five primary schools in Northern Metropolitan Melbourne. The SFC was used to measure the foods contained in children's packed lunches in the school setting and using photographs. Photograph analysis was conducted by the auditors 2-3 months after completion of in-school audits. Both intra-rater [intra-class correlation coefficient (ICC) = 0.78-1] and inter-rater (ICC = 0.50-0.95) reliability analysis indicated strong agreement for in-school auditing. With the exception of the food category titled 'leftovers', there was strong intra-rater reliability for auditors' live audits and their analysis of photographs [ICC = 0.57-0.98 (Auditor 1); ICC = 0.72-0.90 (Auditor 2)], and strong inter-rater reliability for photograph analysis (ICC = 0.68-0.92). The SFC is a reliable method of measuring the foods and beverages contained in children's packed lunches when used in the school setting or for photograph analysis. This finding has broad implications, particularly for the use of photograph analysis, because this approach offers a convenient and cost effective method of measuring what food and beverages children bring to school in home packed lunches.
Clinico-radiological diagnosis and grading of rapidly progressive osteoarthritis of the hip.
Zazgyva, Ancuţa; Gurzu, Simona; Gergely, István; Jung, Ioan; Roman, Ciprian O; Pop, Tudor S
2017-03-01
Due to the current lack of standard definitions for rapidly progressive osteoarthritis of the hip (RPOH) in the literature, this observational study aimed to describe new diagnostic criteria and a grading system for the disease.From a consecutive series of patients undergoing total hip replacement, 2 groups were selected: 1 with RPOH and 1 with primary hip osteoarthritis (POH), and their clinical, paraclinical, and demographic data were compared. The newly proposed clinico-radiological diagnostic criteria are based on characteristics of pain, joint mobility, and radiological assessment. The radiological grading system's inter- and intraobserver reliability was assessed through serial evaluations by 2 blinded reviewers.From the total 863 cases, 82 cases (9.5%) of RPOH were identified and compared with 107 cases of POH. Mean age and disease bilaterality were similar, with a predominance of female patients in the RPOH group (P = 0.03). There were significant differences between the 2 groups in disease onset and aggravation, and intraoperative blood loss. The grading system showed significant inter- and intraobserver agreement (weighted kappa 0.93, and 0.89).Our study presents distinctive, easily recognizable clinico-radiological characteristics of RPOH and confirmed the inter- and intraobserver reliability of the newly proposed grading system.
Wright, F Virginia; Ryan, Jennifer; Brewer, Kelly
2010-01-01
To examine inter-rater, intra-rater and test-re-test reliability of the Community Balance and Mobility Scale (CB&M) and compare reliability in live vs videotape rating contexts for children with acquired brain injury (ABI). Repeated measures design. Seven physiotherapists (PTs) were trained as assessors. The primary assessor administered and scored baseline CB&M while the second assessor observed and scored independently (inter-rater reliability). Re-assessment occurred 3-10 days later by primary assessor (test-re-test reliability). Assessments were videotaped. There were 32 participants with ABI (mean age = 14 years 1 month (SD = 2 years 1 month)). Baseline mean scores were 67.4% (18.2) and 66.7% (18.3) for primary and second assessor, respectively. Primary assessors' re-test mean score was 69.3%. Inter-rater reliability ICC was 0.93 (95% confidence interval (CI) = 0.87-0.97). Test-re-test ICC was 0.90 (95%CI = 0.81-0.95) and Bland-Altman plot indicated greatest test-re-test differences for mid-range CB&M scores. Minimum detectable change (MDC₉₀) was 13.5% points. The CB&M showed excellent reliability in youth. Reliability was comparable for live and videotape rating approaches, meaning that the easier and less expensive live-rating can be recommended. Future work should focus on evaluation of responsiveness to change in rehabilitation centre and community intervention contexts.
Burke, Shane M; Hwang, Steven W; Mehan, William A; Bedi, Harprit S; Ogbuji, Richard; Riesenburger, Ron I
2016-07-01
Cross-specialty inter-rater reliability has not been explicitly reported for imaging characteristics that are thought to be important in lumbar intervertebral disc degeneration. Sufficient cross-specialty reliability is an essential consideration if radiographic stratification of symptomatic patients to specific treatment modalities is to ever be realized. Therefore the purpose of this study was to directly compare the assessment of such characteristics between neurosurgeons and neuroradiologists. Sixty consecutive patients with a diagnosis of lumbago and appropriate imaging were selected for inclusion. Lumbar MRI were evaluated using the Tufts Degenerative Disc Classification by two neurosurgeons and two neuroradiologists. Inter-rater reliability was assessed using Cohen's κ values both within and between specialties. A sensitivity analysis was performed for a modified grading system, which excluded high intensity zones (HIZ), due to poor cross-specialty inter-rater reliability of HIZ between specialties. The reliability of HIZ between neurosurgeons and neuroradiologists was fair in two of the four cross-specialty comparisons in this study (neurosurgeon 1 versus both radiologists κ=0.364 and κ=0.290). Removing HIZ from the classification improved inter-rater reliability for all comparisons within and between specialties (0.465⩽κ⩽0.576). In addition, intra-rater reliability remained in the moderate to substantial range (0.523⩽κ⩽0.649). Given our findings and corroboration with previous studies, identification of HIZ seems to have a markedly variable reliability. Thus we recommend modification of the original Tufts Degenerative Disc Classification by removing HIZ in order to make the overall grade provided by this classification more reproducible when scored by practitioners of different training backgrounds. Copyright © 2015 Elsevier Ltd. All rights reserved.
Prasad, M. Krishna; Udupa, K.; Kishore, K. R.; Thirthalli, J.; Sathyaprabha, T. N.; Gangadhar, B. N.
2009-01-01
Background: Hamilton depression rating scale (Ham-D) is the most widely used clinician rating scale for depression. There has been no Indian study that has examined the inter-rater reliability (IRR) of video-recorded interviews of the 21-item Ham-D. Aim: To study the IRR of scoring video-recorded interviews for 21-item Ham-D. Materials and Methods: Eighteen subjects with major depressive disorder involved in a larger study were interviewed using the semi-structured clinical interview of the 21-item Ham-D by a primary rater after informed consent. These interviews were video-recorded and portions edited to ensure rater blinding. Subsequently, the video-recorded interviews were rated by a “blind” rater. Both rated the different sub-domains of Ham-D according to Rhoades and Overall (1983). IRR was evaluated using intra-class correlation coefficient. Results: Excellent IRR was observed (0.9891) between the two raters. This was true for each of the primary factors and super-factors. Conclusion: Video recorded 21-item Ham-D has excellentIRR. Video-recorded interviews of Ham-D can be reliably used to blind raters in research. PMID:19881046
Eechaute, Christophe; Vaes, Peter; Van Aerschot, Lieve; Asman, Sara; Duquet, William
2007-01-18
The assessment of outcomes from the patient's perspective becomes more recognized in health care. Also in patients with chronic ankle instability, the degree of present impairments, disabilities and participation problems should be documented from the perspective of the patient. The decision about which patient-assessed instrument is most appropriate for clinical practice should be based upon systematic reviews. Only rating scales constructed for patients with acute ligament injuries were systematically reviewed in the past. The aim of this study was to review systematically the clinimetric qualities of patient-assessed instruments designed for patients with chronic ankle instability. A computerized literature search of Medline, Embase, Cinahl, Web of Science, Sport Discus and the Cochrane Controlled Trial Register was performed to identify eligible instruments. Two reviewers independently evaluated the clinimetric qualities of the selected instruments using a criteria list. The inter-observer reliability of both the selection procedure and the clinimetric evaluation was calculated using modified kappa coefficients. The inter-observer reliability of the selection procedure was excellent (k = .86). Four instruments met the eligibility criteria: the Ankle Joint Functional Assessment Tool (AJFAT), the Functional Ankle Outcome Score (FAOS), the Foot and Ankle Disability Index (FADI) and the Functional Ankle Ability Measure (FAAM). The inter-observer reliability of the quality assessment was substantial to excellent (k between .64 and .88). Test-retest reliability was demonstrated for the FAOS, the FADI and the FAAM but not for the AJFAT. The FAOS and the FAAM met the criteria for content validity and construct validity. For none of the studied instruments, the internal consistency was sufficiently demonstrated. The presence of floor- and ceiling effects was assessed for the FAOS but ceiling effects were present for all subscales. Responsiveness was demonstrated for the AJFAT, FADI and the FAAM. Only for the FAAM, a minimal clinical important difference (MCID) was presented. The FADI and the FAAM can be considered as the most appropriate, patient-assessed tools to quantify functional disabilities in patients with chronic ankle instability. The clinimetric qualities of the FAAM need to be further demonstrated in a specific population of patients with chronic ankle instability.
Konge, L; Vilmann, P; Clementsen, P; Annema, J T; Ringsted, C
2012-10-01
Fine-needle aspiration (FNA) guided by endoscopic ultrasonography (EUS) is important in mediastinal staging of non-small cell lung cancer (NSCLC). Training standards and implementation strategies of this technique are currently under discussion. The aim of this study was to explore the reliability and validity of a newly developed EUS Assessment Tool (EUSAT) designed to measure competence in EUS - FNA for mediastinal staging of NSCLC. A total of 30 patients with proven or suspected NSCLC underwent EUS - FNA for mediastinal staging by three trainees and three experienced physicians. Their performances were assessed prospectively by three experts in EUS under direct observation and again 2 months later in a blinded fashion using digital video-recordings. Based on the assessments, intra-rater reliability, inter-rater reliability, and construct validity were explored. The intra-rater reliability was good (Cronbach's α = 0.80), but comparison of results based on direct observations and blinded video-recordings indicated a significant bias favoring consultants (P = 0.022). Inter-rater reliability was very good (Cronbach's α = 0.93). However, one rater assessing five procedures or two raters each assessing four procedures were necessary to secure a generalizability coefficient of 0.80. The assessment tool demonstrated construct validity by discriminating between trainees and experienced physicians (P = 0.034). Competency in mediastinal staging of NSCLC using EUS and EUS - FNA can be assessed in a reliable and valid way using the EUSAT assessment tool. Measuring and defining competency and training requirements could improve EUS quality and benefit patient care. © Georg Thieme Verlag KG Stuttgart · New York.
van Vugt, Jeroen L A; Levolger, Stef; Gharbharan, Arvind; Koek, Marcel; Niessen, Wiro J; Burger, Jacobus W A; Willemsen, Sten P; de Bruin, Ron W F; IJzermans, Jan N M
2017-04-01
The association between body composition (e.g. sarcopenia or visceral obesity) and treatment outcomes, such as survival, using single-slice computed tomography (CT)-based measurements has recently been studied in various patient groups. These studies have been conducted with different software programmes, each with their specific characteristics, of which the inter-observer, intra-observer, and inter-software correlation are unknown. Therefore, a comparative study was performed. Fifty abdominal CT scans were randomly selected from 50 different patients and independently assessed by two observers. Cross-sectional muscle area (CSMA, i.e. rectus abdominis, oblique and transverse abdominal muscles, paraspinal muscles, and the psoas muscle), visceral adipose tissue area (VAT), and subcutaneous adipose tissue area (SAT) were segmented by using standard Hounsfield unit ranges and computed for regions of interest. The inter-software, intra-observer, and inter-observer agreement for CSMA, VAT, and SAT measurements using FatSeg, OsiriX, ImageJ, and sliceOmatic were calculated using intra-class correlation coefficients (ICCs) and Bland-Altman analyses. Cohen's κ was calculated for the agreement of sarcopenia and visceral obesity assessment. The Jaccard similarity coefficient was used to compare the similarity and diversity of measurements. Bland-Altman analyses and ICC indicated that the CSMA, VAT, and SAT measurements between the different software programmes were highly comparable (ICC 0.979-1.000, P < 0.001). All programmes adequately distinguished between the presence or absence of sarcopenia (κ = 0.88-0.96 for one observer and all κ = 1.00 for all comparisons of the other observer) and visceral obesity (all κ = 1.00). Furthermore, excellent intra-observer (ICC 0.999-1.000, P < 0.001) and inter-observer (ICC 0.998-0.999, P < 0.001) agreement for all software programmes were found. Accordingly, excellent Jaccard similarity coefficients were found for all comparisons (mean ≥ 0.964). FatSeg, OsiriX, ImageJ, and sliceOmatic showed an excellent agreement for CSMA, VAT, and SAT measurements on abdominal CT scans. Furthermore, excellent inter-observer and intra-observer agreement were achieved. Therefore, results of studies using these different software programmes can reliably be compared. © 2016 The Authors. Journal of Cachexia, Sarcopenia and Muscle published by John Wiley & Sons Ltd on behalf of the Society on Sarcopenia, Cachexia and Wasting Disorders.
Using image J to document healing in ulcers of the foot in diabetes.
Jeffcoate, William J; Musgrove, Alison J; Lincoln, Nadina B
2017-12-01
The aim of the study was to assess the reliability of measuring the cross-sectional area of diabetic foot ulcers using Image J software. The inter- and intra-rater reliability of ulcer area measures were assessed using digital images of acetate tracings of ulcers of the foot affecting 31 participants in an off-loading randomised trial. Observations were made independently by five specialist podiatrists, one of whom was experienced in the use of Image J software and educated the other four in a single session. The mean (±SD) of the mean cross-sectional areas of the 31 ulcers determined independently by the five observers was 1386·7 (±22·7) mm 2 . The correlation between all pairs of observers was >0·99 (P < 0·001). There was no significant difference overall between the five observers (ANOVA F1.538; P = 0·165) and no difference between any two (paired samples test t = -0·787-1·396; P ≥ 0·088). The correlation between the areas determined by two observers on two occasions separated by not less than 1 week was very high (0·997 and 0·999; P < 0·001 and <0·001, respectively). The inter- and intra-reliability of the Image J software is very high, with no evidence of a difference either between or within observers. This technique should be considered for both research and clinical use in order to document changes in ulcer area. © 2017 Medicalhelplines.com Inc and John Wiley & Sons Ltd.
Gorgos, Kara S; Wasylyk, Nicole T; Van Lunen, Bonnie L; Hoch, Matthew C
2014-04-01
Joint mobilizations are commonly used by clinicians to decrease pain and restore joint arthrokinematics following musculoskeletal injury. The force applied during a joint mobilization treatment is subjective to the individual clinician but may have an effect on patient outcomes. The purpose of this systematic review was to critically appraise and synthesize the studies which examined the reliability of clinicians' force application during joint mobilization. A systematic search of PubMed and EBSCO Host databases from inception to March 1, 2013 was conducted to identify studies assessing the reliability of force application during joint mobilizations. Two reviewers utilized the Quality Appraisal of Reliability Studies (QAREL) assessment tool to determine the quality of included studies. The relative reliability of the included studies was examined through intraclass correlation coefficients (ICC) to synthesize study findings. All results were collated qualitatively with a level of evidence approach. A total of seven studies met the eligibility and were included. Five studies were included that assessed inter-clinician reliability, and six studies were included that assessed intra-clinician reliability. The overall level of evidence for inter-clinician reliability was strong for poor-to-moderate reliability (ICC = -0.04 to 0.70). The overall level of evidence for intra-clinician reliability was strong for good reliability (ICC = 0.75-0.99). This systematic review indicates there is variability in force application between clinicians but individual clinicians apply forces consistently. The results of this systematic review suggest innovative instructional methods are needed to improve consistency and validate the forces applied during of joint mobilization treatments. This is particularly evident for improving the consistency of force application across clinicians. Copyright © 2014 Elsevier Ltd. All rights reserved.
Are photographic records reliable for orthodontic screening?
Mandall, N A
2002-06-01
The aim of the study was to evaluate the reliability of a panel of orthodontists for accepting new patient referrals based on clinical photographs. Eight orthodontists from Greater Manchester, Lancashire, Chester, and Derbyshire observed clinical photographs of 40 consecutive new patients attending the orthodontic department, Hope Hospital, Salford. They recorded whether or not they would accept the patient, as a new patient referral, in their department. Each consultant was asked to take into account factors, such as oral hygiene, dental development, and severity of the malocclusion. Kappa statistic for multiple-rater agreement and kappa statistic for intra-observer reliability were calculated. Inter-observer panel agreement for accepting new patient referrals based on photographic information was low (multiple rater kappa score 0.37). Intra-examiner agreement was better (kappa range 0.34-0.90). Clinician agreement for screening and accepting orthodontic referrals based on clinical photographs is comparable to that previously reported for other clinical decision making.
A tool to assess sex-gender when selecting health research projects.
Tomás, Concepción; Yago, Teresa; Eguiluz, Mercedes; Samitier, M A Luisa; Oliveros, Teresa; Palacios, Gemma
2015-04-01
To validate the questionnaire "Gender Perspective in Health Research" (GPIHR) to assess the inclusion of gender perspective in research projects. Validation study in two stages. Feasibility was analysed in the first, and reliability, internal consistence and validity in the second. Aragón Institute of Health Science, Aragón, Spain. GPIHR was applied to 118 research projects funded in national and international competitive tenders from 2003 to 2012. Analysis of inter- and intra-observer reliability with Kappa index and internal consistency with Cronbach's alpha. Content validity analysed through literature review and construct validity with an exploratory factor analysis. Validated GPIHR has 10 questions: 3 in the introduction, 1 for objectives, 3 for methodology and 3 for research purpose. Average time of application was 13min Inter-observer reliability (Kappa) varied between 0.35 and 0.94 and intra-observer between 0.40 and 0.94. Theoretical construct is supported in the literature. Factor analysis identifies three levels of GP inclusion: "difference by sex", "gender sensitive" and "feminist research" with an internal consistency of 0.64, 0.87 and 0.81, respectively, which explain 74.78% of variance. GPIHR questionnaire is a valid tool to assess GP and useful for those researchers who would like to include GP in their projects. Copyright © 2014 Elsevier España, S.L.U. All rights reserved.
Park, Yoon Soo; Hyderi, Abbas; Bordage, Georges; Xing, Kuan; Yudkowsky, Rachel
2016-10-01
Recent changes to the patient note (PN) format of the United States Medical Licensing Examination have challenged medical schools to improve the instruction and assessment of students taking the Step-2 clinical skills examination. The purpose of this study was to gather validity evidence regarding response process and internal structure, focusing on inter-rater reliability and generalizability, to determine whether a locally-developed PN scoring rubric and scoring guidelines could yield reproducible PN scores. A randomly selected subsample of historical data (post-encounter PN from 55 of 177 medical students) was rescored by six trained faculty raters in November-December 2014. Inter-rater reliability (% exact agreement and kappa) was calculated for five standardized patient cases administered in a local graduation competency examination. Generalizability studies were conducted to examine the overall reliability. Qualitative data were collected through surveys and a rater-debriefing meeting. The overall inter-rater reliability (weighted kappa) was .79 (Documentation = .63, Differential Diagnosis = .90, Justification = .48, and Workup = .54). The majority of score variance was due to case specificity (13 %) and case-task specificity (31 %), indicating differences in student performance by case and by case-task interactions. Variance associated with raters and its interactions were modest (<5 %). Raters felt that justification was the most difficult task to score and that having case and level-specific scoring guidelines during training was most helpful for calibration. The overall inter-rater reliability indicates high level of confidence in the consistency of note scores. Designs for scoring notes may optimize reliability by balancing the number of raters and cases.
The development and validation of a custom built device for assessing frontal knee joint laxity.
Ismail, Shiek Abdullah; Simic, Milena; Clarke, Jillian L; Lopes, Thiago Jambo Alves; Pappas, Evangelos
2017-12-01
This study reports the development and validation of a quantitative technique of assessing frontal knee joint laxity through a custom built device named KLICP. The objectives of this study were to determine: (i) the intra- and inter-rater reliability and (ii) the validity of the device when compared to real time ultrasound. Twenty-five participants had their frontal knee joint laxity assessed by the KLICP, by manual varus/valgus tests and by ultrasound. Two raters independently assessed laxity manually by three repeated measurements, repeated at least 48h later. Results were validated by comparing them to the medial and lateral joint space opening measured by the ultrasound. Intraclass correlation coefficients and standard error of measurement reliability were calculated. Pearson's correlation coefficients were calculated to determine the correlation between the KLICP and the joint space. Intra-rater reliability (intra-session) for each rater was good on both sessions (0.91-0.98), intra-rater reliability (inter-sessions) was moderate to good (0.62-0.87), and inter-rater reliability (intra-session) was good (0.75-0.80). There is low agreement for intra-rater (inter-session) and for inter-rater (intra-session) reliability. The KLICP measurement has a significant positive fair to moderate correlation to the ultrasound measurement at the left (r: 0.61, p: 0.01) and right (r: 0.48, p: 0.02) knee in the valgus direction and at the left (r: 0.51, p: 0.01) and right (r: 0.39, p: 0.05) knee in the varus direction. There is low agreement between the KLICP and the RTU. Reliability and agreement was good only when measured for intra-rater, within session. Copyright © 2017 Elsevier B.V. All rights reserved.
A Turkish Version of the Critical-Care Pain Observation Tool: Reliability and Validity Assessment.
Aktaş, Yeşim Yaman; Karabulut, Neziha
2017-08-01
The study aim was to evaluate the validity and reliability of the Critical-Care Pain Observation Tool in critically ill patients. A repeated measures design was used for the study. A convenience sample of 66 patients who had undergone open-heart surgery in the cardiovascular surgery intensive care unit in Ordu, Turkey, was recruited for the study. The patients were evaluated by using the Critical-Care Pain Observation Tool at rest, during a nociceptive procedure (suctioning), and 20 minutes after the procedure while they were conscious and intubated after surgery. The Turkish version of the Critical-Care Pain Observation Tool has shown statistically acceptable levels of validity and reliability. Inter-rater reliability was supported by moderate-to-high-weighted κ coefficients (weighted κ coefficient = 0.55 to 1.00). For concurrent validity, significant associations were found between the scores on the Critical-Care Pain Observation Tool and the Behavioral Pain Scale scores. Discriminant validity was also supported by higher scores during suctioning (a nociceptive procedure) versus non-nociceptive procedures. The internal consistency of the Critical-Care Pain Observation Tool was 0.72 during a nociceptive procedure and 0.71 during a non-nociceptive procedure. The validity and reliability of the Turkish version of the Critical-Care Pain Observation Tool was determined to be acceptable for pain assessment in critical care, especially for patients who cannot communicate verbally. Copyright © 2016 American Society of PeriAnesthesia Nurses. Published by Elsevier Inc. All rights reserved.
Westbrook, Johanna I; Ampt, Amanda
2009-04-01
Evidence regarding how health information technologies influence clinicians' patterns of work and support efficient practices is limited. Traditional paper-based data collection methods are unable to capture clinical work complexity and communication patterns. The use of electronic data collection tools for such studies is emerging yet is rarely assessed for reliability or validity. Our aim was to design, apply and test an observational method which incorporated the use of an electronic data collection tool for work measurement studies which would allow efficient, accurate and reliable data collection, and capture greater degrees of work complexity than current approaches. We developed an observational method and software for personal digital assistants (PDAs) which captures multiple dimensions of clinicians' work tasks, namely what task, with whom, and with what; tasks conducted in parallel (multi-tasking); interruptions and task duration. During field-testing over 7 months across four hospital wards, fifty-two nurses were observed for 250 h. Inter-rater reliability was tested and validity was measured by (i) assessing whether observational data reflected known differences in clinical role work tasks and (ii) by comparing observational data with participants' estimates of their task time distribution. Observers took 15-20 h of training to master the method and data collection process. Only 1% of tasks observed did not match the classification developed and were classified as 'other'. Inter-rater reliability scores of observers were maintained at over 85%. The results discriminated between the work patterns of enrolled and registered nurses consistent with differences in their roles. Survey data (n=27) revealed consistent ratings of tasks by nurses, and their rankings of most to least time-consuming tasks were significantly correlated with those derived from the observational data. Over 40% of nurses' time was spent in direct care or professional communication, with 11.8% of time spent multi-tasking. Nurses were interrupted approximately every 49 min. One quarter of interruptions occurred while nurses were preparing or administering medications. This method efficiently produces reliable and valid data. The multi-dimensional nature of the data collected provides greater insights into patterns of clinicians' work and communication than has previously been possible using other methods.
Green, Dido; Meroz, Anat; Margalit, Adi Edit; Ratzon, Navah Z
2012-11-01
This study examines a potential instrument for measurement of typing postures of children. This paper describes inter-rater, test-retest reliability and concurrent validity of the Keyboard Personal Computer Style instrument (K-PeCS), an observational measurement of postures and movements during keyboarding, for use with children. Two trained raters independently rated videos of 24 children (aged 7-10 years). Six children returned one week later for identifying test-retest reliability. Concurrent validity was assessed by comparing ratings obtained using the K-PECS to scores from a 3D motion analysis system. Inter-rater reliability was moderate to high for 12 out of 16 items (Kappa: 0.46 to 1.00; correlation coefficients: 0.77-0.95) and test-retest reliability varied across items (Kappa: 0.25 to 0.67; correlation coefficients: r = 0.20 to r = 0.95). Concurrent validity compared favourably across arm pathlength, wrist extension and ulnar deviation. In light of the limitations of other tools the K-PeCS offers a fairly affordable, reliable and valid instrument to address the gap for measurement of typing styles of children, despite the shortcomings of some items. However further research is required to refine the instrument for use in evaluating typing among children. Copyright © 2012 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Prather, Heidi; Hunt, Devyani; Steger-May, Karen; Hayes, Marcie Harris; Knaus, Evan; Clohisy, John
2012-01-01
Objective The objective of the study was to measure the reliability between examiners of three basic maneuvers of the Total Body Functional Profile© physical examination test. The hypothesis was musculoskeletal health care providers of different disciplines could reliably use the three basic maneuvers as part of the musculoskeletal physical examination. Design A prospective observational study was conducted. Twenty-eight adult volunteers were measured on both the left and right side by two independent raters on a single occasion. Setting The subjects were recruited through advertisements placed by the orthopedic department at a tertiary university. Participants 28 volunteers were recruited and completed the study. The volunteers were between the ages of 18 and 51 years of age, had no symptoms in the lower extremity or spine, had no previous history of surgery or tumor involving the lower extremity, and no medical conditions that would preclude participation. Assessment On a single occasion, two examiners per one volunteer were blinded to their own and each others' measurements. Each examiner assessed the distance of frontal and sagittal plane lunge and angle of motion for transverse plane testing. Main Outcome Measurements Inter-rater agreement is expressed with intraclass correlation coefficients (ICCs) and corresponding 95% confidence intervals (CIs). The difference between raters is reported with 95% CIs. Baseline demographics, UCLA, and Harris hip questionnaires were completed by all participants. Results The UCLA and Harris hip scores showed no significant activity restrictions or pain limitations in all participants. The inter-rater reliability for sagittal, frontal, and transverse plane matrix testing was good with ICCs of 0.86 (95% CI 0.77, 0.91), 0.90 (95% CI 0.84, 0.94), and 0.85 (95% CI 0.75, 0.91) respectively. The rater reliability between disciplines for transverse, sagittal and frontal plane matrix testing was good with ICCs of 0.89 (95% CI 0.80, 0.94), 0.88 (95% CI 0.79, 0.94), 0.90 (95% CI 0.81, 0.95). Conclusion The inter-rater reliability for three basic maneuvers of the Total Body Functional Profile© is good amongst musculoskeletal healthcare providers of different disciplines. These three maneuvers may be used consistently as part of the musculoskeletal physical examination. PMID:19627956
Prather, Heidi; Hunt, Devyani; Steger-May, Karen; Hayes, Marcie Harris; Knaus, Evan; Clohisy, John
2009-07-01
The objective of the study was to measure the reliability between examiners of 3 basic maneuvers of the Total Body Functional Profile physical examination test. The hypothesis was musculoskeletal health care providers of different disciplines could reliably use the 3 basic maneuvers as part of the musculoskeletal physical examination. A prospective observational study was conducted. Twenty-eight adult volunteers were measured on both the left and right side by 2 independent raters on a single occasion. The subjects were recruited through advertisements placed by the orthopedic department at a tertiary university. Twenty-eight volunteers were recruited and completed the study. The volunteers were between the ages of 18 and 51 years of age, had no symptoms in the lower extremity or spine, had no previous history of surgery or tumor involving the lower extremity, and no medical conditions that would preclude participation. On a single occasion, 2 examiners per 1 volunteer were blinded to their own and each others' measurements. Each examiner assessed the distance of frontal and sagittal plane lunge and angle of motion for transverse plane testing. Inter-rater agreement is expressed with intraclass correlation coefficients (ICCs) and corresponding 95% confidence intervals (CIs). The difference between raters is reported with 95% CIs. Baseline demographics, University of California Los Angeles (UCLA), and Harris hip questionnaires were completed by all participants. The UCLA and Harris hip scores showed no significant activity restrictions or pain limitations in all participants. The inter-rater reliability for sagittal, frontal, and transverse plane matrix testing was good with ICCs of 0.86 (95% CI 0.77-0.91), 0.90 (95% CI 0.84-0.94), and 0.85 (95% CI 0.75-0.91), respectively. The rater reliability between disciplines for transverse, sagittal, and frontal plane matrix testing was good with ICCs of 0.89 (95% CI 0.80-0.94), 0.88 (95% CI 0.79-0.94), and 0.90 (95% CI 0.81-0.95), respectively. The inter-rater reliability for 3 basic maneuvers of the Total Body Functional Profile is good among musculoskeletal health care providers of different disciplines. These 3 maneuvers may be used consistently as part of the musculoskeletal physical examination.
Kvistgaard Olsen, Jack; Fener, Dilay Kesgin; Waehrens, Eva Elisabet; Wulf Christensen, Anton; Jespersen, Anders; Danneskiold-Samsøe, Bente; Bartels, Else Marie
2017-07-01
Computerized pneumatic cuff pressure algometry (CPA) using the DoloCuff is a new method for pain assessment. Intra- and inter-rater reliabilities have not yet been established. Our aim was to examine the inter- and intrarater reliabilities of DoloCuff measures in healthy subjects. Twenty healthy subjects (ages 20 to 29 years) were assessed three times at 24-hour intervals by two trained raters. Inter-rater reliability was established based on the first and second assessments, whereas intrarater reliability was based on the second and third assessments. Subjects were randomized 1:1 to first assessment at either rater 1 or rater 2. The variables of interest were pressure pain threshold (PT), pressure pain tolerance (PTol), and temporal summation index (TSI). Reliability was estimated by a two-way mixed intraclass correlation coefficient (ICC) absolute agreement analysis. Reliability was considered excellent if ICC > 0.75, fair to good if 0.4 < ICC < 0.75, and poor if ICC < 0.4. Bias and random errors between raters and assessments were evaluated using 95% confidence interval (CI) and Bland-Altman plots. Inter-rater reliability for PT, PTol, and TSI was 0.88 (95% CI: 0.69 to 0.95), 0.86 (95% CI: 0.65 to 0.95), and 0.81 (95% CI: 0.42 to 0.94), respectively. The intrarater reliability for PT, PTol, and TSI was 0.81 (95% CI: 0.53 to 0.92), 0.89 (95% CI: 0.74 to 0.96), and 0.75 (95% CI: 0.28 to 0.91), respectively. Inter-rater reliability was excellent for PT, PTol, and TSI. Similarly, the intrarater reliability for PT and PTol was excellent, while borderline excellent/good for TSI. Therefore, the DoloCuff can be used to obtain reliable measures of pressure pain parameters in healthy subjects. © 2016 World Institute of Pain.
Boe, S G; Dalton, B H; Harwood, B; Doherty, T J; Rice, C L
2009-05-01
To establish the inter-rater reliability of decomposition-based quantitative electromyography (DQEMG) derived motor unit number estimates (MUNEs) and quantitative motor unit (MU) analysis. Using DQEMG, two examiners independently obtained a sample of needle and surface-detected motor unit potentials (MUPs) from the tibialis anterior muscle from 10 subjects. Coupled with a maximal M wave, surface-detected MUPs were used to derive a MUNE for each subject and each examiner. Additionally, size-related parameters of the individual MUs were obtained following quantitative MUP analysis. Test-retest MUNE values were similar with high reliability observed between examiners (ICC=0.87). Additionally, MUNE variability from test-retest as quantified by a 95% confidence interval was relatively low (+/-28 MUs). Lastly, quantitative data pertaining to MU size, complexity and firing rate were similar between examiners. MUNEs and quantitative MU data can be obtained with high reliability by two independent examiners using DQEMG. Establishing the inter-rater reliability of MUNEs and quantitative MU analysis using DQEMG is central to the clinical applicability of the technique. In addition to assessing response to treatments over time, multiple clinicians may be involved in the longitudinal assessment of the MU pool of individuals with disorders of the central or peripheral nervous system.
Values of a Patient and Observer Scar Assessment Scale to Evaluate the Facial Skin Graft Scar.
Chae, Jin Kyung; Kim, Jeong Hee; Kim, Eun Jung; Park, Kun
2016-10-01
The patient and observer scar assessment scale (POSAS) recently emerged as a promising method, reflecting both observer's and patient's opinions in evaluating scar. This tool was shown to be consistent and reliable in burn scar assessment, but it has not been tested in the setting of skin graft scar in skin cancer patients. To evaluate facial skin graft scar applied to POSAS and to compare with objective scar assessment tools. Twenty three patients, who diagnosed with facial cutaneous malignancy and transplanted skin after Mohs micrographic surgery, were recruited. Observer assessment was performed by three independent rates using the observer component of the POSAS and Vancouver scar scale (VSS). Patient self-assessment was performed using the patient component of the POSAS. To quantify scar color and scar thickness more objectively, spectrophotometer and ultrasonography was applied. Inter-observer reliability was substantial with both VSS and the observer component of the POSAS (average measure intraclass coefficient correlation, 0.76 and 0.80, respectively). The observer component consistently showed significant correlations with patients' ratings for the parameters of the POSAS (all p -values<0.05). The correlation between subjective assessment using POSAS and objective assessment using spectrophotometer and ultrasonography showed low relationship. In facial skin graft scar assessment in skin cancer patients, the POSAS showed acceptable inter-observer reliability. This tool was more comprehensive and had higher correlation with patient's opinion.
The psychometric properties of Observer OPTION(5), an observer measure of shared decision making.
Barr, Paul J; O'Malley, Alistair James; Tsulukidze, Maka; Gionfriddo, Michael R; Montori, Victor; Elwyn, Glyn
2015-08-01
Observer OPTION(5) was designed as a more efficient version of OPTION(12), the most commonly used measure of shared decision making (SDM). The current paper assesses the psychometric properties of OPTION(5). Two raters used OPTION(5) to rate recordings of clinical encounters from two previous patient decision aid (PDA) trials (n=201; n=110). A subsample was re-rated two weeks later. We assessed discriminative validity, inter-rater reliability, intra-rater reliability, and concurrent validity. OPTION(5) demonstrated discriminative validity, with increases in SDM between usual care and PDA arms. OPTION(5) also demonstrated concurrent validity with OPTION(12), r=0.61 (95%CI 0.54, 0.68) and intra-rater reliability, r=0.93 (0.83, 0.97). The mean difference in rater score was 8.89 (95% Credibility Interval, 7.5, 10.3), with intraclass correlation (ICC) of 0.67 (95% Credibility Interval, 0.51, 0.91) for the accuracy of rater scores and 0.70 (95% Credibility Interval, 0.56, 0.94) for the consistency of rater scores across encounters, indicating good inter-rater reliability. Raters reported lower cognitive burden when using OPTION(5) compared to OPTION(12). OPTION(5) is a brief, theoretically grounded observer measure of SDM with promising psychometric properties in this sample and low burden on raters. OPTION(5) has potential to provide reliable, valid assessment of SDM in clinical encounters. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
2011-01-01
Background A clinical study was conducted to determine the intra and inter-rater reliability of digital scanning and the neutral suspension casting technique to measure six foot parameters. The neutral suspension casting technique is a commonly utilised method for obtaining a negative impression of the foot prior to orthotic fabrication. Digital scanning offers an alternative to the traditional plaster of Paris techniques. Methods Twenty one healthy participants volunteered to take part in the study. Six casts and six digital scans were obtained from each participant by two raters of differing clinical experience. The foot parameters chosen for investigation were cast length (mm), forefoot width (mm), rearfoot width (mm), medial arch height (mm), lateral arch height (mm) and forefoot to rearfoot alignment (degrees). Intraclass correlation coefficients (ICC) with 95% confidence intervals (CI) were calculated to determine the intra and inter-rater reliability. Measurement error was assessed through the calculation of the standard error of the measurement (SEM) and smallest real difference (SRD). Results ICC values for all foot parameters using digital scanning ranged between 0.81-0.99 for both intra and inter-rater reliability. For neutral suspension casting technique inter-rater reliability values ranged from 0.57-0.99 and intra-rater reliability values ranging from 0.36-0.99 for rater 1 and 0.49-0.99 for rater 2. Conclusions The findings of this study indicate that digital scanning is a reliable technique, irrespective of clinical experience, with reduced measurement variability in all foot parameters investigated when compared to neutral suspension casting. PMID:21375757
Dichter, Martin Nikolaus; Schwab, Christian G G; Meyer, Gabriele; Bartholomeyczik, Sabine; Dortmann, Olga; Halek, Margareta
2014-05-01
Quality of life (Qol) is an increasingly used outcome measure in dementia research. The QUALIDEM is a dementia-specific and proxy-rated Qol instrument. We aimed to determine the inter-rater and intra-rater reliability in residents with dementia in German nursing homes. The QUALIDEM consists of nine subscales that were applied to a sample of 108 people with mild to severe dementia and six consecutive subscales that were applied to a sample of 53 people with very severe dementia. The proxy raters were 49 registered nurses and nursing assistants. Inter-rater and intra-rater reliability scores were calculated on the subscale and item level. None of the QUALIDEM subscales showed strong inter-rater reliability based on the single-measure Intra-Class Correlation Coefficient (ICC) for absolute agreement ≥ 0.70. Based on the average-measure ICC for four raters, eight subscales for people with mild to severe dementia (care relationship, positive affect, negative affect, restless tense behavior, social relations, social isolation, feeling at home and having something to do) and five subscales for very severe dementia (care relationship, negative affect, restless tense behavior, social relations and social isolation) yielded a strong inter-rater agreement (ICC: 0.72-0.86). All of the QUALIDEM subscales, regardless of dementia severity, showed strong intra-rater agreement. The ICC values ranged between 0.70 and 0.79 for people with mild to severe dementia and between 0.75 and 0.87 for people with very severe dementia. This study demonstrated insufficient inter-rater reliability and sufficient intra-rater reliability for all subscales of both versions of the German QUALIDEM. The degree of inter-rater reliability can be improved by collaborative Qol rating by more than one nurse. The development of a measurement manual with accurate item definitions and a standardized education program for proxy raters is recommended.
Children's Reaction to Types of Television. Technical Report No. 28.
ERIC Educational Resources Information Center
Hines, Brainard W.
An observational system having high inter-rater reliability and providing a reliable estimate of patterns of behavior across time periods is developed and tested for use in evaluating children's responses to a number of television styles and modes of presentation. This project was designed to meet three goals: first, to develop a valid and…
Diagnostic reliability of MMPI-2 computer-based test interpretations.
Pant, Hina; McCabe, Brian J; Deskovitz, Mark A; Weed, Nathan C; Williams, John E
2014-09-01
Reflecting the common use of the MMPI-2 to provide diagnostic considerations, computer-based test interpretations (CBTIs) also typically offer diagnostic suggestions. However, these diagnostic suggestions can sometimes be shown to vary widely across different CBTI programs even for identical MMPI-2 profiles. The present study evaluated the diagnostic reliability of 6 commercially available CBTIs using a 20-item Q-sort task developed for this study. Four raters each sorted diagnostic classifications based on these 6 CBTI reports for 20 MMPI-2 profiles. Two questions were addressed. First, do users of CBTIs understand the diagnostic information contained within the reports similarly? Overall, diagnostic sorts of the CBTIs showed moderate inter-interpreter diagnostic reliability (mean r = .56), with sorts for the 1/2/3 profile showing the highest inter-interpreter diagnostic reliability (mean r = .67). Second, do different CBTIs programs vary with respect to diagnostic suggestions? It was found that diagnostic sorts of the CBTIs had a mean inter-CBTI diagnostic reliability of r = .56, indicating moderate but not strong agreement across CBTIs in terms of diagnostic suggestions. The strongest inter-CBTI diagnostic agreement was found for sorts of the 1/2/3 profile CBTIs (mean r = .71). Limitations and future directions are discussed. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Hanskamp-Sebregts, Mirelle; Zegers, Marieke; Vincent, Charles; van Gurp, Petra J; de Vet, Henrica C W; Wollersheim, Hub
2016-01-01
Objectives Record review is the most used method to quantify patient safety. We systematically reviewed the reliability and validity of adverse event detection with record review. Design A systematic review of the literature. Methods We searched PubMed, EMBASE, CINAHL, PsycINFO and the Cochrane Library and from their inception through February 2015. We included all studies that aimed to describe the reliability and/or validity of record review. Two reviewers conducted data extraction. We pooled κ values (κ) and analysed the differences in subgroups according to number of reviewers, reviewer experience and training level, adjusted for the prevalence of adverse events. Results In 25 studies, the psychometric data of the Global Trigger Tool (GTT) and the Harvard Medical Practice Study (HMPS) were reported and 24 studies were included for statistical pooling. The inter-rater reliability of the GTT and HMPS showed a pooled κ of 0.65 and 0.55, respectively. The inter-rater agreement was statistically significantly higher when the group of reviewers within a study consisted of a maximum five reviewers. We found no studies reporting on the validity of the GTT and HMPS. Conclusions The reliability of record review is moderate to substantial and improved when a small group of reviewers carried out record review. The validity of the record review method has never been evaluated, while clinical data registries, autopsy or direct observations of patient care are potential reference methods that can be used to test concurrent validity. PMID:27550650
Phythian, C J; Toft, N; Cripps, P J; Michalopoulou, E; Winter, A C; Jones, P H; Grove-White, D; Duncan, J S
2013-07-01
A scientific literature review and consensus of expert opinion used the welfare definitions provided by the Farm Animal Welfare Council (FAWC) Five Freedoms as the framework for selecting a set of animal-based indicators that were sensitive to the current on-farm welfare issues of young lambs (aged ≤ 6 weeks). Ten animal-based indicators assessed by observation - demeanour, response to stimulation, shivering, standing ability, posture, abdominal fill, body condition, lameness, eye condition and salivation were tested as part of the objective of developing valid, reliable and feasible animal-based measures of lamb welfare The indicators were independently tested on 966 young lambs from 17 sheep flocks across Northwest England and Wales during December 2008 to April 2009 by four trained observers. Inter-observer reliability was assessed using Fleiss's kappa (κ), and the pair-wise agreement with an experienced, observer designated as the 'test standard observer' (TSO) was examined using Cohen's κ. Latent class analysis (LCA) estimated the sensitivity (Se) and specificity (Sp) of each observer without assuming a gold standard and predicted the Se and Sp of randomly selected observers who may apply the indicators in the future. Overall, good levels of inter-observer reliability, and high levels of Sp were identified for demeanour (κ = 0.54, Se ≥ 0.70, Sp ≥ 0.98), stimulation (κ = 0.57, Se = 0.30 to 0.77, Sp ≥ 0.98), shivering (κ = 0.55, Se = 0.37 to 0.85, Sp ≥ 0.99), standing ability (0.54, Se ≥ 0.80, Sp ≥ 0.99), posture (κ = 0.45, Se ≥ 0.56, Sp = 0.99), abdominal fill (κ = 0.44, Se = 0.39 to 0.98, Sp = 0.99), body condition (κ = 0.72, Se ⩾ 0.38 to 0.90, Sp = 0.99), lameness (κ = 0.68, Se > 0.73, Sp = 1.00), and eye condition (κ = 0.72, Se ≥ 0.86, Sp = 0.99). LCA predicted that randomly selected observers had Se > 0.77 (acceptable), and Sp ≥ 0.98 (high) for assessments of demeanour, lameness, abdominal fill posture, body condition and eye condition. The diagnostic performance of some indicators was influenced by the composition of the study population, and it would be useful to test the indicators on lambs with a greater level of outcomes associated with poor welfare. The findings presented in this paper could be applied in the selection of valid, reliable and feasible indicators used for the purposes of on-farm assessments of lamb welfare.
Tollafield, David R
2017-01-01
The management of plantar corns and callus has a low cost-benefit with reduced prioritisation in healthcare. The distinction between types of keratin lesions that forms corns and callus has attracted limited interest. Observation is imperative to improving diagnostic predictions and a number of studies point to some confusion as to how best to achieve this. The use of photographic observation has been proposed to improve our understanding of intractable keratin lesions. Students from a podiatry school reviewed photographs where plantar keratin lesions were divided into four nominal groups; light callus (Grade 1), heavy defined callus (Grade 2), concentric keratin plugs (Grade 3) and callus with deeper density changes under the forefoot (Grade 4). A group of 'experts' assigned from qualified podiatrists validated the observer rated responses by the students. Cohen's weighted statistic (k) was used to measure inter-observer reliability. First year students (unskilled) performed less well when viewing photographs ( k = 0.33) compared to third year students (semi-skilled, k = 0.62). The experts performed better than students ( k = 0.88) providing consistency with wound care models in other studies. Improved clinical annotation of clinical features, supported by classification of keratin- based lesions, combined with patient outcome tools, could improve the scientific rationale to prioritise patient care. Problems associated with photographic assessment involves trying to differentiate similar lesions without the benefit of direct palpation. Direct observation of callus with and without debridement requires further investigation alongside the model proposed in this paper.
Allepuz, Alejandro; Espallargues, Mireia; Moharra, Montse; Comas, Mercè; Pons, Joan MV
2008-01-01
Background Prioritisation instruments were developed for patients on waiting list for hip and knee arthroplasties (AI) and cataract surgery (CI). The aim of the study was to assess their convergent and discriminant validity and inter-observer reliability. Methods Multicentre validation study which included orthopaedic surgeons and ophthalmologists from 10 hospitals. Participating doctors were asked to include all eligible patients placed in the waiting list for the procedures under study during the medical visit. Doctors assessed patients' priority through a visual analogue scale (VAS) and administered the prioritisation instrument. Information on socio-demographic data and health-related quality of life (HRQOL) (HUI3, EQ-5D, WOMAC and VF-14) was obtained through a telephone interview with patients. The correlation coefficients between the prioritisation instrument score and VAS and HRQOL were calculated. For the reliability study a self-administered questionnaire, which included hypothetic patients' scenarios, was sent via postal mail to the doctors. The priority of these scenarios was assessed through the prioritisation instrument. The intraclass correlation coefficient (ICC) between doctors was calculated. Results Correlations with VAS were strong for the AI (0.64, CI95%: 0.59–0.68) and for the CI (0.65, CI95%: 0.62–0.69), and moderate between the WOMAC and the AI (0.39, CI95%: 0.33–0.45) and the VF-14 and the CI (0.38, IC95%: 0.33–0.43). The results of the discriminant analysis were in general as expected. Inter-observer reliability was 0.79 (CI95%: 0.64–0.94) for the AI, and 0.79 (CI95%: 0.63–0.95) for the CI. Conclusion The results show acceptable validity and reliability of the prioritisation instruments in establishing priority for surgery. PMID:18397519
Identifying and classifying hyperostosis frontalis interna via computerized tomography.
May, Hila; Peled, Nathan; Dar, Gali; Hay, Ori; Abbas, Janan; Masharawi, Youssef; Hershkovitz, Israel
2010-12-01
The aim of this study was to recognize the radiological characteristics of hyperostosis frontalis interna (HFI) and to establish a valid and reliable method for its identification and classification. A reliability test was carried out on 27 individuals who had undergone a head computerized tomography (CT) scan. Intra-observer reliability was obtained by examining the images three times, by the same researcher, with a 2-week interval between each sample ranking. The inter-observer test was performed by three independent researchers. A validity test was carried out using two methods for identifying and classifying HFI: 46 cadaver skullcaps were ranked twice via computerized tomography scans and then by direct observation. Reliability and validity were calculated using Kappa test (SPSS 15.0). Reliability tests of ranking HFI via CT scans demonstrated good results (K > 0.7). As for validity, a very good consensus was obtained between the CT and direct observation, when moderate and advanced types of HFI were present (K = 0.82). The suggested classification method for HFI, using CT, demonstrated a sensitivity of 84%, specificity of 90.5%, and positive predictive value of 91.3%. In conclusion, volume rendering is a reliable and valid tool for identifying HFI. The suggested three-scale classification is most suitable for radiological diagnosis of the phenomena. Considering the increasing awareness of HFI as an early indicator of a developing malady, this study may assist radiologists in identifying and classifying the phenomena.
Inter-rater reliability of select physical examination procedures in patients with neck pain.
Hanney, William J; George, Steven Z; Kolber, Morey J; Young, Ian; Salamh, Paul A; Cleland, Joshua A
2014-07-01
This study evaluated the inter-rater reliability of select examination procedures in patients with neck pain (NP) conducted over a 24- to 48-h period. Twenty-two patients with mechanical NP participated in a standardized examination. One examiner performed standardized examination procedures and a second blinded examiner repeated the procedures 24-48 h later with no treatment administered between examinations. Inter-rater reliability was calculated with the Cohen Kappa and weighted Kappa for ordinal data while continuous level data were calculated using an intraclass correlation coefficient model 2,1 (ICC2,1). Coefficients for categorical variables ranged from poor to moderate agreement (-0.22 to 0.70 Kappa) and coefficients for continuous data ranged from slight to moderate (ICC2,1 0.28-0.74). The standard error of measurement for cervical range of motion ranged from 5.3° to 9.9° while the minimal detectable change ranged from 12.5° to 23.1°. This study is the first to report inter-rater reliability values for select components of the cervical examination in those patients with NP performed 24-48 h after the initial examination. There was considerably less reliability when compared to previous studies, thus clinicians should consider how the passage of time may influence variability in examination findings over a 24- to 48-h period.
Evaluating the reliability of an injury prevention screening tool: Test-retest study.
Gittelman, Michael A; Kincaid, Madeline; Denny, Sarah; Wervey Arnold, Melissa; FitzGerald, Michael; Carle, Adam C; Mara, Constance A
2016-10-01
A standardized injury prevention (IP) screening tool can identify family risks and allow pediatricians to address behaviors. To assess behavior changes on later screens, the tool must be reliable for an individual and ideally between household members. Little research has examined the reliability of safety screening tool questions. This study utilized test-retest reliability of parent responses on an existing IP questionnaire and also compared responses between household parents. Investigators recruited parents of children 0 to 1 year of age during admission to a tertiary care children's hospital. When both parents were present, one was chosen as the "primary" respondent. Primary respondents completed the 30-question IP screening tool after consent, and they were re-screened approximately 4 hours later to test individual reliability. The "second" parent, when present, only completed the tool once. All participants received a 10-dollar gift card. Cohen's Kappa was used to estimate test-retest reliability and inter-rater agreement. Standard test-retest criteria consider Kappa values: 0.0 to 0.40 poor to fair, 0.41 to 0.60 moderate, 0.61 to 0.80 substantial, and 0.81 to 1.00 as almost perfect reliability. One hundred five families participated, with five lost to follow-up. Thirty-two (30.5%) parent dyads completed the tool. Primary respondents were generally mothers (88%) and Caucasian (72%). Test-retest of the primary respondents showed their responses to be almost perfect; average 0.82 (SD = 0.13, range 0.49-1.00). Seventeen questions had almost perfect test-retest reliability and 11 had substantial reliability. However, inter-rater agreement between household members for 12 objective questions showed little agreement between responses; inter-rater agreement averaged 0.35 (SD = 0.34, range -0.19-1.00). One question had almost perfect inter-rater agreement and two had substantial inter-rater agreement. The IP screening tool used by a single individual had excellent test-retest reliability for nearly all questions. However, when a reporter changes from pre- to postintervention, differences may reflect poor reliability or different subjective experiences rather than true change.
Ihejirika, Rivka C; Thakore, Rachel V; Sathiyakumar, Vasanth; Ehrenfeld, Jesse M; Obremskey, William T; Sethi, Manish K
2015-04-01
Although recent literature has demonstrated the utility of the ASA score in predicting postoperative length of stay, complication risk and potential utilization of other hospital resources, the ASA score has been inconsistently assigned by anaesthesia providers. This study tested the reliability of assignment of the ASA score classification by both attending anaesthesiologists and anaesthesia residents specifically among the orthopaedic trauma patient population. Nine case-based scenarios were created involving preoperative patients with isolated operative orthopaedic trauma injuries. The cases were created and assigned a reference score by both an attending anaesthesiologist and orthopaedic trauma surgeon. Attending and resident anaesthesiologists were asked to assign an ASA score for each case. Rater versus reference and inter-rater agreement amongst respondents was then analyzed utilizing Fleiss's Kappa and weighted and unweighted Cohen's Kappa. Thirty three individuals provided ASA scores for each of the scenarios. The average rater versus reference reliability was substantial (Kw=0.78, SD=0.131, 95% CI=0.73-0.83). The average rater versus reference Kuw was also substantial (Kuw=0.64, SD=0.21, 95% CI=0.56-0.71). The inter-rater reliability as evaluated by Fleiss's Kappa was moderate (K=0.51, p<.001). An inter-rater comparison within the group of attendings (K=0.50, p<.001) and within the group of residents were both moderate (K=0.55, p<.001). There was a significant increase in the level of inter-rater reliability from the self-reported 'very uncomfortable' participants to the 'very comfortable' participants (uncomfortable K=0.43, comfortable K=0.59, p<.001). This study shows substantial agreement strength for reliability of the ASA score among anaesthesiologists when evaluating orthopaedic trauma patients. The significant increase in inter-rater reliability based on anaesthesiologists' comfort with the ASA scoring method implies a need for further evaluation of ASA assessment training and routine use on the ground. These findings support the use of the ASA score as a statistically reliable tool in orthopaedic trauma. Copyright © 2014 Elsevier Ltd. All rights reserved.
Ray, Stephen; Rayamajhi, Ajit; Bonnett, Laura J; Solomon, Tom; Kneen, Rachel; Griffiths, Michael J
2018-02-01
Background Acute encephalitis syndrome (AES) is a common cause of coma in Nepali children. The Glasgow coma scale (GCS) is used to assess the level of coma in these patients and predict outcome. Alternative coma scales may have better inter-rater reliability and prognostic value in encephalitis in Nepali children, but this has not been studied. The Adelaide coma scale (ACS), Blantyre coma scale (BCS) and the Alert, Verbal, Pain, Unresponsive scale (AVPU) are alternatives to the GCS which can be used. Methods Children aged 1-14 years who presented to Kanti Children's Hospital, Kathmandu with AES between September 2010 and November 2011 were recruited. All four coma scales (GCS, ACS, BCS and AVPU) were applied on admission, 48 h later and on discharge. Inter-rater reliability (unweighted kappa) was measured for each. Correlation and agreement between total coma score and outcome (Liverpool outcome score) was measured by Spearman's rank and Bland-Altman plot. The prognostic value of coma scales alone and in combination with physiological variables was investigated in a subgroup (n = 22). A multivariable logistic regression model was fitted by backward stepwise. Results Fifty children were recruited. Inter-rater reliability using the variables scales was fair to moderate. However, the scales poorly predicted clinical outcome. Combining the scales with physiological parameters such as systolic blood pressure improved outcome prediction. Conclusion This is the first study to compare four coma scales in Nepali children with AES. The scales exhibited fair to moderate inter-rater reliability. However, the study is inadequately powered to answer the question on the relationship between coma scales and outcome. Further larger studies are required.
Systematic behavioural observation of executive performance after brain injury.
Lewis, Mark W; Babbage, Duncan R; Leathem, Janet M
2017-01-01
To develop an ecologically valid measure of executive functioning (i.e. Planning and Organization, Executive Memory, Initiation, Cognitive Shifting, Impulsivity, Sustained and Directed Attention, Error Detection, Error Correction and Time Management) during a functional chocolate brownie cooking task. In Study 1, the inter-rater reliability of a novel behavioural observation assessment method was assessed with 10 people with traumatic brain injury (TBI). In Study 2, 27 people with TBI and 16 healthy controls completed the functional task along with other measures of executive functioning to assess validity. Intraclass correlation coefficients for six of the nine aspects of executive functioning ranged from .54 to 1.00. Percentage agreements for the remaining aspects ranged from 70% to 90%. Significant and non-significant, moderate, correlations were found between the functional cooking task and standard neuropsychological measures. The healthy control group performed better than the TBI group in six areas (d = 0.56 to 1.23). In this initial trial of a novel assessment method, adequate inter-rater reliability was found. The measure was associated with standard neuropsychological measures, and our healthy control group performed better than the TBI group. The measure appears to be an ecologically valid measure of executive functioning.
Han, Paul K J; Joekes, Katherine; Mills, Greg; Gutheil, Caitlin; Smith, Kahsi; Cochran, Nancy E; Elwyn, Glyn
2016-12-01
To develop and evaluate a brief observational measure of clinical risk communication competence. A 4-item checklist-type measure, the BRISK (Brief Risk Information Skill) Scale, was developed by selecting and refining items from a more comprehensive measure of clinical risk communication competence. Six volunteer raters received brief training on the measure and then used the BRISK Scale to evaluate 52 video-recorded encounters between 2nd-year medical students and standardized patients conducted as part of an Observed Structured Clinical Examination (OSCE) involving a risk communication task. Internal consistency reliability, inter-rater reliability, and criterion validity were assessed. Raters reported no difficulties using the BRISK Scale; scores across all raters and subjects ranged from 0 to 16 with a mean score of 6.49 (SD=3.17). The BRISK Scale showed good internal consistency reliability (α=0.64), and inter-rater reliability at the scale level (Intraclass Correlation Coefficient (ICC)=0.79 for consistency, and 0.75 for absolute agreement) and individual-item level (ICC range: 0.62-.91). Novice raters' BRISK Scale scores were highly correlated (r=0.84, p<0.01) with expert raters' scores on the Risk Communication Content measure, a more comprehensive measure of risk communication competence. The BRISK Scale is a promising new brief observational measure of clinical risk communication competence. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Just, Tino; Lankenau, Eva; Prall, Friedrich; Hüttmann, Gereon; Pau, Hans Wilhelm; Sommer, Konrad
2010-10-01
A newly developed microscope-based spectral-domain optical coherence tomography (SD-OCT) device and an endoscope-based time-domain OCT (TD-OCT) were used to assess the inter-rater reliability, sensitivity, specificity, and accuracy of benign and dysplastic laryngeal epithelial lesions. Prospective study. OCT during microlaryngoscopy was done on 35 patients with an endoscope-based TD-OCT, and on 26 patients by an SD-OCT system integrated into an operating microscope. Biopsies were taken from microscopically suspicious lesions allowing comparative study of OCT images and histology. Thickness of the epithelium was seen to be the main criterion for degree of dysplasia. The inter-rater reliability for two observers was found to be kappa = 0.74 (P <.001) for OCT. OCT provided test outcomes for differentiation between benign laryngeal lesions and dysplasia/CIS with sensitivity of 88%, specificity of 89%, PPV of 85%, NPV of 91%, and predictive accuracy of 88%. However, because of the limited penetration depth of the laser light primarily in hyperkeratotic lesions (thickness above 1.5 mm), the basal cell layer was no longer visible, precluding reliable assessment of such lesions. OCT allows for a fairly accurate assessment of benign and dysplastic laryngeal epithelial lesion and greatly facilitates the taking of precise biopsies. Laryngoscope, 2010.
Holland-Letz, Tim; Endres, Heinz G; Biedermann, Stefanie; Mahn, Matthias; Kunert, Joachim; Groh, Sabine; Pittrow, David; von Bilderling, Peter; Sternitzky, Reinhardt; Diehm, Curt
2007-05-01
The reliability of ankle-brachial index (ABI) measurements performed by different observer groups in primary care has not yet been determined. The aims of the study were to provide precise estimates for all effects influencing the variability of the ABI (patients' individual variability, intra- and inter-observer variability), with particular focus on the performance of different observer groups. Using a partially balanced incomplete block design, 144 unselected individuals aged > or = 65 years underwent double ABI measurements by one vascular surgeon or vascular physician, one family physician and one nurse with training in Doppler sonography. Three groups comprising a total of 108 individuals were analyzed (only two with ABI < 0.90). Errors for two repeated measurements for all three observer groups did not differ (experts 8.5%, family physicians 7.7%, and nurses 7.5%, p = 0.39). There was no relevant bias among observer groups. Intra-observer variability expressed as standard deviation divided by the mean was 8%, and inter-observer variability was 9%. In conclusion, reproducibility of the ABI measurement was good in this cohort of elderly patients who almost all had values in the normal range. The mean error of 8-9% within or between observers is smaller than with established screening measures. Since there were no differences among observers with different training backgrounds, our study confirms the appropriateness of ABI assessment for screening peripheral arterial disease (PAD) and generalized atherosclerosis in the primary case setting. Given the importance of the early detection and management of PAD, this diagnostic tool should be used routinely as a standard for PAD screening. Additional studies will be required to confirm our observations in patients with PAD of various severities.
Kozel, Bernd; Grieser, Manuela; Abderhalden, Christoph; Cutcliffe, John R
2016-10-01
In comparison to the general population, the suicide rates of psychiatric inpatient populations in Germany and Switzerland are very high. An important preventive contribution to the lowering of the suicide rates in mental health care is to ensure that the risk of suicide of psychiatric inpatients is assessed as accurately as possible. While risk-assessment instruments can serve an important function in determining such risk, very few have been translated to German. Therefore, in the present study, we reported on the German version of Nurses' Global Assessment of Suicide Risk (NGASR) scale. After translating the original instrument into German and pretesting the German version, we tested the inter-rater reliability of the instrument. Twelve video case studies were evaluated by 13 raters with the NGASR scale in a 'laboratory' trial. In each case, the observer's agreement was calculated for the single items, the overall scale, the risk levels, and the sum scores. The statistical data analysis was conducted with kappa and AC1 statistics for dichotomous (items, scale) scales. A high-to-very high observers' agreement (AC1: 0.62-1.00, kappa: 0.00-1.00) was determined for 16 items of the German version of the NGASR scale. We conclude that the German version of the NGASR scale is a reliable instrument for evaluating risk factors for suicide. A reliable application in the clinical practise appears to be enhanced by training in the use of the instrument and the right implementation instructions. © 2016 Australian College of Mental Health Nurses Inc.
Ghobrial, Fady Emil Ibrahim; Eldin, Manal Salah; Razek, Ahmed Abdel Khalek Abdel; Atwan, Nadia Ibrahim; Shamaa, Sameh Sayed Ahmed
2017-01-01
To assess inter-observer agreement of revised RECIST criteria (version 1.1) for computed tomography assessment of hepatic metastases of breast cancer. A prospective study was conducted in 28 female patients with breast cancer and with at least one measurable metastatic lesion in the liver that was treated with 3 cycles of anthracycline-based chemotherapy. All patients underwent computed tomography of the abdomen with 64-row multi- detector CT at baseline and after 3 cycles of chemotherapy for response assessment. Image analysis was performed by 2 observers, based on the RECIST criteria (version 1.1). Computed tomography revealed partial response of hepatic metastases in 7 patients (25%) by one observer and in 10 patients (35.7%) by the other observer, with good inter-observer agreement (k=0.75, percent agreement of 89.29%). Stable disease was detected in 19 patients (67.8%) by one observer and in 16 patients (57.1%) by the other observer, with good agreement (k=0.774, percent agreement of 89.29%). Progressive disease was detected in 2 patients (7.2%) by both observers, with perfect agreement (k=1, percent agreement of 100%). The overall inter-observer agreement in the CT-based response assessment of hepatic metastasis between the two observers was good ( k =0.793, percent agreement of 89.29%). We concluded that computed tomography is a reliable and reproducible imaging modality for response assessment of hepatic metastases of breast cancer according to the RECIST criteria (version 1.1).
Alyusuf, Raja H.; Prasad, Kameshwar; Abdel Satir, Ali M.; Abalkhail, Ali A.; Arora, Roopa K.
2013-01-01
Background: The exponential use of the internet as a learning resource coupled with varied quality of many websites, lead to a need to identify suitable websites for teaching purposes. Aim: The aim of this study is to develop and to validate a tool, which evaluates the quality of undergraduate medical educational websites; and apply it to the field of pathology. Methods: A tool was devised through several steps of item generation, reduction, weightage, pilot testing, post-pilot modification of the tool and validating the tool. Tool validation included measurement of inter-observer reliability; and generation of criterion related, construct related and content related validity. The validated tool was subsequently tested by applying it to a population of pathology websites. Results and Discussion: Reliability testing showed a high internal consistency reliability (Cronbach's alpha = 0.92), high inter-observer reliability (Pearson's correlation r = 0.88), intraclass correlation coefficient = 0.85 and κ =0.75. It showed high criterion related, construct related and content related validity. The tool showed moderately high concordance with the gold standard (κ =0.61); 92.2% sensitivity, 67.8% specificity, 75.6% positive predictive value and 88.9% negative predictive value. The validated tool was applied to 278 websites; 29.9% were rated as recommended, 41.0% as recommended with caution and 29.1% as not recommended. Conclusion: A systematic tool was devised to evaluate the quality of websites for medical educational purposes. The tool was shown to yield reliable and valid inferences through its application to pathology websites. PMID:24392243
2013-01-01
Background This study investigates the reliability of muscle performance tests using cost- and time-effective methods similar to those used in clinical practice. When conducting reliability studies, great effort goes into standardising test procedures to facilitate a stable outcome. Therefore, several test trials are often performed. However, when muscle performance tests are applied in the clinical setting, clinicians often only conduct a muscle performance test once as repeated testing may produce fatigue and pain, thus variation in test results. We aimed to investigate whether cervical muscle performance tests, which have shown promising psychometric properties, would remain reliable when examined under conditions similar to those of daily clinical practice. Methods The intra-rater (between-day) and inter-rater (within-day) reliability was assessed for five cervical muscle performance tests in patients with (n = 33) and without neck pain (n = 30). The five tests were joint position error, the cranio-cervical flexion test, the neck flexor muscle endurance test performed in supine and in a 45°-upright position and a new neck extensor test. Results Intra-rater reliability ranged from moderate to almost perfect agreement for joint position error (ICC ≥ 0.48-0.82), the cranio-cervical flexion test (ICC ≥ 0.69), the neck flexor muscle endurance test performed in supine (ICC ≥ 0.68) and in a 45°-upright position (ICC ≥ 0.41) with the exception of a new test (neck extensor test), which ranged from slight to moderate agreement (ICC = 0.14-0.41). Likewise, inter-rater reliability ranged from moderate to almost perfect agreement for joint position error (ICC ≥ 0.51-0.75), the cranio-cervical flexion test (ICC ≥ 0.85), the neck flexor muscle endurance test performed in supine (ICC ≥ 0.70) and in a 45°-upright position (ICC ≥ 0.56). However, only slight to fair agreement was found for the neck extensor test (ICC = 0.19-0.25). Conclusions Intra- and inter-rater reliability ranged from moderate to almost perfect agreement with the exception of a new test (neck extensor test), which ranged from slight to moderate agreement. The significant variability observed suggests that tests like the neck extensor test and the neck flexor muscle endurance test performed in a 45°-upright position are too unstable to be used when evaluating neck muscle performance. PMID:24299621
Vrtovec, Tomaž; Pernuš, Franjo; Likar, Boštjan
2014-10-01
In this study, sagittal vertebral inclination (SVI) was systematically evaluated for 28 vertebrae (segments between T4 and L5) in magnetic resonance (MR) images of one normal and one scoliotic subject to compare the performance of manual and computerized measurements, and identify the most reproducible and reliable measurements. Manual measurements were performed by three observers, who identified on two occasions the distinctive anatomical landmarks required to evaluate SVI by six measurement methods, i.e. the superior tangents, inferior tangents, anterior tangents, posterior tangents, mid-endplate lines and mid-wall lines. Computerized measurements were performed by automatically evaluating SVI from the symmetry of vertebral anatomical structures in two-dimensional (2D) sagittal cross-sections and in three-dimensional (3D) volumetric images. The mid-wall lines and posterior tangents proved to be the manual measurements with the lowest intra-observer (standard deviation, SD, of 1.4° and 1.7°, respectively) and inter-observer variability (SD of 1.9° and 2.4°, respectively). The strongest inter-method agreement was found between the mid-wall lines and posterior tangents (SD of 2.0°). Computerized measurements in 2D and in 3D resulted in intra-observer (SD of 2.8° and 3.1°, respectively) and inter-observer variability (SD of 3.8° and 5.2°, respectively) that were comparable to those of the superior tangents (SD of 2.6° and 3.7°) and inferior tangents (SD of 3.2° and 4.5°), which represent standard Cobb angle measurements. It can be concluded that computerized measurements of SVI should be based on the inclination of vertebral body walls. Copyright © 2014 Elsevier Ltd. All rights reserved.
Taghipour, Morteza; Mohseni-Bandpei, Mohammad Ali; Behtash, Hamid; Abdollahi, Iraj; Rajabzadeh, Fatemeh; Pourahmadi, Mohammad Reza; Emami, Mahnaz
2018-04-24
Rehabilitative ultrasound (US) imaging is one of the popular methods for investigating muscle morphologic characteristics and dimensions in recent years. The reliability of this method has been investigated in different studies. As studies have been performed with different designs and quality, reported values of rehabilitative US have a wide range. The objective of this study was to systematically review the literature conducted on the reliability of rehabilitative US imaging for the assessment of deep abdominal and lumbar trunk muscle dimensions. The PubMed/MEDLINE, Scopus, Google Scholar, Science Direct, Embase, Physiotherapy Evidence, Ovid, and CINAHL databases were searched to identify original research articles conducted on the reliability of rehabilitative US imaging published from June 2007 to August 2017. The articles were qualitatively assessed; reliability data were extracted; and the methodological quality was evaluated by 2 independent reviewers. Of the 26 included studies, 16 were considered of high methodological quality. Except for 2 studies, all high-quality studies reported intraclass correlation coefficients (ICCs) for intra-rater reliability of 0.70 or greater. Also, ICCs reported for inter-rater reliability in high-quality studies were generally greater than 0.70. Among low-quality studies, reported ICCs ranged from 0.26 to 0.99 and 0.68 to 0.97 for intra- and inter-rater reliability, respectively. Also, the reported standard error of measurement and minimal detectable change for rehabilitative US were generally in an acceptable range. Generally, the results of the reviewed studies indicate that rehabilitative US imaging has good levels of both inter- and intra-rater reliability. © 2018 by the American Institute of Ultrasound in Medicine.
Inter-rater reliability of three standardized functional tests in patients with low back pain
Tidstrand, Johan; Horneij, Eva
2009-01-01
Background Of all patients with low back pain, 85% are diagnosed as "non-specific lumbar pain". Lumbar instability has been described as one specific diagnosis which several authors have described as delayed muscular responses, impaired postural control as well as impaired muscular coordination among these patients. This has mostly been measured and evaluated in a laboratory setting. There are few standardized and evaluated functional tests, examining functional muscular coordination which are also applicable in the non-laboratory setting. In ordinary clinical work, tests of functional muscular coordination should be easy to apply. The aim of this present study was to therefore standardize and examine the inter-rater reliability of three functional tests of muscular functional coordination of the lumbar spine in patients with low back pain. Methods Nineteen consecutive individuals, ten men and nine women were included. (Mean age 42 years, SD ± 12 yrs). Two independent examiners assessed three tests: "single limb stance", "sitting on a Bobath ball with one leg lifted" and "unilateral pelvic lift" on the same occasion. The standardization procedure took altered positions of the spine or pelvis and compensatory movements of the free extremities into account. The inter-rater reliability was analyzed by Cohen's kappa coefficient (κ) and by percentage agreement. Results The inter-rater reliability for the right and the left leg respectively was: for the single limb stance very good (κ: 0.88–1.0), for sitting on a Bobath ball good (κ: 0.79) and very good (κ: 0.88) and for the unilateral pelvic lift: good (κ: 0.61) and moderate (κ: 0.47). Conclusion The present study showed good to very good inter-rater reliability for two standardized tests, that is, the single-limb stance and sitting on a Bobath-ball with one leg lifted. Inter-rater reliability for the unilateral pelvic lift test was moderate to good. Validation of the tests in their ability to evaluate lumbar stability is required. PMID:19490644
Feasibility and inter-rater reliability of the ICU Mobility Scale.
Hodgson, Carol; Needham, Dale; Haines, Kimberley; Bailey, Michael; Ward, Alison; Harrold, Megan; Young, Paul; Zanni, Jennifer; Buhr, Heidi; Higgins, Alisa; Presneill, Jeff; Berney, Sue
2014-01-01
The objectives of this study were to develop a scale for measuring the highest level of mobility in adult ICU patients and to assess its feasibility and inter-rater reliability. Growing evidence supports the feasibility, safety and efficacy of early mobilization in the intensive care unit (ICU). However, there are no adequately validated tools to quickly, easily, and reliably describe the mobility milestones of adult patients in ICU. Identifying or developing such a tool is a priority for evaluating mobility and rehabilitation activities for research and clinical care purposes. This study was performed at two ICUs in Australia. Thirty ICU nursing, and physiotherapy staff assessed the feasibility of the 'ICU Mobility Scale' (IMS) using a 10-item questionnaire. The inter-rater reliability of the IMS was assessed by 2 junior physical therapists, 2 senior physical therapists, and 16 nursing staff in 100 consecutive medical, surgical or trauma ICU patients. An 11 point IMS scale was developed based on multidisciplinary input. Participating clinicians reported that the scale was clear, with 95% of respondents reporting that it took <1 min to complete. The junior and senior physical therapists showed the highest inter-rater reliability with a weighted Kappa (95% confidence interval) of 0.83 (0.76-0.90), while the senior physical therapists and nurses and the junior physical therapists and nurses had a weighted Kappa of 0.72 (0.61-0.83) and 0.69 (0.56-0.81) respectively. The IMS is a feasible tool with strong inter-rater reliability for measuring the maximum level of mobility of adult patients in the ICU. Copyright © 2014 Elsevier Inc. All rights reserved.
Roberts, M J; Gale, T C E; Sice, P J A; Anderson, I R
2013-06-01
Selection to specialty training is a high-stakes assessment demanding valuable consultant time. In one initial entry level and two higher level anaesthesia selection centres, we investigated the feasibility of using staff participating in simulation scenarios, rather than observing consultants, to rate candidate performance. We compared participant and observer scores using four different outcomes: inter-rater reliability; score distributions; correlation of candidate rankings; and percentage of candidates whose selection might be affected by substituting participants' for observers' ratings. Inter-rater reliability between observers was good (correlation coefficient 0.73-0.96) but lower between participants (correlation coefficient 0.39-0.92), particularly at higher level where participants also rated candidates more favourably than did observers. Station rank orderings were strongly correlated between the rater groups at entry level (rho 0.81, p < 0.001) but weaker at the two higher level centres (rho 0.52, p = 0.018; rho 0.58, p = 0.001). Substituting participants' for observers' ratings had less effect once scores were combined with those from other selection centre stations. Selection decisions for 0-20% of candidates could have changed, depending on the numbers of training posts available. We conclude that using participating raters is feasible at initial entry level only. Anaesthesia © 2013 The Association of Anaesthetists of Great Britain and Ireland.
Redley, Bernice; Waugh, Rachael
2018-04-01
Nurse bedside handover quality is influenced by complex interactions related to the content, processes used and the work environment. Audit tools are seldom tested in 'real' settings. Examine the reliability, validity and usability of a quality improvement tool for audit of nurse bedside handover. Naturalistic, descriptive, mixed-methods. Six inpatient wards at a single large not-for-profit private health service in Victoria, Australia. Five nurse experts and 104 nurses involved in 199 change-of-shift bedside handovers. A focus group with experts and pilot test were used to examine content and face validity, and usability of the handover audit tool. The tool was examined for inter-rater reliability and usability using observation audits of handovers across six wards. Data were collected in 2013-2014. Two independent observers for 72 audits demonstrated acceptable inter-observer agreement for 27 (77%) items. Reliability was weak for items examining the handover environment. Seventeen items were not observed reflecting gaps in practices. Across 199 observation audits, gaps in nurse bedside handover practice most often related to process and environment, rather than content items. Usability was impacted by high observer burden, familiarity and non-specific illustrative behaviours. The reliability and validity of most items to audit handover content was acceptable. Gaps in practices for process and environment items were identified. Context specific exemplars and reducing the items used at each handover audit can enhance usability. Further research is needed to develop context specific exemplars and undertake additional reliability testing using a wide range of handover settings. CONTRIBUTION OF THE PAPER. Copyright © 2017 Elsevier Inc. All rights reserved.
The reliability of WorkWell Systems Functional Capacity Evaluation: a systematic review
2014-01-01
Background Functional capacity evaluation (FCE) determines a person’s ability to perform work-related tasks and is a major component of the rehabilitation process. The WorkWell Systems (WWS) FCE (formerly known as Isernhagen Work Systems FCE) is currently the most commonly used FCE tool in German rehabilitation centres. Our systematic review investigated the inter-rater, intra-rater and test-retest reliability of the WWS FCE. Methods We performed a systematic literature search of studies on the reliability of the WWS FCE and extracted item-specific measures of inter-rater, intra-rater and test-retest reliability from the identified studies. Intraclass correlation coefficients ≥ 0.75, percentages of agreement ≥ 80%, and kappa coefficients ≥ 0.60 were categorised as acceptable, otherwise they were considered non-acceptable. The extracted values were summarised for the five performance categories of the WWS FCE, and the results were classified as either consistent or inconsistent. Results From 11 identified studies, 150 item-specific reliability measures were extracted. 89% of the extracted inter-rater reliability measures, all of the intra-rater reliability measures and 96% of the test-retest reliability measures of the weight handling and strength tests had an acceptable level of reliability, compared to only 67% of the test-retest reliability measures of the posture/mobility tests and 56% of the test-retest reliability measures of the locomotion tests. Both of the extracted test-retest reliability measures of the balance test were acceptable. Conclusions Weight handling and strength tests were found to have consistently acceptable reliability. Further research is needed to explore the reliability of the other tests as inconsistent findings or a lack of data prevented definitive conclusions. PMID:24674029
Gilmore-Bykovskyi, Andrea L
2015-01-01
Mealtime behavioral symptoms are distressing and frequently interrupt eating for the individual experiencing them and others in the environment. A computer-assisted coding scheme was developed to measure caregiver person-centeredness and behavioral symptoms for nursing home residents with dementia during mealtime interactions. The purpose of this pilot study was to determine the feasibility, ease of use, and inter-observer reliability of the coding scheme, and to explore the clinical utility of the coding scheme. Trained observers coded 22 observations. Data collection procedures were acceptable to participants. Overall, the coding scheme proved to be feasible, easy to execute and yielded good to very good inter-observer agreement following observer re-training. The coding scheme captured clinically relevant, modifiable antecedents to mealtime behavioral symptoms, but would be enhanced by the inclusion of measures for resident engagement and consolidation of items for measuring caregiver person-centeredness that co-occurred and were difficult for observers to distinguish. Published by Elsevier Inc.
de la Cámara, Miguel Ángel; Higueras-Fresnillo, Sara; Martinez-Gomez, David; Veiga, Oscar L
2018-05-29
The inter-day reliability of the Intelligent Device for Energy Expenditure and Activity (IDEEA) has not been studied to date. The study purpose was to examine the inter-day variability and reliability on two consecutive days collected with the IDEEA, as well as to predict the number of days needed to provide a reliable estimate of several movement (walking and climbing stairs) and non-movement behaviors (lying, reclining, sitting) and standing in older adults. The sample included 126 older adults (74 women) who wore the IDEEA for 48-h. Results showed low variability between the two days and its reliability was from moderate (ICC=0.34) to high (ICC=0.80) in most of movement and non-movement behaviors analyzed. The Bland-Altman plots showed a high-moderate agreement between days and the Spearman-Brown formula estimated ranged from 1.2 and 9.1 days of monitoring with the IDEEA are needed to achieve ICCs≥0.70 in older adults for sitting and climbing stairs, respectively.
Arthroscopic Diagnosis of the Triangular Fibrocartilage Complex Foveal Tear: A Cadaver Assessment.
Trehan, Samir K; Wall, Lindley B; Calfee, Ryan P; Shen, Tony S; Dy, Christopher J; Yannascoli, Sarah M; Goldfarb, Charles A
2018-01-25
To determine whether the arthroscopic hook and trampoline tests are accurate and reliable diagnostic tests for foveal triangular fibrocartilage complex (TFCC) detachment. Wrist arthroscopy was performed on 10 cadaveric upper extremities. Arthroscopic hook and trampoline tests were performed and videos recorded (baseline). The deep foveal TFCC insertion was then sharply detached. Arthroscopic hook and trampoline tests were repeated. Subsequently, the foveal detachment was repaired via an ulnar tunnel technique and the hook test was repeated for a third time. Videos were independently reviewed at 2 time points by 2 fellowship-trained hand surgeons and 1 hand surgery fellow in a randomized and blinded fashion. Hook and trampoline tests were graded as positive or negative. Proportions of categorical variables were compared via 2-tailed Fisher exact test. Inter- and intraobserver reliabilities were assessed via Cohen kappa coefficient. The sensitivity and specificity of the hook test for foveal detachment diagnosis were 90% and 90%, respectively. There was 90% agreement among all 3 observers for the baseline and foveal detachment hook tests. Cohen kappa coefficients for the inter- and intraobserver reliabilities of the hook test were 0.87 and 0.81, respectively. Seventeen percent of trampoline tests were positive at baseline versus 43% after foveal detachment. The trampoline test had 45% agreement between the 3 observers. Cohen kappa coefficients for the inter- and intraobserver reliabilities of the trampoline test were 0.16 and 0.63, respectively. Following ulnar tunnel repair, 20% of hook tests were positive. The hook test is highly sensitive, specific, and reliable for the diagnosis of isolated TFCC foveal detachment. The trampoline test has insufficient reliability to assess foveal detachment. A TFCC foveal repair using an ulnar tunnel technique returns the hook test to baseline. The hook test is a sensitive, specific, and reliable test for the diagnosis of isolated TFCC foveal detachment. Copyright © 2017 American Society for Surgery of the Hand. Published by Elsevier Inc. All rights reserved.
Reliability of doming and toe flexion testing to quantify foot muscle strength.
Ridge, Sarah Trager; Myrer, J William; Olsen, Mark T; Jurgensmeier, Kevin; Johnson, A Wayne
2017-01-01
Quantifying the strength of the intrinsic foot muscles has been a challenge for clinicians and researchers. The reliable measurement of this strength is important in order to assess weakness, which may contribute to a variety of functional issues in the foot and lower leg, including plantar fasciitis and hallux valgus. This study reports 3 novel methods for measuring foot strength - doming (previously unmeasured), hallux flexion, and flexion of the lesser toes. Twenty-one healthy volunteers performed the strength tests during two testing sessions which occurred one to five days apart. Each participant performed each series of strength tests (doming, hallux flexion, and lesser toe flexion) four times during the first testing session (twice with each of two raters) and two times during the second testing session (once with each rater). Intra-class correlation coefficients were calculated to test for reliability for the following comparisons: between raters during the same testing session on the same day (inter-rater, intra-day, intra-session), between raters on different days (inter-rater, inter-day, inter-session), between days for the same rater (intra-rater, inter-day, inter-session), and between sessions on the same day by the same rater (intra-rater, intra-day, inter-session). ICCs showed good to excellent reliability for all tests between days, raters, and sessions. Average doming strength was 99.96 ± 47.04 N. Average hallux flexion strength was 65.66 ± 24.5 N. Average lateral toe flexion was 50.96 ± 22.54 N. These simple tests using relatively low cost equipment can be used for research or clinical purposes. If repeated testing will be conducted on the same participant, it is suggested that the same researcher or clinician perform the testing each time for optimal reliability.
Zonnebeld, Niek; Maas, Tommy M G; Huberts, Wouter; van Loon, Magda M; Delhaas, Tammo; Tordoir, Jan H M
2017-11-01
Although clinical guidelines on arteriovenous fistula (AVF) creation advocate minimum luminal arterial and venous diameters, assessed by duplex ultrasonography (DUS), the clinical value of routine DUS examination is under debate. DUS might be an insufficiently repeatable and/or reproducible imaging modality because of its operator dependency. The present study aimed to assess intra- and inter-observer agreement of DUS examination in support of AVF surgery planning. Ten end stage renal disease patients were included, to assess intra- and inter-observer agreement of pre-operative DUS measurements. All measurements were performed by two trained and experienced vascular technicians, blinded to measurement readings. From the routine DUS protocol, representative measurements (venous diameters, and arterial diameters and volume flow in the upper arm and forearm) were selected. For intra-observer agreement the measurements were performed in triplicate, with the probe released from the skin between each. Intraclass correlation coefficients were calculated for intra- and inter-observer agreement, and Bland-Altman plots used to graphically display mean measurement differences and limits of agreement. Ten patients (6 male, 59.4±19.7 years) consented to participate, and all predefined measurements were obtained. Intraclass correlation coefficients for intra-observer agreement of diameter measurements were at least 0.90 (95% CI 0.74-0.97; radial artery). Inter-observer agreement was at least 0.83 (0.46-0.96; lateral diameter upper arm cephalic vein). The Bland-Altman plots showed acceptable mean measurement differences and limits of agreement. In experienced hands, excellent intra- and inter-observer agreement can be reached for the discrete pre-operative DUS measurements advocated in clinical guidelines. DUS is therefore a reliable imaging modality to support AVF surgery planning. The content of DUS protocols, however, needs further standardisation. Copyright © 2017 European Society for Vascular Surgery. Published by Elsevier Ltd. All rights reserved.
Grant, Andrew J; Vermunt, Jan D; Kinnersley, Paul; Houston, Helen
2007-01-01
Background Portfolio learning enables students to collect evidence of their learning. Component tasks making up a portfolio can be devised that relate directly to intended learning outcomes. Reflective tasks can stimulate students to recognise their own learning needs. Assessment of portfolios using a rating scale relating to intended learning outcomes offers high content validity. This study evaluated a reflective portfolio used during a final-year attachment in general practice (family medicine). Students were asked to evaluate the portfolio (which used significant event analysis as a basis for reflection) as a learning tool. The validity and reliability of the portfolio as an assessment tool were also measured. Methods 81 final-year medical students completed reflective significant event analyses as part of a portfolio created during a three-week attachment (clerkship) in general practice (family medicine). As well as two reflective significant event analyses each portfolio contained an audit and a health needs assessment. Portfolios were marked three times; by the student's GP teacher, the course organiser and by another teacher in the university department of general practice. Inter-rater reliability between pairs of markers was calculated. A questionnaire enabled the students' experience of portfolio learning to be determined. Results Benefits to learning from reflective learning were limited. Students said that they thought more about the patients they wrote up in significant event analyses but information as to the nature and effect of this was not forthcoming. Moderate inter-rater reliability (Spearman's Rho .65) was found between pairs of departmental raters dealing with larger numbers (20 – 60) of portfolios. Inter-rater reliability of marking involving GP tutors who only marked 1 – 3 portfolios was very low. Students rated highly their mentoring relationship with their GP teacher but found the portfolio tasks time-consuming. Conclusion The inter-rater reliability observed in this study should be viewed alongside the high validity afforded by the authenticity of the learning tasks (compared with a sample of a student's learning taken by an exam question). Validity is enhanced by the rating scale which directly connects the grade given with intended learning outcomes. The moderate inter-rater reliability may be increased if a portfolio is completed over a longer period of time and contains more component pieces of work. The questionnaire used in this study only accessed limited information about the effect of reflection on students' learning. Qualitative methods of evaluation would determine the students experience in greater depth. It would be useful to evaluate the effects of reflective learning after students have had more time to get used to this unfamiliar method of learning and to overcome any problems in understanding the task. PMID:17397544
Grant, Andrew J; Vermunt, Jan D; Kinnersley, Paul; Houston, Helen
2007-03-30
Portfolio learning enables students to collect evidence of their learning. Component tasks making up a portfolio can be devised that relate directly to intended learning outcomes. Reflective tasks can stimulate students to recognise their own learning needs. Assessment of portfolios using a rating scale relating to intended learning outcomes offers high content validity. This study evaluated a reflective portfolio used during a final-year attachment in general practice (family medicine). Students were asked to evaluate the portfolio (which used significant event analysis as a basis for reflection) as a learning tool. The validity and reliability of the portfolio as an assessment tool were also measured. 81 final-year medical students completed reflective significant event analyses as part of a portfolio created during a three-week attachment (clerkship) in general practice (family medicine). As well as two reflective significant event analyses each portfolio contained an audit and a health needs assessment. Portfolios were marked three times; by the student's GP teacher, the course organiser and by another teacher in the university department of general practice. Inter-rater reliability between pairs of markers was calculated. A questionnaire enabled the students' experience of portfolio learning to be determined. Benefits to learning from reflective learning were limited. Students said that they thought more about the patients they wrote up in significant event analyses but information as to the nature and effect of this was not forthcoming. Moderate inter-rater reliability (Spearman's Rho .65) was found between pairs of departmental raters dealing with larger numbers (20-60) of portfolios. Inter-rater reliability of marking involving GP tutors who only marked 1-3 portfolios was very low. Students rated highly their mentoring relationship with their GP teacher but found the portfolio tasks time-consuming. The inter-rater reliability observed in this study should be viewed alongside the high validity afforded by the authenticity of the learning tasks (compared with a sample of a student's learning taken by an exam question). Validity is enhanced by the rating scale which directly connects the grade given with intended learning outcomes. The moderate inter-rater reliability may be increased if a portfolio is completed over a longer period of time and contains more component pieces of work. The questionnaire used in this study only accessed limited information about the effect of reflection on students' learning. Qualitative methods of evaluation would determine the students experience in greater depth. It would be useful to evaluate the effects of reflective learning after students have had more time to get used to this unfamiliar method of learning and to overcome any problems in understanding the task.
Reliability and Concurrent Validity of Dynamic Rotator Stability Test-A Cross Sectional study.
Binoy Mathew, K V; Eapen, Charu; Kumar, P Senthil
2012-01-01
To find intra rater and inter rater reliability of Dynamic Rotator Stability Test (DRST) and to find concurrent validity of Dynamic Rotator Stability Test (DRST) with University of Pennsylvania Shoulder Score (PENN) Scale. 40 subjects of either gender between the age group of 18-70 with painful shoulder conditions of musculoskeletal origin was selected through convenient sampling. Tester 1 and tester 2 administered DRST and PENN scale randomly. In a subgroup of 20 subjects DRST was administered by both the testers to find the inter rater reliability. 180° Standard Universal Goniometer was used to take measurements. For intra-rater reliability, all the test variables were showing highly significant correlation (p=.94 - 1). For inter -rater, with tester 2, test variables like position, ROM, force, direction of abnormal translation, pain during the test, compensatory movement during test were found to be significant (p=.71-1).only some variables of DRST showed significant correlation with PENN scale (P=.320-.450). Dynamic Rotator Stability Test has good intra rater and moderate inter rater reliability. Concurrent validity of Dynamic Rotator Stability Test was found to be poor when compared to PENN Shoulder Score.
Khan, Moin; Ranawat, Anil; Williams, Dale; Gandhi, Rajiv; Choudur, Hema; Parasu, Naveen; Simunovic, Nicole; Ayeni, Olufemi R
2015-09-01
Alpha and beta angles are commonly used radiographic measures to assess the sphericity of the proximal femur and distance between the pathologic head-neck junction and the acetabular rim, respectively. The aim of this study was to explore the relationship between these two measurements on frog-leg lateral hip radiographs. Fifty frog-leg lateral hip radiographs were evaluated by two orthopaedic surgeons and two radiologists. Each reviewer measured the alpha and beta angles on two separate occasions to determine the relationship between positive alpha and beta angles and the inter- and intra-observer reliability of these measurements. There was no significant association between positive alpha and beta angles, [kappa range -0.043 (95 % CI -0.17 to 0.086) to 0.54 (95 % CI 0.33-0.75)]. Intra-observer reliability was high [alpha angle intra-class correlation coefficient (ICC) range 0.74 (95 % CI 0.58-0.84) to 0.99 (95 % CI 0.98-0.99) and beta angle ICC range 0.86 (95 % CI 0.76-0.92) to 0.97 (95 % CI 0.95-0.98)]. There is no statistical or functional relationship between readings of positive alpha and beta angles. The radiographic measurements resulted in high intra-observer and fair-to-moderate inter-observer reliability. Results of this study suggest that the presence of a CAM lesion on lateral radiographs as suggested by a positive alpha angle does not necessitate a decrease in clearance between the femoral head and acetabular rim as measured by the beta angle and thus may not be the best measure of functional impingement. Understanding the relationship between these two aspects of femoroacetabular impingement improves a surgeon's ability to anticipate potential operative management.
Sharma, Shreela; Chuang, Ru-Jye; Skala, Katherine; Atteberry, Heather
2012-01-01
The purpose of this study is describe the initial feasibility, reliability, and validity of an instrument to measure physical activity in preschoolers using direct observation. The System for Observing Fitness Instruction Time for Preschoolers was developed and tested among 3- to 6-year-old children over fall 2008 for feasibility and reliability (Phase I, n=67) and in fall 2009 for concurrent validity (Phase II, n=27). Phase I showed that preschoolers spent >75% of their active time at preschool in light physical activity. The mean inter-observer agreements scores were ≥.75 for physical activity level and type. Correlation coefficients, measuring construct validity between the lesson context and physical activity types with and with the activity levels, were moderately strong. Phase II showed moderately strong correlations ranging from .50 to .54 between the System for Observing Fitness Instruction Time for Preschoolers and Actigraph accelerometers for physical activity levels. The System for Observing Fitness Instruction Time for Preschoolers shows promising initial results as a new method for measuring physical activity among preschoolers. PMID:22485071
Knols, Ruud H; Aufdemkampe, Geert; de Bruin, Eling D; Uebelhart, Daniel; Aaronson, Neil K
2009-01-01
Background Hand-held dynamometry is a portable and inexpensive method to quantify muscle strength. To determine if muscle strength has changed, an examiner must know what part of the difference between a patient's pre-treatment and post-treatment measurements is attributable to real change, and what part is due to measurement error. This study aimed to determine the relative and absolute reliability of intra and inter-observer strength measurements with a hand-held dynamometer (HHD). Methods Two observers performed maximum voluntary peak torque measurements (MVPT) for isometric knee extension in 24 patients with haematological malignancies. For each patient, the measurements were carried out on the same day. The main outcome measures were the intraclass correlation coefficient (ICC ± 95%CI), the standard error of measurement (SEM), the smallest detectable difference (SDD), the relative values as % of the grand mean of the SEM and SDD, and the limits of agreement for the intra- and inter-observer '3 repetition average' and the 'highest value of 3 MVPT' knee extension strength measures. Results The intra-observer ICCs were 0.94 for the average of 3 MVPT (95%CI: 0.86–0.97) and 0.86 for the highest value of 3 MVPT (95%CI: 0.71–0.94). The ICCs for the inter-observer measurements were 0.89 for the average of 3 MVPT (95%CI: 0.75–0.95) and 0.77 for the highest value of 3 MVPT (95%CI: 0.54–0.90). The SEMs for the intra-observer measurements were 6.22 Nm (3.98% of the grand mean (GM) and 9.83 Nm (5.88% of GM). For the inter-observer measurements, the SEMs were 9.65 Nm (6.65% of GM) and 11.41 Nm (6.73% of GM). The SDDs for the generated parameters varied from 17.23 Nm (11.04% of GM) to 27.26 Nm (17.09% of GM) for intra-observer measurements, and 26.76 Nm (16.77% of GM) to 31.62 Nm (18.66% of GM) for inter-observer measurements, with similar results for the limits of agreement. Conclusion The results indicate that there is acceptable relative reliability for evaluating knee strength with a HHD, while the measurement error observed was modest. The HHD may be useful in detecting changes in knee extension strength at the individual patient level. PMID:19272149
Inter- and intra-rater reliability of nasal auscultation in daycare children.
Santos, Rita; Silva Alexandrino, Ana; Tomé, David; Melo, Cristina; Mesquita Montes, António; Costa, Daniel; Pinto Ferreira, João
2018-02-01
The aim of this study was to assess nasal auscultation's intra- and inter-rater reliability and to analyze ear and respiratory clinical condition according to nasal auscultation. Cross-sectional study performed in 125 children aged up to 3 years old attending daycare centers. Nasal auscultation, tympanometry and Paediatric Respiratory Severity Score (PRSS) were applied to all children. Nasal sounds were classified by an expert panel in order to determine nasal auscultation's intra and inter- rater reliability. The classification of nasal sounds was assessed against tympanometric and PRSS values. Nasal auscultation revealed substantial inter-rater (K=0.75) and intra-rater (K=0.69; K=0.61 and K=0.72) reliability. Children with a "non-obstructed" classification revealed a lower peak pressure (t=-3.599, P<0.001 in left ear; t=-2.258, P=0.026 in right ear) and a higher compliance (t=-2,728, P=0.007 in left ear; t=-3.830. P<0.001 in right ear) in both ears. There was an association between the classification of sounds and tympanogram types in both ears (X=11.437, P=0.003 in left ear; X=13.535, P=0.001 in right ear). Children with a "non-obstructed" classification had a healthier respiratory condition. Nasal auscultation revealed substantial intra- and inter-rater reliability. Nasal auscultation exhibited important differences according to ear and respiratory clinical conditions. Nasal auscultation in pediatrics seems to be an original topic as well as a simple method that can be used to identify early signs of nasopharyngeal obstruction.
The Integration of Research in Judgment and Decision Theory
1980-07-01
off at any one of a series of choice points in a basically linear, unidimensional, all-or-none series of relays is at least in part the result of the...Subjective and objective referents. An objective referent requires a series of observations in which inter-observer reliabilities approximate unity; as... series of studies by Br6hmer (1980). More generally, research as far back as that of Krechevsky’s in the 1930s was conducted precisely to show that
RELIABILITY OF ANKLE-FOOT MORPHOLOGY, MOBILITY, STRENGTH, AND MOTOR PERFORMANCE MEASURES.
Fraser, John J; Koldenhoven, Rachel M; Saliba, Susan A; Hertel, Jay
2017-12-01
Assessment of foot posture, morphology, intersegmental mobility, strength and motor control of the ankle-foot complex are commonly used clinically, but measurement properties of many assessments are unclear. To determine test-retest and inter-rater reliability, standard error of measurement, and minimal detectable change of morphology, joint excursion and play, strength, and motor control of the ankle-foot complex. Reliability study. 24 healthy, recreationally-active young adults without history of ankle-foot injury were assessed by two clinicians on two occasions, three to ten days apart. Measurement properties were assessed for foot morphology (foot posture index, total and truncated length, width, arch height), joint excursion (weight-bearing dorsiflexion, rearfoot and hallux goniometry, forefoot inclinometry, 1 st metatarsal displacement) and joint play, strength (handheld dynamometry), and motor control rating during intrinsic foot muscle (IFM) exercises. Clinician order was randomized using a Latin Square. The clinicians performed independent examinations and did not confer on the findings for the duration of the study. Test-retest and inter-tester reliability and agreement was assessed using intraclass correlation coefficients (ICC 2,k ) and weighted kappa ( K w ). Test-retest reliability ICC were as follows: morphology: .80-1.00, joint excursion: .58-.97, joint play: -.67-.84, strength: .67-.92, IFM motor rating: K W -.01-.71. Inter-rater reliability ICC were as follows: morphology: .81-1.00, joint excursion: .32-.97, joint play: -1.06-1.00, strength: .53-.90, and IFM motor rating: K w .02-.56. Measures of ankle-foot posture, morphology, joint excursion, and strength demonstrated fair to excellent test-retest and inter-rater reliability. Test-retest reliability for rating of perceived difficulty and motor performance was good to excellent for short-foot, toe-spread-out, and hallux exercises and poor to fair for lesser toe extension. Joint play measures had poor to fair reliability overall. The findings of this study should be considered when choosing methods of clinical assessment and outcome measures in practice and research. 3.
Charlton, Paula C; Mentiplay, Benjamin F; Grimaldi, Alison; Pua, Yong-Hao; Clark, Ross A
2017-02-01
Firstly to describe the reliability of assessing maximal isometric strength of the hip abductor and adductor musculature using a hand held dynamometry (HHD) protocol with simultaneous wireless surface electromyographic (sEMG) evaluation of the gluteus medius (GM) and adductor longus (AL). Secondly, to describe the correlation between isometric strength recorded with the HHD protocol and a laboratory standard isokinetic device. Reliability and correlational study. A sample of 24 elite, male, junior, rugby league athletes, age 16-20 years participated in repeated HHD and isometric Kin-Com (KC) strength testing with simultaneous sEMG assessment, on average (range) 6 (5-7) days apart by a single assessor. Strength tests included; unilateral hip abduction (ABD) and adduction (ADD) and bilateral ADD assessed with squeeze (SQ) tests in 0 and 45° of hip flexion. HHD demonstrated good to excellent inter-session reliability for all outcome measures (ICC (2,1) =0.76-0.91) and good to excellent association with the laboratory reference KC (ICC (2,1) =0.80-0.88). Whilst intra-session, inter-trial reliability of EMG activation and co-activation outcome measures ranged from moderate to excellent (ICC (2,1) =0.70-0.94), inter-session reliability was poor (all ICC (2,1) <0.50). Isometric strength testing of the hip ABD and ADD musculature using HHD may be measured reliably in elite, junior rugby league athletes. Due to the poor inter-session reliability of sEMG measures, it is not recommended for athlete screening purposes if using the techniques implemented in this study. Copyright © 2016 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Inter- and intra-rater reliability and agreement in determining subcutaneous tumour margins in dogs.
Ranganathan, B; Milovancev, M; Leeper, H; Townsend, K L; Bracha, S; Curran, K
2018-03-01
The objective of this prospective study was to evaluate agreement and reliability of calliper-based measurements of locally invasive subcutaneous malignant tumours in dogs. Four raters measured the longest diameter of 12 subcutaneous tumours (7 soft tissue sarcomas and 5 mast cell tumours) from 11 client-owned dogs during 3 randomized, blinded measurement trials, both pre- and post-sedation. Inter- and intra-rater reliability was evaluated using intra-class correlation coefficient (ICC) and agreement was evaluated using Bland-Altman plots. Inter- and intra-rater reliability was good (ICC range of 0.8694-0.89520) and excellent (ICC range of 0.9720-0.9966), respectively. For agreement calculations, an a priori clinically relevant limit of agreement of 10 mm was set. Inter- and intra-rater agreement was unacceptable with inter-rater limits of agreement ranging from 15.9 to 55.6 mm and intra-rater limit of agreement ranging from 11.9 to 28.1 mm. Review of the measurement trial photographs revealed that calliper orientation changes were frequent, occurring in 9/12 (75%) and 8/12 (67%) pre- and post-sedation cases. No significant correlation was found between inter-rater measurement standard deviations and calliper orientation changes or dog body condition score. These findings suggest veterinarians may have poor agreement in determining the gross edge of tumours, which is expected to introduce bias and inconsistency in tumour staging, assessing response to therapy, and surgical margin planning. Due to the potential consequences for veterinary cancer patients, future studies are needed to validate the present findings. © 2018 John Wiley & Sons Ltd.
ERIC Educational Resources Information Center
Sheehan, Dwayne P.; Lafave, Mark R.; Katz, Larry
2011-01-01
This study was designed to test the intra- and inter-rater reliability of the University of North Carolina's Balance Error Scoring System in 9- and 10-year-old children. Additionally, a modified version of the Balance Error Scoring System was tested to determine if it was more sensitive in this population ("raw scores"). Forty-six…
Inter-rater agreement on PIVC-associated phlebitis signs, symptoms and scales.
Marsh, Nicole; Mihala, Gabor; Ray-Barruel, Gillian; Webster, Joan; Wallis, Marianne C; Rickard, Claire M
2015-10-01
Many peripheral intravenous catheter (PIVC) infusion phlebitis scales and definitions are used internationally, although no existing scale has demonstrated comprehensive reliability and validity. We examined inter-rater agreement between registered nurses on signs, symptoms and scales commonly used in phlebitis assessment. Seven PIVC-associated phlebitis signs/symptoms (pain, tenderness, swelling, erythema, palpable venous cord, purulent discharge and warmth) were observed daily by two raters (a research nurse and registered nurse). These data were modelled into phlebitis scores using 10 different tools. Proportions of agreement (e.g. positive, negative), observed and expected agreements, Cohen's kappa, the maximum achievable kappa, prevalence- and bias-adjusted kappa were calculated. Two hundred ten patients were recruited across three hospitals, with 247 sets of paired observations undertaken. The second rater was blinded to the first's findings. The Catney and Rittenberg scales were the most sensitive (phlebitis in >20% of observations), whereas the Curran, Lanbeck and Rickard scales were the most restrictive (≤2% phlebitis). Only tenderness and the Catney (one of pain, tenderness, erythema or palpable cord) and Rittenberg scales (one of erythema, swelling, tenderness or pain) had acceptable (more than two-thirds, 66.7%) levels of inter-rater agreement. Inter-rater agreement for phlebitis assessment signs/symptoms and scales is low. This likely contributes to the high degree of variability in phlebitis rates in literature. We recommend further research into assessment of infrequent signs/symptoms and the Catney or Rittenberg scales. New approaches to evaluating vein irritation that are valid, reliable and based on their ability to predict complications need exploration. © 2015 John Wiley & Sons, Ltd.
Taffarel, Marilda Onghero; Luna, Stelio Pacca Loureiro; de Oliveira, Flavia Augusta; Cardoso, Guilherme Schiess; Alonso, Juliana de Moura; Pantoja, Jose Carlos; Brondani, Juliana Tabarelli; Love, Emma; Taylor, Polly; White, Kate; Murrell, Joanna C
2015-04-01
Quantification of pain plays a vital role in the diagnosis and management of pain in animals. In order to refine and validate an acute pain scale for horses a prospective, randomized, blinded study was conducted. Twenty-four client owned adult horses were recruited and allocated to one of four following groups: anaesthesia only (GA); pre-emptive analgesia and anaesthesia (GAA,); anaesthesia, castration and postoperative analgesia (GC); or pre-emptive analgesia, anaesthesia and castration (GCA). One investigator, unaware of the treatment group, assessed all horses at time-points before and after intervention and completed the pain scale. Videos were also obtained at these time-points and were evaluated by a further four blinded evaluators who also completed the scale. The data were used to investigate the relevance, specificity, criterion validity and inter- and intra-observer reliability of each item on the pain scale, and to evaluate construct validity and responsiveness of the scale. Construct validity was demonstrated by the observed differences in scores between the groups, four hours after anaesthetic recovery and before administration of systemic analgesia in the GC group. Inter- and intra-observer reliability for the items was only satisfactory. Subsequently the pain scale was refined, based on results for relevance, specificity and total item correlation. Scale refinement and exclusion of items that did not meet predefined requirements generated a selection of relevant pain behaviours in horses. After further validation for reliability, these may be used to evaluate pain under clinical and experimental conditions.
Bronchiolitis Score of Sant Joan de Déu: BROSJOD Score, validation and usefulness.
Balaguer, Mònica; Alejandre, Carme; Vila, David; Esteban, Elisabeth; Carrasco, Josep L; Cambra, Francisco José; Jordan, Iolanda
2017-04-01
To validate the bronchiolitis score of Sant Joan de Déu (BROSJOD) and to examine the previously defined scoring cutoff. Prospective, observational study. BROSJOD scoring was done by two independent physicians (at admission, 24 and 48 hr). Internal consistency of the score was assessed using Cronbach's α. To determine inter-rater reliability, the concordance correlation coefficient estimated as an intraclass correlation coefficient (CCC) and limits of agreement estimated as the 90% total deviation index (TDI) were estimated. An expert opinion was used to classify patients according to clinical severity. A validity analysis was conducted comparing the 3-level classification score to that expert opinion. Volume under the surface (VUS), predictive values, and probability of correct classification (PCC) were measured to assess discriminant validity. About 112 patients were recruited, 62 of them (55.4%) males. Median age: 52.5 days (IQR: 32.75-115.25). The admission Cronbach's α was 0.77 (CI95%: 0.71; 0.82) and at 24 hr it was 0.65 (CI95%: 0.48; 0.7). The inter-rater reliability analysis was: CCC at admission 0.96 (95%CI 0.94-0.97), at 24 h 0.77 (95%CI 0.65-0.86), and at 48 hr 0.94 (95%CI 0.94-0.97); TDI 90%: 1.6, 2.9, and 1.57, respectively. The discriminant validity at admission: VUS of 0.8 (95%CI 0.70-0.90), at 24 h 0.92 (95%CI 0.85-0.99), and at 48 hr 0.93 (95%CI 0.87-0.99). The predictive values and PCC values were within 38-100% depending on the level of clinical severity. There is a high inter-rater reliability, showing the BROSJOD score to be reliable and valid, even when different observers apply it. Pediatr Pulmonol. 2017;52:533-539. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Chiu, Tsz-chun Roxy; Ngo, Hiu-ching; Lau, Lai-wa; Leung, King-wah; Lo, Man-him; Yu, Ho-fai; Ying, Michael
2016-01-01
Aims This study was undertaken to investigate the immediate effect of static stretching on normal Achilles tendon morphology and stiffness, and the different effect on dominant and non-dominant legs; and to evaluate inter-operator and intra-operator reliability of using shear-wave elastography in measuring Achilles tendon stiffness. Methods 20 healthy subjects (13 males, 7 females) were included in the study. Thickness, cross-sectional area and stiffness of Achilles tendons in both legs were measured before and after 5-min static stretching using grey-scale ultrasound and shear-wave elastography. Inter-operator and intra-operator reliability of tendon stiffness measurements of six operators were evaluated. Results Result showed that there was no significant change in the thickness and cross-sectional area of Achilles tendon after static stretching in both dominant and non-dominant legs (p > 0.05). Tendon stiffness showed a significant increase in non-dominant leg (p < 0.05) but not in dominant leg (p > 0.05). The inter-operator reliability of shear-wave elastography measurements was 0.749 and the intra-operator reliability ranged from 0.751 to 0.941. Conclusion Shear-wave elastography is a useful and non-invasive imaging tool to assess the immediate stiffness change of Achilles tendon in response to static stretching with high intra-operator and inter-operator reliability. PMID:27120097
Values of a Patient and Observer Scar Assessment Scale to Evaluate the Facial Skin Graft Scar
Chae, Jin Kyung; Kim, Eun Jung; Park, Kun
2016-01-01
Background The patient and observer scar assessment scale (POSAS) recently emerged as a promising method, reflecting both observer's and patient's opinions in evaluating scar. This tool was shown to be consistent and reliable in burn scar assessment, but it has not been tested in the setting of skin graft scar in skin cancer patients. Objective To evaluate facial skin graft scar applied to POSAS and to compare with objective scar assessment tools. Methods Twenty three patients, who diagnosed with facial cutaneous malignancy and transplanted skin after Mohs micrographic surgery, were recruited. Observer assessment was performed by three independent rates using the observer component of the POSAS and Vancouver scar scale (VSS). Patient self-assessment was performed using the patient component of the POSAS. To quantify scar color and scar thickness more objectively, spectrophotometer and ultrasonography was applied. Results Inter-observer reliability was substantial with both VSS and the observer component of the POSAS (average measure intraclass coefficient correlation, 0.76 and 0.80, respectively). The observer component consistently showed significant correlations with patients' ratings for the parameters of the POSAS (all p-values<0.05). The correlation between subjective assessment using POSAS and objective assessment using spectrophotometer and ultrasonography showed low relationship. Conclusion In facial skin graft scar assessment in skin cancer patients, the POSAS showed acceptable inter-observer reliability. This tool was more comprehensive and had higher correlation with patient's opinion. PMID:27746642
Self-audit of lockout/tagout in manufacturing workplaces: A pilot study.
Yamin, Samuel C; Parker, David L; Xi, Min; Stanley, Rodney
2017-05-01
Occupational health and safety (OHS) self-auditing is a common practice in industrial workplaces. However, few audit instruments have been tested for inter-rater reliability and accuracy. A lockout/tagout (LOTO) self-audit checklist was developed for use in manufacturing enterprises. It was tested for inter-rater reliability and accuracy using responses of business self-auditors and external auditors. Inter-rater reliability at ten businesses was excellent (κ = 0.84). Business self-auditors had high (100%) accuracy in identifying elements of LOTO practice that were present as well those that were absent (81% accuracy). Reliability and accuracy increased further when problematic checklist questions were removed from the analysis. Results indicate that the LOTO self-audit checklist would be useful in manufacturing firms' efforts to assess and improve their LOTO programs. In addition, a reliable self-audit instrument removes the need for external auditors to visit worksites, thereby expanding capacity for outreach and intervention while minimizing costs. © 2017 Wiley Periodicals, Inc.
Kim, Min-Beom; Ban, Jae Ho
2012-12-01
To evaluate the test-retest reliability and convenience of simultaneous binaural acoustic-evoked ocular vestibular evoked myogenic potentials (oVEMP). Thirteen healthy subjects with no history of ear diseases participated in this study. All subjects underwent oVEMP test with both separated monaural acoustic stimulation and simultaneous binaural acoustic stimulation. For evaluating test-retest reliability, three repetitive sessions were performed in each ear for calculating the intraclass correlation coefficient (ICC) for both monaural and binaural tests. We analyzed data from the biphasic n1-p1 complex, such as latency of peak, inter-peak amplitude, and asymmetric ratio of amplitude in both ears. Finally, we checked the total time required to complete each test for evaluating test convenience. No significant difference was observed in amplitude and asymmetric ratio in comparison between monaural and binaural oVEMP. However, latency was slightly delayed in binaural oVEMP. In test-retest reliability analysis, binaural oVEMP showed excellent ICC values ranging from 0.68 to 0.98 in latency, asymmetric ratio, and inter-peak amplitude. Additionally, the test time was shorter in binaural than monaural oVEMP. oVEMP elicited from binaural acoustic stimulation yields similar satisfactory results as monaural stimulation. Further, excellent test-retest reliability and shorter test time were achieved in binaural than in monaural oVEMP.
Validity and reliability of the Diagnostic Adaptive Behaviour Scale.
Tassé, M J; Schalock, R L; Balboni, G; Spreat, S; Navas, P
2016-01-01
The Diagnostic Adaptive Behaviour Scale (DABS) is a new standardised adaptive behaviour measure that provides information for evaluating limitations in adaptive behaviour for the purpose of determining a diagnosis of intellectual disability. This article presents validity evidence and reliability data for the DABS. Validity evidence was based on comparing DABS scores with scores obtained on the Vineland Adaptive Behaviour Scale, second edition. The stability of the test scores was measured using a test and retest, and inter-rater reliability was assessed by computing the inter-respondent concordance. The DABS convergent validity coefficients ranged from 0.70 to 0.84, while the test-retest reliability coefficients ranged from 0.78 to 0.95, and the inter-rater concordance as measured by intraclass correlation coefficients ranged from 0.61 to 0.87. All obtained validity and reliability indicators were strong and comparable with the validity and reliability coefficients of the most commonly used adaptive behaviour instruments. These results and the advantages of the DABS for clinician and researcher use are discussed. © 2015 MENCAP and International Association of the Scientific Study of Intellectual and Developmental Disabilities and John Wiley & Sons Ltd.
Yoshida, Masahito; Collin, Phillipe; Josseaume, Thierry; Lädermann, Alexandre; Goto, Hideyuki; Sugimoto, Katumasa; Otsuka, Takanobu
2018-01-01
Magnetic resonance (MR) imaging is common in structural and qualitative assessment of the rotator cuff post-operatively. Rotator cuff integrity has been thought to be associated with clinical outcome. The purpose of this study was to evaluate the inter-observer reliability of cuff integrity (Sugaya's classification) and assess the correlation between Sugaya's classification and the clinical outcome. It was hypothesized that Sugaya's classification would show good reliability and good correlation with the clinical outcome. Post-operative MR images were taken two years post-operatively, following arthroscopic rotator cuff repair. For assessment of inter-rater reliability, all radiographic evaluations for the supraspinatus muscle were done by two orthopaedic surgeons and one radiologist. Rotator cuff integrity was classified into five categories, according to Sugaya's classification. Fatty infiltration was graded into four categories, based on the Fuchs' classification grading system. Muscle hypotrophy was graded as four grades, according to the scale proposed by Warner. The clinical outcome was assessed according to the constant scoring system pre-operatively and 2 years post-operatively. Of the sixty-two consecutive patients with full-thickness rotator cuff tears, fifty-two patients were reviewed in this study. These subjects included twenty-three men and twenty-nine women, with an average age of fifty-seven years. In terms of the inter-rater reliability between orthopaedic surgeons, Sugaya's classification showed the highest agreement [ICC (2.1) = 0.82] for rotator cuff integrity. The grade of fatty infiltration and muscle atrophy demonstrated good agreement, respectively (0.722 and 0.758). With regard to the inter-rater reliability between orthopaedic surgeon and radiologist, Sugaya's classification showed good reliability [ICC (2.1) = 0.70]. On the other hand, fatty infiltration and muscle hypotrophy classifications demonstrated fair and moderate agreement [ICC (2.1) = 0.39 and 0.49]. Although no significant correlation was found between overall post-operative constant score and Sugaya's classification, Sugaya's classification indicated significant correlation with the muscle strength score. Sugaya's classification showed repeatability and good agreement between the orthopaedist and radiologist, who are involved in the patient care for the rotator cuff tear. Common classification of rotator cuff integrity with good reliability will give appropriate information for clinicians to improve the patient care of the rotator cuff tear. This classification also would be helpful to predict the strength of arm abduction in the scapular plane. IV.
Razek, Ahmed Abdel Khalek Abdel; Shamaa, Sameh; Lattif, Mahmoud Abdel; Yousef, Hanan Hamid
2017-01-01
To assess inter-observer agreement of whole-body computed tomography (WBCT) in staging and response assessment in lymphoma according to the Lugano classification. Retrospective analysis was conducted of 115 consecutive patients with lymphomas (45 females, 70 males; mean age of 46 years). Patients underwent WBCT with a 64 multi-detector CT device for staging and response assessment after a complete course of chemotherapy. Image analysis was performed by 2 reviewers according to the Lugano classification for staging and response assessment. The overall inter-observer agreement of WBCT in staging of lymphoma was excellent ( k =0.90, percent agreement=94.9%). There was an excellent inter-observer agreement for stage I ( k =0.93, percent agreement=96.4%), stage II ( k =0.90, percent agreement=94.8%), stage III ( k =0.89, percent agreement=94.6%) and stage IV ( k =0.88, percent agreement=94%). The overall inter-observer agreement in response assessment after a completer course of treatment was excellent ( k =0.91, percent agreement=95.8%). There was an excellent inter-observer agreement in progressive disease ( k =0.94, percent agreement=97.1%), stable disease ( k =0.90, percent agreement=95%), partial response ( k =0.96, percent agreement=98.1%) and complete response ( k =0.87, Percent agreement=93.3%). We concluded that WBCT is a reliable and reproducible imaging modality for staging and treatment assessment in lymphoma according to the Lugano classification.
Reliability and validity of the de Morton Mobility Index in individuals with sub-acute stroke.
Braun, Tobias; Marks, Detlef; Thiel, Christian; Grüneberg, Christian
2018-02-04
To establish the validity and reliability of the de Morton Mobility Index (DEMMI) in patients with sub-acute stroke. This cross-sectional study was performed in a neurological rehabilitation hospital. We assessed unidimensionality, construct validity, internal consistency reliability, inter-rater reliability, minimal detectable change and possible floor and ceiling effects of the DEMMI in adult patients with sub-acute stroke. The study included a total sample of 121 patients with sub-acute stroke. We analysed validity (n = 109) and reliability (n = 51) in two sub-samples. Rasch analysis indicated unidimensionality with an overall fit to the model (chi-square = 12.37, p = 0.577). All hypotheses on construct validity were confirmed. Internal consistency reliability (Cronbach's alpha = 0.94) and inter-rater reliability (intraclass correlation coefficient = 0.95; 95% confidence interval: 0.92-0.97) were excellent. The minimal detectable change with 90% confidence was 13 points. No floor or ceiling effects were evident. These results indicate unidimensionality, sufficient internal consistency reliability, inter-rater reliability, and construct validity of the DEMMI in patients with a sub-acute stroke. Advantages of the DEMMI in clinical application are the short administration time, no need for special equipment and interval level data. The de Morton Mobility Index, therefore, may be a useful performance-based bedside test to measure mobility in individuals with a sub-acute stroke across the whole mobility spectrum. Implications for Rehabilitation The de Morton Mobility Index (DEMMI) is an unidimensional measurement instrument of mobility in individuals with sub-acute stroke. The DEMMI has excellent internal consistency and inter-rater reliability, and sufficient construct validity. The minimal detectable change of the DEMMI with 90% confidence in stroke rehabilitation is 13 points. The lack of any floor or ceiling effects on hospital admission indicates applicability across the whole mobility spectrum of patients with sub-acute stroke.
Wiig, Ola; Terjesen, Terje; Svenningsen, Svein
2002-10-01
We evaluated the inter-observer agreement of radiographic methods when evaluating patients with Perthes' disease. The radiographs were assessed at the time of diagnosis and at the 1-year follow-up by local orthopaedic surgeons (O) and 2 experienced pediatric orthopedic surgeons (TT and SS). The Catterall, Salter-Thompson, and Herring lateral pillar classifications were compared, and the femoral head coverage (FHC), center-edge angle (CE-angle), and articulo-trochanteric distance (ATD) were measured in the affected and normal hips. On the primary evaluation, the lateral pillar and Salter-Thompson classifications had a higher level of agreement among the observers than the Catterall classification, but none of the classifications showed good agreement (weighted kappa values between O and SS 0.56, 0.54, 0.49, respectively). Combining Catterall groups 1 and 2 into one group, and groups 3 and 4 into another resulted in better agreement (kappa 0.55) than with the original 4-group system. The agreement was also better (kappa 0.62-0.70) between experienced than between less experienced examiners for all classifications. The femoral head coverage was a more reliable and accurate measure than the CE-angle for quantifying the acetabular covering of the femoral head, as indicated by higher intraclass correlation coefficients (ICC) and smaller inter-observer differences. The ATD showed good agreement in all comparisons and had low interobserver differences. We conclude that all classifications of femoral head involvement are adequate in clinical work if the radiographic assessment is done by experienced examiners. When they are less experienced examiners, a 2-group classification or the lateral pillar classification is more reliable. For evaluation of containment of the femoral head, FHC is more appropriate than the CE-angle.
Chen, Hui; van Eijnatten, Maureen; Wolff, Jan; de Lange, Jan; van der Stelt, Paul F; Lobbezoo, Frank; Aarab, Ghizlane
2017-08-01
The aim of this study was to assess the reliability and accuracy of three different imaging software packages for three-dimensional analysis of the upper airway using CBCT images. To assess the reliability of the software packages, 15 NewTom 5G ® (QR Systems, Verona, Italy) CBCT data sets were randomly and retrospectively selected. Two observers measured the volume, minimum cross-sectional area and the length of the upper airway using Amira ® (Visage Imaging Inc., Carlsbad, CA), 3Diagnosys ® (3diemme, Cantu, Italy) and OnDemand3D ® (CyberMed, Seoul, Republic of Korea) software packages. The intra- and inter-observer reliability of the upper airway measurements were determined using intraclass correlation coefficients and Bland & Altman agreement tests. To assess the accuracy of the software packages, one NewTom 5G ® CBCT data set was used to print a three-dimensional anthropomorphic phantom with known dimensions to be used as the "gold standard". This phantom was subsequently scanned using a NewTom 5G ® scanner. Based on the CBCT data set of the phantom, one observer measured the volume, minimum cross-sectional area, and length of the upper airway using Amira ® , 3Diagnosys ® , and OnDemand3D ® , and compared these measurements with the gold standard. The intra- and inter-observer reliability of the measurements of the upper airway using the different software packages were excellent (intraclass correlation coefficient ≥0.75). There was excellent agreement between all three software packages in volume, minimum cross-sectional area and length measurements. All software packages underestimated the upper airway volume by -8.8% to -12.3%, the minimum cross-sectional area by -6.2% to -14.6%, and the length by -1.6% to -2.9%. All three software packages offered reliable volume, minimum cross-sectional area and length measurements of the upper airway. The length measurements of the upper airway were the most accurate results in all software packages. All software packages underestimated the upper airway dimensions of the anthropomorphic phantom.
Sarig Bahat, Hilla; Sprecher, Elliot; Sela, Itamar; Treleaven, Julia
2016-07-01
The use of virtual reality (VR) for assessment and intervention of neck pain has previously been used and shown reliable for cervical range of motion measures. Neck VR enables analysis of task-oriented neck movement by stimulating responsive movements to external stimuli. Therefore, the purpose of this study was to establish inter-tester reliability of neck kinematic measures so that it can be used as a reliable assessment and treatment tool between clinicians. This reliability study included 46 asymptomatic participants, who were assessed using the neck VR system which displayed an interactive VR scenario via a head-mounted device, controlled by neck movements. The objective of the interactive assessment was to hit 16 targets, randomly appearing in four directions, as fast as possible. Each participant was tested twice by two different testers. Good reliability was found of neck motion kinematic measures in flexion, extension, and rotation (0.64-0.93 inter-class correlation). High reliability was shown for peak velocity globally (0.93), in left rotation (0.9), right rotation and extension (0.88), and flexion (0.86). Mean velocity had a good global reliability (0.84), except for left rotation directed movement with moderate reliability (0.68). Minimal detectable change for peak velocity ranged from 41 to 53 °/s, while mean velocity ranged from 20 to 25 °/s. The results suggest high reliability for peak and mean velocity as measured by the interactive Neck VR assessment of neck motion kinematics. VR appears to provide a reliable and more ecologically valid method of cervical motion evaluation than previous conventional methodologies.
ERIC Educational Resources Information Center
Kaya, Taciser; Goksel Karatepe, Altinay; Gunaydin, Rezzan; Koc, Aysegul; Altundal Ercan, Ulku
2011-01-01
The Modified Ashworth Scale (MAS) is commonly used in clinical practice for grading spasticity. However, it was modified recently by omitting grade "1+" of the MAS and redefining grade "2". The aim of this study was to investigate the inter-rater reliability of MAS and modified MAS (MMAS) for the assessment of poststroke elbow flexor spasticity.…
ERIC Educational Resources Information Center
Takeda, Kazuya; Tanabe, Shigeo; Koyama, Soichiro; Nagai, Tomoko; Sakurai, Hiroaki; Kanada, Yoshikiyo; Shomoto, Koji
2018-01-01
The aim of this study was to clarify the intra- and inter-rater reliability of the rate of force development in hip abductor muscle force measurements using a hand-held dynamometer. Thirty healthy adults were separately assessed by two independent raters on two separate days. Rate of force development was calculated from the slope of the…
Cerciello, Simone; Monk, Andrew Paul; Visonà, Enrico; Carbone, Stefano; Edwards, Thomas Bradley; Maffulli, Nicola; Walch, Gilles
2017-07-01
Secondary cuff failure after shoulder replacement is disabling and often requires additional surgery. Increased critical shoulder angle (CSA) has been found in patients with cuff tear compared to normal subjects. The interobserver reliability of the CSA and the relationship between CSA and symptomatic secondary cuff failure after shoulder replacement were investigated. Nineteen patients with symptomatic cuff failure after anatomic shoulder replacement (mean FU 45 months) were compared to a control group of 29 patients showing no signs of symptomatic cuff failure (mean FU 105.7 months). The CSA was measured by two blinded surgeons at a mean follow-up of 45 and 105.7 months, respectively. Inter-observer reliability was calculated. The mean CSA in the study group in neutral, internal and external rotations were 33°, 34° and 34°, respectively. Corresponding values in the control group were 32°, 32° and 32°. The interclass correlation coefficient for the whole population between the two examiners were 0.956 (P < 0.01), 0.964 (P < 0.01) and 0.955 (P < 0.01), respectively. There were no significant differences of CSA values between patients who had undergone shoulder replacement and experienced late cuff failure and those in whom the same procedure had been successful. A good inter-observer reliability was found for the CSA method.
Petterssen, Max; Eljamel, Sarah; Eljamel, Sam
2014-09-01
Protoporphyrin-IX (Pp-IX) fluorescence had been used frequently in recent years to guide microsurgical resection of high-grade gliomas (HGG), particularly following the publication of a randomized controlled trial demonstrating its advantages. However, Pp-IX fluorescence is dependent upon the surgeons' eyes' perception of red fluorescent colour. This study was designed to evaluate human eye fluorescence perception and establish a fluorescence scale. 20 of 108 pre-recorded images from intraoperative fluorescence of HGG were used to construct an 8-panel visual analogue fluorescence scale. The scale was validated by testing 56 participants with normal colour vision and three red-green colour-blind participants. For intra-rater agreement ten participants were tested twice and for inter-observer reliability the whole cohort were tested. The intra- and inter-observer reliability of the scale in normal colour vision participants was excellent. The scale was less reliable in the violet-blue panels of the scale. Colour-blind participants were not able to distinguish between red fluorescence and blue-violet colours. The 8-panel fluorescence scale is valid in differentiating red, pink and blue colours in a fluorescence surgical field among participants with normal colour perception and potentially useful to standardize fluorescence-guided surgery. However, colourblind surgeons should not use fluorescence-guided surgery. Copyright © 2014 Elsevier B.V. All rights reserved.
Ahn, Su Joa; Lee, Jeong Min; Chang, Won; Lee, Sang Min; Kang, Hyo-Jin; Yang, Hyunkyung; Yoon, Jeong Hee; Park, Sae Jin; Han, Joon Koo
2017-01-01
To assess intra- and inter-observer reproducibility of a new point shear wave elastography technique (pSWE, S-Shearwave, Samsung Medison) and compare its accuracy in assessing liver stiffness (LS) with an established pSWE technique (Virtual Touch Quantification, VTQ). Thirty-three patients were enrolled in this Institutional Review Board-approved prospective study. LS values were measured by VTQ on an Acuson S2000 system (Siemens Healthineer) and S-Shearwave on an RS-80A (Samsung Medison) in the same session, followed by two further S-Shearwave sessions for inter- and intra-observer variation at 8-hour intervals. The technical success rate (SR) and reliability of the measurements of both pSWE techniques were compared. The intra- and inter-observer reproducibility of S-Shearwave was determined by intraclass correlation coefficients (ICCs). LS values were measured by both methods of pSWE. The diagnostic performance in severe fibrosis (F ≥ 3) and cirrhosis (F = 4) was evaluated using the receiver operating characteristics curve analysis and the Obuchowski measure with the LS values of transient elastography as the referenced standard. The VTQ (100%, 33/33) and S-Shearwave (96.9%, 32/33) techniques did not display a significant difference in technical SR ( p = 0.63) or reliability of LS measurements (96.9%, 32/33; 93.9%, 30/32, respectively, p = 0.61). The inter- and intra-observer agreement for LS measurements using the S-Shearwave technique was excellent (ICC = 0.98 and 0.99, respectively). The mean LS values of both pSWE techniques were not significantly different and exhibited a good correlation (r = 0.78). To detect F ≥ 3 and F = 4, VTQ and S-Shearwave showed comparable diagnostic accuracy as indicated by the following outcomes: areas under receiver operating characteristics curve (AUROC) = 0.87 (95% confidence intervals [CI] 0.70-0.96), 0.89 for VTQ (95% CI 0.74-0.97), respectively; and AUROC = 0.84 (95% CI 0.67-0.94), 0.94 (95% CI 0.80-0.99) for S-Shearwave (p > 0.48), respectively. The Obuchowski measures were similarly high for S-Shearwave and VTQ (0.94 vs. 0.95). S-Shearwave shows excellent inter- and intra-observer agreement and diagnostic effectiveness comparable to VTQ in detecting LS.
Validity of an Observation Method for Assessing Pain Behavior in Individuals With Multiple Sclerosis
Cook, Karon F.; Roddey, Toni S.; Bamer, Alyssa M.; Amtmann, Dagmar; Keefe, Francis J
2012-01-01
Context Pain is a common and complex experience for individuals who live with multiple sclerosis (MS) that interferes with physical, psychological and social function. A valid and reliable tool for quantifying observed pain behaviors in MS is critical to understanding how pain behaviors contribute to pain-related disability in this clinical population. Objectives To evaluate the reliability and validity of a pain behavioral observation protocol in individuals who have MS. Methods Community-dwelling volunteers with multiple sclerosis (N=30), back pain (N=5), or arthritis (N=8) were recruited based on clinician referrals, advertisements, fliers, web postings, and participation in previous research. Participants completed measures of pain severity, pain interference, and self-reported pain behaviors and were videotaped doing typical activities (e.g., walking, sitting). Two coders independently recorded frequencies of pain behaviors by category (e.g., guarding, bracing) and inter-rater reliability statistics were calculated. Naïve observers reviewed videotapes of individuals with MS and rated their pain. Spearman correlations were calculated between pain behavior frequencies and self-reported pain and pain ratings by naïve observers. Results Inter-rater reliability estimates indicated the reliability of pain codes in the MS sample. Kappa coefficients ranged from moderate agreement (sighing = 0.40) to substantial agreement (guarding = 0.83). These values were comparable to those obtained in the combined back pain and arthritis sample. Concurrent validity was supported by correlations with self-reported pain (0.46-0.53) and with self-reports of pain behaviors (0.58). Construct validity was supported by finding of 0.87 correlation between total pain behaviors observed by coders and mean pain ratings by naïve observers. Conclusion Results support use of the pain behavior observation protocol for assessing pain behaviors of individuals with MS. Valid assessments of pain behaviors of individuals with MS in could lead to creative interventions in the management of chronic pain in this population. PMID:23159684
Venkatraman, Vijay K; Gonzalez, Christopher E.; Landman, Bennett; Goh, Joshua; Reiter, David A.; An, Yang; Resnick, Susan M.
2017-01-01
Diffusion tensor imaging (DTI) measures are commonly used as imaging markers to investigate individual differences in relation to behavioral and health-related characteristics. However, the ability to detect reliable associations in cross-sectional or longitudinal studies is limited by the reliability of the diffusion measures. Several studies have examined reliability of diffusion measures within (i.e. intra-site) and across (i.e. inter-site) scanners with mixed results. Our study compares the test-retest reliability of diffusion measures within and across scanners and field strengths in cognitively normal older adults with a follow-up interval less than 2.25 years. Intra-class correlation (ICC) and coefficient of variation (CoV) of fractional anisotropy (FA) and mean diffusivity (MD) were evaluated in sixteen white matter and twenty-six gray matter bilateral regions. The ICC for intra-site reliability (0.32 to 0.96 for FA and 0.18 to 0.95 for MD in white matter regions; 0.27 to 0.89 for MD and 0.03 to 0.79 for FA in gray matter regions) and inter-site reliability (0.28 to 0.95 for FA in white matter regions, 0.02 to 0.86 for MD in gray matter regions) with longer follow-up intervals were similar to earlier studies using shorter follow-up intervals. The reliability of across field strengths comparisons was lower than intra- and inter-site reliability. Within and across scanner comparisons showed that diffusion measures were more stable in larger white matter regions (> 1500 mm3). For gray matter regions, the MD measure showed stability in specific regions and was not dependent on region size. Linear correction factor estimated from cross-sectional or longitudinal data improved the reliability across field strengths. Our findings indicate that investigations relating diffusion measures to external variables must consider variable reliability across the distinct regions of interest and that correction factors can be used to improve consistency of measurement across field strengths. An important result of this work is that inter-scanner and field strength effects can be partially mitigated with linear correction factors specific to regions of interest. These data-driven linear correction techniques can be applied in cross-sectional or longitudinal studies. PMID:26146196
Panchani, Sunil; Reading, Jonathan; Mehta, Jaysheel
2016-06-01
The position of the lateral sesamoid on standard dorso-plantar weight bearing radiographs, with respect to the lateral cortex of the first metatarsal, has been shown to correlate well with the degree of the hallux valgus angle. This study aimed to assess the inter- and intra-observer error of this new classification system. Five orthopaedic consultants and five trainee orthopaedic surgeons were recruited to assess and document the degree of displacement of the lateral sesamoid on 144 weight-bearing dorso-plantar radiographs on two separate occasions. The severity of hallux valgus was defined as normal (0%), mild (≤50%), moderate (51-≤99%) or severe (≥100%) depending on the percentage displacement of the lateral sesamoid body from the lateral cortical border of the first metatarsal. Consultant intra-observer variability showed good agreement between repeated assessment of the radiographs (mean Kappa=0.75). Intra-observer variability for trainee orthopaedic surgeons also showed good agreement with a mean Kappa=0.73. Intraclass correlations for consultants and trainee surgeons was also high. The new classification system of assessing the severity of hallux valgus shows high inter- and intra-observer variability with good agreement and reproducibility between surgeons of consultant and trainee grades. Copyright © 2015 Elsevier Ltd. All rights reserved.
Yen, Po-Yin; Kelley, Marjorie; Lopetegui, Marcelo; Rosado, Amber L.; Migliore, Elaina M.; Chipps, Esther M.; Buck, Jacalyn
2016-01-01
A fundamental understanding of multitasking within nursing workflow is important in today’s dynamic and complex healthcare environment. We conducted a time motion study to understand nursing workflow, specifically multitasking and task switching activities. We used TimeCaT, a comprehensive electronic time capture tool, to capture observational data. We established inter-observer reliability prior to data collection. We completed 56 hours of observation of 10 registered nurses. We found, on average, nurses had 124 communications and 208 hands-on tasks per 4-hour block of time. They multitasked (having communication and hands-on tasks simultaneously) 131 times, representing 39.48% of all times; the total multitasking duration ranges from 14.6 minutes to 109 minutes, 44.98 minutes (18.63%) on average. We also reviewed workflow visualization to uncover the multitasking events. Our study design and methods provide a practical and reliable approach to conducting and analyzing time motion studies from both quantitative and qualitative perspectives. PMID:28269924
Yen, Po-Yin; Kelley, Marjorie; Lopetegui, Marcelo; Rosado, Amber L; Migliore, Elaina M; Chipps, Esther M; Buck, Jacalyn
2016-01-01
A fundamental understanding of multitasking within nursing workflow is important in today's dynamic and complex healthcare environment. We conducted a time motion study to understand nursing workflow, specifically multitasking and task switching activities. We used TimeCaT, a comprehensive electronic time capture tool, to capture observational data. We established inter-observer reliability prior to data collection. We completed 56 hours of observation of 10 registered nurses. We found, on average, nurses had 124 communications and 208 hands-on tasks per 4-hour block of time. They multitasked (having communication and hands-on tasks simultaneously) 131 times, representing 39.48% of all times; the total multitasking duration ranges from 14.6 minutes to 109 minutes, 44.98 minutes (18.63%) on average. We also reviewed workflow visualization to uncover the multitasking events. Our study design and methods provide a practical and reliable approach to conducting and analyzing time motion studies from both quantitative and qualitative perspectives.
Thomas, Jennifer J; Eddy, Kamryn T; Murray, Helen B; Tromp, Marilou D P; Hartmann, Andrea S; Stone, Melissa T; Levendusky, Philip G; Becker, Anne E
2015-09-30
This study evaluated the relative distribution and inter-rater reliability of revised DSM-5 criteria for eating disorders in a residential treatment program. Consecutive adolescent and young adult females (N=150) admitted to a residential eating disorder treatment facility were assigned both DSM-IV and DSM-5 diagnoses by a clinician (n=14) via routine clinical interview and a research assessor (n=4) via structured interview. We compared the frequency of diagnostic assignments under each taxonomy and by type of assessor. We evaluated concordance between clinician and researcher assignment through inter-rater reliability kappa and percent agreement. Significantly fewer patients received either clinician or researcher diagnoses of a residual eating disorder under DSM-5 (clinician-12.0%; researcher-31.3%) versus DSM-IV (clinician-28.7%; researcher-59.3%), with the majority of reassigned DSM-IV residual cases reclassified as DSM-5 anorexia nervosa. Researcher and clinician diagnoses showed moderate inter-rater reliability under DSM-IV (κ=.48) and DSM-5 (κ=.57), though agreement for specific DSM-5 other specified feeding or eating disorder (OSFED) presentations was poor (κ=.05). DSM-5 revisions were associated with significantly less frequent residual eating disorder diagnoses, but not with reduced inter-rater reliability. Findings support specific dimensions of clinical utility for revised DSM-5 criteria for eating disorders. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Palliative sedation: reliability and validity of sedation scales.
Arevalo, Jimmy J; Brinkkemper, Tijn; van der Heide, Agnes; Rietjens, Judith A; Ribbe, Miel; Deliens, Luc; Loer, Stephan A; Zuurmond, Wouter W A; Perez, Roberto S G M
2012-11-01
Observer-based sedation scales have been used to provide a measurable estimate of the comfort of nonalert patients in palliative sedation. However, their usefulness and appropriateness in this setting has not been demonstrated. To study the reliability and validity of observer-based sedation scales in palliative sedation. A prospective evaluation of 54 patients under intermittent or continuous sedation with four sedation scales was performed by 52 nurses. Included scales were the Minnesota Sedation Assessment Tool (MSAT), Richmond Agitation-Sedation Scale (RASS), Vancouver Interaction and Calmness Scale (VICS), and a sedation score proposed in the Guideline for Palliative Sedation of the Royal Dutch Medical Association (KNMG). Inter-rater reliability was tested with the intraclass correlation coefficient (ICC) and Cohen's kappa coefficient. Correlations between the scales using Spearman's rho tested concurrent validity. We also examined construct, discriminative, and evaluative validity. In addition, nurses completed a user-friendliness survey. Overall moderate to high inter-rater reliability was found for the VICS interaction subscale (ICC = 0.85), RASS (ICC = 0.73), and KNMG (ICC = 0.71). The largest correlation between scales was found for the RASS and KNMG (rho = 0.836). All scales showed discriminative and evaluative validity, except for the MSAT motor subscale and VICS calmness subscale. Finally, the RASS was less time consuming, clearer, and easier to use than the MSAT and VICS. The RASS and KNMG scales stand as the most reliable and valid among the evaluated scales. In addition, the RASS was less time consuming, clearer, and easier to use than the MSAT and VICS. Further research is needed to evaluate the impact of the scales on better symptom control and patient comfort. Copyright © 2012 U.S. Cancer Pain Relief Committee. Published by Elsevier Inc. All rights reserved.
The TiltMeter app is a novel and accurate measurement tool for the weight bearing lunge test.
Williams, Cylie M; Caserta, Antoni J; Haines, Terry P
2013-09-01
The weight bearing lunge test is increasing being used by health care clinicians who treat lower limb and foot pathology. This measure is commonly established accurately and reliably with the use of expensive equipment. This study aims to compare the digital inclinometer with a free app, TiltMeter on an Apple iPhone. This was an intra-rater and inter-rater reliability study. Two raters (novice and experienced) conducted the measurements in both a bent knee and straight leg position to determine the intra-rater and inter-rater reliability. Concurrent validity was also established. Allied health practitioners were recruited as participants from the workplace. A preconditioning stretch was conducted and the ankle range of motion was established with the weight bearing lunge test position with firstly the leg straight and secondly with the knee bent. The measurement device and each participant were randomised during measurement. The intra-rater reliability and inter-rater reliability for the devices and in both positions were all over ICC 0.8 except for one intra-rater measure (Digital inclinometer, novice, ICC 0.65). The inter-rater reliability between the digital inclinometer and the tilmeter was near perfect, ICC 0.96 (CI: 0.898-0.983); Concurrent validity ICC between the two devices was 0.83 (CI: -0.740 to 0.445). The use of the Tiltmeter app on the iPhone is a reliable and inexpensive tool to measure the available ankle range of motion. Health practitioners should use caution in applying these findings to other smart phone equipment if surface areas are not comparable. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.
Urdu translation of the Hamilton Rating Scale for Depression: Results of a validation study
Hashmi, Ali M.; Naz, Shahana; Asif, Aftab; Khawaja, Imran S.
2016-01-01
Objective: To develop a standardized validated version of the Hamilton Rating Scale for Depression (HAM-D) in Urdu. Methods: After translation of the HAM-D into the Urdu language following standard guidelines, the final Urdu version (HAM-D-U) was administered to 160 depressed outpatients. Inter-item correlation was assessed by calculating Cronbach alpha. Correlation between HAM-D-U scores at baseline and after a 2-week interval was evaluated for test-retest reliability. Moreover, scores of two clinicians on HAM-D-U were compared for inter-rater reliability. For establishing concurrent validity, scores of HAM-D-U and BDI-U were compared by using Spearman correlation coefficient. The study was conducted at Mayo Hospital, Lahore, from May to December 2014. Results: The Cronbach alpha for HAM-D-U was 0.71. Composite scores for HAM-D-U at baseline and after a 2-week interval were also highly correlated with each other (Spearman correlation coefficient 0.83, p-value < 0.01) indicating good test-retest reliability. Composite scores for HAM-D-U and BDI-U were positively correlated with each other (Spearman correlation coefficient 0.85, p < 0.01) indicating good concurrent validity. Scores of two clinicians for HAM-D-U were also positively correlated (Spearman correlation coefficient 0.82, p-value < 0.01) indicated good inter-rater reliability. Conclusion: The HAM-D-U is a valid and reliable instrument for the assessment of Depression. It shows good inter-rater and test-retest reliability. The HAM-D-U can be a tool either for clinical management or research. PMID:28083049
Urdu translation of the Hamilton Rating Scale for Depression: Results of a validation study.
Hashmi, Ali M; Naz, Shahana; Asif, Aftab; Khawaja, Imran S
2016-01-01
To develop a standardized validated version of the Hamilton Rating Scale for Depression (HAM-D) in Urdu. After translation of the HAM-D into the Urdu language following standard guidelines, the final Urdu version (HAM-D-U) was administered to 160 depressed outpatients. Inter-item correlation was assessed by calculating Cronbach alpha. Correlation between HAM-D-U scores at baseline and after a 2-week interval was evaluated for test-retest reliability. Moreover, scores of two clinicians on HAM-D-U were compared for inter-rater reliability. For establishing concurrent validity, scores of HAM-D-U and BDI-U were compared by using Spearman correlation coefficient. The study was conducted at Mayo Hospital, Lahore, from May to December 2014. The Cronbach alpha for HAM-D-U was 0.71. Composite scores for HAM-D-U at baseline and after a 2-week interval were also highly correlated with each other (Spearman correlation coefficient 0.83, p-value < 0.01) indicating good test-retest reliability. Composite scores for HAM-D-U and BDI-U were positively correlated with each other (Spearman correlation coefficient 0.85, p < 0.01) indicating good concurrent validity. Scores of two clinicians for HAM-D-U were also positively correlated (Spearman correlation coefficient 0.82, p-value < 0.01) indicated good inter-rater reliability. The HAM-D-U is a valid and reliable instrument for the assessment of Depression. It shows good inter-rater and test-retest reliability. The HAM-D-U can be a tool either for clinical management or research.
Reeves, Mathew J; Mullard, Andrew J; Wehner, Susan
2008-01-01
Background The Paul Coverdell National Acute Stroke Registry (PCNASR) is a U.S. based national registry designed to monitor and improve the quality of acute stroke care delivered by hospitals. The registry monitors care through specific performance measures, the accuracy of which depends in part on the reliability of the individual data elements used to construct them. This study describes the inter-rater reliability of data elements collected in Michigan's state-based prototype of the PCNASR. Methods Over a 6-month period, 15 hospitals participating in the Michigan PCNASR prototype submitted data on 2566 acute stroke admissions. Trained hospital staff prospectively identified acute stroke admissions, abstracted chart information, and submitted data to the registry. At each hospital 8 randomly selected cases were re-abstracted by an experienced research nurse. Inter-rater reliability was estimated by the kappa statistic for nominal variables, and intraclass correlation coefficient (ICC) for ordinal and continuous variables. Factors that can negatively impact the kappa statistic (i.e., trait prevalence and rater bias) were also evaluated. Results A total of 104 charts were available for re-abstraction. Excellent reliability (kappa or ICC > 0.75) was observed for many registry variables including age, gender, black race, hemorrhagic stroke, discharge medications, and modified Rankin Score. Agreement was at least moderate (i.e., 0.75 > kappa ≥; 0.40) for ischemic stroke, TIA, white race, non-ambulance arrival, hospital transfer and direct admit. However, several variables had poor reliability (kappa < 0.40) including stroke onset time, stroke team consultation, time of initial brain imaging, and discharge destination. There were marked systematic differences between hospital abstractors and the audit abstractor (i.e., rater bias) for many of the data elements recorded in the emergency department. Conclusion The excellent reliability of many of the data elements supports the use of the PCNASR to monitor and improve care. However, the poor reliability for several variables, particularly time-related events in the emergency department, indicates the need for concerted efforts to improve the quality of data collection. Specific recommendations include improvements to data definitions, abstractor training, and the development of ED-based real-time data collection systems. PMID:18547421
Objections to routine clinical outcomes measurement in mental health services: any evidence so far?
MacDonald, Alastair J D; Trauer, Tom
2010-12-01
Routine clinical outcomes measurement (RCOM) is gaining importance in mental health services. To examine whether criticisms published in advance of the development of RCOM have been borne out by data now available from such a programme. This was an observational study of routine ratings using HoNOS65+ at inception/admission and again at discharge in an old age psychiatry service from 1997 to 2008. Testable hypotheses were generated from each criticism amenable to empirical examination. Inter-rater reliability estimates were applied to observed differences between scores between community and ward patients using resampling. Five thousand one hundred eighty community inceptions and 862 admissions had HoNOS65+ ratings at referral/admission and discharge. We could find no evidence of gaming (artificially worse scores at inception and better at discharge), selection, attrition or detection bias, and ratings were consistent with diagnosis and level of service. Anticipated low levels of inter-rater reliability did not vitiate differences between levels of service. Although only hypotheses testable from within RCOM data were examined, and only 46% of eligible episodes had complete outcomes data, no evidence of the alleged biases were found. RCOM seems valid and practical in mental health services.
Development of the Responsiveness to Child Feeding Cues Scale
Hodges, Eric A.; Johnson, Susan L.; Hughes, Sheryl O.; Hopkinson, Judy M.; Butte, Nancy F.; Fisher, Jennifer O.
2013-01-01
Parent-child feeding interactions during the first two years of life are thought to shape child appetite and obesity risk, but remain poorly studied. This research was designed to develop and assess the Responsiveness to Child Feeding Cues Scale (RCFCS), an observational measure of caregiver responsiveness to child feeding cues relevant to obesity. General responsiveness during feeding as well as maternal responsiveness to child hunger and fullness were rated during mid-morning feeding occasions by 3 trained coders using digitally-recordings. Initial inter-rater reliability and criterion validity were evaluated in a sample of 144 ethnically-diverse mothers of healthy 7- to 24-month-old children. Maternal self-report of demographics and measurements of maternal/child anthropometrics were obtained. Inter-rater agreement for most variables was excellent (ICC>0.80). Mothers tended to be more responsive to child hunger than fullness cues (p<0.001). Feeding responsiveness dimensions were associated with demographics, including maternal education, maternal body mass index, and child age, and aspects of feeding, including breastfeeding duration, and self-feeding. The RCFCS is a reliable observational measure of responsive feeding for children <2 years of age that is relevant to obesity in early development. PMID:23419965
Inter-Rater Reliability of Total Body Score-A Scale for Quantification of Corpse Decomposition.
Nawrocka, Marta; Frątczak, Katarzyna; Matuszewski, Szymon
2016-05-01
The degree of body decomposition can be quantified using Total Body Score (TBS), a scale frequently used in taphonomic or entomological studies of decomposition. Here, the inter-rater reliability of the scale is analyzed. The study was made on 120 laymen, which were trained in the use of the scale. Participants scored decomposition of pig carcasses from photographs. It was found that the scale, when used by different people, gives homogeneous results irrespective of the user qualifications (the Krippendorff's alfa for all participants was 0.818). The study also indicated that carcasses in advanced decomposition receive significantly less accurate scores. Moreover, it was found that scores for cadavers in mosaic decomposition (i.e., representing signs of at least two stages of decomposition) are less accurate. These results demonstrate that the scale may be regarded as inter-rater reliable. Some propositions for refinement of the scale were also discussed. © 2016 American Academy of Forensic Sciences.
Larson, Tomas; Kerekes, Nóra; Selinus, Eva Norén; Lichtenstein, Paul; Gumpert, Clara Hellner; Anckarsäter, Henrik; Nilsson, Thomas; Lundström, Sebastian
2014-02-01
The Autism-Tics, AD/HD, and other Comorbidities (A-TAC) inventory is used in epidemiological research to assess neurodevelopmental problems and coexisting conditions. Although the A-TAC has been applied in various populations, data on retest reliability are limited. The objective of the present study was to present additional reliability data. The A-TAC was administered by lay assessors and was completed on two occasions by parents of 400 individual twins, with an average interval of 70 days between test sessions. Intra- and inter-rater reliability were analysed with intraclass correlations and Cohen's kappa. A-TAC showed excellent test-retest intraclass correlations for both autism spectrum disorder and attention deficit hyperactivity disorder (each at .84). Most modules in the A-TAC had intra- and inter-rater reliability intraclass correlation coefficients of > or = .60. Cohen's kappa indi- cated acceptable reliability. The current study provides statistical evidence that the A-TAC yields good test-retest reliability in a population-based cohort of children.
Perraton, Luke G.; Bower, Kelly J.; Adair, Brooke; Pua, Yong-Hao; Williams, Gavin P.; McGaw, Rebekah
2015-01-01
Introduction Hand-held dynamometry (HHD) has never previously been used to examine isometric muscle power. Rate of force development (RFD) is often used for muscle power assessment, however no consensus currently exists on the most appropriate method of calculation. The aim of this study was to examine the reliability of different algorithms for RFD calculation and to examine the intra-rater, inter-rater, and inter-device reliability of HHD as well as the concurrent validity of HHD for the assessment of isometric lower limb muscle strength and power. Methods 30 healthy young adults (age: 23±5yrs, male: 15) were assessed on two sessions. Isometric muscle strength and power were measured using peak force and RFD respectively using two HHDs (Lafayette Model-01165 and Hoggan microFET2) and a criterion-reference KinCom dynamometer. Statistical analysis of reliability and validity comprised intraclass correlation coefficients (ICC), Pearson correlations, concordance correlations, standard error of measurement, and minimal detectable change. Results Comparison of RFD methods revealed that a peak 200ms moving window algorithm provided optimal reliability results. Intra-rater, inter-rater, and inter-device reliability analysis of peak force and RFD revealed mostly good to excellent reliability (coefficients ≥ 0.70) for all muscle groups. Concurrent validity analysis showed moderate to excellent relationships between HHD and fixed dynamometry for the hip and knee (ICCs ≥ 0.70) for both peak force and RFD, with mostly poor to good results shown for the ankle muscles (ICCs = 0.31–0.79). Conclusions Hand-held dynamometry has good to excellent reliability and validity for most measures of isometric lower limb strength and power in a healthy population, particularly for proximal muscle groups. To aid implementation we have created freely available software to extract these variables from data stored on the Lafayette device. Future research should examine the reliability and validity of these variables in clinical populations. PMID:26509265
McCool, Megan E.; Wahl, Josepha; Schlecht, Inga; Apfelbacher, Christian
2015-01-01
Patients actively seek information about how to cope with their health problems, but the quality of the information available varies. A number of instruments have been developed to assess the quality of patient information, primarily though in English. Little is known about the reliability of these instruments when applied to patient information in German. The objective of our study was to investigate and compare the reliability of two validated instruments, DISCERN and EQIP, in order to determine which of these instruments is better suited for a further study pertaining to the quality of information available to German patients with eczema. Two independent raters evaluated a random sample of 20 informational brochures in German. All the brochures addressed eczema as a disorder and/or therapy options and care. Intra-rater and inter-rater reliability were assessed by calculating intra-class correlation coefficients, agreement was tested with weighted kappas, and the correlation of the raters’ scores for each instrument was measured with Pearson’s correlation coefficient. DISCERN demonstrated substantial intra- and inter-rater reliability. It also showed slightly better agreement than EQIP. There was a strong correlation of the raters’ scores for both instruments. The findings of this study support the reliability of both DISCERN and EQIP. However, based on the results of the inter-rater reliability, agreement and correlation analyses, we consider DISCERN to be the more precise tool for our project on patient information concerning the treatment and care of eczema. PMID:26440612
McCool, Megan E; Wahl, Josepha; Schlecht, Inga; Apfelbacher, Christian
2015-01-01
Patients actively seek information about how to cope with their health problems, but the quality of the information available varies. A number of instruments have been developed to assess the quality of patient information, primarily though in English. Little is known about the reliability of these instruments when applied to patient information in German. The objective of our study was to investigate and compare the reliability of two validated instruments, DISCERN and EQIP, in order to determine which of these instruments is better suited for a further study pertaining to the quality of information available to German patients with eczema. Two independent raters evaluated a random sample of 20 informational brochures in German. All the brochures addressed eczema as a disorder and/or therapy options and care. Intra-rater and inter-rater reliability were assessed by calculating intra-class correlation coefficients, agreement was tested with weighted kappas, and the correlation of the raters' scores for each instrument was measured with Pearson's correlation coefficient. DISCERN demonstrated substantial intra- and inter-rater reliability. It also showed slightly better agreement than EQIP. There was a strong correlation of the raters' scores for both instruments. The findings of this study support the reliability of both DISCERN and EQIP. However, based on the results of the inter-rater reliability, agreement and correlation analyses, we consider DISCERN to be the more precise tool for our project on patient information concerning the treatment and care of eczema.
Fotina, I; Lütgendorf-Caucig, C; Stock, M; Pötter, R; Georg, D
2012-02-01
Inter-observer studies represent a valid method for the evaluation of target definition uncertainties and contouring guidelines. However, data from the literature do not yet give clear guidelines for reporting contouring variability. Thus, the purpose of this work was to compare and discuss various methods to determine variability on the basis of clinical cases and a literature review. In this study, 7 prostate and 8 lung cases were contoured on CT images by 8 experienced observers. Analysis of variability included descriptive statistics, calculation of overlap measures, and statistical measures of agreement. Cross tables with ratios and correlations were established for overlap parameters. It was shown that the minimal set of parameters to be reported should include at least one of three volume overlap measures (i.e., generalized conformity index, Jaccard coefficient, or conformation number). High correlation between these parameters and scatter of the results was observed. A combination of descriptive statistics, overlap measure, and statistical measure of agreement or reliability analysis is required to fully report the interrater variability in delineation.
Tang, Wing Sze; Chow, Yeow Leng; Koh, Serena Siew Lin
2014-02-01
A prospective, descriptive study was conducted in an acute care hospital in Singapore to determine the inter-rater reliability of the modified Morse Fall Scale by evaluating the degrees of agreement on the ratings of the individual items and overall score between the 'gold standard' assessor and the facility assessors. One hundred and forty-two subjects were recruited during the 1.5 month data collection period. The simple and weighted κ-values were all > 0.8 except for the item 'effects of medications' (κ and κw = 0.63), and the correlation coefficient (rs = 0.89) was significantly high at a significance level of < 0.001. The modified Morse Fall Scale was shown to be a reliable fall risk assessment tool having a relative high inter-rater reliability level for the overall score and individual items. This study provides evidence-based psychometric support for the clinical application of this tool. © 2013 Wiley Publishing Asia Pty Ltd.
Gritsiouk, Yaroslav; Hegsted, Damian; Gardiner, Stuart; Merriman, Lisa; Gubler, Kelly Dean
2013-05-01
Little is known about the reliability of data collected by abstractors without professional medical training. This investigation sought to determine the level of agreement among untrained volunteer abstractors as part of a study to evaluate the risk assessment of venous thromboembolism in patients who have undergone trauma. Forty-nine paper charts were chosen randomly from a volunteer-reviewed cohort of 2,339 and were compared with those of a single experienced abstractor. Inter-rater agreement was assessed using percent agreement, Cohen's kappa, and prevalence-adjusted bias-adjusted kappa (PABAK). Of the 71 data points, 28 had perfect agreement. The average agreement across all charts was 97%. Data with imperfect agreement had kappa values between .27 and .96 (mean, .75), with one additional value at zero even though it was associated with an agreement of 94%. PABAK values ranged from .67 to .98 (mean, .91), an average increase of .17 compared with kappa values. The performance of volunteers showed outstanding inter-rater reliability; however, limitations of interpretation can influence reliability. Copyright © 2013 Elsevier Inc. All rights reserved.
Muhamad, Zailani; Ramli, Ayiesah; Amat, Salleh
2015-05-01
The aim of this study was to determine the content validity, internal consistency, test-retest reliability and inter-rater reliability of the Clinical Competency Evaluation Instrument (CCEVI) in assessing the clinical performance of physiotherapy students. This study was carried out between June and September 2013 at University Kebangsaan Malaysia (UKM), Kuala Lumpur, Malaysia. A panel of 10 experts were identified to establish content validity by evaluating and rating each of the items used in the CCEVI with regards to their relevance in measuring students' clinical competency. A total of 50 UKM undergraduate physiotherapy students were assessed throughout their clinical placement to determine the construct validity of these items. The instrument's reliability was determined through a cross-sectional study involving a clinical performance assessment of 14 final-year undergraduate physiotherapy students. The content validity index of the entire CCEVI was 0.91, while the proportion of agreement on the content validity indices ranged from 0.83-1.00. The CCEVI construct validity was established with factor loading of ≥0.6, while internal consistency (Cronbach's alpha) overall was 0.97. Test-retest reliability of the CCEVI was confirmed with a Pearson's correlation range of 0.91-0.97 and an intraclass coefficient correlation range of 0.95-0.98. Inter-rater reliability of the CCEVI domains ranged from 0.59 to 0.97 on initial and subsequent assessments. This pilot study confirmed the content validity of the CCEVI. It showed high internal consistency, thereby providing evidence that the CCEVI has moderate to excellent inter-rater reliability. However, additional refinement in the wording of the CCEVI items, particularly in the domains of safety and documentation, is recommended to further improve the validity and reliability of the instrument.
A study of the reliability of the Nociception Coma Scale.
Riganello, F; Cortese, M D; Arcuri, F; Candelieri, A; Guglielmino, F; Dolce, G; Sannita, W G; Schnakers, C
2015-04-01
In this study, we investigated the reliability of the Nociception Coma Scale which has recently been developed to assess nociception in non-communicative, severely brain-injured patients. Prospective cross-sequential study. Semi-intensive care unit and long-term brain injury care. Forty-four patients diagnosed as being in a vegetative state (n=26) or in a minimally conscious state (n=18). Patients were assessed by two experts (rater A and rater B) on two consecutive weeks to measure inter-rater agreement and test-retest reliability. Total scores and subscores of the Nociception Coma Scale. We performed a total of 176 assessments. The inter-rater agreement was moderate for the total scores (k = 0.57) and fair to substantial for the subscores (0.33 ≤ k ≤ 0.62) on week 2. The test-retest reliability was substantial for the total scores (k = 0.66) and moderate to almost perfect for the subscores (0.53 ≤ k ≤ 0.96) for rater A. The inter-rater agreement was weaker on week 1, whereas the test-retest reliability was lower for the least experienced rater (rater B). This study provides further evidence of the psychometric qualities of the Nociception Coma Scale. Future studies should assess the impact of practical experience and background on administration and scoring of the scale. © The Author(s) 2014.
Karbalaie, Abdolamir; Abtahi, Farhad; Fatemi, Alimohammad; Etehadtavakol, Mahnaz; Emrani, Zahra; Erlandsson, Björn-Erik
2017-09-01
Nailfold capillaroscopy is a practical method for identifying and obtaining morphological changes in capillaries which might reveal relevant information about diseases and health. Capillaroscopy is harmless, and seems simple and repeatable. However, there is lack of established guidelines and instructions for acquisition as well as the interpretation of the obtained images; which might lead to various ambiguities. In addition, assessment and interpretation of the acquired images are very subjective. In an attempt to overcome some of these problems, in this study a new modified technique for assessment of nailfold capillary density is introduced. The new method is named elliptic broken line (EBL) which is an extension of the two previously known methods by defining clear criteria for finding the apex of capillaries in different scenarios by using a fitted elliptic. A graphical user interface (GUI) is developed for pre-processing, manual assessment of capillary apexes and automatic correction of selected apexes based on 90° rule. Intra- and inter-observer reliability of EBL and corrected EBL is evaluated in this study. Four independent observers familiar with capillaroscopy performed the assessment for 200 nailfold videocapillaroscopy images, form healthy subject and systemic lupus erythematosus patients, in two different sessions. The results show elevation from moderate (ICC=0.691) and good (ICC=0.753) agreements to good (ICC=0.750) and good (ICC=0.801) for intra- and inter-observer reliability after automatic correction of EBL. This clearly shows the potential of this method to improve the reliability and repeatability of assessment which motivates us for further development of automatic tool for EBL method. Copyright © 2017 Elsevier Inc. All rights reserved.
Sellers, Ceri; Dall, Philippa; Grant, Margaret; Stansfield, Ben
2016-01-01
Characterisation of free-living physical activity requires the use of validated and reliable monitors. This study reports an evaluation of the validity and reliability of the activPAL3 monitor for the detection of posture and stepping in both adults and young people. Twenty adults (median 27.6y; IQR22.6y) and 8 young people (12.0y; IQR4.1y) performed standardised activities and activities of daily living (ADL) incorporating sedentary, upright and stepping activity. Agreement, specificity and positive predictive value were calculated between activPAL3 outcomes and the gold-standard of video observation. Inter-device reliability was calculated between 4 monitors. Sedentary and upright times for standardised activities were within ±5% of video observation as was step count (excluding jogging) for both adults and young people. Jogging step detection accuracy reduced with increasing cadence >150stepsmin(-1). For ADLs, sensitivity to stepping was very low for adults (40.4%) but higher for young people (76.1%). Inter-device reliability was either good (ICC(1,1)>0.75) or excellent (ICC(1,1)>0.90) for all outcomes. An excellent level of detection of standardised postures was demonstrated by the activPAL3. Postures such as seat-perching, kneeling and crouching were misclassified when compared to video observation. The activPAL3 appeared to accurately detect 'purposeful' stepping during ADL, but detection of smaller stepping movements was poor. Small variations in outcomes between monitors indicated that differences in monitor placement or hardware may affect outcomes. In general, the detection of posture and purposeful stepping with the activPAL3 was excellent indicating that it is a suitable monitor for characterising free-living posture and purposeful stepping activity in healthy adults and young people. Copyright © 2015 Elsevier B.V. All rights reserved.
Fell, Matthew; Meirte, Jill; Anthonissen, Mieke; Maertens, Koen; Pleat, Jonathon; Moortgat, Peter
2016-03-01
Objective scar assessment tools were designed to help identify problematic scars and direct clinical management. Their use has been restricted by their measurement of a single scar property and the bulky size of equipment. The Scarbase Duo(®) was designed to assess both trans-epidermal water loss (TEWL) and colour of a burn scar whilst being compact and easy to use. Twenty patients with a burn scar were recruited and measurements taken using the Scarbase Duo(®) by two observers. The Scarbase Duo(®) measures TEWL via an open-chamber system and undertakes colorimetry via narrow-band spectrophotometry, producing values for relative erythema and melanin pigmentation. Validity was assessed by comparing the Scarbase Duo(®) against the Dermalab(®) and the Minolta Chromameter(®) respectively for TEWL and colorimetry measurements. The intra-class correlation coefficient (ICC) was used to assess reliability with standard error of measurement (SEM) used to assess reproducibility of measurements. The Pearson correlation coefficient (r) was used to assess the convergent validity. The Scarbase Duo(®) TEWL mode had excellent reliability when used on scars for both intra- (ICC=0.95) and inter-rater (ICC=0.96) measurements with moderate SEM values. The erythema component of the colorimetry mode showed good reliability for use on scars for both intra-(ICC=0.81) and inter-rater (ICC=0.83) measurements with low SEM values. Pigmentation values showed excellent reliability on scar tissue for both intra- (ICC=0.97) and inter-rater (ICC=0.97) with moderate SEM values. The Scarbase Duo(®) TEWL function had excellent correlation with the Dermalab(®) (r=0.93) whilst the colorimetry erythema value had moderate correlation with the Minolta Chromameter (r=0.72). The Scarbase Duo(®) is a reliable and objective scar assessment tool, which is specifically designed for burn scars. However, for clinical use, standardised measurement conditions are recommended. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.
Ammann, Claudia; Lindquist, Martin A; Celnik, Pablo A
It is well known that transcranial direct current stimulation (tDCS) is capable of modulating corticomotor excitability. However, a source of growing concern has been the observed inter- and intra-individual variability of tDCS-responses. Recent studies have assessed whether individuals respond in a predictable manner across repeated sessions of anodal tDCS (atDCS). The findings of these investigations have been inconsistent, and their methods have some limitations (i.e. lack of sham condition or testing only one tDCS intensity). To study inter- and intra-individual variability of atDCS effects at two different intensities on primary motor cortex (M1) excitability. Twelve subjects participated in a crossover study testing 7-min atDCS over M1 in three separate conditions (2 mA, 1 mA, sham) each repeated three times separated by 48 h. Motor evoked potentials were recorded before and after stimulation (up to 30min). Time of testing was maintained consistent within participants. To estimate the reliability of tDCS effects across sessions, we calculated the Intra-class Correlation Coefficient (ICC). AtDCS at 2 mA, but not 1 mA, significantly increased cortical excitability at the group level in all sessions. The overall ICC revealed fair to high reliability of tDCS effects for multiple sessions. Given that the distribution of responses showed important variability in the sham condition, we established a Sham Variability-Based Threshold to classify responses and to track individual changes across sessions. Using this threshold an intra-individual consistent response pattern was then observed only for the 2 mA condition. 2 mA anodal tDCS results in consistent intra- and inter-individual increases of M1 excitability. Copyright © 2017 Elsevier Inc. All rights reserved.
Manzi, Luigi; Villafañe, Jorge Hugo; Indino, Cristian; Tamini, Jacopo; Berjano, Pedro; Usuelli, Federico Giuseppe
2017-11-08
The purpose of this study was to investigate the test-retest reliability of the Phi angle in patients undergoing total ankle replacement (TAR) for end stage ankle osteoarthritis (OA) to assess the rotational alignment of the talar component. Retrospective observational cross-sectional study of prospectively collected data. Post-operative anteroposterior radiographs of the foot of 170 patients who underwent TAR for the ankle OA were evaluated. Three physicians measured Phi on the 170 randomly sorted and anonymized radiographs on two occasions, one week apart (test and retest conditions), inter and intra-observer agreement were evaluated. Test-retest reliability of Phi angle measurement was excellent for patients with Hintegra TAR (ICC=0.995; p<0.001) and Zimmer TAR (ICC=0.995; p<0.001) on radiographs of subjects with ankle OA. There were no significant differences in the reliability of the Phi angle measurement between patients with Hintegra vs. Zimmer implants (p>0.05). Measurement of Phi angle on weight-bearing dorsoplantar radiograph showed an excellent reliability among orthopaedic surgeons in determining the position of the talar component in the axial plane. Level II, cross sectional study. Copyright © 2017 European Foot and Ankle Society. Published by Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Ghirardelli, Alyssa; Quinn, Valerie; Sugerman, Sharon
2011-01-01
Objective: To develop a retail grocery instrument with weighted scoring to be used as an indicator of the food environment. Participants/Setting: Twenty six retail food stores in low-income areas in California. Intervention: Observational. Main Outcome Measure(s): Inter-rater reliability for grocery store survey instrument. Description of store…
New endoscopic indicator of esophageal achalasia: "pinstripe pattern".
Minami, Hitomi; Isomoto, Hajime; Miuma, Satoshi; Kobayashi, Yasutoshi; Yamaguchi, Naoyuki; Urabe, Shigetoshi; Matsushima, Kayoko; Akazawa, Yuko; Ohnita, Ken; Takeshima, Fuminao; Inoue, Haruhiro; Nakao, Kazuhiko
2015-01-01
Endoscopic diagnosis of esophageal achalasia lacking typical endoscopic features can be extremely difficult. The aim of this study was to identify simple and reliable early indicator of esophageal achalasia. This single-center retrospective study included 56 cases of esophageal achalasia without previous treatment. As a control, 60 non-achalasia subjects including reflux esophagitis and superficial esophageal cancer were also included in this study. Endoscopic findings were evaluated according to Descriptive Rules for Achalasia of the Esophagus as follows: (1) esophageal dilatation, (2) abnormal retention of liquid and/or food, (3) whitish change of the mucosal surface, (4) functional stenosis of the esophago-gastric junction, and (5) abnormal contraction. Additionally, the presence of the longitudinal superficial wrinkles of esophageal mucosa, "pinstripe pattern (PSP)" was evaluated endoscopically. Then, inter-observer diagnostic agreement was assessed for each finding. The prevalence rates of the above-mentioned findings (1-5) were 41.1%, 41.1%, 16.1%, 94.6%, and 43.9%, respectively. PSP was observed in 60.7% of achalasia, while none of the control showed positivity for PSP. PSP was observed in 26 (62.5%) of 35 cases with shorter history < 10 years, which usually lacks typical findings such as severe esophageal dilation and tortuosity. Inter-observer agreement level was substantial for food/liquid remnant (k = 0.6861) and PSP (k = 0.6098), and was fair for abnormal contraction and white change. The accuracy, sensitivity, and specificity for achalasia were 83.8%, 64.7%, and 100%, respectively. "Pinstripe pattern" could be a reliable indicator for early discrimination of primary esophageal achalasia.
Ventura-Ríos, Lucio; Hernández-Díaz, Cristina; Ferrusquia-Toríz, Diana; Cruz-Arenas, Esteban; Rodríguez-Henríquez, Pedro; Alvarez Del Castillo, Ana Laura; Campaña-Parra, Alfredo; Canul, Efrén; Guerrero Yeo, Gerardo; Mendoza-Ruiz, Juan Jorge; Pérez Cristóbal, Mario; Sicsik, Sandra; Silva Luna, Karina
2017-12-01
This study aims to test the reliability of ultrasound to graduate synovitis in static and video images, evaluating separately grayscale and power Doppler (PD), and combined. Thirteen trained rheumatologist ultrasonographers participated in two separate rounds reading 42 images, 15 static and 27 videos, of the 7-joint count [wrist, 2nd and 3rd metacarpophalangeal (MCP), 2nd and 3rd interphalangeal (IPP), 2nd and 5th metatarsophalangeal (MTP) joints]. The images were from six patients with rheumatoid arthritis, performed by one ultrasonographer. Synovitis definition was according to OMERACT. Scoring system in grayscale, PD separately, and combined (GLOESS-Global OMERACT-EULAR Score System) were reviewed before exercise. Reliability intra- and inter-reading was calculated with Cohen's kappa weighted, according to Landis and Koch. Kappa values for inter-reading were good to excellent. The minor kappa was for GLOESS in static images, and the highest was for the same scoring in videos (k 0.59 and 0.85, respectively). Excellent values were obtained for static PD in 5th MTP joint and for PD video in 2nd MTP joint. Results for GLOESS in general were good to moderate. Poor agreement was observed in 3rd MCP and 3rd IPP in all kinds of images. Intra-reading agreement were greater in grayscale and GLOESS in static images than in videos (k 0.86 vs. 0.77 and k 0.86 vs. 0.71, respectively), but PD was greater in videos than in static images (k 1.0 vs. 0.79). The reliability of the synovitis scoring through static images and videos is in general good to moderate when using grayscale and PD separately or combined.
Reliability of plain radiographic parameters for developmental dysplasia of the hip in children.
Upasani, Vidyadhar V; Bomar, James D; Parikh, Gaurav; Hosalkar, Harish
2012-07-01
Few studies have evaluated the reliability and reproducibility of the femoral neck-shaft angle (NSA), center-edge angle (CEA), and acetabular index (AI) in young children with developmental dysplasia of the hip (DDH). We wanted to determine whether these parameters could be used reliably by practitioners. Fifty radiographs from 21 children with DDH were reviewed. Analysis was performed by three observers, at two time periods. The intra- and inter-observer reliability for each measure was assessed. At time period one, we noted a "high" level of agreement between observers when measuring the NSA, a "low" level when measuring the CEA, and a "moderate" level when measuring the AI. At time period two, we noted a "very high" level of agreement between observers when measuring the NSA and a "high" level when measuring the CEA and AI. When comparing the measurements of observer 1 at the two different time periods, we noted nearly "very high" agreement when measuring the NSA, a "moderate" agreement when measuring the CEA, and a "high" agreement for the AI. In comparing the measurements of observer 2, we noted "very high" agreement for the NSA and "high" agreement for the CEA and AI. In comparing the measurements for observer 3, we noted nearly "very high" agreement for the NSA, nearly "high" agreement for the CEA, and "high" agreement for the AI. It is difficult to reliably measure three-dimensional pelvic morphology on a frontal plane radiograph, especially when important pelvic landmarks have yet to ossify.
Validity of a smartphone protractor to measure sagittal parameters in adult spinal deformity.
Kunkle, William Aaron; Madden, Michael; Potts, Shannon; Fogelson, Jeremy; Hershman, Stuart
2017-10-01
Smartphones have become an integral tool in the daily life of health-care professionals (Franko 2011). Their ease of use and wide availability often make smartphones the first tool surgeons use to perform measurements. This technique has been validated for certain orthopedic pathologies (Shaw 2012; Quek 2014; Milanese 2014; Milani 2014), but never to assess sagittal parameters in adult spinal deformity (ASD). This study was designed to assess the validity, reproducibility, precision, and efficiency of using a smartphone protractor application to measure sagittal parameters commonly measured in ASD assessment and surgical planning. This study aimed to (1) determine the validity of smartphone protractor applications, (2) determine the intra- and interobserver reliability of smartphone protractor applications when used to measure sagittal parameters in ASD, (3) determine the efficiency of using a smartphone protractor application to measure sagittal parameters, and (4) elucidate whether a physician's level of experience impacts the reliability or validity of using a smartphone protractor application to measure sagittal parameters in ASD. An experimental validation study was carried out. Thirty standard 36″ standing lateral radiographs were examined. Three separate measurements were performed using a marker and protractor; then at a separate time point, three separate measurements were performed using a smartphone protractor application for all 30 radiographs. The first 10 radiographs were then re-measured two more times, for a total of three measurements from both the smartphone protractor and marker and protractor. The parameters included lumbar lordosis, pelvic incidence, and pelvic tilt. Three raters performed all measurements-a junior level orthopedic resident, a senior level orthopedic resident, and a fellowship-trained spinal deformity surgeon. All data, including the time to perform the measurements, were recorded, and statistical analysis was performed to determine intra- and interobserver reliability, as well as accuracy, efficiency, and precision. Statistical analysis using the intra- and interclass correlation coefficient was calculated using R (version 3.3.2, 2016) to determine the degree of intra- and interobserver reliability. High rates of intra- and interobserver reliability were observed between the junior resident, senior resident, and attending surgeon when using the smartphone protractor application as demonstrated by high inter- and intra-class correlation coefficients greater than 0.909 and 0.874 respectively. High rates of inter- and intraobserver reliability were also seen between the junior resident, senior resident, and attending surgeon when a marker and protractor were used as demonstrated by high inter- and intra-class correlation coefficients greater than 0.909 and 0.807 respectively. The lumbar lordosis, pelvic incidence, and pelvic tilt values were accurately measured by all three raters, with excellent inter- and intra-class correlation coefficient values. When the first 10 radiographs were re-measured at different time points, a high degree of precision was noted. Measurements performed using the smartphone application were consistently faster than using a marker and protractor-this difference reached statistical significance of p<.05. Adult spinal deformity radiographic parameters can be measured accurately, precisely, reliably, and more efficiently using a smartphone protractor application than with a standard protractor and wax pencil. A high degree of intra- and interobserver reliability was seen between the residents and attending surgeon, indicating measurements made with a smartphone protractor are unaffected by an observer's level of experience. As a result, smartphone protractors may be used when planning ASD surgery. Copyright © 2017 Elsevier Inc. All rights reserved.
Development and evaluation of the nurse quality of communication with patient questionnaire.
Vuković, Mira; Gvozdenović, Branislav S; Stamatović-Gajić, Branka; Ilić, Miodrag; Gajić, Tomislav
2010-01-01
Nurse/patient relationship as a complex interrelation or as an interaction of the factor patient and factor nurse has been a subject of a number of studies during the past ten years. Nurse/patient communication is a special entity, usually observed within a framework of the wider nurse/patient relationship. In that regard, we wanted to develop a standardized questionnaire that could reliably measure the quality of communication between nurse and patient, and be used by nurses. The main goal of this study was to develop and evaluate construct validity of the Nurse Quality of Communication with Patient Questionnaire (NQCPQ), as well as to evaluate its reliability. The goal was also to establish a measure of inter-raters reliability, using two repeated measurements of results by items and scores of the NQCPQ, on the same observed units by two assessors. The starting NQCPQ that consists of 25 items, was filled in by two groups of nurses. Each nurse was questioned during morning and afternoon shifts, in order to evaluate their communication with hospitalized patients, using marks from 1 to 6. To evaluate construct validity, we used the analysis of main components, while reliability was assessed using intraclass correlation coefficient and Cronbach-alpha coefficient. To evaluate interraters reliability, we used Pearson correlation coefficient. Using a group of 118 patients, we explained 86% of the unknown, regarding the investigated phenomenon (communication nurse/patient), using one component by which we separated 6 items of the questionnaire. Inter-item correlation (alpha) in this component was 0.96. Pearson correlation coefficient was highly significant, value 0.7 by item, and correlation coefficient for scores at repeated measurements was 0.84. NQCPQ is 6-item instrument with high construct validity. It can be used to measure quality of nurse/patient communication in a simple, fast and reliable way. It could contribute to more adequate research and defining of this problem, and as such could be used in studies of interaction of psychometric, clinical, biochemical, socio-cultural, demographic and other parameters as well.
Tsuno, Kanami; Yoshimasu, Kouichi; Hayashi, Takashi; Tatsuta, Nozomi; Ito, Yuki; Kamijima, Michihiro; Nakai, Kunihiko
2018-01-01
Nowadays, attention deficit hyperactivity (ADH) problems are observed commonly among school-age children. However, questionnaires specific to ADH behaviors among preschool children are very few. The aim of this study was to investigate the reliability and validity of the 25-item Behavioral Check List (BCL), which was developed from interviews of parents with children who were diagnosed as having Attention-deficit/hyperactivity disorder (ADHD) and measures ADH behaviors in preschool age. We recruited 22 teachers from 10 nurseries/kindergartens in Miyagi Prefecture, Japan. A total of 138 preschool children were assessed using the BCL. To investigate inter-rater reliability, two teachers from each facility assess seven to twenty children in their class, and intraclass correlation coefficients (ICCs) were calculated. The teachers additionally answered questions in the 1/5-5 Caregiver-Teacher Report Form (C-TRF) to investigate the criterion validity of the BCL. To investigate structural validity, exploratory factor analysis with promax rotation and confirmatory factor analysis were performed. The internal consistency reliability of the BCL was good (α = 0.92) and correlation analyses also confirmed its excellent criterion validity. Although exploratory factor analysis for the BCL yielded a five-factor model that consisted of a factor structure different from that of the original one, the results were similar to the original six factors. The ICCs of the BCL were 0.38-0.99 and it was not high enough for inter-rater reliability in some facilities. However, there is a possibility to improve it by giving raters adequate explanations when using BCL. The present study showed acceptable levels of reliability and validity of the BCL among Japanese preschool children.
Romero-Franco, Natalia; Jiménez-Reyes, Pedro; Montaño-Munuera, Juan A
2017-11-01
Lower limb isometric strength is a key parameter to monitor the training process or recognise muscle weakness and injury risk. However, valid and reliable methods to evaluate it often require high-cost tools. The aim of this study was to analyse the concurrent validity and reliability of a low-cost digital dynamometer for measuring isometric strength in lower limb. Eleven physically active and healthy participants performed maximal isometric strength for: flexion and extension of ankle, flexion and extension of knee, flexion, extension, adduction, abduction, internal and external rotation of hip. Data obtained by the digital dynamometer were compared with the isokinetic dynamometer to examine its concurrent validity. Data obtained by the digital dynamometer from 2 different evaluators and 2 different sessions were compared to examine its inter-rater and intra-rater reliability. Intra-class correlation (ICC) for validity was excellent in every movement (ICC > 0.9). Intra and inter-tester reliability was excellent for all the movements assessed (ICC > 0.75). The low-cost digital dynamometer demonstrated strong concurrent validity and excellent intra and inter-tester reliability for assessing isometric strength in the main lower limb movements.
Panzer, Stephanie; Mc Coy, Mark R; Hitzl, Wolfgang; Piombino-Mascali, Dario; Jankauskas, Rimantas; Zink, Albert R; Augat, Peter
2015-01-01
The purpose of this study was to develop a checklist for standardized assessment of soft tissue preservation in human mummies based on whole-body computed tomography examinations, and to add a scoring system to facilitate quantitative comparison of mummies. Computed tomography examinations of 23 mummies from the Capuchin Catacombs of Palermo, Sicily (17 adults, 6 children; 17 anthropogenically and 6 naturally mummified) and 7 mummies from the crypt of the Dominican Church of the Holy Spirit of Vilnius, Lithuania (5 adults, 2 children; all naturally mummified) were used to develop the checklist following previously published guidelines. The scoring system was developed by assigning equal scores for checkpoints with equivalent quality. The checklist was evaluated by intra- and inter-observer reliability. The finalized checklist was applied to compare the groups of anthropogenically and naturally mummified bodies. The finalized checklist contains 97 checkpoints and was divided into two main categories, "A. Soft Tissues of Head and Musculoskeletal System" and "B. Organs and Organ Systems", each including various subcategories. The complete checklist had an intra-observer reliability of 98% and an inter-observer reliability of 93%. Statistical comparison revealed significantly higher values in anthropogenically compared to naturally mummified bodies for the total score and for three subcategories. In conclusion, the developed checklist allows for a standardized assessment and documentation of soft tissue preservation in whole-body computed tomography examinations of human mummies. The scoring system facilitates a quantitative comparison of the soft tissue preservation status between single mummies or mummy collections.
Laar, Matilda E; Marquis, Grace S; Lartey, Anna; Gray-Donald, Katherine
2018-02-17
Length measurements are important in growth, monitoring and promotion (GMP) for the surveillance of a child's weight-for-length and length-for-age. These two indices provide an indication of a child's risk of becoming wasted or stunted, and are more informative about a child's growth than the widely used weight-for-age index (underweight). Although the introduction of length measurements in GMP is recommended by the World Health Organization, concerns about the reliability of length measurements collected in rural outreach settings have been expressed by stakeholders. Our aim was to describe the reliability and challenges associated with community health personnel measuring length for rural outreach GMP activities. Two reliability studies (A and B), using 10 children less than 24 months each, were conducted in the GMP services of a rural district in Ghana. Fifteen nurses and 15 health volunteers (HV) with no prior experience in length measurements were trained. Intra- and inter-observer technical error of measurement (TEM), average bias from expert anthropometrist, and coefficient of reliability (R) of length measurements were assessed and compared across sessions. Observations and interviews were used to understand the ability and experiences of health personnel with measuring length at outreach GMP. Inter-observer TEM was larger than intra-observer TEM for both nurses and HV at both sessions and was unacceptably (compared to error standards) high in both groups at both time points. Average biases from expert's measurements were within acceptable limits, however, both groups tended to underestimate length measurements. The R for lengths collected by nurses (92.3%) was higher at session B compared to that of HV (87.5%). Length measurements taken by nurses and HV, and those taken by an experienced anthropometrist at GMP sessions were of moderate agreement (kappa = 0.53, p < 0.0001). The reliability of length measurements improved after two refresher trainings for nurses but not for HV. In addition, length measurements taken during GMP sessions may be susceptible to errors due to overburdened health personnel and crowded GMP clinics. There is need for both pre- and in-service training of nurses and HV on length measurements and procedures to improve reliability of length measurements.
Stegeman, Sylvia A; de Witte, Pieter Bas; Boonstra, Sjoerd; de Groot, Jurriaan H; Nagels, Jochem; Krijnen, Pieta; Schipper, Inger B
2016-08-01
Clavicular shortening after fracture is deemed prognostic for clinical outcome and is therefore generally assessed on radiographs. It is used for clinical decision making regarding operative or non-operative treatment in the first 2weeks after trauma, although the reliability and accuracy of the measurements are unclear. This study aimed to assess the reliability of roentgen photogrammetry (2D) of clavicular length and shortening, and to compare these with 3D-spatial digitization measurements, obtained with an electromagnetic recording system (Flock of Birds). Thirty-two participants with a consolidated non-operatively treated two or multi-fragmented dislocated midshaft clavicular fracture were analysed. Two observers measured clavicular lengths and absolute and proportional clavicular shortening on radiographs taken before and after fracture consolidation. The clavicular lengths were also measured with spatial digitization. Inter-observer agreement on the radiographic measurements was assessed using the Intraclass Correlation Coefficient (ICC). Agreement between the radiographic and spatial digitization measurements was assessed using a Bland-Altman plot. The inter-observer agreement on clavicular length, and absolute and proportional shortening on trauma radiographs was almost perfect (ICC>0.90), but moderate for absolute shortening after consolidation (ICC=0.45). The Bland-Altman plot compared measurements of length on AP panorama radiographs with spatial digitization and showed that planar roentgen photogrammetry resulted in up to 37mm longer and 34mm shorter measurements than spatial digitization. Measurements of clavicular length on radiographs are highly reliable between observers, but may not reflect the actual length and shortening of the clavicle when compared to length measurements with spatial digitization. We recommend to use proportional shortening when measuring clavicular length or shortening on radiographs for clinical decision making. Copyright © 2015 Elsevier Ltd. All rights reserved.
Establishing inter-rater reliability scoring in a state trauma system.
Read-Allsopp, Christine
2004-01-01
Trauma systems rely on accurate Injury Severity Scoring (ISS) to describe trauma patient populations. Twenty-seven (27) Trauma Nurse Coordinators and Data Managers across the state of New South Wales, Australia trauma network were instructed in the uses and techniques of the Abbreviated Injury Scale (AIS) from the Association for the Advancement of Automotive Medicine. The aim is to provide accurate, reliable and valid data for the state trauma network. Four (4) months after the course a coding exercise was conducted to assess inter-rater reliability. The results show that inter-rater reliability is with accepted international standards.
Inter-rater reliability of measures to characterize the tobacco retail environment in Mexico.
Hall, Marissa G; Kollath-Cattano, Christy; Reynales-Shigematsu, Luz Myriam; Thrasher, James F
2015-01-01
To evaluate the inter-rater reliability of a data collection instrument to assess the tobacco retail environment in Mexico, after major marketing regulations were implemented. In 2013, two data collectors independently evaluated 21 stores in two census tracts, through a data collection instrument that assessed the presence of price promotions, whether single cigarettes were sold, the number of visible advertisements, the presence of signage prohibiting the sale of cigarettes to minors, and characteristics of cigarette pack displays. We evaluated the inter-rater reliability of the collected data, through the calculation of metrics such as intraclass correlation coefficient, percent agreement, Cohen's kappa and Krippendorff's alpha. Most measures demonstrated substantial or perfect inter-rater reliability. Our results indicate the potential utility of the data collection instrument for future point-of-sale research.
Nikolaidis, Pantelis T; Clemente, Filipe M; van der Linden, Cornelis M I; Rosemann, Thomas; Knechtle, Beat
2018-01-01
The objectives of the present study were to examine the validity and reliability of the 10 Hz Johan GPS unit in assessing in-line movement and change of direction. The validity was tested against the criterion measure of 200 m track-and-field (track-and-field athletes, n = 8) and 20 m shuttle run endurance test (female soccer players, n = 20). Intra-unit and inter-unit reliability was tested by intra-class correlation coefficient (ICC) and coefficient of variation (CV), respectively. An analysis of variance examined differences between the GPS measurement and five laps of 200 m at 15 km/h, and t -test examined differences between the GPS measurement and 20 m shuttle run endurance test. The difference between the GPS measurement and 200 m distance ranged from -0.13 ± 3.94 m (95% CI -3.42; 3.17) in the first lap to 2.13 ± 2.64 m (95% CI -0.08; 4.33) in the fifth lap. A good intra-unit reliability was observed in 200 m (ICC = 0.833, 95% CI 0.535; 0.962). Inter-unit CV ranged from 1.31% (fifth lap) to 2.20% (third lap). The difference between the GPS measurement and 20 m shuttle run endurance test ranged from 0.33 ± 4.16 m (95% CI -10.01; 10.68) in 11.5 km/h to 9.00 ± 5.30 m (95% CI 6.44; 11.56) in 8.0 km/h. A moderate intra-unit reliability was shown in the second and third stage of the 20 m shuttle run endurance test (ICC = 0.718, 95% CI 0.222;0.898) and good reliability in the fifth, sixth, seventh and eighth (ICC = 0.831, 95% CI -0.229;0.996). Inter-unit CV ranged from 2.08% (11.5 km/h) to 3.92% (8.5 km/h). Based on these findings, it was concluded that the 10 Hz Johan system offers an affordable valid and reliable tool for coaches and fitness trainers to monitor training and performance.
Test Assembly Implications for Providing Reliable and Valid Subscores
ERIC Educational Resources Information Center
Lee, Minji K.; Sweeney, Kevin; Melican, Gerald J.
2017-01-01
This study investigates the relationships among factor correlations, inter-item correlations, and the reliability estimates of subscores, providing a guideline with respect to psychometric properties of useful subscores. In addition, it compares subscore estimation methods with respect to reliability and distinctness. The subscore estimation…
Can emergency physicians accurately and reliably assess acute vertigo in the emergency department?
Vanni, Simone; Nazerian, Peiman; Casati, Carlotta; Moroni, Federico; Risso, Michele; Ottaviani, Maddalena; Pecci, Rudi; Pepe, Giuseppe; Vannucchi, Paolo; Grifoni, Stefano
2015-04-01
To validate a clinical diagnostic tool, used by emergency physicians (EPs), to diagnose the central cause of patients presenting with vertigo, and to determine interrater reliability of this tool. A convenience sample of adult patients presenting to a single academic ED with isolated vertigo (i.e. vertigo without other neurological deficits) was prospectively evaluated with STANDING (SponTAneousNystagmus, Direction, head Impulse test, standiNG) by five trained EPs. The first step focused on the presence of spontaneous nystagmus, the second on the direction of nystagmus, the third on head impulse test and the fourth on gait. The local standard practice, senior audiologist evaluation corroborated by neuroimaging when deemed appropriate, was considered the reference standard. Sensitivity and specificity of STANDING were calculated. On the first 30 patients, inter-observer agreement among EPs was also assessed. Five EPs with limited experience in nystagmus assessment volunteered to participate in the present study enrolling 98 patients. Their average evaluation time was 9.9 ± 2.8 min (range 6-17). Central acute vertigo was suspected in 16 (16.3%) patients. There were 13 true positives, three false positives, 81 true negatives and one false negative, with a high sensitivity (92.9%, 95% CI 70-100%) and specificity (96.4%, 95% CI 93-38%) for central acute vertigo according to senior audiologist evaluation. The Cohen's kappas of the first, second, third and fourth steps of the STANDING were 0.86, 0.93, 0.73 and 0.78, respectively. The whole test showed a good inter-observer agreement (k = 0.76, 95% CI 0.45-1). In the hands of EPs, STANDING showed a good inter-observer agreement and accuracy validated against the local standard of care. © 2015 Australasian College for Emergency Medicine and Australasian Society for Emergency Medicine.
Römkens, Tessa E H; Kranenburg, Pim; Tilburg, Arjan van; Bronkhorst, Carolien; Nagtegaal, Iris D; Drenth, Joost P H; Hoentjen, Frank
2018-03-28
Histological remission [HR] is a potential treatment target in ulcerative colitis [UC]. Limited 'real world' data are available on the reliability of histological scoring when assessing minimal histological inflammation. The aim of this study was to investigate the reliability of UC histological scores in colonic biopsies showing mucosal healing [MH] and limited histological inflammation, and to compare the 'daily practice' histological assessment with expert reviews by gastrointestinal [GI] pathologists. We performed a retrospective single-centre study. Colonic biopsies from UC patients with MH [Mayo score ≤ 1] were included. All biopsies assessed in daily practice were reassessed by three blinded GI pathologists using three histological scores (Geboes score [GS], Riley score [RS], Harpaz [Gupta] Index [HGI]) and a global visual scale [GVS]. We evaluated inter- and intra-observer variation between GI pathologists and correlations between scores including the initial histological assessment using Cronbach's alpha and Spearman rho analysis. In total, 270 biopsies from 39 UC patients were included. The inter-observer concordance for all histological indexes was substantial to almost perfect [GS 0.84; HGI 0.61; GVS 0.74, RS 0.91]. Correlation between the RS and GS was almost perfect [R = 0.86], but we found no correlation between the primary histological assessment and reassessment by GI pathologists. Current UC histological scores reliably assess limited histological inflammation in UC patients. The discrepancy between the initial histological assessment and the reassessment by dedicated GI pathologists suggests a gap between daily practice and academic expertise. This issue may limit the implementation of HR as a treatment target for UC in daily practice.
Corner, E J; Wood, H; Englebretsen, C; Thomas, A; Grant, R L; Nikoletou, D; Soni, N
2013-03-01
To develop a scoring system to measure physical morbidity in critical care - the Chelsea Critical Care Physical Assessment Tool (CPAx). The development process was iterative involving content validity indices (CVI), a focus group and an observational study of 33 patients to test construct validity against the Medical Research Council score for muscle strength, peak cough flow, Australian Therapy Outcome Measures score, Glasgow Coma Scale score, Bloomsbury sedation score, Sequential Organ Failure Assessment score, Short Form 36 (SF-36) score, days of mechanical ventilation and inter-rater reliability. Trauma and general critical care patients from two London teaching hospitals. Users of the CPAx felt that it possessed content validity, giving a final CVI of 1.00 (P<0.05). Construct validation data showed moderate to strong significant correlations between the CPAx score and all secondary measures, apart from the mental component of the SF-36 which demonstrated weak correlation with the CPAx score (r=0.024, P=0.720). Reliability testing showed internal consistency of α=0.798 and inter-rater reliability of κ=0.988 (95% confidence interval 0.791 to 1.000) between five raters. This pilot work supports proof of concept of the CPAx as a measure of physical morbidity in the critical care population, and is a cogent argument for further investigation of the scoring system. Copyright © 2012 Chartered Society of Physiotherapy. Published by Elsevier Ltd. All rights reserved.
The Reliability of Environmental Measures of the College Alcohol Environment.
ERIC Educational Resources Information Center
Clapp, John D.; Whitney, Mike; Shillington, Audrey M.
2002-01-01
Assesses the inter-rater reliability of two environmental scanning tools designed to identify alcohol-related advertisements targeting college students. Inter-rater reliability for these forms varied across different rating categories and ranged from poor to excellent. Suggestions for future research are addressed. (Contains 26 references and 6…
New Endoscopic Indicator of Esophageal Achalasia: “Pinstripe Pattern”
Minami, Hitomi; Isomoto, Hajime; Miuma, Satoshi; Kobayashi, Yasutoshi; Yamaguchi, Naoyuki; Urabe, Shigetoshi; Matsushima, Kayoko; Akazawa, Yuko; Ohnita, Ken; Takeshima, Fuminao; Inoue, Haruhiro; Nakao, Kazuhiko
2015-01-01
Background and Study Aims Endoscopic diagnosis of esophageal achalasia lacking typical endoscopic features can be extremely difficult. The aim of this study was to identify simple and reliable early indicator of esophageal achalasia. Patients and Methods This single-center retrospective study included 56 cases of esophageal achalasia without previous treatment. As a control, 60 non-achalasia subjects including reflux esophagitis and superficial esophageal cancer were also included in this study. Endoscopic findings were evaluated according to Descriptive Rules for Achalasia of the Esophagus as follows: (1) esophageal dilatation, (2) abnormal retention of liquid and/or food, (3) whitish change of the mucosal surface, (4) functional stenosis of the esophago-gastric junction, and (5) abnormal contraction. Additionally, the presence of the longitudinal superficial wrinkles of esophageal mucosa, “pinstripe pattern (PSP)” was evaluated endoscopically. Then, inter-observer diagnostic agreement was assessed for each finding. Results The prevalence rates of the above-mentioned findings (1–5) were 41.1%, 41.1%, 16.1%, 94.6%, and 43.9%, respectively. PSP was observed in 60.7% of achalasia, while none of the control showed positivity for PSP. PSP was observed in 26 (62.5%) of 35 cases with shorter history < 10 years, which usually lacks typical findings such as severe esophageal dilation and tortuosity. Inter-observer agreement level was substantial for food/liquid remnant (k = 0.6861) and PSP (k = 0.6098), and was fair for abnormal contraction and white change. The accuracy, sensitivity, and specificity for achalasia were 83.8%, 64.7%, and 100%, respectively. Conclusion “Pinstripe pattern” could be a reliable indicator for early discrimination of primary esophageal achalasia. PMID:25664812
Grant Peer Review: Improving Inter-Rater Reliability with Training
Sattler, David N.; McKnight, Patrick E.; Naney, Linda; ...
2015-06-15
In this study, we developed and evaluated a brief training program for grant reviewers that aimed to increase inter-rater reliability, rating scale knowledge, and effort to read the grant review criteria. Enhancing reviewer training may improve the reliability and accuracy of research grant proposal scoring and funding recommendations. Seventy-five Public Health professors from U.S. research universities watched the training video we produced and assigned scores to the National Institutes of Health scoring criteria proposal summary descriptions. For both novice and experienced reviewers, the training video increased scoring accuracy (the percentage of scores that reflect the true rating scale values), inter-ratermore » reliability, and the amount of time reading the review criteria compared to the no video condition. The increase in reliability for experienced reviewers is notable because it is commonly assumed that reviewers—especially those with experience—have good understanding of the grant review rating scale. Our findings suggest that both experienced and novice reviewers who had not received the type of training developed in this study may not have appropriate understanding of the definitions and meaning for each value of the rating scale and that experienced reviewers may overestimate their knowledge of the rating scale. Lastly, the results underscore the benefits of and need for specialized peer reviewer training.« less
Grant Peer Review: Improving Inter-Rater Reliability with Training.
Sattler, David N; McKnight, Patrick E; Naney, Linda; Mathis, Randy
2015-01-01
This study developed and evaluated a brief training program for grant reviewers that aimed to increase inter-rater reliability, rating scale knowledge, and effort to read the grant review criteria. Enhancing reviewer training may improve the reliability and accuracy of research grant proposal scoring and funding recommendations. Seventy-five Public Health professors from U.S. research universities watched the training video we produced and assigned scores to the National Institutes of Health scoring criteria proposal summary descriptions. For both novice and experienced reviewers, the training video increased scoring accuracy (the percentage of scores that reflect the true rating scale values), inter-rater reliability, and the amount of time reading the review criteria compared to the no video condition. The increase in reliability for experienced reviewers is notable because it is commonly assumed that reviewers--especially those with experience--have good understanding of the grant review rating scale. The findings suggest that both experienced and novice reviewers who had not received the type of training developed in this study may not have appropriate understanding of the definitions and meaning for each value of the rating scale and that experienced reviewers may overestimate their knowledge of the rating scale. The results underscore the benefits of and need for specialized peer reviewer training.
Dibai-Filho, Almir V.; Guirro, Elaine C. O.; Ferreira, Vânia T. K.; Brandino, Hugo E.; Vaz, Maíta M. O. L. L.; Guirro, Rinaldo R. J.
2015-01-01
BACKGROUND: Infrared thermography is recognized as a viable method for evaluation of subjects with myofascial pain. OBJECTIVE: The aim of the present study was to assess the intra- and inter-rater reliability of infrared image analysis of myofascial trigger points in the upper trapezius muscle. METHOD: A reliability study was conducted with 24 volunteers of both genders (23 females) between 18 and 30 years of age (22.12±2.54), all having cervical pain and presence of active myofascial trigger point in the upper trapezius muscle. Two trained examiners performed analysis of point, line, and area of the infrared images at two different periods with a 1-week interval. The intra-class correlation coefficient (ICC2,1) was used to assess the intra- and inter-rater reliability. RESULTS: With regard to the intra-rater reliability, ICC values were between 0.591 and 0.993, with temperatures between 0.13 and 1.57 °C for values of standard error of measurement (SEM) and between 0.36 and 4.35 °C for the minimal detectable change (MDC). For the inter-rater reliability, ICC ranged from 0.615 to 0.918, with temperatures between 0.43 and 1.22 °C for the SEM and between 1.19 and 3.38 °C for the MDC. CONCLUSION: The methods of infrared image analyses of myofascial trigger points in the upper trapezius muscle employed in the present study are suitable for clinical and research practices. PMID:25993626
Measuring the Process and Quality of Informed Consent for Clinical Research: Development and Testing
Cohn, Elizabeth Gross; Jia, Haomiao; Smith, Winifred Chapman; Erwin, Katherine; Larson, Elaine L.
2013-01-01
Purpose/Objectives To develop and assess the reliability and validity of an observational instrument, the Process and Quality of Informed Consent (P-QIC). Design A pilot study of the psychometrics of a tool designed to measure the quality and process of the informed consent encounter in clinical research. The study used professionally filmed, simulated consent encounters designed to vary in process and quality. Setting A major urban teaching hospital in the northeastern region of the United States. Sample 63 students enrolled in health-related programs participated in psychometric testing, 16 students participated in test-retest reliability, and 5 investigator-participant dyads were observed for the actual consent encounters. Methods For reliability and validity testing, students watched and rated videotaped simulations of four consent encounters intentionally varied in process and content and rated them with the proposed instrument. Test-retest reliability was established by raters watching the videotaped simulations twice. Inter-rater reliability was demonstrated by two simultaneous but independent raters observing an actual consent encounter. Main Research Variables The essential elements of information and communication for informed consent. Findings The initial testing of the P-QIC demonstrated reliable and valid psychometric properties in both the simulated standardized consent encounters and actual consent encounters in the hospital setting. Conclusions The P-QIC is an easy-to-use observational tool that provides a quick assessment of the areas of strength and areas that need improvement in a consent encounter. It can be used in the initial trainings of new investigators or consent administrators and in ongoing programs of improvement for informed consent. Implications for Nursing The development of a validated observational instrument will allow investigators to assess the consent process more accurately and evaluate strategies designed to improve it. PMID:21708532
CLINICAL AUDIT OF IMAGE QUALITY IN RADIOLOGY USING VISUAL GRADING CHARACTERISTICS ANALYSIS.
Tesselaar, Erik; Dahlström, Nils; Sandborg, Michael
2016-06-01
The aim of this work was to assess whether an audit of clinical image quality could be efficiently implemented within a limited time frame using visual grading characteristics (VGC) analysis. Lumbar spine radiography, bedside chest radiography and abdominal CT were selected. For each examination, images were acquired or reconstructed in two ways. Twenty images per examination were assessed by 40 radiology residents using visual grading of image criteria. The results were analysed using VGC. Inter-observer reliability was assessed. The results of the visual grading analysis were consistent with expected outcomes. The inter-observer reliability was moderate to good and correlated with perceived image quality (r(2) = 0.47). The median observation time per image or image series was within 2 min. These results suggest that the use of visual grading of image criteria to assess the quality of radiographs provides a rapid method for performing an image quality audit in a clinical environment. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Saad, Karen Ruggeri; Colombo, Alexandra Siqueira; Ribeiro, Ana Paula; João, Sílvia Maria Amado
2012-04-01
The purpose of this study was to investigate the reliability of photogrammetry in the measurement of the postural deviations in individuals with idiopathic scoliosis. Twenty participants with scoliosis (17 women and three men), with a mean age of 23.1 ± 9 yrs, were photographed from the posterior and lateral views. The postural aspects were measured with CorelDRAW software. High inter-rater and test-retest reliability indices were found. It was observed that with more severity of scoliosis, greater were the variations between the thoracic kyphosis and lumbar lordosis measures obtained by the same examiner from the left lateral view photographs. A greater body mass index (BMI) was associated with greater variability of the trunk rotation measures obtained by two independent examiners from the right, lateral view (r = 0.656; p = 0.002). The severity of scoliosis was also associated with greater inter-rater variability measures of trunk rotation obtained from the left, lateral view (r = 0.483; p = 0.036). Photogrammetry demonstrated to be a reliable method for the measurement of postural deviations from the posterior and lateral views of individuals with idiopathic scoliosis and could be complementarily employed for the assessment procedures, which could reduce the number of X-rays used for the follow-up assessments of these individuals. Copyright © 2011 Elsevier Ltd. All rights reserved.
Ay, Ali; Bulut, Hulya
2015-08-01
Many ostomy patients experience peristomal skin lesions. A descriptive study was conducted to assess the validity, usability, and reliability of the Peristomal Skin Lesions Assessment instrument (SACS instrument) adapted to Turkish from English. The SACS Instrument consists of 2 main assessments: lesion type (utilizing definitions and photographs) and lesion area by location around the ostomy. The study was performed in 2 stages: 1) the SACS language was changed and its content validity established; and 2) the instrument\\'92s content validity and inter-observer agreement (consistency) were determined among pairs of nurses who used the tool to assess peristomal skin lesions. Patients (included if they were >18 years old and receiving treatment/observation at 1 of the 4 participating stomatherapy units) and 8 stomatherapy nurses also completed appropriate sociodemographic questionnaires. Of the 393 patients screened during the 7-month study, 100 (average age 56.74 \\'b1 14.03 years, 55 men) participated; most (79) had a planned operation. A little more than half (59) of the patients had colorectal cancer and 28 had their stoma site marked preoperatively by a stomatherapy nurse. The most common peristomal skin lesion risk factors were having an ileostomy and unplanned surgery. The content validity index of the entire Turkish SACS instrument was 1, and the inter-observer agreement Kappa statistic was very good (K = 0.90, 95% CI 0.80- 0.99). Individual SACS item K values ranged from K = 0.84 (95% CI 0.63\\'961) to K = 1 (95% CI 1). Most (62.5%) nurses found the terms and pictures used in the SACS classification adequate and suitable, and 50% believed the Turkish version of the SACS instrument was a valid and suitable assessment tool for use by Turkish stomatherapy nurses. Validity and reliability studies involving larger and more diverse patient and nurse samples are warranted.
Goyal, Alka R; Bergh, Sverre; Engedal, Knut; Kirkevold, Marit; Kirkevold, Øyvind
2017-12-01
Dementia-specific anxiety scales in the Norwegian language are lacking; the aim of this study was to investigate the validity and inter-rater reliability of a Norwegian version of the Rating Anxiety in Dementia (RAID-N) scale. The validity of the RAID-N was tested in a sample of 101 patients with dementia from seven Norwegian nursing homes. One psychogeriatrician (n = 50) or a physician with long experience with nursing home patients (n = 51) 'blind' to the RAID-N score diagnosed anxiety according to DSM-5 criteria of generalised anxiety disorder (GAD). A receiver operating characteristic (ROC) analysis assessed the best cut-off point for the RAID-N, and the area under the curve (AUC) was calculated. Inter-rater reliability was tested in a subgroup of 53 patients by intraclass correlation (ICC) and Cohen's kappa. Twenty-eight of 101 (27.7%) met the GAD criteria. The mean RAID-N score for patients with GAD was 16.1 (SD 6.3) and without GAD, 8.8 (SD 6.5) (p < 0.001). A cut-off score of ≥12 on the RAID-N gave a sensitivity of 82.1%, specificity of 70.0%, and 73.3% accuracy in identifying clinically significant GAD in patients with dementia. Inter-rater reliability on overall RAID-N items was good (ICC = 0.82), Cohen's kappa was 0.58 for total RAID-N score, with satisfactory internal consistency (Cronbach's alpha = 0.81). The RAID-N has fairly good validity and inter-rater reliability, and could be useful to assess GAD in patients with dementia. Further studies should investigate the optimal RAID-N cut-off score in different settings.
Psychometric evaluation of a motor control test battery of the craniofacial region.
von Piekartz, H; Stotz, E; Both, A; Bahn, G; Armijo-Olivo, S; Ballenberger, N
2017-12-01
The primary objective of this study was to determine the structural and known-group validity as well as the inter-rater reliability of a test battery to evaluate the motor control of the craniofacial region. Seventy volunteers without TMD and 25 subjects with TMD (Axes I) per the DC/TMD were asked to execute a test battery consisting of eight tests. The tests were video-taped in the same sequence in a standardised manner. Two experienced physical therapists participated in this study as blinded assessors. We used exploratory factor analysis to identify the underlying component structure of the eight tests. Internal consistency (Cronbach's α), inter-rater reliability (intra-class correlation coefficient) and construct validity (ie, hypothesis testing-known-group validity) (receiver operating curves) were also explored for the test battery. The structural validity showed the presence of one factor underlying the construct of the test battery. The internal consistency was excellent (0.90) as well as the inter-rater reliability. All values of reliability were close to 0.9 or above indicating very high inter-rater reliability. The area under the curve (AUC) was 0.93 for rater 1 and 0.94 for rater two, respectively, indicating excellent discrimination between subjects with TMD and healthy controls. The results of the present study support the psychometric properties of test battery to measure motor control of the craniofacial region when evaluated through videotaping. This test battery could be used to differentiate between healthy subjects and subjects with musculoskeletal impairments in the cervical and oro-facial regions. In addition, this test battery could be used to assess the effectiveness of management strategies in the craniofacial region. © 2017 John Wiley & Sons Ltd.
Poncumhak, Puttipong; Saengsuwan, Jiamjit; Amatachaya, Sugalya
2014-01-01
Background/Objectives More than half of independent ambulatory patients with spinal cord injury (SCI) need a walking device to promote levels of independence. However, long-lasting use of a walking device may introduce negative impacts for the patients. Using a standard objective test relating to the requirement of a walking device may offer a quantitative criterion to effectively monitor levels of independence of the patients. Therefore, this study investigated (1) ability of the three functional tests, including the five times sit-to-stand test (FTSST), timed up and go test (TUGT), and 10-meter walk test (10MWT) to determine the ability of walking without a walking device, and (2) the inter-tester reliability of the tests to assess functional ability in patients with SCI. Methods Sixty independent ambulatory patients with SCI, who walked with and without a walking device (30 subjects/group), were assessed cross-sectionally for their functional ability using the three tests. The first 20 subjects also participated in the inter-tester reliability test. Results The time required to complete the FTSST <14 seconds, the TUGT < 18 seconds, and the 10MWT < 6 seconds had good-to-excellent capability to determine the ability of walking without a walking device of subjects with SCI. These tests also showed excellent inter-tester reliability. Conclusions Methods of clinical evaluation for walking are likely performed using qualitative observation, which makes the results difficult to compare among testers and test intervals. Findings of this study offer a quantitative target criterion or a clear level of ability that patients with SCI could possibly walk without a walking device, which would benefit monitoring process for the patients. PMID:24621030
Thorborg, Kristian; Bandholm, Thomas; Hölmich, Per
2013-03-01
In football, ice-hockey, and track and field, injuries have been predicted, and hip- and knee-strength deficits quantified using hand-held dynamometry (HHD). However, systematic bias exists when testers of different sex and strength perform the measurements. Belt-fixation of the dynamometer may resolve this. The aim of the present study was therefore to examine the inter-tester reliability concerning strength assessments of isometric hip abduction, adduction, flexion, extension and knee-flexion strength, using HHD with external belt-fixation. Twenty-one healthy athletes (6 women), 30 (8.6) (mean (SD)) years of age, were included. Two physiotherapy students (1 female and 1 male) performed all the measurements after careful instruction and procedure training. Isometric hip abduction, adduction, flexion, extension, and knee-flexion strength were tested. The tester-order and hip-action order were randomised. No systematic between-tester differences (bias) were observed for any of the hip or knee actions. The intra-class correlation coefficients (ICC 2.1) ranged from 0.76 to 0.95. Furthermore, standard errors of measurement in per cent (SEM %) ranged from 5 to 11 %, and minimal detectable change in per cent (MDC %) from 14 to 29 % for the different hip and knee actions. The present study shows that isometric hip- and knee-strength measurements have acceptable inter-tester reliability at the group level, when testing strong individuals, using HHD with belt-fixation. This procedure is therefore perfectly suited for the evaluation and monitoring of strong athletes with hip, groin and hamstring injuries, some of the most common and troublesome injuries in sports. Diagnostic, Level III.
Assessing Reliability of Medical Record Reviews for the Detection of Hospital Adverse Events.
Ock, Minsu; Lee, Sang-il; Jo, Min-Woo; Lee, Jin Yong; Kim, Seon-Ha
2015-09-01
The purpose of this study was to assess the inter-rater reliability and intra-rater reliability of medical record review for the detection of hospital adverse events. We conducted two stages retrospective medical records review of a random sample of 96 patients from one acute-care general hospital. The first stage was an explicit patient record review by two nurses to detect the presence of 41 screening criteria (SC). The second stage was an implicit structured review by two physicians to identify the occurrence of adverse events from the positive cases on the SC. The inter-rater reliability of two nurses and that of two physicians were assessed. The intra-rater reliability was also evaluated by using test-retest method at approximately two weeks later. In 84.2% of the patient medical records, the nurses agreed as to the necessity for the second stage review (kappa, 0.68; 95% confidence interval [CI], 0.54 to 0.83). In 93.0% of the patient medical records screened by nurses, the physicians agreed about the absence or presence of adverse events (kappa, 0.71; 95% CI, 0.44 to 0.97). When assessing intra-rater reliability, the kappa indices of two nurses were 0.54 (95% CI, 0.31 to 0.77) and 0.67 (95% CI, 0.47 to 0.87), whereas those of two physicians were 0.87 (95% CI, 0.62 to 1.00) and 0.37 (95% CI, -0.16 to 0.89). In this study, the medical record review for detecting adverse events showed intermediate to good level of inter-rater and intra-rater reliability. Well organized training program for reviewers and clearly defining SC are required to get more reliable results in the hospital adverse event study.
ERIC Educational Resources Information Center
Nordness, Philip D.; Epstein, Michael H.; Cullinan, Douglas; Pierce, Corey D.
2014-01-01
The Emotional and Behavioral Screener (EBS) is a universal screening instrument designed to identify students whose excessive problem behaviors put them at risk of the education disability category of emotional disturbance (ED). This article reports findings from three studies that address the reliability and validity of the EBS. Studies 1 and 2…
ERIC Educational Resources Information Center
Jones, Corinne A.; Hoffman, Matthew R.; Geng, Zhixian; Abdelhalim, Suzan M.; Jiang, Jack J.; McCulloch, Timothy M.
2014-01-01
Purpose: The purpose of this study was to investigate inter- and intrarater reliability among expert users, novice users, and speech-language pathologists with a semiautomated high-resolution manometry analysis program. We hypothesized that all users would have high intrarater reliability and high interrater reliability. Method: Three expert…
Comparison of in vivo 3D cone-beam computed tomography tooth volume measurement protocols.
Forst, Darren; Nijjar, Simrit; Flores-Mir, Carlos; Carey, Jason; Secanell, Marc; Lagravere, Manuel
2014-12-23
The objective of this study is to analyze a set of previously developed and proposed image segmentation protocols for precision in both intra- and inter-rater reliability for in vivo tooth volume measurements using cone-beam computed tomography (CBCT) images. Six 3D volume segmentation procedures were proposed and tested for intra- and inter-rater reliability to quantify maxillary first molar volumes. Ten randomly selected maxillary first molars were measured in vivo in random order three times with 10 days separation between measurements. Intra- and inter-rater agreement for all segmentation procedures was attained using intra-class correlation coefficient (ICC). The highest precision was for automated thresholding with manual refinements. A tooth volume measurement protocol for CBCT images employing automated segmentation with manual human refinement on a 2D slice-by-slice basis in all three planes of space possessed excellent intra- and inter-rater reliability. Three-dimensional volume measurements of the entire tooth structure are more precise than 3D volume measurements of only the dental roots apical to the cemento-enamel junction (CEJ).
Validation of the Dementia Care Assessment Packet-Instrumental Activities of Daily Living
Lee, Seok Bum; Park, Jeong Ran; Yoo, Jeong-Hwa; Park, Joon Hyuk; Lee, Jung Jae; Yoon, Jong Chul; Jhoo, Jin Hyeong; Lee, Dong Young; Woo, Jong Inn; Han, Ji Won; Huh, Yoonseok; Kim, Tae Hui
2013-01-01
Objective We aimed to evaluate the psychometric properties of the IADL measure included in the Dementia Care Assessment Packet (DCAP-IADL) in dementia patients. Methods The study involved 112 dementia patients and 546 controls. The DCAP-IADL was scored in two ways: observed score (OS) and predicted score (PS). The reliability of the DCAP-IADL was evaluated by testing its internal consistency, inter-rater reliability and test-retest reliability. Discriminant validity was evaluated by comparing the mean OS and PS between dementia patients and controls by ANCOVA. Pearson or Spearman correlation analysis was performed with other instruments to assess concurrent validity. Receiver operating characteristics curve analysis was performed to examine diagnostic accuracy. Results Chronbach's α coefficients of the DCAP-IADL were above 0.7. The values in dementia patients were much higher (OS=0.917, PS=0.927), indicating excellent degrees of internal consistency. Inter-rater reliabilities and test-retest reliabilities were statistically significant (p<0.05). PS exhibited higher reliabilities than OS. The mean OS and PS of dementia patients were significantly higher than those of the non-demented group after controlling for age, sex and education level. The DCAP-IADL was significantly correlated with other IADL instruments and MMSE-KC (p<0.001). Areas under the curves of the DCAP-IADL were above 0.9. Conclusion The DCAP-IADL is a reliable and valid instrument for evaluating instrumental ability of daily living for the elderly, and may also be useful for screening dementia. Moreover, administering PS may enable the DCAP-IADL to overcome the differences in gender, culture and life style that hinders accurate evaluation of the elderly in previous IADL instruments. PMID:24302946
Reliability of Semi-Automated Segmentations in Glioblastoma.
Huber, T; Alber, G; Bette, S; Boeckh-Behrens, T; Gempt, J; Ringel, F; Alberts, E; Zimmer, C; Bauer, J S
2017-06-01
In glioblastoma, quantitative volumetric measurements of contrast-enhancing or fluid-attenuated inversion recovery (FLAIR) hyperintense tumor compartments are needed for an objective assessment of therapy response. The aim of this study was to evaluate the reliability of a semi-automated, region-growing segmentation tool for determining tumor volume in patients with glioblastoma among different users of the software. A total of 320 segmentations of tumor-associated FLAIR changes and contrast-enhancing tumor tissue were performed by different raters (neuroradiologists, medical students, and volunteers). All patients underwent high-resolution magnetic resonance imaging including a 3D-FLAIR and a 3D-MPRage sequence. Segmentations were done using a semi-automated, region-growing segmentation tool. Intra- and inter-rater-reliability were addressed by intra-class-correlation (ICC). Root-mean-square error (RMSE) was used to determine the precision error. Dice score was calculated to measure the overlap between segmentations. Semi-automated segmentation showed a high ICC (> 0.985) for all groups indicating an excellent intra- and inter-rater-reliability. Significant smaller precision errors and higher Dice scores were observed for FLAIR segmentations compared with segmentations of contrast-enhancement. Single rater segmentations showed the lowest RMSE for FLAIR of 3.3 % (MPRage: 8.2 %). Both, single raters and neuroradiologists had the lowest precision error for longitudinal evaluation of FLAIR changes. Semi-automated volumetry of glioblastoma was reliably performed by all groups of raters, even without neuroradiologic expertise. Interestingly, segmentations of tumor-associated FLAIR changes were more reliable than segmentations of contrast enhancement. In longitudinal evaluations, an experienced rater can detect progressive FLAIR changes of less than 15 % reliably in a quantitative way which could help to detect progressive disease earlier.
Evidence-based dentistry: analysis of dental anxiety scales for children.
Al-Namankany, A; de Souza, M; Ashley, P
2012-03-09
To review paediatric dental anxiety measures (DAMs) and assess the statistical methods used for validation and their clinical implications. A search of four computerised databases between 1960 and January 2011 associated with DAMs, using pre-specified search terms, to assess the method of validation including the reliability as intra-observer agreement 'repeatability or stability' and inter-observer agreement 'reproducibility' and all types of validity. Fourteen paediatric DAMs were predominantly validated in schools and not in the clinical setting while five of the DAMs were not validated at all. The DAMs that were validated were done so against other paediatric DAMs which may not have been validated previously. Reliability was not assessed in four of the DAMs. However, all of the validated studies assessed reliability which was usually 'good' or 'acceptable'. None of the current DAMs used a formal sample size technique. Diversity was seen between the studies ranging from a few simple pictograms to lists of questions reported by either the individual or an observer. To date there is no scale that can be considered as a gold standard, and there is a need to further develop an anxiety scale with a cognitive component for children and adolescents.
Øhre, Beate; Saltnes, Hege; von Tetzchner, Stephen; Falkum, Erik
2014-05-22
There is a need for psychiatric assessment instruments that enable reliable diagnoses in persons with hearing loss who have sign language as their primary language. The objective of this study was to assess the validity of the Norwegian Sign Language (NSL) version of the Mini International Neuropsychiatric Interview (MINI). The MINI was translated into NSL. Forty-one signing patients consecutively referred to two specialised psychiatric units were assessed with a diagnostic interview by clinical experts and with the MINI. Inter-rater reliability was assessed with Cohen's kappa and "observed agreement". There was 65% agreement between MINI diagnoses and clinical expert diagnoses. Kappa values indicated fair to moderate agreement, and observed agreement was above 76% for all diagnoses. The MINI diagnosed more co-morbid conditions than did the clinical expert interview (mean diagnoses: 1.9 versus 1.2). Kappa values indicated moderate to substantial agreement, and "observed agreement" was above 88%. The NSL version performs similarly to other MINI versions and demonstrates adequate reliability and validity as a diagnostic instrument for assessing mental disorders in persons who have sign language as their primary and preferred language.
Al-Amiry, Bariq; Mahmood, Sarwar; Krupic, Ferid; Sayed-Noor, Arkan
2017-09-01
Background Restoration of femoral offset (FO) and leg length is an important goal in total hip arthroplasty (THA) as it improves functional outcome. Purpose To analyze whether the problem of postoperative leg lengthening and FO reduction is related to the femoral stem or acetabular cup positioning or both. Material and Methods Between September 2010 and April 2013, 172 patients with unilateral primary osteoarthritis treated with THA were included. Postoperative leg-length discrepancy (LLD) and global FO (summation of cup and FO) were measured by two observers using a standardized protocol for evaluation of antero-posterior plain hip radiographs. Patients with postoperative leg lengthening ≥10 mm (n = 41) or with reduced global FO >5 mm (n = 58) were further studied by comparing the stem and cup length of the operated side with the contralateral side in the lengthening group, and by comparing the stem and cup offset of the operated side with the contralateral side in the FO reduction group. We evaluated also the inter-observer and intra-observer reliability of the radiological measurements. Results Both observers found that leg lengthening was related to the stem positioning while FO reduction was related to the positioning of both the femoral stem and acetabular cup. Both inter-observer reliability and intra-observer reproducibility were moderate to excellent (intra-class correlation co-efficient, ICC ≥0.69). Conclusion Post THA leg lengthening was mainly caused by improper femoral stem positioning while global FO reduction resulted from improper positioning of both the femoral stem and the acetabular cup.
Inter-Rater Reliability and Intra-Rater Reliability of Assessing the 2-Minute Push-Up Test.
Fielitz, Lynn; Coelho, Jeffrey; Horne, Thomas; Brechue, William
2016-02-01
The purpose of this study was to assess inter-rater reliability and intra-rater reliability of the 2-minute, 90° push-up test as utilized in the Army Physical Fitness Test. Analysis of rater assessment reliability included both total score agreement and agreement across individual push-up repetitions. This study utilized 8 Raters who assessed 15 different videotaped push-up performances over 4 iterations separated by a minimum of 1 week. The 15 push-up participants were videotaped during the semiannual Army Physical Fitness Test. Each Rater randomly viewed the 15 push-up and verbally responded with a "yes" or "no" to each push-up repetition. The data generated were analyzed using the Pearson product-moment correlation as well as the kappa, modified kappa and the intra-class correlation coefficient (3,1). An attribute agreement analysis was conducted to determine the percent of inter-rater and intra-rater agreement across individual push-ups.The results indicated that Raters varied a great deal in assessing push-ups. Over the 4 trials of 15 participants, the overall scores of the Raters varied between 3.0 and 35.7 push-ups. Post hoc comparisons found that there was significant increase in the grand mean of push-ups from trials 1-3 to trial 4 (p < 0.05). Also, there was a significant difference among raters over the 4 trials (p < 0.05). Pearson correlation coefficients for inter-rater and intra-rater reliability identified inter-rater reliability coefficients were between 0.10 and 0.97. Intra-rater coefficients were between 0.48 and 0.99. Intra-rater agreement for individual push-up repetitions ranged from 41.8% to 84.8%. The results indicated that the raters failed to assess the same push-up repetition with the same score (below 70% agreement) as well as failed to agree when viewed between raters (29%). Interestingly, as previously mentioned, scores on trial 4 increased significantly which might have been caused by rater drift or that the Raters did not maintain the push-up standard over the trials. It does appear that the final push-up scores received by each participant was a close approximation of actual performance (within 65%) but when assessing physical performance for retention in the Army, a more reliable test might be considered. Reprint & Copyright © 2016 Association of Military Surgeons of the U.S.
Henrique-Araújo, Ricardo; Osório, Flávia L; Gonçalves Ribeiro, Mônica; Soares Monteiro, Ivandro; Williams, Janet B W; Kalali, Amir; Alexandre Crippa, José; Oliveira, Irismar Reis De
2014-07-01
GRID-HAMD is a semi-structured interview guide developed to overcome flaws in HAM-D, and has been incorporated into an increasing number of studies. Carry out the transcultural adaptation of GRID-HAMD into the Brazilian Portuguese language, evaluate the inter-rater reliability of this instrument and the training impact upon this measure, and verify the raters' opinions of said instrument. The transcultural adaptation was conducted by appropriate methodology. The measurement of inter-rater reliability was done by way of videos that were evaluated by 85 professionals before and after training for the use of this instrument. The intraclass correlation coefficient (ICC) remained between 0.76 and 0.90 for GRID-HAMD-21 and between 0.72 and 0.91 for GRID-HAMD-17. The training did not have an impact on the ICC, except for a few groups of participants with a lower level of experience. Most of the participants showed high acceptance of GRID-HAMD, when compared to other versions of HAM-D. The scale presented adequate inter-rater reliability even before training began. Training did not have an impact on this measure, except for a few groups with less experience. GRID-HAMD received favorable opinions from most of the participants.
Calhoun Thielen, C; Sadowsky, C; Vogel, L C; Taylor, H; Davidson, L; Bultman, J; Gaughan, J; Mulcahey, M J
2017-05-01
Mixed methods were used in this study. The appropriateness of the levels of the Walking Index for Spinal Cord Injury II (WISCI-II) for application in children was critically reviewed by physical therapists using the Modified Delphi Technique, and the inter- and intra-rater reliability of the WISCI-II in children was evaluated. To examine the construct validity, and to establish reliability of the WISCI-II related to its use in children with spinal cord injury (SCI). United States of America. Using a Modified Delphi Technique, physical therapists critically reviewed the WISCI-II levels for pediatric utilization. Concurrently, ambulatory children under age 18 years with SCI were evaluated using the WISCI-II on two occasions by the same therapist to establish intra-rater reliability. One trial was photographed and de-identified. Each photograph was reviewed by four different physical therapists who gave WISCI-II scores to establish inter-rater reliability. Summary and descriptive statistics were used to calculate the frequency of yes/no responses for each WISCI-II level question and to determine the percent agreement for each question. Inter- and intra-rater reliability was calculated using interclass correlation coefficients (ICCs) with 95% confidence intervals (CI). Construct validity was confirmed after one Delphi round during which at least 80% agreement was established by 51 physical therapists on the appropriateness of the WISCI-II levels for children. Fifty-two children with SCI aged 2-17 years completed repeated WISCI-II assessments and 40 de-identified photographs were scored by four physical therapists. Intra- and inter-rater reliability was high (ICC=0.997, CI=0.995-0.998 and ICC=0.97, CI=0.95-0.98, respectively). This study demonstrates support for the use of the WISCI-II in ambulatory children with SCI. This study was funded by the Craig H Neilsen Foundation, Spinal Cord Injury Research on the Translation Spectrum, Senior Research Award #282592 (Mulcahey, PI).
Inter-Rater Reliability and Validity of the Australian Football League’s Kicking and Handball Tests
Cripps, Ashley J.; Hopper, Luke S.; Joyce, Christopher
2015-01-01
Talent identification tests used at the Australian Football League’s National Draft Combine assess the capacities of athletes to compete at a professional level. Tests created for the National Draft Combine are also commonly used for talent identification and athlete development in development pathways. The skills tests created by the Australian Football League required players to either handball (striking the ball with the hand) or kick to a series of 6 randomly generated targets. Assessors subjectively rate each skill execution giving a 0-5 score for each disposal. This study aimed to investigate the inter-rater reliability and validity of the skills tests at an adolescent sub-elite level. Male Australian footballers were recruited from sub-elite adolescent teams (n = 121, age = 15.7 ± 0.3 years, height = 1.77 ± 0.07 m, mass = 69.17 ± 8.08 kg). The coaches (n = 7) of each team were also recruited. Inter-rater reliability was assessed using Inter-class correlations (ICC) and Limits of Agreement statistics. Both the kicking (ICC = 0.96, p < .01) and handball tests (ICC = 0.89, p < .01) demonstrated strong reliability and acceptable levels of absolute agreement. Content validity was determined by examining the test scores sensitivity to laterality and distance. Concurrent validity was assessed by comparing coaches’ perceptions of skill to actual test outcomes. Multivariate analysis of variance (MANOVA) examined the main effect of laterality, with scores on the dominant hand (p = .04) and foot (p < .01) significantly higher compared to the non-dominant side. Follow-up univariate analysis reported significant differences at every distance in the kicking test. A poor correlation was found between coaches’ perceptions of skill and testing outcomes. The results of this study demonstrate both skill tests demonstrate acceptable inter-rater reliable. Partial content validity was confirmed for the kicking test, however further research is required to confirm validity of the handball test. Key points The skill tests created by the AFL demonstrated acceptable levels of relative and absolute inter-rater reliability. Both the AFL’s skills tests are able to differentiate between athletes dominant and non-dominant limbs. However, only the kicking test could consistently differentiated between score outcomes over a range of Australian Football specific disposal distances. Both tests demonstrated poor concurrent validity, with no correlation found between coaches’ perceptions of technical skills and actual skill outcomes measured. PMID:26336356
Zhou, Yuefang; Black, Rolf; Freeman, Ruth; Herron, Daniel; Humphris, Gerry; Menzies, Rachel; Quinn, Sandra; Scott, Lesley; Waller, Annalu
2014-11-01
The VR-CoDES has been previously applied in the dental context. However, we know little about how dental patients with intellectual disabilities (ID) and complex communication needs express their emotional distress during dental visits. This is the first study explored the applicability of the VR-CoDES to a dental context involving patients with ID. Fourteen dental consultations were video recorded and coded using the VR-CoDES, assisted with the additional guidelines for the VR-CoDES in a dental context. Both inter- and intra-coder reliabilities were checked on the seven consultations where cues were observed. Sixteen cues (eight non-verbal) were identified within seven of the 14 consultations. Twenty responses were observed (12 reducing space) with four multiple responses. Cohen's Kappa were 0.76 (inter-coder) and 0.88 (intra-coder). With the additional guidelines, cues and responses were reliably identified. Cue expression was exhibited by non-verbal expression of emotion with people with ID in the literature. Further guidance is needed to improve the coding accuracy on multiple providers' responses and to investigate potential impacts of conflicting responses on patients. The findings provided a useful initial step towards an ongoing exploration of how healthcare providers identify and manage emotional distress of patients with ID. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Development and evaluation of an instrument for assessing brief behavioral change interventions.
Strayer, Scott M; Martindale, James R; Pelletier, Sandra L; Rais, Salehin; Powell, Jon; Schorling, John B
2011-04-01
To develop an observational coding instrument for evaluating the fidelity and quality of brief behavioral change interventions based on the behavioral theories of the 5 A's, Stages of Change and Motivational Interviewing. Content and face validity were assessed prior to an intervention where psychometric properties were evaluated with a prospective cohort of 116 medical students. Properties assessed included the inter-rater reliability of the instrument, internal consistency of the full scale and sub-scales and descriptive statistics of the instrument. Construct validity was assessed based on student's scores. Inter-rater reliability for the instrument was 0.82 (intraclass correlation). Internal consistency for the full scale was 0.70 (KR20). Internal consistencies for the sub-scales were as follows: MI intervention component (KR20=.7); stage-appropriate MI-based intervention (KR20=.55); MI spirit (KR20=.5); appropriate assessment (KR20=.45) and appropriate assisting (KR20=.56). The instrument demonstrated good inter-rater reliability and moderate overall internal consistency when used to assess performing brief behavioral change interventions by medical students. This practical instrument can be used with minimal training and demonstrates promising psychometric properties when evaluated with medical students counseling standardized patients. Further testing is required to evaluate its usefulness in clinical settings. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Noh, Dong Koog; Koh, Jae-Hyun; You, Joshua Sung-H
2016-01-01
The purpose of this study was to determine intertester and intratester reliability of ultrasound measurements of bilateral diaphragm excursions in the thoracic and thoracolumbar spinal curves of 31 females with adolescent idiopathic scoliosis (AIS) (mean age = 14.1 ± 1.8 years). Subjects were tested during tidal breathing using real-time ultrasound imaging with a 3.5 MHz curvilinear transducer. There were no significant differences in intratester and intertester reliability values in bilateral diaphragmatic excursions measured at the thoracolumbar spinal curve, whereas significant differences were observed in measurements taken at the thoracic spinal curve (p < 0.05). Overall, the intertester and intratester reliabilities of the thoracic and thoracolumbar curves in AIS ranged from 0.764 to 0.998. These findings suggest that ultrasound imaging is highly reliable between and within testers and is useful to precisely discriminate pathological diaphragm movement in idiopathic thoracic scoliosis and idiopathic thoracolumbar scoliosis.
Standard setting: comparison of two methods.
George, Sanju; Haque, M Sayeed; Oyebode, Femi
2006-09-14
The outcome of assessments is determined by the standard-setting method used. There is a wide range of standard-setting methods and the two used most extensively in undergraduate medical education in the UK are the norm-reference and the criterion-reference methods. The aims of the study were to compare these two standard-setting methods for a multiple-choice question examination and to estimate the test-retest and inter-rater reliability of the modified Angoff method. The norm-reference method of standard-setting (mean minus 1 SD) was applied to the 'raw' scores of 78 4th-year medical students on a multiple-choice examination (MCQ). Two panels of raters also set the standard using the modified Angoff method for the same multiple-choice question paper on two occasions (6 months apart). We compared the pass/fail rates derived from the norm reference and the Angoff methods and also assessed the test-retest and inter-rater reliability of the modified Angoff method. The pass rate with the norm-reference method was 85% (66/78) and that by the Angoff method was 100% (78 out of 78). The percentage agreement between Angoff method and norm-reference was 78% (95% CI 69% - 87%). The modified Angoff method had an inter-rater reliability of 0.81-0.82 and a test-retest reliability of 0.59-0.74. There were significant differences in the outcomes of these two standard-setting methods, as shown by the difference in the proportion of candidates that passed and failed the assessment. The modified Angoff method was found to have good inter-rater reliability and moderate test-retest reliability.
Lee, Michele D; Kaidonis, Georgia; Kim, Alice Y; Shields, Ryan A; Leng, Theodore
2017-09-01
Choroidal nevi are common benign intraocular tumors with a small risk of malignant transformation. This retrospective study investigates the use of en face spectral-domain optical coherence tomography angiography (SD-OCTA) in determining the clinical features and measurement of choroidal nevi. Patients with choroidal nevi were imaged with both OCTA and a fundus photography device. Greatest longitudinal dimension (GLD), perpendicular dimension (PD), and the GLD/PD ratio were assessed on each device. Inter-device variation and intra- and inter-rater reliability analyses were performed. Fourteen patients with choroidal nevi were included. No significant difference between the GLD/PD ratio as measured by all three devices was found (Chi-square = 2.8, 2 df, P = .247). Intraclass correlation coefficients were greater than 0.7 for repeated measures on all devices, suggesting good repeatability and reproducibility. This study demonstrated inter-device consistency and high intra- and inter-rater reliability when measuring choroidal nevi. [Ophthalmic Surg Lasers Imaging Retina. 2017;48:741-747.]. Copyright 2017, SLACK Incorporated.
2014-01-01
Background A balance test provides important information such as the standard to judge an individual’s functional recovery or make the prediction of falls. The development of a tool for a balance test that is inexpensive and widely available is needed, especially in clinical settings. The Wii Balance Board (WBB) is designed to test balance, but there is little software used in balance tests, and there are few studies on reliability and validity. Thus, we developed a balance assessment software using the Nintendo Wii Balance Board, investigated its reliability and validity, and compared it with a laboratory-grade force platform. Methods Twenty healthy adults participated in our study. The participants participated in the test for inter-rater reliability, intra-rater reliability, and concurrent validity. The tests were performed with balance assessment software using the Nintendo Wii balance board and a laboratory-grade force platform. Data such as Center of Pressure (COP) path length and COP velocity were acquired from the assessment systems. The inter-rater reliability, the intra-rater reliability, and concurrent validity were analyzed by an intraclass correlation coefficient (ICC) value and a standard error of measurement (SEM). Results The inter-rater reliability (ICC: 0.89-0.79, SEM in path length: 7.14-1.90, SEM in velocity: 0.74-0.07), intra-rater reliability (ICC: 0.92-0.70, SEM in path length: 7.59-2.04, SEM in velocity: 0.80-0.07), and concurrent validity (ICC: 0.87-0.73, SEM in path length: 5.94-0.32, SEM in velocity: 0.62-0.08) were high in terms of COP path length and COP velocity. Conclusion The balance assessment software incorporating the Nintendo Wii balance board was used in our study and was found to be a reliable assessment device. In clinical settings, the device can be remarkably inexpensive, portable, and convenient for the balance assessment. PMID:24912769
Park, Dae-Sung; Lee, GyuChang
2014-06-10
A balance test provides important information such as the standard to judge an individual's functional recovery or make the prediction of falls. The development of a tool for a balance test that is inexpensive and widely available is needed, especially in clinical settings. The Wii Balance Board (WBB) is designed to test balance, but there is little software used in balance tests, and there are few studies on reliability and validity. Thus, we developed a balance assessment software using the Nintendo Wii Balance Board, investigated its reliability and validity, and compared it with a laboratory-grade force platform. Twenty healthy adults participated in our study. The participants participated in the test for inter-rater reliability, intra-rater reliability, and concurrent validity. The tests were performed with balance assessment software using the Nintendo Wii balance board and a laboratory-grade force platform. Data such as Center of Pressure (COP) path length and COP velocity were acquired from the assessment systems. The inter-rater reliability, the intra-rater reliability, and concurrent validity were analyzed by an intraclass correlation coefficient (ICC) value and a standard error of measurement (SEM). The inter-rater reliability (ICC: 0.89-0.79, SEM in path length: 7.14-1.90, SEM in velocity: 0.74-0.07), intra-rater reliability (ICC: 0.92-0.70, SEM in path length: 7.59-2.04, SEM in velocity: 0.80-0.07), and concurrent validity (ICC: 0.87-0.73, SEM in path length: 5.94-0.32, SEM in velocity: 0.62-0.08) were high in terms of COP path length and COP velocity. The balance assessment software incorporating the Nintendo Wii balance board was used in our study and was found to be a reliable assessment device. In clinical settings, the device can be remarkably inexpensive, portable, and convenient for the balance assessment.
Lim, J X; Toh, R X; Chook, S K H; Sebastin, S J; Karjalainen, T
2014-06-01
Previous studies have established the role of quantitative measurements of palmar abduction strength of the thumb (PAST). This study compares the reliability of the 'make' versus the 'break' test in measuring PAST in healthy volunteers. In a 'make' test, the body part being tested is positioned at the start of its range of motion and the participant is asked to exert his/her maximal force. In a 'break' test, increasing force is applied to a body part after it has completed its range of motion, until the joint being tested gives way. PAST was measured in both hands in 100 healthy volunteers using a handheld device. Two examiners measured PAST using both the 'make' and 'break' test to determine inter-rater reliability. The tests were repeated in 30 volunteers 6 weeks after the initial testing to determine intra-rater reliability. Our results showed that the 'make' test has better inter and intra-rater reliability.
Narrow Band Imaging Enhances the Detection Rate of Penetration and Aspiration in FEES.
Nienstedt, Julie C; Müller, Frank; Nießen, Almut; Fleischer, Susanne; Koseki, Jana-Christiane; Flügel, Till; Pflug, Christina
2017-06-01
Narrow band imaging (NBI) is widely used in gastrointestinal, laryngeal, and urological endoscopy. Its original purpose was to visualize vessels and epithelial irregularities. Based on our observation that adding NBI to common white light (WL) improves the contrast of the test bolus in fiberoptic endoscopic evaluation of swallowing (FEES), we now investigated the potential value of NBI in swallowing disorders. 148 FEES images were analyzed from 74 consecutive patients with swallowing disorders, including 74 with and 74 without NBI. All images were evaluated by four dysphagia specialists. Findings were classified according to Rosenbek's penetration-aspiration scale modified for evaluating these FEES images. Intra- and inter-rater reliability was determined as well as observer confidence. A better visualization of the bolus is the main advantage of NBI in FEES. This generally leads to sharper optical contrasts and better detection of small bolus quantities. Accordingly, NBI enhances the detection rate of penetration and aspiration. On average, identification of laryngeal penetration increased from 40 to 73% and of aspiration from 13 to 24% (each p < 0.01) of patients. In contrast to WL alone, the use of NBI also markedly increased the inter- and intra-rater reliability (p < 0.01) and the rating confidence of all experts (p < 0.05). NBI is an easy and cost-effective tool simplifying dysphagia evaluation and shortening FEES evaluation time. It leads to a markedly higher detection rate of pathological findings. The significantly better intra- and inter-rater reliability argues further for a better overall reproducibly of FEES interpretation.
Ultrasound measures of tendon thickness: Intra-rater, Inter-rater and Inter-machine reliability.
Del Baño-Aledo, María Elena; Martínez-Payá, Jacinto Javier; Ríos-Díaz, José; Mejías-Suárez, Silvia; Serrano-Carmona, Sergio; de Groot-Ferrando, Ana
2017-01-01
Ultrasound imaging is often used by physiotherapists and other healthcare professionals but the reliability of image acquisition with different ultrasound machines is unknown. The objective was to compare the intra-rater, inter-rater and intermachine reliability of thickness measurements of the plantar fascia (PF), Achilles tendon (AT), patellar tendon (PT) and elbow common extensor tendon (ECET) with musculoskeletal ultrasound imaging (MSUS). Tendon thickness was measured in four anatomical structures (14 participants, 28 images per tendon) by two sonographers and with two different ultrasound machines. Intraclass Correlation Coefficients (ICCs) and Bland-Altman plots were calculated. The standard error of measurement (SEM) and minimum detectable difference (MDD) were calculated. Inter-rater reliability was excellent for AT (ICC=0.98; 95% CI= 0.96-0.99) and very good for PT (ICC=0.85; 95% CI = 0.67-0.93) and ECET (ICC=0.81; 95% CI= 0.72-0.94). Reliability for PF was moderate, with an ICC of 0.63 (CI 95%= 0.20-0.83). Bland-Altman plot for inter-machine reliability showed a mean difference of 1 m for PF measurements and a mean difference of 4 m and 20 m for AT and PT. The relative SEMs were below 7% and the MDCs were below 0.7 mm. The MSUS reliability in measuring thickness of the four tendons is confirmed by the homogeneous readings intra sonographers, between operators and between different machines. Level of evidence: Tendon thickness can be measured reliably on different ultrasound devices, which is an important step forward in the use of this technique in daily clinical practice and research. III.
ERIC Educational Resources Information Center
Smith, Stacey L.; Vannest, Kimberly J.; Davis, John L.
2011-01-01
The reliability of data is a critical issue in decision-making for practitioners in the school. Percent Agreement and Cohen's kappa are the two most widely reported indices of inter-rater reliability, however, a recent Monte Carlo study on the reliability of multi-category scales found other indices to be more trustworthy given the type of data…
Chen, Qing; Zhang, Jinxiu; Hu, Ze
2017-01-01
This article investigates the dynamic topology control problem of satellite cluster networks (SCNs) in Earth observation (EO) missions by applying a novel metric of stability for inter-satellite links (ISLs). The properties of the periodicity and predictability of satellites’ relative position are involved in the link cost metric which is to give a selection criterion for choosing the most reliable data routing paths. Also, a cooperative work model with reliability is proposed for the situation of emergency EO missions. Based on the link cost metric and the proposed reliability model, a reliability assurance topology control algorithm and its corresponding dynamic topology control (RAT) strategy are established to maximize the stability of data transmission in the SCNs. The SCNs scenario is tested through some numeric simulations of the topology stability of average topology lifetime and average packet loss rate. Simulation results show that the proposed reliable strategy applied in SCNs significantly improves the data transmission performance and prolongs the average topology lifetime. PMID:28241474
Chen, Qing; Zhang, Jinxiu; Hu, Ze
2017-02-23
This article investigates the dynamic topology control problemof satellite cluster networks (SCNs) in Earth observation (EO) missions by applying a novel metric of stability for inter-satellite links (ISLs). The properties of the periodicity and predictability of satellites' relative position are involved in the link cost metric which is to give a selection criterion for choosing the most reliable data routing paths. Also, a cooperative work model with reliability is proposed for the situation of emergency EO missions. Based on the link cost metric and the proposed reliability model, a reliability assurance topology control algorithm and its corresponding dynamic topology control (RAT) strategy are established to maximize the stability of data transmission in the SCNs. The SCNs scenario is tested through some numeric simulations of the topology stability of average topology lifetime and average packet loss rate. Simulation results show that the proposed reliable strategy applied in SCNs significantly improves the data transmission performance and prolongs the average topology lifetime.
Ladwig, R; Vigo, A; Fedeli, L M G; Chambless, L E; Bensenor, I; Schmidt, M I; Vidigal, P G; Castilhos, C D; Duncan, B B
2016-08-01
Multi-center epidemiological studies must ascertain that their measurements are accurate and reliable. For laboratory measurements, reliability can be assessed through investigation of reproducibility of measurements in the same individual. In this paper, we present results from the quality control analysis of the baseline laboratory measurements from the ELSA-Brasil study. The study enrolled 15,105 civil servants at 6 research centers in 3 regions of Brazil between 2008-2010, with multiple biochemical analytes being measured at a central laboratory. Quality control was ascertained through standard laboratory evaluation of intra- and inter-assay variability and test-retest analysis in a subset of randomly chosen participants. An additional sample of urine or blood was collected from these participants, and these samples were handled in the same manner as the original ones, locally and at the central laboratory. Reliability was assessed with the intraclass correlation coefficient (ICC), estimated through a random effects model. Coefficients of variation (CV) and Bland-Altman plots were additionally used to assess measurement variability. Laboratory intra and inter-assay CVs varied from 0.86% to 7.77%. From test-retest analyses, the ICCs were high for the majority of the analytes. Notably lower ICCs were observed for serum sodium (ICC=0.50; 95%CI=0.31-0.65) and serum potassium (ICC=0.73; 95%CI=0.60-0.83), due to the small biological range of these analytes. The CVs ranged from 1 to 14%. The Bland-Altman plots confirmed these results. The quality control analyses showed that the collection, processing and measurement protocols utilized in the ELSA-Brasil produced reliable biochemical measurements.
Maroto, A; Illescas, T; Meléndez, M; Arévalo, S; Rodó, C; Peiró, J L; Belfort, M; Cuxart, A; Carreras, E
2017-10-01
To assess the reliability of the interpretation of a new technique for the ultrasound evaluation of the level of neurological lesion in fetuses with myelomeningocele. Observational study including myelomeningocele fetuses, referred to our center for the sonographic assessment of the fetal lower-limb movements, made and recorded by an expert in Maternal-fetal medicine and a specialist in Rehabilitation. Two observers, with different levels of expertise and blinded to each other's results, interpreted each recorded scan two different times. The agreement for the segmental levels assigned between the observers and the gold standard, the inter-observer and intra-observer reproducibility were tested using the weighed Kappa (wκ) index. Twenty-eight scans were recorded and evaluated. The agreement between the observers and the gold standard remained constant for the expert observer (wκ = 0.82) and increased (wκ = 0.66-wκ = 0.72) for the other one. The inter-observer and the intra-observer variability for the expert observer were wκ = 0.72 and wκ = 0.94, respectively. The agreement for the prenatal evaluation of the segmental neurological level was excellent, after a short training period, for observers with different degrees of expertise. The interpretation of this technique is reproducible enough and this supports its value for the prediction of postnatal motor function in myelomeningocele fetuses.
D'Agostino, Fabio; Barbaranelli, Claudio; Paans, Wolter; Belsito, Romina; Juarez Vela, Raul; Alvaro, Rosaria; Vellone, Ercole
2017-07-01
To evaluate the psychometric properties of the D-Catch instrument. A cross-sectional methodological study. Validity and reliability were estimated with confirmatory factor analysis (CFA) and internal consistency and inter-rater reliability, respectively. A sample of 250 nursing documentations was selected. CFA showed the adequacy of a 1-factor model (chronologically descriptive accuracy) with an outlier item (nursing diagnosis accuracy). Internal consistency and inter-rater reliability were adequate. The D-Catch is a valid and reliable instrument for measuring the accuracy of nursing documentation. Caution is needed when measuring diagnostic accuracy since only one item measures this dimension. The D-Catch can be used as an indicator of the accuracy of nursing documentation and the quality of nursing care. © 2015 NANDA International, Inc.
Modified personal interviews: resurrecting reliable personal interviews for admissions?
Hanson, Mark D; Kulasegaram, Kulamakan Mahan; Woods, Nicole N; Fechtig, Lindsey; Anderson, Geoff
2012-10-01
Traditional admissions personal interviews provide flexible faculty-student interactions but are plagued by low inter-interview reliability. Axelson and Kreiter (2009) retrospectively showed that multiple independent sampling (MIS) may improve reliability of personal interviews; thus, the authors incorporated MIS into the admissions process for medical students applying to the University of Toronto's Leadership Education and Development Program (LEAD). They examined the reliability and resource demands of this modified personal interview (MPI) format. In 2010-2011, LEAD candidates submitted written applications, which were used to screen for participation in the MPI process. Selected candidates completed four brief (10-12 minutes) independent MPIs each with a different interviewer. The authors blueprinted MPI questions to (i.e., aligned them with) leadership attributes, and interviewers assessed candidates' eligibility on a five-point Likert-type scale. The authors analyzed inter-interview reliability using the generalizability theory. Sixteen candidates submitted applications; 10 proceeded to the MPI stage. Reliability of the written application components was 0.75. The MPI process had overall inter-interview reliability of 0.79. Correlation between the written application and MPI scores was 0.49. A decision study showed acceptable reliability of 0.74 with only three MPIs scored using one global rating. Furthermore, a traditional admissions interview format would take 66% more time than the MPI format. The MPI format, used during the LEAD admissions process, achieved high reliability with minimal faculty resources. The MPI format's reliability and effective resource use were possible through MIS and employment of expert interviewers. MPIs may be useful for other admissions tasks.
Urrutia, Julio; Zamora, Tomas; Campos, Mauricio; Yurac, Ratko; Palma, Joaquin; Mobarec, Sebastian; Prada, Carlos
2016-07-01
We performed an agreement study using two subaxial cervical spine classification systems: the AOSpine and the Allen and Ferguson (A&F) classifications. We sought to determine which scheme allows better agreement by different evaluators and by the same evaluator on different occasions. Complete imaging studies of 65 patients with subaxial cervical spine injuries were classified by six evaluators (three spine sub-specialists and three senior orthopaedic surgery residents) using the AOSpine subaxial cervical spine classification system and the A&F scheme. The cases were displayed in a random sequence after a 6-week interval for repeat evaluation. The Kappa coefficient (κ) was used to determine inter- and intra-observer agreement. Inter-observer: considering the main AO injury types, the agreement was substantial for the AOSpine classification [κ = 0.61 (0.57-0.64)]; using AO sub-types, the agreement was moderate [κ = 0.57 (0.54-0.60)]. For the A&F classification, the agreement [κ = 0.46 (0.42-0.49)] was significantly lower than using the AOSpine scheme. Intra-observer: the agreement was substantial considering injury types [κ = 0.68 (0.62-0.74)] and considering sub-types [κ = 0.62 (0.57-0.66)]. Using the A&F classification, the agreement was also substantial [κ = 0.66 (0.61-0.71)]. No significant differences were observed between spine surgeons and orthopaedic residents in the overall inter- and intra-observer agreement, or in the inter- and intra-observer agreement of specific type of injuries. The AOSpine classification (using the four main injury types or at the sub-types level) allows a significantly better agreement than the A&F classification. The A&F scheme does not allow reliable communication between medical professionals.
Overview of intercalibration of satellite instruments
Chander, G.; Hewison, T.J.; Fox, N.; Wu, X.; Xiong, X.; Blackwell, W.J.
2013-01-01
Inter-calibration of satellite instruments is critical for detection and quantification of changes in the Earth’s environment, weather forecasting, understanding climate processes, and monitoring climate and land cover change. These applications use data from many satellites; for the data to be inter-operable, the instruments must be cross-calibrated. To meet the stringent needs of such applications requires that instruments provide reliable, accurate, and consistent measurements over time. Robust techniques are required to ensure that observations from different instruments can be normalized to a common scale that the community agrees on. The long-term reliability of this process needs to be sustained in accordance with established reference standards and best practices. Furthermore, establishing physical meaning to the information through robust Système International d'unités (SI) traceable Calibration and Validation (Cal/Val) is essential to fully understand the parameters under observation. The processes of calibration, correction, stability monitoring, and quality assurance need to be underpinned and evidenced by comparison with “peer instruments” and, ideally, highly calibrated in-orbit reference instruments. Inter-calibration between instruments is a central pillar of the Cal/Val strategies of many national and international satellite remote sensing organizations. Inter-calibration techniques as outlined in this paper not only provide a practical means of identifying and correcting relative biases in radiometric calibration between instruments but also enable potential data gaps between measurement records in a critical time series to be bridged. Use of a robust set of internationally agreed upon and coordinated inter-calibration techniques will lead to significant improvement in the consistency between satellite instruments and facilitate accurate monitoring of the Earth’s climate at uncertainty levels needed to detect and attribute the mechanisms of change. This paper summarizes the state-of-the-art of post-launch radiometric calibration of remote sensing satellite instruments, through inter-calibration.
Diffuse intrinsic pontine glioma: is MRI surveillance improved by region of interest volumetry?
Riley, Garan T; Armitage, Paul A; Batty, Ruth; Griffiths, Paul D; Lee, Vicki; McMullan, John; Connolly, Daniel J A
2015-02-01
Paediatric diffuse intrinsic pontine glioma (DIPG) is noteworthy for its fibrillary infiltration through neuroparenchyma and its resultant irregular shape. Conventional volumetry methods aim to approximate such irregular tumours to a regular ellipse, which could be less accurate when assessing treatment response on surveillance MRI. Region-of-interest (ROI) volumetry methods, using manually traced tumour profiles on contiguous imaging slices and subsequent computer-aided calculations, may prove more reliable. To evaluate whether the reliability of MRI surveillance of DIPGs can be improved by the use of ROI-based volumetry. We investigated the use of ROI- and ellipsoid-based methods of volumetry for paediatric DIPGs in a retrospective review of 22 MRI examinations. We assessed the inter- and intraobserver variability of the two methods when performed by four observers. ROI- and ellipsoid-based methods strongly correlated for all four observers. The ROI-based volumes showed slightly better agreement both between and within observers than the ellipsoid-based volumes (inter-[intra-]observer agreement 89.8% [92.3%] and 83.1% [88.2%], respectively). Bland-Altman plots show tighter limits of agreement for the ROI-based method. Both methods are reproducible and transferrable among observers. ROI-based volumetry appears to perform better with greater intra- and interobserver agreement for complex-shaped DIPG.
Hilgenfeld, Tim; Kästel, Thorsten; Heil, Alexander; Rammelsberg, Peter; Heiland, Sabine; Bendszus, Martin; Schwindling, Franz Sebastian
2018-04-01
To evaluate whether high-resolution, non-contrast-enhanced dental magnetic resonance imaging (MRI) can be used for accurate determination of palatal masticatory mucosa thickness (PMMT) and to locate the greater palatal artery (GPA). In five volunteers (four males, one female; mean age 30.2 ± 0.4 years), two independent raters measured PMMT by use of dental MRI in 180 positions. For comparison, clinical bone sounding was performed. The GPA was identified in time-of-flight (TOF) angiography and MSVAT-SPACE-prototype sequence. Intra- and inter-observer agreement for MRI measurements, agreement between MRI and bone sounding were analysed by intra-class correlation coefficient (ICC) and Cohen's kappa (κ). Reliability of dental MRI measurements was high (intra-observer-ICC 0.962; inter-observer ICC 0.959). Agreement of MRI measurements with bone sounding was moderate (ICC 0.744), and the GPA could be identified in 60% of measurement points using the TOF-angiography alone and in 85% with additional information of the MSVAT-SPACE. Good intra-observer agreement was observed for GPA identification (κ: 0.778). Palatal masticatory mucosa thickness measured by high-resolution, non-contrast enhanced dental MRI is comparable with that obtained by bone sounding. Dental MRI enables reliable, non-invasive and radiation-free planning of palatal tissue harvesting and can also be used for location of the GPA at 85% of measurement points, which might help reduce complications during surgery. © 2018 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Rosenson, Robert S; Miller, Kate; Bayliss, Martha; Sanchez, Robert J; Baccara-Dinet, Marie T; Chibedi-De-Roche, Daniela; Taylor, Beth; Khan, Irfan; Manvelian, Garen; White, Michelle; Jacobson, Terry A
2017-04-01
The Statin-Associated Muscle Symptom Clinical Index (SAMS-CI) is a method for assessing the likelihood that a patient's muscle symptoms (e.g., myalgia or myopathy) were caused or worsened by statin use. The objectives of this study were to prepare the SAMS-CI for clinical use, estimate its inter-rater reliability, and collect feedback from physicians on its practical application. For content validity, we conducted structured in-depth interviews with its original authors as well as with a panel of independent physicians. Estimation of inter-rater reliability involved an analysis of 30 written clinical cases which were scored by a sample of physicians. A separate group of physicians provided feedback on the clinical use of the SAMS-CI and its potential utility in practice. Qualitative interviews with providers supported the content validity of the SAMS-CI. Feedback on the clinical use of the SAMS-CI included several perceived benefits (such as brevity, clear wording, and simple scoring process) and some possible concerns (workflow issues and applicability in primary care). The inter-rater reliability of the SAMS-CI was estimated to be 0.77 (confidence interval 0.66-0.85), indicating high concordance between raters. With additional provider feedback, a revised SAMS-CI instrument was created suitable for further testing, both in the clinical setting and in prospective validation studies. With standardized questions, vetted language, easily interpreted scores, and demonstrated reliability, the SAMS aims to estimate the likelihood that a patient's muscle symptoms were attributable to statins. The SAMS-CI may support better detection of statin-associated muscle symptoms in clinical practice, optimize treatment for patients experiencing muscle symptoms, and provide a useful tool for further clinical research.
Towards an Operational Definition of Clinical Competency in Pharmacy
2015-01-01
Objective. To estimate the inter-rater reliability and accuracy of ratings of competence in student pharmacist/patient clinical interactions as depicted in videotaped simulations and to compare expert panelist and typical preceptor ratings of those interactions. Methods. This study used a multifactorial experimental design to estimate inter-rater reliability and accuracy of preceptors’ assessment of student performance in clinical simulations. The study protocol used nine 5-10 minute video vignettes portraying different levels of competency in student performance in simulated clinical interactions. Intra-Class Correlation (ICC) was used to calculate inter-rater reliability and Fisher exact test was used to compare differences in distribution of scores between expert and nonexpert assessments. Results. Preceptors (n=42) across 5 states assessed the simulated performances. Intra-Class Correlation estimates were higher for 3 nonrandomized video simulations compared to the 6 randomized simulations. Preceptors more readily identified high and low student performances compared to satisfactory performances. In nearly two-thirds of the rating opportunities, a higher proportion of expert panelists than preceptors rated the student performance correctly (18 of 27 scenarios). Conclusion. Valid and reliable assessments are critically important because they affect student grades and formative student feedback. Study results indicate the need for pharmacy preceptor training in performance assessment. The process demonstrated in this study can be used to establish minimum preceptor benchmarks for future national training programs. PMID:26089563
Inter-rater reliability of twelve diagnostic systems of schizophrenia.
Helmes, E; Landmark, J; Kazarian, S S
1983-05-01
The present and past symptomatology of 31 chronic schizophrenics was rated by four independent judges, two experienced clinical psychiatrists and two psychiatric residents, in a context more representative of actual clinical practice than most research studies. Ratings were made on 64 symptoms derived from 12 diagnostic systems, based on either live or videotaped interviews for present symptomatology and case records for past symptomatology. Inter-rater reliabilities were higher for present than for past symptoms, and in general did not approach those reported for highly trained raters. There were no differences between live and videotaped interviews. Diagnostic systems differed widely in rater agreement. The most consistent across both past and present symptomatology were the systems of Langfeldt, Schneider, and DSM-III, for which the level of reliability was consistent with other studies.
López-de-Uralde-Villanueva, Ibai; Acuyo-Osorio, Mario; Prieto-Aldana, María; La Touche, Roy
2017-04-01
The Passive Neck Flexion Test (PNFT) can diagnose meningitis and potential spinal disorders. Little evidence is available concerning the use of a modified version of the PNFT (mPNFT) in patients with chronic nonspecific neck pain (CNSNP). To assess the reliability of the mPNFT in subjects with and without CNSNP. The secondary objective was to assess the differences in the symptoms provoked by the mPNFT between these two populations. We used repeated measures concordance design for the main objective and cross-sectional design for the secondary objective. A total of 30 asymptomatic subjects and 34 patients with CNSNP were recruited. The following measures were recorded: the range of motion at the onset of symptoms (OS-mPNFT), the range of motion at the submaximal pain (SP-mPNFT), and evoked pain intensity on the mPNFT (VAS-mPNFT). Good to excellent reliability was observed for OS-mPNFT and SP-mPNFT in the asymptomatic group (intra-examiner reliability: 0.95-0.97; inter-examiner reliability: 0.86-0.90; intra-examiner test-retest reliability: 0.84-0.87). In the CNSNP group, a good to excellent reliability was obtained for the OS-mPNFT (intra-examiner reliability: 0.89-0.96; inter-examiner reliability: 0.83-0.86; intra-examiner test-retest reliability: 0.83-0.85) and the SP-PNFT (intra-examiner reliability: 0.94-0.98; inter-examiner reliability: 0.80-0.82; intra-examiner test-retest reliability: 0.88-0.91). The CNSNP group showed statistically significant differences in OS-mPNFT (t = 4.92; P < 0.001), SP-mPNFT (t = 2.79; P = 0.007) and in VAS-mPNFT (t = -10.39; P < 0.001) versus the asymptomatic group. The mPNFT is a reliable tool regardless of the examiner and the time factor. Patients with CNSNP have a decrease range of motion and more pain than asymptomatic subjects in the mPNFT. This exceeds the minimal detectable changes for OS-mPNFT and VAS-mPNFT. Copyright © 2017 Elsevier Ltd. All rights reserved.
Gilmore-Bykovskyi, Andrea L.
2015-01-01
Mealtime behavioral symptoms are distressing and frequently interrupt eating for the individual experiencing them and others in the environment. In order to enable identification of potential antecedents to mealtime behavioral symptoms, a computer-assisted coding scheme was developed to measure caregiver person-centeredness and behavioral symptoms for nursing home residents with dementia during mealtime interactions. The purpose of this pilot study was to determine the acceptability and feasibility of procedures for video-capturing naturally-occurring mealtime interactions between caregivers and residents with dementia, to assess the feasibility, ease of use, and inter-observer reliability of the coding scheme, and to explore the clinical utility of the coding scheme. Trained observers coded 22 observations. Data collection procedures were feasible and acceptable to caregivers, residents and their legally authorized representatives. Overall, the coding scheme proved to be feasible, easy to execute and yielded good to very good inter-observer agreement following observer re-training. The coding scheme captured clinically relevant, modifiable antecedents to mealtime behavioral symptoms, but would be enhanced by the inclusion of measures for resident engagement and consolidation of items for measuring caregiver person-centeredness that co-occurred and were difficult for observers to distinguish. PMID:25784080
Chanani, Ankit; Adhikari, Haridas Das
2017-01-01
Differential diagnosis of periapical cysts and granulomas is required as their treatment modalities are different. The aim of this study was to evaluate the efficacy of cone beam computed tomography (CBCT) in the differential diagnosis of periapical cysts from granulomas. A single-centered observational study was carried out in the Department of Conservative Dentistry and Endodontics, Dr. R. Ahmed Dental College and Hospital, using CBCT and dental operating microscope. Forty-five lesions were analyzed using CBCT scans. One evaluator analyzed each CBCT scan for the presence of the following six characteristic radiological features: cyst like-location, shape, periphery, internal structure, effect on the surrounding structures, and cortical plate perforation. Another independent evaluator analyzed the CBCT scans. This process was repeated after 6 months, and inter- and intrarater reliability of CBCT diagnoses was evaluated. Periapical surgeries were performed and tissue samples were obtained for histopathological analysis. To evaluate the efficacy, CBCT diagnoses were compared with histopathological diagnoses, and six receiver operating characteristic (ROC) curve analyses were conducted. ROC curve, Cronbach's alpha (α) test, and Cohen Kappa (κ) test were used for statistical analysis. Both inter- and intrarater reliability were excellent (α = 0.94, κ = 0.75 and 0.77, respectively). ROC curve with regard to ≥4 positive findings revealed the highest area under curve (0.66). CBCT is moderately accurate in the differential diagnosis of periapical cysts and granulomas.
Ringe, Kristina Imeen; Luetkens, Julian A; Fimmers, Rolf; Hammerstingl, Renate Maria; Layer, Günter; Maurer, Martin H; Nähle, Claas Philip; Michalik, Sabine; Reimer, Peter; Schraml, Christina; Schreyer, Andreas G; Stumpp, Patrick; Vogl, Thomas J; Wacker, Frank K; Willinek, Winfried; Kukuk, Guido Mattias
2018-04-01
To assess the interrater agreement and reliability of experienced abdominal radiologists in the characterization and grading of arterial phase gadoxetate disodium-related respiratory motion artifact on liver MRI. This prospective multicenter study was initiated by the working group for abdominal imaging within the German Roentgen Society (DRG), and approved by the local IRB of each participating center. 11 board-certified radiologists independently reviewed 40 gadoxetate disodium-enhanced liver MRI datasets. Motion artifacts in the arterial phase were assessed on a 5-point scale. Interrater agreement and reliability were calculated using the intraclass correlation coefficient (ICC) and Kendall coefficient of concordance (W), with p < 0.05 deemed significant. The ICC for interrater agreement and reliability were 0.983 (CI 0.973 - 0.990) and 0.985 (CI 0.978 - 0.991), respectively (both p < 0.0001), indicating excellent agreement and reliability. Kendall's W for interrater agreement was 0.865. A severe motion artifact, defined as a mean motion score ≥ 4 in the arterial phase was observed in 12 patients. In these specific cases, a motion score ≥ 4 was assigned by all readers in 75 % (n = 9/12 cases). Differentiation and grading of arterial phase respiratory motion artifact is possible with a high level of inter-/intrarater agreement and interrater reliability, which is crucial for assessing the incidence of this phenomenon in larger multicenter studies. · Inter- and intrarater agreement for motion artifact scoring is excellent among experienced readers.. · Interrater reliability for motion artifact scoring is excellent among experienced readers.. · Characterization of severe motion artifacts proved feasible in this multicenter study.. · Ringe KI, Luetkens JA, Fimmers R et al. Characterization of Severe Arterial Phase Respiratory Motion Artifact on Gadoxetate Disodium-Enhanced MRI - Assessment of Interrater Agreement and Reliability. Fortschr Röntgenstr 2017; 190: 341 - 347. © Georg Thieme Verlag KG Stuttgart · New York.
MacLean, Sharon; Geddes, Fiona; Kelly, Michelle; Della, Phillip
2018-03-01
Simulated patients (SPs) are frequently used for training nursing students in communication skills. An acknowledged benefit of using SPs is the opportunity to provide a standardized approach by which participants can demonstrate and develop communication skills. However, relatively little evidence is available on how to best facilitate and evaluate the reliability and accuracy of SPs' performances. The aim of this study is to investigate the effectiveness of an evidenced based SP training framework to ensure standardization of SPs. The training framework was employed to improve inter-rater reliability of SPs. A quasi-experimental study was employed to assess SP post-training understanding of simulation scenario parameters using inter-rater reliability agreement indices. Two phases of data collection took place. Initially a trial phase including audio-visual (AV) recordings of two undergraduate nursing students completing a simulation scenario is rated by eight SPs using the Interpersonal Communication Assessments Scale (ICAS) and Quality of Discharge Teaching Scale (QDTS). In phase 2, eight SP raters and four nursing faculty raters independently evaluated students' (N=42) communication practices using the QDTS. Intraclass correlation coefficients (ICC) were >0.80 for both stages of the study in clinical communication skills. The results support the premise that if trained appropriately, SPs have a high degree of reliability and validity to both facilitate and evaluate student performance in nurse education. Crown Copyright © 2018. Published by Elsevier Ltd. All rights reserved.
Delcroix, Olivier; Robin, Philippe; Gouillou, Maelenn; Le Duc-Pennec, Alexandra; Alavi, Zarrin; Le Roux, Pierre-Yves; Abgral, Ronan; Salaun, Pierre-Yves; Bourhis, David; Querellou, Solène
2018-02-12
xSPECT Bone® (xB) is a new reconstruction algorithm developed by Siemens® in bone hybrid imaging (SPECT/CT). A CT-based tissue segmentation is incorporated into SPECT reconstruction to provide SPECT images with bone anatomy appearance. The objectives of this study were to assess xB/CT reconstruction diagnostic reliability and accuracy in comparison with Flash 3D® (F3D)/CT in clinical routine. Two hundred thirteen consecutive patients referred to the Brest Nuclear Medicine Department for non-oncological bone diseases were evaluated retrospectively. Two hundred seven SPECT/CT were included. All SPECT/CT were independently interpreted by two nuclear medicine physicians (a junior and a senior expert) with xB/CT then with F3D/CT three months later. Inter-observer agreement (IOA) and diagnostic confidence were determined using McNemar test, and unweighted Kappa coefficient. The study objectives were then re-assessed for validation through > 18 months of clinical and paraclinical follow-up. No statistically significant differences between IOA xB and IOA F3D were found (p = 0.532). Agreement for xB after categorical classification of the diagnoses was high (κ xB = 0.89 [95% CI 0.84 -0.93]) but without statistically significant difference F3D (κ F3D = 0.90 [95% CI 0.86 - 0.94]). Thirty-one (14.9%) inter-reconstruction diagnostic discrepancies were observed of which 21 (10.1%) were classified as major. The follow-up confirmed the diagnosis of F3D in 10 cases, xB in 6 cases and was non-contributory in 5 cases. xB reconstruction algorithm was found reliable, providing high interobserver agreement and similar diagnostic confidence to F3D reconstruction in clinical routine.
Development and Validation of a Family Meeting Assessment Tool (FMAT).
Hagiwara, Yuya; Healy, Jennifer; Lee, Shuko; Ross, Jeanette; Fischer, Dixie; Sanchez-Reilly, Sandra
2018-01-01
A cornerstone procedure in Palliative Medicine is to perform family meetings. Learning how to lead a family meeting is an important skill for physicians and others who care for patients with serious illnesses and their families. There is limited evidence on how to assess best practice behaviors during end-of-life family meetings. Our aim was to develop and validate an observational tool to assess trainees' ability to lead a simulated end-of-life family meeting. Building on evidence from published studies and accrediting agency guidelines, an expert panel at our institution developed the Family Meeting Assessment Tool. All fourth-year medical students (MS4) and eight geriatric and palliative medicine fellows (GPFs) were invited to participate in a Family Meeting Objective Structured Clinical Examination, where each trainee assumed the physician role leading a complex family meeting. Two evaluators observed and rated randomly chosen students' performances using the Family Meeting Assessment Tool during the examination. Inter-rater reliability was measured using percent agreement. Internal consistency was measured using Cronbach α. A total of 141 trainees (MS4 = 133 and GPF = 8) and 26 interdisciplinary evaluators participated in the study. Internal reliability (Cronbach α) of the tool was 0.85. Number of trainees rated by two evaluators was 210 (MS4 = 202 and GPF = 8). Rater agreement was 84%. Composite scores, on average, were significantly higher for fellows than for medical students (P < 0.001). Expert-based content, high inter-rater reliability, good internal consistency, and ability to predict educational level provided initial evidence for construct validity for this novel assessment tool. Copyright © 2017 American Academy of Hospice and Palliative Medicine. All rights reserved.
Mani, Suresh; Sharma, Shobha; Omar, Baharudin; Paungmali, Aatit; Joseph, Leonard
2017-04-01
Purpose The purpose of this review is to systematically explore and summarise the validity and reliability of telerehabilitation (TR)-based physiotherapy assessment for musculoskeletal disorders. Method A comprehensive systematic literature review was conducted using a number of electronic databases: PubMed, EMBASE, PsycINFO, Cochrane Library and CINAHL, published between January 2000 and May 2015. The studies examined the validity, inter- and intra-rater reliabilities of TR-based physiotherapy assessment for musculoskeletal conditions were included. Two independent reviewers used the Quality Appraisal Tool for studies of diagnostic Reliability (QAREL) and the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) tool to assess the methodological quality of reliability and validity studies respectively. Results A total of 898 hits were achieved, of which 11 articles based on inclusion criteria were reviewed. Nine studies explored the concurrent validity, inter- and intra-rater reliabilities, while two studies examined only the concurrent validity. Reviewed studies were moderate to good in methodological quality. The physiotherapy assessments such as pain, swelling, range of motion, muscle strength, balance, gait and functional assessment demonstrated good concurrent validity. However, the reported concurrent validity of lumbar spine posture, special orthopaedic tests, neurodynamic tests and scar assessments ranged from low to moderate. Conclusion TR-based physiotherapy assessment was technically feasible with overall good concurrent validity and excellent reliability, except for lumbar spine posture, orthopaedic special tests, neurodynamic testa and scar assessment.
Newton, R. L.; Thomson, J. L.; Rau, K.; Duhe’, S.; Sample, A.; Singleton, N.; Anton, S. D.; Webber, L. S.; Williamson, D. A.
2011-01-01
Purpose To evaluate the implementation of intervention components of the Louisiana Health study, which was a multi-component childhood obesity prevention program conducted in rural schools. Design Content analysis. Setting Process evaluation assessed implementation in the classrooms, gym classes, and cafeterias. Subjects Classroom teachers (n = 232), physical education teachers (n = 53), food service managers (n = 33), and trained observers (n = 9). Measures Five process evaluation measures were created: Physical Education Questionnaire (PEQ), Intervention Questionnaire (IQ), Food Service Manager Questionnaire (FSMQ), Classroom Observation (CO) and School Nutrition Environment Observation (SNEO). Analysis Inter-rater reliability and internal consistency were conducted on all measures. ANOVA and Chi-square were used to compare differences across study groups on questionnaires and observations. Results The PEQ and one sub-scale from the FSMQ were eliminated because their reliability coefficients fell below acceptable standards. The sub-scale internal consistencies for the IQ, FSMQ, CO, and SNEO (all Cronbach’s α > .60) were acceptable. Conclusions After the initial 4 months of intervention, there was evidence that the Louisiana Health intervention was being implemented as it was designed. In summary, four process evaluation measures were found to be sufficiently reliable and valid for assessing the delivery of various aspects of a school-based obesity prevention program. These process measures could be modified to evaluate the delivery of other similar school-based interventions. PMID:21721969
Jackson, Benjamin M; Polglaze, Ted; Dawson, Brian; King, Trish; Peeling, Peter
2018-02-21
To compare data from conventional GPS and new GNSS-enabled tracking devices, and to examine the inter-unit reliability of GNSS devices. Inter-device differences between 10 Hz GPS and GNSS devices were examined during laps (n=40) of a simulated game circuit (SGC) and during elite hockey matches (n=21); GNSS inter-unit reliability was also examined during the SGC laps. Differences in distance values and measures in three velocity categories (low <3 m.s -1 ; moderate 3-5 m.s -1 ; high >5 m.s -1 ) and acceleration/deceleration counts (>1.46 m.s -2 and < -1.46 m.s -2 ) were examined using one-way ANOVA. Inter-unit GNSS reliability was examined using the coefficient of variation (CV) and intra-class correlation coefficient (ICC). Inter-device differences (P <0.05) were found for measures of peak deceleration, low-speed distance, % total distance at low speed, and deceleration count during the SGC, and for all measures except total distance and low-speed distance during hockey matches. Inter-unit (GNSS) differences (P <0.05) were not found. The CV was below 5% for total distance, average and peak speeds and distance and % total distance of low-speed running. The GNSS devices had a lower HDoP score than GPS devices in all conditions. These findings suggest that GNSS devices may be more sensitive than GPS in quantifying the physical demands of team sport movements, but further study into the accuracy of GNSS devices is required.
Griessenauer, Christoph J; Foreman, Paul; Shoja, Mohammadali M; Kicielinski, Kimberly P; Deveikis, John P; Walters, Beverly C; Harrigan, Mark R
2015-04-01
Traumatic aneurysms occur in up to 20% of blunt traumatic extracranial carotid artery injuries. Currently there is no standardized method for characterization of traumatic aneurysms. For the carotid and vertebral injury study (CAVIS), a prospective study of traumatic cerebrovascular injury, we established a method for aneurysm characterization and tested its reliability. Saccular aneurysm size was defined as the greatest linear distance between the expected location of the normal artery wall and the outer edge of the aneurysm lumen ("depth"). Fusiform aneurysm size was defined as the "depth" and longitudinal distance ("length") paralleling the normal artery. The size of the aneurysm relative to the normal artery was also assessed. Reliability measurements were made using four raters who independently reviewed 15 computed tomographic angiograms (CTAs) and 13 digital subtraction angiograms (DSAs) demonstrating a traumatic aneurysm of the internal carotid artery. Raters categorized the aneurysms as either "saccular" or "fusiform" and made measurements. Five scans of each imaging modality were repeated to evaluate intra-rater reliability. Fleiss's free-marginal multi-rater kappa (κ), Cohen's kappa (κ), and interclass correlation coefficient (ICC) determined inter- and intra-rater reliability. Inter-rater agreement as to the aneurysm "shape" was almost perfect for CTA (κ = 0.82) and DSA (κ = 0.897). Agreements on aneurysm "depth," "length," "aneurysm plus parent artery," and "parent artery" for CTA and DSA were excellent (ICC > 0.75). Intra-rater agreement as to aneurysm "shape" was substantial to almost perfect (κ > 0.60). The CAVIS method of traumatic aneurysm characterization has remarkable inter- and intra-rater reliability and will facilitate further studies of the natural history and management of extracranial cerebrovascular traumatic aneurysms. © The Author(s) 2015 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
The development and testing of a qualitative instrument designed to assess critical thinking
NASA Astrophysics Data System (ADS)
Clauson, Cynthia Louisa
This study examined a qualitative approach to assess critical thinking. An instrument was developed that incorporates an assessment process based on Dewey's (1933) concepts of self-reflection and critical thinking as problem solving. The study was designed to pilot test the critical thinking assessment process with writing samples collected from a heterogeneous group of students. The pilot test included two phases. Phase 1 was designed to determine the validity and inter-rater reliability of the instrument using two experts in critical thinking, problem solving, and literacy development. Validity of the instrument was addressed by requesting both experts to respond to ten questions in an interview. The inter-rater reliability was assessed by analyzing the consistency of the two experts' scorings of the 20 writing samples to each other, as well as to my scoring of the same 20 writing samples. Statistical analyses included the Spearman Rho and the Kuder-Richardson (Formula 20). Phase 2 was designed to determine the validity and reliability of the critical thinking assessment process with seven science teachers. Validity was addressed by requesting the teachers to respond to ten questions in a survey and interview. Inter-rater reliability was addressed by comparing the seven teachers' scoring of five writing samples with my scoring of the same five writing samples. Again, the Spearman Rho and the Kuder-Richardson (Formula 20) were used to determine the inter-rater reliability. The validity results suggest that the instrument is helpful as a guide for instruction and provides a systematic method to teach and assess critical thinking while problem solving with students in the classroom. The reliability results show the critical thinking assessment instrument to possess fairly high reliability when used by the experts, but weak reliability when used by classroom teachers. A major conclusion was drawn that teachers, as well as students, would need to receive instruction in critical thinking and in how to use the assessment process in order to gain more consistent interpretations of the six problem-solving steps. Specific changes needing to be made in the instrument to improve the quality are included.
Ruehland, Warren R; O'Donoghue, Fergal J; Pierce, Robert J; Thornton, Andrew T; Singh, Parmjit; Copland, Janet M; Stevens, Bronwyn; Rochford, Peter D
2011-01-01
To examine the impact of using American Academy of Sleep Medicine (AASM) recommended EEG derivations (F4/M1, C4/M1, O2/M1) vs. a single derivation (C4/M1) in polysomnography (PSG) on the measurement of sleep and cortical arousals, including inter- and intra-observer variability. Prospective, non-blinded, randomized comparison. Three Australian tertiary-care hospital clinical sleep laboratories. 30 PSGs from consecutive patients investigated for obstructive sleep apnea (OSA) during December 2007 and January 2008. N/A. To examine the impact of EEG derivations on PSG summary statistics, 3 scorers from different Australian clinical sleep laboratories each scored separate sets of 10 PSGs twice, once using 3 EEG derivations and once using 1 EEG derivation. To examine the impact on inter- and intra-scorer reliability, all 3 scorers scored a subset of 10 PSGs 4 times, twice using each method. All PSGs were de-identified and scored in random order according to the 2007 AASM Manual for the Scoring of Sleep and Associated Events. Using 3 referential EEG derivations during PSG, as recommended in the AASM manual, instead of a single central EEG derivation, as originally suggested by Rechtschaffen and Kales (1968), resulted in a mean ± SE decrease in N1 sleep of 9.6 ± 3.9 min (P = 0.018) and an increase in N3 sleep of 10.6 ± 2.8 min (P = 0.001). No significant differences were observed for any other sleep or arousal scoring summary statistics; nor were any differences observed in inter-scorer or intra-scorer reliability for scoring sleep or cortical arousals. This study provides information for those changing practice to comply with the 2007 AASM recommendations for EEG placement in PSG, for those using portable devices that are unable to comply with the recommendations due to limited channel options, and for the development of future standards for PSG scoring and recording. As the use of multiple EEG derivations only led to small changes in the distribution of derived sleep stages and no significant differences in scoring reliability, this study calls into question the need to use multiple EEG derivations in clinical PSG as suggested in the AASM manual.
A Student Assessment Tool for Standardized Patient Simulations (SAT-SPS): Psychometric analysis.
Castro-Yuste, Cristina; García-Cabanillas, María José; Rodríguez-Cornejo, María Jesús; Carnicer-Fuentes, Concepción; Paloma-Castro, Olga; Moreno-Corral, Luis Javier
2018-05-01
The evaluation of the level of clinical competence acquired by the student is a complex process that must meet various requirements to ensure its quality. The psychometric analysis of the data collected by the assessment tools used is a fundamental aspect to guarantee the student's competence level. To conduct a psychometric analysis of an instrument which assesses clinical competence in nursing students at simulation stations with standardized patients in OSCE-format tests. The construct of clinical competence was operationalized as a set of observable and measurable behaviors, measured by the newly-created Student Assessment Tool for Standardized Patient Simulations (SAT-SPS), which was comprised of 27 items. The categories assigned to the items were 'incorrect or not performed' (0), 'acceptable' (1), and 'correct' (2). 499 nursing students. Data were collected by two independent observers during the assessment of the students' performance at a four-station OSCE with standardized patients. Descriptive statistics were used to summarize the variables. The difficulty levels and floor and ceiling effects were determined for each item. Reliability was analyzed using internal consistency and inter-observer reliability. The validity analysis was performed considering face validity, content and construct validity (through exploratory factor analysis), and criterion validity. Internal reliability and inter-observer reliability were higher than 0.80. The construct validity analysis suggested a three-factor model accounting for 37.1% of the variance. These three factors were named 'Nursing process', 'Communication skills', and 'Safe practice'. A significant correlation was found between the scores obtained and the students' grades in general, as well as with the grades obtained in subjects with clinical content. The assessment tool has proven to be sufficiently reliable and valid for the assessment of the clinical competence of nursing students using standardized patients. This tool has three main components: the nursing process, communication skills, and safety management. Copyright © 2018 Elsevier Ltd. All rights reserved.
Shipley, Hilary; Guedes, Alonso; Graham, Lynelle; Goudie-DeAngelis, Elizabeth; Wendt-Hornickle, Erin
2018-05-01
Objectives The objective of this study was to determine the inter-rater reliability and convergent validity of the Colorado State University Feline Acute Pain Scale (CSU-FAPS) in a preliminary appraisal of its performance in a clinical teaching setting. Methods Sixty-eight female cats were assessed for pain after ovariohysterectomy. A cohort of 21 cats was examined independently by four raters (two board-certified anesthesiologists and two anesthesia residents) with the CSU-FAPS, and intra-class correlation coefficient (ICC) was used to determine inter-rater reliability. Weighted Cohen's kappa was used to determine inter-rater reliability centered on the 'need to reassess analgesic plan' (dichotomous scale). A separate cohort of 47 cats was evaluated independently by two raters (one board-certified anesthesiologist and one veterinary small animal rotating intern) using the CSU-FAPS and the Glasgow Composite Measure Pain Scale (CMPS-Feline), and Spearman rank-order correlation was determined to assess convergent validity. Reliability was interpreted using Altman's classification as very good, good, moderate, fair and poor. Validity was considered adequate if correlation coefficients were between 0.4 and 0.8. Results The ICC was 0.61 for anesthesiologists and 0.67 for residents, indicating good reliability. Weighted Cohen's kappa was 0.79 for anesthesiologists and 0.44 for residents, indicating moderate to good reliability. The Spearman rank correlation indicated a statistically significant ( P = 0.0003) positive correlation (0.31; 95% confidence interval 0.14-0.46) between the CSU-FAPS and the CMPS-Feline. Conclusions and relevance The CSU-FAPS showed moderate-to-good inter-rater reliability when used by veterinarians to assess pain level or need to reassess analgesic plan after ovariohysterectomy in cats. The validity fell short of current guidelines for correlation coefficients and further refinement and testing are warranted to improve its performance.
[Inter-rater reliability and validity of the OPD-CA axes structure and conflict].
Benecke, Cord; Bock, Astrid; Wieser, Elke; Tschiesner, Reinhard; Lochmann, Martha; Küspert, Felicia; Schorn, Robert; Viertler, Bernhard; Steinmayr-Gensluckner, Maria
2011-01-01
The manual of the Operationalized Psychodynamic Diagnostics in childhood and adolescence (OPD-CA) is an instrument meanwhile widespread in the clinical practice to assess psychodynamic dimensions. Publications of inter-rater agreement and validity are still outstanding. This study assessed the interrater-reliability and validity for the axis structure and the axis conflict. 60 adolescents between 14 and 17 years, with and without psychic disorders, were diagnosed with the Operationalized Psychodynamic Diagnostics in childhood and adolescence (Arbeitskreis OPD-KJ, 2007) and SCID-II-interviews and questionnaires. A partial sample of 36 OPD-CA-interviews was the data basis for the assessment of inter-rater agreement. Calculations of validity for axis structure and axis conflict were made with the whole sample. Inter-rater agreement for the axis structure and the axis conflict showed good to very good weighted Kappa coefficients among the trained raters. Validity of the axis structure showed good results. The Operationalized Psychodynamic Diagnostics in childhood and adolescence (OPD-CA) allows a reliable diagnostic of axis structure and axis conflict, if the ratings are done on the basis of semistructured videotaped interviews by trained raters. The axis structure shows validity, while the results concerning the validity of the axis conflict remain unclear.
Developing the Person-Environment Apathy Rating for persons with dementia.
Jao, Ying-Ling; Algase, Donna L; Specht, Janet K; Williams, Kristine
2016-08-01
To develop the Person-Environment Apathy Rating (PEAR) scale that measures environmental stimulation and apathy in persons with dementia and to evaluate its psychometrics. The PEAR scale consists of the PEAR-Environment subscale and PEAR-Apathy subscales. The items were developed via literature review, field testing, expert review, and pilot testing. The construct validity and reliability were examined through video observation. The parent study enrolled 185 institutionalized residents with dementia. For this study, 96 videos were selected from 24 participants. The PEAR-Environment subscale was validated using the Ambiance Scale and the Crowding Index. The PEAR-Apathy subscale was validated using the Neuropsychiatric Inventory (NPI)-Apathy, Passivity in Dementia Scale (PDS), and NPI-Depression. The PEAR-Environment subscale and PEAR-Apathy subscales each consists of six items rated on a 1-4 scale. For validity, the Crowding Index slightly, yet significantly, correlated with the PEAR-Environment subscale total score and three of the individual scores. Ambiance Scale scores, both engaging and soothing, did not correlate with the PEAR-Environment subscale. The PEAR-Apathy highly correlated with the PDS and NPI-Apathy and moderately correlated with the NPI-Depression, suggesting good convergent validity and moderate discriminant validity. For reliability, both environment and apathy subscales demonstrated excellent internal consistency. Although facial expression and eye contact showed moderate inter-rater reliability, all other items showed good to excellent inter-rater and intra-rater reliability. This study has successfully developed the PEAR scale and established its psychometrics based on the compatible scales available. The PEAR scale is the first scale that concurrently assesses apathy and environmental stimulation, and is recommended for use in persons with dementia.
Precise orbit determination of Multi-GNSS constellation including GPS GLONASS BDS and GALIEO
NASA Astrophysics Data System (ADS)
Dai, Xiaolei
2014-05-01
In addition to the existing American global positioning system (GPS) and the Russian global navigation satellite system (GLONASS), the new generation of GNSS is emerging and developing, such as the Chinese BeiDou satellite navigation system (BDS) and the European GALILEO system. Multi-constellation is expected to contribute to more accurate and reliable positioning and navigation service. However, the application of multi-constellation challenges the traditional precise orbit determination (POD) strategy that was designed usually for single constellation. In this contribution, we exploit a more rigorous multi-constellation POD strategy for the ongoing IGS multi-GNSS experiment (MGEX) where the common parameters are identical for each system, and the frequency- and system-specified parameters are employed to account for the inter-frequency and inter-system biases. Since the authorized BDS attitude model is not yet released, different BDS attitude model are implemented and their impact on orbit accuracy are studied. The proposed POD strategy was implemented in the PANDA (Position and Navigation Data Analyst) software and can process observations from GPS, GLONASS, BDS and GALILEO together. The strategy is evaluated with the multi-constellation observations from about 90 MGEX stations and BDS observations from the BeiDou experimental tracking network (BETN) of Wuhan University (WHU). Of all the MGEX stations, 28 stations record BDS observation, and about 80 stations record GALILEO observations. All these data were processed together in our software, resulting in the multi-constellation POD solutions. We assessed the orbit accuracy for GPS and GLONASS by comparing our solutions with the IGS final orbit, and for BDS and GALILEO by overlapping our daily orbit solution. The stability of inter-frequency bias of GLONASS and inter-system biases w.r.t. GPS for GLONASS, BDS and GALILEO were investigated. At last, we carried out precise point positioning (PPP) using the multi-constellation POD orbit and clock products, and analyzed the contribution of these POD products to PPP. Keywords: Multi-GNSS, Precise Orbit Determination, Inter-frequency bias, Inter-system bias, Precise Point Positioning
Lee, Hoe C.; Yanting Chee, Derserri; Selander, Helena; Falkmer, Torbjorn
2012-01-01
Background Current methods of determining licence retainment or cancellation is through on-road driving tests. Previous research has shown that occupational therapists frequently assess drivers’ visual attention while sitting in the back seat on the opposite side of the driver. Since the eyes of the driver are not always visible, assessment by eye contact becomes problematic. Such procedural drawbacks may challenge validity and reliability of the visual attention assessments. In terms of correctly classified attention, the aim of the study was to establish the accuracy and the inter-rater reliability of driving assessments of visual attention from the back seat. Furthermore, by establishing eye contact between the assessor and the driver through an additional mirror on the wind screen, the present study aimed to establish how much such an intervention would enhance the accuracy of the visual attention assessment. Methods Two drivers with Parkinson's disease (PD) and six control drivers drove a fixed route in a driving simulator while wearing a head mounted eye tracker. The eye tracker data showed where the foveal visual attention actually was directed. These data were time stamped and compared with the simultaneous manual scoring of the visual attention of the drivers. In four of the drivers, one with Parkinson's disease, a mirror on the windscreen was set up to arrange for eye contact between the driver and the assessor. Inter-rater reliability was performed with one of the Parkinson drivers driving, but without the mirror. Results Without mirror, the overall accuracy was 56% when assessing the three control drivers and with mirror 83%. However, for the PD driver without mirror the accuracy was 94%, whereas for the PD driver with a mirror the accuracy was 90%. With respect to the inter-rater reliability, a 73% agreement was found. Conclusion If the final outcome of a driving assessment is dependent on the subcategory of a protocol assessing visual attention, we suggest the use of an additional mirror to establish eye contact between the assessor and the driver. The clinicians’ observations on-road should not be a standalone assessment in driving assessments. Instead, eye trackers should be employed for further analyses and correlation in cases where there is doubt about a driver's attention. PMID:22461850
The reliability and validity of the Turkish version of Fullerton Advanced Balance (FAB-T) scale.
Iyigun, Gozde; Kirmizigil, Berkiye; Angin, Ender; Oksuz, Sevim; Can, Filiz; Eker, Levent; Rose, Debra J
2018-06-04
The aim of this study was to evaluate the reliability and validity of the Turkish version of the FAB(FAB-T) scale in the older Turkish adults. The reliability and validity of the scale was tested on 200 community-dwelling older adults. FAB-T scale was scored by different physiotherapists on different days to evaluate inter-rater and intrarater reliability. The Berg Balance Scale (BBS) was used for the evaluation of convergent validity, and the content validity of the FAB-T scale was investigated. The FAB-T scale showed very high inter- and intra-rater reliability. For inter-rater agreement, on the individual test items and total score ICC values were 0.92 (95 %CI; 0.90-0.94) and 0.96 (95% CI; 0.95-0.97) respectively. The intra-rater agreement, on the individual test items and total score ICC values were 0.93 (95 %CI; 0.91- 0.95) and 0.96 (95% CI; 0.95- 0.97) respectively. There was a good agreement between the FAB-T and BBS scales. A high correlation was found between the BBS and FAB-T scales [rho = 0.70 (%95 CI; 0.62-0.76)] indicating good convergent validity. Considering the content validity of the FAB-T scale, no floor (floor score: 0%) or ceiling (ceiling score: 6.5%) effect was detected. The FAB-T scale was successfully translated from the original English version (FAB) and demonstrated strong psychometric features. It was found that the FAB-T scale has very high inter-rater and intra-rater reliability. Considering the convergent validity, the scale has high correlation with the BBS. The FAB-T has no floor and ceiling effect. Copyright © 2018 Elsevier B.V. All rights reserved.
Downer, Jason T.; Booren, Leslie M.; Lima, Olivia K.; Luckner, Amy E.; Pianta, Robert C.
2012-01-01
This paper introduces the Individualized Classroom Assessment Scoring System (inCLASS), an observation tool that targets children’s interactions in preschool classrooms with teachers, peers, and tasks. In particular, initial evidence is reported of the extent to which the inCLASS meets the following psychometric criteria: inter-rater reliability, normal distributions and adequate range, construct validity, and criterion-related validity. These initial findings suggest that the inCLASS has the potential to provide an authentic, contextualized assessment of young children’s classroom behaviors. Future directions for research with the inCLASS are discussed. PMID:23175598
Ban, Ilija; Troelsen, Anders; Kristensen, Morten Tange
2016-10-01
The Constant score (CS) has been the primary endpoint in most studies on clavicle fractures. However, the CS was not developed to assess patients with clavicle fractures. Our aim was to examine inter-rater reliability and agreement of the CS in patients with clavicle fractures. The secondary aim was to estimate the correlation between the CS and the Disabilities of the Arm, Shoulder and Hand score and the internal consistency of the 2 scores. On the basis of sample sizing, 36 patients (31 male and 5 female patients; mean age, 41.3 years) with clavicle fractures underwent standardized CS assessment at a mean of 6.8 weeks (SD, 1.0 weeks) after injury. Reliability and agreement of the CS were determined by 2 raters. The interclass correlation coefficient (ICC2,1), standard error of measurement, minimal detectable change, Cronbach α coefficient, and Pearson correlation coefficient were estimated. Inter-rater reliability of the total CS was excellent (interclass correlation coefficient, 0.94; 95% confidence interval, 0.88-0.97), with no systematic difference between the 2 raters (P = .75). The standard error of measurement (measurement error at the group level) was 4.9, whereas the minimal detectable change (smallest change needed to indicate a real change for an individual) was 13.6 CS points. The internal consistency of the 10 CS items was good, with a Cronbach α of .85, and we found a strong correlation (r = -0.92) between the CS and Disabilities of the Arm, Shoulder and Hand score. The CS was found to be reliable for assessing patients with clavicle fractures, especially at the group level. With high inter-rater reliability and agreement, in addition to good internal consistency, the standardized CS used in this study can be used for comparison of results from different settings. Copyright © 2016 Journal of Shoulder and Elbow Surgery Board of Trustees. Published by Elsevier Inc. All rights reserved.
Multidimensional measures validated for home health needs of older persons: A systematic review.
de Rossi Figueiredo, Daniela; Paes, Lucilene Gama; Warmling, Alessandra Martins; Erdmann, Alacoque Lorenzini; de Mello, Ana Lúcia Schaefer Ferreira
2018-01-01
To conduct a systematic review of the literature on valid and reliable multidimensional instruments to assess home health needs of older persons. Systematic review. Electronic databases, PubMed/Medline, Web of Science, Scopus, Cumulative Index to Nursing and Allied Health Literature, Scientific Electronic Library Online and the Latin American and Caribbean Health Sciences Information. All English, Portuguese and Spanish literature which included studies of reliability and validity of instruments that assessed at least two dimensions: physical, psychological, social support and functional independence, self-rated health behaviors and contextual environment and if such instruments proposed interventions after evaluation and/or monitoring changes over a period of time. Older persons aged 60 years or older. Of the 2397 studies identified, 32 were considered eligible. Two-thirds of the instruments proposed the physical, psychological, social support and functional independence dimensions. Inter-observer and intra-observer reliability and internal consistency values were 0.7 or above. More than two-thirds of the studies included validity (n=26) and more than one validity was tested in 15% (n=4) of these. Only 7% (n=2) proposed interventions after evaluation and/or monitoring changes over a period of time. Although the multidimensional assessment was performed, and the reliability values of the reviewed studies were satisfactory, different validity tests were not present in several studies. A gap at the instrument conception was observed related to interventions after evaluation and/or monitoring changes over a period of time. Further studies with this purpose are necessary for home health needs of the older persons. Copyright © 2017 Elsevier Ltd. All rights reserved.
Costa, Ana C S; Dibai Filho, Almir V; Packer, Amanda C; Rodrigues-Bigaton, Delaine
2013-01-01
Infrared thermography is an aid tool that can be used to evaluate several pathologies given its efficiency in analyzing the distribution of skin surface temperature. To propose two forms of infrared image analysis of the masticatory and upper trapezius muscles, and to determine the intra and inter-rater reliability of both forms of analysis. Infrared images of masticatory and upper trapezius muscles of 64 female volunteers with and without temporomandibular disorder (TMD) were collected. Two raters performed the infrared image analysis, which occurred in two ways: temperature measurement of the muscle length and in central portion of the muscle. The Intraclass Correlation Coefficient (ICC) was used to determine the intra and inter-rater reliability. The ICC showed excellent intra and inter-rater values for both measurements: temperature measurement of the muscle length (TMD group, intra-rater, ICC ranged from 0.996 to 0.999, inter-rater, ICC ranged from 0.992 to 0.999; control group, intra-rater, ICC ranged from 0.993 to 0.998, inter-rater, ICC ranged from 0.990 to 0.998), and temperature measurement of the central portion of the muscle (TMD group, intra-rater, ICC ranged from 0.981 to 0.998, inter-rater, ICC ranged from 0.971 to 0.998; control group, intra-rater, ICC ranged from 0.887 to 0.996, inter-rater, ICC ranged from 0.852 to 0.996). The results indicated that temperature measurements of the masticatory and upper trapezius muscles carried out by the analysis of the muscle length and central portion yielded excellent intra and inter-rater reliability.
Infant polysomnography: reliability and validity of infant arousal assessment.
Crowell, David H; Kulp, Thomas D; Kapuniai, Linda E; Hunt, Carl E; Brooks, Lee J; Weese-Mayer, Debra E; Silvestri, Jean; Ward, Sally Davidson; Corwin, Michael; Tinsley, Larry; Peucker, Mark
2002-10-01
Infant arousal scoring based on the Atlas Task Force definition of transient EEG arousal was evaluated to determine (1). whether transient arousals can be identified and assessed reliably in infants and (2). whether arousal and no-arousal epochs scored previously by trained raters can be validated reliably by independent sleep experts. Phase I for inter- and intrarater reliability scoring was based on two datasets of sleep epochs selected randomly from nocturnal polysomnograms of healthy full-term, preterm, idiopathic apparent life-threatening event cases, and siblings of Sudden Infant Death Syndrome infants of 35 to 64 weeks postconceptional age. After training, test set 1 reliability was assessed and discrepancies identified. After retraining, test set 2 was scored by the same raters to determine interrater reliability. Later, three raters from the trained group rescored test set 2 to assess inter- and intrarater reliabilities. Interrater and intrarater reliability kappa's, with 95% confidence intervals, ranged from substantial to almost perfect levels of agreement. Interrater reliabilities for spontaneous arousals were initially moderate and then substantial. During the validation phase, 315 previously scored epochs were presented to four sleep experts to rate as containing arousal or no-arousal events. Interrater expert agreements were diverse and considered as noninterpretable. Concordance in sleep experts' agreements, based on identification of the previously sampled arousal and no-arousal epochs, was used as a secondary evaluative technique. Results showed agreement by two or more experts on 86% of the Collaborative Home Infant Monitoring Evaluation Study arousal scored events. Conversely, only 1% of the Collaborative Home Infant Monitoring Evaluation Study-scored no-arousal epochs were rated as an arousal. In summary, this study presents an empirically tested model with procedures and criteria for attaining improved reliability in transient EEG arousal assessments in infants using the modified Atlas Task Force standards. With training based on specific criteria, substantial inter- and intrarater agreement in identifying infant arousals was demonstrated. Corroborative validation results were too disparate for meaningful interpretation. Alternate evaluation based on concordance agreements supports reliance on infant EEG criteria for assessment. Results mandate additional confirmatory validation studies with specific training on infant EEG arousal assessment criteria.
Medial tibial stress syndrome can be diagnosed reliably using history and physical examination.
Winters, M; Bakker, E W P; Moen, M H; Barten, C C; Teeuwen, R; Weir, A
2017-02-08
The majority of sporting injuries are clinically diagnosed using history and physical examination as the cornerstone. There are no studies supporting the reliability of making a clinical diagnosis of medial tibial stress syndrome (MTSS). Our aim was to assess if MTSS can be diagnosed reliably, using history and physical examination. We also investigated if clinicians were able to reliably identify concurrent lower leg injuries. A clinical reliability study was performed at multiple sports medicine sites in The Netherlands. Athletes with non-traumatic lower leg pain were assessed for having MTSS by two clinicians, who were blinded to each others' diagnoses. We calculated the prevalence, percentage of agreement, observed percentage of positive agreement (Ppos), observed percentage of negative agreement (Pneg) and Kappa-statistic with 95%CI. Forty-nine athletes participated in this study, of whom 46 completed both assessments. The prevalence of MTSS was 74%. The percentage of agreement was 96%, with Ppos and Pneg of 97% and 92%, respectively. The inter-rater reliability was almost perfect; k=0.89 (95% CI 0.74 to 1.00), p<0.000001. Of the 34 athletes with MTSS, 11 (32%) had a concurrent lower leg injury, which was reliably noted by our clinicians, k=0.73, 95% CI 0.48 to 0.98, p<0.0001. Our findings show that MTSS can be reliably diagnosed clinically using history and physical examination, in clinical practice and research settings. We also found that concurrent lower leg injuries are common in athletes with MTSS. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Salavati, M; Waninge, A; Rameckers, E A A; de Blécourt, A C E; Krijnen, W P; Steenbergen, B; van der Schans, C P
2015-02-01
The aims of this study were to adapt the Paediatric Evaluation of Disability Inventory, Dutch version (PEDI-NL) for children with cerebral visual impairment (CVI) and cerebral palsy (CP) and determine test-retest and inter-respondent reliability. The Delphi method was used to gain consensus among twenty-one health experts familiar with CVI. Test-retest and inter-respondent reliability were assessed for parents and caregivers of 75 children (aged 50-144 months) with CP and CVI. The percentage identical scores of item scores were computed, as well as the interclass coefficients (ICC) and Cronbach's alphas of scale scores over the domains self-care, mobility, and social function. All experts agreed on the adaptation of the PEDI-NL for children with CVI. On item score, for the Functional Skills scale, mean percentage identical scores variations for test-retest reliability were 73-79 with Caregiver Assistance scale 73-81, and for inter-respondent reliability 21-76 with Caregiver Assistance scale 40-43. For all scales over all domains ICCs exceeded 0.87. For the domains self-care, mobility, and social function, the Functional Skills scale and the Caregiver Assistance scale have Cronbach's alpha above 0.88. The adapted PEDI-NL for children with CP and CVI is reliable and comparable to the original PEDI-NL. Copyright © 2014 Elsevier Ltd. All rights reserved.
Validation of the Italian version of the Coma Recovery Scale-Revised (CRS-R).
Sacco, Simona; Altobelli, Emma; Pistarini, Caterina; Cerone, Davide; Cazzulani, Benedetta; Carolei, Antonio
2011-01-01
To validate the Italian version of the Coma Recovery Scale-Revised (CRS-R). Two observers applied the Italian version of the CRS-R to selected patients. On day 1, observer A and B independently scored each patient; the comparison of their observations was used to evaluate inter-observer agreement. On day 2, observer A completed a second evaluation and the comparison of this observation with that obtained on day 1 by the same observer was used to evaluate test-re-test agreement. For each evaluation, also diagnostic impression (vegetative state/minimally conscious state) was reported. Thirty-eight patients were evaluated (mean age ± SD, 58.9 ± 13.8 years). Inter-observer (ρ = 0.81; p < 0.001) as well as test-re-test agreement (ρ = 0.97; p < 0.001) for the total score was high. Inter-observer agreement was excellent for the communication sub-scale, good for the auditory, visual and motor sub-scales and moderate for the oromotor/verbal and arousal sub-scales. Test-re-test agreement was excellent for the visual, motor, oromotor/verbal and communication sub-scales, good for the auditory sub-scale and moderate for the arousal sub-scale. When considering the diagnostic impression, inter-observer agreement was good (κ = 0.75; p < 0.001) and test-re-test agreement was excellent (κ = 0.92; p < 0.001). The Italian version of the CRS-R can be administered reliably and can be also employed to discriminate patients in vegetative and in minimally conscious state.
Are Various Forms of Locomotion-Speed Diverse or Unique Performance Quality?
Cavar, Mile; Corluka, Marin; Cerkez, Ivana; Culjak, Zoran; Sekulic, Damir
2013-01-01
The forward-sprint is considered to be, and is regularly performed as, a unique measure of “on-ground” linear-speed performance. Thus far, no investigation has simultaneously studied different forms of linear-speed or investigated whether different forms of linear-speed should be observed as unique performance quality. The purpose of this study was to determine (I) the achievements (i.e. execution time), and (II) the reliability and inter-relationships between various linear-speed performances. The participants were 42 male physical education students with substantial sport-specific backgrounds. We applied a total of six tests: three quadrupedal (supine backward, supine forward, and pronate backward locomotion) and three bipedal-performances (forward sprinting, backward sprinting, lateral shuffling). All of the tests showed appropriate reliability parameters (Cronbach Alpha ranged from 0.91 to 0.97; Inter-Item-R 0.78–0.92; Coefficient-of-Variation 1.3–9.1). The tests used in this study shared between 9% and 50% of the common variance. Our results suggest that different activities require activity-specific tests of linear-speed. This is particularly significant in those sports and activities in which quadrupedal locomotion patterns are highly important (wrestling, physically trained military services, law enforcement, fire and rescue, protective services). PMID:24235984
A medical record review for functional somatic symptoms in children.
Rask, Charlotte Ulrikka; Borg, Carsten; Søndergaard, Charlotte; Schulz-Pedersen, Søren; Thomsen, Per Hove; Fink, Per
2010-04-01
The objectives of this study were to develop and test a systematic medical record review for functional somatic symptoms (FSSs) in paediatric patients and to estimate the inter-rater reliability of paediatricians' recognition of FSSs and their associated impairments while using this method. We developed the Medical Record Review for Functional Somatic Symptoms in Children (MRFC) for retrospective medical record review. Described symptoms were categorised as probably, definitely, or not FSSs. FSS-associated impairment was also determined. Three paediatricians performed the MRFC on the medical records of 54 children with a diagnosed, well-defined physical disease and 59 with 'symptom' diagnoses. The inter-rater reliabilities of the recognition and associated impairment of FSSs were tested on 20 of these records. The MRFC allowed identification of subgroups of children with multisymptomatic FSSs, long-term FSSs, and/or impairing FSSs. The FSS inter-rater reliability was good (combined kappa=0.69) but only fair as far as associated impairment was concerned (combined kappa=0.29). In the hands of skilled paediatricians, the MRFC is a reliable method for identifying paediatric patients with diverse types of FSSs for clinical research. However, additional information is needed for reliable judgement of impairment. The method may also prove useful in clinical practice. Copyright 2010 Elsevier Inc. All rights reserved.
A comparison of three observational techniques for assessing postural loads in industry.
Kee, Dohyung; Karwowski, Waldemar
2007-01-01
This study aims to compare 3 observational techniques for assessing postural load, namely, OWAS, RULA, and REBA. The comparison was based on the evaluation results generated by the classification techniques using 301 working postures. All postures were sampled from the iron and steel, electronics, automotive, and chemical industries, and a general hospital. While only about 21% of the 301 postures were classified at the action category/level 3 or 4 by both OWAS and REBA, about 56% of the postures were classified into action level 3 or 4 by RULA. The inter-method reliability for postural load category between OWAS and RULA was just 29.2%, and the reliability between RULA and REBA was 48.2%. These results showed that compared to RULA, OWAS, and REBA generally underestimated postural loads for the analyzed postures, irrespective of industry, work type, and whether or not the body postures were in a balanced state.
Kinematic repeatability of a multi-segment foot model for dance.
Carter, Sarah L; Sato, Nahoko; Hopper, Luke S
2018-03-01
The purpose of this study was to determine the intra and inter-assessor repeatability of a modified Rizzoli Foot Model for analysing the foot kinematics of ballet dancers. Six university-level ballet dancers performed the movements; parallel stance, turnout plié, turnout stance, turnout rise and flex-point-flex. The three-dimensional (3D) position of individual reflective markers and marker triads was used to model the movement of the dancers' tibia, entire foot, hindfoot, midfoot, forefoot and hallux. Intra and inter-assessor reliability demonstrated excellent (ICC ≥ 0.75) repeatability for the first metatarsophalangeal joint in the sagittal plane. Intra-assessor reliability demonstrated excellent (ICC ≥ 0.75) repeatability during flex-point-flex across all inter-segmental angles except for the tibia-hindfoot and hindfoot-midfoot frontal planes. Inter-assessor repeatability ranged from poor to excellent (0.5 > ICC ≥ 0.75) for the 3D segment rotations. The most repeatable measure was the tibia-foot dorsiflexion/plantar flexion articulation whereas the least repeatable measure was the hindfoot-midfoot adduction/abduction articulation. The variation found in the inter-assessor results is likely due to inconsistencies in marker placement. This 3D dance specific multi-segment foot model provides insight into which kinematic measures can be reliably used to ascertain in vivo technical errors and/or biomechanical abnormalities in a dancer's foot motion.
A TWIN STUDY OF SCHIZOAFFECTIVE-MANIA, SCHIZOAFFECTIVE-DEPRESSION AND OTHER PSYCHOTIC SYNDROMES
Cardno, Alastair G; Rijsdijk, Frühling V; West, Robert M; Gottesman, Irving I; Craddock, Nick; Murray, Robin M; McGuffin, Peter
2012-01-01
The nosological status of schizoaffective disorders remains controversial. Twin studies are potentially valuable for investigating relationships between schizoaffective-mania, schizoaffective-depression and other psychotic syndromes, but no such study has yet been reported. We ascertained 224 probandwise twin pairs (106 monozygotic, 118 same-sex dizygotic), where probands had psychotic or manic symptoms, from the Maudsley Twin Register in London (1948–1993). We investigated Research Diagnostic Criteria schizoaffective-mania, schizoaffective-depression, schizophrenia, mania and depressive psychosis primarily using a non-hierarchical classification, and additionally using hierarchical and data-derived classifications, and a classification featuring broad schizophrenic and manic syndromes without separate schizoaffective syndromes. We investigated inter-rater reliability and co-occurrence of syndromes within twin probands and twin pairs. The schizoaffective syndromes showed only moderate inter-rater reliability. There was general significant co-occurrence between syndromes within twin probands and monozygotic pairs, and a trend for schizoaffective-mania and mania to have the greatest co-occurrence. Schizoaffective syndromes in monozygotic probands were associated with relatively high risk of a psychotic syndrome occurring in their co-twins. The classification of broad schizophrenic and manic syndromes without separate schizoaffective syndromes showed improved inter-rater reliability, but high genetic and environmental correlations between the two broad syndromes. The results are consistent with regarding schizoaffective-mania as due to co-occurring elevated liability to schizophrenia, mania and depression; and schizoaffective-depression as due to co-occurring elevated liability to schizophrenia and depression, but with less elevation of liability to mania. If in due course schizoaffective syndromes show satisfactory inter-rater reliability and some specific etiological factors they could alternatively be regarded as partly independent disorders. PMID:22213671
A twin study of schizoaffective-mania, schizoaffective-depression, and other psychotic syndromes.
Cardno, Alastair G; Rijsdijk, Frühling V; West, Robert M; Gottesman, Irving I; Craddock, Nick; Murray, Robin M; McGuffin, Peter
2012-03-01
The nosological status of schizoaffective disorders remains controversial. Twin studies are potentially valuable for investigating relationships between schizoaffective-mania, schizoaffective-depression, and other psychotic syndromes, but no such study has yet been reported. We ascertained 224 probandwise twin pairs [106 monozygotic (MZ), 118 same-sex dizygotic (DZ)], where probands had psychotic or manic symptoms, from the Maudsley Twin Register in London (1948-1993). We investigated Research Diagnostic Criteria schizoaffective-mania, schizoaffective-depression, schizophrenia, mania and depressive psychosis primarily using a non-hierarchical classification, and additionally using hierarchical and data-derived classifications, and a classification featuring broad schizophrenic and manic syndromes without separate schizoaffective syndromes. We investigated inter-rater reliability and co-occurrence of syndromes within twin probands and twin pairs. The schizoaffective syndromes showed only moderate inter-rater reliability. There was general significant co-occurrence between syndromes within twin probands and MZ pairs, and a trend for schizoaffective-mania and mania to have the greatest co-occurrence. Schizoaffective syndromes in MZ probands were associated with relatively high risk of a psychotic syndrome occurring in their co-twins. The classification of broad schizophrenic and manic syndromes without separate schizoaffective syndromes showed improved inter-rater reliability, but high genetic and environmental correlations between the two broad syndromes. The results are consistent with regarding schizoaffective-mania as due to co-occurring elevated liability to schizophrenia, mania, and depression; and schizoaffective-depression as due to co-occurring elevated liability to schizophrenia and depression, but with less elevation of liability to mania. If in due course schizoaffective syndromes show satisfactory inter-rater reliability and some specific etiological factors they could alternatively be regarded as partly independent disorders. Copyright © 2011 Wiley Periodicals, Inc.
Savoia, Elena; Biddinger, Paul D; Burstein, Jon; Stoto, Michael A
2010-01-01
As proxies for actual emergencies, drills and exercises can raise awareness, stimulate improvements in planning and training, and provide an opportunity to examine how different components of the public health system would combine to respond to a challenge. Despite these benefits, there remains a substantial need for widely accepted and prospectively validated tools to evaluate agencies' and hospitals' performance during such events. Unfortunately, to date, few studies have focused on addressing this need. The purpose of this study was to assess the validity and reliability of a qualitative performance assessment tool designed to measure hospitals' communication and operational capabilities during a functional exercise. The study population included 154 hospital personnel representing nine hospitals that participated in a functional exercise in Massachusetts in June 2008. A 25-item questionnaire was developed to assess the following three hospital functional capabilities: (1) inter-agency communication; (2) communication with the public; and (3) disaster operations. Analyses were conducted to examine internal consistency, associations among scales, the empirical structure of the items, and inter-rater agreement. Twenty-two questions were retained in the final instrument, which demonstrated reliability with alpha coefficients of 0.83 or higher for all scales. A three-factor solution from the principal components analysis accounted for 57% of the total variance, and the factor structure was consistent with the original hypothesized domains. Inter-rater agreement between participants' self reported scores and external evaluators' scores ranged from moderate to good. The resulting 22-item performance measurement tool reliably measured hospital capabilities in a functional exercise setting, with preliminary evidence of concurrent and criterion-related validity.
THE INTRA- AND INTER-RATER RELIABILITY OF THE SOCCER INJURY MOVEMENT SCREEN (SIMS).
McCunn, Robert; Aus der Fünten, Karen; Govus, Andrew; Julian, Ross; Schimpchen, Jan; Meyer, Tim
2017-02-01
The growing volume of movement screening research reveals a belief among practitioners and researchers alike that movement quality may have an association with injury risk. However, existing movement screening tools have not considered the sport-specific movement and injury patterns relevant to soccer. The present study introduces the Soccer Injury Movement Screen (SIMS), which has been designed specifically for use within soccer. Furthermore, the purpose of the present study was to assess the intra- and inter-rater reliability of the SIMS and determine its suitability for use in further research. The study utilized a test-retest design to discern reliablility. Twenty-five (11 males, 14 females) healthy, recreationally active university students (age 25.5 ± 4.0 years, height 171 ± 9 cm, weight 64.7 ± 12.6 kg) agreed to participate. The SIMS contains five sub-tests: the anterior reach, single-leg deadlift, in-line lunge, single-leg hop for distance and tuck jump. Each movement was scored out of 10 points and summed to produce a composite score out of 50. The anterior reach and single-leg hop for distance were scored in real-time while the remaining tests were filmed and scored retrospectively. Three raters conducted the SIMS with each participant on three occasions separated by an average of three and a half days (minimum one day, maximum seven days). Rater 1 re-scored the filmed movements for all participants on all occasions six months later to establish the 'pure' intra-rater (intra-occasion) reliability for those movements. Intraclass correlation coefficient (ICC) values for intra- and inter-rater composite score reliability ranged from 0.66-0.72 and 0.79-0.86 respectively. Weighted kappa values representing the intra- and inter-rater reliability of the individual sub-tests ranged from 0.35-0.91 indicating fair to almost perfect agreement. Establishing the reliability of the SIMS is a prerequisite for further research seeking to investigate the relationship between test score and subsequent injury. The present results indicate acceptable reliability for this purpose; however, room for further development of the intra-rater reliability exists for some of the individual sub-tests. 2b.
THE INTRA- AND INTER-RATER RELIABILITY OF THE SOCCER INJURY MOVEMENT SCREEN (SIMS)
aus der Fünten, Karen; Govus, Andrew; Julian, Ross; Schimpchen, Jan; Meyer, Tim
2017-01-01
Background/purpose The growing volume of movement screening research reveals a belief among practitioners and researchers alike that movement quality may have an association with injury risk. However, existing movement screening tools have not considered the sport-specific movement and injury patterns relevant to soccer. The present study introduces the Soccer Injury Movement Screen (SIMS), which has been designed specifically for use within soccer. Furthermore, the purpose of the present study was to assess the intra- and inter-rater reliability of the SIMS and determine its suitability for use in further research. Methods The study utilized a test-retest design to discern reliablility. Twenty-five (11 males, 14 females) healthy, recreationally active university students (age 25.5 ± 4.0 years, height 171 ± 9 cm, weight 64.7 ± 12.6 kg) agreed to participate. The SIMS contains five sub-tests: the anterior reach, single-leg deadlift, in-line lunge, single-leg hop for distance and tuck jump. Each movement was scored out of 10 points and summed to produce a composite score out of 50. The anterior reach and single-leg hop for distance were scored in real-time while the remaining tests were filmed and scored retrospectively. Three raters conducted the SIMS with each participant on three occasions separated by an average of three and a half days (minimum one day, maximum seven days). Rater 1 re-scored the filmed movements for all participants on all occasions six months later to establish the ‘pure’ intra-rater (intra-occasion) reliability for those movements. Results Intraclass correlation coefficient (ICC) values for intra- and inter-rater composite score reliability ranged from 0.66-0.72 and 0.79-0.86 respectively. Weighted kappa values representing the intra- and inter-rater reliability of the individual sub-tests ranged from 0.35-0.91 indicating fair to almost perfect agreement. Conclusions Establishing the reliability of the SIMS is a prerequisite for further research seeking to investigate the relationship between test score and subsequent injury. The present results indicate acceptable reliability for this purpose; however, room for further development of the intra-rater reliability exists for some of the individual sub-tests. Level of evidence 2b PMID:28217416
Gao, Zhongyang; Song, Hui; Ren, Fenggang; Li, Yuhuan; Wang, Dong; He, Xijing
2017-12-01
The aim of the present study was to evaluate the reliability of the Cartesian Optoelectronic Dynamic Anthropometer (CODA) motion system in measuring the cervical range of motion (ROM) and verify the construct validity of the CODA motion system. A total of 26 patients with cervical spondylosis and 22 patients with anterior cervical fusion were enrolled and the CODA motion analysis system was used to measure the three-dimensional cervical ROM. Intra- and inter-rater reliability was assessed by interclass correlation coefficients (ICCs), standard error of measurement (SEm), Limits of Agreements (LOA) and minimal detectable change (MDC). Independent samples t-tests were performed to examine the differences of cervical ROM between cervical spondylosis and anterior cervical fusion patients. The results revealed that in the cervical spondylosis group, the reliability was almost perfect (intra-rater reliability: ICC, 0.87-0.95; LOA, -12.86-13.70; SEm, 2.97-4.58; inter-rater reliability: ICC, 0.84-0.95; LOA, -13.09-13.48; SEm, 3.13-4.32). In the anterior cervical fusion group, the reliability was high (intra-rater reliability: ICC, 0.88-0.97; LOA, -10.65-11.08; SEm, 2.10-3.77; inter-rater reliability: ICC, 0.86-0.96; LOA, -10.91-13.66; SEm, 2.20-4.45). The cervical ROM in the cervical spondylosis group was significantly higher than that in the anterior cervical fusion group in all directions except for left rotation. In conclusion, the CODA motion analysis system is highly reliable in measuring cervical ROM and the construct validity was verified, as the system was sufficiently sensitive to distinguish between the cervical spondylosis and anterior cervical fusion groups based on their ROM.
Grant, Jon E; Kim, Suck Won; McCabe, James S
2006-06-01
Kleptomania presents difficulties in diagnosis for clinicians. This study aimed to develop and test a DSM-IV-based diagnostic instrument for kleptomania. To assess for current kleptomania the Structured Clinical Interview for Kleptomania (SCI-K) was administered to 112 consecutive subjects requesting psychiatric outpatient treatment for a variety of disorders. Reliability and validity were determined. Classification accuracy was examined using the longitudinal course of illness. The SCI-K demonstrated excellent test-retest (Phi coefficient = 0.956 (95% CI = 0.937, 0.970)) and inter-rater reliability (phi coefficient = 0.718 (95% CI = 0.506, 0.848)) in the diagnosis of kleptomania. Concurrent validity was observed with a self-report measure using DSM-IV kleptomania criteria (phi coefficient = 0.769 (95% CI = 0.653, 0.850)). Discriminant validity was observed with a measure of depression (point biserial coefficient = -0.020 (95% CI = -0.205, 0.166)). The SCI-K demonstrated both high sensitivity and specificity based on longitudinal assessment. The SCI-K demonstrated excellent reliability and validity in diagnosing kleptomania in subjects presenting with various psychiatric problems. These findings require replication in larger groups, including non-psychiatric populations, to examine their generalizability. Copyright (c) 2006 John Wiley & Sons, Ltd.
Bastida Castillo, Alejandro; Gómez Carmona, Carlos D; De la Cruz Sánchez, Ernesto; Pino Ortega, José
2018-05-01
There is interest in the accuracy and inter-unit reliability of position-tracking systems to monitor players. Research into this technology, although relatively recent, has grown exponentially in the last years, and it is difficult to find professional team sport that does not use Global Positioning System (GPS) technology at least. The aim of this study is to know the accuracy of both GPS-based and Ultra Wide Band (UWB)-based systems on a soccer field and their inter- and intra-unit reliability. A secondary aim is to compare them for practical applications in sport science. Following institutional ethical approval and familiarization, 10 healthy and well-trained former soccer players (20 ± 1.6 years, 1.76 ± 0.08 cm, and 69.5 ± 9.8 kg) performed three course tests: (i) linear course, (ii) circular course, and (iii) a zig-zag course, all using UWB and GPS technologies. The average speed and distance covered were compared with timing gates and the real distance as references. The UWB technology showed better accuracy (bias: 0.57-5.85%), test-retest reliability (%TEM: 1.19), and inter-unit reliability (bias: 0.18) in determining distance covered than the GPS technology (bias: 0.69-6.05%; %TEM: 1.47; bias: 0.25) overall. Also, UWB showed better results (bias: 0.09; ICC: 0.979; bias: 0.01) for mean velocity measurement than GPS (bias: 0.18; ICC: 0.951; bias: 0.03).
Validation of Clinical Observations of Mastication in Persons with ALS.
Simione, Meg; Wilson, Erin M; Yunusova, Yana; Green, Jordan R
2016-06-01
Amyotrophic lateral sclerosis (ALS) is a progressive neurological disease that can result in difficulties with mastication leading to malnutrition, choking or aspiration, and reduced quality of life. When evaluating mastication, clinicians primarily observe spatial and temporal aspects of jaw motion. The reliability and validity of clinical observations for detecting jaw movement abnormalities is unknown. The purpose of this study is to determine the reliability and validity of clinician-based ratings of chewing performance in neuro-typical controls and persons with varying degrees of chewing impairments due to ALS. Adults chewed a solid food consistency while full-face video were recorded along with jaw kinematic data using a 3D optical motion capture system. Five experienced speech-language pathologists watched the videos and rated the spatial and temporal aspects of chewing performance. The jaw kinematic data served as the gold-standard for validating the clinicians' ratings. Results showed that the clinician-based rating of temporal aspects of chewing performance had strong inter-rater reliability and correlated well with comparable kinematic measures. In contrast, the reliability of rating the spatial and spatiotemporal aspects of chewing (i.e., range of motion of the jaw, consistency of the chewing pattern) was mixed. Specifically, ratings of range of motion were at best only moderately reliable. Ratings of chewing movement consistency were reliable but only weakly correlated with comparable measures of jaw kinematics. These findings suggest that clinician ratings of temporal aspects of chewing are appropriate for clinical use, whereas ratings of the spatial and spatiotemporal aspects of chewing may not be reliable or valid.
Reliability and type of consumer health documents on the World Wide Web: an annotation study.
Martin, Melanie J
2011-01-01
In this paper we present a detailed scheme for annotating medical web pages designed for health care consumers. The annotation is along two axes: first, by reliability (the extent to which the medical information on the page can be trusted), second, by the type of page (patient leaflet, commercial, link, medical article, testimonial, or support). We analyze inter-rater agreement among three judges for each axis. Inter-rater agreement was moderate (0.77 accuracy, 0.62 F-measure, 0.49 Kappa) on the page reliability axis and good (0.81 accuracy, 0.72 F-measure, 0.73 Kappa) along the page type axis. We have shown promising results in this study that appropriate classes of pages can be developed and used by human annotators to annotate web pages with reasonable to good agreement. No.
Effect of image resolution manipulation in rearfoot angle measurements obtained with photogrammetry
Sacco, I.C.N.; Picon, A.P.; Ribeiro, A.P.; Sartor, C.D.; Camargo-Junior, F.; Macedo, D.O.; Mori, E.T.T.; Monte, F.; Yamate, G.Y.; Neves, J.G.; Kondo, V.E.; Aliberti, S.
2012-01-01
The aim of this study was to investigate the influence of image resolution manipulation on the photogrammetric measurement of the rearfoot static angle. The study design was that of a reliability study. We evaluated 19 healthy young adults (11 females and 8 males). The photographs were taken at 1536 pixels in the greatest dimension, resized into four different resolutions (1200, 768, 600, 384 pixels) and analyzed by three equally trained examiners on a 96-pixels per inch (ppi) screen. An experienced physiotherapist marked the anatomic landmarks of rearfoot static angles on two occasions within a 1-week interval. Three different examiners had marked angles on digital pictures. The systematic error and the smallest detectable difference were calculated from the angle values between the image resolutions and times of evaluation. Different resolutions were compared by analysis of variance. Inter- and intra-examiner reliability was calculated by intra-class correlation coefficients (ICC). The rearfoot static angles obtained by the examiners in each resolution were not different (P > 0.05); however, the higher the image resolution the better the inter-examiner reliability. The intra-examiner reliability (within a 1-week interval) was considered to be unacceptable for all image resolutions (ICC range: 0.08-0.52). The whole body image of an adult with a minimum size of 768 pixels analyzed on a 96-ppi screen can provide very good inter-examiner reliability for photogrammetric measurements of rearfoot static angles (ICC range: 0.85-0.92), although the intra-examiner reliability within each resolution was not acceptable. Therefore, this method is not a proper tool for follow-up evaluations of patients within a therapeutic protocol. PMID:22911379
Effect of image resolution manipulation in rearfoot angle measurements obtained with photogrammetry.
Sacco, I C N; Picon, A P; Ribeiro, A P; Sartor, C D; Camargo-Junior, F; Macedo, D O; Mori, E T T; Monte, F; Yamate, G Y; Neves, J G; Kondo, V E; Aliberti, S
2012-09-01
The aim of this study was to investigate the influence of image resolution manipulation on the photogrammetric measurement of the rearfoot static angle. The study design was that of a reliability study. We evaluated 19 healthy young adults (11 females and 8 males). The photographs were taken at 1536 pixels in the greatest dimension, resized into four different resolutions (1200, 768, 600, 384 pixels) and analyzed by three equally trained examiners on a 96-pixels per inch (ppi) screen. An experienced physiotherapist marked the anatomic landmarks of rearfoot static angles on two occasions within a 1-week interval. Three different examiners had marked angles on digital pictures. The systematic error and the smallest detectable difference were calculated from the angle values between the image resolutions and times of evaluation. Different resolutions were compared by analysis of variance. Inter- and intra-examiner reliability was calculated by intra-class correlation coefficients (ICC). The rearfoot static angles obtained by the examiners in each resolution were not different (P > 0.05); however, the higher the image resolution the better the inter-examiner reliability. The intra-examiner reliability (within a 1-week interval) was considered to be unacceptable for all image resolutions (ICC range: 0.08-0.52). The whole body image of an adult with a minimum size of 768 pixels analyzed on a 96-ppi screen can provide very good inter-examiner reliability for photogrammetric measurements of rearfoot static angles (ICC range: 0.85-0.92), although the intra-examiner reliability within each resolution was not acceptable. Therefore, this method is not a proper tool for follow-up evaluations of patients within a therapeutic protocol.
Izatt, Maree T; Bateman, Gary R; Adam, Clayton J
2012-07-30
Vertebral rotation found in structural scoliosis contributes to trunkal asymmetry which is commonly measured with a simple Scoliometer device on a patient's thorax in the forward flexed position. The new generation of mobile 'smartphones' have an integrated accelerometer, making accurate angle measurement possible, which provides a potentially useful clinical tool for assessing rib hump deformity. This study aimed to compare rib hump angle measurements performed using a Smartphone and traditional Scoliometer on a set of plaster torsos representing the range of torsional deformities seen in clinical practice. Nine observers measured the rib hump found on eight plaster torsos moulded from scoliosis patients with both a Scoliometer and an Apple iPhone on separate occasions. Each observer repeated the measurements at least a week after the original measurements, and were blinded to previous results. Intra-observer reliability and inter-observer reliability were analysed using the method of Bland and Altman and 95% confidence intervals were calculated. The Intra-Class Correlation Coefficients (ICC) were calculated for repeated measurements of each of the eight plaster torso moulds by the nine observers. Mean absolute difference between pairs of iPhone/Scoliometer measurements was 2.1 degrees, with a small (1 degrees) bias toward higher rib hump angles with the iPhone. 95% confidence intervals for intra-observer variability were +/- 1.8 degrees (Scoliometer) and +/- 3.2 degrees (iPhone). 95% confidence intervals for inter-observer variability were +/- 4.9 degrees (iPhone) and +/- 3.8 degrees (Scoliometer). The measurement errors and confidence intervals found were similar to or better than the range of previously published thoracic rib hump measurement studies. The iPhone is a clinically equivalent rib hump measurement tool to the Scoliometer in spinal deformity patients. The novel use of plaster torsos as rib hump models avoids the variables of patient fatigue and discomfort, inconsistent positioning and deformity progression using human subjects in a single or multiple measurement sessions.
2012-01-01
Background Vertebral rotation found in structural scoliosis contributes to trunkal asymmetry which is commonly measured with a simple Scoliometer device on a patient's thorax in the forward flexed position. The new generation of mobile 'smartphones' have an integrated accelerometer, making accurate angle measurement possible, which provides a potentially useful clinical tool for assessing rib hump deformity. This study aimed to compare rib hump angle measurements performed using a Smartphone and traditional Scoliometer on a set of plaster torsos representing the range of torsional deformities seen in clinical practice. Methods Nine observers measured the rib hump found on eight plaster torsos moulded from scoliosis patients with both a Scoliometer and an Apple iPhone on separate occasions. Each observer repeated the measurements at least a week after the original measurements, and were blinded to previous results. Intra-observer reliability and inter-observer reliability were analysed using the method of Bland and Altman and 95% confidence intervals were calculated. The Intra-Class Correlation Coefficients (ICC) were calculated for repeated measurements of each of the eight plaster torso moulds by the nine observers. Results Mean absolute difference between pairs of iPhone/Scoliometer measurements was 2.1 degrees, with a small (1 degrees) bias toward higher rib hump angles with the iPhone. 95% confidence intervals for intra-observer variability were +/- 1.8 degrees (Scoliometer) and +/- 3.2 degrees (iPhone). 95% confidence intervals for inter-observer variability were +/- 4.9 degrees (iPhone) and +/- 3.8 degrees (Scoliometer). The measurement errors and confidence intervals found were similar to or better than the range of previously published thoracic rib hump measurement studies. Conclusions The iPhone is a clinically equivalent rib hump measurement tool to the Scoliometer in spinal deformity patients. The novel use of plaster torsos as rib hump models avoids the variables of patient fatigue and discomfort, inconsistent positioning and deformity progression using human subjects in a single or multiple measurement sessions. PMID:22846346
Sled, Elizabeth A.; Sheehy, Lisa M.; Felson, David T.; Costigan, Patrick A.; Lam, Miu; Cooke, T. Derek V.
2010-01-01
The objective of the study was to evaluate the reliability of frontal plane lower limb alignment measures using a landmark-based method by (1) comparing inter- and intra-reader reliability between measurements of alignment obtained manually with those using a computer program, and (2) determining inter- and intra-reader reliability of computer-assisted alignment measures from full-limb radiographs. An established method for measuring alignment was used, involving selection of 10 femoral and tibial bone landmarks. 1) To compare manual and computer methods, we used digital images and matching paper copies of five alignment patterns simulating healthy and malaligned limbs drawn using AutoCAD. Seven readers were trained in each system. Paper copies were measured manually and repeat measurements were performed daily for 3 days, followed by a similar routine with the digital images using the computer. 2) To examine the reliability of computer-assisted measures from full-limb radiographs, 100 images (200 limbs) were selected as a random sample from 1,500 full-limb digital radiographs which were part of the Multicenter Osteoarthritis (MOST) Study. Three trained readers used the software program to measure alignment twice from the batch of 100 images, with two or more weeks between batch handling. Manual and computer measures of alignment showed excellent agreement (intraclass correlations [ICCs] 0.977 – 0.999 for computer analysis; 0.820 – 0.995 for manual measures). The computer program applied to full-limb radiographs produced alignment measurements with high inter- and intra-reader reliability (ICCs 0.839 – 0.998). In conclusion, alignment measures using a bone landmark-based approach and a computer program were highly reliable between multiple readers. PMID:19882339
Establishing Inter- and Intrarater Reliability for High-Stakes Testing Using Simulation.
Kardong-Edgren, Suzan; Oermann, Marilyn H; Rizzolo, Mary Anne; Odom-Maryon, Tamara
This article reports one method to develop a standardized training method to establish the inter- and intrarater reliability of a group of raters for high-stakes testing. Simulation is used increasingly for high-stakes testing, but without research into the development of inter- and intrarater reliability for raters. Eleven raters were trained using a standardized methodology. Raters scored 28 student videos over a six-week period. Raters then rescored all videos over a two-day period to establish both intra- and interrater reliability. One rater demonstrated poor intrarater reliability; a second rater failed all students. Kappa statistics improved from the moderate to substantial agreement range with the exclusion of the two outlier raters' scores. There may be faculty who, for different reasons, should not be included in high-stakes testing evaluations. All faculty are content experts, but not all are expert evaluators.
Mian, Nicholas D.; Carter, Alice S.; Pine, Daniel S.; Wakschlag, Lauren S.; Briggs-Gowan, Margaret J.
2015-01-01
Background Identifying anxiety disorders in preschool-age children represents an important clinical challenge. Observation is essential to clinical assessment and can help differentiate normative variation from clinically significant anxiety. Yet, most anxiety assessment methods for young children rely on parent-reports. The goal of this article is to present and preliminarily test the reliability and validity of a novel observational paradigm for assessing a range of fearful and anxious behaviors in young children, the Anxiety Dimensional Observation Schedule (Anx-DOS). Methods A diverse sample of 403 children, aged 3 to 6 years, and their mothers was studied. Reliability and validity in relation to parent reports (Preschool Age Psychiatric Assessment) and known risk factors, including indicators of behavioral inhibition (latency to touch novel objects) and attention bias to threat (in the dot-probe task) were investigated. Results The Anx-DOS demonstrated good inter-rater reliability and internal consistency. Evidence for convergent validity was demonstrated relative to mother-reported separation anxiety, social anxiety, phobic avoidance, trauma symptoms, and past service use. Finally, fearfulness was associated with observed latency and attention bias toward threat. Conclusions Findings support the Anx-DOS as a method for capturing early manifestations of fearfulness and anxiety in young children. Multimethod assessments incorporating standardized methods for assessing discrete, observable manifestations of anxiety may be beneficial for early identification and clinical intervention efforts. PMID:25773515
Why Are Experts Correlated? Decomposing Correlations between Judges
ERIC Educational Resources Information Center
Broomell, Stephen B.; Budescu, David V.
2009-01-01
We derive an analytic model of the inter-judge correlation as a function of five underlying parameters. Inter-cue correlation and the number of cues capture our assumptions about the environment, while differentiations between cues, the weights attached to the cues, and (un)reliability describe assumptions about the judges. We study the relative…
Schulman, A; Simpkins, K C
1975-07-01
The initial aim was to program a computer with information on the frequency of radiological signs in benign and malignant gastric ulcers in order to obtain a percentage probability of benignancy or malignancy in succeeding ulcers in clinical practice. However, only four of the many signs described in gastric ulcer were confirmed to be of validity (i.e. reliable existence) by an inter-observer variation study using two observers and the films from 69 barium meal examinations. These were projection or non-projection of the in-profile ulcer, presence or absence of adjacent mucosal folds, good or poor definition of the in-face ulcer's edge, and extension of radiating folds to the in-face ulcer's edge. A few more remained unassessed due to insufficient numbers of relevant cases. It is condluced that: as defined in the literature the majority of radiological signs in this field are of uncertain existence; and the four that were found to be valid do not fully describe the important appearances that may be seen in benign and malignant ulcers and would be inadequate to differentiate them to a sufficiently high degree of probability.
Assessing Lower Limb Alignment: Comparison of Standard Knee Xray vs Long Leg View.
Zampogna, Biagio; Vasta, Sebastiano; Amendola, Annunziato; Uribe-Echevarria Marbach, Bastian; Gao, Yubo; Papalia, Rocco; Denaro, Vincenzo
2015-01-01
High tibial osteotomy (HTO) is a well-established and commonly utilized technique in medial knee osteoarthritis secondary to varus malalignment. Accurate measurement of the preoperative limb alignment, and the amount of correction required are essential when planning limb realignment surgery. The hip-knee-ankle angle (HKA) measured on a full length weightbearing (FLWB) X-ray in the standing position is considered the gold standard, since it allows for reliable and accurate measurement of the mechanical axis of the whole lower extremity. In general practice, alignment is often evaluated on standard anteroposterior weightbearing (APWB) X-rays, as the angle between the femur and tibial anatomic axis (TFa). It is, therefore, of value to establish if measuring the anatomical axis from limited APWB is an effective measure of knee alignment especially in patients undergoing osteotomy about the knee. Three independent observers measured preoperative and postoperative FTa with standard method (FTa1) and with circles method (FTa2) on APWB X-ray and the HKA on FLWB X-ray at three different time-points separated by a two-week period. Intra-observer and inter-observer reliabilities and the comparison and relationship between anatomical and mechanical alignment were calculated. Intra- and interclass coefficients for all the three methods indicated excellent reliability, having all the values above 0.80. Using the mean of paired t-student test, the comparison of HKA versus TFa1 and TFa2 showed a statistically significant difference (p<.0001) both for the pre-operative and post-operative sets of values. The correlation between the HKA and FTal was found poor for the preoperative set (R=0.26) and fair for the postoperative one (R=0.53), while the new circles method showed a higher correlation in both the preoperative (R=0.71) and postoperative sets (R=0.79). Intra-observer reliability was high for HKA, FTal and FTa2 on APWB x-rays in the pre- and post-operative setting. Inter-rater reliability was higher for HKA and TFa2 compared to FTal. The femoro-tibial angle as measured on APWB with the traditional method (FTal) has a weak correlation with the HKA, and based on these findings, should not be used in everyday practice. The FTa2 showed better correlation with the HKA, although not excellent. Level III, Retrospective study.
Assessing Lower Limb Alignment: Comparison of Standard Knee Xray vs Long Leg View
Zampogna, Biagio; Vasta, Sebastiano; Amendola, Annunziato; Uribe-Echevarria Marbach, Bastian; Gao, Yubo; Papalia, Rocco; Denaro, Vincenzo
2015-01-01
Background High tibial osteotomy (HTO) is a well-established and commonly utilized technique in medial knee osteoarthritis secondary to varus malalignment. Accurate measurement of the preoperative limb alignment, and the amount of correction required are essential when planning limb realignment surgery. The hip-knee-ankle angle (HKA) measured on a full length weightbearing (FLWB) X-ray in the standing position is considered the gold standard, since it allows for reliable and accurate measurement of the mechanical axis of the whole lower extremity. In general practice, alignment is often evaluated on standard anteroposterior weightbearing (APWB) X-rays, as the angle between the femur and tibial anatomic axis (TFa). It is, therefore, of value to establish if measuring the anatomical axis from limited APWB is an effective measure of knee alignment especially in patients undergoing osteotomy about the knee. Methods Three independent observers measured preoperative and postoperative FTa with standard method (FTa1) and with circles method (FTa2) on APWB X-ray and the HKA on FLWB X-ray at three different time-points separated by a two-week period. Intra-observer and inter-observer reliabilities and the comparison and relationship between anatomical and mechanical alignment were calculated. Results Intra- and interclass coefficients for all the three methods indicated excellent reliability, having all the values above 0.80. Using the mean of paired t-student test, the comparison of HKA versus TFa1 and TFa2 showed a statistically significant difference (p<.0001) both for the pre-operative and post-operative sets of values. The correlation between the HKA and FTal was found poor for the preoperative set (R=0.26) and fair for the postoperative one (R=0.53), while the new circles method showed a higher correlation in both the preoperative (R=0.71) and postoperative sets (R=0.79). Conclusions Intra-observer reliability was high for HKA, FTal and FTa2 on APWB x-rays in the pre- and post-operative setting. Inter-rater reliability was higher for HKA and TFa2 compared to FTal. The femoro-tibial angle as measured on APWB with the traditional method (FTal) has a weak correlation with the HKA, and based on these findings, should not be used in everyday practice. The FTa2 showed better correlation with the HKA, although not excellent Level of Evidence Level III, Retrospective study. PMID:26361444
And the Winner Is … : Inter-Rater Reliability among Scholarship Assessors
ERIC Educational Resources Information Center
Johnston, Lucy; Schluter, Philip J.
2017-01-01
With increasing competition for postgraduate research scholarships, awarding processes demand attention and scrutiny. We examine inter-rater reliability for two prestigious New Zealand scholarships, the Shirtcliffe Fellowship and the Gordon Watson Scholarship. For each scholarship, five assessors (three academic; two non-academic) independently…
A Study of Reliability of Marking and Absolute Grading in Secondary Schools
ERIC Educational Resources Information Center
Abdul Gafoor, K.; Jisha, P.
2014-01-01
Using a non-experimental comparative group design in a sample consisting of 100 English teachers randomly selected from 30 secondary schools of a district of Kerala and assigning fifty teachers to groups for marking and grading, this study compares inter and intra-individual reliability in marking and absolute grading. Studying (1) the in marking…
Gupta, Tejpal; Nair, Vimoj; Epari, Sridhar; Pietsch, Torsten; Jalali, Rakesh
2012-01-01
There is significant inter-observer variation amongst the neuro-pathologists in the typing, subtyping, and grading of glial neoplasms for diagnosis. Centralized pathology review has been proposed to minimize this inter-observer variation and is now almost mandatory for accrual into multicentric trials. We sought to assess the concordance between neuro-pathologists on histopathological diagnosis of glioblastoma. Comparison of local, institutional, and central neuro-oncopathology reporting in a cohort of 34 patients with newly diagnosed supratentorial glioblastoma accrued consecutively at a tertiary-care institution on a prospective trial testing the addition of a new agent to standard chemo-radiation regimen. Concordance was sub-optimal between local histological diagnosis and central review, fair between local diagnosis and institutional review, and good between institutional and central review, with respect to histological typing/subtyping. Twelve (39%) of 31 patients with local histological diagnosis had identical tumor type, subtype and grade on central review. Overall agreement was modestly better (52%) between local diagnosis and institutional review. In contrast, 28 (83%) of 34 patients had completely concordant histopathologic diagnosis between institutional and central review. The inter-observer reliability test showed poor agreement between local and central review (kappa statistic=0.12, 95% confidence interval (CI): -0.03-0.32, P=0.043), but moderate agreement between institutional and central review (kappa statistic=0.51, 95%CI: 0.17-0.84, P=0.00003). Agreement between local diagnosis and institutional review was fair. There exists significant inter-observer variation regarding histopathological diagnosis of glioblastoma with significant implications for clinical research and practice. There is a need for more objective, quantitative, robust, and reproducible criteria for better subtyping for accurate diagnosis.
Virues-Ortega, Javier; Montaño-Fidalgo, Montserrat; Froján-Parga, María Xesús; Calero-Elvira, Ana
2011-12-01
This study analyzes the interobserver agreement and hypothesis-based known-group validity of the Therapist's Verbal Behavior Category System (SISC-INTER). The SISC-INTER is a behavioral observation protocol comprised of a set of verbal categories representing putative behavioral functions of the in-session verbal behavior of a therapist (e.g., discriminative, reinforcing, punishing, and motivational operations). The complete therapeutic process of a clinical case of an individual with marital problems was recorded (10 sessions, 8 hours), and data were arranged in a temporal sequence using 10-min periods. Hypotheses based on the expected performance of the putative behavioral functions portrayed by the SISC-INTER codes across prevalent clinical activities (i.e., assessing, explaining, Socratic method, providing clinical guidance) were tested using autoregressive integrated moving average (ARIMA) models. Known-group validity analyses provided support to all hypotheses. The SISC-INTER may be a useful tool to describe therapist-client interaction in operant terms. The utility of reliable and valid protocols for the descriptive analysis of clinical practice in terms of verbal behavior is discussed. Copyright © 2011. Published by Elsevier Ltd.
Wong, Camilla L.; Norris, Mireille; Sinha, Samir S.; Zorzitto, Maria L.; Madala, Sushma; Hamid, Jemila S.
2016-01-01
Background The Team Standardized Assessment of a Clinical Encounter Report (StACER) was designed for use in Geriatric Medicine residency programs to evaluate Communicator and Collaborator competencies. Methods The Team StACER was completed by two geriatricians and interdisciplinary team members based on observations during a geriatric medicine team meeting. Postgraduate trainees were recruited from July 2010–November 2013. Inter-rater reliability between two geriatricians and between all team members was determined. Internal consistency of items for the constructs Communicator and Collaborator competencies was calculated. Raters completed a survey previously administered to Canadian geriatricians to assess face validity. Trainees completed a survey to determine the usefulness of this instrument as a feedback tool. Results Thirty postgraduate trainees participated. The prevalence-adjusted bias-adjusted kappa range inter-rater reliability for Communicator and Collaborator items were 0.87–1.00 and 0.86–1.00, respectively. The Cronbach’s alpha coefficient for Communicator and Collaborator items was 0.997 (95% CI: 0.993–1.00) and 0.997 (95% CI: 0.997–1.00), respectively. The instrument lacked discriminatory power, as all trainees scored “meets requirements” in the overall assessment. Niney-three per cent and 86% of trainees found feedback useful for developing Communicator and Collaborator competencies, respectively. Conclusions The Team StACER has adequate inter-rater reliability and internal consistency. Poor discriminatory power and face validity challenge the merit of using this evaluation tool. Trainees felt the tool provided useful feedback on Collaborator and Communicator competencies. PMID:28050222
A preliminary psychometric evaluation of Music in Dementia Assessment Scales (MiDAS).
McDermott, Orii; Orgeta, Vasiliki; Ridder, Hanne Mette; Orrell, Martin
2014-06-01
Music in Dementia Assessment Scales (MiDAS), an observational outcome measure for music therapy with people with moderate to severe dementia, was developed from qualitative data of focus groups and interviews. Expert and peer consultations were conducted at each stage of the scale development to maximize its content validity. This study aimed to evaluate the psychometric properties of MiDAS. Care home residents with dementia attended weekly group music therapy for up to ten sessions. Music therapists and care home staff were requested to complete weekly MiDAS ratings. The Quality of Life Scale (QoL-AD) was completed at three time-points. A total of 629 (staff = 306, therapist = 323) MiDAS forms were completed. The statistical analysis revealed that MiDAS has high therapist inter-rater reliability, low staff inter-rater reliability, adequate staff test-retest reliability, adequate concurrent validity, and good construct validity. High factor loadings between the five MiDAS Visual Analogue Scale (VAS) items, levels of Interest, Response, Initiation, Involvement, and Enjoyment, were found. This study indicates that MiDAS has good psychometric properties despite the small sample size. Future research with a larger sample size could provide a more in-depth psychometric evaluation, including further exploration of the underlying factors. MiDAS provides a measure of engagement with musical experience and offers insight into who is likely to benefit on other outcomes such as quality of life or reduction in psychiatric symptoms.
Lampropoulou, Sofia I; Billis, Evdokia; Gedikoglou, Ingrid A; Michailidou, Christina; Nowicky, Alexander V; Skrinou, Dimitra; Michailidi, Fotini; Chandrinou, Danae; Meligkoni, Margarita
2018-02-23
This study aimed to investigate the psychometric characteristics of reliability, validity and ability to detect change of a newly developed balance assessment tool, the Mini-BESTest, in Greek patients with stroke. A prospective, observational design study with test-retest measures was conducted. A convenience sample of 21 Greek patients with chronic stroke (14 male, 7 female; age of 63 ± 16 years) was recruited. Two independent examiners administered the scale, for the inter-rater reliability, twice within 10 days for the test-retest reliability. Bland Altman Analysis for repeated measures assessed the absolute reliability and the Standard Error of Measurement (SEM) and the Minimum Detectable Change at 95% confidence interval (MDC 95% ) were established. The Greek Mini-BESTest (Mini-BESTest GR ) was correlated with the Greek Berg Balance Scale (BBS GR ) for assessing the concurrent validity and with the Timed Up and Go (TUG), the Functional Reach Test (FRT) and the Greek Falls Efficacy Scale-International (FES-I GR ) for the convergent validity. The Mini-BESTestGR demonstrated excellent inter-rater reliability (ICC (95%CI) = 0.997 (0.995-0.999, SEM = 0.46) with the scores of two raters within the limits of agreement (mean dif = -0.143 ± 0.727, p > 0.05) and test-retest reliability (ICC (95%CI) = 0.966 (0.926-0.988), SEM = 1.53). Additionally, the Mini-BESTest GR yielded very strong to moderate correlations with BBS GR (r = 0.924, p < 0.001), TUG (r = -0.823, p < 0.001), FES-I GR (r = -0.734, p < 0.001) and FRT (r = 0.689, p < 0.001). MDC 95 was 4.25 points. The exceptionally high reliability and the equally good validity of the Mini-BESTest GR , strongly support its utility in Greek people with chronic stroke. Its ability to identify clinically meaningful changes and falls risk need further investigation.
Lee, Eugene; Choi, Jung-Ah; Oh, Joo Han; Ahn, Soyeon; Hong, Sung Hwan; Chai, Jee Won; Kang, Heung Sik
2013-09-01
To retrospectively evaluate fatty degeneration (FD) of rotator cuff muscles on CTA using Goutallier's grading system and quantitative measurements with comparison between pre- and postoperative states. IRB approval was obtained for this study. Two radiologists independently reviewed pre- and postoperative CTAs of 43 patients (24 males and 19 females, mean age, 58.1 years) with 46 shoulders confirmed as full-thickness tears with random distribution. FD of supraspinatus, infraspinatus/teres minor, and subscapularis was assessed using Goutallier's system and by quantitative measurements of Hounsfield units (HUs) on sagittal images. Changes in FD grades and HUs were compared between pre- and postoperative CTAs and analyzed with respect to preoperative tear size and postoperative cuff integrity. The correlations between qualitative grades and quantitative measurements and their inter-observer reliabilities were also assessed. There was statistically significant correlation between FD grades and HU measurements of all muscles on pre- and postoperative CTA (p < 0.05). Inter-observer reliability of fatty degeneration grades were excellent to substantial on both pre- and postoperative CTA in supraspinatus (0.8685 and 0.8535) and subscapularis muscles (0.7777 and 0.7972), but fair in infraspinatus/teres minor muscles (0.5791 and 0.5740); however, quantitative Hounsfield units measurements showed excellent reliability for all muscles (ICC: 0.7950 and 0.9346 for SST, 0.7922 and 0.8492 for SSC, and 0.9254 and 0.9052 for IST/TM). No muscle showed improvement of fatty degeneration after surgical repair on qualitative and quantitative assessments; there was no difference in changes of fatty degeneration after surgical repair according to preoperative tear size and post-operative cuff integrity (p > 0.05). The average dose-length product (DLP, mGy · cm) was 365.2 mGy · cm (range, 323.8-417.2 mGy · cm) and estimated average effective dose was 5.1 mSv. Goutallier grades correlated well with HUs of rotator cuff muscles. Reliability was excellent for both systems, except for FD grade of IST/TM muscles, which may be more reliably assessed using quantitative measurements.
Karampatos, Sarah; Papaioannou, Alexandra; Beattie, Karen A; Maly, Monica R; Chan, Adrian; Adachi, Jonathan D; Pritchard, Janet M
2016-04-01
Determine the reliability of a magnetic resonance (MR) image segmentation protocol for quantifying intramuscular adipose tissue (IntraMAT), subcutaneous adipose tissue, total muscle and intermuscular adipose tissue (InterMAT) of the lower leg. Ten axial lower leg MRI slices were obtained from 21 postmenopausal women using a 1 Tesla peripheral MRI system. Images were analyzed using sliceOmatic™ software. The average cross-sectional areas of the tissues were computed for the ten slices. Intra-rater and inter-rater reliability were determined and expressed as the standard error of measurement (SEM) (absolute reliability) and intraclass coefficient (ICC) (relative reliability). Intra-rater and inter-rater reliability for IntraMAT were 0.991 (95% confidence interval [CI] 0.978-0.996, p < 0.05) and 0.983 (95% CI 0.958-9.993, p < 0.05), respectively. For the other soft tissue compartments, the ICCs were all >0.90 (p < 0.05). The absolute intra-rater and inter-rater reliability (expressed as SEM) for segmenting IntraMAT were 22.19 mm(2) (95% CI 16.97-32.04) and 78.89 mm(2) (95% CI 60.36-113.92), respectively. This is a reliable segmentation protocol for quantifying IntraMAT and other soft-tissue compartments of the lower leg. A standard operating procedure manual is provided to assist users, and SEM values can be used to estimate sample size and determine confidence in repeated measurements in future research.
Stokes, Verity; Gunn, Sarah; Schouwenaars, Katie; Badwan, Derar
2018-09-01
The Sensory Tool to Assess Responsiveness (STAR) is an interdisciplinary neurobehavioural diagnostic tool for individuals with prolonged disorders of consciousness. It utilises current diagnostic criteria and is intended to improve upon the high misdiagnosis rate in this population. This study assesses the inter-rater reliability of the STAR and its diagnostic validity in comparison with the Coma Recovery Scale-Revised (CRS-R) and the Wessex Head Injury Matrix (WHIM). Participants were patients with severe acquired brain injury resulting in a disorder of consciousness, who were admitted to the Royal Leamington Spa Rehabilitation Hospital between 1999 and 2009. Patients underwent sensory stimulation sessions during their period of admission, which were recorded on video. Using this footage, patients were re-assessed for this study using the STAR, WHIM and CRS-R criteria. The STAR demonstrated "moderate" inter-rater reliability, "substantial" diagnostic agreement with the CRS-R, and "moderate" agreement with the WHIM. There were no significant differences between diagnoses assigned by the different assessments. The STAR demonstrated a good degree of inter-rater reliability in identification of diagnoses for patients with disorders of consciousness. The diagnostic outcomes of the STAR agreed at a good level with the CRS-R, moderately with the WHIM, and did not significantly differ from either. This demonstrates the reliability and validity of the STAR, showing its appropriateness for clinical use. Future longitudinal studies and research into the STAR's applicability in long-stay rehabilitation are indicated.
Validity and reliability of the robotic objective structured assessment of technical skills
Siddiqui, Nazema Y.; Galloway, Michael L.; Geller, Elizabeth J.; Green, Isabel C.; Hur, Hye-Chun; Langston, Kyle; Pitter, Michael C.; Tarr, Megan E.; Martino, Martin A.
2015-01-01
Objective Objective structured assessments of technical skills (OSATS) have been developed to measure the skill of surgical trainees. Our aim was to develop an OSATS specifically for trainees learning robotic surgery. Study Design This is a multi-institutional study in eight academic training programs. We created an assessment form to evaluate robotic surgical skill through five inanimate exercises. Obstetrics/gynecology, general surgery, and urology residents, fellows, and faculty completed five robotic exercises on a standard training model. Study sessions were recorded and randomly assigned to three blinded judges who scored performance using the assessment form. Construct validity was evaluated by comparing scores between participants with different levels of surgical experience; inter- and intra-rater reliability were also assessed. Results We evaluated 83 residents, 9 fellows, and 13 faculty, totaling 105 participants; 88 (84%) were from obstetrics/gynecology. Our assessment form demonstrated construct validity, with faculty and fellows performing significantly better than residents (mean scores: 89 ± 8 faculty; 74 ± 17 fellows; 59 ± 22 residents, p<0.01). In addition, participants with more robotic console experience scored significantly higher than those with fewer prior console surgeries (p<0.01). R-OSATS demonstrated good inter-rater reliability across all five drills (mean Cronbach's α: 0.79 ± 0.02). Intra-rater reliability was also high (mean Spearman's correlation: 0.91 ± 0.11). Conclusions We developed an assessment form for robotic surgical skill that demonstrates construct validity, inter- and intra-rater reliability. When paired with standardized robotic skill drills this form may be useful to distinguish between levels of trainee performance. PMID:24807319
Reliability and agreement on embryo assessment: 5 years of an external quality control programme.
Martínez-Granados, Luis; Serrano, María; González-Utor, Antonio; Ortiz, Nereyda; Badajoz, Vicente; López-Regalado, María Luisa; Boada, Montserrat; Castilla, Jose A
2018-03-01
An external quality-control programme for morphology-based embryo quality assessment, incorporating a standardized embryo grading scheme, was evaluated over a period of 5 years to determine levels of inter-observer reliability and agreement between practising clinical embryologists at IVF centres and the opinions of a panel of experts. Following Guidelines for Reporting Reliability and Agreement Studies, the Gwet index and proportion of positive (Ppos) and negative agreement were calculated. For embryo morphology assessment, a substantial degree of reliability was measured between the centres and the panel of experts (Gwet index: 0.76; 95% CI 0.70 to 0.84). The agreement was higher for good- versus poor-quality embryos. When multinucleation or vacuoles were observed, low levels of reliability were obtained (Ppos: 0.56 and 0.43, respectively). In blastocysts, the characteristic that presented the largest discrepancy was that related to the inner cell mass. In decisions about the final disposition of the embryo, reliability between centre and the panel of experts was moderate (Gwet index: 0.51; 95% CI 0.41 to 0.60). In conclusion, the ability of clinical embryologists to evaluate the presence of multinucleation and vacuoles in the early cleavage embryo, and to determine the category of the inner cell mass in blastocysts, needs to be improved. Copyright © 2017 Reproductive Healthcare Ltd. All rights reserved.
Determining inter-system bias of GNSS signals with narrowly spaced frequencies for GNSS positioning
NASA Astrophysics Data System (ADS)
Tian, Yumiao; Liu, Zhizhao; Ge, Maorong; Neitzel, Frank
2017-12-01
Relative positioning using multi-GNSS (global navigation satellite systems) can improve accuracy, reliability, and availability compared to the use of a single constellation system. Intra-system double-difference (DD) ambiguities (ISDDAs) refer to the DD ambiguities between satellites of a single constellation system and can be fixed to an integer to derive the precise fixed solution. Inter-system ambiguities, which denote the DD ambiguities between different constellation systems, can also be fixed to integers on overlapping frequencies, once the inter-system bias (ISB) is removed. Compared with fixing ISDDAs, fixing both integer intra- and inter-system DD ambiguities (IIDDAs) means an increase of positioning precision through an integration of multiple GNSS constellations. Previously, researchers have studied IIDDA fixing with systems of the same frequencies, but not with systems of different frequencies. Integer IIDDAs can be determined from single-difference (SD) ambiguities, even if the frequencies of multi-GNSS signals used in the positioning are different. In this study, we investigated IIDDA fixing for multi-GNSS signals of narrowly spaced frequencies. First, the inter-system DD models of multi-GNSS signals of different frequencies are introduced, and the strategy for compensating for ISB is presented. The ISB is decomposed into three parts: 1) a float approximate ISB number that can be considered equal to the ISB of code pseudorange observations and thus can be estimated through single point positioning (SPP); 2) a number that is a multiple of the GNSS signal wavelength; and 3) a fractional ISB part, with a magnitude smaller than a single wavelength. Then, the relationship between intra- and inter-system DD ambiguity RATIO values and ISB was investigated by integrating GPS L1 and GLONASS L1 signals. In our numerical analyses with short baselines, the ISB parameter and IIDDA were successfully fixed, even if the number of observed satellites in each system was small.
Training less-experienced faculty improves reliability of skills assessment in cardiac surgery.
Lou, Xiaoying; Lee, Richard; Feins, Richard H; Enter, Daniel; Hicks, George L; Verrier, Edward D; Fann, James I
2014-12-01
Previous work has demonstrated high inter-rater reliability in the objective assessment of simulated anastomoses among experienced educators. We evaluated the inter-rater reliability of less-experienced educators and the impact of focused training with a video-embedded coronary anastomosis assessment tool. Nine less-experienced cardiothoracic surgery faculty members from different institutions evaluated 2 videos of simulated coronary anastomoses (1 by a medical student and 1 by a resident) at the Thoracic Surgery Directors Association Boot Camp. They then underwent a 30-minute training session using an assessment tool with embedded videos to anchor rating scores for 10 components of coronary artery anastomosis. Afterward, they evaluated 2 videos of a different student and resident performing the task. Components were scored on a 1 to 5 Likert scale, yielding an average composite score. Inter-rater reliabilities of component and composite scores were assessed using intraclass correlation coefficients (ICCs) and overall pass/fail ratings with kappa. All components of the assessment tool exhibited improvement in reliability, with 4 (bite, needle holder use, needle angles, and hand mechanics) improving the most from poor (ICC range, 0.09-0.48) to strong (ICC range, 0.80-0.90) agreement. After training, inter-rater reliabilities for composite scores improved from moderate (ICC, 0.76) to strong (ICC, 0.90) agreement, and for overall pass/fail ratings, from poor (kappa = 0.20) to moderate (kappa = 0.78) agreement. Focused, video-based anchor training facilitates greater inter-rater reliability in the objective assessment of simulated coronary anastomoses. Among raters with less teaching experience, such training may be needed before objective evaluation of technical skills. Published by Elsevier Inc.
ERIC Educational Resources Information Center
Matson, Johnny L.; Horovitz, Max; Mahan, Sara; Fodstad, Jill
2013-01-01
The purpose of this paper was to update the psychometrics of the "Matson Evaluation of Social Skills for Youngsters" ("MESSY") with children with Autism Spectrum Disorders (ASD), specifically with respect to internal consistency, split-half reliability, and inter-rater reliability. In Study 1, 114 children with ASD (Autistic Disorder, Asperger's…
Inter-rater reliability of the Sødring Motor Evaluation of Stroke patients (SMES).
Halsaa, K E; Sødring, K M; Bjelland, E; Finsrud, K; Bautz-Holter, E
1999-12-01
The Sødring Motor Evaluation of Stroke patients is an instrument for physiotherapists to evaluate motor function and activities in stroke patients. The rating reflects quality as well as quantity of the patient's unassisted performance within three domains: leg, arm and gross function. The inter-rater reliability of the method was studied in a sample of 30 patients admitted to a stroke rehabilitation unit. Three therapists were involved in the study; two therapists assessed the same patient on two consecutive days in a balanced design. Cohen's weighted kappa and McNemar's test of symmetry were used as measures of item reliability, and the intraclass correlation coefficient was used to express the reliability of the sumscores. For 24 out of 32 items the weighted kappa statistic was excellent (0.75-0.98), while 7 items had a kappa statistic within the range 0.53-0.74 (fair to good). The reliability of one item was poor (0.13). The intraclass correlation coefficient for the three sumscores was 0.97, 0.91 and 0.97. We conclude that the Sødring Motor Evaluation of Stroke patients is a reliable measure of motor function in stroke patients undergoing rehabilitation.
ERIC Educational Resources Information Center
Soslau, Elizabeth; Lewis, Kandia
2014-01-01
For accreditation and programmatic decision making, education school administrators use inter-rater reliability analyses to judge credibility of student-teacher assessments. Although weak levels of agreement between university-appointed supervisors and cooperating teachers are usually interpreted to indicate that the process is not being…
Tabuse, Hideaki; Kalali, Amir; Azuma, Hideki; Ozaki, Norio; Iwata, Nakao; Naitoh, Hiroshi; Higuchi, Teruhiko; Kanba, Shigenobu; Shioe, Kunihiko; Akechi, Tatsuo; Furukawa, Toshi A
2007-09-30
The Hamilton Rating Scale for Depression (HAMD) is the de facto international gold standard for the assessment of depression. There are some criticisms, however, especially with regard to its inter-rater reliability, due to the lack of standardized questions or explicit scoring procedures. The GRID-HAMD was developed to provide standardized explicit scoring conventions and a structured interview guide for administration and scoring of the HAMD. We developed the Japanese version of the GRID-HAMD and examined its inter-rater reliability among experienced and inexperienced clinicians (n=70), how rater characteristics may affect it, and how training can improve it in the course of a model training program using videotaped interviews. The results showed that the inter-rater reliability of the GRID-HAMD total score was excellent to almost perfect and those of most individual items were also satisfactory to excellent, both with experienced and inexperienced raters, and both before and after the training. With its standardized definitions, questions and detailed scoring conventions, the GRID-HAMD appears to be the best achievable set of interview guides for the HAMD and can provide a solid tool for highly reliable assessment of depression severity.
Stroke and aphasia quality-of-life scale-39: Reliability and validity of the Turkish version.
Noyan-ErbaŞ, AyŞin; Toğram, Bülent
2016-10-01
The aim of this study was to adapt the stroke and aphasia quality-of-life scale-39 (SAQoL-39) to the Turkish language and carry out a reliability and validity study of the instrument in a group of patients with aphasia. The study was a descriptive study and contained three phases: adaptation of the SAQoL-39 to the Turkish language, administration of the scale to 30 aphasia patients and reliability and validity studies of the scale. Internal consistency was assessed with Cronbach's alpha and test-re-test reliability was explored (n = 14). The adaptation process was completed based on inter-rater agreement on the translated items and within the scope of final editing by the authors of the study. The SAQoL-39 in Turkish exhibited high test-re-test reliability (ICC =0.97) as well as acceptability with minimal missing data (0-1.4). This instrument exhibited high internal consistency (Cronbach's α = 0.70-0.97), domain-total correlations (r = 0.76-0.85) and inter-domain correlations (r = 0.40-0.68). The analysis shows that the Turkish version of SAQoL-39 is a scale that is highly acceptable, valid and reliable and can be easily used in evaluating the quality-of-life of Turkish people with aphasia.
Semiautomatic estimation of breast density with DM-Scan software.
Martínez Gómez, I; Casals El Busto, M; Antón Guirao, J; Ruiz Perales, F; Llobet Azpitarte, R
2014-01-01
To evaluate the reproducibility of the calculation of breast density with DM-Scan software, which is based on the semiautomatic segmentation of fibroglandular tissue, and to compare it with the reproducibility of estimation by visual inspection. The study included 655 direct digital mammograms acquired using craniocaudal projections. Three experienced radiologists analyzed the density of the mammograms using DM-Scan, and the inter- and intra-observer agreement between pairs of radiologists for the Boyd and BI-RADS® scales were calculated using the intraclass correlation coefficient. The Kappa index was used to compare the inter- and intra-observer agreements with those obtained previously for visual inspection in the same set of images. For visual inspection, the mean interobserver agreement was 0,876 (95% CI: 0,873-0,879) on the Boyd scale and 0,823 (95% CI: 0,818-0,829) on the BI-RADS® scale. The mean intraobserver agreement was 0,813 (95% CI: 0,796-0,829) on the Boyd scale and 0,770 (95% CI: 0,742-0,797) on the BI-RADS® scale. For DM-Scan, the mean inter- and intra-observer agreement was 0,92, considerably higher than the agreement for visual inspection. The semiautomatic calculation of breast density using DM-Scan software is more reliable and reproducible than visual estimation and reduces the subjectivity and variability in determining breast density. Copyright © 2012 SERAM. Published by Elsevier Espana. All rights reserved.
Ruehland, Warren R.; O'Donoghue, Fergal J.; Pierce, Robert J.; Thornton, Andrew T.; Singh, Parmjit; Copland, Janet M.; Stevens, Bronwyn; Rochford, Peter D.
2011-01-01
Study Objective: To examine the impact of using American Academy of Sleep Medicine (AASM) recommended EEG derivations (F4/M1, C4/M1, O2/M1) vs. a single derivation (C4/M1) in polysomnography (PSG) on the measurement of sleep and cortical arousals, including inter- and intra-observer variability. Design: Prospective, non-blinded, randomized comparison. Setting: Three Australian tertiary-care hospital clinical sleep laboratories. Patients or Participants: 30 PSGs from consecutive patients investigated for obstructive sleep apnea (OSA) during December 2007 and January 2008. Interventions: N/A Measurements and Results: To examine the impact of EEG derivations on PSG summary statistics, 3 scorers from different Australian clinical sleep laboratories each scored separate sets of 10 PSGs twice, once using 3 EEG derivations and once using 1 EEG derivation. To examine the impact on inter- and intra-scorer reliability, all 3 scorers scored a subset of 10 PSGs 4 times, twice using each method. All PSGs were de-identified and scored in random order according to the 2007 AASM Manual for the Scoring of Sleep and Associated Events. Using 3 referential EEG derivations during PSG, as recommended in the AASM manual, instead of a single central EEG derivation, as originally suggested by Rechtschaffen and Kales (1968), resulted in a mean ± SE decrease in N1 sleep of 9.6 ± 3.9 min (P = 0.018) and an increase in N3 sleep of 10.6 ± 2.8 min (P = 0.001). No significant differences were observed for any other sleep or arousal scoring summary statistics; nor were any differences observed in inter-scorer or intra-scorer reliability for scoring sleep or cortical arousals. Conclusion: This study provides information for those changing practice to comply with the 2007 AASM recommendations for EEG placement in PSG, for those using portable devices that are unable to comply with the recommendations due to limited channel options, and for the development of future standards for PSG scoring and recording. As the use of multiple EEG derivations only led to small changes in the distribution of derived sleep stages and no significant differences in scoring reliability, this study calls into question the need to use multiple EEG derivations in clinical PSG as suggested in the AASM manual. Citation: Ruehland WR; O'Donoghue FJ; Pierce RJ; Thornton AT; Singh P; Copland JM; Stevens B; Rochford PD. The 2007 AASM recommendations for EEG electrode placement in polysomnography: impact on sleep and cortical arousal scoring. SLEEP 2011;34(1):73-81. PMID:21203376
Pijl, Mirjam Kj; Rommelse, Nanda Nj; Hendriks, Monica; De Korte, Manon Wp; Buitelaar, Jan K; Oosterling, Iris J
2018-02-01
The field of early autism research is in dire need of outcome measures that adequately reflect subtle changes in core autistic behaviors. This article compares the ability of a newly developed measure, the Brief Observation of Social Communication Change (BOSCC), and the Autism Diagnostic Observation Schedule (ADOS) to detect changes in core symptoms of autism in 44 toddlers. The results provide encouraging evidence for the Brief Observation of Social Communication Change as a candidate outcome measure, as reflected in sufficient inter- and intra-rater reliability, independency from other child characteristics, and sensitivity to capture change. Although the Brief Observation of Social Communication Change did not evidently outperform the Autism Diagnostic Observation Schedule on any of these quality criteria, the instrument may be better able to capture subtle, individual changes in core autistic symptoms. The promising findings warrant further study of this new instrument.
Falloon, I R H; Mizuno, M; Murakami, M; Roncone, R; Unoka, Z; Harangozo, J; Pullman, J; Gedye, R; Held, T; Hager, B; Erickson, D; Burnett, K
2005-01-01
To develop a reliable standardized assessment of psychiatric symptoms for use in clinical practice. A 50-item interview, the Current Psychiatric State 50 (CPS-50), was used to assess 237 patients with a range of psychiatric diagnoses. Ratings were made by interviewers after a 2-day training. Comparisons of inter-rater reliability on each item and on eight clinical subscales were made across four international centres and between psychiatrists and non-psychiatrists. A principal components analysis was used to validate these clinical scales. Acceptable inter-rater reliability (intra-class coefficient > 0.80) was found for 46 of the 50 items, and for all eight subscales. There was no difference between centres or between psychiatrists and non-psychiatrists. The principal components analysis factors were similar to the clinical scales. The CPS-50 is a reliable standardized assessment of current mental status that can be used in clinical practice by all mental health professionals after brief training. Blackwell Munksgaard 2004
Fokkens, Andrea S; Groothoff, Johan W; van der Klink, Jac J L; Popping, Roel; Stewart, Roy E; van de Ven, Lex; Brouwer, Sandra; Tuinstra, Jolanda
2015-09-01
An assessment tool was developed to assess disability in veterans who suffer from post-traumatic stress disorder (PTSD) due to a military mission. The objective of this study was to determine the reliability, intra-rater and inter-rater variation of the Mental Disability Military (MDM) assessment tool. Twenty-four assessment interviews of veterans with an insurance physician were videotaped. Each videotaped interview was assessed by a group of five independent raters on limitations of the veterans using the MDM assessment tool. After 2 months the raters repeated this procedure. Next the intra-rater and inter-rater variation was assessed with an adjusted version of AG09 computing weighted percentage agreement. The results of this study showed that both the intra-rater variation and inter-rater variation on the ten subcategories of the MDM assessment tool were small, with an agreement of 84-100% within raters and 93-100% between raters. The MDM assessment tool proves to be a reliable instrument to measure PTSD limitations in functioning in Dutch military veterans who apply for disability compensation. Further research is needed to assess the validity of this instrument.
Evaluation of airway protection: Quantitative timing measures versus penetration/aspiration score.
Kendall, Katherine A
2017-10-01
Quantitative measures of swallowing function may improve the reliability and accuracy of modified barium swallow (MBS) study interpretation. Quantitative study analysis has not been widely instituted, however, secondary to concerns about the time required to make measures and a lack of research demonstrating impact on MBS interpretation. This study compares the accuracy of the penetration/aspiration (PEN/ASP) scale (an observational visual-perceptual assessment tool) to quantitative measures of airway closure timing relative to the arrival of the bolus at the upper esophageal sphincter in identifying a failure of airway protection during deglutition. Retrospective review of clinical swallowing data from a university-based outpatient clinic. Swallowing data from 426 patients were reviewed. Patients with normal PEN/ASP scores were identified, and the results of quantitative airway closure timing measures for three liquid bolus sizes were evaluated. The incidence of significant airway closure delay with and without a normal PEN/ASP score was determined. Inter-rater reliability for the quantitative measures was calculated. In patients with a normal PEN/ASP score, 33% demonstrated a delay in airway closure on at least one swallow during the MBS study. There was no correlation between PEN/ASP score and airway closure delay. Inter-rater reliability for the quantitative measure of airway closure timing was nearly perfect (intraclass correlation coefficient = 0.973). The use of quantitative measures of swallowing function, in conjunction with traditional visual perceptual methods of MBS study interpretation, improves the identification of airway closure delay, and hence, potential aspiration risk, even when no penetration or aspiration is apparent on the MBS study. 4. Laryngoscope, 127:2314-2318, 2017. © 2017 The American Laryngological, Rhinological and Otological Society, Inc.
Applicability of contrast-enhanced ultrasound in the diagnosis of plantar fasciitis.
Broholm, R; Pingel, J; Simonsen, L; Bülow, J; Johannsen, F
2017-12-01
Contrast-enhanced ultrasound (CEUS) is used to visualize the microvascularization in various tissues. The purpose of this study was to investigate whether CEUS could be used to visualize the microvascular volume (MV) in the plantar fascia, and to compare the method to clinical symptoms and B-mode ultrasound (US) in patients with plantar fasciitis (PF). Twenty patients with unilateral PF were included and were divided by US in insertional thickening (10), midsubstance thickening (5), and no US changes (5). The MV was measured simultaneously in both heels. Four areas in the plantar fascia and plantar fat pad were measured independently by two observers. Inter- and intra-observer correlation analyses were performed. The asymptomatic heels showed a constantly low MV, and for the whole group of patients, a significantly higher MV was found in the symptomatic plantar fascia and plantar fat pad. Inter-observer correlation as well as intra-observer agreement was excellent. The MV in the plantar fascia and plantar fat pad can be measured reliably using CEUS, suggesting that it is a reproducible method to examine patients with plantar fasciitis. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Chanani, Ankit; Adhikari, Haridas Das
2017-01-01
Background: Differential diagnosis of periapical cysts and granulomas is required as their treatment modalities are different. Aim: The aim of this study was to evaluate the efficacy of cone beam computed tomography (CBCT) in the differential diagnosis of periapical cysts from granulomas. Settings and Design: A single-centered observational study was carried out in the Department of Conservative Dentistry and Endodontics, Dr. R. Ahmed Dental College and Hospital, using CBCT and dental operating microscope. Methods: Forty-five lesions were analyzed using CBCT scans. One evaluator analyzed each CBCT scan for the presence of the following six characteristic radiological features: cyst like-location, shape, periphery, internal structure, effect on the surrounding structures, and cortical plate perforation. Another independent evaluator analyzed the CBCT scans. This process was repeated after 6 months, and inter- and intrarater reliability of CBCT diagnoses was evaluated. Periapical surgeries were performed and tissue samples were obtained for histopathological analysis. To evaluate the efficacy, CBCT diagnoses were compared with histopathological diagnoses, and six receiver operating characteristic (ROC) curve analyses were conducted. Statistical Analysis Used: ROC curve, Cronbach's alpha (α) test, and Cohen Kappa (κ) test were used for statistical analysis. Results: Both inter- and intrarater reliability were excellent (α = 0.94, κ = 0.75 and 0.77, respectively). ROC curve with regard to ≥4 positive findings revealed the highest area under curve (0.66). Conclusion: CBCT is moderately accurate in the differential diagnosis of periapical cysts and granulomas. PMID:29386780
Palmer, Janice L; Coats, Mary A; Roe, Catherine M; Hanko, Shelly M; Xiong, Chengjie; Morris, John C
2010-06-01
This paper is a report of a study to establish the inter-rater reliability of advanced practice nurse and neurologist neurological assessments which included ratings with the Unified Parkinson's Disease Rating Scale-Motor Exam. Around the world, advanced practice nurses are performing tasks once completed only by physicians. To promote consumer and provider confidence, it is important to establish that nurse and physician ratings using assessment tools are similar. In addition in research settings, when different raters are used, establishment of inter-rater reliability for study assessments is needed. Advanced practice nurses and neurologists independently recorded findings on neurological examinations of 46 participants in a study conducted between August 2007 and January 2008. An intraclass correlation coefficient was calculated to estimate overall agreement between the nurse and neurologist ratings. Agreement for individual items measured on a dichotomous scale was assessed by calculating Cohen's kappa. There was substantial agreement between advanced practice nurses and neurologists on the mean Unified Parkinson's Disease Rating Scale-Motor Exam ratings (intraclass correlation coefficient = 0.65) and the U.S. National Alzheimer's Coordinating Center Uniform Data Set neurological examination ratings of unremarkable findings (kappa = 0.74) and of gait disorder (kappa = 0.73). Moderate agreement (kappa = 0.53) was reached for the rating of whether all Unified Parkinson's Disease Rating Scale-Motor Exam items were normal. These findings are consistent with studies of the inter-rater agreement of the Unified Parkinson's Disease Rating Scale-Motor Exam and support the conduct of neurological assessments by advanced practice nurses.
Safety, reliability, and validity of a physiologic definition of bronchopulmonary dysplasia.
Walsh, Michele C; Wilson-Costello, Deanna; Zadell, Arlene; Newman, Nancy; Fanaroff, Avroy
2003-09-01
Bronchopulmonary dysplasia (BPD) is the focus of many intervention trials, yet the outcome measure when based solely on oxygen administration may be confounded by differing criteria for oxygen administration between physicians. Thus, we wished to define BPD by a standardized oxygen saturation monitoring at 36 weeks corrected age, and compare this physiologic definition with the standard clinical definition of BPD based solely on oxygen administration. A total of 199 consecutive very low birthweight infants (VLBW, 501 to 1500 g birthweight) were assessed prospectively at 36+/-1 weeks corrected age. Neonates on positive pressure support or receiving >30% supplemental oxygen were assigned the outcome BPD. Those receiving < or =30% oxygen underwent a stepwise 2% reduction in supplemental oxygen to room air while under continuous observation and oxygen saturation monitoring. Outcomes of the test were "no BPD" (saturations > or =88% for 60 minutes) or "BPD" (saturation < 88%). At the conclusion of the test, all infants were returned to their baseline oxygen. Safety (apnea, bradycardia, increased oxygen use), inter-rater reliability, test-retest reliability, and validity of the physiologic definition vs the clinical definition were assessed. A total of 199 VLBW were assessed, of whom 45 (36%) were diagnosed with BPD by the clinical definition of oxygen use at 36 weeks corrected age. The physiologic definition identified 15 infants treated with oxygen who successfully passed the saturation monitoring test in room air. The physiologic definition diagnosed BPD in 30 (24%) of the cohort. All infants were safely studied. The test was highly reliable (inter-rater reliability, kappa=1.0; test-retest reliability, kappa=0.83) and highly correlated with discharge home in oxygen, length of hospital stay, and hospital readmissions in the first year of life. The physiologic definition of BPD is safe, feasible, reliable, and valid and improves the precision of the diagnosis of BPD. This may be of benefit in future multicenter clinical trials.
Prather, H; Harris-Hayes, M; Hunt, D; Steger-May, K; Mathew, V; Clohisy, JC
2012-01-01
Objective The objectives of this study are the following: 1) report passive hip ROM in asymptomatic young adults, 2) report the intra-tester and inter-tester reliability of hip ROM measurements among testers of multiple disciplines, 3) report the results of provocative hip tests and tester agreement. Design descriptive epidemiology study Setting tertiary university Participants Twenty-eight young adult volunteers without musculoskeletal symptoms, history of disorder or surgery involving the lumbar spine or lower extremities were enrolled and completed the study. Methods Asymptomatic young adult volunteers completed questionnaires and were examined by two blinded examiners during a single session. The testers were physical therapists and physicians. Hip range of motion and provocative tests were completed by both examiners on each hip. Main Outcome Measurements Inter and intra-rater reliability for ROM and agreement for provocative tests was determined. Results Twenty-eight asymptomatic adults with mean age 31 years old (range 18–51 years) and mean modified Harris Hip Score of 99.5 ± 1.5 and UCLA Activity score of 8.8 ± 1.2 completed the study. Intra-rater agreement was excellent for all hip range of motion measurements, with intraclass correlation coefficients (ICCs) ranging from 0.76 to 0.97 with similar agreement if the examiner was a physical therapist or a physician. Excellent inter-rater reliability was found for hip flexion ICC 0.87 (95% CI 0.78 to 0.92), supine internal rotation ICC 0.75 (95% CI 0.60 to 0.84) and prone internal rotation ICC 0.79 (95% CI 0.66 to 0.87). The least reliable measurements were supine hip abduction (ICC 0.34) and supine external rotation (ICC 0.18). Agreement between examiners ranged from 96–100% for provocative hip tests which included the hip impingement, resisted straight leg raise, FABER/Patrick’s and log roll tests. Conclusions Specific hip ROM measures show excellent inter-rater reliability and provocative hip tests show good agreement among multiple examiners and medical disciplines. Further studies are needed to assess the utilization of these measurements and tests as a part of a hip screening examination to assess for young adults at risk intra-articular hip disorders prior to the onset of degenerative changes. PMID:20970757
ERIC Educational Resources Information Center
Carey, Michael D.; Mannell, Robert H.; Dunn, Peter K.
2011-01-01
This study investigated factors that could affect inter-examiner reliability in the pronunciation assessment component of speaking tests. We hypothesized that the rating of pronunciation is susceptible to variation in assessment due to the amount of exposure examiners have to nonnative English accents. An inter-rater variability analysis was…
Lange, Toni; Matthijs, Omer; Jain, Nitin B; Schmitt, Jochen; Lützner, Jörg; Kopkow, Christian
2017-03-01
Shoulder pain in the general population is common and to identify the aetiology of shoulder pain, history, motion and muscle testing, and physical examination tests are usually performed. The aim of this systematic review was to summarise and evaluate intrarater and inter-rater reliability of physical examination tests in the diagnosis of shoulder pathologies. A comprehensive systematic literature search was conducted using MEDLINE, EMBASE, Allied and Complementary Medicine Database (AMED) and Physiotherapy Evidence Database (PEDro) through 20 March 2015. Methodological quality was assessed using the Quality Appraisal of Reliability Studies (QAREL) tool by 2 independent reviewers. The search strategy revealed 3259 articles, of which 18 finally met the inclusion criteria. These studies evaluated the reliability of 62 test and test variations used for the specific physical examination tests for the diagnosis of shoulder pathologies. Methodological quality ranged from 2 to 7 positive criteria of the 11 items of the QAREL tool. This review identified a lack of high-quality studies evaluating inter-rater as well as intrarater reliability of specific physical examination tests for the diagnosis of shoulder pathologies. In addition, reliability measures differed between included studies hindering proper cross-study comparisons. PROSPERO CRD42014009018. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
McAlpine, R T; Bettany-Saltikov, J A; Warren, J G
2009-01-01
Assessment of spinal posture during physiotherapy practice is routine, yet few objective measures exist to this end. The Middlesbrough Integrated Digital Assessment System (MIDAS) is a low cost portable system able to record 3D information on posture. The purpose of this study was to assess both the intra-rater and inter-rater reliability of the MIDAS system. Twenty-five healthy subjects were recruited. A repeated measures design was used to record fifteen pre-palpated landmarks on the back of each subject. To limit the sources of variability, the principal researcher palpated the landmarks for each subject. Each of three raters took two measurements on each subject in a standardized upright posture. X (medio-lateral), Y (antero-posterior) and Z (height) landmark positions were recorded via a computer interface. Both intra-rater agreement (mean ICCs - rater 1 r=0.970, rater 2 r=0.965 and rater 3 r=0.965, p< 0.001) and inter-rater agreement (mean ICCs r=0.967, p< 0.001) was very high between repeated measures and between markers. Error values for the z-axis (height) were the lowest. The MIDAS demonstrated both high inter-rater and intra-rater reliability and provides an objective method for the assessment of posture in physiotherapy practice.
Reliability of injury grading systems for patients with blunt splenic trauma.
Olthof, D C; van der Vlies, C H; Scheerder, M J; de Haan, R J; Beenen, L F M; Goslings, J C; van Delden, O M
2014-01-01
The most widely used grading system for blunt splenic injury is the American Association for the Surgery of Trauma (AAST) organ injury scale. In 2007 a new grading system was developed. This 'Baltimore CT grading system' is superior to the AAST classification system in predicting the need for angiography and embolization or surgery. The objective of this study was to assess inter- and intraobserver reliability between radiologists in classifying splenic injury according to both grading systems. CT scans of 83 patients with blunt splenic injury admitted between 1998 and 2008 to an academic Level 1 trauma centre were retrospectively reviewed. Inter and intrarater reliability were expressed in Cohen's or weighted Kappa values. Overall weighted interobserver Kappa coefficients for the AAST and 'Baltimore CT grading system' were respectively substantial (kappa=0.80) and almost perfect (kappa=0.85). Average weighted intraobserver Kappa's values were in the 'almost perfect' range (AAST: kappa=0.91, 'Baltimore CT grading system': kappa=0.81). The present study shows that overall the inter- and intraobserver reliability for grading splenic injury according to the AAST grading system and 'Baltimore CT grading system' are equally high. Because of the integration of vascular injury, the 'Baltimore CT grading system' supports clinical decision making. We therefore recommend use of this system in the classification of splenic injury. Copyright © 2012 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Dogan, C. Deha; Uluman, Müge
2017-01-01
The aim of this study was to determine the extent at which graded-category rating scales and rubrics contribute to inter-rater reliability. The research was designed as a correlational study. Study group consisted of 82 students attending sixth grade and three writing course teachers in a private elementary school. A performance task was…
Fatigue in children: reliability and validity of the Dutch PedsQL™ Multidimensional Fatigue Scale.
Gordijn, M Suzanne; Suzanne Gordijn, M; Cremers, Eline M P; Kaspers, Gertjan J L; Gemke, Reinoud J B J
2011-09-01
The aim of the study is to report on the feasibility, reliability, validity, and the norm-references of the Dutch version of the PedsQL™ Multidimensional Fatigue Scale. The study participants are four hundred and ninety-seven parents of children aged 2-18 years and 366 children aged 5-18 years from various day care facilities, elementary schools, and a high school who completed the Dutch version of the PedsQL™ Multidimensional Fatigue Scale. The number of missing items was minimal. All scales showed satisfactory internal consistency reliability, with Cronbach's coefficient alpha exceeding 0.70. Test-retest reliability was good to excellent (ICCs 0.68-0.84) and inter-observer reliability varied from moderate to excellent (ICCs 0.56-0.93) for total scores. Parent/child concordance for total scores was poor to good (ICCs 0.25-0.68). The PedsQL™ Multidimensional Fatigue Scale was able to distinguish between healthy children and children with an impaired health condition. The Dutch version of the PedsQL™ Multidimensional Fatigue Scale demonstrates an adequate feasibility, reliability, and validity in another sociocultural context. With the obtained norm-references, it can be utilized as a tool in the evaluation of fatigue in healthy and chronically ill children aged 2-18 years.
A reliability analysis of the revised competitiveness index.
Harris, Paul B; Houston, John M
2010-06-01
This study examined the reliability of the Revised Competitiveness Index by investigating the test-retest reliability, interitem reliability, and factor structure of the measure based on a sample of 280 undergraduates (200 women, 80 men) ranging in age from 18 to 28 years (M = 20.1, SD = 2.1). The findings indicate that the Revised Competitiveness Index has high test-retest reliability, high inter-item reliability, and a stable factor structure. The results support the assertion that the Revised Competitiveness Index assesses competitiveness as a stable trait rather than a dynamic state.
Developing a General Outcome Measure of Growth in Movement for Infants and Toddlers.
ERIC Educational Resources Information Center
Greenwood, Charles R.; Luze, Gayle J.; Cline, Gabriel; Kuntz, Susan; Leitschuh, Carol
2002-01-01
The development of an experimental measure for assessing growth in movement in children (ages birth-3) is described. Results from the use of the Movement General Outcome Measurement with 29 infants and toddlers demonstrated the feasibility of the measure. The 6-minute assessment was found reliable in terms of inter-observer agreement. (Contains…
Johnson, Mark I.; Francis, Peter
2018-01-01
Context The influence of methodological parameters on the measurement of muscle contractile properties using Tensiomyography (TMG) has not been published. Objective To investigate the; (1) reliability of stimulus amplitude needed to elicit maximum muscle displacement (Dm), (2) effect of changing inter-stimulus interval on Dm (using a fixed stimulus amplitude) and contraction time (Tc), (3) the effect of changing inter-electrode distance on Dm and Tc. Design Within subject, repeated measures. Participants 10 participants for each objective. Main outcome measures Dm and Tc of the rectus femoris, measured using TMG. Results The coefficient of variance (CV) and the intra-class correlation (ICC) of stimulus amplitude needed to elicit maximum Dm was 5.7% and 0.92 respectively. Dm was higher when using an inter-electrode distance of 7cm compared to 5cm [P = 0.03] and when using an inter-stimulus interval of 10s compared to 30s [P = 0.017]. Further analysis of inter-stimulus interval data, found that during 10 repeated stimuli Tc became faster after the 5th measure when compared to the second measure [P<0.05]. The 30s inter-stimulus interval produced the most stable Tc over 10 measures compared to 10s and 5s respectively. Conclusion Our data suggest that the stimulus amplitude producing maximum Dm of the rectus femoris is reliable. Inter-electrode distance and inter-stimulus interval can significantly influence Dm and/ or Tc. Our results support the use of a 30s inter-stimulus interval over 10s or 5s. Future studies should determine the influence of methodological parameters on muscle contractile properties in a range of muscles. PMID:29451885
Done, Terence; Roelfsema, Chris; Harvey, Andrew; Schuller, Laura; Hill, Jocelyn; Schläppy, Marie-Lise; Lea, Alexandra; Bauer-Civiello, Anne; Loder, Jennifer
2017-04-15
Reef Check Australia (RCA) has collected data on benthic composition and cover at >70 sites along >1000km of Australia's Queensland coast from 2002 to 2015. This paper quantifies the accuracy, precision and power of RCA benthic composition data, to guide its application and interpretation. A simulation study established that the inherent accuracy of the Reef Check point sampling protocol is high (<±7% error absolute), in the range of estimates of benthic cover from 1% to 50%. A field study at three reef sites indicated that, despite minor observer- and deployment-related biases, the protocol does reliably document moderate ecological changes in coral communities. The error analyses were then used to guide the interpretation of inter-annual variability and long term trends at three study sites in RCA's major 2002-2015 data series for the Queensland coast. Copyright © 2017 Elsevier Ltd. All rights reserved.
Stefanatou, Pentagiotissa; Giannouli, Eleni; Konstantakopoulos, George; Vitoratou, Silia; Mavreas, Venetsanos
2014-11-01
Evaluation of mental health services based on patients' needs assessments has never taken place in Greece, although it is a crucial factor for the efficient use of their limited resources. To examine the inter-rater and test-retest reliability and the concurrent/convergent validity of the Greek research version of the Camberwell Assessment of Need-Research (CAN-R). A total of 53 schizophrenic patient-staff pairs were interviewed twice to test the inter-rater and test-retest reliability of the Greek version of the CAN-R. The World Health Organization Quality of Life-Brief Form (WHOQOL-BREF) and World Health Organization Disability Assessment Schedule-2.0 (WHODAS-2.0) were administered to the patients to examine concurrent validity. The inter-rater and test-retest reliability of patient and staff interviews for the 22 individual items and the eight summary scores of the instrument's four sections were good to excellent. Significant correlations emerged between CAN scores and the WHOQOL-BREF and WHODAS-2.0 domains for both patient and staff ratings, indicating good concurrent validity. Our results suggest that the Greek version of the CAN-R is a reliable instrument for assessing mental health patients' needs. Moreover, it is the first CAN-R validity study with satisfactory results using WHOQOL-BREF and WHODAS-2.0 as criterion variables. © The Author(s) 2013.
Reliability of Multi-Category Rating Scales
ERIC Educational Resources Information Center
Parker, Richard I.; Vannest, Kimberly J.; Davis, John L.
2013-01-01
The use of multi-category scales is increasing for the monitoring of IEP goals, classroom and school rules, and Behavior Improvement Plans (BIPs). Although they require greater inference than traditional data counting, little is known about the inter-rater reliability of these scales. This simulation study examined the performance of nine…
ERIC Educational Resources Information Center
Claes, C.; Van Hove, G.; van Loon, J.; Vandevelde, S.; Schalock, R. L.
2009-01-01
Background: Despite various reliability studies on the Supports Intensity Scale (SIS), to date there has not been an evaluation of the reliability of client vs. staff judgments. Such determination is important, given the increasing consumer-driven approach to services. Additionally, there has not been an evaluation of the instrument's construct…
ERIC Educational Resources Information Center
Klin, Ami; Lang, Jason; Cicchetti, Domenic V.; Volkmar, Fred R.
2000-01-01
This study examined the inter-rater reliability of clinician-assigned diagnosis of autism using or not using the criteria specified in the Diagnostic and Statistical Manual IV (DSM-IV). For experienced raters there was little difference in reliability in the two conditions. However, a clinically significant improvement in diagnostic reliability…
Makdissi, Michael; Davis, Gavin
2016-10-01
The objective of this study was to determine the reliability and validity of identifying clinical signs of concussion using video analysis in Australian football. Prospective cohort study. All impacts and collisions potentially resulting in a concussion were identified during 2012 and 2013 Australian Football League seasons. Consensus definitions were developed for clinical signs associated with concussion. For intra- and inter-rater reliability analysis, two experienced clinicians independently assessed 102 randomly selected videos on two occasions. Sensitivity, specificity, positive and negative predictive values were calculated based on the diagnosis provided by team medical staff. 212 incidents resulting in possible concussion were identified in 414 Australian Football League games. The intra-rater reliability of the video-based identification of signs associated with concussion was good to excellent. Inter-rater reliability was good to excellent for impact seizure, slow to get up, motor incoordination, ragdoll appearance (2 of 4 analyses), clutching at head and facial injury. Inter-rater reliability for loss of responsiveness and blank and vacant look was only fair and did not reach statistical significance. The feature with the highest sensitivity was slow to get up (87%), but this sign had a low specificity (19%). Other video signs had a high specificity but low sensitivity. Blank and vacant look (100%) and motor incoordination (81%) had the highest positive predictive value. Video analysis may be a useful adjunct to the side-line assessment of a possible concussion. Video analysis however should not replace the need for a thorough multimodal clinical assessment. Copyright © 2016 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Stolarova, Margarita; Wolf, Corinna; Rinker, Tanja; Brielmann, Aenne
2014-01-01
This report has two main purposes. First, we combine well-known analytical approaches to conduct a comprehensive assessment of agreement and correlation of rating-pairs and to dis-entangle these often confused concepts, providing a best-practice example on concrete data and a tutorial for future reference. Second, we explore whether a screening questionnaire developed for use with parents can be reliably employed with daycare teachers when assessing early expressive vocabulary. A total of 53 vocabulary rating pairs (34 parent–teacher and 19 mother–father pairs) collected for two-year-old children (12 bilingual) are evaluated. First, inter-rater reliability both within and across subgroups is assessed using the intra-class correlation coefficient (ICC). Next, based on this analysis of reliability and on the test-retest reliability of the employed tool, inter-rater agreement is analyzed, magnitude and direction of rating differences are considered. Finally, Pearson correlation coefficients of standardized vocabulary scores are calculated and compared across subgroups. The results underline the necessity to distinguish between reliability measures, agreement and correlation. They also demonstrate the impact of the employed reliability on agreement evaluations. This study provides evidence that parent–teacher ratings of children's early vocabulary can achieve agreement and correlation comparable to those of mother–father ratings on the assessed vocabulary scale. Bilingualism of the evaluated child decreased the likelihood of raters' agreement. We conclude that future reports of agreement, correlation and reliability of ratings will benefit from better definition of terms and stricter methodological approaches. The methodological tutorial provided here holds the potential to increase comparability across empirical reports and can help improve research practices and knowledge transfer to educational and therapeutic settings. PMID:24994985
Chow, Clara K.; Lock, Karen; Madhavan, Manisha; Corsi, Daniel J.; Gilmore, Anna B.; Subramanian, S. V.; Li, Wei; Swaminathan, Sumathi; Lopez-Jaramillo, Patricio; Avezum, Alvaro; Lear, Scott A.; Dagenais, Gilles; Teo, Koon; McKee, Martin; Yusuf, Salim
2010-01-01
Background The environment in which people live is known to be important in influencing diet, physical activity, smoking, psychosocial and other risk factors for cardiovascular (CV) disease. However no instrument exists that evaluates communities for these multiple environmental factors and is suitable for use across different communities, regions and countries. This report describes the design and reliability of an instrument to measure environmental determinants of CV risk factors. Method/Principal Findings The Environmental Profile of Community Health (EPOCH) instrument comprises two parts: (I) an assessment of the physical environment, and (II) an interviewer-administered questionnaire to collect residents' perceptions of their community. We examined the inter-rater reliability amongst 3 observers from each region of the direct observation component of the instrument (EPOCH I) in 93 rural and urban communities in 5 countries (Canada, Colombia, Brazil, China and India). Data collection using the EPOCH instrument was feasible in all communities. Reliability of the instrument was excellent (Intraclass Correlation Coefficient - ICC>0.75) for 24 of 38 items and fair to good (ICC 0.4–0.75) for 14 of 38 items. Conclusion This report shows data collection with the EPOCH instrument is feasible and direct observation of community measures reliable. The EPOCH instrument will enable further research on environmental determinants of health for population studies from a broad range of settings. PMID:21170320
Orban, Pierre; Madjar, Cécile; Savard, Mélissa; Dansereau, Christian; Tam, Angela; Das, Samir; Evans, Alan C; Rosa-Neto, Pedro; Breitner, John C S; Bellec, Pierre
2015-01-01
We present a test-retest dataset of resting-state fMRI data obtained in 80 cognitively normal elderly volunteers enrolled in the "Pre-symptomatic Evaluation of Novel or Experimental Treatments for Alzheimer's Disease" (PREVENT-AD) Cohort. Subjects with a family history of Alzheimer's disease in first-degree relatives were recruited as part of an on-going double blind randomized clinical trial of Naproxen or placebo. Two pairs of scans were acquired ~3 months apart, allowing the assessment of both intra- and inter-session reliability, with the possible caveat of treatment effects as a source of inter-session variation. Using the NeuroImaging Analysis Kit (NIAK), we report on the standard quality of co-registration and motion parameters of the data, and assess their validity based on the spatial distribution of seed-based connectivity maps as well as intra- and inter-session reliability metrics in the default-mode network. This resource, released publicly as sample UM1 of the Consortium for Reliability and Reproducibility (CoRR), will benefit future studies focusing on the preclinical period preceding the appearance of dementia in Alzheimer's disease.
Towards a new protocol of scoliosis assessments and monitoring in clinical practice: A pilot study.
Lukovic, Tanja; Cukovic, Sasa; Lukovic, Vanja; Devedzic, Goran; Djordjevic, Dusica
2015-01-01
Although intensively investigated, the procedures for assessment and monitoring of scoliosis are still a subject of controversies. The aim of this study was to assess validity and reliability of a number of physiotherapeutic measurements that could be used for clinical monitoring of scoliosis. Fifteen healthy (symmetric) subjects were subjected to a set of measurements two times, by two experienced and two inexperienced physiotherapists. Intra-observer and inter-observer reliability of measurements were determined. Following measurements were performed: body height and weight, chest girth in inspirium and expirium, the length of legs, the spine translation, the lateral pelvic tilt, the equality of the shoulders, position of scapulas, the equality of stature triangles, the rib hump, the existence of m. iliopsoas contracture, Fröhner index, the size of lumbar lordosis and the angle of trunk rotation. Intraclass correlation coefficient was high (> 0.8) for majority of measurements when experienced physiotherapists performed them, while inexperienced physiotherapists performed precisely only basic, easy measurements. We showed in this pilot study on healthy subjects, that majority of basic physiotherapeutic measurements are valid and reliable when performed by specialized physiotherapist, and it can be expected that this protocol will gain high value when measurements on subjects with scoliosis are performed.
Petrova, Tatjana; Kavookjian, Jan; Madson, Michael B; Dagley, John; Shannon, David; McDonough, Sharon K
2015-01-01
Motivational interviewing (MI) has demonstrated a significant impact as an intervention strategy for addiction management, change in lifestyle behaviors, and adherence to prescribed medication and other treatments. Key elements to studying MI include training in MI of professionals who will use it, assessment of skills acquisition in trainees, and the use of a validated skills assessment tool. The purpose of this research project was to develop a psychometrically valid and reliable tool that has been designed to assess MI skills competence in health care provider trainees. The goal was to develop an assessment tool that would evaluate the acquisition and use of specific MI skills and principles, as well as the quality of the patient-provider therapeutic alliance in brief health care encounters. To address this purpose, specific steps were followed, beginning with a literature review. This review contributed to the development of relevant conceptual and operational definitions, selecting a scaling technique and response format, and methods for analyzing validity and reliability. Internal consistency reliability was established on 88 video recorded interactions. The inter-rater and test-retest reliability were established using randomly selected 18 from the 88 interactions. The assessment tool Motivational Interviewing Skills for Health Care Encounters (MISHCE) and a manual for use of the tool were developed. Validity and reliability of MISHCE were examined. Face and content validity were supported with well-defined conceptual and operational definitions and feedback from an expert panel. Reliability was established through internal consistency, inter-rater reliability, and test-retest reliability. The overall internal consistency reliability (Cronbach's alpha) for all fifteen items was 0.75. MISHCE demonstrated good inter-rater reliability and good to excellent test-retest reliability. MISHCE assesses the health provider's level of knowledge and skills in brief disease management encounters. MISHCE also evaluates quality of the patient-provider therapeutic alliance, i.e., the "flow" of the interaction. Copyright © 2015 Elsevier Inc. All rights reserved.
Development and validation of a Malawian version of the primary care assessment tool.
Dullie, Luckson; Meland, Eivind; Hetlevik, Øystein; Mildestvedt, Thomas; Gjesdal, Sturla
2018-05-16
Malawi does not have validated tools for assessing primary care performance from patients' experience. The aim of this study was to develop a Malawian version of Primary Care Assessment Tool (PCAT-Mw) and to evaluate its reliability and validity in the assessment of the core primary care dimensions from adult patients' perspective in Malawi. A team of experts assessed the South African version of the primary care assessment tool (ZA-PCAT) for face and content validity. The adapted questionnaire underwent forward and backward translation and a pilot study. The tool was then used in an interviewer administered cross-sectional survey in Neno district, Malawi, to test validity and reliability. Exploratory factor analysis was performed on a random half of the sample to evaluate internal consistency, reliability and construct validity of items and scales. The identified constructs were then tested with confirmatory factor analysis. Likert scale assumption testing and descriptive statistics were done on the final factor structure. The PCAT-Mw was further tested for intra-rater and inter-rater reliability. From the responses of 631 patients, a 29-item PCAT-Mw was constructed comprising seven multi-item scales, representing five primary care dimensions (first contact, continuity, comprehensiveness, coordination and community orientation). All the seven scales achieved good internal consistency, item-total correlations and construct validity. Cronbach's alpha coefficient ranged from 0.66 to 0.91. A satisfactory goodness of fit model was achieved (GFI = 0.90, CFI = 0.91, RMSEA = 0.05, PCLOSE = 0.65). The full range of possible scores was observed for all scales. Scaling assumptions tests were achieved for all except the two comprehensiveness scales. Intra-class correlation coefficient (ICC) was 0.90 (n = 44, 95% CI 0.81-0.94, p < 0.001) for intra-rater reliability and 0.84 (n = 42, 95% CI 0.71-0.96, p < 0.001) for inter-rater reliability. Comprehensive metric analyses supported the reliability and validity of PCAT-Mw in assessing the core concepts of primary care from adult patients' experience. This tool could be used for health service research in primary care in Malawi.
Hughes, Michael; Tracey, Andrew; Bhushan, Monica; Chakravarty, Kuntal; Denton, Christopher P; Dubey, Shirish; Guiducci, Serena; Muir, Lindsay; Ong, Voon; Parker, Louise; Pauling, John D; Prabu, Athiveeraramapandian; Rogers, Christine; Roberts, Christopher; Herrick, Ariane L
2018-06-01
The reliability of clinician grading of systemic sclerosis-related digital ulcers has been reported to be poor to moderate at best, which has important implications for clinical trial design. The aim of this study was to examine the reliability of new proposed UK Scleroderma Study Group digital ulcer definitions among UK clinicians with an interest in systemic sclerosis. Raters graded (through a custom-built interface) 90 images (80 unique and 10 repeat) of a range of digital lesions collected from patients with systemic sclerosis. Lesions were graded on an ordinal scale of severity: 'no ulcer', 'healed ulcer' or 'digital ulcer'. A total of 23 clinicians - 18 rheumatologists, 3 dermatologists, 1 hand surgeon and 1 specialist rheumatology nurse - completed the study. A total of 2070 (1840 unique + 230 repeat) image gradings were obtained. For intra-rater reliability, across all images, the overall weighted kappa coefficient was high (0.71) and was moderate (0.55) when averaged across individual raters. Overall inter-rater reliability was poor (0.15). Although our proposed digital ulcer definitions had high intra-rater reliability, the overall inter-rater reliability was poor. Our study highlights the challenges of digital ulcer assessment by clinicians with an interest in systemic sclerosis and provides a number of useful insights for future clinical trial design. Further research is warranted to improve the reliability of digital ulcer definition/rating as an outcome measure in clinical trials, including examining the role for objective measurement techniques, and the development of digital ulcer patient-reported outcome measures.
Font, P; Loscertales, J; Benavente, C; Bermejo, A; Callejas, M; Garcia-Alonso, L; Garcia-Marcilla, A; Gil, S; Lopez-Rubio, M; Martin, E; Muñoz, C; Ricard, P; Soto, C; Balsalobre, P; Villegas, A
2013-01-01
Morphology is the basis of the diagnosis of myelodysplastic syndromes (MDS). The WHO classification offers prognostic information and helps with the treatment decisions. However, morphological changes are subject to potential inter-observer variance. The aim of our study was to explore the reliability of the 2008 WHO classification of MDS, reviewing 100 samples previously diagnosed with MDS using the 2001 WHO criteria. Specimens were collected from 10 hospitals and were evaluated by 10 morphologists, working in five pairs. Each observer evaluated 20 samples, and each sample was analyzed independently by two morphologists. The second observer was blinded to the clinical and laboratory data, except for the peripheral blood (PB) counts. Nineteen cases were considered as unclassified MDS (MDS-U) by the 2001 WHO classification, but only three remained as MDS-U by the 2008 WHO proposal. Discordance was observed in 26 of the 95 samples considered suitable (27 %). Although there were a high number of observers taking part, the rate of discordance was quite similar among the five pairs. The inter-observer concordance was very good regarding refractory anemia with excess blasts type 1 (RAEB-1) (10 of 12 cases, 84 %), RAEB-2 (nine of 10 cases, 90 %), and also good regarding refractory cytopenia with multilineage dysplasia (37 of 50 cases, 74 %). However, the categories with unilineage dysplasia were not reproducible in most of the cases. The rate of concordance with refractory cytopenia with unilineage dysplasia was 40 % (two of five cases) and 25 % with RA with ring sideroblasts (two of eight). Our results show that the 2008 WHO classification gives a more accurate stratification of MDS but also illustrates the difficulty in diagnosing MDS with unilineage dysplasia.
A system framework of inter-enterprise machining quality control based on fractal theory
NASA Astrophysics Data System (ADS)
Zhao, Liping; Qin, Yongtao; Yao, Yiyong; Yan, Peng
2014-03-01
In order to meet the quality control requirement of dynamic and complicated product machining processes among enterprises, a system framework of inter-enterprise machining quality control based on fractal was proposed. In this system framework, the fractal-specific characteristic of inter-enterprise machining quality control function was analysed, and the model of inter-enterprise machining quality control was constructed by the nature of fractal structures. Furthermore, the goal-driven strategy of inter-enterprise quality control and the dynamic organisation strategy of inter-enterprise quality improvement were constructed by the characteristic analysis on this model. In addition, the architecture of inter-enterprise machining quality control based on fractal was established by means of Web service. Finally, a case study for application was presented. The result showed that the proposed method was available, and could provide guidance for quality control and support for product reliability in inter-enterprise machining processes.
Braunschmidt, Brigitte; Müller, Gerhard; Jukic-Puntigam, Margareta; Steininger, Alfred
2013-01-01
Incontinence-associated dermatitis (IAD) is the clinical manifestation of moisture related skin damage (Beeckman, Woodward, & Gray, 2011). Valid assessment instruments are needed for risk assessment and classification of IAD. Aim of the quantitative-descriptive cross-sectional study was to determine the inter-rater reliability of the item scores of the German Incontinence Associated Dermatitis Intervention Tool (IADIT-D) between two independent assessors of nursing home residents (n = 381) in long-term care facilities. The 19 pairs of assessors consisted of registered nurses. The data analysis was computed first with the calculation of the total percentage of agreement. Because this value is not randomly adjusted, the calculation of the Kappa-coefficients and AC1-Statistic was done as well. The total percentage of the inter-rater agreement was 84% (n = 319). In a second step of analysis, the calculation of all items determined high (kappa = .70) and very high agreement (AC1 = .83) levels, respectively. For the risk assessment (kappa = .82; AC1 = .94), the values amounted to very high agreement levels and for the classification (kappa(w) = .70; AC1 = .76) to high agreement levels. The high to very high agreement values of IADIT-D demonstrate that the items can be regarded as stable in regards to the inter-rater reliability for the use in long-term care facilities. In addition, further validation studies are needed.
van Loon, Johannes P A M; Van Dierendonck, Machteld C
2015-12-01
Although recognition of equine pain has been studied extensively over the past decades there is still need for improvement in objective identification of pain in horses with acute colic. This study describes scale construction and clinical applicability of the Equine Utrecht University Scale for Composite Pain Assessment (EQUUS-COMPASS) and the Equine Utrecht University Scale for Facial Assessment of Pain (EQUUS-FAP) in horses with acute colic. A cohort follow-up study was performed using 50 adult horses (n = 25 with acute colic, n = 25 controls). Composite pain scores were assessed by direct observations, Visual Analog Scale (VAS) scores were assessed from video clips. Colic patients were assessed at arrival, and on the first and second mornings after arrival. Both the EQUUS-COMPASS and EQUUS-FAP scores showed high inter-observer reliability (ICC = 0.98 for EQUUS-COMPASS, ICC = 0.93 for EQUUS-FAP, P <0.001), while a moderate inter-observer reliability for the VAS scores was found (ICC = 0.63, P <0.001). The cut-off value for differentiation between healthy and colic horses for the EQUUS-COMPASS was 5, and for differentiation between conservatively treated and surgically treated or euthanased patients it was 11. For the EQUUS-FAP, cut-off values were 4 and 6, respectively. Internal sensitivity and specificity were good for both EQUUS-COMPASS (sensitivity 95.8%, specificity 84.0%) and EQUUS-FAP (sensitivity 87.5%, specificity 88.0%). The use of the EQUUS-COMPASS and EQUUS-FAP enabled repeated and objective scoring of pain in horses with acute colic. A follow-up study with new patients and control animals will be performed to further validate the constructed scales that are described in this study. Copyright © 2015 Elsevier Ltd. All rights reserved.
General motor function assessment scale--reliability of a Norwegian version.
Langhammer, Birgitta; Lindmark, Birgitta
2014-01-01
The General Motor Function assessment scale (GMF) measures activity-related dependence, pain and insecurity among older people in frail health. The aim of the present study was to translate the GMF into a Norwegian version (N-GMF) and establish its reliability and clinical feasibility. The procedure used in translating the GMF was a forward and backward process, testing a convenience sample of 30 frail elderly people with it. The intra-rater reliability tests were performed by three physiotherapists, and the inter-reliability test was done by the same three plus nine independent colleagues. The statistical analyses were performed with a pairwise analysis for intra- and inter-rater reliability, using Cronbach's α, Percentage Agreement (PA), Svensson's rank transformable method and Cohen's κ. The Cronbach's α coefficients for the different subscales of N-GMF were 0.68 for Dependency, 0.73 for Pain and 0.75 for Insecurity. Intra-rater reliability: The variation in the PA for the total score was 40-70% in Dependence, 30-40% in Pain and 30-60% in Insecurity. The Relative Rank Variant (RV) indicated a modest individual bias and an augmented rank-order agreement coefficient ra of 0.96, 0.96 and 0.99, respectively. The variation in the κ statistics was 0.27-0.62 for Dependence, 0.17-0.35 for Pain and 0.13-0.47 for Insecurity. Inter-rater reliability: The PA between different testers in Dependence, Pain and Insecurity was 74%, 89% and 74%, respectively. The augmented rank-order agreement coefficients were: for Dependence r(a) = 0.97; for Pain, r(a) = 0.99; and for Insecurity, r(a) = 0.99. The N-GMF is a fairly reliable instrument for use with frail elderly people, with intra-rater and inter-rater reliability moderate in Dependence and slight to fair in Pain and Insecurity. The clinical usefulness was stressed in regard to its main focus, the frail elderly, and for communication within a multidisciplinary team. Implications for Rehabilitation The Norwegian-General Motor Function Assessment Scale (N-GMF) is a reliable instrument. The N-GMF is an instrument for screening and assessment of activity-related dependence, pain and insecurity in frail older people. The N-GMF may be used as a tool of communication in a multidisciplinary team.
Namkoong, Sun; Hong, Seung Phil; Kim, Myung Hwa; Park, Byung Cheol
2013-02-01
Nowadays, although its clinical value remains controversial institutions utilize hair mineral analysis. Arguments about the reliability of hair mineral analysis persist, and there have been evaluations of commercial laboratories performing hair mineral analysis. The objective of this study was to assess the reliability of intra-laboratory and inter-laboratory data at three commercial laboratories conducting hair mineral analysis, compared to serum mineral analysis. Two divided hair samples taken from near the scalp were submitted for analysis at the same time, to all laboratories, from one healthy volunteer. Each laboratory sent a report consisting of quantitative results and their interpretation of health implications. Differences among intra-laboratory and interlaboratory data were analyzed using SPSS version 12.0 (SPSS Inc., USA). All the laboratories used identical methods for quantitative analysis, and they generated consistent numerical results according to Friedman analysis of variance. However, the normal reference ranges of each laboratory varied. As such, each laboratory interpreted the patient's health differently. On intra-laboratory data, Wilcoxon analysis suggested they generated relatively coherent data, but laboratory B could not in one element, so its reliability was doubtful. In comparison with the blood test, laboratory C generated identical results, but not laboratory A and B. Hair mineral analysis has its limitations, considering the reliability of inter and intra laboratory analysis comparing with blood analysis. As such, clinicians should be cautious when applying hair mineral analysis as an ancillary tool. Each laboratory included in this study requires continuous refinement from now on for inducing standardized normal reference levels.
Brown Adipose Tissue Quantification in Human Neonates Using Water-Fat Separated MRI
Rasmussen, Jerod M.; Entringer, Sonja; Nguyen, Annie; van Erp, Theo G. M.; Guijarro, Ana; Oveisi, Fariba; Swanson, James M.; Piomelli, Daniele; Wadhwa, Pathik D.
2013-01-01
There is a major resurgence of interest in brown adipose tissue (BAT) biology, particularly regarding its determinants and consequences in newborns and infants. Reliable methods for non-invasive BAT measurement in human infants have yet to be demonstrated. The current study first validates methods for quantitative BAT imaging of rodents post mortem followed by BAT excision and re-imaging of excised tissues. Identical methods are then employed in a cohort of in vivo infants to establish the reliability of these measures and provide normative statistics for BAT depot volume and fat fraction. Using multi-echo water-fat MRI, fat- and water-based images of rodents and neonates were acquired and ratios of fat to the combined signal from fat and water (fat signal fraction) were calculated. Neonatal scans (n = 22) were acquired during natural sleep to quantify BAT and WAT deposits for depot volume and fat fraction. Acquisition repeatability was assessed based on multiple scans from the same neonate. Intra- and inter-rater measures of reliability in regional BAT depot volume and fat fraction quantification were determined based on multiple segmentations by two raters. Rodent BAT was characterized as having significantly higher water content than WAT in both in situ as well as ex vivo imaging assessments. Human neonate deposits indicative of bilateral BAT in spinal, supraclavicular and axillary regions were observed. Pairwise, WAT fat fraction was significantly greater than BAT fat fraction throughout the sample (ΔWAT-BAT = 38%, p<10−4). Repeated scans demonstrated a high voxelwise correlation for fat fraction (Rall = 0.99). BAT depot volume and fat fraction measurements showed high intra-rater (ICCBAT,VOL = 0.93, ICCBAT,FF = 0.93) and inter-rater reliability (ICCBAT,VOL = 0.86, ICCBAT,FF = 0.93). This study demonstrates the reliability of using multi-echo water-fat MRI in human neonates for quantification throughout the torso of BAT depot volume and fat fraction measurements. PMID:24205024