inter-examiner reliability study: Topics by Science.gov

Sample records for inter-examiner reliability study

The intra- and inter-observer reliability of the physical examination methods used to assess patients with patellofemoral joint instability.

PubMed

Smith, Toby O; Clark, Allan; Neda, Sophia; Arendt, Elizabeth A; Post, William R; Grelsamer, Ronald P; Dejour, David; Almqvist, Karl Fredrik; Donell, Simon T

2012-08-01

An accurate physical examination of patients with patellar instability is an important aspect of the diagnosis and treatment. While previous studies have assessed the diagnostic accuracy of such physical examination tests, little has been undertaken to assess the inter- and intra-tester reliability of such techniques. The purpose of this study was to determine the inter- and intra-tester reliability of the physical examination tests used for patients with patellar instability. Five patients (10 knees) with bilateral recurrent patellar instability were assessed by five members of the International Patellofemoral Study Group. Each surgeon assessed each patient twice using 18 reported physical examination tests. The inter- and intra-observer reliability was assessed using weighted Kappa statistics with 95% confidence intervals. The findings of the study suggested that there were very poor inter-observer reliability for the majority of the physical tests, with only the assessments of patellofemoral crepitus, foot arch position and the J-sign presenting with fair to moderate agreement respectively. The intra-observer reliability indicated largely moderate to substantial agreement between the first and second tests performed by each assessor, with the greatest agreement seen for the assessment of tibial torsion, popliteal angle and the Bassett's sign. For the common physical examination tests used in the management of patients with patellar instability inter-observer reliability is poor, while intra-observer reliability is moderate. Standardization of physical exam assessments and further study of these results among different clinicians and more divergent patient groups is indicated. Copyright © 2011 Elsevier B.V. All rights reserved.
Inter-rater reliability of a modified version of Delitto et al.’s classification-based system for low back pain: a pilot study

PubMed Central

Apeldoorn, Adri T.; van Helvoirt, Hans; Ostelo, Raymond W.; Meihuizen, Hanneke; Kamper, Steven J.; van Tulder, Maurits W.; de Vet, Henrica C. W.

2016-01-01

Study design Observational inter-rater reliability study. Objectives To examine: (1) the inter-rater reliability of a modified version of Delitto et al.’s classification-based algorithm for patients with low back pain; (2) the influence of different levels of familiarity with the system; and (3) the inter-rater reliability of algorithm decisions in patients who clearly fit into a subgroup (clear classifications) and those who do not (unclear classifications). Methods Patients were examined twice on the same day by two of three participating physical therapists with different levels of familiarity with the system. Patients were classified into one of four classification groups. Raters were blind to the others’ classification decision. In order to quantify the inter-rater reliability, percentages of agreement and Cohen’s Kappa were calculated. Results A total of 36 patients were included (clear classification n = 23; unclear classification n = 13). The overall rate of agreement was 53% and the Kappa value was 0·34 [95% confidence interval (CI): 0·11–0·57], which indicated only fair inter-rater reliability. Inter-rater reliability for patients with a clear classification (agreement 52%, Kappa value 0·29) was not higher than for patients with an unclear classification (agreement 54%, Kappa value 0·33). Familiarity with the system (i.e. trained with written instructions and previous research experience with the algorithm) did not improve the inter-rater reliability. Conclusion Our pilot study challenges the inter-rater reliability of the classification procedure in clinical practice. Therefore, more knowledge is needed about factors that affect the inter-rater reliability, in order to improve the clinical applicability of the classification scheme. PMID:27559279
Inter-arch digital model vs. manual cast measurements: Accuracy and reliability.

PubMed

Kiviahde, Heikki; Bukovac, Lea; Jussila, Päivi; Pesonen, Paula; Sipilä, Kirsi; Raustia, Aune; Pirttiniemi, Pertti

2017-06-28

The purpose of this study was to evaluate the accuracy and reliability of inter-arch measurements using digital dental models and conventional dental casts. Thirty sets of dental casts with permanent dentition were examined. Manual measurements were done with a digital caliper directly on the dental casts, and digital measurements were made on 3D models by two independent examiners. Intra-class correlation coefficients (ICC), a paired sample t-test or Wilcoxon signed-rank test, and Bland-Altman plots were used to evaluate intra- and inter-examiner error and to determine the accuracy and reliability of the measurements. The ICC values were generally good for manual and excellent for digital measurements. The Bland-Altman plots of all the measurements showed good agreement between the manual and digital methods and excellent inter-examiner agreement using the digital method. Inter-arch occlusal measurements on digital models are accurate and reliable and are superior to manual measurements.
Inter-rater reliability of select physical examination procedures in patients with neck pain.

PubMed

Hanney, William J; George, Steven Z; Kolber, Morey J; Young, Ian; Salamh, Paul A; Cleland, Joshua A

2014-07-01

This study evaluated the inter-rater reliability of select examination procedures in patients with neck pain (NP) conducted over a 24- to 48-h period. Twenty-two patients with mechanical NP participated in a standardized examination. One examiner performed standardized examination procedures and a second blinded examiner repeated the procedures 24-48 h later with no treatment administered between examinations. Inter-rater reliability was calculated with the Cohen Kappa and weighted Kappa for ordinal data while continuous level data were calculated using an intraclass correlation coefficient model 2,1 (ICC2,1). Coefficients for categorical variables ranged from poor to moderate agreement (-0.22 to 0.70 Kappa) and coefficients for continuous data ranged from slight to moderate (ICC2,1 0.28-0.74). The standard error of measurement for cervical range of motion ranged from 5.3° to 9.9° while the minimal detectable change ranged from 12.5° to 23.1°. This study is the first to report inter-rater reliability values for select components of the cervical examination in those patients with NP performed 24-48 h after the initial examination. There was considerably less reliability when compared to previous studies, thus clinicians should consider how the passage of time may influence variability in examination findings over a 24- to 48-h period.
Inter-examiner classification reliability of Mechanical Diagnosis and Therapy for extremity problems - Systematic review.

PubMed

Takasaki, Hiroshi; Okuyama, Kousuke; Rosedale, Richard

2017-02-01

Mechanical Diagnosis and Therapy (MDT) is used in the treatment of extremity problems. Classifying clinical problems is one method of providing effective treatment to a target population. Classification reliability is a key factor to determine the precise clinical problem and to direct an appropriate intervention. To explore inter-examiner reliability of the MDT classification for extremity problems in three reliability designs: 1) vignette reliability using surveys with patient vignettes, 2) concurrent reliability, where multiple assessors decide a classification by observing someone's assessment, 3) successive reliability, where multiple assessors independently assess the same patient at different times. Systematic review with data synthesis in a quantitative format. Agreement of MDT subgroups was examined using the Kappa value, with the operational definition of acceptable reliability set at ≥ 0.6. The level of evidence was determined considering the methodological quality of the studies. Six studies were included and all studies met the criteria for high quality. Kappa values for the vignette reliability design (five studies) were ≥ 0.7. There was data from two cohorts in one study for the concurrent reliability design and the Kappa values ranged from 0.45 to 1.0. Kappa values for the successive reliability design (data from three cohorts in one study) were < 0.6. The current review found strong evidence of acceptable inter-examiner reliability of MDT classification for extremity problems in the vignette reliability design, limited evidence of acceptable reliability in the concurrent reliability design and unacceptable reliability in the successive reliability design. Copyright © 2017 Elsevier Ltd. All rights reserved.
Cardiac valve calcifications on low-dose unenhanced ungated chest computed tomography: inter-observer and inter-examination reliability, agreement and variability.

PubMed

van Hamersvelt, Robbert W; Willemink, Martin J; Takx, Richard A P; Eikendal, Anouk L M; Budde, Ricardo P J; Leiner, Tim; Mol, Christian P; Isgum, Ivana; de Jong, Pim A

2014-07-01

To determine inter-observer and inter-examination variability for aortic valve calcification (AVC) and mitral valve and annulus calcification (MC) in low-dose unenhanced ungated lung cancer screening chest computed tomography (CT). We included 578 lung cancer screening trial participants who were examined by CT twice within 3 months to follow indeterminate pulmonary nodules. On these CTs, AVC and MC were measured in cubic millimetres. One hundred CTs were examined by five observers to determine the inter-observer variability. Reliability was assessed by kappa statistics (κ) and intra-class correlation coefficients (ICCs). Variability was expressed as the mean difference ± standard deviation (SD). Inter-examination reliability was excellent for AVC (κ = 0.94, ICC = 0.96) and MC (κ = 0.95, ICC = 0.90). Inter-examination variability was 12.7 ± 118.2 mm(3) for AVC and 31.5 ± 219.2 mm(3) for MC. Inter-observer reliability ranged from κ = 0.68 to κ = 0.92 for AVC and from κ = 0.20 to κ = 0.66 for MC. Inter-observer ICC was 0.94 for AVC and ranged from 0.56 to 0.97 for MC. Inter-observer variability ranged from -30.5 ± 252.0 mm(3) to 84.0 ± 240.5 mm(3) for AVC and from -95.2 ± 210.0 mm(3) to 303.7 ± 501.6 mm(3) for MC. AVC can be quantified with excellent reliability on ungated unenhanced low-dose chest CT, but manual detection of MC can be subject to substantial inter-observer variability. Lung cancer screening CT may be used for detection and quantification of cardiac valve calcifications. • Low-dose unenhanced ungated chest computed tomography can detect cardiac valve calcifications. • However, calcified cardiac valves are not reported by most radiologists. • Inter-observer and inter-examination variability of aortic valve calcifications is sufficient for longitudinal studies. • Volumetric measurement variability of mitral valve and annulus calcifications is substantial.
Reliability of physical examination tests for the diagnosis of knee disorders: Evidence from a systematic review.

PubMed

Décary, Simon; Ouellet, Philippe; Vendittoli, Pascal-André; Desmeules, François

2016-12-01

Clinicians often rely on physical examination tests to guide them in the diagnostic process of knee disorders. However, reliability of these tests is often overlooked and may influence the consistency of results and overall diagnostic validity. Therefore, the objective of this study was to systematically review evidence on the reliability of physical examination tests for the diagnosis of knee disorders. A structured literature search was conducted in databases up to January 2016. Included studies needed to report reliability measures of at least one physical test for any knee disorder. Methodological quality was evaluated using the QAREL checklist. A qualitative synthesis of the evidence was performed. Thirty-three studies were included with a mean QAREL score of 5.5 ± 0.5. Based on low to moderate quality evidence, the Thessaly test for meniscal injuries reached moderate inter-rater reliability (k = 0.54). Based on moderate to excellent quality evidence, the Lachman for anterior cruciate ligament injuries reached moderate to excellent inter-rater reliability (k = 0.42 to 0.81). Based on low to moderate quality evidence, the Tibiofemoral Crepitus, Joint Line and Patellofemoral Pain/Tenderness, Bony Enlargement and Joint Pain on Movement tests for knee osteoarthritis reached fair to excellent inter-rater reliability (k = 0.29 to 0.93). Based on low to moderate quality evidence, the Lateral Glide, Lateral Tilt, Lateral Pull and Quality of Movement tests for patellofemoral pain reached moderate to good inter-rater reliability (k = 0.49 to 0.73). Many physical tests appear to reach good inter-rater reliability, but this is based on low-quality and conflicting evidence. High-quality research is required to evaluate the reliability of knee physical examination tests. Copyright © 2016 Elsevier Ltd. All rights reserved.
Inter-rater reliability and generalizability of patient note scores using a scoring rubric based on the USMLE Step-2 CS format.

PubMed

Park, Yoon Soo; Hyderi, Abbas; Bordage, Georges; Xing, Kuan; Yudkowsky, Rachel

2016-10-01

Recent changes to the patient note (PN) format of the United States Medical Licensing Examination have challenged medical schools to improve the instruction and assessment of students taking the Step-2 clinical skills examination. The purpose of this study was to gather validity evidence regarding response process and internal structure, focusing on inter-rater reliability and generalizability, to determine whether a locally-developed PN scoring rubric and scoring guidelines could yield reproducible PN scores. A randomly selected subsample of historical data (post-encounter PN from 55 of 177 medical students) was rescored by six trained faculty raters in November-December 2014. Inter-rater reliability (% exact agreement and kappa) was calculated for five standardized patient cases administered in a local graduation competency examination. Generalizability studies were conducted to examine the overall reliability. Qualitative data were collected through surveys and a rater-debriefing meeting. The overall inter-rater reliability (weighted kappa) was .79 (Documentation = .63, Differential Diagnosis = .90, Justification = .48, and Workup = .54). The majority of score variance was due to case specificity (13 %) and case-task specificity (31 %), indicating differences in student performance by case and by case-task interactions. Variance associated with raters and its interactions were modest (<5 %). Raters felt that justification was the most difficult task to score and that having case and level-specific scoring guidelines during training was most helpful for calibration. The overall inter-rater reliability indicates high level of confidence in the consistency of note scores. Designs for scoring notes may optimize reliability by balancing the number of raters and cases.
The reliability of knee joint position testing using electrogoniometry

PubMed Central

Piriyaprasarth, Pagamas; Morris, Meg E; Winter, Adele; Bialocerkowski, Andrea E

2008-01-01

Background The current investigation examined the inter- and intra-tester reliability of knee joint angle measurements using a flexible Penny and Giles Biometric® electrogoniometer. The clinical utility of electrogoniometry was also addressed. Methods The first study examined the inter- and intra-tester reliability of measurements of knee joint angles in supine, sitting and standing in 35 healthy adults. The second study evaluated inter-tester and intra-tester reliability of knee joint angle measurements in standing and after walking 10 metres in 20 healthy adults, using an enhanced measurement protocol with a more detailed electrogoniometer attachment procedure. Both inter-tester reliability studies involved two testers. Results In the first study, inter-tester reliability (ICC[2,10]) ranged from 0.58–0.71 in supine, 0.68–0.79 in sitting and 0.57–0.80 in standing. The standard error of measurement between testers was less than 3.55° and the limits of agreement ranged from -12.51° to 12.21°. Reliability coefficients for intra-tester reliability (ICC[3,10]) ranged from 0.75–0.76 in supine, 0.86–0.87 in sitting and 0.87–0.88 in standing. The standard error of measurement for repeated measures by the same tester was less than 1.7° and the limits of agreement ranged from -8.13° to 7.90°. The second study showed that using a more detailed electrogoniometer attachment protocol reduced the error of measurement between testers to 0.5°. Conclusion Using a standardised protocol, reliable measures of knee joint angles can be gained in standing, supine and sitting by using a flexible goniometer. PMID:18211714
Harmonization Process and Reliability Assessment of Anthropometric Measurements in the Elderly EXERNET Multi-Centre Study

PubMed Central

Gómez-Cabello, Alba; Vicente-Rodríguez, Germán; Albers, Ulrike; Mata, Esmeralda; Rodriguez-Marroyo, Jose A.; Olivares, Pedro R.; Gusi, Narcis; Villa, Gerardo; Aznar, Susana; Gonzalez-Gross, Marcela; Casajús, Jose A.; Ara, Ignacio

2012-01-01

Background The elderly EXERNET multi-centre study aims to collect normative anthropometric data for old functionally independent adults living in Spain. Purpose To describe the standardization process and reliability of the anthropometric measurements carried out in the pilot study and during the final workshop, examining both intra- and inter-rater errors for measurements. Materials and Methods A total of 98 elderly from five different regions participated in the intra-rater error assessment, and 10 different seniors living in the city of Toledo (Spain) participated in the inter-rater assessment. We examined both intra- and inter-rater errors for heights and circumferences. Results For height, intra-rater technical errors of measurement (TEMs) were smaller than 0.25 cm. For circumferences and knee height, TEMs were smaller than 1 cm, except for waist circumference in the city of Cáceres. Reliability for heights and circumferences was greater than 98% in all cases. Inter-rater TEMs were 0.61 cm for height, 0.75 cm for knee-height and ranged between 2.70 and 3.09 cm for the circumferences measured. Inter-rater reliabilities for anthropometric measurements were always higher than 90%. Conclusion The harmonization process, including the workshop and pilot study, guarantee the quality of the anthropometric measurements in the elderly EXERNET multi-centre study. High reliability and low TEM may be expected when assessing anthropometry in elderly population. PMID:22860013
Intra- and inter-tester reliability and validity of normal finger size measurement using the Japanese ring gauge system.

PubMed

Suzuki, T; Sato, Y; Sotome, S; Arai, H; Arai, A; Yoshida, H

2017-06-01

This study was designed to investigate the reliability and validity of measurements of finger diameters with a ring gauge. A reliability study enrolled two independent samples (50 participants and seven examiners in Study I; 26 participants and 26 examiners in Study II). The sizes of each participant's little fingers were measured twice with a ring gauge by each examiner. To investigate the validity of the measurements, five hand therapists compared the finger size and hand volume of 30 participants with the ring gauge and with a figure-of-eight technique (Study III). The intra-class correlation coefficient for intra-observer reliability ranged from 0.97 to 0.99 in Study I, and 0.90 to 0.97 in Study II. The intra-class correlation coefficient for inter-observer reliability was 0.95 in Study I and 0.94 in Study II. The validity study showed a Pearson product moment correlation coefficient of 0.75. The ring gauge showed high reliability and validity for measurement of finger size. III, diagnostic.
Effect of image resolution manipulation in rearfoot angle measurements obtained with photogrammetry

PubMed Central

Sacco, I.C.N.; Picon, A.P.; Ribeiro, A.P.; Sartor, C.D.; Camargo-Junior, F.; Macedo, D.O.; Mori, E.T.T.; Monte, F.; Yamate, G.Y.; Neves, J.G.; Kondo, V.E.; Aliberti, S.

2012-01-01

The aim of this study was to investigate the influence of image resolution manipulation on the photogrammetric measurement of the rearfoot static angle. The study design was that of a reliability study. We evaluated 19 healthy young adults (11 females and 8 males). The photographs were taken at 1536 pixels in the greatest dimension, resized into four different resolutions (1200, 768, 600, 384 pixels) and analyzed by three equally trained examiners on a 96-pixels per inch (ppi) screen. An experienced physiotherapist marked the anatomic landmarks of rearfoot static angles on two occasions within a 1-week interval. Three different examiners had marked angles on digital pictures. The systematic error and the smallest detectable difference were calculated from the angle values between the image resolutions and times of evaluation. Different resolutions were compared by analysis of variance. Inter- and intra-examiner reliability was calculated by intra-class correlation coefficients (ICC). The rearfoot static angles obtained by the examiners in each resolution were not different (P > 0.05); however, the higher the image resolution the better the inter-examiner reliability. The intra-examiner reliability (within a 1-week interval) was considered to be unacceptable for all image resolutions (ICC range: 0.08-0.52). The whole body image of an adult with a minimum size of 768 pixels analyzed on a 96-ppi screen can provide very good inter-examiner reliability for photogrammetric measurements of rearfoot static angles (ICC range: 0.85-0.92), although the intra-examiner reliability within each resolution was not acceptable. Therefore, this method is not a proper tool for follow-up evaluations of patients within a therapeutic protocol. PMID:22911379
Effect of image resolution manipulation in rearfoot angle measurements obtained with photogrammetry.

PubMed

Sacco, I C N; Picon, A P; Ribeiro, A P; Sartor, C D; Camargo-Junior, F; Macedo, D O; Mori, E T T; Monte, F; Yamate, G Y; Neves, J G; Kondo, V E; Aliberti, S

2012-09-01

The aim of this study was to investigate the influence of image resolution manipulation on the photogrammetric measurement of the rearfoot static angle. The study design was that of a reliability study. We evaluated 19 healthy young adults (11 females and 8 males). The photographs were taken at 1536 pixels in the greatest dimension, resized into four different resolutions (1200, 768, 600, 384 pixels) and analyzed by three equally trained examiners on a 96-pixels per inch (ppi) screen. An experienced physiotherapist marked the anatomic landmarks of rearfoot static angles on two occasions within a 1-week interval. Three different examiners had marked angles on digital pictures. The systematic error and the smallest detectable difference were calculated from the angle values between the image resolutions and times of evaluation. Different resolutions were compared by analysis of variance. Inter- and intra-examiner reliability was calculated by intra-class correlation coefficients (ICC). The rearfoot static angles obtained by the examiners in each resolution were not different (P > 0.05); however, the higher the image resolution the better the inter-examiner reliability. The intra-examiner reliability (within a 1-week interval) was considered to be unacceptable for all image resolutions (ICC range: 0.08-0.52). The whole body image of an adult with a minimum size of 768 pixels analyzed on a 96-ppi screen can provide very good inter-examiner reliability for photogrammetric measurements of rearfoot static angles (ICC range: 0.85-0.92), although the intra-examiner reliability within each resolution was not acceptable. Therefore, this method is not a proper tool for follow-up evaluations of patients within a therapeutic protocol.
IRR (Inter-Rater Reliability) of a COP (Classroom Observation Protocol)--A Critical Appraisal

ERIC Educational Resources Information Center

Rui, Ning; Feldman, Jill M.

2012-01-01

Notwithstanding broad utility of COPs (classroom observation protocols), there has been limited documentation of the psychometric properties of even the most popular COPs. This study attempted to fill this void by closely examining the item and domain-level IRR (inter-rater reliability) of a COP that was used in a federally funded striving readers…
Test-retest reliability of myofascial trigger point detection in hip and thigh areas.

PubMed

Rozenfeld, E; Finestone, A S; Moran, U; Damri, E; Kalichman, L

2017-10-01

Myofascial trigger points (MTrP's) are a primary source of pain in patients with musculoskeletal disorders. Nevertheless, they are frequently underdiagnosed. Reliable MTrP palpation is the necessary for their diagnosis and treatment. The few studies that have looked for intra-tester reliability of MTrPs detection in upper body, provide preliminary evidence that MTrP palpation is reliable. Reliability tests for MTrP palpation on the lower limb have not yet been performed. To evaluate inter- and intra-tester reliability of MTrP recognition in hip and thigh muscles. Reliability study. 21 patients (15 males and 6 females, mean age 21.1 years) referred to the physical therapy clinic, 10 with knee or hip pain and 11 with pain in an upper limb, low back, shin or ankle. Two experienced physical therapists performed the examinations, blinded to the subjects' identity, medical condition and results of the previous MTrP evaluation. Each subject was evaluated four times, twice by each examiner in a random order. Dichotomous findings included a palpable taut band, tenderness, referred pain, and relevance of referred pain to patient's complaint. Based on these, diagnosis of latent MTrP's or active MTrP's was established. The evaluation was performed on both legs and included a total of 16 locations in the following muscles: rectus femoris (proximal), vastus medialis (middle and distal), vastus lateralis (middle and distal) and gluteus medius (anterior, posterior and distal). Inter- and intra-tester reliability (Cohen's kappa (κ)) values for single sites ranged from -0.25 to 0.77. Median intra-tester reliability was 0.45 and 0.46 for latent and active MTrP's, and median inter-tester reliability was 0.51 and 0.64 for latent and active MTrPs, respectively. The examination of the distal vastus medialis was most reliable for latent and active MTrP's (intra-tester k = 0.27-0.77, inter-tester k = 0.77 and intra-tester k = 0.53-0.72, inter-tester k = 0.72, correspondingly). Inter- and intra-tester reliability of active and latent MTrP evaluation was moderate to substantial. Palpation evaluation can be used for clinical diagnosis of MTrP's in the hip and thigh muscles. This study provides evidence that MTrP palpation is a moderately reliable diagnostic tool in the hip and thigh muscles and can be used in clinical practice and research. Copyright © 2017 Elsevier Ltd. All rights reserved.
Assessment of Lower Limb Muscle Strength and Power Using Hand-Held and Fixed Dynamometry: A Reliability and Validity Study

PubMed Central

Perraton, Luke G.; Bower, Kelly J.; Adair, Brooke; Pua, Yong-Hao; Williams, Gavin P.; McGaw, Rebekah

2015-01-01

Introduction Hand-held dynamometry (HHD) has never previously been used to examine isometric muscle power. Rate of force development (RFD) is often used for muscle power assessment, however no consensus currently exists on the most appropriate method of calculation. The aim of this study was to examine the reliability of different algorithms for RFD calculation and to examine the intra-rater, inter-rater, and inter-device reliability of HHD as well as the concurrent validity of HHD for the assessment of isometric lower limb muscle strength and power. Methods 30 healthy young adults (age: 23±5yrs, male: 15) were assessed on two sessions. Isometric muscle strength and power were measured using peak force and RFD respectively using two HHDs (Lafayette Model-01165 and Hoggan microFET2) and a criterion-reference KinCom dynamometer. Statistical analysis of reliability and validity comprised intraclass correlation coefficients (ICC), Pearson correlations, concordance correlations, standard error of measurement, and minimal detectable change. Results Comparison of RFD methods revealed that a peak 200ms moving window algorithm provided optimal reliability results. Intra-rater, inter-rater, and inter-device reliability analysis of peak force and RFD revealed mostly good to excellent reliability (coefficients ≥ 0.70) for all muscle groups. Concurrent validity analysis showed moderate to excellent relationships between HHD and fixed dynamometry for the hip and knee (ICCs ≥ 0.70) for both peak force and RFD, with mostly poor to good results shown for the ankle muscles (ICCs = 0.31–0.79). Conclusions Hand-held dynamometry has good to excellent reliability and validity for most measures of isometric lower limb strength and power in a healthy population, particularly for proximal muscle groups. To aid implementation we have created freely available software to extract these variables from data stored on the Lafayette device. Future research should examine the reliability and validity of these variables in clinical populations. PMID:26509265
Inter-rater reliability of three standardized functional tests in patients with low back pain

PubMed Central

Tidstrand, Johan; Horneij, Eva

2009-01-01

Background Of all patients with low back pain, 85% are diagnosed as "non-specific lumbar pain". Lumbar instability has been described as one specific diagnosis which several authors have described as delayed muscular responses, impaired postural control as well as impaired muscular coordination among these patients. This has mostly been measured and evaluated in a laboratory setting. There are few standardized and evaluated functional tests, examining functional muscular coordination which are also applicable in the non-laboratory setting. In ordinary clinical work, tests of functional muscular coordination should be easy to apply. The aim of this present study was to therefore standardize and examine the inter-rater reliability of three functional tests of muscular functional coordination of the lumbar spine in patients with low back pain. Methods Nineteen consecutive individuals, ten men and nine women were included. (Mean age 42 years, SD ± 12 yrs). Two independent examiners assessed three tests: "single limb stance", "sitting on a Bobath ball with one leg lifted" and "unilateral pelvic lift" on the same occasion. The standardization procedure took altered positions of the spine or pelvis and compensatory movements of the free extremities into account. The inter-rater reliability was analyzed by Cohen's kappa coefficient (κ) and by percentage agreement. Results The inter-rater reliability for the right and the left leg respectively was: for the single limb stance very good (κ: 0.88–1.0), for sitting on a Bobath ball good (κ: 0.79) and very good (κ: 0.88) and for the unilateral pelvic lift: good (κ: 0.61) and moderate (κ: 0.47). Conclusion The present study showed good to very good inter-rater reliability for two standardized tests, that is, the single-limb stance and sitting on a Bobath-ball with one leg lifted. Inter-rater reliability for the unilateral pelvic lift test was moderate to good. Validation of the tests in their ability to evaluate lumbar stability is required. PMID:19490644
The intra- and inter-assessor reliability of measurement of functional outcome by lameness scoring in horses.

PubMed

Fuller, Catherine J; Bladon, Bruce M; Driver, Adam J; Barr, Alistair R S

2006-03-01

The objective of this study was to assess the reliability of lameness scoring in horses. One veterinary surgeon examined nineteen lame horses on four occasions. Gait was recorded by camcorder, and scored from 0 to 10 ranging from sound to non-weight bearing lameness. A global score of overall change in lameness during the study was also determined for each horse. To measure intra-assessor reliability of the scoring systems, one veterinary surgeon scored videotapes of the horses' gaits on two occasions. To measure inter-assessor reliability, three veterinary surgeons viewed the videotapes, assigning individual lameness scores plus global scores to each horse. Reliability of individual lameness scoring was good intra-assessor, but only just within our acceptable limit inter-assessor. However, global scoring of change in lameness throughout the study was found to be reliable overall. Since clinician scoring is commonly used to assess lameness in horses, this is an important finding, fundamental to future clinical studies.
Hand assessment in older adults with musculoskeletal hand problems: a reliability study.

PubMed

Myers, Helen L; Thomas, Elaine; Hay, Elaine M; Dziedzic, Krysia S

2011-01-07

Musculoskeletal hand pain is common in the general population. This study aims to investigate the inter- and intra-observer reliability of two trained observers conducting a simple clinical interview and physical examination for hand problems in older adults. The reliability of applying the American College of Rheumatology (ACR) criteria for hand osteoarthritis to community-dwelling older adults will also be investigated. Fifty-five participants aged 50 years and over with a current self-reported hand problem and registered with one general practice were recruited from a previous health questionnaire study. Participants underwent a standardised, structured clinical interview and physical examination by two independent trained observers and again by one of these observers a month later. Agreement beyond chance was summarised using Kappa statistics and intra-class correlation coefficients. Median values for inter- and intra-observer reliability for clinical interview questions were found to be "substantial" and "moderate" respectively [median agreement beyond chance (Kappa) was 0.75 (range: -0.03, 0.93) for inter-observer ratings and 0.57 (range: -0.02, 1.00) for intra-observer ratings]. Inter- and intra-observer reliability for physical examination items was variable, with good reliability observed for some items, such as grip and pinch strength, and poor reliability observed for others, notably assessment of altered sensation, pain on resisted movement and judgements based on observation and palpation of individual features at single joints, such as bony enlargement, nodes and swelling. Moderate agreement was observed both between and within observers when applying the ACR criteria for hand osteoarthritis. Standardised, structured clinical interview is reliable for taking a history in community-dwelling older adults with self reported hand problems. Agreement between and within observers for physical examination items is variable. Low Kappa values may have resulted, in part, from a low prevalence of clinical signs and symptoms in the study participants. The decision to use clinical interview and hand assessment variables in clinical practice or further research in primary care should include consideration of clinical applicability and training alongside reliability. Further investigation is required to determine the relationship between these clinical questions and assessments and the clinical course of hand pain and hand problems in community-dwelling older adults.
Comparing Global Positioning System (GPS) and Global Navigation Satellite System (GNSS) Measures of Team Sport Movements.

PubMed

Jackson, Benjamin M; Polglaze, Ted; Dawson, Brian; King, Trish; Peeling, Peter

2018-02-21

To compare data from conventional GPS and new GNSS-enabled tracking devices, and to examine the inter-unit reliability of GNSS devices. Inter-device differences between 10 Hz GPS and GNSS devices were examined during laps (n=40) of a simulated game circuit (SGC) and during elite hockey matches (n=21); GNSS inter-unit reliability was also examined during the SGC laps. Differences in distance values and measures in three velocity categories (low <3 m.s -1 ; moderate 3-5 m.s -1 ; high >5 m.s -1 ) and acceleration/deceleration counts (>1.46 m.s -2 and < -1.46 m.s -2 ) were examined using one-way ANOVA. Inter-unit GNSS reliability was examined using the coefficient of variation (CV) and intra-class correlation coefficient (ICC). Inter-device differences (P <0.05) were found for measures of peak deceleration, low-speed distance, % total distance at low speed, and deceleration count during the SGC, and for all measures except total distance and low-speed distance during hockey matches. Inter-unit (GNSS) differences (P <0.05) were not found. The CV was below 5% for total distance, average and peak speeds and distance and % total distance of low-speed running. The GNSS devices had a lower HDoP score than GPS devices in all conditions. These findings suggest that GNSS devices may be more sensitive than GPS in quantifying the physical demands of team sport movements, but further study into the accuracy of GNSS devices is required.

Inter-clinician and intra-clinician reliability of force application during joint mobilization: a systematic review.

PubMed

Gorgos, Kara S; Wasylyk, Nicole T; Van Lunen, Bonnie L; Hoch, Matthew C

2014-04-01

Joint mobilizations are commonly used by clinicians to decrease pain and restore joint arthrokinematics following musculoskeletal injury. The force applied during a joint mobilization treatment is subjective to the individual clinician but may have an effect on patient outcomes. The purpose of this systematic review was to critically appraise and synthesize the studies which examined the reliability of clinicians' force application during joint mobilization. A systematic search of PubMed and EBSCO Host databases from inception to March 1, 2013 was conducted to identify studies assessing the reliability of force application during joint mobilizations. Two reviewers utilized the Quality Appraisal of Reliability Studies (QAREL) assessment tool to determine the quality of included studies. The relative reliability of the included studies was examined through intraclass correlation coefficients (ICC) to synthesize study findings. All results were collated qualitatively with a level of evidence approach. A total of seven studies met the eligibility and were included. Five studies were included that assessed inter-clinician reliability, and six studies were included that assessed intra-clinician reliability. The overall level of evidence for inter-clinician reliability was strong for poor-to-moderate reliability (ICC = -0.04 to 0.70). The overall level of evidence for intra-clinician reliability was strong for good reliability (ICC = 0.75-0.99). This systematic review indicates there is variability in force application between clinicians but individual clinicians apply forces consistently. The results of this systematic review suggest innovative instructional methods are needed to improve consistency and validate the forces applied during of joint mobilization treatments. This is particularly evident for improving the consistency of force application across clinicians. Copyright © 2014 Elsevier Ltd. All rights reserved.
Trunk Muscle Size and Composition Assessment in Older Adults with Chronic Low Back Pain: An Intra-Examiner and Inter-Examiner Reliability Study.

PubMed

Sions, Jaclyn Megan; Smith, Andrew Craig; Hicks, Gregory Evan; Elliott, James Matthew

2016-08-01

To evaluate intra- and inter-examiner reliability for the assessment of relative cross-sectional area, muscle-to-fat infiltration indices, and relative muscle cross-sectional area, i.e., total cross-sectional area minus intramuscular fat, from T1-weighted magnetic resonance images obtained in older adults with chronic low back pain. Reliability study. n = 13 (69.3 ± 8.2 years old) After lumbar magnetic resonance imaging, two examiners produced relative cross-sectional area measurements of multifidi, erector spinae, psoas, and quadratus lumborum by tracing regions of interest just inside fascial borders. Pixel-intensity summaries were used to determine muscle-to-fat infiltration indices; relative muscle cross-sectional area was calculated. Intraclass correlation coefficients were used to estimate intra- and inter-examiner reliability; standard error of measurement was calculated. Intra-examiner intraclass correlation coefficient point estimates for relative cross-sectional area, muscle-to-fat infiltration indices, and relative muscle cross-sectional area were excellent for multifidi and erector spinae across levels L2-L5 (ICC = 0.77-0.99). At L3, intra-examiner reliability was excellent for relative cross-sectional area, muscle-to-fat infiltration indices, and relative muscle cross-sectional area for both psoas and quadratus lumborum (ICC = 0.81-0.99). Inter-examiner intraclass correlation coefficients ranged from poor to excellent for relative cross-sectional area, muscle-to-fat infiltration indices, and relative muscle cross-sectional area. Assessment of relative cross-sectional area, muscle-to-fat infiltration indices, and relative muscle cross-sectional area in older adults with chronic low back pain can be reliably determined by one examiner from T1-weighted images. Such assessments provide valuable information, as muscle-to-fat infiltration indices and relative muscle cross-sectional area indicate that a substantial amount of relative cross-sectional area may be magnetic resonance-visible intramuscular fat in older adults with chronic low back pain. © 2015 American Academy of Pain Medicine. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Intra- and inter-rater reliability of 3D passive intervertebral motion in subjects with nonspecific neck pain assessed by physical therapy students: A pilot study.

PubMed

Rossettini, Giacomo; Rondoni, Angie; Lovato, Tommaso; Strobe, Marco; Verzè, Elisa; Vicentini, Marco; Testa, Marco

2016-06-03

Passive Intervertebral Movements (PIVMs) are commonly used to assess and treat patients with nonspecific neck pain. Only very few studies have investigated 3D movements until now. This study assessed intra- and inter-rater reliability of three-dimensional (3D) cervical PIVMs performed by physical therapy students in patients with nonspecific neck pain. Thirty-one patients, mean age 47.2 ± 7.2 years, were independently evaluated by 2 physical therapy students. The raters (A and B) assessed mobility, end-feel and pain provocation performing bilaterally the 3D cervical segmental side-bending test (3D CSSB) from levels C2-C3 to C6-C7. Percentage agreement (raw, positive and negative), Cohen's kappa (95% CI), prevalence index and bias index were calculated to estimate intra- and inter-reliability. Intra-rater reliability showed kappa values ranging between fair and substantial (k 0.29-0.80) for pain provocation, mobility and end-feel, with percentage agreements between 61%-90%. Inter-rater reliability presented kappa values ranging between fair and substantial (k 0.22-0.62) for pain provocation, mobility and end-feel, with percentage agreements between 61% and 80%. Intra-rater reliability of 3D PIVMs was superior to inter-rater reliability in patients with nonspecific neck pain. The most repeatable evaluation parameter was pain. However overall poor reliability suggests avoiding the use of these techniques alone to examine patients and measure their outcome. Further studies are needed to investigate PIVMs reliability in combination with other assessment procedure in symptomatic patients.
Screening of the spine in adolescents: inter- and intra-rater reliability and measurement error of commonly used clinical tests.

PubMed

Aartun, Ellen; Degerfalk, Anna; Kentsdotter, Linn; Hestbaek, Lise

2014-02-10

Evidence on the reliability of clinical tests used for the spinal screening of children and adolescents is currently lacking. The aim of this study was to determine the inter- and intra-rater reliability and measurement error of clinical tests commonly used when screening young spines. Two experienced chiropractors independently assessed 111 adolescents aged 12-14 years who were recruited from a primary school in Denmark. A standardised examination protocol was used to test inter-rater reliability including tests for scoliosis, hypermobility, general mobility, inter-segmental mobility and end range pain in the spine. Seventy-five of the 111 subjects were re-examined after one to four hours to test intra-rater reliability. Percentage agreement and Cohen's Kappa were calculated for binary variables, and interclass correlation (ICC) and Bland-Altman plots with Limits of Agreement (LoA) were calculated for continuous measures. Inter-rater percentage agreement for binary data ranged from 59.5% to 100%. Kappa ranged from 0.06-1.00. Kappa ≥ 0.40 was seen for elbow, thumb, fifth finger and trunk/hip flexion hypermobility, pain response in inter-segmental mobility and end range pain in lumbar flexion and extension. For continuous data, ICCs ranged from 0.40-0.95. Only forward flexion as measured by finger-to-floor distance reached an acceptable ICC(≥ 0.75). Overall, results for intra-rater reliability were better than for inter-rater reliability but for both components, the LoA were quite wide compared with the range of assessments. Some clinical tests showed good, and some tests poor, reliability when applied in a spinal screening of adolescents. The results could probably be improved by additional training and further test standardization. This is the first step in evaluating the value of these tests for the spinal screening of adolescents. Future research should determine the association between these tests and current and/or future neck and back pain.
Hip range of motion and provocative physical examination tests reliability and agreement in asymptomatic volunteers

PubMed Central

Prather, H; Harris-Hayes, M; Hunt, D; Steger-May, K; Mathew, V; Clohisy, JC

2012-01-01

Objective The objectives of this study are the following: 1) report passive hip ROM in asymptomatic young adults, 2) report the intra-tester and inter-tester reliability of hip ROM measurements among testers of multiple disciplines, 3) report the results of provocative hip tests and tester agreement. Design descriptive epidemiology study Setting tertiary university Participants Twenty-eight young adult volunteers without musculoskeletal symptoms, history of disorder or surgery involving the lumbar spine or lower extremities were enrolled and completed the study. Methods Asymptomatic young adult volunteers completed questionnaires and were examined by two blinded examiners during a single session. The testers were physical therapists and physicians. Hip range of motion and provocative tests were completed by both examiners on each hip. Main Outcome Measurements Inter and intra-rater reliability for ROM and agreement for provocative tests was determined. Results Twenty-eight asymptomatic adults with mean age 31 years old (range 18–51 years) and mean modified Harris Hip Score of 99.5 ± 1.5 and UCLA Activity score of 8.8 ± 1.2 completed the study. Intra-rater agreement was excellent for all hip range of motion measurements, with intraclass correlation coefficients (ICCs) ranging from 0.76 to 0.97 with similar agreement if the examiner was a physical therapist or a physician. Excellent inter-rater reliability was found for hip flexion ICC 0.87 (95% CI 0.78 to 0.92), supine internal rotation ICC 0.75 (95% CI 0.60 to 0.84) and prone internal rotation ICC 0.79 (95% CI 0.66 to 0.87). The least reliable measurements were supine hip abduction (ICC 0.34) and supine external rotation (ICC 0.18). Agreement between examiners ranged from 96–100% for provocative hip tests which included the hip impingement, resisted straight leg raise, FABER/Patrick’s and log roll tests. Conclusions Specific hip ROM measures show excellent inter-rater reliability and provocative hip tests show good agreement among multiple examiners and medical disciplines. Further studies are needed to assess the utilization of these measurements and tests as a part of a hip screening examination to assess for young adults at risk intra-articular hip disorders prior to the onset of degenerative changes. PMID:20970757
Reproducibility of African giant pouched rats detecting Mycobacterium tuberculosis.

PubMed

Ellis, Haylee; Mulder, Christiaan; Valverde, Emilio; Poling, Alan; Edwards, Timothy

2017-04-24

African pouched rats sniffing sputum samples provided by local clinics have significantly increased tuberculosis case findings in Tanzania and Mozambique. The objective of this study was to determine the reproducibility of rat results. Over an 18-month period 11,869 samples were examined by the rats. Intra-rater reliability was assessed through Yule's Q. Inter-rater reliability was assessed with Krippendorff's alpha. Intra-rater reliability was high, with a mean Yule's Q of 0.9. Inter-rater agreement was fair, with Krippendorf's alpha ranging from 0.15 to 0.45. Both Intra- and Inter-rater reliability was independent of the sex of the animals, but they were positively correlated with age. Both intra- and inter-rater agreement was lowest for samples designated as smear-negative by the clinics. Overall, the reproducibility of tuberculosis detection rat results was fair and diagnostic results were therefore independent of the rats used.
Measuring the Pain Area: An Intra- and Inter-Rater Reliability Study Using Image Analysis Software.

PubMed

Dos Reis, Felipe Jose Jandre; de Barros E Silva, Veronica; de Lucena, Raphaela Nunes; Mendes Cardoso, Bruno Alexandre; Nogueira, Leandro Calazans

2016-01-01

Pain drawings have frequently been used for clinical information and research. The aim of this study was to investigate intra- and inter-rater reliability of area measurements performed on pain drawings. Our secondary objective was to verify the reliability when using computers with different screen sizes, both with and without mouse hardware. Pain drawings were completed by patients with chronic neck pain or neck-shoulder-arm pain. Four independent examiners participated in the study. Examiners A and B used the same computer with a 16-inch screen and wired mouse hardware. Examiner C used a notebook with a 16-inch screen and no mouse hardware, and Examiner D used a computer with an 11.6-inch screen and a wireless mouse. Image measurements were obtained using GIMP and NIH ImageJ computer programs. The length of all the images was measured using GIMP software to a set scale in ImageJ. Thus, each marked area was encircled and the total surface area (cm(2) ) was calculated for each pain drawing measurement. A total of 117 areas were identified and 52 pain drawings were analyzed. The intrarater reliability between all examiners was high (ICC = 0.989). The inter-rater reliability was also high. No significant differences were observed when using different screen sizes or when using or not using the mouse hardware. This suggests that the precision of these measurements is acceptable for the use of this method as a measurement tool in clinical practice and research. © 2014 World Institute of Pain.
Reliability of specific physical examination tests for the diagnosis of shoulder pathologies: a systematic review and meta-analysis.

PubMed

Lange, Toni; Matthijs, Omer; Jain, Nitin B; Schmitt, Jochen; Lützner, Jörg; Kopkow, Christian

2017-03-01

Shoulder pain in the general population is common and to identify the aetiology of shoulder pain, history, motion and muscle testing, and physical examination tests are usually performed. The aim of this systematic review was to summarise and evaluate intrarater and inter-rater reliability of physical examination tests in the diagnosis of shoulder pathologies. A comprehensive systematic literature search was conducted using MEDLINE, EMBASE, Allied and Complementary Medicine Database (AMED) and Physiotherapy Evidence Database (PEDro) through 20 March 2015. Methodological quality was assessed using the Quality Appraisal of Reliability Studies (QAREL) tool by 2 independent reviewers. The search strategy revealed 3259 articles, of which 18 finally met the inclusion criteria. These studies evaluated the reliability of 62 test and test variations used for the specific physical examination tests for the diagnosis of shoulder pathologies. Methodological quality ranged from 2 to 7 positive criteria of the 11 items of the QAREL tool. This review identified a lack of high-quality studies evaluating inter-rater as well as intrarater reliability of specific physical examination tests for the diagnosis of shoulder pathologies. In addition, reliability measures differed between included studies hindering proper cross-study comparisons. PROSPERO CRD42014009018. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
Inter-rater reliability of direct observations of the physical and psychosocial working conditions in eldercare: An evaluation in the DOSES project.

PubMed

Karstad, Kristina; Rugulies, Reiner; Skotte, Jørgen; Munch, Pernille Kold; Greiner, Birgit A; Burdorf, Alex; Søgaard, Karen; Holtermann, Andreas

2018-05-01

The aim of the study was to develop and evaluate the reliability of the "Danish observational study of eldercare work and musculoskeletal disorders" (DOSES) observation instrument to assess physical and psychosocial risk factors for musculoskeletal disorders (MSD) in eldercare work. During 1.5 years, sixteen raters conducted 117 inter-rater observations from 11 nursing homes. Reliability was evaluated using percent agreement and Gwet's AC1 coefficient. Of the 18 examined items, inter-rater reliability was excellent for 7 items (AC1>0.75) fair to good for 7 items (AC1 0.40-0.75) and poor for 2 items (AC1 0-0.40). For 2 items there was no agreement between the raters (AC1 <0). The reliability did not differ between the first and second half of the data collection period and the inter-rater observations were representative regarding occurrence of events in eldercare work. The instrument is appropriate for assessing physical and psychosocial risk factors for MSD among eldercare workers. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
Intra- and Inter-Observer Reliability of the Trunk Impairment Scale for Children with Cerebral Palsy

ERIC Educational Resources Information Center

Saether, Rannei; Jorgensen, Lone

2011-01-01

Standardized scales to evaluate qualities of trunk movements in children with dysfunction are sparse. An examination of the reliability of scales that may be useful in the clinic is important. The aim of this study was to examine the reliability of the Trunk Impairment Scale (TIS) for children with cerebral palsy (CP). Standardized scales are…
Indicating spinal joint mobilisations or manipulations in patients with neck or low-back pain: protocol of an inter-examiner reliability study among manual therapists.

PubMed

van Trijffel, Emiel; Lindeboom, Robert; Bossuyt, Patrick Mm; Schmitt, Maarten A; Lucas, Cees; Koes, Bart W; Oostendorp, Rob Ab

2014-01-01

Manual spinal joint mobilisations and manipulations are widely used treatments in patients with neck and low-back pain. Inter-examiner reliability of passive intervertebral motion assessment of the cervical and lumbar spine, perceived as important for indicating these interventions, is poor within a univariable approach. The diagnostic process as a whole in daily practice in manual therapy has a multivariable character, however, in which the use and interpretation of passive intervertebral motion assessment depend on earlier results from the diagnostic process. To date, the inter-examiner reliability among manual therapists of a multivariable diagnostic decision-making process in patients with neck or low-back pain is unknown. This study will be conducted as a repeated-measures design in which 14 pairs of manual therapists independently examine a consecutive series of a planned total of 165 patients with neck or low-back pain presenting in primary care physiotherapy. Primary outcome measure is therapists' decision about whether or not manual spinal joint mobilisations or manipulations, or both, are indicated in each patient, alone or as part of a multimodal treatment. Therapists will largely be free to conduct the full diagnostic process based on their formulated examination objectives. For each pair of therapists, 2×2 tables will be constructed and reliability for the dichotomous decision will be expressed using Cohen's kappa. In addition, observed agreement, prevalence of positive decisions, prevalence index, bias index, and specific agreement in positive and negative decisions will be calculated. Univariable logistic regression analysis of concordant decisions will be performed to explore which demographic, professional, or clinical factors contributed to reliability. This study will provide an estimate of the inter-examiner reliability among manual therapists of indicating spinal joint mobilisations or manipulations in patients with neck or low-back pain based on a multivariable diagnostic reasoning and decision-making process, as opposed to reliability of individual tests. As such, it is proposed as an initial step toward the development of an alternative approach to current classification systems and prediction rules for identifying those patients with spinal disorders that may show a better response to manual therapy which can be incorporated in randomised clinical trials. Potential methodological limitations of this study are discussed.
Indicating spinal joint mobilisations or manipulations in patients with neck or low-back pain: protocol of an inter-examiner reliability study among manual therapists

PubMed Central

2014-01-01

Background Manual spinal joint mobilisations and manipulations are widely used treatments in patients with neck and low-back pain. Inter-examiner reliability of passive intervertebral motion assessment of the cervical and lumbar spine, perceived as important for indicating these interventions, is poor within a univariable approach. The diagnostic process as a whole in daily practice in manual therapy has a multivariable character, however, in which the use and interpretation of passive intervertebral motion assessment depend on earlier results from the diagnostic process. To date, the inter-examiner reliability among manual therapists of a multivariable diagnostic decision-making process in patients with neck or low-back pain is unknown. Methods This study will be conducted as a repeated-measures design in which 14 pairs of manual therapists independently examine a consecutive series of a planned total of 165 patients with neck or low-back pain presenting in primary care physiotherapy. Primary outcome measure is therapists’ decision about whether or not manual spinal joint mobilisations or manipulations, or both, are indicated in each patient, alone or as part of a multimodal treatment. Therapists will largely be free to conduct the full diagnostic process based on their formulated examination objectives. For each pair of therapists, 2×2 tables will be constructed and reliability for the dichotomous decision will be expressed using Cohen’s kappa. In addition, observed agreement, prevalence of positive decisions, prevalence index, bias index, and specific agreement in positive and negative decisions will be calculated. Univariable logistic regression analysis of concordant decisions will be performed to explore which demographic, professional, or clinical factors contributed to reliability. Discussion This study will provide an estimate of the inter-examiner reliability among manual therapists of indicating spinal joint mobilisations or manipulations in patients with neck or low-back pain based on a multivariable diagnostic reasoning and decision-making process, as opposed to reliability of individual tests. As such, it is proposed as an initial step toward the development of an alternative approach to current classification systems and prediction rules for identifying those patients with spinal disorders that may show a better response to manual therapy which can be incorporated in randomised clinical trials. Potential methodological limitations of this study are discussed. PMID:24982754
Does a Rater's Familiarity with a Candidate's Pronunciation Affect the Rating in Oral Proficiency Interviews?

ERIC Educational Resources Information Center

Carey, Michael D.; Mannell, Robert H.; Dunn, Peter K.

2011-01-01

This study investigated factors that could affect inter-examiner reliability in the pronunciation assessment component of speaking tests. We hypothesized that the rating of pronunciation is susceptible to variation in assessment due to the amount of exposure examiners have to nonnative English accents. An inter-rater variability analysis was…
Examiner Training and Reliability in Two Randomized Clinical Trials of Adult Dental Caries

PubMed Central

Banting, David W.; Amaechi, Bennett T.; Bader, James D.; Blanchard, Peter; Gilbert, Gregg H.; Gullion, Christina M.; Holland, Jan Carlton; Makhija, Sonia K.; Papas, Athena; Ritter, André V.; Singh, Mabi L.; Vollmer, William M.

2013-01-01

Objectives This report describes the training of dental examiners participating in two dental caries clinical trials and reports the inter- and intra- examiner reliability scores from the initial standardization sessions. Methods Study examiners were trained to use a modified ICDAS-II system to detect the visual signs of non-cavitated and cavitated dental caries in adult subjects. Dental caries was classified as no caries (S), non-cavitated caries (D1), enamel caries (D2) and dentine caries (D3). Three standardization sessions involving 60 subjects and 3604 tooth surface calls were used to calculate several measures of examiner reliability. Results The prevalence of dental caries observed in the standardization sessions ranged from 1.4% to 13.5% of the coronal tooth surfaces examined. Overall agreement between pairs of examiners ranged from 0.88 to 0.99. An intra-class coefficient threshold of 0.60 was surpassed for all but one examiner. Inter-examiner unweighted kappa values were low (0.23– 0.35) but weighted kappas and the ratio of observed to maximum kappas were more encouraging (0.42– 0.83). The highest kappa values occurred for the S/D1 vs. D2/D3 two-level classification of dental caries, for which seven of the eight examiners achieved observed to maximum kappa values over 0.90.Intra-examiner reliability was notably higher than inter-examiner reliability for all measures and dental caries classification systems employed. Conclusion The methods and results for the initial examiner training and standardization sessions for two large clinical trials are reported. Recommendations for others planning examiner training and standardization sessions are offered. PMID:22320292
Examiner training and reliability in two randomized clinical trials of adult dental caries.

PubMed

Banting, David W; Amaechi, Bennett T; Bader, James D; Blanchard, Peter; Gilbert, Gregg H; Gullion, Christina M; Holland, Jan Carlton; Makhija, Sonia K; Papas, Athena; Ritter, André V; Singh, Mabi L; Vollmer, William M

2011-01-01

This report describes the training of dental examiners participating in two dental caries clinical trials and reports the inter- and intra-examiner reliability scores from the initial standardization sessions. Study examiners were trained to use a modified International Caries Detection and Assessment System II system to detect the visual signs of non-cavitated and cavitated dental caries in adult subjects. Dental caries was classified as no caries (S), non-cavitated caries (D1), enamel caries (D2), and dentine caries (D3). Three standardization sessions involving 60 subjects and 3,604 tooth surface calls were used to calculate several measures of examiner reliability. The prevalence of dental caries observed in the standardization sessions ranged from 1.4 percent to 13.5 percent of the coronal tooth surfaces examined. Overall agreement between pairs of examiners ranged from 0.88 to 0.99. An intra-class coefficient threshold of 0.60 was surpassed for all but one examiner. Inter-examiner unweighted kappa values were low (0.23-0.35), but weighted kappas and the ratio of observed to maximum kappas were more encouraging (0.42-0.83). The highest kappa values occurred for the S/D1 versus D2/D3 two-level classification of dental caries, for which seven of the eight examiners achieved observed to maximum kappa values over 0.90. Intra-examiner reliability was notably higher than inter-examiner reliability for all measures and dental caries classifications employed. The methods and results for the initial examiner training and standardization sessions for two large clinical trials are reported. Recommendations for others planning examiner training and standardization sessions are offered. © 2011 American Association of Public Health Dentistry.
Validity and reliability of a low-cost digital dynamometer for measuring isometric strength of lower limb.

PubMed

Romero-Franco, Natalia; Jiménez-Reyes, Pedro; Montaño-Munuera, Juan A

2017-11-01

Lower limb isometric strength is a key parameter to monitor the training process or recognise muscle weakness and injury risk. However, valid and reliable methods to evaluate it often require high-cost tools. The aim of this study was to analyse the concurrent validity and reliability of a low-cost digital dynamometer for measuring isometric strength in lower limb. Eleven physically active and healthy participants performed maximal isometric strength for: flexion and extension of ankle, flexion and extension of knee, flexion, extension, adduction, abduction, internal and external rotation of hip. Data obtained by the digital dynamometer were compared with the isokinetic dynamometer to examine its concurrent validity. Data obtained by the digital dynamometer from 2 different evaluators and 2 different sessions were compared to examine its inter-rater and intra-rater reliability. Intra-class correlation (ICC) for validity was excellent in every movement (ICC > 0.9). Intra and inter-tester reliability was excellent for all the movements assessed (ICC > 0.75). The low-cost digital dynamometer demonstrated strong concurrent validity and excellent intra and inter-tester reliability for assessing isometric strength in the main lower limb movements.
Reliability of Single-Leg Balance and Landing Tests in Rugby Union; Prospect of Using Postural Control to Monitor Fatigue

PubMed Central

Troester, Jordan C.; Jasmin, Jason G.; Duffield, Rob

2018-01-01

The present study examined the inter-trial (within test) and inter-test (between test) reliability of single-leg balance and single-leg landing measures performed on a force plate in professional rugby union players using commercially available software (SpartaMARS, Menlo Park, USA). Twenty-four players undertook test – re-test measures on two occasions (7 days apart) on the first training day of two respective pre-season weeks following 48h rest and similar weekly training loads. Two 20s single-leg balance trials were performed on a force plate with eyes closed. Three single-leg landing trials were performed by jumping off two feet and landing on one foot in the middle of a force plate 1m from the starting position. Single-leg balance results demonstrated acceptable inter-trial reliability (ICC = 0.60-0.81, CV = 11-13%) for sway velocity, anterior-posterior sway velocity, and mediolateral sway velocity variables. Acceptable inter-test reliability (ICC = 0.61-0.89, CV = 7-13%) was evident for all variables except mediolateral sway velocity on the dominant leg (ICC = 0.41, CV = 15%). Single-leg landing results only demonstrated acceptable inter-trial reliability for force based measures of relative peak landing force and impulse (ICC = 0.54-0.72, CV = 9-15%). Inter-test results indicate improved reliability through the averaging of three trials with force based measures again demonstrating acceptable reliability (ICC = 0.58-0.71, CV = 7-14%). Of the variables investigated here, total sway velocity and relative landing impulse are the most reliable measures of single-leg balance and landing performance, respectively. These measures should be considered for monitoring potential changes in postural control in professional rugby union. Key points Single-leg balance demonstrated acceptable inter-trial and inter-test reliability. Single-leg landing demonstrated good inter-trial and inter-test reliability for measures of relative peak landing force and relative impulse, but not time to stabilization. Of the variables investigated, sway velocity and relative landing impulse are the most reliable measures of single-leg balance and landing respectively, and should considered for monitoring changes in postural control. PMID:29769817
The Assessment of Minor Neurological Dysfunction in Infancy Using the Touwen Infant Neurological Examination: Strengths and Limitations

ERIC Educational Resources Information Center

Hadders-Algra, Mijna; Heineman, Kirsten R.; Bos, Arend F.; Middelburg, Karin J.

2010-01-01

Aim: Little is known of minor neurological dysfunction (MND) in infancy. This study aimed to evaluate the inter-assessor reliability of the assessment of MND with the Touwen Infant Neurological Examination (TINE) and the construct and predictive validity of MND in infancy. Method: Inter-assessor agreement was determined in a sample of 40 infants…
Inter-rater Reliability of Three Musculoskeletal Physical examination Techniques Used to Assess Motion in Three Planes While Standing

PubMed Central

Prather, Heidi; Hunt, Devyani; Steger-May, Karen; Hayes, Marcie Harris; Knaus, Evan; Clohisy, John

2012-01-01

Objective The objective of the study was to measure the reliability between examiners of three basic maneuvers of the Total Body Functional Profile© physical examination test. The hypothesis was musculoskeletal health care providers of different disciplines could reliably use the three basic maneuvers as part of the musculoskeletal physical examination. Design A prospective observational study was conducted. Twenty-eight adult volunteers were measured on both the left and right side by two independent raters on a single occasion. Setting The subjects were recruited through advertisements placed by the orthopedic department at a tertiary university. Participants 28 volunteers were recruited and completed the study. The volunteers were between the ages of 18 and 51 years of age, had no symptoms in the lower extremity or spine, had no previous history of surgery or tumor involving the lower extremity, and no medical conditions that would preclude participation. Assessment On a single occasion, two examiners per one volunteer were blinded to their own and each others' measurements. Each examiner assessed the distance of frontal and sagittal plane lunge and angle of motion for transverse plane testing. Main Outcome Measurements Inter-rater agreement is expressed with intraclass correlation coefficients (ICCs) and corresponding 95% confidence intervals (CIs). The difference between raters is reported with 95% CIs. Baseline demographics, UCLA, and Harris hip questionnaires were completed by all participants. Results The UCLA and Harris hip scores showed no significant activity restrictions or pain limitations in all participants. The inter-rater reliability for sagittal, frontal, and transverse plane matrix testing was good with ICCs of 0.86 (95% CI 0.77, 0.91), 0.90 (95% CI 0.84, 0.94), and 0.85 (95% CI 0.75, 0.91) respectively. The rater reliability between disciplines for transverse, sagittal and frontal plane matrix testing was good with ICCs of 0.89 (95% CI 0.80, 0.94), 0.88 (95% CI 0.79, 0.94), 0.90 (95% CI 0.81, 0.95). Conclusion The inter-rater reliability for three basic maneuvers of the Total Body Functional Profile© is good amongst musculoskeletal healthcare providers of different disciplines. These three maneuvers may be used consistently as part of the musculoskeletal physical examination. PMID:19627956
Inter-rater reliability of three musculoskeletal physical examination techniques used to assess motion in three planes while standing.

PubMed

Prather, Heidi; Hunt, Devyani; Steger-May, Karen; Hayes, Marcie Harris; Knaus, Evan; Clohisy, John

2009-07-01

The objective of the study was to measure the reliability between examiners of 3 basic maneuvers of the Total Body Functional Profile physical examination test. The hypothesis was musculoskeletal health care providers of different disciplines could reliably use the 3 basic maneuvers as part of the musculoskeletal physical examination. A prospective observational study was conducted. Twenty-eight adult volunteers were measured on both the left and right side by 2 independent raters on a single occasion. The subjects were recruited through advertisements placed by the orthopedic department at a tertiary university. Twenty-eight volunteers were recruited and completed the study. The volunteers were between the ages of 18 and 51 years of age, had no symptoms in the lower extremity or spine, had no previous history of surgery or tumor involving the lower extremity, and no medical conditions that would preclude participation. On a single occasion, 2 examiners per 1 volunteer were blinded to their own and each others' measurements. Each examiner assessed the distance of frontal and sagittal plane lunge and angle of motion for transverse plane testing. Inter-rater agreement is expressed with intraclass correlation coefficients (ICCs) and corresponding 95% confidence intervals (CIs). The difference between raters is reported with 95% CIs. Baseline demographics, University of California Los Angeles (UCLA), and Harris hip questionnaires were completed by all participants. The UCLA and Harris hip scores showed no significant activity restrictions or pain limitations in all participants. The inter-rater reliability for sagittal, frontal, and transverse plane matrix testing was good with ICCs of 0.86 (95% CI 0.77-0.91), 0.90 (95% CI 0.84-0.94), and 0.85 (95% CI 0.75-0.91), respectively. The rater reliability between disciplines for transverse, sagittal, and frontal plane matrix testing was good with ICCs of 0.89 (95% CI 0.80-0.94), 0.88 (95% CI 0.79-0.94), and 0.90 (95% CI 0.81-0.95), respectively. The inter-rater reliability for 3 basic maneuvers of the Total Body Functional Profile is good among musculoskeletal health care providers of different disciplines. These 3 maneuvers may be used consistently as part of the musculoskeletal physical examination.

The reliability of four widely used patellar height ratios.

PubMed

van Duijvenbode, Dennis; Stavenuiter, Michel; Burger, Bart; van Dijke, Cees; Spermon, Jacco; Hoozemans, Marco

2016-03-01

The objective of this study was to evaluate the inter-observer reliability and the intra-observer reliability of four patellar height ratios: Insall-Salvati (IS), modified Insall-Salvati (MIS), Blackburne-Peel (BP) and Caton-Deschamps (CD). The patellar height ratios were assessed by four independent examiners using weight-bearing lateral knee radiographs in 30° flexion. Intra-class correlation coefficients and Fleiss' kappa's were determined. The inter-observer reliability was excellent for the IS and moderate for the other ratios. When the ratio values were categorized, the inter-observer reliability was strong for the IS, moderate for the MIS and BP, and poor for the CD. The intra-observer reliability was excellent for the IS, MIS and CD, and strong for the BP. When the ratio values were categorized, the intra-observer reliability was strong for the IS and MIS, and moderate for the other ratios. Although the IS showed best reliability, we advise to use the MIS as it showed the second best reliability but is, according to the literature, associated with better validity.
Analysis of the reliability and reproducibility of goniometry compared to hand photogrammetry

PubMed Central

de Carvalho, Rosana Martins Ferreira; Mazzer, Nilton; Barbieri, Claudio Henrique

2012-01-01

Objective: To evaluate the intra- and inter-examiner reliability and reproducibility of goniometry in relation to photogrammetry of hand, comparing the angles of thumb abduction, PIP joint flexion of the II finger and MCP joint flexion of the V finger. Methods: The study included 30 volunteers, who were divided into three groups: one group of 10 physiotherapy students, one group of 10 physiotherapists, and a third group of 10 therapists of the hand. Each examiner performed the measurements on the same hand mold, using the goniometer followed by two photogrammetry software programs; CorelDraw® and ALCimagem®. Results: The results revealed that the groups and the methods proposed presented inter-examiner reliability, generally rated as excellent (ICC 0.998 I.C. 95% 0.995 - 0.999). In the intra-examiner evaluation, an excellent level of reliability was found between the three groups. In the comparison between groups for each angle and each method, no significant differences were found between the groups for most of the measurements. Conclusion: Goniometry and photogrammetry are reliable and reproducible methods for evaluating measurements of the hand. However, due to the lack of similar references, detailed studies are needed to define the normal parameters between the methods in the joints of the hand. Level of Evidence II, Diagnostic Study. PMID:24453594
Intra and Inter-Rater Reliability of Screening for Movement Impairments: Movement Control Tests from The Foundation Matrix

PubMed Central

Mischiati, Carolina R.; Comerford, Mark; Gosford, Emma; Swart, Jacqueline; Ewings, Sean; Botha, Nadine; Stokes, Maria; Mottram, Sarah L.

2015-01-01

Pre-season screening is well established within the sporting arena, and aims to enhance performance and reduce injury risk. With the increasing need to identify potential injury with greater accuracy, a new risk assessment process has been produced; The Performance Matrix (battery of movement control tests). As with any new method of objective testing, it is fundamental to establish whether the same results can be reproduced between examiners and by the same examiner on consecutive occasions. This study aimed to determine the intra-rater test re-test and inter-rater reliability of tests from a component of The Performance Matrix, The Foundation Matrix. Twenty participants were screened by two experienced musculoskeletal therapists using nine tests to assess the ability to control movement during specific tasks. Movement evaluation criteria for each test were rated as pass or fail. The therapists observed participants real-time and tests were recorded on video to enable repeated ratings four months later to examine intra-rater reliability (videos rated two weeks apart). Overall test percentage agreement was 87% for inter-rater reliability; 98% Rater 1, 94% Rater 2 for test re-test reliability; and 75% for real-time versus video. Intraclass-correlation coefficients (ICCs) were excellent between raters (0.81) and within raters (Rater 1, 0.96; Rater 2, 0.88) but poor for real-time versus video (0.23). Reliability for individual components of each test was more variable: inter-rater, 68-100%; intra-rater, 88-100% Rater 1, 75-100% Rater 2; and real-time versus video 31-100%. Cohen’s Kappa values for inter-rater reliability were 0.0-1.0; intra-rater 0.6-1.0 for Rater 1; -0.1-1.0 for Rater 2; and -0.1-1 for real-time versus video. It is concluded that both inter and intra-rater reliability of tests in The Foundation Matrix are acceptable when rated by experienced therapists. Recommendations are made for modifying some of the criteria to improve reliability where excellence was not reached. Key points The movement control tests of The Foundation Matrix had acceptable reliability between raters and within raters on different days Agreement between observations made on tests performed real-time and on video recordings was low, indicating poor validity of use of video recordings Some movement evaluation criteria related to specific tests that did not achieve excellent agreement could be modified to improve reliability PMID:25983594
Reliability and Validity of the Psychoeducational Profile-Third Edition Caregiver Report in Children with Autism Spectrum Disorders

ERIC Educational Resources Information Center

Fu, Chung-Pei; Chen, Kuan-Lin; Tseng, Mei-Hui; Chiang, Fu-Mei; Hsieh, Ching-Lin

2012-01-01

The aim of this study was to examine the internal consistency, inter-respondent reliability, and convergent and divergent validity of the Psychoeducational Profile-third edition Caregiver Report (PEP3-CR) in children with Autism Spectrum Disorders (ASD). We examined the internal consistency on 66 mothers of children with ASD who completed the…
Inter-day Reliability of the IDEEA Activity Monitor for Measuring Movement and Non-Movement Behaviors in Older Adults.

PubMed

de la Cámara, Miguel Ángel; Higueras-Fresnillo, Sara; Martinez-Gomez, David; Veiga, Oscar L

2018-05-29

The inter-day reliability of the Intelligent Device for Energy Expenditure and Activity (IDEEA) has not been studied to date. The study purpose was to examine the inter-day variability and reliability on two consecutive days collected with the IDEEA, as well as to predict the number of days needed to provide a reliable estimate of several movement (walking and climbing stairs) and non-movement behaviors (lying, reclining, sitting) and standing in older adults. The sample included 126 older adults (74 women) who wore the IDEEA for 48-h. Results showed low variability between the two days and its reliability was from moderate (ICC=0.34) to high (ICC=0.80) in most of movement and non-movement behaviors analyzed. The Bland-Altman plots showed a high-moderate agreement between days and the Spearman-Brown formula estimated ranged from 1.2 and 9.1 days of monitoring with the IDEEA are needed to achieve ICCs≥0.70 in older adults for sitting and climbing stairs, respectively.
And the Winner Is … : Inter-Rater Reliability among Scholarship Assessors

ERIC Educational Resources Information Center

Johnston, Lucy; Schluter, Philip J.

2017-01-01

With increasing competition for postgraduate research scholarships, awarding processes demand attention and scrutiny. We examine inter-rater reliability for two prestigious New Zealand scholarships, the Shirtcliffe Fellowship and the Gordon Watson Scholarship. For each scholarship, five assessors (three academic; two non-academic) independently…
Reliability of horizontal and vertical tube shift techniques in the localisation of supernumerary teeth.

PubMed

Mallineni, S K; Anthonappa, R P; King, N M

2016-12-01

To assess the reliability of the vertical tube shift technique (VTST) and horizontal tube shift technique (HTST) for the localisation of unerupted supernumerary teeth (ST) in the anterior region of the maxilla. A convenience sample of 83 patients who attended a major teaching hospital because of unerupted ST was selected. Only non-syndromic patients with ST and who had complete clinical and radiographic and surgical records were included in the study. Ten examiners independently rated the paired set of radiographs for each technique. Chi-square test, paired t test and kappa statistics were employed to assess the intra- and inter-examiner reliability. Paired sets of 1660 radiographs (830 pairs for each technique) were available for the analysis. The overall sensitivity for VTST and HTST was 80.6 and 72.1% respectively, with slight inter-examiner and good intra-examiner reliability. Statistically significant differences were evident between the two localisation techniques (p < 0.05). Localisation of unerupted ST using VTST was more successful than HTST in the anterior region of the maxilla.
Reliability and criterion validity of an observation protocol for working technique assessments in cash register work.

PubMed

Palm, Peter; Josephson, Malin; Mathiassen, Svend Erik; Kjellberg, Katarina

2016-06-01

We evaluated the intra- and inter-observer reliability and criterion validity of an observation protocol, developed in an iterative process involving practicing ergonomists, for assessment of working technique during cash register work for the purpose of preventing upper extremity symptoms. Two ergonomists independently assessed 17 15-min videos of cash register work on two occasions each, as a basis for examining reliability. Criterion validity was assessed by comparing these assessments with meticulous video-based analyses by researchers. Intra-observer reliability was acceptable (i.e. proportional agreement >0.7 and kappa >0.4) for 10/10 questions. Inter-observer reliability was acceptable for only 3/10 questions. An acceptable inter-observer reliability combined with an acceptable criterion validity was obtained only for one working technique aspect, 'Quality of movements'. Thus, major elements of the cashiers' working technique could not be assessed with an acceptable accuracy from short periods of observations by one observer, such as often desired by practitioners. Practitioner Summary: We examined an observation protocol for assessing working technique in cash register work. It was feasible in use, but inter-observer reliability and criterion validity were generally not acceptable when working technique aspects were assessed from short periods of work. We recommend the protocol to be used for educational purposes only.
Reliability of the Cooking Task in adults with acquired brain injury.

PubMed

Poncet, Frédérique; Swaine, Bonnie; Taillefer, Chantal; Lamoureux, Julie; Pradat-Diehl, Pascale; Chevignard, Mathilde

2015-01-01

Acquired brain injury (ABI) often leads to deficits in executive functioning (EF) responsible for severe and long-standing disabilities in daily life activities. The Cooking Task is an ecological and valid test of EF involving multi-tasking in a real environment. Given its complex scoring system, it is important to establish the tool's reliability. The objective of the study was to examine the reliability of the Cooking Task (internal consistency, inter-rater and test-retest reliability). A total of 160 patients with ABI (113 men, mean age 37 years, SD = 14.3) were tested using the Cooking Task. For test-retest reliability, patients were assessed by the same rater on two occasions (mean interval 11 days) while two raters independently and simultaneously observed and scored patients' performances to estimate inter-rater reliability. Internal consistency was high for the global scale (Cronbach α = .74). Inter-rater reliability (n = 66) for total errors was also high (ICC = .93), however the test-retest reliability (n = 11) was poor (ICC = .36). In general the Cooking Task appears to be a reliable tool. The low test-retest results were expected given the importance of EF in the performance of novel tasks.
Reliability of Single-Leg Balance and Landing Tests in Rugby Union; Prospect of Using Postural Control to Monitor Fatigue.

PubMed

Troester, Jordan C; Jasmin, Jason G; Duffield, Rob

2018-06-01

The present study examined the inter-trial (within test) and inter-test (between test) reliability of single-leg balance and single-leg landing measures performed on a force plate in professional rugby union players using commercially available software (SpartaMARS, Menlo Park, USA). Twenty-four players undertook test - re-test measures on two occasions (7 days apart) on the first training day of two respective pre-season weeks following 48h rest and similar weekly training loads. Two 20s single-leg balance trials were performed on a force plate with eyes closed. Three single-leg landing trials were performed by jumping off two feet and landing on one foot in the middle of a force plate 1m from the starting position. Single-leg balance results demonstrated acceptable inter-trial reliability (ICC = 0.60-0.81, CV = 11-13%) for sway velocity, anterior-posterior sway velocity, and mediolateral sway velocity variables. Acceptable inter-test reliability (ICC = 0.61-0.89, CV = 7-13%) was evident for all variables except mediolateral sway velocity on the dominant leg (ICC = 0.41, CV = 15%). Single-leg landing results only demonstrated acceptable inter-trial reliability for force based measures of relative peak landing force and impulse (ICC = 0.54-0.72, CV = 9-15%). Inter-test results indicate improved reliability through the averaging of three trials with force based measures again demonstrating acceptable reliability (ICC = 0.58-0.71, CV = 7-14%). Of the variables investigated here, total sway velocity and relative landing impulse are the most reliable measures of single-leg balance and landing performance, respectively. These measures should be considered for monitoring potential changes in postural control in professional rugby union.
Standard setting: comparison of two methods.

PubMed

George, Sanju; Haque, M Sayeed; Oyebode, Femi

2006-09-14

The outcome of assessments is determined by the standard-setting method used. There is a wide range of standard-setting methods and the two used most extensively in undergraduate medical education in the UK are the norm-reference and the criterion-reference methods. The aims of the study were to compare these two standard-setting methods for a multiple-choice question examination and to estimate the test-retest and inter-rater reliability of the modified Angoff method. The norm-reference method of standard-setting (mean minus 1 SD) was applied to the 'raw' scores of 78 4th-year medical students on a multiple-choice examination (MCQ). Two panels of raters also set the standard using the modified Angoff method for the same multiple-choice question paper on two occasions (6 months apart). We compared the pass/fail rates derived from the norm reference and the Angoff methods and also assessed the test-retest and inter-rater reliability of the modified Angoff method. The pass rate with the norm-reference method was 85% (66/78) and that by the Angoff method was 100% (78 out of 78). The percentage agreement between Angoff method and norm-reference was 78% (95% CI 69% - 87%). The modified Angoff method had an inter-rater reliability of 0.81-0.82 and a test-retest reliability of 0.59-0.74. There were significant differences in the outcomes of these two standard-setting methods, as shown by the difference in the proportion of candidates that passed and failed the assessment. The modified Angoff method was found to have good inter-rater reliability and moderate test-retest reliability.
Validity and reliability of Internet-based physiotherapy assessment for musculoskeletal disorders: a systematic review.

PubMed

Mani, Suresh; Sharma, Shobha; Omar, Baharudin; Paungmali, Aatit; Joseph, Leonard

2017-04-01

Purpose The purpose of this review is to systematically explore and summarise the validity and reliability of telerehabilitation (TR)-based physiotherapy assessment for musculoskeletal disorders. Method A comprehensive systematic literature review was conducted using a number of electronic databases: PubMed, EMBASE, PsycINFO, Cochrane Library and CINAHL, published between January 2000 and May 2015. The studies examined the validity, inter- and intra-rater reliabilities of TR-based physiotherapy assessment for musculoskeletal conditions were included. Two independent reviewers used the Quality Appraisal Tool for studies of diagnostic Reliability (QAREL) and the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) tool to assess the methodological quality of reliability and validity studies respectively. Results A total of 898 hits were achieved, of which 11 articles based on inclusion criteria were reviewed. Nine studies explored the concurrent validity, inter- and intra-rater reliabilities, while two studies examined only the concurrent validity. Reviewed studies were moderate to good in methodological quality. The physiotherapy assessments such as pain, swelling, range of motion, muscle strength, balance, gait and functional assessment demonstrated good concurrent validity. However, the reported concurrent validity of lumbar spine posture, special orthopaedic tests, neurodynamic tests and scar assessments ranged from low to moderate. Conclusion TR-based physiotherapy assessment was technically feasible with overall good concurrent validity and excellent reliability, except for lumbar spine posture, orthopaedic special tests, neurodynamic testa and scar assessment.
Inter-rater reliability of motor unit number estimates and quantitative motor unit analysis in the tibialis anterior muscle.

PubMed

Boe, S G; Dalton, B H; Harwood, B; Doherty, T J; Rice, C L

2009-05-01

To establish the inter-rater reliability of decomposition-based quantitative electromyography (DQEMG) derived motor unit number estimates (MUNEs) and quantitative motor unit (MU) analysis. Using DQEMG, two examiners independently obtained a sample of needle and surface-detected motor unit potentials (MUPs) from the tibialis anterior muscle from 10 subjects. Coupled with a maximal M wave, surface-detected MUPs were used to derive a MUNE for each subject and each examiner. Additionally, size-related parameters of the individual MUs were obtained following quantitative MUP analysis. Test-retest MUNE values were similar with high reliability observed between examiners (ICC=0.87). Additionally, MUNE variability from test-retest as quantified by a 95% confidence interval was relatively low (+/-28 MUs). Lastly, quantitative data pertaining to MU size, complexity and firing rate were similar between examiners. MUNEs and quantitative MU data can be obtained with high reliability by two independent examiners using DQEMG. Establishing the inter-rater reliability of MUNEs and quantitative MU analysis using DQEMG is central to the clinical applicability of the technique. In addition to assessing response to treatments over time, multiple clinicians may be involved in the longitudinal assessment of the MU pool of individuals with disorders of the central or peripheral nervous system.
Ultrasonographic measurements of lower trapezius muscle thickness at rest and during isometric contraction: a reliability study.

PubMed

Talbott, Nancy R; Witt, Dexter W

2014-07-01

The purpose of this study was to determine the intra-rater reliability and inter-rater reliability of ultrasound imaging (USI) thickness measurements of the lower trapezius (LT) at rest and during active contractions when the transverse process and the lamina were used as reference sites for the measurement process. Twenty healthy individuals between the ages of 22 and 32 years volunteered. With the subject prone and the shoulder in 145° of abduction, images of the LT were taken bilaterally by one examiner as the subject: (1) rested; (2) actively held the test position; and (3) actively held the test position while holding a weight. Ten subjects returned and testing was repeated by the same examiner and by a second examiner. LT thickness measurements were recorded at the level of the transverse process and at the level of the lamina. Intra-class correlation coefficients (ICC) for within session intra-rater reliability (ICC3,3) ranged from 0.951 to 0.986 for both measurement sites while between session intra-rater reliability (ICC3,2) ranged from 0.935 to 0.962. Within session inter-rater reliability (ICC2,2) ranged from 0.934 to 0.973. USI can be used to reliably measure LT thickness at rest, during active contraction and during active contraction when holding a weight. The described protocol can be utilized during shoulder examinations to provide an additional assessment tool for monitoring changes in LT thickness.
Reliability of Multi-Category Rating Scales

ERIC Educational Resources Information Center

Parker, Richard I.; Vannest, Kimberly J.; Davis, John L.

2013-01-01

The use of multi-category scales is increasing for the monitoring of IEP goals, classroom and school rules, and Behavior Improvement Plans (BIPs). Although they require greater inference than traditional data counting, little is known about the inter-rater reliability of these scales. This simulation study examined the performance of nine…
Reliability and validity of a nutrition and physical activity environmental self-assessment for child care

PubMed Central

Benjamin, Sara E; Neelon, Brian; Ball, Sarah C; Bangdiwala, Shrikant I; Ammerman, Alice S; Ward, Dianne S

2007-01-01

Background Few assessment instruments have examined the nutrition and physical activity environments in child care, and none are self-administered. Given the emerging focus on child care settings as a target for intervention, a valid and reliable measure of the nutrition and physical activity environment is needed. Methods To measure inter-rater reliability, 59 child care center directors and 109 staff completed the self-assessment concurrently, but independently. Three weeks later, a repeat self-assessment was completed by a sub-sample of 38 directors to assess test-retest reliability. To assess criterion validity, a researcher-administered environmental assessment was conducted at 69 centers and was compared to a self-assessment completed by the director. A weighted kappa test statistic and percent agreement were calculated to assess agreement for each question on the self-assessment. Results For inter-rater reliability, kappa statistics ranged from 0.20 to 1.00 across all questions. Test-retest reliability of the self-assessment yielded kappa statistics that ranged from 0.07 to 1.00. The inter-quartile kappa statistic ranges for inter-rater and test-retest reliability were 0.45 to 0.63 and 0.27 to 0.45, respectively. When percent agreement was calculated, questions ranged from 52.6% to 100% for inter-rater reliability and 34.3% to 100% for test-retest reliability. Kappa statistics for validity ranged from -0.01 to 0.79, with an inter-quartile range of 0.08 to 0.34. Percent agreement for validity ranged from 12.9% to 93.7%. Conclusion This study provides estimates of criterion validity, inter-rater reliability and test-retest reliability for an environmental nutrition and physical activity self-assessment instrument for child care. Results indicate that the self-assessment is a stable and reasonably accurate instrument for use with child care interventions. We therefore recommend the Nutrition and Physical Activity Self-Assessment for Child Care (NAP SACC) instrument to researchers and practitioners interested in conducting healthy weight intervention in child care. However, a more robust, less subjective measure would be more appropriate for researchers seeking an outcome measure to assess intervention impact. PMID:17615078
Reliability and criterion validity of two applications of the iPhone™ to measure cervical range of motion in healthy participants

PubMed Central

2013-01-01

Summary of background data Recent smartphones, such as the iPhone, are often equipped with an accelerometer and magnetometer, which, through software applications, can perform various inclinometric functions. Although these applications are intended for recreational use, they have the potential to measure and quantify range of motion. The purpose of this study was to estimate the intra and inter-rater reliability as well as the criterion validity of the clinometer and compass applications of the iPhone in the assessment cervical range of motion in healthy participants. Methods The sample consisted of 28 healthy participants. Two examiners measured cervical range of motion of each participant twice using the iPhone (for the estimation of intra and inter-reliability) and once with the CROM (for the estimation of criterion validity). Estimates of reliability and validity were then established using the intraclass correlation coefficient (ICC). Results We observed a moderate intra-rater reliability for each movement (ICC = 0.65-0.85) but a poor inter-rater reliability (ICC < 0.60). For the criterion validity, the ICCs are moderate (>0.50) to good (>0.65) for movements of flexion, extension, lateral flexions and right rotation, but poor (<0.50) for the movement left rotation. Conclusion We found good intra-rater reliability and lower inter-rater reliability. When compared to the gold standard, these applications showed moderate to good validity. However, before using the iPhone as an outcome measure in clinical settings, studies should be done on patients presenting with cervical problems. PMID:23829201
Reliability of Pain Measurements Using Computerized Cuff Algometry: A DoloCuff Reliability and Agreement Study.

PubMed

Kvistgaard Olsen, Jack; Fener, Dilay Kesgin; Waehrens, Eva Elisabet; Wulf Christensen, Anton; Jespersen, Anders; Danneskiold-Samsøe, Bente; Bartels, Else Marie

2017-07-01

Computerized pneumatic cuff pressure algometry (CPA) using the DoloCuff is a new method for pain assessment. Intra- and inter-rater reliabilities have not yet been established. Our aim was to examine the inter- and intrarater reliabilities of DoloCuff measures in healthy subjects. Twenty healthy subjects (ages 20 to 29 years) were assessed three times at 24-hour intervals by two trained raters. Inter-rater reliability was established based on the first and second assessments, whereas intrarater reliability was based on the second and third assessments. Subjects were randomized 1:1 to first assessment at either rater 1 or rater 2. The variables of interest were pressure pain threshold (PT), pressure pain tolerance (PTol), and temporal summation index (TSI). Reliability was estimated by a two-way mixed intraclass correlation coefficient (ICC) absolute agreement analysis. Reliability was considered excellent if ICC > 0.75, fair to good if 0.4 < ICC < 0.75, and poor if ICC < 0.4. Bias and random errors between raters and assessments were evaluated using 95% confidence interval (CI) and Bland-Altman plots. Inter-rater reliability for PT, PTol, and TSI was 0.88 (95% CI: 0.69 to 0.95), 0.86 (95% CI: 0.65 to 0.95), and 0.81 (95% CI: 0.42 to 0.94), respectively. The intrarater reliability for PT, PTol, and TSI was 0.81 (95% CI: 0.53 to 0.92), 0.89 (95% CI: 0.74 to 0.96), and 0.75 (95% CI: 0.28 to 0.91), respectively. Inter-rater reliability was excellent for PT, PTol, and TSI. Similarly, the intrarater reliability for PT and PTol was excellent, while borderline excellent/good for TSI. Therefore, the DoloCuff can be used to obtain reliable measures of pressure pain parameters in healthy subjects. © 2016 World Institute of Pain.
Brief Report: Interrater Reliability of Clinical Diagnosis and DSM-IV Criteria for Autistic Disorder: Results of the DSM-IV Autism Field Trial.

ERIC Educational Resources Information Center

Klin, Ami; Lang, Jason; Cicchetti, Domenic V.; Volkmar, Fred R.

2000-01-01

This study examined the inter-rater reliability of clinician-assigned diagnosis of autism using or not using the criteria specified in the Diagnostic and Statistical Manual IV (DSM-IV). For experienced raters there was little difference in reliability in the two conditions. However, a clinically significant improvement in diagnostic reliability…
Reliability and minimal detectable change of a modified passive neck flexion test in patients with chronic nonspecific neck pain and asymptomatic subjects.

PubMed

López-de-Uralde-Villanueva, Ibai; Acuyo-Osorio, Mario; Prieto-Aldana, María; La Touche, Roy

2017-04-01

The Passive Neck Flexion Test (PNFT) can diagnose meningitis and potential spinal disorders. Little evidence is available concerning the use of a modified version of the PNFT (mPNFT) in patients with chronic nonspecific neck pain (CNSNP). To assess the reliability of the mPNFT in subjects with and without CNSNP. The secondary objective was to assess the differences in the symptoms provoked by the mPNFT between these two populations. We used repeated measures concordance design for the main objective and cross-sectional design for the secondary objective. A total of 30 asymptomatic subjects and 34 patients with CNSNP were recruited. The following measures were recorded: the range of motion at the onset of symptoms (OS-mPNFT), the range of motion at the submaximal pain (SP-mPNFT), and evoked pain intensity on the mPNFT (VAS-mPNFT). Good to excellent reliability was observed for OS-mPNFT and SP-mPNFT in the asymptomatic group (intra-examiner reliability: 0.95-0.97; inter-examiner reliability: 0.86-0.90; intra-examiner test-retest reliability: 0.84-0.87). In the CNSNP group, a good to excellent reliability was obtained for the OS-mPNFT (intra-examiner reliability: 0.89-0.96; inter-examiner reliability: 0.83-0.86; intra-examiner test-retest reliability: 0.83-0.85) and the SP-PNFT (intra-examiner reliability: 0.94-0.98; inter-examiner reliability: 0.80-0.82; intra-examiner test-retest reliability: 0.88-0.91). The CNSNP group showed statistically significant differences in OS-mPNFT (t = 4.92; P < 0.001), SP-mPNFT (t = 2.79; P = 0.007) and in VAS-mPNFT (t = -10.39; P < 0.001) versus the asymptomatic group. The mPNFT is a reliable tool regardless of the examiner and the time factor. Patients with CNSNP have a decrease range of motion and more pain than asymptomatic subjects in the mPNFT. This exceeds the minimal detectable changes for OS-mPNFT and VAS-mPNFT. Copyright © 2017 Elsevier Ltd. All rights reserved.

Objective structured clinical examination for pharmacy students in Qatar: cultural and contextual barriers to assessment.

PubMed

Wilby, K J; Black, E K; Austin, Z; Mukhalalati, B; Aboulsoud, S; Khalifa, S I

2016-07-10

This study aimed to evaluate the feasibility and psychometric defensibility of implementing a comprehensive objective structured clinical examination (OSCE) on the complete pharmacy programme for pharmacy students in a Middle Eastern context, and to identify facilitators and barriers to implementation within new settings. Eight cases were developed, validated, and had standards set according to a blueprint, and were assessed with graduating pharmacy students. Assessor reliability was evaluated using inter-class coefficients (ICCs). Concurrent validity was evaluated by comparing OSCE results to professional skills course grades. Field notes were maintained to generate recommendations for implementation in other contexts. The examination pass mark was 424 points out of 700 (60.6%). All 23 participants passed. Mean performance was 74.6%. Low to moderate inter-rater reliability was obtained for analytical and global components (average ICC 0.77 and 0.48, respectively). In conclusion, OSCE was feasible in Qatar but context-related validity and reliability concerns must be addressed prior to future iterations in Qatar and elsewhere.
An initial reliability and validity study of the Interaction, Communication, and Literacy Skills Audit.

PubMed

El-Choueifati, Nisrine; Purcell, Alison; McCabe, Patricia; Heard, Robert; Munro, Natalie

2014-06-01

Early childhood educators (ECEs) have an important role in promoting positive outcomes for children's language and literacy development. This paper reports the development of a new tool, The Interaction Communication and Literacy (ICL) Skills Audit, and pilots its reliability and validity. Intra- and inter-rater reliability was examined by three speech-language pathologists (SLPs). Five skill areas relating to ECE language and literacy practice were rated. The face and content validity of the ICL Skills Audit was examined by expert SLPs (n = 8) and expert ECEs (n = 4) via questionnaire. The overall intra-rater reliability for the ICL Skills Audit was excellent with percentage close agreement (PCA) of 91-94. Inter-rater agreement was PCA 68-80. Expert SLPs and ECEs agreed that the content was comprehensive and practical. Based on this preliminary study, the ICL Skills Audit appears to be a promising tool that can be used by SLPs and ECEs in collaboration to measure the skills of ECEs in the areas of language and literacy support. Future psychometric and outcome research on the revised ICL Skills Audit is warranted.
Inter-agency communication and operations capabilities during a hospital functional exercise: reliability and validity of a measurement tool.

PubMed

Savoia, Elena; Biddinger, Paul D; Burstein, Jon; Stoto, Michael A

2010-01-01

As proxies for actual emergencies, drills and exercises can raise awareness, stimulate improvements in planning and training, and provide an opportunity to examine how different components of the public health system would combine to respond to a challenge. Despite these benefits, there remains a substantial need for widely accepted and prospectively validated tools to evaluate agencies' and hospitals' performance during such events. Unfortunately, to date, few studies have focused on addressing this need. The purpose of this study was to assess the validity and reliability of a qualitative performance assessment tool designed to measure hospitals' communication and operational capabilities during a functional exercise. The study population included 154 hospital personnel representing nine hospitals that participated in a functional exercise in Massachusetts in June 2008. A 25-item questionnaire was developed to assess the following three hospital functional capabilities: (1) inter-agency communication; (2) communication with the public; and (3) disaster operations. Analyses were conducted to examine internal consistency, associations among scales, the empirical structure of the items, and inter-rater agreement. Twenty-two questions were retained in the final instrument, which demonstrated reliability with alpha coefficients of 0.83 or higher for all scales. A three-factor solution from the principal components analysis accounted for 57% of the total variance, and the factor structure was consistent with the original hypothesized domains. Inter-rater agreement between participants' self reported scores and external evaluators' scores ranged from moderate to good. The resulting 22-item performance measurement tool reliably measured hospital capabilities in a functional exercise setting, with preliminary evidence of concurrent and criterion-related validity.
Reliability and Validity of the Dutch Physical Activity Questionnaires for Children (PAQ-C) and Adolescents (PAQ-A).

PubMed

Bervoets, Liene; Van Noten, Caroline; Van Roosbroeck, Sofie; Hansen, Dominique; Van Hoorenbeeck, Kim; Verheyen, Els; Van Hal, Guido; Vankerckhoven, Vanessa

2014-01-01

This study was designed to validate the Dutch Physical Activity Questionnaires for Children (PAQ-C) and Adolescents (PAQ-A). After adjustment of the original Canadian PAQ-C and PAQ-A (i.e. translation/back-translation and evaluation by expert committee), content validity of both PAQs was assessed and calculated using item-level (I-CVI) and scale-level (S-CVI) content validity indexes. Inter-item and inter-rater reliability of 196 PAQ-C and 95 PAQ-A filled in by both children or adolescents and their parent, were evaluated. Inter-item reliability was calculated by Cronbach's alpha (α) and inter-rater reliability was examined by percent observed agreement and weighted kappa (κ). Concurrent validity of PAQ-A was examined in a subsample of 28 obese and 16 normal-weight children by comparing it with concurrently measured physical activity using a maximal cardiopulmonary exercise test for the assessment of peak oxygen uptake (VO2 peak). For both PAQs, I-CVI ranged 0.67-1.00. S-CVI was 0.89 for PAQ-C and 0.90 for PAQ-A. A total of 192 PAQ-C and 94 PAQ-A were fully completed by both child and parent. Cronbach's α was 0.777 for PAQ-C and 0.758 for PAQ-A. Percent agreement ranged 59.9-74.0% for PAQ-C and 51.1-77.7% for PAQ-A, and weighted κ ranged 0.48-0.69 for PAQ-C and 0.51-0.68 for PAQ-A. The correlation between total PAQ-A score and VO2 peak - corrected for age, gender, height and weight - was 0.516 (p = 0.001). Both PAQs have an excellent content validity, an acceptable inter-item reliability and a moderate to good strength of inter-rater agreement. In addition, total PAQ-A score showed a moderate positive correlation with VO2 peak. Both PAQs have an acceptable to good reliability and validity, however, further validity testing is recommended to provide a more complete assessment of both PAQs.
Inter-session reliability and sex-related differences in hamstrings total reaction time, pre-motor time and motor time during eccentric isokinetic contractions in recreational athlete.

PubMed

Ayala, Francisco; De Ste Croix, Mark; Sainz de Baranda, Pilar; Santonja, Fernando

2014-04-01

The purposes were twofold: (a) to ascertain the inter-session reliability of hamstrings total reaction time, pre-motor time and motor time; and (b) to examine sex-related differences in the hamstrings reaction times profile. Twenty-four men and 24 women completed the study. Biceps femoris and semitendinosus total reaction time, pre-motor time and motor time measured during eccentric isokinetic contractions were recorded on three different occasions. Inter-session reliability was examined through typical percentage error (CVTE), percentage change in the mean (CM) and intraclass correlations (ICC). For both biceps femoris and semitendinosus, total reaction time, pre-motor time and motor time measures demonstrated moderate inter-session reliability (CVTE<10%; CM<3%; ICC>0.7). The results also indicated that, although not statistically significant, women reported consistently longer hamstrings total reaction time (23.5ms), pre-motor time (12.7ms) and motor time (7.5ms) values than men. Therefore, an observed change larger than 5%, 9% and 8% for total reaction time, pre-motor time and motor time respectively from baseline scores after performing a training program would indicate that a real change was likely. Furthermore, while not statistically significant, sex differences were noted in the hamstrings reaction time profile which may play a role in the greater incidence of ACL injuries in women. Copyright © 2013 Elsevier Ltd. All rights reserved.
Measuring the needs of mental health patients in Greece: reliability and validity of the Greek version of the Camberwell assessment of need.

PubMed

Stefanatou, Pentagiotissa; Giannouli, Eleni; Konstantakopoulos, George; Vitoratou, Silia; Mavreas, Venetsanos

2014-11-01

Evaluation of mental health services based on patients' needs assessments has never taken place in Greece, although it is a crucial factor for the efficient use of their limited resources. To examine the inter-rater and test-retest reliability and the concurrent/convergent validity of the Greek research version of the Camberwell Assessment of Need-Research (CAN-R). A total of 53 schizophrenic patient-staff pairs were interviewed twice to test the inter-rater and test-retest reliability of the Greek version of the CAN-R. The World Health Organization Quality of Life-Brief Form (WHOQOL-BREF) and World Health Organization Disability Assessment Schedule-2.0 (WHODAS-2.0) were administered to the patients to examine concurrent validity. The inter-rater and test-retest reliability of patient and staff interviews for the 22 individual items and the eight summary scores of the instrument's four sections were good to excellent. Significant correlations emerged between CAN scores and the WHOQOL-BREF and WHODAS-2.0 domains for both patient and staff ratings, indicating good concurrent validity. Our results suggest that the Greek version of the CAN-R is a reliable instrument for assessing mental health patients' needs. Moreover, it is the first CAN-R validity study with satisfactory results using WHOQOL-BREF and WHODAS-2.0 as criterion variables. © The Author(s) 2013.
Inter-Rater Reliability and Validity of the Australian Football League’s Kicking and Handball Tests

PubMed Central

Cripps, Ashley J.; Hopper, Luke S.; Joyce, Christopher

2015-01-01

Talent identification tests used at the Australian Football League’s National Draft Combine assess the capacities of athletes to compete at a professional level. Tests created for the National Draft Combine are also commonly used for talent identification and athlete development in development pathways. The skills tests created by the Australian Football League required players to either handball (striking the ball with the hand) or kick to a series of 6 randomly generated targets. Assessors subjectively rate each skill execution giving a 0-5 score for each disposal. This study aimed to investigate the inter-rater reliability and validity of the skills tests at an adolescent sub-elite level. Male Australian footballers were recruited from sub-elite adolescent teams (n = 121, age = 15.7 ± 0.3 years, height = 1.77 ± 0.07 m, mass = 69.17 ± 8.08 kg). The coaches (n = 7) of each team were also recruited. Inter-rater reliability was assessed using Inter-class correlations (ICC) and Limits of Agreement statistics. Both the kicking (ICC = 0.96, p < .01) and handball tests (ICC = 0.89, p < .01) demonstrated strong reliability and acceptable levels of absolute agreement. Content validity was determined by examining the test scores sensitivity to laterality and distance. Concurrent validity was assessed by comparing coaches’ perceptions of skill to actual test outcomes. Multivariate analysis of variance (MANOVA) examined the main effect of laterality, with scores on the dominant hand (p = .04) and foot (p < .01) significantly higher compared to the non-dominant side. Follow-up univariate analysis reported significant differences at every distance in the kicking test. A poor correlation was found between coaches’ perceptions of skill and testing outcomes. The results of this study demonstrate both skill tests demonstrate acceptable inter-rater reliable. Partial content validity was confirmed for the kicking test, however further research is required to confirm validity of the handball test. Key points The skill tests created by the AFL demonstrated acceptable levels of relative and absolute inter-rater reliability. Both the AFL’s skills tests are able to differentiate between athletes dominant and non-dominant limbs. However, only the kicking test could consistently differentiated between score outcomes over a range of Australian Football specific disposal distances. Both tests demonstrated poor concurrent validity, with no correlation found between coaches’ perceptions of technical skills and actual skill outcomes measured. PMID:26336356
Evaluating the reliability of an injury prevention screening tool: Test-retest study.

PubMed

Gittelman, Michael A; Kincaid, Madeline; Denny, Sarah; Wervey Arnold, Melissa; FitzGerald, Michael; Carle, Adam C; Mara, Constance A

2016-10-01

A standardized injury prevention (IP) screening tool can identify family risks and allow pediatricians to address behaviors. To assess behavior changes on later screens, the tool must be reliable for an individual and ideally between household members. Little research has examined the reliability of safety screening tool questions. This study utilized test-retest reliability of parent responses on an existing IP questionnaire and also compared responses between household parents. Investigators recruited parents of children 0 to 1 year of age during admission to a tertiary care children's hospital. When both parents were present, one was chosen as the "primary" respondent. Primary respondents completed the 30-question IP screening tool after consent, and they were re-screened approximately 4 hours later to test individual reliability. The "second" parent, when present, only completed the tool once. All participants received a 10-dollar gift card. Cohen's Kappa was used to estimate test-retest reliability and inter-rater agreement. Standard test-retest criteria consider Kappa values: 0.0 to 0.40 poor to fair, 0.41 to 0.60 moderate, 0.61 to 0.80 substantial, and 0.81 to 1.00 as almost perfect reliability. One hundred five families participated, with five lost to follow-up. Thirty-two (30.5%) parent dyads completed the tool. Primary respondents were generally mothers (88%) and Caucasian (72%). Test-retest of the primary respondents showed their responses to be almost perfect; average 0.82 (SD = 0.13, range 0.49-1.00). Seventeen questions had almost perfect test-retest reliability and 11 had substantial reliability. However, inter-rater agreement between household members for 12 objective questions showed little agreement between responses; inter-rater agreement averaged 0.35 (SD = 0.34, range -0.19-1.00). One question had almost perfect inter-rater agreement and two had substantial inter-rater agreement. The IP screening tool used by a single individual had excellent test-retest reliability for nearly all questions. However, when a reporter changes from pre- to postintervention, differences may reflect poor reliability or different subjective experiences rather than true change.
A reliability analysis of the revised competitiveness index.

PubMed

Harris, Paul B; Houston, John M

2010-06-01

This study examined the reliability of the Revised Competitiveness Index by investigating the test-retest reliability, interitem reliability, and factor structure of the measure based on a sample of 280 undergraduates (200 women, 80 men) ranging in age from 18 to 28 years (M = 20.1, SD = 2.1). The findings indicate that the Revised Competitiveness Index has high test-retest reliability, high inter-item reliability, and a stable factor structure. The results support the assertion that the Revised Competitiveness Index assesses competitiveness as a stable trait rather than a dynamic state.
Unified Parkinson's Disease Rating Scale-Motor Exam: inter-rater reliability of advanced practice nurse and neurologist assessments.

PubMed

Palmer, Janice L; Coats, Mary A; Roe, Catherine M; Hanko, Shelly M; Xiong, Chengjie; Morris, John C

2010-06-01

This paper is a report of a study to establish the inter-rater reliability of advanced practice nurse and neurologist neurological assessments which included ratings with the Unified Parkinson's Disease Rating Scale-Motor Exam. Around the world, advanced practice nurses are performing tasks once completed only by physicians. To promote consumer and provider confidence, it is important to establish that nurse and physician ratings using assessment tools are similar. In addition in research settings, when different raters are used, establishment of inter-rater reliability for study assessments is needed. Advanced practice nurses and neurologists independently recorded findings on neurological examinations of 46 participants in a study conducted between August 2007 and January 2008. An intraclass correlation coefficient was calculated to estimate overall agreement between the nurse and neurologist ratings. Agreement for individual items measured on a dichotomous scale was assessed by calculating Cohen's kappa. There was substantial agreement between advanced practice nurses and neurologists on the mean Unified Parkinson's Disease Rating Scale-Motor Exam ratings (intraclass correlation coefficient = 0.65) and the U.S. National Alzheimer's Coordinating Center Uniform Data Set neurological examination ratings of unremarkable findings (kappa = 0.74) and of gait disorder (kappa = 0.73). Moderate agreement (kappa = 0.53) was reached for the rating of whether all Unified Parkinson's Disease Rating Scale-Motor Exam items were normal. These findings are consistent with studies of the inter-rater agreement of the Unified Parkinson's Disease Rating Scale-Motor Exam and support the conduct of neurological assessments by advanced practice nurses.
Development of the Therapist Empathy Scale.

PubMed

Decker, Suzanne E; Nich, Charla; Carroll, Kathleen M; Martino, Steve

2014-05-01

Few measures exist to examine therapist empathy as it occurs in session. A 9-item observer rating scale, called the Therapist Empathy Scale (TES), was developed based on Watson's (1999) work to assess affective, cognitive, attitudinal, and attunement aspects of therapist empathy. The aim of this study was to evaluate the inter-rater reliability, internal consistency, and construct and criterion validity of the TES. Raters evaluated therapist empathy in 315 client sessions conducted by 91 therapists, using data from a multi-site therapist training trial (Martino et al., 2010) in Motivational Interviewing (MI). Inter-rater reliability (ICC = .87 to .91) and internal consistency (Cronbach's alpha = .94) were high. Confirmatory factor analyses indicated some support for single-factor fit. Convergent validity was supported by correlations between TES scores and MI fundamental adherence (r range .50 to .67) and competence scores (r range .56 to .69). Discriminant validity was indicated by negative or nonsignificant correlations between TES and MI-inconsistent behavior (r range .05 to -.33). The TES demonstrates excellent inter-rater reliability and internal consistency. RESULTS indicate some support for a single-factor solution and convergent and discriminant validity. Future studies should examine the use of the TES to evaluate therapist empathy in different psychotherapy approaches and to determine the impact of therapist empathy on client outcome.
A comparison of the reliability of make versus break testing in measuring palmar abduction strength of the thumb.

PubMed

Lim, J X; Toh, R X; Chook, S K H; Sebastin, S J; Karjalainen, T

2014-06-01

Previous studies have established the role of quantitative measurements of palmar abduction strength of the thumb (PAST). This study compares the reliability of the 'make' versus the 'break' test in measuring PAST in healthy volunteers. In a 'make' test, the body part being tested is positioned at the start of its range of motion and the participant is asked to exert his/her maximal force. In a 'break' test, increasing force is applied to a body part after it has completed its range of motion, until the joint being tested gives way. PAST was measured in both hands in 100 healthy volunteers using a handheld device. Two examiners measured PAST using both the 'make' and 'break' test to determine inter-rater reliability. The tests were repeated in 30 volunteers 6 weeks after the initial testing to determine intra-rater reliability. Our results showed that the 'make' test has better inter and intra-rater reliability.
RELIABILITY OF ANKLE-FOOT MORPHOLOGY, MOBILITY, STRENGTH, AND MOTOR PERFORMANCE MEASURES.

PubMed

Fraser, John J; Koldenhoven, Rachel M; Saliba, Susan A; Hertel, Jay

2017-12-01

Assessment of foot posture, morphology, intersegmental mobility, strength and motor control of the ankle-foot complex are commonly used clinically, but measurement properties of many assessments are unclear. To determine test-retest and inter-rater reliability, standard error of measurement, and minimal detectable change of morphology, joint excursion and play, strength, and motor control of the ankle-foot complex. Reliability study. 24 healthy, recreationally-active young adults without history of ankle-foot injury were assessed by two clinicians on two occasions, three to ten days apart. Measurement properties were assessed for foot morphology (foot posture index, total and truncated length, width, arch height), joint excursion (weight-bearing dorsiflexion, rearfoot and hallux goniometry, forefoot inclinometry, 1 st metatarsal displacement) and joint play, strength (handheld dynamometry), and motor control rating during intrinsic foot muscle (IFM) exercises. Clinician order was randomized using a Latin Square. The clinicians performed independent examinations and did not confer on the findings for the duration of the study. Test-retest and inter-tester reliability and agreement was assessed using intraclass correlation coefficients (ICC 2,k ) and weighted kappa ( K w ). Test-retest reliability ICC were as follows: morphology: .80-1.00, joint excursion: .58-.97, joint play: -.67-.84, strength: .67-.92, IFM motor rating: K W -.01-.71. Inter-rater reliability ICC were as follows: morphology: .81-1.00, joint excursion: .32-.97, joint play: -1.06-1.00, strength: .53-.90, and IFM motor rating: K w .02-.56. Measures of ankle-foot posture, morphology, joint excursion, and strength demonstrated fair to excellent test-retest and inter-rater reliability. Test-retest reliability for rating of perceived difficulty and motor performance was good to excellent for short-foot, toe-spread-out, and hallux exercises and poor to fair for lesser toe extension. Joint play measures had poor to fair reliability overall. The findings of this study should be considered when choosing methods of clinical assessment and outcome measures in practice and research. 3.
Reliability of digital ulcer definitions as proposed by the UK Scleroderma Study Group: A challenge for clinical trial design.

PubMed

Hughes, Michael; Tracey, Andrew; Bhushan, Monica; Chakravarty, Kuntal; Denton, Christopher P; Dubey, Shirish; Guiducci, Serena; Muir, Lindsay; Ong, Voon; Parker, Louise; Pauling, John D; Prabu, Athiveeraramapandian; Rogers, Christine; Roberts, Christopher; Herrick, Ariane L

2018-06-01

The reliability of clinician grading of systemic sclerosis-related digital ulcers has been reported to be poor to moderate at best, which has important implications for clinical trial design. The aim of this study was to examine the reliability of new proposed UK Scleroderma Study Group digital ulcer definitions among UK clinicians with an interest in systemic sclerosis. Raters graded (through a custom-built interface) 90 images (80 unique and 10 repeat) of a range of digital lesions collected from patients with systemic sclerosis. Lesions were graded on an ordinal scale of severity: 'no ulcer', 'healed ulcer' or 'digital ulcer'. A total of 23 clinicians - 18 rheumatologists, 3 dermatologists, 1 hand surgeon and 1 specialist rheumatology nurse - completed the study. A total of 2070 (1840 unique + 230 repeat) image gradings were obtained. For intra-rater reliability, across all images, the overall weighted kappa coefficient was high (0.71) and was moderate (0.55) when averaged across individual raters. Overall inter-rater reliability was poor (0.15). Although our proposed digital ulcer definitions had high intra-rater reliability, the overall inter-rater reliability was poor. Our study highlights the challenges of digital ulcer assessment by clinicians with an interest in systemic sclerosis and provides a number of useful insights for future clinical trial design. Further research is warranted to improve the reliability of digital ulcer definition/rating as an outcome measure in clinical trials, including examining the role for objective measurement techniques, and the development of digital ulcer patient-reported outcome measures.
Surgeon Reliability for the Assessment of Lumbar Spinal Stenosis on MRI: The Impact of Surgeon Experience.

PubMed

Marawar, Satyajit V; Madom, Ian A; Palumbo, Mark; Tallarico, Richard A; Ordway, Nathaniel R; Metkar, Umesh; Wang, Dongliang; Green, Adam; Lavelle, William F

2017-01-01

Treating surgeon's visual assessment of axial MRI images to ascertain the degree of stenosis has a critical impact on surgical decision-making. The purpose of this study was to prospectively analyze the impact of surgeon experience on inter-observer and intra-observer reliability of assessing severity of spinal stenosis on MRIs by spine surgeons directly involved in surgical decision-making. Seven fellowship trained spine surgeons reviewed MRI studies of 30 symptomatic patients with lumbar stenosis and graded the stenosis in the central canal, the lateral recess and the foramen at T12-L1 to L5-S1 as none, mild, moderate or severe. No specific instructions were provided to what constituted mild, moderate, or severe stenosis. Two surgeons were "senior" (>fifteen years of practice experience); two were "intermediate" (>four years of practice experience), and three "junior" (< one year of practice experience). The concordance correlation coefficient (CCC) was calculated to assess inter-observer reliability. Seven MRI studies were duplicated and randomly re-read to evaluate inter-observer reliability. Surgeon experience was found to be a strong predictor of inter-observer reliability. Senior inter-observer reliability was significantly higher assessing central(p<0.001), foraminal p=0.005 and lateral p=0.001 than "junior" group.Senior group also showed significantly higher inter-observer reliability that intermediate group assessing foraminal stenosis (p=0.036). In intra-observer reliability the results were contrary to that found in inter-observer reliability. Inter-observer reliability of assessing stenosis on MRIs increases with surgeon experience. Lower intra-observer reliability values among the senior group, although not clearly explained, may be due to the small number of MRIs evaluated and quality of MRI images.Level of evidence: Level 3.
Interobserver Reliability of the Respiratory Physical Examination in Premature Infants: A Multicenter Study

PubMed Central

Jensen, Erik A.; Panitch, Howard; Feng, Rui; Moore, Paul E.; Schmidt, Barbara

2017-01-01

Objective To measure the inter-rater reliability of 7 visual and 3 auscultatory respiratory physical examination findings at 36–40 weeks’ postmenstrual age in infants born less than 29 weeks’ gestation. Physicians also estimated the probability that each infant would remain hospitalized for 3 months after the examination or be readmitted for a respiratory illness during that time. Study design Prospective, multicenter, inter-rater reliability study using standardized audio-video recordings of respiratory physical examinations. Results We recorded the respiratory physical examination of 30 infants at 2 centers and invited 32 physicians from 9 centers to review the examinations. The intraclass correlation values for physician agreement ranged from 0.73 (95% CI 0.57–0.85) for subcostal retractions to 0.22 (95% CI 0.11–0.41) for expiratory abdominal muscle use. Eight (27%) infants remained hospitalized or were readmitted within 3 months after the examination. The area under the receiver operating characteristic curve for prediction of this outcome was 0.82 (95% CI 0.78–0.86). Physician predictive accuracy was greater for infants receiving supplemental oxygen (0.90, 95% CI 0.86–0.95) compared with those breathing in room air (0.71, 95% CI 0.66–0.75). Conclusions Physicians often do not agree on respiratory physical examination findings in premature infants. Physician prediction of short-term respiratory morbidity was more accurate for infants receiving supplemental oxygen compared with those breathing in room air. PMID:27567413
The inter and intra rater reliability of the Netball Movement Screening Tool.

PubMed

Reid, Duncan A; Vanweerd, Rebecca J; Larmer, Peter J; Kingstone, Rachel

2015-05-01

To establish the inter- and intra-rater reliability of the Netball Movement Screening Tool, for screening adolescent female netball players. Inter- and intra-rater reliability study. Forty secondary school netball players were recruited to take part in the study. Twenty subjects were screened simultaneously and independently by two raters to ascertain inter-rater agreement. Twenty subjects were scored by rater one on two occasions, separated by a week, to ascertain intra-rater agreement. Inter and intra-rater agreement was assessed utilising the two-way mixed inter class correlation coefficient and weighted kappa statistics. No significant demographic differences were found between the inter and intra-rater groups of subjects. Inter class correlation coefficients' demonstrated excellent inter-rater (two-way mixed inter class correlation coefficients 0.84, standard error of measurement 0.25) and intra-rater (two-way mixed inter class correlation coefficients 0.96, standard error of measurement 0.13) reliability for the overall Netball Movement Screening Tool score and substantial-excellent (two-way mixed inter class correlation coefficients 1.0-0.65) inter-rater and substantial-excellent intra-rater (two-way mixed inter class correlation coefficients 0.96-0.79) reliability for the component scores of the Netball Movement Screening Tool. Kappa statistic showed substantial to poor inter-rater (k=0.75-0.32) and intra-rater (k=0.77-0.27) agreement for individual tests of the NMST. The Netball Movement Screening Tool may be a reliable screening tool for adolescent netball players; however the individual test scores have low reliability. The screening tool can be administered reliably by raters with similar levels of training in the tool but variable clinical experience. On-going research needs to be undertaken to ascertain whether the Netball Movement Screening Tool is a valid tool in ascertaining increased injury risk for netball players. Copyright © 2014 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
The evaluation of lumbar multifidus muscle function via palpation: reliability and validity of a new clinical test.

PubMed

Hebert, Jeffrey J; Koppenhaver, Shane L; Teyhen, Deydre S; Walker, Bruce F; Fritz, Julie M

2015-06-01

The lumbar multifidus muscle provides an important contribution to lumbar spine stability, and the restoration of lumbar multifidus function is a frequent goal of rehabilitation. Currently, there are no reliable and valid physical examination procedures available to assess lumbar multifidus function among patients with low back pain. To examine the inter-rater reliability and concurrent validity of the multifidus lift test (MLT) to identify lumbar multifidus dysfunction among patients with low back pain. A cross-sectional analysis of reliability and concurrent validity performed in a university outpatient research facility. Thirty-two persons aged 18 to 60 years with current low back pain and a minimum modified Oswestry disability score of 20%. Study participants were excluded if they reported a history of lumbar spine surgery, lumbar radiculopathy, medical red flags, osteoporosis, or had recently been treated with spinal manipulation or trunk stabilization exercises. Concurrent measures of lumbar multifidus muscle function at the L4-L5 and L5-S1 levels were obtained with the MLT (index test) and real-time ultrasound imaging (reference standard). The inter-rater reliability of the MLT was examined by measuring the level of agreement between two blinded examiners. Concurrent validity of the MLT was investigated by comparing clinicians' judgments with real-time ultrasound imaging measures of lumbar multifidus function. Inter-rater reliability of the MLT was substantial to excellent (κ=0.75 to 0.81, p≤.01) and free from errors of bias and prevalence. When performed at L4-L5 or L5-S1, the MLT demonstrated evidence of concurrent validity through its relationship with the reference standard results at L4-L5 (rbis=0.59-0.73, p≤.01). The MLT generally failed to demonstrate a relationship with the reference standard results from the L5-S1 level. Our results provide preliminary evidence supporting the reliability and validity of the MLT to assess lumbar multifidus function at the L4-L5 spinal level. Additional research examining the measurement properties and utility of this test should be undertaken before confident implementation with patients. Copyright © 2015 Elsevier Inc. All rights reserved.
Inter-rater and intra-rater reliability of the Bahasa Melayu version of Rose Angina Questionnaire.

PubMed

Hassan, N B; Choudhury, S R; Naing, L; Conroy, R M; Rahman, A R A

2007-01-01

The objective of the study is to translate the Rose Questionnaire (RQ) into a Bahasa Melayu version and adapt it cross-culturally, and to measure its inter-rater and intrarater reliability. This cross sectional study was conducted in the respondents' homes or workplaces in Kelantan, Malaysia. One hundred respondents aged 30 and above with different socio-demographic status were interviewed for face validity. For each inter-rater and intra-rater reliability, a sample of 150 respondents was interviewed. Inter-rater and intra-rater reliabilities were assessed by Cohen's kappa. The overall inter-rater agreements by the five pair of interviewers at point one and two were 0.86, and intrarater reliability by the five interviewers on the seven-item questionnaire at poinone and two was 0.88, as measured by kappa coefficient. The translated Malay version of RQ demonstrated an almost perfect inter-rater and intra-rater reliability and further validation such as sensitivity and specificity analysis of this translated questionnaire is highly recommended.
Considerations in the use of reflective writing for student assessment: issues of reliability and validity.

PubMed

Moniz, Tracy; Arntfield, Shannon; Miller, Kristina; Lingard, Lorelei; Watling, Chris; Regehr, Glenn

2015-09-01

Reflective writing is a popular tool to support the growth of reflective capacity in undergraduate medical learners. Its popularity stems from research suggesting that reflective capacity may lead to improvements in skills such as empathy, communication, collaboration and professionalism. This has led to assumptions that reflective writing can also serve as a tool for student assessment. However, evidence to support the reliability and validity of reflective writing as a meaningful assessment strategy is lacking. Using a published instrument for measuring 'reflective capacity' (the Reflection Evaluation for Learners' Enhanced Competencies Tool [REFLECT]), four trained raters independently scored four samples of writing from each of 107 undergraduate medical students to determine the reliability of reflective writing scores. REFLECT scores were then correlated with scores on a Year 4 objective structured clinical examination (OSCE) and Year 2 multiple-choice question (MCQ) examinations to examine, respectively, convergent and divergent validity. Across four writing samples, four-rater Cronbach's α-values ranged from 0.72 to 0.82, demonstrating reasonable inter-rater reliability with four raters using the REFLECT rubric. However, inter-sample reliability was fairly low (four-sample Cronbach's α = 0.54, single-sample intraclass correlation coefficient: 0.23), which suggests that performance on one reflective writing sample was not strongly indicative of performance on the next. Approximately 14 writing samples are required to achieve reasonable inter-sample reliability. The study found weak, non-significant correlations between reflective writing scores and both OSCE global scores (r = 0.13) and MCQ examination scores (r = 0.10), demonstrating a lack of relationship between reflective writing and these measures of performance. Our findings suggest that to draw meaningful conclusions about reflective capacity as a stable construct in individuals requires 14 writing samples per student, each assessed by four or five raters. This calls into question the feasibility and utility of using reflective writing rigorously as an assessment tool in undergraduate medical education. © 2015 John Wiley & Sons Ltd.

Indices of Paraspinal Muscles Degeneration: Reliability and Association With Facet Joint Osteoarthritis: Feasibility Study.

PubMed

Kalichman, Leonid; Klindukhov, Alexander; Li, Ling; Linov, Lina

2016-11-01

A reliability and cross-sectional observational study. To introduce a scoring system for visible fat infiltration in paraspinal muscles; to evaluate intertester and intratester reliability of this system and its relationship with indices of muscle density; to evaluate the association between indices of paraspinal muscle degeneration and facet joint osteoarthritis. Current evidence suggests that the paraspinal muscles degeneration is associated with low back pain, facet joint osteoarthritis, spondylolisthesis, and degenerative disc disease. However, the evaluation of paraspinal muscles on computed tomography is not radiological routine, probably because of absence of simple and reliable indices of paraspinal degeneration. One hundred fifty consecutive computed tomography scans of the lower back (N=75) or abdomen (N=75) were evaluated. Mean radiographic density (in Hounsfield units) and SD of the density of multifidus and erector spinae were evaluated at the L4-L5 spinal level. A new index of muscle degeneration, radiographic density ratio=muscle density/SD of density, was calculated. To evaluate the visible fat infiltration in paraspinal muscles, we proposed a 3-graded scoring system. The prevalence of facet joint osteoarthritis was also evaluated. Intraclass correlation and κ statistics were used to evaluate inter-rater and intra-rater reliability. Logistic regression examined the association between paraspinal muscle indices and facet joint osteoarthritis. Intra-rater reliability for fat infiltration score (κ) ranged between 0.87 and 0.92; inter-rater reliability between 0.70 and 0.81. Intra-rater reliability (intraclass correlation) for mean density of paraspinal muscles ranged between 0.96 and 0.99, inter-rater reliability between 0.95 and 0.99; SD intra-rater reliability ranged between 0.82 and 0.91, inter-rater reliability between 0.80 and 0.89. Significant associations (P<0.01) were found between facet joint osteoarthritis, fat infiltration score, and radiographic density ratio. Two suggested indices of paraspinal muscle degeneration showed excellent reliability and were significantly associated with facet joint osteoarthritis. Additional studies are needed to evaluate the associations with other spinal degeneration features and low back pain.
Reliability of rehabilitative ultrasonographic imaging for muscle thickness measurement of the rhomboid major.

PubMed

Jeong, Ju Ri; Ko, Young Jun; Ha, Hyun Geun; Lee, Wan Hee

2016-03-01

This study was to establish inter-rater and intrarater reliability of the rehabilitative ultrasonographic imaging (RUSI) technique for muscle thickness measurement of the rhomboid major at rest and with the shoulder abducted to 90°. Twenty-four young adults (eight men, 16 women; right-handed; mean age [±SD], 24·4 years [±2·6]) with no history of neck, shoulder, or arm pain were recruited. Rhomboid major muscle images were obtained in the resting position and with shoulder in 90° abduction using an ultrasonography system with a 7·5-MHz linear transducer. In these two positions, the examiners found the site at which the transducer could be placed. Two examiners obtained the images of all participants in three test sessions at random. Intraclass correlation coefficients (ICC) were used to estimate reliability. All ICCs (95% CI) were >0·75, ranging from 0·93 to 0·98, which indicates good reliability. The ICCs for inter-rater reliability ranged from 0·75 to 0·94. For the absolute value of the difference in the intra-examiner reliability between the right and left ratios, the ICCs ranged from 0·58 to 0·91. In this study, the intra- and interexaminer reliability of muscle thickness measurements of the rhomboid major were good. Therefore, we suggest that muscle thickness measurements of the rhomboid major obtained with the RUSI technique would be useful for clinical rehabilitative assessment. © 2014 Scandinavian Society of Clinical Physiology and Nuclear Medicine. Published by John Wiley & Sons Ltd.
Reliability of the Functional Mobility Scale for Children with Cerebral Palsy

ERIC Educational Resources Information Center

Harvey, Adrienne R.; Morris, Meg E.; Graham, H. Kerr; Wolfe, Rory; Baker, Richard

2010-01-01

This study examined inter-rater reliability of the Functional Mobility Scale (FMS) for children with cerebral palsy (CP) and the presence of rater bias. A consecutive sample of 118 children with CP, 2-18 years old (mean 10.3 years, SD 3.6), was recruited from a hospital setting. Children were classified using the gross motor function…
Reliability of different methodologies of infrared image analysis of myofascial trigger points in the upper trapezius muscle

PubMed Central

Dibai-Filho, Almir V.; Guirro, Elaine C. O.; Ferreira, Vânia T. K.; Brandino, Hugo E.; Vaz, Maíta M. O. L. L.; Guirro, Rinaldo R. J.

2015-01-01

BACKGROUND: Infrared thermography is recognized as a viable method for evaluation of subjects with myofascial pain. OBJECTIVE: The aim of the present study was to assess the intra- and inter-rater reliability of infrared image analysis of myofascial trigger points in the upper trapezius muscle. METHOD: A reliability study was conducted with 24 volunteers of both genders (23 females) between 18 and 30 years of age (22.12±2.54), all having cervical pain and presence of active myofascial trigger point in the upper trapezius muscle. Two trained examiners performed analysis of point, line, and area of the infrared images at two different periods with a 1-week interval. The intra-class correlation coefficient (ICC2,1) was used to assess the intra- and inter-rater reliability. RESULTS: With regard to the intra-rater reliability, ICC values were between 0.591 and 0.993, with temperatures between 0.13 and 1.57 °C for values of standard error of measurement (SEM) and between 0.36 and 4.35 °C for the minimal detectable change (MDC). For the inter-rater reliability, ICC ranged from 0.615 to 0.918, with temperatures between 0.43 and 1.22 °C for the SEM and between 1.19 and 3.38 °C for the MDC. CONCLUSION: The methods of infrared image analyses of myofascial trigger points in the upper trapezius muscle employed in the present study are suitable for clinical and research practices. PMID:25993626
Dental students consistency in applying the ICDAS system within paediatric dentistry.

PubMed

Foley, J I

2012-12-01

To examine dental students' consistency in utilising the International Caries Detection and Assessment System (ICDAS) one and three months after training. A prospective study. All clinical dental students (Year Two: BDS2; Year Three: BDS3; Year Four: BDS4) as part of their education in Paediatric Dentistry at Aberdeen Dental School (n = 56) received baseline training by two "gold-standard" examiners and were advised to complete the 90-minute ICDAS e-learning program. Study One: One month later, the occlusal surface of 40 extracted primary and permanent molar teeth were examined and assigned both a caries (0-6 scale) and restorative code (0-9 scale). Study Two: The same teeth were examined three months later. Kappa statistics were used to determine inter- and intra-examiner reliability at baseline and after three months. In total, 31 students (BDS2: n = 9; BDS3: n = 8; BDS4: n = 14) completed both examinations. The inter-examiner reliability kappa scores for restoration codes for Study One and Study Two were: BDS2: 0.47 and 0.38; BDS3: 0.61 and 0.52 and BDS4: 0.56 and 0.52. The caries scores for the two studies were: BDS2: 0.31 and 0.20; BDS3: 0.45 and 0.32 and BDS4: 0.35 and 0.34. The intra-examiner reliability range for restoration codes were: BDS2: 0.20 to 0.55; BDS3: 0.34 to 0.72 and BDS4: 0.28 to 0.80. The intra-examiner reliability range for caries codes were: BDS2: 0.35 to 0.62; BDS3: 0.22 to 0.53 and BDS4: 0.22 to 0.65. The consistency of ICDAS codes varied between students and also, between year groups. In general, consistency was greater for restoration codes.
Unified Parkinson’s Disease Rating Scale-Motor Exam: Inter-rater reliability of advanced practice nurse and neurologist assessments

PubMed Central

Palmer, Janice L.; Coats, Mary A.; Roe, Catherine M.; Hanko, Shelly M.; Xiong, Chengjie; Morris, John C.

2010-01-01

Aim This paper is a report of a study to establish the inter-rater reliability of advanced practice nurse and neurologist neurological assessments which included ratings with the Unified Parkinson’s Disease Rating Scale-Motor Exam. Background Around the world, advanced practice nurses are performing tasks once completed by only physicians. To promote consumer and provider confidence, it is important to establish that nurse and physician ratings using assessment tools are similar. In addition in research settings, when different raters are used, establishment of inter-rater reliability for study assessments is needed. Method Advanced practice nurses and neurologists independently recorded findings on neurological examinations of 46 participants in a study conducted between August 2007 and January 2008. An intraclass correlation coefficient was calculated to estimate overall agreement between the nurse and neurologist ratings. Agreement for individual items measured on a dichotomous scale was assessed by calculating Cohen’s kappa. Results There was substantial agreement between advanced practice nurses and neurologists on the mean Unified Parkinson’s Disease Rating Scale-Motor Exam ratings (intraclass correlation coefficient = 0.65) and the U.S. National Alzheimer’s Coordinating Center Uniform Data Set neurological examination ratings of unremarkable findings (kappa = 0.74) and of gait disorder (kappa = 0.73). Moderate agreement (kappa = 0.53) was reached for the rating of whether all Unified Parkinson’s Disease Rating Scale-Motor Exam items were normal. Conclusion These findings are consistent with studies of the inter-rater agreement of the Unified Parkinson’s Disease Rating Scale-Motor Exam and support the conduct of neurological assessments by advanced practice nurses. PMID:20546368
Validation of the Spanish adaptation of the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V).

PubMed

Núñez-Batalla, Faustino; Morato-Galán, Marta; García-López, Isabel; Ávila-Menéndez, Arántzazu

2015-01-01

The Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V) was developed.to promote a standardised approach to evaluating and documenting auditory perceptual judgments of vocal quality. This tool was originally developed in English language and its Spanish version is still inexistent. The aim of this study was to develop a Spanish adaptation of CAPE-V and to examine the reliability and empirical validity of this Spanish version. To adapt the CAPE-V protocol to the Spanish language, we proposed 6 phrases phonetically designed according to the CAPE-V requirements. Prospective instrument validation was performed. The validity of the Spanish version of the CAPE-V was examined in 4 ways: intra-rater reliability, inter-rater reliability and CAPE-V versus GRABS judgments. Inter-rater reliability coefficients for the CAPE-V ranged from 0.93 for overall severity to 0.54 for intensity; intra-rater reliability ranged from 0.98 for overall severity to 0.85 for intensity. The comparison of judgments between GRABS and CAPE-V ranged from 0.86 for overall severity to 0.61 for breathiness. The present study supports the use of the Spanish version of CAPE-V because of its validity and reliability. Copyright © 2014 Elsevier España, S.L.U. and Sociedad Española de Otorrinolaringología y Patología Cérvico-Facial. All rights reserved.
Interobserver Reliability of the Respiratory Physical Examination in Premature Infants: A Multicenter Study.

PubMed

Jensen, Erik A; Panitch, Howard; Feng, Rui; Moore, Paul E; Schmidt, Barbara

2016-11-01

To measure the inter-rater reliability of 7 visual and 3 auscultatory respiratory physical examination findings at 36-40 weeks' postmenstrual age in infants born less than 29 weeks' gestation. Physicians also estimated the probability that each infant would remain hospitalized for 3 months after the examination or be readmitted for a respiratory illness during that time. Prospective, multicenter, inter-rater reliability study using standardized audio-video recordings of respiratory physical examinations. We recorded the respiratory physical examination of 30 infants at 2 centers and invited 32 physicians from 9 centers to review the examinations. The intraclass correlation values for physician agreement ranged from 0.73 (95% CI 0.57-0.85) for subcostal retractions to 0.22 (95% CI 0.11-0.41) for expiratory abdominal muscle use. Eight (27%) infants remained hospitalized or were readmitted within 3 months after the examination. The area under the receiver operating characteristic curve for prediction of this outcome was 0.82 (95% CI 0.78-0.86). Physician predictive accuracy was greater for infants receiving supplemental oxygen (0.90, 95% CI 0.86-0.95) compared with those breathing in room air (0.71, 95% CI 0.66-0.75). Physicians often do not agree on respiratory physical examination findings in premature infants. Physician prediction of short-term respiratory morbidity was more accurate for infants receiving supplemental oxygen compared with those breathing in room air. Copyright © 2016 Elsevier Inc. All rights reserved.
Reliability of segmental accelerations measured using a new wireless gait analysis system.

PubMed

Kavanagh, Justin J; Morrison, Steven; James, Daniel A; Barrett, Rod

2006-01-01

The purpose of this study was to determine the inter- and intra-examiner reliability, and stride-to-stride reliability, of an accelerometer-based gait analysis system which measured 3D accelerations of the upper and lower body during self-selected slow, preferred and fast walking speeds. Eight subjects attended two testing sessions in which accelerometers were attached to the head, neck, lower trunk, and right shank. In the initial testing session, two different examiners attached the accelerometers and performed the same testing procedures. A single examiner repeated the procedure in a subsequent testing session. All data were collected using a new wireless gait analysis system, which features near real-time data transmission via a Bluetooth network. Reliability for each testing condition (4 locations, 3 directions, 3 speeds) was quantified using a waveform similarity statistic known as the coefficient of multiple determination (CMD). CMD's ranged from 0.60 to 0.98 across all test conditions and were not significantly different for inter-examiner (0.86), intra-examiner (0.87), and stride-to-stride reliability (0.86). The highest repeatability for the effect of location, direction and walking speed were for the shank segment (0.94), the vertical direction (0.91) and the fast walking speed (0.91), respectively. Overall, these results indicate that a high degree of waveform repeatability was obtained using a new gait system under test-retest conditions involving single and dual examiners. Furthermore, differences in acceleration waveform repeatability associated with the reapplication of accelerometers were small in relation to normal motor variability.
Inter-rater and test–retest reliability of quality assessments by novice student raters using the Jadad and Newcastle–Ottawa Scales

PubMed Central

Oremus, Carolina; Hall, Geoffrey B C; McKinnon, Margaret C

2012-01-01

Introduction Quality assessment of included studies is an important component of systematic reviews. Objective The authors investigated inter-rater and test–retest reliability for quality assessments conducted by inexperienced student raters. Design Student raters received a training session on quality assessment using the Jadad Scale for randomised controlled trials and the Newcastle–Ottawa Scale (NOS) for observational studies. Raters were randomly assigned into five pairs and they each independently rated the quality of 13–20 articles. These articles were drawn from a pool of 78 papers examining cognitive impairment following electroconvulsive therapy to treat major depressive disorder. The articles were randomly distributed to the raters. Two months later, each rater re-assessed the quality of half of their assigned articles. Setting McMaster Integrative Neuroscience Discovery and Study Program. Participants 10 students taking McMaster Integrative Neuroscience Discovery and Study Program courses. Main outcome measures The authors measured inter-rater reliability using κ and the intraclass correlation coefficient type 2,1 or ICC(2,1). The authors measured test–retest reliability using ICC(2,1). Results Inter-rater reliability varied by scale question. For the six-item Jadad Scale, question-specific κs ranged from 0.13 (95% CI −0.11 to 0.37) to 0.56 (95% CI 0.29 to 0.83). The ranges were −0.14 (95% CI −0.28 to 0.00) to 0.39 (95% CI −0.02 to 0.81) for the NOS cohort and −0.20 (95% CI −0.49 to 0.09) to 1.00 (95% CI 1.00 to 1.00) for the NOS case–control. For overall scores on the six-item Jadad Scale, ICC(2,1)s for inter-rater and test–retest reliability (accounting for systematic differences between raters) were 0.32 (95% CI 0.08 to 0.52) and 0.55 (95% CI 0.41 to 0.67), respectively. Corresponding ICC(2,1)s for the NOS cohort were −0.19 (95% CI −0.67 to 0.35) and 0.62 (95% CI 0.25 to 0.83), and for the NOS case–control, the ICC(2,1)s were 0.46 (95% CI −0.13 to 0.92) and 0.83 (95% CI 0.48 to 0.95). Conclusions Inter-rater reliability was generally poor to fair and test–retest reliability was fair to excellent. A pilot rating phase following rater training may be one way to improve agreement. PMID:22855629
Inter-rater and test-retest reliability of quality assessments by novice student raters using the Jadad and Newcastle-Ottawa Scales.

PubMed

Oremus, Mark; Oremus, Carolina; Hall, Geoffrey B C; McKinnon, Margaret C

2012-01-01

Quality assessment of included studies is an important component of systematic reviews. The authors investigated inter-rater and test-retest reliability for quality assessments conducted by inexperienced student raters. Student raters received a training session on quality assessment using the Jadad Scale for randomised controlled trials and the Newcastle-Ottawa Scale (NOS) for observational studies. Raters were randomly assigned into five pairs and they each independently rated the quality of 13-20 articles. These articles were drawn from a pool of 78 papers examining cognitive impairment following electroconvulsive therapy to treat major depressive disorder. The articles were randomly distributed to the raters. Two months later, each rater re-assessed the quality of half of their assigned articles. McMaster Integrative Neuroscience Discovery and Study Program. 10 students taking McMaster Integrative Neuroscience Discovery and Study Program courses. The authors measured inter-rater reliability using κ and the intraclass correlation coefficient type 2,1 or ICC(2,1). The authors measured test-retest reliability using ICC(2,1). Inter-rater reliability varied by scale question. For the six-item Jadad Scale, question-specific κs ranged from 0.13 (95% CI -0.11 to 0.37) to 0.56 (95% CI 0.29 to 0.83). The ranges were -0.14 (95% CI -0.28 to 0.00) to 0.39 (95% CI -0.02 to 0.81) for the NOS cohort and -0.20 (95% CI -0.49 to 0.09) to 1.00 (95% CI 1.00 to 1.00) for the NOS case-control. For overall scores on the six-item Jadad Scale, ICC(2,1)s for inter-rater and test-retest reliability (accounting for systematic differences between raters) were 0.32 (95% CI 0.08 to 0.52) and 0.55 (95% CI 0.41 to 0.67), respectively. Corresponding ICC(2,1)s for the NOS cohort were -0.19 (95% CI -0.67 to 0.35) and 0.62 (95% CI 0.25 to 0.83), and for the NOS case-control, the ICC(2,1)s were 0.46 (95% CI -0.13 to 0.92) and 0.83 (95% CI 0.48 to 0.95). Inter-rater reliability was generally poor to fair and test-retest reliability was fair to excellent. A pilot rating phase following rater training may be one way to improve agreement.
Inter- and Intrarater Reliability Using Different Software Versions of E4D Compare in Dental Education.

PubMed

Callan, Richard S; Cooper, Jeril R; Young, Nancy B; Mollica, Anthony G; Furness, Alan R; Looney, Stephen W

2015-06-01

The problems associated with intra- and interexaminer reliability when assessing preclinical performance continue to hinder dental educators' ability to provide accurate and meaningful feedback to students. Many studies have been conducted to evaluate the validity of utilizing various technologies to assist educators in achieving that goal. The purpose of this study was to compare two different versions of E4D Compare software to determine if either could be expected to deliver consistent and reliable comparative results, independent of the individual utilizing the technology. Five faculty members obtained E4D digital images of students' attempts (sample model) at ideal gold crown preparations for tooth #30 performed on typodont teeth. These images were compared to an ideal (master model) preparation utilizing two versions of E4D Compare software. The percent correlations between and within these faculty members were recorded and averaged. The intraclass correlation coefficient was used to measure both inter- and intrarater agreement among the examiners. The study found that using the older version of E4D Compare did not result in acceptable intra- or interrater agreement among the examiners. However, the newer version of E4D Compare, when combined with the Nevo scanner, resulted in a remarkable degree of agreement both between and within the examiners. These results suggest that consistent and reliable results can be expected when utilizing this technology under the protocol described in this study.
Utility and Reliability of an App for the System for Observing Play and Recreation in Communities (iSOPARC®)

ERIC Educational Resources Information Center

Santos, Maria P. M.; Rech, Cassiano R.; Alberico, Claudia O.; Fermino, Rogério C.; Rios, Ana P.; David, João; Reis, Rodrigo S.; Sarmiento, Olga L.; McKenzie, Thomas L.; Mota, Jorge

2016-01-01

The app for the System for Observing Play and Recreation in Communities (iSOPARC®) was developed to enhance System for Observing Play and Recreation in Communities data collection and management. The study aim was to examine the usability and inter-rater reliability of iSOPARC®. Trained observers collected data in 16 park areas in two Latin…
Inter-Observer Reliability of DSM-5 Substance Use Disorders*

PubMed Central

Denis, Cécile M.; Gelernter, Joel; Hart, Amy B.; Kranzler, Henry R.

2015-01-01

Aims Although studies have examined the impact of changes made in DSM-5 on the estimated prevalence of substance use disorder (SUD) diagnoses, there is limited evidence of the reliability of DSM-5 SUDs. We evaluated the inter-observer reliability of four DSM-5 SUDs in a sample in which we had previously evaluated the reliability of DSM-IV diagnoses, allowing us to compare the two systems. Methods Two different interviewers each assessed 173 subjects over a 2-week period using the Semi-Structured Assessment for Drug Dependence and Alcoholism (SSADDA). Using the percent agreement and kappa (κ) coefficient, we examined the reliability of DSM-5 lifetime alcohol, opioid, cocaine, and cannabis use disorders, which we compared to that of SSADDA-derived DSM-IV SUD diagnoses. We also assessed the effect of additional lifetime SUD and lifetime mood or anxiety disorder diagnoses on the reliability of the DSM-5 SUD diagnoses. Results Reliability was good to excellent for the four disorders, with κ values ranging from 0.65 to 0.94. Agreement was consistently lower for SUDs of mild severity than for moderate or severe disorders. DSM-5 SUD diagnoses showed greater reliability than DSM-IV diagnoses of abuse or dependence or dependence only. Co-occurring SUD and lifetime mood or anxiety disorders exerted a modest effect on the reliability of the DSM-5 SUD diagnoses. Conclusions For alcohol, opioid, cocaine and cannabis use disorders, DSM-5 criteria and diagnoses are at least as reliable as those of DSM-IV. PMID:26048641
Region of Interest Correction Factors Improve Reliability of Diffusion Imaging Measures Within and Across Scanners and Field Strengths

PubMed Central

Venkatraman, Vijay K; Gonzalez, Christopher E.; Landman, Bennett; Goh, Joshua; Reiter, David A.; An, Yang; Resnick, Susan M.

2017-01-01

Diffusion tensor imaging (DTI) measures are commonly used as imaging markers to investigate individual differences in relation to behavioral and health-related characteristics. However, the ability to detect reliable associations in cross-sectional or longitudinal studies is limited by the reliability of the diffusion measures. Several studies have examined reliability of diffusion measures within (i.e. intra-site) and across (i.e. inter-site) scanners with mixed results. Our study compares the test-retest reliability of diffusion measures within and across scanners and field strengths in cognitively normal older adults with a follow-up interval less than 2.25 years. Intra-class correlation (ICC) and coefficient of variation (CoV) of fractional anisotropy (FA) and mean diffusivity (MD) were evaluated in sixteen white matter and twenty-six gray matter bilateral regions. The ICC for intra-site reliability (0.32 to 0.96 for FA and 0.18 to 0.95 for MD in white matter regions; 0.27 to 0.89 for MD and 0.03 to 0.79 for FA in gray matter regions) and inter-site reliability (0.28 to 0.95 for FA in white matter regions, 0.02 to 0.86 for MD in gray matter regions) with longer follow-up intervals were similar to earlier studies using shorter follow-up intervals. The reliability of across field strengths comparisons was lower than intra- and inter-site reliability. Within and across scanner comparisons showed that diffusion measures were more stable in larger white matter regions (> 1500 mm3). For gray matter regions, the MD measure showed stability in specific regions and was not dependent on region size. Linear correction factor estimated from cross-sectional or longitudinal data improved the reliability across field strengths. Our findings indicate that investigations relating diffusion measures to external variables must consider variable reliability across the distinct regions of interest and that correction factors can be used to improve consistency of measurement across field strengths. An important result of this work is that inter-scanner and field strength effects can be partially mitigated with linear correction factors specific to regions of interest. These data-driven linear correction techniques can be applied in cross-sectional or longitudinal studies. PMID:26146196
Inter-rater and intra-rater reliability of a movement control test in shoulder.

PubMed

Rajasekar, S; Bangera, Rakshith K; Sekaran, Padmanaban

2017-07-01

Movement faults are commonly observed in patients with musculoskeletal pain. The Kinetic Medial Rotation Test (KMRT) is a movement control test used to identify movement faults of the scapula and gleno-humeral joints during arm movement. Objective tests such as the KMRT need to be reliable and valid for the results to be applied across different clinical settings and patient populations. The primary objective of the present study was to determine the intra-rater and inter-rater reliability of KMRT in subjects with and without shoulder pain. Sixty subjects were included in this study based on specific inclusion and exclusion criteria. Two musculoskeletal physiotherapists with different levels of clinical experience performed the tests. The intra-rater reliability was tested in twenty asymptomatic subjects by a single assessor at two week intervals. An equal number of subjects with and without shoulder pain were tested by both the assessors to determine the inter-rater reliability. Both components of the KMRT, the Gleno- Humeral Anterior Translation (GHAT) and the Scapular Forward Tilt (SCFT) were tested. The Kappa values for inter-rater reliability of the GHAT and SCFT were K = 0.68 & K = 0.65 respectively in subjects with shoulder pain. In asymptomatic subjects, the inter-rater reliability of GHAT was K = 0.61 and SCFT was K = 0.85. Intra-rater reliability ranged from K = 0.66 for GHAT to K = 0.87 for SCFT. Our study found substantial agreement in inter-rater reliability of KMRT in subjects with shoulder pain, whereas substantial to near perfect agreement was found in intra-rater and inter-rater reliability of KMRT in subjects without shoulder pain. Copyright © 2017 Elsevier Ltd. All rights reserved.
Development and inter-rater reliability of a standardized verbal instruction manual for the Chinese Geriatric Depression Scale-short form.

PubMed

Wong, M T P; Ho, T P; Ho, M Y; Yu, C S; Wong, Y H; Lee, S Y

2002-05-01

The Geriatric Depression Scale (GDS) is a common screening tool for elderly depression in Hong Kong. This study aimed at (1) developing a standardized manual for the verbal administration and scoring of the GDS-SF, and (2) comparing the inter-rater reliability between the standardized and non-standardized verbal administration of GDS-SF. Two studies were reported. In Study 1, the process of developing the manual was described. In Study 2, we compared the inter-rater reliabilities of GDS-SF scores using the standardized verbal instructions and the traditional non-standardized administration. Results of Study 2 indicated that the standardized procedure in verbal administration and scoring improved the inter-rater reliabilities of GDS-SF. Copyright 2002 John Wiley & Sons, Ltd.
Exploring Differences in Measurement and Reporting of Classroom Observation Inter-Rater Reliability

ERIC Educational Resources Information Center

Wilhelm, Anne Garrison; Gillespie Rouse, Amy; Jones, Francesca

2018-01-01

Although inter-rater reliability is an important aspect of using observational instruments, it has received little theoretical attention. In this article, we offer some guidance for practitioners and consumers of classroom observations so that they can make decisions about inter-rater reliability, both for study design and in the reporting of data…
Inter-operator and inter-device agreement and reliability of the SEM Scanner.

PubMed

Clendenin, Marta; Jaradeh, Kindah; Shamirian, Anasheh; Rhodes, Shannon L

2015-02-01

The SEM Scanner is a medical device designed for use by healthcare providers as part of pressure ulcer prevention programs. The objective of this study was to evaluate the inter-rater and inter-device agreement and reliability of the SEM Scanner. Thirty-one (31) volunteers free of pressure ulcers or broken skin at the sternum, sacrum, and heels were assessed with the SEM Scanner. Each of three operators utilized each of three devices to collect readings from four anatomical sites (sternum, sacrum, left and right heels) on each subject for a total of 108 readings per subject collected over approximately 30 min. For each combination of operator-device-anatomical site, three SEM readings were collected. Inter-operator and inter-device agreement and reliability were estimated. Over the course of this study, more than 3000 SEM Scanner readings were collected. Agreement between operators was good with mean differences ranging from -0.01 to 0.11. Inter-operator and inter-device reliability exceeded 0.80 at all anatomical sites assessed. The results of this study demonstrate the high reliability and good agreement of the SEM Scanner across different operators and different devices. Given the limitations of current methods to prevent and detect pressure ulcers, the SEM Scanner shows promise as an objective, reliable tool for assessing the presence or absence of pressure-induced tissue damage such as pressure ulcers. Copyright © 2015 Bruin Biometrics, LLC. Published by Elsevier Ltd.. All rights reserved.
Digital assessment of the fetal alcohol syndrome facial phenotype: reliability and agreement study.

PubMed

Tsang, Tracey W; Laing-Aiken, Zoe; Latimer, Jane; Fitzpatrick, James; Oscar, June; Carter, Maureen; Elliott, Elizabeth J

2017-01-01

To examine the three facial features of fetal alcohol syndrome (FAS) in a cohort of Australian Aboriginal children from two-dimensional digital facial photographs to: (1) assess intrarater and inter-rater reliability; (2) identify the racial norms with the best fit for this population; and (3) assess agreement with clinician direct measures. Photographs and clinical data for 106 Aboriginal children (aged 7.4-9.6 years) were sourced from the Lililwan Project . Fifty-eight per cent had a confirmed prenatal alcohol exposure and 13 (12%) met the Canadian 2005 criteria for FAS/partial FAS. Photographs were analysed using the FAS Facial Photographic Analysis Software to generate the mean PFL three-point ABC-Score, five-point lip and philtrum ranks and four-point face rank in accordance with the 4-Digit Diagnostic Code. Intrarater and inter-rater reliability of digital ratings was examined in two assessors. Caucasian or African American racial norms for PFL and lip thickness were assessed for best fit; and agreement between digital and direct measurement methods was assessed. Reliability of digital measures was substantial within (kappa: 0.70-1.00) and between assessors (kappa: 0.64-0.89). Clinician and digital ratings showed moderate agreement (kappa: 0.47-0.58). Caucasian PFL norms and the African American Lip-Philtrum Guide 2 provided the best fit for this cohort. In an Aboriginal cohort with a high rate of FAS, assessment of facial dysmorphology using digital methods showed substantial inter- and intrarater reliability. Digital measurement of features has high reliability and until data are available from a larger population of Aboriginal children, the African American Lip-Philtrum Guide 2 and Caucasian (Strömland) PFL norms provide the best fit for Australian Aboriginal children.

Reliability and Validity of Objective Measures of Physical Activity in Youth With Cerebral Palsy Who Are Ambulatory.

PubMed

O'Neil, Margaret E; Fragala-Pinkham, Maria; Lennon, Nancy; George, Ameeka; Forman, Jeffrey; Trost, Stewart G

2016-01-01

Physical therapy for youth with cerebral palsy (CP) who are ambulatory includes interventions to increase functional mobility and participation in physical activity (PA). Thus, reliable and valid measures are needed to document PA in youth with CP. The purpose of this study was to evaluate the inter-instrument reliability and concurrent validity of 3 accelerometer-based motion sensors with indirect calorimetry as the criterion for measuring PA intensity in youth with CP. Fifty-seven youth with CP (mean age=12.5 years, SD=3.3; 51% female; 49.1% with spastic hemiplegia) participated. Inclusion criteria were: aged 6 to 20 years, ambulatory, Gross Motor Function Classification System (GMFCS) levels I through III, able to follow directions, and able to complete the full PA protocol. Protocol activities included standardized activity trials with increasing PA intensity (resting, writing, household chores, active video games, and walking at 3 self-selected speeds), as measured by weight-relative oxygen uptake (in mL/kg/min). During each trial, participants wore bilateral accelerometers on the upper arms, waist/hip, and ankle and a portable indirect calorimeter. Intraclass coefficient correlations (ICCs) were calculated to evaluate inter-instrument reliability (left-to-right accelerometer placement). Spearman correlations were used to examine concurrent validity between accelerometer output (activity and step counts) and indirect calorimetry. Friedman analyses of variance with post hoc pair-wise analyses were conducted to examine the validity of accelerometers to discriminate PA intensity across activity trials. All accelerometers exhibited excellent inter-instrument reliability (ICC=.94-.99) and good concurrent validity (rho=.70-.85). All accelerometers discriminated PA intensity across most activity trials. This PA protocol consisted of controlled activity trials. Accelerometers provide valid and reliable measures of PA intensity among youth with CP. © 2016 American Physical Therapy Association.
Modified personal interviews: resurrecting reliable personal interviews for admissions?

PubMed

Hanson, Mark D; Kulasegaram, Kulamakan Mahan; Woods, Nicole N; Fechtig, Lindsey; Anderson, Geoff

2012-10-01

Traditional admissions personal interviews provide flexible faculty-student interactions but are plagued by low inter-interview reliability. Axelson and Kreiter (2009) retrospectively showed that multiple independent sampling (MIS) may improve reliability of personal interviews; thus, the authors incorporated MIS into the admissions process for medical students applying to the University of Toronto's Leadership Education and Development Program (LEAD). They examined the reliability and resource demands of this modified personal interview (MPI) format. In 2010-2011, LEAD candidates submitted written applications, which were used to screen for participation in the MPI process. Selected candidates completed four brief (10-12 minutes) independent MPIs each with a different interviewer. The authors blueprinted MPI questions to (i.e., aligned them with) leadership attributes, and interviewers assessed candidates' eligibility on a five-point Likert-type scale. The authors analyzed inter-interview reliability using the generalizability theory. Sixteen candidates submitted applications; 10 proceeded to the MPI stage. Reliability of the written application components was 0.75. The MPI process had overall inter-interview reliability of 0.79. Correlation between the written application and MPI scores was 0.49. A decision study showed acceptable reliability of 0.74 with only three MPIs scored using one global rating. Furthermore, a traditional admissions interview format would take 66% more time than the MPI format. The MPI format, used during the LEAD admissions process, achieved high reliability with minimal faculty resources. The MPI format's reliability and effective resource use were possible through MIS and employment of expert interviewers. MPIs may be useful for other admissions tasks.
Is laser speckle contrast analysis (LASCA) the new kid on the block in systemic sclerosis? A systematic literature review and pilot study to evaluate reliability of LASCA to measure peripheral blood perfusion in scleroderma patients.

PubMed

Cutolo, Maurizio; Vanhaecke, Amber; Ruaro, Barbara; Deschepper, Ellen; Ickinger, Claudia; Melsens, Karin; Piette, Yves; Trombetta, Amelia Chiara; De Keyser, Filip; Smith, Vanessa

2018-06-06

A reliable tool to evaluate flow is paramount in systemic sclerosis (SSc). We describe herein on the one hand a systematic literature review on the reliability of laser speckle contrast analysis (LASCA) to measure the peripheral blood perfusion (PBP) in SSc and perform an additional pilot study, investigating the intra- and inter-rater reliability of LASCA. A systematic search was performed in 3 electronic databases, according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. In the pilot study, 30 SSc patients and 30 healthy subjects (HS) underwent LASCA assessment. Intra-rater reliability was assessed by having a first anchor rater performing the measurements at 2 time-points and inter-rater reliability by having the anchor rater and a team of second raters performing the measurements in 15 SSc and 30 HS. The measurements were repeated with a second anchor rater in the other 15 SSc patients, as external validation. Only 1 of the 14 records of interest identified through the systematic search was included in the final analysis. In the additional pilot study: intra-class correlation coefficient (ICC) for intra-rater reliability of the first anchor rater was 0.95 in SSc and 0.93 in HS, the ICC for inter-rater reliability was 0.97 in SSc and 0.93 in HS. Intra- and inter-rater reliability of the second anchor rater was 0.78 and 0.87. The identified literature regarding the reliability of LASCA measurements reports good to excellent inter-rater agreement. This very pilot study could confirm the reliability of LASCA measurements with good to excellent inter-rater agreement and found additionally good to excellent intra-rater reliability. Furthermore, similar results were found in the external validation. Copyright © 2018. Published by Elsevier B.V.
The new GRID Hamilton Rating Scale for Depression demonstrates excellent inter-rater reliability for inexperienced and experienced raters before and after training.

PubMed

Tabuse, Hideaki; Kalali, Amir; Azuma, Hideki; Ozaki, Norio; Iwata, Nakao; Naitoh, Hiroshi; Higuchi, Teruhiko; Kanba, Shigenobu; Shioe, Kunihiko; Akechi, Tatsuo; Furukawa, Toshi A

2007-09-30

The Hamilton Rating Scale for Depression (HAMD) is the de facto international gold standard for the assessment of depression. There are some criticisms, however, especially with regard to its inter-rater reliability, due to the lack of standardized questions or explicit scoring procedures. The GRID-HAMD was developed to provide standardized explicit scoring conventions and a structured interview guide for administration and scoring of the HAMD. We developed the Japanese version of the GRID-HAMD and examined its inter-rater reliability among experienced and inexperienced clinicians (n=70), how rater characteristics may affect it, and how training can improve it in the course of a model training program using videotaped interviews. The results showed that the inter-rater reliability of the GRID-HAMD total score was excellent to almost perfect and those of most individual items were also satisfactory to excellent, both with experienced and inexperienced raters, and both before and after the training. With its standardized definitions, questions and detailed scoring conventions, the GRID-HAMD appears to be the best achievable set of interview guides for the HAMD and can provide a solid tool for highly reliable assessment of depression severity.
The development and reliability of a simple field based screening tool to assess core stability in athletes.

PubMed

O'Connor, S; McCaffrey, N; Whyte, E; Moran, K

2016-07-01

To adapt the trunk stability test to facilitate further sub-classification of higher levels of core stability in athletes for use as a screening tool. To establish the inter-tester and intra-tester reliability of this adapted core stability test. Reliability study. Collegiate athletic therapy facilities. Fifteen physically active male subjects (19.46 ± 0.63) free from any orthopaedic or neurological disorders were recruited from a convenience sample of collegiate students. The intraclass correlation coefficients (ICC) and 95% Confidence Intervals (CI) were computed to establish inter-tester and intra-tester reliability. Excellent ICC values were observed in the adapted core stability test for inter-tester reliability (0.97) and good to excellent intra-tester reliability (0.73-0.90). While the 95% CI were narrow for inter-tester reliability, Tester A and C 95% CI's were widely distributed compared to Tester B. The adapted core stability test developed in this study is a quick and simple field based test to administer that can further subdivide athletes with high levels of core stability. The test demonstrated high inter-tester and intra-tester reliability. Copyright © 2015 Elsevier Ltd. All rights reserved.
Brief Report: "Quick and (Not So) Dirty" Assessment of Change in Autism--Cross-Cultural Reliability of the Developmental Disabilities CGAS and the OSU Autism CGI

ERIC Educational Resources Information Center

Choque Olsson, Nora; Bölte, Sven

2014-01-01

There are few evaluated economic tools to assess change in autism. This study examined the inter-rater reliability of the Developmental Disabilities Children's Global Assessment Scale (DD-CGAS), and the OSU Autism Clinical Global Impression (OSU Autism CGI) in a European setting. Using these scales, 16 clinicians with multidisciplinary…
Assessing the Reliability and Use of the Expository Scoring Scheme as a Measure of Developmental Change in Monolingual English and Bilingual French/English Children

ERIC Educational Resources Information Center

Bird, Elizabeth Kay-Raining; Joshi, Nila; Cleave, Patricia L.

2016-01-01

Purpose: The Expository Scoring Scheme (ESS) is designed to analyze the macrostructure of descriptions of a favorite game or sport. This pilot study examined inter- and intrarater reliability of the ESS and use of the scale to capture developmental change in elementary school children. Method: Twenty-four children in 2 language groups (monolingual…
The Critical Thinking Analytic Rubric (CTAR): Investigating Intra-Rater and Inter-Rater Reliability of a Scoring Mechanism for Critical Thinking Performance Assessments

ERIC Educational Resources Information Center

Saxton, Emily; Belanger, Secret; Becker, William

2012-01-01

The purpose of this study was to investigate the intra-rater and inter-rater reliability of the Critical Thinking Analytic Rubric (CTAR). The CTAR is composed of 6 rubric categories: interpretation, analysis, evaluation, inference, explanation, and disposition. To investigate inter-rater reliability, two trained raters scored four sets of…
Assessing the environmental characteristics of cycling routes to school: a study on the reliability and validity of a Google Street View-based audit.

PubMed

Vanwolleghem, Griet; Van Dyck, Delfien; Ducheyne, Fabian; De Bourdeaudhuij, Ilse; Cardon, Greet

2014-06-10

Google Street View provides a valuable and efficient alternative to observe the physical environment compared to on-site fieldwork. However, studies on the use, reliability and validity of Google Street View in a cycling-to-school context are lacking. We aimed to study the intra-, inter-rater reliability and criterion validity of EGA-Cycling (Environmental Google Street View Based Audit - Cycling to school), a newly developed audit using Google Street View to assess the physical environment along cycling routes to school. Parents (n = 52) of 11-to-12-year old Flemish children, who mostly cycled to school, completed a questionnaire and identified their child's cycling route to school on a street map. Fifty cycling routes of 11-to-12-year olds were identified and physical environmental characteristics along the identified routes were rated with EGA-Cycling (5 subscales; 37 items), based on Google Street View. To assess reliability, two researchers performed the audit. Criterion validity of the audit was examined by comparing the ratings based on Google Street View with ratings through on-site assessments. Intra-rater reliability was high (kappa range 0.47-1.00). Large variations in the inter-rater reliability (kappa range -0.03-1.00) and criterion validity scores (kappa range -0.06-1.00) were reported, with acceptable inter-rater reliability values for 43% of all items and acceptable criterion validity for 54% of all items. EGA-Cycling can be used to assess physical environmental characteristics along cycling routes to school. However, to assess the micro-environment specifically related to cycling, on-site assessments have to be added.
Reliability of joint count assessment in rheumatoid arthritis: a systematic literature review.

PubMed

Cheung, Peter P; Gossec, Laure; Mak, Anselm; March, Lyn

2014-06-01

Joint counts are central to the assessment of rheumatoid arthritis (RA) but reliability is an issue. To evaluate the reliability and agreement of joint counts (intra-observer and inter-observer) by health care professionals (physicians, nurses, and metrologists) and patients in RA, and the impact of training and standardization on joint count reliability through a systematic literature review. Articles reporting joint count reliability or agreement in RA in PubMed, EMBase, and the Cochrane library between 1960 and 2012 were selected. Data were extracted regarding tender joint counts (TJCs) and swollen joint counts (SJCs) derived by physicians, metrologists, or patients for intra-observer and inter-observer reliability. In addition, methods and effects of training or standardization were extracted. Statistics expressing reliability such as intraclass correlation coefficients (ICCs) were extracted. Data analysis was primarily descriptive due to high heterogeneity. Twenty-eight studies on health care professionals (HCP) and 20 studies on patients were included. Intra-observer reliability for TJCs and SJCs was good for HCPs and patients (range of ICC: 0.49-0.98). Inter-observer reliability between HCPs for TJCs was higher than for SJCs (range of ICC: 0.64-0.88 vs. 0.29-0.98). Patient inter-observer reliability with HCPs as comparators was better for TJCs (range of ICC: 0.31-0.91) compared to SJCs (0.16-0.64). Nine studies (7 with HCPs and 2 with patients) evaluated consensus or training, with improvement in reliability of TJCs but conflicting evidence for SJCs. Intra- and inter-observer reliability was high for TJCs for HCPs and patients: among all groups, reliability was better for TJCs than SJCs. Inter-observer reliability of SJCs was poorer for patients than HCPs. Data were inconclusive regarding the potential for training to improve SJC reliability. Overall, the results support further evaluation for patient-reported joint counts as an outcome measure. © 2013 Published by Elsevier Inc.
[Overal cognitive assessment in Basque-speaking people with advanced dementia. Validation to the Basque language of the Severe Mini-Mental State Examination SMMSE (SMMSE-eus)].

PubMed

Buiza, Cristina; Yanguas, Javier; Zulaica, Amaia; Antón, Iván; Arriola, Enrique; García, Alvaro

2018-04-13

Adaptation and validation to the Basque language of tests to assess advanced cognitive impairment is a not covered need for Basque-speaking people. The present work shows the validation of the Basque version of the Severe Mini Mental State Examination (SMMSE). A total of 109 people with advanced dementia (MEC<15) took part in the validation study, and were classified as GDS 5-7 on the Geriatric Depression Scale (GDS). All participants were Spanish-Basque bilingual. It was shown that SMMSE-eus has a high internal consistency (alpha=0.92), a good test-retest reliability (r=0.88; P<.01), and a high inter-rater reliability (CCI=0.99; P<.00) for the overall score, as well as for each item. Both the high internal consistency and inter-rater reliability, and to a lesser extent, test-retest reliability, made the SMMSE-eus a valid test for the brief assessment of cognitive status in people with advanced dementia in Basque-speaking people. For this reason, the SMMSE-eus is a usable and reliable alternative for assessing Basque-speaking people in their mother-tongue, or preferred language. Copyright © 2017 SEGG. Publicado por Elsevier España, S.L.U. All rights reserved.
Reliability of a two-wavelength autofluorescence technique by Heidelberg Spectralis to measure macular pigment optical density in Asian subjects.

PubMed

Obana, Akira; Gellermann, Werner; Gohto, Yuko; Seto, Takahiko; Sasano, Hiroyuki; Tanito, Masaki; Okazaki, Shigetoshi

2018-03-01

This study evaluates the accuracy of an objective two-wavelength fundus autofluorescence technique for the purpose of measuring the macular pigment optical density (MPOD) in Asian pigmented eyes. Potential differences between MPOD values obtained via autofluorescence technique and subjective heterochromatic photometry (HFP) were examined. Inter-examiner reproducibility between three examiners and test-retest reliability over five time points were also explored. Subjects were 27 healthy Japanese volunteers aged 24 to 58 (mean ± standard deviation, 40.2 ± 9.0) years. An MPOD module of the Spectralis MultiColor instrument configuration (Spectralis-MP) was used for the autofluorescence technique, and a Macular Metrics Densitometer (MM) was used for HFP. The mean MPOD values at 0.25° and 0.5° eccentricities using the Spectralis-MP were 0.51 ± 0.12 and 0.48 ± 0.13, respectively. In comparison, the MM based values were 0.72 ± 0.23 and 0.61 ± 0.25, respectively. High correlations between the Spectralis-MP and MM instrument were found (Pearson's correlation coefficients of 0.73 and 0.87 at 0.25° and 0.5° eccentricities, respectively), but there was a systematic bias: the MPOD values by MM method were significantly higher than those by Spectralis-MP at 0.25° eccentricity. High inter-examiner reproducibility and test-retest reliability were found for MM measurements at 0.5° eccentricity, but not at 0.25°. The Spectralis-MP showed less inter-examiner and test-retest variability than the MM instrument at 0.25° and 0.5° eccentricities. We conclude that the Spectralis-MP, given its high agreement with the HFP method and due to its higher reproducibility and reliability, is well suited for clinical measurements of MPOD levels in Asian pigmented eyes. Copyright © 2018. Published by Elsevier Ltd.
Evaluation of the Walking Index for Spinal Cord Injury II (WISCI-II) in children with Spinal Cord Injury (SCI).

PubMed

Calhoun Thielen, C; Sadowsky, C; Vogel, L C; Taylor, H; Davidson, L; Bultman, J; Gaughan, J; Mulcahey, M J

2017-05-01

Mixed methods were used in this study. The appropriateness of the levels of the Walking Index for Spinal Cord Injury II (WISCI-II) for application in children was critically reviewed by physical therapists using the Modified Delphi Technique, and the inter- and intra-rater reliability of the WISCI-II in children was evaluated. To examine the construct validity, and to establish reliability of the WISCI-II related to its use in children with spinal cord injury (SCI). United States of America. Using a Modified Delphi Technique, physical therapists critically reviewed the WISCI-II levels for pediatric utilization. Concurrently, ambulatory children under age 18 years with SCI were evaluated using the WISCI-II on two occasions by the same therapist to establish intra-rater reliability. One trial was photographed and de-identified. Each photograph was reviewed by four different physical therapists who gave WISCI-II scores to establish inter-rater reliability. Summary and descriptive statistics were used to calculate the frequency of yes/no responses for each WISCI-II level question and to determine the percent agreement for each question. Inter- and intra-rater reliability was calculated using interclass correlation coefficients (ICCs) with 95% confidence intervals (CI). Construct validity was confirmed after one Delphi round during which at least 80% agreement was established by 51 physical therapists on the appropriateness of the WISCI-II levels for children. Fifty-two children with SCI aged 2-17 years completed repeated WISCI-II assessments and 40 de-identified photographs were scored by four physical therapists. Intra- and inter-rater reliability was high (ICC=0.997, CI=0.995-0.998 and ICC=0.97, CI=0.95-0.98, respectively). This study demonstrates support for the use of the WISCI-II in ambulatory children with SCI. This study was funded by the Craig H Neilsen Foundation, Spinal Cord Injury Research on the Translation Spectrum, Senior Research Award #282592 (Mulcahey, PI).
Gait assessment using the Microsoft Xbox One Kinect: Concurrent validity and inter-day reliability of spatiotemporal and kinematic variables.

PubMed

Mentiplay, Benjamin F; Perraton, Luke G; Bower, Kelly J; Pua, Yong-Hao; McGaw, Rebekah; Heywood, Sophie; Clark, Ross A

2015-07-16

The revised Xbox One Kinect, also known as the Microsoft Kinect V2 for Windows, includes enhanced hardware which may improve its utility as a gait assessment tool. This study examined the concurrent validity and inter-day reliability of spatiotemporal and kinematic gait parameters estimated using the Kinect V2 automated body tracking system and a criterion reference three-dimensional motion analysis (3DMA) marker-based camera system. Thirty healthy adults performed two testing sessions consisting of comfortable and fast paced walking trials. Spatiotemporal outcome measures related to gait speed, speed variability, step length, width and time, foot swing velocity and medial-lateral and vertical pelvis displacement were examined. Kinematic outcome measures including ankle flexion, knee flexion and adduction and hip flexion were examined. To assess the agreement between Kinect and 3DMA systems, Bland-Altman plots, relative agreement (Pearson's correlation) and overall agreement (concordance correlation coefficients) were determined. Reliability was assessed using intraclass correlation coefficients, Cronbach's alpha and standard error of measurement. The spatiotemporal measurements had consistently excellent (r≥0.75) concurrent validity, with the exception of modest validity for medial-lateral pelvis sway (r=0.45-0.46) and fast paced gait speed variability (r=0.73). In contrast kinematic validity was consistently poor to modest, with all associations between the systems weak (r<0.50). In those measures with acceptable validity, the inter-day reliability was similar between systems. In conclusion, while the Kinect V2 body tracking may not accurately obtain lower body kinematic data, it shows great potential as a tool for measuring spatiotemporal aspects of gait. Copyright © 2015 Elsevier Ltd. All rights reserved.
Assessment of the severity of dementia: validity and reliability of the Chinese (Cantonese) version of the Hierarchic Dementia Scale (CV-HDS).

PubMed

Poon, Vickie Wan-kei; Lam, Linda Chiu-wa; Wong, Samuel Yeung-shan

2008-09-01

With the rapid growth of the older population, early detection of cognitive deficits is crucial in slowing down functional deterioration of the elderly persons. To examine the validity and reliability of the Chinese (Cantonese) version of the Hierarchic Dementia Scale (CV-HDS) for Chinese older persons in Hong Kong. The HDS was translated into Cantonese Chinese. The content and cultural validity were evaluated by six expert panel members. Sixty-two participants with diagnosis of dementia were recruited for evaluation. Inter-rater reliability, test-retest reliability, internal consistency and concurrent validity were examined. The CV-HDS demonstrated satisfactory psychometric properties. inter-rater reliability and test-retest reliability were high (alpha=0.89 and alpha=0.94 respectively). High value of Cronbach's alpha (alpha=0.94) demonstrated good internal consistency. The concurrent validity of CV-HDS, through correlation with its scores with that of the Chinese version of Mini Mental Status Examination, was established (ranged from r=0.58 to r=0.78, p<0.01). The CV-HDS is a reliable and valid instrument for assessing severity of cognitive impairment in Cantonese speaking Chinese people with dementia. It facilitates treatment planning to optimize the effects of functional training and rehabilitation.
RELIABILITY AND VALIDITY OF A BIOMECHANICALLY BASED ANALYSIS METHOD FOR THE TENNIS SERVE

PubMed Central

Kibler, W. Ben; Lamborn, Leah; Smith, Belinda J.; English, Tony; Jacobs, Cale; Uhl, Tim L.

2017-01-01

Background An observational tennis serve analysis (OTSA) tool was developed using previously established body positions from three-dimensional kinematic motion analysis studies. These positions, defined as nodes, have been associated with efficient force production and minimal joint loading. However, the tool has yet to be examined scientifically. Purpose The primary purpose of this investigation was to determine the inter-observer reliability for each node between two health care professionals (HCPs) that developed the OTSA, and secondarily to investigate the validity of the OTSA. Methods Two separate studies were performed to meet these objectives. An inter-observer reliability study preceded the validity study by examining 28 videos of players serving. Two HCPs graded each video and scored the presence or absence of obtaining each node. Discriminant validity was determined in 33 tennis players using video taped records of three first serves. Serve mechanics were graded using the OSTA and categorized players into those with good ( ≥ 5) and poor ( ≤ 4) mechanics. Participants performed a series of field tests to evaluate trunk flexibility, lower extremity and trunk power, and dynamic balance. Results The group with good mechanics demonstrated greater backward trunk flexibility (p=0.02), greater rotational power (p=0.02), and higher single leg countermovement jump (p=0.05). Reliability of the OTSA ranged from K = 0.36-1.0, with the majority of all the nodes displaying substantial reliability (K>0.61). Conclusion This study provides HCPs with a valid and reliable field tool used to assess serve mechanics. Physical characteristics of trunk mobility and power appear to discriminate serve mechanics between players. Future intervention studies are needed to determine if improvement in physical function contribute to improved serve mechanics. Level of Evidence 3 PMID:28593098
Inter-observer reliability of DSM-5 substance use disorders.

PubMed

Denis, Cécile M; Gelernter, Joel; Hart, Amy B; Kranzler, Henry R

2015-08-01

Although studies have examined the impact of changes made in DSM-5 on the estimated prevalence of substance use disorder (SUD) diagnoses, there is limited evidence concerning the reliability of DSM-5 SUDs. We evaluated the inter-observer reliability of four DSM-5 SUDs in a sample in which we had previously evaluated the reliability of DSM-IV diagnoses, allowing us to compare the two systems. Two different interviewers each assessed 173 subjects over a 2-week period using the Semi-Structured Assessment for Drug Dependence and Alcoholism (SSADDA). Using the percent agreement and kappa (κ) coefficient, we examined the reliability of DSM-5 lifetime alcohol, opioid, cocaine, and cannabis use disorders, which we compared to that of SSADDA-derived DSM-IV SUD diagnoses. We also assessed the effect of additional lifetime SUD and lifetime mood or anxiety disorder diagnoses on the reliability of the DSM-5 SUD diagnoses. Reliability was good to excellent for the four disorders, with κ values ranging from 0.65 to 0.94. Agreement was consistently lower for SUDs of mild severity than for moderate or severe disorders. DSM-5 SUD diagnoses showed greater reliability than DSM-IV diagnoses of abuse or dependence or dependence only. Co-occurring SUD and lifetime mood or anxiety disorders exerted a modest effect on the reliability of the DSM-5 SUD diagnoses. For alcohol, opioid, cocaine and cannabis use disorders, DSM-5 criteria and diagnoses are at least as reliable as those of DSM-IV. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
A study on the reproducibility of cephalometric landmarks when undertaking a three-dimensional (3D) cephalometric analysis

PubMed Central

Llamas, José M.; Cibrián, Rosa; Gandia, José L.; Paredes, Vanessa

2012-01-01

Objectives: Cone Beam Computerized Tomography (CBCT) allows the possibility of modifying some of the diagnostic tools used in orthodontics, such as cephalometry. The first step must be to study the characteristics of these devices in terms of accuracy and reliability of the most commonly used landmarks. The aims were 1- To assess intra and inter-observer reliability in the location of anatomical landmarks belonging to hard tissues of the skull in images taken with a CBCT device, 2- To determine which of those landmarks are more vs. less reliable and 3- To introduce planes of reference so as to create cephalometric analyses appropriated to the 3D reality. Study design: Fifteen patients who had a CBCT (i-CAT®) as a diagnostic register were selected. To assess the reproducibility on landmark location and the differences in the measurements of two observers at different times, 41 landmarks were defined on the three spatial axes (X,Y,Z) and located. 3.690 measurements were taken and, as each determination has 3 coordinates, 11.070 data were processed with SPSS® statistical package. To discover the reproducibility of the method on landmark location, an ANOVA was undertaken using two variation factors: time (t1, t2 and t3) and observer (Ob1 and Ob2) for each axis (X, Y and Z) and landmark. The order of the CBCT scans submitted to the observers (Ob1, Ob2) at t1, t2, and t3, were different and randomly allocated. Multiple comparisons were undertaken using the Bonferroni test. The intra- and inter-examiner ICC´s were calculated. Results: Intra- and inter-examiner reliability was high, both being ICC ≥ 0.99, with the best frequency on axis Z. Conclusions: The most reliable landmarks were: Nasion, Sella, Basion, left Porion, point A, anterior nasal spine, Pogonion, Gnathion, Menton, frontozygomatic sutures, first lower molars and upper and lower incisors. Those with less reliability were the supraorbitals, right zygion and posterior nasal spine. Key words:Cone Beam Computed Tomography, cephalometry, landmark, orthodontics, reliability. PMID:22322503
Reliability of mercury-in-silastic strain gauge plethysmography curve reading: influence of clinical clues and observer variation.

PubMed

Høyer, Christian; Pavar, Susanne; Pedersen, Begitte H; Biurrun Manresa, José A; Petersen, Lars J

2013-08-01

Mercury-in-silastic strain gauge pletysmography (SGP) is a well-established technique for blood flow and blood pressure measurements. The aim of this study was to examine (i) the possible influence of clinical clues, e.g. the presence of wounds and color changes during blood pressure measurements, and (ii) intra- and inter-observer variation of curve interpretation for segmental blood pressure measurements. A total of 204 patients with known or suspected peripheral arterial disease (PAD) were included in a diagnostic accuracy trial. Toe and ankle pressures were measured in both limbs, and primary observers analyzed a total of 804 pressure curve sets. The SGP curves were later reanalyzed separately by two observers blinded to clinical clues. Intra- and inter-observer agreement was quantified using Cohen's kappa and reliability was quantified using intra-class correlation coefficients, coefficients of variance, and Bland-Altman analysis. There was an overall agreement regarding patient diagnostic classification (PAD/not PAD) in 202/204 (99.0%) for intra-observer (κ = 0.969, p < 0.001), and 201/204 (98.5%) for inter-observer readings (κ = 0.953, p < 0.001). Reliability analysis showed excellent correlation between blinded versus non-blinded and inter-observer readings for determination of absolute segmental pressures (all intraclass correlation coefficients ≥ 0.984). The coefficient of variance for determination of absolute segmental blood pressure ranged from 2.9-3.4% for blinded/non-blinded data and from 3.8-5.0% for inter-observer data. This study shows a low inter-observer variation among experienced laboratory technicians for reading strain gauge curves. The low variation between blinded/non-blinded readings indicates that SGP measurements are minimally biased by clinical clues.
Dental examiners consistency in applying the ICDAS criteria for a caries prevention community trial.

PubMed

Nelson, S; Eggertsson, H; Powell, B; Mandelaris, J; Ntragatakis, M; Richardson, T; Ferretti, G

2011-09-01

To examine dental examiners' one-year consistency in utilizing the International Caries Detection and Assessment System (ICDAS) criteria after baseline training and calibration. A total of three examiners received baseline training/calibration by a "gold standard" examiner, and one year later re-calibration was conducted. For the baseline training/calibration, subjects aged 8-16 years, and for the re-calibration subjects aged five to six years were recruited for the study. The ICDAS criteria were used to classify visual caries lesion severity (0-6 scale), lesion activity (active/inactive), and presence of filling material (0-9 scale) of all available tooth surfaces of permanent and primary teeth. The examination used a clinical light, mirror and air syringe. Kappa (weighted: Wkappa, unweighted: Kappa) statistics were used to determine inter-and intra-examiner reliability at baseline and re-calibration. For lesion severity and filling criteria, the baseline calibration on 35 subjects indicated an inter-rater Wkappa ranging from 0.69-0.92 and intra-rater Wkappa ranging from 0.81-0.92. Re-calibration on 22 subjects indicated an inter-rater Wkappa of 0.77-0.98 and intra-rater Wkappa ranged from 0.93-1.00. The Wkappa for filling was consistently in the excellent range, while lesion severity was in the good to excellent range. Activity kappa was in the poor to good range. All examiners improved with time. The baseline training/calibration in ICDAS was crucial to maintain the stability of the examiners reliability over a one year period. The ICDAS can be an effective assessment tool for community-based clinical trials.

Reliability of automatic vibratory equipment for ultrasonic strain measurement of the median nerve.

PubMed

Yoshii, Yuichi; Ishii, Tomoo; Etou, Fumihiko; Sakai, Shinsuke; Tanaka, Toshikazu; Ochiai, Naoyuki

2014-10-01

The objective of this study was to test the reliability of ultrasonic median nerve strain measurements using automatic vibratory equipment. Strain ratios of the median nerve in the carpal tunnel model and the reference coupler were measured at three different settings of the transducer: 0, +2 and +4 mm (+ = compressing the model down 2-4 mm initially). After measurement of the carpal tunnel model, a +4-mm setting was chosen for in vivo measurement. The median nerve strains of 30 wrists were measured by two examiners using the equipment. Intra- and inter-examiner correlation coefficients (CCs) for the strain ratios were calculated. The closest ratio was found in the +4-mm placement (strain ratio: 0.73, Young's modulus ratio: 0.79). The intra-examiner CC was 0.91 (p < 0.01), and the inter-examiner CCs were 0.72-0.78 (p < 0.01). The automatic vibratory equipment was useful in quantifying median nerve strain at the wrist. Copyright © 2014 World Federation for Ultrasound in Medicine & Biology. Published by Elsevier Inc. All rights reserved.
Implementing and Evaluating a National Certification Technical Skills Examination: The Colorectal Objective Structured Assessment of Technical Skill.

PubMed

de Montbrun, Sandra; Roberts, Patricia L; Satterthwaite, Lisa; MacRae, Helen

2016-07-01

To implement the Colorectal Objective Structured Assessment of Technical skill (COSATS) into American Board of Colon and Rectal Surgery (ABCRS) certification and build evidence of validity for the interpretation of the scores of this high stakes assessment tool. Currently, technical skill assessment is not a formal component of board certification. With the technical demands of surgical specialties, documenting competence in technical skill at the time of certification with a valid tool is ideal. In September 2014, the COSATS was a mandatory component of ABCRS certification. Seventy candidates took the examination, with their performance evaluated by expert colorectal surgeons using a task-specific checklist, global rating scale, and overall performance scale. Passing scores were set and compared using 2 standard setting methodologies, using a compensatory and conjunctive model. Inter-rater reliability and the reliability of the pass/fail decision were calculated using Cronbach alpha and Subkoviak methodology, respectively. Overall COSATS scores and pass/fail status were compared with results on the ABCRS oral examination. The pass rate ranged from 85.7% to 90%. Inter-rater reliability (0.85) and reliability of the pass/fail decision (0.87 and 0.84) were high. A low positive correlation (r= 0.25) was seen between the COSATS and oral examination. All individuals who failed the COSATS passed the ABCRS oral examination. COSATS is the first technical skill examination used in national surgical board certification. This study suggests that the current certification process may be failing to identify individuals who have demonstrated technical deficiencies on this standardized assessment tool.
Reliability of externally fixed dynamometry hamstring strength testing in elite youth football players.

PubMed

Wollin, Martin; Purdam, Craig; Drew, Michael K

2016-01-01

To investigate inter and intra-tester reliability of an externally fixed dynamometry unilateral hamstring strength test, in the elite sports setting. Reliability study. Sixteen, injury-free, elite male youth football players (age=16.81±0.54 years, height=180.22±5.29cm, weight 73.88±6.54kg, BMI=22.57±1.42) gave written informed consent. Unilateral maximum isometric peak hamstring force was evaluated by externally fixed dynamometry for inter-tester, intra-day and intra-tester, inter-week reliability. The test position was standardised to correlate with the terminal swing phase of the gait running cycle. Inter and intra-tester values demonstrated good to high levels of reliability. The intra-class coefficient (ICC) for inter-tester, intra-day reliability was 0.87 (95% CI=0.75-0.93) with standard error of measure percentage (SEM%) 4.7 and minimal detectable change percentage (MDC%) 12.9. Intra-tester, inter-week reliability results were ICC 0.86 (95% CI, 0.74-0.93), SEM% 5.0 and MDC% 14.0. This study demonstrates good to high inter and intra-tester reliability of isometric externally fixed dynamometry unilateral hamstring strength testing in the regular elite sport setting involving elite male youth football players. The intra-class coefficient in association with the low standard error of measure and minimal detectable change percentages suggest that this procedure is appropriate for clinical and academic use as well as monitoring hamstring strength in the elite sport setting. Crown Copyright © 2015. Published by Elsevier Ltd. All rights reserved.
A Spanish validation of the Coma Recovery Scale-Revised (CRS-R).

PubMed

Tamashiro, Mercedes; Rivas, Maria Elisa; Ron, Melania; Salierno, Fernando; Dalera, Marisol; Olmos, Lisandro

2014-01-01

Analysis of inter-rater reliability and concurrent validity. To determine measurement properties of a Spanish version of The Coma Recovery Scale-Revised (CRS-R). A sample of 35 in-patients with severe acquired brain injury. To test concurrent validity of the translated scale, the Glasgow Coma Scale (GSC) and Disability Rating Scale (DRS) were also administered. Two experts in the field were recruited to assess inter-rater agreement. Inter-rater reliability was good for total CRS-R scores (Cronbach α = 0.973, p = 0.001). Sub-scale analysis showed moderate-to-high inter-rater agreement. Total CRS-R scores correlated significantly (p < 0.05) with total GCS (r = 0.74) and DRS (r = 0.54) scores, indicating acceptable concurrent validity. The Spanish version of CRS-R can be administered reliably by trained and experienced examiners. CRS-R appears capable of differentiating patients in Emergence from Minimally Conscious State (EMCS) or in Minimally Conscious State (MCS) from those in a Vegetative State (VS).
A reliability study of the new sensors for movement analysis (SHARIF-HMIS).

PubMed

Abedi, Mohen; Manshadi, Farideh Dehghan; Zavieh, Minoo Khalkhali; Ashouri, Sajad; Azimi, Hadi; Parnanpour, Mohamad

2016-04-01

SHARIF-HMIS is a new inertial sensor designed for movement analysis. The aim of the present study was to assess the inter-tester and intra-tester reliability of some kinematic parameters in different lumbar motions making use of this sensor. 24 healthy persons and 28 patients with low back pain participated in the current reliability study. The test was performed in five different lumbar motions consisting of lumbar flexion in 0, 15, and 30° in the right and left directions. For measuring inter-tester reliability, all the tests were carried out twice on the same day separately by two physiotherapists. Intra-tester reliability was assessed by reproducing the tests after 3 days by the same physiotherapist. The present study revealed satisfactory inter- and intra-tester reliability indices in different positions. ICCs for intra-tester reliability ranged from 0.65 to 0.98 and 0.59 to 0.81 for healthy and patient participants, respectively. Also, ICCs for inter-tester reliability ranged from 0.65 to 0.92 for the healthy and 0.65 to 0.87 for patient participants. In general, it can be inferred from the results that measuring the kinematic parameters in lumbar movements using inertial sensors enjoys acceptable reliability. Copyright © 2015 Elsevier Ltd. All rights reserved.
Inter-rater reliability of PATH observations for assessment of ergonomic risk factors in hospital work.

PubMed

Park, Jung-Keun; Boyer, Jon; Tessler, Jamie; Casey, Jeffrey; Schemm, Linda; Gore, Rebecca; Punnett, Laura

2009-07-01

This study examined the inter-rater reliability of expert observations of ergonomic risk factors by four analysts. Ten jobs were observed at a hospital using a newly expanded version of the PATH method (Buchholz et al. 1996), to which selected upper extremity exposures had been added. Two of the four raters simultaneously observed each worker onsite for a total of 443 observation pairs containing 18 categorical exposure items each. For most exposure items, kappa coefficients were 0.4 or higher. For some items, agreement was higher both for the jobs with less rapid hand activity and for the analysts with a higher level of ergonomic job analysis experience. These upper extremity exposures could be characterised reliably with real-time observation, given adequate experience and training of the observers. The revised version of PATH is applicable to the analysis of jobs where upper extremity musculoskeletal strain is of concern.
Validity and Reliability of 10-Hz Global Positioning System to Assess In-line Movement and Change of Direction.

PubMed

Nikolaidis, Pantelis T; Clemente, Filipe M; van der Linden, Cornelis M I; Rosemann, Thomas; Knechtle, Beat

2018-01-01

The objectives of the present study were to examine the validity and reliability of the 10 Hz Johan GPS unit in assessing in-line movement and change of direction. The validity was tested against the criterion measure of 200 m track-and-field (track-and-field athletes, n = 8) and 20 m shuttle run endurance test (female soccer players, n = 20). Intra-unit and inter-unit reliability was tested by intra-class correlation coefficient (ICC) and coefficient of variation (CV), respectively. An analysis of variance examined differences between the GPS measurement and five laps of 200 m at 15 km/h, and t -test examined differences between the GPS measurement and 20 m shuttle run endurance test. The difference between the GPS measurement and 200 m distance ranged from -0.13 ± 3.94 m (95% CI -3.42; 3.17) in the first lap to 2.13 ± 2.64 m (95% CI -0.08; 4.33) in the fifth lap. A good intra-unit reliability was observed in 200 m (ICC = 0.833, 95% CI 0.535; 0.962). Inter-unit CV ranged from 1.31% (fifth lap) to 2.20% (third lap). The difference between the GPS measurement and 20 m shuttle run endurance test ranged from 0.33 ± 4.16 m (95% CI -10.01; 10.68) in 11.5 km/h to 9.00 ± 5.30 m (95% CI 6.44; 11.56) in 8.0 km/h. A moderate intra-unit reliability was shown in the second and third stage of the 20 m shuttle run endurance test (ICC = 0.718, 95% CI 0.222;0.898) and good reliability in the fifth, sixth, seventh and eighth (ICC = 0.831, 95% CI -0.229;0.996). Inter-unit CV ranged from 2.08% (11.5 km/h) to 3.92% (8.5 km/h). Based on these findings, it was concluded that the 10 Hz Johan system offers an affordable valid and reliable tool for coaches and fitness trainers to monitor training and performance.
Inter-rater Reliability of Sustained Aberrant Movement Patterns as a Clinical Assessment of Muscular Fatigue

PubMed Central

Aerts, Frank; Carrier, Kathy; Alwood, Becky

2016-01-01

Background: The assessment of clinical manifestation of muscle fatigue is an effective procedure in establishing therapeutic exercise dose. Few studies have evaluated physical therapist reliability in establishing muscle fatigue through detection of changes in quality of movement patterns in a live setting. Objective: The purpose of this study is to evaluate the inter-rater reliability of physical therapists’ ability to detect altered movement patterns due to muscle fatigue. Design: A reliability study in a live setting with multiple raters. Participants: Forty-four healthy individuals (ages 19-35) were evaluated by six physical therapists in a live setting. Methods: Participants were evaluated by physical therapists for altered movement patterns during resisted shoulder rotation. Each participant completed a total of four tests: right shoulder internal rotation, right shoulder external rotation, left shoulder internal rotation and left shoulder external rotation. Results: For all tests combined, the inter-rater reliability for a single rater scoring ICC (2,1) was .65 (95%, .60, .71) This corresponds to moderate inter-rater reliability between physical therapists. Limitations: The results of this study apply only to healthy participants and therefore cannot be generalized to a symptomatic population. Conclusion: Moderate inter-rater reliability was found between physical therapists in establishing muscle fatigue through the observation of sustained altered movement patterns during dynamic resistive shoulder internal and external rotation. PMID:27347241
Specific algorithm method of scoring the Clock Drawing Test applied in cognitively normal elderly

PubMed Central

Mendes-Santos, Liana Chaves; Mograbi, Daniel; Spenciere, Bárbara; Charchat-Fichman, Helenice

2015-01-01

The Clock Drawing Test (CDT) is an inexpensive, fast and easily administered measure of cognitive function, especially in the elderly. This instrument is a popular clinical tool widely used in screening for cognitive disorders and dementia. The CDT can be applied in different ways and scoring procedures also vary. Objective The aims of this study were to analyze the performance of elderly on the CDT and evaluate inter-rater reliability of the CDT scored by using a specific algorithm method adapted from Sunderland et al. (1989). Methods We analyzed the CDT of 100 cognitively normal elderly aged 60 years or older. The CDT ("free-drawn") and Mini-Mental State Examination (MMSE) were administered to all participants. Six independent examiners scored the CDT of 30 participants to evaluate inter-rater reliability. Results and Conclusion A score of 5 on the proposed algorithm ("Numbers in reverse order or concentrated"), equivalent to 5 points on the original Sunderland scale, was the most frequent (53.5%). The CDT specific algorithm method used had high inter-rater reliability (p<0.01), and mean score ranged from 5.06 to 5.96. The high frequency of an overall score of 5 points may suggest the need to create more nuanced evaluation criteria, which are sensitive to differences in levels of impairment in visuoconstructive and executive abilities during aging. PMID:29213954
Ecologically relevant outcome measure for post-inpatient rehabilitation.

PubMed

Marquez de la Plata, Carlos; Qualls, Devin; Plenger, Patrick; Malec, James F; Hayden, Mary Ellen

2017-01-01

Transfer of skills learned within the clinic environment to patients' home or community is important in post-inpatient brain injury rehabilitation (PBIR). Outcome measures used in PBIR assess level of independence during functional tasks; however, available functional instruments do not quantitate the environment in which the behaviors occur. To examine the reliability and validity of an instrument used to assess patients' functional abilities while quantifying the amount of structure and distractions in the environment. 2501 patients who sustained a traumatic brain injury (TBI) or cerebrovascular accident (CVA) and participated in a multidisciplinary PBIR program between 2006 and 2014 were identified retrospectively for this study. The PERPOS and MPAI-4 were used to assess functional abilities at admission and at discharge. Construct validity was assessed using a bivariate Spearman rho analysis A subsample of 56 consecutive admissions during 2014 were examined to determine inter-rater reliability. Intra-class correlation coefficient (ICC) and Kappa coefficients assessed inter-rater agreement of the total PERPOS and PERPOS subscales respectively. The PERPOS and MPAI-4 demonstrated a strong negative association among both TBI and CVA patients. Kappa scores for the three PERPOS scales each demonstrated good to excellent inter-rater agreement. The ICC for overall PERPOS scores fell in the good agreement range. The PERPOS can be used reliably in PBIR to quantify patients' functional abilities within the context of environmental demands.
Reliability and validity of CODA motion analysis system for measuring cervical range of motion in patients with cervical spondylosis and anterior cervical fusion.

PubMed

Gao, Zhongyang; Song, Hui; Ren, Fenggang; Li, Yuhuan; Wang, Dong; He, Xijing

2017-12-01

The aim of the present study was to evaluate the reliability of the Cartesian Optoelectronic Dynamic Anthropometer (CODA) motion system in measuring the cervical range of motion (ROM) and verify the construct validity of the CODA motion system. A total of 26 patients with cervical spondylosis and 22 patients with anterior cervical fusion were enrolled and the CODA motion analysis system was used to measure the three-dimensional cervical ROM. Intra- and inter-rater reliability was assessed by interclass correlation coefficients (ICCs), standard error of measurement (SEm), Limits of Agreements (LOA) and minimal detectable change (MDC). Independent samples t-tests were performed to examine the differences of cervical ROM between cervical spondylosis and anterior cervical fusion patients. The results revealed that in the cervical spondylosis group, the reliability was almost perfect (intra-rater reliability: ICC, 0.87-0.95; LOA, -12.86-13.70; SEm, 2.97-4.58; inter-rater reliability: ICC, 0.84-0.95; LOA, -13.09-13.48; SEm, 3.13-4.32). In the anterior cervical fusion group, the reliability was high (intra-rater reliability: ICC, 0.88-0.97; LOA, -10.65-11.08; SEm, 2.10-3.77; inter-rater reliability: ICC, 0.86-0.96; LOA, -10.91-13.66; SEm, 2.20-4.45). The cervical ROM in the cervical spondylosis group was significantly higher than that in the anterior cervical fusion group in all directions except for left rotation. In conclusion, the CODA motion analysis system is highly reliable in measuring cervical ROM and the construct validity was verified, as the system was sufficiently sensitive to distinguish between the cervical spondylosis and anterior cervical fusion groups based on their ROM.
The children's menu assessment: development, evaluation, and relevance of a tool for evaluating children's menus.

PubMed

Krukowski, Rebecca A; Eddings, Kenya; West, Delia Smith

2011-06-01

Restaurant foods represent a substantial portion of children's dietary intake, and consumption of foods away from home has been shown to contribute to excess adiposity. This descriptive study aimed to pilot-test and establish the reliability of a standardized and comprehensive assessment tool, the Children's Menu Assessment, for evaluating the restaurant food environment for children. The tool is an expansion of the Nutrition Environment Measures Survey-Restaurant. In 2009-2010, a randomly selected sample of 130 local and chain restaurants were chosen from within 20 miles of Little Rock, AR, to examine the availability of children's menus and to conduct initial calibration of the Children's Menu Assessment tool (final sample: n=46). Independent raters completed the Children's Menu Assessment in order to determine inter-rater reliability. Test-retest reliability was also examined. Inter-rater reliability was high: percent agreement was 97% and Spearman correlation was 0.90. Test-retest was also high: percent agreement was 91% and Spearman correlation was 0.96. Mean Children's Menu Assessment completion time was 14 minutes, 56 seconds ± 10 minutes, 21 seconds. Analysis of Children's Menu Assessment findings revealed that few healthier options were available on children's menus, and most menus did not provide parents with information for making healthy choices, including nutrition information or identification of healthier options. The Children's Menu Assessment tool allows for comprehensive, rapid measurement of the restaurant food environment for children with high inter-rater reliability. This tool has the potential to contribute to public health efforts to develop and evaluate targeted environmental interventions and/or policy changes regarding restaurant foods. Copyright © 2011 American Dietetic Association. Published by Elsevier Inc. All rights reserved.
Translation and validation of the Spanish version of the Health of the Nation Outcome Scales for People with Learning Disabilities (HoNOS-LD).

PubMed

Esteba-Castillo, Susanna; Torrents-Rodas, David; García-Alba, Javier; Ribas-Vidal, Núria; Novell-Alsina, Ramon

2016-12-21

The Health of the Nation Outcome Scales for People with Learning Disabilities (HoNOS-LD) is a brief instrument that assesses functioning in people with intellectual development disorder and mental health problems/behaviour disorders. The aim of the present study was to examine the evidence on the validity of the scores based on the Spanish version of the HoNOS-LD. The study included 111 participants that were assessed by the Spanish version of the HoNOS-LD and other questionnaires that measured different variables related to the scale. Thirty-three participants were assessed by 2 examiners, and retested 7 days later, in order to study inter-examiner reliability and test-retest reliabilities. Based on clinical and conceptual criteria, and on the results of the parallel analysis, a factorial solution with one factor was selected. Internal consistency was good (Omega coefficient of 0.87). Inter-examiner and test-retest reliabilities were excellent (intraclass correlation coefficients of 0.95 and 0.98, respectively). Correlations between sections of the HoNOS-LD and the related instruments showed the expected direction, and were highly significant (P<.001), and the HoNOS-LD score increased with the intensity of the support required by the participants. These results showed evidence of the validity of association with other external variables. The Spanish version of the HoNOS-LD is a brief, valid and reliable instrument, which will enable a routine assessment of functioning for different uses, including diagnosis and intervention. Copyright © 2016 SEP y SEPB. Publicado por Elsevier España, S.L.U. All rights reserved.
Delirium assessment in hospitalized elderly patients: Italian translation and validation of the nursing delirium screening scale.

PubMed

Spedale, Valentina; Di Mauro, Stefania; Del Giorno, Giulia; Barilaro, Monica; Villa, Candida E; Gaudreau, Jean D; Ausili, Davide

2017-08-01

Delirium has a high incidence pathology associated with negative outcomes. Although highly preventable, half the cases are not recognized. One major cause of delirium misdiagnosis is the absence of a versatile instrument to measure it. Our objective was to translate the nursing delirium screening scale (Nu-DESC) and evaluate its performance in Italian settings. This was a methodological study conducted in two sequential phases. The first was the Italian translation of Nu-DESC through a translation and back-translation process. The second aimed to test the inter-rater reliability, the sensibility and specificity of the instrument on a convenience sample of 101 hospitalized elderly people admitted to relevant wards of the San Gerardo Hospital in Monza. To evaluate the inter-rater reliability, two examiners tested Nu-DESC on 20 patients concurrently without comparison. To measure the sensibility and specificity of Nu-DESC, the confusion assessment method was used as a gold standard measure. The inter-rater reliability (Cohen Kappa) was 0.87-an excellent agreement between examiners. The study of the ROC curve showed an AUC value of 0.9461 suggesting high test accuracy. Using 3 as a cut-off value, Nu-DESC showed 100 % sensibility and 76 % specificity. Further research is needed to test Nu-DESC on a larger sample. However, based on our results, Nu-DESC can be used in research and clinical practice in Italian settings because of its very good and similar performances to previous validation studies. The value of 3 appears to be the optimal cut-off in the Italian context.
Development and Reliability Testing of the FEDS System for Classifying Glenohumeral Instability

PubMed Central

Kuhn, John E.; Helmer, Tara T.; Dunn, Warren R.; Throckmorton V, Thomas W.

2010-01-01

Background Classification systems for glenohumeral instability (GHI) are opinion based, not validated, and poorly defined. This study is designed to methodologically develop and test a GHI classification system. Methods: Classification System Development A systematic literature review identified 18 systems for classifying GHI. The frequency characteristics used was recorded. Additionally 31 members of the American Shoulder and Elbow Surgeons responded to a survey to identify features important to characterize GHI. Frequency, Etiology, Direction, and Severity (FEDS), were found to be most important. Frequency was defined as solitary (one episode), occasional (2–5x/year), or frequent (>5x/year). Etiology was defined as traumatic or atraumatic. Direction referred to the primary direction of instability (anterior, posterior, or inferior). Severity was defined as either subluxation or dislocation. Methods: Reliability Testing Fifty GHI patients completed a questionnaire at their initial visit. One of six sports medicine fellowship trained physicians completed a similar questionnaire after examining the patient. Patients returned after two weeks and were examined by the original physician and two other physicians. Inter- and intra-rater agreement for the FEDS classification system was calculated. Results Agreement between patients and physicians was lowest for frequency (39%; k=0.130) and highest for direction (82%; k=0.636). Physician intra-rater agreement was 84– 97% for the individual FEDS characteristics (k=0.69 to 0.87)). Physician inter-rater agreement ranged from 82–90% (k=0.44 to 0.76). Conclusions The FEDS system has content validity and is highly reliable for classifying GHI. Physical examination using provocative testing to determine the primary direction of instability produces very high levels of inter- and intra-rater agreement. Level of evidence Level II, Development of Diagnostic Criteria with Consecutive Series of Patients, Diagnosis Study. PMID:21277809
An Investigation of the Impact of Guessing on Coefficient α and Reliability

PubMed Central

2014-01-01

Guessing is known to influence the test reliability of multiple-choice tests. Although there are many studies that have examined the impact of guessing, they used rather restrictive assumptions (e.g., parallel test assumptions, homogeneous inter-item correlations, homogeneous item difficulty, and homogeneous guessing levels across items) to evaluate the relation between guessing and test reliability. Based on the item response theory (IRT) framework, this study investigated the extent of the impact of guessing on reliability under more realistic conditions where item difficulty, item discrimination, and guessing levels actually vary across items with three different test lengths (TL). By accommodating multiple item characteristics simultaneously, this study also focused on examining interaction effects between guessing and other variables entered in the simulation to be more realistic. The simulation of the more realistic conditions and calculations of reliability and classical test theory (CTT) item statistics were facilitated by expressing CTT item statistics, coefficient α, and reliability in terms of IRT model parameters. In addition to the general negative impact of guessing on reliability, results showed interaction effects between TL and guessing and between guessing and test difficulty.
Preliminary appraisal of the reliability and validity of the Colorado State University Feline Acute Pain Scale.

PubMed

Shipley, Hilary; Guedes, Alonso; Graham, Lynelle; Goudie-DeAngelis, Elizabeth; Wendt-Hornickle, Erin

2018-05-01

Objectives The objective of this study was to determine the inter-rater reliability and convergent validity of the Colorado State University Feline Acute Pain Scale (CSU-FAPS) in a preliminary appraisal of its performance in a clinical teaching setting. Methods Sixty-eight female cats were assessed for pain after ovariohysterectomy. A cohort of 21 cats was examined independently by four raters (two board-certified anesthesiologists and two anesthesia residents) with the CSU-FAPS, and intra-class correlation coefficient (ICC) was used to determine inter-rater reliability. Weighted Cohen's kappa was used to determine inter-rater reliability centered on the 'need to reassess analgesic plan' (dichotomous scale). A separate cohort of 47 cats was evaluated independently by two raters (one board-certified anesthesiologist and one veterinary small animal rotating intern) using the CSU-FAPS and the Glasgow Composite Measure Pain Scale (CMPS-Feline), and Spearman rank-order correlation was determined to assess convergent validity. Reliability was interpreted using Altman's classification as very good, good, moderate, fair and poor. Validity was considered adequate if correlation coefficients were between 0.4 and 0.8. Results The ICC was 0.61 for anesthesiologists and 0.67 for residents, indicating good reliability. Weighted Cohen's kappa was 0.79 for anesthesiologists and 0.44 for residents, indicating moderate to good reliability. The Spearman rank correlation indicated a statistically significant ( P = 0.0003) positive correlation (0.31; 95% confidence interval 0.14-0.46) between the CSU-FAPS and the CMPS-Feline. Conclusions and relevance The CSU-FAPS showed moderate-to-good inter-rater reliability when used by veterinarians to assess pain level or need to reassess analgesic plan after ovariohysterectomy in cats. The validity fell short of current guidelines for correlation coefficients and further refinement and testing are warranted to improve its performance.
Reliability of lower limb alignment measures using an established landmark-based method with a customized computer software program

PubMed Central

Sled, Elizabeth A.; Sheehy, Lisa M.; Felson, David T.; Costigan, Patrick A.; Lam, Miu; Cooke, T. Derek V.

2010-01-01

The objective of the study was to evaluate the reliability of frontal plane lower limb alignment measures using a landmark-based method by (1) comparing inter- and intra-reader reliability between measurements of alignment obtained manually with those using a computer program, and (2) determining inter- and intra-reader reliability of computer-assisted alignment measures from full-limb radiographs. An established method for measuring alignment was used, involving selection of 10 femoral and tibial bone landmarks. 1) To compare manual and computer methods, we used digital images and matching paper copies of five alignment patterns simulating healthy and malaligned limbs drawn using AutoCAD. Seven readers were trained in each system. Paper copies were measured manually and repeat measurements were performed daily for 3 days, followed by a similar routine with the digital images using the computer. 2) To examine the reliability of computer-assisted measures from full-limb radiographs, 100 images (200 limbs) were selected as a random sample from 1,500 full-limb digital radiographs which were part of the Multicenter Osteoarthritis (MOST) Study. Three trained readers used the software program to measure alignment twice from the batch of 100 images, with two or more weeks between batch handling. Manual and computer measures of alignment showed excellent agreement (intraclass correlations [ICCs] 0.977 – 0.999 for computer analysis; 0.820 – 0.995 for manual measures). The computer program applied to full-limb radiographs produced alignment measurements with high inter- and intra-reader reliability (ICCs 0.839 – 0.998). In conclusion, alignment measures using a bone landmark-based approach and a computer program were highly reliable between multiple readers. PMID:19882339
Effect of knee angle on neuromuscular assessment of plantar flexor muscles: A reliability study

PubMed Central

Cornu, Christophe; Jubeau, Marc

2018-01-01

Introduction This study aimed to determine the intra- and inter-session reliability of neuromuscular assessment of plantar flexor (PF) muscles at three knee angles. Methods Twelve young adults were tested for three knee angles (90°, 30° and 0°) and at three time points separated by 1 hour (intra-session) and 7 days (inter-session). Electrical (H reflex, M wave) and mechanical (evoked and maximal voluntary torque, activation level) parameters were measured on the PF muscles. Intraclass correlation coefficients (ICC) and coefficients of variation were calculated to determine intra- and inter-session reliability. Results The mechanical measurements presented excellent (ICC>0.75) intra- and inter-session reliabilities regardless of the knee angle considered. The reliability of electrical measurements was better for the 90° knee angle compared to the 0° and 30° angles. Conclusions Changes in the knee angle may influence the reliability of neuromuscular assessments, which indicates the importance of considering the knee angle to collect consistent outcomes on the PF muscles. PMID:29596480
High inter-rater reliability, agreement, and convergent validity of Constant score in patients with clavicle fractures.

PubMed

Ban, Ilija; Troelsen, Anders; Kristensen, Morten Tange

2016-10-01

The Constant score (CS) has been the primary endpoint in most studies on clavicle fractures. However, the CS was not developed to assess patients with clavicle fractures. Our aim was to examine inter-rater reliability and agreement of the CS in patients with clavicle fractures. The secondary aim was to estimate the correlation between the CS and the Disabilities of the Arm, Shoulder and Hand score and the internal consistency of the 2 scores. On the basis of sample sizing, 36 patients (31 male and 5 female patients; mean age, 41.3 years) with clavicle fractures underwent standardized CS assessment at a mean of 6.8 weeks (SD, 1.0 weeks) after injury. Reliability and agreement of the CS were determined by 2 raters. The interclass correlation coefficient (ICC2,1), standard error of measurement, minimal detectable change, Cronbach α coefficient, and Pearson correlation coefficient were estimated. Inter-rater reliability of the total CS was excellent (interclass correlation coefficient, 0.94; 95% confidence interval, 0.88-0.97), with no systematic difference between the 2 raters (P = .75). The standard error of measurement (measurement error at the group level) was 4.9, whereas the minimal detectable change (smallest change needed to indicate a real change for an individual) was 13.6 CS points. The internal consistency of the 10 CS items was good, with a Cronbach α of .85, and we found a strong correlation (r = -0.92) between the CS and Disabilities of the Arm, Shoulder and Hand score. The CS was found to be reliable for assessing patients with clavicle fractures, especially at the group level. With high inter-rater reliability and agreement, in addition to good internal consistency, the standardized CS used in this study can be used for comparison of results from different settings. Copyright © 2016 Journal of Shoulder and Elbow Surgery Board of Trustees. Published by Elsevier Inc. All rights reserved.

The Reliability and Validity of the Computerized Double Inclinometer in Measuring Lumbar Mobility

PubMed Central

MacDermid, Joy Christine; Arumugam, Vanitha; Vincent, Joshua Israel; Carroll, Krista L

2014-01-01

Study Design : Repeated measures reliability/validity study. Objectives : To determine the concurrent validity, test-retest, inter-rater and intra-rater reliability of lumbar flexion and extension measurements using the Tracker M.E. computerized dual inclinometer (CDI) in comparison to the modified-modified Schober (MMS) Summary of Background : Numerous studies have evaluated the reliability and validity of the various methods of measuring spinal motion, but the results are inconsistent. Differences in equipment and techniques make it difficult to correlate results. Methods : Twenty subjects with back pain and twenty without back pain were selected through convenience sampling. Two examiners measured sagittal plane lumbar range of motion for each subject. Two separate tests with the CDI and one test with the MMS were conducted. Each test consisted of three trials. Instrument and examiner order was randomly assigned. Intra-class correlations (ICCs 2, 2 and 2, 2) and Pearson correlation coefficients (r) were used to calculate reliability and concurrent validity respectively. Results : Intra-trial reliability was high to very high for both the CDI (ICCs 0.85 - 0.96) and MMS (ICCs 0.84 - 0.98). However, the reliability was poor to moderate, when the CDI unit had to be repositioned either by the same rate (ICCs 0.16 - 0.59) or a different rater (ICCs 0.45 - 0.52). Inter-rater reliability for the MMS was moderate to high (ICCs 0.75 - 0.82) which bettered the moderate correlation obtained for the CDI (ICCs 0.45 - 0.52). Correlations between the CDI and MMS were poor for flexion (0.32; p<0.05) and poor to moderate (-0.42 - -0.51; p<0.05) for extension measurements. Conclusion : When using the CDI, an average of subsequent tests is required to obtain moderate reliability. The MMS was highly reliable than the CDI. The MMS and the CDI measure lumbar movement on a different metric that are not highly related to each other. PMID:25352928
Inter- and intraexaminer reliability of bitewing radiography and near-infrared light transillumination for proximal caries detection and assessment.

PubMed

Litzenburger, Friederike; Heck, Katrin; Pitchika, Vinay; Neuhaus, Klaus W; Jost, Fabian N; Hickel, Reinhard; Jablonski-Momeni, Anahita; Welk, Alexander; Lederer, Alexander; Kühnisch, Jan

2018-02-01

The purpose of this in vitro study was to evaluate the inter- and intraexaminer reliability of digital bitewing (DBW) radiography and near-infrared light transillumination (NIRT) for proximal caries detection and assessment in posterior teeth. From a pool of 85 patients, 100 corresponding pairs of DBW and NIRT images (~1/3 healthy, ~1/3 with enamel caries and ~1/3 with dentin caries) were chosen. 12 dentists with different professional status and clinical experience repeated the evaluation in two blinded cycles. Two experienced dentists provided a reference diagnosis after analysing all images independently. Statistical analysis included the calculation of simple (κ) and weighted Kappa (wκ) values as a measure of reliability. Logistic regression with a backward elimination model was used to investigate the influence of the diagnostic method, evaluation cycle, type of tooth, and clinical experience on reliability. Altogether, inter- and intraexaminer reliability exhibited good to excellent κ and wκ values for DBW radiography (Inter: κ = 0.60/ 0.63; wκ = 0.74/0.76; Intra: κ = 0.64; wκ = 0.77) and NIRT (Inter: κ = 0.74/0.64; wκ = 0.86/0.82; Intra: κ = 0.68; wκ = 0.84). The backward elimination model revealed NIRT to be significantly more reliable than DBW radiography. This study revealed a good to excellent inter- and intraexaminer reliability for proximal caries detection using DBW and NIRT images. The logistic regression analysis revealed significantly better reliability for NIRT. Additionally, the first evaluation cycle was more reliable according to the reference diagnoses.
The Oral Speech Mechanism Screening Examination (OSMSE).

ERIC Educational Resources Information Center

St. Louis, Kenneth O.; Ruscello, Dennis M.

Although speech-language pathologists are expected to be able to administer and interpret oral examinations, there are currently no screening tests available that provide careful administration instructions and data for intra-examiner and inter-examiner reliability. The Oral Speech Mechanism Screening Examination (OSMSE) is designed primarily for…
The development and testing of a qualitative instrument designed to assess critical thinking

NASA Astrophysics Data System (ADS)

Clauson, Cynthia Louisa

This study examined a qualitative approach to assess critical thinking. An instrument was developed that incorporates an assessment process based on Dewey's (1933) concepts of self-reflection and critical thinking as problem solving. The study was designed to pilot test the critical thinking assessment process with writing samples collected from a heterogeneous group of students. The pilot test included two phases. Phase 1 was designed to determine the validity and inter-rater reliability of the instrument using two experts in critical thinking, problem solving, and literacy development. Validity of the instrument was addressed by requesting both experts to respond to ten questions in an interview. The inter-rater reliability was assessed by analyzing the consistency of the two experts' scorings of the 20 writing samples to each other, as well as to my scoring of the same 20 writing samples. Statistical analyses included the Spearman Rho and the Kuder-Richardson (Formula 20). Phase 2 was designed to determine the validity and reliability of the critical thinking assessment process with seven science teachers. Validity was addressed by requesting the teachers to respond to ten questions in a survey and interview. Inter-rater reliability was addressed by comparing the seven teachers' scoring of five writing samples with my scoring of the same five writing samples. Again, the Spearman Rho and the Kuder-Richardson (Formula 20) were used to determine the inter-rater reliability. The validity results suggest that the instrument is helpful as a guide for instruction and provides a systematic method to teach and assess critical thinking while problem solving with students in the classroom. The reliability results show the critical thinking assessment instrument to possess fairly high reliability when used by the experts, but weak reliability when used by classroom teachers. A major conclusion was drawn that teachers, as well as students, would need to receive instruction in critical thinking and in how to use the assessment process in order to gain more consistent interpretations of the six problem-solving steps. Specific changes needing to be made in the instrument to improve the quality are included.
Inter-Rater Reliability and Generalizability of Patient Note Scores Using a Scoring Rubric Based on the USMLE Step-2 CS Format

ERIC Educational Resources Information Center

Park, Yoon Soo; Hyderi, Abbas; Bordage, Georges; Xing, Kuan; Yudkowsky, Rachel

2016-01-01

Recent changes to the patient note (PN) format of the United States Medical Licensing Examination have challenged medical schools to improve the instruction and assessment of students taking the Step-2 clinical skills examination. The purpose of this study was to gather validity evidence regarding response process and internal structure, focusing…
Inter-rater Reliability of Real-Time Ultrasound to Measure Acromiohumeral Distance.

PubMed

Mackenzie, Tanya Anne; Bdaiwi, Alya H; Herrington, Lee; Cools, Ann

2016-07-01

Real-time ultrasound (RTUS) has been suggested as a reliable measure of acromiohumeral distance. However, to date, no vigorous assessment and reporting of inter-rater reliability of this method has been performed with the shoulder in a neutral position or with active and passive arm abduction. To assess intrasession inter-rater reliability of using RTUS to measure acromiohumeral distance with the shoulder in a neutral position and with 60° active and passive abduction. Inter-rater intrasession reliability of repeated measures. Human performance laboratory. Twenty persons (12 male and 8 female) with an average age of 29.86 years (standard deviation, 7.8). In an inter-rater, intrasession study, RTUS was used to measure the acromiohumeral distance with the shoulder in a neutral position and with 60° of both active and passive abduction. Acromiohumeral distance. Intraclass correlation coefficient (ICC)2.1 scores ranged between 0.65-0.88 (standard error of the mean = 0.81-1.2 mm and minimal detectable differences with 95% confidence = 2.2-2.3 mm) for inter-rater intrasession reliability. RTUS was found to have fair to good inter-rater reliability as a tool to measure acromiohumeral distance with the shoulder in a neutral position and with 60° of both active and passive arm abduction. Copyright © 2016 American Academy of Physical Medicine and Rehabilitation. Published by Elsevier Inc. All rights reserved.
Reliability of Chinese medicine diagnostic variables in the examination of patients with osteoarthritis of the knee.

PubMed

Hua, Bin; Abbas, Estelle; Hayes, Alan; Ryan, Peter; Nelson, Lisa; O'Brien, Kylie

2012-11-01

Chinese medicine (CM) has its own diagnostic indicators that are used as evidence of change in a patient's condition. The majority of studies investigating efficacy of Chinese herbal medicine (CHM) have utilized biomedical diagnostic endpoints. For CM clinical diagnostic variables to be incorporated into clinical trial designs, there would need to be evidence that these diagnostic variables are reliable. Previous studies have indicated that the reliability of CM syndrome diagnosis is variable. Little information is known about where the variability stems from--the basic data collection level or the synthesis of diagnostic data, or both. No previous studies have investigated systematically the reliability of all four diagnostic methods used in the CM diagnostic process (Inquiry, Inspection, Auscultation/Olfaction, and Palpation). The objective of this study was to assess the inter-rater reliability of data collected using the four diagnostic methods of CM in Australian patients with knee osteoarthritis (OA), in order to investigate if CM variables could be used with confidence as diagnostic endpoints in a clinical trial investigating the efficacy of a CHM in treating OA. An inter-rater reliability study was conducted as a substudy of a clinical trial investigating the treatment of knee OA with Chinese herbal medicine. Two (2) experienced CM practitioners conducted a CM examination separately, within 2 hours of each other, in 40 participants. A CM assessment form was utilized to record the diagnostic data. Cohen's κ coefficient was used as a measure of the level of agreement between 2 practitioners. There was a relatively good level of agreement for Inquiry and Auscultation variables, and, in general, a low level of agreement for (visual) Inspection and Palpation variables. There was variation in the level of agreement between 2 practitioners on clinical information collected using the Four Diagnostic Methods of a CM examination. Some aspects of CM diagnosis appear to be reliable, while others are not. Based on these results, it was inappropriate to use CM diagnostic variables as diagnostic endpoints in the main study, which was an investigation of efficacy of CHM treatment of knee OA.
Reliability of infrared thermometric measurements of skin temperature in the hand.

PubMed

Packham, Tara L; Fok, Diana; Frederiksen, Karen; Thabane, Lehana; Buckley, Norman

2012-01-01

Clinical measurement study. Skin temperature asymmetries (STAs) are used in the diagnosis of complex regional pain syndrome (CRPS), but little evidence exists for reliability of the equipment and methods. This study examined the reliability of an inexpensive infrared (IR) thermometer and measurement points in the hand for the study of STA. ST was measured three times at five points on both hands with an IR thermometer by two raters in 20 volunteers (12 normals and 8 CRPS). ST measurement results using IR thermometers support inter-rater reliability: intraclass correlation coefficient (ICC) estimate for single measures 0.80; all ST measurement points were also highly reliable (ICC single measures, 0.83-0.91). The equipment demonstrated excellent reliability, with little difference in the reliability of the five measurement sites. These preliminary findings support their use in future CRPS research. Not applicable. Copyright © 2012 Hanley & Belfus. Published by Elsevier Inc. All rights reserved.
Tackling reliability and construct validity: the systematic development of a qualitative protocol for skill and incident analysis.

PubMed

Savage, Trevor Nicholas; McIntosh, Andrew Stuart

2017-03-01

It is important to understand factors contributing to and directly causing sports injuries to improve the effectiveness and safety of sports skills. The characteristics of injury events must be evaluated and described meaningfully and reliably. However, many complex skills cannot be effectively investigated quantitatively because of ethical, technological and validity considerations. Increasingly, qualitative methods are being used to investigate human movement for research purposes, but there are concerns about reliability and measurement bias of such methods. Using the tackle in Rugby union as an example, we outline a systematic approach for developing a skill analysis protocol with a focus on improving objectivity, validity and reliability. Characteristics for analysis were selected using qualitative analysis and biomechanical theoretical models and epidemiological and coaching literature. An expert panel comprising subject matter experts provided feedback and the inter-rater reliability of the protocol was assessed using ten trained raters. The inter-rater reliability results were reviewed by the expert panel and the protocol was revised and assessed in a second inter-rater reliability study. Mean agreement in the second study improved and was comparable (52-90% agreement and ICC between 0.6 and 0.9) with other studies that have reported inter-rater reliability of qualitative analysis of human movement.
Reliability assessments in qualitative health promotion research.

PubMed

Cook, Kay E

2012-03-01

This article contributes to the debate about the use of reliability assessments in qualitative research in general, and health promotion research in particular. In this article, I examine the use of reliability assessments in qualitative health promotion research in response to health promotion researchers' commonly held misconception that reliability assessments improve the rigor of qualitative research. All qualitative articles published in the journal Health Promotion International from 2003 to 2009 employing reliability assessments were examined. In total, 31.3% (20/64) articles employed some form of reliability assessment. The use of reliability assessments increased over the study period, ranging from <20% in 2003/2004 to 50% and above in 2008/2009, while at the same time the total number of qualitative articles decreased. The articles were then classified into four types of reliability assessments, including the verification of thematic codes, the use of inter-rater reliability statistics, congruence in team coding and congruence in coding across sites. The merits of each type were discussed, with the subsequent discussion focusing on the deductive nature of reliable thematic coding, the limited depth of immediately verifiable data and the usefulness of such studies to health promotion and the advancement of the qualitative paradigm.
Reproducibility of manual pressure force on provocation of the sacroiliac joint.

PubMed

Levin, U; Nilsson-Wikmar, L; Stenström, C H; Lundeberg, T

1998-01-01

Previous studies of pain-provocation sacroiliac (SI) joint tests have revealed conflicting results. The aim of the present study was to evaluate the intra- and inter-test reliability of pressure force applied during distraction test, compression test and pressure on the apex sacralis. Seventeen physiotherapists (PTs), median age 43 years and median clinical experience 11 years, all experienced in musculoskeletal evaluation and therapy, participated in the study. Each PT performed each test on the same healthy volunteer for 20 s, on three separate occasions, at intervals of one week using a specially constructed examination table which registered pressure force. The PTs were capable of maintaining a relatively constant pressure force for 20 s. The intra-test reliability was acceptable even though there were individual differences on different occasions between those PTs who used the SI joint tests often and those who seldom or never used them. The inter-test reliability was insufficient. The findings indicate the advantage of registering pressure force as a complement for standardized methods for pain-provoking tests and when learning provocation tests, since individual variability was considerable.
Psychometric properties of the Peer Proficiency Assessment (PEPA): a tool for evaluation of undergraduate peer counselors' motivational interviewing fidelity.

PubMed

Mastroleo, Nadine R; Mallett, Kimberly A; Turrisi, Rob; Ray, Anne E

2009-09-01

Despite the expanding use of undergraduate student peer counseling interventions aimed at reducing college student drinking, few programs evaluate peer counselors' competency to conduct these interventions. The present research describes the development and psychometric assessments of the Peer Proficiency Assessment (PEPA), a new tool for examining Motivational Interviewing adherence in undergraduate student peer delivered interventions. Twenty peer delivered sessions were evaluated by master and undergraduate student coders using a cross-validation design to examine peer based alcohol intervention sessions. Assessments revealed high inter-rater reliability between student and master coders and good correlations between previously established fidelity tools. Findings lend support for the use of the PEPA to examine peer counselor competency. The PEPA, training for use, inter-rater reliability information, construct and predictive validity, and tool usefulness are described.
Shear-wave sonoelastography for assessing masseter muscle hardness in comparison with strain sonoelastography: study with phantoms and healthy volunteers

PubMed Central

Nakayama, Miwa; Nishiyama, Wataru; Nozawa, Michihito

2016-01-01

Objectives Shear-wave sonoelastography is expected to facilitate low operator dependency, high reproducibility and quantitative evaluation, whereas there are few reports on available normative values of in vivo tissue in head and neck fields. The purpose of this study was to examine the reliabilities on measuring hardness using shear-wave sonoelastography and to clarify normal values of masseter muscle hardness in healthy volunteers. Methods Phantoms with known hardness ranging from 20 to 140 kPa were scanned with shear-wave sonoelastography, and inter- and intraoperator reliabilities were examined compared with strain sonoelastography. The relationships between the actual and measured hardness were analyzed. The masseter muscle hardness in 30 healthy volunteers was measured using shear-wave sonoelastography. Results: The inter- and intraoperator intraclass correlation coefficients were almost perfect. Strong correlations were seen between the actual and measured hardness. The mean hardness of the masseter muscles in healthy volunteers was 42.82 ± 5.56 kPa at rest and 53.36 ± 8.46 kPa during jaw clenching. Conclusions: The hardness measured with shear-wave sonoelastography showed high-level reliability. Shear-wave sonoelastography may be suitable for evaluation of the masseter muscles. PMID:26624000
Shear-wave sonoelastography for assessing masseter muscle hardness in comparison with strain sonoelastography: study with phantoms and healthy volunteers.

PubMed

Ariji, Yoshiko; Nakayama, Miwa; Nishiyama, Wataru; Nozawa, Michihito; Ariji, Eiichiro

2016-01-01

Objectives Shear-wave sonoelastography is expected to facilitate low operator dependency, high reproducibility and quantitative evaluation, whereas there are few reports on available normative values of in vivo tissue in head and neck fields. The purpose of this study was to examine the reliabilities on measuring hardness using shear-wave sonoelastography and to clarify normal values of masseter muscle hardness in healthy volunteers. Methods Phantoms with known hardness ranging from 20 to 140 kPa were scanned with shear-wave sonoelastography, and inter- and intraoperator reliabilities were examined compared with strain sonoelastography. The relationships between the actual and measured hardness were analyzed. The masseter muscle hardness in 30 healthy volunteers was measured using shear-wave sonoelastography. The inter- and intraoperator intraclass correlation coefficients were almost perfect. Strong correlations were seen between the actual and measured hardness. The mean hardness of the masseter muscles in healthy volunteers was 42.82 ± 5.56 kPa at rest and 53.36 ± 8.46 kPa during jaw clenching. The hardness measured with shear-wave sonoelastography showed high-level reliability. Shear-wave sonoelastography may be suitable for evaluation of the masseter muscles.
Inter-rater reliability of output measures for a posture matching assessment approach: a pilot study with food service workers.

PubMed

Cann, A P; Connolly, M; Ruuska, R; MacNeil, M; Birmingham, T B; Vandervoort, A A; Callaghan, J P

2008-04-01

Despite the ongoing health problem of repetitive strain injuries, there are few tools currently available for ergonomic applications evaluating cumulative loading that have well-documented evidence of reliability and validity. The purpose of this study was to determine the inter-rater reliability of a posture matching based analysis tool (3DMatch, University of Waterloo) for predicting cumulative and peak spinal loads. A total of 30 food service workers were each videotaped for a 1-h period while performing typical work activities and a single work task was randomly selected from each for analysis by two raters. Inter-rater reliability was determined using intraclass correlation coefficients (ICC) model 2,1 and standard errors of measurement for cumulative and peak spinal and shoulder loading variables across all subjects. Overall, 85.5% of variables had moderate to excellent inter-rater reliability, with ICCs ranging from 0.30-0.99 for all cumulative and peak loading variables. 3DMatch was found to be a reliable ergonomic tool when more than one rater is involved.
Mothers and Children as Informants of Bullying Victimization: Results from an Epidemiological Cohort of Children

ERIC Educational Resources Information Center

Shakoor, Sania; Jaffee, Sara R.; Andreou, Penelope; Bowes, Lucy; Ambler, Antony P.; Caspi, Avshalom; Moffitt, Terrie E.; Arseneault, Louise

2011-01-01

Stressful events early in life can affect children's mental health problems. Collecting valid and reliable information about children's bad experiences is important for research and clinical purposes. This study aimed to (1) investigate whether mothers and children provide valid reports of bullying victimization, (2) examine the inter-rater…
Frame-of-reference training for simulation-based intraoperative communication assessment.

PubMed

Gardner, Aimee K; Russo, Michael A; Jabbour, Ibrahim I; Kosemund, Matthew; Scott, Daniel J

2016-09-01

The purpose of this study was to examine the impact of frame-of-reference (FOR) training on assessments of intraoperative communication skills and identify areas of need to inform curricular efforts. Simulation instructors (M.D., Ph.D., Research Fellow, Simulation Technician) underwent a 2-hour FOR training session with the operating room communication instrument. They then independently rated communication skills of 19 PGY1s who participated in a team-based simulation. Residents completed self-assessments via video review of the scenario. Intraclass correlation coefficients were used to examine inter-rater reliability. Relationships between trained raters and resident scores were assessed with Pearson correlation coefficients and paired sample t tests. Inter-reliability after FOR training was .91. The correlation between trained rater scores and resident evaluations was nonsignificant. Residents significantly underestimated their intraoperative communication skills (P < .05). Use of names, closed loop communication, and sharing information with team members demonstrated consistently low ratings among all residents. These findings reveal that a number of individuals can be trained to reliably rate resident intraoperative communication performance and that residents tend to under-rate their communication skills. Copyright © 2016 Elsevier Inc. All rights reserved.
Greater understanding of normal hip physical function may guide clinicians in providing targeted rehabilitation programmes.

PubMed

Kemp, Joanne L; Schache, Anthony G; Makdissi, Michael; Sims, Kevin J; Crossley, Kay M

2013-07-01

This study investigated tests of hip muscle strength and functional performance. The specific objectives were to: (i) establish intra- and inter-rater reliability; (ii) compare differences between dominant and non-dominant limbs; (iii) compare agonist and antagonist muscle strength ratios; (iv) compare differences between genders; and (v) examine relationships between hip muscle strength, baseline measures and functional performance. Reliability study and cross-sectional analysis of hip strength and functional performance. In healthy adults aged 18-50years, normalised hip muscle peak torque and functional performance were evaluated to: (i) establish intra-rater and inter-rater reliability; (ii) analyse differences between limbs, between antagonistic muscle groups and genders; and (iii) associations between strength and functional performance. Excellent reliability (intra-rater ICC=0.77-0.96; inter-rater ICC=0.82-0.95) was observed. No difference existed between dominant and non-dominant limbs. Differences in strength existed between antagonistic pairs of muscles: hip abduction was greater than adduction (p<0.001) and hip ER was greater than IR (p<0.001). Men had greater ER strength (p=0.006) and hop for distance (p<0.001) than women. Strong associations were observed between measures of hip muscle strength (except hip flexion) and age, height, and functional performance. Deficits in hip muscle strength or functional performance may influence hip pain. In order to provide targeted rehabilitation programmes to address patient-specific impairments, and determine when individuals are ready to return to physical activity, clinicians are increasingly utilising tests of hip strength and functional performance. This study provides a battery of reliable, clinically applicable tests which can be used for these purposes. Copyright © 2012 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Evaluation of the reliability and accuracy of using cone-beam computed tomography for diagnosing periapical cysts from granulomas.

PubMed

Guo, Jing; Simon, James H; Sedghizadeh, Parish; Soliman, Osman N; Chapman, Travis; Enciso, Reyes

2013-12-01

The purpose of this study was to evaluate the reliability and accuracy of cone-beam computed tomographic (CBCT) imaging against the histopathologic diagnosis for the differential diagnosis of periapical cysts (cavitated lesions) from (solid) granulomas. Thirty-six periapical lesions were imaged using CBCT scans. Apicoectomy surgeries were conducted for histopathological examination. Evaluator 1 examined each CBCT scan for the presence of 6 radiologic characteristics of a cyst (ie, location, periphery, shape, internal structure, effects on surrounding structure, and perforation of the cortical plate). Not every cyst showed all radiologic features (eg, not all cysts perforate the cortical plate). For the purpose of finding the minimum number of diagnostic criteria present in a scan to diagnose a lesion as a cyst, we conducted 6 receiver operating characteristic curve analyses comparing CBCT diagnoses with the histopathologic diagnosis. Two other independent evaluators examined the CBCT lesions. Statistical tests were conducted to examine the accuracy, inter-rater reliability, and intrarater reliability of CBCT images. Findings showed that a score of ≥4 positive findings was the optimal scoring system. The accuracies of differential diagnoses of 3 evaluators were moderate (area under the curve = 0.76, 0.70, and 0.69 for evaluators 1, 2, and 3, respectively). The inter-rater agreement of the 3 evaluators was excellent (α = 0.87). The intrarater agreement was good to excellent (κ = 0.71, 0.76, and 0.77). CBCT images can provide a moderately accurate diagnosis between cysts and granulomas. Copyright © 2013 American Association of Endodontists. Published by Elsevier Inc. All rights reserved.
Applying Resource Utilization Groups (RUG-III) in Hong Kong nursing homes.

PubMed

Chou, Kee-Lee; Chi, Iris; Leung, Joe C B

2008-01-01

Resource Utilization Groups III (RUG-III) is a case-mix system developed in the United States for categorization of nursing home residents and the financing of residential care services. In Hong Kong, RUG-III is based on several board groups of residents. The aim of this study was to examine the reliability and validity of the RUG-III in Hong Kong nursing homes. A cross-sectional survey was conducted in seven residential facilities operated by one agency. Residents ( N = 1,127) were assessed by the Minimum Data Set (MDS) and nursing as well as auxiliary staff care times were recorded within 2 weeks before or after the completion of MDS assessment. Forty-five out 1,127 residents were re-interviewed by an independent assessor to assess the inter-rater reliability. The inter-rater reliability of MDS assessment was excellent (kappa = 0.76) and the original RUG-III accounted for about 30 per cent of nursing staff time. Results provide preliminary evidence to support that RUG-III is a reliable and valid case-mix system for Hong Kong nursing homes, but future studies must be explored to reduce the variance of resource use explained by this case-mix system.

Inter-vender and test-retest reliabilities of resting-state functional magnetic resonance imaging: Implications for multi-center imaging studies.

PubMed

An, Hyeong Su; Moon, Won-Jin; Ryu, Jae-Kyun; Park, Ju Yeon; Yun, Won Sung; Choi, Jin Woo; Jahng, Geon-Ho; Park, Jang-Yeon

2017-12-01

This prospective multi-center study aimed to evaluate the inter-vendor and test-retest reliabilities of resting-state functional magnetic resonance imaging (RS-fMRI) by assessing the temporal signal-to-noise ratio (tSNR) and functional connectivity. Study included 10 healthy subjects and each subject was scanned using three 3T MR scanners (GE Signa HDxt, Siemens Skyra, and Philips Achieva) in two sessions. The tSNR was calculated from the time course data. Inter-vendor and test-retest reliabilities were assessed with intra-class correlation coefficients (ICCs) derived from variant component analysis. Independent component analysis was performed to identify the connectivity of the default-mode network (DMN). In result, the tSNR for the DMN was not significantly different among the GE, Philips, and Siemens scanners (P=0.638). In terms of vendor differences, the inter-vendor reliability was good (ICC=0.774). Regarding the test-retest reliability, the GE scanner showed excellent correlation (ICC=0.961), while the Philips (ICC=0.671) and Siemens (ICC=0.726) scanners showed relatively good correlation. The DMN pattern of the subjects between the two sessions for each scanner and between three scanners showed the identical patterns of functional connectivity. The inter-vendor and test-retest reliabilities of RS-fMRI using different 3T MR scanners are good. Thus, we suggest that RS-fMRI could be used in multicenter imaging studies as a reliable imaging marker. Copyright © 2017 Elsevier Inc. All rights reserved.
Impact of clinical history on chest radiograph interpretation.

PubMed

Test, Matthew; Shah, Samir S; Monuteaux, Michael; Ambroggio, Lilliam; Lee, Edward Y; Markowitz, Richard I; Bixby, Sarah; Diperna, Stephanie; Servaes, Sabah; Hellinger, Jeffrey C; Neuman, Mark I

2013-07-01

The inclusion of clinical information may have unrecognized influence in the interpretation of diagnostic testing. The objective of the study was to determine the impact of clinical history on chest radiograph interpretation in the diagnosis of pneumonia. Prospective case-based study. Radiologists interpreted 110 radiographs of children evaluated for suspicion of pneumonia. Clinical information was withheld during the first interpretation. After 6 months the radiographs were reviewed with clinical information. Radiologists reported on pneumonia indicators described by the World Health Organization (ie, any infiltrate, alveolar infiltrate, interstitial infiltrate, air bronchograms, hilar adenopathy, pleural effusion). Children's Hospital of Philadelphia and Boston Children's Hospital. Six board-certified radiologists. Inter- and inter-rater reliability were assessed using the kappa statistic. The addition of clinical history did not have a substantial impact on the inter-rater reliability in the identification of any infiltrate, alveolar infiltrate, interstitial infiltrate, pleural effusion, or hilar adenopathy. Inter-rater reliability in the identification of air bronchograms improved from fair (k = 0.32) to moderate (k = 0.53). Intra-rater reliability for the identification of alveolar infiltrate remained substantial to almost perfect for all 6 raters with and without clinical information. One rater had a decrease in inter-rater reliability from almost perfect (k = 1.0) to fair (k = 0.21) in the identification of interstitial infiltrate with the addition of clinical history. Alveolar infiltrate and pleural effusion are findings with high intra- and inter-rater reliability in the diagnosis of bacterial pneumonia. The addition of clinical information did not have a substantial impact on the reliability of these findings. © 2012 Society of Hospital Medicine.
Inter-rater reliability of the Full Outline of UnResponsiveness score and the Glasgow Coma Scale in critically ill patients: a prospective observational study

PubMed Central

2010-01-01

Introduction The Glasgow Coma Scale (GCS) is the most widely used scoring system for comatose patients in intensive care. Limitations of the GCS include the impossibility to assess the verbal score in intubated or aphasic patients, and an inconsistent inter-rater reliability. The FOUR (Full Outline of UnResponsiveness) score, a new coma scale not reliant on verbal response, was recently proposed. The aim of the present study was to compare the inter-rater reliability of the GCS and the FOUR score among unselected patients in general critical care. A further aim was to compare the inter-rater reliability of neurologists with that of intensive care unit (ICU) staff. Methods In this prospective observational study, scoring of GCS and FOUR score was performed by neurologists and ICU staff on 267 consecutive patients admitted to intensive care. Results In a total of 437 pair wise ratings the exact inter-rater agreement for the GCS was 71%, and for the FOUR score 82% (P = 0.0016); the inter-rater agreement within a range of ± 1 score point for the GCS was 90%, and for the FOUR score 92% (P = ns.). The exact inter-rater agreement among neurologists was superior to that among ICU staff for the FOUR score (87% vs. 79%, P = 0.04) but not for the GCS (73% vs. 73%). Neurologists and ICU staff did not significantly differ in the inter-rater agreement within a range of ± 1 score point for both GCS (88% vs. 93%) and the FOUR score (91% vs. 88%). Conclusions The FOUR score performed better than the GCS for exact inter-rater agreement, but not for the clinically more relevant agreement within the range of ± 1 score point. Though neurologists outperformed ICU staff with regard to exact inter-rater agreement, the inter-rater agreement of ICU staff within the clinically more relevant range of ± 1 score point equalled that of the neurologists. The small advantage in inter-rater reliability of the FOUR score is most likely insufficient to replace the GCS, a score with a long tradition in intensive care. PMID:20398274
Can Physicians Identify Inappropriate Nuclear Stress Tests? An Examination of Inter-rater Reliability for the 2009 Appropriate Use Criteria for Radionuclide Imaging

PubMed Central

Ye, Siqin; Rabbani, LeRoy E.; Kelly, Christopher R.; Kelly, Maureen R.; Lewis, Matthew; Paz, Yehuda; Peck, Clara L.; Rao, Shaline; Bokhari, Sabahat; Weiner, Shepard D.; Einstein, Andrew J.

2014-01-01

Background We sought to determine inter-rater reliability of the 2009 Appropriate Use Criteria (AUC) for radionuclide imaging (RNI) and whether physicians at various levels of training can effectively identify nuclear stress tests with inappropriate indications. Methods and Results Four hundred patients were randomly selected from a consecutive cohort of patients undergoing nuclear stress testing at an academic medical center. Raters with different levels of training (including cardiology attending physicians, cardiology fellows, internal medicine hospitalists, and internal medicine interns) classified individual nuclear stress tests using the 2009 AUC. Consensus classification by two cardiologists was considered the operational gold standard, and sensitivity and specificity of individual raters for identifying inappropriate tests was calculated. Inter-rater reliability of the AUC was assessed using Cohen’s kappa statistics for pairs of different raters. The mean age of patients was 61.5 years; 214 (54%) were female. The cardiologists rated 256 (64%) of 400 NSTs as appropriate, 68 (18%) as uncertain, 55 (14%) as inappropriate; 21 (5%) tests were unable to be classified. Inter-rater reliability for non-cardiologist raters was modest (unweighted Cohen’s kappa, 0.51, 95% confidence interval, 0.45 to 0.55). Sensitivity of individual raters for identifying inappropriate tests ranged from 47% to 82%, while specificity ranged from 85% to 97%. Conclusions Inter-rater reliability for the 2009 AUC for RNI is modest, and there is considerable variation in the ability of raters at different levels of training to identify inappropriate tests. PMID:25563660
Motivational Interviewing Skills in Health Care Encounters (MISHCE): Development and psychometric testing of an assessment tool.

PubMed

Petrova, Tatjana; Kavookjian, Jan; Madson, Michael B; Dagley, John; Shannon, David; McDonough, Sharon K

2015-01-01

Motivational interviewing (MI) has demonstrated a significant impact as an intervention strategy for addiction management, change in lifestyle behaviors, and adherence to prescribed medication and other treatments. Key elements to studying MI include training in MI of professionals who will use it, assessment of skills acquisition in trainees, and the use of a validated skills assessment tool. The purpose of this research project was to develop a psychometrically valid and reliable tool that has been designed to assess MI skills competence in health care provider trainees. The goal was to develop an assessment tool that would evaluate the acquisition and use of specific MI skills and principles, as well as the quality of the patient-provider therapeutic alliance in brief health care encounters. To address this purpose, specific steps were followed, beginning with a literature review. This review contributed to the development of relevant conceptual and operational definitions, selecting a scaling technique and response format, and methods for analyzing validity and reliability. Internal consistency reliability was established on 88 video recorded interactions. The inter-rater and test-retest reliability were established using randomly selected 18 from the 88 interactions. The assessment tool Motivational Interviewing Skills for Health Care Encounters (MISHCE) and a manual for use of the tool were developed. Validity and reliability of MISHCE were examined. Face and content validity were supported with well-defined conceptual and operational definitions and feedback from an expert panel. Reliability was established through internal consistency, inter-rater reliability, and test-retest reliability. The overall internal consistency reliability (Cronbach's alpha) for all fifteen items was 0.75. MISHCE demonstrated good inter-rater reliability and good to excellent test-retest reliability. MISHCE assesses the health provider's level of knowledge and skills in brief disease management encounters. MISHCE also evaluates quality of the patient-provider therapeutic alliance, i.e., the "flow" of the interaction. Copyright © 2015 Elsevier Inc. All rights reserved.
Development and validation of the Bush-Francis Catatonia Rating Scale - Brazilian version.

PubMed

Nunes, Ana Letícia Santos; Filgueiras, Alberto; Nicolato, Rodrigo; Alvarenga, Jussara Mendonça; Silveira, Luciana Angélica Silva; Silva, Rafael Assis da; Cheniaux, Elie

2017-01-01

This article aims to describe the adaptation and translation process of the Bush-Francis Catatonia Rating Scale (BFCRS) and its reduced version, the Bush-Francis Catatonia Screening Instrument (BFCSI) for Brazilian Portuguese, as well as its validation. Semantic equivalence processes included four steps: translation, back translation, evaluation of semantic equivalence and a pilot-study. Validation consisted of simultaneous applications of the instrument in Portuguese by two examiners in 30 catatonic and 30 non-catatonic patients. Total scores averaged 20.07 for the complete scale and 7.80 for its reduced version among catatonic patients, compared with 0.47 and 0.20 among non-catatonic patients, respectively. Overall values of inter-rater reliability of the instruments were 0.97 for the BFCSI and 0.96 for the BFCRS. The scale's version in Portuguese proved to be valid and was able to distinguish between catatonic and non-catatonic patients. It was also reliable, with inter-evaluator reliability indexes as high as those of the original instrument.
Reliability and main findings of the FEES-Tensilon Test in patients with myasthenia gravis and dysphagia.

PubMed

Im, Sun; Suntrup-Krueger, Sonja; Colbow, Sigrid; Sauer, Sonja; Claus, Inga; Meuth, Sven G; Dziewas, Rainer; Warnecke, Tobias

2018-05-26

Diagnosis of pharyngeal dysphagia caused by myasthenia gravis (MG) based on clinical examination alone is often challenging. Flexible endoscopic evaluation of swallowing (FEES) combined with Tensilon (edrophonium) application, referred to as the FEES-Tensilon Test, was developed to improve diagnostic accuracy and to detect the main symptoms of pharyngeal dysphagia in MG. Here we investigated inter- and intra-rater reliability of the FEES-Tensilon Test and analyzed the main endoscopic findings. Four experienced raters reviewed a total of 20 FEES-Tensilon-Test videos in randomized order. Residue severity was graded at 4 different pharyngeal spaces before and after Tensilon administration. All interpretations were performed twice per rater, 4 weeks apart (a total of 160 scorings). Intra-rater test-retest reliability and inter-rater reliability levels were calculated. The most frequent FEES findings in MG patients before Tensilon application were prominent residues of semi solids spread all over the hypopharynx in varying locations. The reliability level in the interpretation of the FEES-Tensilon test was excellent regardless of the raters' profession or years of experience with FEES. All 4 raters showed high inter- and intra- reliability levels in interpreting the FEES-Tensilon Test based on residue clearance (kappa=0.922, 0.981). Degree of residue normalization in the vallecular space after Tensilon application showed the highest inter- and intra-rater reliability level (kappa=0.863, 0.957) followed by the epiglottis (kappa=0.813, 0.946) and pyriform sinuses (kappa=0.836, 0.929). Interpretation of the FEES-Tensilon Test based on residue severity and degree of Tensilon clearance, especially in the vallecular space, is consistent and reliable. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Test-re-test reliability and inter-rater reliability of a digital pelvic inclinometer in young, healthy males and females.

PubMed

Beardsley, Chris; Egerton, Tim; Skinner, Brendon

2016-01-01

Objective. The purpose of this study was to investigate the reliability of a digital pelvic inclinometer (DPI) for measuring sagittal plane pelvic tilt in 18 young, healthy males and females. Method. The inter-rater reliability and test-re-test reliabilities of the DPI for measuring pelvic tilt in standing on both the right and left sides of the pelvis were measured by two raters carrying out two rating sessions of the same subjects, three weeks apart. Results. For measuring pelvic tilt, inter-rater reliability was designated as good on both sides (ICC = 0.81-0.88), test-re-test reliability within a single rating session was designated as good on both sides (ICC = 0.88-0.95), and test-re-test reliability between two rating sessions was designated as moderate on the left side (ICC = 0.65) and good on the right side (ICC = 0.85). Conclusion. Inter-rater reliability and test-re-test reliability within a single rating session of the DPI in measuring pelvic tilt were both good, while test-re-test reliability between rating sessions was moderate-to-good. Caution is required regarding the interpretation of the test-re-test reliability within a single rating session, as the raters were not blinded. Further research is required to establish validity.
The modified gait abnormality rating scale in patients with a conversion disorder: a reliability and responsiveness study.

PubMed

Vandenberg, Justin M; George, Deanna R; O'Leary, Andrea J; Olson, Lindsay C; Strassburg, Kaitlyn R; Hollman, John H

2015-01-01

Individuals with conversion disorder have neurologic symptoms that are not identified by an underlying organic cause. Often the symptoms manifest as gait disturbances. The modified gait abnormality rating scale (GARS-M) may be useful for quantifying gait abnormalities in these individuals. The purpose of this study was to examine the reliability, responsiveness and concurrent validity of GARS-M scores in individuals with conversion disorder. Data from 27 individuals who completed a rehabilitation program were included in this study. Pre- and post-intervention videos were obtained and walking speed was measured. Five examiners independently evaluated gait performance according to the GARS-M criteria. Inter- and intrarater reliability of GARS-M scores were estimated with intraclass correlation coefficients (ICCs). Responsiveness was estimated with the minimum detectable change (MDC). Pre- to post-treatment changes in GARS-M scores were analyzed with a dependent t-test. The correlation between GARS-M scores and walking speed was analyzed to assess concurrent validity. GARS-M scores were quantified with good-to-excellent inter- (ICC = 0.878) and intrarater reliability (ICC = 0.989). The MDC was 2 points. Mean GARS-M scores decreased from 7 ± 5 at baseline to 1 ± 2 at discharge (t26 = 7.411, p < 0.001) and 85% of patients improved beyond the MDC. Furthermore, GARS-M scores and walking speed measurements were moderately correlated (r = -0.582, p = 0.004), indicating that the GARS-M has acceptable concurrent validity. Our findings provide evidence that the GARS-M scores are reliable, valid and responsive for quantifying gait abnormalities in patients with conversion disorder. GARS-M scores provide objective measures upon which treatment effects can be assessed. Copyright © 2014 Elsevier B.V. All rights reserved.
Reliability and Validity of Autism Diagnostic Interview-Revised, Japanese Version

ERIC Educational Resources Information Center

Tsuchiya, Kenji J.; Matsumoto, Kaori; Yagi, Atsuko; Inada, Naoko; Kuroda, Miho; Inokuchi, Eiko; Koyama, Tomonori; Kamio, Yoko; Tsujii, Masatsugu; Sakai, Saeko; Mohri, Ikuko; Taniike, Masako; Iwanaga, Ryoichiro; Ogasahara, Kei; Miyachi, Taishi; Nakajima, Shunji; Tani, Iori; Ohnishi, Masafumi; Inoue, Masahiko; Nomura, Kazuyo; Hagiwara, Taku; Uchiyama, Tokio; Ichikawa, Hironobu; Kobayashi, Shuji; Miyamoto, Ken; Nakamura, Kazuhiko; Suzuki, Katsuaki; Mori, Norio; Takei, Nori

2013-01-01

To examine the inter-rater reliability of Autism Diagnostic Interview-Revised, Japanese Version (ADI-R-JV), the authors recruited 51 individuals aged 3-19 years, interviewed by two independent raters. Subsequently, to assess the discriminant and diagnostic validity of ADI-R-JV, the authors investigated 317 individuals aged 2-19 years, who were…
How Reliable Are Students' Evaluations of Teaching Quality? A Variance Components Approach

ERIC Educational Resources Information Center

Feistauer, Daniela; Richter, Tobias

2017-01-01

The inter-rater reliability of university students' evaluations of teaching quality was examined with cross-classified multilevel models. Students (N = 480) evaluated lectures and seminars over three years with a standardised evaluation questionnaire, yielding 4224 data points. The total variance of these student evaluations was separated into the…
Measurement of the Inter-Rater Reliability Rate Is Mandatory for Improving the Quality of a Medical Database: Experience with the Paulista Lung Cancer Registry.

PubMed

Lauricella, Leticia L; Costa, Priscila B; Salati, Michele; Pego-Fernandes, Paulo M; Terra, Ricardo M

2018-06-01

Database quality measurement should be considered a mandatory step to ensure an adequate level of confidence in data used for research and quality improvement. Several metrics have been described in the literature, but no standardized approach has been established. We aimed to describe a methodological approach applied to measure the quality and inter-rater reliability of a regional multicentric thoracic surgical database (Paulista Lung Cancer Registry). Data from the first 3 years of the Paulista Lung Cancer Registry underwent an audit process with 3 metrics: completeness, consistency, and inter-rater reliability. The first 2 methods were applied to the whole data set, and the last method was calculated using 100 cases randomized for direct auditing. Inter-rater reliability was evaluated using percentage of agreement between the data collector and auditor and through calculation of Cohen's κ and intraclass correlation. The overall completeness per section ranged from 0.88 to 1.00, and the overall consistency was 0.96. Inter-rater reliability showed many variables with high disagreement (>10%). For numerical variables, intraclass correlation was a better metric than inter-rater reliability. Cohen's κ showed that most variables had moderate to substantial agreement. The methodological approach applied to the Paulista Lung Cancer Registry showed that completeness and consistency metrics did not sufficiently reflect the real quality status of a database. The inter-rater reliability associated with κ and intraclass correlation was a better quality metric than completeness and consistency metrics because it could determine the reliability of specific variables used in research or benchmark reports. This report can be a paradigm for future studies of data quality measurement. Copyright © 2018 American College of Surgeons. Published by Elsevier Inc. All rights reserved.
Inter- and intra-rater reliability of calliper-based lymph node measurement in dogs with peripheral nodal lymphomas.

PubMed

Childress, M O; Fulkerson, C M; Lahrman, S A; Weng, H-Y

2016-08-01

The purpose of this study was to assess reliability of lymph node measurements between and within raters in dogs with nodal lymphomas. Three raters measured lymph nodes from 20 dogs twice prior to and once after administering chemotherapy. Sum tumour volume (TV) and sum longest diameter (LD) of all lymph nodes at each time point, and the percent change in measurements following chemotherapy, were calculated for each dog. Inter- and intra-rater reliability were assessed with the intraclass correlation coefficient (ICC). ICC for inter-rater sum TV and sum LD prior to chemotherapy were 0.86 and 0.80, respectively. ICC for inter-rater sum TV and sum LD after chemotherapy were 0.95 and 0.91, respectively. ICC for percent change in sum TV and sum LD were 0.96 and 0.94, respectively. ICC for intra-rater reliability ranged from 0.90 to 0.98 for each rater. Inter- and intra-rater reliability in measurements among the three raters was good to excellent. © 2014 John Wiley & Sons Ltd.
A preliminary examination of the validity and reliability of a new brief rating scale for symptom domains of psychosis: Brief Evaluation of Psychosis Symptom Domains (BE-PSD).

PubMed

Takeuchi, Hiroyoshi; Fervaha, Gagan; Lee, Jimmy; Agid, Ofer; Remington, Gary

2016-09-01

Brief assessments have the potential to be widely adopted as outcome measures in research but also routine clinical practice. Existing brief rating scales that assess symptoms of schizophrenia or psychosis have a number of limitations including inability to capture five symptom domains of psychosis and a lack of clearly defined operational anchor points for scoring. We developed a new brief rating scale for five symptom domains of psychosis with clearly defined operational anchor points - the Brief Evaluation of Psychosis Symptom Domains (BE-PSD). To examine the psychometric properties of the BE-PSD, fifty patients with schizophrenia or schizoaffective disorder were included in this preliminary cross-sectional study. To test the convergent and discriminant validity of the BE-PSD, correlational analyses were employed using the consensus Positive and Negative Syndrome Scale (PANSS) five-factor model. To examine the inter-rater reliability of the BE-PSD, single measures intraclass correlation coefficients (ICCs) were calculated for 11 patients. The BE-PSD domain scores demonstrated high convergent validity with the corresponding PANSS factor score (rs = 0.81-0.93) as well as good discriminant validity, as evidenced by lower correlations with the other PANSS factors (rs = 0.23-0.62). The BE-PSD also demonstrated excellent inter-rater reliability for each of the domain scores and the total scores (ICC(2,1) = 0.79-0.96). The present preliminary study found the BE-PSD measure to be valid and reliable; however, further studies are needed to establish the psychometric properties of the BE-PSD because of the limitations such as the small sample size and lacking data on test-retest reliability or sensitivity to change. Copyright © 2016 Elsevier Ltd. All rights reserved.
Reliability of photogrammetry in the evaluation of the postural aspects of individuals with structural scoliosis.

PubMed

Saad, Karen Ruggeri; Colombo, Alexandra Siqueira; Ribeiro, Ana Paula; João, Sílvia Maria Amado

2012-04-01

The purpose of this study was to investigate the reliability of photogrammetry in the measurement of the postural deviations in individuals with idiopathic scoliosis. Twenty participants with scoliosis (17 women and three men), with a mean age of 23.1 ± 9 yrs, were photographed from the posterior and lateral views. The postural aspects were measured with CorelDRAW software. High inter-rater and test-retest reliability indices were found. It was observed that with more severity of scoliosis, greater were the variations between the thoracic kyphosis and lumbar lordosis measures obtained by the same examiner from the left lateral view photographs. A greater body mass index (BMI) was associated with greater variability of the trunk rotation measures obtained by two independent examiners from the right, lateral view (r = 0.656; p = 0.002). The severity of scoliosis was also associated with greater inter-rater variability measures of trunk rotation obtained from the left, lateral view (r = 0.483; p = 0.036). Photogrammetry demonstrated to be a reliable method for the measurement of postural deviations from the posterior and lateral views of individuals with idiopathic scoliosis and could be complementarily employed for the assessment procedures, which could reduce the number of X-rays used for the follow-up assessments of these individuals. Copyright © 2011 Elsevier Ltd. All rights reserved.
Schedule for personality assessment from notes and documents (SPAN-DOC): Preliminary validation, links to the ICD-11 classification of personality disorder, and use in eating disorders.

PubMed

Kim, Youl-Ri; Tyrer, Peter; Lee, Hong-Seock; Kim, Sung-Gon; Connan, Frances; Kinnaird, Emma; Olajide, Kike; Crawford, Mike

2016-05-01

The underlying core of personality is insufficiently assessed by any single instrument. This has led to the development of instruments adapted for written records in the assessment of personality disorder. To test the construct validity and inter-rater reliability of a new personality assessment method. This study (four parts) assessed the construct validity of the Schedule for Personality Assessment from Notes and Documents (SPAN-DOC), a dimensional assessment from clinical records. We examined inter-rater reliability using case vignettes (Part 1) and convergent validity in three ways: by comparison with NEO Five-Factor Inventory in 130 Korean patients (Part 2), with agreed ICD-11 personality severity levels in two populations (Part 3) and determining its use in assessing the personality status in 90 British patients with eating disorders (Part 4). Internal consistency (alpha = .90) and inter-rater reliability (intraclass correlation coefficient ≥ .88) were satisfactory. Each factor in the five-factor model of personality was correlated with conceptually valid SPAN-DOC variables. The SPAN-DOC domain traits in those with eating disorders were categorized into 3 clusters: self-aggrandisement, emotionally unstable, and anxious/dependent. This study provides preliminary support for the usefulness of SPAN-DOC in the assessment of personality disorder. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Specvis: Free and open-source software for visual field examination.

PubMed

Dzwiniel, Piotr; Gola, Mateusz; Wójcik-Gryciuk, Anna; Waleszczyk, Wioletta J

2017-01-01

Visual field impairment affects more than 100 million people globally. However, due to the lack of the access to appropriate ophthalmic healthcare in undeveloped regions as a result of associated costs and expertise this number may be an underestimate. Improved access to affordable diagnostic software designed for visual field examination could slow the progression of diseases, such as glaucoma, allowing for early diagnosis and intervention. We have developed Specvis, a free and open-source application written in Java programming language that can run on any personal computer to meet this requirement (http://www.specvis.pl/). Specvis was tested on glaucomatous, retinitis pigmentosa and stroke patients and the results were compared to results using the Medmont M700 Automated Static Perimeter. The application was also tested for inter-test intrapersonal variability. The results from both validation studies indicated low inter-test intrapersonal variability, and suitable reliability for a fast and simple assessment of visual field impairment. Specvis easily identifies visual field areas of zero sensitivity and allows for evaluation of its levels throughout the visual field. Thus, Specvis is a new, reliable application that can be successfully used for visual field examination and can fill the gap between confrontation and perimetry tests. The main advantages of Specvis over existing methods are its availability (free), affordability (runs on any personal computer), and reliability (comparable to high-cost solutions).
Specvis: Free and open-source software for visual field examination

PubMed Central

Dzwiniel, Piotr; Gola, Mateusz; Wójcik-Gryciuk, Anna

2017-01-01

Visual field impairment affects more than 100 million people globally. However, due to the lack of the access to appropriate ophthalmic healthcare in undeveloped regions as a result of associated costs and expertise this number may be an underestimate. Improved access to affordable diagnostic software designed for visual field examination could slow the progression of diseases, such as glaucoma, allowing for early diagnosis and intervention. We have developed Specvis, a free and open-source application written in Java programming language that can run on any personal computer to meet this requirement (http://www.specvis.pl/). Specvis was tested on glaucomatous, retinitis pigmentosa and stroke patients and the results were compared to results using the Medmont M700 Automated Static Perimeter. The application was also tested for inter-test intrapersonal variability. The results from both validation studies indicated low inter-test intrapersonal variability, and suitable reliability for a fast and simple assessment of visual field impairment. Specvis easily identifies visual field areas of zero sensitivity and allows for evaluation of its levels throughout the visual field. Thus, Specvis is a new, reliable application that can be successfully used for visual field examination and can fill the gap between confrontation and perimetry tests. The main advantages of Specvis over existing methods are its availability (free), affordability (runs on any personal computer), and reliability (comparable to high-cost solutions). PMID:29028825
Do you see what I see? Mobile eye-tracker contextual analysis and inter-rater reliability.

PubMed

Stuart, S; Hunt, D; Nell, J; Godfrey, A; Hausdorff, J M; Rochester, L; Alcock, L

2018-02-01

Mobile eye-trackers are currently used during real-world tasks (e.g. gait) to monitor visual and cognitive processes, particularly in ageing and Parkinson's disease (PD). However, contextual analysis involving fixation locations during such tasks is rarely performed due to its complexity. This study adapted a validated algorithm and developed a classification method to semi-automate contextual analysis of mobile eye-tracking data. We further assessed inter-rater reliability of the proposed classification method. A mobile eye-tracker recorded eye-movements during walking in five healthy older adult controls (HC) and five people with PD. Fixations were identified using a previously validated algorithm, which was adapted to provide still images of fixation locations (n = 116). The fixation location was manually identified by two raters (DH, JN), who classified the locations. Cohen's kappa correlation coefficients determined the inter-rater reliability. The algorithm successfully provided still images for each fixation, allowing manual contextual analysis to be performed. The inter-rater reliability for classifying the fixation location was high for both PD (kappa = 0.80, 95% agreement) and HC groups (kappa = 0.80, 91% agreement), which indicated a reliable classification method. This study developed a reliable semi-automated contextual analysis method for gait studies in HC and PD. Future studies could adapt this methodology for various gait-related eye-tracking studies.
Reliability and validity of a Chinese version of the Diagnostic Interview for Borderlines-Revised.

PubMed

Wang, Lanlan; Yuan, Chenmei; Qiu, Jianying; Gunderson, John; Zhang, Min; Jiang, Kaida; Leung, Freedom; Zhong, Jie; Xiao, Zeping

2014-09-01

Borderline personality disorder (BPD) is the most studied of the axis II disorders. One of the most widely used diagnostic instruments is the Diagnostic Interview for Borderline Patients-Revised (DIB-R). The aim of this study was to test the reliability and validity of DIB-R for use in the Chinese culture. The reliability and validity of the DIB-R Chinese version were assessed in a sample of 236 outpatients with a probable BPD diagnosis. The Structured Clinical Interview for DSM-IV Personality Disorders (SCID-II) was used as a standard. Test-retest reliability was tested six months later with 20 patients, and inter-rater reliability was tested on 32 patients. The Chinese version of the DIB-R showed good internal global consistency (Cronbach's α of 0.916), good test-retest reliability (Pearson correlation of 0.704), good inter-rater reliability (intra-class correlation coefficient of 0.892 and kappa of 0.861). When compared with the DSM-IV diagnosis as measured by the SCID-II, the DIB-R showed relatively good sensitivity (0.768) and specificity (0.891) at the cutoff of 7, moderate diagnostic convergence (kappa of 0.631), as well as good discriminating validity. The Chinese version of the DIB-R has good psychometric properties, which renders it a valuable method for examining the presence, the severity, and component phenotypes of BPD in Chinese samples. © 2013 Wiley Publishing Asia Pty Ltd.

Reliability and diagnostic validity of the slump knee bend neurodynamic test for upper/mid lumbar nerve root compression: a pilot study.

PubMed

Trainor, Kate; Pinnington, Mark A

2011-03-01

It has been proposed that neurodynamic examination can assist differential diagnosis of upper/mid lumbar nerve root compression; however, the diagnostic validity of many of these tests has yet to be established. This pilot study aimed to establish the diagnostic validity of the slump knee bend neurodynamic test for upper/mid lumbar nerve root compression in subjects with suspected lumbosacral radicular pain. Two independent examiners performed the slump knee bend test on subjects with radicular leg pain. Inter-tester reliability was calculated using the kappa coefficient. Slump knee bend test results were compared with magnetic resonance imaging findings, and diagnostic accuracy measures were calculated including sensitivity, specificity, predictive values and likelihood ratios. Orthopaedic spinal clinic, secondary care. Sixteen patients with radicular leg pain. All four subjects with mid lumbar nerve root compression on magnetic resonance imaging were correctly identified with the slump knee bend test; however, it was falsely positive in two individuals without the condition. Inter-tester reliability for the slump knee bend test using the kappa coefficient was 0.71 (95% confidence interval 0.33 to 1.0). Diagnostic validity calculations for the slump knee bend test (95% confidence intervals) were: sensitivity, 100% (40 to 100%); specificity, 83% (52 to 98%); positive predictive value, 67% (22 to 96%); negative predictive value, 100% (69 to 100%); positive likelihood ratio, 6.0 (1.58 to 19.4); and negative likelihood ratio, 0 (0 to 0.6). Results indicate good inter-tester reliability and suggest that the slump knee bend test has potential to be a useful clinical test for identifying patients with mid lumbar nerve root compression. Further investigation is needed on larger numbers of patients to confirm these findings. Copyright © 2010 Chartered Society of Physiotherapy. Published by Elsevier Ltd. All rights reserved.
Anatomical landmark position--can we trust what we see? Results from an online reliability and validity study of osteopaths.

PubMed

Pattyn, Elise; Rajendran, Dévan

2014-04-01

Practitioners traditionally use observation to classify the position of patients' anatomical landmarks. This information may contribute to diagnosis and patient management. To calculate a) Inter-rater reliability of categorising the sagittal plane position of four anatomical landmarks (lateral femoral epicondyle, greater trochanter, mastoid process and acromion) on side-view photographs (with landmarks highlighted and not-highlighted) of anonymised subjects; b) Intra-rater reliability; c) Individual landmark inter-rater reliability; d) Validity against a 'gold standard' photograph. Online inter- and intra-rater reliability study. Photographed subjects: convenience sample of asymptomatic students; raters: randomly selected UK registered osteopaths. 40 photographs of 30 subjects were used, a priori clinically acceptable reliability was ≥0.4. Inter-rater arm: 20 photographs without landmark highlights plus 10 with highlights; Intra-rater arm: 10 duplicate photographs (non-highlighted landmarks). Validity arm: highlighted landmark scores versus 'gold standard' photographs with vertical line. Research ethics approval obtained. Osteopaths (n = 48) categorised landmark position relative to imagined vertical-line; Gwet's Agreement Coefficient 1 (AC1) calculated and chance-corrected coefficient benchmarked against Landis and Koch's scale; Validity calculation used Kendall's tau-B. Inter-rater reliability was 'fair' (AC1 = 0.342; 95% confidence interval (CI) = 0.279-0.404) for non-highlighted landmarks and 'moderate' (AC1 = 0.700; 95% CI = 0.596-0.805) for highlighted landmarks. Intra-rater reliability was 'fair' (AC1 = 0.522); range was 'poor' (AC1 = 0.160) to 'substantial' (AC1 = 0.896). No differences were found between individual landmarks. Validity was 'low' (TB = 0.327; p = 0.104). Both inter- and intra-rater reliability was 'fair' but below clinically acceptable levels, validity was 'low'. Together these results challenge the clinical practice of using observation to categorise anterio-posterior landmark position. Copyright © 2014 Elsevier Ltd. All rights reserved.
Correction of gene expression data: Performance-dependency on inter-replicate and inter-treatment biases.

PubMed

Darbani, Behrooz; Stewart, C Neal; Noeparvar, Shahin; Borg, Søren

2014-10-20

This report investigates for the first time the potential inter-treatment bias source of cell number for gene expression studies. Cell-number bias can affect gene expression analysis when comparing samples with unequal total cellular RNA content or with different RNA extraction efficiencies. For maximal reliability of analysis, therefore, comparisons should be performed at the cellular level. This could be accomplished using an appropriate correction method that can detect and remove the inter-treatment bias for cell-number. Based on inter-treatment variations of reference genes, we introduce an analytical approach to examine the suitability of correction methods by considering the inter-treatment bias as well as the inter-replicate variance, which allows use of the best correction method with minimum residual bias. Analyses of RNA sequencing and microarray data showed that the efficiencies of correction methods are influenced by the inter-treatment bias as well as the inter-replicate variance. Therefore, we recommend inspecting both of the bias sources in order to apply the most efficient correction method. As an alternative correction strategy, sequential application of different correction approaches is also advised. Copyright © 2014 Elsevier B.V. All rights reserved.
Reliability analysis for radiographic measures of lumbar lordosis in adult scoliosis: a case–control study comparing 6 methods

PubMed Central

Hong, Jae Young; Modi, Hitesh N.; Hur, Chang Yong; Song, Hae Ryong; Park, Jong Hoon

2010-01-01

Several methods are used to measure lumbar lordosis. In adult scoliosis patients, the measurement is difficult due to degenerative changes in the vertebral endplate as well as the coronal and sagittal deformity. We did the observational study with three examiners to determine the reliability of six methods for measuring the global lumbar lordosis in adult scoliosis patients. Ninety lateral lumbar radiographs were collected for the study. The radiographs were divided into normal (Cobb < 10°), low-grade (Cobb 10°–19°), high-grade (Cobb ≥ 20°) group to determine the reliability of Cobb L1–S1, Cobb L1–L5, centroid, posterior tangent L1–S1, posterior tangent L1–L5 and TRALL method in adult scoliosis. The 90 lateral radiographs were measured twice by each of the three examiners using the six measurement methods. The data was analyzed to determine the inter- and intra-observer reliability. In general, for the six radiographic methods, the inter- and intra-class correlation coefficients (ICCs) were all ≥0.82. A comparison of the ICCs and 95% CI for the inter- and intra-observer reliability between the groups with varying degrees of scoliosis showed that, the reliability of the lordosis measurement decreased with increasing severity of scoliosis. In Cobb L1–S1, centroid and posterior tangent L1–S1 methods, the ICCs were relatively lower in the high-grade scoliosis group (≥0.60). And, the mean absolute difference (MAD) in these methods was high in the high-grade scoliosis group (≤7.17°). However, in the Cobb L1–L5 and posterior tangent L1–L5 method, the ICCs were ≥0.86 in all groups. And, in the TRALL method, the ICCs were ≥0.76 in all groups. In addition, in the Cobb L1–L5 and posterior tangent L1–L5 method, the MAD was ≤3.63°. And, in the TRALL method, the MAD was ≤3.84° in all groups. We concluded that the Cobb L1–L5 and the posterior tangent L1–L5 methods are reliable methods for measuring the global lumbar lordosis in adult scoliosis. And the TRALL method is more reliable method than other methods which include the L5–S1 joint in lordosis measurement. PMID:20437183
Nutrition Environment Measures Survey in stores (NEMS-S): development and evaluation.

PubMed

Glanz, Karen; Sallis, James F; Saelens, Brian E; Frank, Lawrence D

2007-04-01

Eating, or nutrition, environments are believed to contribute to obesity and chronic diseases. There is a need for valid, reliable measures of nutrition environments. This article reports on the development and evaluation of measures of nutrition environments in retail food stores. The Nutrition Environment Measures Study developed observational measures of the nutrition environment within retail food stores (NEMS-S) to assess availability of healthy options, price, and quality. After pretesting, measures were completed by independent raters to evaluate inter-rater reliability and across two occasions to assess test-retest reliability in grocery and convenience stores in four neighborhoods differing on income and community design in the Atlanta metropolitan area. Data were collected and analyzed in 2004 and 2005. Ten food categories (e.g., fruits) or indicator food items (e.g., ground beef) were evaluated in 85 stores. Inter-rater reliability and test-retest reliability of availability were high: inter-rater reliability kappas were 0.84 to 1.00, and test-retest reliabilities were .73 to 1.00. Inter-rater reliability for quality across fresh produce was moderate (kappas, 0.44 to 1.00). Healthier options were higher priced for hot dogs, lean ground beef, and baked chips. More healthful options were available in grocery than convenience stores and in stores in higher income neighborhoods. The NEMS-S tool was found to have a high degree of inter-rater and test-retest reliability, and to reveal significant differences across store types and neighborhoods of high and low socioeconomic status. These observational measures of nutrition environments can be applied in multilevel studies of community nutrition, and can inform new approaches to conducting and evaluating nutrition interventions.
Inter- and intra-operator reliability and repeatability of shear wave elastography in the liver: a study in healthy volunteers.

PubMed

Hudson, John M; Milot, Laurent; Parry, Craig; Williams, Ross; Burns, Peter N

2013-06-01

This study assessed the reproducibility of shear wave elastography (SWE) in the liver of healthy volunteers. Intra- and inter-operator reliability and repeatability were quantified in three different liver segments in a sample of 15 subjects, scanned during four independent sessions (two scans on day 1, two scans 1 wk later) by two operators. A total of 1440 measurements were made. Reproducibility was assessed using the intra-class correlation coefficient (ICC) and a repeated measures analysis of variance. The shear wave speed was measured and used to estimate Young's modulus using the Supersonics Imagine Aixplorer. The median Young's modulus measured through the inter-costal space was 5.55 ± 0.74 kPa. The intra-operator reliability was better for same-day evaluations (ICC = 0.91) than the inter-operator reliability (ICC = 0.78). Intra-observer agreement decreased when scans were repeated on a different day. Inter-session repeatability was between 3.3% and 9.9% for intra-day repeated scans, compared with to 6.5%-12% for inter-day repeated scans. No significant difference was observed in subjects with a body mass index greater or less than 25 kg/m(2). Copyright © 2013 World Federation for Ultrasound in Medicine & Biology. Published by Elsevier Inc. All rights reserved.
The Berg Balance Scale has high intra- and inter-rater reliability but absolute reliability varies across the scale: a systematic review.

PubMed

Downs, Stephen; Marquez, Jodie; Chiarelli, Pauline

2013-06-01

What is the intra-rater and inter-rater relative reliability of the Berg Balance Scale? What is the absolute reliability of the Berg Balance Scale? Does the absolute reliability of the Berg Balance Scale vary across the scale? Systematic review with meta-analysis of reliability studies. Any clinical population that has undergone assessment with the Berg Balance Scale. Relative intra-rater reliability, relative inter-rater reliability, and absolute reliability. Eleven studies involving 668 participants were included in the review. The relative intrarater reliability of the Berg Balance Scale was high, with a pooled estimate of 0.98 (95% CI 0.97 to 0.99). Relative inter-rater reliability was also high, with a pooled estimate of 0.97 (95% CI 0.96 to 0.98). A ceiling effect of the Berg Balance Scale was evident for some participants. In the analysis of absolute reliability, all of the relevant studies had an average score of 20 or above on the 0 to 56 point Berg Balance Scale. The absolute reliability across this part of the scale, as measured by the minimal detectable change with 95% confidence, varied between 2.8 points and 6.6 points. The Berg Balance Scale has a higher absolute reliability when close to 56 points due to the ceiling effect. We identified no data that estimated the absolute reliability of the Berg Balance Scale among participants with a mean score below 20 out of 56. The Berg Balance Scale has acceptable reliability, although it might not detect modest, clinically important changes in balance in individual subjects. The review was only able to comment on the absolute reliability of the Berg Balance Scale among people with moderately poor to normal balance. Copyright © 2013 Australian Physiotherapy Association. Published by .. All rights reserved.
Objections to routine clinical outcomes measurement in mental health services: any evidence so far?

PubMed

MacDonald, Alastair J D; Trauer, Tom

2010-12-01

Routine clinical outcomes measurement (RCOM) is gaining importance in mental health services. To examine whether criticisms published in advance of the development of RCOM have been borne out by data now available from such a programme. This was an observational study of routine ratings using HoNOS65+ at inception/admission and again at discharge in an old age psychiatry service from 1997 to 2008. Testable hypotheses were generated from each criticism amenable to empirical examination. Inter-rater reliability estimates were applied to observed differences between scores between community and ward patients using resampling. Five thousand one hundred eighty community inceptions and 862 admissions had HoNOS65+ ratings at referral/admission and discharge. We could find no evidence of gaming (artificially worse scores at inception and better at discharge), selection, attrition or detection bias, and ratings were consistent with diagnosis and level of service. Anticipated low levels of inter-rater reliability did not vitiate differences between levels of service. Although only hypotheses testable from within RCOM data were examined, and only 46% of eligible episodes had complete outcomes data, no evidence of the alleged biases were found. RCOM seems valid and practical in mental health services.
Introducing a new definition of a near fall: intra-rater and inter-rater reliability.

PubMed

Maidan, I; Freedman, T; Tzemah, R; Giladi, N; Mirelman, A; Hausdorff, J M

2014-01-01

Near falls (NFs) are more frequent than falls, and may occur before falls, potentially predicting fall risk. As such, identification of a NF is important. We aimed to assess intra and inter-rater reliability of the traditional definition of a NF and to demonstrate the potential utility of a new definition. To this end, 10 older adults, 10 idiopathic elderly fallers, and 10 patients with Parkinson's disease (PD) walked in an obstacle course while wearing a safety harness. All walks were videotaped. Forty-nine video segments were extracted to create 2 clips each of 8.48 min. Four raters scored each event using the traditional definition and, two weeks later, using the new definition. A fifth rater used only the new definition. Intra-rater reliability was determined using Kappa (K) statistics and inter-rater reliability was determined using ICC. Using the traditional definition, three raters had poor intra-rater reliability (K<0.054, p>0.137) and one rater had moderate intra-rater reliability (K=0.624, p<0.001). With the traditional definition, inter-rater reliability between the four raters was moderate (ICC=0.667, p<0.001). In contrast, the new NF definition showed high intra-rater (K>0.601, p<0.001) and excellent inter-rater reliability (ICC=0.815, p<0.001). A priori, it is easy to distinguish falls from usual walking and NFs, but it is more challenging to distinguish NFs from obstacle negotiation and usual walking. Therefore, a more precise definition of NF is required. The results of the present study suggest that the proposed new definition increases intra and inter-rater reliability, a critical step for using NFs to quantify fall risk. Copyright © 2013 Elsevier B.V. All rights reserved.
Inter-rater reliability of Hamilton depression rating scale using video-recorded interviews — Focus on rater-blinding

PubMed Central

Prasad, M. Krishna; Udupa, K.; Kishore, K. R.; Thirthalli, J.; Sathyaprabha, T. N.; Gangadhar, B. N.

2009-01-01

Background: Hamilton depression rating scale (Ham-D) is the most widely used clinician rating scale for depression. There has been no Indian study that has examined the inter-rater reliability (IRR) of video-recorded interviews of the 21-item Ham-D. Aim: To study the IRR of scoring video-recorded interviews for 21-item Ham-D. Materials and Methods: Eighteen subjects with major depressive disorder involved in a larger study were interviewed using the semi-structured clinical interview of the 21-item Ham-D by a primary rater after informed consent. These interviews were video-recorded and portions edited to ensure rater blinding. Subsequently, the video-recorded interviews were rated by a “blind” rater. Both rated the different sub-domains of Ham-D according to Rhoades and Overall (1983). IRR was evaluated using intra-class correlation coefficient. Results: Excellent IRR was observed (0.9891) between the two raters. This was true for each of the primary factors and super-factors. Conclusion: Video recorded 21-item Ham-D has excellentIRR. Video-recorded interviews of Ham-D can be reliably used to blind raters in research. PMID:19881046
Reliability of intracerebral hemorrhage classification systems: A systematic review.

PubMed

Rannikmäe, Kristiina; Woodfield, Rebecca; Anderson, Craig S; Charidimou, Andreas; Chiewvit, Pipat; Greenberg, Steven M; Jeng, Jiann-Shing; Meretoja, Atte; Palm, Frederic; Putaala, Jukka; Rinkel, Gabriel Je; Rosand, Jonathan; Rost, Natalia S; Strbian, Daniel; Tatlisumak, Turgut; Tsai, Chung-Fen; Wermer, Marieke Jh; Werring, David; Yeh, Shin-Joe; Al-Shahi Salman, Rustam; Sudlow, Cathie Lm

2016-08-01

Accurately distinguishing non-traumatic intracerebral hemorrhage (ICH) subtypes is important since they may have different risk factors, causal pathways, management, and prognosis. We systematically assessed the inter- and intra-rater reliability of ICH classification systems. We sought all available reliability assessments of anatomical and mechanistic ICH classification systems from electronic databases and personal contacts until October 2014. We assessed included studies' characteristics, reporting quality and potential for bias; summarized reliability with kappa value forest plots; and performed meta-analyses of the proportion of cases classified into each subtype. We included 8 of 2152 studies identified. Inter- and intra-rater reliabilities were substantial to perfect for anatomical and mechanistic systems (inter-rater kappa values: anatomical 0.78-0.97 [six studies, 518 cases], mechanistic 0.89-0.93 [three studies, 510 cases]; intra-rater kappas: anatomical 0.80-1 [three studies, 137 cases], mechanistic 0.92-0.93 [two studies, 368 cases]). Reporting quality varied but no study fulfilled all criteria and none was free from potential bias. All reliability studies were performed with experienced raters in specialist centers. Proportions of ICH subtypes were largely consistent with previous reports suggesting that included studies are appropriately representative. Reliability of existing classification systems appears excellent but is unknown outside specialist centers with experienced raters. Future reliability comparisons should be facilitated by studies following recently published reporting guidelines. © 2016 World Stroke Organization.
Intra- and interrater reliability of the 'lumbar-locked thoracic rotation test' in competitive swimmers ages 10 through 18 years.

PubMed

Feijen, Stef; Kuppens, Kevin; Tate, Angela; Baert, Isabel; Struyf, Thomas; Struyf, Filip

2018-04-17

Measuring thoracic spine mobility can be of interest to competitive swimmers as it has been associated with shoulder girdle function and scapular position in subjects with and without shoulder pain. At present, no reliability data of thoracic spine mobility measurements are available in the swimming population. This study aims to evaluate the within-session intra- and interrater reliability of the "lumbar-locked rotation test" for thoracic spine rotation in competitive swimmers aged 10 to 18 years. This reliability study is part of a larger prospective cohort study investigating potential risk factors for the development of shoulder pain in competitive swimmers. Within-session, intra- and inter-rater reliability. Competitive swimming clubs in Belgium. 21 competitive swimmers. Intra- and inter-rater reliability of the lumbar-locked thoracic rotation test. Intraclass correlation coefficients (ICCs) ranged from 0.91 (95% CI 0.78 to 0.96) to 0.96 (0.89-0.98) for intra-rater reliability. Results for inter-rater reliability ranged from 0.89 (0.72-0.95) to 0.86 (0.65-0.94) respectively for right and left thoracic rotation. Results suggest good to excellent reliability of the lumbar-locked thoracic rotation test, indicating this test can be used reliably in clinical practice. Copyright © 2018 Elsevier Ltd. All rights reserved.
Reliability of the Community Balance and Mobility Scale (CB&M) in high-functioning school-aged children and adolescents who have an acquired brain injury.

PubMed

Wright, F Virginia; Ryan, Jennifer; Brewer, Kelly

2010-01-01

To examine inter-rater, intra-rater and test-re-test reliability of the Community Balance and Mobility Scale (CB&M) and compare reliability in live vs videotape rating contexts for children with acquired brain injury (ABI). Repeated measures design. Seven physiotherapists (PTs) were trained as assessors. The primary assessor administered and scored baseline CB&M while the second assessor observed and scored independently (inter-rater reliability). Re-assessment occurred 3-10 days later by primary assessor (test-re-test reliability). Assessments were videotaped. There were 32 participants with ABI (mean age = 14 years 1 month (SD = 2 years 1 month)). Baseline mean scores were 67.4% (18.2) and 66.7% (18.3) for primary and second assessor, respectively. Primary assessors' re-test mean score was 69.3%. Inter-rater reliability ICC was 0.93 (95% confidence interval (CI) = 0.87-0.97). Test-re-test ICC was 0.90 (95%CI = 0.81-0.95) and Bland-Altman plot indicated greatest test-re-test differences for mid-range CB&M scores. Minimum detectable change (MDC₉₀) was 13.5% points. The CB&M showed excellent reliability in youth. Reliability was comparable for live and videotape rating approaches, meaning that the easier and less expensive live-rating can be recommended. Future work should focus on evaluation of responsiveness to change in rehabilitation centre and community intervention contexts.
The reliability and validity of the Saliba Postural Classification System

PubMed Central

Collins, Cristiana Kahl; Johnson, Vicky Saliba; Godwin, Ellen M.; Pappas, Evangelos

2016-01-01

Objectives To determine the reliability and validity of the Saliba Postural Classification System (SPCS). Methods Two physical therapists classified pictures of 100 volunteer participants standing in their habitual posture for inter and intra-tester reliability. For validity, 54 participants stood on a force plate in a habitual and a corrected posture, while a vertical force was applied through the shoulders until the clinician felt a postural give. Data were extracted at the time the give was felt and at a time in the corrected posture that matched the peak vertical ground reaction force (VGRF) in the habitual posture. Results Inter-tester reliability demonstrated 75% agreement with a Kappa = 0.64 (95% CI = 0.524–0.756, SE = 0.059). Intra-tester reliability demonstrated 87% agreement with a Kappa = 0.8, (95% CI = 0.702–0.898, SE = 0.05) and 80% agreement with a Kappa = 0.706, (95% CI = 0.594–0818, SE = 0.057). The examiner applied a significantly higher (p < 0.001) peak vertical force in the corrected posture prior to a postural give when compared to the habitual posture. Within the corrected posture, the %VGRF was higher when the test was ongoing vs. when a postural give was felt (p < 0.001). The %VGRF was not different between the two postures when comparing the peaks (p = 0.214). Discussion The SPCS has substantial agreement for inter- and intra-tester reliability and is largely a valid postural classification system as determined by the larger vertical forces in the corrected postures. Further studies on the correlation between the SPCS and diagnostic classifications are indicated. PMID:27559288
The reliability and validity of the Saliba Postural Classification System.

PubMed

Collins, Cristiana Kahl; Johnson, Vicky Saliba; Godwin, Ellen M; Pappas, Evangelos

2016-07-01

To determine the reliability and validity of the Saliba Postural Classification System (SPCS). Two physical therapists classified pictures of 100 volunteer participants standing in their habitual posture for inter and intra-tester reliability. For validity, 54 participants stood on a force plate in a habitual and a corrected posture, while a vertical force was applied through the shoulders until the clinician felt a postural give. Data were extracted at the time the give was felt and at a time in the corrected posture that matched the peak vertical ground reaction force (VGRF) in the habitual posture. Inter-tester reliability demonstrated 75% agreement with a Kappa = 0.64 (95% CI = 0.524-0.756, SE = 0.059). Intra-tester reliability demonstrated 87% agreement with a Kappa = 0.8, (95% CI = 0.702-0.898, SE = 0.05) and 80% agreement with a Kappa = 0.706, (95% CI = 0.594-0818, SE = 0.057). The examiner applied a significantly higher (p < 0.001) peak vertical force in the corrected posture prior to a postural give when compared to the habitual posture. Within the corrected posture, the %VGRF was higher when the test was ongoing vs. when a postural give was felt (p < 0.001). The %VGRF was not different between the two postures when comparing the peaks (p = 0.214). The SPCS has substantial agreement for inter- and intra-tester reliability and is largely a valid postural classification system as determined by the larger vertical forces in the corrected postures. Further studies on the correlation between the SPCS and diagnostic classifications are indicated.
Inter-Rater and Test-Retest (Between-Sessions) Reliability of the 4-Skills Scan for Dutch Elementary School Children

ERIC Educational Resources Information Center

van Kernebeek, Willem G.; de Schipper, Antoine W.; Savelsbergh, Geert J. P.; Toussaint, Huub M.

2018-01-01

In The Netherlands, the 4-Skills Scan is an instrument for physical education teachers to assess gross motor skills of elementary school children. Little is known about its reliability. Therefore, in this study the test-retest and inter-rater reliability was determined. Respectively, 624 and 557 Dutch 6- to 12-year-old children were analyzed for…
Measurement of glenohumeral joint translation using real-time ultrasound imaging: A physiotherapist and sonographer intra-rater and inter-rater reliability study.

PubMed

Rathi, Sangeeta; Taylor, Nicholas F; Gee, Jamie; Green, Rodney A

2016-12-01

Ultrasonography is an economical and non-invasive method for measuring real-time joint movements. Although physiotherapists are increasingly using ultrasound imaging for rotator cuff disorders, there is a lack of evidence on their reliability in using ultrasonography to measure glenohumeral translation. The aim of this study was to evaluate the reliability of a physiotherapist in measuring anterior and posterior glenohumeral joint translation with ultrasound. Study design: within day reliability. Anterior and posterior glenohumeral translations were measured at rest, in response to passive accessory motion testing force, and with isometric internal and external rotation in 12 young healthy adults. All the measurements were made in real time by a physiotherapist and an experienced sonographer in two positions (neutral and abducted) and in two views (anterior and posterior). Intra-rater and inter-rater reliability were expressed using intraclass correlation coefficients (ICC) and measurement error (mm). Intra-rater reliability was good for both raters (ICC P : 0.86-0.98; ICC S : 0.85-0.96). The inter-rater reliability between the physiotherapist and sonographer was moderate to good for posterior measurements (ICC 0.50-0.75) and poor to moderate for anterior measurements (ICC 0.31-0.53). For both intra-rater and inter-rater measurements, posterior translation was more reliable than the anterior translation with smaller measurement errors (posterior: 0.1-0.2 mm, anterior: 0.2-0.3 mm). A physiotherapist with minimal training was reliable in measuring glenohumeral joint translations. The ultrasound method was reliable for repeated measurement of both anterior and posterior glenohumeral translations with posterior measurements being more reliable than anterior. This method is recommended for future research to investigate the stabilising role of rotator cuff muscles. Copyright © 2016 Elsevier Ltd. All rights reserved.
Assessing local instrument reliability and validity: a field-based example from northern Uganda.

PubMed

Betancourt, Theresa S; Bass, Judith; Borisova, Ivelina; Neugebauer, Richard; Speelman, Liesbeth; Onyango, Grace; Bolton, Paul

2009-08-01

This paper presents an approach for evaluating the reliability and validity of mental health measures in non-Western field settings. We describe this approach using the example of our development of the Acholi psychosocial assessment instrument (APAI), which is designed to assess depression-like (two tam, par and kumu), anxiety-like (ma lwor) and conduct problems (kwo maraco) among war-affected adolescents in northern Uganda. To examine the criterion validity of this measure in the absence of a traditional gold standard, we derived local syndrome terms from qualitative data and used self reports of these syndromes by indigenous people as a reference point for determining caseness. Reliability was examined using standard test-retest and inter-rater methods. Each of the subscale scores for the depression-like syndromes exhibited strong internal reliability ranging from alpha = 0.84-0.87. Internal reliability was good for anxiety (0.70), conduct problems (0.83), and the pro-social attitudes and behaviors (0.70) subscales. Combined inter-rater reliability and test-retest reliability were good for most subscales except for the conduct problem scale and prosocial scales. The pattern of significant mean differences in the corresponding APAI problem scale score between self-reported cases vs. noncases on local syndrome terms was confirmed in the data for all of the three depression-like syndromes, but not for the anxiety-like syndrome ma lwor or the conduct problem kwo maraco.
A validation study of the Keyboard Personal Computer Style instrument (K-PeCS) for use with children.

PubMed

Green, Dido; Meroz, Anat; Margalit, Adi Edit; Ratzon, Navah Z

2012-11-01

This study examines a potential instrument for measurement of typing postures of children. This paper describes inter-rater, test-retest reliability and concurrent validity of the Keyboard Personal Computer Style instrument (K-PeCS), an observational measurement of postures and movements during keyboarding, for use with children. Two trained raters independently rated videos of 24 children (aged 7-10 years). Six children returned one week later for identifying test-retest reliability. Concurrent validity was assessed by comparing ratings obtained using the K-PECS to scores from a 3D motion analysis system. Inter-rater reliability was moderate to high for 12 out of 16 items (Kappa: 0.46 to 1.00; correlation coefficients: 0.77-0.95) and test-retest reliability varied across items (Kappa: 0.25 to 0.67; correlation coefficients: r = 0.20 to r = 0.95). Concurrent validity compared favourably across arm pathlength, wrist extension and ulnar deviation. In light of the limitations of other tools the K-PeCS offers a fairly affordable, reliable and valid instrument to address the gap for measurement of typing styles of children, despite the shortcomings of some items. However further research is required to refine the instrument for use in evaluating typing among children. Copyright © 2012 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Validation of the one pass measure for motivational interviewing competence.

PubMed

McMaster, Fiona; Resnicow, Ken

2015-04-01

This paper examines the psychometric properties of the OnePass coding system: a new, user-friendly tool for evaluating practitioner competence in motivational interviewing (MI). We provide data on reliability and validity with the current gold-standard: Motivational Interviewing Treatment Integrity tool (MITI). We compared scores from 27 videotaped MI sessions performed by student counselors trained in MI and simulated patients using both OnePass and MITI, with three different raters for each tool. Reliability was estimated using intra-class coefficients (ICCs), and validity was assessed using Pearson's r. OnePass had high levels of inter-rater reliability with 19/23 items found from substantial to almost perfect agreement. Taking the pair of scores with the highest inter-rater reliability on the MITI, the concurrent validity between the two measures ranged from moderate to high. Validity was highest for evocation, autonomy, direction and empathy. OnePass appears to have good inter-rater reliability while capturing similar dimensions of MI as the MITI. Despite the moderate concurrent validity with the MITI, the OnePass shows promise in evaluating both traditional and novel interpretations of MI. OnePass may be a useful tool for developing and improving practitioner competence in MI where access to MITI coders is limited. Copyright © 2015. Published by Elsevier Ireland Ltd.

The development of an instrument to match individuals with disabilities and service animals.

PubMed

Zapf, S A; Rough, R B

There has been an increase in the use of service animals assisting persons with disabilities in the past decade. However many of the service dog agencies do not utilize an assessment that is designed to match the person to the animal in the rehabilitation and psycho-social domains. The purpose of this study was to develop the Service Animal Adaptive Intervention Assessment (SAAIA) and to measure the content validity, inter-rater reliability and clinical utility of the assessment. Two subject groups were used. Subject group one had 43 subjects who measured the content validity and clinical utility of the SAAIA Survey. Subject group two had 12 subjects who measured the inter-rater reliability by completing the SAAIA using information obtained through a video-taped client case scenario. Content validity results indicated a good to high percentage of agreement and a fair percentage of agreement for clinical utility. Inter-rater reliability results indicate good to high agreement on six of the eight variables of the SAAIA. However, the Kappa score indicates low inter-rater reliability. Results indicate the SAAIA has good content validity and inter-rater reliability and fair clinical utility based on percent agreement. However, further research is needed on the reliability of the SAAIA.
The intensive care delirium screening checklist: translation and reliability testing in a Swedish ICU.

PubMed

Neziraj, M; Sarac Kart, N; Samuelson, Karin

2011-08-01

The view of delirium has changed considerably over the last decade, and delirium is now a very topical issue within the intensive care unit (ICU) setting. Delirium has proved to be common in critically ill patients and is manifested as acute changes in mental status with reduced cognitive ability, incoherent thought patterns, impaired consciousness, agitation and acute confusion. In order to be able to prevent, identify and alleviate problems related to delirium it is important that validated instruments for delirium screening are implemented and evaluated. The aim of this study was to translate the Intensive Care Delirium Screening Checklist (ICDSC) into Swedish and test the inter-rater reliability in a Swedish general ICU setting. The study was carried out during 2009 in a general Swedish ICU. A translation of the scale from English into Swedish was made, including back-translation, critical review and pilot testing. A total of 49 paired ratings were carried out using the Swedish version of the ICDSC scale. The inter-rater reliability was tested using weighted kappa (κ) statistics (linear weighting). The ICDSC scale was successfully translated into Swedish and the inter-rater reliability testing of the Swedish version resulted in a weighted k value of 0.92. The result of this study indicates that the Swedish version of the ICDSC scale has a very good inter-rater reliability. The high inter-rater reliability and the ease of administration make the ICDSC scale applicable for delirium screening in a Swedish ICU setting. © 2011 The Authors. Acta Anaesthesiologica Scandinavica © 2011 The Acta Anaesthesiologica Scandinavica Foundation.
Inter and intra-rater reliability of mobile device goniometer in measuring lumbar flexion range of motion.

PubMed

Bedekar, Nilima; Suryawanshi, Mayuri; Rairikar, Savita; Sancheti, Parag; Shyam, Ashok

2014-01-01

Evaluation of range of motion (ROM) is integral part of assessment of musculoskeletal system. This is required in health fitness and pathological conditions; also it is used as an objective outcome measure. Several methods are described to check spinal flexion range of motion. Different methods for measuring spine ranges have their advantages and disadvantages. Hence, a new device was introduced in this study using the method of dual inclinometer to measure lumbar spine flexion range of motion (ROM). To determine Intra and Inter-rater reliability of mobile device goniometer in measuring lumbar flexion range of motion. iPod mobile device with goniometer software was used. The part being measure i.e the back of the subject was suitably exposed. Subject was standing with feet shoulder width apart. Spinous process of second sacral vertebra S2 and T12 were located, these were used as the reference points and readings were taken. Three readings were taken for each: inter-rater reliability as well as the intra-rater reliability. Sufficient rest was given between each flexion movement. Intra-rater reliability using ICC was r=0.920 and inter-rater r=0.812 at CI 95%. Validity r=0.95. Mobile device goniometer has high intra-rater reliability. The inter-rater reliability was moderate. This device can be used to assess range of motion of spine flexion, representing uni-planar movement.
Is the Bayley Scales of Infant and Toddler Developmental Screening Test, Valid and Reliable for Persian Speaking Children?

PubMed

Soleimani, Farin; Azari, Nadia; Vameghi, Roshanak; Sajedi, Firoozeh; Shahshahani, Soheila; Karimi, Hossein; Kraskian, Adis; Shahrokhi, Amin; Teymouri, Robab; Gharib, Masoud

2016-10-01

Advances in perinatal and neonatal care have substantially improved the survival of at-risk infants over the past two decades. The purpose of this study was to assess the reliability and validity of the Bayley Scales of infant and toddler developmental Screening test in Persian-speaking children. This was a cross-sectional prospective study of 403 children aged 1 - 42-months. The Bayley scales screening instrument, which consists of five domains (cognitive, receptive, and expressive communication and fine and gross motor items), was used to measure infants' and toddlers' development. The psychometric properties examined included the face and content validity of the scale, in addition to cultural and linguistic modifications to the scale and its test-retest and inter-rater reliability. An expert team changed some of the test items relating to cultural and linguistic issues. In almost all the age groups, cultural or linguistic changes were made to items in the communication domains. According to Cronbach's alpha for internal consistency, the reliability of the cognitive scale was r = 0.79, and the reliability of the receptive scale was r = 0.76. The reliability for expressive communication, fine motor, and gross motor scales was r = 0.81, r = 0.80, and r = 0.81, respectively. The construct validity of the tests was confirmed using a factor analysis and comparison of the mean scores of the age groups. The intra- and inter-rater reliabilities of the Bayley Scales were good-to-excellent. The results indicated that the Bayley Scales had a high level of reliability in the present study. Thus, the scale can be used in a Persian population.
Cognitive emotion regulation questionnaire in hypertensive patients.

PubMed

Duan, Shu; Liu, Yiqun; Xiao, Jing; Zhao, Shuiping; Zhu, Xiongzhao

2011-06-01

To examine the reliability,validity,and practicability of Cognitive Emotion Regulation Questionnaire (CERQ) in hypertensive patients in China. Altogether 434 hypertensive patients and 462 healthy subjects were recruited. All the subjects were assessed with the CERQ-Chinese version (CERQ-C), Dysfunctional Attitude Scale (DAS), Mood and Anxiety Symptom Questionnaire-Short Form (MASQ-SF), and Center for Epidemiologic Studies Depression Scale (CES-D). We calculated the mean inter-item correlations for the total CERQ and for each of the subscales. Cronbach's alpha coefficient was used to analyze the inter-correlation and reliability, and confirmatory factor analysis was used to examine the 9-factor model. 1) Hypertension group reported significantly higher score than that of healthy ones on rumination (12.19 ± 2.51 vs. 11.51 ± 2.60, P<0.001), catastrophizing(8.82 ± 2.19 vs.8.11 ± 2.70,P<0.001),and blaming others(10.76 ± 2.11 vs. 9.88 ± 2.48,P<0.001), and had significantly lower score than that of healthy ones on positive reappraisal(13.80 ± 3.55 vs.14.71 ± 4.11,P<0.001).2)Reliability:In the hypertension group the Cronbach's alpha for the total CERQ was 0.80, and that for the 9 subscales ranged from 0.71 (self-blame) to 0.90 (rumination). In the healthy group the Cronbach's alpha for the total CERQ was 0.79, and that for the 9 subscales ranged from 0.71 (positive reappraisal) to 0.90 (rumination). The mean inter-item correlation coefficient for the 9 subscales was 0.21-0.42(the hypertension group)/0.19-0.32 (the healthy group). In the hypertension group,the test-retest reliability of the total scale was 0.82, the test-retest reliability of the 9 subscales ranged from 0.73 to 0.92. The confirmatory factor analysis showed that the 9 first-order factor data fitted both 2 samples well. CERQ meets the psychometric standard and it is reliable and valid for cognitive emotion regulation strategies, which may be regarded as an appropriate assessment tool.
Evaluation of previously embolized intracranial aneurysms: inter-and intra-rater reliability among neurosurgeons and interventional neuroradiologists.

PubMed

Zuckerman, Scott L; Lakomkin, Nikita; Magarik, Jordan A; Vargas, Jan; Stephens, Marcus; Akinpelu, Babatunde; Spiotta, Alejandro M; Ahmed, Azam; Arthur, Adam S; Fiorella, David; Hanel, Ricardo; Hirsch, Joshua A; Hui, Ferdinand K; James, Robert F; Kallmes, David F; Meyers, Philip M; Niemann, David B; Rasmussen, Peter; Turner, Raymond D; Welch, Babu G; Mocco, J

2018-05-01

The angiographic evaluation of previously coiled aneurysms can be difficult yet remains critical for determining re-treatment. The main objective of this study was to determine the inter-rater reliability for both the Raymond Scale and per cent embolization among a group of neurointerventionalists evaluating previously embolized aneurysms. A panel of 15 neurointerventionalists examined 92 distinct cases of immediate post-coil embolization and 1 year post-embolization angiographs. Each case was presented four times throughout the study, along with alterations in demographics in order to evaluate intra-rater reliability. All respondents were asked to provide the per cent embolization (0-100%) and Raymond Scale grade (1-3) for each aneurysm. Inter-rater reliability was evaluated by computing weighted kappa values (for the Raymond Scale) and intraclass correlation coefficients (ICC) for per cent embolization. 10 neurosurgeons and 5 interventional neuroradiologists evaluated 368 simulated cases. The agreement among all readers employing the Raymond Scale was fair (κ=0.35) while concordance in per cent embolization was good (ICC=0.64). Clinicians with fewer than 10 years of experience demonstrated a significantly greater level of agreement than the group with greater than 10 years (κ=0.39 and ICC=0.70 vs κ=0.28 and ICC=0.58). When the same aneurysm was presented multiple times, clinicians demonstrated excellent consistency when assessing per cent embolization (ICC=0.82), but moderate agreement when employing the Raymond classification (κ=0.58). Identifying the per cent embolization in previously coiled aneurysms resulted in good inter- and intra-rater agreement, regardless of years of experience. The strong agreement among providers employing per cent embolization may make it a valuable tool for embolization assessment in this patient population. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
The root coverage esthetic score: Intra-examiner reliability among students and faculty at tufts university school of dental medicine.

PubMed

Isaia, Federica; Gyurko, Robert; Roomian, Tamar C; Hawley, Charles E

2018-04-06

The Root Coverage Esthetic Score (RES) was published in 2009 as an esthetic scoring system to measure visible final outcomes of root coverage procedures performed on Miller I and II recession defects. The aim of this study was to evaluate the intra-examiner, intra-group, and inter-examiner reliability of the (Root Coverage Esthetic Score) RES when used among periodontal faculty, post-graduate students in periodontology, and pre-doctoral DMD students when using the RES at Tufts University School of Dental Medicine (TUSDM). Thirty-three participants (12 second year DMD students, 11 periodontal residents, and 10 faculty members) were assembled to evaluate 25 baseline and 6-months post-treatment outcomes of mucogingival surgeries using the RES. Each projection was shown for 30 seconds during which the participants were asked to use the RES scoring system to evaluate the surgical outcomes. The results were then recorded on a standardized worksheet grid. To test intra-examiner reliability, 7 of the 25 projections were shown twice. Intra-examiner reliability and inter-examiner reliability were assessed using intraclass correlation coefficient using a two-way mixed effects model, and stratified by education level. PG residents had the highest tendency to agree with each other with an interclass correlation (ICC) of 0.53 (95%CI 0.36 - 0.74). DMD students had an ICC: 0.51 (95%CI: 0.33 - 0.75), and PG faculty members produced an ICC: 0.41 (95%CI: 0.24 - 0.64). There was no statistically significant difference in ICC among the three groups of participants (Kruskal-Wallis test, P = 0.2440). When the data for each RES element were then combined, the mean ICC for the total interrater agreement for RES was 0.48 (95% CI: 0.32-0.71). This corresponds to an overall moderate agreement among all participants using the RES to evaluate the 25 surgical outcomes. The intra-examiner reliability within each of the three groups was quite high. The highest mean ICC was produced by the PG Faculty (0.908). The mean ICCs for PG residents was 0.867, and the mean ICC for DMD students was 0.855. The Kruskal-Wallis test (p = 0.46) failed to find any statistical difference in intra-examiner reliability between the three groups of participants CONCLUSIONS: The RES is a "moderately" reliable scoring system for mucogingival treatments in a dental school setting and can be used even by operators with different level of periodontal experience. This scoring system can be repeated by the same examiner obtaining reliable results. This article is protected by copyright. All rights reserved. © 2018 American Academy of Periodontology.
The sizing of hamstring grafts for anterior cruciate reconstruction: intra- and inter-observer reliability.

PubMed

Dwyer, Tim; Whelan, Daniel B; Khoshbin, Amir; Wasserstein, David; Dold, Andrew; Chahal, Jaskarndip; Nauth, Aaron; Murnaghan, M Lucas; Ogilvie-Harris, Darrell J; Theodoropoulos, John S

2015-04-01

The objective of this study was to establish the intra- and inter-observer reliability of hamstring graft measurement using cylindrical sizing tubes. Hamstring tendons (gracilis and semitendinosus) were harvested from ten cadavers by a single surgeon and whip stitched together to create ten 4-strand hamstring grafts. Ten sports medicine surgeons and fellows sized each graft independently using either hollow cylindrical sizers or block sizers in 0.5-mm increments—the sizing technique used was applied consistently to each graft. Surgeons moved sequentially from graft to graft and measured each hamstring graft twice. Surgeons were asked to state the measured proximal (femoral) and distal (tibial) diameter of each graft, as well as the diameter of the tibial and femoral tunnels that they would drill if performing an anterior cruciate ligament (ACL) reconstruction using that graft. Reliability was established using intra-class correlation coefficients. Overall, both the inter-observer and intra-observer agreement were >0.9, demonstrating excellent reliability. The inter-observer reliability for drill sizes was also excellent (>0.9). Excellent correlation was seen between cylindrical sizing, and drill sizes (>0.9). Sizing of hamstring grafts by multiple surgeons demonstrated excellent intra-observer and intra-observer reliability, potentially validating clinical studies exploring ACL reconstruction outcomes by hamstring graft diameter when standard techniques are used. III.
How reliable are Functional Movement Screening scores? A systematic review of rater reliability.

PubMed

Moran, Robert W; Schneiders, Anthony G; Major, Katherine M; Sullivan, S John

2016-05-01

Several physical assessment protocols to identify intrinsic risk factors for injury aetiology related to movement quality have been described. The Functional Movement Screen (FMS) is a standardised, field-expedient test battery intended to assess movement quality and has been used clinically in preparticipation screening and in sports injury research. To critically appraise and summarise research investigating the reliability of scores obtained using the FMS battery. Systematic literature review. Systematic search of Google Scholar, Scopus (including ScienceDirect and PubMed), EBSCO (including Academic Search Complete, AMED, CINAHL, Health Source: Nursing/Academic Edition), MEDLINE and SPORTDiscus. Studies meeting eligibility criteria were assessed by 2 reviewers for risk of bias using the Quality Appraisal of Reliability Studies checklist. Overall quality of evidence was determined using van Tulder's levels of evidence approach. 12 studies were appraised. Overall, there was a 'moderate' level of evidence in favour of 'acceptable' (intraclass correlation coefficient ≥0.6) inter-rater and intra-rater reliability for composite scores derived from live scoring. For inter-rater reliability of composite scores derived from video recordings there was 'conflicting' evidence, and 'limited' evidence for intra-rater reliability. For inter-rater reliability based on live scoring of individual subtests there was 'moderate' evidence of 'acceptable' reliability (κ≥0.4) for 4 subtests (Deep Squat, Shoulder Mobility, Active Straight-leg Raise, Trunk Stability Push-up) and 'conflicting' evidence for the remaining 3 (Hurdle Step, In-line Lunge, Rotary Stability). This review found 'moderate' evidence that raters can achieve acceptable levels of inter-rater and intra-rater reliability of composite FMS scores when using live ratings. Overall, there were few high-quality studies, and the quality of several studies was impacted by poor study reporting particularly in relation to rater blinding. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Inter-Observer and Intra-Observer Reliability of Clinical Assessments in Knee Osteoarthritis

PubMed Central

Maricar, Nasimah; Callaghan, Michael J; Parkes, Matthew J; Felson, David T; O’Neill, Terence W

2016-01-01

Background Clinical examination of the knee is subject to measurement error. The aim of this analysis was to determine inter- and intra-observer reliability of commonly used clinical tests in patients with knee osteoarthritis(OA). Methods We studied subjects with symptomatic knee OA who were participants in an open-label clinical trial of intra-articular steroid therapy. Following standardisation of the clinical test procedures, two clinicians assessed 25 subjects independently at the same visit, and the same clinician assessed 88 subjects over an interval period of 2–10 weeks; in both cases prior to the steroid intervention. Clinical examination included assessment of bony enlargement, crepitus, quadriceps wasting, knee effusion, joint-line and anserine tenderness and knee range of movement(ROM). Intra-class correlation coefficients(ICC), estimated kappa(κ), weighted kappa(κω) and Bland and Altman plots were used to determine inter- and intra-observer levels of agreement. Results Using Landis and Koch criteria, inter-observer kappa scores were moderate for patellofemoral joint(κ=0.53) and anserine tenderness(κ=0.48); good for bony enlargement(κ=0.66), quadriceps wasting(κ=0.78), crepitus(κ=0.78), medial tibiofemoral joint tenderness(κ=0.76), and effusion assessed by ballottement(κ=0.73) and bulge sign(κω =0.78); and excellent for lateral tibiofemoral joint tenderness(κ=1.00), flexion(ICC=0.97) and extension(ICC=0.87) ROM. Intra-observer kappa scores were moderate for lateral tibiofemoral joint tenderness(κ=0.60), good for crepitus(κ=0.78), effusion assessed by ballottement test(κ=0.77), patellofemoral joint(κ=0.66), medial tibiofemoral joint(κ=0.64) and anserine(κ=0.73) tenderness and excellent for effusion assessed by bulge sign(κω =0.83), bony enlargement(κ=0.98), quadriceps wasting(κ=0.83), flexion(ICC=0.99) and extension(ICC=0.96) ROM. Conclusion Among individuals with symptomatic knee OA, the reliability of clinical examination of the knee was at least good for the majority of clinical signs of knee OA. PMID:27909143
Inter-rater reliability for movement pattern analysis (MPA): measuring patterning of behaviors versus discrete behavior counts as indicators of decision-making style

PubMed Central

Connors, Brenda L.; Rende, Richard; Colton, Timothy J.

2014-01-01

The unique yield of collecting observational data on human movement has received increasing attention in a number of domains, including the study of decision-making style. As such, interest has grown in the nuances of core methodological issues, including the best ways of assessing inter-rater reliability. In this paper we focus on one key topic – the distinction between establishing reliability for the patterning of behaviors as opposed to the computation of raw counts – and suggest that reliability for each be compared empirically rather than determined a priori. We illustrate by assessing inter-rater reliability for key outcome measures derived from movement pattern analysis (MPA), an observational methodology that records body movements as indicators of decision-making style with demonstrated predictive validity. While reliability ranged from moderate to good for raw counts of behaviors reflecting each of two Overall Factors generated within MPA (Assertion and Perspective), inter-rater reliability for patterning (proportional indicators of each factor) was significantly higher and excellent (ICC = 0.89). Furthermore, patterning, as compared to raw counts, provided better prediction of observable decision-making process assessed in the laboratory. These analyses support the utility of using an empirical approach to inform the consideration of measuring patterning versus discrete behavioral counts of behaviors when determining inter-rater reliability of observable behavior. They also speak to the substantial reliability that may be achieved via application of theoretically grounded observational systems such as MPA that reveal thinking and action motivations via visible movement patterns. PMID:24999336
Inter-rater reliability for movement pattern analysis (MPA): measuring patterning of behaviors versus discrete behavior counts as indicators of decision-making style.

PubMed

Connors, Brenda L; Rende, Richard; Colton, Timothy J

2014-01-01

The unique yield of collecting observational data on human movement has received increasing attention in a number of domains, including the study of decision-making style. As such, interest has grown in the nuances of core methodological issues, including the best ways of assessing inter-rater reliability. In this paper we focus on one key topic - the distinction between establishing reliability for the patterning of behaviors as opposed to the computation of raw counts - and suggest that reliability for each be compared empirically rather than determined a priori. We illustrate by assessing inter-rater reliability for key outcome measures derived from movement pattern analysis (MPA), an observational methodology that records body movements as indicators of decision-making style with demonstrated predictive validity. While reliability ranged from moderate to good for raw counts of behaviors reflecting each of two Overall Factors generated within MPA (Assertion and Perspective), inter-rater reliability for patterning (proportional indicators of each factor) was significantly higher and excellent (ICC = 0.89). Furthermore, patterning, as compared to raw counts, provided better prediction of observable decision-making process assessed in the laboratory. These analyses support the utility of using an empirical approach to inform the consideration of measuring patterning versus discrete behavioral counts of behaviors when determining inter-rater reliability of observable behavior. They also speak to the substantial reliability that may be achieved via application of theoretically grounded observational systems such as MPA that reveal thinking and action motivations via visible movement patterns.
Anthropometry of the Human Scaphoid Waist by Three-Dimensional Computed Tomography.

PubMed

Smith, Jennifer; Hofmeister, Eric P; Renninger, Christopher; Kroonen, Leo T

2015-01-01

Published measurements for the scaphoid are scarce. The purpose of this study is to define anthropometric norms for the waist of the scaphoid to assist in optimizing bone graft quantity and implant use. Computed tomography images of the wrist were reviewed by three surgeons. Anthropometric data were gathered, including the scaphoid waist diameter in two dimensions and the scaphoid waist volume. Each study was measured twice, allowing for determination of inter- and intraobserver reliability. Forty-three studies were examined (23 female and 20 male). Average measurements of the scaphoid waist were 11.28 ± 0.26 mm in the sagittal plane and 8.70 ± 0.17 mm in the coronal plane, and the waist volume was 715 ± 33.0 mm3. Specific measures of the narrowest portion of the scaphoid are provided by this study. Measurements of the scaphoid waist through the use of three-dimensional imaging are an accurate method with good inter- and intraobserver reliability. The measurements obtained from this study can be applied to guide graft and implant selection for treatment of scaphoid waist fractures and nonunions.
Reliability of analysis of the bone mineral density of the second and fifth metatarsals using dual-energy x-ray absorptiometry (DXA).

PubMed

Pritchard, N Stewart; Smoliga, James M; Nguyen, Anh-Dung; Branscomb, Micah C; Sinacore, David R; Taylor, Jeffrey B; Ford, Kevin R

2017-01-01

Metatarsal fractures, especially of the fifth metatarsal, are common injuries of the foot in a young athletic population, but the risk factors for this injury are not well understood. Dual-energy x-ray absorptiometry (DXA) provides reliable measures of regional bone mineral density to predict fracture risk in the hip and lumbar spine. Recently, sub-regional metatarsal reliability was established in fresh cadaveric specimens and associated with ultimate fracture force. The purpose of this study was to assess the reliability of DXA bone mineral density measurements of sub-regions of the second and fifth metatarsals in a young, active population. Thirty two recreationally active individuals participated in the study, and the bone density of the second (2MT) and fifth (5MT) metatarsals of each subject was measured using a Hologic QDR x-ray bone densitometer. Scans were analyzed separately by two raters, and regional bone mineral density, bone mineral content, and area measurements were calculated for the proximal, shaft, and distal regions of the bone. Intra-rater, inter-rater, and scan-rescan reliability were then determined for each region. Proximal and shaft bone mineral density measurements of the second and fifth metatarsal were reliable. ICC's were variable across regions and metatarsals, with the distal region being the poorest. Bone mineral density measurements of the metatarsals may be a better indicator of fracture risk of the metatarsals than whole body measurements. A reliable method for measuring the regional bone mineral densities of the metatarsals was found. However, inter-rater reliability and scan-rescan reliability for the distal regions were poor. Future research should examine the relationship between DXA bone mineral density measurements and fracture risk at the metatarsals.
Measuring verbal and non-verbal communication in aphasia: reliability, validity, and sensitivity to change of the Scenario Test.

PubMed

van der Meulen, Ineke; van de Sandt-Koenderman, W Mieke E; Duivenvoorden, Hugo J; Ribbers, Gerard M

2010-01-01

This study explores the psychometric qualities of the Scenario Test, a new test to assess daily-life communication in severe aphasia. The test is innovative in that it: (1) examines the effectiveness of verbal and non-verbal communication; and (2) assesses patients' communication in an interactive setting, with a supportive communication partner. To determine the reliability, validity, and sensitivity to change of the Scenario Test and discuss its clinical value. The Scenario Test was administered to 122 persons with aphasia after stroke and to 25 non-aphasic controls. Analyses were performed for the entire group of persons with aphasia, as well as for a subgroup of persons unable to communicate verbally (n = 43). Reliability (internal consistency, test-retest reliability, inter-judge, and intra-judge reliability) and validity (internal validity, convergent validity, known-groups validity) and sensitivity to change were examined using standard psychometric methods. The Scenario Test showed high levels of reliability. Internal consistency (Cronbach's alpha = 0.96; item-rest correlations = 0.58-0.82) and test-retest reliability (ICC = 0.98) were high. Agreement between judges in total scores was good, as indicated by the high inter- and intra-judge reliability (ICC = 0.86-1.00). Agreement in scores on the individual items was also good (square-weighted kappa values 0.61-0.92). The test demonstrated good levels of validity. A principal component analysis for categorical data identified two dimensions, interpreted as general communication and communicative creativity. Correlations with three other instruments measuring communication in aphasia, that is, Spontaneous Speech interview from the Aachen Aphasia Test (AAT), Amsterdam-Nijmegen Everyday Language Test (ANELT), and Communicative Effectiveness Index (CETI), were moderate to strong (0.50-0.85) suggesting good convergent validity. Group differences were observed between persons with aphasia and non-aphasic controls, as well as between persons with aphasia unable to use speech to convey information and those able to communicate verbally; this indicates good known-groups validity. The test was sensitive to changes in performance, measured over a period of 6 months. The data support the reliability and validity of the Scenario Test as an instrument for examining daily-life communication in aphasia. The test focuses on multimodal communication; its psychometric qualities enable future studies on the effect of Alternative and Augmentative Communication (AAC) training in aphasia.
Reliability and Normative Data for the Dynamic Visual Acuity Test for Vestibular Screening.

PubMed

Riska, Kristal M; Hall, Courtney D

2016-06-01

The purpose of this study was to determine reliability of computerized dynamic visual acuity (DVA) testing and to determine reference values for younger and older adults. A primary function of the vestibular system is to maintain gaze stability during head motion. The DVA test quantifies gaze stabilization with the head moving versus stationary. Commercially available computerized systems allow clinicians to incorporate DVA into their assessment; however, information regarding reliability and normative values of these systems is sparse. Forty-six healthy adults, grouped by age, with normal vestibular function were recruited. Each participant completed computerized DVA testing including static visual acuity, minimum perception time, and DVA using the NeuroCom inVision System. Testing was performed by two examiners in the same session and then repeated at a follow-up session 3 to 14 days later. Intraclass correlation coefficients (ICCs) were used to determine inter-rater and test-retest reliability. ICCs for inter-rater reliability ranged from 0.323 to 0.937 and from 0.434 to 0.909 for horizontal and vertical head movements, respectively. ICCs for test-retest reliability ranged from 0.154 to 0.856 and from 0.377 to 0.9062 for horizontal and vertical head movements, respectively. Overall, raw scores (left/right DVA and up/down DVA) were more reliable than DVA loss scores. Reliability of a commercially available DVA system has poor-to-fair reliability for DVA loss scores. The use of a convergence paradigm and not incorporating the forced choice paradigm may contribute to poor reliability.
The Outdoor MEDIA DOT: The development and inter-rater reliability of a tool designed to measure food and beverage outlets and outdoor advertising.

PubMed

Poulos, Natalie S; Pasch, Keryn E

2015-07-01

Few studies of the food environment have collected primary data, and even fewer have reported reliability of the tool used. This study focused on the development of an innovative electronic data collection tool used to document outdoor food and beverage (FB) advertising and establishments near 43 middle and high schools in the Outdoor MEDIA Study. Tool development used GIS based mapping, an electronic data collection form on handheld devices, and an easily adaptable interface to efficiently collect primary data within the food environment. For the reliability study, two teams of data collectors documented all FB advertising and establishments within one half-mile of six middle schools. Inter-rater reliability was calculated overall and by advertisement or establishment category using percent agreement. A total of 824 advertisements (n=233), establishment advertisements (n=499), and establishments (n=92) were documented (range=8-229 per school). Overall inter-rater reliability of the developed tool ranged from 69-89% for advertisements and establishments. Results suggest that the developed tool is highly reliable and effective for documenting the outdoor FB environment. Copyright © 2015 Elsevier Ltd. All rights reserved.
The Outdoor MEDIA DOT: The Development and Inter-Rater Reliability of a Tool Designed to Measure Food and Beverage Outlets and Outdoor Advertising

PubMed Central

Poulos, Natalie S.; Pasch, Keryn E.

2015-01-01

Few studies of the food environment have collected primary data, and even fewer have reported reliability of the tool used. This study focused on the development of an innovative electronic data collection tool used to document outdoor food and beverage (FB) advertising and establishments near 43 middle and high schools in the Outdoor MEDIA Study. Tool development used GIS based mapping, an electronic data collection form on handheld devices, and an easily adaptable interface to efficiently collect primary data within the food environment. For the reliability study, two teams of data collectors documented all FB advertising and establishments within one half-mile of six middle schools. Inter-rater reliability was calculated overall and by advertisement or establishment category using percent agreement. A total of 824 advertisements (n=233), establishment advertisements (n=499), and establishments (n=92) were documented (range=8–229 per school). Overall inter-rater reliability of the developed tool ranged from 69–89% for advertisements and establishments. Results suggest that the developed tool is highly reliable and effective for documenting the outdoor FB environment. PMID:26022774
Organizational readiness for implementing change: a psychometric assessment of a new measure.

PubMed

Shea, Christopher M; Jacobs, Sara R; Esserman, Denise A; Bruce, Kerry; Weiner, Bryan J

2014-01-10

Organizational readiness for change in healthcare settings is an important factor in successful implementation of new policies, programs, and practices. However, research on the topic is hindered by the absence of a brief, reliable, and valid measure. Until such a measure is developed, we cannot advance scientific knowledge about readiness or provide evidence-based guidance to organizational leaders about how to increase readiness. This article presents results of a psychometric assessment of a new measure called Organizational Readiness for Implementing Change (ORIC), which we developed based on Weiner's theory of organizational readiness for change. We conducted four studies to assess the psychometric properties of ORIC. In study one, we assessed the content adequacy of the new measure using quantitative methods. In study two, we examined the measure's factor structure and reliability in a laboratory simulation. In study three, we assessed the reliability and validity of an organization-level measure of readiness based on aggregated individual-level data from study two. In study four, we conducted a small field study utilizing the same analytic methods as in study three. Content adequacy assessment indicated that the items developed to measure change commitment and change efficacy reflected the theoretical content of these two facets of organizational readiness and distinguished the facets from hypothesized determinants of readiness. Exploratory and confirmatory factor analysis in the lab and field studies revealed two correlated factors, as expected, with good model fit and high item loadings. Reliability analysis in the lab and field studies showed high inter-item consistency for the resulting individual-level scales for change commitment and change efficacy. Inter-rater reliability and inter-rater agreement statistics supported the aggregation of individual level readiness perceptions to the organizational level of analysis. This article provides evidence in support of the ORIC measure. We believe this measure will enable testing of theories about determinants and consequences of organizational readiness and, ultimately, assist healthcare leaders to reduce the number of health organization change efforts that do not achieve desired benefits. Although ORIC shows promise, further assessment is needed to test for convergent, discriminant, and predictive validity.
Organizational readiness for implementing change: a psychometric assessment of a new measure

PubMed Central

2014-01-01

Background Organizational readiness for change in healthcare settings is an important factor in successful implementation of new policies, programs, and practices. However, research on the topic is hindered by the absence of a brief, reliable, and valid measure. Until such a measure is developed, we cannot advance scientific knowledge about readiness or provide evidence-based guidance to organizational leaders about how to increase readiness. This article presents results of a psychometric assessment of a new measure called Organizational Readiness for Implementing Change (ORIC), which we developed based on Weiner’s theory of organizational readiness for change. Methods We conducted four studies to assess the psychometric properties of ORIC. In study one, we assessed the content adequacy of the new measure using quantitative methods. In study two, we examined the measure’s factor structure and reliability in a laboratory simulation. In study three, we assessed the reliability and validity of an organization-level measure of readiness based on aggregated individual-level data from study two. In study four, we conducted a small field study utilizing the same analytic methods as in study three. Results Content adequacy assessment indicated that the items developed to measure change commitment and change efficacy reflected the theoretical content of these two facets of organizational readiness and distinguished the facets from hypothesized determinants of readiness. Exploratory and confirmatory factor analysis in the lab and field studies revealed two correlated factors, as expected, with good model fit and high item loadings. Reliability analysis in the lab and field studies showed high inter-item consistency for the resulting individual-level scales for change commitment and change efficacy. Inter-rater reliability and inter-rater agreement statistics supported the aggregation of individual level readiness perceptions to the organizational level of analysis. Conclusions This article provides evidence in support of the ORIC measure. We believe this measure will enable testing of theories about determinants and consequences of organizational readiness and, ultimately, assist healthcare leaders to reduce the number of health organization change efforts that do not achieve desired benefits. Although ORIC shows promise, further assessment is needed to test for convergent, discriminant, and predictive validity. PMID:24410955

Objective measurements of excess skin in post bariatric patients--inter-rater reliability.

PubMed

Biörserud, Christina; Fagevik Olsén, Monika; Elander, Anna; Wiklund, Malin

2016-01-01

An ability to reliably assess excess skin after massive weight loss using well-described and transferrable methods is important. The aim of this trial was to evaluate inter-rater reliability of ptosis and circumference measurements in patients with excess skin after bariatric surgery. Twenty-five postbariatric patients were included in the study, and their excess skin was measured 18 months after surgery. A protocol was designed to measure excess skin in a standardised way. To evaluate the inter-rater reliability in the measuring protocol, all patients were measured twice, by a specialist nurse and a specialist physiotherapist. All circumference measurements on different body parts had an ICC > 0.9, indicating high reliability. Furthermore, all breast and abdominal ptosis measurements had high reliability. In contrast, visual evaluation of abdominal ptosis had poor reliability. Measurements of ptoses on different body parts had an ICC > 0.6. There were no systematic differences between the results of the two testers, except for measurements of the buttocks and maximal knee circumference. The measuring protocol presented in this study has high reliability and, therefore, represents a useful instrument to provide a consistent and objective assessment of excess skin in the postbariatric patient.
Assessing movement quality in persons with severe mental illness - Reliability and validity of the Body Awareness Scale Movement Quality and Experience.

PubMed

Hedlund, Lena; Gyllensten, Amanda Lundvik; Waldegren, Tomas; Hansson, Lars

2016-05-01

Motor disturbances and disturbed self-recognition are common features that affect mobility in persons with schizophrenia spectrum disorder and bipolar disorder. Physiotherapists in Scandinavia assess and treat movement difficulties in persons with severe mental illness. The Body Awareness Scale Movement Quality and Experience (BAS MQ-E) is a new and shortened version of the commonly used Body Awareness Scale-Health (BAS-H). The purpose of this study was to investigate the inter-rater reliability and the concurrent validity of BAS MQ-E in persons with severe mental illness. The concurrent validity was examined by investigating the relationships between neurological soft signs, alexithymia, fatigue, anxiety, and mastery. Sixty-two persons with severe mental illness participated in the study. The results showed a satisfactory inter-rater reliability (n = 53) and a concurrent validity (n = 62) with neurological soft signs, especially cognitive and perceptual based signs. There was also a concurrent validity linked to physical fatigue and aspects of alexithymia. The scores of BAS MQ-E were in general higher for persons with schizophrenia compared to persons with other diagnoses within the schizophrenia spectrum disorders and bipolar disorder. The clinical implications are presented in the discussion.
IOTN - a tool to prioritize treatment need in children and plan Dental Health services.

PubMed

Sharma, Jaideep; Sharma, Ruchi Dhir

2014-03-01

The study aimed to evaluate the orthodontic treatment need in school going children in Moradabad, North India, to assess the malocclusion traits, concern towards Dental Health and individual aesthetic perception compared to orthodontist's opinion. 5232 children, aged 11-14 yrs formed the sample. The Dental Health Component (DHC) and Aesthetic Component (AC) were recorded as defined by Brook and Shaw, with slight modification for AC assessment. Statistical analysis revealed that only 12.5% children had no treatment need while 87.5% presented malocclusion with varying treatment needs. There was insignificant sex difference for aesthetic perception. Examiner graded children less attractive than children. Class I was the most common malocclusion and crowding was the most common malocclusion trait. High intra-examiner and substantial inter-examiner agreements were observed for DHC and substantial intra- examiner and moderate inter-examiner agreements for AC. It can be concluded from the present study that, IOTN is a reliable epidemiologic tool to benefit local health services in planning their budget, and improve focus of services by inducing greater uniformity and standardization in the assessment of Orthodontic treatment need.
Braden scale (ALB) for assessing pressure ulcer risk in hospital patients: A validity and reliability study.

PubMed

Chen, Hong-Lin; Cao, Ying-Juan; Zhang, Wei; Wang, Jing; Huai, Bao-Sha

2017-02-01

The inter-rater reliability of Braden Scale is not so good. We modified the Braden(ALB) scale by defining nutrition subscale based on serum albumin, then assessed it's the validity and reliability in hospital patients. We designed a retrospective study for validity analysis, and a prospective study for reliability analysis. Receiver operating curve (ROC) and area under the curve (AUC) were used to evaluate the predictive validity. Intra-class correlation coefficient (ICC) was used to investigate the inter-rater reliability. Two thousand five hundred twenty-five patients were included for validity analysis, 76 patients (3.0%) developed pressure ulcer. Positive correlation was found between serum albumin and nutrition score in Braden scale (Spearman's coefficient 0.2203, P<0.0001). The AUCs for Braden scale and Braden(ALB) scale predicting pressure ulcer risk were 0.813 (95% CI 0.797-0.828; P<0.0001), and 0.859 (95% CI 0.845-0.872; P<0.0001), respectively. The Braden(ALB) scale was even more valid than the Braden scale (z=1.860, P=0.0628). In different age subgroups, the Braden(ALB) scale seems also more valid than the original Braden scale, but no statistically significant differences were found (P>0.05). The inter-rater reliability study showed the ICC-value for nutrition increased 45.9%, and increased 4.3% for total score. The Braden(ALB) scale has similar validity compared with the original Braden scale for in hospital patients. However, the inter-rater reliability was significantly increased. Copyright © 2016 Elsevier Inc. All rights reserved.
Reliability and group differences in quantitative cervicothoracic measures among individuals with and without chronic neck pain

PubMed Central

2012-01-01

Background Clinicians frequently rely on subjective categorization of impairments in mobility, strength, and endurance for clinical decision-making; however, these assessments are often unreliable and lack sensitivity to change. The objective of this study was to determine the inter-rater reliability, minimum detectable change (MDC), and group differences in quantitative cervicothoracic measures for individuals with and without chronic neck pain (NP). Methods Nineteen individuals with NP and 20 healthy controls participated in this case control study. Two physical therapists performed a 30-minute examination on separate days. A handheld dynamometer, gravity inclinometer, ruler, and stopwatch were used to quantify cervical range of motion (ROM), cervical muscle strength and endurance, and scapulothoracic muscle length and strength, respectively. Results Intraclass correlation coefficients for inter-rater reliability were significantly greater than zero for most impairment measures, with point estimates ranging from 0.45 to 0.93. The NP group exhibited reduced cervical ROM (P ≤ 0.012) and muscle strength (P ≤ 0.038) in most movement directions, reduced cervical extensor endurance (P = 0.029), and reduced rhomboid and middle trapezius muscle strength (P ≤ 0.049). Conclusions Results demonstrate the feasibility of obtaining objective cervicothoracic impairment measures with acceptable inter-rater agreement across time. The clinical utility of these measures is supported by evidence of impaired mobility, strength, and endurance among patients with NP, with corresponding MDC values that can help establish benchmarks for clinically significant change. PMID:23114092
Inter-rater agreement in evaluation of disability: systematic review of reproducibility studies

PubMed Central

Barth, Jürgen; de Boer, Wout E L; Busse, Jason W; Hoving, Jan L; Kedzia, Sarah; Couban, Rachel; Fischer, Katrin; von Allmen, David Y; Spanjer, Jerry

2017-01-01

Objectives To explore agreement among healthcare professionals assessing eligibility for work disability benefits. Design Systematic review and narrative synthesis of reproducibility studies. Data sources Medline, Embase, and PsycINFO searched up to 16 March 2016, without language restrictions, and review of bibliographies of included studies. Eligibility criteria Observational studies investigating reproducibility among healthcare professionals performing disability evaluations using a global rating of working capacity and reporting inter-rater reliability by a statistical measure or descriptively. Studies could be conducted in insurance settings, where decisions on ability to work include normative judgments based on legal considerations, or in research settings, where decisions on ability to work disregard normative considerations.Teams of paired reviewers identified eligible studies, appraised their methodological quality and generalisability, and abstracted results with pretested forms. As heterogeneity of research designs and findings impeded a quantitative analysis, a descriptive synthesis stratified by setting (insurance or research) was performed. Results From 4562 references, 101 full text articles were reviewed. Of these, 16 studies conducted in an insurance setting and seven in a research setting, performed in 12 countries, met the inclusion criteria. Studies in the insurance setting were conducted with medical experts assessing claimants who were actual disability claimants or played by actors, hypothetical cases, or short written scenarios. Conditions were mental (n=6, 38%), musculoskeletal (n=4, 25%), or mixed (n=6, 38%). Applicability of findings from studies conducted in an insurance setting to real life evaluations ranged from generalisable (n=7, 44%) and probably generalisable (n=3, 19%) to probably not generalisable (n=6, 37%). Median inter-rater reliability among experts was 0.45 (range intraclass correlation coefficient 0.86 to κ−0.10). Inter-rater reliability was poor in six studies (37%) and excellent in only two (13%). This contrasts with studies conducted in the research setting, where the median inter-rater reliability was 0.76 (range 0.91-0.53), and 71% (5/7) studies achieved excellent inter-rater reliability. Reliability between assessing professionals was higher when the evaluation was guided by a standardised instrument (23 studies, P=0.006). No such association was detected for subjective or chronic health conditions or the studies’ generalisability to real world evaluation of disability (P=0.46, 0.45, and 0.65, respectively). Conclusions Despite their common use and far reaching consequences for workers claiming disabling injury or illness, research on the reliability of medical evaluations of disability for work is limited and indicates high variation in judgments among assessing professionals. Standardising the evaluation process could improve reliability. Development and testing of instruments and structured approaches to improve reliability in evaluation of disability are urgently needed. PMID:28122727
Checklist and Scoring System for the Assessment of Soft Tissue Preservation in CT Examinations of Human Mummies.

PubMed

Panzer, Stephanie; Mc Coy, Mark R; Hitzl, Wolfgang; Piombino-Mascali, Dario; Jankauskas, Rimantas; Zink, Albert R; Augat, Peter

2015-01-01

The purpose of this study was to develop a checklist for standardized assessment of soft tissue preservation in human mummies based on whole-body computed tomography examinations, and to add a scoring system to facilitate quantitative comparison of mummies. Computed tomography examinations of 23 mummies from the Capuchin Catacombs of Palermo, Sicily (17 adults, 6 children; 17 anthropogenically and 6 naturally mummified) and 7 mummies from the crypt of the Dominican Church of the Holy Spirit of Vilnius, Lithuania (5 adults, 2 children; all naturally mummified) were used to develop the checklist following previously published guidelines. The scoring system was developed by assigning equal scores for checkpoints with equivalent quality. The checklist was evaluated by intra- and inter-observer reliability. The finalized checklist was applied to compare the groups of anthropogenically and naturally mummified bodies. The finalized checklist contains 97 checkpoints and was divided into two main categories, "A. Soft Tissues of Head and Musculoskeletal System" and "B. Organs and Organ Systems", each including various subcategories. The complete checklist had an intra-observer reliability of 98% and an inter-observer reliability of 93%. Statistical comparison revealed significantly higher values in anthropogenically compared to naturally mummified bodies for the total score and for three subcategories. In conclusion, the developed checklist allows for a standardized assessment and documentation of soft tissue preservation in whole-body computed tomography examinations of human mummies. The scoring system facilitates a quantitative comparison of the soft tissue preservation status between single mummies or mummy collections.
Reliability of the modified Tufts Lumbar Degenerative Disc Classification between neurosurgeons and neuroradiologists.

PubMed

Burke, Shane M; Hwang, Steven W; Mehan, William A; Bedi, Harprit S; Ogbuji, Richard; Riesenburger, Ron I

2016-07-01

Cross-specialty inter-rater reliability has not been explicitly reported for imaging characteristics that are thought to be important in lumbar intervertebral disc degeneration. Sufficient cross-specialty reliability is an essential consideration if radiographic stratification of symptomatic patients to specific treatment modalities is to ever be realized. Therefore the purpose of this study was to directly compare the assessment of such characteristics between neurosurgeons and neuroradiologists. Sixty consecutive patients with a diagnosis of lumbago and appropriate imaging were selected for inclusion. Lumbar MRI were evaluated using the Tufts Degenerative Disc Classification by two neurosurgeons and two neuroradiologists. Inter-rater reliability was assessed using Cohen's κ values both within and between specialties. A sensitivity analysis was performed for a modified grading system, which excluded high intensity zones (HIZ), due to poor cross-specialty inter-rater reliability of HIZ between specialties. The reliability of HIZ between neurosurgeons and neuroradiologists was fair in two of the four cross-specialty comparisons in this study (neurosurgeon 1 versus both radiologists κ=0.364 and κ=0.290). Removing HIZ from the classification improved inter-rater reliability for all comparisons within and between specialties (0.465⩽κ⩽0.576). In addition, intra-rater reliability remained in the moderate to substantial range (0.523⩽κ⩽0.649). Given our findings and corroboration with previous studies, identification of HIZ seems to have a markedly variable reliability. Thus we recommend modification of the original Tufts Degenerative Disc Classification by removing HIZ in order to make the overall grade provided by this classification more reproducible when scored by practitioners of different training backgrounds. Copyright © 2015 Elsevier Ltd. All rights reserved.
Inter-Observer, Intra-Observer and Intra-Individual Reliability of Uroflowmetry Tests in Aged Men: A Generalizability Theory Approach.

PubMed

Liu, Ying-Buh; Yang, Stephen S; Hsieh, Cheng-Hsing; Lin, Chia-Da; Chang, Shang-Jen

2014-05-01

To evaluate the inter-observer, intra-observer and intra-individual reliability of uroflowmetry and post-void residual urine (PVR) tests in adult men. Healthy volunteers aged over 40 years were enrolled. Every participant underwent two sets of uroflowmetry and PVR tests with a 2-week interval between the tests. The uroflowmetry tests were interpreted by four urologists independently. Uroflowmetry curves were classified as bell-shaped, bell-shaped with tail, obstructive, restrictive, staccato, interrupted and tower-shaped and scored from 1 (highly abnormal) to 5 (absolutely normal). The agreements between the observers, interpretations and tests within individuals were analyzed using kappa statistics and intraclass correlation coefficients. Generalizability theory with decision analysis was used to determine how many observers, tests, and interpretations were needed to obtain an acceptable reliability (> 0.80). Of 108 volunteers, we randomly selected the uroflowmetry results from 25 participants for the evaluation of reliability. The mean age of the studied adults was 55.3 years. The intra-individual and intra-observer reliability on uroflowmetry tests ranged from good to very good. However, the inter-observer reliability on normalcy and specific type of flow pattern were relatively lower. In generalizability theory, three observers were needed to obtain an acceptable reliability on normalcy of uroflow pattern if the patient underwent uroflowmetry tests twice with one observation. The intra-individual and intra-observer reliability on uroflowmetry tests were good while the inter-observer reliability was relatively lower. To improve inter-observer reliability, the definition of uroflowmetry should be clarified by the International Continence Society. © 2013 Wiley Publishing Asia Pty Ltd.
Medial tibial stress syndrome can be diagnosed reliably using history and physical examination.

PubMed

Winters, M; Bakker, E W P; Moen, M H; Barten, C C; Teeuwen, R; Weir, A

2017-02-08

The majority of sporting injuries are clinically diagnosed using history and physical examination as the cornerstone. There are no studies supporting the reliability of making a clinical diagnosis of medial tibial stress syndrome (MTSS). Our aim was to assess if MTSS can be diagnosed reliably, using history and physical examination. We also investigated if clinicians were able to reliably identify concurrent lower leg injuries. A clinical reliability study was performed at multiple sports medicine sites in The Netherlands. Athletes with non-traumatic lower leg pain were assessed for having MTSS by two clinicians, who were blinded to each others' diagnoses. We calculated the prevalence, percentage of agreement, observed percentage of positive agreement (Ppos), observed percentage of negative agreement (Pneg) and Kappa-statistic with 95%CI. Forty-nine athletes participated in this study, of whom 46 completed both assessments. The prevalence of MTSS was 74%. The percentage of agreement was 96%, with Ppos and Pneg of 97% and 92%, respectively. The inter-rater reliability was almost perfect; k=0.89 (95% CI 0.74 to 1.00), p<0.000001. Of the 34 athletes with MTSS, 11 (32%) had a concurrent lower leg injury, which was reliably noted by our clinicians, k=0.73, 95% CI 0.48 to 0.98, p<0.0001. Our findings show that MTSS can be reliably diagnosed clinically using history and physical examination, in clinical practice and research settings. We also found that concurrent lower leg injuries are common in athletes with MTSS. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Segmental Musculoskeletal Examinations using Dual-Energy X-Ray Absorptiometry (DXA): Positioning and Analysis Considerations

PubMed Central

Hart, Nicolas H.; Nimphius, Sophia; Spiteri, Tania; Cochrane, Jodie L.; Newton, Robert U.

2015-01-01

Musculoskeletal examinations provide informative and valuable quantitative insight into muscle and bone health. DXA is one mainstream tool used to accurately and reliably determine body composition components and bone mass characteristics in-vivo. Presently, whole body scan models separate the body into axial and appendicular regions, however there is a need for localised appendicular segmentation models to further examine regions of interest within the upper and lower extremities. Similarly, inconsistencies pertaining to patient positioning exist in the literature which influence measurement precision and analysis outcomes highlighting a need for standardised procedure. This paper provides standardised and reproducible: 1) positioning and analysis procedures using DXA and 2) reliable segmental examinations through descriptive appendicular boundaries. Whole-body scans were performed on forty-six (n = 46) football athletes (age: 22.9 ± 4.3 yrs; height: 1.85 ± 0.07 cm; weight: 87.4 ± 10.3 kg; body fat: 11.4 ± 4.5 %) using DXA. All segments across all scans were analysed three times by the main investigator on three separate days, and by three independent investigators a week following the original analysis. To examine intra-rater and inter-rater, between day and researcher reliability, coefficients of variation (CV) and intraclass correlation coefficients (ICC) were determined. Positioning and segmental analysis procedures presented in this study produced very high, nearly perfect intra-tester (CV ≤ 2.0%; ICC ≥ 0.988) and inter-tester (CV ≤ 2.4%; ICC ≥ 0.980) reliability, demonstrating excellent reproducibility within and between practitioners. Standardised examinations of axial and appendicular segments are necessary. Future studies aiming to quantify and report segmental analyses of the upper- and lower-body musculoskeletal properties using whole-body DXA scans are encouraged to use the patient positioning and image analysis procedures outlined in this paper. Key points Musculoskeletal examinations using DXA technology require highly standardised and reproducible patient positioning and image analysis procedures to accurately measure and monitor axial, appendicular and segmental regions of interest. Internal rotation and fixation of the lower-limbs is strongly recommended during whole-body DXA scans to prevent undesired movement, improve frontal mass accessibility and enhance ankle joint visibility during scan performance and analysis. Appendicular segmental analyses using whole-body DXA scans are highly reliable for all regional upper-body and lower-body segmentations, with hard-tissue (CV ≤ 1.5%; R ≥ 0.990) achieving greater reliability and lower error than soft-tissue (CV ≤ 2.4%; R ≥ 0.980) masses when using our appendicular segmental boundaries. PMID:26336349
Validation of different pediatric triage systems in the emergency department

PubMed Central

Aeimchanbanjong, Kanokwan; Pandee, Uthen

2017-01-01

BACKGROUND: Triage system in children seems to be more challenging compared to adults because of their different response to physiological and psychosocial stressors. This study aimed to determine the best triage system in the pediatric emergency department. METHODS: This was a prospective observational study. This study was divided into two phases. The first phase determined the inter-rater reliability of five triage systems: Manchester Triage System (MTS), Emergency Severity Index (ESI) version 4, Pediatric Canadian Triage and Acuity Scale (CTAS), Australasian Triage Scale (ATS), and Ramathibodi Triage System (RTS) by triage nurses and pediatric residents. In the second phase, to analyze the validity of each triage system, patients were categorized as two groups, i.e., high acuity patients (triage level 1, 2) and low acuity patients (triage level 3, 4, and 5). Then we compared the triage acuity with actual admission. RESULTS: In phase I, RTS illustrated almost perfect inter-rater reliability with kappa of 1.0 (P<0.01). ESI and CTAS illustrated good inter-rater reliability with kappa of 0.8–0.9 (P<0.01). Meanwhile, ATS and MTS illustrated moderate to good inter-rater reliability with kappa of 0.5–0.7 (P<0.01). In phase II, we included 1 041 participants with average age of 4.7±4.2 years, of which 55% were male and 45% were female. In addition 32% of the participants had underlying diseases, and 123 (11.8%) patients were admitted. We found that ESI illustrated the most appropriate predicting ability for admission with sensitivity of 52%, specificity of 81%, and AUC 0.78 (95%CI 0.74–0.81). CONCLUSION: RTS illustrated almost perfect inter-rater reliability. Meanwhile, ESI and CTAS illustrated good inter-rater reliability. Finally, ESI illustrated the appropriate validity for triage system. PMID:28680520
The development and validation of a custom built device for assessing frontal knee joint laxity.

PubMed

Ismail, Shiek Abdullah; Simic, Milena; Clarke, Jillian L; Lopes, Thiago Jambo Alves; Pappas, Evangelos

2017-12-01

This study reports the development and validation of a quantitative technique of assessing frontal knee joint laxity through a custom built device named KLICP. The objectives of this study were to determine: (i) the intra- and inter-rater reliability and (ii) the validity of the device when compared to real time ultrasound. Twenty-five participants had their frontal knee joint laxity assessed by the KLICP, by manual varus/valgus tests and by ultrasound. Two raters independently assessed laxity manually by three repeated measurements, repeated at least 48h later. Results were validated by comparing them to the medial and lateral joint space opening measured by the ultrasound. Intraclass correlation coefficients and standard error of measurement reliability were calculated. Pearson's correlation coefficients were calculated to determine the correlation between the KLICP and the joint space. Intra-rater reliability (intra-session) for each rater was good on both sessions (0.91-0.98), intra-rater reliability (inter-sessions) was moderate to good (0.62-0.87), and inter-rater reliability (intra-session) was good (0.75-0.80). There is low agreement for intra-rater (inter-session) and for inter-rater (intra-session) reliability. The KLICP measurement has a significant positive fair to moderate correlation to the ultrasound measurement at the left (r: 0.61, p: 0.01) and right (r: 0.48, p: 0.02) knee in the valgus direction and at the left (r: 0.51, p: 0.01) and right (r: 0.39, p: 0.05) knee in the varus direction. There is low agreement between the KLICP and the RTU. Reliability and agreement was good only when measured for intra-rater, within session. Copyright © 2017 Elsevier B.V. All rights reserved.
The reliability and validity of measurements of human dental casts made by an intra-oral 3D scanner, with conventional hand-held digital callipers as the comparison measure.

PubMed

Rajshekar, Mithun; Julian, Roberta; Williams, Anne-Marie; Tennant, Marc; Forrest, Alex; Walsh, Laurence J; Wilson, Gary; Blizzard, Leigh

2017-09-01

Intra-oral 3D scanning of dentitions has the potential to provide a fast, accurate and non-invasive method of recording dental information. The aim of this study was to assess the reliability of measurements of human dental casts made using a portable intra-oral 3D scanner appropriate for field use. Two examiners each measured 84 tooth and 26 arch features of 50 sets of upper and lower human dental casts using digital hand-held callipers, and secondly using the measuring tool provided with the Zfx IntraScan intraoral 3D scanner applied to the virtual dental casts. The measurements were repeated at least one week later. Reliability and validity were quantified concurrently by calculation of intra-class correlation coefficients (ICC) and standard errors of measurement (SEM). The measurements of the 110 landmark features of human dental casts made using the intra-oral 3D scanner were virtually indistinguishable from measurements of the same features made using conventional hand-held callipers. The difference of means as a percentage of the average of the measurements by each method ranged between 0.030% and 1.134%. The intermethod SEMs ranged between 0.037% and 0.535%, and the inter-method ICCs ranged between 0.904 and 0.999, for both the upper and the lower arches. The inter-rater SEMs were one-half and the intra-method/rater SEMs were one-third of the inter-method values. This study demonstrates that the Zfx IntraScan intra-oral 3D scanner with its virtual on-screen measuring tool is a reliable and valid method for measuring the key features of dental casts. Copyright © 2017 Elsevier B.V. All rights reserved.
Pneumothorax size measurements on digital chest radiographs: Intra- and inter- rater reliability.

PubMed

Thelle, Andreas; Gjerdevik, Miriam; Grydeland, Thomas; Skorge, Trude D; Wentzel-Larsen, Tore; Bakke, Per S

2015-10-01

Detailed and reliable methods may be important for discussions on the importance of pneumothorax size in clinical decision-making. Rhea's method is widely used to estimate pneumothorax size in percent based on chest X-rays (CXRs) from three measure points. Choi's addendum is used for anterioposterior projections. The aim of this study was to examine the intrarater and interrater reliability of the Rhea and Choi method using digital CXR in the ward based PACS monitors. Three physicians examined a retrospective series of 80 digital CXRs showing pneumothorax, using Rhea and Choi's method, then repeated in a random order two weeks later. We used the analysis of variance technique by Eliasziw et al. to assess the intrarater and interrater reliability in altogether 480 estimations of pneumothorax size. Estimated pneumothorax sizes ranged between 5% and 100%. The intrarater reliability coefficient was 0.98 (95% one-sided lower-limit confidence interval C 0.96), and the interrater reliability coefficient was 0.95 (95% one-sided lower-limit confidence interval 0.93). This study has shown that the Rhea and Choi method for calculating pneumothorax size has high intrarater and interrater reliability. These results are valid across gender, side of pneumothorax and whether the patient is diagnosed with primary or secondary pneumothorax. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
ASSOCIATIONS BETWEEN THREE CLINICAL ASSESSMENT TOOLS FOR POSTURAL STABILITY

PubMed Central

Saxion, Casie E.; Cameron, Kenneth L.; Gerber, J. Parry

2010-01-01

Study Design: Clinical Measurement, Correlation, Reliability Objectives: To assess the relationship between the Single Leg Balance (SLB), modified Balance Error Scoring System (mBESS), and modified Star Excursion Balance (mSEBT) tests and secondarily to assess inter-rater and test-retest reliability of these tests. Background: Ankle sprains often result in chronic instability and dysfunction. Several clinical tests assess postural deficits as a potential cause of this dysfunction; however, limited information exists pertaining to the relationship that these tests have with one another. Methods: Two independent examiners measured the performance of 34 healthy participants completing the SLB Test, mBESS test, and mSEBT at two different time periods. The relationship between tests was assessed using the Pearson Correlation and Fisher's Exact Tests. Inter-rater and test-retest reliability were assessed using the intraclass correlation coefficient (ICC) and Kappa statistics. Results: A significant correlation (r = -0.35) was observed between the mSEBT and the mBESS. Fisher's Exact Test showed a significant association between the SLB Test and mBESS (P = .048), but no association between the SLB and mSEBT (P = 1.000). Inter-rater reliability was excellent for the mSEBT and fair for the mBESS (ICCs of .91 and .61 respectively). Excellent agreement was observed between raters for the SLB test (k = 1.00). Test-retest reliability was excellent for the mSEBT (ICC = 0.98) and fair for the mBESS (ICC = 0.74). There was poor test-retest agreement for the SLB test (k = .211). Conclusion: There was a significant relationship observed between the SLB Test, mBESS test, and mSEBT: however; strength of association measures showed limited overlap between these tests. This suggests that these tests are interrelated but may not assess equal components of postural stability. PMID:21589668
Developing a standardized measurement of alcohol intoxication.

PubMed

Benoit, Justin L; Hart, Kimberly W; Soliman, Adam A; Barczak, Christopher M; Sibilia, Robert S; Lindsell, Christopher J; Fermann, Gregory J

2017-05-01

We assessed multiple examinations and assessment tools to develop a standardized measurement of alcohol intoxication to aid medical decision making in the Emergency Department. Volunteers underwent an alcohol challenge. Pre- and post-alcohol challenge, subjects were videotaped performing three standardized clinical examinations: (1) Standardized Field Sobriety Test (SFST) examination, (2) Hack's Impairment Index (HII) examination, and (3) Cincinnati Intoxication Examination (CIE). Emergency clinicians evaluated the level of intoxication using five standardized assessment tools in a blinded and randomized fashion: (1) SFST assessment tool (range 0-18), (2) HII assessment tool (range 0-1), (3) St. Elizabeth Alcohol Intoxication Scale (STE, range 0-17), (4) a Visual Analog Scale (VAS, range 0-100), and (5) a Binary Intoxication Question (BIQ). Construct validity was assessed along with inter- and intra-rater reliability. Median scores pre- and post-alcohol challenge were: SFST 6 (interquartile range 5) and 11 (3), respectively; HII 0 (0.05), 0.1 (0.1); STE 0 (1), 1 (2); VAS 10 (22), 33 (31). For BIQ, 59% and 91% indicated intoxication, respectively. Inter-rater reliability scores were: SFST 0.71 (95% confidence interval 0.48-0.86) to 0.93 (0.88-0.97) depending on examination component; HII 0.90 (0.82-0.95); STE 0.86 (0.75-0.93); VAS 0.92 (0.88-0.94); BIQ 0.3. Intra-rater reliability scores were: SFST 0.74 (0.64-0.82) to 0.87 (0.81-0.91); HII 0.85 (0.79-0.90); STE 0.78 (0.68-0.85); VAS 0.82 (0.74-0.87); BIQ 0.71. VAS reliability was best when paired with the HII and SFST examinations. HII examination, paired with either a VAS or HII assessment tool, yielded valid and reliable measurements of alcohol intoxication. Copyright © 2017 Elsevier Inc. All rights reserved.
Reliability and discriminatory power of methods for dental plaque quantification

PubMed Central

RAGGIO, Daniela Prócida; BRAGA, Mariana Minatel; RODRIGUES, Jonas Almeida; FREITAS, Patrícia Moreira; IMPARATO, José Carlos Pettorossi; MENDES, Fausto Medeiros

2010-01-01

Objective This in situ study evaluated the discriminatory power and reliability of methods of dental plaque quantification and the relationship between visual indices (VI) and fluorescence camera (FC) to detect plaque. Material and Methods Six volunteers used palatal appliances with six bovine enamel blocks presenting different stages of plaque accumulation. The presence of plaque with and without disclosing was assessed using VI. Images were obtained with FC and digital camera in both conditions. The area covered by plaque was assessed. Examinations were done by two independent examiners. Data were analyzed by Kruskal-Wallis and Kappa tests to compare different conditions of samples and to assess the inter-examiner reproducibility. Results Some methods presented adequate reproducibility. The Turesky index and the assessment of area covered by disclosed plaque in the FC images presented the highest discriminatory powers. Conclusions The Turesky index and images with FC with disclosing present good reliability and discriminatory power in quantifying dental plaque. PMID:20485931
The intra- and inter-rater reliability of five clinical muscle performance tests in patients with and without neck pain

PubMed Central

2013-01-01

Background This study investigates the reliability of muscle performance tests using cost- and time-effective methods similar to those used in clinical practice. When conducting reliability studies, great effort goes into standardising test procedures to facilitate a stable outcome. Therefore, several test trials are often performed. However, when muscle performance tests are applied in the clinical setting, clinicians often only conduct a muscle performance test once as repeated testing may produce fatigue and pain, thus variation in test results. We aimed to investigate whether cervical muscle performance tests, which have shown promising psychometric properties, would remain reliable when examined under conditions similar to those of daily clinical practice. Methods The intra-rater (between-day) and inter-rater (within-day) reliability was assessed for five cervical muscle performance tests in patients with (n = 33) and without neck pain (n = 30). The five tests were joint position error, the cranio-cervical flexion test, the neck flexor muscle endurance test performed in supine and in a 45°-upright position and a new neck extensor test. Results Intra-rater reliability ranged from moderate to almost perfect agreement for joint position error (ICC ≥ 0.48-0.82), the cranio-cervical flexion test (ICC ≥ 0.69), the neck flexor muscle endurance test performed in supine (ICC ≥ 0.68) and in a 45°-upright position (ICC ≥ 0.41) with the exception of a new test (neck extensor test), which ranged from slight to moderate agreement (ICC = 0.14-0.41). Likewise, inter-rater reliability ranged from moderate to almost perfect agreement for joint position error (ICC ≥ 0.51-0.75), the cranio-cervical flexion test (ICC ≥ 0.85), the neck flexor muscle endurance test performed in supine (ICC ≥ 0.70) and in a 45°-upright position (ICC ≥ 0.56). However, only slight to fair agreement was found for the neck extensor test (ICC = 0.19-0.25). Conclusions Intra- and inter-rater reliability ranged from moderate to almost perfect agreement with the exception of a new test (neck extensor test), which ranged from slight to moderate agreement. The significant variability observed suggests that tests like the neck extensor test and the neck flexor muscle endurance test performed in a 45°-upright position are too unstable to be used when evaluating neck muscle performance. PMID:24299621
Intra- and inter-observer reliability of quantitative analysis of the infra-patellar fat pad and comparison between fat- and non-fat-suppressed imaging--Data from the osteoarthritis initiative.

PubMed

Steidle-Kloc, E; Wirth, W; Ruhdorfer, A; Dannhauer, T; Eckstein, F

2016-03-01

The infra-patellar fat pad (IPFP), as intra-articular adipose tissue represents a potential source of pro-inflammatory cytokines and its size has been suggested to be associated with osteoarthritis (OA) of the knee. This study examines inter- and intra-observer reliability of fat-suppressed (fs) and non-fat-suppressed (nfs) MR imaging for determination of IPFP morphological measurements as novel biomarkers. The IPFP of nine right knees of healthy Osteoarthritis Initiative participants was segmented by five readers, using fs and nfs baseline sagittal MRIs. The intra-observer reliability was determined from baseline and 1-year follow-up images. All segmentations were quality controlled (QC) by an expert reader. Reliability was expressed as root mean square coefficient of variation (RMS CV%). After QC, the inter-observer reliability for fs (nfs) imaging was 2.0% (1.1%) for IPFP volume, 2.1%/2.5% (1.6%/1.8%) for anterior/posterior surface areas, 1.8% (1.8%) for depth, and 2.1% (2.4%) for maximum sagittal area. The intra-observer reliability was 3.1% (5.0%) for volume, 2.3%/2.8% (2.5%/2.9%) for anterior/posterior surfaces, 1.9% (3.5%) for depth, and 3.3% (4.5%) for maximum sagittal area. IPFP volume from nfs images was systematically greater (+7.3%) than from fs images, but highly correlated (r=0.98). The results suggest that quantitative measurements of IPFP morphology can be performed with satisfactory reliability when expert QC is implemented. The IPFP is more clearly depicted in nfs images, and there is a small systematic off-set versus analysis from fs images. However, the high linear relationship between fs and nfs imaging suggests that fs images can be used to analyze IPFP morphology, when nfs images are not available. Copyright © 2015 Elsevier GmbH. All rights reserved.

Intra- and inter-observer reliability of quantitative analysis of the infra-patellar fat pad and comparison between fat- and non-fat-suppressed imaging—Data from the osteoarthritis initiative

PubMed Central

Steidle-Kloc, E.; Wirth, W.; Ruhdorfer, A.; Dannhauer, T.; Eckstein, F.

2015-01-01

The infra-patellar fat pad (IPFP), as intra-articular adipose tissue represents a potential source of pro-inflammatory cytokines and its size has been suggested to be associated with osteoarthritis (OA) of the knee. This study examines inter- and intra-observer reliability of fat-suppressed (fs) and non-fat-suppressed (nfs) MR imaging for determination of IPFP morphological measurements as novel biomarkers. The IPFP of nine right knees of healthy Osteoarthritis Initiative participants was segmented by five readers, using fs and nfs baseline sagittal MRIs. The intra-observer reliability was determined from baseline and 1-year follow-up images. All segmentations were quality controlled (QC) by an expert reader. Reliability was expressed as root mean square coefficient of variation (RMS CV%). After QC, the inter-observer reliability for fs (nfs) imaging was 2.0% (1.1%) for IPFP volume, 2.1%/2.5% (1.6%/1.8%) for anterior/posterior surface areas, 1.8% (1.8%) for depth, and 2.1% (2.4%) for maximum sagittal area. The intra-observer reliability was 3.1% (5.0%) for volume, 2.3%/2.8% (2.5%/2.9%) for anterior/posterior surfaces, 1.9% (3.5%) for depth, and 3.3% (4.5%) for maximum sagittal area. IPFP volume from nfs images was systematically greater (+7.3%) than from fs images, but highly correlated (r = 0.98). The results suggest that quantitative measurements of IPFP morphology can be performed with satisfactory reliability when expert QC is implemented. The IPFP is more clearly depicted in nfs images, and there is a small systematic off-set versus analysis from fs images. However, the high linear relationship between fs and nfs imaging suggests that fs images can be used to analyze IPFP morphology, when nfs images are not available. PMID:26569532
A new scale for the assessment of performance and capacity of hand function in children with hemiplegic cerebral palsy: reliability and validity studies.

PubMed

Rosa-Rizzotto, M; Visonà Dalla Pozza, L; Corlatti, A; Luparia, A; Marchi, A; Molteni, F; Facchin, P; Pagliano, E; Fedrizzi, E

2014-10-01

In hemiplegic children, the recognition of the activity limitation pattern and the possibility of grading its severity are relevant for clinicians while planning interventions, monitoring results, predicting outcomes. Aim of the study is to examine the reliability and validity of Besta Scale, an instrument used to measure in hemiplegic children from 18 months to 12 years of age both grasp on request (capacity) and spontaneous use of upper limb (performance) in bimanual play activities and in ADL. Psychometric analysis of reliability and of validity of the Besta scale was performed. Outpatient study sample Reliability study: A sample of 39 patients was enrolled. The administration of Besta scale was video-recorded in a standardized manner. All videos were scored by 20 independent raters on subsequent viewing. 3 raters randomly selected from the 20-raters group rescored the same video two years later for intra-rater reliability. Intra and inter-rater reliability were calculated using Intraclass Correlation Coefficient (ICC) and Kendall's coefficient (K), respectively. Internal consistency reliability was assessed using Alpha's Chronbach coefficient. Validity study: a sample of 105 children was assessed 5 times (at t0 and 2, 3, 6 and 12 months later) by 20 independent raters. Each patient underwent at the same time to QUEST and Besta scale administration and assessment. Criterion validity was calculated using rho-Pearson coefficient. Reliability study: The inter-rater reliability calculated with Kendall's coefficient resulted moderate K=0.47. The intra-rater (or test-retest) reliability for 3 raters was excellent (ICC=0.927). The Cronbach's alpha for internal consistency was 0.972. Validity study: Besta scale showed a good criterion validity compared to QUEST increasing by age and severity of impairment. Rho Pearson's correlation coefficient r was 0.81 (P<0.0001). Limitations. Besta scales in infants finds hard to distinguish between mild to moderately impaired hand function. Besta scale scoring system is a valid and reliable tool, utilizable in a clinical setting to monitor evolution of unimanual and bimanual manipulation and to distinguish hand's capacity from performance.
Intra- and inter-observer reliability of ten major histological scoring systems used for the evaluation of in vivo cartilage repair.

PubMed

Bonasia, Davide Edoardo; Marmotti, Antongiulio; Massa, Alessandro Domenico Felice; Ferro, Andrea; Blonna, Davide; Castoldi, Filippo; Rossi, Roberto

2015-09-01

In the last two decades, many surgical techniques have been described for articular cartilage repair. Reliable histological scoring systems are fundamental tools to evaluate new procedures. Several histological scoring systems have been described, and these can be divided in elementary and comprehensive scores, according to the number of sub-items. The aim of this study was to test the inter- and intra-observer reliability of ten main scores used for the histological evaluation of in vivo cartilage repair. The authors tested the starting hypothesis that elementary scores would show superior intra- and inter-observer reliability compared with comprehensive scores. Fifty histological sections obtained from the trochlea of New Zealand Rabbit and stained with Safranin-O fast green were used. The histological sections were analysed by 4 observers: 2 experienced in cartilage histology and 2 inexperienced. Histological evaluations were performed at time 1 and time 2, separated by a 30-day interval. The following scores were used: Mankin, O'Driscoll, Pineda, Wakitani, Fortier, Selleres, ICRS, ICRSII, Oswestry (OsScore) and modified O'Driscoll. Intra- and inter-observer reliability were evaluated for each score. In addition, the pavement-ceiling effect and the Bland-Altman Coefficient of Repeatability were then evaluated for each sub-item of every score. Intra-observer reliability was high for all observers in every score, even though the reliability was significantly lower for non-expert observers compared with expert counterparts. In terms of Coefficient of Repeatability, some scores performed better (O'Driscoll, Modified O'Driscoll and ICRSII) than others (Fortier, Seller). Inter-observer reliability was high for all observers in every score, but significantly lower for non-expert compared with expert observers. In expert hands, all the scores showed high intra- and inter-observer reliability, independently of the complexity. Although every score has advantages and disadvantages, ICRSII, O'Driscoll and Modified O'Driscoll scores should be preferred for the evaluation of in vivo cartilage repair in animal models.
Reliability and validity of a questionnaire for self-assessment of complete dentures.

PubMed

Komagamine, Yuriko; Kanazawa, Manabu; Kaiba, Yoshinori; Sato, Yusuke; Minakuchi, Shunsuke

2014-05-02

Demand for complete denture treatment is expected to rise over several decades. However, to date, no questionnaire on complete dentures, as evaluated by edentulous patients, has been shown to be reliable and valid. This study sought to assess the reliability and validity of Patient's Denture Assessment (PDA), which provides a multidimensional evaluation of dentures among edentulous patients. Patients, who had new complete dentures fabricated at the University Hospital of Dentistry, Tokyo Medical and Dental University through 2009 to 2010, were enrolled. The reliability of the PDA was determined by examining internal consistency and test-retest reliability. Internal consistency for all of the question items and the six subscales was measured using Cronbach's α and average inter-item correlation coefficients among 93 participants. For 33 of these participants, test-retest reliability was determined at a 2 month-interval using the interclass correlation coefficients (ICCs) and 95% confidence interval for the summary scores and the six subscale scores. The PDA was validated in 93 participants by examining the difference in the summary score and the six subscale scores of the PDA before and after replacement with new dentures by the paired t-test. Ability to detect change was also tested in 93 patients using effect size. The Cronbach's α for the PDA ranged from 0.56 to 0.93. The average inter-item correlation coefficients ranged from 0.28 to 0.83. ICCs for the PDA ranged from 0.37 to 0.83. The paired t-test showed a significant difference between the summary score and the six subscale scores before and after replacement with new dentures (p < 0.05) and the effect size was 0.97. The PDA demonstrated good reliability by assessing internal consistency and test-retest reliability. In addition, the PDA demonstrated good validity by assessing discriminant validity. Thus, the PDA could help dentists obtain a detailed understanding of the patients' perceptions in using their dentures.
Assessment of nursing home residents in Europe: the Services and Health for Elderly in Long TERm care (SHELTER) study

PubMed Central

2012-01-01

Background Aims of the present study are the following: 1. to describe the rationale and methodology of the Services and Health for Elderly in Long TERm care (SHELTER) study, a project funded by the European Union, aimed at implementing the interRAI instrument for Long Term Care Facilities (interRAI LTCF) as a tool to assess and gather uniform information about nursing home (NH) residents across different health systems in European countries; 2. to present the results about the test-retest and inter-rater reliability of the interRAI LTCF instrument translated into the languages of participating countries; 3 to illustrate the characteristics of NH residents at study entry. Methods A 12 months prospective cohort study was conducted in 57 NH in 7 EU countries (Czech Republic, England, Finland, France, Germany, Italy, The Netherlands) and 1 non EU country (Israel). Weighted kappa coefficients were used to evaluate the reliability of interRAI LTCF items. Results Mean age of 4156 residents entering the study was 83.4 ± 9.4 years, 73% were female. ADL disability and cognitive impairment was observed in 81.3% and 68.0% of residents, respectively. Clinical complexity of residents was confirmed by a high prevalence of behavioral symptoms (27.5% of residents), falls (18.6%), pressure ulcers (10.4%), pain (36.0%) and urinary incontinence (73.5%). Overall, 197 of the 198 the items tested met or exceeded standard cut-offs for acceptable test-retest and inter-rater reliability after translation into the target languages. Conclusion The interRAI LTCF appears to be a reliable instrument. It enables the creation of databases that can be used to govern the provision of long-term care across different health systems in Europe, to answer relevant research and policy questions and to compare characteristics of NH residents across countries, languages and cultures. PMID:22230771
Validity and reliability of a new ankle dorsiflexion measurement device.

PubMed

Gatt, Alfred; Chockalingam, Nachiappan

2013-08-01

The assessment of the maximum ankle dorsiflexion angle is an important clinical examination procedure. Evidence shows that the traditional goniometer is highly unreliable, and various designs of goniometers to measure the maximum ankle dorsiflexion angle rely on the application of a known force to obtain reliable results. Hence, an innovative ankle dorsiflexion measurement device was designed to make this measurement more reliable by holding the foot in a selected posture without the application of a known moment. To report on the comprehensive validity and reliability testing carried out on the new device. Following validity testing, four different trials to test reliability of the ankle dorsiflexion measurement device were performed. These trials included inter-rater and intra-rater testings with a controlled moment, intra-rater reliability testing with knees flexed and extended without a controlled moment, intra-rater testing with a patient population, and inter-rater reliability testing between four raters of varying experience without controlling moment. All raters were blinded. A series of trials to test intra-rater and inter-rater reliabilities. Intra-rater reliability intraclass correlation coefficient was 0.98 and inter-rater reliability intraclass correlation coefficient (2,1) was 0.953 with a controlled moment. With uncontrolled moment, very high reliability for intra-tester was also achieved (intraclass correlation coefficient = 0.94 with knees extended and intraclass correlation coefficient = 0.95 with knees flexed). For the trial investigating test-retest reliability with actual patients, intraclass correlation coefficient of 0.99 was obtained. In the trial investigating four different raters with uncontrolled moment, intraclass correlation coefficient of 0.91 was achieved. The new ankle dorsiflexion measurement device is a valid and reliable device for measuring ankle dorsiflexion in both healthy subjects and patients, with both controlled and uncontrolled moments, even by multiple raters of varying experience when the foot is dorsiflexed to its end of range of motion. An ankle dorsiflexion measuring device has been designed to increase the reliability of ankle dorsiflexion measurement and replace the traditional goniometer. While the majority of similar devices rely on application of a known moment to perform this measurement, it has been shown that this is not required with the new ankle dorsiflexion measurement device and, rather, foot posture should be taken into consideration as this affects the maximum ankle dorsiflexion angle.
Reliability of capturing foot parameters using digital scanning and the neutral suspension casting technique

PubMed Central

2011-01-01

Background A clinical study was conducted to determine the intra and inter-rater reliability of digital scanning and the neutral suspension casting technique to measure six foot parameters. The neutral suspension casting technique is a commonly utilised method for obtaining a negative impression of the foot prior to orthotic fabrication. Digital scanning offers an alternative to the traditional plaster of Paris techniques. Methods Twenty one healthy participants volunteered to take part in the study. Six casts and six digital scans were obtained from each participant by two raters of differing clinical experience. The foot parameters chosen for investigation were cast length (mm), forefoot width (mm), rearfoot width (mm), medial arch height (mm), lateral arch height (mm) and forefoot to rearfoot alignment (degrees). Intraclass correlation coefficients (ICC) with 95% confidence intervals (CI) were calculated to determine the intra and inter-rater reliability. Measurement error was assessed through the calculation of the standard error of the measurement (SEM) and smallest real difference (SRD). Results ICC values for all foot parameters using digital scanning ranged between 0.81-0.99 for both intra and inter-rater reliability. For neutral suspension casting technique inter-rater reliability values ranged from 0.57-0.99 and intra-rater reliability values ranging from 0.36-0.99 for rater 1 and 0.49-0.99 for rater 2. Conclusions The findings of this study indicate that digital scanning is a reliable technique, irrespective of clinical experience, with reduced measurement variability in all foot parameters investigated when compared to neutral suspension casting. PMID:21375757
Measuring the quality of life in mild to very severe dementia: testing the inter-rater and intra-rater reliability of the German version of the QUALIDEM.

PubMed

Dichter, Martin Nikolaus; Schwab, Christian G G; Meyer, Gabriele; Bartholomeyczik, Sabine; Dortmann, Olga; Halek, Margareta

2014-05-01

Quality of life (Qol) is an increasingly used outcome measure in dementia research. The QUALIDEM is a dementia-specific and proxy-rated Qol instrument. We aimed to determine the inter-rater and intra-rater reliability in residents with dementia in German nursing homes. The QUALIDEM consists of nine subscales that were applied to a sample of 108 people with mild to severe dementia and six consecutive subscales that were applied to a sample of 53 people with very severe dementia. The proxy raters were 49 registered nurses and nursing assistants. Inter-rater and intra-rater reliability scores were calculated on the subscale and item level. None of the QUALIDEM subscales showed strong inter-rater reliability based on the single-measure Intra-Class Correlation Coefficient (ICC) for absolute agreement ≥ 0.70. Based on the average-measure ICC for four raters, eight subscales for people with mild to severe dementia (care relationship, positive affect, negative affect, restless tense behavior, social relations, social isolation, feeling at home and having something to do) and five subscales for very severe dementia (care relationship, negative affect, restless tense behavior, social relations and social isolation) yielded a strong inter-rater agreement (ICC: 0.72-0.86). All of the QUALIDEM subscales, regardless of dementia severity, showed strong intra-rater agreement. The ICC values ranged between 0.70 and 0.79 for people with mild to severe dementia and between 0.75 and 0.87 for people with very severe dementia. This study demonstrated insufficient inter-rater reliability and sufficient intra-rater reliability for all subscales of both versions of the German QUALIDEM. The degree of inter-rater reliability can be improved by collaborative Qol rating by more than one nurse. The development of a measurement manual with accurate item definitions and a standardized education program for proxy raters is recommended.
[Inter-rater reliability and construct validity of the OPD-CA axis structure: first study results regarding the integration of OPD-CA into clinical practice].

PubMed

Cropp, Carola; Salzer, Simone; Häusser, Leonard F; Streeck-Fischer, Annette

2013-01-01

The axis structure of the Operationalized Psychodynamic Diagnostics in childhood and adolescence (OPD-CA) has proven to be a reliable and valid diagnostic tool under research conditions. However, corresponding data regarding the integration of OPD-CA axis structure into clinical practice is still lacking. Hence, this aspect was examined as part of a randomized controlled clinical trial realized at Asklepios Fachklinikum Tiefenbrunn. Here, the OPD-CA axis structure has been applied to assess the structural level of 42 adolescent patients (15-19 years). In contrast to previous studies, the assessment was not carried out by independent raters using a videotaped OPD-CA interview, but the rating was part of clinical routine procedures. Also under these conditions, inter-rater reliability was high, in particular regarding the four subscales of the OPD-CA axis structure. With respect to construct validity, the results of our study supported a two-factor solution, which is in accordance with the findings of two previous works. One factor corresponded to the dimension "self-regulation" while the other factor included both the dimension "self-perception and object perception" as well as the dimension "communication skills". Implications of the findings for research and practice are discussed.
Semi-structured Interview Measure of Stigma (SIMS) in psychosis: Assessment of psychometric properties.

PubMed

Wood, Lisa; Burke, Eilish; Byrne, Rory; Enache, Gabriela; Morrison, Anthony P

2016-10-01

Stigma is a significant difficulty for people who experience psychosis. To date, there have been no outcome measures developed to examine stigma exclusively in people with psychosis. The aim of this study was develop and validate a semi-structured interview measure of stigma (SIMS) in psychosis. The SIMS is an eleven item measure of stigma developed in consultation with service users who have experienced psychosis. 79 participants with experience of psychosis were recruited for the purposes of this study. They were administered the SIMS alongside a battery of other relevant outcome measures to examine reliability and validity. A one-factor solution was identified for the SIMS which encompassed all ten rateable items. The measure met all reliability and validity criteria and illustrated good internal consistency, inter-rater reliability, test retest reliability, criterion validity, construct validity, sensitivity to change and had no floor or ceiling effects. The SIMS is a reliable and valid measure of stigma in psychosis. It may be more engaging and acceptable than other stigma measures due to its semi-structured interview format. Crown Copyright © 2016. Published by Elsevier B.V. All rights reserved.
Diagnostic reliability of MMPI-2 computer-based test interpretations.

PubMed

Pant, Hina; McCabe, Brian J; Deskovitz, Mark A; Weed, Nathan C; Williams, John E

2014-09-01

Reflecting the common use of the MMPI-2 to provide diagnostic considerations, computer-based test interpretations (CBTIs) also typically offer diagnostic suggestions. However, these diagnostic suggestions can sometimes be shown to vary widely across different CBTI programs even for identical MMPI-2 profiles. The present study evaluated the diagnostic reliability of 6 commercially available CBTIs using a 20-item Q-sort task developed for this study. Four raters each sorted diagnostic classifications based on these 6 CBTI reports for 20 MMPI-2 profiles. Two questions were addressed. First, do users of CBTIs understand the diagnostic information contained within the reports similarly? Overall, diagnostic sorts of the CBTIs showed moderate inter-interpreter diagnostic reliability (mean r = .56), with sorts for the 1/2/3 profile showing the highest inter-interpreter diagnostic reliability (mean r = .67). Second, do different CBTIs programs vary with respect to diagnostic suggestions? It was found that diagnostic sorts of the CBTIs had a mean inter-CBTI diagnostic reliability of r = .56, indicating moderate but not strong agreement across CBTIs in terms of diagnostic suggestions. The strongest inter-CBTI diagnostic agreement was found for sorts of the 1/2/3 profile CBTIs (mean r = .71). Limitations and future directions are discussed. PsycINFO Database Record (c) 2014 APA, all rights reserved.
An assessment of the inter-rater reliability of the ASA physical status score in the orthopaedic trauma population.

PubMed

Ihejirika, Rivka C; Thakore, Rachel V; Sathiyakumar, Vasanth; Ehrenfeld, Jesse M; Obremskey, William T; Sethi, Manish K

2015-04-01

Although recent literature has demonstrated the utility of the ASA score in predicting postoperative length of stay, complication risk and potential utilization of other hospital resources, the ASA score has been inconsistently assigned by anaesthesia providers. This study tested the reliability of assignment of the ASA score classification by both attending anaesthesiologists and anaesthesia residents specifically among the orthopaedic trauma patient population. Nine case-based scenarios were created involving preoperative patients with isolated operative orthopaedic trauma injuries. The cases were created and assigned a reference score by both an attending anaesthesiologist and orthopaedic trauma surgeon. Attending and resident anaesthesiologists were asked to assign an ASA score for each case. Rater versus reference and inter-rater agreement amongst respondents was then analyzed utilizing Fleiss's Kappa and weighted and unweighted Cohen's Kappa. Thirty three individuals provided ASA scores for each of the scenarios. The average rater versus reference reliability was substantial (Kw=0.78, SD=0.131, 95% CI=0.73-0.83). The average rater versus reference Kuw was also substantial (Kuw=0.64, SD=0.21, 95% CI=0.56-0.71). The inter-rater reliability as evaluated by Fleiss's Kappa was moderate (K=0.51, p<.001). An inter-rater comparison within the group of attendings (K=0.50, p<.001) and within the group of residents were both moderate (K=0.55, p<.001). There was a significant increase in the level of inter-rater reliability from the self-reported 'very uncomfortable' participants to the 'very comfortable' participants (uncomfortable K=0.43, comfortable K=0.59, p<.001). This study shows substantial agreement strength for reliability of the ASA score among anaesthesiologists when evaluating orthopaedic trauma patients. The significant increase in inter-rater reliability based on anaesthesiologists' comfort with the ASA scoring method implies a need for further evaluation of ASA assessment training and routine use on the ground. These findings support the use of the ASA score as a statistically reliable tool in orthopaedic trauma. Copyright © 2014 Elsevier Ltd. All rights reserved.
Reliability and concurrent validity of a new iPhone® goniometric application for measuring active wrist range of motion: a cross-sectional study in asymptomatic subjects.

PubMed

Pourahmadi, Mohammad Reza; Ebrahimi Takamjani, Ismail; Sarrafzadeh, Javad; Bahramian, Mehrdad; Mohseni-Bandpei, Mohammad Ali; Rajabzadeh, Fatemeh; Taghipour, Morteza

2017-03-01

Measurement of wrist range of motion (ROM) is often considered to be an essential component of wrist physical examination. The measurement can be carried out through various instruments such as goniometers and inclinometers. Recent smartphones have been equipped with accelerometers and magnetometers, which, through specific software applications (apps) can be used for goniometric functions. This study, for the first time, aimed to evaluate the reliability and concurrent validity of a new smartphone goniometric app (Goniometer Pro©) for measuring active wrist ROM. In all, 120 wrists of 70 asymptomatic adults (38 men and 32 women; aged 18-40 years) were assessed in a physiotherapy clinic located at the School of Rehabilitation Sciences, Iran University of Medical Science and Health Services, Tehran, Iran. Following the recruitment process, active wrist ROM was measured using a universal goniometer and iPhone ® 5 app. Two blinded examiners each utilized the universal goniometer and iPhone ® to measure active wrist ROM using a volar/dorsal alignment technique in the following sequences: flexion, extension, radial deviation, and ulnar deviation. The second (2 h later) and third (48 h later) sessions were carried out in the same manner as the first session. All the measurements were conducted three times and the mean value of three repetitions for each measurement was used for analysis. Intraclass correlation coefficient (ICC) models (3, k) and (2, k) were used to determine the intra-rater and inter-rater reliability, respectively. The Pearson correlation coefficients were used to establish concurrent validity of the iPhone ® app. Good to excellent intra-rater and inter-rater reliability was demonstrated for the goniometer with ICC values of ≥ 0.82 and ≥ 0.73 and the iPhone ® app with ICC values of ≥ 0.83 and ≥ 0.79, respectively. Minimum detectable change at the 95% confidence level (MDC 95 ) was computed as 1.96 × standard error of measurement × √2. The MDC 95 ranged from 1.66° to 5.35° for the intra-rater analysis and from 1.97° to 6.15° for the inter-rater analysis. The concurrent validity between the two instruments was high, with r values of ≥ 0.80. From the results of this cross-sectional study, it can be concluded that the iPhone ® app possesses good to excellent intra-rater and inter-rater reliability and concurrent validity. It seems that this app can be used for the measurement of wrist ROM. However, further research is needed to evaluate symptomatic subjects using this app. © 2016 Anatomical Society.
The inter-rater reliability and prognostic value of coma scales in Nepali children with acute encephalitis syndrome.

PubMed

Ray, Stephen; Rayamajhi, Ajit; Bonnett, Laura J; Solomon, Tom; Kneen, Rachel; Griffiths, Michael J

2018-02-01

Background Acute encephalitis syndrome (AES) is a common cause of coma in Nepali children. The Glasgow coma scale (GCS) is used to assess the level of coma in these patients and predict outcome. Alternative coma scales may have better inter-rater reliability and prognostic value in encephalitis in Nepali children, but this has not been studied. The Adelaide coma scale (ACS), Blantyre coma scale (BCS) and the Alert, Verbal, Pain, Unresponsive scale (AVPU) are alternatives to the GCS which can be used. Methods Children aged 1-14 years who presented to Kanti Children's Hospital, Kathmandu with AES between September 2010 and November 2011 were recruited. All four coma scales (GCS, ACS, BCS and AVPU) were applied on admission, 48 h later and on discharge. Inter-rater reliability (unweighted kappa) was measured for each. Correlation and agreement between total coma score and outcome (Liverpool outcome score) was measured by Spearman's rank and Bland-Altman plot. The prognostic value of coma scales alone and in combination with physiological variables was investigated in a subgroup (n = 22). A multivariable logistic regression model was fitted by backward stepwise. Results Fifty children were recruited. Inter-rater reliability using the variables scales was fair to moderate. However, the scales poorly predicted clinical outcome. Combining the scales with physiological parameters such as systolic blood pressure improved outcome prediction. Conclusion This is the first study to compare four coma scales in Nepali children with AES. The scales exhibited fair to moderate inter-rater reliability. However, the study is inadequately powered to answer the question on the relationship between coma scales and outcome. Further larger studies are required.
Understanding Expenditure Data.

ERIC Educational Resources Information Center

Dyke, Frances L.

2000-01-01

Stresses the importance of common understandings of cost definitions and data collection in order to create reliable databases with optimal utility for inter-institutional analysis. Examines definitions of common expenditure categories, discusses cost-accumulation rules governing financial reporting, and explains differences between direct costs…
Assessing Variations in Areal Organization for the Intrinsic Brain: From Fingerprints to Reliability

PubMed Central

Xu, Ting; Opitz, Alexander; Craddock, R. Cameron; Wright, Margaret J.; Zuo, Xi-Nian; Milham, Michael P.

2016-01-01

Resting state fMRI (R-fMRI) is a powerful in-vivo tool for examining the functional architecture of the human brain. Recent studies have demonstrated the ability to characterize transitions between functionally distinct cortical areas through the mapping of gradients in intrinsic functional connectivity (iFC) profiles. To date, this novel approach has primarily been applied to iFC profiles averaged across groups of individuals, or in one case, a single individual scanned multiple times. Here, we used a publically available R-fMRI dataset, in which 30 healthy participants were scanned 10 times (10 min per session), to investigate differences in full-brain transition profiles (i.e., gradient maps, edge maps) across individuals, and their reliability. 10-min R-fMRI scans were sufficient to achieve high accuracies in efforts to “fingerprint” individuals based upon full-brain transition profiles. Regarding test–retest reliability, the image-wise intraclass correlation coefficient (ICC) was moderate, and vertex-level ICC varied depending on region; larger durations of data yielded higher reliability scores universally. Initial application of gradient-based methodologies to a recently published dataset obtained from twins suggested inter-individual variation in areal profiles might have genetic and familial origins. Overall, these results illustrate the utility of gradient-based iFC approaches for studying inter-individual variation in brain function. PMID:27600846
German version, inter- and intrarater reliability and internal consistency of the "Agitated Behavior Scale" (ABS-G) in patients with moderate to severe traumatic brain injury.

PubMed

Hellweg, Stephanie; Schuster-Amft, Corina

2016-07-19

Agitation is frequently observed during early recovery after traumatic brain injury (TBI). Agitated behaviour often interferes with a goal-orientated rehabilitation and can be a substantial hindrance to therapy. Despite the relatively high occurance of agitation in TBI population there is no objective assessement in German (G) available. An existing scale with excellent psychometric properties is the "Agitated Behavior Scale (ABS)" developed by Corrigan in 1989. The aim of the study was to translate the Agitated Behavior Scale (ABS) into German (ABS-G) and investigate the inter- and intrarater reliability and internal consistency in patients with moderate to severe TBI. A formal nine-step translation and cross-cultural adaptation procedure (TCCA) was applied. Subsequently a prospective observational patient study was conducted. To examine the interrater reliability and internal consistency, two therapists rated 20 patients independently after a therapy session. This procedure was repeated twice on a weekly basis. The intrarater reliability was assessed through video recordings from three patients. Nine raters scored the demonstrated behaviour on the videotape with the ABS-G independently twice within one month. The inter- and intrarater reliability were evaluated with the Spearman rank correlation coefficient and the quadratic weighted kappa. The internal consistency was tested with Cronbach's alpha. Behaviour of 20 patients (18 males; mean age 41 ± 20.7; mean Functional Independence Measure (FIM) cognitive score on admission 7.1 ± 4.04; mean ABS-G score at first observation 17.3 ± 2.83) was assessed threefold. Interrater reliability yielded a correlation coefficient for ABS-G total score of all 60 paired observations of r s 0.845 and a weighted Kappa of 0.738. Intrarater reliability for ABS-G total score ranged between r s 0.719 and 0.953 and showed a weighted Kappa between 0.871 and 0.953. Cronbach's alpha indicated moderate internal consistency with 0.661. This study demonstrates that the ABS-G is a reliable instrument for evaluating agitation in patients with moderate to severe TBI. Hereby it would be possible to monitor agitation objectively and optimise the management of agitated patients according to international recommendations.
Six of one, half a dozen of the other: A measure of multidisciplinary inter/intra-rater reliability of the society for fetal urology and urinary tract dilation grading systems for hydronephrosis.

PubMed

Rickard, Mandy; Easterbrook, Bethany; Kim, Soojin; Farrokhyar, Forough; Stein, Nina; Arora, Steven; Belostotsky, Vladamir; DeMaria, Jorge; Lorenzo, Armando J; Braga, Luis H

2017-02-01

The urinary tract dilation (UTD) classification system was introduced to standardize terminology in the reporting of hydronephrosis (HN), and bridge a gap between pre- and postnatal classification such as the Society for Fetal Urology (SFU) grading system. Herein we compare the intra/inter-rater reliability of both grading systems. SFU (I-IV) and UTD (I-III) grades were independently assigned by 13 raters (9 pediatric urology staff, 2 nephrologists, 2 radiologists), twice, 3 weeks apart, to 50 sagittal postnatal ultrasonographic views of hydronephrotic kidneys. Data regarding ureteral measurements and bladder abnormalities were included to allow proper UTD categorization. Ten images were repeated to assess intra-rater reliability. Krippendorff's alpha coefficient was used to measure overall and by grade intra/inter-rater reliability. Reliability between specialties and training levels were also analyzed. Overall inter-rater reliability was slightly higher for SFU (α = 0.842, 95% CI 0.812-0.879, in session 1; and α = 0.808, 95% CI 0.775-0.839, in session 2) than for UTD (α = 0.774, 95% CI 0.715-0.827, in session 1; and α = 0.679, 95% CI 0.605-0.750, in session 2). Reliability for intermediate grades (SFU II/III and UTD 2) of HN was poor regardless of the system. Reliabilities for SFU and UTD classifications among Urology, Nephrology, and Radiology, as well as between training levels were not significantly different. Despite the introduction of HN grading systems to standardize the interpretation and reporting of renal ultrasound in infants with HN, none have been proven superior in allowing clinicians to distinguish between "moderate" grades. While this study demonstrated high reliability in distinguishing between "mild" (SFU I/II and UTD 1) and "severe" (SFU IV and UTD 3) grades of HN, the overall reliability between specialties was poor. This is in keeping with a previous report of modest inter-rater reliability of the SFU system. This drawback is likely explained by the subjective interpretation required to assign grades, which can be impacted by experience, image quality, and scanning technique. As shown in the figure, which demonstrates SFU II (a) and SFU III (b), as assigned by a radiologist, it is possible to make an argument that either of these images can be classified into both categories that were observed during the grading sessions of this study. Although both systems have acceptable reliability, the SFU grading system showed higher overall intra/inter-rater reliability regardless of rater specialty than the UTD classification. Inter-rater reliability for SFU grades II/III and UTD 2 was low, highlighting the limitations of both classifications in regards to properly segregating moderate HN grades. Copyright © 2016 Journal of Pediatric Urology Company. Published by Elsevier Ltd. All rights reserved.
Translation, adaptation and inter-rater reliability of the administration manual for the Fugl-Meyer assessment.

PubMed

Michaelsen, Stella M; Rocha, André S; Knabben, Rodrigo J; Rodrigues, Luciano P; Fernandes, Claudia G C

2011-01-01

Recently, the reliability of the Brazilian version of the Fugl-Meyer Assessment (FMA) was assessed through the scoring given according to observations made by a single evaluator who applied the test. When different raters apply the scale, the reliability may depend on the interpretation given to the assessment sheet. In such cases, a clear administration manual is essential for ensuring homogeneity of application. To translate and adapt the French Canadian version of the FMA administration manual into Brazilian Portuguese and to evaluate the inter-rater reliability when different evaluators apply the FMA on the basis of the information contained in the manual. Eighteen adults (59±10 years) with chronic hemiparesis (38±35 months after a stroke) took part in this study. Eight patients participated in the first part of the study and 10 in the second part. Based on analyzing the results from part 1, an adapted version was developed, in which information and photos were added to illustrate the positions of the patient and evaluator. The inter-rater reliability was assessed using the intraclass correlation coefficient (ICC). The reliability of the FMA based on the adapted version of the manual was excellent for the total motor scores for the upper limbs (ICC=0.98) and lower limbs (ICC=0.90), as well as for movement sense (ICC=0.98) and upper and lower-limb passive range of motion (ICC=0.84 and 0.90, respectively). The reliability was moderate for tactile sensitivity (0.75). The joint pain assessment presented low reliability. The results showed that, except for pain assessment, application of the FMA based on the adapted version of the application manual for Brazilian Portuguese presented adequate inter-rater reliability.
Inter-rater agreement in evaluation of disability: systematic review of reproducibility studies.

PubMed

Barth, Jürgen; de Boer, Wout E L; Busse, Jason W; Hoving, Jan L; Kedzia, Sarah; Couban, Rachel; Fischer, Katrin; von Allmen, David Y; Spanjer, Jerry; Kunz, Regina

2017-01-25

To explore agreement among healthcare professionals assessing eligibility for work disability benefits. Systematic review and narrative synthesis of reproducibility studies. Medline, Embase, and PsycINFO searched up to 16 March 2016, without language restrictions, and review of bibliographies of included studies. Observational studies investigating reproducibility among healthcare professionals performing disability evaluations using a global rating of working capacity and reporting inter-rater reliability by a statistical measure or descriptively. Studies could be conducted in insurance settings, where decisions on ability to work include normative judgments based on legal considerations, or in research settings, where decisions on ability to work disregard normative considerations. : Teams of paired reviewers identified eligible studies, appraised their methodological quality and generalisability, and abstracted results with pretested forms. As heterogeneity of research designs and findings impeded a quantitative analysis, a descriptive synthesis stratified by setting (insurance or research) was performed. From 4562 references, 101 full text articles were reviewed. Of these, 16 studies conducted in an insurance setting and seven in a research setting, performed in 12 countries, met the inclusion criteria. Studies in the insurance setting were conducted with medical experts assessing claimants who were actual disability claimants or played by actors, hypothetical cases, or short written scenarios. Conditions were mental (n=6, 38%), musculoskeletal (n=4, 25%), or mixed (n=6, 38%). Applicability of findings from studies conducted in an insurance setting to real life evaluations ranged from generalisable (n=7, 44%) and probably generalisable (n=3, 19%) to probably not generalisable (n=6, 37%). Median inter-rater reliability among experts was 0.45 (range intraclass correlation coefficient 0.86 to κ-0.10). Inter-rater reliability was poor in six studies (37%) and excellent in only two (13%). This contrasts with studies conducted in the research setting, where the median inter-rater reliability was 0.76 (range 0.91-0.53), and 71% (5/7) studies achieved excellent inter-rater reliability. Reliability between assessing professionals was higher when the evaluation was guided by a standardised instrument (23 studies, P=0.006). No such association was detected for subjective or chronic health conditions or the studies' generalisability to real world evaluation of disability (P=0.46, 0.45, and 0.65, respectively). Despite their common use and far reaching consequences for workers claiming disabling injury or illness, research on the reliability of medical evaluations of disability for work is limited and indicates high variation in judgments among assessing professionals. Standardising the evaluation process could improve reliability. Development and testing of instruments and structured approaches to improve reliability in evaluation of disability are urgently needed. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

Reliability of visual and instrumental color matching.

PubMed

Igiel, Christopher; Lehmann, Karl Martin; Ghinea, Razvan; Weyhrauch, Michael; Hangx, Ysbrand; Scheller, Herbert; Paravina, Rade D

2017-09-01

The aim of this investigation was to evaluate intra-rater and inter-rater reliability of visual and instrumental shade matching. Forty individuals with normal color perception participated in this study. The right maxillary central incisor of a teaching model was prepared and restored with 10 feldspathic all-ceramic crowns of different shades. A shade matching session consisted of the observer (rater) visually selecting the best match by using VITA classical A1-D4 (VC) and VITA Toothguide 3D Master (3D) shade guides and the VITA Easyshade Advance intraoral spectrophotometer (ES) to obtain both VC and 3D matches. Three shade matching sessions were held with 4 to 6 weeks between sessions. Intra-rater reliability was assessed based on the percentage of agreement for the three sessions for the same observer, whereas the inter-rater reliability was calculated as mean percentage of agreement between different observers. The Fleiss' Kappa statistical analysis was used to evaluate visual inter-rater reliability. The mean intra-rater reliability for the visual shade selection was 64(11) for VC and 48(10) for 3D. The corresponding ES values were 96(4) for both VC and 3D. The percentages of observers who matched the same shade with VC and 3D were 55(10) and 43(12), respectively, while corresponding ES values were 88(8) for VC and 92(4) for 3D. The results for visual shade matching exhibited a high to moderate level of inconsistency for both intra-rater and inter-rater comparisons. The VITA Easyshade Advance intraoral spectrophotometer exhibited significantly better reliability compared with visual shade selection. This study evaluates the ability of observers to consistently match the same shade visually and with a dental spectrophotometer in different sessions. The intra-rater and inter-rater reliability (agreement of repeated shade matching) of visual and instrumental tooth color matching strongly suggest the use of color matching instruments as a supplementary tool in everyday dental practice to enhance the esthetic outcome. © 2017 Wiley Periodicals, Inc.
Rater agreement reliability of the dial test in the ACL-deficient knee.

PubMed

Slichter, Malou E; Wolterbeek, Nienke; Auw Yang, K Gie; Zijl, Jacco A C; Piscaer, Tom M

2018-06-14

Posterolateral rotatory instability (PLRI) of the knee can easily be missed, because attention is paid to injury of the cruciate ligaments. If left untreated this clinical instability may persist after reconstruction of the cruciate ligaments and may put the graft at risk of failure. Even though the dial test is widely used to diagnose PLRI, no validity and reliability studies of the manual dial test are yet performed in patients. This study focuses on the reliability of the manual dial test by determining the rater agreement. Two independent examiners performed the dial test in knees of 52 patients after knee distorsion with a suspicion on ACL rupture. The dial test was performed in prone position in 30°, 60° and 90° of flexion of the knees. ≥10° side-to-side difference was considered a positive dial test. For quantification of the amount of rotation in degrees, a measuring device was used with a standardized 6 Nm force, using a digital torque adapter on a booth. The intra-rater, inter-rater and rater-device agreement were determined by calculating kappa (κ) for the dial test. A positive dial test was found in 21.2% and 18.0% of the patients as assessed by a blinded examiner and orthopaedic surgeon respectively. Fair inter-rater agreement was found in 30° of flexion, κ F = 0.29 (95% CI: 0.01 to 0.56), p = 0.044 and 90° of flexion, κ F = 0.38 (95% CI: 0.10 to 0.66), p = 0.007. Almost perfect rater-device agreement was found in 30° of flexion, κ C = 0.84 (95% CI: 0.52 to 1.15), p < 0.001. Moderate rater-device agreement was found in 30° and 90° combined, κ C = 0.50 (95% CI: 0.13 to 0.86), p = 0.008. No significant intra-rater agreement was found. Rater agreement reliability of the manual dial test is questionable. It has a fair inter-rater agreement in 30° and 90° of flexion.
Validation of the secretion severity rating scale.

PubMed

Pluschinski, Petra; Zaretsky, Eugen; Stöver, Timo; Murray, Joseph; Sader, Robert; Hey, Christiane

2016-10-01

Accumulation of secretions within the hypopharynx, aditus laryngis, and trachea is one characteristic of severe dysphagia and is of high clinical and therapeutic relevance. For the graduation of the secretion severity level, a secretion scale was provided by Murray et al. in 1996. The purpose of the study presented here is the validation of this scale by analyzing the intra-rater and inter-rater reliability as well as concurrent validity. For examination of reliability and validity, a reference standard was defined by two expert clinicians who reviewed 40 video recordings of fiberendoscopic swallowing evaluations, with 10 videos for each severity grade. These videos were rated and rerated independently and blinded by 4 ENT-residents with an interval of 4 weeks. Both the intra-rater (Kendall's τ > 0.847***) and inter-rater reliability (Kendall's W > 0.951***) were highly significant and can be considered good or very good. Correlation of the median of all ratings with the reference standard was close to the highest possible value 1 (τ = 0.984***). The scale was proved to be a reliable and valid instrument for graduation of one of the principal symptoms of oropharyngeal dysphagia and is recommended as an evidence-based instrument for standardized fiberoptic endoscopic evaluation of swallowing.
Reliability of anthropometric measurements in European preschool children: the ToyBox-study.

PubMed

De Miguel-Etayo, P; Mesana, M I; Cardon, G; De Bourdeaudhuij, I; Góźdź, M; Socha, P; Lateva, M; Iotova, V; Koletzko, B V; Duvinage, K; Androutsos, O; Manios, Y; Moreno, L A

2014-08-01

The ToyBox-study aims to develop and test an innovative and evidence-based obesity prevention programme for preschoolers in six European countries: Belgium, Bulgaria, Germany, Greece, Poland and Spain. In multicentre studies, anthropometric measurements using standardized procedures that minimize errors in the data collection are essential to maximize reliability of measurements. The aim of this paper is to describe the standardization process and reliability (intra- and inter-observer) of height, weight and waist circumference (WC) measurements in preschoolers. All technical procedures and devices were standardized and centralized training was given to the fieldworkers. At least seven children per country participated in the intra- and inter-observer reliability testing. Intra-observer technical error ranged from 0.00 to 0.03 kg for weight and from 0.07 to 0.20 cm for height, with the overall reliability being above 99%. A second training was organized for WC due to low reliability observed in the first training. Intra-observer technical error for WC ranged from 0.12 to 0.71 cm during the first training and from 0.05 to 1.11 cm during the second training, and reliability above 92% was achieved. Epidemiological surveys need standardized procedures and training of researchers to reduce measurement error. In the ToyBox-study, very good intra- and-inter-observer agreement was achieved for all anthropometric measurements performed. © 2014 World Obesity.
Comparative study between the hand-wrist method and cervical vertebral maturation method for evaluation skeletal maturity in cleft patients.

PubMed

Manosudprasit, Montian; Wangsrimongkol, Tasanee; Pisek, Poonsak; Chantaramungkorn, Melissa

2013-09-01

To test the measure of agreement between use of the Skeletal Maturation Index (SMI) method of Fishman using hand-wrist radiographs and the Cervical Vertebral Maturation Index (CVMI) method for assessing skeletal maturity of the cleft patients. Hand-wrist and lateral cephalometric radiographs of 60 cleft subjects (35 females and 25 males, age range: 7-16 years) were used. Skeletal age was assessed using an adjustment to the SMI method of Fishman to compare with the CVMI method of Hassel and Farman. Agreement between skeletal age assessed by both methods and the intra- and inter-examiner reliability of both methods were tested by weighted kappa analysis. There was good agreement between the two methods with a kappa value of 0.80 (95% CI = 0.66-0.88, p-value <0.001). Reliability of intra- and inter-examiner of both methods was very good with kappa value ranging from 0.91 to 0.99. The CVMI method can be used as an alternative to the SMI method in skeletal age assessment in cleft patients with the benefit of no need of an additional radiograph and avoiding extra-radiation exposure. Comparing the two methods, the present study found better agreement from peak of adolescence onwards.
Is Ultrasound a Valid and Reliable Imaging Modality for Airway Evaluation?: An Observational Computed Tomographic Validation Study Using Submandibular Scanning of the Mouth and Oropharynx.

PubMed

Abdallah, Faraj W; Yu, Eugene; Cholvisudhi, Phantila; Niazi, Ahtsham U; Chin, Ki J; Abbas, Sherif; Chan, Vincent W

2017-01-01

Ultrasound (US) imaging of the airway may be useful in predicting difficulty of airway management (DAM); but its use is limited by lack of proof of its validity and reliability. We sought to validate US imaging of the airway by comparison to CT-scan, and to assess its inter- and intra-observer reliability. We used submandibular sonographic imaging of the mouth and oropharynx to examine how well the ratio of tongue thickness to oral cavity height correlates with the ratio of tongue volume to oral cavity volume, an established tomographic measure of DAM. A cohort of 34 patients undergoing CT-scan was recruited. Study standardized assessments included CT-measured ratios of tongue volume to oropharyngeal cavity volume; tongue thickness to oral cavity height; and US-measured ratio of tongue thickness to oral cavity height. Two sonographers independently performed US imaging of the airway before and after CT-scan. Our findings indicate that the US-measured ratio of tongue thickness to oral cavity height highly correlates with the CT-measured ratio of tongue volume to oral cavity volume. US measurements also demonstrated strong inter- and intra-observer reliability. This study suggests that US is a valid and reliable tool for imaging the oral and oropharyngeal parts of the airway, as well as for measuring the volumetric relationship between the tongue and oral cavity, and may therefore be a useful predictor of DAM. © 2016 by the American Institute of Ultrasound in Medicine.
Mixed methods evaluation of a quality improvement and audit tool for nurse-to-nurse bedside clinical handover in ward settings.

PubMed

Redley, Bernice; Waugh, Rachael

2018-04-01

Nurse bedside handover quality is influenced by complex interactions related to the content, processes used and the work environment. Audit tools are seldom tested in 'real' settings. Examine the reliability, validity and usability of a quality improvement tool for audit of nurse bedside handover. Naturalistic, descriptive, mixed-methods. Six inpatient wards at a single large not-for-profit private health service in Victoria, Australia. Five nurse experts and 104 nurses involved in 199 change-of-shift bedside handovers. A focus group with experts and pilot test were used to examine content and face validity, and usability of the handover audit tool. The tool was examined for inter-rater reliability and usability using observation audits of handovers across six wards. Data were collected in 2013-2014. Two independent observers for 72 audits demonstrated acceptable inter-observer agreement for 27 (77%) items. Reliability was weak for items examining the handover environment. Seventeen items were not observed reflecting gaps in practices. Across 199 observation audits, gaps in nurse bedside handover practice most often related to process and environment, rather than content items. Usability was impacted by high observer burden, familiarity and non-specific illustrative behaviours. The reliability and validity of most items to audit handover content was acceptable. Gaps in practices for process and environment items were identified. Context specific exemplars and reducing the items used at each handover audit can enhance usability. Further research is needed to develop context specific exemplars and undertake additional reliability testing using a wide range of handover settings. CONTRIBUTION OF THE PAPER. Copyright © 2017 Elsevier Inc. All rights reserved.
Reliability of Real-time Ultrasound Imaging for the Assessment of Trunk Stabilizer Muscles: A Systematic Review of the Literature.

PubMed

Taghipour, Morteza; Mohseni-Bandpei, Mohammad Ali; Behtash, Hamid; Abdollahi, Iraj; Rajabzadeh, Fatemeh; Pourahmadi, Mohammad Reza; Emami, Mahnaz

2018-04-24

Rehabilitative ultrasound (US) imaging is one of the popular methods for investigating muscle morphologic characteristics and dimensions in recent years. The reliability of this method has been investigated in different studies. As studies have been performed with different designs and quality, reported values of rehabilitative US have a wide range. The objective of this study was to systematically review the literature conducted on the reliability of rehabilitative US imaging for the assessment of deep abdominal and lumbar trunk muscle dimensions. The PubMed/MEDLINE, Scopus, Google Scholar, Science Direct, Embase, Physiotherapy Evidence, Ovid, and CINAHL databases were searched to identify original research articles conducted on the reliability of rehabilitative US imaging published from June 2007 to August 2017. The articles were qualitatively assessed; reliability data were extracted; and the methodological quality was evaluated by 2 independent reviewers. Of the 26 included studies, 16 were considered of high methodological quality. Except for 2 studies, all high-quality studies reported intraclass correlation coefficients (ICCs) for intra-rater reliability of 0.70 or greater. Also, ICCs reported for inter-rater reliability in high-quality studies were generally greater than 0.70. Among low-quality studies, reported ICCs ranged from 0.26 to 0.99 and 0.68 to 0.97 for intra- and inter-rater reliability, respectively. Also, the reported standard error of measurement and minimal detectable change for rehabilitative US were generally in an acceptable range. Generally, the results of the reviewed studies indicate that rehabilitative US imaging has good levels of both inter- and intra-rater reliability. © 2018 by the American Institute of Ultrasound in Medicine.
Reliability and validity of the Turkish version of the Berg Balance Scale.

PubMed

Sahin, Fusun; Yilmaz, Figen; Ozmaden, Asli; Kotevolu, Nurdan; Sahin, Tulay; Kuran, Banu

2008-01-01

The purpose of this study was to develop a Turkish version of the Berg Balance Scale (BBS) and assess its reliability and validity. Sixty healthy volunteers older than 65 years were included in to the study. Subjects who had lower extremity amputation, or were armchair or bedridden were excluded. After translation process, the Turkish version of the scale was administered to each participant twice with an interval of 2 weeks. The intraclass correlation coefficient (ICC) was calculated to assess intra- and inter-observer reliability. Chronbach alpha was calculated to evaluate internal consistency of the total BBS score. Interclass correlation coefficient was calcuated to examine test-retest reliability. Convergent validity was assessed by correlating the scale with Modified Barthel Index (MBI) and Timed Up and Go Test (TUG). Construct validity was assessed with factor analysis. The mean age in years of the participants were 77.00+/-5.67 (range: 67-92 yrs). The ICC for intra- and inter- observer reliability was 0.98 (p<0.0001) and 0.97 (p<0.0001), respectively. Chronbach alpha of the Turkish version of the BBS was 0.98. The test-retest reliability (ICC) of the Turkish version of the BBS was determined as 0.98 for the total score, and ranged from 0.86-0.99 for individual items. In terms of validity, the Turkish version of the BBS was correlated with the MBI (in positive direction) and TUG (in negative direction) (r=0.67 p<0.0001; r=-0.75 p<0.0001, respectively). The Turkish version of the BBS is a reliable and valid scale to be used in balance assessment of Turkish older adults.
Feasibility and inter-rater reliability of the ICU Mobility Scale.

PubMed

Hodgson, Carol; Needham, Dale; Haines, Kimberley; Bailey, Michael; Ward, Alison; Harrold, Megan; Young, Paul; Zanni, Jennifer; Buhr, Heidi; Higgins, Alisa; Presneill, Jeff; Berney, Sue

2014-01-01

The objectives of this study were to develop a scale for measuring the highest level of mobility in adult ICU patients and to assess its feasibility and inter-rater reliability. Growing evidence supports the feasibility, safety and efficacy of early mobilization in the intensive care unit (ICU). However, there are no adequately validated tools to quickly, easily, and reliably describe the mobility milestones of adult patients in ICU. Identifying or developing such a tool is a priority for evaluating mobility and rehabilitation activities for research and clinical care purposes. This study was performed at two ICUs in Australia. Thirty ICU nursing, and physiotherapy staff assessed the feasibility of the 'ICU Mobility Scale' (IMS) using a 10-item questionnaire. The inter-rater reliability of the IMS was assessed by 2 junior physical therapists, 2 senior physical therapists, and 16 nursing staff in 100 consecutive medical, surgical or trauma ICU patients. An 11 point IMS scale was developed based on multidisciplinary input. Participating clinicians reported that the scale was clear, with 95% of respondents reporting that it took <1 min to complete. The junior and senior physical therapists showed the highest inter-rater reliability with a weighted Kappa (95% confidence interval) of 0.83 (0.76-0.90), while the senior physical therapists and nurses and the junior physical therapists and nurses had a weighted Kappa of 0.72 (0.61-0.83) and 0.69 (0.56-0.81) respectively. The IMS is a feasible tool with strong inter-rater reliability for measuring the maximum level of mobility of adult patients in the ICU. Copyright © 2014 Elsevier Inc. All rights reserved.
The reliability of WorkWell Systems Functional Capacity Evaluation: a systematic review

PubMed Central

2014-01-01

Background Functional capacity evaluation (FCE) determines a person’s ability to perform work-related tasks and is a major component of the rehabilitation process. The WorkWell Systems (WWS) FCE (formerly known as Isernhagen Work Systems FCE) is currently the most commonly used FCE tool in German rehabilitation centres. Our systematic review investigated the inter-rater, intra-rater and test-retest reliability of the WWS FCE. Methods We performed a systematic literature search of studies on the reliability of the WWS FCE and extracted item-specific measures of inter-rater, intra-rater and test-retest reliability from the identified studies. Intraclass correlation coefficients ≥ 0.75, percentages of agreement ≥ 80%, and kappa coefficients ≥ 0.60 were categorised as acceptable, otherwise they were considered non-acceptable. The extracted values were summarised for the five performance categories of the WWS FCE, and the results were classified as either consistent or inconsistent. Results From 11 identified studies, 150 item-specific reliability measures were extracted. 89% of the extracted inter-rater reliability measures, all of the intra-rater reliability measures and 96% of the test-retest reliability measures of the weight handling and strength tests had an acceptable level of reliability, compared to only 67% of the test-retest reliability measures of the posture/mobility tests and 56% of the test-retest reliability measures of the locomotion tests. Both of the extracted test-retest reliability measures of the balance test were acceptable. Conclusions Weight handling and strength tests were found to have consistently acceptable reliability. Further research is needed to explore the reliability of the other tests as inconsistent findings or a lack of data prevented definitive conclusions. PMID:24674029
Inter- and intraobserver reliability of the vertebral, local and segmental kyphosis in 120 traumatic lumbar and thoracic burst fractures: evaluation in lateral X-rays and sagittal computed tomographies

PubMed Central

Brunner, Alexander; Gühring, Markus; Schmälzle, Traude; Weise, Kuno; Badke, Andreas

2009-01-01

Evaluation of the kyphosis angle in thoracic and lumbar burst fractures is often used to indicate surgical procedures. The kyphosis angle could be measured as vertebral, segmental and local kyphosis according to the method of Cobb. The vertebral, segmental and local kyphosis according to the method of Cobb were measured at 120 lateral X-rays and sagittal computed tomographies of 60 thoracic and 60 lumbar burst fractures by 3 independent observers on 2 separate occasions. Osteoporotic fractures were excluded. The intra- and interobserver reliability of these angles in X-ray and computed tomogram, using the intra class correlation coefficient (ICC) were evaluated. Highest reproducibility showed the segmental kyphosis followed by the vertebral kyphosis. For thoracic fractures segmental kyphosis shows in X-ray “excellent” inter- and intraobserver reliabilities (ICC 0.826, 0.802) and for lumbar fractures “good” to “excellent” inter- and intraobserver reliabilities (ICC = 0.790, 0.803). In computed tomography, the segmental kyphosis showed “excellent” inter- and intraobserver reliabilities (ICC = 0.824, 0.801) for thoracic and “excellent” inter- and intraobserver reliabilities (ICC = 0.874, 0.835) for the lumbar fractures. Regarding both diagnostic work ups (X-ray and computed tomography), significant differences were evaluated in interobserver reliabilities for vertebral kyphosis measured in lumbar fracture X-rays (p = 0.035) and interobserver reliabilities for local kyphosis, measured in thoracic fracture X-rays (p = 0.010). Regarding both fracture localizations (thoracic and lumbar fractures), significant differences could only be evaluated in interobserver reliabilities for the local kyphosis measured in computed tomographies (p = 0.045) and in intraobserver reliabilities for the vertebral kyphosis measured in X-rays (p = 0.024). “Good” to “excellent” inter- and intraobserver reliabilities for vertebral, segmental and local kyphosis in X-ray make these angles to a helpful tool, indicating surgical procedures. For the practical use in lateral X-ray, we emphasize the determination of the segmental kyphosis, because of the highest reproducibility of this angle. “Good” to “excellent” inter- and intraobserver reliabilities for these three angles could also be evaluated in computed tomographies. Therefore, also in computed tomography, the use of these three angles seems to be generally possible. For a direct correlation of the results in lateral X-ray and in computed tomography, further studies should be needed. PMID:19953277
Reliability of doming and toe flexion testing to quantify foot muscle strength.

PubMed

Ridge, Sarah Trager; Myrer, J William; Olsen, Mark T; Jurgensmeier, Kevin; Johnson, A Wayne

2017-01-01

Quantifying the strength of the intrinsic foot muscles has been a challenge for clinicians and researchers. The reliable measurement of this strength is important in order to assess weakness, which may contribute to a variety of functional issues in the foot and lower leg, including plantar fasciitis and hallux valgus. This study reports 3 novel methods for measuring foot strength - doming (previously unmeasured), hallux flexion, and flexion of the lesser toes. Twenty-one healthy volunteers performed the strength tests during two testing sessions which occurred one to five days apart. Each participant performed each series of strength tests (doming, hallux flexion, and lesser toe flexion) four times during the first testing session (twice with each of two raters) and two times during the second testing session (once with each rater). Intra-class correlation coefficients were calculated to test for reliability for the following comparisons: between raters during the same testing session on the same day (inter-rater, intra-day, intra-session), between raters on different days (inter-rater, inter-day, inter-session), between days for the same rater (intra-rater, inter-day, inter-session), and between sessions on the same day by the same rater (intra-rater, intra-day, inter-session). ICCs showed good to excellent reliability for all tests between days, raters, and sessions. Average doming strength was 99.96 ± 47.04 N. Average hallux flexion strength was 65.66 ± 24.5 N. Average lateral toe flexion was 50.96 ± 22.54 N. These simple tests using relatively low cost equipment can be used for research or clinical purposes. If repeated testing will be conducted on the same participant, it is suggested that the same researcher or clinician perform the testing each time for optimal reliability.
Reliability of the nursing care hour measure: a descriptive study.

PubMed

Klaus, Susan F; Dunton, Nancy; Gajewski, Byron; Potter, Catima

2013-07-01

The nursing care hour has become an international standard unit of measure in research where nurse staffing is a key variable. Until now, there have been no studies verifying whether nursing care hours obtained from hospital data sources can be collected reliably. To examine the processes used by hospitals to generate nursing care hour data and to evaluate inter-rater reliability and guideline compliance with standards of the National Database of Nursing Quality Indicators(®) (NDNQI(®)) and the National Quality Forum. Two-phase descriptive study of all NDNQI hospitals that submitted data in third quarter of 2007. Data for phase I came from an online survey created by the authors to ascertain the processes used by hospitals to collect nursing care hours and their compliance with standardized data collection guidelines. In phase II, inter-rater reliability was measured using intra-class correlations between nursing care hours generated from clock hour files submitted to the study team by participants' payroll/accounting departments and aggregated data submitted previously. Phase I data were obtained from a total of 714 respondents. Nearly half (48%) of all sites use payroll records to obtain nursing care hour data and 70% use one of the standardized methods for converting the bi-weekly hours into months. Unit secretaries were reportedly included in NCH by 17.4% of respondents and only 26.2% of sites could accurately identify the point at which newly hired nurses should be included. The phase II findings (n=11) support the ability of two independent raters to obtain similar results when calculating total nursing care hours according to standard guidelines (ICC=0.76-0.99). Although barriers exist, this study found support for hospitals' abilities to collect reliable nursing care hour data. Copyright © 2012 Elsevier Ltd. All rights reserved.
Accuracy and reliability of the Pfeffer Questionnaire for the Brazilian elderly population

PubMed Central

Dutra, Marina Carneiro; Ribeiro, Raynan dos Santos; Pinheiro, Sarah Brandão; de Melo, Gislane Ferreira; Carvalho, Gustavo de Azevedo

2015-01-01

The aging population calls for instruments to assess functional and cognitive impairment in the elderly, aiming to prevent conditions that affect functional abilities. Objective To verify the accuracy and reliability of the Pfeffer (FAQ) scale for the Brazilian elderly population and to evaluate the reliability and reproducibility of the translated version of the Pfeffer Questionnaire. Methods The Brazilian version of the FAQ was applied to 110 elderly divided into two groups. Both groups were assessed by two blinded investigators at baseline and again after 15 days. In order to verify the accuracy and reliability of the instrument, sensitivity and specificity measurements for the presence or absence of functional and cognitive decline were calculated for various cut-off points and the ROC curve. Intra and inter-examiner reliability were assessed using the Interclass Correlation Coefficient (ICC) and Bland-Altman plots. Results For the occurrence of cognitive decline, the ROC curve yielded an area under the curve of 0.909 (95%CI of 0.845 to 0.972), sensitivity of 75.68% (95%CI of 93.52% to 100%) and specificity of 97.26%. For the occurrence of functional decline, the ROC curve yielded an area under the curve of 0.851 (95%CI of 64.52% to 87.33%) and specificity of 80.36% (95%CI of 69.95% to 90.76%). The ICC was excellent, with all values exceeding 0.75. On the Bland-Altman plot, intra-examiner agreement was good, with p>0.05consistently close to 0. A systematic difference was found for inter-examiner agreement. Conclusion The Pfeffer Questionnaire is applicable in the Brazilian elderly population and showed reliability and reproducibility compared to the original test. PMID:29213959
An Examination of the True Reliability of Lower Limb Stiffness Measures During Overground Hopping.

PubMed

Diggin, David; Anderson, Ross; Harrison, Andrew J

2016-06-01

Evidence suggests reports describing the reliability of leg-spring (kleg) and joint stiffness (kjoint) measures are contaminated by artifacts originating from digital filtering procedures. In addition, the intraday reliability of kleg and kjoint requires investigation. This study examined the effects of experimental procedures on the inter- and intraday reliability of kleg and kjoint. Thirty-two participants completed 2 trials of single-legged hopping at 1.5, 2.2, and 3.0 Hz at the same time of day across 3 days. On the final test day a fourth experimental bout took place 6 hours before or after participants' typical testing time. Kinematic and kinetic data were collected throughout. Stiffness was calculated using models of kleg and kjoint. Classifications of measurement agreement were established using thresholds for absolute and relative reliability statistics. Results illustrated that kleg and kankle exhibited strong agreement. In contrast, kknee and khip demonstrated weak-to-moderate consistency. Results suggest limits in kjoint reliability persist despite employment of appropriate filtering procedures. Furthermore, diurnal fluctuations in lower-limb muscle-tendon stiffness exhibit little effect on intraday reliability. The present findings support the existence of kleg as an attractor state during hopping, achieved through fluctuations in kjoint variables. Limits to kjoint reliability appear to represent biological function rather than measurement artifact.
Reliability and Concurrent Validity of Dynamic Rotator Stability Test-A Cross Sectional study.

PubMed

Binoy Mathew, K V; Eapen, Charu; Kumar, P Senthil

2012-01-01

To find intra rater and inter rater reliability of Dynamic Rotator Stability Test (DRST) and to find concurrent validity of Dynamic Rotator Stability Test (DRST) with University of Pennsylvania Shoulder Score (PENN) Scale. 40 subjects of either gender between the age group of 18-70 with painful shoulder conditions of musculoskeletal origin was selected through convenient sampling. Tester 1 and tester 2 administered DRST and PENN scale randomly. In a subgroup of 20 subjects DRST was administered by both the testers to find the inter rater reliability. 180° Standard Universal Goniometer was used to take measurements. For intra-rater reliability, all the test variables were showing highly significant correlation (p=.94 - 1). For inter -rater, with tester 2, test variables like position, ROM, force, direction of abnormal translation, pain during the test, compensatory movement during test were found to be significant (p=.71-1).only some variables of DRST showed significant correlation with PENN scale (P=.320-.450). Dynamic Rotator Stability Test has good intra rater and moderate inter rater reliability. Concurrent validity of Dynamic Rotator Stability Test was found to be poor when compared to PENN Shoulder Score.
The admissions process of a bachelor of science in nursing program: initial reliability and validity of the personal interview.

PubMed

Carpio, B; Brown, B

1993-01-01

The undergraduate nursing degree program (B.Sc.N.) at McMaster University School of Nursing uses small groups, and is learner-centered and problem-based. A study was conducted during the 1991 admissions cycle to determine the initial reliability and validity of the semi-structured personal interview which constitutes the final component of candidate selection for this program. During the interview, three-member teams assess applicant suitability to the program based on six dimensions: applicant motivation, awareness of the program, problem-solving abilities, ability to relate to others, self-appraisal skills, and career goals. Each interviewer assigns the applicant a global rating using a seven-point scale. For the purposes of this study four interviewer teams were randomly selected from the pool of 31 teams to interview four simulated (preprogrammed) applicants. Using two-factor repeated-measures ANOVA to analyze interview ratings, inter-rater and inter-team intraclass correlation coefficients (ICC) were calculated. Inter-team reliability ranged from .64 to .97 for the individual dimensions, and .66 to .89 on global ratings. Inter-rater ICC for the six dimensions ranged from .81 to .99, and .96 to .99 for the global ratings. The item-to-total correlation coefficients between individual dimensions and global ratings ranged from .8 to 1.0. Pearson correlations between items ranged from .77 to 1.0. The ICC were then calculated for the interview scores of 108 actual applicants to the program. Inter-rater reliability based on global ratings was .79 for the single (1 rater) observation, and .91 for the multiple (3 rater) observation. These findings support the continued use of the interview as a reliable instrument with face validity. Studies of predictive validity will be undertaken.
A new iPhone application for measuring active craniocervical range of motion in patients with non-specific neck pain: a reliability and validity study.

PubMed

Pourahmadi, Mohammad Reza; Bagheri, Rasool; Taghipour, Morteza; Takamjani, Ismail Ebrahimi; Sarrafzadeh, Javad; Mohseni-Bandpei, Mohammad Ali

2018-03-01

Measurement of cervical spine range of motion (ROM) is often considered to be an essential component of cervical spine physiotherapy assessment. This study aimed to investigate the reliability and validity of an iPhone application (app) (Goniometer Pro) for measuring active craniocervical ROM (ACCROM) in patients with non-specific neck pain. A cross-sectional study was conducted at the musculoskeletal biomechanics laboratory located at Iran University of Medical Sciences. Forty non-specific neck pain patients participated in this study. The outcome measure was the ACCROM, including flexion, extension, lateral flexion, and rotation. Following the recruitment process, ACCROM was measured using a universal goniometer (UG) and iPhone 7 app. Two blinded examiners each used the UG and iPhone to measure ACCROM in the following sequences: flexion, extension, lateral flexion, and rotation. The second (2 hours later) and third (48 hours later) sessions were carried out in the same manner as the first session. Intraclass correlation coefficient (ICC) models were used to determine the intra-rater and inter-rater reliability. The Pearson correlation coefficients were used to establish concurrent validity of the iPhone app. Minimum detectable change at the 95% confidence level (MDC 95 ) was also computed. Good intra-rater and inter-rater reliability was demonstrated for the goniometer with ICC values of ≥0.66 and ≥0.70 and the iPhone app with ICC values of ≥0.62 and ≥0.65, respectively. The MDC 95 ranged from 2.21° to 12.50° for the intra-rater analysis and from 3.40° to 12.61° for the inter-rater analysis. The concurrent validity between the two instruments was high, with r valuesof ≥0.63. The magnitude of the differences between the UG and iPhone app values (effect sizes) was small, with Cohen d values of ≤0.17. The iPhone app possesses good reliability and high validity. It seems that this app can be used for measuring ACCROM. Copyright © 2017 Elsevier Inc. All rights reserved.
Inter- and intra-rater reliability of nasal auscultation in daycare children.

PubMed

Santos, Rita; Silva Alexandrino, Ana; Tomé, David; Melo, Cristina; Mesquita Montes, António; Costa, Daniel; Pinto Ferreira, João

2018-02-01

The aim of this study was to assess nasal auscultation's intra- and inter-rater reliability and to analyze ear and respiratory clinical condition according to nasal auscultation. Cross-sectional study performed in 125 children aged up to 3 years old attending daycare centers. Nasal auscultation, tympanometry and Paediatric Respiratory Severity Score (PRSS) were applied to all children. Nasal sounds were classified by an expert panel in order to determine nasal auscultation's intra and inter- rater reliability. The classification of nasal sounds was assessed against tympanometric and PRSS values. Nasal auscultation revealed substantial inter-rater (K=0.75) and intra-rater (K=0.69; K=0.61 and K=0.72) reliability. Children with a "non-obstructed" classification revealed a lower peak pressure (t=-3.599, P<0.001 in left ear; t=-2.258, P=0.026 in right ear) and a higher compliance (t=-2,728, P=0.007 in left ear; t=-3.830. P<0.001 in right ear) in both ears. There was an association between the classification of sounds and tympanogram types in both ears (X=11.437, P=0.003 in left ear; X=13.535, P=0.001 in right ear). Children with a "non-obstructed" classification had a healthier respiratory condition. Nasal auscultation revealed substantial intra- and inter-rater reliability. Nasal auscultation exhibited important differences according to ear and respiratory clinical conditions. Nasal auscultation in pediatrics seems to be an original topic as well as a simple method that can be used to identify early signs of nasopharyngeal obstruction.

[Kennedy V Axis assessment in an Italian outpatient and inpatient population].

PubMed

Mundo, Emanuela; Bonalume, Laura; Del Corno, Franco; Madeddu, Fabio; Lang, Margherita

2010-01-01

Kennedy Axis V or K Axis acts is an alternative tool to the DSM-IVTR Global Assessment of Functioning (GAF) Scale, that many researchers describe as a scale with poor inter-rater reliability and clinical utility. Unlike the GAF scale, K Axis provides a multidimensional and multiaxial approach to measure personal, social and interpersonal functioning in psychiatric outpatients and inpatients. In this study, we examined K Axis's inter-raters reliability by using it with an Italian clinical population. Clinicians used Kennedy Axis V to assess global functioning among 180 inpatients, in 9 psychiatric services in Lombardia and Piemonte. Patients were divided into 4 different diagnostic groups, according to the DSM-IV-TR criteria. Intraclass correlations between two independent raters's scores reveal high level of interrater reliability for all K Axis scales (0,633 < ICC < 0,813). Highly significant results in the Kruskal-Wallis test demonstrate that the patient diagnosis influence all the scales scores. Significant differences in patients functioning profiles in all K Axis scales, apart from Violence one, were noted between different diagnosis groups. In this study high level of raters agreement was noted, even if K Axis scales were used in different mental health services from different clinicians. K Axis scales provide a useful profile of patient global functioning, in line with the specific pathology.
Reliability of the OSCE for Physical and Occupational Therapists

PubMed Central

Sakurai, Hiroaki; Kanada, Yoshikiyo; Sugiura, Yoshito; Motoya, Ikuo; Wada, Yosuke; Yamada, Masayuki; Tomita, Masao; Tanabe, Shigeo; Teranishi, Toshio; Tsujimura, Toru; Sawa, Syunji; Okanishi, Tetsuo

2014-01-01

[Purpose] To examine agreement rates between faculty members and clinical supervisors as OSCE examiners. [Subjects] The study subjects were involved physical and occupational therapists working in clinical environments for 1 to 5 years after graduating from training schools as OSCE examinees, and a physical or occupational therapy faculty member and a clinical supervisor as examiners. Another clinical supervisor acted as a simulated patient. [Methods] The agreement rate between the examiners for each OSCE item was calculated based on Cohen’s kappa coefficient to confirm inter-rater reliability. [Results] The agreement rates for the behavioral aspects of the items were higher in the second than in the first examination. Similar increases were also observed in the agreement rates for the technical aspects until the initiation of each activity; however, the rates decreased during the middle to terminal stages of continuous movements. [Conclusion] The results may reflect the recent implementation of measures for the integration of therapist education in training schools and clinical training facilities. PMID:25202170
The reliability of a maximal isometric hip strength and simultaneous surface EMG screening protocol in elite, junior rugby league athletes.

PubMed

Charlton, Paula C; Mentiplay, Benjamin F; Grimaldi, Alison; Pua, Yong-Hao; Clark, Ross A

2017-02-01

Firstly to describe the reliability of assessing maximal isometric strength of the hip abductor and adductor musculature using a hand held dynamometry (HHD) protocol with simultaneous wireless surface electromyographic (sEMG) evaluation of the gluteus medius (GM) and adductor longus (AL). Secondly, to describe the correlation between isometric strength recorded with the HHD protocol and a laboratory standard isokinetic device. Reliability and correlational study. A sample of 24 elite, male, junior, rugby league athletes, age 16-20 years participated in repeated HHD and isometric Kin-Com (KC) strength testing with simultaneous sEMG assessment, on average (range) 6 (5-7) days apart by a single assessor. Strength tests included; unilateral hip abduction (ABD) and adduction (ADD) and bilateral ADD assessed with squeeze (SQ) tests in 0 and 45° of hip flexion. HHD demonstrated good to excellent inter-session reliability for all outcome measures (ICC (2,1) =0.76-0.91) and good to excellent association with the laboratory reference KC (ICC (2,1) =0.80-0.88). Whilst intra-session, inter-trial reliability of EMG activation and co-activation outcome measures ranged from moderate to excellent (ICC (2,1) =0.70-0.94), inter-session reliability was poor (all ICC (2,1) <0.50). Isometric strength testing of the hip ABD and ADD musculature using HHD may be measured reliably in elite, junior rugby league athletes. Due to the poor inter-session reliability of sEMG measures, it is not recommended for athlete screening purposes if using the techniques implemented in this study. Copyright © 2016 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Narrative review: should teaching of the respiratory physical examination be restricted only to signs with proven reliability and validity?

PubMed

Benbassat, Jochanan; Baumal, Reuben

2010-08-01

To review the reported reliability (reproducibility, inter-examiner agreement) and validity (sensitivity, specificity and likelihood ratios) of respiratory physical examination (PE) signs, and suggest an approach to teaching these signs to medical students. Review of the literature. We searched Paper Chase between 1966 and June 2009 to identify and evaluate published studies on the diagnostic accuracy of respiratory PE signs. Most studies have reported low to fair reliability and sensitivity values. However, some studies have found high specificites for selected PE signs. None of the studies that we reviewed adhered to all of the STARD criteria for reporting diagnostic accuracy. Possible flaws in study designs may have led to underestimates of the observed diagnostic accuracy of respiratory PE signs. The reported poor reliabilities may have been due to differences in the PE skills of the participating examiners, while the sensitivities may have been confounded by variations in the severity of the diseases of the participating patients. IMPLICATION FOR PRACTICE AND MEDICAL EDUCATION: Pending the results of properly controlled studies, the reported poor reliability and sensitivity of most respiratory PE signs do not necessarily detract from their clinical utility. Therefore, we believe that a meticulously performed respiratory PE, which aims to explore a diagnostic hypothesis, as opposed to a PE that aims to detect a disease in an asymptomatic person, remains a cornerstone of clinical practice. We propose teaching the respiratory PE signs according to their importance, beginning with signs of life-threatening conditions and those that have been reported to have a high specificity, and ending with signs that are "nice to know," but are no longer employed because of the availability of more easily performed tests.
Narrative Review: Should Teaching of the Respiratory Physical Examination Be Restricted Only to Signs with Proven Reliability and Validity?

PubMed Central

Baumal, Reuben

2010-01-01

OBJECTIVE To review the reported reliability (reproducibility, inter-examiner agreement) and validity (sensitivity, specificity and likelihood ratios) of respiratory physical examination (PE) signs, and suggest an approach to teaching these signs to medical students. METHODS Review of the literature. We searched Paper Chase between 1966 and June 2009 to identify and evaluate published studies on the diagnostic accuracy of respiratory PE signs. RESULTS Most studies have reported low to fair reliability and sensitivity values. However, some studies have found high specificites for selected PE signs. None of the studies that we reviewed adhered to all of the STARD criteria for reporting diagnostic accuracy. CONCLUSIONS Possible flaws in study designs may have led to underestimates of the observed diagnostic accuracy of respiratory PE signs. The reported poor reliabilities may have been due to differences in the PE skills of the participating examiners, while the sensitivities may have been confounded by variations in the severity of the diseases of the participating patients. IMPLICATION FOR PRACTICE AND MEDICAL EDUCATION Pending the results of properly controlled studies, the reported poor reliability and sensitivity of most respiratory PE signs do not necessarily detract from their clinical utility. Therefore, we believe that a meticulously performed respiratory PE, which aims to explore a diagnostic hypothesis, as opposed to a PE that aims to detect a disease in an asymptomatic person, remains a cornerstone of clinical practice. We propose teaching the respiratory PE signs according to their importance, beginning with signs of life-threatening conditions and those that have been reported to have a high specificity, and ending with signs that are "nice to know," but are no longer employed because of the availability of more easily performed tests. PMID:20349154
Inter- and intra-rater reliability and agreement in determining subcutaneous tumour margins in dogs.

PubMed

Ranganathan, B; Milovancev, M; Leeper, H; Townsend, K L; Bracha, S; Curran, K

2018-03-01

The objective of this prospective study was to evaluate agreement and reliability of calliper-based measurements of locally invasive subcutaneous malignant tumours in dogs. Four raters measured the longest diameter of 12 subcutaneous tumours (7 soft tissue sarcomas and 5 mast cell tumours) from 11 client-owned dogs during 3 randomized, blinded measurement trials, both pre- and post-sedation. Inter- and intra-rater reliability was evaluated using intra-class correlation coefficient (ICC) and agreement was evaluated using Bland-Altman plots. Inter- and intra-rater reliability was good (ICC range of 0.8694-0.89520) and excellent (ICC range of 0.9720-0.9966), respectively. For agreement calculations, an a priori clinically relevant limit of agreement of 10 mm was set. Inter- and intra-rater agreement was unacceptable with inter-rater limits of agreement ranging from 15.9 to 55.6 mm and intra-rater limit of agreement ranging from 11.9 to 28.1 mm. Review of the measurement trial photographs revealed that calliper orientation changes were frequent, occurring in 9/12 (75%) and 8/12 (67%) pre- and post-sedation cases. No significant correlation was found between inter-rater measurement standard deviations and calliper orientation changes or dog body condition score. These findings suggest veterinarians may have poor agreement in determining the gross edge of tumours, which is expected to introduce bias and inconsistency in tumour staging, assessing response to therapy, and surgical margin planning. Due to the potential consequences for veterinary cancer patients, future studies are needed to validate the present findings. © 2018 John Wiley & Sons Ltd.
Intra-Rater and Inter-Rater Reliability of the Balance Error Scoring System in Pre-Adolescent School Children

ERIC Educational Resources Information Center

Sheehan, Dwayne P.; Lafave, Mark R.; Katz, Larry

2011-01-01

This study was designed to test the intra- and inter-rater reliability of the University of North Carolina's Balance Error Scoring System in 9- and 10-year-old children. Additionally, a modified version of the Balance Error Scoring System was tested to determine if it was more sensitive in this population ("raw scores"). Forty-six…
Measuring symptoms and functioning of youth with ADHD in middle schools.

PubMed

Evans, Steven W; Allen, Jessica; Moore, Sheryle; Strauss, Victoria

2005-12-01

The identification of reliable and valid means for evaluating the effectiveness of school-based treatments and completing diagnostic evaluations of middle school aged students are needed. The present study examined the inter-rater agreement of teacher ratings and the relationship between ratings and observational data in a middle school setting. The data are interpreted in the context of differences between a secondary and elementary school setting. Teacher ratings and observational data were collected regularly over the course of two academic years for middle school students diagnosed with ADHD. The results indicate low rates of inter-rater agreement as well as low rates of agreement between teachers and observational data, and between observational data collected in different classrooms. Inter-rater agreement was lowest in late fall and gradually increased over the second half of the year. Implications for conducting treatment outcome evaluations of school-based treatment programs and diagnostic evaluations are discussed.
An Investigation of the Immediate Effect of Static Stretching on the Morphology and Stiffness of Achilles Tendon in Dominant and Non-Dominant Legs

PubMed Central

Chiu, Tsz-chun Roxy; Ngo, Hiu-ching; Lau, Lai-wa; Leung, King-wah; Lo, Man-him; Yu, Ho-fai; Ying, Michael

2016-01-01

Aims This study was undertaken to investigate the immediate effect of static stretching on normal Achilles tendon morphology and stiffness, and the different effect on dominant and non-dominant legs; and to evaluate inter-operator and intra-operator reliability of using shear-wave elastography in measuring Achilles tendon stiffness. Methods 20 healthy subjects (13 males, 7 females) were included in the study. Thickness, cross-sectional area and stiffness of Achilles tendons in both legs were measured before and after 5-min static stretching using grey-scale ultrasound and shear-wave elastography. Inter-operator and intra-operator reliability of tendon stiffness measurements of six operators were evaluated. Results Result showed that there was no significant change in the thickness and cross-sectional area of Achilles tendon after static stretching in both dominant and non-dominant legs (p > 0.05). Tendon stiffness showed a significant increase in non-dominant leg (p < 0.05) but not in dominant leg (p > 0.05). The inter-operator reliability of shear-wave elastography measurements was 0.749 and the intra-operator reliability ranged from 0.751 to 0.941. Conclusion Shear-wave elastography is a useful and non-invasive imaging tool to assess the immediate stiffness change of Achilles tendon in response to static stretching with high intra-operator and inter-operator reliability. PMID:27120097
Self-audit of lockout/tagout in manufacturing workplaces: A pilot study.

PubMed

Yamin, Samuel C; Parker, David L; Xi, Min; Stanley, Rodney

2017-05-01

Occupational health and safety (OHS) self-auditing is a common practice in industrial workplaces. However, few audit instruments have been tested for inter-rater reliability and accuracy. A lockout/tagout (LOTO) self-audit checklist was developed for use in manufacturing enterprises. It was tested for inter-rater reliability and accuracy using responses of business self-auditors and external auditors. Inter-rater reliability at ten businesses was excellent (κ = 0.84). Business self-auditors had high (100%) accuracy in identifying elements of LOTO practice that were present as well those that were absent (81% accuracy). Reliability and accuracy increased further when problematic checklist questions were removed from the analysis. Results indicate that the LOTO self-audit checklist would be useful in manufacturing firms' efforts to assess and improve their LOTO programs. In addition, a reliable self-audit instrument removes the need for external auditors to visit worksites, thereby expanding capacity for outreach and intervention while minimizing costs. © 2017 Wiley Periodicals, Inc.
The push-off test: development of a simple, reliable test of upper extremity weight-bearing capability.

PubMed

Vincent, Joshua I; MacDermid, Joy C; Michlovitz, Susan L; Rafuse, Richard; Wells-Rowsell, Christina; Wong, Owen; Bisbee, Leslie

2014-01-01

Longitudinal clinical measurement study. The push-off test (POT) is a novel and simple measure of upper extremity weight-bearing that can be measured with a grip dynamometer. There are no published studies on the validity and reliability of the POT. The relationship between upper extremity self-report activity/participation and impairment measures remain an unexplored realm. The primary purpose of this study is to estimate the intra and inter-rater reliability and construct validity of the POT. The secondary purpose is to estimate the relationship between upper extremity self-report activity/participation questionnaires and impairment measures. A convenience sample of 22 patients with wrist or elbow injuries were tested for POT, wrist/elbow range of motion (ROM), isometric wrist extension strength (WES) and grip strength; and completed two self-report activity/participation questionnaires: Disability of the Arm, Shoulder and the Hand (DASH) and Work Limitations Questionnaire (WLQ-26). POT's inter and intra-rater reliability and construct validity was tested. Pearson's correlations were run between the impairment measures and self-report questionnaires to look into the relationship amongst them. The POT demonstrated high inter-rater reliability (ICC affected = 0.97; 95% C.I. 0.93-0.99; ICC unaffected = 0.85; 95% C.I. 0.68-0.94) and intra-rater reliability (ICC affected = 0.96; 95% C.I. 0.92-0.97; ICC unaffected = 0.92; 95% C.I. 0.85-0.97). The POT was correlated moderately with the DASH (r = -0.47; p = 0.03). While examining the relationship between upper extremity self-reported activity/participation questionnaires and impairment measures the strongest correlation was between the DASH and the POT (r = -0.47; p = 0.03) and none of the correlations with the other physical impairment measures reached significance. At-work disability demonstrated insignificant correlations with physical impairments. The POT test provides a reliable and easily administered quantitative measure of ability to bear the load through an injured arm. Preliminary evidence supports a moderate relationship between loading bearing measured by the POT and upper extremity function measured by the DASH. 1b. Copyright © 2014 Hanley & Belfus. Published by Elsevier Inc. All rights reserved.
Quantitative outcome measures for systemic sclerosis-related Microangiopathy - Reliability of image acquisition in Nailfold Capillaroscopy.

PubMed

Dinsdale, Graham; Moore, Tonia; O'Leary, Neil; Berks, Michael; Roberts, Christopher; Manning, Joanne; Allen, John; Anderson, Marina; Cutolo, Maurizio; Hesselstrand, Roger; Howell, Kevin; Pizzorni, Carmen; Smith, Vanessa; Sulli, Alberto; Wildt, Marie; Taylor, Christopher; Murray, Andrea; Herrick, Ariane L

2017-09-01

Nailfold capillaroscopic parameters hold increasing promise as outcome measures for clinical trials in systemic sclerosis (SSc). Their inclusion as outcomes would often naturally require capillaroscopy images to be captured at several time points during any one study. Our objective was to assess repeatability of image acquisition (which has been little studied), as well as of measurement. 41 patients (26 with SSc, 15 with primary Raynaud's phenomenon) and 10 healthy controls returned for repeat high-magnification (300×) videocapillaroscopy mosaic imaging of 10 digits one week after initial imaging (as part of a larger study of reliability). Images were assessed in a random order by an expert blinded observer and 4 outcome measures extracted: (1) overall image grade and then (where possible) distal vessel locations were marked, allowing (2) vessel density (across the whole nailfold) to be calculated (3) apex width measurement and (4) giant vessel count. Intra-rater, intra-visit and intra-rater inter-visit (baseline vs. 1week) reliability were examined in 475 and 392 images respectively. A linear, mixed-effects model was used to estimate variance components, from which intra-class correlation coefficients (ICCs) were determined. Intra-visit and inter-visit reliability estimates (ICCs) were (respectively): overall image grade, 0.97 and 0.90; vessel density, 0.92 and 0.65; mean vessel width, 0.91 and 0.79; presence of giant capillary, 0.68 and 0.56. These estimates were conditional on each parameter being measurable. Within-operator image analysis and acquisition are reproducible. Quantitative nailfold capillaroscopy, at least with a single observer, provides reliable outcome measures for clinical studies including randomised controlled trials. Copyright © 2017 Elsevier Inc. All rights reserved.
The Influence of the Manner of Performing the Thyroid Ultrasound Examination on the Reliability of the Assessment of the Thyroid Size in School-Aged Children.

PubMed

Zygmunt, Arkadiusz; Adamczewski, Zbigniew; Zygmunt, Agnieszka; Karbownik-Lewinska, Malgorzata; Lewinski, Andrzej

2017-01-01

Goitre incidence in school-aged children evaluated using ultrasonography is one of the essential indicators of iodine intake in a given area. The aim of the study was to examine what the difference is between the volume of the thyroid gland measured in the supine and sitting position and to determine the intra-observer, inter-observer, and inter-position variations. The survey was conducted among 87 children (56 girls and 31 boys aged 7-13 years, mean age 10.44 ± 1.72 years). The thyroid volume measured in a sitting position was significantly lower than that measured in the supine position. The intra-observer variations for the total thyroid volume equalled 9.56-9.65%. The inter-observer variations were significantly higher and amounted to 34.5-35.7%. The way in which ultrasound evaluation is performed is important for the analysis of the results. It is crucial to aim for the smallest inter-observer variation, which can be achieved by strictly defining the methods of the thyroid measurement and comparing one's measuring techniques with the reference method. The use of standards in ultrasound evaluation performed in the supine position, as well as the use of standards without a strict determination of the study method, can lead to erro-neous conclusions. © 2017 S. Karger AG, Basel.
Inter-observer reliability of radiographic classifications and measurements in the assessment of Perthes' disease.

PubMed

Wiig, Ola; Terjesen, Terje; Svenningsen, Svein

2002-10-01

We evaluated the inter-observer agreement of radiographic methods when evaluating patients with Perthes' disease. The radiographs were assessed at the time of diagnosis and at the 1-year follow-up by local orthopaedic surgeons (O) and 2 experienced pediatric orthopedic surgeons (TT and SS). The Catterall, Salter-Thompson, and Herring lateral pillar classifications were compared, and the femoral head coverage (FHC), center-edge angle (CE-angle), and articulo-trochanteric distance (ATD) were measured in the affected and normal hips. On the primary evaluation, the lateral pillar and Salter-Thompson classifications had a higher level of agreement among the observers than the Catterall classification, but none of the classifications showed good agreement (weighted kappa values between O and SS 0.56, 0.54, 0.49, respectively). Combining Catterall groups 1 and 2 into one group, and groups 3 and 4 into another resulted in better agreement (kappa 0.55) than with the original 4-group system. The agreement was also better (kappa 0.62-0.70) between experienced than between less experienced examiners for all classifications. The femoral head coverage was a more reliable and accurate measure than the CE-angle for quantifying the acetabular covering of the femoral head, as indicated by higher intraclass correlation coefficients (ICC) and smaller inter-observer differences. The ATD showed good agreement in all comparisons and had low interobserver differences. We conclude that all classifications of femoral head involvement are adequate in clinical work if the radiographic assessment is done by experienced examiners. When they are less experienced examiners, a 2-group classification or the lateral pillar classification is more reliable. For evaluation of containment of the femoral head, FHC is more appropriate than the CE-angle.
Reliability of movement control tests in the lumbar spine

PubMed Central

Luomajoki, Hannu; Kool, Jan; de Bruin, Eling D; Airaksinen, Olavi

2007-01-01

Background Movement control dysfunction [MCD] reduces active control of movements. Patients with MCD might form an important subgroup among patients with non specific low back pain. The diagnosis is based on the observation of active movements. Although widely used clinically, only a few studies have been performed to determine the test reliability. The aim of this study was to determine the inter- and intra-observer reliability of movement control dysfunction tests of the lumbar spine. Methods We videoed patients performing a standardized test battery consisting of 10 active movement tests for motor control in 27 patients with non specific low back pain and 13 patients with other diagnoses but without back pain. Four physiotherapists independently rated test performances as correct or incorrect per observation, blinded to all other patient information and to each other. The study was conducted in a private physiotherapy outpatient practice in Reinach, Switzerland. Kappa coefficients, percentage agreements and confidence intervals for inter- and intra-rater results were calculated. Results The kappa values for inter-tester reliability ranged between 0.24 – 0.71. Six tests out of ten showed a substantial reliability [k > 0.6]. Intra-tester reliability was between 0.51 – 0.96, all tests but one showed substantial reliability [k > 0.6]. Conclusion Physiotherapists were able to reliably rate most of the tests in this series of motor control tasks as being performed correctly or not, by viewing films of patients with and without back pain performing the task. PMID:17850669
Validity and reliability of the Diagnostic Adaptive Behaviour Scale.

PubMed

Tassé, M J; Schalock, R L; Balboni, G; Spreat, S; Navas, P

2016-01-01

The Diagnostic Adaptive Behaviour Scale (DABS) is a new standardised adaptive behaviour measure that provides information for evaluating limitations in adaptive behaviour for the purpose of determining a diagnosis of intellectual disability. This article presents validity evidence and reliability data for the DABS. Validity evidence was based on comparing DABS scores with scores obtained on the Vineland Adaptive Behaviour Scale, second edition. The stability of the test scores was measured using a test and retest, and inter-rater reliability was assessed by computing the inter-respondent concordance. The DABS convergent validity coefficients ranged from 0.70 to 0.84, while the test-retest reliability coefficients ranged from 0.78 to 0.95, and the inter-rater concordance as measured by intraclass correlation coefficients ranged from 0.61 to 0.87. All obtained validity and reliability indicators were strong and comparable with the validity and reliability coefficients of the most commonly used adaptive behaviour instruments. These results and the advantages of the DABS for clinician and researcher use are discussed. © 2015 MENCAP and International Association of the Scientific Study of Intellectual and Developmental Disabilities and John Wiley & Sons Ltd.
Design, implementation, and psychometric analysis of a scoring instrument for simulated pediatric resuscitation: a report from the EXPRESS pediatric investigators.

PubMed

Donoghue, Aaron; Ventre, Kathleen; Boulet, John; Brett-Fleegler, Marisa; Nishisaki, Akira; Overly, Frank; Cheng, Adam

2011-04-01

Robustly tested instruments for quantifying clinical performance during pediatric resuscitation are lacking. Examining Pediatric Resuscitation Education through Simulation and Scripting Collaborative was established to conduct multicenter trials of simulation education in pediatric resuscitation, evaluating performance with multiple instruments, one of which is the Clinical Performance Tool (CPT). We hypothesize that the CPT will measure clinical performance during simulated pediatric resuscitation in a reliable and valid manner. Using a pediatric resuscitation scenario as a basis, a scoring system was designed based on Pediatric Advanced Life Support algorithms comprising 21 tasks. Each task was scored as follows: task not performed (0 points); task performed partially, incorrectly, or late (1 point); and task performed completely, correctly, and within the recommended time frame (2 points). Study teams at 14 children's hospitals went through the scenario twice (PRE and POST) with an interposed 20-minute debriefing. Both scenarios for each of eight study teams were scored by multiple raters. A generalizability study, based on the PRE scores, was conducted to investigate the sources of measurement error in the CPT total scores. Inter-rater reliability was estimated based on the variance components. Validity was assessed by repeated measures analysis of variance comparing PRE and POST scores. Sixteen resuscitation scenarios were reviewed and scored by seven raters. Inter-rater reliability for the overall CPT score was 0.63. POST scores were found to be significantly improved compared with PRE scores when controlled for within-subject covariance (F1,15 = 4.64, P < 0.05). The variance component ascribable to rater was 2.4%. Reliable and valid measures of performance in simulated pediatric resuscitation can be obtained from the CPT. Future studies should examine the applicability of trichotomous scoring instruments to other clinical scenarios, as well as performance during actual resuscitations.
Inter-observer and intra-observer reliability in the radiographic diagnosis of avascular necrosis of the femoral head following reconstructive hip surgery in children with cerebral palsy.

PubMed

Hesketh, Kim; Sankar, Wudbhav; Joseph, Benjamin; Narayanan, Unni; Mulpuri, Kishore

2016-04-01

The incidence of avascular necrosis (AVN) following reconstructive hip surgery in cerebral palsy (CP) ranges from 0 to 69 % in the current literature. The purpose of this study was to determine the inter- and intra-observer reliability of radiographically diagnosing AVN in children with CP after hip surgery. A retrospective review of 65 children with CP who had reconstructive hip surgery between 2009 and 2012 at BC Children's Hospital was completed. Anterior-posterior and lateral radiographs were presented to four pediatric orthopaedic surgeons over two rounds. Surgeons were asked to review the set of unidentified radiographs and comment 'yes' or 'no' for the presence of AVN. Two weeks later the same set of radiographs was sent in a different order and the surgeons were again asked to comment on AVN. Inter- and intra-observer reliability was determined using kappa statistics. The intra-observer reliability ranged from 0.65 to 0.88 with an average score of 0.76. Inter-observer reliability showed greater variability, ranging from 0.41 to 0.77 with an average score of 0.56 across all surgeons. Although the intra-rater reliability produced a strength of "good" and the inter-rater reliability a strength of "moderate" agreement, the variability within these scores is clinically important as it demonstrates the difficulty in identifying AVN. This may explain the variability in AVN that is reported in the literature. The need for further education and research in the diagnosis of AVN in children with CP who have undergone reconstructive hip surgery is clinically necessary.
Evaluation of General Classes of Reliability Estimators Often Used in Statistical Analyses of Quasi-Experimental Designs

NASA Astrophysics Data System (ADS)

Saini, K. K.; Sehgal, R. K.; Sethi, B. L.

2008-10-01

In this paper major reliability estimators are analyzed and there comparatively result are discussed. There strengths and weaknesses are evaluated in this case study. Each of the reliability estimators has certain advantages and disadvantages. Inter-rater reliability is one of the best ways to estimate reliability when your measure is an observation. However, it requires multiple raters or observers. As an alternative, you could look at the correlation of ratings of the same single observer repeated on two different occasions. Each of the reliability estimators will give a different value for reliability. In general, the test-retest and inter-rater reliability estimates will be lower in value than the parallel forms and internal consistency ones because they involve measuring at different times or with different raters. Since reliability estimates are often used in statistical analyses of quasi-experimental designs.
Reliability and validity of the de Morton Mobility Index in individuals with sub-acute stroke.

PubMed

Braun, Tobias; Marks, Detlef; Thiel, Christian; Grüneberg, Christian

2018-02-04

To establish the validity and reliability of the de Morton Mobility Index (DEMMI) in patients with sub-acute stroke. This cross-sectional study was performed in a neurological rehabilitation hospital. We assessed unidimensionality, construct validity, internal consistency reliability, inter-rater reliability, minimal detectable change and possible floor and ceiling effects of the DEMMI in adult patients with sub-acute stroke. The study included a total sample of 121 patients with sub-acute stroke. We analysed validity (n = 109) and reliability (n = 51) in two sub-samples. Rasch analysis indicated unidimensionality with an overall fit to the model (chi-square = 12.37, p = 0.577). All hypotheses on construct validity were confirmed. Internal consistency reliability (Cronbach's alpha = 0.94) and inter-rater reliability (intraclass correlation coefficient = 0.95; 95% confidence interval: 0.92-0.97) were excellent. The minimal detectable change with 90% confidence was 13 points. No floor or ceiling effects were evident. These results indicate unidimensionality, sufficient internal consistency reliability, inter-rater reliability, and construct validity of the DEMMI in patients with a sub-acute stroke. Advantages of the DEMMI in clinical application are the short administration time, no need for special equipment and interval level data. The de Morton Mobility Index, therefore, may be a useful performance-based bedside test to measure mobility in individuals with a sub-acute stroke across the whole mobility spectrum. Implications for Rehabilitation The de Morton Mobility Index (DEMMI) is an unidimensional measurement instrument of mobility in individuals with sub-acute stroke. The DEMMI has excellent internal consistency and inter-rater reliability, and sufficient construct validity. The minimal detectable change of the DEMMI with 90% confidence in stroke rehabilitation is 13 points. The lack of any floor or ceiling effects on hospital admission indicates applicability across the whole mobility spectrum of patients with sub-acute stroke.

Reliability and concurrent validity of postural asymmetry measurement in adolescent idiopathic scoliosis.

PubMed

Prowse, Ashleigh; Aslaksen, Berit; Kierkegaard, Marie; Furness, James; Gerdhem, Paul; Abbott, Allan

2017-01-18

To investigate the reliability and concurrent validity of the Baseline ® Body Level/Scoliosis meter for adolescent idiopathic scoliosis postural assessment in three anatomical planes. This is an observational reliability and concurrent validity study of adolescent referrals to the Orthopaedic department for scoliosis screening at Karolinska University Hospital, Stockholm, Sweden between March-May 2012. A total of 31 adolescents with idiopathic scoliosis (13.6 ± 0.6 years old) of mild-moderate curvatures (25° ± 12°) were consecutively recruited. Measurement of cervical, thoracic and lumbar curvatures, pelvic and shoulder tilt, and axial thoracic rotation (ATR) were performed by two trained physiotherapists in one day. The intraclass correlation coefficient (ICC) was used to determine the inter-examiner reliability (ICC2,1) and the intra-rater reliability (ICC3,3) of the Baseline ® Body Level/Scoliosis meter. Spearman's correlation analyses were used to estimate concurrent validity between the Baseline ® Body Level/Scoliosis meter and Gold Standard Cobb angles from radiographs and the Orthopaedic Systems Inc. Scoliometer. There was excellent reliability between examiners for thoracic kyphosis (ICC2,1 = 0.94), ATR (ICC2,1 = 0.92) and lumbar lordosis (ICC2,1 = 0.79). There was adequate reliability between examiners for cervical lordosis (ICC2,1 = 0.51), however poor reliability for pelvic and shoulder tilt. Both devices were reproducible in the measurement of ATR when repeated by one examiner (ICC3,3 0.98-1.00). The device had a good correlation with the Scoliometer (rho = 0.78). When compared with Cobb angle from radiographs, there was a moderate correlation for ATR (rho = 0.627). The Baseline ® Body Level/Scoliosis meter provides reliable transverse and sagittal cervical, thoracic and lumbar measurements and valid transverse plan measurements of mild-moderate scoliosis deformity.
Neck motion kinematics: an inter-tester reliability study using an interactive neck VR assessment in asymptomatic individuals.

PubMed

Sarig Bahat, Hilla; Sprecher, Elliot; Sela, Itamar; Treleaven, Julia

2016-07-01

The use of virtual reality (VR) for assessment and intervention of neck pain has previously been used and shown reliable for cervical range of motion measures. Neck VR enables analysis of task-oriented neck movement by stimulating responsive movements to external stimuli. Therefore, the purpose of this study was to establish inter-tester reliability of neck kinematic measures so that it can be used as a reliable assessment and treatment tool between clinicians. This reliability study included 46 asymptomatic participants, who were assessed using the neck VR system which displayed an interactive VR scenario via a head-mounted device, controlled by neck movements. The objective of the interactive assessment was to hit 16 targets, randomly appearing in four directions, as fast as possible. Each participant was tested twice by two different testers. Good reliability was found of neck motion kinematic measures in flexion, extension, and rotation (0.64-0.93 inter-class correlation). High reliability was shown for peak velocity globally (0.93), in left rotation (0.9), right rotation and extension (0.88), and flexion (0.86). Mean velocity had a good global reliability (0.84), except for left rotation directed movement with moderate reliability (0.68). Minimal detectable change for peak velocity ranged from 41 to 53 °/s, while mean velocity ranged from 20 to 25 °/s. The results suggest high reliability for peak and mean velocity as measured by the interactive Neck VR assessment of neck motion kinematics. VR appears to provide a reliable and more ecologically valid method of cervical motion evaluation than previous conventional methodologies.
Inter-Rater Reliability of the Modified Ashworth Scale and Modified Modified Ashworth Scale in Assessing Poststroke Elbow Flexor Spasticity

ERIC Educational Resources Information Center

Kaya, Taciser; Goksel Karatepe, Altinay; Gunaydin, Rezzan; Koc, Aysegul; Altundal Ercan, Ulku

2011-01-01

The Modified Ashworth Scale (MAS) is commonly used in clinical practice for grading spasticity. However, it was modified recently by omitting grade "1+" of the MAS and redefining grade "2". The aim of this study was to investigate the inter-rater reliability of MAS and modified MAS (MMAS) for the assessment of poststroke elbow flexor spasticity.…
Intra- and Inter-Rater Reliability of the Rate of Force Development of Hip Abductor Muscles Measured by Hand-Held Dynamometer

ERIC Educational Resources Information Center

Takeda, Kazuya; Tanabe, Shigeo; Koyama, Soichiro; Nagai, Tomoko; Sakurai, Hiroaki; Kanada, Yoshikiyo; Shomoto, Koji

2018-01-01

The aim of this study was to clarify the intra- and inter-rater reliability of the rate of force development in hip abductor muscle force measurements using a hand-held dynamometer. Thirty healthy adults were separately assessed by two independent raters on two separate days. Rate of force development was calculated from the slope of the…
PubMed

Brosseau, Lucie; Laroche, Chantal; Guitard, Paulette; King, Judy; Poitras, Stéphane; Casimiro, Lynn; Barette, Julie Alexandra; Cardinal, Dominique; Cavallo, Sabrina; Laferrière, Lucie; Martini, Rose; Champoux, Nicholas; Taverne, Jennifer; Paquette, Chanyque; Tremblay, Sébastien; Sutton, Ann; Galipeau, Roseline; Tourigny, Jocelyne; Toupin-April, Karine; Loew, Laurianne; Demers, Catrine; Sauvé-Schenk, Katrine; Paquet, Nicole; Savard, Jacinthe; Lagacé, Josée; Pharand, Denyse; Vaillancourt, Véronique

2017-01-01

Objectives: The primary objective was to produce a French-Canadian translation of AMSTAR (a measurement tool to assess systematic reviews) and to examine the validity of the translation's contents. The secondary and tertiary objectives were to assess the inter-rater reliability and factorial construct validity of this French-Canadian version of AMSTAR. Methods: A modified approach to Vallerand's methodology (1989) for cross-cultural validation was used. 1 First, a parallel back-translation of AMSTAR 2 was performed, by both professionals and future professionals. Next, a first committee of experts (P1) examined the translations to create a first draft of the French-Canadian version of the AMSTAR tool. This draft was then evaluated and modified by a second committee of experts (P2). Following that, 18 future professionals (master's students in physiotherapy) rated this second draft of the instrument for clarity using a seven-point scale (1: very clear; 7: very ambiguous). Lastly, the principal co-investigators then reviewed the problematic elements and proposed final changes. Four independent raters used this French-Canadian version of AMSTAR to assess 20 systematic reviews that were published in French after the year 2000. An intraclass correlation coefficient (ICC) and kappa coefficient were calculated to measure the tool's inter-rater reliability. A Cronbach's alpha coefficient was also calculated to measure internal consistency. In addition, factor analysis was used to evaluate construct validity in order to determine the number of dimensions. Results: The statements on the final version of the AMSTAR tool received an average ambiguity rating of between 1.0 and 1.4. No statement received an average rating below 1.4, which indicates a high level of clarity. Inter-rater reliability ( n =4) for the instrument's total score was moderate, with an intraclass correlation coefficient of 0.61 (95% confidence interval [CI]: 0.29, 0.97). Inter-rater reliability for 82% of the individual items was good, according to the kappa values obtained. Internal consistency was excellent, with a Cronbach's alpha coefficient of 0.91 (95% CI: 0.83, 0.99). The French-Canadian version of AMSTAR is a unidimensional tool, as confirmed by factor analysis and community values greater than 0.30. Conclusion: A valid French-Canadian version of AMSTAR was created using this rigorous five-step process. This version is unidimensional, with moderate inter-rater reliability for the elements overall, and with excellent internal consistency. This tool could be valuable to French-Canadian professionals and researchers, and could also be of interest to the international Francophone community.
Non-Weight-Bearing and Weight-Bearing Ultrasonography of Select Foot Muscles in Young, Asymptomatic Participants: A Descriptive and Reliability Study.

PubMed

Battaglia, Patrick J; Mattox, Ross; Winchester, Brett; Kettner, Norman W

The primary aim of this study was to determine the reliability of diagnostic ultrasound imaging for select intrinsic foot muscles using both non-weight-bearing and weight-bearing postures. Our secondary aim was to describe the change in muscle cross-sectional area (CSA) and dorsoplantar thickness when bearing weight. An ultrasound examination was performed with a linear ultrasound transducer operating between 9 and 12 MHz. Long-axis and short-axis ultrasound images of the abductor hallucis, flexor digitorum brevis, and quadratus plantae were obtained in both the non-weight-bearing and weight-bearing postures. Two examiners independently collected ultrasound images to allow for interexaminer and intraexaminer reliability calculation. The change in muscle CSA and dorsoplantar thickness when bearing weight was also studied. There were 26 participants (17 female) with a mean age of 25.5 ± 3.8 years and a mean body mass index of 28.0 ± 7.8 kg/m 2 . Inter-examiner reliability was excellent when measuring the muscles in short axis (intraclass correlation coefficient >0.75) and fair to good in long axis (intraclass correlation coefficient >0.4). Intraexaminer reliability was excellent for the abductor hallucis and flexor digitorum brevis and ranged from fair to good to excellent for the quadratus plantae. Bearing weight did not reduce interexaminer or intraexaminer reliability. All muscles exhibited a significant increase in CSA when bearing weight. This is the first report to describe weight-bearing diagnostic ultrasound of the intrinsic foot muscles. Ultrasound imaging is reliable when imaging these muscles bearing weight. Furthermore, muscle CSA increases in the weight-bearing posture. Copyright Â© 2016. Published by Elsevier Inc.
The impact of revised DSM-5 criteria on the relative distribution and inter-rater reliability of eating disorder diagnoses in a residential treatment setting.

PubMed

Thomas, Jennifer J; Eddy, Kamryn T; Murray, Helen B; Tromp, Marilou D P; Hartmann, Andrea S; Stone, Melissa T; Levendusky, Philip G; Becker, Anne E

2015-09-30

This study evaluated the relative distribution and inter-rater reliability of revised DSM-5 criteria for eating disorders in a residential treatment program. Consecutive adolescent and young adult females (N=150) admitted to a residential eating disorder treatment facility were assigned both DSM-IV and DSM-5 diagnoses by a clinician (n=14) via routine clinical interview and a research assessor (n=4) via structured interview. We compared the frequency of diagnostic assignments under each taxonomy and by type of assessor. We evaluated concordance between clinician and researcher assignment through inter-rater reliability kappa and percent agreement. Significantly fewer patients received either clinician or researcher diagnoses of a residual eating disorder under DSM-5 (clinician-12.0%; researcher-31.3%) versus DSM-IV (clinician-28.7%; researcher-59.3%), with the majority of reassigned DSM-IV residual cases reclassified as DSM-5 anorexia nervosa. Researcher and clinician diagnoses showed moderate inter-rater reliability under DSM-IV (κ=.48) and DSM-5 (κ=.57), though agreement for specific DSM-5 other specified feeding or eating disorder (OSFED) presentations was poor (κ=.05). DSM-5 revisions were associated with significantly less frequent residual eating disorder diagnoses, but not with reduced inter-rater reliability. Findings support specific dimensions of clinical utility for revised DSM-5 criteria for eating disorders. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
The TiltMeter app is a novel and accurate measurement tool for the weight bearing lunge test.

PubMed

Williams, Cylie M; Caserta, Antoni J; Haines, Terry P

2013-09-01

The weight bearing lunge test is increasing being used by health care clinicians who treat lower limb and foot pathology. This measure is commonly established accurately and reliably with the use of expensive equipment. This study aims to compare the digital inclinometer with a free app, TiltMeter on an Apple iPhone. This was an intra-rater and inter-rater reliability study. Two raters (novice and experienced) conducted the measurements in both a bent knee and straight leg position to determine the intra-rater and inter-rater reliability. Concurrent validity was also established. Allied health practitioners were recruited as participants from the workplace. A preconditioning stretch was conducted and the ankle range of motion was established with the weight bearing lunge test position with firstly the leg straight and secondly with the knee bent. The measurement device and each participant were randomised during measurement. The intra-rater reliability and inter-rater reliability for the devices and in both positions were all over ICC 0.8 except for one intra-rater measure (Digital inclinometer, novice, ICC 0.65). The inter-rater reliability between the digital inclinometer and the tilmeter was near perfect, ICC 0.96 (CI: 0.898-0.983); Concurrent validity ICC between the two devices was 0.83 (CI: -0.740 to 0.445). The use of the Tiltmeter app on the iPhone is a reliable and inexpensive tool to measure the available ankle range of motion. Health practitioners should use caution in applying these findings to other smart phone equipment if surface areas are not comparable. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.
Inter-rater reliability and aspects of validity of the parent-infant relationship global assessment scale (PIR-GAS)

PubMed Central

2013-01-01

Background The Parent-Infant Relationship Global Assessment Scale (PIR-GAS) signifies a conceptually relevant development in the multi-axial, developmentally sensitive classification system DC:0-3R for preschool children. However, information about the reliability and validity of the PIR-GAS is rare. A review of the available empirical studies suggests that in research, PIR-GAS ratings can be based on a ten-minute videotaped interaction sequence. The qualification of raters may be very heterogeneous across studies. Methods To test whether the use of the PIR-GAS still allows for a reliable assessment of the parent-infant relationship, our study compared a PIR-GAS ratings based on a full-information procedure across multiple settings with ratings based on a ten-minute video by two doctoral candidates of medicine. For each mother-child dyad at a family day hospital (N = 48), we obtained two video ratings and one full-information rating at admission to therapy and at discharge. This pre-post design allowed for a replication of our findings across the two measurement points. We focused on the inter-rater reliability between the video coders, as well as between the video and full-information procedure, including mean differences and correlations between the raters. Additionally, we examined aspects of the validity of video and full-information ratings based on their correlation with measures of child and maternal psychopathology. Results Our results showed that a ten-minute video and full-information PIR-GAS ratings were not interchangeable. Most results at admission could be replicated by the data obtained at discharge. We concluded that a higher degree of standardization of the assessment procedure should increase the reliability of the PIR-GAS, and a more thorough theoretical foundation of the manual should increase its validity. PMID:23705962
Inter- and intra- observer reliability of risk assessment of repetitive work without an explicit method.

PubMed

Eliasson, Kristina; Palm, Peter; Nyman, Teresia; Forsman, Mikael

2017-07-01

A common way to conduct practical risk assessments is to observe a job and report the observed long term risks for musculoskeletal disorders. The aim of this study was to evaluate the inter- and intra-observer reliability of ergonomists' risk assessments without the support of an explicit risk assessment method. Twenty-one experienced ergonomists assessed the risk level (low, moderate, high risk) of eight upper body regions, as well as the global risk of 10 video recorded work tasks. Intra-observer reliability was assessed by having nine of the ergonomists repeat the procedure at least three weeks after the first assessment. The ergonomists made their risk assessment based on his/her experience and knowledge. The statistical parameters of reliability included agreement in %, kappa, linearly weighted kappa, intraclass correlation and Kendall's coefficient of concordance. The average inter-observer agreement of the global risk was 53% and the corresponding weighted kappa (K w ) was 0.32, indicating fair reliability. The intra-observer agreement was 61% and 0.41 (K w ). This study indicates that risk assessments of the upper body, without the use of an explicit observational method, have non-acceptable reliability. It is therefore recommended to use systematic risk assessment methods to a higher degree. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
Inter- and intra-observer reliability of measurement of pedicle screw breach assessed by postoperative CT scans.

PubMed

Lavelle, William F; Ranade, Ashish; Samdani, Amer F; Gaughan, John P; D'Andrea, Linda P; Betz, Randal R

2014-01-01

Pedicle screws are used increasingly in spine surgery. Concerns of complications associated with screw breach necessitates accurate pedicle screw placement. Postoperative CT imaging helps to detect screw malposition and assess its severity. However, accuracy is dependent on the reading of the CT scans. Inter- and intra-observer variability could affect the reliability of CT scans to assess multiple screw types and sites. The purpose of this study was to assess the reliability of multi-observer analysis of CT scans for determining pedicle screw breach for various screw types and sites in patients with spinal deformity or degenerative pathologies. Axial CT scan images of 23 patients (286 screws) were read by four experienced spine surgeons. Pedicle screw placement was considered 'In' when the screw was fully contained and/or the pedicle wall breach was ≤2 mm. 'Out' was defined as a breach in the medial or lateral pedicle wall >2 mm. Intra-class coefficients (ICC) were calculated to assess the inter- and intra-observer reliability. Marked inter- and intra-observer variability was noticed. The overall inter-observer ICC was 0.45 (95% confidence limits 0.25 to 0.65). The intra-observer ICC was 0.49 (95% confidence limits 0.29 to 0.69). Underlying spinal pathology, screw type, and patient age did not seem to impact the reliability of our CT assessments. Our results indicate the evaluation of pedicle screw breach on CT by a single surgeon is highly variable, and care should be taken when using individual CT evaluations of millimeters of breach as a basis for screw removal. This was a Level III study.
Urdu translation of the Hamilton Rating Scale for Depression: Results of a validation study

PubMed Central

Hashmi, Ali M.; Naz, Shahana; Asif, Aftab; Khawaja, Imran S.

2016-01-01

Objective: To develop a standardized validated version of the Hamilton Rating Scale for Depression (HAM-D) in Urdu. Methods: After translation of the HAM-D into the Urdu language following standard guidelines, the final Urdu version (HAM-D-U) was administered to 160 depressed outpatients. Inter-item correlation was assessed by calculating Cronbach alpha. Correlation between HAM-D-U scores at baseline and after a 2-week interval was evaluated for test-retest reliability. Moreover, scores of two clinicians on HAM-D-U were compared for inter-rater reliability. For establishing concurrent validity, scores of HAM-D-U and BDI-U were compared by using Spearman correlation coefficient. The study was conducted at Mayo Hospital, Lahore, from May to December 2014. Results: The Cronbach alpha for HAM-D-U was 0.71. Composite scores for HAM-D-U at baseline and after a 2-week interval were also highly correlated with each other (Spearman correlation coefficient 0.83, p-value < 0.01) indicating good test-retest reliability. Composite scores for HAM-D-U and BDI-U were positively correlated with each other (Spearman correlation coefficient 0.85, p < 0.01) indicating good concurrent validity. Scores of two clinicians for HAM-D-U were also positively correlated (Spearman correlation coefficient 0.82, p-value < 0.01) indicated good inter-rater reliability. Conclusion: The HAM-D-U is a valid and reliable instrument for the assessment of Depression. It shows good inter-rater and test-retest reliability. The HAM-D-U can be a tool either for clinical management or research. PMID:28083049
Urdu translation of the Hamilton Rating Scale for Depression: Results of a validation study.

PubMed

Hashmi, Ali M; Naz, Shahana; Asif, Aftab; Khawaja, Imran S

2016-01-01

To develop a standardized validated version of the Hamilton Rating Scale for Depression (HAM-D) in Urdu. After translation of the HAM-D into the Urdu language following standard guidelines, the final Urdu version (HAM-D-U) was administered to 160 depressed outpatients. Inter-item correlation was assessed by calculating Cronbach alpha. Correlation between HAM-D-U scores at baseline and after a 2-week interval was evaluated for test-retest reliability. Moreover, scores of two clinicians on HAM-D-U were compared for inter-rater reliability. For establishing concurrent validity, scores of HAM-D-U and BDI-U were compared by using Spearman correlation coefficient. The study was conducted at Mayo Hospital, Lahore, from May to December 2014. The Cronbach alpha for HAM-D-U was 0.71. Composite scores for HAM-D-U at baseline and after a 2-week interval were also highly correlated with each other (Spearman correlation coefficient 0.83, p-value < 0.01) indicating good test-retest reliability. Composite scores for HAM-D-U and BDI-U were positively correlated with each other (Spearman correlation coefficient 0.85, p < 0.01) indicating good concurrent validity. Scores of two clinicians for HAM-D-U were also positively correlated (Spearman correlation coefficient 0.82, p-value < 0.01) indicated good inter-rater reliability. The HAM-D-U is a valid and reliable instrument for the assessment of Depression. It shows good inter-rater and test-retest reliability. The HAM-D-U can be a tool either for clinical management or research.
Inter-Rater Reliability of Total Body Score-A Scale for Quantification of Corpse Decomposition.

PubMed

Nawrocka, Marta; Frątczak, Katarzyna; Matuszewski, Szymon

2016-05-01

The degree of body decomposition can be quantified using Total Body Score (TBS), a scale frequently used in taphonomic or entomological studies of decomposition. Here, the inter-rater reliability of the scale is analyzed. The study was made on 120 laymen, which were trained in the use of the scale. Participants scored decomposition of pig carcasses from photographs. It was found that the scale, when used by different people, gives homogeneous results irrespective of the user qualifications (the Krippendorff's alfa for all participants was 0.818). The study also indicated that carcasses in advanced decomposition receive significantly less accurate scores. Moreover, it was found that scores for cadavers in mosaic decomposition (i.e., representing signs of at least two stages of decomposition) are less accurate. These results demonstrate that the scale may be regarded as inter-rater reliable. Some propositions for refinement of the scale were also discussed. © 2016 American Academy of Forensic Sciences.
The Pareidolia Test: A Simple Neuropsychological Test Measuring Visual Hallucination-Like Illusions.

PubMed

Mamiya, Yasuyuki; Nishio, Yoshiyuki; Watanabe, Hiroyuki; Yokoi, Kayoko; Uchiyama, Makoto; Baba, Toru; Iizuka, Osamu; Kanno, Shigenori; Kamimura, Naoto; Kazui, Hiroaki; Hashimoto, Mamoru; Ikeda, Manabu; Takeshita, Chieko; Shimomura, Tatsuo; Mori, Etsuro

2016-01-01

Visual hallucinations are a core clinical feature of dementia with Lewy bodies (DLB), and this symptom is important in the differential diagnosis and prediction of treatment response. The pareidolia test is a tool that evokes visual hallucination-like illusions, and these illusions may be a surrogate marker of visual hallucinations in DLB. We created a simplified version of the pareidolia test and examined its validity and reliability to establish the clinical utility of this test. The pareidolia test was administered to 52 patients with DLB, 52 patients with Alzheimer's disease (AD) and 20 healthy controls (HCs). We assessed the test-retest/inter-rater reliability using the intra-class correlation coefficient (ICC) and the concurrent validity using the Neuropsychiatric Inventory (NPI) hallucinations score as a reference. A receiver operating characteristic (ROC) analysis was used to evaluate the sensitivity and specificity of the pareidolia test to differentiate DLB from AD and HCs. The pareidolia test required approximately 15 minutes to administer, exhibited good test-retest/inter-rater reliability (ICC of 0.82), and moderately correlated with the NPI hallucinations score (rs = 0.42). Using an optimal cut-off score set according to the ROC analysis, and the pareidolia test differentiated DLB from AD with a sensitivity of 81% and a specificity of 92%. Our study suggests that the simplified version of the pareidolia test is a valid and reliable surrogate marker of visual hallucinations in DLB.
Reliability and accuracy of real-time visualization techniques for measuring school cafeteria tray waste: validating the quarter-waste method.

PubMed

Hanks, Andrew S; Wansink, Brian; Just, David R

2014-03-01

Measuring food waste is essential to determine the impact of school interventions on what children eat. There are multiple methods used for measuring food waste, yet it is unclear which method is most appropriate in large-scale interventions with restricted resources. This study examines which of three visual tray waste measurement methods is most reliable, accurate, and cost-effective compared with the gold standard of individually weighing leftovers. School cafeteria researchers used the following three visual methods to capture tray waste in addition to actual food waste weights for 197 lunch trays: the quarter-waste method, the half-waste method, and the photograph method. Inter-rater and inter-method reliability were highest for on-site visual methods (0.90 for the quarter-waste method and 0.83 for the half-waste method) and lowest for the photograph method (0.48). This low reliability is partially due to the inability of photographs to determine whether packaged items (such as milk or yogurt) are empty or full. In sum, the quarter-waste method was the most appropriate for calculating accurate amounts of tray waste, and the photograph method might be appropriate if researchers only wish to detect significant differences in waste or consumption of selected, unpackaged food. Copyright © 2014 Academy of Nutrition and Dietetics. Published by Elsevier Inc. All rights reserved.
Reliability of Autism-Tics, AD/HD, and other Comorbidities (A-TAC) inventory in a test-retest design.

PubMed

Larson, Tomas; Kerekes, Nóra; Selinus, Eva Norén; Lichtenstein, Paul; Gumpert, Clara Hellner; Anckarsäter, Henrik; Nilsson, Thomas; Lundström, Sebastian

2014-02-01

The Autism-Tics, AD/HD, and other Comorbidities (A-TAC) inventory is used in epidemiological research to assess neurodevelopmental problems and coexisting conditions. Although the A-TAC has been applied in various populations, data on retest reliability are limited. The objective of the present study was to present additional reliability data. The A-TAC was administered by lay assessors and was completed on two occasions by parents of 400 individual twins, with an average interval of 70 days between test sessions. Intra- and inter-rater reliability were analysed with intraclass correlations and Cohen's kappa. A-TAC showed excellent test-retest intraclass correlations for both autism spectrum disorder and attention deficit hyperactivity disorder (each at .84). Most modules in the A-TAC had intra- and inter-rater reliability intraclass correlation coefficients of > or = .60. Cohen's kappa indi- cated acceptable reliability. The current study provides statistical evidence that the A-TAC yields good test-retest reliability in a population-based cohort of children.
Inter- and intra-observer reliability of clinical movement-control tests for marines

PubMed Central

2012-01-01

Background Musculoskeletal disorders particularly in the back and lower extremities are common among marines. Here, movement-control tests are considered clinically useful for screening and follow-up evaluation. However, few studies have addressed the reliability of clinical tests, and no such published data exists for marines. The present aim was therefore to determine the inter- and intra-observer reliability of clinically convenient tests emphasizing movement control of the back and hip among marines. A secondary aim was to investigate the sensitivity and specificity of these clinical tests for discriminating musculoskeletal pain disorders in this group of military personnel. Methods This inter- and intra-observer reliability study used a test-retest approach with six standardized clinical tests focusing on movement control for back and hip. Thirty-three marines (age 28.7 yrs, SD 5.9) on active duty volunteered and were recruited. They followed an in-vivo observation test procedure that covered both low- and high-load (threshold) tasks relevant for marines on operational duty. Two independent observers simultaneously rated performance as “correct” or “incorrect” following a standardized assessment protocol. Re-testing followed 7–10 days thereafter. Reliability was analysed using kappa (κ) coefficients, while discriminative power of the best-fitting tests for back- and lower-extremity pain was assessed using a multiple-variable regression model. Results Inter-observer reliability for the six tests was moderate to almost perfect with κ-coefficients ranging between 0.56-0.95. Three tests reached almost perfect inter-observer reliability with mean κ-coefficients > 0.81. However, intra-observer reliability was fair-to-moderate with mean κ-coefficients between 0.22-0.58. Three tests achieved moderate intra-observer reliability with κ-coefficients > 0.41. Combinations of one low- and one high-threshold test best discriminated prior back pain, but results were inconsistent for lower-extremity pain. Conclusions Our results suggest that clinical tests of movement control of back and hip are reliable for use in screening protocols using several observers with marines. However, test-retest reproducibility was less accurate, which should be considered in follow-up evaluations. The results also indicate that combinations of low- and high-threshold tests have discriminative validity for prior back pain, but were inconclusive for lower-extremity pain. PMID:23273285
Evaluating Written Patient Information for Eczema in German: Comparing the Reliability of Two Instruments, DISCERN and EQIP

PubMed Central

McCool, Megan E.; Wahl, Josepha; Schlecht, Inga; Apfelbacher, Christian

2015-01-01

Patients actively seek information about how to cope with their health problems, but the quality of the information available varies. A number of instruments have been developed to assess the quality of patient information, primarily though in English. Little is known about the reliability of these instruments when applied to patient information in German. The objective of our study was to investigate and compare the reliability of two validated instruments, DISCERN and EQIP, in order to determine which of these instruments is better suited for a further study pertaining to the quality of information available to German patients with eczema. Two independent raters evaluated a random sample of 20 informational brochures in German. All the brochures addressed eczema as a disorder and/or therapy options and care. Intra-rater and inter-rater reliability were assessed by calculating intra-class correlation coefficients, agreement was tested with weighted kappas, and the correlation of the raters’ scores for each instrument was measured with Pearson’s correlation coefficient. DISCERN demonstrated substantial intra- and inter-rater reliability. It also showed slightly better agreement than EQIP. There was a strong correlation of the raters’ scores for both instruments. The findings of this study support the reliability of both DISCERN and EQIP. However, based on the results of the inter-rater reliability, agreement and correlation analyses, we consider DISCERN to be the more precise tool for our project on patient information concerning the treatment and care of eczema. PMID:26440612
Evaluating Written Patient Information for Eczema in German: Comparing the Reliability of Two Instruments, DISCERN and EQIP.

PubMed

McCool, Megan E; Wahl, Josepha; Schlecht, Inga; Apfelbacher, Christian

2015-01-01

Patients actively seek information about how to cope with their health problems, but the quality of the information available varies. A number of instruments have been developed to assess the quality of patient information, primarily though in English. Little is known about the reliability of these instruments when applied to patient information in German. The objective of our study was to investigate and compare the reliability of two validated instruments, DISCERN and EQIP, in order to determine which of these instruments is better suited for a further study pertaining to the quality of information available to German patients with eczema. Two independent raters evaluated a random sample of 20 informational brochures in German. All the brochures addressed eczema as a disorder and/or therapy options and care. Intra-rater and inter-rater reliability were assessed by calculating intra-class correlation coefficients, agreement was tested with weighted kappas, and the correlation of the raters' scores for each instrument was measured with Pearson's correlation coefficient. DISCERN demonstrated substantial intra- and inter-rater reliability. It also showed slightly better agreement than EQIP. There was a strong correlation of the raters' scores for both instruments. The findings of this study support the reliability of both DISCERN and EQIP. However, based on the results of the inter-rater reliability, agreement and correlation analyses, we consider DISCERN to be the more precise tool for our project on patient information concerning the treatment and care of eczema.

The inter-rater reliability test of the modified Morse Fall Scale among patients ≥ 55 years old in an acute care hospital in Singapore.

PubMed

Tang, Wing Sze; Chow, Yeow Leng; Koh, Serena Siew Lin

2014-02-01

A prospective, descriptive study was conducted in an acute care hospital in Singapore to determine the inter-rater reliability of the modified Morse Fall Scale by evaluating the degrees of agreement on the ratings of the individual items and overall score between the 'gold standard' assessor and the facility assessors. One hundred and forty-two subjects were recruited during the 1.5 month data collection period. The simple and weighted κ-values were all > 0.8 except for the item 'effects of medications' (κ and κw = 0.63), and the correlation coefficient (rs = 0.89) was significantly high at a significance level of < 0.001. The modified Morse Fall Scale was shown to be a reliable fall risk assessment tool having a relative high inter-rater reliability level for the overall score and individual items. This study provides evidence-based psychometric support for the clinical application of this tool. © 2013 Wiley Publishing Asia Pty Ltd.
Use of volunteer student abstractors for a retrospective cohort analysis: a study of inter-rater reliability.

PubMed

Gritsiouk, Yaroslav; Hegsted, Damian; Gardiner, Stuart; Merriman, Lisa; Gubler, Kelly Dean

2013-05-01

Little is known about the reliability of data collected by abstractors without professional medical training. This investigation sought to determine the level of agreement among untrained volunteer abstractors as part of a study to evaluate the risk assessment of venous thromboembolism in patients who have undergone trauma. Forty-nine paper charts were chosen randomly from a volunteer-reviewed cohort of 2,339 and were compared with those of a single experienced abstractor. Inter-rater agreement was assessed using percent agreement, Cohen's kappa, and prevalence-adjusted bias-adjusted kappa (PABAK). Of the 71 data points, 28 had perfect agreement. The average agreement across all charts was 97%. Data with imperfect agreement had kappa values between .27 and .96 (mean, .75), with one additional value at zero even though it was associated with an agreement of 94%. PABAK values ranged from .67 to .98 (mean, .91), an average increase of .17 compared with kappa values. The performance of volunteers showed outstanding inter-rater reliability; however, limitations of interpretation can influence reliability. Copyright © 2013 Elsevier Inc. All rights reserved.
Validity and Reliability of the Clinical Competency Evaluation Instrument for Use among Physiotherapy Students: Pilot study.

PubMed

Muhamad, Zailani; Ramli, Ayiesah; Amat, Salleh

2015-05-01

The aim of this study was to determine the content validity, internal consistency, test-retest reliability and inter-rater reliability of the Clinical Competency Evaluation Instrument (CCEVI) in assessing the clinical performance of physiotherapy students. This study was carried out between June and September 2013 at University Kebangsaan Malaysia (UKM), Kuala Lumpur, Malaysia. A panel of 10 experts were identified to establish content validity by evaluating and rating each of the items used in the CCEVI with regards to their relevance in measuring students' clinical competency. A total of 50 UKM undergraduate physiotherapy students were assessed throughout their clinical placement to determine the construct validity of these items. The instrument's reliability was determined through a cross-sectional study involving a clinical performance assessment of 14 final-year undergraduate physiotherapy students. The content validity index of the entire CCEVI was 0.91, while the proportion of agreement on the content validity indices ranged from 0.83-1.00. The CCEVI construct validity was established with factor loading of ≥0.6, while internal consistency (Cronbach's alpha) overall was 0.97. Test-retest reliability of the CCEVI was confirmed with a Pearson's correlation range of 0.91-0.97 and an intraclass coefficient correlation range of 0.95-0.98. Inter-rater reliability of the CCEVI domains ranged from 0.59 to 0.97 on initial and subsequent assessments. This pilot study confirmed the content validity of the CCEVI. It showed high internal consistency, thereby providing evidence that the CCEVI has moderate to excellent inter-rater reliability. However, additional refinement in the wording of the CCEVI items, particularly in the domains of safety and documentation, is recommended to further improve the validity and reliability of the instrument.
A study of the reliability of the Nociception Coma Scale.

PubMed

Riganello, F; Cortese, M D; Arcuri, F; Candelieri, A; Guglielmino, F; Dolce, G; Sannita, W G; Schnakers, C

2015-04-01

In this study, we investigated the reliability of the Nociception Coma Scale which has recently been developed to assess nociception in non-communicative, severely brain-injured patients. Prospective cross-sequential study. Semi-intensive care unit and long-term brain injury care. Forty-four patients diagnosed as being in a vegetative state (n=26) or in a minimally conscious state (n=18). Patients were assessed by two experts (rater A and rater B) on two consecutive weeks to measure inter-rater agreement and test-retest reliability. Total scores and subscores of the Nociception Coma Scale. We performed a total of 176 assessments. The inter-rater agreement was moderate for the total scores (k = 0.57) and fair to substantial for the subscores (0.33 ≤ k ≤ 0.62) on week 2. The test-retest reliability was substantial for the total scores (k = 0.66) and moderate to almost perfect for the subscores (0.53 ≤ k ≤ 0.96) for rater A. The inter-rater agreement was weaker on week 1, whereas the test-retest reliability was lower for the least experienced rater (rater B). This study provides further evidence of the psychometric qualities of the Nociception Coma Scale. Future studies should assess the impact of practical experience and background on administration and scoring of the scale. © The Author(s) 2014.
The German Version of the Manchester Triage System and Its Quality Criteria – First Assessment of Validity and Reliability

PubMed Central

Gräff, Ingo; Goldschmidt, Bernd; Glien, Procula; Bogdanow, Manuela; Fimmers, Rolf; Hoeft, Andreas; Kim, Se-Chan; Grigutsch, Daniel

2014-01-01

Background The German Version of the Manchester Triage System (MTS) has found widespread use in EDs across German-speaking Europe. Studies about the quality criteria validity and reliability of the MTS currently only exist for the English-language version. Most importantly, the content of the German version differs from the English version with respect to presentation diagrams and change indicators, which have a significant impact on the category assigned. This investigation offers a preliminary assessment in terms of validity and inter-rater reliability of the German MTS. Methods Construct validity of assigned MTS level was assessed based on comparisons to hospitalization (general / intensive care), mortality, ED and hospital length of stay, level of prehospital care and number of invasive diagnostics. A sample of 45,469 patients was used. Inter-rater agreement between an expert and triage nurses (reliability) was calculated separately for a subset group of 167 emergency patients. Results For general hospital admission the area under the curve (AUC) of the receiver operating characteristic was 0.749; for admission to ICU it was 0.871. An examination of MTS-level and number of deceased patients showed that the higher the priority derived from MTS, the higher the number of deaths (p<0.0001 / χ2 Test). There was a substantial difference in the 30-day survival among the 5 MTS categories (p<0.0001 / log-rank test).The AUC for the predict 30-day mortality was 0.613. Categories orange and red had the highest numbers of heart catheter and endoscopy. Category red and orange were mostly accompanied by an emergency physician, whereas categories blue and green were walk-in patients. Inter-rater agreement between expert triage nurses was almost perfect (κ = 0.954). Conclusion The German version of the MTS is a reliable and valid instrument for a first assessment of emergency patients in the emergency department. PMID:24586477
Evaluation of a modified Karnofsky score to assess physical and psychological wellbeing of cats in a hospital setting.

PubMed

Taffin, Elien Rl; Paepe, Dominique; Campos, Miguel; Duchateau, Luc; Goris, Nesya; De Roover, Katrien; Daminet, Sylvie

2016-11-01

Objectives The Karnofsky score (KS) modified for cats, a scoring system to rate health and quality of life (QOL) in cats, is used in clinical trials, but its reliability and validity are yet to be determined. The present study aims to evaluate the scientific robustness of the KS when adapted for use in a hospital setting. Methods A list of variables to consider during the physical examination, which informs the clinician's score (CS) part of the KS, was added and clinicians were allowed to choose a score anywhere between 0 and 50. The Karnofsky QOL questionnaire was adapted for use in a hospital setting. F-tests with Bonferroni correction and Spearman rank correlation coefficients were used to evaluate reliability and validity of the KS to assess the health and wellbeing of cats in a hospital setting. The records of 54 feline immunodeficiency virus-positive cats, which were recruited for a clinical trial and hospitalised for 6 weeks, were reviewed. Four veterinarians scored the CS, and one veterinarian and a veterinary nurse assessed the QOL score. Results Mean absolute difference between observers was significantly larger for the CS than for the QOL score ( P <0.001) and two veterinarians scored significantly higher than the remaining two veterinarians ( P <0.001). Inter-observer correlation ranged from 0.45-0.75 for the CS. For the QOL score, the absolute difference between observers was small, no significant difference was found between observers and a high degree of inter-observer correlation was noted (r = 0.91). Conclusions and relevance The results indicate low inter-observer reliability for the CS, requiring additional modifications to this part of the KS. The QOL score seems more reliable, and the questionnaire may serve as a reliable tool in the assessment of QOL in cats in a hospital setting. Consequently, further adaptation of the KS is mandatory when simultaneous assessment of both the cat's clinical health and perceived wellbeing is required.
Psychometric testing of the modified Care Dependency Scale among hospitalized school-aged children in Germany.

PubMed

Tork, Hanan; Lohrmann, Christa; Dassen, Theo

2008-03-01

The objectives of this study were to examine the psychometric properties of the modified Care Dependency Scale in a pediatric setting and to explore the extent of dependency of school-aged children regarding their self-care. The data were collected from 130 hospitalized children, aged 6-12 years. The reliability was determined by Cronbach's alpha, which showed a high level of consistency. The subsequent inter-rater reliability revealed moderate-to-substantial agreement. The criterion-related validity was tested by comparing the sum scores of the Care Dependency Scale for Paediatrics and the Visual Analog Scale. Factor analysis was used to investigate the construct validity and resulted in a one-factor solution. In conclusion, this study provides evidence that the Care Dependency Scale for Paediatrics is a valid and reliable measure that offers a comprehensive assessment from a nursing perspective and enables nurses to help children acquire independence.
Inter-Rater Agreement of Pressure Ulcer Risk and Prevention Measures in the National Database of Nursing Quality Indicators(®) (NDNQI).

PubMed

Waugh, Shirley Moore; Bergquist-Beringer, Sandra

2016-06-01

In this descriptive multi-site study, we examined inter-rater agreement on 11 National Database of Nursing Quality Indicators(®) (NDNQI(®) ) pressure ulcer (PrU) risk and prevention measures. One hundred twenty raters at 36 hospitals captured data from 1,637 patient records. At each hospital, agreement between the most experienced rater and each other team rater was calculated for each measure. In the ratings studied, 528 patients were rated as "at risk" for PrU and, therefore, were included in calculations of agreement for the prevention measures. Prevalence-adjusted kappa (PAK) was used to interpret inter-rater agreement because prevalence of single responses was high. The PAK values for eight measures indicated "substantial" to "near perfect" agreement between most experienced and other team raters: Skin assessment on admission (.977, 95% CI [.966-.989]), PrU risk assessment on admission (.978, 95% CI [.964-.993]), Time since last risk assessment (.790, 95% CI [.729-.852]), Risk assessment method (.997, 95% CI [.991-1.0]), Risk status (.877, 95% CI [.838-.917]), Any prevention (.856, 95% CI [.76-.943]), Skin assessment (.956, 95% CI [.904-1.0]), and Pressure-redistribution surface use (.839, 95% CI [.763-.916]). For three intervention measures, PAK values fell below the recommended value of ≥.610: Routine repositioning (.577, 95% CI [.494-.661]), Nutritional support (.500, 95% CI [.418-.581]), and Moisture management (.556, 95% CI [.469-.643]). Areas of disagreement were identified. Findings provide support for the reliability of 8 of the 11 measures. Further clarification of data collection procedures is needed to improve reliability for the less reliable measures. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Hip- and knee-strength assessments using a hand-held dynamometer with external belt-fixation are inter-tester reliable.

PubMed

Thorborg, Kristian; Bandholm, Thomas; Hölmich, Per

2013-03-01

In football, ice-hockey, and track and field, injuries have been predicted, and hip- and knee-strength deficits quantified using hand-held dynamometry (HHD). However, systematic bias exists when testers of different sex and strength perform the measurements. Belt-fixation of the dynamometer may resolve this. The aim of the present study was therefore to examine the inter-tester reliability concerning strength assessments of isometric hip abduction, adduction, flexion, extension and knee-flexion strength, using HHD with external belt-fixation. Twenty-one healthy athletes (6 women), 30 (8.6) (mean (SD)) years of age, were included. Two physiotherapy students (1 female and 1 male) performed all the measurements after careful instruction and procedure training. Isometric hip abduction, adduction, flexion, extension, and knee-flexion strength were tested. The tester-order and hip-action order were randomised. No systematic between-tester differences (bias) were observed for any of the hip or knee actions. The intra-class correlation coefficients (ICC 2.1) ranged from 0.76 to 0.95. Furthermore, standard errors of measurement in per cent (SEM %) ranged from 5 to 11 %, and minimal detectable change in per cent (MDC %) from 14 to 29 % for the different hip and knee actions. The present study shows that isometric hip- and knee-strength measurements have acceptable inter-tester reliability at the group level, when testing strong individuals, using HHD with belt-fixation. This procedure is therefore perfectly suited for the evaluation and monitoring of strong athletes with hip, groin and hamstring injuries, some of the most common and troublesome injuries in sports. Diagnostic, Level III.
Is One Trial Sufficient to Obtain Excellent Pressure Pain Threshold Reliability in the Low Back of Asymptomatic Individuals? A Test-Retest Study.

PubMed

Balaguier, Romain; Madeleine, Pascal; Vuillerme, Nicolas

2016-01-01

The assessment of pressure pain threshold (PPT) provides a quantitative value related to the mechanical sensitivity to pain of deep structures. Although excellent reliability of PPT has been reported in numerous anatomical locations, its absolute and relative reliability in the lower back region remains to be determined. Because of the high prevalence of low back pain in the general population and because low back pain is one of the leading causes of disability in industrialized countries, assessing pressure pain thresholds over the low back is particularly of interest. The purpose of this study study was (1) to evaluate the intra- and inter- absolute and relative reliability of PPT within 14 locations covering the low back region of asymptomatic individuals and (2) to determine the number of trial required to ensure reliable PPT measurements. Fifteen asymptomatic subjects were included in this study. PPTs were assessed among 14 anatomical locations in the low back region over two sessions separated by one hour interval. For the two sessions, three PPT assessments were performed on each location. Reliability was assessed computing intraclass correlation coefficients (ICC), standard error of measurement (SEM) and minimum detectable change (MDC) for all possible combinations between trials and sessions. Bland-Altman plots were also generated to assess potential bias in the dataset. Relative reliability for both intra- and inter- session was almost perfect with ICC ranged from 0.85 to 0.99. With respect to the intra-session, no statistical difference was reported for ICCs and SEM regardless of the conducted comparisons between trials. Conversely, for inter-session, ICCs and SEM values were significantly larger when two consecutive PPT measurements were used for data analysis. No significant difference was observed for the comparison between two consecutive measurements and three measurements. Excellent relative and absolute reliabilities were reported for both intra- and inter-session. Reliable measurements can be equally achieved when using the mean of two or three consecutive PPT measurements, as usually proposed in the literature, or with only the first one. Although reliability was almost perfect regardless of the conducted comparison between PPT assessments, our results suggest using two consecutive measurements to obtain higher short term absolute reliability.
Inter-Rater Reliability of Cyclotorsion Measurements Using Fundus Photography.

PubMed

Dysli, Muriel; Kanku, Madeleine; Traber, Ghislaine L

2018-04-01

The foveo-papillary angle (FPA) on fundus photographs is the accepted standard for the measurement of ocular cyclotorsion. We assessed the inter-rater reliability of this method in healthy subjects and in patients with trochlear nerve palsies. In this methodological study, fundus photographs of healthy subjects and of patients with trochlear nerve palsies were made with a fundus camera (Zeiss Fundus Camera FF 450 plus, Jena, Germany). Three independent observers measured the FPA on the fundus photographs of all subjects in synedra View (synedra View 16, Version 16.0.0.11, Innsbruck, Austria). One hundred and four eyes of 52 subjects (26 healthy controls and 26 patients) were assessed. The mean FPA of the healthy controls was 5.80 degrees (°) [± 0.44 standard error of the mean (SEM)] compared to 11.55° (± 0.80 SEM) for patients with trochlear nerve palsies. The inter-rater reliability of all measured FPAs showed an intraclass correlation coefficient (ICC) of 0.98 (95% CI 0.97 - 0.98). The inter-rater reliability of objective cyclotorsion measurements using fundus photographs was very high. Georg Thieme Verlag KG Stuttgart · New York.
An empirical look at the Defense Mechanism Test (DMT): reliability and construct validity.

PubMed

Ekehammar, Bo; Zuber, Irena; Konstenius, Marja-Liisa

2005-07-01

Although the Defense Mechanism Test (DMT) has been in use for almost half a century, there are still quite contradictory views about whether it is a reliable instrument, and if so, what it really measures. Thus, based on data from 39 female students, we first examined DMT inter-coder reliability by analyzing the agreement among trained judges in their coding of the same DMT protocols. Second, we constructed a "parallel" photographic picture that retained all structural characteristic of the original and analyzed DMT parallel-test reliability. Third, we examined the construct validity of the DMT by (a) employing three self-report defense-mechanism inventories and analyzing the intercorrelations between DMT defense scores and corresponding defenses in these instruments, (b) studying the relationships between DMT responses and scores on trait and state anxiety, and (c) relating DMT-defense scores to measures of self-esteem. The main results showed that the DMT can be coded with high reliability by trained coders, that the parallel-test reliability is unsatisfactory compared to traditional psychometric standards, that there is a certain generalizability in the number of perceptual distortions that people display from one picture to another, and that the construct validation provided meager empirical evidence for the conclusion that the DMT measures what it purports to measure, that is, psychological defense mechanisms.
The influence of incubation time, sample preparation and exposure to oxygen on the quality of the MALDI-TOF MS spectrum of anaerobic bacteria.

PubMed

Veloo, A C M; Elgersma, P E; Friedrich, A W; Nagy, E; van Winkelhoff, A J

2014-12-01

With matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS), bacteria can be identified quickly and reliably. This accounts especially for anaerobic bacteria. Because growth rate and oxygen sensitivity differ among anaerobic bacteria, we aimed to study the influence of incubation time, exposure to oxygen and sample preparation on the quality of the spectrum using the Bruker system. Also, reproducibility and inter-examiner variability were determined. Twenty-six anaerobic species, representing 17 genera, were selected based on gram-stain characteristics, growth rate and colony morphology. Inter-examiner variation showed that experience in the preparation of the targets can be a significant variable. The influence of incubation time was determined between 24 and 96 h of incubation. Reliable species identification was obtained after 48 h of incubation for gram-negative anaerobes and after 72 h for gram-positive anaerobes. Exposure of the cultures to oxygen did not influence the results of the MALDI-TOF MS identifications of all tested gram-positive species. Fusobacterium necrophorum and Prevotella intermedia could not be identified after >24 h and 48 h of exposure to oxygen, respectively. Other tested gram-negative bacteria could be identified after 48 h of exposure to oxygen. Most of the tested species could be identified using the direct spotting method. Bifidobacterium longum and Finegoldia magna needed on-target extraction with 70% formic acid in order to obtain reliable species identification and Peptoniphilus ivorii a full extraction. Spectrum quality was influenced by the amount of bacteria spotted on the target, the homogeneity of the smear and the experience of the examiner. © 2014 The Authors Clinical Microbiology and Infection © 2014 European Society of Clinical Microbiology and Infectious Diseases.
Reliability of 3D laser-based anthropometry and comparison with classical anthropometry.

PubMed

Kuehnapfel, Andreas; Ahnert, Peter; Loeffler, Markus; Broda, Anja; Scholz, Markus

2016-05-26

Anthropometric quantities are widely used in epidemiologic research as possible confounders, risk factors, or outcomes. 3D laser-based body scans (BS) allow evaluation of dozens of quantities in short time with minimal physical contact between observers and probands. The aim of this study was to compare BS with classical manual anthropometric (CA) assessments with respect to feasibility, reliability, and validity. We performed a study on 108 individuals with multiple measurements of BS and CA to estimate intra- and inter-rater reliabilities for both. We suggested BS equivalents of CA measurements and determined validity of BS considering CA the gold standard. Throughout the study, the overall concordance correlation coefficient (OCCC) was chosen as indicator of agreement. BS was slightly more time consuming but better accepted than CA. For CA, OCCCs for intra- and inter-rater reliability were greater than 0.8 for all nine quantities studied. For BS, 9 of 154 quantities showed reliabilities below 0.7. BS proxies for CA measurements showed good agreement (minimum OCCC > 0.77) after offset correction. Thigh length showed higher reliability in BS while upper arm length showed higher reliability in CA. Except for these issues, reliabilities of CA measurements and their BS equivalents were comparable.
Validation of the Dementia Care Assessment Packet-Instrumental Activities of Daily Living

PubMed Central

Lee, Seok Bum; Park, Jeong Ran; Yoo, Jeong-Hwa; Park, Joon Hyuk; Lee, Jung Jae; Yoon, Jong Chul; Jhoo, Jin Hyeong; Lee, Dong Young; Woo, Jong Inn; Han, Ji Won; Huh, Yoonseok; Kim, Tae Hui

2013-01-01

Objective We aimed to evaluate the psychometric properties of the IADL measure included in the Dementia Care Assessment Packet (DCAP-IADL) in dementia patients. Methods The study involved 112 dementia patients and 546 controls. The DCAP-IADL was scored in two ways: observed score (OS) and predicted score (PS). The reliability of the DCAP-IADL was evaluated by testing its internal consistency, inter-rater reliability and test-retest reliability. Discriminant validity was evaluated by comparing the mean OS and PS between dementia patients and controls by ANCOVA. Pearson or Spearman correlation analysis was performed with other instruments to assess concurrent validity. Receiver operating characteristics curve analysis was performed to examine diagnostic accuracy. Results Chronbach's α coefficients of the DCAP-IADL were above 0.7. The values in dementia patients were much higher (OS=0.917, PS=0.927), indicating excellent degrees of internal consistency. Inter-rater reliabilities and test-retest reliabilities were statistically significant (p<0.05). PS exhibited higher reliabilities than OS. The mean OS and PS of dementia patients were significantly higher than those of the non-demented group after controlling for age, sex and education level. The DCAP-IADL was significantly correlated with other IADL instruments and MMSE-KC (p<0.001). Areas under the curves of the DCAP-IADL were above 0.9. Conclusion The DCAP-IADL is a reliable and valid instrument for evaluating instrumental ability of daily living for the elderly, and may also be useful for screening dementia. Moreover, administering PS may enable the DCAP-IADL to overcome the differences in gender, culture and life style that hinders accurate evaluation of the elderly in previous IADL instruments. PMID:24302946
Diminished neural network dynamics after moderate and severe traumatic brain injury.

PubMed

Gilbert, Nicholas; Bernier, Rachel A; Calhoun, Vincent D; Brenner, Einat; Grossner, Emily; Rajtmajer, Sarah M; Hillary, Frank G

2018-01-01

Over the past decade there has been increasing enthusiasm in the cognitive neurosciences around using network science to understand the system-level changes associated with brain disorders. A growing literature has used whole-brain fMRI analysis to examine changes in the brain's subnetworks following traumatic brain injury (TBI). Much of network modeling in this literature has focused on static network mapping, which provides a window into gross inter-nodal relationships, but is insensitive to more subtle fluctuations in network dynamics, which may be an important predictor of neural network plasticity. In this study, we examine the dynamic connectivity with focus on state-level connectivity (state) and evaluate the reliability of dynamic network states over the course of two runs of intermittent task and resting data. The goal was to examine the dynamic properties of neural networks engaged periodically with task stimulation in order to determine: 1) the reliability of inter-nodal and network-level characteristics over time and 2) the transitions between distinct network states after traumatic brain injury. To do so, we enrolled 23 individuals with moderate and severe TBI at least 1-year post injury and 19 age- and education-matched healthy adults using functional MRI methods, dynamic connectivity modeling, and graph theory. The results reveal several distinct network "states" that were reliably evident when comparing runs; the overall frequency of dynamic network states are highly reproducible (r-values>0.8) for both samples. Analysis of movement between states resulted in fewer state transitions in the TBI sample and, in a few cases, brain injury resulted in the appearance of states not exhibited by the healthy control (HC) sample. Overall, the findings presented here demonstrate the reliability of observable dynamic mental states during periods of on-task performance and support emerging evidence that brain injury may result in diminished network dynamics.
Diminished neural network dynamics after moderate and severe traumatic brain injury

PubMed Central

Gilbert, Nicholas; Bernier, Rachel A.; Calhoun, Vincent D.; Brenner, Einat; Grossner, Emily; Rajtmajer, Sarah M.

2018-01-01

Over the past decade there has been increasing enthusiasm in the cognitive neurosciences around using network science to understand the system-level changes associated with brain disorders. A growing literature has used whole-brain fMRI analysis to examine changes in the brain’s subnetworks following traumatic brain injury (TBI). Much of network modeling in this literature has focused on static network mapping, which provides a window into gross inter-nodal relationships, but is insensitive to more subtle fluctuations in network dynamics, which may be an important predictor of neural network plasticity. In this study, we examine the dynamic connectivity with focus on state-level connectivity (state) and evaluate the reliability of dynamic network states over the course of two runs of intermittent task and resting data. The goal was to examine the dynamic properties of neural networks engaged periodically with task stimulation in order to determine: 1) the reliability of inter-nodal and network-level characteristics over time and 2) the transitions between distinct network states after traumatic brain injury. To do so, we enrolled 23 individuals with moderate and severe TBI at least 1-year post injury and 19 age- and education-matched healthy adults using functional MRI methods, dynamic connectivity modeling, and graph theory. The results reveal several distinct network “states” that were reliably evident when comparing runs; the overall frequency of dynamic network states are highly reproducible (r-values>0.8) for both samples. Analysis of movement between states resulted in fewer state transitions in the TBI sample and, in a few cases, brain injury resulted in the appearance of states not exhibited by the healthy control (HC) sample. Overall, the findings presented here demonstrate the reliability of observable dynamic mental states during periods of on-task performance and support emerging evidence that brain injury may result in diminished network dynamics. PMID:29883447
Inter-Rater Reliability of Neck Reflex Points in Women with Chronic Neck Pain.

PubMed

Weinschenk, Stefan; Göllner, Richard; Hollmann, Markus W; Hotz, Lorenz; Picardi, Susanne; Hubbert, Katharina; Strowitzki, Thomas; Meuser, Thomas

2016-01-01

Neck reflex points (NRP) are tender soft tissue areas of the cervical region that display reflectory changes in response to chronic inflammations of correlated regions in the visceral cranium. Six bilateral areas, NRP C0, C1, C2, C3, C4 and C7, are detectable by palpating the lateral neck. We investigated the inter-rater reliability of NRP to assess their potential clinical relevance. 32 consecutive patients with chronic neck pain were examined for NRP tenderness by an experienced physician and an inexperienced medical student in a blinded design. A detailed description of the palpation technique is included in this section. Absence of pain was defined as pain index (PI) = 0, slight tenderness = 1, and marked pain = 2. Findings were evaluated either by pair-wise Cohen's kappa (ĸ) or by percentage of agreement (PA). Examiners identified 40% and 41% of positive NRP, respectively (PI > 0, physician: 155, student: 157) with a slight preference for the left side (1.2:1). The number of patients identified with >6 positive NRP by the examiners was similar (13 vs. 12 patients). ĸ values ranged from 0.52 to 0.95. The overall kappa was ĸ = 0.80 for the left and ĸ = 0.74 for the right side. PA varied from 78.1% to 96.9% with strongest agreement at NRP C0, NRP C2, and NRP C7. Inter-rater agreement was independent of patients' age, gender, body mass index and examiner's experience. The high reproducibility suggests the clinical relevance of NRP in women. © 2016 S. Karger GmbH, Freiburg.
Establishing inter-rater reliability scoring in a state trauma system.

PubMed

Read-Allsopp, Christine

2004-01-01

Trauma systems rely on accurate Injury Severity Scoring (ISS) to describe trauma patient populations. Twenty-seven (27) Trauma Nurse Coordinators and Data Managers across the state of New South Wales, Australia trauma network were instructed in the uses and techniques of the Abbreviated Injury Scale (AIS) from the Association for the Advancement of Automotive Medicine. The aim is to provide accurate, reliable and valid data for the state trauma network. Four (4) months after the course a coding exercise was conducted to assess inter-rater reliability. The results show that inter-rater reliability is with accepted international standards.
Inter-rater reliability of measures to characterize the tobacco retail environment in Mexico.

PubMed

Hall, Marissa G; Kollath-Cattano, Christy; Reynales-Shigematsu, Luz Myriam; Thrasher, James F

2015-01-01

To evaluate the inter-rater reliability of a data collection instrument to assess the tobacco retail environment in Mexico, after major marketing regulations were implemented. In 2013, two data collectors independently evaluated 21 stores in two census tracts, through a data collection instrument that assessed the presence of price promotions, whether single cigarettes were sold, the number of visible advertisements, the presence of signage prohibiting the sale of cigarettes to minors, and characteristics of cigarette pack displays. We evaluated the inter-rater reliability of the collected data, through the calculation of metrics such as intraclass correlation coefficient, percent agreement, Cohen's kappa and Krippendorff's alpha. Most measures demonstrated substantial or perfect inter-rater reliability. Our results indicate the potential utility of the data collection instrument for future point-of-sale research.

Measuring the suffering of end-stage dementia: reliability and validity of the Mini-Suffering State Examination.

PubMed

Aminoff, Bechor Z; Purits, Elena; Noy, Shlomo; Adunsky, Abraham

2004-01-01

Assessment of suffering is extremely important in dying end-stage dementia patients (ESDP). We have developed and examined the reliability and validity of the Mini-Suffering State Examination (MSSE), in 103 consecutive bedridden ESDP. Main outcome measures included inter-observer reliability and concurrent validity. Reliability of the MSSE questionnaire was satisfactory, with Cronbach alpha values of 0.735 and 0.718 for the two physicians (Ph-1, Ph-2), respectively. The kappa agreement coefficient was 0.791. There was a high agreement for seven items (kappa 0.882-0.972) and a substantial agreement for the other three items (kappa 0.621-0.682) of the MSSE. MSSE was validated versus the comfort assessment in dying with dementia (CAD-EOLD) scale and resulted in a significant Pearson correlation (r=-0.796, P<0.001). We conclude that the MSSE scale is a reliable and valid clinical tool, recommended for evaluating the severity of the patient's condition and the level of suffering of ESDP. Use of MSSE may improve medical management and facilitate communication between patients and caregivers.
[Primary care screening of problems in the elderly and a proposal for a screening protocol with a multidimensional approach].

PubMed

Lino, Valéria Teresa Saraiva; Portela, Margareth Crisóstomo; Camacho, Luiz Antonio Bastos; Rodrigues, Nadia Cristina Pinheiro; Andrade, Monica Kramer de Noronha; O'Dwyer, Gisele

2016-07-21

The objectives were to examine psychometric properties of a screening test for the elderly and to propose a protocol for use in primary care. The method consisted of four stages: (1) inter-evaluator reliability for performance tests and self-assessment questions for eight functions; (2) sensitivity and specificity of questions on depression and social support; (3) meeting of experts to select instrumental activities of daily living (IADL); and (4) elaboration of the protocol. Screening lasted 16 minutes. Inter-evaluator reliability was excellent for performance tests but poor for questions. Depression and social support showed satisfactory sensitivity and specificity (0.74/0.77 and 0.77/0.96). Four IADL were selected by more than 55% of the experts. Following the results, a screening protocol was elaborated that prioritized the use of performance tests, maintaining questions on mood, social support, and IADL. The study suggests better reproducibility of performance tests when compared to questions. For mood and social support, the questions may provide a first screening stage. The proposed protocol allows rapid screening of problems.
Test Assembly Implications for Providing Reliable and Valid Subscores

ERIC Educational Resources Information Center

Lee, Minji K.; Sweeney, Kevin; Melican, Gerald J.

2017-01-01

This study investigates the relationships among factor correlations, inter-item correlations, and the reliability estimates of subscores, providing a guideline with respect to psychometric properties of useful subscores. In addition, it compares subscore estimation methods with respect to reliability and distinctness. The subscore estimation…
Validity and inter-observer reliability of subjective hand-arm vibration assessments.

PubMed

Coenen, Pieter; Formanoy, Margriet; Douwes, Marjolein; Bosch, Tim; de Kraker, Heleen

2014-07-01

Exposure to mechanical vibrations at work (e.g., due to handling powered tools) is a potential occupational risk as it may cause upper extremity complaints. However, reliable and valid assessment methods for vibration exposure at work are lacking. Measuring hand-arm vibration objectively is often difficult and expensive, while often used information provided by manufacturers lacks detail. Therefore, a subjective hand-arm vibration assessment method was tested on validity and inter-observer reliability. In an experimental protocol, sixteen tasks handling powered tools were executed by two workers. Hand-arm vibration was assessed subjectively by 16 observers according to the proposed subjective assessment method. As a gold standard reference, hand-arm vibration was measured objectively using a vibration measurement device. Weighted κ's were calculated to assess validity, intra-class-correlation coefficients (ICCs) were calculated to assess inter-observer reliability. Inter-observer reliability of the subjective assessments depicting the agreement among observers can be expressed by an ICC of 0.708 (0.511-0.873). The validity of the subjective assessments as compared to the gold-standard reference can be expressed by a weighted κ of 0.535 (0.285-0.785). Besides, the percentage of exact agreement of the subjective assessment compared to the objective measurement was relatively low (i.e., 52% of all tasks). This study shows that subjectively assessed hand-arm vibrations are fairly reliable among observers and moderately valid. This assessment method is a first attempt to use subjective risk assessments of hand-arm vibration. Although, this assessment method can benefit from some future improvement, it can be of use in future studies and in field-based ergonomic assessments. Copyright © 2014 Elsevier Ltd and The Ergonomics Society. All rights reserved.
The reliability of Little's Irregularity Index for the upper dental arch using three dimensional (3D) digital models.

PubMed

Burns, Angus; Dowling, Adam H; Garvey, Thérèse M; Fleming, Garry J P

2014-10-01

To investigate the inter-examiner variability of contact point displacement measurements (used to calculate the overall Little's Irregularity Index (LII) score) from digital models of the maxillary arch by four independent examiners. Maxillary orthodontic pre-treatment study models of ten patients were scanned using the Lava(tm) Chairside Oral Scanner (LCOS) and 3D digital models were created using Creo(®) computer aided design (CAD) software. Four independent examiners measured the contact point displacements of the anterior maxillary teeth using the software. Measurements were recorded randomly on three separate occasions by the examiners and the measurements (n=600) obtained were analysed using correlation analyses and analyses of variance (ANOVA). LII contact point displacement measurements for the maxillary arch were reproducible for inter-examiner assessment when using the digital method and were highly correlated between examiner pairs for contact point displacement measurements >2mm. The digital measurement technique showed poor correlation for smaller contact point displacement measurements (<2mm) for repeated measurements. The coefficient of variation (CoV) of the digital contact point displacement measurements highlighted 348 of the 600 measurements differed by more than 20% of the mean compared with 516 of 600 for the same measurements performed using the conventional LII measurement technique. Although the inter-examiner variability of LII contact point displacement measurements on the maxillary arch was reduced using the digital compared with the conventional LII measurement methodology, neither method was considered appropriate for orthodontic research purposes particularly when measuring small contact point displacements. Copyright © 2014 Elsevier Ltd. All rights reserved.
Effects of questionnaire-based diagnosis and training on inter-rater reliability among practitioners of traditional Chinese medicine.

PubMed

Mist, Scott; Ritenbaugh, Cheryl; Aickin, Mikel

2009-07-01

To investigate whether a training process that focused on a questionnaire-based diagnosis in Traditional Chinese Medicine (TCM), and developing diagnostic consensus, would improve the agreement of TCM diagnoses among 10 TCM practitioners evaluating patients with temporomandibular joint disorder (TMJD). Evaluation of a diagnostic training program at the Department of Family and Community Medicine, University of Arizona, Tucson, Arizona, and the Oregon College of Oriental Medicine, Portland, Oregon. Screened participants for a study of TCM for TMJD. PRACTITIONERS: Ten (10) licensed acupuncturists with a minimum of 5 years licensure and education in Chinese herbs. A training session using a questionnaire-based diagnostic form was conducted, followed by waves of diagnostic sessions. Between sessions, practitioners discussed the results of the previous round of participants with a focus on reducing variability in primary diagnosis and severity rating of each diagnosis: 3 waves of 5 patients were assessed by 4 practitioner pairs for a total of 120 diagnoses. At 18 months, practitioners completed a recalibration exercise with a similar format with a total of 32 diagnoses. These diagnoses were then examined with respect to the rate of agreement among the 10 practitioners using inter-rater correlations and kappas. The inter-rater correlation with respect to the TCM diagnoses among the 10 practitioners increased from 0.112 to 0.618 with training. Statistically significant improvements were found between the baseline and 18 month exercises (p < 0.01). Inter-rater reliability of TCM diagnosis may be improved through a training process and a questionnaire-based diagnosis process. The improvements varied by diagnosis, with the greatest congruence among primary and more severe diagnoses. Future TCM studies should consider including calibration training to improve the validity of results.
Development and Validation of a Family Meeting Assessment Tool (FMAT).

PubMed

Hagiwara, Yuya; Healy, Jennifer; Lee, Shuko; Ross, Jeanette; Fischer, Dixie; Sanchez-Reilly, Sandra

2018-01-01

A cornerstone procedure in Palliative Medicine is to perform family meetings. Learning how to lead a family meeting is an important skill for physicians and others who care for patients with serious illnesses and their families. There is limited evidence on how to assess best practice behaviors during end-of-life family meetings. Our aim was to develop and validate an observational tool to assess trainees' ability to lead a simulated end-of-life family meeting. Building on evidence from published studies and accrediting agency guidelines, an expert panel at our institution developed the Family Meeting Assessment Tool. All fourth-year medical students (MS4) and eight geriatric and palliative medicine fellows (GPFs) were invited to participate in a Family Meeting Objective Structured Clinical Examination, where each trainee assumed the physician role leading a complex family meeting. Two evaluators observed and rated randomly chosen students' performances using the Family Meeting Assessment Tool during the examination. Inter-rater reliability was measured using percent agreement. Internal consistency was measured using Cronbach α. A total of 141 trainees (MS4 = 133 and GPF = 8) and 26 interdisciplinary evaluators participated in the study. Internal reliability (Cronbach α) of the tool was 0.85. Number of trainees rated by two evaluators was 210 (MS4 = 202 and GPF = 8). Rater agreement was 84%. Composite scores, on average, were significantly higher for fellows than for medical students (P < 0.001). Expert-based content, high inter-rater reliability, good internal consistency, and ability to predict educational level provided initial evidence for construct validity for this novel assessment tool. Copyright © 2017 American Academy of Hospice and Palliative Medicine. All rights reserved.
The Reliability of Environmental Measures of the College Alcohol Environment.

ERIC Educational Resources Information Center

Clapp, John D.; Whitney, Mike; Shillington, Audrey M.

2002-01-01

Assesses the inter-rater reliability of two environmental scanning tools designed to identify alcohol-related advertisements targeting college students. Inter-rater reliability for these forms varied across different rating categories and ranged from poor to excellent. Suggestions for future research are addressed. (Contains 26 references and 6…
The 2007 AASM recommendations for EEG electrode placement in polysomnography: impact on sleep and cortical arousal scoring.

PubMed

Ruehland, Warren R; O'Donoghue, Fergal J; Pierce, Robert J; Thornton, Andrew T; Singh, Parmjit; Copland, Janet M; Stevens, Bronwyn; Rochford, Peter D

2011-01-01

To examine the impact of using American Academy of Sleep Medicine (AASM) recommended EEG derivations (F4/M1, C4/M1, O2/M1) vs. a single derivation (C4/M1) in polysomnography (PSG) on the measurement of sleep and cortical arousals, including inter- and intra-observer variability. Prospective, non-blinded, randomized comparison. Three Australian tertiary-care hospital clinical sleep laboratories. 30 PSGs from consecutive patients investigated for obstructive sleep apnea (OSA) during December 2007 and January 2008. N/A. To examine the impact of EEG derivations on PSG summary statistics, 3 scorers from different Australian clinical sleep laboratories each scored separate sets of 10 PSGs twice, once using 3 EEG derivations and once using 1 EEG derivation. To examine the impact on inter- and intra-scorer reliability, all 3 scorers scored a subset of 10 PSGs 4 times, twice using each method. All PSGs were de-identified and scored in random order according to the 2007 AASM Manual for the Scoring of Sleep and Associated Events. Using 3 referential EEG derivations during PSG, as recommended in the AASM manual, instead of a single central EEG derivation, as originally suggested by Rechtschaffen and Kales (1968), resulted in a mean ± SE decrease in N1 sleep of 9.6 ± 3.9 min (P = 0.018) and an increase in N3 sleep of 10.6 ± 2.8 min (P = 0.001). No significant differences were observed for any other sleep or arousal scoring summary statistics; nor were any differences observed in inter-scorer or intra-scorer reliability for scoring sleep or cortical arousals. This study provides information for those changing practice to comply with the 2007 AASM recommendations for EEG placement in PSG, for those using portable devices that are unable to comply with the recommendations due to limited channel options, and for the development of future standards for PSG scoring and recording. As the use of multiple EEG derivations only led to small changes in the distribution of derived sleep stages and no significant differences in scoring reliability, this study calls into question the need to use multiple EEG derivations in clinical PSG as suggested in the AASM manual.
Longitudinal Improvement in Balance Error Scoring System Scores among NCAA Division-I Football Athletes.

PubMed

Mathiasen, Ross; Hogrefe, Christopher; Harland, Kari; Peterson, Andrew; Smoot, M Kyle

2018-02-15

The Balance Error Scoring System (BESS) is a commonly used concussion assessment tool. Recent studies have questioned the stability and reliability of baseline BESS scores. The purpose of this longitudinal prospective cohort study is to examine differences in yearly baseline BESS scores in athletes participating on an NCAA Division-I football team. NCAA Division-I freshman football athletes were videotaped performing the BESS test at matriculation and after 1 year of participation in the football program. Twenty-three athletes were enrolled in year 1 of the study, and 25 athletes were enrolled in year 2. Those athletes enrolled in year 1 were again videotaped after year 2 of the study. The paired t-test was used to assess for change in score over time for the firm surface, foam surface, and the cumulative BESS score. Additionally, inter- and intrarater reliability values were calculated. Cumulative errors on the BESS significantly decreased from a mean of 20.3 at baseline to 16.8 after 1 year of participation. The mean number of errors following the second year of participation was 15.0. Inter-rater reliability for the cumulative score ranged from 0.65 to 0.75. Intrarater reliability was 0.81. After 1 year of participation, there is a statistically and clinically significant improvement in BESS scores in an NCAA Division-I football program. Although additional improvement in BESS scores was noted after a second year of participation, it did not reach statistical significance. Football athletes should undergo baseline BESS testing at least yearly if the BESS is to be optimally useful as a diagnostic test for concussion.
Grant Peer Review: Improving Inter-Rater Reliability with Training

DOE PAGES

Sattler, David N.; McKnight, Patrick E.; Naney, Linda; ...

2015-06-15

In this study, we developed and evaluated a brief training program for grant reviewers that aimed to increase inter-rater reliability, rating scale knowledge, and effort to read the grant review criteria. Enhancing reviewer training may improve the reliability and accuracy of research grant proposal scoring and funding recommendations. Seventy-five Public Health professors from U.S. research universities watched the training video we produced and assigned scores to the National Institutes of Health scoring criteria proposal summary descriptions. For both novice and experienced reviewers, the training video increased scoring accuracy (the percentage of scores that reflect the true rating scale values), inter-ratermore » reliability, and the amount of time reading the review criteria compared to the no video condition. The increase in reliability for experienced reviewers is notable because it is commonly assumed that reviewers—especially those with experience—have good understanding of the grant review rating scale. Our findings suggest that both experienced and novice reviewers who had not received the type of training developed in this study may not have appropriate understanding of the definitions and meaning for each value of the rating scale and that experienced reviewers may overestimate their knowledge of the rating scale. Lastly, the results underscore the benefits of and need for specialized peer reviewer training.« less
Grant Peer Review: Improving Inter-Rater Reliability with Training.

PubMed

Sattler, David N; McKnight, Patrick E; Naney, Linda; Mathis, Randy

2015-01-01

This study developed and evaluated a brief training program for grant reviewers that aimed to increase inter-rater reliability, rating scale knowledge, and effort to read the grant review criteria. Enhancing reviewer training may improve the reliability and accuracy of research grant proposal scoring and funding recommendations. Seventy-five Public Health professors from U.S. research universities watched the training video we produced and assigned scores to the National Institutes of Health scoring criteria proposal summary descriptions. For both novice and experienced reviewers, the training video increased scoring accuracy (the percentage of scores that reflect the true rating scale values), inter-rater reliability, and the amount of time reading the review criteria compared to the no video condition. The increase in reliability for experienced reviewers is notable because it is commonly assumed that reviewers--especially those with experience--have good understanding of the grant review rating scale. The findings suggest that both experienced and novice reviewers who had not received the type of training developed in this study may not have appropriate understanding of the definitions and meaning for each value of the rating scale and that experienced reviewers may overestimate their knowledge of the rating scale. The results underscore the benefits of and need for specialized peer reviewer training.
Norwegian version of the rating anxiety in dementia scale (RAID-N): a validity and reliability study.

PubMed

Goyal, Alka R; Bergh, Sverre; Engedal, Knut; Kirkevold, Marit; Kirkevold, Øyvind

2017-12-01

Dementia-specific anxiety scales in the Norwegian language are lacking; the aim of this study was to investigate the validity and inter-rater reliability of a Norwegian version of the Rating Anxiety in Dementia (RAID-N) scale. The validity of the RAID-N was tested in a sample of 101 patients with dementia from seven Norwegian nursing homes. One psychogeriatrician (n = 50) or a physician with long experience with nursing home patients (n = 51) 'blind' to the RAID-N score diagnosed anxiety according to DSM-5 criteria of generalised anxiety disorder (GAD). A receiver operating characteristic (ROC) analysis assessed the best cut-off point for the RAID-N, and the area under the curve (AUC) was calculated. Inter-rater reliability was tested in a subgroup of 53 patients by intraclass correlation (ICC) and Cohen's kappa. Twenty-eight of 101 (27.7%) met the GAD criteria. The mean RAID-N score for patients with GAD was 16.1 (SD 6.3) and without GAD, 8.8 (SD 6.5) (p < 0.001). A cut-off score of ≥12 on the RAID-N gave a sensitivity of 82.1%, specificity of 70.0%, and 73.3% accuracy in identifying clinically significant GAD in patients with dementia. Inter-rater reliability on overall RAID-N items was good (ICC = 0.82), Cohen's kappa was 0.58 for total RAID-N score, with satisfactory internal consistency (Cronbach's alpha = 0.81). The RAID-N has fairly good validity and inter-rater reliability, and could be useful to assess GAD in patients with dementia. Further studies should investigate the optimal RAID-N cut-off score in different settings.
Psychometric evaluation of a motor control test battery of the craniofacial region.

PubMed

von Piekartz, H; Stotz, E; Both, A; Bahn, G; Armijo-Olivo, S; Ballenberger, N

2017-12-01

The primary objective of this study was to determine the structural and known-group validity as well as the inter-rater reliability of a test battery to evaluate the motor control of the craniofacial region. Seventy volunteers without TMD and 25 subjects with TMD (Axes I) per the DC/TMD were asked to execute a test battery consisting of eight tests. The tests were video-taped in the same sequence in a standardised manner. Two experienced physical therapists participated in this study as blinded assessors. We used exploratory factor analysis to identify the underlying component structure of the eight tests. Internal consistency (Cronbach's α), inter-rater reliability (intra-class correlation coefficient) and construct validity (ie, hypothesis testing-known-group validity) (receiver operating curves) were also explored for the test battery. The structural validity showed the presence of one factor underlying the construct of the test battery. The internal consistency was excellent (0.90) as well as the inter-rater reliability. All values of reliability were close to 0.9 or above indicating very high inter-rater reliability. The area under the curve (AUC) was 0.93 for rater 1 and 0.94 for rater two, respectively, indicating excellent discrimination between subjects with TMD and healthy controls. The results of the present study support the psychometric properties of test battery to measure motor control of the craniofacial region when evaluated through videotaping. This test battery could be used to differentiate between healthy subjects and subjects with musculoskeletal impairments in the cervical and oro-facial regions. In addition, this test battery could be used to assess the effectiveness of management strategies in the craniofacial region. © 2017 John Wiley & Sons Ltd.
Reliability and validity of the upper-body dressing scale in Japanese patients with vascular dementia with hemiparesis.

PubMed

Endo, Arisa; Suzuki, Makoto; Akagi, Atsumi; Chiba, Naoyuki; Ishizaka, Ikuyo; Matsunaga, Atsuhiko; Fukuda, Michinari

2015-03-01

The purpose of this study was to examine the reliability and validity of the Upper-body Dressing Scale (UBDS) for buttoned shirt dressing, which evaluates the learning process of new component actions of upper-body dressing in patients diagnosed with dementia and hemiparesis. This was a preliminary correlational study of concurrent validity and reliability in which 10 vascular dementia patients with hemiparesis were enrolled and assessed repeatedly by six occupational therapists by means of the UBDS and the dressing item of the Functional Independence Measure (FIM). Intraclass correlation coefficient was 0.97 for intra-rater reliability and 0.99 for inter-rater reliability. The level of correlation between UBDS score and FIM dressing item scores was -0.93. UBDS scores for paralytic hand passed into the sleeve and sleeve pulled up beyond the shoulder joint were worse than the scores for the other components of the task. The UBDS has good reliability and validity for vascular dementia patients with hemiparesis. Further research is needed to investigate the relation between UBDS score and the effect of intervention and to clarify sensitivity or responsiveness of the scale to clinical change. Copyright © 2014 John Wiley & Sons, Ltd.
Assessing Reliability of Medical Record Reviews for the Detection of Hospital Adverse Events.

PubMed

Ock, Minsu; Lee, Sang-il; Jo, Min-Woo; Lee, Jin Yong; Kim, Seon-Ha

2015-09-01

The purpose of this study was to assess the inter-rater reliability and intra-rater reliability of medical record review for the detection of hospital adverse events. We conducted two stages retrospective medical records review of a random sample of 96 patients from one acute-care general hospital. The first stage was an explicit patient record review by two nurses to detect the presence of 41 screening criteria (SC). The second stage was an implicit structured review by two physicians to identify the occurrence of adverse events from the positive cases on the SC. The inter-rater reliability of two nurses and that of two physicians were assessed. The intra-rater reliability was also evaluated by using test-retest method at approximately two weeks later. In 84.2% of the patient medical records, the nurses agreed as to the necessity for the second stage review (kappa, 0.68; 95% confidence interval [CI], 0.54 to 0.83). In 93.0% of the patient medical records screened by nurses, the physicians agreed about the absence or presence of adverse events (kappa, 0.71; 95% CI, 0.44 to 0.97). When assessing intra-rater reliability, the kappa indices of two nurses were 0.54 (95% CI, 0.31 to 0.77) and 0.67 (95% CI, 0.47 to 0.87), whereas those of two physicians were 0.87 (95% CI, 0.62 to 1.00) and 0.37 (95% CI, -0.16 to 0.89). In this study, the medical record review for detecting adverse events showed intermediate to good level of inter-rater and intra-rater reliability. Well organized training program for reviewers and clearly defining SC are required to get more reliable results in the hospital adverse event study.
Emotional and Behavioral Screener: Test-Retest Reliability, Inter-Rater Reliability, and Convergent Validity

ERIC Educational Resources Information Center

Nordness, Philip D.; Epstein, Michael H.; Cullinan, Douglas; Pierce, Corey D.

2014-01-01

The Emotional and Behavioral Screener (EBS) is a universal screening instrument designed to identify students whose excessive problem behaviors put them at risk of the education disability category of emotional disturbance (ED). This article reports findings from three studies that address the reliability and validity of the EBS. Studies 1 and 2…
Reliability of an Automated High-Resolution Manometry Analysis Program across Expert Users, Novice Users, and Speech-Language Pathologists

ERIC Educational Resources Information Center

Jones, Corinne A.; Hoffman, Matthew R.; Geng, Zhixian; Abdelhalim, Suzan M.; Jiang, Jack J.; McCulloch, Timothy M.

2014-01-01

Purpose: The purpose of this study was to investigate inter- and intrarater reliability among expert users, novice users, and speech-language pathologists with a semiautomated high-resolution manometry analysis program. We hypothesized that all users would have high intrarater reliability and high interrater reliability. Method: Three expert…
Comparison of in vivo 3D cone-beam computed tomography tooth volume measurement protocols.

PubMed

Forst, Darren; Nijjar, Simrit; Flores-Mir, Carlos; Carey, Jason; Secanell, Marc; Lagravere, Manuel

2014-12-23

The objective of this study is to analyze a set of previously developed and proposed image segmentation protocols for precision in both intra- and inter-rater reliability for in vivo tooth volume measurements using cone-beam computed tomography (CBCT) images. Six 3D volume segmentation procedures were proposed and tested for intra- and inter-rater reliability to quantify maxillary first molar volumes. Ten randomly selected maxillary first molars were measured in vivo in random order three times with 10 days separation between measurements. Intra- and inter-rater agreement for all segmentation procedures was attained using intra-class correlation coefficient (ICC). The highest precision was for automated thresholding with manual refinements. A tooth volume measurement protocol for CBCT images employing automated segmentation with manual human refinement on a 2D slice-by-slice basis in all three planes of space possessed excellent intra- and inter-rater reliability. Three-dimensional volume measurements of the entire tooth structure are more precise than 3D volume measurements of only the dental roots apical to the cemento-enamel junction (CEJ).
Performance of a quality assurance program for assessing dental health in methamphetamine users.

PubMed

Dye, Bruce A; Harrell, Lauren; Murphy, Debra A; Belin, Thomas; Shetty, Vivek

2015-07-05

Systematic characterization of the dental consequences of methamphetamine (MA) abuse presupposes a rigorous quality assurance (QA) program to ensure the credibility of the data collected and the scientific integrity and validity of the clinical study. In this report we describe and evaluate the performance of a quality assurance program implemented in a large cross-sectional study of the dental consequences of MA use. A large community sample of MA users was recruited over a 30 month period during 2011-13 and received comprehensive oral examinations and psychosocial assessments by site examiners based at two large community health centers in Los Angeles. National Health and Nutrition Examination Survey (NHANES) protocols for oral health assessments were utilized to characterize dental disease. Using NHANES oral health quality assurance guidelines, examiner reliability statistics such as Cohen's Kappa coefficients and inter-class correlation coefficients were calculated to assess the magnitude of agreement between the site examiners and a reference examiner to ensure conformance and comparability with NHANES practices. Approximately 9% (n = 49) of the enrolled 574 MA users received a repeat dental caries and periodontal examination conducted by the reference examiner. There was high concordance between the reference examiner and the site examiners for identification of untreated dental disease (Kappa statistic values: 0.57-0.75, percent agreement 83-88%). For identification of untreated caries on at least 5 surfaces of anterior teeth, the Kappas ranged from 0.77 to 0.87, and percent agreement from 94 to 97%. The intra-class coefficients (ICCs) ranged from 0.87 to 89 for attachment loss across all periodontal sites assessed and the ICCs ranged from 0.79 to 0.81 for pocket depth. For overall gingival recession, the ICCs ranged from 0.88 to 0.91. When Kappa was calculated based on the CDC/AAP case definitions for severe periodontitis, inter-examiner reliability for site examiners was low (Kappa 0.27-0.67). Overall, the quality assurance program confirmed the procedural adherence of the quality of the data collected on the distribution of dental caries and periodontal disease in MA-users. Examiner concordance was higher for dental caries but lower for specific periodontal assessments.

A Structured Clinical Interview for Kleptomania (SCI-K): preliminary validity and reliability testing.

PubMed

Grant, Jon E; Kim, Suck Won; McCabe, James S

2006-06-01

Kleptomania presents difficulties in diagnosis for clinicians. This study aimed to develop and test a DSM-IV-based diagnostic instrument for kleptomania. To assess for current kleptomania the Structured Clinical Interview for Kleptomania (SCI-K) was administered to 112 consecutive subjects requesting psychiatric outpatient treatment for a variety of disorders. Reliability and validity were determined. Classification accuracy was examined using the longitudinal course of illness. The SCI-K demonstrated excellent test-retest (Phi coefficient = 0.956 (95% CI = 0.937, 0.970)) and inter-rater reliability (phi coefficient = 0.718 (95% CI = 0.506, 0.848)) in the diagnosis of kleptomania. Concurrent validity was observed with a self-report measure using DSM-IV kleptomania criteria (phi coefficient = 0.769 (95% CI = 0.653, 0.850)). Discriminant validity was observed with a measure of depression (point biserial coefficient = -0.020 (95% CI = -0.205, 0.166)). The SCI-K demonstrated both high sensitivity and specificity based on longitudinal assessment. The SCI-K demonstrated excellent reliability and validity in diagnosing kleptomania in subjects presenting with various psychiatric problems. These findings require replication in larger groups, including non-psychiatric populations, to examine their generalizability. Copyright (c) 2006 John Wiley & Sons, Ltd.
A novel standardized algorithm using SPECT/CT evaluating unhappy patients after unicondylar knee arthroplasty--a combined analysis of tracer uptake distribution and component position.

PubMed

Suter, Basil; Testa, Enrique; Stämpfli, Patrick; Konala, Praveen; Rasch, Helmut; Friederich, Niklaus F; Hirschmann, Michael T

2015-03-20

The introduction of a standardized SPECT/CT algorithm including a localization scheme, which allows accurate identification of specific patterns and thresholds of SPECT/CT tracer uptake, could lead to a better understanding of the bone remodeling and specific failure modes of unicondylar knee arthroplasty (UKA). The purpose of the present study was to introduce a novel standardized SPECT/CT algorithm for patients after UKA and evaluate its clinical applicability, usefulness and inter- and intra-observer reliability. Tc-HDP-SPECT/CT images of consecutive patients (median age 65, range 48-84 years) with 21 knees after UKA were prospectively evaluated. The tracer activity on SPECT/CT was localized using a specific standardized UKA localization scheme. For tracer uptake analysis (intensity and anatomical distribution pattern) a 3D volumetric quantification method was used. The maximum intensity values were recorded for each anatomical area. In addition, ratios between the respective value in the measured area and the background tracer activity were calculated. The femoral and tibial component position (varus-valgus, flexion-extension, internal and external rotation) was determined in 3D-CT. The inter- and intraobserver reliability of the localization scheme, grading of the tracer activity and component measurements were determined by calculating the intraclass correlation coefficients (ICC). The localization scheme, grading of the tracer activity and component measurements showed high inter- and intra-observer reliabilities for all regions (tibia, femur and patella). For measurement of component position there was strong agreement between the readings of the two observers; the ICC for the orientation of the femoral component was 0.73-1.00 (intra-observer reliability) and 0.91-1.00 (inter-observer reliability). The ICC for the orientation of the tibial component was 0.75-1.00 (intra-observer reliability) and 0.77-1.00 (inter-observer reliability). The SPECT/CT algorithm presented combining the mechanical information on UKA component position, alignment and metabolic data is highly reliable and proved to be a valuable, consistent and useful tool for analysing postoperative knees after UKA. Using this standardized approach in clinical studies might be helpful in establishing the diagnosis in patients with pain after UKA.
Inter-Rater Reliability and Intra-Rater Reliability of Assessing the 2-Minute Push-Up Test.

PubMed

Fielitz, Lynn; Coelho, Jeffrey; Horne, Thomas; Brechue, William

2016-02-01

The purpose of this study was to assess inter-rater reliability and intra-rater reliability of the 2-minute, 90° push-up test as utilized in the Army Physical Fitness Test. Analysis of rater assessment reliability included both total score agreement and agreement across individual push-up repetitions. This study utilized 8 Raters who assessed 15 different videotaped push-up performances over 4 iterations separated by a minimum of 1 week. The 15 push-up participants were videotaped during the semiannual Army Physical Fitness Test. Each Rater randomly viewed the 15 push-up and verbally responded with a "yes" or "no" to each push-up repetition. The data generated were analyzed using the Pearson product-moment correlation as well as the kappa, modified kappa and the intra-class correlation coefficient (3,1). An attribute agreement analysis was conducted to determine the percent of inter-rater and intra-rater agreement across individual push-ups.The results indicated that Raters varied a great deal in assessing push-ups. Over the 4 trials of 15 participants, the overall scores of the Raters varied between 3.0 and 35.7 push-ups. Post hoc comparisons found that there was significant increase in the grand mean of push-ups from trials 1-3 to trial 4 (p < 0.05). Also, there was a significant difference among raters over the 4 trials (p < 0.05). Pearson correlation coefficients for inter-rater and intra-rater reliability identified inter-rater reliability coefficients were between 0.10 and 0.97. Intra-rater coefficients were between 0.48 and 0.99. Intra-rater agreement for individual push-up repetitions ranged from 41.8% to 84.8%. The results indicated that the raters failed to assess the same push-up repetition with the same score (below 70% agreement) as well as failed to agree when viewed between raters (29%). Interestingly, as previously mentioned, scores on trial 4 increased significantly which might have been caused by rater drift or that the Raters did not maintain the push-up standard over the trials. It does appear that the final push-up scores received by each participant was a close approximation of actual performance (within 65%) but when assessing physical performance for retention in the Army, a more reliable test might be considered. Reprint & Copyright © 2016 Association of Military Surgeons of the U.S.
CLINICAL AUDIT OF IMAGE QUALITY IN RADIOLOGY USING VISUAL GRADING CHARACTERISTICS ANALYSIS.

PubMed

Tesselaar, Erik; Dahlström, Nils; Sandborg, Michael

2016-06-01

The aim of this work was to assess whether an audit of clinical image quality could be efficiently implemented within a limited time frame using visual grading characteristics (VGC) analysis. Lumbar spine radiography, bedside chest radiography and abdominal CT were selected. For each examination, images were acquired or reconstructed in two ways. Twenty images per examination were assessed by 40 radiology residents using visual grading of image criteria. The results were analysed using VGC. Inter-observer reliability was assessed. The results of the visual grading analysis were consistent with expected outcomes. The inter-observer reliability was moderate to good and correlated with perceived image quality (r(2) = 0.47). The median observation time per image or image series was within 2 min. These results suggest that the use of visual grading of image criteria to assess the quality of radiographs provides a rapid method for performing an image quality audit in a clinical environment. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Transcultural Adaptation of GRID Hamilton Rating Scale For Depression (GRID-HAMD) to Brazilian Portuguese and Evaluation of the Impact of Training Upon Inter-Rater Reliability.

PubMed

Henrique-Araújo, Ricardo; Osório, Flávia L; Gonçalves Ribeiro, Mônica; Soares Monteiro, Ivandro; Williams, Janet B W; Kalali, Amir; Alexandre Crippa, José; Oliveira, Irismar Reis De

2014-07-01

GRID-HAMD is a semi-structured interview guide developed to overcome flaws in HAM-D, and has been incorporated into an increasing number of studies. Carry out the transcultural adaptation of GRID-HAMD into the Brazilian Portuguese language, evaluate the inter-rater reliability of this instrument and the training impact upon this measure, and verify the raters' opinions of said instrument. The transcultural adaptation was conducted by appropriate methodology. The measurement of inter-rater reliability was done by way of videos that were evaluated by 85 professionals before and after training for the use of this instrument. The intraclass correlation coefficient (ICC) remained between 0.76 and 0.90 for GRID-HAMD-21 and between 0.72 and 0.91 for GRID-HAMD-17. The training did not have an impact on the ICC, except for a few groups of participants with a lower level of experience. Most of the participants showed high acceptance of GRID-HAMD, when compared to other versions of HAM-D. The scale presented adequate inter-rater reliability even before training began. Training did not have an impact on this measure, except for a few groups with less experience. GRID-HAMD received favorable opinions from most of the participants.
A comparison of Google Glass and traditional video vantage points for bedside procedural skill assessment.

PubMed

Evans, Heather L; O'Shea, Dylan J; Morris, Amy E; Keys, Kari A; Wright, Andrew S; Schaad, Douglas C; Ilgen, Jonathan S

2016-02-01

This pilot study assessed the feasibility of using first person (1P) video recording with Google Glass (GG) to assess procedural skills, as compared with traditional third person (3P) video. We hypothesized that raters reviewing 1P videos would visualize more procedural steps with greater inter-rater reliability than 3P rating vantages. Seven subjects performed simulated internal jugular catheter insertions. Procedures were recorded by both Google Glass and an observer's head-mounted camera. Videos were assessed by 3 expert raters using a task-specific checklist (CL) and both an additive- and summative-global rating scale (GRS). Mean scores were compared by t-tests. Inter-rater reliabilities were calculated using intraclass correlation coefficients. The 1P vantage was associated with a significantly higher mean CL score than the 3P vantage (7.9 vs 6.9, P = .02). Mean GRS scores were not significantly different. Mean inter-rater reliabilities for the CL, additive-GRS, and summative-GRS were similar between vantages. 1P vantage recordings may improve visualization of tasks for behaviorally anchored instruments (eg, CLs), whereas maintaining similar global ratings and inter-rater reliability when compared with conventional 3P vantage recordings. Copyright © 2016 Elsevier Inc. All rights reserved.
The Pareidolia Test: A Simple Neuropsychological Test Measuring Visual Hallucination-Like Illusions

PubMed Central

Mamiya, Yasuyuki; Nishio, Yoshiyuki; Watanabe, Hiroyuki; Yokoi, Kayoko; Uchiyama, Makoto; Baba, Toru; Iizuka, Osamu; Kanno, Shigenori; Kamimura, Naoto; Kazui, Hiroaki; Hashimoto, Mamoru; Ikeda, Manabu; Takeshita, Chieko; Shimomura, Tatsuo; Mori, Etsuro

2016-01-01

Background Visual hallucinations are a core clinical feature of dementia with Lewy bodies (DLB), and this symptom is important in the differential diagnosis and prediction of treatment response. The pareidolia test is a tool that evokes visual hallucination-like illusions, and these illusions may be a surrogate marker of visual hallucinations in DLB. We created a simplified version of the pareidolia test and examined its validity and reliability to establish the clinical utility of this test. Methods The pareidolia test was administered to 52 patients with DLB, 52 patients with Alzheimer’s disease (AD) and 20 healthy controls (HCs). We assessed the test-retest/inter-rater reliability using the intra-class correlation coefficient (ICC) and the concurrent validity using the Neuropsychiatric Inventory (NPI) hallucinations score as a reference. A receiver operating characteristic (ROC) analysis was used to evaluate the sensitivity and specificity of the pareidolia test to differentiate DLB from AD and HCs. Results The pareidolia test required approximately 15 minutes to administer, exhibited good test-retest/inter-rater reliability (ICC of 0.82), and moderately correlated with the NPI hallucinations score (rs = 0.42). Using an optimal cut-off score set according to the ROC analysis, and the pareidolia test differentiated DLB from AD with a sensitivity of 81% and a specificity of 92%. Conclusions Our study suggests that the simplified version of the pareidolia test is a valid and reliable surrogate marker of visual hallucinations in DLB. PMID:27171377
THE NAVICULAR POSITION TEST – A RELIABLE MEASURE OF THE NAVICULAR BONE POSITION DURING REST AND LOADING

PubMed Central

Spörndly-Nees, Søren; Dåsberg, Brian; Nielsen, Rasmus Oestergaard; Boesen, Morten Ilum

2011-01-01

Background: Lower limb injuries are a large problem in athletes. However, there is a paucity of knowledge on the relationship between alignment of the medial longitudinal arch (MLA) of the foot and development of such injuries. A reliable and valid test to quantify foot type is needed to be able to investigate the relationship between arch type and injury likelihood. Feiss Line is a valid clinical measure of the MLA. However, no study has investigated the reliability of the test. Objectives: The purpose was to describe a modified version of the Feiss Line test and to determine the intra- and inter-tester reliability of this new foot alignment test. To emphasize the purpose of the modified test, the authors have named it The Navicular Position Test. Methods: Intra- and inter-tester reliability were evaluated of The Navicular Position Test with the use of ICC (interclass correlation coefficient) and Bland-Altman limits of agreement on 43 healthy, young, subjects. Results: Inter-tester mean difference -0.35 degrees [–1.32; 0.62] p = 0.47. Bland-Altman limits of agreement –6.55 to 5.85 degrees, ICC = 0.94. Intra-tester mean difference 0.47 degrees [–0.57; 1.50] p = 0.37. Bland-Altman limits of agreement –6.15 to 7.08 degrees, ICC = 0.91. Discussion: The present data support The Navicular Position Test as a reliable test of the navicular bone position during rest and loading measured in a simple test set-up. Conclusion: The Navicular Position Test was shown to have a high intraday-, intra- and inter-tester reliability. When cut off values to categorize the MLA into planus, rectus, or cavus feet, has been determined and presented, the test could be used in prospective observational studies investigating the role of the arch type on the development of various lower limb injuries. PMID:21904698
RELIABILITY AND VALIDITY OF SUBJECTIVE ASSESSMENT OF LUMBAR LORDOSIS IN CONVENTIONAL RADIOGRAPHY.

PubMed

Ruhinda, E; Byanyima, R K; Mugerwa, H

2014-10-01

Reliability and validity studies of different lumbar curvature analysis and measurement techniques have been documented however there is limited literature on the reliability and validity of subjective visual analysis. Radiological assessment of lumbar lordotic curve aids in early diagnosis of conditions even before neurologic changes set in. To ascertain the level of reliability and validity of subjective assessment of lumbar lordosis in conventional radiography. A blinded, repeated-measures diagnostic test was carried out on lumbar spine x-ray radiographs. Radiology Department at Joint Clinical Research Centre (JCRC), Mengo-Kampala-Uganda. Seventy (70) lateral lumbar x-ray films were used for this study and were obtained from the archive of JCRC radiology department at Butikiro house, Mengo-Kampala. Poor observer agreement, both inter- and intra-observer, with kappa values of 0.16 was found. Inter-observer agreement was poorer than intra-observer agreement. Kappa values significantly rose when the lumbar lordosis was clustered into four categories without grading each abnormality. The results confirm that subjective assessment of lumbar lordosis has low reliability and validity. Film quality has limited influence on the observer reliability. This study further shows that fewer scale categories of lordosis abnormalities produce better observer reliability.
Developing the Person-Environment Apathy Rating for persons with dementia.

PubMed

Jao, Ying-Ling; Algase, Donna L; Specht, Janet K; Williams, Kristine

2016-08-01

To develop the Person-Environment Apathy Rating (PEAR) scale that measures environmental stimulation and apathy in persons with dementia and to evaluate its psychometrics. The PEAR scale consists of the PEAR-Environment subscale and PEAR-Apathy subscales. The items were developed via literature review, field testing, expert review, and pilot testing. The construct validity and reliability were examined through video observation. The parent study enrolled 185 institutionalized residents with dementia. For this study, 96 videos were selected from 24 participants. The PEAR-Environment subscale was validated using the Ambiance Scale and the Crowding Index. The PEAR-Apathy subscale was validated using the Neuropsychiatric Inventory (NPI)-Apathy, Passivity in Dementia Scale (PDS), and NPI-Depression. The PEAR-Environment subscale and PEAR-Apathy subscales each consists of six items rated on a 1-4 scale. For validity, the Crowding Index slightly, yet significantly, correlated with the PEAR-Environment subscale total score and three of the individual scores. Ambiance Scale scores, both engaging and soothing, did not correlate with the PEAR-Environment subscale. The PEAR-Apathy highly correlated with the PDS and NPI-Apathy and moderately correlated with the NPI-Depression, suggesting good convergent validity and moderate discriminant validity. For reliability, both environment and apathy subscales demonstrated excellent internal consistency. Although facial expression and eye contact showed moderate inter-rater reliability, all other items showed good to excellent inter-rater and intra-rater reliability. This study has successfully developed the PEAR scale and established its psychometrics based on the compatible scales available. The PEAR scale is the first scale that concurrently assesses apathy and environmental stimulation, and is recommended for use in persons with dementia.
Is computed tomography an accurate and reliable method for measuring total knee arthroplasty component rotation?

PubMed

Figueroa, José; Guarachi, Juan Pablo; Matas, José; Arnander, Magnus; Orrego, Mario

2016-04-01

Computed tomography (CT) is widely used to assess component rotation in patients with poor results after total knee arthroplasty (TKA). The purpose of this study was to simultaneously determine the accuracy and reliability of CT in measuring TKA component rotation. TKA components were implanted in dry-bone models and assigned to two groups. The first group (n = 7) had variable femoral component rotations, and the second group (n = 6) had variable tibial tray rotations. CT images were then used to assess component rotation. Accuracy of CT rotational assessment was determined by mean difference, in degrees, between implanted component rotation and CT-measured rotation. Intraclass correlation coefficient (ICC) was applied to determine intra-observer and inter-observer reliability. Femoral component accuracy showed a mean difference of 2.5° and the tibial tray a mean difference of 3.2°. There was good intra- and inter-observer reliability for both components, with a femoral ICC of 0.8 and 0.76, and tibial ICC of 0.68 and 0.65, respectively. CT rotational assessment accuracy can differ from true component rotation by approximately 3° for each component. It does, however, have good inter- and intra-observer reliability.
En Face Optical Coherence Tomography Angiography Imaging Versus Fundus Photography in the Measurement of Choroidal Nevi.

PubMed

Lee, Michele D; Kaidonis, Georgia; Kim, Alice Y; Shields, Ryan A; Leng, Theodore

2017-09-01

Choroidal nevi are common benign intraocular tumors with a small risk of malignant transformation. This retrospective study investigates the use of en face spectral-domain optical coherence tomography angiography (SD-OCTA) in determining the clinical features and measurement of choroidal nevi. Patients with choroidal nevi were imaged with both OCTA and a fundus photography device. Greatest longitudinal dimension (GLD), perpendicular dimension (PD), and the GLD/PD ratio were assessed on each device. Inter-device variation and intra- and inter-rater reliability analyses were performed. Fourteen patients with choroidal nevi were included. No significant difference between the GLD/PD ratio as measured by all three devices was found (Chi-square = 2.8, 2 df, P = .247). Intraclass correlation coefficients were greater than 0.7 for repeated measures on all devices, suggesting good repeatability and reproducibility. This study demonstrated inter-device consistency and high intra- and inter-rater reliability when measuring choroidal nevi. [Ophthalmic Surg Lasers Imaging Retina. 2017;48:741-747.]. Copyright 2017, SLACK Incorporated.
Reliability of a four-column classification for tibial plateau fractures.

PubMed

Martínez-Rondanelli, Alfredo; Escobar-González, Sara Sofía; Henao-Alzate, Alejandro; Martínez-Cano, Juan Pablo

2017-09-01

A four-column classification system offers a different way of evaluating tibial plateau fractures. The aim of this study is to compare the intra-observer and inter-observer reliability between four-column and classic classifications. This is a reliability study, which included patients presenting with tibial plateau fractures between January 2013 and September 2015 in a level-1 trauma centre. Four orthopaedic surgeons blindly classified each fracture according to four different classifications: AO, Schatzker, Duparc and four-column. Kappa, intra-observer and inter-observer concordance were calculated for the reliability analysis. Forty-nine patients were included. The mean age was 39 ± 14.2 years, with no gender predominance (men: 51%; women: 49%), and 67% of the fractures included at least one of the posterior columns. The intra-observer and inter-observer concordance were calculated for each classification: four-column (84%/79%), Schatzker (60%/71%), AO (50%/59%) and Duparc (48%/58%), with a statistically significant difference among them (p = 0.001/p = 0.003). Kappa coefficient for intr-aobserver and inter-observer evaluations: Schatzker 0.48/0.39, four-column 0.61/0.34, Duparc 0.37/0.23, and AO 0.34/0.11. The proposed four-column classification showed the highest intra and inter-observer agreement. When taking into account the agreement that occurs by chance, Schatzker classification showed the highest inter-observer kappa, but again the four-column had the highest intra-observer kappa value. The proposed classification is a more inclusive classification for the posteromedial and posterolateral fractures. We suggest, therefore, that it be used in addition to one of the classic classifications in order to better understand the fracture pattern, as it allows more attention to be paid to the posterior columns, it improves the surgical planning and allows the surgical approach to be chosen more accurately.
Validity and reliability of balance assessment software using the Nintendo Wii balance board: usability and validation

PubMed Central

2014-01-01

Background A balance test provides important information such as the standard to judge an individual’s functional recovery or make the prediction of falls. The development of a tool for a balance test that is inexpensive and widely available is needed, especially in clinical settings. The Wii Balance Board (WBB) is designed to test balance, but there is little software used in balance tests, and there are few studies on reliability and validity. Thus, we developed a balance assessment software using the Nintendo Wii Balance Board, investigated its reliability and validity, and compared it with a laboratory-grade force platform. Methods Twenty healthy adults participated in our study. The participants participated in the test for inter-rater reliability, intra-rater reliability, and concurrent validity. The tests were performed with balance assessment software using the Nintendo Wii balance board and a laboratory-grade force platform. Data such as Center of Pressure (COP) path length and COP velocity were acquired from the assessment systems. The inter-rater reliability, the intra-rater reliability, and concurrent validity were analyzed by an intraclass correlation coefficient (ICC) value and a standard error of measurement (SEM). Results The inter-rater reliability (ICC: 0.89-0.79, SEM in path length: 7.14-1.90, SEM in velocity: 0.74-0.07), intra-rater reliability (ICC: 0.92-0.70, SEM in path length: 7.59-2.04, SEM in velocity: 0.80-0.07), and concurrent validity (ICC: 0.87-0.73, SEM in path length: 5.94-0.32, SEM in velocity: 0.62-0.08) were high in terms of COP path length and COP velocity. Conclusion The balance assessment software incorporating the Nintendo Wii balance board was used in our study and was found to be a reliable assessment device. In clinical settings, the device can be remarkably inexpensive, portable, and convenient for the balance assessment. PMID:24912769
Validity and reliability of balance assessment software using the Nintendo Wii balance board: usability and validation.

PubMed

Park, Dae-Sung; Lee, GyuChang

2014-06-10

A balance test provides important information such as the standard to judge an individual's functional recovery or make the prediction of falls. The development of a tool for a balance test that is inexpensive and widely available is needed, especially in clinical settings. The Wii Balance Board (WBB) is designed to test balance, but there is little software used in balance tests, and there are few studies on reliability and validity. Thus, we developed a balance assessment software using the Nintendo Wii Balance Board, investigated its reliability and validity, and compared it with a laboratory-grade force platform. Twenty healthy adults participated in our study. The participants participated in the test for inter-rater reliability, intra-rater reliability, and concurrent validity. The tests were performed with balance assessment software using the Nintendo Wii balance board and a laboratory-grade force platform. Data such as Center of Pressure (COP) path length and COP velocity were acquired from the assessment systems. The inter-rater reliability, the intra-rater reliability, and concurrent validity were analyzed by an intraclass correlation coefficient (ICC) value and a standard error of measurement (SEM). The inter-rater reliability (ICC: 0.89-0.79, SEM in path length: 7.14-1.90, SEM in velocity: 0.74-0.07), intra-rater reliability (ICC: 0.92-0.70, SEM in path length: 7.59-2.04, SEM in velocity: 0.80-0.07), and concurrent validity (ICC: 0.87-0.73, SEM in path length: 5.94-0.32, SEM in velocity: 0.62-0.08) were high in terms of COP path length and COP velocity. The balance assessment software incorporating the Nintendo Wii balance board was used in our study and was found to be a reliable assessment device. In clinical settings, the device can be remarkably inexpensive, portable, and convenient for the balance assessment.
Reliability and concurrent validity of postural asymmetry measurement in adolescent idiopathic scoliosis

PubMed Central

Prowse, Ashleigh; Aslaksen, Berit; Kierkegaard, Marie; Furness, James; Gerdhem, Paul; Abbott, Allan

2017-01-01

AIM To investigate the reliability and concurrent validity of the Baseline® Body Level/Scoliosis meter for adolescent idiopathic scoliosis postural assessment in three anatomical planes. METHODS This is an observational reliability and concurrent validity study of adolescent referrals to the Orthopaedic department for scoliosis screening at Karolinska University Hospital, Stockholm, Sweden between March-May 2012. A total of 31 adolescents with idiopathic scoliosis (13.6 ± 0.6 years old) of mild-moderate curvatures (25° ± 12°) were consecutively recruited. Measurement of cervical, thoracic and lumbar curvatures, pelvic and shoulder tilt, and axial thoracic rotation (ATR) were performed by two trained physiotherapists in one day. The intraclass correlation coefficient (ICC) was used to determine the inter-examiner reliability (ICC2,1) and the intra-rater reliability (ICC3,3) of the Baseline® Body Level/Scoliosis meter. Spearman’s correlation analyses were used to estimate concurrent validity between the Baseline® Body Level/Scoliosis meter and Gold Standard Cobb angles from radiographs and the Orthopaedic Systems Inc. Scoliometer. RESULTS There was excellent reliability between examiners for thoracic kyphosis (ICC2,1 = 0.94), ATR (ICC2,1 = 0.92) and lumbar lordosis (ICC2,1 = 0.79). There was adequate reliability between examiners for cervical lordosis (ICC2,1 = 0.51), however poor reliability for pelvic and shoulder tilt. Both devices were reproducible in the measurement of ATR when repeated by one examiner (ICC3,3 0.98-1.00). The device had a good correlation with the Scoliometer (rho = 0.78). When compared with Cobb angle from radiographs, there was a moderate correlation for ATR (rho = 0.627). CONCLUSION The Baseline® Body Level/Scoliosis meter provides reliable transverse and sagittal cervical, thoracic and lumbar measurements and valid transverse plan measurements of mild-moderate scoliosis deformity. PMID:28144582
Reliability of the Cardiff Test of basic life support and automated external defibrillation version 3.1.

PubMed

Whitfield, Richard H; Newcombe, Robert G; Woollard, Malcolm

2003-12-01

The introduction of the European Resuscitation Guidelines (2000) for cardiopulmonary resuscitation (CPR) and automated external defibrillation (AED) prompted the development of an up-to-date and reliable method of assessing the quality of performance of CPR in combination with the use of an AED. The Cardiff Test of basic life support (BLS) and AED version 3.1 was developed to meet this need and uses standardised checklists to retrospectively evaluate performance from analyses of video recordings and data drawn from a laptop computer attached to a training manikin. This paper reports the inter- and intra-observer reliability of this test. Data used to assess reliability were obtained from an investigation of CPR and AED skill acquisition in a lay responder AED training programme. Six observers were recruited to evaluate performance in 33 data sets, repeating their evaluation after a minimum interval of 3 weeks. More than 70% of the 42 variables considered in this study had a kappa score of 0.70 or above for inter-observer reliability or were drawn from computer data and therefore not subject to evaluator variability. 85% of the 42 variables had kappa scores for intra-observer reliability of 0.70 or above or were drawn from computer data. The standard deviations for inter- and intra-observer measures of time to first shock were 11.6 and 7.7 s, respectively. The inter- and intra-observer reliability for the majority of the variables in the Cardiff Test of BLS and AED version 3.1 is satisfactory. However, reliability is less acceptable with respect to shaking when checking for responsiveness, initial check/clearing of the airway, checks for signs of circulation, time to first shock and performance of interventions in the correct sequence. Further research is required to determine if modifications to the method of assessing these variables can increase reliability.
Ultrasound measures of tendon thickness: Intra-rater, Inter-rater and Inter-machine reliability.

PubMed

Del Baño-Aledo, María Elena; Martínez-Payá, Jacinto Javier; Ríos-Díaz, José; Mejías-Suárez, Silvia; Serrano-Carmona, Sergio; de Groot-Ferrando, Ana

2017-01-01

Ultrasound imaging is often used by physiotherapists and other healthcare professionals but the reliability of image acquisition with different ultrasound machines is unknown. The objective was to compare the intra-rater, inter-rater and intermachine reliability of thickness measurements of the plantar fascia (PF), Achilles tendon (AT), patellar tendon (PT) and elbow common extensor tendon (ECET) with musculoskeletal ultrasound imaging (MSUS). Tendon thickness was measured in four anatomical structures (14 participants, 28 images per tendon) by two sonographers and with two different ultrasound machines. Intraclass Correlation Coefficients (ICCs) and Bland-Altman plots were calculated. The standard error of measurement (SEM) and minimum detectable difference (MDD) were calculated. Inter-rater reliability was excellent for AT (ICC=0.98; 95% CI= 0.96-0.99) and very good for PT (ICC=0.85; 95% CI = 0.67-0.93) and ECET (ICC=0.81; 95% CI= 0.72-0.94). Reliability for PF was moderate, with an ICC of 0.63 (CI 95%= 0.20-0.83). Bland-Altman plot for inter-machine reliability showed a mean difference of 1 m for PF measurements and a mean difference of 4 m and 20 m for AT and PT. The relative SEMs were below 7% and the MDCs were below 0.7 mm. The MSUS reliability in measuring thickness of the four tendons is confirmed by the homogeneous readings intra sonographers, between operators and between different machines. Level of evidence: Tendon thickness can be measured reliably on different ultrasound devices, which is an important step forward in the use of this technique in daily clinical practice and research. III.
Seven Reliability Indices for High-Stakes Decision Making: Description, Selection, and Simple Calculation

ERIC Educational Resources Information Center

Smith, Stacey L.; Vannest, Kimberly J.; Davis, John L.

2011-01-01

The reliability of data is a critical issue in decision-making for practitioners in the school. Percent Agreement and Cohen's kappa are the two most widely reported indices of inter-rater reliability, however, a recent Monte Carlo study on the reliability of multi-category scales found other indices to be more trustworthy given the type of data…
Are photographic records reliable for orthodontic screening?

PubMed

Mandall, N A

2002-06-01

The aim of the study was to evaluate the reliability of a panel of orthodontists for accepting new patient referrals based on clinical photographs. Eight orthodontists from Greater Manchester, Lancashire, Chester, and Derbyshire observed clinical photographs of 40 consecutive new patients attending the orthodontic department, Hope Hospital, Salford. They recorded whether or not they would accept the patient, as a new patient referral, in their department. Each consultant was asked to take into account factors, such as oral hygiene, dental development, and severity of the malocclusion. Kappa statistic for multiple-rater agreement and kappa statistic for intra-observer reliability were calculated. Inter-observer panel agreement for accepting new patient referrals based on photographic information was low (multiple rater kappa score 0.37). Intra-examiner agreement was better (kappa range 0.34-0.90). Clinician agreement for screening and accepting orthodontic referrals based on clinical photographs is comparable to that previously reported for other clinical decision making.

Psychometric Evaluation of the D-Catch, an Instrument to Measure the Accuracy of Nursing Documentation.

PubMed

D'Agostino, Fabio; Barbaranelli, Claudio; Paans, Wolter; Belsito, Romina; Juarez Vela, Raul; Alvaro, Rosaria; Vellone, Ercole

2017-07-01

To evaluate the psychometric properties of the D-Catch instrument. A cross-sectional methodological study. Validity and reliability were estimated with confirmatory factor analysis (CFA) and internal consistency and inter-rater reliability, respectively. A sample of 250 nursing documentations was selected. CFA showed the adequacy of a 1-factor model (chronologically descriptive accuracy) with an outlier item (nursing diagnosis accuracy). Internal consistency and inter-rater reliability were adequate. The D-Catch is a valid and reliable instrument for measuring the accuracy of nursing documentation. Caution is needed when measuring diagnostic accuracy since only one item measures this dimension. The D-Catch can be used as an indicator of the accuracy of nursing documentation and the quality of nursing care. © 2015 NANDA International, Inc.
Bronchiolitis Score of Sant Joan de Déu: BROSJOD Score, validation and usefulness.

PubMed

Balaguer, Mònica; Alejandre, Carme; Vila, David; Esteban, Elisabeth; Carrasco, Josep L; Cambra, Francisco José; Jordan, Iolanda

2017-04-01

To validate the bronchiolitis score of Sant Joan de Déu (BROSJOD) and to examine the previously defined scoring cutoff. Prospective, observational study. BROSJOD scoring was done by two independent physicians (at admission, 24 and 48 hr). Internal consistency of the score was assessed using Cronbach's α. To determine inter-rater reliability, the concordance correlation coefficient estimated as an intraclass correlation coefficient (CCC) and limits of agreement estimated as the 90% total deviation index (TDI) were estimated. An expert opinion was used to classify patients according to clinical severity. A validity analysis was conducted comparing the 3-level classification score to that expert opinion. Volume under the surface (VUS), predictive values, and probability of correct classification (PCC) were measured to assess discriminant validity. About 112 patients were recruited, 62 of them (55.4%) males. Median age: 52.5 days (IQR: 32.75-115.25). The admission Cronbach's α was 0.77 (CI95%: 0.71; 0.82) and at 24 hr it was 0.65 (CI95%: 0.48; 0.7). The inter-rater reliability analysis was: CCC at admission 0.96 (95%CI 0.94-0.97), at 24 h 0.77 (95%CI 0.65-0.86), and at 48 hr 0.94 (95%CI 0.94-0.97); TDI 90%: 1.6, 2.9, and 1.57, respectively. The discriminant validity at admission: VUS of 0.8 (95%CI 0.70-0.90), at 24 h 0.92 (95%CI 0.85-0.99), and at 48 hr 0.93 (95%CI 0.87-0.99). The predictive values and PCC values were within 38-100% depending on the level of clinical severity. There is a high inter-rater reliability, showing the BROSJOD score to be reliable and valid, even when different observers apply it. Pediatr Pulmonol. 2017;52:533-539. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
The 2007 AASM Recommendations for EEG Electrode Placement in Polysomnography: Impact on Sleep and Cortical Arousal Scoring

PubMed Central

Ruehland, Warren R.; O'Donoghue, Fergal J.; Pierce, Robert J.; Thornton, Andrew T.; Singh, Parmjit; Copland, Janet M.; Stevens, Bronwyn; Rochford, Peter D.

2011-01-01

Study Objective: To examine the impact of using American Academy of Sleep Medicine (AASM) recommended EEG derivations (F4/M1, C4/M1, O2/M1) vs. a single derivation (C4/M1) in polysomnography (PSG) on the measurement of sleep and cortical arousals, including inter- and intra-observer variability. Design: Prospective, non-blinded, randomized comparison. Setting: Three Australian tertiary-care hospital clinical sleep laboratories. Patients or Participants: 30 PSGs from consecutive patients investigated for obstructive sleep apnea (OSA) during December 2007 and January 2008. Interventions: N/A Measurements and Results: To examine the impact of EEG derivations on PSG summary statistics, 3 scorers from different Australian clinical sleep laboratories each scored separate sets of 10 PSGs twice, once using 3 EEG derivations and once using 1 EEG derivation. To examine the impact on inter- and intra-scorer reliability, all 3 scorers scored a subset of 10 PSGs 4 times, twice using each method. All PSGs were de-identified and scored in random order according to the 2007 AASM Manual for the Scoring of Sleep and Associated Events. Using 3 referential EEG derivations during PSG, as recommended in the AASM manual, instead of a single central EEG derivation, as originally suggested by Rechtschaffen and Kales (1968), resulted in a mean ± SE decrease in N1 sleep of 9.6 ± 3.9 min (P = 0.018) and an increase in N3 sleep of 10.6 ± 2.8 min (P = 0.001). No significant differences were observed for any other sleep or arousal scoring summary statistics; nor were any differences observed in inter-scorer or intra-scorer reliability for scoring sleep or cortical arousals. Conclusion: This study provides information for those changing practice to comply with the 2007 AASM recommendations for EEG placement in PSG, for those using portable devices that are unable to comply with the recommendations due to limited channel options, and for the development of future standards for PSG scoring and recording. As the use of multiple EEG derivations only led to small changes in the distribution of derived sleep stages and no significant differences in scoring reliability, this study calls into question the need to use multiple EEG derivations in clinical PSG as suggested in the AASM manual. Citation: Ruehland WR; O'Donoghue FJ; Pierce RJ; Thornton AT; Singh P; Copland JM; Stevens B; Rochford PD. The 2007 AASM recommendations for EEG electrode placement in polysomnography: impact on sleep and cortical arousal scoring. SLEEP 2011;34(1):73-81. PMID:21203376
Brazilian version of the Nottingham Sensory Assessment: validity, agreement and reliability.

PubMed

Lima, Daniela H F; Queiroz, Ana P; De Salvo, Geovana; Yoneyama, Simone M; Oberg, Telma D; Lima, Núbia M F V

2010-01-01

To investigate the inter-rater and intra-rater reliability, construct validity and internal consistency of the Brazilian version of the Nottingham Sensory Assessment for Stroke Patients (NSA). The instrument was translated into Portuguese from its original in English by a bilingual translator and was then back-translated into English. Twenty-one hemiparetics were evaluated by two examiners using the NSA and the Fugl-Meyer Assessment (FMA) of physical performance. Significant correlation were found between the FMA and the NSA (r=0.752). The NSA showed excellent internal consistency (0.86), and there were acceptable inter- and intra-rater reliability for all items of the NSA, except temperature. Significant ceiling effects were found for the NSA and the FMA. The Brazilian version of the NSA met the criteria for agreement, internal consistency and concurrent validity. It was quick and easy to apply, and it could be used within clinical practice in neuro-rehabilitation outpatient clinics to assess sensory functions following stroke. The significant ceiling effect for the NSA did not limit its use, given that for the same patients, the FMA also showed ceiling effects.
The reliability and validity of hand-held refractometry water content measures of hydrogel lenses.

PubMed

Nichols, Jason J; Mitchell, G Lynn; Good, Gregory W

2003-06-01

To investigate within- and between-examiner reliability and validity of hand-held refractometry water content measures of hydrogel lenses. Nineteen lenses of various nominal water contents were examined by two examiners on two occasions separated by 1 hour. An Atago N2 hand-held refractometer was used for all water content measures. Lenses were presented in a random order to each examiner by a third party, and examiners were masked to any potential lens identifiers. Intraclass correlation coefficients (ICC), 95% limits of agreement, and Wilcoxon signed rank test were used to characterize the within- and between-examiner reliability and validity of lens water content measures. Within-examiner reliability was excellent (ICC, 0.97; 95% limits of agreement, -3.6% to +5.7%), and the inter-visit mean difference of 1.1 +/- 2.4% was not biased (p = 0.08). Between-examiner reliability was also excellent (ICC, 0.98; 95% limits of agreement, -4.1% to +3.9%). The mean difference between examiners was -0.1 +/- 2.1% (p = 0.83). The mean difference between the nominally reported water content and our water content measures was -2.1 +/- 1.7% (p < 0.001); the 95% limits of agreement for this difference were -5.4% to +1.1%. There is good reliability within and between examiners in measuring water content of hydrogel lenses. However, with our sample of lenses, examiners tended to overestimate the nominal water content of hydrogel lenses. As discussed, this bias may be associated with the Brix scale used in refractometry and is material dependent. Therefore, investigators may need to account for bias when measuring hydrogel lens water content via hand-held refractometry.
Reliability of two social cognition tests: The combined stories test and the social knowledge test.

PubMed

Thibaudeau, Élisabeth; Cellard, Caroline; Legendre, Maxime; Villeneuve, Karèle; Achim, Amélie M

2018-04-01

Deficits in social cognition are common in psychiatric disorders. Validated social cognition measures with good psychometric properties are necessary to assess and target social cognitive deficits. Two recent social cognition tests, the Combined Stories Test (COST) and the Social Knowledge Test (SKT), respectively assess theory of mind and social knowledge. Previous studies have shown good psychometric properties for these tests, but the test-retest reliability has never been documented. The aim of this study was to evaluate the test-retest reliability and the inter-rater reliability of the COST and the SKT. The COST and the SKT were administered twice to a group of forty-two healthy adults, with a delay of approximately four weeks between the assessments. Excellent test-retest reliability was observed for the COST, and a good test-retest reliability was observed for the SKT. There was no evidence of practice effect. Furthermore, an excellent inter-rater reliability was observed for both tests. This study shows a good reliability of the COST and the SKT that adds to the good validity previously reported for these two tests. These good psychometrics properties thus support that the COST and the SKT are adequate measures for the assessment of social cognition. Copyright © 2018. Published by Elsevier B.V.
The Statin-Associated Muscle Symptom Clinical Index (SAMS-CI): Revision for Clinical Use, Content Validation, and Inter-rater Reliability.

PubMed

Rosenson, Robert S; Miller, Kate; Bayliss, Martha; Sanchez, Robert J; Baccara-Dinet, Marie T; Chibedi-De-Roche, Daniela; Taylor, Beth; Khan, Irfan; Manvelian, Garen; White, Michelle; Jacobson, Terry A

2017-04-01

The Statin-Associated Muscle Symptom Clinical Index (SAMS-CI) is a method for assessing the likelihood that a patient's muscle symptoms (e.g., myalgia or myopathy) were caused or worsened by statin use. The objectives of this study were to prepare the SAMS-CI for clinical use, estimate its inter-rater reliability, and collect feedback from physicians on its practical application. For content validity, we conducted structured in-depth interviews with its original authors as well as with a panel of independent physicians. Estimation of inter-rater reliability involved an analysis of 30 written clinical cases which were scored by a sample of physicians. A separate group of physicians provided feedback on the clinical use of the SAMS-CI and its potential utility in practice. Qualitative interviews with providers supported the content validity of the SAMS-CI. Feedback on the clinical use of the SAMS-CI included several perceived benefits (such as brevity, clear wording, and simple scoring process) and some possible concerns (workflow issues and applicability in primary care). The inter-rater reliability of the SAMS-CI was estimated to be 0.77 (confidence interval 0.66-0.85), indicating high concordance between raters. With additional provider feedback, a revised SAMS-CI instrument was created suitable for further testing, both in the clinical setting and in prospective validation studies. With standardized questions, vetted language, easily interpreted scores, and demonstrated reliability, the SAMS aims to estimate the likelihood that a patient's muscle symptoms were attributable to statins. The SAMS-CI may support better detection of statin-associated muscle symptoms in clinical practice, optimize treatment for patients experiencing muscle symptoms, and provide a useful tool for further clinical research.
The use and reliability of SymNose for quantitative measurement of the nose and lip in unilateral cleft lip and palate patients.

PubMed

Mosmuller, David; Tan, Robin; Mulder, Frans; Bachour, Yara; de Vet, Henrica; Don Griot, Peter

2016-10-01

It is essential to have a reliable assessment method in order to compare the results of cleft lip and palate surgery. In this study the computer-based program SymNose, a method for quantitative assessment of the nose and lip, will be assessed on usability and reliability. The symmetry of the nose and lip was measured twice in 50 six-year-old complete and incomplete unilateral cleft lip and palate patients by four observers. For the frontal view the asymmetry level of the nose and upper lip were evaluated and for the basal view the asymmetry level of the nose and nostrils were evaluated. A mean inter-observer reliability when tracing each image once or twice was 0.70 and 0.75, respectively. Tracing the photographs with 2 observers and 4 observers gave a mean inter-observer score of 0.86 and 0.92, respectively. The mean intra-observer reliability varied between 0.80 and 0.84. SymNose is a practical and reliable tool for the retrospective assessment of large caseloads of 2D photographs of cleft patients for research purposes. Moderate to high single inter-observer reliability was found. For future research with SymNose reliable outcomes can be achieved by using the average outcomes of single tracings of two observers. Copyright © 2016 European Association for Cranio-Maxillo-Facial Surgery. Published by Elsevier Ltd. All rights reserved.
Towards an Operational Definition of Clinical Competency in Pharmacy

PubMed Central

2015-01-01

Objective. To estimate the inter-rater reliability and accuracy of ratings of competence in student pharmacist/patient clinical interactions as depicted in videotaped simulations and to compare expert panelist and typical preceptor ratings of those interactions. Methods. This study used a multifactorial experimental design to estimate inter-rater reliability and accuracy of preceptors’ assessment of student performance in clinical simulations. The study protocol used nine 5-10 minute video vignettes portraying different levels of competency in student performance in simulated clinical interactions. Intra-Class Correlation (ICC) was used to calculate inter-rater reliability and Fisher exact test was used to compare differences in distribution of scores between expert and nonexpert assessments. Results. Preceptors (n=42) across 5 states assessed the simulated performances. Intra-Class Correlation estimates were higher for 3 nonrandomized video simulations compared to the 6 randomized simulations. Preceptors more readily identified high and low student performances compared to satisfactory performances. In nearly two-thirds of the rating opportunities, a higher proportion of expert panelists than preceptors rated the student performance correctly (18 of 27 scenarios). Conclusion. Valid and reliable assessments are critically important because they affect student grades and formative student feedback. Study results indicate the need for pharmacy preceptor training in performance assessment. The process demonstrated in this study can be used to establish minimum preceptor benchmarks for future national training programs. PMID:26089563
Inter-rater reliability of twelve diagnostic systems of schizophrenia.

PubMed

Helmes, E; Landmark, J; Kazarian, S S

1983-05-01

The present and past symptomatology of 31 chronic schizophrenics was rated by four independent judges, two experienced clinical psychiatrists and two psychiatric residents, in a context more representative of actual clinical practice than most research studies. Ratings were made on 64 symptoms derived from 12 diagnostic systems, based on either live or videotaped interviews for present symptomatology and case records for past symptomatology. Inter-rater reliabilities were higher for present than for past symptoms, and in general did not approach those reported for highly trained raters. There were no differences between live and videotaped interviews. Diagnostic systems differed widely in rater agreement. The most consistent across both past and present symptomatology were the systems of Langfeldt, Schneider, and DSM-III, for which the level of reliability was consistent with other studies.
The assessment of fidelity in a motor speech-treatment approach

PubMed Central

Hayden, Deborah; Namasivayam, Aravind Kumar; Ward, Roslyn

2015-01-01

Objective To demonstrate the application of the constructs of treatment fidelity for research and clinical practice for motor speech disorders, using the Prompts for Restructuring Oral Muscular Phonetic Targets (PROMPT) Fidelity Measure (PFM). Treatment fidelity refers to a set of procedures used to monitor and improve the validity and reliability of behavioral intervention. While the concept of treatment fidelity has been emphasized in medical and allied health sciences, documentation of procedures for the systematic evaluation of treatment fidelity in Speech-Language Pathology is sparse. Methods The development and iterative process to improve the PFM, is discussed. Further, the PFM is evaluated against recommended measurement strategies documented in the literature. This includes evaluating the appropriateness of goals and objectives; and the training of speech–language pathologists, using direct and indirect procedures. Three expert raters scored the PFM to examine inter-rater reliability. Results Three raters, blinded to each other's scores, completed fidelity ratings on three separate occasions. Inter-rater reliability, using Krippendorff's Alpha, was >80% for the PFM on the final scoring occasion. This indicates strong inter-rater reliability. Conclusion The development of fidelity measures for the training of service providers and treatment delivery is important in specialized treatment approaches where certain ‘active ingredients’ (e.g. specific treatment targets and therapeutic techniques) must be present in order for treatment to be effective. The PFM reflects evidence-based practice by integrating treatment delivery and clinical skill as a single quantifiable metric. PFM enables researchers and clinicians to objectively measure treatment outcomes within the PROMPT approach. PMID:26213623
Simulated patient training: Using inter-rater reliability to evaluate simulated patient consistency in nursing education.

PubMed

MacLean, Sharon; Geddes, Fiona; Kelly, Michelle; Della, Phillip

2018-03-01

Simulated patients (SPs) are frequently used for training nursing students in communication skills. An acknowledged benefit of using SPs is the opportunity to provide a standardized approach by which participants can demonstrate and develop communication skills. However, relatively little evidence is available on how to best facilitate and evaluate the reliability and accuracy of SPs' performances. The aim of this study is to investigate the effectiveness of an evidenced based SP training framework to ensure standardization of SPs. The training framework was employed to improve inter-rater reliability of SPs. A quasi-experimental study was employed to assess SP post-training understanding of simulation scenario parameters using inter-rater reliability agreement indices. Two phases of data collection took place. Initially a trial phase including audio-visual (AV) recordings of two undergraduate nursing students completing a simulation scenario is rated by eight SPs using the Interpersonal Communication Assessments Scale (ICAS) and Quality of Discharge Teaching Scale (QDTS). In phase 2, eight SP raters and four nursing faculty raters independently evaluated students' (N=42) communication practices using the QDTS. Intraclass correlation coefficients (ICC) were >0.80 for both stages of the study in clinical communication skills. The results support the premise that if trained appropriately, SPs have a high degree of reliability and validity to both facilitate and evaluate student performance in nurse education. Crown Copyright © 2018. Published by Elsevier Ltd. All rights reserved.
Examining the cultural validity of fear survey schedule for children: the contemporary fears of Turkish children and adolescents.

PubMed

Serim-Yildiz, Begüm; Erdur-Baker, Ozgür

2013-01-01

The authors examined the cultural validity of Fear Survey Schedule for Children (FSSC-AM) developed by J. J. Burnham (2005) with Turkish children. The relationships between demographic variables and the level of fear were also tested. Three independent data sets were used. The first data set comprised 676 participants (321 women and 355 men) and was used for examining factor structure and internal reliability of FSSC. The second data set comprised 639 participants (321 women and 318 men) and was used for testing internal reliability and to confirm the factor structure of FSCC. The third data set comprised 355 participants (173 women and 182 men) and used for analyses of test-retest reliability, inter-item reliability, and convergent validity for the scores of FSSC. The sum of the first and second samples (1,315 participants; 642 women and 673 men) was used for testing the relationships between demographic variables and the level of fear. Results indicated that FSSC is a valid and reliable instrument to examine Turkish children's and adolescents' fears between the ages of 8 and 18 years. The younger, female, children of low-income parents reported a higher level of fear. The findings are discussed in light of the existing literature.
Carotid and vertebral injury study (CAVIS) technique for characterization of blunt traumatic aneurysms with reliability assessment.

PubMed

Griessenauer, Christoph J; Foreman, Paul; Shoja, Mohammadali M; Kicielinski, Kimberly P; Deveikis, John P; Walters, Beverly C; Harrigan, Mark R

2015-04-01

Traumatic aneurysms occur in up to 20% of blunt traumatic extracranial carotid artery injuries. Currently there is no standardized method for characterization of traumatic aneurysms. For the carotid and vertebral injury study (CAVIS), a prospective study of traumatic cerebrovascular injury, we established a method for aneurysm characterization and tested its reliability. Saccular aneurysm size was defined as the greatest linear distance between the expected location of the normal artery wall and the outer edge of the aneurysm lumen ("depth"). Fusiform aneurysm size was defined as the "depth" and longitudinal distance ("length") paralleling the normal artery. The size of the aneurysm relative to the normal artery was also assessed. Reliability measurements were made using four raters who independently reviewed 15 computed tomographic angiograms (CTAs) and 13 digital subtraction angiograms (DSAs) demonstrating a traumatic aneurysm of the internal carotid artery. Raters categorized the aneurysms as either "saccular" or "fusiform" and made measurements. Five scans of each imaging modality were repeated to evaluate intra-rater reliability. Fleiss's free-marginal multi-rater kappa (κ), Cohen's kappa (κ), and interclass correlation coefficient (ICC) determined inter- and intra-rater reliability. Inter-rater agreement as to the aneurysm "shape" was almost perfect for CTA (κ = 0.82) and DSA (κ = 0.897). Agreements on aneurysm "depth," "length," "aneurysm plus parent artery," and "parent artery" for CTA and DSA were excellent (ICC > 0.75). Intra-rater agreement as to aneurysm "shape" was substantial to almost perfect (κ > 0.60). The CAVIS method of traumatic aneurysm characterization has remarkable inter- and intra-rater reliability and will facilitate further studies of the natural history and management of extracranial cerebrovascular traumatic aneurysms. © The Author(s) 2015 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
Reliability, validity and minimal detectable change of the Mini-BESTest in Greek participants with chronic stroke.

PubMed

Lampropoulou, Sofia I; Billis, Evdokia; Gedikoglou, Ingrid A; Michailidou, Christina; Nowicky, Alexander V; Skrinou, Dimitra; Michailidi, Fotini; Chandrinou, Danae; Meligkoni, Margarita

2018-02-23

This study aimed to investigate the psychometric characteristics of reliability, validity and ability to detect change of a newly developed balance assessment tool, the Mini-BESTest, in Greek patients with stroke. A prospective, observational design study with test-retest measures was conducted. A convenience sample of 21 Greek patients with chronic stroke (14 male, 7 female; age of 63 ± 16 years) was recruited. Two independent examiners administered the scale, for the inter-rater reliability, twice within 10 days for the test-retest reliability. Bland Altman Analysis for repeated measures assessed the absolute reliability and the Standard Error of Measurement (SEM) and the Minimum Detectable Change at 95% confidence interval (MDC 95% ) were established. The Greek Mini-BESTest (Mini-BESTest GR ) was correlated with the Greek Berg Balance Scale (BBS GR ) for assessing the concurrent validity and with the Timed Up and Go (TUG), the Functional Reach Test (FRT) and the Greek Falls Efficacy Scale-International (FES-I GR ) for the convergent validity. The Mini-BESTestGR demonstrated excellent inter-rater reliability (ICC (95%CI) = 0.997 (0.995-0.999, SEM = 0.46) with the scores of two raters within the limits of agreement (mean dif = -0.143 ± 0.727, p > 0.05) and test-retest reliability (ICC (95%CI) = 0.966 (0.926-0.988), SEM = 1.53). Additionally, the Mini-BESTest GR yielded very strong to moderate correlations with BBS GR (r = 0.924, p < 0.001), TUG (r = -0.823, p < 0.001), FES-I GR (r = -0.734, p < 0.001) and FRT (r = 0.689, p < 0.001). MDC 95 was 4.25 points. The exceptionally high reliability and the equally good validity of the Mini-BESTest GR , strongly support its utility in Greek people with chronic stroke. Its ability to identify clinically meaningful changes and falls risk need further investigation.
The influence of critical shoulder angle on secondary rotator cuff insufficiency following shoulder arthroplasty.

PubMed

Cerciello, Simone; Monk, Andrew Paul; Visonà, Enrico; Carbone, Stefano; Edwards, Thomas Bradley; Maffulli, Nicola; Walch, Gilles

2017-07-01

Secondary cuff failure after shoulder replacement is disabling and often requires additional surgery. Increased critical shoulder angle (CSA) has been found in patients with cuff tear compared to normal subjects. The interobserver reliability of the CSA and the relationship between CSA and symptomatic secondary cuff failure after shoulder replacement were investigated. Nineteen patients with symptomatic cuff failure after anatomic shoulder replacement (mean FU 45 months) were compared to a control group of 29 patients showing no signs of symptomatic cuff failure (mean FU 105.7 months). The CSA was measured by two blinded surgeons at a mean follow-up of 45 and 105.7 months, respectively. Inter-observer reliability was calculated. The mean CSA in the study group in neutral, internal and external rotations were 33°, 34° and 34°, respectively. Corresponding values in the control group were 32°, 32° and 32°. The interclass correlation coefficient for the whole population between the two examiners were 0.956 (P < 0.01), 0.964 (P < 0.01) and 0.955 (P < 0.01), respectively. There were no significant differences of CSA values between patients who had undergone shoulder replacement and experienced late cuff failure and those in whom the same procedure had been successful. A good inter-observer reliability was found for the CSA method.
Reliability of the Melbourne assessment of unilateral upper limb function.

PubMed

Randall, M; Carlin, J B; Chondros, P; Reddihough, D

2001-11-01

This study examines the reliability of the Melbourne Assessment of Unilateral Upper Limb Function: a quantitative test of quality of movement in children with neurological impairment. The assessment was administered to 20 children aged from 5 to 16 years (mean age 9 years 10 months, SD 2 years 10 months) who had various types and degrees of cerebral palsy (CP). The performances of the 20 children during assessment were videotaped for subsequent scoring by 15 occupational therapists. Scores were analyzed for internal consistency of test items, inter- and intrarater reliability of scorings of the same videotapes, and test-retest reliability using repeat videotaping. Results revealed very high internal consistency of test items (alpha=0.96), moderate to high agreement both within and between raters for all test items (intraclass correlations of at least 0.7) apart from item 16 (hand to mouth and down), and high interrater reliability (0.95) and intrarater reliability (0.97) for total test scores. Test-retest results revealed moderate to high intrarater reliability for item totals (mean of 0.83 and 0.79) for each rater and high reliability for test totals (0.98 and 0.97). These findings indicate that the Melbourne Assessment of Unilateral Upper Limb Function is a reliable tool for measuring the quality of unilateral upper-limb movement in children with CP.
[Reliability and reproducibility of the Fitzpatrick phototype scale for skin sensitivity to ultraviolet light].

PubMed

Sánchez, Guillermo; Nova, John; Arias, Nilsa; Peña, Bibiana

2008-12-01

The Fitzpatrick phototype scale has been used to determine skin sensitivity to ultraviolet light. The reliability of this scale in estimating sensitivity permits risk evaluation of skin cancer based on phototype. Reliability and changes in intra and inter-observer concordance was determined for the Fitzpatrick phototype scale after the assessment methods for establishing the phototype were standardized. An analytical study of intra and inter-observer concordance was performed. The Fitzpatrick phototype scale was standardized using focus group methodology. To determine intra and inter-observer agreement, the weighted kappa statistical method was applied. The standardization effect was measured using the equal kappa contrast hypothesis and Wald test for dependent measurements. The phototype scale was applied to 155 patients over 15 years of age who were assessed four times by two independent observers. The sample was drawn from patients of the Centro Dermatol6gico Federico Lleras Acosta. During the pre-standardization phase, the baseline and six-week inter-observer weighted kappa were 0.31 and 0.40, respectively. The intra-observer kappa values for observers A and B were 0.47 and 0.51, respectively. After the standardization process, the baseline and six-week inter-observer weighted kappa values were 0.77, and 0.82, respectively. Intra-observer kappa coefficients for observers A and B were 0.78 and 0.82. Statistically significant differences were found between coefficients before and after standardization (p<0.001) in all comparisons. Following a standardization exercise, the Fitzpatrick phototype scale yielded reliable, reproducible and consistent results.
Inter-rater reliability of a food store checklist to assess availability of healthier alternatives to the energy-dense snacks and beverages commonly consumed by children.

PubMed

Izumi, Betty T; Findholt, Nancy E; Pickus, Hayley A; Nguyen, Thuan; Cuneo, Monica K

2014-06-01

Food stores have gained attention as potential intervention targets for improving children's eating habits. There is a need for valid and reliable instruments to evaluate changes in food store snack and beverage availability secondary to intervention. The aim of this study was to develop a valid, reliable, and resource-efficient instrument to evaluate the healthfulness of food store environments faced by children. The SNACZ food store checklist was developed to assess availability of healthier alternatives to the energy-dense snacks and beverages commonly consumed by children. After pretesting, two trained observers independently assessed the availability of 48 snack and beverage items in 50 food stores located near elementary and middle schools in Portland, Oregon, over a 2-week period in summer 2012. Inter-rater reliability was calculated using the kappa statistic. Overall, the instrument had mostly high inter-rater reliability. Seventy-three percent of items assessed had almost perfect or substantial reliability. Two items had moderate reliability (0.41-0.60), and no items had a reliability score less than 0.41. Eleven items occurred too infrequently to generate a kappa score. The SNACZ food store checklist is a first-step toward developing a valid and reliable tool to evaluate the healthfulness of food store environments faced by children. The tool can be used to compare availability of healthier snack and beverage alternatives across communities and measure change secondary to intervention. As a wider variety of healthier snack and beverage alternatives become available in food stores, the checklist should be updated.
Knowledge translation from continuing education to physiotherapy practice in classifying patients with low back pain.

PubMed

Karvonen, Eira; Paatelma, Markku; Kesonen, Jukka-Pekka; Heinonen, Ari O

2015-05-01

Physical therapists have used continuing education as a method of improving their skills in conducting clinical examination of patients with low back pain (LBP). The purpose of this study was to evaluate how well the pathoanatomical classification of patients in acute or subacute LBP can be learned and applied through a continuing education format. The patients were seen in a direct access setting. The study was carried out in a large health-care center in Finland. The analysis included a total of 57 patient evaluations generated by six physical therapists on patients with LBP. We analyzed the consistency and level of agreement of the six physiotherapists' (PTs) diagnostic decisions, who participated in a 5-day, intensive continuing education session and also compared those with the diagnostic opinions of two expert physical therapists, who were blind to the original diagnostic decisions. Evaluation of the physical therapists' clinical examination of the patients was conducted by the two experts, in order to determine the accuracy and percentage agreement of the pathoanatomical diagnoses. The percentage of agreement between the experts and PTs was 72-77%. The overall inter-examiner reliability (kappa coefficient) for the subgroup classification between the six PTs and two experts was 0.63 [95% confidence interval (CI): 0.47-0.77], indicating good agreement between the PTs and the two experts. The overall inter-examiner reliability between the two experts was 0.63 (0.49-0.77) indicating good level of agreement. Our results indicate that PTs' were able to apply their continuing education training to clinical reasoning and make consistently accurate pathoanatomic based diagnostic decisions for patients with LBP. This would suggest that continuing education short-courses provide a reasonable format for knowledge translation (KT) by which physical therapists can learn and apply new information related to the examination and differential diagnosis of patients in acute or subacute LBP.

The reliability of three psoriasis assessment tools: Psoriasis area and severity index, body surface area and physician global assessment.

PubMed

Bożek, Agnieszka; Reich, Adam

2017-08-01

A wide variety of psoriasis assessment tools have been proposed to evaluate the severity of psoriasis in clinical trials and daily practice. The most frequently used clinical instrument is the psoriasis area and severity index (PASI); however, none of the currently published severity scores used for psoriasis meets all the validation criteria required for an ideal score. The aim of this study was to compare and assess the reliability of 3 commonly used assessment instruments for psoriasis severity: the psoriasis area and severity index (PASI), body surface area (BSA) and physician global assessment (PGA). On the scoring day, 10 trained dermatologists evaluated 9 adult patients with plaque-type psoriasis using the PASI, BSA and PGA. All the subjects were assessed twice by each physician. Correlations between the assessments were analyzed using the Pearson correlation coefficient. Intra-class correlation coefficient (ICC) was calculated to analyze intra-rater reliability, and the coefficient of variation (CV) was used to assess inter-rater variability. Significant correlations were observed among the 3 scales in both assessments. In all 3 scales the ICCs were > 0.75, indicating high intra-rater reliability. The highest ICC was for the BSA (0.96) and the lowest one for the PGA (0.87). The CV for the PGA and PASI were 29.3 and 36.9, respectively, indicating moderate inter-rater variability. The CV for the BSA was 57.1, indicating high inter-rater variability. Comparing the PASI, PGA and BSA, it was shown that the PGA had the highest inter-rater reliability, whereas the BSA had the highest intra-rater reliability. The PASI showed intermediate values in terms of interand intra-rater reliability. None of the 3 assessment instruments showed a significant advantage over the other. A reliable assessment of psoriasis severity requires the use of several independent evaluations simultaneously.
[Inter-rater reliability and validity of the OPD-CA axes structure and conflict].

PubMed

Benecke, Cord; Bock, Astrid; Wieser, Elke; Tschiesner, Reinhard; Lochmann, Martha; Küspert, Felicia; Schorn, Robert; Viertler, Bernhard; Steinmayr-Gensluckner, Maria

2011-01-01

The manual of the Operationalized Psychodynamic Diagnostics in childhood and adolescence (OPD-CA) is an instrument meanwhile widespread in the clinical practice to assess psychodynamic dimensions. Publications of inter-rater agreement and validity are still outstanding. This study assessed the interrater-reliability and validity for the axis structure and the axis conflict. 60 adolescents between 14 and 17 years, with and without psychic disorders, were diagnosed with the Operationalized Psychodynamic Diagnostics in childhood and adolescence (Arbeitskreis OPD-KJ, 2007) and SCID-II-interviews and questionnaires. A partial sample of 36 OPD-CA-interviews was the data basis for the assessment of inter-rater agreement. Calculations of validity for axis structure and axis conflict were made with the whole sample. Inter-rater agreement for the axis structure and the axis conflict showed good to very good weighted Kappa coefficients among the trained raters. Validity of the axis structure showed good results. The Operationalized Psychodynamic Diagnostics in childhood and adolescence (OPD-CA) allows a reliable diagnostic of axis structure and axis conflict, if the ratings are done on the basis of semistructured videotaped interviews by trained raters. The axis structure shows validity, while the results concerning the validity of the axis conflict remain unclear.
The reliability and validity of the Turkish version of Fullerton Advanced Balance (FAB-T) scale.

PubMed

Iyigun, Gozde; Kirmizigil, Berkiye; Angin, Ender; Oksuz, Sevim; Can, Filiz; Eker, Levent; Rose, Debra J

2018-06-04

The aim of this study was to evaluate the reliability and validity of the Turkish version of the FAB(FAB-T) scale in the older Turkish adults. The reliability and validity of the scale was tested on 200 community-dwelling older adults. FAB-T scale was scored by different physiotherapists on different days to evaluate inter-rater and intrarater reliability. The Berg Balance Scale (BBS) was used for the evaluation of convergent validity, and the content validity of the FAB-T scale was investigated. The FAB-T scale showed very high inter- and intra-rater reliability. For inter-rater agreement, on the individual test items and total score ICC values were 0.92 (95 %CI; 0.90-0.94) and 0.96 (95% CI; 0.95-0.97) respectively. The intra-rater agreement, on the individual test items and total score ICC values were 0.93 (95 %CI; 0.91- 0.95) and 0.96 (95% CI; 0.95- 0.97) respectively. There was a good agreement between the FAB-T and BBS scales. A high correlation was found between the BBS and FAB-T scales [rho = 0.70 (%95 CI; 0.62-0.76)] indicating good convergent validity. Considering the content validity of the FAB-T scale, no floor (floor score: 0%) or ceiling (ceiling score: 6.5%) effect was detected. The FAB-T scale was successfully translated from the original English version (FAB) and demonstrated strong psychometric features. It was found that the FAB-T scale has very high inter-rater and intra-rater reliability. Considering the convergent validity, the scale has high correlation with the BBS. The FAB-T has no floor and ceiling effect. Copyright © 2018 Elsevier B.V. All rights reserved.
Five times sit-to-stand test in subjects with total knee replacement: Reliability and relationship with functional mobility tests.

PubMed

Medina-Mirapeix, Francesc; Vivo-Fernández, Iván; López-Cañizares, Juan; García-Vidal, José A; Benítez-Martínez, Josep Carles; Del Baño-Aledo, María Elena

2018-01-01

The objective was to determine the inter-observer and test/retest reliability of the "Five-repetition sit-to-stand" (5STS) test in patients with total knee replacement (TKR). To explore correlation between 5STS and two mobility tests. A reliability study was conducted among 24 (mean age 72.13, S.D. 10.67; 50% were women) outpatients with TKR. They were recruited from a traumatology unit of a public hospital via convenience sampling. A physiotherapist and trauma physician assessed each patient at the same time. The same physiotherapist realized a 5STS second measurement 45-60min after the first one. Reliability was assessed with intraclass correlation coefficients (ICCs) and Bland-Altman plots. Pearson coefficient was calculated to assess the correlation between 5STS, time up to go test (TUG) and four meters gait speed (4MGS). ICC for inter-observer and test-retest reliability of the 5STS were 0.998 (95% confidence interval [CI], 0.995-0.999) and 0.982 (95% CI, 0.959-0.992). Bland-Altman plot inter-observer showed limits between -0.82 and 1.06 with a mean of 0.11 and no heteroscedasticity within the data. Bland-Altman plot for test-retest showed the limits between 1.76 and 4.16, a mean of 1.20 and heteroscedasticity within the data. Pearson correlation coefficient revealed significant correlation between 5STS and TUG (r=0.7, p<0.001) and 4MGS (r=-0.583, p=0.003). This study demonstrates excellent inter-observer and test-retest reliability when it is used in people with TKR, and also significant correlation with other functional mobility tests. These findings support the use of 5STS as outcome measure in TKR population. Copyright © 2017 Elsevier B.V. All rights reserved.
Validity and reliability of clinical prediction rules used to screen for cervical spine injury in alert low-risk patients with blunt trauma to the neck: part 2. A systematic review from the Cervical Assessment and Diagnosis Research Evaluation (CADRE) Collaboration.

PubMed

Moser, N; Lemeunier, N; Southerst, D; Shearer, H; Murnaghan, K; Sutton, D; Côté, P

2018-06-01

To update findings of the 2000-2010 Bone and Joint Decade Task Force on Neck Pain and its Associated Disorders (Neck Pain Task Force) on the validity and reliability of clinical prediction rules used to screen for cervical spine injury in alert low-risk adult patients with blunt trauma to the neck. We searched four databases from 2005 to 2015. Pairs of independent reviewers critically appraised eligible studies using the modified QUADAS-2 and QAREL criteria. We synthesized low risk of bias studies following best evidence synthesis principles. We screened 679 citations; five had a low risk of bias and were included in our synthesis. The sensitivity of the Canadian C-spine rule ranged from 0.90 to 1.00 with negative predictive values ranging from 99 to 100%. Inter-rater reliability of the Canadian C-spine rule varied from k = 0.60 between nurses and physicians to k = 0.93 among paramedics. The inter-rater reliability of the Nexus Low-Risk Criteria was k = 0.53 between resident physicians and faculty physicians. Our review adds new evidence to the Neck Pain Task Force and supports the use of clinical prediction rules in emergency care settings to screen for cervical spine injury in alert low-risk adult patients with blunt trauma to the neck. The Canadian C-spine rule consistently demonstrated excellent sensitivity and negative predictive values. Our review, however, suggests that the reproducibility of the clinical predictions rules varies depending on the examiners level of training and experience.
Reliability of laser Doppler flowmetry curve reading for measurement of toe and ankle pressures: intra- and inter-observer variation.

PubMed

Høyer, C; Paludan, J P D; Pavar, S; Biurrun Manresa, J A; Petersen, L J

2014-03-01

To assess the intra- and inter-observer variation in laser Doppler flowmetry curve reading for measurement of toe and ankle pressures. A prospective single blinded diagnostic accuracy study was conducted on 200 patients with known or suspected peripheral arterial disease (PAD), with a total of 760 curve sets produced. The first curve reading for this study was performed by laboratory technologists blinded to clinical clues and previous readings at least 3 months after the primary data sampling. The pressure curves were later reassessed following another period of at least 3 months. Observer agreement in diagnostic classification according to TASC-II criteria was quantified using Cohen's kappa. Reliability was quantified using intra-class correlation coefficients, coefficients of variance, and Bland-Altman analysis. The overall agreement in diagnostic classification (PAD/not PAD) was 173/200 (87%) for intra-observer (κ = .858) and 175/200 (88%) for inter-observer data (κ = .787). Reliability analysis confirmed excellent correlation for both intra- and inter-observer data (ICC all ≥.931). The coefficients of variance ranged from 2.27% to 6.44% for intra-observer and 2.39% to 8.42% for inter-observer data. Subgroup analysis showed lower observer-variation for reading of toe pressures in patients with diabetes and/or chronic kidney disease than patients not diagnosed with these conditions. Bland-Altman plots showed higher variation in toe pressure readings than ankle pressure readings. This study shows substantial intra- and inter-observer agreement in diagnostic classification and reading of absolute pressures when using laboratory technologists as observers. The study emphasises that observer variation for curve reading is an important factor concerning the overall reproducibility of the method. Our data suggest diabetes and chronic kidney disease have an influence on toe pressure reproducibility. Copyright © 2013 European Society for Vascular Surgery. Published by Elsevier Ltd. All rights reserved.
Palliative sedation: reliability and validity of sedation scales.

PubMed

Arevalo, Jimmy J; Brinkkemper, Tijn; van der Heide, Agnes; Rietjens, Judith A; Ribbe, Miel; Deliens, Luc; Loer, Stephan A; Zuurmond, Wouter W A; Perez, Roberto S G M

2012-11-01

Observer-based sedation scales have been used to provide a measurable estimate of the comfort of nonalert patients in palliative sedation. However, their usefulness and appropriateness in this setting has not been demonstrated. To study the reliability and validity of observer-based sedation scales in palliative sedation. A prospective evaluation of 54 patients under intermittent or continuous sedation with four sedation scales was performed by 52 nurses. Included scales were the Minnesota Sedation Assessment Tool (MSAT), Richmond Agitation-Sedation Scale (RASS), Vancouver Interaction and Calmness Scale (VICS), and a sedation score proposed in the Guideline for Palliative Sedation of the Royal Dutch Medical Association (KNMG). Inter-rater reliability was tested with the intraclass correlation coefficient (ICC) and Cohen's kappa coefficient. Correlations between the scales using Spearman's rho tested concurrent validity. We also examined construct, discriminative, and evaluative validity. In addition, nurses completed a user-friendliness survey. Overall moderate to high inter-rater reliability was found for the VICS interaction subscale (ICC = 0.85), RASS (ICC = 0.73), and KNMG (ICC = 0.71). The largest correlation between scales was found for the RASS and KNMG (rho = 0.836). All scales showed discriminative and evaluative validity, except for the MSAT motor subscale and VICS calmness subscale. Finally, the RASS was less time consuming, clearer, and easier to use than the MSAT and VICS. The RASS and KNMG scales stand as the most reliable and valid among the evaluated scales. In addition, the RASS was less time consuming, clearer, and easier to use than the MSAT and VICS. Further research is needed to evaluate the impact of the scales on better symptom control and patient comfort. Copyright © 2012 U.S. Cancer Pain Relief Committee. Published by Elsevier Inc. All rights reserved.
Content Validity Index and Intra- and Inter-Rater Reliability of a New Muscle Strength/Endurance Test Battery for Swedish Soldiers

PubMed Central

Larsson, Helena; Tegern, Matthias; Monnier, Andreas; Skoglund, Jörgen; Helander, Charlotte; Persson, Emelie; Malm, Christer; Broman, Lisbet; Aasa, Ulrika

2015-01-01

The objective of this study was to examine the content validity of commonly used muscle performance tests in military personnel and to investigate the reliability of a proposed test battery. For the content validity investigation, thirty selected tests were those described in the literature and/or commonly used in the Nordic and North Atlantic Treaty Organization (NATO) countries. Nine selected experts rated, on a four-point Likert scale, the relevance of these tests in relation to five different work tasks: lifting, carrying equipment on the body or in the hands, climbing, and digging. Thereafter, a content validity index (CVI) was calculated for each work task. The result showed excellent CVI (≥0.78) for sixteen tests, which comprised of one or more of the military work tasks. Three of the tests; the functional lower-limb loading test (the Ranger test), dead-lift with kettlebells, and back extension, showed excellent content validity for four of the work tasks. For the development of a new muscle strength/endurance test battery, these three tests were further supplemented with two other tests, namely, the chins and side-bridge test. The inter-rater reliability was high (intraclass correlation coefficient, ICC2,1 0.99) for all five tests. The intra-rater reliability was good to high (ICC3,1 0.82–0.96) with an acceptable standard error of mean (SEM), except for the side-bridge test (SEM%>15). Thus, the final suggested test battery for a valid and reliable evaluation of soldiers’ muscle performance comprised the following four tests; the Ranger test, dead-lift with kettlebells, chins, and back extension test. The criterion-related validity of the test battery should be further evaluated for soldiers exposed to varying physical workload. PMID:26177030
Intra and inter-rater reliability of infrared image analysis of masticatory and upper trapezius muscles in women with and without temporomandibular disorder.

PubMed

Costa, Ana C S; Dibai Filho, Almir V; Packer, Amanda C; Rodrigues-Bigaton, Delaine

2013-01-01

Infrared thermography is an aid tool that can be used to evaluate several pathologies given its efficiency in analyzing the distribution of skin surface temperature. To propose two forms of infrared image analysis of the masticatory and upper trapezius muscles, and to determine the intra and inter-rater reliability of both forms of analysis. Infrared images of masticatory and upper trapezius muscles of 64 female volunteers with and without temporomandibular disorder (TMD) were collected. Two raters performed the infrared image analysis, which occurred in two ways: temperature measurement of the muscle length and in central portion of the muscle. The Intraclass Correlation Coefficient (ICC) was used to determine the intra and inter-rater reliability. The ICC showed excellent intra and inter-rater values for both measurements: temperature measurement of the muscle length (TMD group, intra-rater, ICC ranged from 0.996 to 0.999, inter-rater, ICC ranged from 0.992 to 0.999; control group, intra-rater, ICC ranged from 0.993 to 0.998, inter-rater, ICC ranged from 0.990 to 0.998), and temperature measurement of the central portion of the muscle (TMD group, intra-rater, ICC ranged from 0.981 to 0.998, inter-rater, ICC ranged from 0.971 to 0.998; control group, intra-rater, ICC ranged from 0.887 to 0.996, inter-rater, ICC ranged from 0.852 to 0.996). The results indicated that temperature measurements of the masticatory and upper trapezius muscles carried out by the analysis of the muscle length and central portion yielded excellent intra and inter-rater reliability.
Multidisciplinary assessment measure for individuals with disorders of consciousness.

PubMed

Gollega, Ana; Meghji, Chamine; Renton, Sharon; Lazoruk, Arlene; Haynes, Elizabeth; Lawson, Denise; Ostapovitch, MaryAnne

2015-01-01

This study introduces the Comprehensive Assessment Measure for the Minimally Responsive Individual (CAMMRI) and reports on its development, inter-rater reliability, construct validity and clinical value. A multidisciplinary team of therapists developed this measure, which comprises 12 sub-tests that examine three main areas: Response to the Environment, Motor Control and Communication and Swallowing. The sub-tests are scored using a 7-point scale; sub-tests can also be administered individually. The measure was administered during a pilot project and then 1 year later to 12 adult clients with severe acquired brain injury at a long-term rehabilitation programme. The age range of the participants was 18-65 years; individuals were 1.5-10 years post-injury. Comparison measures included the Western Neuro Sensory Stimulation Profile (WNSSP), the Coma Recovery Scale-Revised (CRS-R) and the Chedoke McMaster Impairment Inventory (CMII). Inter-rater reliability of each sub-test ranged from 0.87-1.0, with an average of 0.90 in the first year of the assessments. Validity data supported the use of the CAMMRI for minimally conscious adults with ABI to measure behavioural changes and plan treatment for this population. Future research should focus on using this measure with other neurological populations.
BurnCase 3D software validation study: Burn size measurement accuracy and inter-rater reliability.

PubMed

Parvizi, Daryousch; Giretzlehner, Michael; Wurzer, Paul; Klein, Limor Dinur; Shoham, Yaron; Bohanon, Fredrick J; Haller, Herbert L; Tuca, Alexandru; Branski, Ludwik K; Lumenta, David B; Herndon, David N; Kamolz, Lars-P

2016-03-01

The aim of this study was to compare the accuracy of burn size estimation using the computer-assisted software BurnCase 3D (RISC Software GmbH, Hagenberg, Austria) with that using a 2D scan, considered to be the actual burn size. Thirty artificial burn areas were pre planned and prepared on three mannequins (one child, one female, and one male). Five trained physicians (raters) were asked to assess the size of all wound areas using BurnCase 3D software. The results were then compared with the real wound areas, as determined by 2D planimetry imaging. To examine inter-rater reliability, we performed an intraclass correlation analysis with a 95% confidence interval. The mean wound area estimations of the five raters using BurnCase 3D were in total 20.7±0.9% for the child, 27.2±1.5% for the female and 16.5±0.1% for the male mannequin. Our analysis showed relative overestimations of 0.4%, 2.8% and 1.5% for the child, female and male mannequins respectively, compared to the 2D scan. The intraclass correlation between the single raters for mean percentage of the artificial burn areas was 98.6%. There was also a high intraclass correlation between the single raters and the 2D Scan visible. BurnCase 3D is a valid and reliable tool for the determination of total body surface area burned in standard models. Further clinical studies including different pediatric and overweight adult mannequins are warranted. Copyright © 2016 Elsevier Ltd and ISBI. All rights reserved.
Performance of regional oxygen saturation monitoring by near-infrared spectroscopy (NIRS) in pediatric inter-hospital transports with special reference to air ambulance transports: a methodological study.

PubMed

Hamrin, Tova Hannegård; Radell, Peter J; Fläring, Urban; Berner, Jonas; Eksborg, Staffan

2017-12-28

The aim of the present study was to evaluate the performance of regional oxygen saturation (rSO 2 ) monitoring with near infrared spectroscopy (NIRS) during pediatric inter-hospital transports and to optimize processing of the electronically stored data. Cerebral (rSO 2 -C) and abdominal (rSO 2 -A) NIRS sensors were used during transport in air ambulance and connecting ground ambulance. Data were electronically stored by the monitor during transport, extracted and analyzed off-line after the transport. After removal of all zero and floor effect values, the Savitzky-Golay algorithm of data smoothing was applied on the NIRS-signal. The second order of smoothing polynomial was used and the optimal number of neighboring points for the smoothing procedure was evaluated. NIRS-data from 38 pediatric patients was examined. Reliability, defined as measurements without values of 0 or 15%, was acceptable during transport (> 90% of all measurements). There were, however, individual patients with < 90% reliable measurements during transport, while no patient was found to have < 90% reliable measurements in hospital. Satisfactory noise reduction of the signal, without distortion of the underlying information, was achieved when 20-50 neighbors ("window-size") were used. The use of NIRS for measuring rSO 2 in clinical studies during pediatric transport in ground and air-ambulance is feasible but hampered by unreliable values and signal interference. By applying the Savitzky-Golay algorithm, the signal-to-noise ratio was improved and enabled better post-hoc signal evaluation.
Correlation between musical responsiveness and developmental age among early age children as assessed by the Non-Verbal Measurement of the Musical Responsiveness of Children.

PubMed

Matsuyama, Kumi

2005-10-01

The currently-available standardized music tests are not suitable for administration to young children and children with special needs because they are complicated and require verbal instructions and verbal responses. A test that was named the Non-Verbal Measurement of the Musical Responsiveness of Children, was developed to assess the musical responsiveness of young children. This test does not depend on verbal instructions, and is composed of two parts, Rhythm and Melody. Ninety-two children [age, range, 6-69 months; 36.39+/-17.61 (mean +/-standard deviation) months] who attended mainstream pre-schools were studied. Each child was tested to see whether the child correctly imitated 7 different patterns of rhythm and 6 different patterns of melody that were delivered by clapping of hands or the voice of the examiner, respectively. The examiner rated whether the child could imitate each pattern and the total score was the sum of successfully reproduced patterns. Two independent observers viewed videotapes of the testing sessions and assigned scores in a similar manner. The inter-rater reliability among the three raters was assessed. The total score in Melody (R=0.63, p<0.001) and the total score in Rhythm (R=0.81, p<0.001) were each correlated with developmental age. The inter-rater reliability was good (Melody: Kendall's W=0.78, Rhythm: Kendall's W=0.95). The degree of musical responsiveness of normal young children is correlated with general development. This measurement tool is valid and reliable for use in young children who lack sufficient verbal understanding to take standardized music tests. This test may also be administered to children with special needs.
Current management and prognostic factors in physiotherapy practice for patients with shoulder pain: design of a prospective cohort study.

PubMed

Karel, Yasmaine H J M; Scholten-Peeters, Wendy G M; Thoomes-de Graaf, Marloes; Duijn, Edwin; Ottenheijm, Ramon P G; van den Borne, Maaike P J; Koes, Bart W; Verhagen, Arianne P; Dinant, Geert-Jan; Tetteroo, Eric; Beumer, Annechien; van Broekhoven, Joost B; Heijmans, Marcel

2013-02-11

Shoulder pain is disabling and has a considerable socio-economic impact. Over 50% of patients presenting in primary care still have symptoms after 6 months; moreover, prognostic factors such as pain intensity, age, disability level and duration of complaints are associated with poor outcome. Most shoulder complaints in this group are categorized as non-specific. Musculoskeletal ultrasound might be a useful imaging method to detect subgroups of patients with subacromial disorders.This article describes the design of a prospective cohort study evaluating the influence of known prognostic and possible prognostic factors, such as findings from musculoskeletal ultrasound outcome and working alliance, on the recovery of shoulder pain. Also, to assess the usual physiotherapy care for shoulder pain and examine the inter-rater reliability of musculoskeletal ultrasound between radiologists and physiotherapists for patients with shoulder pain. A prospective cohort study including an inter-rater reliability study. Patients presenting in primary care physiotherapy practice with shoulder pain are enrolled. At baseline validated questionnaires are used to measure patient characteristics, disease-specific characteristics and social factors. Physical examination is performed according to the expertise of the physiotherapists. Follow-up measurements will be performed 6, 12 and 26 weeks after inclusion. Primary outcome measure is perceived recovery, measured on a 7-point Likert scale. Logistic regression analysis will be used to evaluate the association between prognostic factors and recovery. The ShoCoDiP (Shoulder Complaints and using Diagnostic ultrasound in Physiotherapy practice) cohort study will provide information on current management of patients with shoulder pain in primary care, provide data to develop a prediction model for shoulder pain in primary care and to evaluate whether musculoskeletal ultrasound can improve prognosis.
Infant polysomnography: reliability and validity of infant arousal assessment.

PubMed

Crowell, David H; Kulp, Thomas D; Kapuniai, Linda E; Hunt, Carl E; Brooks, Lee J; Weese-Mayer, Debra E; Silvestri, Jean; Ward, Sally Davidson; Corwin, Michael; Tinsley, Larry; Peucker, Mark

2002-10-01

Infant arousal scoring based on the Atlas Task Force definition of transient EEG arousal was evaluated to determine (1). whether transient arousals can be identified and assessed reliably in infants and (2). whether arousal and no-arousal epochs scored previously by trained raters can be validated reliably by independent sleep experts. Phase I for inter- and intrarater reliability scoring was based on two datasets of sleep epochs selected randomly from nocturnal polysomnograms of healthy full-term, preterm, idiopathic apparent life-threatening event cases, and siblings of Sudden Infant Death Syndrome infants of 35 to 64 weeks postconceptional age. After training, test set 1 reliability was assessed and discrepancies identified. After retraining, test set 2 was scored by the same raters to determine interrater reliability. Later, three raters from the trained group rescored test set 2 to assess inter- and intrarater reliabilities. Interrater and intrarater reliability kappa's, with 95% confidence intervals, ranged from substantial to almost perfect levels of agreement. Interrater reliabilities for spontaneous arousals were initially moderate and then substantial. During the validation phase, 315 previously scored epochs were presented to four sleep experts to rate as containing arousal or no-arousal events. Interrater expert agreements were diverse and considered as noninterpretable. Concordance in sleep experts' agreements, based on identification of the previously sampled arousal and no-arousal epochs, was used as a secondary evaluative technique. Results showed agreement by two or more experts on 86% of the Collaborative Home Infant Monitoring Evaluation Study arousal scored events. Conversely, only 1% of the Collaborative Home Infant Monitoring Evaluation Study-scored no-arousal epochs were rated as an arousal. In summary, this study presents an empirically tested model with procedures and criteria for attaining improved reliability in transient EEG arousal assessments in infants using the modified Atlas Task Force standards. With training based on specific criteria, substantial inter- and intrarater agreement in identifying infant arousals was demonstrated. Corroborative validation results were too disparate for meaningful interpretation. Alternate evaluation based on concordance agreements supports reliance on infant EEG criteria for assessment. Results mandate additional confirmatory validation studies with specific training on infant EEG arousal assessment criteria.
Reliability of the modified Paediatric Evaluation of Disability Inventory, Dutch version (PEDI-NL) for children with cerebral palsy and cerebral visual impairment.

PubMed

Salavati, M; Waninge, A; Rameckers, E A A; de Blécourt, A C E; Krijnen, W P; Steenbergen, B; van der Schans, C P

2015-02-01

The aims of this study were to adapt the Paediatric Evaluation of Disability Inventory, Dutch version (PEDI-NL) for children with cerebral visual impairment (CVI) and cerebral palsy (CP) and determine test-retest and inter-respondent reliability. The Delphi method was used to gain consensus among twenty-one health experts familiar with CVI. Test-retest and inter-respondent reliability were assessed for parents and caregivers of 75 children (aged 50-144 months) with CP and CVI. The percentage identical scores of item scores were computed, as well as the interclass coefficients (ICC) and Cronbach's alphas of scale scores over the domains self-care, mobility, and social function. All experts agreed on the adaptation of the PEDI-NL for children with CVI. On item score, for the Functional Skills scale, mean percentage identical scores variations for test-retest reliability were 73-79 with Caregiver Assistance scale 73-81, and for inter-respondent reliability 21-76 with Caregiver Assistance scale 40-43. For all scales over all domains ICCs exceeded 0.87. For the domains self-care, mobility, and social function, the Functional Skills scale and the Caregiver Assistance scale have Cronbach's alpha above 0.88. The adapted PEDI-NL for children with CP and CVI is reliable and comparable to the original PEDI-NL. Copyright © 2014 Elsevier Ltd. All rights reserved.
Development and psychometric properties of an informant assessment scale of theory of mind for adults with traumatic brain injury.

PubMed

Zhang, Dengke; Pang, Yanxia; Cai, Weixiong; Fazio, Rachel L; Ge, Jianrong; Su, Qiaorong; Xu, Shuiqin; Pan, Yinan; Chen, Sanmei; Zhang, Hongwei

2016-08-01

Impairment of theory of mind (ToM) is a common phenomenon following traumatic brain injury (TBI) that has clear effects on patients' social functioning. A growing body of research has focused on this area, and several methods have been developed to assess ToM deficiency. Although an informant assessment scale would be useful for examining individuals with TBI, very few studies have adopted this approach. The purpose of the present study was to develop an informant assessment scale of ToM for adults with traumatic brain injury (IASToM-aTBI) and to test its reliability and validity with 196 adults with TBI and 80 normal adults. A 44-item scale was developed following a literature review, interviews with patient informants, consultations with experts, item analysis, and exploratory factor analysis (EFA). The following three common factors were extracted: social interaction, understanding of beliefs, and understanding of emotions. The psychometric analyses indicate that the scale has good internal consistency reliability, split-half reliability, test-retest reliability, inter-rater reliability, structural validity, discriminate validity and criterion validity. These results provide preliminary evidence that supports the reliability and validity of the IASToM-aTBI as a ToM assessment tool for adults with TBI.
A medical record review for functional somatic symptoms in children.

PubMed

Rask, Charlotte Ulrikka; Borg, Carsten; Søndergaard, Charlotte; Schulz-Pedersen, Søren; Thomsen, Per Hove; Fink, Per

2010-04-01

The objectives of this study were to develop and test a systematic medical record review for functional somatic symptoms (FSSs) in paediatric patients and to estimate the inter-rater reliability of paediatricians' recognition of FSSs and their associated impairments while using this method. We developed the Medical Record Review for Functional Somatic Symptoms in Children (MRFC) for retrospective medical record review. Described symptoms were categorised as probably, definitely, or not FSSs. FSS-associated impairment was also determined. Three paediatricians performed the MRFC on the medical records of 54 children with a diagnosed, well-defined physical disease and 59 with 'symptom' diagnoses. The inter-rater reliabilities of the recognition and associated impairment of FSSs were tested on 20 of these records. The MRFC allowed identification of subgroups of children with multisymptomatic FSSs, long-term FSSs, and/or impairing FSSs. The FSS inter-rater reliability was good (combined kappa=0.69) but only fair as far as associated impairment was concerned (combined kappa=0.29). In the hands of skilled paediatricians, the MRFC is a reliable method for identifying paediatric patients with diverse types of FSSs for clinical research. However, additional information is needed for reliable judgement of impairment. The method may also prove useful in clinical practice. Copyright 2010 Elsevier Inc. All rights reserved.
Analyses of inter-rater reliability between professionals, medical students and trained school children as assessors of basic life support skills.

PubMed

Beck, Stefanie; Ruhnke, Bjarne; Issleib, Malte; Daubmann, Anne; Harendza, Sigrid; Zöllner, Christian

2016-10-07

Training of lay-rescuers is essential to improve survival-rates after cardiac arrest. Multiple campaigns emphasise the importance of basic life support (BLS) training for school children. Trainings require a valid assessment to give feedback to school children and to compare the outcomes of different training formats. Considering these requirements, we developed an assessment of BLS skills using MiniAnne and tested the inter-rater reliability between professionals, medical students and trained school children as assessors. Fifteen professional assessors, 10 medical students and 111-trained school children (peers) assessed 1087 school children at the end of a CPR-training event using the new assessment format. Analyses of inter-rater reliability (intraclass correlation coefficient; ICC) were performed. Overall inter-rater reliability of the summative assessment was high (ICC = 0.84, 95 %-CI: 0.84 to 0.86, n = 889). The number of comparisons between peer-peer assessors (n = 303), peer-professional assessors (n = 339), and peer-student assessors (n = 191) was adequate to demonstrate high inter-rater reliability between peer- and professional-assessors (ICC: 0.76), peer- and student-assessors (ICC: 0.88) and peer- and other peer-assessors (ICC: 0.91). Systematic variation in rating of specific items was observed for three items between professional- and peer-assessors. Using this assessment and integrating peers and medical students as assessors gives the opportunity to assess hands-on skills of school children with high reliability.
Kinematic repeatability of a multi-segment foot model for dance.

PubMed

Carter, Sarah L; Sato, Nahoko; Hopper, Luke S

2018-03-01

The purpose of this study was to determine the intra and inter-assessor repeatability of a modified Rizzoli Foot Model for analysing the foot kinematics of ballet dancers. Six university-level ballet dancers performed the movements; parallel stance, turnout plié, turnout stance, turnout rise and flex-point-flex. The three-dimensional (3D) position of individual reflective markers and marker triads was used to model the movement of the dancers' tibia, entire foot, hindfoot, midfoot, forefoot and hallux. Intra and inter-assessor reliability demonstrated excellent (ICC ≥ 0.75) repeatability for the first metatarsophalangeal joint in the sagittal plane. Intra-assessor reliability demonstrated excellent (ICC ≥ 0.75) repeatability during flex-point-flex across all inter-segmental angles except for the tibia-hindfoot and hindfoot-midfoot frontal planes. Inter-assessor repeatability ranged from poor to excellent (0.5 > ICC ≥ 0.75) for the 3D segment rotations. The most repeatable measure was the tibia-foot dorsiflexion/plantar flexion articulation whereas the least repeatable measure was the hindfoot-midfoot adduction/abduction articulation. The variation found in the inter-assessor results is likely due to inconsistencies in marker placement. This 3D dance specific multi-segment foot model provides insight into which kinematic measures can be reliably used to ascertain in vivo technical errors and/or biomechanical abnormalities in a dancer's foot motion.

Overcoming the Challenges of Unstructured Data in Multi-site, Electronic Medical Record-based Abstraction

PubMed Central

Polnaszek, Brock; Gilmore-Bykovskyi, Andrea; Hovanes, Melissa; Roiland, Rachel; Ferguson, Patrick; Brown, Roger; Kind, Amy JH

2014-01-01

Background Unstructured data encountered during retrospective electronic medical record (EMR) abstraction has routinely been identified as challenging to reliably abstract, as this data is often recorded as free text, without limitations to format or structure. There is increased interest in reliably abstracting this type of data given its prominent role in care coordination and communication, yet limited methodological guidance exists. Objective As standard abstraction approaches resulted in sub-standard data reliability for unstructured data elements collected as part of a multi-site, retrospective EMR study of hospital discharge communication quality, our goal was to develop, apply and examine the utility of a phase-based approach to reliably abstract unstructured data. This approach is examined using the specific example of discharge communication for warfarin management. Research Design We adopted a “fit-for-use” framework to guide the development and evaluation of abstraction methods using a four step, phase-based approach including (1) team building, (2) identification of challenges, (3) adaptation of abstraction methods, and (4) systematic data quality monitoring. Measures Unstructured data elements were the focus of this study, including elements communicating steps in warfarin management (e.g., warfarin initiation) and medical follow-up (e.g., timeframe for follow-up). Results After implementation of the phase-based approach, inter-rater reliability for all unstructured data elements demonstrated kappas of ≥ 0.89 -- an average increase of + 0.25 for each unstructured data element. Conclusions As compared to standard abstraction methodologies, this phase-based approach was more time intensive, but did markedly increase abstraction reliability for unstructured data elements within multi-site EMR documentation. PMID:27624585
A TWIN STUDY OF SCHIZOAFFECTIVE-MANIA, SCHIZOAFFECTIVE-DEPRESSION AND OTHER PSYCHOTIC SYNDROMES

PubMed Central

Cardno, Alastair G; Rijsdijk, Frühling V; West, Robert M; Gottesman, Irving I; Craddock, Nick; Murray, Robin M; McGuffin, Peter

2012-01-01

The nosological status of schizoaffective disorders remains controversial. Twin studies are potentially valuable for investigating relationships between schizoaffective-mania, schizoaffective-depression and other psychotic syndromes, but no such study has yet been reported. We ascertained 224 probandwise twin pairs (106 monozygotic, 118 same-sex dizygotic), where probands had psychotic or manic symptoms, from the Maudsley Twin Register in London (1948–1993). We investigated Research Diagnostic Criteria schizoaffective-mania, schizoaffective-depression, schizophrenia, mania and depressive psychosis primarily using a non-hierarchical classification, and additionally using hierarchical and data-derived classifications, and a classification featuring broad schizophrenic and manic syndromes without separate schizoaffective syndromes. We investigated inter-rater reliability and co-occurrence of syndromes within twin probands and twin pairs. The schizoaffective syndromes showed only moderate inter-rater reliability. There was general significant co-occurrence between syndromes within twin probands and monozygotic pairs, and a trend for schizoaffective-mania and mania to have the greatest co-occurrence. Schizoaffective syndromes in monozygotic probands were associated with relatively high risk of a psychotic syndrome occurring in their co-twins. The classification of broad schizophrenic and manic syndromes without separate schizoaffective syndromes showed improved inter-rater reliability, but high genetic and environmental correlations between the two broad syndromes. The results are consistent with regarding schizoaffective-mania as due to co-occurring elevated liability to schizophrenia, mania and depression; and schizoaffective-depression as due to co-occurring elevated liability to schizophrenia and depression, but with less elevation of liability to mania. If in due course schizoaffective syndromes show satisfactory inter-rater reliability and some specific etiological factors they could alternatively be regarded as partly independent disorders. PMID:22213671
A twin study of schizoaffective-mania, schizoaffective-depression, and other psychotic syndromes.

PubMed

Cardno, Alastair G; Rijsdijk, Frühling V; West, Robert M; Gottesman, Irving I; Craddock, Nick; Murray, Robin M; McGuffin, Peter

2012-03-01

The nosological status of schizoaffective disorders remains controversial. Twin studies are potentially valuable for investigating relationships between schizoaffective-mania, schizoaffective-depression, and other psychotic syndromes, but no such study has yet been reported. We ascertained 224 probandwise twin pairs [106 monozygotic (MZ), 118 same-sex dizygotic (DZ)], where probands had psychotic or manic symptoms, from the Maudsley Twin Register in London (1948-1993). We investigated Research Diagnostic Criteria schizoaffective-mania, schizoaffective-depression, schizophrenia, mania and depressive psychosis primarily using a non-hierarchical classification, and additionally using hierarchical and data-derived classifications, and a classification featuring broad schizophrenic and manic syndromes without separate schizoaffective syndromes. We investigated inter-rater reliability and co-occurrence of syndromes within twin probands and twin pairs. The schizoaffective syndromes showed only moderate inter-rater reliability. There was general significant co-occurrence between syndromes within twin probands and MZ pairs, and a trend for schizoaffective-mania and mania to have the greatest co-occurrence. Schizoaffective syndromes in MZ probands were associated with relatively high risk of a psychotic syndrome occurring in their co-twins. The classification of broad schizophrenic and manic syndromes without separate schizoaffective syndromes showed improved inter-rater reliability, but high genetic and environmental correlations between the two broad syndromes. The results are consistent with regarding schizoaffective-mania as due to co-occurring elevated liability to schizophrenia, mania, and depression; and schizoaffective-depression as due to co-occurring elevated liability to schizophrenia and depression, but with less elevation of liability to mania. If in due course schizoaffective syndromes show satisfactory inter-rater reliability and some specific etiological factors they could alternatively be regarded as partly independent disorders. Copyright © 2011 Wiley Periodicals, Inc.
THE INTRA- AND INTER-RATER RELIABILITY OF THE SOCCER INJURY MOVEMENT SCREEN (SIMS).

PubMed

McCunn, Robert; Aus der Fünten, Karen; Govus, Andrew; Julian, Ross; Schimpchen, Jan; Meyer, Tim

2017-02-01

The growing volume of movement screening research reveals a belief among practitioners and researchers alike that movement quality may have an association with injury risk. However, existing movement screening tools have not considered the sport-specific movement and injury patterns relevant to soccer. The present study introduces the Soccer Injury Movement Screen (SIMS), which has been designed specifically for use within soccer. Furthermore, the purpose of the present study was to assess the intra- and inter-rater reliability of the SIMS and determine its suitability for use in further research. The study utilized a test-retest design to discern reliablility. Twenty-five (11 males, 14 females) healthy, recreationally active university students (age 25.5 ± 4.0 years, height 171 ± 9 cm, weight 64.7 ± 12.6 kg) agreed to participate. The SIMS contains five sub-tests: the anterior reach, single-leg deadlift, in-line lunge, single-leg hop for distance and tuck jump. Each movement was scored out of 10 points and summed to produce a composite score out of 50. The anterior reach and single-leg hop for distance were scored in real-time while the remaining tests were filmed and scored retrospectively. Three raters conducted the SIMS with each participant on three occasions separated by an average of three and a half days (minimum one day, maximum seven days). Rater 1 re-scored the filmed movements for all participants on all occasions six months later to establish the 'pure' intra-rater (intra-occasion) reliability for those movements. Intraclass correlation coefficient (ICC) values for intra- and inter-rater composite score reliability ranged from 0.66-0.72 and 0.79-0.86 respectively. Weighted kappa values representing the intra- and inter-rater reliability of the individual sub-tests ranged from 0.35-0.91 indicating fair to almost perfect agreement. Establishing the reliability of the SIMS is a prerequisite for further research seeking to investigate the relationship between test score and subsequent injury. The present results indicate acceptable reliability for this purpose; however, room for further development of the intra-rater reliability exists for some of the individual sub-tests. 2b.
THE INTRA- AND INTER-RATER RELIABILITY OF THE SOCCER INJURY MOVEMENT SCREEN (SIMS)

PubMed Central

aus der Fünten, Karen; Govus, Andrew; Julian, Ross; Schimpchen, Jan; Meyer, Tim

2017-01-01

Background/purpose The growing volume of movement screening research reveals a belief among practitioners and researchers alike that movement quality may have an association with injury risk. However, existing movement screening tools have not considered the sport-specific movement and injury patterns relevant to soccer. The present study introduces the Soccer Injury Movement Screen (SIMS), which has been designed specifically for use within soccer. Furthermore, the purpose of the present study was to assess the intra- and inter-rater reliability of the SIMS and determine its suitability for use in further research. Methods The study utilized a test-retest design to discern reliablility. Twenty-five (11 males, 14 females) healthy, recreationally active university students (age 25.5 ± 4.0 years, height 171 ± 9 cm, weight 64.7 ± 12.6 kg) agreed to participate. The SIMS contains five sub-tests: the anterior reach, single-leg deadlift, in-line lunge, single-leg hop for distance and tuck jump. Each movement was scored out of 10 points and summed to produce a composite score out of 50. The anterior reach and single-leg hop for distance were scored in real-time while the remaining tests were filmed and scored retrospectively. Three raters conducted the SIMS with each participant on three occasions separated by an average of three and a half days (minimum one day, maximum seven days). Rater 1 re-scored the filmed movements for all participants on all occasions six months later to establish the ‘pure’ intra-rater (intra-occasion) reliability for those movements. Results Intraclass correlation coefficient (ICC) values for intra- and inter-rater composite score reliability ranged from 0.66-0.72 and 0.79-0.86 respectively. Weighted kappa values representing the intra- and inter-rater reliability of the individual sub-tests ranged from 0.35-0.91 indicating fair to almost perfect agreement. Conclusions Establishing the reliability of the SIMS is a prerequisite for further research seeking to investigate the relationship between test score and subsequent injury. The present results indicate acceptable reliability for this purpose; however, room for further development of the intra-rater reliability exists for some of the individual sub-tests. Level of evidence 2b PMID:28217416
Accuracy, intra- and inter-unit reliability, and comparison between GPS and UWB-based position-tracking systems used for time-motion analyses in soccer.

PubMed

Bastida Castillo, Alejandro; Gómez Carmona, Carlos D; De la Cruz Sánchez, Ernesto; Pino Ortega, José

2018-05-01

There is interest in the accuracy and inter-unit reliability of position-tracking systems to monitor players. Research into this technology, although relatively recent, has grown exponentially in the last years, and it is difficult to find professional team sport that does not use Global Positioning System (GPS) technology at least. The aim of this study is to know the accuracy of both GPS-based and Ultra Wide Band (UWB)-based systems on a soccer field and their inter- and intra-unit reliability. A secondary aim is to compare them for practical applications in sport science. Following institutional ethical approval and familiarization, 10 healthy and well-trained former soccer players (20 ± 1.6 years, 1.76 ± 0.08 cm, and 69.5 ± 9.8 kg) performed three course tests: (i) linear course, (ii) circular course, and (iii) a zig-zag course, all using UWB and GPS technologies. The average speed and distance covered were compared with timing gates and the real distance as references. The UWB technology showed better accuracy (bias: 0.57-5.85%), test-retest reliability (%TEM: 1.19), and inter-unit reliability (bias: 0.18) in determining distance covered than the GPS technology (bias: 0.69-6.05%; %TEM: 1.47; bias: 0.25) overall. Also, UWB showed better results (bias: 0.09; ICC: 0.979; bias: 0.01) for mean velocity measurement than GPS (bias: 0.18; ICC: 0.951; bias: 0.03).
Psychometric Properties of the Self-Perception Profile for Children in Children with Chronic Illness.

PubMed

Ferro, Mark A; Tang, Jennie

2017-07-01

The Self-Perception Profile for Children (SPPC) is a commonly used measure of self-concept in children, but little research has examined its psychometric properties in children newly-diagnosed with chronic illness. Confirmatory factor analysis and examination of reliability and convergent and discriminant validity of the SPPC was conducted in 31 children newly-diagnosed with asthma, diabetes, epilepsy, food allergy, or juvenile arthritis. The unidimensionality of each domain of the SPPC was confirmed, internal reliability was robust (α=.83-.95), and inter-domain polychoric correlations ranged from weak to strong (ρ=.05-.85) Convergent validity was demonstrated with measures of global self-concept and domains of quality of life. The Global Self-worth domain showed discriminant validity between children with and without comorbid mental disorder. Findings extend the psychometric properties of the SPPC as a valid and reliable scale in children newly-diagnosed with chronic illness.
Reliability and type of consumer health documents on the World Wide Web: an annotation study.

PubMed

Martin, Melanie J

2011-01-01

In this paper we present a detailed scheme for annotating medical web pages designed for health care consumers. The annotation is along two axes: first, by reliability (the extent to which the medical information on the page can be trusted), second, by the type of page (patient leaflet, commercial, link, medical article, testimonial, or support). We analyze inter-rater agreement among three judges for each axis. Inter-rater agreement was moderate (0.77 accuracy, 0.62 F-measure, 0.49 Kappa) on the page reliability axis and good (0.81 accuracy, 0.72 F-measure, 0.73 Kappa) along the page type axis. We have shown promising results in this study that appropriate classes of pages can be developed and used by human annotators to annotate web pages with reasonable to good agreement. No.
Establishing Inter- and Intrarater Reliability for High-Stakes Testing Using Simulation.

PubMed

Kardong-Edgren, Suzan; Oermann, Marilyn H; Rizzolo, Mary Anne; Odom-Maryon, Tamara

This article reports one method to develop a standardized training method to establish the inter- and intrarater reliability of a group of raters for high-stakes testing. Simulation is used increasingly for high-stakes testing, but without research into the development of inter- and intrarater reliability for raters. Eleven raters were trained using a standardized methodology. Raters scored 28 student videos over a six-week period. Raters then rescored all videos over a two-day period to establish both intra- and interrater reliability. One rater demonstrated poor intrarater reliability; a second rater failed all students. Kappa statistics improved from the moderate to substantial agreement range with the exclusion of the two outlier raters' scores. There may be faculty who, for different reasons, should not be included in high-stakes testing evaluations. All faculty are content experts, but not all are expert evaluators.
Test-retest and inter- and intrareliability of the quality of the upper-extremity skills test in preschool-age children with cerebral palsy.

PubMed

Haga, Nienke; van der Heijden-Maessen, Hélène C; van Hoorn, Jessika F; Boonstra, Anne M; Hadders-Algra, Mijna

2007-12-01

To investigate the test-retest, inter-, and intraobserver reliability of the Quality of Upper Extremity Skills Test (QUEST) in young children with cerebral palsy (CP). For test-retest reliability, a test-retest design was used; for the intra- and interobserver reliability, the videotaped test was scored on 2 occasions by 1 observer and by various observers. Groups of preschool-age children in 2 general rehabilitation centers. Twenty-one children with CP (12 boys, 9 girls) aged 2 to 4.5 years (mean, 39 mo). Not applicable. Spearman correlation coefficient. The data indicated that test-retest reliability was strong (rho range, .85-.94). Intraobserver agreement (rho range, .63-.95) and agreement between various observers (rho range, .72-.90) were moderate to strong. Test-retest and inter- and intraobserver reliability of the QUEST in preschool-age children with CP is good.
Why Are Experts Correlated? Decomposing Correlations between Judges

ERIC Educational Resources Information Center

Broomell, Stephen B.; Budescu, David V.

2009-01-01

We derive an analytic model of the inter-judge correlation as a function of five underlying parameters. Inter-cue correlation and the number of cues capture our assumptions about the environment, while differentiations between cues, the weights attached to the cues, and (un)reliability describe assumptions about the judges. We study the relative…
A Study of Reliability of Marking and Absolute Grading in Secondary Schools

ERIC Educational Resources Information Center

Abdul Gafoor, K.; Jisha, P.

2014-01-01

Using a non-experimental comparative group design in a sample consisting of 100 English teachers randomly selected from 30 secondary schools of a district of Kerala and assigning fifty teachers to groups for marking and grading, this study compares inter and intra-individual reliability in marking and absolute grading. Studying (1) the in marking…
Validity of a smartphone protractor to measure sagittal parameters in adult spinal deformity.

PubMed

Kunkle, William Aaron; Madden, Michael; Potts, Shannon; Fogelson, Jeremy; Hershman, Stuart

2017-10-01

Smartphones have become an integral tool in the daily life of health-care professionals (Franko 2011). Their ease of use and wide availability often make smartphones the first tool surgeons use to perform measurements. This technique has been validated for certain orthopedic pathologies (Shaw 2012; Quek 2014; Milanese 2014; Milani 2014), but never to assess sagittal parameters in adult spinal deformity (ASD). This study was designed to assess the validity, reproducibility, precision, and efficiency of using a smartphone protractor application to measure sagittal parameters commonly measured in ASD assessment and surgical planning. This study aimed to (1) determine the validity of smartphone protractor applications, (2) determine the intra- and interobserver reliability of smartphone protractor applications when used to measure sagittal parameters in ASD, (3) determine the efficiency of using a smartphone protractor application to measure sagittal parameters, and (4) elucidate whether a physician's level of experience impacts the reliability or validity of using a smartphone protractor application to measure sagittal parameters in ASD. An experimental validation study was carried out. Thirty standard 36″ standing lateral radiographs were examined. Three separate measurements were performed using a marker and protractor; then at a separate time point, three separate measurements were performed using a smartphone protractor application for all 30 radiographs. The first 10 radiographs were then re-measured two more times, for a total of three measurements from both the smartphone protractor and marker and protractor. The parameters included lumbar lordosis, pelvic incidence, and pelvic tilt. Three raters performed all measurements-a junior level orthopedic resident, a senior level orthopedic resident, and a fellowship-trained spinal deformity surgeon. All data, including the time to perform the measurements, were recorded, and statistical analysis was performed to determine intra- and interobserver reliability, as well as accuracy, efficiency, and precision. Statistical analysis using the intra- and interclass correlation coefficient was calculated using R (version 3.3.2, 2016) to determine the degree of intra- and interobserver reliability. High rates of intra- and interobserver reliability were observed between the junior resident, senior resident, and attending surgeon when using the smartphone protractor application as demonstrated by high inter- and intra-class correlation coefficients greater than 0.909 and 0.874 respectively. High rates of inter- and intraobserver reliability were also seen between the junior resident, senior resident, and attending surgeon when a marker and protractor were used as demonstrated by high inter- and intra-class correlation coefficients greater than 0.909 and 0.807 respectively. The lumbar lordosis, pelvic incidence, and pelvic tilt values were accurately measured by all three raters, with excellent inter- and intra-class correlation coefficient values. When the first 10 radiographs were re-measured at different time points, a high degree of precision was noted. Measurements performed using the smartphone application were consistently faster than using a marker and protractor-this difference reached statistical significance of p<.05. Adult spinal deformity radiographic parameters can be measured accurately, precisely, reliably, and more efficiently using a smartphone protractor application than with a standard protractor and wax pencil. A high degree of intra- and interobserver reliability was seen between the residents and attending surgeon, indicating measurements made with a smartphone protractor are unaffected by an observer's level of experience. As a result, smartphone protractors may be used when planning ASD surgery. Copyright © 2017 Elsevier Inc. All rights reserved.
Can we perceptually rate alaryngeal voice? Developing the Sunderland Tracheoesophageal Voice Perceptual Scale.

PubMed

Hurren, A; Hildreth, A J; Carding, P N

2009-12-01

To investigate the inter and intra reliability of raters (in relation to both profession and expertise) when judging two alaryngeal voice parameters: 'Overall Grade' and 'Neoglottal Tonicity'. Reliable perceptual assessment is essential for surgical and therapeutic outcome measurement but has been minimally researched to date. Test of inter and intra rater agreement from audio recordings of 55 tracheoesophageal speakers. Cancer Unit. Twelve speech and language therapists and ten Ear, Nose and Throat surgeons. Perceptual voice parameters of 'Overall Grade' rated with a 0-3 equally appearing interval scale and 'Neoglottal Tonicity' with an 11-point bipolar semantic scale. All raters achieved 'good' agreement for 'Overall Grade' with mean weighted kappa coefficients of 0.78 for intra and 0.70 for inter-rater agreement. All raters achieved 'good' intra-rater agreement for 'Neoglottal Tonicity' (0.64) but inter-rater agreement was only 'moderate' (0.40). However, the expert speech and language therapists sub-group attained 'good' inter-rater agreement with this parameter (0.63). The effect of 'Neoglottal Tonicity' on 'Overall Grade' was examined utilising only expert speech and language therapists data. Linear regression analysis resulted in an r-squared coefficient of 0.67. Analysis of the perceptual impression of hypotonicity and hypertonicity in relation to mean 'Overall Grade' score demonstrated neither tone was linked to a more favourable grade (P = 0.42). Expert speech and language therapist raters may be the optimal judges for tracheoesophageal voice assessment. Tonicity appears to be a good predictor of 'Overall Grade'. These scales have clinical applicability to investigate techniques that facilitate optotonic neoglottal voice quality.
Pelvis and lower limb anatomical landmark calibration precision and its propagation to bone geometry and joint angles.

PubMed

della Croce, U; Cappozzo, A; Kerrigan, D C

1999-03-01

Human movement analysis using stereophotogrammetry is based on the reconstruction of the instantaneous laboratory position of selected bony anatomical landmarks (AL). For this purpose, knowledge of an AL's position in relevant bone-embedded frames is required. Because ALs are not points but relatively large and curved areas, their identification by palpation or other means is subject to both intra- and inter-examiner variability. In addition, the local position of ALs, as reconstructed using an ad hoc experimental procedure (AL calibration), is affected by photogrammetric errors. The intra- and inter-examiner precision with which local positions of pelvis and lower limb palpable bony ALs can be identified and reconstructed were experimentally assessed. Six examiners and two subjects participated in the study. Intra- and inter-examiner precision (RMS distance from the mean position) resulted in the range 6-21 mm and 13-25 mm, respectively. Propagation of the imprecision of ALs to the orientation of bone-embedded anatomical frames and to hip, knee and ankle joint angles was assessed. Results showed that this imprecision may cause distortion in joint angle against time functions to the extent that information relative to angular movements in the range of 10 degrees or lower may be concealed. Bone geometry parameters estimated using the same data showed that the relevant precision does not allow for reliable bone geometry description. These findings, together with those relative to skin movement artefacts reported elsewhere, assist the human movement analyst's consciousness of the possible limitations involved in 3D movement analysis using stereophotogrammetry and call for improvements of the relevant experimental protocols.
The reliability of a segmentation methodology for assessing intramuscular adipose tissue and other soft-tissue compartments of lower leg MRI images.

PubMed

Karampatos, Sarah; Papaioannou, Alexandra; Beattie, Karen A; Maly, Monica R; Chan, Adrian; Adachi, Jonathan D; Pritchard, Janet M

2016-04-01

Determine the reliability of a magnetic resonance (MR) image segmentation protocol for quantifying intramuscular adipose tissue (IntraMAT), subcutaneous adipose tissue, total muscle and intermuscular adipose tissue (InterMAT) of the lower leg. Ten axial lower leg MRI slices were obtained from 21 postmenopausal women using a 1 Tesla peripheral MRI system. Images were analyzed using sliceOmatic™ software. The average cross-sectional areas of the tissues were computed for the ten slices. Intra-rater and inter-rater reliability were determined and expressed as the standard error of measurement (SEM) (absolute reliability) and intraclass coefficient (ICC) (relative reliability). Intra-rater and inter-rater reliability for IntraMAT were 0.991 (95% confidence interval [CI] 0.978-0.996, p < 0.05) and 0.983 (95% CI 0.958-9.993, p < 0.05), respectively. For the other soft tissue compartments, the ICCs were all >0.90 (p < 0.05). The absolute intra-rater and inter-rater reliability (expressed as SEM) for segmenting IntraMAT were 22.19 mm(2) (95% CI 16.97-32.04) and 78.89 mm(2) (95% CI 60.36-113.92), respectively. This is a reliable segmentation protocol for quantifying IntraMAT and other soft-tissue compartments of the lower leg. A standard operating procedure manual is provided to assist users, and SEM values can be used to estimate sample size and determine confidence in repeated measurements in future research.
Neurobehavioural assessment and diagnosis in disorders of consciousness: a preliminary study of the Sensory Tool to Assess Responsiveness (STAR).

PubMed

Stokes, Verity; Gunn, Sarah; Schouwenaars, Katie; Badwan, Derar

2018-09-01

The Sensory Tool to Assess Responsiveness (STAR) is an interdisciplinary neurobehavioural diagnostic tool for individuals with prolonged disorders of consciousness. It utilises current diagnostic criteria and is intended to improve upon the high misdiagnosis rate in this population. This study assesses the inter-rater reliability of the STAR and its diagnostic validity in comparison with the Coma Recovery Scale-Revised (CRS-R) and the Wessex Head Injury Matrix (WHIM). Participants were patients with severe acquired brain injury resulting in a disorder of consciousness, who were admitted to the Royal Leamington Spa Rehabilitation Hospital between 1999 and 2009. Patients underwent sensory stimulation sessions during their period of admission, which were recorded on video. Using this footage, patients were re-assessed for this study using the STAR, WHIM and CRS-R criteria. The STAR demonstrated "moderate" inter-rater reliability, "substantial" diagnostic agreement with the CRS-R, and "moderate" agreement with the WHIM. There were no significant differences between diagnoses assigned by the different assessments. The STAR demonstrated a good degree of inter-rater reliability in identification of diagnoses for patients with disorders of consciousness. The diagnostic outcomes of the STAR agreed at a good level with the CRS-R, moderately with the WHIM, and did not significantly differ from either. This demonstrates the reliability and validity of the STAR, showing its appropriateness for clinical use. Future longitudinal studies and research into the STAR's applicability in long-stay rehabilitation are indicated.
Validity and reliability of the robotic objective structured assessment of technical skills

PubMed Central

Siddiqui, Nazema Y.; Galloway, Michael L.; Geller, Elizabeth J.; Green, Isabel C.; Hur, Hye-Chun; Langston, Kyle; Pitter, Michael C.; Tarr, Megan E.; Martino, Martin A.

2015-01-01

Objective Objective structured assessments of technical skills (OSATS) have been developed to measure the skill of surgical trainees. Our aim was to develop an OSATS specifically for trainees learning robotic surgery. Study Design This is a multi-institutional study in eight academic training programs. We created an assessment form to evaluate robotic surgical skill through five inanimate exercises. Obstetrics/gynecology, general surgery, and urology residents, fellows, and faculty completed five robotic exercises on a standard training model. Study sessions were recorded and randomly assigned to three blinded judges who scored performance using the assessment form. Construct validity was evaluated by comparing scores between participants with different levels of surgical experience; inter- and intra-rater reliability were also assessed. Results We evaluated 83 residents, 9 fellows, and 13 faculty, totaling 105 participants; 88 (84%) were from obstetrics/gynecology. Our assessment form demonstrated construct validity, with faculty and fellows performing significantly better than residents (mean scores: 89 ± 8 faculty; 74 ± 17 fellows; 59 ± 22 residents, p<0.01). In addition, participants with more robotic console experience scored significantly higher than those with fewer prior console surgeries (p<0.01). R-OSATS demonstrated good inter-rater reliability across all five drills (mean Cronbach's α: 0.79 ± 0.02). Intra-rater reliability was also high (mean Spearman's correlation: 0.91 ± 0.11). Conclusions We developed an assessment form for robotic surgical skill that demonstrates construct validity, inter- and intra-rater reliability. When paired with standardized robotic skill drills this form may be useful to distinguish between levels of trainee performance. PMID:24807319
Do children with tuberous sclerosis complex have superior musical skill?--A unique tendency of musical responsiveness in children with TSC.

PubMed

Matsuyama, Kumi; Ohsawa, Isao; Ogawa, Toyoaki

2007-04-01

Tuberous sclerosis complex (TSC) is an autosomal dominant disorder that manifests with symptoms that might include mental retardation, epilepsy, skin lesions, and hamartomas in the heart, brain, and kidneys. Anecdotal reports have characterized children with TSC as having high music responsiveness despite their developmental delay. This study is intended to investigate this putative musical skill of children with TSC and to elucidate the presence of non-delayed facets of their development. This study examined 11 children with TSC: 10 children with DSM-IV autism and 92 healthy children who participated as control subjects. Correlation was examined between results obtained using Non-Verbal MMRC, which is a validated musical responsiveness battery, and results of a scientifically accepted standardized pediatric developmental test: the New Edition of the Kyoto Scale of Psychological Development. Inter-rater reliability among the three raters was also assessed. The rhythm or melody score on the Non-Verbal MMRC and DA among children with TSC showed no significant correlation. In contrast, a significant correlation was found among normal children and those with autism. Moreover, the inter-rater reliability was good. The results demonstrate that children with TSC show high responsiveness to musical stimuli despite otherwise delayed development (e.g., language, cognition, motor skills). This report is the first stating that children with TSC have a unique tendency in terms of correlation between music and developmental age. These findings indicate a non-delayed area of TSC children's development and suggest the use of music as therapeutic intervention.
Reliability of Measurements Performed by Community-Drawn Anthropometrists from Rural Ethiopia

PubMed Central

Ayele, Berhan; Aemere, Abaineh; Gebre, Teshome; Tadesse, Zerihun; Stoller, Nicole E.; See, Craig W.; Yu, Sun N.; Gaynor, Bruce D.; McCulloch, Charles E.; Porco, Travis C.; Emerson, Paul M.; Lietman, Thomas M.; Keenan, Jeremy D.

2012-01-01

Background Undernutrition is an important risk factor for childhood mortality, and remains a major problem facing many developing countries. Millennium Development Goal 1 calls for a reduction in underweight children, implemented through a variety of interventions. To adequately judge the impact of these interventions, it is important to know the reproducibility of the main indicators for undernutrition. In this study, we trained individuals from rural communities in Ethiopia in anthropometry techniques and measured intra- and inter-observer reliability. Methods and Findings We trained 6 individuals without prior anthropometry experience to perform weight, height, and middle upper arm circumference (MUAC) measurements. Two anthropometry teams were dispatched to 18 communities in rural Ethiopia and measurements performed on all consenting pre-school children. Anthropometry teams performed a second independent measurement on a convenience sample of children in order to assess intra-anthropometrist reliability. Both teams measured the same children in 2 villages to assess inter-anthropometrist reliability. We calculated several metrics of measurement reproducibility, including the technical error of measurement (TEM) and relative TEM. In total, anthropometry teams performed measurements on 606 pre-school children, 84 of which had repeat measurements performed by the same team, and 89 of which had measurements performed by both teams. Intra-anthropometrist TEM (and relative TEM) were 0.35 cm (0.35%) for height, 0.05 kg (0.39%) for weight, and 0.18 cm (1.27%) for MUAC. Corresponding values for inter-anthropometrist reliability were 0.67 cm (0.75%) for height, 0.09 kg (0.79%) for weight, and 0.22 kg (1.53%) for MUAC. Inter-anthropometrist measurement error was greater for smaller children than for larger children. Conclusion Measurements of height and weight were more reproducible than measurements of MUAC and measurements of larger children were more reliable than those for smaller children. Community-drawn anthropometrists can provide reliable measurements that could be used to assess the impact of interventions for childhood undernutrition. PMID:22291939

Training less-experienced faculty improves reliability of skills assessment in cardiac surgery.

PubMed

Lou, Xiaoying; Lee, Richard; Feins, Richard H; Enter, Daniel; Hicks, George L; Verrier, Edward D; Fann, James I

2014-12-01

Previous work has demonstrated high inter-rater reliability in the objective assessment of simulated anastomoses among experienced educators. We evaluated the inter-rater reliability of less-experienced educators and the impact of focused training with a video-embedded coronary anastomosis assessment tool. Nine less-experienced cardiothoracic surgery faculty members from different institutions evaluated 2 videos of simulated coronary anastomoses (1 by a medical student and 1 by a resident) at the Thoracic Surgery Directors Association Boot Camp. They then underwent a 30-minute training session using an assessment tool with embedded videos to anchor rating scores for 10 components of coronary artery anastomosis. Afterward, they evaluated 2 videos of a different student and resident performing the task. Components were scored on a 1 to 5 Likert scale, yielding an average composite score. Inter-rater reliabilities of component and composite scores were assessed using intraclass correlation coefficients (ICCs) and overall pass/fail ratings with kappa. All components of the assessment tool exhibited improvement in reliability, with 4 (bite, needle holder use, needle angles, and hand mechanics) improving the most from poor (ICC range, 0.09-0.48) to strong (ICC range, 0.80-0.90) agreement. After training, inter-rater reliabilities for composite scores improved from moderate (ICC, 0.76) to strong (ICC, 0.90) agreement, and for overall pass/fail ratings, from poor (kappa = 0.20) to moderate (kappa = 0.78) agreement. Focused, video-based anchor training facilitates greater inter-rater reliability in the objective assessment of simulated coronary anastomoses. Among raters with less teaching experience, such training may be needed before objective evaluation of technical skills. Published by Elsevier Inc.
Reliability of the Matson Evaluation of Social Skills with Youngsters (MESSY) for Children with Autism Spectrum Disorders

ERIC Educational Resources Information Center

Matson, Johnny L.; Horovitz, Max; Mahan, Sara; Fodstad, Jill

2013-01-01

The purpose of this paper was to update the psychometrics of the "Matson Evaluation of Social Skills for Youngsters" ("MESSY") with children with Autism Spectrum Disorders (ASD), specifically with respect to internal consistency, split-half reliability, and inter-rater reliability. In Study 1, 114 children with ASD (Autistic Disorder, Asperger's…
Inter-rater reliability of the Sødring Motor Evaluation of Stroke patients (SMES).

PubMed

Halsaa, K E; Sødring, K M; Bjelland, E; Finsrud, K; Bautz-Holter, E

1999-12-01

The Sødring Motor Evaluation of Stroke patients is an instrument for physiotherapists to evaluate motor function and activities in stroke patients. The rating reflects quality as well as quantity of the patient's unassisted performance within three domains: leg, arm and gross function. The inter-rater reliability of the method was studied in a sample of 30 patients admitted to a stroke rehabilitation unit. Three therapists were involved in the study; two therapists assessed the same patient on two consecutive days in a balanced design. Cohen's weighted kappa and McNemar's test of symmetry were used as measures of item reliability, and the intraclass correlation coefficient was used to express the reliability of the sumscores. For 24 out of 32 items the weighted kappa statistic was excellent (0.75-0.98), while 7 items had a kappa statistic within the range 0.53-0.74 (fair to good). The reliability of one item was poor (0.13). The intraclass correlation coefficient for the three sumscores was 0.97, 0.91 and 0.97. We conclude that the Sødring Motor Evaluation of Stroke patients is a reliable measure of motor function in stroke patients undergoing rehabilitation.
Rater reliability and concurrent validity of the Keyboard Personal Computer Style instrument (K-PeCS).

PubMed

Baker, Nancy A; Cook, James R; Redfern, Mark S

2009-01-01

This paper describes the inter-rater and intra-rater reliability, and the concurrent validity of an observational instrument, the Keyboard Personal Computer Style instrument (K-PeCS), which assesses stereotypical postures and movements associated with computer keyboard use. Three trained raters independently rated the video clips of 45 computer keyboard users to ascertain inter-rater reliability, and then re-rated a sub-sample of 15 video clips to ascertain intra-rater reliability. Concurrent validity was assessed by comparing the ratings obtained using the K-PeCS to scores developed from a 3D motion analysis system. The overall K-PeCS had excellent reliability [inter-rater: intra-class correlation coefficients (ICC)=.90; intra-rater: ICC=.92]. Most individual items on the K-PeCS had from good to excellent reliability, although six items fell below ICC=.75. Those K-PeCS items that were assessed for concurrent validity compared favorably to the motion analysis data for all but two items. These results suggest that most items on the K-PeCS can be used to reliably document computer keyboarding style.
Movement Assessment of Children (MAC): validity, reliability, stability and sensitivity to change in typically developing children.

PubMed

Chandler, L S; Terhorst, L; Rogers, J C; Holm, M B

2016-07-01

The purpose of this study was to establish the validity, reliability, stability and sensitivity to change of the family-centred Movement Assessment of Children (MAC) in typically developing infants/toddlers from 2 months (1 month 16 days) to 2 years (24 months 15 days) of age. Assessment of infant/toddler motor development is critical so that infants and toddlers who are at-risk for developmental delay or whose functional motor development is delayed can be monitored and receive therapy to improve their developmental outcomes. Infants/toddlers are thought to be more responsive during the MAC assessment because parents and siblings participate and elicit responses. Two hundred seventy six children and 405 assessments contributed to the establishment of age-related parameters for typically developing infants and toddlers on the MAC. The MAC assesses three core domains of functional movement (head control, upper extremities and hands, pelvis and lower extremities), and generates a core total score. Four explanatory domains serve to alert examiners to factors that may impact atypical development (general observations, special senses, primitive reflexes/reactions, muscle tone). Construct validity of functional motor development was examined using the relationship between incremental increases in scores and increases in participants' ages. Subsamples were used to establish inter-rater reliability, test-retest reliability, stability and sensitivity to change. Construct validity was established and inter-rater reliability ICCs for the core items and core total ranged from 0.83 to 0.99. Percent agreement for the explanatory items ranged from 0.72 to 0.96. Stability within age grouping was consistent from baseline to 6 months post-baseline, and sensitivity to change from baseline to 6 months was significant for all core items and the total score. The MAC has proven to be a well-constructed assessment of infant and toddler functional motor development. It is a family-centred and efficient tool that can be used to assess and follow-up of infants and toddlers from 2 months to 2 years. © 2016 John Wiley & Sons Ltd.
Leveraging Data Sampling and Practical Knowledge: Field Instructors' Perceptions about Inter-Rater Reliability Data

ERIC Educational Resources Information Center

Soslau, Elizabeth; Lewis, Kandia

2014-01-01

For accreditation and programmatic decision making, education school administrators use inter-rater reliability analyses to judge credibility of student-teacher assessments. Although weak levels of agreement between university-appointed supervisors and cooperating teachers are usually interpreted to indicate that the process is not being…
Stroke and aphasia quality-of-life scale-39: Reliability and validity of the Turkish version.

PubMed

Noyan-ErbaŞ, AyŞin; Toğram, Bülent

2016-10-01

The aim of this study was to adapt the stroke and aphasia quality-of-life scale-39 (SAQoL-39) to the Turkish language and carry out a reliability and validity study of the instrument in a group of patients with aphasia. The study was a descriptive study and contained three phases: adaptation of the SAQoL-39 to the Turkish language, administration of the scale to 30 aphasia patients and reliability and validity studies of the scale. Internal consistency was assessed with Cronbach's alpha and test-re-test reliability was explored (n = 14). The adaptation process was completed based on inter-rater agreement on the translated items and within the scope of final editing by the authors of the study. The SAQoL-39 in Turkish exhibited high test-re-test reliability (ICC =0.97) as well as acceptability with minimal missing data (0-1.4). This instrument exhibited high internal consistency (Cronbach's α = 0.70-0.97), domain-total correlations (r = 0.76-0.85) and inter-domain correlations (r = 0.40-0.68). The analysis shows that the Turkish version of SAQoL-39 is a scale that is highly acceptable, valid and reliable and can be easily used in evaluating the quality-of-life of Turkish people with aphasia.
Structured assessment of current mental state in clinical practice: an international study of the reliability and validity of the Current Psychiatric State interview, CPS-50.

PubMed

Falloon, I R H; Mizuno, M; Murakami, M; Roncone, R; Unoka, Z; Harangozo, J; Pullman, J; Gedye, R; Held, T; Hager, B; Erickson, D; Burnett, K

2005-01-01

To develop a reliable standardized assessment of psychiatric symptoms for use in clinical practice. A 50-item interview, the Current Psychiatric State 50 (CPS-50), was used to assess 237 patients with a range of psychiatric diagnoses. Ratings were made by interviewers after a 2-day training. Comparisons of inter-rater reliability on each item and on eight clinical subscales were made across four international centres and between psychiatrists and non-psychiatrists. A principal components analysis was used to validate these clinical scales. Acceptable inter-rater reliability (intra-class coefficient > 0.80) was found for 46 of the 50 items, and for all eight subscales. There was no difference between centres or between psychiatrists and non-psychiatrists. The principal components analysis factors were similar to the clinical scales. The CPS-50 is a reliable standardized assessment of current mental status that can be used in clinical practice by all mental health professionals after brief training. Blackwell Munksgaard 2004
[Spanish validation of the MacArthur Competence Assessment Tool for Treatment interview to assess patients competence to consent treatment].

PubMed

Alvarez Marrodán, Ignacio; Baón Pérez, Beatriz; Navío Acosta, Mercedes; López-Antón, Raul; Lobo Escolar, Elena; Ventura Faci, Tirso

2014-09-09

To validate the MacArthur Competence Assessment Tool for Treatment (MacCAT-T) Spanish version, which assesses the mental capacity of patients to consent treatment, by examining 4 areas (Understanding, Appreciation, Reasoning and Expressing a choice). 160 subjects (80 Internal Medicine inpatients, 40 Psychiatric inpatients and 40 healthy controls). MacCAT-T, Mini-Mental Status Examination (MMSE). Feasibility study, reliability and validity calculations (against to gold standard of clinical expert). Mean duration of the MacCAT-T interview was 18min. Inter-rater reliability: Intraclass correlation coefficient for Understanding=0.98, Appreciation=0.97, Reasoning=0.98, Expressing a choice=0.91. Internal consistency (Cronbach's alpha): Understanding=0.87, for Appreciation=0.76, for Reasoning=0.86. Patients considered to be incapable (gold standard) scored lower in all the MacCAT-T areas. Poor performance on the MacCAT-T was related to cognitive impairment assessed by MMSE. Spanish version of the MacCAT-T is feasible, reliable, and valid for assessing the capacity of patients to consent treatment. Copyright © 2013 Elsevier España, S.L. All rights reserved.
The Mental Disability Military Assessment Tool: A Reliable Tool for Determining Disability in Veterans with Post-traumatic Stress Disorder.

PubMed

Fokkens, Andrea S; Groothoff, Johan W; van der Klink, Jac J L; Popping, Roel; Stewart, Roy E; van de Ven, Lex; Brouwer, Sandra; Tuinstra, Jolanda

2015-09-01

An assessment tool was developed to assess disability in veterans who suffer from post-traumatic stress disorder (PTSD) due to a military mission. The objective of this study was to determine the reliability, intra-rater and inter-rater variation of the Mental Disability Military (MDM) assessment tool. Twenty-four assessment interviews of veterans with an insurance physician were videotaped. Each videotaped interview was assessed by a group of five independent raters on limitations of the veterans using the MDM assessment tool. After 2 months the raters repeated this procedure. Next the intra-rater and inter-rater variation was assessed with an adjusted version of AG09 computing weighted percentage agreement. The results of this study showed that both the intra-rater variation and inter-rater variation on the ten subcategories of the MDM assessment tool were small, with an agreement of 84-100% within raters and 93-100% between raters. The MDM assessment tool proves to be a reliable instrument to measure PTSD limitations in functioning in Dutch military veterans who apply for disability compensation. Further research is needed to assess the validity of this instrument.
Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial

PubMed Central

Hallgren, Kevin A.

2012-01-01

Many research designs require the assessment of inter-rater reliability (IRR) to demonstrate consistency among observational ratings provided by multiple coders. However, many studies use incorrect statistical procedures, fail to fully report the information necessary to interpret their results, or do not address how IRR affects the power of their subsequent analyses for hypothesis testing. This paper provides an overview of methodological issues related to the assessment of IRR with a focus on study design, selection of appropriate statistics, and the computation, interpretation, and reporting of some commonly-used IRR statistics. Computational examples include SPSS and R syntax for computing Cohen’s kappa and intra-class correlations to assess IRR. PMID:22833776
Oxytocin enhances inter-brain synchrony during social coordination in male adults

PubMed Central

Mu, Yan; Guo, Chunyan

2016-01-01

Recent brain imaging research has revealed oxytocin (OT) effects on an individual's brain activity during social interaction but tells little about whether and how OT modulates the coherence of inter-brain activity related to two individuals' coordination behavior. We developed a new real-time coordination game that required two individuals of a dyad to synchronize with a partner (coordination task) or with a computer (control task) by counting in mind rhythmically. Electroencephalography (EEG) was recorded simultaneously from a dyad to examine OT effects on inter-brain synchrony of neural activity during interpersonal coordination. Experiment 1 found that dyads showed smaller interpersonal time lags of counting and greater inter-brain synchrony of alpha-band neural oscillations during the coordination (vs control) task and these effects were reliably observed in female but not male dyads. Moreover, the increased alpha-band inter-brain synchrony predicted better interpersonal behavioral synchrony across all participants. Experiment 2, using a double blind, placebo-controlled between-subjects design, revealed that intranasal OT vs placebo administration in male dyads improved interpersonal behavioral synchrony in both the coordination and control tasks but specifically enhanced alpha-band inter-brain neural oscillations during the coordination task. Our findings provide first evidence that OT enhances inter-brain synchrony in male adults to facilitate social coordination. PMID:27510498
Placido disk-based topography versus high-resolution rotating Scheimpflug camera for corneal power measurements in keratoconic and post-LASIK eyes: reliability and agreement.

PubMed

Penna, Rachele R; de Sanctis, Ugo; Catalano, Martina; Brusasco, Luca; Grignolo, Federico M

2017-01-01

To compare the repeatability/reproducibility of measurement by high-resolution Placido disk-based topography with that of a high-resolution rotating Scheimpflug camera and assess the agreement between the two instruments in measuring corneal power in eyes with keratoconus and post-laser in situ keratomileusis (LASIK). One eye each of 36 keratoconic patients and 20 subjects who had undergone LASIK was included in this prospective observational study. Two independent examiners worked in a random order to take three measurements of each eye with both instruments. Four parameters were measured on the anterior cornea: steep keratometry (Ks), flat keratometry (Kf), mean keratometry (Km), and astigmatism (Ks-Kf). Intra-examiner repeatability and inter-examiner reproducibility were evaluated by calculating the within-subject standard deviation (Sw) the coefficient of repeatability (R), the coefficient of variation (CoV), and the intraclass correlation coefficient (ICC). Agreement between instruments was tested with the Bland-Altman method by calculating the 95% limits of agreement (95% LoA). In keratoconic eyes, the intra-examiner and inter-examiner ICC were >0.95. As compared with measurement by high-resolution Placido disk-based topography, the intra-examiner R of the high-resolution rotating Scheimpflug camera was lower for Kf (0.32 vs 0.88), Ks (0.61 vs 0.88), and Km (0.32 vs 0.84) but higher for Ks-Kf (0.70 vs 0.57). Inter-examiner R values were lower for all parameters measured using the high-resolution rotating Scheimpflug camera. The 95% LoA were -1.28 to +0.55 for Kf, -1.36 to +0.99 for Ks, -1.08 to +0.50 for Km, and -1.11 to +1.48 for Ks-Kf. In the post-LASIK eyes, the intra-examiner and inter-examiner ICC were >0.87 for all parameters. The intra-examiner and inter-examiner R were lower for all parameters measured using the high-resolution rotating Scheimpflug camera. The intra-examiner R was 0.17 vs 0.88 for Kf, 0.21 vs 0.88 for Ks, 0.17 vs 0.86 for Km, and 0.28 vs 0.33 for Ks-Kf. The inter-examiner R was 0.09 vs 0.64 for Kf, 0.15 vs 0.56 for Ks, 0.09 vs 0.59 for Km, and 0.18 vs 0.23 for Ks-Kf. The 95% LoA were -0.54 to +0.58 for Kf, -0.51 to +0.53 for Ks and Km, and -0.28 to +0.27 for Ks-Kf. As compared with Placido disk-based topography, the high-resolution rotating Scheimpflug camera provides more repeatable and reproducible measurements of Ks, Kf and Ks in keratoconic and post-LASIK eyes. Agreement between instruments is fair in keratoconus and very good in post-LASIK eyes.
Placido disk-based topography versus high-resolution rotating Scheimpflug camera for corneal power measurements in keratoconic and post-LASIK eyes: reliability and agreement

PubMed Central

Penna, Rachele R.; de Sanctis, Ugo; Catalano, Martina; Brusasco, Luca; Grignolo, Federico M.

2017-01-01

AIM To compare the repeatability/reproducibility of measurement by high-resolution Placido disk-based topography with that of a high-resolution rotating Scheimpflug camera and assess the agreement between the two instruments in measuring corneal power in eyes with keratoconus and post-laser in situ keratomileusis (LASIK). METHODS One eye each of 36 keratoconic patients and 20 subjects who had undergone LASIK was included in this prospective observational study. Two independent examiners worked in a random order to take three measurements of each eye with both instruments. Four parameters were measured on the anterior cornea: steep keratometry (Ks), flat keratometry (Kf), mean keratometry (Km), and astigmatism (Ks-Kf). Intra-examiner repeatability and inter-examiner reproducibility were evaluated by calculating the within-subject standard deviation (Sw) the coefficient of repeatability (R), the coefficient of variation (CoV), and the intraclass correlation coefficient (ICC). Agreement between instruments was tested with the Bland-Altman method by calculating the 95% limits of agreement (95% LoA). RESULTS In keratoconic eyes, the intra-examiner and inter-examiner ICC were >0.95. As compared with measurement by high-resolution Placido disk-based topography, the intra-examiner R of the high-resolution rotating Scheimpflug camera was lower for Kf (0.32 vs 0.88), Ks (0.61 vs 0.88), and Km (0.32 vs 0.84) but higher for Ks-Kf (0.70 vs 0.57). Inter-examiner R values were lower for all parameters measured using the high-resolution rotating Scheimpflug camera. The 95% LoA were -1.28 to +0.55 for Kf, -1.36 to +0.99 for Ks, -1.08 to +0.50 for Km, and -1.11 to +1.48 for Ks-Kf. In the post-LASIK eyes, the intra-examiner and inter-examiner ICC were >0.87 for all parameters. The intra-examiner and inter-examiner R were lower for all parameters measured using the high-resolution rotating Scheimpflug camera. The intra-examiner R was 0.17 vs 0.88 for Kf, 0.21 vs 0.88 for Ks, 0.17 vs 0.86 for Km, and 0.28 vs 0.33 for Ks-Kf. The inter-examiner R was 0.09 vs 0.64 for Kf, 0.15 vs 0.56 for Ks, 0.09 vs 0.59 for Km, and 0.18 vs 0.23 for Ks-Kf. The 95% LoA were -0.54 to +0.58 for Kf, -0.51 to +0.53 for Ks and Km, and -0.28 to +0.27 for Ks-Kf. CONCLUSION As compared with Placido disk-based topography, the high-resolution rotating Scheimpflug camera provides more repeatable and reproducible measurements of Ks, Kf and Ks in keratoconic and post-LASIK eyes. Agreement between instruments is fair in keratoconus and very good in post-LASIK eyes. PMID:28393039
Development of the Music Therapy Assessment Tool for Advanced Huntington's Disease: A Pilot Validation Study.

PubMed

O'Kelly, Julian; Bodak, Rebeka

2016-01-01

Case studies of people with Huntington's disease (HD) report that music therapy provides a range of benefits that may improve quality of life; however, no robust music therapy assessment tools exist for this population. Develop and conduct preliminary psychometric testing of a music therapy assessment tool for patients with advanced HD. First, we established content and face validity of the Music Therapy Assessment Tool for Advanced HD (MATA-HD) through focus groups and field testing. Second, we examined psychometric properties of the resulting MATA-HD in terms of its construct validity, internal consistency, and inter-rater and intra-rater reliability over 10 group music therapy sessions with 19 patients. The resulting MATA-HD included a total of 15 items across six subscales (Arousal/Attention, Physical Presentation, Communication, Musical, Cognition, and Psychological/Behavioral). We found good construct validity (r ≥ 0.7) for Mood, Communication Level, Communication Effectiveness, Choice, Social Behavior, Arousal, and Attention items. Cronbach's α of 0.825 indicated good internal consistency across 11 items with a common focus of engagement in therapy. The inter-rater reliability (IRR) Intra-Class Coefficient (ICC) scores averaged 0.65, and a mean intra-rater ICC reliability of 0.68 was obtained. Further training and retesting provided a mean of IRR ICC of 0.7. Preliminary data indicate that the MATA-HD is a promising tool for measuring patient responses to music therapy interventions across psychological, physical, social, and communication domains of functioning in patients with advanced HD. © the American Music Therapy Association 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
A cross-validation study of the TGMD-2: The case of an adolescent population.

PubMed

Issartel, Johann; McGrane, Bronagh; Fletcher, Richard; O'Brien, Wesley; Powell, Danielle; Belton, Sarahjane

2017-05-01

This study proposes an extension of a widely used test evaluating fundamental movement skills proficiency to an adolescent population, with a specific emphasis on validity and reliability for this older age group. Cross-sectional observational study. A total of 844 participants (n=456 male, 12.03±0.49) participated in this study. The 12 fundamental movement skills of the TGMD-2 were assessed. Inter-rater reliability was examined to ensure a minimum of 95% consistency between coders. Confirmatory factor analysis was undertaken with a one-factor model (all 12 skills) and two-factor model (6 locomotor skills and 6 object-control skills) as proposed by Ulrich et al. (2000). The model fit was examined using χ 2 , TLI, CFI and RMSEA. Test-retest reliability was carried out with a subsample of 35 participants. The test-retest reliability reached Intraclass Correlation Coefficient of 0.78 (locomotor), 0.76 (object related) and 0.91 (gross motor skill proficiency). The confirmatory factor analysis did not display a good fit for either the one-factor or two-factor model due to a really low contribution of several skills. A reduction in the number of skills to just seven (run, gallop, hop, horizontal jump, bounce, kick and roll) revealed an overall good fit by TLI, CFI and RMSEA measures. The proposed new model offers the possibility of longitudinal studies to track the maturation of fundamental movement skills across the child and adolescent spectrum, while also giving researchers a valid assessment to tool to evaluate adolescent fundamental movement skills proficiency level. Copyright © 2016 Sports Medicine Australia. All rights reserved.
Computerized back postural assessment in physiotherapy practice: Intra-rater and inter-rater reliability of the MIDAS system.

PubMed

McAlpine, R T; Bettany-Saltikov, J A; Warren, J G

2009-01-01

Assessment of spinal posture during physiotherapy practice is routine, yet few objective measures exist to this end. The Middlesbrough Integrated Digital Assessment System (MIDAS) is a low cost portable system able to record 3D information on posture. The purpose of this study was to assess both the intra-rater and inter-rater reliability of the MIDAS system. Twenty-five healthy subjects were recruited. A repeated measures design was used to record fifteen pre-palpated landmarks on the back of each subject. To limit the sources of variability, the principal researcher palpated the landmarks for each subject. Each of three raters took two measurements on each subject in a standardized upright posture. X (medio-lateral), Y (antero-posterior) and Z (height) landmark positions were recorded via a computer interface. Both intra-rater agreement (mean ICCs - rater 1 r=0.970, rater 2 r=0.965 and rater 3 r=0.965, p< 0.001) and inter-rater agreement (mean ICCs r=0.967, p< 0.001) was very high between repeated measures and between markers. Error values for the z-axis (height) were the lowest. The MIDAS demonstrated both high inter-rater and intra-rater reliability and provides an objective method for the assessment of posture in physiotherapy practice.
Reliability of injury grading systems for patients with blunt splenic trauma.

PubMed

Olthof, D C; van der Vlies, C H; Scheerder, M J; de Haan, R J; Beenen, L F M; Goslings, J C; van Delden, O M

2014-01-01

The most widely used grading system for blunt splenic injury is the American Association for the Surgery of Trauma (AAST) organ injury scale. In 2007 a new grading system was developed. This 'Baltimore CT grading system' is superior to the AAST classification system in predicting the need for angiography and embolization or surgery. The objective of this study was to assess inter- and intraobserver reliability between radiologists in classifying splenic injury according to both grading systems. CT scans of 83 patients with blunt splenic injury admitted between 1998 and 2008 to an academic Level 1 trauma centre were retrospectively reviewed. Inter and intrarater reliability were expressed in Cohen's or weighted Kappa values. Overall weighted interobserver Kappa coefficients for the AAST and 'Baltimore CT grading system' were respectively substantial (kappa=0.80) and almost perfect (kappa=0.85). Average weighted intraobserver Kappa's values were in the 'almost perfect' range (AAST: kappa=0.91, 'Baltimore CT grading system': kappa=0.81). The present study shows that overall the inter- and intraobserver reliability for grading splenic injury according to the AAST grading system and 'Baltimore CT grading system' are equally high. Because of the integration of vascular injury, the 'Baltimore CT grading system' supports clinical decision making. We therefore recommend use of this system in the classification of splenic injury. Copyright © 2012 Elsevier Ltd. All rights reserved.
A Comparison of Rubrics and Graded Category Rating Scales with Various Methods Regarding Raters' Reliability

ERIC Educational Resources Information Center

Dogan, C. Deha; Uluman, Müge

2017-01-01

The aim of this study was to determine the extent at which graded-category rating scales and rubrics contribute to inter-rater reliability. The research was designed as a correlational study. Study group consisted of 82 students attending sixth grade and three writing course teachers in a private elementary school. A performance task was…
Pre-operative Duplex Ultrasonography in Arteriovenous Fistula Creation: Intra- and Inter-observer Agreement.

PubMed

Zonnebeld, Niek; Maas, Tommy M G; Huberts, Wouter; van Loon, Magda M; Delhaas, Tammo; Tordoir, Jan H M

2017-11-01

Although clinical guidelines on arteriovenous fistula (AVF) creation advocate minimum luminal arterial and venous diameters, assessed by duplex ultrasonography (DUS), the clinical value of routine DUS examination is under debate. DUS might be an insufficiently repeatable and/or reproducible imaging modality because of its operator dependency. The present study aimed to assess intra- and inter-observer agreement of DUS examination in support of AVF surgery planning. Ten end stage renal disease patients were included, to assess intra- and inter-observer agreement of pre-operative DUS measurements. All measurements were performed by two trained and experienced vascular technicians, blinded to measurement readings. From the routine DUS protocol, representative measurements (venous diameters, and arterial diameters and volume flow in the upper arm and forearm) were selected. For intra-observer agreement the measurements were performed in triplicate, with the probe released from the skin between each. Intraclass correlation coefficients were calculated for intra- and inter-observer agreement, and Bland-Altman plots used to graphically display mean measurement differences and limits of agreement. Ten patients (6 male, 59.4±19.7 years) consented to participate, and all predefined measurements were obtained. Intraclass correlation coefficients for intra-observer agreement of diameter measurements were at least 0.90 (95% CI 0.74-0.97; radial artery). Inter-observer agreement was at least 0.83 (0.46-0.96; lateral diameter upper arm cephalic vein). The Bland-Altman plots showed acceptable mean measurement differences and limits of agreement. In experienced hands, excellent intra- and inter-observer agreement can be reached for the discrete pre-operative DUS measurements advocated in clinical guidelines. DUS is therefore a reliable imaging modality to support AVF surgery planning. The content of DUS protocols, however, needs further standardisation. Copyright © 2017 European Society for Vascular Surgery. Published by Elsevier Ltd. All rights reserved.

Manual unloading of the lumbar spine: can it identify immediate responders to mechanical traction in a low back pain population? A study of reliability and criterion referenced predictive validity

PubMed Central

Swanson, Brian T.; Riley, Sean P.; Cote, Mark P.; Leger, Robin R.; Moss, Isaac L.; Carlos,, John

2016-01-01

Background To date, no research has examined the reliability or predictive validity of manual unloading tests of the lumbar spine to identify potential responders to lumbar mechanical traction. Purpose To determine: (1) the intra and inter-rater reliability of a manual unloading test of the lumbar spine and (2) the criterion referenced predictive validity for the manual unloading test. Methods Ten volunteers with low back pain (LBP) underwent a manual unloading test to establish reliability. In a separate procedure, 30 consecutive patients with LBP (age 50·86±11·51) were assessed for pain in their most provocative standing position (visual analog scale (VAS) 49·53±25·52 mm). Patients were assessed with a manual unloading test in their most provocative position followed by a single application of intermittent mechanical traction. Post traction, pain in the provocative position was reassessed and utilized as the outcome criterion. Results The test of unloading demonstrated substantial intra and inter-rater reliability K = 1·00, P = 0·002, K = 0·737, P = 0·001, respectively. There were statistically significant within group differences for pain response following traction for patients with a positive manual unloading test (P<0·001), while patients with a negative manual unloading test did not demonstrate a statistically significant change (P>0·05). There were significant between group differences for proportion of responders to traction based on manual unloading response (P = 0·031), and manual unloading response demonstrated a moderate to strong relationship with traction response Phi = 0·443, P = 0·015. Discussion and conclusion The manual unloading test appears to be a reliable test and has a moderate to strong correlation with pain relief that exceeds minimal clinically important difference (MCID) following traction supporting the validity of this test. PMID:27559274
Post-traumatic subtalar osteoarthritis: which grading system should we use?

PubMed

de Muinck Keizer, Robert-Jan O; Backes, Manouk; Dingemans, Siem A; Goslings, J Carel; Schepers, Tim

2016-09-01

To assess and compare post-traumatic osteoarthritis following intra-articular calcaneal fractures, one must have a reliable grading system that consistently grades the post-traumatic changes of the joint. A reliable grading system aids in the communication between treating physicians and improves the interpretation of research. To date, there is no consensus on what grading system to use in the evaluation of post-traumatic subtalar osteoarthritis. The objective of this study was to determine and compare the inter- and intra-rater reliability of two grading systems for post-traumatic subtalar osteoarthritis. Four observers evaluated 50 calcaneal fractures at least one year after trauma on conventional oblique lateral, internally and externally rotated views, and graded post-traumatic subtalar osteoarthritis using the Kellgren and Lawrence Grading Scale (KLGS) and the Paley Grading System (PGS). Inter- and intra-rater reliability were calculated and compared. The inter-rater reliability showed an intra-class correlation (ICC) of 0.54 (95 % CI 0.40-0.67) for the KLGS and an ICC of 0.41 (95 % CI 0.26 - 0.57) for the PGS. This difference was not statistically significant. The intra-rater reliability showed a mean weighted kappa of 0.62 for both the KLGS and the PGS. There is no statistically significant difference in reliability between the Kellgren and Lawrence Grading System (KLGS) and the Paley Grading System (PGS). The PGS allows for an easy two-step approach making it easy for everyday clinical purposes. For research purposes however, the more detailed and widely used KLGS seems preferable.
Repeated stimulation, inter-stimulus interval and inter-electrode distance alters muscle contractile properties as measured by Tensiomyography

PubMed Central

Johnson, Mark I.; Francis, Peter

2018-01-01

Context The influence of methodological parameters on the measurement of muscle contractile properties using Tensiomyography (TMG) has not been published. Objective To investigate the; (1) reliability of stimulus amplitude needed to elicit maximum muscle displacement (Dm), (2) effect of changing inter-stimulus interval on Dm (using a fixed stimulus amplitude) and contraction time (Tc), (3) the effect of changing inter-electrode distance on Dm and Tc. Design Within subject, repeated measures. Participants 10 participants for each objective. Main outcome measures Dm and Tc of the rectus femoris, measured using TMG. Results The coefficient of variance (CV) and the intra-class correlation (ICC) of stimulus amplitude needed to elicit maximum Dm was 5.7% and 0.92 respectively. Dm was higher when using an inter-electrode distance of 7cm compared to 5cm [P = 0.03] and when using an inter-stimulus interval of 10s compared to 30s [P = 0.017]. Further analysis of inter-stimulus interval data, found that during 10 repeated stimuli Tc became faster after the 5th measure when compared to the second measure [P<0.05]. The 30s inter-stimulus interval produced the most stable Tc over 10 measures compared to 10s and 5s respectively. Conclusion Our data suggest that the stimulus amplitude producing maximum Dm of the rectus femoris is reliable. Inter-electrode distance and inter-stimulus interval can significantly influence Dm and/ or Tc. Our results support the use of a 30s inter-stimulus interval over 10s or 5s. Future studies should determine the influence of methodological parameters on muscle contractile properties in a range of muscles. PMID:29451885
Intra-and inter-observer reliability of nailfold videocapillaroscopy - A possible outcome measure for systemic sclerosis-related microangiopathy.

PubMed

Dinsdale, Graham; Moore, Tonia; O'Leary, Neil; Tresadern, Philip; Berks, Michael; Roberts, Christopher; Manning, Joanne; Allen, John; Anderson, Marina; Cutolo, Maurizio; Hesselstrand, Roger; Howell, Kevin; Pizzorni, Carmen; Smith, Vanessa; Sulli, Alberto; Wildt, Marie; Taylor, Christopher; Murray, Andrea; Herrick, Ariane L

2017-07-01

Our aim was to assess the reliability of nailfold capillary assessment in terms of image evaluability, image severity grade ('normal', 'early', 'active', 'late'), capillary density, capillary (apex) width, and presence of giant capillaries, and also to gain further insight into differences in these parameters between patients with systemic sclerosis (SSc), patients with primary Raynaud's phenomenon (PRP) and healthy control subjects. Videocapillaroscopy images (magnification 300×) were acquired from all 10 digits from 173 participants: 101 patients with SSc, 22 with PRP and 50 healthy controls. Ten capillaroscopy experts from 7 European centres evaluated the images. Custom image mark-up software allowed extraction of the following outcome measures: overall grade ('normal', 'early', 'active', 'late', 'non-specific', or 'ungradeable'), capillary density (vessels/mm), mean vessel apical width, and presence of giant capillaries. Observers analysed a median of 129 images each. Evaluability (i.e. the availability of measures) varied across outcome measures (e.g. 73.0% for density and 46.2% for overall grade in patients with SSc). Intra-observer reliability for evaluability was consistently higher than inter- (e.g. for density, intra-class correlation coefficient [ICC] was 0.71 within and 0.14 between observers). Conditional on evaluability, both intra- and inter-observer reliability were high for grade (ICC 0.93 and 0.78 respectively), density (0.91 and 0.64) and width (0.91 and 0.85). Evaluability is one of the major challenges in assessing nailfold capillaries. However, when images are evaluable, the high intra- and inter-reliabilities suggest that overall image grade, capillary density and apex width have potential as outcome measures in longitudinal studies. Copyright © 2017 Elsevier Inc. All rights reserved.
Comparison of 3D computer-aided with manual cerebral aneurysm measurements in different imaging modalities.

PubMed

Groth, M; Forkert, N D; Buhk, J H; Schoenfeld, M; Goebell, E; Fiehler, J

2013-02-01

To compare intra- and inter-observer reliability of aneurysm measurements obtained by a 3D computer-aided technique with standard manual aneurysm measurements in different imaging modalities. A total of 21 patients with 29 cerebral aneurysms were studied. All patients underwent digital subtraction angiography (DSA), contrast-enhanced (CE-MRA) and time-of-flight magnetic resonance angiography (TOF-MRA). Aneurysm neck and depth diameters were manually measured by two observers in each modality. Additionally, semi-automatic computer-aided diameter measurements were performed using 3D vessel surface models derived from CE- (CE-com) and TOF-MRA (TOF-com) datasets. Bland-Altman analysis (BA) and intra-class correlation coefficient (ICC) were used to evaluate intra- and inter-observer agreement. BA revealed the narrowest relative limits of intra- and inter-observer agreement for aneurysm neck and depth diameters obtained by TOF-com (ranging between ±5.3 % and ±28.3 %) and CE-com (ranging between ±23.3 % and ±38.1 %). Direct measurements in DSA, TOF-MRA and CE-MRA showed considerably wider limits of agreement. The highest ICCs were observed for TOF-com and CE-com (ICC values, 0.92 or higher for intra- as well as inter-observer reliability). Computer-aided aneurysm measurement in 3D offers improved intra- and inter-observer reliability and a reproducible parameter extraction, which may be used in clinical routine and as objective surrogate end-points in clinical trials.
A French validation study of the Coma Recovery Scale-Revised (CRS-R).

PubMed

Schnakers, Caroline; Majerus, Steve; Giacino, Joseph; Vanhaudenhuyse, Audrey; Bruno, Marie-Aurelie; Boly, Melanie; Moonen, Gustave; Damas, Pierre; Lambermont, Bernard; Lamy, Maurice; Damas, Francois; Ventura, Manfredi; Laureys, Steven

2008-09-01

The aim of the present study was to explore the concurrent validity, inter-rater agreement and diagnostic sensitivity of a French adaptation of the Coma Recovery Scale-Revised (CRS-R) as compared to other coma scales such as the Glasgow Coma Scale (GCS), the Full Outline of UnResponsiveness scale (FOUR) and the Wessex Head Injury Matrix (WHIM). Multi-centric prospective study. To test concurrent validity and diagnostic sensitivity, the four behavioural scales were administered in a randomized order in 77 vegetative and minimally conscious patients. Twenty-four clinicians with different professional backgrounds, levels of expertise and CRS-R experience were recruited to assess inter-rater agreement. Good concurrent validity was obtained between the CRS-R and the three other standardized behavioural scales. Inter-rater reliability for the CRS-R total score and sub-scores was good, indicating that the scale yields reproducible findings across examiners and does not appear to be systematically biased by profession, level of expertise or CRS-R experience. Finally, the CRS-R demonstrated a significantly higher sensitivity to detect MCS patients, as compared to the GCS, the FOUR and the WHIM. The results show that the French version of the CRS-R is a valid and sensitive scale which can be used in severely brain damaged patients by all members of the medical staff.
Reproducibility of electronic tooth colour measurements.

PubMed

Ratzmann, Anja; Klinke, Thomas; Schwahn, Christian; Treichel, Anja; Gedrange, Tomasz

2008-10-01

Clinical methods of investigation, such as tooth colour determination, should be simple, quick and reproducible. The determination of tooth colours usually relies upon manual comparison of a patient's tooth colour with a colour ring. After some days, however, measurement results frequently lack unequivocal reproducibility. This study aimed to examine an electronic method for reliable colour measurement. The colours of the teeth 14 to 24 were determined by three different examiners in 10 subjects using the colour measuring device Shade Inspector. In total, 12 measurements per tooth were taken. Two measurement time points were scheduled to be taken, namely at study onset (T(1)) and after 6 months (T(2)). At either time point, two measurement series per subject were taken by the different examiners at 2-week intervals. The inter-examiner and intra-examiner agreement of the measurement results was assessed. The concordance for lightness and colour intensity (saturation) was represented by the intra-class correlation coefficient. The categorical variable colour shade (hue) was assessed using the kappa statistic. The study results show that tooth colour can be measured independently of the examiner. Good agreement was found between the examiners.
A comparison of four measures of moral reasoning.

PubMed

Wilmoth, G H; McFarland, S G

1977-08-01

Kohlberg's Moral Judgment Scale, Gilligan et al.'s Sexual Moral Judgment Scale, Maitland and Goldman's Objective Moral Judgment Scale, and Hogan's Maturity of Moral Judgment Scale, were examined for reliability and inter-scale relationships. All measures except the Objective Moral Judgment Scale had good reliabilities. The obtained relations between the Moral Judgment Scale and the Sexual Moral Judgment Scale replicated previous research. The Objective Moral Judgment Scale was not found to validly assess the Kohlberg stages. The Maturity of Moral Judgment Scale scores were strongly related to the subjects's classification on the Kohlberg stages, and the scale appears to offer a reliable, quickly scored, and valid index of mature thought, although the scale's continuous scores do not permit clear stage classification.
Evaluating the Inter-Respondent (Consumer vs. Staff) Reliability and Construct Validity (SIS vs. Vineland) of the Supports Intensity Scale on a Dutch Sample

ERIC Educational Resources Information Center

Claes, C.; Van Hove, G.; van Loon, J.; Vandevelde, S.; Schalock, R. L.

2009-01-01

Background: Despite various reliability studies on the Supports Intensity Scale (SIS), to date there has not been an evaluation of the reliability of client vs. staff judgments. Such determination is important, given the increasing consumer-driven approach to services. Additionally, there has not been an evaluation of the instrument's construct…
The reliability and validity of video analysis for the assessment of the clinical signs of concussion in Australian football.

PubMed

Makdissi, Michael; Davis, Gavin

2016-10-01

The objective of this study was to determine the reliability and validity of identifying clinical signs of concussion using video analysis in Australian football. Prospective cohort study. All impacts and collisions potentially resulting in a concussion were identified during 2012 and 2013 Australian Football League seasons. Consensus definitions were developed for clinical signs associated with concussion. For intra- and inter-rater reliability analysis, two experienced clinicians independently assessed 102 randomly selected videos on two occasions. Sensitivity, specificity, positive and negative predictive values were calculated based on the diagnosis provided by team medical staff. 212 incidents resulting in possible concussion were identified in 414 Australian Football League games. The intra-rater reliability of the video-based identification of signs associated with concussion was good to excellent. Inter-rater reliability was good to excellent for impact seizure, slow to get up, motor incoordination, ragdoll appearance (2 of 4 analyses), clutching at head and facial injury. Inter-rater reliability for loss of responsiveness and blank and vacant look was only fair and did not reach statistical significance. The feature with the highest sensitivity was slow to get up (87%), but this sign had a low specificity (19%). Other video signs had a high specificity but low sensitivity. Blank and vacant look (100%) and motor incoordination (81%) had the highest positive predictive value. Video analysis may be a useful adjunct to the side-line assessment of a possible concussion. Video analysis however should not replace the need for a thorough multimodal clinical assessment. Copyright © 2016 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Radiologic analysis of hindfoot alignment: Comparison of Méary, long axial, and hindfoot alignment views.

PubMed

Neri, T; Barthelemy, R; Tourné, Y

2017-12-01

Among radiographic views available for assessing hindfoot alignment, the antero-posterior weight-bearing view with metal cerclage of the hindfoot (Méary view) is the most widely used in France. Internationally, the long axial view (LAV) and hindfoot alignment view (HAV) are used also. The objective of this study was to compare the reliability of these three views. The Méary view with cerclage of the hindfoot is as reliable as the LAV and HAV for assessing hindfoot alignment. All three views were obtained in each of 22 prospectively included patients. Intra-observer and inter-observer reliabilities were assessed by having two observers collect the radiographic measurements then computing the intra-class correlation coefficients (ICCs). The intra-observer and inter-observer ICCs were 0.956 and 0.988 with the Méary view, 0.990 and 0.765 with the HAV, and 0.997 and 0.991 with the LAV, respectively. Correlations were far stronger between the LAV and HAV than between each of these and the Méary view. Compared to the LAV and HAV, the Méary view indicated a greater degree of hindfoot valgus. Intra-observer reliability was excellent with both the LAV and HAV, whereas inter-observer reliability was better with the LAV. Excellent reliability was also obtained with the Méary view. Combining the Méary view to obtain a radiographic image of the clinical deformity with the LAV to measure the angular deviation of the hindfoot axis may be useful when assessing hindfoot malalignment. A comparison of the three views in a larger population is needed before clinical recommendations can be made. II, prospective study. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
Clinical assessment of effusion in knee osteoarthritis—A systematic review

PubMed Central

Maricar, Nasimah; Callaghan, Michael J.; Parkes, Matthew J.; Felson, David T.; O׳Neill, Terence W.

2016-01-01

Objective The aim of this systematic review was to determine the validity and inter- and intra-observer reliability of the assessment of knee joint effusion in osteoarthritis (OA) of the knee. Methods MEDLINE, Web of Knowledge, CINAHL, EMBASE, and AMED were searched from their inception to February 2015. Articles were included according to a priori defined criteria: samples containing participants with knee OA; prospective evaluation of clinical tests and assessments of knee effusion that included reliability, sensitivity, and specificity of these tests. Results A total of 10 publications were reviewed. Eight of these considered reliability and four on validity of clinical assessments against ultrasound effusion. It was not possible to undertake a meta-analysis of reliability or validity because of differences in study designs and the clinical tests. Intra-observer kappa agreement for visible swelling ranged from 0.37 (suprapatellar) to 1.0 (prepatellar); for bulge sign 0.47 and balloon sign 0.37. Inter-observer kappa agreement for visible swelling ranged from −0.02 (prepatellar) to 0.65 (infrapatellar), the balloon sign −0.11 to 0.82, patellar tap −0.02 to 0.75 and bulge sign kappa −0.04 to 0.14 or reliability coefficient 0.97. Reliability and diagnostic accuracy tended to be better in experienced observers. Very few data looked at performance of individual clinical tests with sensitivity ranging 18.2–85.7% and specificity 35.3–93.3%, both higher with larger effusions. Conclusion The majority of unstandardized clinical tests to assess joint effusion in knee OA had relatively low intra- and inter-observer reliability. There is some evidence experience improved reliability and diagnostic accuracy of tests. Currently there is insufficient evidence to recommend any particular test in clinical practice. PMID:26581486
Clinical assessment of effusion in knee osteoarthritis-A systematic review.

PubMed

Maricar, Nasimah; Callaghan, Michael J; Parkes, Matthew J; Felson, David T; O'Neill, Terence W

2016-04-01

The aim of this systematic review was to determine the validity and inter- and intra-observer reliability of the assessment of knee joint effusion in osteoarthritis (OA) of the knee. MEDLINE, Web of Knowledge, CINAHL, EMBASE, and AMED were searched from their inception to February 2015. Articles were included according to a priori defined criteria: samples containing participants with knee OA; prospective evaluation of clinical tests and assessments of knee effusion that included reliability, sensitivity, and specificity of these tests. A total of 10 publications were reviewed. Eight of these considered reliability and four on validity of clinical assessments against ultrasound effusion. It was not possible to undertake a meta-analysis of reliability or validity because of differences in study designs and the clinical tests. Intra-observer kappa agreement for visible swelling ranged from 0.37 (suprapatellar) to 1.0 (prepatellar); for bulge sign 0.47 and balloon sign 0.37. Inter-observer kappa agreement for visible swelling ranged from -0.02 (prepatellar) to 0.65 (infrapatellar), the balloon sign -0.11 to 0.82, patellar tap -0.02 to 0.75 and bulge sign kappa -0.04 to 0.14 or reliability coefficient 0.97. Reliability and diagnostic accuracy tended to be better in experienced observers. Very few data looked at performance of individual clinical tests with sensitivity ranging 18.2-85.7% and specificity 35.3-93.3%, both higher with larger effusions. The majority of unstandardized clinical tests to assess joint effusion in knee OA had relatively low intra- and inter-observer reliability. There is some evidence experience improved reliability and diagnostic accuracy of tests. Currently there is insufficient evidence to recommend any particular test in clinical practice. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
How to assess and compare inter-rater reliability, agreement and correlation of ratings: an exemplary analysis of mother-father and parent-teacher expressive vocabulary rating pairs

PubMed Central

Stolarova, Margarita; Wolf, Corinna; Rinker, Tanja; Brielmann, Aenne

2014-01-01

This report has two main purposes. First, we combine well-known analytical approaches to conduct a comprehensive assessment of agreement and correlation of rating-pairs and to dis-entangle these often confused concepts, providing a best-practice example on concrete data and a tutorial for future reference. Second, we explore whether a screening questionnaire developed for use with parents can be reliably employed with daycare teachers when assessing early expressive vocabulary. A total of 53 vocabulary rating pairs (34 parent–teacher and 19 mother–father pairs) collected for two-year-old children (12 bilingual) are evaluated. First, inter-rater reliability both within and across subgroups is assessed using the intra-class correlation coefficient (ICC). Next, based on this analysis of reliability and on the test-retest reliability of the employed tool, inter-rater agreement is analyzed, magnitude and direction of rating differences are considered. Finally, Pearson correlation coefficients of standardized vocabulary scores are calculated and compared across subgroups. The results underline the necessity to distinguish between reliability measures, agreement and correlation. They also demonstrate the impact of the employed reliability on agreement evaluations. This study provides evidence that parent–teacher ratings of children's early vocabulary can achieve agreement and correlation comparable to those of mother–father ratings on the assessed vocabulary scale. Bilingualism of the evaluated child decreased the likelihood of raters' agreement. We conclude that future reports of agreement, correlation and reliability of ratings will benefit from better definition of terms and stricter methodological approaches. The methodological tutorial provided here holds the potential to increase comparability across empirical reports and can help improve research practices and knowledge transfer to educational and therapeutic settings. PMID:24994985
Tests examining skill outcomes in sport: a systematic review of measurement properties and feasibility.

PubMed

Robertson, Samuel J; Burnett, Angus F; Cochrane, Jodie

2014-04-01

A high level of participant skill is influential in determining the outcome of many sports. Thus, tests assessing skill outcomes in sport are commonly used by coaches and researchers to estimate an athlete's ability level, to evaluate the effectiveness of interventions or for the purpose of talent identification. The objective of this systematic review was to examine the methodological quality, measurement properties and feasibility characteristics of sporting skill outcome tests reported in the peer-reviewed literature. A search of both SPORTDiscus and MEDLINE databases was undertaken. Studies that examined tests of sporting skill outcomes were reviewed. Only studies that investigated measurement properties of the test (reliability or validity) were included. A total of 22 studies met the inclusion/exclusion criteria. A customised checklist of assessment criteria, based on previous research, was utilised for the purpose of this review. A range of sports were the subject of the 22 studies included in this review, with considerations relating to methodological quality being generally well addressed by authors. A range of methods and statistical procedures were used by researchers to determine the measurement properties of their skill outcome tests. The majority (95%) of the reviewed studies investigated test-retest reliability, and where relevant, inter and intra-rater reliability was also determined. Content validity was examined in 68% of the studies, with most tests investigating multiple skill domains relevant to the sport. Only 18% of studies assessed all three reviewed forms of validity (content, construct and criterion), with just 14% investigating the predictive validity of the test. Test responsiveness was reported in only 9% of studies, whilst feasibility received varying levels of attention. In organised sport, further tests may exist which have not been investigated in this review. This could be due to such tests firstly not being published in the peer-review literature and secondly, not having their measurement properties (i.e., reliability or validity) examined formally. Of the 22 studies included in this review, items relating to test methodological quality were, on the whole, well addressed. Test-retest reliability was determined in all but one of the reviewed studies, whilst most studies investigated at least two aspects of validity (i.e., content, construct or criterion-related validity). Few studies examined predictive validity or responsiveness. While feasibility was addressed in over half of the studies, practicality and test limitations were rarely addressed. Consideration of study quality, measurement properties and feasibility components assessed in this review can assist future researchers when developing or modifying tests of sporting skill outcomes.
Test-retest resting-state fMRI in healthy elderly persons with a family history of Alzheimer's disease.

PubMed

Orban, Pierre; Madjar, Cécile; Savard, Mélissa; Dansereau, Christian; Tam, Angela; Das, Samir; Evans, Alan C; Rosa-Neto, Pedro; Breitner, John C S; Bellec, Pierre

2015-01-01

We present a test-retest dataset of resting-state fMRI data obtained in 80 cognitively normal elderly volunteers enrolled in the "Pre-symptomatic Evaluation of Novel or Experimental Treatments for Alzheimer's Disease" (PREVENT-AD) Cohort. Subjects with a family history of Alzheimer's disease in first-degree relatives were recruited as part of an on-going double blind randomized clinical trial of Naproxen or placebo. Two pairs of scans were acquired ~3 months apart, allowing the assessment of both intra- and inter-session reliability, with the possible caveat of treatment effects as a source of inter-session variation. Using the NeuroImaging Analysis Kit (NIAK), we report on the standard quality of co-registration and motion parameters of the data, and assess their validity based on the spatial distribution of seed-based connectivity maps as well as intra- and inter-session reliability metrics in the default-mode network. This resource, released publicly as sample UM1 of the Consortium for Reliability and Reproducibility (CoRR), will benefit future studies focusing on the preclinical period preceding the appearance of dementia in Alzheimer's disease.
Reliability and validity of neurobehavioral function on the Psychology Experimental Building Language test battery in young adults

PubMed Central

Mueller, Shane T.; Geerken, Alexander R.; Dixon, Kyle L.; Kroliczak, Gregory; Olsen, Reid H.J.; Miller, Jeremy K.

2015-01-01

Background. The Psychology Experiment Building Language (PEBL) software consists of over one-hundred computerized tests based on classic and novel cognitive neuropsychology and behavioral neurology measures. Although the PEBL tests are becoming more widely utilized, there is currently very limited information about the psychometric properties of these measures. Methods. Study I examined inter-relationships among nine PEBL tests including indices of motor-function (Pursuit Rotor and Dexterity), attention (Test of Attentional Vigilance and Time-Wall), working memory (Digit Span Forward), and executive-function (PEBL Trail Making Test, Berg/Wisconsin Card Sorting Test, Iowa Gambling Test, and Mental Rotation) in a normative sample (N = 189, ages 18–22). Study II evaluated test–retest reliability with a two-week interest interval between administrations in a separate sample (N = 79, ages 18–22). Results. Moderate intra-test, but low inter-test, correlations were observed and ceiling/floor effects were uncommon. Sex differences were identified on the Pursuit Rotor (Cohen’s d = 0.89) and Mental Rotation (d = 0.31) tests. The correlation between the test and retest was high for tests of motor learning (Pursuit Rotor time on target r = .86) and attention (Test of Attentional Vigilance response time r = .79), intermediate for memory (digit span r = .63) but lower for the executive function indices (Wisconsin/Berg Card Sorting Test perseverative errors = .45, Tower of London moves = .15). Significant practice effects were identified on several indices of executive function. Conclusions. These results are broadly supportive of the reliability and validity of individual PEBL tests in this sample. These findings indicate that the freely downloadable, open-source PEBL battery (http://pebl.sourceforge.net) is a versatile research tool to study individual differences in neurocognitive performance. PMID:26713233
Reliability and validity of neurobehavioral function on the Psychology Experimental Building Language test battery in young adults.

PubMed

Piper, Brian J; Mueller, Shane T; Geerken, Alexander R; Dixon, Kyle L; Kroliczak, Gregory; Olsen, Reid H J; Miller, Jeremy K

2015-01-01

Background. The Psychology Experiment Building Language (PEBL) software consists of over one-hundred computerized tests based on classic and novel cognitive neuropsychology and behavioral neurology measures. Although the PEBL tests are becoming more widely utilized, there is currently very limited information about the psychometric properties of these measures. Methods. Study I examined inter-relationships among nine PEBL tests including indices of motor-function (Pursuit Rotor and Dexterity), attention (Test of Attentional Vigilance and Time-Wall), working memory (Digit Span Forward), and executive-function (PEBL Trail Making Test, Berg/Wisconsin Card Sorting Test, Iowa Gambling Test, and Mental Rotation) in a normative sample (N = 189, ages 18-22). Study II evaluated test-retest reliability with a two-week interest interval between administrations in a separate sample (N = 79, ages 18-22). Results. Moderate intra-test, but low inter-test, correlations were observed and ceiling/floor effects were uncommon. Sex differences were identified on the Pursuit Rotor (Cohen's d = 0.89) and Mental Rotation (d = 0.31) tests. The correlation between the test and retest was high for tests of motor learning (Pursuit Rotor time on target r = .86) and attention (Test of Attentional Vigilance response time r = .79), intermediate for memory (digit span r = .63) but lower for the executive function indices (Wisconsin/Berg Card Sorting Test perseverative errors = .45, Tower of London moves = .15). Significant practice effects were identified on several indices of executive function. Conclusions. These results are broadly supportive of the reliability and validity of individual PEBL tests in this sample. These findings indicate that the freely downloadable, open-source PEBL battery (http://pebl.sourceforge.net) is a versatile research tool to study individual differences in neurocognitive performance.
A system framework of inter-enterprise machining quality control based on fractal theory

NASA Astrophysics Data System (ADS)

Zhao, Liping; Qin, Yongtao; Yao, Yiyong; Yan, Peng

2014-03-01

In order to meet the quality control requirement of dynamic and complicated product machining processes among enterprises, a system framework of inter-enterprise machining quality control based on fractal was proposed. In this system framework, the fractal-specific characteristic of inter-enterprise machining quality control function was analysed, and the model of inter-enterprise machining quality control was constructed by the nature of fractal structures. Furthermore, the goal-driven strategy of inter-enterprise quality control and the dynamic organisation strategy of inter-enterprise quality improvement were constructed by the characteristic analysis on this model. In addition, the architecture of inter-enterprise machining quality control based on fractal was established by means of Web service. Finally, a case study for application was presented. The result showed that the proposed method was available, and could provide guidance for quality control and support for product reliability in inter-enterprise machining processes.
Development and Validation of a Practical Instrument for Injury Prevention: The Occupational Safety and Health Monitoring and Assessment Tool (OSH-MAT).

PubMed

Sun, Yi; Arning, Martin; Bochmann, Frank; Börger, Jutta; Heitmann, Thomas

2018-06-01

The Occupational Safety and Health Monitoring and Assessment Tool (OSH-MAT) is a practical instrument that is currently used in the German woodworking and metalworking industries to monitor safety conditions at workplaces. The 12-item scoring system has three subscales rating technical, organizational, and personnel-related conditions in a company. Each item has a rating value ranging from 1 to 9, with higher values indicating higher standard of safety conditions. The reliability of this instrument was evaluated in a cross-sectional survey among 128 companies and its validity among 30,514 companies. The inter-rater reliability of the instrument was examined independently and simultaneously by two well-trained safety engineers. Agreement between the double ratings was quantified by the intraclass correlation coefficient and absolute agreement of the rating values. The content validity of the OSH-MAT was evaluated by quantifying the association between OSH-MAT values and 5-year average injury rates by Poisson regression analysis adjusted for the size of the companies and industrial sectors. The construct validity of OSH-MAT was examined by principle component factor analysis. Our analysis indicated good to very good inter-rater reliability (intraclass correlation coefficient = 0.64-0.74) of OSH-MAT values with an absolute agreement of between 72% and 81%. Factor analysis identified three component subscales that met exactly the structure theory of this instrument. The Poisson regression analysis demonstrated a statistically significant exposure-response relationship between OSH-MAT values and the 5-year average injury rates. These analyses indicate that OSH-MAT is a valid and reliable instrument that can be used effectively to monitor safety conditions at workplaces.

The acceleration dependent validity and reliability of 10 Hz GPS.

PubMed

Akenhead, Richard; French, Duncan; Thompson, Kevin G; Hayes, Philip R

2014-09-01

To examine the validity and inter-unit reliability of 10 Hz GPS for measuring instantaneous velocity during maximal accelerations. Experimental. Two 10 Hz GPS devices secured to a sliding platform mounted on a custom built monorail were towed whilst sprinting maximally over 10 m. Displacement of GPS devices was measured using a laser sampling at 2000 Hz, from which velocity and mean acceleration were derived. Velocity data was pooled into acceleration thresholds according to mean acceleration. Agreement between laser and GPS measures of instantaneous velocity within each acceleration threshold was examined using least squares linear regression and Bland-Altman limits of agreement (LOA). Inter-unit reliability was expressed as typical error (TE) and a Pearson correlation coefficient. Mean bias ± 95% LOA during accelerations of 0-0.99 ms(-2) was 0.12 ± 0.27 ms(-1), decreasing to -0.40 ± 0.67 ms(-1) during accelerations >4 ms(-2). Standard error of the estimate ± 95% CI (SEE) increased from 0.12 ± 0.02 ms(-1) during accelerations of 0-0.99 ms(-2) to 0.32 ± 0.06 ms(-1) during accelerations >4 ms(-2). TE increased from 0.05 ± 0.01 to 0.12 ± 0.01 ms(-1) during accelerations of 0-0.99 ms(-2) and >4 ms(-2) respectively. The validity and reliability of 10 Hz GPS for the measurement of instantaneous velocity has been shown to be inversely related to acceleration. Those using 10 Hz GPS should be aware that during accelerations of over 4 ms(-2), accuracy is compromised. Copyright © 2013 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Comparison of human septal nuclei MRI measurements using automated segmentation and a new manual protocol based on histology

PubMed Central

Butler, Tracy; Zaborszky, Laszlo; Pirraglia, Elizabeth; Li, Jinyu; Wang, Xiuyuan Hugh; Li, Yi; Tsui, Wai; Talos, Delia; Devinsky, Orrin; Kuchna, Izabela; Nowicki, Krzysztof; French, Jacqueline; Kuzniecky, Rubin; Wegiel, Jerzy; Glodzik, Lidia; Rusinek, Henry; DeLeon, Mony J.; Thesen, Thomas

2014-01-01

Septal nuclei, located in basal forebrain, are strongly connected with hippocampi and important in learning and memory, but have received limited research attention in human MRI studies. While probabilistic maps for estimating septal volume on MRI are now available, they have not been independently validated against manual tracing of MRI, typically considered the gold standard for delineating brain structures. We developed a protocol for manual tracing of the human septal region on MRI based on examination of neuroanatomical specimens. We applied this tracing protocol to T1 MRI scans (n=86) from subjects with temporal epilepsy and healthy controls to measure septal volume. To assess the inter-rater reliability of the protocol, a second tracer used the same protocol on 20 scans that were randomly selected from the 72 healthy controls. In addition to measuring septal volume, maximum septal thickness between the ventricles was measured and recorded. The same scans (n=86) were also analysed using septal probabilistic maps and Dartel toolbox in SPM. Results show that our manual tracing algorithm is reliable, and that septal volume measurements obtained via manual and automated methods correlate significantly with each other (p<001). Both manual and automated methods detected significantly enlarged septal nuclei in patients with temporal lobe epilepsy in accord with a proposed compensatory neuroplastic process related to the strong connections between septal nuclei and hippocampi. Septal thickness, which was simple to measure with excellent inter-rater reliability, correlated well with both manual and automated septal volume, suggesting it could serve as an easy-to-measure surrogate for septal volume in future studies. Our results call attention to the important though understudied human septal region, confirm its enlargement in temporal lobe epilepsy, and provide a reliable new manual delineation protocol that will facilitate continued study of this critical region. PMID:24736183
Comparison of human septal nuclei MRI measurements using automated segmentation and a new manual protocol based on histology.

PubMed

Butler, Tracy; Zaborszky, Laszlo; Pirraglia, Elizabeth; Li, Jinyu; Wang, Xiuyuan Hugh; Li, Yi; Tsui, Wai; Talos, Delia; Devinsky, Orrin; Kuchna, Izabela; Nowicki, Krzysztof; French, Jacqueline; Kuzniecky, Rubin; Wegiel, Jerzy; Glodzik, Lidia; Rusinek, Henry; deLeon, Mony J; Thesen, Thomas

2014-08-15

Septal nuclei, located in basal forebrain, are strongly connected with hippocampi and important in learning and memory, but have received limited research attention in human MRI studies. While probabilistic maps for estimating septal volume on MRI are now available, they have not been independently validated against manual tracing of MRI, typically considered the gold standard for delineating brain structures. We developed a protocol for manual tracing of the human septal region on MRI based on examination of neuroanatomical specimens. We applied this tracing protocol to T1 MRI scans (n=86) from subjects with temporal epilepsy and healthy controls to measure septal volume. To assess the inter-rater reliability of the protocol, a second tracer used the same protocol on 20 scans that were randomly selected from the 72 healthy controls. In addition to measuring septal volume, maximum septal thickness between the ventricles was measured and recorded. The same scans (n=86) were also analyzed using septal probabilistic maps and DARTEL toolbox in SPM. Results show that our manual tracing algorithm is reliable, and that septal volume measurements obtained via manual and automated methods correlate significantly with each other (p<.001). Both manual and automated methods detected significantly enlarged septal nuclei in patients with temporal lobe epilepsy in accord with a proposed compensatory neuroplastic process related to the strong connections between septal nuclei and hippocampi. Septal thickness, which was simple to measure with excellent inter-rater reliability, correlated well with both manual and automated septal volume, suggesting it could serve as an easy-to-measure surrogate for septal volume in future studies. Our results call attention to the important though understudied human septal region, confirm its enlargement in temporal lobe epilepsy, and provide a reliable new manual delineation protocol that will facilitate continued study of this critical region. Copyright © 2014 Elsevier Inc. All rights reserved.
Development and initial validation of the Localized Scleroderma Skin Damage Index and Physician Global Assessment of disease Damage: a proof-of-concept study

PubMed Central

Vilaiyuk, Soamarat; Torok, Kathryn S.; Medsger, Thomas A.

2010-01-01

Objective. To develop and assess the psychometric properties of the Localized Scleroderma (LS) Skin Damage Index (LoSDI) and Physician Global Assessment of disease Damage (PGA-D). Methods. Damage was defined as irreversible/persistent changes (>6 months) due to previous active disease/complications of therapy. Eight rheumatologists assessed the importance of 17 variables in formulating the PGA-D/LoSDI. LS patients were evaluated by two rheumatologists using both tools to assess their psychometric properties. LoSDI was calculated by summing three scores for cutaneous features of damage [dermal atrophy (DAT), subcutaneous atrophy (SAT) and dyspigmentation (DP)] measured at 18 anatomic sites. Patient GA of disease severity (PtGA-S), Children's Dermatology Life Quality Index (CDLQI) and PGA-D were recorded at the time of each examination. Results. Thirty LS patients (112 lesions) and nine patient-visit pairs (18 lesions) were included for inter- and intra-rater reliability study. LoSDI and its domains DAT, SAT, DP and PGA-D demonstrated excellent inter- and intra-rater reliability (reliability coefficients 0.86–0.99 and 0.74–0.96, respectively). LoSDI correlated moderately with PGA-D and poorly with PtGA-S and CDLQI. PGA-D correlated moderately with PtGA-S, but poorly with CDLQI. Conclusions. To complete the LS Cutaneous Assessment Tool (LoSCAT), we developed and evaluated the psychometric properties of the LoSDI and PGA-D in addition to the LS Skin Severity Index (LoSSI). These instruments will facilitate evaluation of LS patients for individual patient management and clinical trials. LoSDI and PGA-D demonstrated excellent reliability and high validity. LoSCAT provides an improved understanding of LS natural history. Further study in a larger group of patients is needed to confirm these preliminary findings. PMID:20008472
Fluorescein angiography-based diagnosis for retinopathy of prematurity: expert-non expert comparison.

PubMed

Guagliano, Rosanna; Barillà, Donatella; Bertone, Chiara; Maffia, Anna; Periti, Francesca; Spallone, Laura; Anselmetti, Giovanni; Giacosa, Elisabetta; Stronati, Mauro; Tinelli, Carmine; Bianchi, Paolo Emilio

2013-01-01

To evaluate accuracy and inter-rater reliability of RetCam fundus images and digital camera fluorangioscopic images in acute retinopathy of prematurity (ROP) by comparing diagnoses given by trainee ophthalmologists with those provided by expert ophthalmologists.   This is a multicenter retrospective observational study of diagnostic data from 48 eyes of 24 premature infants with classical ROP, stage II, as evaluated by RetCam 3 and fluorescein angiography (FA). Average gestational age was 25.4 weeks, average weight 804.7 g. A staging grid (with ocular fundus divided into 3 concentric zones) and 24 15° sectors centered around the optic papilla were superimposed on 360° retina photomontages (Photoshop) made from RetCam and FA images. Non expert vs expert diagnosis agreement was measured for each sector by means of Cohen kappa (Fleiss, 1981).  A high degree of concordance was found. Inter-rater agreement between expert and non expert interpretations of retinal photomontages was greater for fluorangiographic images than for RetCam images, with κ = 0.61-1 for 120/152 (78.9%) sectors examined on the RetCam images and  κ = 0.61-1 for 168/198 (84.8%) sectors examined on the FA images.  The FA images appear to be easier to interpret than RetCam images, both by expert and non expert ophthalmologists. The results confirm that FA is a good examination technique with a high degree of reliability, even where trainee practitioners are involved. This suggests that retinopathy management can be improved by entrusting diagnostic responsibilities to trainee ophthalmologists, in order to extend access to correct diagnosis, recognition of threshold lesions, and prompt treatment.
Anthropometric Study of Three-Dimensional Facial Morphology in Malay Adults

PubMed Central

Majawit, Lynnora Patrick; Mohd Razi, Roziana

2016-01-01

Objectives To establish the three-dimensional (3D) facial soft tissue morphology of adult Malaysian subjects of the Malay ethnic group; and to determine the morphological differences between the genders, using a non-invasive stereo-photogrammetry 3D camera. Material and Methods One hundred and nine subjects participated in this research, 54 Malay men and 55 Malay women, aged 20–30 years old with healthy BMI and with no adverse skeletal deviation. Twenty-three facial landmarks were identified on 3D facial images captured using a VECTRA M5-360 Head System (Canfield Scientific Inc, USA). Two angular, 3 ratio and 17 linear measurements were identified using Canfield Mirror imaging software. Intra- and inter-examiner reliability tests were carried out using 10 randomly selected images, analyzed using the intra-class correlation coefficient (ICC). Multivariate analysis of variance (MANOVA) was carried out to investigate morphologic differences between genders. Results ICC scores were generally good for both intra-examiner (range 0.827–0.987) and inter-examiner reliability (range 0.700–0.983) tests. Generally, all facial measurements were larger in men than women, except the facial profile angle which was larger in women. Clinically significant gender dimorphisms existed in biocular width, nose height, nasal bridge length, face height and lower face height values (mean difference > 3mm). Clinical significance was set at 3mm. Conclusion Facial soft tissue morphological values can be gathered efficiently and measured effectively from images captured by a non-invasive stereo-photogrammetry 3D camera. Adult men in Malaysia when compared to women had a wider distance between the eyes, a longer and more prominent nose and a longer face. PMID:27706220
The inter-rater reliability of the incontinence-associated dermatitis intervention tool-D (IADIT-D) between two independent registered nurses of nursing home residents in long-term care facilities.

PubMed

Braunschmidt, Brigitte; Müller, Gerhard; Jukic-Puntigam, Margareta; Steininger, Alfred

2013-01-01

Incontinence-associated dermatitis (IAD) is the clinical manifestation of moisture related skin damage (Beeckman, Woodward, & Gray, 2011). Valid assessment instruments are needed for risk assessment and classification of IAD. Aim of the quantitative-descriptive cross-sectional study was to determine the inter-rater reliability of the item scores of the German Incontinence Associated Dermatitis Intervention Tool (IADIT-D) between two independent assessors of nursing home residents (n = 381) in long-term care facilities. The 19 pairs of assessors consisted of registered nurses. The data analysis was computed first with the calculation of the total percentage of agreement. Because this value is not randomly adjusted, the calculation of the Kappa-coefficients and AC1-Statistic was done as well. The total percentage of the inter-rater agreement was 84% (n = 319). In a second step of analysis, the calculation of all items determined high (kappa = .70) and very high agreement (AC1 = .83) levels, respectively. For the risk assessment (kappa = .82; AC1 = .94), the values amounted to very high agreement levels and for the classification (kappa(w) = .70; AC1 = .76) to high agreement levels. The high to very high agreement values of IADIT-D demonstrate that the items can be regarded as stable in regards to the inter-rater reliability for the use in long-term care facilities. In addition, further validation studies are needed.
Identifying and classifying hyperostosis frontalis interna via computerized tomography.

PubMed

May, Hila; Peled, Nathan; Dar, Gali; Hay, Ori; Abbas, Janan; Masharawi, Youssef; Hershkovitz, Israel

2010-12-01

The aim of this study was to recognize the radiological characteristics of hyperostosis frontalis interna (HFI) and to establish a valid and reliable method for its identification and classification. A reliability test was carried out on 27 individuals who had undergone a head computerized tomography (CT) scan. Intra-observer reliability was obtained by examining the images three times, by the same researcher, with a 2-week interval between each sample ranking. The inter-observer test was performed by three independent researchers. A validity test was carried out using two methods for identifying and classifying HFI: 46 cadaver skullcaps were ranked twice via computerized tomography scans and then by direct observation. Reliability and validity were calculated using Kappa test (SPSS 15.0). Reliability tests of ranking HFI via CT scans demonstrated good results (K > 0.7). As for validity, a very good consensus was obtained between the CT and direct observation, when moderate and advanced types of HFI were present (K = 0.82). The suggested classification method for HFI, using CT, demonstrated a sensitivity of 84%, specificity of 90.5%, and positive predictive value of 91.3%. In conclusion, volume rendering is a reliable and valid tool for identifying HFI. The suggested three-scale classification is most suitable for radiological diagnosis of the phenomena. Considering the increasing awareness of HFI as an early indicator of a developing malady, this study may assist radiologists in identifying and classifying the phenomena.
General motor function assessment scale--reliability of a Norwegian version.

PubMed

Langhammer, Birgitta; Lindmark, Birgitta

2014-01-01

The General Motor Function assessment scale (GMF) measures activity-related dependence, pain and insecurity among older people in frail health. The aim of the present study was to translate the GMF into a Norwegian version (N-GMF) and establish its reliability and clinical feasibility. The procedure used in translating the GMF was a forward and backward process, testing a convenience sample of 30 frail elderly people with it. The intra-rater reliability tests were performed by three physiotherapists, and the inter-reliability test was done by the same three plus nine independent colleagues. The statistical analyses were performed with a pairwise analysis for intra- and inter-rater reliability, using Cronbach's α, Percentage Agreement (PA), Svensson's rank transformable method and Cohen's κ. The Cronbach's α coefficients for the different subscales of N-GMF were 0.68 for Dependency, 0.73 for Pain and 0.75 for Insecurity. Intra-rater reliability: The variation in the PA for the total score was 40-70% in Dependence, 30-40% in Pain and 30-60% in Insecurity. The Relative Rank Variant (RV) indicated a modest individual bias and an augmented rank-order agreement coefficient ra of 0.96, 0.96 and 0.99, respectively. The variation in the κ statistics was 0.27-0.62 for Dependence, 0.17-0.35 for Pain and 0.13-0.47 for Insecurity. Inter-rater reliability: The PA between different testers in Dependence, Pain and Insecurity was 74%, 89% and 74%, respectively. The augmented rank-order agreement coefficients were: for Dependence r(a) = 0.97; for Pain, r(a) = 0.99; and for Insecurity, r(a) = 0.99. The N-GMF is a fairly reliable instrument for use with frail elderly people, with intra-rater and inter-rater reliability moderate in Dependence and slight to fair in Pain and Insecurity. The clinical usefulness was stressed in regard to its main focus, the frail elderly, and for communication within a multidisciplinary team. Implications for Rehabilitation The Norwegian-General Motor Function Assessment Scale (N-GMF) is a reliable instrument. The N-GMF is an instrument for screening and assessment of activity-related dependence, pain and insecurity in frail older people. The N-GMF may be used as a tool of communication in a multidisciplinary team.
Reliability on intra-laboratory and inter-laboratory data of hair mineral analysis comparing with blood analysis.

PubMed

Namkoong, Sun; Hong, Seung Phil; Kim, Myung Hwa; Park, Byung Cheol

2013-02-01

Nowadays, although its clinical value remains controversial institutions utilize hair mineral analysis. Arguments about the reliability of hair mineral analysis persist, and there have been evaluations of commercial laboratories performing hair mineral analysis. The objective of this study was to assess the reliability of intra-laboratory and inter-laboratory data at three commercial laboratories conducting hair mineral analysis, compared to serum mineral analysis. Two divided hair samples taken from near the scalp were submitted for analysis at the same time, to all laboratories, from one healthy volunteer. Each laboratory sent a report consisting of quantitative results and their interpretation of health implications. Differences among intra-laboratory and interlaboratory data were analyzed using SPSS version 12.0 (SPSS Inc., USA). All the laboratories used identical methods for quantitative analysis, and they generated consistent numerical results according to Friedman analysis of variance. However, the normal reference ranges of each laboratory varied. As such, each laboratory interpreted the patient's health differently. On intra-laboratory data, Wilcoxon analysis suggested they generated relatively coherent data, but laboratory B could not in one element, so its reliability was doubtful. In comparison with the blood test, laboratory C generated identical results, but not laboratory A and B. Hair mineral analysis has its limitations, considering the reliability of inter and intra laboratory analysis comparing with blood analysis. As such, clinicians should be cautious when applying hair mineral analysis as an ancillary tool. Each laboratory included in this study requires continuous refinement from now on for inducing standardized normal reference levels.
Reproducibility of the index of orthognathic functional treatment need scores derived from plaster study casts and their three-dimensional digital equivalents: a pilot study.

PubMed

McCrory, Emma; McGuinness, Niall Jp; Ulhaq, Aman

2018-06-01

To determine the reproducibility of Index of Orthognathic Functional Treatment Need (IOFTN) scores derived from plaster casts and their three-dimensional (3D) digital equivalents. Pilot study, prospective analytical. UK hospital orthodontic department. Thirty casts and their digital equivalents, representing the pre-treatment malocclusions of patients requiring orthodontic-orthognathic surgical treatment, were scored by four clinicians using IOFTN. Casts were scanned using a 3Shape digital scanner and 3D models produced using OrthoAnalyzer TM (3Shape Ltd, Copenhagen, Denmark). Examiners independently determined the IOFTN scores for the casts and digital models, to test their inter- and intra-operator reliability using weighted Kappa scores. Intra-operator agreement with IOFTN major categories (1-5: treatment need) was very good for plaster casts (0.83-0.98) and good-very good for digital models (0.78-0.83). Inter-operator agreement was moderate-very good for casts (0.58-0.82) and good-very good for digital models (0.65-0.92). Intra-operator agreement with IOFTN sub-categories (1-14: feature of malocclusion) was good-very good for casts (0.70-0.97) and digital models (0.80-0.94). Inter-operator agreement was moderate-good for casts (0.53-0.77); and moderate-very good for the digital models (0.58-0.90). Digital models are an acceptable alternative to plaster casts for examining the malocclusion of patients requiring combined orthodontic-orthognathic surgical treatment and determining treatment need.
Inter-Rater Reliability of Provider Interpretations of Irritable Bowel Syndrome Food and Symptom Journals

PubMed Central

Chung, Chia-Fang; Xu, Kaiyuan; Dong, Yi; Schenk, Jeanette M.; Cain, Kevin; Munson, Sean; Heitkemper, Margaret M.

2017-01-01

There are currently no standardized methods for identifying trigger food(s) from irritable bowel syndrome (IBS) food and symptom journals. The primary aim of this study was to assess the inter-rater reliability of providers’ interpretations of IBS journals. A second aim was to describe whether these interpretations varied for each patient. Eight providers reviewed 17 IBS journals and rated how likely key food groups (fermentable oligo-di-monosaccharides and polyols, high-calorie, gluten, caffeine, high-fiber) were to trigger IBS symptoms for each patient. Agreement of trigger food ratings was calculated using Krippendorff’s α-reliability estimate. Providers were also asked to write down recommendations they would give to each patient. Estimates of agreement of trigger food likelihood ratings were poor (average α = 0.07). Most providers gave similar trigger food likelihood ratings for over half the food groups. Four providers gave the exact same written recommendation(s) (range 3–7) to over half the patients. Inter-rater reliability of provider interpretations of IBS food and symptom journals was poor. Providers favored certain trigger food likelihood ratings and written recommendations. This supports the need for a more standardized method for interpreting these journals and/or more rigorous techniques to accurately identify personalized IBS food triggers. PMID:29113044
Optimation of Operation System Integration between Main and Feeder Public Transport (Case Study: Trans Jakarta-Kopaja Bus Services)

NASA Astrophysics Data System (ADS)

Miharja, M.; Priadi, Y. N.

2018-05-01

Promoting a better public transport is a key strategy to cope with urban transport problems which are mostly caused by a huge private vehicle usage. A better public transport service quality not only focuses on one type of public transport mode, but also concerns on inter modes service integration. Fragmented inter mode public transport service leads to a longer trip chain as well as average travel time which would result in its failure to compete with a private vehicle. This paper examines the optimation process of operation system integration between Trans Jakarta Bus as the main public transport mode and Kopaja Bus as feeder public transport service in Jakarta. Using scoring-interview method combined with standard parameters in operation system integration, this paper identifies the key factors that determine the success of the two public transport operation system integrations. The study found that some key integration parameters, such as the cancellation of “system setoran”, passenger get in-get out at official stop points, and systematic payment, positively contribute to a better service integration. However, some parameters such as fine system, time and changing point reliability, and information system reliability are among those which need improvement. These findings are very useful for the authority to set the right strategy to improve operation system integration between Trans Jakarta and Kopaja Bus services.
Measurement of inter- and intra-annual variability of landscape fire activity at a continental scale: The Australian case

Treesearch

Grant J. Williamson; Lynda D. Prior; Matt Jolly; Mark A. Cochrane; Brett P. Murphy; David M. J. S. Bowman

2016-01-01

Climate dynamics at diurnal, seasonal and inter-annual scales shape global fire activity, although difficulties of assembling reliable fire and meteorological data with sufficient spatio-temporal resolution have frustrated quantification of this variability. Using Australia as a case study, we combine data from 4760 meteorological stations with 12 years of satellite-...
Reliability of a survey tool for measuring consumer nutrition environment in urban food stores.

PubMed

Hosler, Akiko S; Dharssi, Aliza

2011-01-01

Despite the increase in the volume and importance of food environment research, there is a general lack of reliable measurement tools. This study presents the development and reliability assessment of a tool for measuring consumer nutrition environment in urban food stores. Cross-sectional design. A racially diverse downtown portion (6 ZIP code areas) in Albany, New York. A sample of 39 food stores was visited by our research team in 2009 to 2010. These stores were randomly selected from 123 eligible food stores identified through multiple government lists and ground-truthing. The Food Retail Outlet Survey Tool was developed to assess the presence of selected food and nonfood items, placement, milk prices, physical characteristics of the store, policy implementation, and advertisements on outside windows. For in-store items, agreement of observations between experienced and lightly trained surveyors was assessed. For window advertisement assessments, inter-method agreement (on-site sketch vs digital photo), and inter-rater agreement (both on-site) among lightly trained surveyors were evaluated. Percent agreement, Kappa, and prevalence-adjusted bias-adjusted kappa were calculated for in-store observations. Interclass correlation coefficients were calculated for window observations. Twenty-seven of the 47 in-store items had 100% agreement. The prevalence-adjusted bias-adjusted kappa indicated excellent agreement (≥0.90) on all items, except aisle width (0.74) and dark-green/orange colored fresh vegetables (0.85). The store type (nonconvenience store), the order of visits (first half), and the time to complete survey (>10 minutes) were associated with lower reliability in these 2 items. Both the inter-method and inter-rater agreements for window advertisements were uniformly high (intraclass correlation coefficient ranged 0.94-1.00), indicating high reliability. The Food Retail Outlet Survey Tool is a reliable tool for quickly measuring consumer nutrition environment. It can be effectively used by an individual who attended a 30-minute group briefing and practiced with 3 to 4 stores.
Vowels Development in Babbling of typically developing 6-to-12-month old Persian-learning Infants.

PubMed

Fotuhi, Mina; Yadegari, Fariba; Teymouri, Robab

2017-10-01

Pre-linguistic vocalizations including early consonants, vowels, and their combinations into syllables are considered as important predictors of the speech and language development. The purpose of this study was to examine vowel development in babblings of normally developing Persian-learning infants. Eight typically developing 6-8-month-old Persian-learning infants (3 boys and 5 girls) participated in this 4-month longitudinal descriptive-analytic study. A weekly 30-60-minute audio- and video-recording was obtained at home from the comfort state vocalizations of infants and the mother-child interactions. A total of 74:02:03 hours of vocalizations were phonetically transcribed. Seven vowels comprising /i/,/e/,/a/,/u/,/o/,/ɑ/, and /ә/ were identified in the babblings. The inter-rater reliability was obtained for 20% of vocalizations. The data were analyzed by repeated measures ANOVA and Pearson's correlation coefficient using SPSS software version 20. The results showed that two vowels /a/ (46.04) and /e/ (23.60) were produced with the highest mean frequency of occurrence, respectively. Regarding front/back dimension, the front vowels were the most prominent ones (71.87); in terms of height, low (46.78) and mid (32.45) vowels occurred maximally. A good inter-rater reliability was obtained (0.99, P < .01). The increased frequency of occurrence of the low and mid front vowels in the current study was consistent with previous studies on the emergence of vowels in pre-linguistic vocalization in other languages.
Reliability of Untrained and Experienced Raters on FEES: Rating Overall Residue is a Simple Task.

PubMed

Pisegna, Jessica M; Borders, James C; Kaneoka, Asako; Coster, Wendy J; Leonard, Rebecca; Langmore, Susan E

2018-03-07

The purpose of this study was to investigate the reliability of residue ratings on Fiberoptic Endoscopic Evaluation of Swallowing (FEES). We also examined rating differences based on experience to determine if years of experience influenced residue ratings. A group of 44 raters watched 81 FEES videos representing a wide range of residue severities for thin liquid, applesauce, and cracker boluses. Raters were untrained on the rating scales and simply rated their overall impression of residue amount on a visual analog scale (VAS) and a five-point ordinal scale in a randomized fashion across two sessions. Intra-class correlation coefficients, kappa coefficients, and ANOVAs were used to analyze agreement and differences in ratings. Residue ratings on both the VAS and ordinal scales had acceptable inter- and intra-rater reliability. Inter-rater agreement was acceptable (ICC > 0.7) for all comparisons. Intra-rater agreement was excellent on the VAS scale (r c = 0.9) and good on the ordinal scale (k = 0.78). There was no significant difference between expert ratings and other raters based on years of experience for cracker ratings (p = 0.2119) and applesauce ratings (p = 0.2899), but there was a significant difference between clinicians on thin liquid ratings (p = 0.0005). Without any specific training, raters demonstrated high reliability when rating the overall amount of residue on FEES. Years of experience with FEES did not influence residue ratings, suggesting that expert ratings of overall residue amount are not unique or specialized. Rating the overall amount of residue on FEES appears to be a simple visual-perceptual task for puree and cracker boluses.
The development of a semi-structured home interview (CHIF) to directly assess function in cognitively impaired elderly people in two cultures

PubMed Central

Hendrie, H. C.; Lane, K. A.; Ogunniyi, A.; Baiyewu, O.; Gureje, O.; Evans, R.; Smith-Gamble, V.; Pettaway, M.; Unverzagt, F. W.; Gao, S.; Hall, K. S.

2010-01-01

Background Assessing function is a crucial element in the diagnosis of dementia. This information is usually obtained from key informants. However, reliable informants are not always available. Methods A 10-item semi-structured home interview (the CHIF, or Clinician Home-based Interview to assess Function) to assess function primarily by measuring instrumental activities of daily living directly was developed and tested for inter-rater reliability and validity as part of the Indianapolis–Ibadan dementia project. The primary validity measurements were correlations between scores on the CHIF and independently gathered scores on the Blessed Dementia Scale (from informants) and the Mini-mental State Examination (MMSE). Sensitivities and specificities of scores on the CHIF and receiver operator characteristic (ROC) curves were constructed with dementia as the dependent variable. Results Inter-rater reliability for the CHIF was high (Pearson’s correlation coefficient 0.99 in Indianapolis and 0.87 in Ibadan). Internal consistency, in both samples, was good (Cronbach’s α 0.95 in Indianapolis and 0.83 in Ibadan). Scores on the CHIF correlated well with the Blessed Dementia scores at both sites (−0.71, p < 0.0001 for Indianapolis and −0.56, p < 0.0001 for Ibadan) and with the MMSE (0.75, p < 0.0001 for Indianapolis and 0.44, p < 0.0001 for Ibadan). For all items at both sites, the subjects without dementia performed significantly better than those with dementia. The area under the ROC curve for dementia diagnosis was 0.965 for Indianapolis and 0.925 for Ibadan. Conclusion The CHIF is a useful instrument to assess function directly in elderly participants in international studies, particularly in the absence of reliable informants. PMID:16640794
Friedman tongue position: age distribution and relationship to sleep-disordered breathing.

PubMed

Ingram, David G; Ruiz, Amanda; Friedman, Norman R

2015-05-01

Friedman tongue position (FTP) may play an important role in the evaluation of children with sleep-related breathing disorders (SRBD), but there are no previous data on FTP distribution by age. The objective of the current study was to determine the distribution of FTP by age and examine the relationship between FTP and snoring in children. Prospective cross-sectional study of 199 children (mean age, 6.8 years; 59% male) had tongue position assessed by FTP as part of their clinical examination of the oral cavity during routine ENT visits at a tertiary care children's hospital. The FTP and snoring frequency of participants was examined across the entire age range as well as by comparing those older (middle childhood and above) and younger than 5 years of age. Tongue position did not correlate with age or snoring frequency. The proportion of children with FTP III/IV was not significantly different in children younger than five years of age compared to older than five. Habitual snoring was not associated with having a higher FTP. Among children who snored <3 times per week, those who had previously undergone tonsillectomy did have higher FTP compared to those who had not (p=0.007). BMI-%-for-age was significantly correlated with FTP (p=0.003). The percent of children having FTP class III/IV differed significantly between ethnicities (22% of whites, 26% of others, 45% of hispanics, 53% of African-Americans; p=0.011). Inter-rater reliability among pediatric otolaryngologist was excellent (kappa=0.93, p<0.001). There does not appear to be an association between FTP with age or snoring frequency in children. The excellent inter-rater reliability for FTP among pediatric ENT providers suggests the null findings are not due to rater bias. These findings may serve as an important reference for those studying the role of tongue position in pediatric SRBD and complement previous studies examining FTP among children with known OSA or snoring. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Development and evaluation of the "BRISK Scale," a brief observational measure of risk communication competence.

PubMed

Han, Paul K J; Joekes, Katherine; Mills, Greg; Gutheil, Caitlin; Smith, Kahsi; Cochran, Nancy E; Elwyn, Glyn

2016-12-01

To develop and evaluate a brief observational measure of clinical risk communication competence. A 4-item checklist-type measure, the BRISK (Brief Risk Information Skill) Scale, was developed by selecting and refining items from a more comprehensive measure of clinical risk communication competence. Six volunteer raters received brief training on the measure and then used the BRISK Scale to evaluate 52 video-recorded encounters between 2nd-year medical students and standardized patients conducted as part of an Observed Structured Clinical Examination (OSCE) involving a risk communication task. Internal consistency reliability, inter-rater reliability, and criterion validity were assessed. Raters reported no difficulties using the BRISK Scale; scores across all raters and subjects ranged from 0 to 16 with a mean score of 6.49 (SD=3.17). The BRISK Scale showed good internal consistency reliability (α=0.64), and inter-rater reliability at the scale level (Intraclass Correlation Coefficient (ICC)=0.79 for consistency, and 0.75 for absolute agreement) and individual-item level (ICC range: 0.62-.91). Novice raters' BRISK Scale scores were highly correlated (r=0.84, p<0.01) with expert raters' scores on the Risk Communication Content measure, a more comprehensive measure of risk communication competence. The BRISK Scale is a promising new brief observational measure of clinical risk communication competence. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.