rater blinded reference: Topics by Science.gov

Sample records for rater blinded reference

Inter-rater reliability of Hamilton depression rating scale using video-recorded interviews — Focus on rater-blinding

PubMed Central

Prasad, M. Krishna; Udupa, K.; Kishore, K. R.; Thirthalli, J.; Sathyaprabha, T. N.; Gangadhar, B. N.

2009-01-01

Background: Hamilton depression rating scale (Ham-D) is the most widely used clinician rating scale for depression. There has been no Indian study that has examined the inter-rater reliability (IRR) of video-recorded interviews of the 21-item Ham-D. Aim: To study the IRR of scoring video-recorded interviews for 21-item Ham-D. Materials and Methods: Eighteen subjects with major depressive disorder involved in a larger study were interviewed using the semi-structured clinical interview of the 21-item Ham-D by a primary rater after informed consent. These interviews were video-recorded and portions edited to ensure rater blinding. Subsequently, the video-recorded interviews were rated by a “blind” rater. Both rated the different sub-domains of Ham-D according to Rhoades and Overall (1983). IRR was evaluated using intra-class correlation coefficient. Results: Excellent IRR was observed (0.9891) between the two raters. This was true for each of the primary factors and super-factors. Conclusion: Video recorded 21-item Ham-D has excellentIRR. Video-recorded interviews of Ham-D can be reliably used to blind raters in research. PMID:19881046
Frame of Reference Rater Training Issues: Recall, Time and Behavior Observation Training.

ERIC Educational Resources Information Center

Roch, Sylvia G.; O'Sullivan, Brian J.

2003-01-01

Graduate students were trained as raters either using frame of reference (FOR, n=220, behavior observation training (BOT, n=21), or performance appraisal (controls, n=21). They rated videotaped lecturers twice. FOR increased number of behaviors recalled; FOR and BOT improved recall quality. FOR improved rating accuracy even after 2 weeks.…
A Randomized, Rater-Blinded, Parallel Trial of Intensive Speech Therapy in Sub-Acute Post-Stroke Aphasia: The SP-I-R-IT Study

ERIC Educational Resources Information Center

Martins, Isabel Pavao; Leal, Gabriela; Fonseca, Isabel; Farrajota, Luisa; Aguiar, Marta; Fonseca, Jose; Lauterbach, Martin; Goncalves, Luis; Cary, M. Carmo; Ferreira, Joaquim J.; Ferro, Jose M.

2013-01-01

Background: There is conflicting evidence regarding the benefits of intensive speech and language therapy (SLT), particularly because intensity is often confounded with total SLT provided. Aims: A two-centre, randomized, rater-blinded, parallel study was conducted to compare the efficacy of 100 h of SLT in a regular (RT) versus intensive (IT)…
Rater methodology for stroboscopy: a systematic review.

PubMed

Bonilha, Heather Shaw; Focht, Kendrea L; Martin-Harris, Bonnie

2015-01-01

Laryngeal endoscopy with stroboscopy (LES) remains the clinical gold standard for assessing vocal fold function. LES is used to evaluate the efficacy of voice treatments in research studies and clinical practice. LES as a voice treatment outcome tool is only as good as the clinician interpreting the recordings. Research using LES as a treatment outcome measure should be evaluated based on rater methodology and reliability. The purpose of this literature review was to evaluate the rater-related methodology from studies that use stroboscopic findings as voice treatment outcome measures. Systematic literature review. Computerized journal databases were searched for relevant articles using terms: stroboscopy and treatment. Eligible articles were categorized and evaluated for the use of rater-related methodology, reporting of number of raters, types of raters, blinding, and rater reliability. Of the 738 articles reviewed, 80 articles met inclusion criteria. More than one-third of the studies included in the review did not report the number of raters who participated in the study. Eleven studies reported results of rater reliability analysis with only two studies reporting good inter- and intrarater reliability. The comparability and use of results from treatment studies that use LES are limited by a lack of rigor in rater methodology and variable, mostly poor, inter- and intrarater reliability. To improve our ability to evaluate and use the findings from voice treatment studies that use LES features as outcome measures, greater consistency of reporting rater methodology characteristics across studies and improved rater reliability is needed. Copyright © 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
On Rater Agreement and Rater Training

ERIC Educational Resources Information Center

Wang, Binhong

2010-01-01

This paper first analyzed two studies on rater factors and rating criteria to raise the problem of rater agreement. After that the author reveals the causes of discrepencies in rating administration by discussing rater variability and rater bias. The author argues that rater bias can not be eliminated completely, we can only reduce the error to a…
The Smile Esthetic Index (SEI): A method to measure the esthetics of the smile. An intra-rater and inter-rater agreement study.

PubMed

Rotundo, Roberto; Nieri, Michele; Bonaccini, Daniele; Mori, Massimiliano; Lamberti, Elena; Massironi, Domenico; Giachetti, Luca; Franchi, Lorenzo; Venezia, Piero; Cavalcanti, Raffaele; Bondi, Elena; Farneti, Mauro; Pinchi, Vilma; Buti, Jacopo

2015-01-01

To propose a method to measure the esthetics of the smile and to report its validation by means of an intra-rater and inter-rater agreement analysis. Ten variables were chosen as determinants for the esthetics of a smile: smile line and facial midline, tooth alignment, tooth deformity, tooth dischromy, gingival dischromy, gingival recession, gingival excess, gingival scars and diastema/missing papillae. One examiner consecutively selected seventy smile pictures, which were in the frontal view. Ten examiners, with different levels of clinical experience and specialties, applied the proposed assessment method twice on the selected pictures, independently and blindly. Intraclass correlation coefficient (ICC) and Fleiss' kappa) statistics were performed to analyse the intra-rater and inter-rater agreement. Considering the cumulative assessment of the Smile Esthetic Index (SEI), the ICC value for the inter-rater agreement of the 10 examiners was 0.62 (95% CI: 0.51 to 0.72), representing a substantial agreement. Intra-rater agreement ranged from 0.86 to 0.99. Inter-rater agreement (Fleiss' kappa statistics) calculated for each variable ranged from 0.17 to 0.75. The SEI was a reproducible method, to assess the esthetic component of the smile, useful for the diagnostic phase and for setting appropriate treatment plans.
Psychiatric comorbidity may not predict suicide during and after hospitalization. A nested case-control study with blinded raters.

PubMed

Walby, Fredrik A; Odegaard, Erik; Mehlum, Lars

2006-06-01

To investigate the differential impact of DSM-IV axis-I and axis-II disorders on completed suicide and to study if psychiatric comorbidity increases the risk of suicide in currently and previously hospitalized psychiatric patients. A nested case-control design based on case notes from 136 suicides and 166 matched controls. All cases and controls were rediagnosed using the SCID-CV for axis-I and the DSM-IV criteria for axis-II disorders and the inter-rater reliability was satisfactory. Raters were blind to the case and control status and the original hospital diagnoses. Depressive disorders and bipolar disorders were associated with an increased risk of suicide. No such effect was found for comorbidity between axis-I disorders and for comorbidity between axis-I and axis-II disorders. Psychiatric diagnoses, although made using a structured and criteria-based approach, was based on information recorded in case notes. Axis-II comorbidity could only be investigated at an aggregated level. Psychiatric comorbidity did not predict suicide in this sample. Mood disorders did, however, increase the risk significantly independent of history of previous suicide attempts. Both findings can inform identification and treatment of patients at high risk for completed suicide.
Effects of a rater training on rating accuracy in a physical examination skills assessment.

PubMed

Weitz, Gunther; Vinzentius, Christian; Twesten, Christoph; Lehnert, Hendrik; Bonnemeier, Hendrik; König, Inke R

2014-01-01

The accuracy and reproducibility of medical skills assessment is generally low. Rater training has little or no effect. Our knowledge in this field, however, relies on studies involving video ratings of overall clinical performances. We hypothesised that a rater training focussing on the frame of reference could improve accuracy in grading the curricular assessment of a highly standardised physical head-to-toe examination. Twenty-one raters assessed the performance of 242 third-year medical students. Eleven raters had been randomly assigned to undergo a brief frame-of-reference training a few days before the assessment. 218 encounters were successfully recorded on video and re-assessed independently by three additional observers. Accuracy was defined as the concordance between the raters' grade and the median of the observers' grade. After the assessment, both students and raters filled in a questionnaire about their views on the assessment. Rater training did not have a measurable influence on accuracy. However, trained raters rated significantly more stringently than untrained raters, and their overall stringency was closer to the stringency of the observers. The questionnaire indicated a higher awareness of the halo effect in the trained raters group. Although the self-assessment of the students mirrored the assessment of the raters in both groups, the students assessed by trained raters felt more discontent with their grade. While training had some marginal effects, it failed to have an impact on the individual accuracy. These results in real-life encounters are consistent with previous studies on rater training using video assessments of clinical performances. The high degree of standardisation in this study was not suitable to harmonize the trained raters' grading. The data support the notion that the process of appraising medical performance is highly individual. A frame-of-reference training as applied does not effectively adjust the physicians' judgement
Rater Wealth Predicts Perceptions of Outgroup Competence

PubMed Central

Chan, Wayne; McCrae, Robert R.; Rogers, Darrin L.; Weimer, Amy A.; Greenberg, David M.; Terracciano, Antonio

2011-01-01

National income has a pervasive influence on the perception of ingroup stereotypes, with high status and wealthy targets perceived as more competent. In two studies we investigated the degree to which economic wealth of raters related to perceptions of outgroup competence. Raters’ economic wealth predicted trait ratings when 1) raters in 48 other cultures rated Americans’ competence and 2) Mexican Americans rated Anglo Americans’ competence. Rater wealth also predicted ratings of interpersonal warmth on the culture level. In conclusion, raters’ economic wealth, either nationally or individually, is significantly associated with perception of outgroup members, supporting the notion that ingroup conditions or stereotypes function as frames of reference in evaluating outgroup traits. PMID:22379232
Accelerated resolution of laser-induced bruising with topical 20% arnica: a rater-blinded randomized controlled trial.

PubMed

Leu, S; Havey, J; White, L E; Martin, N; Yoo, S S; Rademaker, A W; Alam, M

2010-09-01

Dermatological procedures can result in disfiguring bruises that resolve slowly. To assess the comparative utility of topical formulations in hastening the resolution of skin bruising. Healthy volunteers, age range 21-65 years, were enrolled for this double (patient and rater) blinded randomized controlled trial. For each subject, four standard bruises of 7 mm diameter each were created on the bilateral upper inner arms, 5 cm apart, two per arm, using a 595-nm pulsed-dye laser (Vbeam; Candela Corp., Wayland, MA, U.S.A.). Randomization was used to assign one topical agent (5% vitamin K, 1% vitamin K and 0·3% retinol, 20% arnica, or white petrolatum) to exactly one bruise per subject, which was then treated under occlusion twice a day for 2 weeks. A dermatologist not involved with subject assignment rated bruises [visual analogue scale, 0 (least)-10 (most)] in standardized photographs immediately after bruise creation and at week 2. There was significant difference in the change in the rater bruising score associated with the four treatments (anova, P=0·016). Pairwise comparisons indicated that the mean improvement associated with 20% arnica was greater than with white petrolatum (P=0·003), and the improvement with arnica was greater than with the mixture of 1% vitamin K and 0·3% retinol (P=0·01). Improvement with arnica was not greater than with 5% vitamin K cream, however. Topical 20% arnica ointment may be able to reduce bruising more effectively than placebo and more effectively than low-concentration vitamin K formulations, such as 1% vitamin K with 0·3% retinol. © 2010 The Authors. Journal Compilation © 2010 British Association of Dermatologists.
Effects of a rater training on rating accuracy in a physical examination skills assessment

PubMed Central

Weitz, Gunther; Vinzentius, Christian; Twesten, Christoph; Lehnert, Hendrik; Bonnemeier, Hendrik; König, Inke R.

2014-01-01

Background: The accuracy and reproducibility of medical skills assessment is generally low. Rater training has little or no effect. Our knowledge in this field, however, relies on studies involving video ratings of overall clinical performances. We hypothesised that a rater training focussing on the frame of reference could improve accuracy in grading the curricular assessment of a highly standardised physical head-to-toe examination. Methods: Twenty-one raters assessed the performance of 242 third-year medical students. Eleven raters had been randomly assigned to undergo a brief frame-of-reference training a few days before the assessment. 218 encounters were successfully recorded on video and re-assessed independently by three additional observers. Accuracy was defined as the concordance between the raters' grade and the median of the observers' grade. After the assessment, both students and raters filled in a questionnaire about their views on the assessment. Results: Rater training did not have a measurable influence on accuracy. However, trained raters rated significantly more stringently than untrained raters, and their overall stringency was closer to the stringency of the observers. The questionnaire indicated a higher awareness of the halo effect in the trained raters group. Although the self-assessment of the students mirrored the assessment of the raters in both groups, the students assessed by trained raters felt more discontent with their grade. Conclusions: While training had some marginal effects, it failed to have an impact on the individual accuracy. These results in real-life encounters are consistent with previous studies on rater training using video assessments of clinical performances. The high degree of standardisation in this study was not suitable to harmonize the trained raters’ grading. The data support the notion that the process of appraising medical performance is highly individual. A frame-of-reference training as applied does not
Exploring the role of first impressions in rater-based assessments.

PubMed

Wood, Timothy J

2014-08-01

Medical education relies heavily on assessment formats that require raters to assess the competence and skills of learners. Unfortunately, there are often inconsistencies and variability in the scores raters assign. To ensure the scores from these assessment tools have validity, it is important to understand the underlying cognitive processes that raters use when judging the abilities of their learners. The goal of this paper, therefore, is to contribute to a better understanding of the cognitive processes used by raters. Representative findings from the social judgment and decision making, cognitive psychology, and educational measurement literature will be used to enlighten the underpinnings of these rater-based assessments. Of particular interest is the impact judgments referred to as first impressions (or thin slices) have on rater-based assessments. These are judgments about people made very quickly and based on very little information. A narrative review will provide a synthesis of research in these three literatures (social judgment and decision making, educational psychology, and cognitive psychology) and will focus on the underlying cognitive processes, the accuracy and the impact of first impressions on rater-based assessments. The application of these findings to the types of rater-based assessments used in medical education will then be reviewed. Gaps in understanding will be identified and suggested directions for future research studies will be discussed.
Introducing a new definition of a near fall: intra-rater and inter-rater reliability.

PubMed

Maidan, I; Freedman, T; Tzemah, R; Giladi, N; Mirelman, A; Hausdorff, J M

2014-01-01

Near falls (NFs) are more frequent than falls, and may occur before falls, potentially predicting fall risk. As such, identification of a NF is important. We aimed to assess intra and inter-rater reliability of the traditional definition of a NF and to demonstrate the potential utility of a new definition. To this end, 10 older adults, 10 idiopathic elderly fallers, and 10 patients with Parkinson's disease (PD) walked in an obstacle course while wearing a safety harness. All walks were videotaped. Forty-nine video segments were extracted to create 2 clips each of 8.48 min. Four raters scored each event using the traditional definition and, two weeks later, using the new definition. A fifth rater used only the new definition. Intra-rater reliability was determined using Kappa (K) statistics and inter-rater reliability was determined using ICC. Using the traditional definition, three raters had poor intra-rater reliability (K<0.054, p>0.137) and one rater had moderate intra-rater reliability (K=0.624, p<0.001). With the traditional definition, inter-rater reliability between the four raters was moderate (ICC=0.667, p<0.001). In contrast, the new NF definition showed high intra-rater (K>0.601, p<0.001) and excellent inter-rater reliability (ICC=0.815, p<0.001). A priori, it is easy to distinguish falls from usual walking and NFs, but it is more challenging to distinguish NFs from obstacle negotiation and usual walking. Therefore, a more precise definition of NF is required. The results of the present study suggest that the proposed new definition increases intra and inter-rater reliability, a critical step for using NFs to quantify fall risk. Copyright © 2013 Elsevier B.V. All rights reserved.
Inter-Rater Reliability and Intra-Rater Reliability of Assessing the 2-Minute Push-Up Test.

PubMed

Fielitz, Lynn; Coelho, Jeffrey; Horne, Thomas; Brechue, William

2016-02-01

The purpose of this study was to assess inter-rater reliability and intra-rater reliability of the 2-minute, 90° push-up test as utilized in the Army Physical Fitness Test. Analysis of rater assessment reliability included both total score agreement and agreement across individual push-up repetitions. This study utilized 8 Raters who assessed 15 different videotaped push-up performances over 4 iterations separated by a minimum of 1 week. The 15 push-up participants were videotaped during the semiannual Army Physical Fitness Test. Each Rater randomly viewed the 15 push-up and verbally responded with a "yes" or "no" to each push-up repetition. The data generated were analyzed using the Pearson product-moment correlation as well as the kappa, modified kappa and the intra-class correlation coefficient (3,1). An attribute agreement analysis was conducted to determine the percent of inter-rater and intra-rater agreement across individual push-ups.The results indicated that Raters varied a great deal in assessing push-ups. Over the 4 trials of 15 participants, the overall scores of the Raters varied between 3.0 and 35.7 push-ups. Post hoc comparisons found that there was significant increase in the grand mean of push-ups from trials 1-3 to trial 4 (p < 0.05). Also, there was a significant difference among raters over the 4 trials (p < 0.05). Pearson correlation coefficients for inter-rater and intra-rater reliability identified inter-rater reliability coefficients were between 0.10 and 0.97. Intra-rater coefficients were between 0.48 and 0.99. Intra-rater agreement for individual push-up repetitions ranged from 41.8% to 84.8%. The results indicated that the raters failed to assess the same push-up repetition with the same score (below 70% agreement) as well as failed to agree when viewed between raters (29%). Interestingly, as previously mentioned, scores on trial 4 increased significantly which might have been caused by rater drift or that the Raters did not maintain
Rating the raters in a mixed model: An approach to deciphering the rater reliability

NASA Astrophysics Data System (ADS)

Shang, Junfeng; Wang, Yougui

2013-05-01

Rating the raters has attracted extensive attention in recent years. Ratings are quite complex in that the subjective assessment and a number of criteria are involved in a rating system. Whenever the human judgment is a part of ratings, the inconsistency of ratings is the source of variance in scores, and it is therefore quite natural for people to verify the trustworthiness of ratings. Accordingly, estimation of the rater reliability will be of great interest and an appealing issue. To facilitate the evaluation of the rater reliability in a rating system, we propose a mixed model where the scores of the ratees offered by a rater are described with the fixed effects determined by the ability of the ratees and the random effects produced by the disagreement of the raters. In such a mixed model, for the rater random effects, we derive its posterior distribution for the prediction of random effects. To quantitatively make a decision in revealing the unreliable raters, the predictive influence function (PIF) serves as a criterion which compares the posterior distributions of random effects between the full data and rater-deleted data sets. The benchmark for this criterion is also discussed. This proposed methodology of deciphering the rater reliability is investigated in the multiple simulated and two real data sets.
Rater agreement reliability of the dial test in the ACL-deficient knee.

PubMed

Slichter, Malou E; Wolterbeek, Nienke; Auw Yang, K Gie; Zijl, Jacco A C; Piscaer, Tom M

2018-06-14

Posterolateral rotatory instability (PLRI) of the knee can easily be missed, because attention is paid to injury of the cruciate ligaments. If left untreated this clinical instability may persist after reconstruction of the cruciate ligaments and may put the graft at risk of failure. Even though the dial test is widely used to diagnose PLRI, no validity and reliability studies of the manual dial test are yet performed in patients. This study focuses on the reliability of the manual dial test by determining the rater agreement. Two independent examiners performed the dial test in knees of 52 patients after knee distorsion with a suspicion on ACL rupture. The dial test was performed in prone position in 30°, 60° and 90° of flexion of the knees. ≥10° side-to-side difference was considered a positive dial test. For quantification of the amount of rotation in degrees, a measuring device was used with a standardized 6 Nm force, using a digital torque adapter on a booth. The intra-rater, inter-rater and rater-device agreement were determined by calculating kappa (κ) for the dial test. A positive dial test was found in 21.2% and 18.0% of the patients as assessed by a blinded examiner and orthopaedic surgeon respectively. Fair inter-rater agreement was found in 30° of flexion, κ F = 0.29 (95% CI: 0.01 to 0.56), p = 0.044 and 90° of flexion, κ F = 0.38 (95% CI: 0.10 to 0.66), p = 0.007. Almost perfect rater-device agreement was found in 30° of flexion, κ C = 0.84 (95% CI: 0.52 to 1.15), p < 0.001. Moderate rater-device agreement was found in 30° and 90° combined, κ C = 0.50 (95% CI: 0.13 to 0.86), p = 0.008. No significant intra-rater agreement was found. Rater agreement reliability of the manual dial test is questionable. It has a fair inter-rater agreement in 30° and 90° of flexion.
The Effects of Rater Training on Inter-Rater Agreement

ERIC Educational Resources Information Center

Pufpaff, Lisa A.; Clarke, Laura; Jones, Ruth E.

2015-01-01

This paper addresses the effects of rater training on the rubric-based scoring of three preservice teacher candidate performance assessments. This project sought to evaluate the consistency of ratings assigned to student learning outcome measures being used for program accreditation and to explore the need for rater training in order to increase…
Inter-rater and intra-rater reliability of a movement control test in shoulder.

PubMed

Rajasekar, S; Bangera, Rakshith K; Sekaran, Padmanaban

2017-07-01

Movement faults are commonly observed in patients with musculoskeletal pain. The Kinetic Medial Rotation Test (KMRT) is a movement control test used to identify movement faults of the scapula and gleno-humeral joints during arm movement. Objective tests such as the KMRT need to be reliable and valid for the results to be applied across different clinical settings and patient populations. The primary objective of the present study was to determine the intra-rater and inter-rater reliability of KMRT in subjects with and without shoulder pain. Sixty subjects were included in this study based on specific inclusion and exclusion criteria. Two musculoskeletal physiotherapists with different levels of clinical experience performed the tests. The intra-rater reliability was tested in twenty asymptomatic subjects by a single assessor at two week intervals. An equal number of subjects with and without shoulder pain were tested by both the assessors to determine the inter-rater reliability. Both components of the KMRT, the Gleno- Humeral Anterior Translation (GHAT) and the Scapular Forward Tilt (SCFT) were tested. The Kappa values for inter-rater reliability of the GHAT and SCFT were K = 0.68 & K = 0.65 respectively in subjects with shoulder pain. In asymptomatic subjects, the inter-rater reliability of GHAT was K = 0.61 and SCFT was K = 0.85. Intra-rater reliability ranged from K = 0.66 for GHAT to K = 0.87 for SCFT. Our study found substantial agreement in inter-rater reliability of KMRT in subjects with shoulder pain, whereas substantial to near perfect agreement was found in intra-rater and inter-rater reliability of KMRT in subjects without shoulder pain. Copyright © 2017 Elsevier Ltd. All rights reserved.
Inter-rater agreement on PIVC-associated phlebitis signs, symptoms and scales.

PubMed

Marsh, Nicole; Mihala, Gabor; Ray-Barruel, Gillian; Webster, Joan; Wallis, Marianne C; Rickard, Claire M

2015-10-01

Many peripheral intravenous catheter (PIVC) infusion phlebitis scales and definitions are used internationally, although no existing scale has demonstrated comprehensive reliability and validity. We examined inter-rater agreement between registered nurses on signs, symptoms and scales commonly used in phlebitis assessment. Seven PIVC-associated phlebitis signs/symptoms (pain, tenderness, swelling, erythema, palpable venous cord, purulent discharge and warmth) were observed daily by two raters (a research nurse and registered nurse). These data were modelled into phlebitis scores using 10 different tools. Proportions of agreement (e.g. positive, negative), observed and expected agreements, Cohen's kappa, the maximum achievable kappa, prevalence- and bias-adjusted kappa were calculated. Two hundred ten patients were recruited across three hospitals, with 247 sets of paired observations undertaken. The second rater was blinded to the first's findings. The Catney and Rittenberg scales were the most sensitive (phlebitis in >20% of observations), whereas the Curran, Lanbeck and Rickard scales were the most restrictive (≤2% phlebitis). Only tenderness and the Catney (one of pain, tenderness, erythema or palpable cord) and Rittenberg scales (one of erythema, swelling, tenderness or pain) had acceptable (more than two-thirds, 66.7%) levels of inter-rater agreement. Inter-rater agreement for phlebitis assessment signs/symptoms and scales is low. This likely contributes to the high degree of variability in phlebitis rates in literature. We recommend further research into assessment of infrequent signs/symptoms and the Catney or Rittenberg scales. New approaches to evaluating vein irritation that are valid, reliable and based on their ability to predict complications need exploration. © 2015 John Wiley & Sons, Ltd.
Inter- and intra-rater reliability and agreement in determining subcutaneous tumour margins in dogs.

PubMed

Ranganathan, B; Milovancev, M; Leeper, H; Townsend, K L; Bracha, S; Curran, K

2018-03-01

The objective of this prospective study was to evaluate agreement and reliability of calliper-based measurements of locally invasive subcutaneous malignant tumours in dogs. Four raters measured the longest diameter of 12 subcutaneous tumours (7 soft tissue sarcomas and 5 mast cell tumours) from 11 client-owned dogs during 3 randomized, blinded measurement trials, both pre- and post-sedation. Inter- and intra-rater reliability was evaluated using intra-class correlation coefficient (ICC) and agreement was evaluated using Bland-Altman plots. Inter- and intra-rater reliability was good (ICC range of 0.8694-0.89520) and excellent (ICC range of 0.9720-0.9966), respectively. For agreement calculations, an a priori clinically relevant limit of agreement of 10 mm was set. Inter- and intra-rater agreement was unacceptable with inter-rater limits of agreement ranging from 15.9 to 55.6 mm and intra-rater limit of agreement ranging from 11.9 to 28.1 mm. Review of the measurement trial photographs revealed that calliper orientation changes were frequent, occurring in 9/12 (75%) and 8/12 (67%) pre- and post-sedation cases. No significant correlation was found between inter-rater measurement standard deviations and calliper orientation changes or dog body condition score. These findings suggest veterinarians may have poor agreement in determining the gross edge of tumours, which is expected to introduce bias and inconsistency in tumour staging, assessing response to therapy, and surgical margin planning. Due to the potential consequences for veterinary cancer patients, future studies are needed to validate the present findings. © 2018 John Wiley & Sons Ltd.

Inter-rater and intra-rater reliability of the Bahasa Melayu version of Rose Angina Questionnaire.

PubMed

Hassan, N B; Choudhury, S R; Naing, L; Conroy, R M; Rahman, A R A

2007-01-01

The objective of the study is to translate the Rose Questionnaire (RQ) into a Bahasa Melayu version and adapt it cross-culturally, and to measure its inter-rater and intrarater reliability. This cross sectional study was conducted in the respondents' homes or workplaces in Kelantan, Malaysia. One hundred respondents aged 30 and above with different socio-demographic status were interviewed for face validity. For each inter-rater and intra-rater reliability, a sample of 150 respondents was interviewed. Inter-rater and intra-rater reliabilities were assessed by Cohen's kappa. The overall inter-rater agreements by the five pair of interviewers at point one and two were 0.86, and intrarater reliability by the five interviewers on the seven-item questionnaire at poinone and two was 0.88, as measured by kappa coefficient. The translated Malay version of RQ demonstrated an almost perfect inter-rater and intra-rater reliability and further validation such as sensitivity and specificity analysis of this translated questionnaire is highly recommended.
Inter-Rater Variability as Mutual Disagreement: Identifying Raters' Divergent Points of View

ERIC Educational Resources Information Center

Gingerich, Andrea; Ramlo, Susan E.; van der Vleuten, Cees P. M.; Eva, Kevin W.; Regehr, Glenn

2017-01-01

Whenever multiple observers provide ratings, even of the same performance, inter-rater variation is prevalent. The resulting "idiosyncratic rater variance" is considered to be unusable error of measurement in psychometric models and is a threat to the defensibility of our assessments. Prior studies of inter-rater variation in clinical…
Longitudinal Rater Modeling with Splines

ERIC Educational Resources Information Center

Dobria, Lidia

2011-01-01

Performance assessments rely on the expert judgment of raters for the measurement of the quality of responses, and raters unavoidably introduce error in the scoring process. Defined as the tendency of a rater to assign higher or lower ratings, on average, than those assigned by other raters, even after accounting for differences in examinee…
How reliable are Functional Movement Screening scores? A systematic review of rater reliability.

PubMed

Moran, Robert W; Schneiders, Anthony G; Major, Katherine M; Sullivan, S John

2016-05-01

rater blinding. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
A Hierarchical Rater Model for Constructed Responses, with a Signal Detection Rater Model

ERIC Educational Resources Information Center

DeCarlo, Lawrence T.; Kim, YoungKoung; Johnson, Matthew S.

2011-01-01

The hierarchical rater model (HRM) recognizes the hierarchical structure of data that arises when raters score constructed response items. In this approach, raters' scores are not viewed as being direct indicators of examinee proficiency but rather as indicators of essay quality; the (latent categorical) quality of an examinee's essay in turn…
The Assignment of Raters to Items: Controlling for Rater Effects.

ERIC Educational Resources Information Center

Sykes, Robert C.; Heidorn, Mark; Lee, Guemin

A study was conducted to evaluate the effect of different modes (modalities) of assigning raters to test items. The impact on total constructed response (c.r.) score, and subsequently on total test score, of assigning a single versus multiple raters to an examination reading of a student's set of c.r. responses was evaluated for several mixed-item…
Computerized back postural assessment in physiotherapy practice: Intra-rater and inter-rater reliability of the MIDAS system.

PubMed

McAlpine, R T; Bettany-Saltikov, J A; Warren, J G

2009-01-01

Assessment of spinal posture during physiotherapy practice is routine, yet few objective measures exist to this end. The Middlesbrough Integrated Digital Assessment System (MIDAS) is a low cost portable system able to record 3D information on posture. The purpose of this study was to assess both the intra-rater and inter-rater reliability of the MIDAS system. Twenty-five healthy subjects were recruited. A repeated measures design was used to record fifteen pre-palpated landmarks on the back of each subject. To limit the sources of variability, the principal researcher palpated the landmarks for each subject. Each of three raters took two measurements on each subject in a standardized upright posture. X (medio-lateral), Y (antero-posterior) and Z (height) landmark positions were recorded via a computer interface. Both intra-rater agreement (mean ICCs - rater 1 r=0.970, rater 2 r=0.965 and rater 3 r=0.965, p< 0.001) and inter-rater agreement (mean ICCs r=0.967, p< 0.001) was very high between repeated measures and between markers. Error values for the z-axis (height) were the lowest. The MIDAS demonstrated both high inter-rater and intra-rater reliability and provides an objective method for the assessment of posture in physiotherapy practice.
An assessment of the inter-rater reliability of the ASA physical status score in the orthopaedic trauma population.

PubMed

Ihejirika, Rivka C; Thakore, Rachel V; Sathiyakumar, Vasanth; Ehrenfeld, Jesse M; Obremskey, William T; Sethi, Manish K

2015-04-01

Although recent literature has demonstrated the utility of the ASA score in predicting postoperative length of stay, complication risk and potential utilization of other hospital resources, the ASA score has been inconsistently assigned by anaesthesia providers. This study tested the reliability of assignment of the ASA score classification by both attending anaesthesiologists and anaesthesia residents specifically among the orthopaedic trauma patient population. Nine case-based scenarios were created involving preoperative patients with isolated operative orthopaedic trauma injuries. The cases were created and assigned a reference score by both an attending anaesthesiologist and orthopaedic trauma surgeon. Attending and resident anaesthesiologists were asked to assign an ASA score for each case. Rater versus reference and inter-rater agreement amongst respondents was then analyzed utilizing Fleiss's Kappa and weighted and unweighted Cohen's Kappa. Thirty three individuals provided ASA scores for each of the scenarios. The average rater versus reference reliability was substantial (Kw=0.78, SD=0.131, 95% CI=0.73-0.83). The average rater versus reference Kuw was also substantial (Kuw=0.64, SD=0.21, 95% CI=0.56-0.71). The inter-rater reliability as evaluated by Fleiss's Kappa was moderate (K=0.51, p<.001). An inter-rater comparison within the group of attendings (K=0.50, p<.001) and within the group of residents were both moderate (K=0.55, p<.001). There was a significant increase in the level of inter-rater reliability from the self-reported 'very uncomfortable' participants to the 'very comfortable' participants (uncomfortable K=0.43, comfortable K=0.59, p<.001). This study shows substantial agreement strength for reliability of the ASA score among anaesthesiologists when evaluating orthopaedic trauma patients. The significant increase in inter-rater reliability based on anaesthesiologists' comfort with the ASA scoring method implies a need for further evaluation
Evaluating Rater Accuracy in Rater-Mediated Assessments Using an Unfolding Model

ERIC Educational Resources Information Center

Wang, Jue; Engelhard, George, Jr.; Wolfe, Edward W.

2016-01-01

The number of performance assessments continues to increase around the world, and it is important to explore new methods for evaluating the quality of ratings obtained from raters. This study describes an unfolding model for examining rater accuracy. Accuracy is defined as the difference between observed and expert ratings. Dichotomous accuracy…
Workplace-based assessment: raters' performance theories and constructs.

PubMed

Govaerts, M J B; Van de Wiel, M W J; Schuwirth, L W T; Van der Vleuten, C P M; Muijtjens, A M M

2013-08-01

Weaknesses in the nature of rater judgments are generally considered to compromise the utility of workplace-based assessment (WBA). In order to gain insight into the underpinnings of rater behaviours, we investigated how raters form impressions of and make judgments on trainee performance. Using theoretical frameworks of social cognition and person perception, we explored raters' implicit performance theories, use of task-specific performance schemas and the formation of person schemas during WBA. We used think-aloud procedures and verbal protocol analysis to investigate schema-based processing by experienced (N = 18) and inexperienced (N = 16) raters (supervisor-raters in general practice residency training). Qualitative data analysis was used to explore schema content and usage. We quantitatively assessed rater idiosyncrasy in the use of performance schemas and we investigated effects of rater expertise on the use of (task-specific) performance schemas. Raters used different schemas in judging trainee performance. We developed a normative performance theory comprising seventeen inter-related performance dimensions. Levels of rater idiosyncrasy were substantial and unrelated to rater expertise. Experienced raters made significantly more use of task-specific performance schemas compared to inexperienced raters, suggesting more differentiated performance schemas in experienced raters. Most raters started to develop person schemas the moment they began to observe trainee performance. The findings further our understanding of processes underpinning judgment and decision making in WBA. Raters make and justify judgments based on personal theories and performance constructs. Raters' information processing seems to be affected by differences in rater expertise. The results of this study can help to improve rater training, the design of assessment instruments and decision making in WBA.
Kappa and Rater Accuracy: Paradigms and Parameters

ERIC Educational Resources Information Center

Conger, Anthony J.

2017-01-01

Drawing parallels to classical test theory, this article clarifies the difference between rater accuracy and reliability and demonstrates how category marginal frequencies affect rater agreement and Cohen's kappa. Category assignment paradigms are developed: comparing raters to a standard (index) versus comparing two raters to one another…
Systematic review of blinding assessment in randomized controlled trials in schizophrenia and affective disorders 2000-2010.

PubMed

Baethge, Christopher; Assall, Oliver P; Baldessarini, Ross J

2013-01-01

Blinding is an integral part of many randomized controlled trials (RCTs). However, both blinding and blinding assessment seem to be rarely documented in trial reports. Systematic review of articles on RCTs in schizophrenia and affective disorders research during 2000-2010. Among 2,467 publications, 61 (2.5%; 95% confidence interval: 1.9-3.1%) reported assessing participant, rater, or clinician blinding: 5/672 reports on schizophrenia (0.7%; 0.3-1.6%) and 33/1,079 (3.1%; 2.1-4.2%) on affective disorders, without significant trends across the decade. Rarely was blinding assessed at the beginning, in most studies assessment was at the end. Proportion of patients' and raters' correct guesses of study arm averaged 54.4 and 62.0% per study, with slightly more correct guesses in treatment arms than in placebo arms. Three fourths of responders correctly guessed that they received the active agent. Blinding assessment was more frequently reported in papers on psychotherapy and brain stimulation than on drug trials (5.1%, 1.7-11.9%, vs. 8.3%, 4.3-14.4%, vs. 2.1%, 1.5-2.8%). Lack of assessment of blinding was associated with: (a) positive findings, (b) full industrial sponsorship, and (c) diagnosis of schizophrenia. There was a moderate association of treatment success and blinding status of both trial participants (r = 0.51, p = 0.002) and raters (r = 0.55, p = 0.067). Many RCT reports did not meet CONSORT standards regarding documentation of persons blinded (60%) or of efforts to match interventions (50%). Recent treatment trials in major psychiatric disorders rarely reported on or evaluated blinding. We recommend routine documentation of blinding strategies in reports. Copyright © 2013 S. Karger AG, Basel.
Intra-rater and inter-rater reliability of ultrasonographic measurements of acromion-greater tuberosity distance in patients with post-stroke hemiplegia.

PubMed

Kumar, Praveen; Cruziah, Reynold; Bradley, Michael; Gray, Selena; Swinkels, Annette

2016-06-01

Glenohumeral subluxation (GHS) is reported in up to 81% of patients with stroke. Ultrasonographic measurements of GHS by measuring the acromion-greater tuberosity (AGT) have been found to be reliable for experienced raters. The primary aim was to assess the intra-rater reliability of measurements of AGT distance in people with stroke following a short course of rater training. A secondary aim was to compare the inter-rater reliability of these measurements between novice and experienced raters. Patients with stroke (n = 16; 5 men, 11 women; 74 ± 10 years) with 1-sided weakness who gave informed consent were recruited. Ultrasonographic measurements were recorded at the bedside by two physiotherapists with patients seated upright in a hospital chair. Reliability was assessed by intra-class correlation coefficients (ICCs) and the standard error of measurements (SEM). Minimum detectable change (MDC90) scores were used to estimate the magnitude of change that is likely to exceed measurement error. Mean ± SD AGT distances on the affected and unaffected sides for rater 1 were 2.2 ± 0.7 and 1.7 ± 0.4 cm, respectively. Corresponding values for rater 2 were 2.5 ± 0.6 and 2.0 ± 0.4 cm. Intra-class correlation coefficient values for the affected and unaffected shoulders for rater 1 were 0.96 and 0.91, respectively. Corresponding values for rater 2 were 0.95 and 0.90.SEM and MDC90 for both affected and unaffected shoulders were ≤ 0.2 cm. Inter-rater reliability coefficients were 0.86 (affected) and 0.76 (unaffected) shoulders. Ultrasonographic measurement of AGT distance demonstrates excellent intra-rater reliability for a novice rater. Inter-rater reliability of ultrasonographic measurement of AGT also demonstrates good reliability between novice and experienced raters.
How Do Raters Judge Spoken Vocabulary?

ERIC Educational Resources Information Center

Li, Hui

2016-01-01

The aim of the study was to investigate how raters come to their decisions when judging spoken vocabulary. Segmental rating was introduced to quantify raters' decision-making process. It is hoped that this simulated study brings fresh insight to future methodological considerations with spoken data. Twenty trainee raters assessed five Chinese…
Test-re-test reliability and inter-rater reliability of a digital pelvic inclinometer in young, healthy males and females.

PubMed

Beardsley, Chris; Egerton, Tim; Skinner, Brendon

2016-01-01

Objective. The purpose of this study was to investigate the reliability of a digital pelvic inclinometer (DPI) for measuring sagittal plane pelvic tilt in 18 young, healthy males and females. Method. The inter-rater reliability and test-re-test reliabilities of the DPI for measuring pelvic tilt in standing on both the right and left sides of the pelvis were measured by two raters carrying out two rating sessions of the same subjects, three weeks apart. Results. For measuring pelvic tilt, inter-rater reliability was designated as good on both sides (ICC = 0.81-0.88), test-re-test reliability within a single rating session was designated as good on both sides (ICC = 0.88-0.95), and test-re-test reliability between two rating sessions was designated as moderate on the left side (ICC = 0.65) and good on the right side (ICC = 0.85). Conclusion. Inter-rater reliability and test-re-test reliability within a single rating session of the DPI in measuring pelvic tilt were both good, while test-re-test reliability between rating sessions was moderate-to-good. Caution is required regarding the interpretation of the test-re-test reliability within a single rating session, as the raters were not blinded. Further research is required to establish validity.
Inter-rater reliability of an observation-based ergonomics assessment checklist for office workers.

PubMed

Pereira, Michelle Jessica; Straker, Leon Melville; Comans, Tracy Anne; Johnston, Venerina

2016-12-01

To establish the inter-rater reliability of an observation-based ergonomics assessment checklist for computer workers. A 37-item (38-item if a laptop was part of the workstation) comprehensive observational ergonomics assessment checklist comparable to government guidelines and up to date with empirical evidence was developed. Two trained practitioners assessed full-time office workers performing their usual computer-based work and evaluated the suitability of workstations used. Practitioners assessed each participant consecutively. The order of assessors was randomised, and the second assessor was blinded to the findings of the first. Unadjusted kappa coefficients between the raters were obtained for the overall checklist and subsections that were formed from question-items relevant to specific workstation equipment. Twenty-seven office workers were recruited. The inter-rater reliability between two trained practitioners achieved moderate to good reliability for all except one checklist component. This checklist has mostly moderate to good reliability between two trained practitioners. Practitioner Summary: This reliable ergonomics assessment checklist for computer workers was designed using accessible government guidelines and supplemented with up-to-date evidence. Employers in Queensland (Australia) can fulfil legislative requirements by using this reliable checklist to identify and subsequently address potential risk factors for work-related injury to provide a safe working environment.
Kappa and Rater Accuracy: Paradigms and Parameters.

PubMed

Conger, Anthony J

2017-12-01

Drawing parallels to classical test theory, this article clarifies the difference between rater accuracy and reliability and demonstrates how category marginal frequencies affect rater agreement and Cohen's kappa (κ). Category assignment paradigms are developed: comparing raters to a standard (index) versus comparing two raters to one another (concordance), using both nonstochastic and stochastic category membership. Using a probability model to express category assignments in terms of rater accuracy and random error, it is shown that observed agreement (Po) depends only on rater accuracy and number of categories; however, expected agreement (Pe) and κ depend additionally on category frequencies. Moreover, category frequencies affect Pe and κ solely through the variance of the category proportions, regardless of the specific frequencies underlying the variance. Paradoxically, some judgment paradigms involving stochastic categories are shown to yield higher κ values than their nonstochastic counterparts. Using the stated probability model, assignments to categories were generated for 552 combinations of paradigms, rater and category parameters, category frequencies, and number of stimuli. Observed means and standard errors for Po, Pe, and κ were fully consistent with theory expectations. Guidelines for interpretation of rater accuracy and reliability are offered, along with a discussion of alternatives to the basic model.
The Effects of Rater Severity and Rater Distribution on Examinees' Ability Estimation for Constructed-Response Items. Research Report. ETS RR-13-23

ERIC Educational Resources Information Center

Wang, Zhen; Yao, Lihua

2013-01-01

The current study used simulated data to investigate the properties of a newly proposed method (Yao's rater model) for modeling rater severity and its distribution under different conditions. Our study examined the effects of rater severity, distributions of rater severity, the difference between item response theory (IRT) models with rater effect…
Age Matters, and so May Raters: Rater Differences in the Assessment of Foreign Accents

ERIC Educational Resources Information Center

Huang, Becky H.; Jun, Sun-Ah

2015-01-01

Research on the age of learning effect on second language learners' foreign accents utilizes human judgments to determine speech production outcomes. Inferences drawn from analyses of these ratings are then used to inform theories. The present study focuses on rater differences in the age of learning effect research. Three groups of raters who…
Inter-rater reliability of a modified version of Delitto et al.’s classification-based system for low back pain: a pilot study

PubMed Central

Apeldoorn, Adri T.; van Helvoirt, Hans; Ostelo, Raymond W.; Meihuizen, Hanneke; Kamper, Steven J.; van Tulder, Maurits W.; de Vet, Henrica C. W.

2016-01-01

Study design Observational inter-rater reliability study. Objectives To examine: (1) the inter-rater reliability of a modified version of Delitto et al.’s classification-based algorithm for patients with low back pain; (2) the influence of different levels of familiarity with the system; and (3) the inter-rater reliability of algorithm decisions in patients who clearly fit into a subgroup (clear classifications) and those who do not (unclear classifications). Methods Patients were examined twice on the same day by two of three participating physical therapists with different levels of familiarity with the system. Patients were classified into one of four classification groups. Raters were blind to the others’ classification decision. In order to quantify the inter-rater reliability, percentages of agreement and Cohen’s Kappa were calculated. Results A total of 36 patients were included (clear classification n = 23; unclear classification n = 13). The overall rate of agreement was 53% and the Kappa value was 0·34 [95% confidence interval (CI): 0·11–0·57], which indicated only fair inter-rater reliability. Inter-rater reliability for patients with a clear classification (agreement 52%, Kappa value 0·29) was not higher than for patients with an unclear classification (agreement 54%, Kappa value 0·33). Familiarity with the system (i.e. trained with written instructions and previous research experience with the algorithm) did not improve the inter-rater reliability. Conclusion Our pilot study challenges the inter-rater reliability of the classification procedure in clinical practice. Therefore, more knowledge is needed about factors that affect the inter-rater reliability, in order to improve the clinical applicability of the classification scheme. PMID:27559279

Comparing the Effectiveness of Self-Paced and Collaborative Frame-of-Reference Training on Rater Accuracy in a Large-Scale Writing Assessment

ERIC Educational Resources Information Center

Raczynski, Kevin R.; Cohen, Allan S.; Engelhard, George, Jr.; Lu, Zhenqiu

2015-01-01

There is a large body of research on the effectiveness of rater training methods in the industrial and organizational psychology literature. Less has been reported in the measurement literature on large-scale writing assessments. This study compared the effectiveness of two widely used rater training methods--self-paced and collaborative…
Inter and intra-rater reliability of mobile device goniometer in measuring lumbar flexion range of motion.

PubMed

Bedekar, Nilima; Suryawanshi, Mayuri; Rairikar, Savita; Sancheti, Parag; Shyam, Ashok

2014-01-01

Evaluation of range of motion (ROM) is integral part of assessment of musculoskeletal system. This is required in health fitness and pathological conditions; also it is used as an objective outcome measure. Several methods are described to check spinal flexion range of motion. Different methods for measuring spine ranges have their advantages and disadvantages. Hence, a new device was introduced in this study using the method of dual inclinometer to measure lumbar spine flexion range of motion (ROM). To determine Intra and Inter-rater reliability of mobile device goniometer in measuring lumbar flexion range of motion. iPod mobile device with goniometer software was used. The part being measure i.e the back of the subject was suitably exposed. Subject was standing with feet shoulder width apart. Spinous process of second sacral vertebra S2 and T12 were located, these were used as the reference points and readings were taken. Three readings were taken for each: inter-rater reliability as well as the intra-rater reliability. Sufficient rest was given between each flexion movement. Intra-rater reliability using ICC was r=0.920 and inter-rater r=0.812 at CI 95%. Validity r=0.95. Mobile device goniometer has high intra-rater reliability. The inter-rater reliability was moderate. This device can be used to assess range of motion of spine flexion, representing uni-planar movement.
Inter-rater reliability of select physical examination procedures in patients with neck pain.

PubMed

Hanney, William J; George, Steven Z; Kolber, Morey J; Young, Ian; Salamh, Paul A; Cleland, Joshua A

2014-07-01

This study evaluated the inter-rater reliability of select examination procedures in patients with neck pain (NP) conducted over a 24- to 48-h period. Twenty-two patients with mechanical NP participated in a standardized examination. One examiner performed standardized examination procedures and a second blinded examiner repeated the procedures 24-48 h later with no treatment administered between examinations. Inter-rater reliability was calculated with the Cohen Kappa and weighted Kappa for ordinal data while continuous level data were calculated using an intraclass correlation coefficient model 2,1 (ICC2,1). Coefficients for categorical variables ranged from poor to moderate agreement (-0.22 to 0.70 Kappa) and coefficients for continuous data ranged from slight to moderate (ICC2,1 0.28-0.74). The standard error of measurement for cervical range of motion ranged from 5.3° to 9.9° while the minimal detectable change ranged from 12.5° to 23.1°. This study is the first to report inter-rater reliability values for select components of the cervical examination in those patients with NP performed 24-48 h after the initial examination. There was considerably less reliability when compared to previous studies, thus clinicians should consider how the passage of time may influence variability in examination findings over a 24- to 48-h period.
The Critical Thinking Analytic Rubric (CTAR): Investigating Intra-Rater and Inter-Rater Reliability of a Scoring Mechanism for Critical Thinking Performance Assessments

ERIC Educational Resources Information Center

Saxton, Emily; Belanger, Secret; Becker, William

2012-01-01

The purpose of this study was to investigate the intra-rater and inter-rater reliability of the Critical Thinking Analytic Rubric (CTAR). The CTAR is composed of 6 rubric categories: interpretation, analysis, evaluation, inference, explanation, and disposition. To investigate inter-rater reliability, two trained raters scored four sets of…
Inter-rater and test-retest reliability of quality assessments by novice student raters using the Jadad and Newcastle-Ottawa Scales.

PubMed

Oremus, Mark; Oremus, Carolina; Hall, Geoffrey B C; McKinnon, Margaret C

2012-01-01

Quality assessment of included studies is an important component of systematic reviews. The authors investigated inter-rater and test-retest reliability for quality assessments conducted by inexperienced student raters. Student raters received a training session on quality assessment using the Jadad Scale for randomised controlled trials and the Newcastle-Ottawa Scale (NOS) for observational studies. Raters were randomly assigned into five pairs and they each independently rated the quality of 13-20 articles. These articles were drawn from a pool of 78 papers examining cognitive impairment following electroconvulsive therapy to treat major depressive disorder. The articles were randomly distributed to the raters. Two months later, each rater re-assessed the quality of half of their assigned articles. McMaster Integrative Neuroscience Discovery and Study Program. 10 students taking McMaster Integrative Neuroscience Discovery and Study Program courses. The authors measured inter-rater reliability using κ and the intraclass correlation coefficient type 2,1 or ICC(2,1). The authors measured test-retest reliability using ICC(2,1). Inter-rater reliability varied by scale question. For the six-item Jadad Scale, question-specific κs ranged from 0.13 (95% CI -0.11 to 0.37) to 0.56 (95% CI 0.29 to 0.83). The ranges were -0.14 (95% CI -0.28 to 0.00) to 0.39 (95% CI -0.02 to 0.81) for the NOS cohort and -0.20 (95% CI -0.49 to 0.09) to 1.00 (95% CI 1.00 to 1.00) for the NOS case-control. For overall scores on the six-item Jadad Scale, ICC(2,1)s for inter-rater and test-retest reliability (accounting for systematic differences between raters) were 0.32 (95% CI 0.08 to 0.52) and 0.55 (95% CI 0.41 to 0.67), respectively. Corresponding ICC(2,1)s for the NOS cohort were -0.19 (95% CI -0.67 to 0.35) and 0.62 (95% CI 0.25 to 0.83), and for the NOS case-control, the ICC(2,1)s were 0.46 (95% CI -0.13 to 0.92) and 0.83 (95% CI 0.48 to 0.95). Inter-rater reliability was generally poor
Individualized exergame training improves postural control in advanced degenerative spinocerebellar ataxia: A rater-blinded, intra-individually controlled trial.

PubMed

Schatton, Cornelia; Synofzik, Matthis; Fleszar, Zofia; Giese, Martin A; Schöls, Ludger; Ilg, Winfried

2017-06-01

Treatment options are rare in degenerative ataxias, especially in advanced, multisystemic disease. Exergame training might offer a novel treatment strategy, but its effectiveness has not been investigated in advanced stages. We examined the effectiveness of a 12-week home-based training with body-controlled videogames in 10 young subjects with advanced degenerative ataxia unable or barely able to stand. Training was structured in two 6-weeks phases, allowing to adapt the training according to individual training progress. Rater-blinded clinical assessment (Scale for the Assessment and Rating of Ataxia; SARA), individual goal-attainment scoring (GAS), and quantitative movement analysis were performed two weeks before training, immediately prior to training, and after training phases 1 and 2 (intra-individual control design). This study is registered with ClinicalTrials.gov, NCT02874911). After intervention, ataxia symptoms were reduced (SARA -2.5 points, p < 0.01), with benefits correlating to the amount of training (p = 0.04). Goal attainment during daily living was higher than expected (GAS: 0.45). Movement analysis revealed reduced body sway while sitting (p < 0.01), which correlated with improvements in SARA posture and gait (p = 0.005), indicating training-induced improvements in posture control mechanisms. This study provides first evidence that, even in advanced stages, subjects with degenerative ataxia may benefit from individualized training, with effects translating into daily living and improving underlying control mechanisms. The proposed training strategy can be performed at home, is motivating and facilitates patient self-empowerment. Copyright © 2017 Elsevier Ltd. All rights reserved.
Inter-rater and test–retest reliability of quality assessments by novice student raters using the Jadad and Newcastle–Ottawa Scales

PubMed Central

Oremus, Carolina; Hall, Geoffrey B C; McKinnon, Margaret C

2012-01-01

Introduction Quality assessment of included studies is an important component of systematic reviews. Objective The authors investigated inter-rater and test–retest reliability for quality assessments conducted by inexperienced student raters. Design Student raters received a training session on quality assessment using the Jadad Scale for randomised controlled trials and the Newcastle–Ottawa Scale (NOS) for observational studies. Raters were randomly assigned into five pairs and they each independently rated the quality of 13–20 articles. These articles were drawn from a pool of 78 papers examining cognitive impairment following electroconvulsive therapy to treat major depressive disorder. The articles were randomly distributed to the raters. Two months later, each rater re-assessed the quality of half of their assigned articles. Setting McMaster Integrative Neuroscience Discovery and Study Program. Participants 10 students taking McMaster Integrative Neuroscience Discovery and Study Program courses. Main outcome measures The authors measured inter-rater reliability using κ and the intraclass correlation coefficient type 2,1 or ICC(2,1). The authors measured test–retest reliability using ICC(2,1). Results Inter-rater reliability varied by scale question. For the six-item Jadad Scale, question-specific κs ranged from 0.13 (95% CI −0.11 to 0.37) to 0.56 (95% CI 0.29 to 0.83). The ranges were −0.14 (95% CI −0.28 to 0.00) to 0.39 (95% CI −0.02 to 0.81) for the NOS cohort and −0.20 (95% CI −0.49 to 0.09) to 1.00 (95% CI 1.00 to 1.00) for the NOS case–control. For overall scores on the six-item Jadad Scale, ICC(2,1)s for inter-rater and test–retest reliability (accounting for systematic differences between raters) were 0.32 (95% CI 0.08 to 0.52) and 0.55 (95% CI 0.41 to 0.67), respectively. Corresponding ICC(2,1)s for the NOS cohort were −0.19 (95% CI −0.67 to 0.35) and 0.62 (95% CI 0.25 to 0.83), and for the NOS case–control, the ICC(2
Workplace-Based Assessment: Raters' Performance Theories and Constructs

ERIC Educational Resources Information Center

Govaerts, M. J. B.; Van de Wiel, M. W. J.; Schuwirth, L. W. T.; Van der Vleuten, C. P. M.; Muijtjens, A. M. M.

2013-01-01

Weaknesses in the nature of rater judgments are generally considered to compromise the utility of workplace-based assessment (WBA). In order to gain insight into the underpinnings of rater behaviours, we investigated how raters form impressions of and make judgments on trainee performance. Using theoretical frameworks of social cognition and…
Inter-rater and intra-rater agreement on the Nordic Orofacial Test--Screening examination in children, adolescents and young adults with cerebral palsy.

PubMed

Edvinsson, Siv Elisabet; Lundqvist, Lars-Olov

2014-02-01

To evaluate inter-rater and intra-rater agreement on the Nordic Orofacial Test-Screening (NOT-S) examination applied to children, adolescents and young adults with cerebral palsy (CP). Using the NOT-S examination, two speech and language pathologists independently assessed video recordings of 48 subjects with CP aged 5-22 years and representing all CP sub-diagnoses and levels of gross motor function and manual ability. Thirty-one subjects were reassessed. Fifteen out of 17 items in the NOT-S examination domains (1) Face at rest, (2) Nose breathing, (3) Facial expression, (4) Masticatory muscle and jaw function, (5) Oral motor function and (6) Speech were rated using a 'yes' (dysfunction observed)/'no' format, generating an overall score of 0-6. Inter-rater agreement: Twelve out of 15 items and five out of six domains showed acceptable unweighted kappa values (κ = 0.46-1.00). The lowest kappa value was found for domain 4 (κ = -0.04), although it had high inter-rater agreement (92%). The linear weighted kappa value for the overall NOT-S examination score was 0.65 (95% CI = 0.49-0.82). Intra-rater agreement: All items and domains showed acceptable unweighted kappa values (items 0.58-1.00 and 0.59-1.00, domains 0.81-1.00 and 0.62-0.89) for both raters. The linear weighted kappa value for the overall NOT-S examination score was 0.81 (95% CI = 0.63-0.99) for rater A and 0.54 (95% CI = 0.25-0.82) for rater B. The NOT-S examination has acceptable inter-rater and intra-rater agreement when used in young individuals with CP.
Accuracy of Surgery Clerkship Performance Raters.

ERIC Educational Resources Information Center

Littlefield, John H.; And Others

1991-01-01

Interrater reliability in numerical ratings of clerkship performance (n=1,482 students) in five surgery programs was studied. Raters were classified as accurate or moderately or significantly stringent or lenient. Results indicate that increasing the proportion of accurate raters would substantially improve the precision of class rankings. (MSE)
A rater training protocol to assess team performance.

PubMed

Eppich, Walter; Nannicelli, Anna P; Seivert, Nicholas P; Sohn, Min-Woong; Rozenfeld, Ranna; Woods, Donna M; Holl, Jane L

2015-01-01

Simulation-based methodologies are increasingly used to assess teamwork and communication skills and provide team training. Formative feedback regarding team performance is an essential component. While effective use of simulation for assessment or training requires accurate rating of team performance, examples of rater-training programs in health care are scarce. We describe our rater training program and report interrater reliability during phases of training and independent rating. We selected an assessment tool shown to yield valid and reliable results and developed a rater training protocol with an accompanying rater training handbook. The rater training program was modeled after previously described high-stakes assessments in the setting of 3 facilitated training sessions. Adjacent agreement was used to measure interrater reliability between raters. Nine raters with a background in health care and/or patient safety evaluated team performance of 42 in-situ simulations using post-hoc video review. Adjacent agreement increased from the second training session (83.6%) to the third training session (85.6%) when evaluating the same video segments. Adjacent agreement for the rating of overall team performance was 78.3%, which was added for the third training session. Adjacent agreement was 97% 4 weeks posttraining and 90.6% at the end of independent rating of all simulation videos. Rater training is an important element in team performance assessment, and providing examples of rater training programs is essential. Articulating key rating anchors promotes adequate interrater reliability. In addition, using adjacent agreement as a measure allows differentiation between high- and low-performing teams on video review. © 2015 The Alliance for Continuing Education in the Health Professions, the Society for Academic Continuing Medical Education, and the Council on Continuing Medical Education, Association for Hospital Medical Education.
Rater Variables Associated with ITER Ratings

ERIC Educational Resources Information Center

Paget, Michael; Wu, Caren; McIlwrick, Joann; Woloschuk, Wayne; Wright, Bruce; McLaughlin, Kevin

2013-01-01

Advocates of holistic assessment consider the ITER a more authentic way to assess performance. But this assessment format is subjective and, therefore, susceptible to rater bias. Here our objective was to study the association between rater variables and ITER ratings. In this observational study our participants were clerks at the University of…
Analyzing Written Comments by Performance Raters.

ERIC Educational Resources Information Center

Littlefield, John; And Others

A four-level taxonomy is proposed to define the usefulness of rater written comments for supporting letters of recommendation. The taxonomy is used to classify comments on 220 rating forms by 25 raters from two surgery departments regarding performance by third-year medical students. Written comments were classified by the following taxonomy: (1)…
Rater cognition: review and integration of research findings.

PubMed

Gauthier, Geneviève; St-Onge, Christina; Tavares, Walter

2016-05-01

Given the complexity of competency frameworks, associated skills and abilities, and contexts in which they are to be assessed in competency-based education (CBE), there is an increased reliance on rater judgements when considering trainee performance. This increased dependence on rater-based assessment has led to the emergence of rater cognition as a field of research in health professions education. The topic, however, is often conceptualised and ultimately investigated using many different perspectives and theoretical frameworks. Critically analysing how researchers think about, study and discuss rater cognition or the judgement processes in assessment frameworks may provide meaningful and efficient directions in how the field continues to explore the topic. We conducted a critical and integrative review of the literature to explore common conceptualisations and unified terminology associated with rater cognition research. We identified 1045 articles on rater-based assessment in health professions education using Scorpus, Medline and ERIC and 78 articles were included in our review. We propose a three-phase framework of observation, processing and integration. We situate nine specific mechanisms and sub-mechanisms described across the literature within these phases: (i) generating automatic impressions about the person; (ii) formulating high-level inferences; (iii) focusing on different dimensions of competencies; (iv) categorising through well-developed schemata based on (a) personal concept of competence, (b) comparison with various exemplars and (c) task and context specificity; (v) weighting and synthesising information differently, (vi) producing narrative judgements; and (vii) translating narrative judgements into scales. Our review has allowed us to identify common underlying conceptualisations of observed rater mechanisms and subsequently propose a comprehensive, although complex, framework for the dynamic and contextual nature of the rating process
Resampling probability values for weighted kappa with multiple raters.

PubMed

Mielke, Paul W; Berry, Kenneth J; Johnston, Janis E

2008-04-01

A new procedure to compute weighted kappa with multiple raters is described. A resampling procedure to compute approximate probability values for weighted kappa with multiple raters is presented. Applications of weighted kappa are illustrated with an example analysis of classifications by three independent raters.
Measuring the quality of life in mild to very severe dementia: testing the inter-rater and intra-rater reliability of the German version of the QUALIDEM.

PubMed

Dichter, Martin Nikolaus; Schwab, Christian G G; Meyer, Gabriele; Bartholomeyczik, Sabine; Dortmann, Olga; Halek, Margareta

2014-05-01

Quality of life (Qol) is an increasingly used outcome measure in dementia research. The QUALIDEM is a dementia-specific and proxy-rated Qol instrument. We aimed to determine the inter-rater and intra-rater reliability in residents with dementia in German nursing homes. The QUALIDEM consists of nine subscales that were applied to a sample of 108 people with mild to severe dementia and six consecutive subscales that were applied to a sample of 53 people with very severe dementia. The proxy raters were 49 registered nurses and nursing assistants. Inter-rater and intra-rater reliability scores were calculated on the subscale and item level. None of the QUALIDEM subscales showed strong inter-rater reliability based on the single-measure Intra-Class Correlation Coefficient (ICC) for absolute agreement ≥ 0.70. Based on the average-measure ICC for four raters, eight subscales for people with mild to severe dementia (care relationship, positive affect, negative affect, restless tense behavior, social relations, social isolation, feeling at home and having something to do) and five subscales for very severe dementia (care relationship, negative affect, restless tense behavior, social relations and social isolation) yielded a strong inter-rater agreement (ICC: 0.72-0.86). All of the QUALIDEM subscales, regardless of dementia severity, showed strong intra-rater agreement. The ICC values ranged between 0.70 and 0.79 for people with mild to severe dementia and between 0.75 and 0.87 for people with very severe dementia. This study demonstrated insufficient inter-rater reliability and sufficient intra-rater reliability for all subscales of both versions of the German QUALIDEM. The degree of inter-rater reliability can be improved by collaborative Qol rating by more than one nurse. The development of a measurement manual with accurate item definitions and a standardized education program for proxy raters is recommended.
Does a Rater's Professional Background Influence Communication Skills Assessment?

PubMed

Artemiou, Elpida; Hecker, Kent G; Adams, Cindy L; Coe, Jason B

2015-01-01

There is increasing pressure in veterinary education to teach and assess communication skills, with the Objective Structured Clinical Examination (OSCE) being the most common assessment method. Previous research reveals that raters are a large source of variance in OSCEs. This study focused on examining the effect of raters' professional background as a source of variance when assessing students' communication skills. Twenty-three raters were categorized according to their professional background: clinical sciences (n=11), basic sciences (n=4), clinical communication (n=5), or hospital administrator/clinical skills technicians (n=3). Raters from each professional background were assigned to the same station and assessed the same students during two four-station OSCEs. Students were in year 2 of their pre-clinical program. Repeated-measures ANOVA results showed that OSCE scores awarded by the rater groups differed significantly: (F(matched_station_1) [2,91]=6.97, p=.002), (F(matched_station_2) [3,90]=13.95, p=.001), (F(matched_station_3) [3,90]=8.76, p=.001), and ((Fmatched_station_4) [2,91]=30.60, p=.001). A significant time effect between the two OSCEs was calculated for matched stations 1, 2, and 4, indicating improved student performances. Raters with a clinical communication skills background assigned scores that were significantly lower compared to the other rater groups. Analysis of written feedback provided by the clinical sciences raters showed that they were influenced by the students' clinical knowledge of the case and that they did not rely solely on the communication checklist items. This study shows that it is important to consider rater background both in recruitment and training programs for communication skills' assessment.
Rater Cognition Research: Some Possible Directions for the Future

ERIC Educational Resources Information Center

Myford, Carol M.

2012-01-01

Over the last several decades, researchers have studied many and varied aspects of rater cognition. Those interested in pursuing basic research have focused on gaining an understanding of raters' thought processes as they score different types of performances and products, striving to understand how raters' mental representations and the cognitive…
Exploring Rating Quality in Rater-Mediated Assessments Using Mokken Scale Analysis

PubMed Central

Wind, Stefanie A.; Engelhard, George

2015-01-01

Mokken scale analysis is a probabilistic nonparametric approach that offers statistical and graphical tools for evaluating the quality of social science measurement without placing potentially inappropriate restrictions on the structure of a data set. In particular, Mokken scaling provides a useful method for evaluating important measurement properties, such as invariance, in contexts where response processes are not well understood. Because rater-mediated assessments involve complex interactions among many variables, including assessment contexts, student artifacts, rubrics, individual rater characteristics, and others, rater-assigned scores are suitable candidates for Mokken scale analysis. The purposes of this study are to describe a suite of indices that can be used to explore the psychometric quality of data from rater-mediated assessments and to illustrate the substantive interpretation of Mokken-based statistics and displays in this context. Techniques that are commonly used in polytomous applications of Mokken scaling are adapted for use with rater-mediated assessments, with a focus on the substantive interpretation related to individual raters. Overall, the findings suggest that indices of rater monotonicity, rater scalability, and invariant rater ordering based on Mokken scaling provide diagnostic information at the level of individual raters related to the requirements for invariant measurement. These Mokken-based indices serve as an additional suite of diagnostic tools for exploring the quality of data from rater-mediated assessments that can supplement rating quality indices based on parametric models. PMID:29795883
Multi-rater feedback with gap analysis: an innovative means to assess communication skill and self-insight.

PubMed

Calhoun, Aaron W; Rider, Elizabeth A; Peterson, Eleanor; Meyer, Elaine C

2010-09-01

Multi-rater assessment with gap analysis is a powerful method for assessing communication skills and self-insight, and enhancing self-reflection. We demonstrate the use of this methodology. The Program for the Approach to Complex Encounters (PACE) is an interdisciplinary simulation-based communication skills program. Encounters are assessed using an expanded Kalamazoo Consensus Statement Essential Elements Checklist adapted for multi-rater feedback and gap analysis. Data from a representative conversation were analyzed. Likert and forced-choice data with gap analysis are used to assess performance. Participants were strong in Demonstrating Empathy and Providing Closure, and needed to improve Relationship Building, Gathering Information, and understanding the Patient's/Family's Perspective. Participants under-appraised their abilities in Relationship Building, Providing Closure, and Demonstrating Empathy, as well as their overall performance. The conversion of these results into verbal feedback is discussed. We describe an evaluation methodology using multi-rater assessment with gap analysis to assess communication skills and self-insight. This methodology enables faculty to identify undervalued skills and perceptual blind spots, provide comprehensive, data driven, feedback, and encourage reflection. Implementation of graphical feedback forms coupled with one-on-one discussion using the above methodology has the potential to enhance trainee self-awareness and reflection, improving the impact of educational programs. Copyright (c) 2010 Elsevier Ireland Ltd. All rights reserved.

A paired comparison analysis of third-party rater thyroidectomy scar preference.

PubMed

Rajakumar, C; Doyle, P C; Brandt, M G; Moore, C C; Nichols, A; Franklin, J H; Yoo, J; Fung, K

2017-01-01

To determine the length and position of a thyroidectomy scar that is cosmetically most appealing to naïve raters. Images of thyroidectomy scars were reproduced on male and female necks using digital imaging software. Surgical variables studied were scar position and length. Fifteen raters were presented with 56 scar pairings and asked to identify which was preferred cosmetically. Twenty duplicate pairings were included to assess rater reliability. Analysis of variance was used to determine preference. Raters preferred low, short scars, followed by high, short scars, with long scars in either position being less desirable (p < 0.05). Twelve of 15 raters had acceptable intra-rater and inter-rater reliability. Naïve raters preferred low, short scars over the alternatives. High, short scars were the next most favourably rated. If other factors influencing incision choice are considered equal, surgeons should consider these preferences in scar position and length when planning their thyroidectomy approach.
Rater agreement of visual lameness assessment in horses during lungeing.

PubMed

Hammarberg, M; Egenvall, A; Pfau, T; Rhodin, M

2016-01-01

Lungeing is an important part of lameness examinations as the circular path may accentuate low-grade lameness. Movement asymmetries related to the circular path, to compensatory movements and to pain make the lameness evaluation complex. Scientific studies have shown high inter-rater variation when assessing lameness during straight line movement. The aim was to estimate inter- and intra-rater agreement of equine veterinarians evaluating lameness from videos of sound and lame horses during lungeing and to investigate the influence of veterinarians' experience and the objective degree of movement asymmetry on rater agreement. Cross-sectional observational study. Video recordings and quantitative gait analysis with inertial sensors were performed in 23 riding horses of various breeds. The horses were examined at trot on a straight line and during lungeing on soft or hard surfaces in both directions. One video sequence was recorded per condition and the horses were classified as forelimb lame, hindlimb lame or sound from objective straight line symmetry measurements. Equine veterinarians (n = 86), including 43 with >5 years of orthopaedic experience, participated in a web-based survey and were asked to identify the lamest limb on 60 videos, including 10 repeats. The agreements between (inter-rater) and within (intra-rater) veterinarians were analysed with κ statistics (Fleiss, Cohen). Inter-rater agreement κ was 0.31 (0.38/0.25 for experienced/less experienced) and higher for forelimb (0.33) than for hindlimb lameness (0.11) or soundness (0.08) evaluation. Median intra-rater agreement κ was 0.57. Inter-rater agreement was poor for less experienced raters, and for all raters when evaluating hindlimb lameness. Since identification of the lame limb/limbs is a prerequisite for successful diagnosis, treatment and recovery, the high inter-rater variation when evaluating lameness on the lunge is likely to influence the accuracy and repeatability of lameness examinations
Rater agreement of visual lameness assessment in horses during lungeing

PubMed Central

Hammarberg, M.; Egenvall, A.; Pfau, T.

2015-01-01

Summary Reasons for performing study Lungeing is an important part of lameness examinations as the circular path may accentuate low‐grade lameness. Movement asymmetries related to the circular path, to compensatory movements and to pain make the lameness evaluation complex. Scientific studies have shown high inter‐rater variation when assessing lameness during straight line movement. Objectives The aim was to estimate inter‐ and intra‐rater agreement of equine veterinarians evaluating lameness from videos of sound and lame horses during lungeing and to investigate the influence of veterinarians’ experience and the objective degree of movement asymmetry on rater agreement. Study design Cross‐sectional observational study. Methods Video recordings and quantitative gait analysis with inertial sensors were performed in 23 riding horses of various breeds. The horses were examined at trot on a straight line and during lungeing on soft or hard surfaces in both directions. One video sequence was recorded per condition and the horses were classified as forelimb lame, hindlimb lame or sound from objective straight line symmetry measurements. Equine veterinarians (n = 86), including 43 with >5 years of orthopaedic experience, participated in a web‐based survey and were asked to identify the lamest limb on 60 videos, including 10 repeats. The agreements between (inter‐rater) and within (intra‐rater) veterinarians were analysed with κ statistics (Fleiss, Cohen). Results Inter‐rater agreement κ was 0.31 (0.38/0.25 for experienced/less experienced) and higher for forelimb (0.33) than for hindlimb lameness (0.11) or soundness (0.08) evaluation. Median intra‐rater agreement κ was 0.57. Conclusions Inter‐rater agreement was poor for less experienced raters, and for all raters when evaluating hindlimb lameness. Since identification of the lame limb/limbs is a prerequisite for successful diagnosis, treatment and recovery, the high inter‐rater variation
Body Shape Preferences: Associations with Rater Body Shape and Sociosexuality

PubMed Central

Price, Michael E.; Pound, Nicholas; Dunn, James; Hopkins, Sian; Kang, Jinsheng

2013-01-01

There is accumulating evidence of condition-dependent mate choice in many species, that is, individual preferences varying in strength according to the condition of the chooser. In humans, for example, people with more attractive faces/bodies, and who are higher in sociosexuality, exhibit stronger preferences for attractive traits in opposite-sex faces/bodies. However, previous studies have tended to use only relatively simple, isolated measures of rater attractiveness. Here we use 3D body scanning technology to examine associations between strength of rater preferences for attractive traits in opposite-sex bodies, and raters’ body shape, self-perceived attractiveness, and sociosexuality. For 118 raters and 80 stimuli models, we used a 3D scanner to extract body measurements associated with attractiveness (male waist-chest ratio [WCR], female waist-hip ratio [WHR], and volume-height index [VHI] in both sexes) and also measured rater self-perceived attractiveness and sociosexuality. As expected, WHR and VHI were important predictors of female body attractiveness, while WCR and VHI were important predictors of male body attractiveness. Results indicated that male rater sociosexuality scores were positively associated with strength of preference for attractive (low) VHI and attractive (low) WHR in female bodies. Moreover, male rater self-perceived attractiveness was positively associated with strength of preference for low VHI in female bodies. The only evidence of condition-dependent preferences in females was a positive association between attractive VHI in female raters and preferences for attractive (low) WCR in male bodies. No other significant associations were observed in either sex between aspects of rater body shape and strength of preferences for attractive opposite-sex body traits. These results suggest that among male raters, rater self-perceived attractiveness and sociosexuality are important predictors of preference strength for attractive opposite
Genetics Home Reference: X-linked congenital stationary night blindness

MedlinePlus

... Health Conditions X-linked congenital stationary night blindness X-linked congenital stationary night blindness Printable PDF Open ... Javascript to view the expand/collapse boxes. Description X-linked congenital stationary night blindness is a disorder ...
Measuring Symmetry in Children With Unrepaired Cleft Lip: Defining a Standard for the Three-Dimensional Midfacial Reference Plane.

PubMed

Wu, Jia; Heike, Carrie; Birgfeld, Craig; Evans, Kelly; Maga, Murat; Morrison, Clinton; Saltzman, Babette; Shapiro, Linda; Tse, Raymond

2016-11-01

Quantitative measures of facial form to evaluate treatment outcomes for cleft lip (CL) are currently limited. Computer-based analysis of three-dimensional (3D) images provides an opportunity for efficient and objective analysis. The purpose of this study was to define a computer-based standard of identifying the 3D midfacial reference plane of the face in children with unrepaired cleft lip for measurement of facial symmetry. The 3D images of 50 subjects (35 with unilateral CL, 10 with bilateral CL, five controls) were included in this study. Five methods of defining a midfacial plane were applied to each image, including two human-based (Direct Placement, Manual Landmark) and three computer-based (Mirror, Deformation, Learning) methods. Six blinded raters (three cleft surgeons, two craniofacial pediatricians, and one craniofacial researcher) independently ranked and rated the accuracy of the defined planes. Among computer-based methods, the Deformation method performed significantly better than the others. Although human-based methods performed best, there was no significant difference compared with the Deformation method. The average correlation coefficient among raters was .4; however, it was .7 and .9 when the angular difference between planes was greater than 6° and 8°, respectively. Raters can agree on the 3D midfacial reference plane in children with unrepaired CL using digital surface mesh. The Deformation method performed best among computer-based methods evaluated and can be considered a useful tool to carry out automated measurements of facial symmetry in children with unrepaired cleft lip.
Measurement of glenohumeral joint translation using real-time ultrasound imaging: A physiotherapist and sonographer intra-rater and inter-rater reliability study.

PubMed

Rathi, Sangeeta; Taylor, Nicholas F; Gee, Jamie; Green, Rodney A

2016-12-01

Ultrasonography is an economical and non-invasive method for measuring real-time joint movements. Although physiotherapists are increasingly using ultrasound imaging for rotator cuff disorders, there is a lack of evidence on their reliability in using ultrasonography to measure glenohumeral translation. The aim of this study was to evaluate the reliability of a physiotherapist in measuring anterior and posterior glenohumeral joint translation with ultrasound. Study design: within day reliability. Anterior and posterior glenohumeral translations were measured at rest, in response to passive accessory motion testing force, and with isometric internal and external rotation in 12 young healthy adults. All the measurements were made in real time by a physiotherapist and an experienced sonographer in two positions (neutral and abducted) and in two views (anterior and posterior). Intra-rater and inter-rater reliability were expressed using intraclass correlation coefficients (ICC) and measurement error (mm). Intra-rater reliability was good for both raters (ICC P : 0.86-0.98; ICC S : 0.85-0.96). The inter-rater reliability between the physiotherapist and sonographer was moderate to good for posterior measurements (ICC 0.50-0.75) and poor to moderate for anterior measurements (ICC 0.31-0.53). For both intra-rater and inter-rater measurements, posterior translation was more reliable than the anterior translation with smaller measurement errors (posterior: 0.1-0.2 mm, anterior: 0.2-0.3 mm). A physiotherapist with minimal training was reliable in measuring glenohumeral joint translations. The ultrasound method was reliable for repeated measurement of both anterior and posterior glenohumeral translations with posterior measurements being more reliable than anterior. This method is recommended for future research to investigate the stabilising role of rotator cuff muscles. Copyright © 2016 Elsevier Ltd. All rights reserved.
The inter and intra rater reliability of the Netball Movement Screening Tool.

PubMed

Reid, Duncan A; Vanweerd, Rebecca J; Larmer, Peter J; Kingstone, Rachel

2015-05-01

To establish the inter- and intra-rater reliability of the Netball Movement Screening Tool, for screening adolescent female netball players. Inter- and intra-rater reliability study. Forty secondary school netball players were recruited to take part in the study. Twenty subjects were screened simultaneously and independently by two raters to ascertain inter-rater agreement. Twenty subjects were scored by rater one on two occasions, separated by a week, to ascertain intra-rater agreement. Inter and intra-rater agreement was assessed utilising the two-way mixed inter class correlation coefficient and weighted kappa statistics. No significant demographic differences were found between the inter and intra-rater groups of subjects. Inter class correlation coefficients' demonstrated excellent inter-rater (two-way mixed inter class correlation coefficients 0.84, standard error of measurement 0.25) and intra-rater (two-way mixed inter class correlation coefficients 0.96, standard error of measurement 0.13) reliability for the overall Netball Movement Screening Tool score and substantial-excellent (two-way mixed inter class correlation coefficients 1.0-0.65) inter-rater and substantial-excellent intra-rater (two-way mixed inter class correlation coefficients 0.96-0.79) reliability for the component scores of the Netball Movement Screening Tool. Kappa statistic showed substantial to poor inter-rater (k=0.75-0.32) and intra-rater (k=0.77-0.27) agreement for individual tests of the NMST. The Netball Movement Screening Tool may be a reliable screening tool for adolescent netball players; however the individual test scores have low reliability. The screening tool can be administered reliably by raters with similar levels of training in the tool but variable clinical experience. On-going research needs to be undertaken to ascertain whether the Netball Movement Screening Tool is a valid tool in ascertaining increased injury risk for netball players. Copyright © 2014 Sports
Rating the raters: assessing the quality of Hamilton rating scale for depression clinical interviews in two industry-sponsored clinical drug trials.

PubMed

Engelhardt, Nina; Feiger, Alan D; Cogger, Kenneth O; Sikich, Dawn; DeBrota, David J; Lipsitz, Joshua D; Kobak, Kenneth A; Evans, Kenneth R; Potter, William Z

2006-02-01

The quality of clinical interviews conducted in industry-sponsored clinical drug trials is an important but frequently overlooked variable that may influence the outcome of a study. We evaluated the quality of Hamilton Rating Scale for Depression (HAM-D) clinical interviews performed at baseline in 2 similar multicenter, randomized, placebo-controlled depression trials sponsored by 2 pharmaceutical companies. A total of 104 audiotaped HAM-D clinical interviews were evaluated by a blinded expert reviewer for interview quality using the Rater Applied Performance Scale (RAPS). The RAPS assesses adherence to a structured interview guide, clarification of and follow-up to patient responses, neutrality, rapport, and adequacy of information obtained. HAM-D interviews were brief and cursory and the quality of interviews was below what would be expected in a clinical drug trial. Thirty-nine percent of the interviews were conducted in 10 minutes or less, and most interviews were rated fair or unsatisfactory on most RAPS dimensions. Results from our small sample illustrate that the clinical interview skills of raters who administered the HAM-D were below what many would consider acceptable. Evaluation and training of clinical interview skills should be considered as part of a rater training program.
Emotions and assessment: considerations for rater-based judgements of entrustment.

PubMed

Gomez-Garibello, Carlos; Young, Meredith

2018-03-01

Assessment is subject to increasing scrutiny as medical education transitions towards a competency-based medical education (CBME) model. Traditional perspectives on the roles of assessment emphasise high-stakes, summative assessment, whereas CBME argues for formative assessment. Revisiting conceptualisations about the roles and formats of assessment in medical education provides opportunities to examine understandings and expectations of the assessment of learners. The act of the rater generating scores might be considered as an exclusively cognitive exercise; however, current literature has drawn attention to the notion of raters as measurement instruments, thereby attributing additional factors to their decision-making processes, such as social considerations and intuition. However, the literature has not comprehensively examined the influence of raters' emotions during assessment. In this narrative review, we explore the influence of raters' emotions in the assessment of learners. We summarise existing literature that describes the role of emotions in assessment broadly, and rater-based assessment specifically, across a variety of fields. The literature related to emotions and assessment is examined from different perspectives, including those of educational context, decision making and rater cognition. We use the concept of entrustable professional activities (EPAs) to contextualise a discussion of the ways in which raters' emotions may have meaningful impacts on the decisions they make in clinical settings. This review summarises findings from different perspectives and identifies areas for consideration for the role of emotion in rater-based assessment, and areas for future research. We identify and discuss three different interpretations of the influence of raters' emotions during assessments: (i) emotions lead to biased decision making; (ii) emotions contribute random noise to assessment, and (iii) emotions constitute legitimate sources of information that
An Investigation of Rater Cognition in the Assessment of Projects

ERIC Educational Resources Information Center

Crisp, Victoria

2012-01-01

In the United Kingdom, the majority of national assessments involve human raters. The processes by which raters determine the scores to award are central to the assessment process and affect the extent to which valid inferences can be made from assessment outcomes. Thus, understanding rater cognition has become a growing area of research in the…
Rater Accuracy and Training Group Effects in Expert- and Supervisor-Based Monitoring Systems

ERIC Educational Resources Information Center

Baird, Jo-Anne; Meadows, Michelle; Leckie, George; Caro, Daniel

2017-01-01

This study evaluated rater accuracy with rater-monitoring data from high stakes examinations in England. Rater accuracy was estimated with cross-classified multilevel modelling. The data included face-to-face training and monitoring of 567 raters in 110 teams, across 22 examinations, giving a total of 5500 data points. Two rater-monitoring systems…
A novel approach to rater training and certification in multinational trials.

PubMed

Jeglic, Elizabeth; Kobak, Kenneth A; Engelhardt, Nina; Williams, Janet B W; Lipsitz, Joshua D; Salvucci, Donna; Bryson, Heather; Bellew, Kevin

2007-07-01

Clinical trials are becoming increasingly international in scope. Global studies pose unique challenges in training and calibrating raters owing to language and cultural differences. Recent findings that poorly conducted interviews reduce study power, makes attention to raters' clinical skills critical. In this study, 109 raters from 14 countries went through a two-step certification process on the Hamilton Depression and Anxiety Rating Scales: (i) an online didactic tutorial on scoring conventions, and (ii) applied clinical training, consisting of small language-specific groups in which raters took turns interviewing patients while observed by an expert trainer, and observation and evaluation of individual interviews. Translators were used when native-language trainers were unavailable. Those who were unable to attend the startup meeting received the training individually via telephone. Results found a significant improvement in raters' knowledge of scoring conventions, with the mean number of correct answers on the 20-item test improving from 14.59 to 17.83, P<0.0001. In addition, raters' clinical skills improved significantly, with the mean score on the Rater Applied Performance Scale improving from their first to their second testing from 10.25 to 11.31, P=0.003. These results support the efficacy of this applied training model in improving raters' applied clinical skills in multinational trials.
Effect of rater training on reliability and accuracy of mini-CEX scores: a randomized, controlled trial.

PubMed

Cook, David A; Dupras, Denise M; Beckman, Thomas J; Thomas, Kris G; Pankratz, V Shane

2009-01-01

Mini-CEX scores assess resident competence. Rater training might improve mini-CEX score interrater reliability, but evidence is lacking. Evaluate a rater training workshop using interrater reliability and accuracy. Randomized trial (immediate versus delayed workshop) and single-group pre/post study (randomized groups combined). Academic medical center. Fifty-two internal medicine clinic preceptors (31 randomized and 21 additional workshop attendees). The workshop included rater error training, performance dimension training, behavioral observation training, and frame of reference training using lecture, video, and facilitated discussion. Delayed group received no intervention until after posttest. Mini-CEX ratings at baseline (just before workshop for workshop group), and four weeks later using videotaped resident-patient encounters; mini-CEX ratings of live resident-patient encounters one year preceding and one year following the workshop; rater confidence using mini-CEX. Among 31 randomized participants, interrater reliabilities in the delayed group (baseline intraclass correlation coefficient [ICC] 0.43, follow-up 0.53) and workshop group (baseline 0.40, follow-up 0.43) were not significantly different (p = 0.19). Mean ratings were similar at baseline (delayed 4.9 [95% confidence interval 4.6-5.2], workshop 4.8 [4.5-5.1]) and follow-up (delayed 5.4 [5.0-5.7], workshop 5.3 [5.0-5.6]; p = 0.88 for interaction). For the entire cohort, rater confidence (1 = not confident, 6 = very confident) improved from mean (SD) 3.8 (1.4) to 4.4 (1.0), p = 0.018. Interrater reliability for ratings of live encounters (entire cohort) was higher after the workshop (ICC 0.34) than before (ICC 0.18) but the standard error of measurement was similar for both periods. Rater training did not improve interrater reliability or accuracy of mini-CEX scores. clinicaltrials.gov identifier NCT00667940
Virtual Raters for Reproducible and Objective Assessments in Radiology

NASA Astrophysics Data System (ADS)

Kleesiek, Jens; Petersen, Jens; Döring, Markus; Maier-Hein, Klaus; Köthe, Ullrich; Wick, Wolfgang; Hamprecht, Fred A.; Bendszus, Martin; Biller, Armin

2016-04-01

Volumetric measurements in radiologic images are important for monitoring tumor growth and treatment response. To make these more reproducible and objective we introduce the concept of virtual raters (VRs). A virtual rater is obtained by combining knowledge of machine-learning algorithms trained with past annotations of multiple human raters with the instantaneous rating of one human expert. Thus, he is virtually guided by several experts. To evaluate the approach we perform experiments with multi-channel magnetic resonance imaging (MRI) data sets. Next to gross tumor volume (GTV) we also investigate subcategories like edema, contrast-enhancing and non-enhancing tumor. The first data set consists of N = 71 longitudinal follow-up scans of 15 patients suffering from glioblastoma (GB). The second data set comprises N = 30 scans of low- and high-grade gliomas. For comparison we computed Pearson Correlation, Intra-class Correlation Coefficient (ICC) and Dice score. Virtual raters always lead to an improvement w.r.t. inter- and intra-rater agreement. Comparing the 2D Response Assessment in Neuro-Oncology (RANO) measurements to the volumetric measurements of the virtual raters results in one-third of the cases in a deviating rating. Hence, we believe that our approach will have an impact on the evaluation of clinical studies as well as on routine imaging diagnostics.
The new GRID Hamilton Rating Scale for Depression demonstrates excellent inter-rater reliability for inexperienced and experienced raters before and after training.

PubMed

Tabuse, Hideaki; Kalali, Amir; Azuma, Hideki; Ozaki, Norio; Iwata, Nakao; Naitoh, Hiroshi; Higuchi, Teruhiko; Kanba, Shigenobu; Shioe, Kunihiko; Akechi, Tatsuo; Furukawa, Toshi A

2007-09-30

The Hamilton Rating Scale for Depression (HAMD) is the de facto international gold standard for the assessment of depression. There are some criticisms, however, especially with regard to its inter-rater reliability, due to the lack of standardized questions or explicit scoring procedures. The GRID-HAMD was developed to provide standardized explicit scoring conventions and a structured interview guide for administration and scoring of the HAMD. We developed the Japanese version of the GRID-HAMD and examined its inter-rater reliability among experienced and inexperienced clinicians (n=70), how rater characteristics may affect it, and how training can improve it in the course of a model training program using videotaped interviews. The results showed that the inter-rater reliability of the GRID-HAMD total score was excellent to almost perfect and those of most individual items were also satisfactory to excellent, both with experienced and inexperienced raters, and both before and after the training. With its standardized definitions, questions and detailed scoring conventions, the GRID-HAMD appears to be the best achievable set of interview guides for the HAMD and can provide a solid tool for highly reliable assessment of depression severity.
Inter-rater Reliability of Three Musculoskeletal Physical examination Techniques Used to Assess Motion in Three Planes While Standing

PubMed Central

Prather, Heidi; Hunt, Devyani; Steger-May, Karen; Hayes, Marcie Harris; Knaus, Evan; Clohisy, John

2012-01-01

Objective The objective of the study was to measure the reliability between examiners of three basic maneuvers of the Total Body Functional Profile© physical examination test. The hypothesis was musculoskeletal health care providers of different disciplines could reliably use the three basic maneuvers as part of the musculoskeletal physical examination. Design A prospective observational study was conducted. Twenty-eight adult volunteers were measured on both the left and right side by two independent raters on a single occasion. Setting The subjects were recruited through advertisements placed by the orthopedic department at a tertiary university. Participants 28 volunteers were recruited and completed the study. The volunteers were between the ages of 18 and 51 years of age, had no symptoms in the lower extremity or spine, had no previous history of surgery or tumor involving the lower extremity, and no medical conditions that would preclude participation. Assessment On a single occasion, two examiners per one volunteer were blinded to their own and each others' measurements. Each examiner assessed the distance of frontal and sagittal plane lunge and angle of motion for transverse plane testing. Main Outcome Measurements Inter-rater agreement is expressed with intraclass correlation coefficients (ICCs) and corresponding 95% confidence intervals (CIs). The difference between raters is reported with 95% CIs. Baseline demographics, UCLA, and Harris hip questionnaires were completed by all participants. Results The UCLA and Harris hip scores showed no significant activity restrictions or pain limitations in all participants. The inter-rater reliability for sagittal, frontal, and transverse plane matrix testing was good with ICCs of 0.86 (95% CI 0.77, 0.91), 0.90 (95% CI 0.84, 0.94), and 0.85 (95% CI 0.75, 0.91) respectively. The rater reliability between disciplines for transverse, sagittal and frontal plane matrix testing was good with ICCs of 0.89 (95% CI 0.80, 0
Inter-rater reliability of three musculoskeletal physical examination techniques used to assess motion in three planes while standing.

PubMed

Prather, Heidi; Hunt, Devyani; Steger-May, Karen; Hayes, Marcie Harris; Knaus, Evan; Clohisy, John

2009-07-01

The objective of the study was to measure the reliability between examiners of 3 basic maneuvers of the Total Body Functional Profile physical examination test. The hypothesis was musculoskeletal health care providers of different disciplines could reliably use the 3 basic maneuvers as part of the musculoskeletal physical examination. A prospective observational study was conducted. Twenty-eight adult volunteers were measured on both the left and right side by 2 independent raters on a single occasion. The subjects were recruited through advertisements placed by the orthopedic department at a tertiary university. Twenty-eight volunteers were recruited and completed the study. The volunteers were between the ages of 18 and 51 years of age, had no symptoms in the lower extremity or spine, had no previous history of surgery or tumor involving the lower extremity, and no medical conditions that would preclude participation. On a single occasion, 2 examiners per 1 volunteer were blinded to their own and each others' measurements. Each examiner assessed the distance of frontal and sagittal plane lunge and angle of motion for transverse plane testing. Inter-rater agreement is expressed with intraclass correlation coefficients (ICCs) and corresponding 95% confidence intervals (CIs). The difference between raters is reported with 95% CIs. Baseline demographics, University of California Los Angeles (UCLA), and Harris hip questionnaires were completed by all participants. The UCLA and Harris hip scores showed no significant activity restrictions or pain limitations in all participants. The inter-rater reliability for sagittal, frontal, and transverse plane matrix testing was good with ICCs of 0.86 (95% CI 0.77-0.91), 0.90 (95% CI 0.84-0.94), and 0.85 (95% CI 0.75-0.91), respectively. The rater reliability between disciplines for transverse, sagittal, and frontal plane matrix testing was good with ICCs of 0.89 (95% CI 0.80-0.94), 0.88 (95% CI 0.79-0.94), and 0.90 (95% CI 0
Rater variables associated with ITER ratings.

PubMed

Paget, Michael; Wu, Caren; McIlwrick, Joann; Woloschuk, Wayne; Wright, Bruce; McLaughlin, Kevin

2013-10-01

Advocates of holistic assessment consider the ITER a more authentic way to assess performance. But this assessment format is subjective and, therefore, susceptible to rater bias. Here our objective was to study the association between rater variables and ITER ratings. In this observational study our participants were clerks at the University of Calgary and preceptors who completed online ITERs between February 2008 and July 2009. Our outcome variable was global rating on the ITER (rated 1-5), and we used a generalized estimating equation model to identify variables associated with this rating. Students were rated "above expected level" or "outstanding" on 66.4 % of 1050 online ITERs completed during the study period. Two rater variables attenuated ITER ratings: the log transformed time taken to complete the ITER [β = -0.06, 95 % confidence interval (-0.10, -0.02), p = 0.002], and the number of ITERs that a preceptor completed over the time period of the study [β = -0.008 (-0.02, -0.001), p = 0.02]. In this study we found evidence of leniency bias that resulted in two thirds of students being rated above expected level of performance. This leniency bias appeared to be attenuated by delay in ITER completion, and was also blunted in preceptors who rated more students. As all biases threaten the internal validity of the assessment process, further research is needed to confirm these and other sources of rater bias in ITER ratings, and to explore ways of limiting their impact.
A Qualitative Analysis of Rater Behavior on an L2 Speaking Assessment

ERIC Educational Resources Information Center

Kim, Hyun Jung

2015-01-01

Human raters are normally involved in L2 performance assessment; as a result, rater behavior has been widely investigated to reduce rater effects on test scores and to provide validity arguments. Yet raters' cognition and use of rubrics in their actual rating have rarely been explored qualitatively in L2 speaking assessments. In this study three…

A sequential test for assessing observed agreement between raters.

PubMed

Bersimis, Sotiris; Sachlas, Athanasios; Chakraborti, Subha

2018-01-01

Assessing the agreement between two or more raters is an important topic in medical practice. Existing techniques, which deal with categorical data, are based on contingency tables. This is often an obstacle in practice as we have to wait for a long time to collect the appropriate sample size of subjects to construct the contingency table. In this paper, we introduce a nonparametric sequential test for assessing agreement, which can be applied as data accrues, does not require a contingency table, facilitating a rapid assessment of the agreement. The proposed test is based on the cumulative sum of the number of disagreements between the two raters and a suitable statistic representing the waiting time until the cumulative sum exceeds a predefined threshold. We treat the cases of testing two raters' agreement with respect to one or more characteristics and using two or more classification categories, the case where the two raters extremely disagree, and finally the case of testing more than two raters' agreement. The numerical investigation shows that the proposed test has excellent performance. Compared to the existing methods, the proposed method appears to require significantly smaller sample size with equivalent power. Moreover, the proposed method is easily generalizable and brings the problem of assessing the agreement between two or more raters and one or more characteristics under a unified framework, thus providing an easy to use tool to medical practitioners. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Consensus Conference Follow-up: Inter-rater Reliability Assessment of the Best Evidence in Emergency Medicine (BEEM) Rater Scale, a Medical Literature Rating Tool for Emergency Physicians

PubMed Central

Worster, Andrew; Kulasegaram, Kulamakan; Carpenter, Christopher R.; Vallera, Teresa; Upadhye, Suneel; Sherbino, Jonathan; Haynes, R. Brian

2011-01-01

Background Studies published in general and specialty medical journals have the potential to improve emergency medicine (EM) practice, but there can be delayed awareness of this evidence because emergency physicians (EPs) are unlikely to read most of these journals. Also, not all published studies are intended for or ready for clinical practice application. The authors developed “Best Evidence in Emergency Medicine” (BEEM) to ameliorate these problems by searching for, identifying, appraising, and translating potentially practice-changing studies for EPs. An initial step in the BEEM process is the BEEM rater scale, a novel tool for EPs to collectively evaluate the relative clinical relevance of EM-related studies found in more than 120 journals. The BEEM rater process was designed to serve as a clinical relevance filter to identify those studies with the greatest potential to affect EM practice. Therefore, only those studies identified by BEEM raters as having the highest clinical relevance are selected for the subsequent critical appraisal process and, if found methodologically sound, are promoted as the best evidence in EM. Objectives The primary objective was to measure inter-rater reliability (IRR) of the BEEM rater scale. Secondary objectives were to determine the minimum number of EP raters needed for the BEEM rater scale to achieve acceptable reliability and to compare performance of the scale against a previously published evidence rating system, the McMaster Online Rating of Evidence (MORE), in an EP population. Methods The authors electronically distributed the title, conclusion, and a PubMed link for 23 recently published studies related to EM to a volunteer group of 134 EPs. The volunteers answered two demographic questions and rated the articles using one of two randomly assigned seven-point Likert scales, the BEEM rater scale (n = 68) or the MORE scale (n = 66), over two separate administrations. The IRR of each scale was measured using
The Effect of Year-to-Year Rater Variation on IRT Linking

ERIC Educational Resources Information Center

Yen, Shu Jing; Ochieng, Charles; Michaels, Hillary; Friedman, Greg

2005-01-01

Year-to-year rater variation may result in constructed response (CR) parameter changes, making CR items inappropriate to use in anchor sets for linking or equating. This study demonstrates how rater severity affected the writing and reading scores. Rater adjustments were made to statewide results using an item response theory (IRT) methodology…
Statistically Comparing the Performance of Multiple Automated Raters across Multiple Items

ERIC Educational Resources Information Center

Kieftenbeld, Vincent; Boyer, Michelle

2017-01-01

Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…
An evaluation of the predictive validity and inter-rater reliability of clinical diagnostic criteria for senile dementia of Lewy body type.

PubMed

McKeith, I G; Fairbairn, A F; Bothwell, R A; Moore, P B; Ferrier, I N; Thompson, P; Perry, R H

1994-05-01

Several recent autopsy studies suggest that senile dementia of Lewy body type (SDLT) may be the second most common neuropathologic cause of dementia in the elderly, accounting for 7 to 30% of all cases. Operational criteria for the antemortem clinical diagnosis of SDLT have already been proposed by our group. The performance of these is now examined by randomizing the case notes from a new series of SDLT, Alzheimer, and multi-infarct dementia patients for psychiatric assessment by four raters of varying clinical experience and blind to pathologic diagnosis. Using the SDLT criteria, the two most experienced raters agreed in 94% of cases (kappa = 0.87), with the least experienced rater agreeing in 78% (kappa = 0.50). Diagnostic specificity for SDLT was uniformly high (90.0 to 97.0%), with a mean sensitivity of detection of 74%, and was greater by the experienced (90.0%) than the least experienced (55%) clinician. The antemortem identification of SDLT patients can therefore be achieved with a high degree of diagnostic specificity using such operationalized criteria, although there remains a minority of patients who present with either "typical" Alzheimer-type symptoms or with paranoid or delusional symptoms in the absence of substantial cognitive impairment. Sensitivity to neuroleptics may be a useful diagnostic pointer in these patients.
The Impact of Rater Variability on Relationships among Different Effect-Size Indices for Inter-Rater Agreement between Human and Automated Essay Scoring

ERIC Educational Resources Information Center

Yun, Jiyeo

2017-01-01

Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…
Resistance versus Balance Training to Improve Postural Control in Parkinson's Disease: A Randomized Rater Blinded Controlled Study

PubMed Central

Schlenstedt, Christian; Paschen, Steffen; Kruse, Annika; Raethjen, Jan; Weisser, Burkhard; Deuschl, Günther

2015-01-01

Background Reduced muscle strength is an independent risk factor for falls and related to postural instability in individuals with Parkinson’s disease. The ability of resistance training to improve postural control still remains unclear. Objective To compare resistance training with balance training to improve postural control in people with Parkinson’s disease. Methods 40 patients with idiopathic Parkinson’s disease (Hoehn&Yahr: 2.5–3.0) were randomly assigned into resistance or balance training (2x/week for 7 weeks). Assessments were performed at baseline, 8- and 12-weeks follow-up: primary outcome: Fullerton Advanced Balance (FAB) scale; secondary outcomes: center of mass analysis during surface perturbations, Timed-up-and-go-test, Unified Parkinson’s Disease Rating Scale, Clinical Global Impression, gait analysis, maximal isometric leg strength, PDQ-39, Beck Depression Inventory. Clinical tests were videotaped and analysed by a second rater, blind to group allocation and assessment time. Results 32 participants (resistance training: n = 17, balance training: n = 15; 8 drop-outs) were analyzed at 8-weeks follow-up. No significant difference was found in the FAB scale when comparing the effects of the two training types (p = 0.14; effect size (Cohen’s d) = -0.59). Participants from the resistance training group, but not from the balance training group significantly improved on the FAB scale (resistance training: +2.4 points, Cohen’s d = -0.46; balance training: +0.3 points, Cohen’s d = -0.08). Within the resistance training group, improvements of the FAB scale were significantly correlated with improvements of rate of force development and stride time variability. No significant differences were found in the secondary outcome measures when comparing the training effects of both training types. Conclusions The difference between resistance and balance training to improve postural control in people with Parkinson’s disease was small and not
Explaining sexual harassment judgments: looking beyond gender of the rater.

PubMed

O'Connor, Maureen; Gutek, Barbara A; Stockdale, Margaret; Geer, Tracey M; Melançon, Renée

2004-02-01

In two decades of research on sexual harassment, one finding that appears repeatedly is that gender of the rater influences judgments about sexual harassment such that women are more likely than men to label behavior as sexual harassment. Yet, sexual harassment judgments are complex, particularly in situations that culminate in legal proceedings. And, this one variable, gender, may have been overemphasized to the exclusion of other situational and rater characteristic variables. Moreover, why do gender differences appear? As work by Wiener and his colleagues have done (R. L. Wiener et al., 2002; R. L. Wiener & L. Hurt, 2000; R. L. Wiener, L. Hurt, B. Russell, K. Mannen, & C. Gasper, 1997), this study attempts to look beyond gender to answer this question. In the studies reported here, raters (undergraduates and community adults), either read a written scenario or viewed a videotaped reenactment of a sexual harassment trial. The nature of the work environment was manipulated to see what, if any, effect the context would have on gender effects. Additionally, a number of rater characteristics beyond gender were measured, including ambivalent sexism attitudes of the raters, their judgments of complainant credibility, and self-referencing that might help explain rater judgments. Respondent gender, work environment, and community vs. student sample differences produced reliable differences in sexual harassment ratings in both the written and video trial versions of the study. The gender and sample differences in the sexual harassment ratings, however, are explained by a model which incorporates hostile sexism, perceptions of the complainants credibility, and raters' own ability to put themselves in the complainant's position (self-referencing).
Sustained-release bupropion versus naltrexone in the treatment of pathological gambling: a preliminary blind-rater study.

PubMed

Dannon, Pinhas N; Lowengrub, Katherine; Musin, Ernest; Gonopolski, Yehudit; Kotler, Moshe

2005-12-01

Pathological gambling (PG) is a relatively common and highly disabling impulse control disorder. A range of psychotherapeutic agents, including selective serotonin reuptake inhibitors, mood stabilizers, and opioid antagonists, has been shown to be effective in the treatment of PG. The use of selective serotonin reuptake inhibitors and opioid antagonists for PG is consistent with the observation that PG shares features of both the obsessive-compulsive spectrum disorders and addictive disorders. The aim of the study is to compare the effectiveness of sustained-release bupropion versus naltrexone in the treatment of PG. Thirty-six male pathological gamblers were enrolled in our study. A comprehensive psychiatric diagnostic evaluation was performed at baseline on all patients, and patients were screened for symptoms of gambling, depression, and anxiety using the South Oaks Gambling Screen, the Hamilton Depression Rating Scale, the Hamilton Anxiety Rating Scale, and the Clinical Global Impression-Severity Scale. In addition, the patients completed self-report questionnaires about their demographic status. Patients were randomized in 2 groups and received either naltrexone (n = 19) or sustained-release bupropion (n = 17) for 12 weeks in a parallel fashion. Treatment response was monitored using the Clinical Global Impression-Improvement Scale which was performed at weeks 2, 4, 8, and 12. Patients were also assessed for the presence of gambling behavior via an unstructured interview, which was also performed at weeks 2, 4, 6, 8, and 12. Raters were blind to the study treatment. The majority of patients responded well to the drug treatment. Twelve of 17 patients in the sustained-release bupropion group completed the 12-week study, and 13 of 19 naltrexone patients completed the study. Nine (75%) of the 12 completers were rated as full responders in the sustained-release bupropion group versus 10 (76%) of 12 in the naltrexone group. Three (25%) of 12 completers in the
Ultrasound measures of tendon thickness: Intra-rater, Inter-rater and Inter-machine reliability.

PubMed

Del Baño-Aledo, María Elena; Martínez-Payá, Jacinto Javier; Ríos-Díaz, José; Mejías-Suárez, Silvia; Serrano-Carmona, Sergio; de Groot-Ferrando, Ana

2017-01-01

Ultrasound imaging is often used by physiotherapists and other healthcare professionals but the reliability of image acquisition with different ultrasound machines is unknown. The objective was to compare the intra-rater, inter-rater and intermachine reliability of thickness measurements of the plantar fascia (PF), Achilles tendon (AT), patellar tendon (PT) and elbow common extensor tendon (ECET) with musculoskeletal ultrasound imaging (MSUS). Tendon thickness was measured in four anatomical structures (14 participants, 28 images per tendon) by two sonographers and with two different ultrasound machines. Intraclass Correlation Coefficients (ICCs) and Bland-Altman plots were calculated. The standard error of measurement (SEM) and minimum detectable difference (MDD) were calculated. Inter-rater reliability was excellent for AT (ICC=0.98; 95% CI= 0.96-0.99) and very good for PT (ICC=0.85; 95% CI = 0.67-0.93) and ECET (ICC=0.81; 95% CI= 0.72-0.94). Reliability for PF was moderate, with an ICC of 0.63 (CI 95%= 0.20-0.83). Bland-Altman plot for inter-machine reliability showed a mean difference of 1 m for PF measurements and a mean difference of 4 m and 20 m for AT and PT. The relative SEMs were below 7% and the MDCs were below 0.7 mm. The MSUS reliability in measuring thickness of the four tendons is confirmed by the homogeneous readings intra sonographers, between operators and between different machines. Level of evidence: Tendon thickness can be measured reliably on different ultrasound devices, which is an important step forward in the use of this technique in daily clinical practice and research. III.
Rater Effects in Clinical Performance Ratings of Surgery Residents

ERIC Educational Resources Information Center

Iramaneerat, Cherdsak; Myford, Carol M.

2006-01-01

A multi-faceted Rasch measurement (MFRM) approach was used to analyze clinical performance ratings of 24 first-year residents in one surgery residency program in Thailand to investigate three types of rater effects: leniency, rater inconsistency, and restriction of range. Faculty from 14 surgical services rated the clinical performance of…
Training Raters to Assess Adult ADHD: Reliability of Ratings

ERIC Educational Resources Information Center

Adler, Lenard A.; Spencer, Thomas; Faraone, Stephen V.; Reimherr, Fred W.; Kelsey, Douglas; Michelson, David; Biederman, Joseph

2005-01-01

The standardization of ADHD ratings in adults is important given their differing symptom presentation. The authors investigated the agreement and reliability of rater standardization in a large-scale trial of atomoxetine in adults with ADHD. Training of 91 raters for the investigator-administered ADHD Rating Scale (ADHDRS-IV-Inv) occurred prior to…
Rater Severity in Large-Scale Assessment: Is It Invariant?

ERIC Educational Resources Information Center

McQueen, Joy; Congdon, Peter J.

A study was conducted to investigate the stability of rater severity over an extended rating period. Multifaceted Rasch analysis was applied to ratings of writing performances of 8,285 primary school (elementary) students. Each performance was rated on two performance dimensions by two trained raters over a period of 7 rating days. Performances…
Investigating Raters' Development of Rating Ability on a Second Language Speaking Assessment

ERIC Educational Resources Information Center

Kim, Hyun Jung

2011-01-01

The purpose of the study was to investigate the extent to which raters coming from diverse backgrounds exhibited different levels of rating ability while scoring speaking performances. The study also aimed to examine how raters with different backgrounds could develop their rating ability over time. For this purpose, raters' background…
Rater Expertise in a Second Language Speaking Assessment: The Influence of Training and Experience

ERIC Educational Resources Information Center

Davis, Lawrence Edward

2012-01-01

Speaking performance tests typically employ raters to produce scores; accordingly, variability in raters' scoring decisions has important consequences for test reliability and validity. One such source of variability is the rater's level of expertise in scoring. Therefore, it is important to understand how raters' performance is influenced by…
Detecting rater bias using a person-fit statistic: a Monte Carlo simulation study.

PubMed

Aubin, André-Sébastien; St-Onge, Christina; Renaud, Jean-Sébastien

2018-04-01

With the Standards voicing concern for the appropriateness of response processes, we need to explore strategies that would allow us to identify inappropriate rater response processes. Although certain statistics can be used to help detect rater bias, their use is complicated by either a lack of data about their actual power to detect rater bias or the difficulty related to their application in the context of health professions education. This exploratory study aimed to establish the worthiness of pursuing the use of l z to detect rater bias. We conducted a Monte Carlo simulation study to investigate the power of a specific detection statistic, that is: the standardized likelihood l z person-fit statistics (PFS). Our primary outcome was the detection rate of biased raters, namely: raters whom we manipulated into being either stringent (giving lower scores) or lenient (giving higher scores), using the l z statistic while controlling for the number of biased raters in a sample (6 levels) and the rate of bias per rater (6 levels). Overall, stringent raters (M = 0.84, SD = 0.23) were easier to detect than lenient raters (M = 0.31, SD = 0.28). More biased raters were easier to detect then less biased raters (60% bias: 62, SD = 0.37; 10% bias: 43, SD = 0.36). The PFS l z seems to offer an interesting potential to identify biased raters. We observed detection rates as high as 90% for stringent raters, for whom we manipulated more than half their checklist. Although we observed very interesting results, we cannot generalize these results to the use of PFS with estimated item/station parameters or real data. Such studies should be conducted to assess the feasibility of using PFS to identify rater bias.
Individual Differences in Rater Decision-Making Style: An Exploratory Mixed-Methods Study

ERIC Educational Resources Information Center

Baker, Beverly Anne

2012-01-01

Researchers of high-stakes, subjectively scored writing assessments have done much work to better understand the process that raters go through in applying a rating scale to a language performance to arrive at a score. However, there is still unexplained, systematic variability in rater scoring that resists rater training (see Hoyt & Kerns,…
Factors Influencing Mini-CEX Rater Judgments and Their Practical Implications: A Systematic Literature Review.

PubMed

Lee, Victor; Brain, Keira; Martin, Jenepher

2017-06-01

At present, little is known about how mini-clinical evaluation exercise (mini-CEX) raters translate their observations into judgments and ratings. The authors of this systematic literature review aim both to identify the factors influencing mini-CEX rater judgments in the medical education setting and to translate these findings into practical implications for clinician assessors. The authors searched for internal and external factors influencing mini-CEX rater judgments in the medical education setting from 1980 to 2015 using the Ovid MEDLINE, PsycINFO, ERIC, PubMed, and Scopus databases. They extracted the following information from each study: country of origin, educational level, study design and setting, type of observation, occurrence of rater training, provision of feedback to the trainee, research question, and identified factors influencing rater judgments. The authors also conducted a quality assessment for each study. Seventeen articles met the inclusion criteria. The authors identified both internal and external factors that influence mini-CEX rater judgments. They subcategorized the internal factors into intrinsic rater factors, judgment-making factors (conceptualization, interpretation, attention, and impressions), and scoring factors (scoring integration and domain differentiation). The current theories of rater-based judgment have not helped clinicians resolve the issues of rater idiosyncrasy, bias, gestalt, and conflicting contextual factors; therefore, the authors believe the most important solution is to increase the justification of rater judgments through the use of specific narrative and contextual comments, which are more informative for trainees. Finally, more real-world research is required to bridge the gap between the theory and practice of rater cognition.
Inter-rater Reliability of Real-Time Ultrasound to Measure Acromiohumeral Distance.

PubMed

Mackenzie, Tanya Anne; Bdaiwi, Alya H; Herrington, Lee; Cools, Ann

2016-07-01

Real-time ultrasound (RTUS) has been suggested as a reliable measure of acromiohumeral distance. However, to date, no vigorous assessment and reporting of inter-rater reliability of this method has been performed with the shoulder in a neutral position or with active and passive arm abduction. To assess intrasession inter-rater reliability of using RTUS to measure acromiohumeral distance with the shoulder in a neutral position and with 60° active and passive abduction. Inter-rater intrasession reliability of repeated measures. Human performance laboratory. Twenty persons (12 male and 8 female) with an average age of 29.86 years (standard deviation, 7.8). In an inter-rater, intrasession study, RTUS was used to measure the acromiohumeral distance with the shoulder in a neutral position and with 60° of both active and passive abduction. Acromiohumeral distance. Intraclass correlation coefficient (ICC)2.1 scores ranged between 0.65-0.88 (standard error of the mean = 0.81-1.2 mm and minimal detectable differences with 95% confidence = 2.2-2.3 mm) for inter-rater intrasession reliability. RTUS was found to have fair to good inter-rater reliability as a tool to measure acromiohumeral distance with the shoulder in a neutral position and with 60° of both active and passive arm abduction. Copyright © 2016 American Academy of Physical Medicine and Rehabilitation. Published by Elsevier Inc. All rights reserved.
A Comparison of Assessment Methods and Raters in Product Creativity

ERIC Educational Resources Information Center

Lu, Chia-Chen; Luh, Ding-Bang

2012-01-01

Although previous studies have attempted to use different experiences of raters to rate product creativity by adopting the Consensus Assessment Method (CAT) approach, the validity of replacing CAT with another measurement tool has not been adequately tested. This study aimed to compare raters with different levels of experience (expert ves.…

Measuring the Impact of Rater Negotiation in Writing Performance Assessment

ERIC Educational Resources Information Center

Trace, Jonathan; Janssen, Gerriet; Meier, Valerie

2017-01-01

Previous research in second language writing has shown that when scoring performance assessments even trained raters can exhibit significant differences in severity. When raters disagree, using discussion to try to reach a consensus is one popular form of score resolution, particularly in contexts with limited resources, as it does not require…
Exploring the Role of First Impressions in Rater-Based Assessments

ERIC Educational Resources Information Center

Wood, Timothy J.

2014-01-01

Medical education relies heavily on assessment formats that require raters to assess the competence and skills of learners. Unfortunately, there are often inconsistencies and variability in the scores raters assign. To ensure the scores from these assessment tools have validity, it is important to understand the underlying cognitive processes that…
Inter-rater reliability of twelve diagnostic systems of schizophrenia.

PubMed

Helmes, E; Landmark, J; Kazarian, S S

1983-05-01

The present and past symptomatology of 31 chronic schizophrenics was rated by four independent judges, two experienced clinical psychiatrists and two psychiatric residents, in a context more representative of actual clinical practice than most research studies. Ratings were made on 64 symptoms derived from 12 diagnostic systems, based on either live or videotaped interviews for present symptomatology and case records for past symptomatology. Inter-rater reliabilities were higher for present than for past symptoms, and in general did not approach those reported for highly trained raters. There were no differences between live and videotaped interviews. Diagnostic systems differed widely in rater agreement. The most consistent across both past and present symptomatology were the systems of Langfeldt, Schneider, and DSM-III, for which the level of reliability was consistent with other studies.
The Influence of Training and Experience on Rater Performance in Scoring Spoken Language

ERIC Educational Resources Information Center

Davis, Larry

2016-01-01

Two factors were investigated that are thought to contribute to consistency in rater scoring judgments: rater training and experience in scoring. Also considered were the relative effects of scoring rubrics and exemplars on rater performance. Experienced teachers of English (N = 20) scored recorded responses from the TOEFL iBT speaking test prior…
The Stability of Rater Severity in Large-Scale Assessment Programs.

ERIC Educational Resources Information Center

Congdon, Peter J.; McQueen, Joy

2000-01-01

Studied the stability of rater severity over an extended rating period by applying multifaceted Rasch analysis to ratings of 16 raters of writing performances of 8,285 elementary school students. Findings cast doubt on the practice of using a single calibration of rate severity as the basis for adjustment of person measures. (SLD)
Rater reliability and construct validity of a mobile application for posture analysis

PubMed Central

Szucs, Kimberly A.; Brown, Elena V. Donoso

2018-01-01

[Purpose] Measurement of posture is important for those with a clinical diagnosis as well as researchers aiming to understand the impact of faulty postures on the development of musculoskeletal disorders. A reliable, cost-effective and low tech posture measure may be beneficial for research and clinical applications. The purpose of this study was to determine rater reliability and construct validity of a posture screening mobile application in healthy young adults. [Subjects and Methods] Pictures of subjects were taken in three standing positions. Two raters independently digitized the static standing posture image twice. The app calculated posture variables, including sagittal and coronal plane translations and angulations. Intra- and inter-rater reliability were calculated using the appropriate ICC models for complete agreement. Construct validity was determined through comparison of known groups using repeated measures ANOVA. [Results] Intra-rater reliability ranged from 0.71 to 0.99. Inter-rater reliability was good to excellent for all translations. ICCs were stronger for translations versus angulations. The construct validity analysis found that the app was able to detect the change in the four variables selected. [Conclusion] The posture mobile application has demonstrated strong rater reliability and preliminary evidence of construct validity. This application may have utility in clinical and research settings. PMID:29410561
Rater reliability and construct validity of a mobile application for posture analysis.

PubMed

Szucs, Kimberly A; Brown, Elena V Donoso

2018-01-01

[Purpose] Measurement of posture is important for those with a clinical diagnosis as well as researchers aiming to understand the impact of faulty postures on the development of musculoskeletal disorders. A reliable, cost-effective and low tech posture measure may be beneficial for research and clinical applications. The purpose of this study was to determine rater reliability and construct validity of a posture screening mobile application in healthy young adults. [Subjects and Methods] Pictures of subjects were taken in three standing positions. Two raters independently digitized the static standing posture image twice. The app calculated posture variables, including sagittal and coronal plane translations and angulations. Intra- and inter-rater reliability were calculated using the appropriate ICC models for complete agreement. Construct validity was determined through comparison of known groups using repeated measures ANOVA. [Results] Intra-rater reliability ranged from 0.71 to 0.99. Inter-rater reliability was good to excellent for all translations. ICCs were stronger for translations versus angulations. The construct validity analysis found that the app was able to detect the change in the four variables selected. [Conclusion] The posture mobile application has demonstrated strong rater reliability and preliminary evidence of construct validity. This application may have utility in clinical and research settings.
Intra and Inter-Rater Reliability of Screening for Movement Impairments: Movement Control Tests from The Foundation Matrix

PubMed Central

Mischiati, Carolina R.; Comerford, Mark; Gosford, Emma; Swart, Jacqueline; Ewings, Sean; Botha, Nadine; Stokes, Maria; Mottram, Sarah L.

2015-01-01

Pre-season screening is well established within the sporting arena, and aims to enhance performance and reduce injury risk. With the increasing need to identify potential injury with greater accuracy, a new risk assessment process has been produced; The Performance Matrix (battery of movement control tests). As with any new method of objective testing, it is fundamental to establish whether the same results can be reproduced between examiners and by the same examiner on consecutive occasions. This study aimed to determine the intra-rater test re-test and inter-rater reliability of tests from a component of The Performance Matrix, The Foundation Matrix. Twenty participants were screened by two experienced musculoskeletal therapists using nine tests to assess the ability to control movement during specific tasks. Movement evaluation criteria for each test were rated as pass or fail. The therapists observed participants real-time and tests were recorded on video to enable repeated ratings four months later to examine intra-rater reliability (videos rated two weeks apart). Overall test percentage agreement was 87% for inter-rater reliability; 98% Rater 1, 94% Rater 2 for test re-test reliability; and 75% for real-time versus video. Intraclass-correlation coefficients (ICCs) were excellent between raters (0.81) and within raters (Rater 1, 0.96; Rater 2, 0.88) but poor for real-time versus video (0.23). Reliability for individual components of each test was more variable: inter-rater, 68-100%; intra-rater, 88-100% Rater 1, 75-100% Rater 2; and real-time versus video 31-100%. Cohen’s Kappa values for inter-rater reliability were 0.0-1.0; intra-rater 0.6-1.0 for Rater 1; -0.1-1.0 for Rater 2; and -0.1-1 for real-time versus video. It is concluded that both inter and intra-rater reliability of tests in The Foundation Matrix are acceptable when rated by experienced therapists. Recommendations are made for modifying some of the criteria to improve reliability where
Adjusting for Year to Year Rater Variation in IRT Linking--An Empirical Evaluation

ERIC Educational Resources Information Center

Yen, Shu Jing; Ochieng, Charles; Michaels, Hillary; Friedman, Greg

2005-01-01

The main purpose of this study was to illustrate a polytomous IRT-based linking procedure that adjusts for rater variations. Test scores from two administrations of a statewide reading assessment were used. An anchor set of Year 1 students' constructed responses were rescored by Year 2 raters. To adjust for year-to-year rater variation in IRT…
Inter-Rater Reliability of Neck Reflex Points in Women with Chronic Neck Pain.

PubMed

Weinschenk, Stefan; Göllner, Richard; Hollmann, Markus W; Hotz, Lorenz; Picardi, Susanne; Hubbert, Katharina; Strowitzki, Thomas; Meuser, Thomas

2016-01-01

Neck reflex points (NRP) are tender soft tissue areas of the cervical region that display reflectory changes in response to chronic inflammations of correlated regions in the visceral cranium. Six bilateral areas, NRP C0, C1, C2, C3, C4 and C7, are detectable by palpating the lateral neck. We investigated the inter-rater reliability of NRP to assess their potential clinical relevance. 32 consecutive patients with chronic neck pain were examined for NRP tenderness by an experienced physician and an inexperienced medical student in a blinded design. A detailed description of the palpation technique is included in this section. Absence of pain was defined as pain index (PI) = 0, slight tenderness = 1, and marked pain = 2. Findings were evaluated either by pair-wise Cohen's kappa (ĸ) or by percentage of agreement (PA). Examiners identified 40% and 41% of positive NRP, respectively (PI > 0, physician: 155, student: 157) with a slight preference for the left side (1.2:1). The number of patients identified with >6 positive NRP by the examiners was similar (13 vs. 12 patients). ĸ values ranged from 0.52 to 0.95. The overall kappa was ĸ = 0.80 for the left and ĸ = 0.74 for the right side. PA varied from 78.1% to 96.9% with strongest agreement at NRP C0, NRP C2, and NRP C7. Inter-rater agreement was independent of patients' age, gender, body mass index and examiner's experience. The high reproducibility suggests the clinical relevance of NRP in women. © 2016 S. Karger GmbH, Freiburg.
Investigating Differences between American and Indian Raters in Assessing TOEFL iBT Speaking Tasks

ERIC Educational Resources Information Center

Wei, Jing; Llosa, Lorena

2015-01-01

This article reports on an investigation of the role raters' language background plays in raters' assessment of test takers' speaking ability. Specifically, this article examines differences between American and Indian raters in their scores and scoring processes when rating Indian test takers' responses to the Test of English as a Foreign…
Rater Cognition: Implications for Validity

ERIC Educational Resources Information Center

Bejar, Issac I.

2012-01-01

The scoring process is critical in the validation of tests that rely on constructed responses. Documenting that readers carry out the scoring in ways consistent with the construct and measurement goals is an important aspect of score validity. In this article, rater cognition is approached as a source of support for a validity argument for scores…
Inter-rater reliability of surgical reviews for AREN03B2: a COG renal tumor committee study.

PubMed

Hamilton, Thomas E; Barnhart, Douglas; Gow, Kenneth; Ferrer, Fernando; Kandel, Jessica; Glick, Richard; Dasgupta, Roshni; Naranjo, Arlene; He, Ying; Gratias, Eric; Geller, James; Mullen, Elizabeth; Ehrlich, Peter

2014-01-01

The Children's Oncology Group (COG) renal tumor study (AREN03B2) requires real-time central review of radiology, pathology, and the surgical procedure to determine appropriate risk-based therapy. The purpose of this study was to determine the inter-rater reliability of the surgical reviews. Of the first 3200 enrolled AREN03B2 patients, a sample of 100 enriched for blood vessel involvement, spill, rupture, and lymph node involvement was selected for analysis. The surgical assessment was then performed independently by two blinded surgical reviewers and compared to the original assessment, which had been completed by another of the committee surgeons. Variables assessed included surgeon-determined local tumor stage, overall disease stage, type of renal procedure performed, presence of tumor rupture, occurrence of intraoperative tumor spill, blood vessel involvement, presence of peritoneal implants, and interpretation of residual disease. Inter-rater reliability was measured using the Fleiss' Kappa statistic two-sided hypothesis tests (Kappa, p-value). Local tumor stage correlated in all 3 reviews except in one case (Kappa=0.9775, p<0.001). Similarly, overall disease stage had excellent correlation (0.9422, p<0.001). There was strong correlation for type of renal procedure (0.8357, p<0.001), presence of tumor rupture (0.6858, p<0.001), intraoperative tumor spill (0.6493, p<0.001), and blood vessel involvement (0.6470, p<0.001). Variables that had lower correlation were determination of the presence of peritoneal implants (0.2753, p<0.001) and interpretation of residual disease status (0.5310, p<0.001). The inter-rater reliability of the surgical review is high based on the great consistency in the 3 independent review results. This analysis provides validation and establishes precedent for real-time central surgical review to determine treatment assignment in a risk-based stratagem for multimodal cancer therapy. © 2014.
Assessing the influence of rater and subject characteristics on measures of agreement for ordinal ratings.

PubMed

Nelson, Kerrie P; Mitani, Aya A; Edwards, Don

2017-09-10

Widespread inconsistencies are commonly observed between physicians' ordinal classifications in screening tests results such as mammography. These discrepancies have motivated large-scale agreement studies where many raters contribute ratings. The primary goal of these studies is to identify factors related to physicians and patients' test results, which may lead to stronger consistency between raters' classifications. While ordered categorical scales are frequently used to classify screening test results, very few statistical approaches exist to model agreement between multiple raters. Here we develop a flexible and comprehensive approach to assess the influence of rater and subject characteristics on agreement between multiple raters' ordinal classifications in large-scale agreement studies. Our approach is based upon the class of generalized linear mixed models. Novel summary model-based measures are proposed to assess agreement between all, or a subgroup of raters, such as experienced physicians. Hypothesis tests are described to formally identify factors such as physicians' level of experience that play an important role in improving consistency of ratings between raters. We demonstrate how unique characteristics of individual raters can be assessed via conditional modes generated during the modeling process. Simulation studies are presented to demonstrate the performance of the proposed methods and summary measure of agreement. The methods are applied to a large-scale mammography agreement study to investigate the effects of rater and patient characteristics on the strength of agreement between radiologists. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Two Models of Raters in a Structured Oral Examination: Does It Make a Difference?

ERIC Educational Resources Information Center

Touchie, Claire; Humphrey-Murto, Susan; Ainslie, Martha; Myers, Kathryn; Wood, Timothy J.

2010-01-01

Oral examinations have become more standardized over recent years. Traditionally a small number of raters were used for this type of examination. Past studies suggested that more raters should improve reliability. We compared the results of a multi-station structured oral examination using two different rater models, those based in a station,…
Rater reliability and concurrent validity of the Keyboard Personal Computer Style instrument (K-PeCS).

PubMed

Baker, Nancy A; Cook, James R; Redfern, Mark S

2009-01-01

This paper describes the inter-rater and intra-rater reliability, and the concurrent validity of an observational instrument, the Keyboard Personal Computer Style instrument (K-PeCS), which assesses stereotypical postures and movements associated with computer keyboard use. Three trained raters independently rated the video clips of 45 computer keyboard users to ascertain inter-rater reliability, and then re-rated a sub-sample of 15 video clips to ascertain intra-rater reliability. Concurrent validity was assessed by comparing the ratings obtained using the K-PeCS to scores developed from a 3D motion analysis system. The overall K-PeCS had excellent reliability [inter-rater: intra-class correlation coefficients (ICC)=.90; intra-rater: ICC=.92]. Most individual items on the K-PeCS had from good to excellent reliability, although six items fell below ICC=.75. Those K-PeCS items that were assessed for concurrent validity compared favorably to the motion analysis data for all but two items. These results suggest that most items on the K-PeCS can be used to reliably document computer keyboarding style.
Summary measures of agreement and association between many raters' ordinal classifications.

PubMed

Mitani, Aya A; Freer, Phoebe E; Nelson, Kerrie P

2017-10-01

Interpretation of screening tests such as mammograms usually require a radiologist's subjective visual assessment of images, often resulting in substantial discrepancies between radiologists' classifications of subjects' test results. In clinical screening studies to assess the strength of agreement between experts, multiple raters are often recruited to assess subjects' test results using an ordinal classification scale. However, using traditional measures of agreement in some studies is challenging because of the presence of many raters, the use of an ordinal classification scale, and unbalanced data. We assess and compare the performances of existing measures of agreement and association as well as a newly developed model-based measure of agreement to three large-scale clinical screening studies involving many raters' ordinal classifications. We also conduct a simulation study to demonstrate the key properties of the summary measures. The assessment of agreement and association varied according to the choice of summary measure. Some measures were influenced by the underlying prevalence of disease and raters' marginal distributions and/or were limited in use to balanced data sets where every rater classifies every subject. Our simulation study indicated that popular measures of agreement and association are prone to underlying disease prevalence. Model-based measures provide a flexible approach for calculating agreement and association and are robust to missing and unbalanced data as well as the underlying disease prevalence. Copyright © 2017 Elsevier Inc. All rights reserved.
Rater Training to Support High-Stakes Simulation-Based Assessments

PubMed Central

Feldman, Moshe; Lazzara, Elizabeth H.; Vanderbilt, Allison A.; DiazGranados, Deborah

2013-01-01

Competency-based assessment and an emphasis on obtaining higher-level outcomes that reflect physicians’ ability to demonstrate their skills has created a need for more advanced assessment practices. Simulation-based assessments provide medical education planners with tools to better evaluate the 6 Accreditation Council for Graduate Medical Education (ACGME) and American Board of Medical Specialties (ABMS) core competencies by affording physicians opportunities to demonstrate their skills within a standardized and replicable testing environment, thus filling a gap in the current state of assessment for regulating the practice of medicine. Observational performance assessments derived from simulated clinical tasks and scenarios enable stronger inferences about the skill level a physician may possess, but also introduce the potential of rater errors into the assessment process. This article reviews the use of simulation-based assessments for certification, credentialing, initial licensure, and relicensing decisions and describes rater training strategies that may be used to reduce rater errors, increase rating accuracy, and enhance the validity of simulation-based observational performance assessments. PMID:23280532
Specific agreement on dichotomous outcomes can be calculated for more than two raters.

PubMed

de Vet, Henrica C W; Dikmans, Rieky E; Eekhout, Iris

2017-03-01

For assessing interrater agreement, the concepts of observed agreement and specific agreement have been proposed. The situation of two raters and dichotomous outcomes has been described, whereas often, multiple raters are involved. We aim to extend it for more than two raters and examine how to calculate agreement estimates and 95% confidence intervals (CIs). As an illustration, we used a reliability study that includes the scores of four plastic surgeons classifying photographs of breasts of 50 women after breast reconstruction into "satisfied" or "not satisfied." In a simulation study, we checked the hypothesized sample size for calculation of 95% CIs. For m raters, all pairwise tables [ie, m (m - 1)/2] were summed. Then, the discordant cells were averaged before observed and specific agreements were calculated. The total number (N) in the summed table is m (m - 1)/2 times larger than the number of subjects (n), in the example, N = 300 compared to n = 50 subjects times m = 4 raters. A correction of n√(m - 1) was appropriate to find 95% CIs comparable to bootstrapped CIs. The concept of observed agreement and specific agreement can be extended to more than two raters with a valid estimation of the 95% CIs. Copyright © 2017 Elsevier Inc. All rights reserved.
Inter- and intra-rater reliability of calliper-based lymph node measurement in dogs with peripheral nodal lymphomas.

PubMed

Childress, M O; Fulkerson, C M; Lahrman, S A; Weng, H-Y

2016-08-01

The purpose of this study was to assess reliability of lymph node measurements between and within raters in dogs with nodal lymphomas. Three raters measured lymph nodes from 20 dogs twice prior to and once after administering chemotherapy. Sum tumour volume (TV) and sum longest diameter (LD) of all lymph nodes at each time point, and the percent change in measurements following chemotherapy, were calculated for each dog. Inter- and intra-rater reliability were assessed with the intraclass correlation coefficient (ICC). ICC for inter-rater sum TV and sum LD prior to chemotherapy were 0.86 and 0.80, respectively. ICC for inter-rater sum TV and sum LD after chemotherapy were 0.95 and 0.91, respectively. ICC for percent change in sum TV and sum LD were 0.96 and 0.94, respectively. ICC for intra-rater reliability ranged from 0.90 to 0.98 for each rater. Inter- and intra-rater reliability in measurements among the three raters was good to excellent. © 2014 John Wiley & Sons Ltd.

The Effects of Primacy on Rater Cognition: An Eye-Tracking Study

ERIC Educational Resources Information Center

Ballard, Laura

2017-01-01

Rater scoring has an impact on writing test reliability and validity. Thus, there has been a continued call for researchers to investigate issues related to rating (Crusan, 2015). Investigating the scoring process and understanding how raters arrive at particular scores are critical "because the score is ultimately what will be used in making…
Inter-rater agreement in evaluation of disability: systematic review of reproducibility studies.

PubMed

Barth, Jürgen; de Boer, Wout E L; Busse, Jason W; Hoving, Jan L; Kedzia, Sarah; Couban, Rachel; Fischer, Katrin; von Allmen, David Y; Spanjer, Jerry; Kunz, Regina

2017-01-25

To explore agreement among healthcare professionals assessing eligibility for work disability benefits. Systematic review and narrative synthesis of reproducibility studies. Medline, Embase, and PsycINFO searched up to 16 March 2016, without language restrictions, and review of bibliographies of included studies. Observational studies investigating reproducibility among healthcare professionals performing disability evaluations using a global rating of working capacity and reporting inter-rater reliability by a statistical measure or descriptively. Studies could be conducted in insurance settings, where decisions on ability to work include normative judgments based on legal considerations, or in research settings, where decisions on ability to work disregard normative considerations. : Teams of paired reviewers identified eligible studies, appraised their methodological quality and generalisability, and abstracted results with pretested forms. As heterogeneity of research designs and findings impeded a quantitative analysis, a descriptive synthesis stratified by setting (insurance or research) was performed. From 4562 references, 101 full text articles were reviewed. Of these, 16 studies conducted in an insurance setting and seven in a research setting, performed in 12 countries, met the inclusion criteria. Studies in the insurance setting were conducted with medical experts assessing claimants who were actual disability claimants or played by actors, hypothetical cases, or short written scenarios. Conditions were mental (n=6, 38%), musculoskeletal (n=4, 25%), or mixed (n=6, 38%). Applicability of findings from studies conducted in an insurance setting to real life evaluations ranged from generalisable (n=7, 44%) and probably generalisable (n=3, 19%) to probably not generalisable (n=6, 37%). Median inter-rater reliability among experts was 0.45 (range intraclass correlation coefficient 0.86 to κ-0.10). Inter-rater reliability was poor in six studies (37
Inter-rater agreement in evaluation of disability: systematic review of reproducibility studies

PubMed Central

Barth, Jürgen; de Boer, Wout E L; Busse, Jason W; Hoving, Jan L; Kedzia, Sarah; Couban, Rachel; Fischer, Katrin; von Allmen, David Y; Spanjer, Jerry

2017-01-01

Objectives To explore agreement among healthcare professionals assessing eligibility for work disability benefits. Design Systematic review and narrative synthesis of reproducibility studies. Data sources Medline, Embase, and PsycINFO searched up to 16 March 2016, without language restrictions, and review of bibliographies of included studies. Eligibility criteria Observational studies investigating reproducibility among healthcare professionals performing disability evaluations using a global rating of working capacity and reporting inter-rater reliability by a statistical measure or descriptively. Studies could be conducted in insurance settings, where decisions on ability to work include normative judgments based on legal considerations, or in research settings, where decisions on ability to work disregard normative considerations.Teams of paired reviewers identified eligible studies, appraised their methodological quality and generalisability, and abstracted results with pretested forms. As heterogeneity of research designs and findings impeded a quantitative analysis, a descriptive synthesis stratified by setting (insurance or research) was performed. Results From 4562 references, 101 full text articles were reviewed. Of these, 16 studies conducted in an insurance setting and seven in a research setting, performed in 12 countries, met the inclusion criteria. Studies in the insurance setting were conducted with medical experts assessing claimants who were actual disability claimants or played by actors, hypothetical cases, or short written scenarios. Conditions were mental (n=6, 38%), musculoskeletal (n=4, 25%), or mixed (n=6, 38%). Applicability of findings from studies conducted in an insurance setting to real life evaluations ranged from generalisable (n=7, 44%) and probably generalisable (n=3, 19%) to probably not generalisable (n=6, 37%). Median inter-rater reliability among experts was 0.45 (range intraclass correlation coefficient 0.86 to κ−0
Rater Agreement Indexes for Performance Assessment.

ERIC Educational Resources Information Center

Burry-Stock, Judith A.; And Others

1996-01-01

It is argued that interrater agreement is a psychometric property which is theoretically different from classic reliability. Formulas are presented to illustrate a set of algebraically equivalent rater agreement indices that are intended to provide educational and psychological researchers with a practical way to establish a measure of rater…
Magnetic resonance enterography has good inter-rater agreement and diagnostic accuracy for detecting inflammation in pediatric Crohn disease.

PubMed

Church, Peter C; Greer, Mary-Louise C; Cytter-Kuint, Ruth; Doria, Andrea S; Griffiths, Anne M; Turner, Dan; Walters, Thomas D; Feldman, Brian M

2017-05-01

Magnetic resonance enterography (MRE) is increasingly relied upon for noninvasive assessment of intestinal inflammation in Crohn disease. However very few studies have examined the diagnostic accuracy of individual MRE signs in children. We have created an MR-based multi-item measure of intestinal inflammation in children with Crohn disease - the Pediatric Inflammatory Crohn's MRE Index (PICMI). To inform item selection for this instrument, we explored the inter-rater agreement and diagnostic accuracy of individual MRE signs of inflammation in pediatric Crohn disease and compared our findings with the reference standards of the weighted Pediatric Crohn's Disease Activity Index (wPCDAI) and C-reactive protein (CRP). In this cross-sectional single-center study, MRE studies in 48 children with diagnosed Crohn disease (66% male, median age 15.5 years) were reviewed by two independent radiologists for the presence of 15 MRE signs of inflammation. Using kappa statistics we explored inter-rater agreement for each MRE sign across 10 anatomical segments of the gastrointestinal tract. We correlated MRE signs with the reference standards using correlation coefficients. Radiologists measured the length of inflamed bowel in each segment of the gastrointestinal tract. In each segment, MRE signs were scored as either binary (0-absent, 1-present), or ordinal (0-absent, 1-mild, 2-marked). These segmental scores were weighted by the length of involved bowel and were summed to produce a weighted score per patient for each MRE sign. Using a combination of wPCDAI≥12.5 and CRP≥5 to define active inflammation, we calculated area under the receiver operating characteristic curve (AUC) for each weighted MRE sign. Bowel wall enhancement, wall T2 hyperintensity, wall thickening and wall diffusion-weighted imaging (DWI) hyperintensity were most commonly identified. Inter-rater agreement was best for decreased motility and wall DWI hyperintensity (kappa≥0.64). Correlation between MRE
Intra- and inter-rater agreement between an ophthalmologist and mid-level ophthalmic personnel to diagnose retinal diseases based on fundus photographs at a primary eye center in Nepal: the Bhaktapur Retina Study.

PubMed

Thapa, Raba; Bajimaya, Sanyam; Bouman, Renske; Paudyal, Govinda; Khanal, Shankar; Tan, Stevie; Thapa, Suman S; van Rens, Ger

2016-07-18

Early detection can reduce irreversible blindness from retinal diseases. This study aims to assess the intra- and inter-rater agreement of retinal pathologies observed on fundus photographs between an ophthalmologist and two-mid level ophthalmic personnel (MLOPs). A population-based, cross-sectional study was conducted among subjects 60 years and above in the Bhaktapur district of Nepal. Fundus photographs of 500 eyes of 500 subjects were assessed. The macula-centered 45-degree photographs were graded twice by one ophthalmologist and two MLOPs. Intra-rater and inter-rater agreements were assessed for the ophthalmologist and the MLOPs. Mean age was 70.22 years ± 6.94 (SD). Retinal pathologies were observed in 55.6 % of photographs (age-related macular degeneration: 34.2 %; diabetic retinopathy: 4.2 %; retinal vein occlusion: 3.8 %). Twelve (2.4 %) fundus pictures were non-gradable. The intra-rater agreement for overall retinal pathologies, retinal hemorrhage, and maculopathy were substantial both for the ophthalmologist as well as for the MLOPs. There was moderate inter-rater agreement between the ophthalmologist and the first MLOP on second rating for overall retinal pathologies, [kappa (k); 95 % CI = 0.59 (0.51-0.66)], retinal hemorrhage [k; 95 % CI = 0.60 (0.41-0.78)], and maculopathy [k; 95 % CI = 0.52 (0.43-0.60)]. Inter-rater agreement between the ophthalmologist and the second MLOP for second rating was moderate for overall retinal pathologies [k; 95 % CI = 0.52 (0.44-0.60)], substantial agreement for retinal hemorrhage [k; 95 % CI = 0. 68 (0.52-0.84)], moderate agreement for maculopathy [k; 95 % CI = 0.59 (0.50-0.67)]. There is moderate agreement between the MLOPs and the ophthalmologist in grading fundus photographs for retinal hemorrhages and maculopathy.
Comparing Native and Non-Native Raters of US Federal Government Speaking Tests

ERIC Educational Resources Information Center

Brooks, Rachel Lunde

2013-01-01

Previous Language Testing research has largely reported that although many raters' characteristics affect their evaluations of language assessments (Reed & Cohen, 2001), being a native speaker or non-native speaker rater does not significantly affect final ratings (Kim, 2009). In Second Language Acquisition, some researchers conclude that…
Qualitative analysis of MMI raters' scorings of medical school candidates: A matter of taste?

PubMed

Christensen, Mette K; Lykkegaard, Eva; Lund, Ole; O'Neill, Lotte D

2018-05-01

Recent years have seen leading medical educationalists repeatedly call for a paradigm shift in the way we view, value and use subjectivity in assessment. The argument is that subjective expert raters generally bring desired quality, not just noise, to performance evaluations. While several reviews document the psychometric qualities of the Multiple Mini-Interview (MMI), we currently lack qualitative studies examining what we can learn from MMI raters' subjectivity. The present qualitative study therefore investigates rater subjectivity or taste in MMI selection interview. Taste (Bourdieu 1984) is a practical sense, which makes it possible at a pre-reflective level to apply 'invisible' or 'tacit' categories of perception for distinguishing between good and bad. The study draws on data from explorative in-depth interviews with 12 purposefully selected MMI raters. We find that MMI raters spontaneously applied subjective criteria-their taste-enabling them to assess the candidates' interpersonal attributes and to predict the candidates' potential. In addition, MMI raters seemed to share a taste for certain qualities in the candidates (e.g. reflectivity, resilience, empathy, contact, alikeness, 'the good colleague'); hence, taste may be the result of an ongoing enculturation in medical education and healthcare systems. This study suggests that taste is an inevitable condition in the assessment of students' performance. The MMI set-up should therefore make room for MMI raters' taste and their connoisseurship, i.e. their ability to taste, to improve the quality of their assessment of medical school candidates.
Assessing Agreement between Multiple Raters with Missing Rating Information, Applied to Breast Cancer Tumour Grading

PubMed Central

Ellis, Ian O.; Green, Andrew R.; Hanka, Rudolf

2008-01-01

Background We consider the problem of assessing inter-rater agreement when there are missing data and a large number of raters. Previous studies have shown only ‘moderate’ agreement between pathologists in grading breast cancer tumour specimens. We analyse a large but incomplete data-set consisting of 24177 grades, on a discrete 1–3 scale, provided by 732 pathologists for 52 samples. Methodology/Principal Findings We review existing methods for analysing inter-rater agreement for multiple raters and demonstrate two further methods. Firstly, we examine a simple non-chance-corrected agreement score based on the observed proportion of agreements with the consensus for each sample, which makes no allowance for missing data. Secondly, treating grades as lying on a continuous scale representing tumour severity, we use a Bayesian latent trait method to model cumulative probabilities of assigning grade values as functions of the severity and clarity of the tumour and of rater-specific parameters representing boundaries between grades 1–2 and 2–3. We simulate from the fitted model to estimate, for each rater, the probability of agreement with the majority. Both methods suggest that there are differences between raters in terms of rating behaviour, most often caused by consistent over- or under-estimation of the grade boundaries, and also considerable variability in the distribution of grades assigned to many individual samples. The Bayesian model addresses the tendency of the agreement score to be biased upwards for raters who, by chance, see a relatively ‘easy’ set of samples. Conclusions/Significance Latent trait models can be adapted to provide novel information about the nature of inter-rater agreement when the number of raters is large and there are missing data. In this large study there is substantial variability between pathologists and uncertainty in the identity of the ‘true’ grade of many of the breast cancer tumours, a fact often ignored in
A Surgery Oral Examination: Interrater Agreement and the Influence of Rater Characteristics.

ERIC Educational Resources Information Center

Burchard, Kenneth W.; And Others

1995-01-01

A study measured interrater reliability among 140 United States and Canadian surgery exam raters and the influences of age, years in practice, and experience as an examiner on individual scores. Results indicate three aspects of examinee performance influenced scores: verbal style, dress, and content of answers. No rater characteristic…
A Note on the Interpretation of Weighted Kappa and its Relations to Other Rater Agreement Statistics for Metric Scales

ERIC Educational Resources Information Center

Schuster, Christof

2004-01-01

This article presents a formula for weighted kappa in terms of rater means, rater variances, and the rater covariance that is particularly helpful in emphasizing that weighted kappa is an absolute agreement measure in the sense that it is sensitive to differences in rater's marginal distributions. Specifically, rater mean differences will decrease…
Inter- and intra-rater reliability of nasal auscultation in daycare children.

PubMed

Santos, Rita; Silva Alexandrino, Ana; Tomé, David; Melo, Cristina; Mesquita Montes, António; Costa, Daniel; Pinto Ferreira, João

2018-02-01

The aim of this study was to assess nasal auscultation's intra- and inter-rater reliability and to analyze ear and respiratory clinical condition according to nasal auscultation. Cross-sectional study performed in 125 children aged up to 3 years old attending daycare centers. Nasal auscultation, tympanometry and Paediatric Respiratory Severity Score (PRSS) were applied to all children. Nasal sounds were classified by an expert panel in order to determine nasal auscultation's intra and inter- rater reliability. The classification of nasal sounds was assessed against tympanometric and PRSS values. Nasal auscultation revealed substantial inter-rater (K=0.75) and intra-rater (K=0.69; K=0.61 and K=0.72) reliability. Children with a "non-obstructed" classification revealed a lower peak pressure (t=-3.599, P<0.001 in left ear; t=-2.258, P=0.026 in right ear) and a higher compliance (t=-2,728, P=0.007 in left ear; t=-3.830. P<0.001 in right ear) in both ears. There was an association between the classification of sounds and tympanogram types in both ears (X=11.437, P=0.003 in left ear; X=13.535, P=0.001 in right ear). Children with a "non-obstructed" classification had a healthier respiratory condition. Nasal auscultation revealed substantial intra- and inter-rater reliability. Nasal auscultation exhibited important differences according to ear and respiratory clinical conditions. Nasal auscultation in pediatrics seems to be an original topic as well as a simple method that can be used to identify early signs of nasopharyngeal obstruction.
Exploring Differences in Measurement and Reporting of Classroom Observation Inter-Rater Reliability

ERIC Educational Resources Information Center

Wilhelm, Anne Garrison; Gillespie Rouse, Amy; Jones, Francesca

2018-01-01

Although inter-rater reliability is an important aspect of using observational instruments, it has received little theoretical attention. In this article, we offer some guidance for practitioners and consumers of classroom observations so that they can make decisions about inter-rater reliability, both for study design and in the reporting of data…
Effects of Rating Purpose and Rater Self-Esteem on Performance Ratings.

DTIC Science & Technology

1983-03-01

examined in a laboratory study, using a 2x2 analysis of variance design. Results indicate that low self - esteem raters assign significantly higher...design. Results indicate that low self - esteem raters assign significantly higher performance ratings when performance appraisal information will be used...studies indicated that individuals low in self - esteem have less self -confidence, feel less competent, and rely more on others’ opinions than do individuals
Intra-Rater and Inter-Rater Reliability of the Balance Error Scoring System in Pre-Adolescent School Children

ERIC Educational Resources Information Center

Sheehan, Dwayne P.; Lafave, Mark R.; Katz, Larry

2011-01-01

This study was designed to test the intra- and inter-rater reliability of the University of North Carolina's Balance Error Scoring System in 9- and 10-year-old children. Additionally, a modified version of the Balance Error Scoring System was tested to determine if it was more sensitive in this population ("raw scores"). Forty-six…
Overview on Deaf-Blindness

ERIC Educational Resources Information Center

Miles, Barbara

2008-01-01

It may seem that deaf-blindness refers to a total inability to see or hear. However, in reality deaf-blindness is a condition in which the combination of hearing and visual losses in children cause "such severe communication and other develop mental and educational needs that they cannot be accommodated in special education programs solely for…
An Examination of Rater Performance on a Local Oral English Proficiency Test: A Mixed-Methods Approach

ERIC Educational Resources Information Center

Yan, Xun

2014-01-01

This paper reports on a mixed-methods approach to evaluate rater performance on a local oral English proficiency test. Three types of reliability estimates were reported to examine rater performance from different perspectives. Quantitative results were also triangulated with qualitative rater comments to arrive at a more representative picture of…
Inter-Rater Reliability of Cyclotorsion Measurements Using Fundus Photography.

PubMed

Dysli, Muriel; Kanku, Madeleine; Traber, Ghislaine L

2018-04-01

The foveo-papillary angle (FPA) on fundus photographs is the accepted standard for the measurement of ocular cyclotorsion. We assessed the inter-rater reliability of this method in healthy subjects and in patients with trochlear nerve palsies. In this methodological study, fundus photographs of healthy subjects and of patients with trochlear nerve palsies were made with a fundus camera (Zeiss Fundus Camera FF 450 plus, Jena, Germany). Three independent observers measured the FPA on the fundus photographs of all subjects in synedra View (synedra View 16, Version 16.0.0.11, Innsbruck, Austria). One hundred and four eyes of 52 subjects (26 healthy controls and 26 patients) were assessed. The mean FPA of the healthy controls was 5.80 degrees (°) [± 0.44 standard error of the mean (SEM)] compared to 11.55° (± 0.80 SEM) for patients with trochlear nerve palsies. The inter-rater reliability of all measured FPAs showed an intraclass correlation coefficient (ICC) of 0.98 (95% CI 0.97 - 0.98). The inter-rater reliability of objective cyclotorsion measurements using fundus photographs was very high. Georg Thieme Verlag KG Stuttgart · New York.
Workplace-Based Assessment: Effects of Rater Expertise

ERIC Educational Resources Information Center

Govaerts, M. J. B.; Schuwirth, L. W. T.; Van der Vleuten, C. P. M.; Muijtjens, A. M. M.

2011-01-01

Traditional psychometric approaches towards assessment tend to focus exclusively on quantitative properties of assessment outcomes. This may limit more meaningful educational approaches towards workplace-based assessment (WBA). Cognition-based models of WBA argue that assessment outcomes are determined by cognitive processes by raters which are…
[Inter-rater reliability and validity of the OPD-CA axes structure and conflict].

PubMed

Benecke, Cord; Bock, Astrid; Wieser, Elke; Tschiesner, Reinhard; Lochmann, Martha; Küspert, Felicia; Schorn, Robert; Viertler, Bernhard; Steinmayr-Gensluckner, Maria

2011-01-01

The manual of the Operationalized Psychodynamic Diagnostics in childhood and adolescence (OPD-CA) is an instrument meanwhile widespread in the clinical practice to assess psychodynamic dimensions. Publications of inter-rater agreement and validity are still outstanding. This study assessed the interrater-reliability and validity for the axis structure and the axis conflict. 60 adolescents between 14 and 17 years, with and without psychic disorders, were diagnosed with the Operationalized Psychodynamic Diagnostics in childhood and adolescence (Arbeitskreis OPD-KJ, 2007) and SCID-II-interviews and questionnaires. A partial sample of 36 OPD-CA-interviews was the data basis for the assessment of inter-rater agreement. Calculations of validity for axis structure and axis conflict were made with the whole sample. Inter-rater agreement for the axis structure and the axis conflict showed good to very good weighted Kappa coefficients among the trained raters. Validity of the axis structure showed good results. The Operationalized Psychodynamic Diagnostics in childhood and adolescence (OPD-CA) allows a reliable diagnostic of axis structure and axis conflict, if the ratings are done on the basis of semistructured videotaped interviews by trained raters. The axis structure shows validity, while the results concerning the validity of the axis conflict remain unclear.

A Simulation Study of Rater Agreement Measures with 2x2 Contingency Tables

ERIC Educational Resources Information Center

Ato, Manuel; Lopez, Juan Jose; Benavente, Ana

2011-01-01

A comparison between six rater agreement measures obtained using three different approaches was achieved by means of a simulation study. Rater coefficients suggested by Bennet's [sigma] (1954), Scott's [pi] (1955), Cohen's [kappa] (1960) and Gwet's [gamma] (2008) were selected to represent the classical, descriptive approach, [alpha] agreement…
A risk-based classification scheme for genetically modified foods. III: Evaluation using a panel of reference foods.

PubMed

Chao, Eunice; Krewski, Daniel

2008-12-01

This paper presents an exploratory evaluation of four functional components of a proposed risk-based classification scheme (RBCS) for crop-derived genetically modified (GM) foods in a concordance study. Two independent raters assigned concern levels to 20 reference GM foods using a rating form based on the proposed RBCS. The four components of evaluation were: (1) degree of concordance, (2) distribution across concern levels, (3) discriminating ability of the scheme, and (4) ease of use. At least one of the 20 reference foods was assigned to each of the possible concern levels, demonstrating the ability of the scheme to identify GM foods of different concern with respect to potential health risk. There was reasonably good concordance between the two raters for the three separate parts of the RBCS. The raters agreed that the criteria in the scheme were sufficiently clear in discriminating reference foods into different concern levels, and that with some experience, the scheme was reasonably easy to use. Specific issues and suggestions for improvements identified in the concordance study are discussed.
Assessment and Correlation of Customer and Rater Response to Cold-Start and Warmup Driveability

DTIC Science & Technology

1993-08-01

Customer satisfaction fleet Year N % 1986 13 18 1988 10 14 1987 12 18 1988 12 16 1989 14 19 1990 9 12 1991 3 4 Consumer I Rater Fleet Hydrocarbon fuel...2 4 1991 0 0 Fuel system * Customer satisfaction fleet Fuel system N % Carbureted 19 26 PFI 33 48 1T1 21 29 Consumer I Rater Fleet Hydrooarbon fuel...between the customer fleet and one of the consumer /rater subfleets; these vehicles are included in both places in the tables above. 30 TABLE 2 AVERAGE
On individual differences in person perception: raters' personality traits relate to their psychopathy checklist-revised scoring tendencies.

PubMed

Miller, Audrey K; Rufino, Katrina A; Boccaccini, Marcus T; Jackson, Rebecca L; Murrie, Daniel C

2011-06-01

This study investigated raters' personality traits in relation to scores they assigned to offenders using the Psychopathy Checklist-Revised (PCL-R). A total of 22 participants, including graduate students and faculty members in clinical psychology programs, completed a PCL-R training session, independently scored four criminal offenders using the PCL-R, and completed a comprehensive measure of their own personality traits. A priori hypotheses specified that raters' personality traits, and their similarity to psychopathy characteristics, would relate to raters' PCL-R scoring tendencies. As hypothesized, some raters assigned consistently higher scores on the PCL-R than others, especially on PCL-R Facets 1 and 2. Also as hypothesized, raters' scoring tendencies related to their own personality traits (e.g., higher rater Agreeableness was associated with lower PCL-R Interpersonal facet scoring). Overall, findings underscore the need for future research to examine the role of evaluator characteristics on evaluation results and the need for clinical training to address evaluators' personality influences on their ostensibly objective evaluations.
Intra- and inter-rater reliability of digital image analysis for skin color measurement

PubMed Central

Sommers, Marilyn; Beacham, Barbara; Baker, Rachel; Fargo, Jamison

2013-01-01

Background We determined the intra- and inter-rater reliability of data from digital image color analysis between an expert and novice analyst. Methods Following training, the expert and novice independently analyzed 210 randomly ordered images. Both analysts used Adobe® Photoshop lasso or color sampler tools based on the type of image file. After color correction with Pictocolor® in camera software, they recorded L*a*b* (L*=light/dark; a*=red/green; b*=yellow/blue) color values for all skin sites. We computed intra-rater and inter-rater agreement within anatomical region, color value (L*, a*, b*), and technique (lasso, color sampler) using a series of one-way intra-class correlation coefficients (ICCs). Results Results of ICCs for intra-rater agreement showed high levels of internal consistency reliability within each rater for the lasso technique (ICC ≥ 0.99) and somewhat lower, yet acceptable, level of agreement for the color sampler technique (ICC = 0.91 for expert, ICC = 0.81 for novice). Skin L*, skin b*, and labia L* values reached the highest level of agreement (ICC ≥ 0.92) and skin a*, labia b*, and vaginal wall b* were the lowest (ICC ≥ 0.64). Conclusion Data from novice analysts can achieve high levels of agreement with data from expert analysts with training and the use of a detailed, standard protocol. PMID:23551208
Reference Books in Special Media. Reference Circular No. 82-4.

ERIC Educational Resources Information Center

Library of Congress, Washington, DC. National Library Service for the Blind and Physically Handicapped.

Based on information contained in producers' catalogs and on responses to a survey conducted by the Reference Section of the Library of Congress National Library Service (NLS) for the Blind and Physically Handicapped, this publication lists reference materials produced in braille or in large type, and sound recordings of reference works available…
Oscillatory activity reflects differential use of spatial reference frames by sighted and blind individuals in tactile attention.

PubMed

Schubert, Jonathan T W; Buchholz, Verena N; Föcker, Julia; Engel, Andreas K; Röder, Brigitte; Heed, Tobias

2015-08-15

Touch can be localized either on the skin in anatomical coordinates, or, after integration with posture, in external space. Sighted individuals are thought to encode touch in both coordinate systems concurrently, whereas congenitally blind individuals exhibit a strong bias for using anatomical coordinates. We investigated the neural correlates of this differential dominance in the use of anatomical and external reference frames by assessing oscillatory brain activity during a tactile spatial attention task. The EEG was recorded while sighted and congenitally blind adults received tactile stimulation to uncrossed and crossed hands while detecting rare tactile targets at one cued hand only. In the sighted group, oscillatory alpha-band activity (8-12Hz) in the cue-target interval was reduced contralaterally and enhanced ipsilaterally with uncrossed hands. Hand crossing attenuated the degree of posterior parietal alpha-band lateralization, indicating that attention deployment was affected by external spatial coordinates. Beamforming suggested that this posture effect originated in the posterior parietal cortex. In contrast, cue-related lateralization of central alpha-band as well as of beta-band activity (16-24Hz) were unaffected by hand crossing, suggesting that these oscillations exclusively encode anatomical coordinates. In the blind group, central alpha-band activity was lateralized, but did not change across postures. The pattern of beta-band activity was indistinguishable between groups. Because the neural mechanisms for posterior alpha-band generation seem to be linked to developmental vision, we speculate that the lack of this neural mechanism in blind individuals is related to their preferred use of anatomical over external spatial codes in sensory processing. Copyright © 2015 Elsevier Inc. All rights reserved.
Evaluating Rater Responses to an Online Training Program for L2 Writing Assessment

ERIC Educational Resources Information Center

Elder, Catherine; Barkhuizen, Gary; Knoch, Ute; von Randow, Janet

2007-01-01

The use of online rater self-training is growing in popularity and has obvious practical benefits, facilitating access to training materials and rating samples and allowing raters to reorient themselves to the rating scale and self monitor their behaviour at their own convenience. However there has thus far been little research into rater…
Establishing inter-rater reliability scoring in a state trauma system.

PubMed

Read-Allsopp, Christine

2004-01-01

Trauma systems rely on accurate Injury Severity Scoring (ISS) to describe trauma patient populations. Twenty-seven (27) Trauma Nurse Coordinators and Data Managers across the state of New South Wales, Australia trauma network were instructed in the uses and techniques of the Abbreviated Injury Scale (AIS) from the Association for the Advancement of Automotive Medicine. The aim is to provide accurate, reliable and valid data for the state trauma network. Four (4) months after the course a coding exercise was conducted to assess inter-rater reliability. The results show that inter-rater reliability is with accepted international standards.
Putting Raters in Ratees' Shoes: Perspective Taking and Assessment of Creative Products

ERIC Educational Resources Information Center

Han, Jiantao; Long, Haiying; Pang, Weiguo

2017-01-01

This study reported 2 experiments that studied the effect of perspective taking on assessment of creative products by using human raters. Forty responses of 2 alternative uses tasks (AUTs) and 15 alien stories generated by 6th-grade students were used as assessment materials. Undergraduate students as the novice raters assessed the products under…
Inter-rater Reliability of Sustained Aberrant Movement Patterns as a Clinical Assessment of Muscular Fatigue

PubMed Central

Aerts, Frank; Carrier, Kathy; Alwood, Becky

2016-01-01

Background: The assessment of clinical manifestation of muscle fatigue is an effective procedure in establishing therapeutic exercise dose. Few studies have evaluated physical therapist reliability in establishing muscle fatigue through detection of changes in quality of movement patterns in a live setting. Objective: The purpose of this study is to evaluate the inter-rater reliability of physical therapists’ ability to detect altered movement patterns due to muscle fatigue. Design: A reliability study in a live setting with multiple raters. Participants: Forty-four healthy individuals (ages 19-35) were evaluated by six physical therapists in a live setting. Methods: Participants were evaluated by physical therapists for altered movement patterns during resisted shoulder rotation. Each participant completed a total of four tests: right shoulder internal rotation, right shoulder external rotation, left shoulder internal rotation and left shoulder external rotation. Results: For all tests combined, the inter-rater reliability for a single rater scoring ICC (2,1) was .65 (95%, .60, .71) This corresponds to moderate inter-rater reliability between physical therapists. Limitations: The results of this study apply only to healthy participants and therefore cannot be generalized to a symptomatic population. Conclusion: Moderate inter-rater reliability was found between physical therapists in establishing muscle fatigue through the observation of sustained altered movement patterns during dynamic resistive shoulder internal and external rotation. PMID:27347241
A Multicenter, Rater-Blinded, Randomized Controlled Study of Auditory Processing-Focused Cognitive Remediation Combined With Open-Label Lurasidone in Patients With Schizophrenia and Schizoaffective Disorder.

PubMed

Kantrowitz, Joshua T; Sharif, Zafar; Medalia, Alice; Keefe, Richard S E; Harvey, Philip; Bruder, Gerard; Barch, Deanna M; Choo, Tse; Lee, Seonjoo; Lieberman, Jeffrey A

2016-06-01

Small-scale studies of auditory processing cognitive remediation programs have demonstrated efficacy in schizophrenia. We describe a multicenter, rater-blinded, randomized, controlled study of auditory-focused cognitive remediation, conducted from June 24, 2010, to June 14, 2013, and approved by the local institutional review board at all sites. Prior to randomization, participants with schizophrenia (DSM-IV-TR) were stabilized on a standardized antipsychotic regimen (lurasidone [40-160 mg/d]), followed by randomization to adjunctive cognitive remediation: auditory focused (Brain Fitness) versus control (nonspecific video games), administered 1-2 times weekly for 30 sessions. Coprimary outcome measures included MATRICS Consensus Cognitive Battery (MCCB) and the University of California, San Diego, Performance-Based Skills Assessment-Brief scale. 120 participants were randomized and completed at least 1 auditory-focused cognitive remediation (n = 56) or video game control session (n = 64). 74 participants completed ≥ 25 sessions and postrandomization assessments. At study completion, the change from prestabilization was statistically significant for MCCB composite score (d = 0.42, P < .0001) across groups. Participants randomized to auditory-focused cognitive remediation had a trend-level higher mean MCCB composite score compared to participants randomized to control cognitive remediation (P = .08). After controlling for scores at the time of randomization, there were no significant between-treatment group differences at study completion. Auditory processing cognitive remediation combined with lurasidone did not lead to differential improvement over nonspecific video games. Across-group improvement from prestabilization baseline to study completion was observed, but since all participants were receiving lurasidone open label, it is difficult to interpret the source of these effects. Future studies comparing both pharmacologic and behavioral cognitive enhancers
Genetics Home Reference: autosomal dominant congenital stationary night blindness

MedlinePlus

... collapse boxes. Description Autosomal dominant congenital stationary night blindness is a disorder of the retina , which is the specialized tissue at the back of the eye that detects light and color. People with this condition typically have difficulty seeing ...
Reliability of Untrained and Experienced Raters on FEES: Rating Overall Residue is a Simple Task.

PubMed

Pisegna, Jessica M; Borders, James C; Kaneoka, Asako; Coster, Wendy J; Leonard, Rebecca; Langmore, Susan E

2018-03-07

The purpose of this study was to investigate the reliability of residue ratings on Fiberoptic Endoscopic Evaluation of Swallowing (FEES). We also examined rating differences based on experience to determine if years of experience influenced residue ratings. A group of 44 raters watched 81 FEES videos representing a wide range of residue severities for thin liquid, applesauce, and cracker boluses. Raters were untrained on the rating scales and simply rated their overall impression of residue amount on a visual analog scale (VAS) and a five-point ordinal scale in a randomized fashion across two sessions. Intra-class correlation coefficients, kappa coefficients, and ANOVAs were used to analyze agreement and differences in ratings. Residue ratings on both the VAS and ordinal scales had acceptable inter- and intra-rater reliability. Inter-rater agreement was acceptable (ICC > 0.7) for all comparisons. Intra-rater agreement was excellent on the VAS scale (r c = 0.9) and good on the ordinal scale (k = 0.78). There was no significant difference between expert ratings and other raters based on years of experience for cracker ratings (p = 0.2119) and applesauce ratings (p = 0.2899), but there was a significant difference between clinicians on thin liquid ratings (p = 0.0005). Without any specific training, raters demonstrated high reliability when rating the overall amount of residue on FEES. Years of experience with FEES did not influence residue ratings, suggesting that expert ratings of overall residue amount are not unique or specialized. Rating the overall amount of residue on FEES appears to be a simple visual-perceptual task for puree and cracker boluses.
THE INTRA- AND INTER-RATER RELIABILITY OF THE SOCCER INJURY MOVEMENT SCREEN (SIMS).

PubMed

McCunn, Robert; Aus der Fünten, Karen; Govus, Andrew; Julian, Ross; Schimpchen, Jan; Meyer, Tim

2017-02-01

The growing volume of movement screening research reveals a belief among practitioners and researchers alike that movement quality may have an association with injury risk. However, existing movement screening tools have not considered the sport-specific movement and injury patterns relevant to soccer. The present study introduces the Soccer Injury Movement Screen (SIMS), which has been designed specifically for use within soccer. Furthermore, the purpose of the present study was to assess the intra- and inter-rater reliability of the SIMS and determine its suitability for use in further research. The study utilized a test-retest design to discern reliablility. Twenty-five (11 males, 14 females) healthy, recreationally active university students (age 25.5 ± 4.0 years, height 171 ± 9 cm, weight 64.7 ± 12.6 kg) agreed to participate. The SIMS contains five sub-tests: the anterior reach, single-leg deadlift, in-line lunge, single-leg hop for distance and tuck jump. Each movement was scored out of 10 points and summed to produce a composite score out of 50. The anterior reach and single-leg hop for distance were scored in real-time while the remaining tests were filmed and scored retrospectively. Three raters conducted the SIMS with each participant on three occasions separated by an average of three and a half days (minimum one day, maximum seven days). Rater 1 re-scored the filmed movements for all participants on all occasions six months later to establish the 'pure' intra-rater (intra-occasion) reliability for those movements. Intraclass correlation coefficient (ICC) values for intra- and inter-rater composite score reliability ranged from 0.66-0.72 and 0.79-0.86 respectively. Weighted kappa values representing the intra- and inter-rater reliability of the individual sub-tests ranged from 0.35-0.91 indicating fair to almost perfect agreement. Establishing the reliability of the SIMS is a prerequisite for further research seeking to investigate
THE INTRA- AND INTER-RATER RELIABILITY OF THE SOCCER INJURY MOVEMENT SCREEN (SIMS)

PubMed Central

aus der Fünten, Karen; Govus, Andrew; Julian, Ross; Schimpchen, Jan; Meyer, Tim

2017-01-01

Background/purpose The growing volume of movement screening research reveals a belief among practitioners and researchers alike that movement quality may have an association with injury risk. However, existing movement screening tools have not considered the sport-specific movement and injury patterns relevant to soccer. The present study introduces the Soccer Injury Movement Screen (SIMS), which has been designed specifically for use within soccer. Furthermore, the purpose of the present study was to assess the intra- and inter-rater reliability of the SIMS and determine its suitability for use in further research. Methods The study utilized a test-retest design to discern reliablility. Twenty-five (11 males, 14 females) healthy, recreationally active university students (age 25.5 ± 4.0 years, height 171 ± 9 cm, weight 64.7 ± 12.6 kg) agreed to participate. The SIMS contains five sub-tests: the anterior reach, single-leg deadlift, in-line lunge, single-leg hop for distance and tuck jump. Each movement was scored out of 10 points and summed to produce a composite score out of 50. The anterior reach and single-leg hop for distance were scored in real-time while the remaining tests were filmed and scored retrospectively. Three raters conducted the SIMS with each participant on three occasions separated by an average of three and a half days (minimum one day, maximum seven days). Rater 1 re-scored the filmed movements for all participants on all occasions six months later to establish the ‘pure’ intra-rater (intra-occasion) reliability for those movements. Results Intraclass correlation coefficient (ICC) values for intra- and inter-rater composite score reliability ranged from 0.66-0.72 and 0.79-0.86 respectively. Weighted kappa values representing the intra- and inter-rater reliability of the individual sub-tests ranged from 0.35-0.91 indicating fair to almost perfect agreement. Conclusions Establishing the reliability of the SIMS is a
The Hierarchical Rater Model for Rated Test Items and Its Application to Large-Scale Educational Assessment Data.

ERIC Educational Resources Information Center

Patz, Richard J.; Junker, Brian W.; Johnson, Matthew S.; Mariano, Louis T.

2002-01-01

Discusses the hierarchical rater model (HRM) of R. Patz (1996) and shows how it can be used to scale examinees and items, model aspects of consensus among raters, and model individual rater severity and consistency effects. Also shows how the HRM fits into the generalizability theory framework. Compares the HRM to the conventional item response…
In the eye of the beholder: the effect of rater variability and different rating scales on QTL mapping.

PubMed

Poland, Jesse A; Nelson, Rebecca J

2011-02-01

The agronomic importance of developing durably resistant cultivars has led to substantial research in the field of quantitative disease resistance (QDR) and, in particular, mapping quantitative trait loci (QTL) for disease resistance. The assessment of QDR is typically conducted by visual estimation of disease severity, which raises concern over the accuracy and precision of visual estimates. Although previous studies have examined the factors affecting the accuracy and precision of visual disease assessment in relation to the true value of disease severity, the impact of this variability on the identification of disease resistance QTL has not been assessed. In this study, the effects of rater variability and rating scales on mapping QTL for northern leaf blight resistance in maize were evaluated in a recombinant inbred line population grown under field conditions. The population of 191 lines was evaluated by 22 different raters using a direct percentage estimate, a 0-to-9 ordinal rating scale, or both. It was found that more experienced raters had higher precision and that using a direct percentage estimation of diseased leaf area produced higher precision than using an ordinal scale. QTL mapping was then conducted using the disease estimates from each rater using stepwise general linear model selection (GLM) and inclusive composite interval mapping (ICIM). For GLM, the same QTL were largely found across raters, though some QTL were only identified by a subset of raters. The magnitudes of estimated allele effects at identified QTL varied drastically, sometimes by as much as threefold. ICIM produced highly consistent results across raters and for the different rating scales in identifying the location of QTL. We conclude that, despite variability between raters, the identification of QTL was largely consistent among raters, particularly when using ICIM. However, care should be taken in estimating QTL allele effects, because this was highly variable and rater
Intra- and inter-rater reliability of digital image analysis for skin color measurement.

PubMed

Sommers, Marilyn; Beacham, Barbara; Baker, Rachel; Fargo, Jamison

2013-11-01

We determined the intra- and inter-rater reliability of data from digital image color analysis between an expert and novice analyst. Following training, the expert and novice independently analyzed 210 randomly ordered images. Both analysts used Adobe(®) Photoshop lasso or color sampler tools based on the type of image file. After color correction with Pictocolor(®) in camera software, they recorded L*a*b* (L*=light/dark; a*=red/green; b*=yellow/blue) color values for all skin sites. We computed intra-rater and inter-rater agreement within anatomical region, color value (L*, a*, b*), and technique (lasso, color sampler) using a series of one-way intra-class correlation coefficients (ICCs). Results of ICCs for intra-rater agreement showed high levels of internal consistency reliability within each rater for the lasso technique (ICC ≥ 0.99) and somewhat lower, yet acceptable, level of agreement for the color sampler technique (ICC = 0.91 for expert, ICC = 0.81 for novice). Skin L*, skin b*, and labia L* values reached the highest level of agreement (ICC ≥ 0.92) and skin a*, labia b*, and vaginal wall b* were the lowest (ICC ≥ 0.64). Data from novice analysts can achieve high levels of agreement with data from expert analysts with training and the use of a detailed, standard protocol. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Effects of measurement method and transcript availability on inexperienced raters' stuttering frequency scores.

PubMed

Chakraborty, Nalanda; Logan, Kenneth J

To examine the effects of measurement method and transcript availability on the accuracy, reliability, and efficiency of inexperienced raters' stuttering frequency measurements. 44 adults, all inexperienced at evaluating stuttered speech, underwent 20 min of preliminary training in stuttering measurement and then analyzed a series of sentences, with and without access to transcripts of sentence stimuli, using either a syllable-based analysis (SBA) or an utterance-based analysis (UBA). Participants' analyses were compared between groups and to a composite analysis from two experienced evaluators. Stuttering frequency scores from the SBA and UBA groups differed significantly from the experienced evaluators' scores; however, UBA scores were significantly closer to the experienced evaluators' scores and were completed significantly faster than the SBA scores. Transcript availability facilitated scoring accuracy and efficiency in both groups. The internal reliability of stuttering frequency scores was acceptable for the SBA and UBA groups; however, the SBA group demonstrated only modest point-by-point agreement with ratings from the experienced evaluators. Given its accuracy and efficiency advantages over syllable-based analysis, utterance-based fluency analysis appears to be an appropriate context for introducing stuttering frequency measurement to raters who have limited experience in stuttering measurement. To address accuracy gaps between experienced and inexperienced raters, however, use of either analysis must be supplemented with training activities that expose inexperienced raters to the decision-making processes used by experienced raters when identifying stuttered syllables. Copyright © 2018 Elsevier Inc. All rights reserved.

Feasibility and inter-rater reliability of the ICU Mobility Scale.

PubMed

Hodgson, Carol; Needham, Dale; Haines, Kimberley; Bailey, Michael; Ward, Alison; Harrold, Megan; Young, Paul; Zanni, Jennifer; Buhr, Heidi; Higgins, Alisa; Presneill, Jeff; Berney, Sue

2014-01-01

The objectives of this study were to develop a scale for measuring the highest level of mobility in adult ICU patients and to assess its feasibility and inter-rater reliability. Growing evidence supports the feasibility, safety and efficacy of early mobilization in the intensive care unit (ICU). However, there are no adequately validated tools to quickly, easily, and reliably describe the mobility milestones of adult patients in ICU. Identifying or developing such a tool is a priority for evaluating mobility and rehabilitation activities for research and clinical care purposes. This study was performed at two ICUs in Australia. Thirty ICU nursing, and physiotherapy staff assessed the feasibility of the 'ICU Mobility Scale' (IMS) using a 10-item questionnaire. The inter-rater reliability of the IMS was assessed by 2 junior physical therapists, 2 senior physical therapists, and 16 nursing staff in 100 consecutive medical, surgical or trauma ICU patients. An 11 point IMS scale was developed based on multidisciplinary input. Participating clinicians reported that the scale was clear, with 95% of respondents reporting that it took <1 min to complete. The junior and senior physical therapists showed the highest inter-rater reliability with a weighted Kappa (95% confidence interval) of 0.83 (0.76-0.90), while the senior physical therapists and nurses and the junior physical therapists and nurses had a weighted Kappa of 0.72 (0.61-0.83) and 0.69 (0.56-0.81) respectively. The IMS is a feasible tool with strong inter-rater reliability for measuring the maximum level of mobility of adult patients in the ICU. Copyright © 2014 Elsevier Inc. All rights reserved.
Can Raters with Reduced Job Descriptive Information Provide Accurate Position Analysis Questionnaire (PAQ) Ratings?

ERIC Educational Resources Information Center

Friedman, Lee; Harvey, Robert J.

1986-01-01

Job-naive raters provided with job descriptive information made Position Analysis Questionnaire (PAQ) ratings which were validated against ratings of job analysts who were also job content experts. None of the reduced job descriptive information conditions enabled job-naive raters to obtain either acceptable levels of convergent validity with…
Is the Parkinson Anxiety Scale comparable across raters?

PubMed

Forjaz, Maria João; Ayala, Alba; Martinez-Martin, Pablo; Dujardin, Kathy; Pontone, Gregory M; Starkstein, Sergio E; Weintraub, Daniel; Leentjens, Albert F G

2015-04-01

The Parkinson Anxiety Scale is a new scale developed to measure anxiety severity in Parkinson's disease specifically. It consists of three dimensions: persistent anxiety, episodic anxiety, and avoidance behavior. This study aimed to assess the measurement properties of the scale while controlling for the rater (self- vs. clinician-rated) effect. The Parkinson Anxiety Scale was administered to a cross-sectional multicenter international sample of 362 Parkinson's disease patients. Both patients and clinicians rated the patient's anxiety independently. A many-facet Rasch model design was applied to estimate and remove the rater effect. The following measurement properties were assessed: fit to the Rasch model, unidimensionality, reliability, differential item functioning, item local independency, interrater reliability (self or clinician), and scale targeting. In addition, test-retest stability, construct validity, precision, and diagnostic properties of the Parkinson Anxiety Scale were also analyzed. A good fit to the Rasch model was obtained for Parkinson Anxiety Scale dimensions A and B, after the removal of one item and rescoring of the response scale for certain items, whereas dimension C showed marginal fit. Self versus clinician rating differences were of small magnitude, with patients reporting higher anxiety levels than clinicians. The linear measure for Parkinson Anxiety Scale dimensions A and B showed good convergent construct with other anxiety measures and good diagnostic properties. Parkinson Anxiety Scale modified dimensions A and B provide valid and reliable measures of anxiety in Parkinson's disease that are comparable across raters. Further studies are needed with dimension C. © 2014 International Parkinson and Movement Disorder Society.
Cultural values and performance appraisal: assessing the effects of rater self-construal on performance ratings.

PubMed

Mishra, Vipanchi; Roch, Sylvia G

2013-01-01

Much of the prior research investigating the influence of cultural values on performance ratings has focused either on conducting cross-national comparisons among raters or using cultural level individualism/collectivism scales to measure the effects of cultural values on performance ratings. Recent research has shown that there is considerable within country variation in cultural values, i.e. people in one country can be more individualistic or collectivistic in nature. Taking the latter perspective, the present study used Markus and Kitayama's (1991) conceptualization of independent and interdependent self-construals as measures of individual variations in cultural values to investigate within culture variations in performance ratings. Results suggest that rater self-construal has a significant influence on overall performance evaluations; specifically, raters with a highly interdependent self-construal tend to show a preference for interdependent ratees, whereas raters high on independent self-construal do not show a preference for specific type of ratees when making overall performance evaluations. Although rater self-construal significantly influenced overall performance evaluations, no such effects were observed for specific dimension ratings. Implications of these results for performance appraisal research and practice are discussed.
A Cross-Linguistic Investigation of the Effect of Raters' Accent Familiarity on Speaking Assessment

ERIC Educational Resources Information Center

Huang, Becky; Alegre, Analucia; Eisenberg, Ann

2016-01-01

The project aimed to examine the effect of raters' familiarity with accents on their judgments of non-native speech. Participants included three groups of raters who were either from Spanish Heritage, Spanish Non-Heritage, or Chinese Heritage backgrounds (n = 16 in each group) using Winke & Gass's (2013) definition of a heritage learner as…
The Problem of Limited Inter-rater Agreement in Modelling Music Similarity

PubMed Central

Flexer, Arthur; Grill, Thomas

2016-01-01

One of the central goals of Music Information Retrieval (MIR) is the quantification of similarity between or within pieces of music. These quantitative relations should mirror the human perception of music similarity, which is however highly subjective with low inter-rater agreement. Unfortunately this principal problem has been given little attention in MIR so far. Since it is not meaningful to have computational models that go beyond the level of human agreement, these levels of inter-rater agreement present a natural upper bound for any algorithmic approach. We will illustrate this fundamental problem in the evaluation of MIR systems using results from two typical application scenarios: (i) modelling of music similarity between pieces of music; (ii) music structure analysis within pieces of music. For both applications, we derive upper bounds of performance which are due to the limited inter-rater agreement. We compare these upper bounds to the performance of state-of-the-art MIR systems and show how the upper bounds prevent further progress in developing better MIR systems. PMID:28190932
Intra and inter-rater reliability of infrared image analysis of masticatory and upper trapezius muscles in women with and without temporomandibular disorder.

PubMed

Costa, Ana C S; Dibai Filho, Almir V; Packer, Amanda C; Rodrigues-Bigaton, Delaine

2013-01-01

Infrared thermography is an aid tool that can be used to evaluate several pathologies given its efficiency in analyzing the distribution of skin surface temperature. To propose two forms of infrared image analysis of the masticatory and upper trapezius muscles, and to determine the intra and inter-rater reliability of both forms of analysis. Infrared images of masticatory and upper trapezius muscles of 64 female volunteers with and without temporomandibular disorder (TMD) were collected. Two raters performed the infrared image analysis, which occurred in two ways: temperature measurement of the muscle length and in central portion of the muscle. The Intraclass Correlation Coefficient (ICC) was used to determine the intra and inter-rater reliability. The ICC showed excellent intra and inter-rater values for both measurements: temperature measurement of the muscle length (TMD group, intra-rater, ICC ranged from 0.996 to 0.999, inter-rater, ICC ranged from 0.992 to 0.999; control group, intra-rater, ICC ranged from 0.993 to 0.998, inter-rater, ICC ranged from 0.990 to 0.998), and temperature measurement of the central portion of the muscle (TMD group, intra-rater, ICC ranged from 0.981 to 0.998, inter-rater, ICC ranged from 0.971 to 0.998; control group, intra-rater, ICC ranged from 0.887 to 0.996, inter-rater, ICC ranged from 0.852 to 0.996). The results indicated that temperature measurements of the masticatory and upper trapezius muscles carried out by the analysis of the muscle length and central portion yielded excellent intra and inter-rater reliability.
Inter-rater reliability of measures to characterize the tobacco retail environment in Mexico.

PubMed

Hall, Marissa G; Kollath-Cattano, Christy; Reynales-Shigematsu, Luz Myriam; Thrasher, James F

2015-01-01

To evaluate the inter-rater reliability of a data collection instrument to assess the tobacco retail environment in Mexico, after major marketing regulations were implemented. In 2013, two data collectors independently evaluated 21 stores in two census tracts, through a data collection instrument that assessed the presence of price promotions, whether single cigarettes were sold, the number of visible advertisements, the presence of signage prohibiting the sale of cigarettes to minors, and characteristics of cigarette pack displays. We evaluated the inter-rater reliability of the collected data, through the calculation of metrics such as intraclass correlation coefficient, percent agreement, Cohen's kappa and Krippendorff's alpha. Most measures demonstrated substantial or perfect inter-rater reliability. Our results indicate the potential utility of the data collection instrument for future point-of-sale research.
Inter-rater agreement in the assessment of abnormal chest X-ray findings for tuberculosis between two Asian countries

PubMed Central

2012-01-01

Background Inter-rater agreement in the interpretation of chest X-ray (CXR) films is crucial for clinical and epidemiological studies of tuberculosis. We compared the readings of CXR films used for a survey of tuberculosis between raters from two Asian countries. Methods Of the 11,624 people enrolled in a prevalence survey in Hanoi, Viet Nam, in 2003, we studied 258 individuals whose CXR films did not exclude the possibility of active tuberculosis. Follow-up films obtained from accessible individuals in 2006 were also analyzed. Two Japanese and two Vietnamese raters read the CXR films based on a coding system proposed by Den Boon et al. and another system newly developed in this study. Inter-rater agreement was evaluated by kappa statistics. Marginal homogeneity was evaluated by the generalized estimating equation (GEE). Results CXR findings suspected of tuberculosis differed between the four raters. The frequencies of infiltrates and fibrosis/scarring detected on the films significantly differed between the raters from the two countries (P < 0.0001 and P = 0.0082, respectively, by GEE). The definition of findings such as primary cavity, used in the coding systems also affected the degree of agreement. Conclusions CXR findings were inconsistent between the raters with different backgrounds. High inter-rater agreement is a component necessary for an optimal CXR coding system, particularly in international studies. An analysis of reading results and a thorough discussion to achieve a consensus would be necessary to achieve further consistency and high quality of reading. PMID:22296612
A Bayesian hierarchical latent trait model for estimating rater bias and reliability in large-scale performance assessment

PubMed Central

2018-01-01

We propose a novel approach to modelling rater effects in scoring-based assessment. The approach is based on a Bayesian hierarchical model and simulations from the posterior distribution. We apply it to large-scale essay assessment data over a period of 5 years. Empirical results suggest that the model provides a good fit for both the total scores and when applied to individual rubrics. We estimate the median impact of rater effects on the final grade to be ± 2 points on a 50 point scale, while 10% of essays would receive a score at least ± 5 different from their actual quality. Most of the impact is due to rater unreliability, not rater bias. PMID:29614129
Y-balance test: a reliability study involving multiple raters.

PubMed

Shaffer, Scott W; Teyhen, Deydre S; Lorenson, Chelsea L; Warren, Rick L; Koreerat, Christina M; Straseske, Crystal A; Childs, John D

2013-11-01

The Y-balance test (YBT) is one of the few field expedient tests that have shown predictive validity for injury risk in an athletic population. However, analysis of the YBT in a heterogeneous population of active adults (e.g., military, specific occupations) involving multiple raters with limited experience in a mass screening setting is lacking. The primary purpose of this study was to determine interrater test-retest reliability of the YBT in a military setting using multiple raters. Sixty-four service members (53 males, 11 females) actively conducting military training volunteered to participate. Interrater test-retest reliability of the maximal reach had intraclass correlation coefficients (2,1) of 0.80 to 0.85 with a standard error of measurement ranging from 3.1 to 4.2 cm for the 3 reach directions (anterior, posteromedial, and posterolateral). Interrater test-retest reliability of the average reach of 3 trails had an intraclass correlation coefficients (2,3) range of 0.85 to 0.93 with an associated standard error of measurement ranging from 2.0 to 3.5cm. The YBT showed good interrater test-retest reliability with an acceptable level of measurement error among multiple raters screening active duty service members. In addition, 31.3% (n = 20 of 64) of participants exhibited an anterior reach asymmetry of >4cm, suggesting impaired balance symmetry and potentially increased risk for injury. Reprint & Copyright © 2013 Association of Military Surgeons of the U.S.
Inter-rater Agreement of End-of-shift Evaluations Based on a Single Encounter

PubMed Central

Warrington, Steven; Beeson, Michael; Bradford, Amber

2017-01-01

Introduction End-of-shift evaluation (ESE) forms, also known as daily encounter cards, represent a subset of encounter-based assessment forms. Encounter cards have become prevalent for formative evaluation, with some suggesting a potential for summative evaluation. Our objective was to evaluate the inter-rater agreement of ESE forms using a single scripted encounter at a conference of emergency medicine (EM) educators. Methods Following institutional review board exemption, we created a scripted video simulating an encounter between an intern and a patient with an ankle injury. That video was shown during a lecture at the Council of EM Residency Director’s Academic Assembly with attendees asked to evaluate the “resident” using one of eight possible ESE forms randomly distributed. Descriptive statistics were used to analyze the results with Fleiss’ kappa to evaluate inter-rater agreement. Results Most of the 324 respondents were leadership in residency programs (66%), with a range of 29–47 responses per evaluation form. Few individuals (5%) felt they were experts in assessing residents based on EM milestones. Fleiss’ kappa ranged from 0.157 – 0.308 and did not perform much better in two post-hoc subgroup analyses. Conclusion The kappa ranges found show only slight to fair inter-rater agreement and raise concerns about the use of ESE forms in assessment of EM residents. Despite limitations present in this study, these results and a lack of other studies on inter-rater agreement of encounter cards should prompt further studies of such methods of assessment. Additionally, EM educators should focus research on methods to improve inter-rater agreement of ESE forms or other evaluating other methods of assessment of EM residents. PMID:28435505
Proxies and Other External Raters: Methodological Considerations

PubMed Central

Snow, A Lynn; Cook, Karon F; Lin, Pay-Shin; Morgan, Robert O; Magaziner, Jay

2005-01-01

Objective The purpose of this paper is to introduce researchers to the measurement and subsequent analysis considerations involved when using externally rated data. We will define and describe two categories of externally rated data, recommend methodological approaches for analyzing and interpreting data in these two categories, and explore factors affecting agreement between self-rated and externally rated reports. We conclude with a discussion of needs for future research. Data Sources/Study Setting Data sources for this paper are previous published studies and reviews comparing self-rated with externally rated data. Study Design/Data Collection/Extraction Methods This is a psychometric conceptual paper. Principal Findings We define two types of externally rated data: proxy data and other-rated data. Proxy data refer to those collected from someone who speaks for a patient who cannot, will not, or is unavailable to speak for him or herself, whereas we use the term other-rater data to refer to situations in which the researcher collects ratings from a person other than the patient to gain multiple perspectives on the assessed construct. These two types of data differ in the way the measurement model is defined, the definition of the gold standard against which the measurements are validated, the analysis strategies appropriately used, and how the analyses are interpreted. There are many factors affecting the discrepancies between self- and external ratings, including characteristics of the patient, the proxy, and of the rated construct. Several psychological theories can be helpful in predicting such discrepancies. Conclusions Externally rated data have an important place in health services research, but use of such data requires careful consideration of the nature of the data and how it will be analyzed and interpreted. PMID:16179002
Blindness and Severe Visual Impairment in Pupils at Schools for the Blind in Burundi

PubMed Central

Ruhagaze, Patrick; Njuguna, Kahaki Kimani Margaret; Kandeke, Lévi; Courtright, Paul

2013-01-01

Purpose: To determine the causes of childhood blindness and severe visual impairment in pupils attending schools for the blind in Burundi in order to assist planning for services in the country. Materials and Methods: All pupils attending three schools for the blind in Burundi were examined. A modified WHO/PBL eye examination record form for children with blindness and low vision was used to record the findings. Data was analyzed for those who became blind or severely visually impaired before the age of 16 years. Results: Overall, 117 pupils who became visually impaired before 16 years of age were examined. Of these, 109 (93.2%) were blind or severely visually impaired. The major anatomical cause of blindness or severe visual impairment was cornea pathology/phthisis (23.9%), followed by lens pathology (18.3%), uveal lesions (14.7%) and optic nerve lesions (11.9%). In the majority of pupils with blindness or severe visual impairment, the underlying etiology of visual loss was unknown (74.3%). More than half of the pupils with lens related blindness had not had surgery; among those who had surgery, outcomes were generally poor. Conclusion: The causes identified indicate the importance of continuing preventive public health strategies, as well as the development of specialist pediatric ophthalmic services in the management of childhood blindness in Burundi. The geographic distribution of pupils at the schools for the blind indicates a need for community-based programs to identify and refer children in need of services. PMID:23580854
Blindness and severe visual impairment in pupils at schools for the blind in Burundi.

PubMed

Ruhagaze, Patrick; Njuguna, Kahaki Kimani Margaret; Kandeke, Lévi; Courtright, Paul

2013-01-01

To determine the causes of childhood blindness and severe visual impairment in pupils attending schools for the blind in Burundi in order to assist planning for services in the country. All pupils attending three schools for the blind in Burundi were examined. A modified WHO/PBL eye examination record form for children with blindness and low vision was used to record the findings. Data was analyzed for those who became blind or severely visually impaired before the age of 16 years. Overall, 117 pupils who became visually impaired before 16 years of age were examined. Of these, 109 (93.2%) were blind or severely visually impaired. The major anatomical cause of blindness or severe visual impairment was cornea pathology/phthisis (23.9%), followed by lens pathology (18.3%), uveal lesions (14.7%) and optic nerve lesions (11.9%). In the majority of pupils with blindness or severe visual impairment, the underlying etiology of visual loss was unknown (74.3%). More than half of the pupils with lens related blindness had not had surgery; among those who had surgery, outcomes were generally poor. The causes identified indicate the importance of continuing preventive public health strategies, as well as the development of specialist pediatric ophthalmic services in the management of childhood blindness in Burundi. The geographic distribution of pupils at the schools for the blind indicates a need for community-based programs to identify and refer children in need of services.
Exploring the Effects of Rater Linking Designs and Rater Fit on Achievement Estimates within the Context of Music Performance Assessments

ERIC Educational Resources Information Center

Wind, Stefanie A.; Engelhard, George, Jr.; Wesolowski, Brian

2016-01-01

When good model-data fit is observed, the Many-Facet Rasch (MFR) model acts as a linking and equating model that can be used to estimate student achievement, item difficulties, and rater severity on the same linear continuum. Given sufficient connectivity among the facets, the MFR model provides estimates of student achievement that are equated to…
The Effect of Raters and Rating Conditions on the Reliability of the Missionary Teaching Assessment

ERIC Educational Resources Information Center

Ure, Abigail C.

2011-01-01

This study investigated how 2 different rating conditions, the controlled rating condition (CRC) and the uncontrolled rating condition (URC), effected rater behavior and the reliability of a performance assessment (PA) known as the Missionary Teaching Assessment (MTA). The CRC gives raters the capability to manipulate (pause, rewind, fast-forward)…
Individual Differences in Susceptibility to Inattentional Blindness

ERIC Educational Resources Information Center

Seegmiller, Janelle K.; Watson, Jason M.; Strayer, David L.

2011-01-01

Inattentional blindness refers to the finding that people do not always see what appears in their gaze. Though inattentional blindness affects large percentages of people, it is unclear if there are individual differences in susceptibility. The present study addressed whether individual differences in attentional control, as reflected by…
The Scarbase Duo(®): Intra-rater and inter-rater reliability and validity of a compact dual scar assessment tool.

PubMed

Fell, Matthew; Meirte, Jill; Anthonissen, Mieke; Maertens, Koen; Pleat, Jonathon; Moortgat, Peter

2016-03-01

Objective scar assessment tools were designed to help identify problematic scars and direct clinical management. Their use has been restricted by their measurement of a single scar property and the bulky size of equipment. The Scarbase Duo(®) was designed to assess both trans-epidermal water loss (TEWL) and colour of a burn scar whilst being compact and easy to use. Twenty patients with a burn scar were recruited and measurements taken using the Scarbase Duo(®) by two observers. The Scarbase Duo(®) measures TEWL via an open-chamber system and undertakes colorimetry via narrow-band spectrophotometry, producing values for relative erythema and melanin pigmentation. Validity was assessed by comparing the Scarbase Duo(®) against the Dermalab(®) and the Minolta Chromameter(®) respectively for TEWL and colorimetry measurements. The intra-class correlation coefficient (ICC) was used to assess reliability with standard error of measurement (SEM) used to assess reproducibility of measurements. The Pearson correlation coefficient (r) was used to assess the convergent validity. The Scarbase Duo(®) TEWL mode had excellent reliability when used on scars for both intra- (ICC=0.95) and inter-rater (ICC=0.96) measurements with moderate SEM values. The erythema component of the colorimetry mode showed good reliability for use on scars for both intra-(ICC=0.81) and inter-rater (ICC=0.83) measurements with low SEM values. Pigmentation values showed excellent reliability on scar tissue for both intra- (ICC=0.97) and inter-rater (ICC=0.97) with moderate SEM values. The Scarbase Duo(®) TEWL function had excellent correlation with the Dermalab(®) (r=0.93) whilst the colorimetry erythema value had moderate correlation with the Minolta Chromameter (r=0.72). The Scarbase Duo(®) is a reliable and objective scar assessment tool, which is specifically designed for burn scars. However, for clinical use, standardised measurement conditions are recommended. Copyright © 2015 Elsevier
Accuracy and reliability of the sensory test performed using the laryngopharyngeal endoscopic esthesiometer and rangefinder in patients with suspected obstructive sleep apnoea hypopnoea: protocol for a prospective double-blinded, randomised, exploratory study.

PubMed

Giraldo-Cadavid, Luis Fernando; Bastidas, Alirio Rodrigo; Padilla-Ortiz, Diana Marcela; Concha-Galan, Diana Carolina; Bazurto, María Angelica; Vargas, Leslie

2017-08-21

Patients with obstructive sleep apnoea hypopnoea syndrome (OSA) might have varying degrees of laryngopharyngeal mechanical hyposensitivity that might impair the brain's capacity to prevent airway collapse during sleep. However, this knowledge about sensory compromises in OSA comes from studies performed using methods with little evidence of their validity. Hence, the purpose of this study is to assess the reliability and accuracy of the measurement of laryngopharyngeal mechanosensitivity in patients with OSA using a recently developed laryngopharyngeal endoscopic esthesiometer and rangefinder (LPEER). The study will be prospective and double blinded, with a randomised crossover assignment of raters performing the sensory tests. Subjects will be recruited from patients with suspected OSA referred for baseline polysomnography to a university hospital sleep laboratory. Intra-rater and inter-rater reliability will be evaluated using the Bland-Altman's limits of agreement plot, the intraclass correlation coefficient, and the Pearson or Spearman correlation coefficient, depending on the distribution of the variables. Diagnostic accuracy will be evaluated plotting ROC curves using standard baseline polysomnography as a reference. The sensory threshold values for patients with mild, moderate and severe OSA will be determined and compared using ANOVA or the Kruskal-Wallis test, depending on the distribution of the variables. The LPEER could be a new tool for evaluating and monitoring laryngopharyngeal sensory impairment in patients with OSA. If it is shown to be valid, it could help to increase our understanding of the pathophysiological mechanisms of this condition and potentially help in finding new therapeutic interventions for OSA. The protocol has been approved by the Institutional Review Board of Fundacion Neumologica Colombiana. The results will be disseminated through conference presentations and peer-reviewed publication. This trial was registered at Clinical

Comparison of "E-Rater"[R] Automated Essay Scoring Model Calibration Methods Based on Distributional Targets

ERIC Educational Resources Information Center

Zhang, Mo; Williamson, David M.; Breyer, F. Jay; Trapani, Catherine

2012-01-01

This article describes two separate, related studies that provide insight into the effectiveness of "e-rater" score calibration methods based on different distributional targets. In the first study, we developed and evaluated a new type of "e-rater" scoring model that was cost-effective and applicable under conditions of absent human rating and…
Examining the interrater reliability of the Hare Psychopathy Checklist-Revised across a large sample of trained raters.

PubMed

Blais, Julie; Forth, Adelle E; Hare, Robert D

2017-06-01

The goal of the current study was to assess the interrater reliability of the Psychopathy Checklist-Revised (PCL-R) among a large sample of trained raters (N = 280). All raters completed PCL-R training at some point between 1989 and 2012 and subsequently provided complete coding for the same 6 practice cases. Overall, 3 major conclusions can be drawn from the results: (a) reliability of individual PCL-R items largely fell below any appropriate standards while the estimates for Total PCL-R scores and factor scores were good (but not excellent); (b) the cases representing individuals with high psychopathy scores showed better reliability than did the cases of individuals in the moderate to low PCL-R score range; and (c) there was a high degree of variability among raters; however, rater specific differences had no consistent effect on scoring the PCL-R. Therefore, despite low reliability estimates for individual items, Total scores and factor scores can be reliably scored among trained raters. We temper these conclusions by noting that scoring standardized videotaped case studies does not allow the rater to interact directly with the offender. Real-world PCL-R assessments typically involve a face-to-face interview and much more extensive collateral information. We offer recommendations for new web-based training procedures. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Writing Evaluation: Rater and Task Effects on the Reliability of Writing Scores for Children in Grades 3 and 4

PubMed Central

Kim, Grace Young-Suk; Schatschneider, Christopher; Wanzek, Jeanne; Gatlin, Brandy; Al Otaiba, Stephanie

2017-01-01

We examined how raters and tasks influence measurement error in writing evaluation and how many raters and tasks are needed to reach a desirable level of .90 and .80 reliabilities for children in Grades 3 and 4. A total of 211 children (102 boys) were administered three tasks in narrative and expository genres, respectively, and their written compositions were evaluated in widely used evaluation methods for developing writers: holistic scoring, productivity, and curriculum-based writing scores. Results showed that 54% and 52% of variance in narrative and expository compositions were attributable to true individual differences in writing. Students’ scores varied largely by tasks (30.44% and 28.61% of variance), but not by raters. To reach the reliability of .90, multiple tasks and raters were needed, and for the reliability of .80, a single rater and multiple tasks were needed. These findings offer important implications about reliably evaluating children’s writing skills, given that writing is typically evaluated by a single task and a single rater in classrooms and even in state accountability systems. PMID:29075050
Face transplantation for the blind: more than being blind in a sighted world.

PubMed

Lee, Joseph

2018-06-01

Face transplantation (FT) is a landmark in reconstructive surgery involving vascularised composite allotransplantation. A recent issue of FT for patients who are blind has arisen. Some bioethicists recommend not excluding a patient who is blind, as this may amount to discrimination. From an ethical standpoint, FT for those with blindness is appropriate in selected candidates. This article seeks to add to the clinical evidence supporting FT for those with blindness by detailing a complementary psychosocial perspective. Currently, there is little relevant research about the subjectivity of the blind. This is critical since the arguments against FT for the blind refer to their inability to see their face and to view the reaction of others to their disfigured faces. We begin with a brief look at examples of FT involving blindness and associated arguments. The next part is a multidisciplinary investigation of the experiences of the blind. These are gleaned from a close reading of the literature and drawing inferences, as direct studies are rare. The discussion analyses identity themes of the blind in relation to their faces: as they experience it; the face they wish to show to the world; and how others perceive and react to their face in a saturated environment of imagery and visual communication. Disability and the blind person's experience of faces are well-founded considerations for medical practitioners and ethics boards in the process of FT decision-making. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Cervical auscultation as an adjunct to the clinical swallow examination: a comparison with fibre-optic endoscopic evaluation of swallowing.

PubMed

Bergström, Liza; Svensson, Per; Hartelius, Lena

2014-10-01

This prospective, single-blinded study investigated the validity and reliability of cervical auscultation (CA) under two conditions; (1) CA-only, using isolated swallow-sound clips, and (2) CSE + CA, using extra clinical swallow examination (CSE) information such as patient case history, oromotor assessment, and the same swallow-sound clips as condition one. The two CA conditions were compared against a fibre-optic endoscopic evaluation of swallowing (FEES) reference test. Each CA condition consisted of 18 swallows samples compiled from 12 adult patients consecutively referred to the FEES clinic. Patients' swallow sounds were simultaneously recorded during FEES via a Littmann E3200 electronic stethoscope. These 18 swallow samples were sent to 13 experienced dysphagia clinicians recruited from the UK and Australia who were blinded to the FEES results. Samples were rated in terms of (1) if dysphagic, (2) if the patient was safe on consistency trialled, and (3) dysphagia severity. Sensitivity measures ranged from 83-95%, specificity measures from 50-92% across the conditions. Intra-rater agreement ranged from 69-97% total agreement. Inter-rater reliability for dysphagia severity showed substantial agreement (rs = 0.68 and 0.74). Results show good rater reliability for CA-trained speech-language pathologists. Sensitivity and specificity for both CA conditions in this study are comparable to and often better than other well-established CSE components.
SU-E-T-511: Inter-Rater Variability in Classification of Incidents in a New Incident Reporting System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pappas, D; Reis, S; Ali, A

Purpose To determine how consistent the results of different raters are when reviewing the same cases within the Radiation Oncology Incident Learning System (ROILS). Methods Three second-year medical physics graduate students filled out incident reports in spreadsheets set up to mimic ROILS. All students studied the same 33 cases and independently entered their assessments, for a total of 99 reviewed cases. The narratives for these cases were obtained from a published International Commission on Radiological Protection (ICRP) report which included shorter narratives selected from the Radiation Oncology Safety Information System (ROSIS) database. Each category of questions was reviewed to seemore » how consistent the results were by utilizing free-marginal multirater kappa analysis. The percentage of cases where all raters shared full agreement or full disagreement was recorded to show which questions were answered consistently by multiple raters for a given case. The consistency among the raters was analyzed between ICRP and ROSIS cases to see if either group led to more reliable results. Results The categories where all raters agreed 100 percent in their choices were the event type (93.94 percent of cases 0.946 kappa) and the likelihood of the event being harmful to the patient (42.42 percent of cases 0.409 kappa). The categories where all raters disagreed 100 percent in their choices were the dosimetric severity scale (39.39 percent of cases 0.139 kappa) and the potential future toxicity (48.48 percent of cases 0.205 kappa). ROSIS had more cases where all raters disagreed than ICRP (23.06 percent of cases compared to 15.58 percent, respectively). Conclusion Despite reviewing the same cases, the results among the three raters was widespread. ROSIS narratives were shorter than ICRP, which suggests that longer narratives lead to more consistent results. This study shows that the incident reporting system can be optimized to yield more consistent results.« less
A randomized, rater-blinded, crossover study of the effects of oxymorphone extended release, fed versus fasting, on cognitive performance as tested with CANTAB in opioid-tolerant subjects.

PubMed

Spierings, Egilius L H; Volkerts, Edmund R; Heitland, Ivo; Thomson, Heather

2014-02-01

The maximum plasma concentration (Cmax ) of oxymorphone extended release (ER) 20 mg and 40 mg is approximately 50% higher in fed than in fasted subjects, with most of the difference in area-under-the-curve (AUC) occurring in the first 4 hours post-dose. Hence, the US FDA recommends in the approved labeling that oxymorphone ER is taken at least 1 hour before or 2 hours after eating. In order to determine the potential impact on cognitive performance of the increased absorption of oxymorphone ER, fed versus fasting, we conducted a randomized, rater-blinded, crossover study in 30 opioid-tolerant subjects, using tests from the Cambridge Neuropsychological Test Automated Battery (CANTAB). The subjects randomly received 40 mg oxymorphone ER after a high-fat meal of approximately 1,010 kCal or after fasting for 8-12 hours, and were tested 1 hour and 3 hours post-dose. The CANTAB tests, Spatial Recognition Memory (SRM) and Spatial Working Memory (SWM), showed no statistically significant differences between the fed and fasting conditions. However, sustained attention, as measured by the Rapid Visual Information Processing (RVP) CANTAB test, showed a statistically significant interaction of fed versus fasting and post-dose time of testing (F[1,28] = 6.88, P = 0.01), suggesting that 40 mg oxymorphone ER after a high-fat meal versus fasting mitigates the learning effect in this particular cognition domain from 1 hour to 3 hours post-dose. Oxymorphone 40 mg ER affected cognitive performance similarly within 3 hours post-dose, whether given on an empty stomach or after a high-fat meal, suggesting that the effect of food on plasma concentration may not be relevant in the medication's impact on cognition. Wiley Periodicals, Inc.
The mental health of individuals referred for assessment of autism spectrum disorder in adulthood: A clinic report.

PubMed

Russell, Ailsa J; Murphy, Clodagh M; Wilson, Ellie; Gillan, Nicola; Brown, Cordelia; Robertson, Dene M; Craig, Michael C; Deeley, Quinton; Zinkstok, Janneke; Johnston, Kate; McAlonan, Grainne M; Spain, Deborah; Murphy, Declan Gm

2016-07-01

Growing awareness of autism spectrum disorders has increased the demand for diagnostic services in adulthood. High rates of mental health problems have been reported in young people and adults with autism spectrum disorder. However, sampling and methodological issues mean prevalence estimates and conclusions about specificity in psychiatric co-morbidity in autism spectrum disorder remain unclear. A retrospective case review of 859 adults referred for assessment of autism spectrum disorder compares International Classification of Diseases, Tenth Revision diagnoses in those that met criteria for autism spectrum disorder (n = 474) with those that did not (n = 385). Rates of psychiatric diagnosis (>57%) were equivalent across both groups and exceeded general population rates for a number of conditions. The prevalence of anxiety disorders, particularly obsessive compulsive disorder, was significantly higher in adults with autism spectrum disorder than adults without autism spectrum disorder. Limitations of this observational clinic study, which may impact generalisability of the findings, include the lack of standardised structured psychiatric diagnostic assessments by assessors blind to autism spectrum disorder diagnosis and inter-rater reliability. The implications of this study highlight the need for careful consideration of mental health needs in all adults referred for autism spectrum disorder diagnosis. © The Author(s) 2015.
Can Physicians Identify Inappropriate Nuclear Stress Tests? An Examination of Inter-rater Reliability for the 2009 Appropriate Use Criteria for Radionuclide Imaging

PubMed Central

Ye, Siqin; Rabbani, LeRoy E.; Kelly, Christopher R.; Kelly, Maureen R.; Lewis, Matthew; Paz, Yehuda; Peck, Clara L.; Rao, Shaline; Bokhari, Sabahat; Weiner, Shepard D.; Einstein, Andrew J.

2014-01-01

Background We sought to determine inter-rater reliability of the 2009 Appropriate Use Criteria (AUC) for radionuclide imaging (RNI) and whether physicians at various levels of training can effectively identify nuclear stress tests with inappropriate indications. Methods and Results Four hundred patients were randomly selected from a consecutive cohort of patients undergoing nuclear stress testing at an academic medical center. Raters with different levels of training (including cardiology attending physicians, cardiology fellows, internal medicine hospitalists, and internal medicine interns) classified individual nuclear stress tests using the 2009 AUC. Consensus classification by two cardiologists was considered the operational gold standard, and sensitivity and specificity of individual raters for identifying inappropriate tests was calculated. Inter-rater reliability of the AUC was assessed using Cohen’s kappa statistics for pairs of different raters. The mean age of patients was 61.5 years; 214 (54%) were female. The cardiologists rated 256 (64%) of 400 NSTs as appropriate, 68 (18%) as uncertain, 55 (14%) as inappropriate; 21 (5%) tests were unable to be classified. Inter-rater reliability for non-cardiologist raters was modest (unweighted Cohen’s kappa, 0.51, 95% confidence interval, 0.45 to 0.55). Sensitivity of individual raters for identifying inappropriate tests ranged from 47% to 82%, while specificity ranged from 85% to 97%. Conclusions Inter-rater reliability for the 2009 AUC for RNI is modest, and there is considerable variation in the ability of raters at different levels of training to identify inappropriate tests. PMID:25563660
Managing Rater Effects through the Use of FACETS Analysis: The Case of a University Placement Test

ERIC Educational Resources Information Center

Wu, Siew Mei; Tan, Susan

2016-01-01

Rating essays is a complex task where students' grades could be adversely affected by test-irrelevant factors such as rater characteristics and rating scales. Understanding these factors and controlling their effects are crucial for test validity. Rater behaviour has been extensively studied through qualitative methods such as questionnaires and…
Definition of blindness under National Programme for Control of Blindness: Do we need to revise it?

PubMed Central

Vashist, Praveen; Senjam, Suraj Singh; Gupta, Vivek; Gupta, Noopur; Kumar, Atul

2017-01-01

A review appropriateness of the current definition of blindness under National Programme for Control of Blindness (NPCB), Government of India. Online search of peer-reviewed scientific published literature and guidelines using PubMed, the World Health Organization (WHO) IRIS, and Google Scholar with keywords, namely blindness and visual impairment, along with offline examination of reports of national and international organizations, as well as their cross-references was done until December 2016, to identify relevant documents on the definition of blindness. The evidence for the historical and currently adopted definition of blindness under the NPCB, the WHO, and other countries was reviewed. Differences in the NPCB and WHO definitions were analyzed to assess the impact on the epidemiological status of blindness and visual impairment in India. The differences in the criteria for blindness under the NPCB and the WHO definitions cause an overestimation of the prevalence of blindness in India. These variations are also associated with an over-representation of refractive errors as a cause of blindness and an under-representation of other causes under the NPCB definition. The targets for achieving elimination of blindness also become much more difficult to achieve under the NPCB definition. Ignoring differences in definitions when comparing the global and Indian prevalence of blindness will cause erroneous interpretations. We recommend that the appropriate modifications should be made in the NPCB definition of blindness to make it consistent with the WHO definition. PMID:28345562
Definition of blindness under National Programme for Control of Blindness: Do we need to revise it?

PubMed

Vashist, Praveen; Senjam, Suraj Singh; Gupta, Vivek; Gupta, Noopur; Kumar, Atul

2017-02-01

A review appropriateness of the current definition of blindness under National Programme for Control of Blindness (NPCB), Government of India. Online search of peer-reviewed scientific published literature and guidelines using PubMed, the World Health Organization (WHO) IRIS, and Google Scholar with keywords, namely blindness and visual impairment, along with offline examination of reports of national and international organizations, as well as their cross-references was done until December 2016, to identify relevant documents on the definition of blindness. The evidence for the historical and currently adopted definition of blindness under the NPCB, the WHO, and other countries was reviewed. Differences in the NPCB and WHO definitions were analyzed to assess the impact on the epidemiological status of blindness and visual impairment in India. The differences in the criteria for blindness under the NPCB and the WHO definitions cause an overestimation of the prevalence of blindness in India. These variations are also associated with an over-representation of refractive errors as a cause of blindness and an under-representation of other causes under the NPCB definition. The targets for achieving elimination of blindness also become much more difficult to achieve under the NPCB definition. Ignoring differences in definitions when comparing the global and Indian prevalence of blindness will cause erroneous interpretations. We recommend that the appropriate modifications should be made in the NPCB definition of blindness to make it consistent with the WHO definition.
Can training improve the quality of inferences made by raters in competency modeling? A quasi-experiment.

PubMed

Lievens, Filip; Sanchez, Juan I

2007-05-01

A quasi-experiment was conducted to investigate the effects of frame-of-reference training on the quality of competency modeling ratings made by consultants. Human resources consultants from a large consulting firm were randomly assigned to either a training or a control condition. The discriminant validity, interrater reliability, and accuracy of the competency ratings were significantly higher in the training group than in the control group. Further, the discriminant validity and interrater reliability of competency inferences were highest among an additional group of trained consultants who also had competency modeling experience. Together, these results suggest that procedural interventions such as rater training can significantly enhance the quality of competency modeling. 2007 APA, all rights reserved
Validity and reliability of exposure assessors' ratings of exposure intensity by type of occupational questionnaire and type of rater.

PubMed

Friesen, Melissa C; Coble, Joseph B; Katki, Hormuzd A; Ji, Bu-Tian; Xue, Shouzheng; Lu, Wei; Stewart, Patricia A

2011-07-01

In epidemiologic studies that rely on professional judgment to assess occupational exposures, the raters' accurate assessment is vital to detect associations. We examined the influence of the type of questionnaire, type of industry, and type of rater on the raters' ability to reliably and validly assess within-industry differences in exposure. Our aim was to identify areas where improvements in exposure assessment may be possible. Subjects from three foundries (n = 72) and three textile plants (n = 74) in Shanghai, China, completed an occupational history (OH) and an industry-specific questionnaire (IQ). Six total dust measurements were collected per subject and were used to calculate a subject-specific measurement mean, which was used as the gold standard. Six raters independently ranked the intensity of each subject's current job on an ordinal scale (1-4) based on the OH alone and on the OH and IQ together. Aggregate ratings were calculated for the group, for industrial hygienists, and for occupational physicians. We calculated intra-class correlation coefficients (ICCs) to evaluate the reliability of the raters. We calculated the correlation between the subject-specific measurement means and the ratings to evaluate the raters' validity. Analyses were stratified by industry, type of questionnaire, and type of rater. We also examined the agreement between the ratings by exposure category, where the subject-specific measurement means were categorized into two and four categories. The reliability and validity measures were higher for the aggregate ratings than for the ratings from the individual raters. The group's performance was maximized with three raters. Both the reliability and validity measures were higher for the foundry industry than for the textile industry. The ICCs were consistently lower in the OH/IQ round than in the OH round in both industries. In contrast, the correlations with the measurement means were higher in the OH/IQ round than in the OH round
DeuteRater: a tool for quantifying peptide isotope precision and kinetic proteomics.

PubMed

Naylor, Bradley C; Porter, Michael T; Wilson, Elise; Herring, Adam; Lofthouse, Spencer; Hannemann, Austin; Piccolo, Stephen R; Rockwood, Alan L; Price, John C

2017-05-15

Using mass spectrometry to measure the concentration and turnover of the individual proteins in a proteome, enables the calculation of individual synthesis and degradation rates for each protein. Software to analyze concentration is readily available, but software to analyze turnover is lacking. Data analysis workflows typically don't access the full breadth of information about instrument precision and accuracy that is present in each peptide isotopic envelope measurement. This method utilizes both isotope distribution and changes in neutromer spacing, which benefits the analysis of both concentration and turnover. We have developed a data analysis tool, DeuteRater, to measure protein turnover from metabolic D 2 O labeling. DeuteRater uses theoretical predictions for label-dependent change in isotope abundance and inter-peak (neutromer) spacing within the isotope envelope to calculate protein turnover rate. We have also used these metrics to evaluate the accuracy and precision of peptide measurements and thereby determined the optimal data acquisition parameters of different instruments, as well as the effect of data processing steps. We show that these combined measurements can be used to remove noise and increase confidence in the protein turnover measurement for each protein. Source code and ReadMe for Python 2 and 3 versions of DeuteRater are available at https://github.com/JC-Price/DeuteRater . Data is at https://chorusproject.org/pages/index.html project number 1147. Critical Intermediate calculation files provided as Tables S3 and S4. Software has only been tested on Windows machines. jcprice@chem.byu.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Rater Judgment and English Language Speaking Proficiency. Research Report

ERIC Educational Resources Information Center

Chalhoub-Deville, Micheline; Wigglesworth, Gillian

2005-01-01

The paper investigates whether there is a shared perception of speaking proficiency among raters from different English speaking countries. More specifically, this study examines whether there is a significant difference among English language learning (ELL) teachers, residing in Australia, Canada, the UK, and the USA when rating speech samples of…
On the Performance of the Marginal Homogeneity Test to Detect Rater Drift.

PubMed

Sgammato, Adrienne; Donoghue, John R

2018-06-01

When constructed response items are administered repeatedly, "trend scoring" can be used to test for rater drift. In trend scoring, raters rescore responses from the previous administration. Two simulation studies evaluated the utility of Stuart's Q measure of marginal homogeneity as a way of evaluating rater drift when monitoring trend scoring. In the first study, data were generated based on trend scoring tables obtained from an operational assessment. The second study tightly controlled table margins to disentangle certain features present in the empirical data. In addition to Q , the paired t test was included as a comparison, because of its widespread use in monitoring trend scoring. Sample size, number of score categories, interrater agreement, and symmetry/asymmetry of the margins were manipulated. For identical margins, both statistics had good Type I error control. For a unidirectional shift in margins, both statistics had good power. As expected, when shifts in the margins were balanced across categories, the t test had little power. Q demonstrated good power for all conditions and identified almost all items identified by the t test. Q shows substantial promise for monitoring of trend scoring.
Adjacent-Categories Mokken Models for Rater-Mediated Assessments

PubMed Central

Wind, Stefanie A.

2016-01-01

Molenaar extended Mokken’s original probabilistic-nonparametric scaling models for use with polytomous data. These polytomous extensions of Mokken’s original scaling procedure have facilitated the use of Mokken scale analysis as an approach to exploring fundamental measurement properties across a variety of domains in which polytomous ratings are used, including rater-mediated educational assessments. Because their underlying item step response functions (i.e., category response functions) are defined using cumulative probabilities, polytomous Mokken models can be classified as cumulative models based on the classifications of polytomous item response theory models proposed by several scholars. In order to permit a closer conceptual alignment with educational performance assessments, this study presents an adjacent-categories variation on the polytomous monotone homogeneity and double monotonicity models. Data from a large-scale rater-mediated writing assessment are used to illustrate the adjacent-categories approach, and results are compared with the original formulations. Major findings suggest that the adjacent-categories models provide additional diagnostic information related to individual raters’ use of rating scale categories that is not observed under the original formulation. Implications are discussed in terms of methods for evaluating rating quality. PMID:29795916
Inter-rater reliability and generalizability of patient note scores using a scoring rubric based on the USMLE Step-2 CS format.

PubMed

Park, Yoon Soo; Hyderi, Abbas; Bordage, Georges; Xing, Kuan; Yudkowsky, Rachel

2016-10-01

Recent changes to the patient note (PN) format of the United States Medical Licensing Examination have challenged medical schools to improve the instruction and assessment of students taking the Step-2 clinical skills examination. The purpose of this study was to gather validity evidence regarding response process and internal structure, focusing on inter-rater reliability and generalizability, to determine whether a locally-developed PN scoring rubric and scoring guidelines could yield reproducible PN scores. A randomly selected subsample of historical data (post-encounter PN from 55 of 177 medical students) was rescored by six trained faculty raters in November-December 2014. Inter-rater reliability (% exact agreement and kappa) was calculated for five standardized patient cases administered in a local graduation competency examination. Generalizability studies were conducted to examine the overall reliability. Qualitative data were collected through surveys and a rater-debriefing meeting. The overall inter-rater reliability (weighted kappa) was .79 (Documentation = .63, Differential Diagnosis = .90, Justification = .48, and Workup = .54). The majority of score variance was due to case specificity (13 %) and case-task specificity (31 %), indicating differences in student performance by case and by case-task interactions. Variance associated with raters and its interactions were modest (<5 %). Raters felt that justification was the most difficult task to score and that having case and level-specific scoring guidelines during training was most helpful for calibration. The overall inter-rater reliability indicates high level of confidence in the consistency of note scores. Designs for scoring notes may optimize reliability by balancing the number of raters and cases.
Building "e-rater"® Scoring Models Using Machine Learning Methods. Research Report. ETS RR-16-04

ERIC Educational Resources Information Center

Chen, Jing; Fife, James H.; Bejar, Isaac I.; Rupp, André A.

2016-01-01

The "e-rater"® automated scoring engine used at Educational Testing Service (ETS) scores the writing quality of essays. In the current practice, e-rater scores are generated via a multiple linear regression (MLR) model as a linear combination of various features evaluated for each essay and human scores as the outcome variable. This…

Frame-of-reference training for simulation-based intraoperative communication assessment.

PubMed

Gardner, Aimee K; Russo, Michael A; Jabbour, Ibrahim I; Kosemund, Matthew; Scott, Daniel J

2016-09-01

The purpose of this study was to examine the impact of frame-of-reference (FOR) training on assessments of intraoperative communication skills and identify areas of need to inform curricular efforts. Simulation instructors (M.D., Ph.D., Research Fellow, Simulation Technician) underwent a 2-hour FOR training session with the operating room communication instrument. They then independently rated communication skills of 19 PGY1s who participated in a team-based simulation. Residents completed self-assessments via video review of the scenario. Intraclass correlation coefficients were used to examine inter-rater reliability. Relationships between trained raters and resident scores were assessed with Pearson correlation coefficients and paired sample t tests. Inter-reliability after FOR training was .91. The correlation between trained rater scores and resident evaluations was nonsignificant. Residents significantly underestimated their intraoperative communication skills (P < .05). Use of names, closed loop communication, and sharing information with team members demonstrated consistently low ratings among all residents. These findings reveal that a number of individuals can be trained to reliably rate resident intraoperative communication performance and that residents tend to under-rate their communication skills. Copyright © 2016 Elsevier Inc. All rights reserved.
Rating Written Performance: What Do Raters Do and Why?

ERIC Educational Resources Information Center

Kuiken, Folkert; Vedder, Ineke

2014-01-01

This study investigates the relationship in L2 writing between raters' judgments of communicative adequacy and linguistic complexity by means of six-point Likert scales, and general measures of linguistic performance. The participants were 39 learners of Italian and 32 of Dutch, who wrote two short argumentative essays. The same writing tasks…
Construct Validity of "e-rater"® in Scoring TOEFL® Essays. Research Report. ETS RR-07-21

ERIC Educational Resources Information Center

Attali, Yigal

2007-01-01

This study examined the construct validity of the "e-rater"® automated essay scoring engine as an alternative to human scoring in the context of TOEFL® essay writing. Analyses were based on a sample of students who repeated the TOEFL within a short time period. Two "e-rater" scores were investigated in this study, the first…
Inter-rater reliability of output measures for a posture matching assessment approach: a pilot study with food service workers.

PubMed

Cann, A P; Connolly, M; Ruuska, R; MacNeil, M; Birmingham, T B; Vandervoort, A A; Callaghan, J P

2008-04-01

Despite the ongoing health problem of repetitive strain injuries, there are few tools currently available for ergonomic applications evaluating cumulative loading that have well-documented evidence of reliability and validity. The purpose of this study was to determine the inter-rater reliability of a posture matching based analysis tool (3DMatch, University of Waterloo) for predicting cumulative and peak spinal loads. A total of 30 food service workers were each videotaped for a 1-h period while performing typical work activities and a single work task was randomly selected from each for analysis by two raters. Inter-rater reliability was determined using intraclass correlation coefficients (ICC) model 2,1 and standard errors of measurement for cumulative and peak spinal and shoulder loading variables across all subjects. Overall, 85.5% of variables had moderate to excellent inter-rater reliability, with ICCs ranging from 0.30-0.99 for all cumulative and peak loading variables. 3DMatch was found to be a reliable ergonomic tool when more than one rater is involved.
BurnCase 3D software validation study: Burn size measurement accuracy and inter-rater reliability.

PubMed

Parvizi, Daryousch; Giretzlehner, Michael; Wurzer, Paul; Klein, Limor Dinur; Shoham, Yaron; Bohanon, Fredrick J; Haller, Herbert L; Tuca, Alexandru; Branski, Ludwik K; Lumenta, David B; Herndon, David N; Kamolz, Lars-P

2016-03-01

The aim of this study was to compare the accuracy of burn size estimation using the computer-assisted software BurnCase 3D (RISC Software GmbH, Hagenberg, Austria) with that using a 2D scan, considered to be the actual burn size. Thirty artificial burn areas were pre planned and prepared on three mannequins (one child, one female, and one male). Five trained physicians (raters) were asked to assess the size of all wound areas using BurnCase 3D software. The results were then compared with the real wound areas, as determined by 2D planimetry imaging. To examine inter-rater reliability, we performed an intraclass correlation analysis with a 95% confidence interval. The mean wound area estimations of the five raters using BurnCase 3D were in total 20.7±0.9% for the child, 27.2±1.5% for the female and 16.5±0.1% for the male mannequin. Our analysis showed relative overestimations of 0.4%, 2.8% and 1.5% for the child, female and male mannequins respectively, compared to the 2D scan. The intraclass correlation between the single raters for mean percentage of the artificial burn areas was 98.6%. There was also a high intraclass correlation between the single raters and the 2D Scan visible. BurnCase 3D is a valid and reliable tool for the determination of total body surface area burned in standard models. Further clinical studies including different pediatric and overweight adult mannequins are warranted. Copyright © 2016 Elsevier Ltd and ISBI. All rights reserved.
Individual differences in susceptibility to inattentional blindness.

PubMed

Seegmiller, Janelle K; Watson, Jason M; Strayer, David L

2011-05-01

Inattentional blindness refers to the finding that people do not always see what appears in their gaze. Though inattentional blindness affects large percentages of people, it is unclear if there are individual differences in susceptibility. The present study addressed whether individual differences in attentional control, as reflected by variability in working memory capacity, modulate susceptibility to inattentional blindness. Participants watched a classic inattentional blindness video (Simons & Chabris, 1999) and were instructed to count passes among basketball players, wherein 58% noticed the unexpected: a person wearing a gorilla suit. When participants were accurate with their pass counts, individuals with higher working memory capacity were more likely to report seeing the gorilla (67%) than those with lesser working memory capacity (36%). These results suggest that variability in attentional control is a potential mechanism underlying the apparent modulation of inattentional blindness across individuals.
Prompt and Rater Effects in Second Language Writing Performance Assessment

ERIC Educational Resources Information Center

Lim, Gad S.

2009-01-01

Performance assessments have become the norm for evaluating language learners' writing abilities in international examinations of English proficiency. Two aspects of these assessments are usually systematically varied: test takers respond to different prompts, and their responses are read by different raters. This raises the possibility of undue…
A Literature Review of Inattentional and Change Blindness in Transportation

DOT National Transportation Integrated Search

2009-12-01

Inattentional blindness refers to situations in which a person is unaware of a change that is occurring because attention is not currently focused on what is changing. Change blindness occurs when a change takes place during an eye movement or blink ...
Use of e-rater[R] in Scoring of the TOEFL iBT[R] Writing Test. Research Report. ETS RR-11-25

ERIC Educational Resources Information Center

Haberman, Shelby J.

2011-01-01

Alternative approaches are discussed for use of e-rater[R] to score the TOEFL iBT[R] Writing test. These approaches involve alternate criteria. In the 1st approach, the predicted variable is the expected rater score of the examinee's 2 essays. In the 2nd approach, the predicted variable is the expected rater score of 2 essay responses by the…
Challenges in using rater judgements in medical education.

PubMed

Albanese, M A

2000-08-01

Changes in the healthcare environment are putting increasing pressure on medical schools to make faculty accountable and to document the quality of the medical education they provide. Faculty's ratings of students' performances and students' ratings of faculty's teaching are important elements in these efforts to document educational quality. This article discusses selected research related to factors affecting raters' judgements, analyses how changes in the health care environment are influencing such judgements, offers some suggestions to moderate some of the effects and links these influences to the system that upholds professional standards. Ratings are known to have a positive bias (generosity error), provide limited discrimination and often fail to document serious deficits. The potential sources of these problems relate to the mechanics of the rating task, the system used to obtain ratings and factors affecting rater judgement. As managed care demands reduce the time faculty have for teaching, as system-wide disincentives to provide negative ratings proliferate and as social engineering challenges, such as the Americans with Disabilities Act, impose differential standards for students, the natural tendency to avoid giving negative ratings becomes even harder to resist. Ultimately, these forces compromise the capability of faculty to uphold the standards of the profession. The author calls for a national effort to stem the erosion of those standards.
Validation of the secretion severity rating scale.

PubMed

Pluschinski, Petra; Zaretsky, Eugen; Stöver, Timo; Murray, Joseph; Sader, Robert; Hey, Christiane

2016-10-01

Accumulation of secretions within the hypopharynx, aditus laryngis, and trachea is one characteristic of severe dysphagia and is of high clinical and therapeutic relevance. For the graduation of the secretion severity level, a secretion scale was provided by Murray et al. in 1996. The purpose of the study presented here is the validation of this scale by analyzing the intra-rater and inter-rater reliability as well as concurrent validity. For examination of reliability and validity, a reference standard was defined by two expert clinicians who reviewed 40 video recordings of fiberendoscopic swallowing evaluations, with 10 videos for each severity grade. These videos were rated and rerated independently and blinded by 4 ENT-residents with an interval of 4 weeks. Both the intra-rater (Kendall's τ > 0.847***) and inter-rater reliability (Kendall's W > 0.951***) were highly significant and can be considered good or very good. Correlation of the median of all ratings with the reference standard was close to the highest possible value 1 (τ = 0.984***). The scale was proved to be a reliable and valid instrument for graduation of one of the principal symptoms of oropharyngeal dysphagia and is recommended as an evidence-based instrument for standardized fiberoptic endoscopic evaluation of swallowing.
Inter-rater reliability of h-index scores calculated by Web of Science and Scopus for clinical epidemiology scientists.

PubMed

Walker, Benjamin; Alavifard, Sepand; Roberts, Surain; Lanes, Andrea; Ramsay, Tim; Boet, Sylvain

2016-06-01

We investigated the inter-rater reliability of Web of Science (WoS) and Scopus when calculating the h-index of 25 senior scientists in the Clinical Epidemiology Program of the Ottawa Hospital Research Institute. Bibliometric information and the h-indices for the subjects were computed by four raters using the automatic calculators in WoS and Scopus. Correlation and agreement between ratings was assessed using Spearman's correlation coefficient and a Bland-Altman plot, respectively. Data could not be gathered from Google Scholar due to feasibility constraints. The Spearman's rank correlation between the h-index of scientists calculated with WoS was 0.81 (95% CI 0.72-0.92) and with Scopus was 0.95 (95% CI 0.92-0.99). The Bland-Altman plot showed no significant rater bias in WoS and Scopus; however, the agreement between ratings is higher in Scopus compared to WoS. Our results showed a stronger relationship and increased agreement between raters when calculating the h-index of a scientist using Scopus compared to WoS. The higher inter-rater reliability and simple user interface used in Scopus may render it the more effective database when calculating the h-index of senior scientists in epidemiology. © 2016 Health Libraries Group.
Effects of Rater Characteristics and Scoring Methods on Speaking Assessment

ERIC Educational Resources Information Center

Matsugu, Sawako

2013-01-01

Understanding the sources of variance in speaking assessment is important in Japan where society's high demand for English speaking skills is growing. Three challenges threaten fair assessment of speaking. First, in Japanese university speaking courses, teachers are typically the only raters, but teachers' knowledge of their students may unfairly…
Exploring the Impact of Mental Workload on Rater-Based Assessments

ERIC Educational Resources Information Center

Tavares, Walter; Eva, Kevin W.

2013-01-01

When appraising the performance of others, assessors must acquire relevant information and process it in a meaningful way in order to translate it effectively into ratings, comments, or judgments about how well the performance meets appropriate standards. Rater-based assessment strategies in health professional education, including scale and…
Exploring Examiner Judgement of Professional Competence in Rater Based Assessment

ERIC Educational Resources Information Center

Naumann, Fiona L.; Marshall, Stephen; Shulruf, Boaz; Jones, Philip D.

2016-01-01

Exercise physiology courses have transitioned to competency based, forcing Universities to rethink assessment to ensure students are competent to practice. This study built on earlier research to explore rater cognition, capturing factors that contribute to assessor decision making about students' competency. The aims were to determine the source…
Inter-rater reliability of three standardized functional tests in patients with low back pain

PubMed Central

Tidstrand, Johan; Horneij, Eva

2009-01-01

Background Of all patients with low back pain, 85% are diagnosed as "non-specific lumbar pain". Lumbar instability has been described as one specific diagnosis which several authors have described as delayed muscular responses, impaired postural control as well as impaired muscular coordination among these patients. This has mostly been measured and evaluated in a laboratory setting. There are few standardized and evaluated functional tests, examining functional muscular coordination which are also applicable in the non-laboratory setting. In ordinary clinical work, tests of functional muscular coordination should be easy to apply. The aim of this present study was to therefore standardize and examine the inter-rater reliability of three functional tests of muscular functional coordination of the lumbar spine in patients with low back pain. Methods Nineteen consecutive individuals, ten men and nine women were included. (Mean age 42 years, SD ± 12 yrs). Two independent examiners assessed three tests: "single limb stance", "sitting on a Bobath ball with one leg lifted" and "unilateral pelvic lift" on the same occasion. The standardization procedure took altered positions of the spine or pelvis and compensatory movements of the free extremities into account. The inter-rater reliability was analyzed by Cohen's kappa coefficient (κ) and by percentage agreement. Results The inter-rater reliability for the right and the left leg respectively was: for the single limb stance very good (κ: 0.88–1.0), for sitting on a Bobath ball good (κ: 0.79) and very good (κ: 0.88) and for the unilateral pelvic lift: good (κ: 0.61) and moderate (κ: 0.47). Conclusion The present study showed good to very good inter-rater reliability for two standardized tests, that is, the single-limb stance and sitting on a Bobath-ball with one leg lifted. Inter-rater reliability for the unilateral pelvic lift test was moderate to good. Validation of the tests in their ability to evaluate lumbar
Rubella Deaf-Blind Child: Implications of Psychological Assessment. Proceedings.

ERIC Educational Resources Information Center

Rouin, Carole

Presented are proceedings of a conference involving authorities in testing and evaluating the blind, deaf, and deaf-blind. In a paper titled "Psychological Implications of Assessing the Deaf", C. Goetzinger discusses references used in audiology, anatomy and physiology of the ear, degrees of hearing impairment, and implications of the various…
Evaluating the Construct-Coverage of the e-rater[R] Scoring Engine. Research Report. ETS RR-09-01

ERIC Educational Resources Information Center

Quinlan, Thomas; Higgins, Derrick; Wolff, Susanne

2009-01-01

This report evaluates the construct coverage of the e-rater[R[ scoring engine. The matter of construct coverage depends on whether one defines writing skill, in terms of process or product. Originally, the e-rater engine consisted of a large set of components with a proven ability to predict human holistic scores. By organizing these capabilities…
The Child and Adolescent Services Assessment: Interrater Reliability and Predictors of Rater Disagreement.

PubMed

Schwartz, Karen T G; Bowling, Amanda A; Dickerson, John F; Lynch, Frances L; Brent, David A; Porta, Giovanna; Iyengar, Satish; Weersing, V Robin

2018-05-24

The current study evaluated the interrater reliability of the Child and Adolescent Services Assessment (CASA), a widely used structured interview measuring pediatric mental health service use. Interviews (N = 72) were randomly selected from a pediatric effectiveness trial, and audio was coded by an independent rater. Regressions were employed to identify predictors of rater disagreement. Interrater reliability was high for items (> 94%) and summary metrics (ICC > .79) across service sectors. Predictors of disagreement varied by domain; significant predictors indexed higher clinical severity or social disadvantage. Results support the CASA as a reliable and robust assessment of pediatric service use, but administrators should be alert when assessing vulnerable populations.
Do you see what I see? Mobile eye-tracker contextual analysis and inter-rater reliability.

PubMed

Stuart, S; Hunt, D; Nell, J; Godfrey, A; Hausdorff, J M; Rochester, L; Alcock, L

2018-02-01

Mobile eye-trackers are currently used during real-world tasks (e.g. gait) to monitor visual and cognitive processes, particularly in ageing and Parkinson's disease (PD). However, contextual analysis involving fixation locations during such tasks is rarely performed due to its complexity. This study adapted a validated algorithm and developed a classification method to semi-automate contextual analysis of mobile eye-tracking data. We further assessed inter-rater reliability of the proposed classification method. A mobile eye-tracker recorded eye-movements during walking in five healthy older adult controls (HC) and five people with PD. Fixations were identified using a previously validated algorithm, which was adapted to provide still images of fixation locations (n = 116). The fixation location was manually identified by two raters (DH, JN), who classified the locations. Cohen's kappa correlation coefficients determined the inter-rater reliability. The algorithm successfully provided still images for each fixation, allowing manual contextual analysis to be performed. The inter-rater reliability for classifying the fixation location was high for both PD (kappa = 0.80, 95% agreement) and HC groups (kappa = 0.80, 91% agreement), which indicated a reliable classification method. This study developed a reliable semi-automated contextual analysis method for gait studies in HC and PD. Future studies could adapt this methodology for various gait-related eye-tracking studies.

Deaf-Blindness: National Organizations and Resources. Reference Circular No. 93-1.

ERIC Educational Resources Information Center

Library of Congress, Washington, DC. National Library Service for the Blind and Physically Handicapped.

This circular lists national organizations and print and audiovisual resources on areas of service to persons with deaf blindness, including rehabilitation, education, information and referral, recreation, and sources for adaptive devices and products. Section I is an alphabetical list of 40 national organizations and resources, including…
Expert and Naive Raters Using the PAG: Does it Matter?

ERIC Educational Resources Information Center

Cornelius, Edwin T.; And Others

1984-01-01

Questions the observed correlation between job experts and naive raters using the Position Analysis Questionnaire (PAQ); and conducts a replication of the Smith and Hakel study (1979) with college students (N=39). Concluded that PAQ ratings from job experts and college students are not equivalent and therefore are not interchangeable. (LLL)
CPS-Rater: Automated Sequential Annotation for Conversations in Collaborative Problem-Solving Activities. Research Report. ETS RR-17-58

ERIC Educational Resources Information Center

Hao, Jiangang; Chen, Lei; Flor, Michael; Liu, Lei; von Davier, Alina A.

2017-01-01

Conversations in collaborative problem-solving activities can be used to probe the collaboration skills of the team members. Annotating the conversations into different collaboration skills by human raters is laborious and time consuming. In this report, we report our work on developing an automated annotation system, CPS-rater, for conversational…
[Intra-rater Reliability for the Questionnaire on Activity Limitations and Participation Restrictions of Children With ADHD].

PubMed

Salamanca Duque, Luisa Matilde; Naranjo Aristizábal, María Mercedes; Gutiérrez Ríos, Gladys Helena; Prieto, Jaime Bayona

2014-03-01

Questionnaires for evaluating activity limitations and participation restrictions in children with ADHD (CLARP-TDAH) has recently been developed in Colombia, based on the suggestions made by the WHO from the International Classification of Functioning, Disability and Health (ICF), allowing clinical evaluation beyond an evaluation of the functionality and functioning of children in their family and school environments. Previous research with the questionnaire proved useful in the multidisciplinary approach of Colombian children with ADHD. This study determines the level of intra-rater reliability for questionnaires CLARP-TDAH Parents and Teachers. The study included a non-random sample of 203 Colombian children attending school and diagnosed with ADHD. Intra-rater reliability and the reproducibility of the results was determined using the Kappa index. The informants were parents and teachers. Kappa values >0.7 were obtained for the intra-rater reliability of the questionnaire domains of CLARP-TDAH Parents, while for CLARP-TDAH Teachers domains these values were >0.8. CLARP-TDAH questionnaires are a tool with a good level of intra-rater reliability, which allows a reliable assessment of activity limitations and participation restrictions in order to determine the level of functioning at home and school. Copyright © 2014 Asociación Colombiana de Psiquiatría. Publicado por Elsevier España. All rights reserved.
And the Winner Is … : Inter-Rater Reliability among Scholarship Assessors

ERIC Educational Resources Information Center

Johnston, Lucy; Schluter, Philip J.

2017-01-01

With increasing competition for postgraduate research scholarships, awarding processes demand attention and scrutiny. We examine inter-rater reliability for two prestigious New Zealand scholarships, the Shirtcliffe Fellowship and the Gordon Watson Scholarship. For each scholarship, five assessors (three academic; two non-academic) independently…
Blindness prevention programmes: past, present, and future.

PubMed Central

Resnikoff, S.; Pararajasegaram, R.

2001-01-01

Blindness and visual impairment have far-reaching implications for society, the more so when it is realized that 80% of visual disability is avoidable. The marked increase in the size of the elderly population, with their greater propensity for visually disabling conditions, presents a further challenge in this respect. However, if available knowledge and skills were made accessible to those communities in greatest need, much of this needless blindness could be alleviated. Since its inception over 50 years ago, and beginning with trachoma control, WHO has spearheaded efforts to assist Member States to meet the challenge of needless blindness. Since the establishment of the WHO Programme for the Prevention of Blindness in 1978, vast strides have been made through various forms of technical support to establish national prevention of blindness programmes. A more recent initiative, "The Global Initiative for the Elimination of Avoidable Blindness" (referred to as "VISION 2020--The Right to Sight"), launched in 1999, is a collaborative effort between WHO and a number of international nongovernmental organizations and other interested partners. This effort is poised to take the steps necessary to achieve the goal of eliminating avoidable blindness worldwide by the year 2020. PMID:11285666
Quality assessment for color reproduction using a blind metric

NASA Astrophysics Data System (ADS)

Bringier, B.; Quintard, L.; Larabi, M.-C.

2007-01-01

This paper deals with image quality assessment. This field plays nowadays an important role in various image processing applications. Number of objective image quality metrics, that correlate or not, with the subjective quality have been developed during the last decade. Two categories of metrics can be distinguished, the first with full-reference and the second with no-reference. Full-reference metric tries to evaluate the distortion introduced to an image with regards to the reference. No-reference approach attempts to model the judgment of image quality in a blind way. Unfortunately, the universal image quality model is not on the horizon and empirical models established on psychophysical experimentation are generally used. In this paper, we focus only on the second category to evaluate the quality of color reproduction where a blind metric, based on human visual system modeling is introduced. The objective results are validated by single-media and cross-media subjective tests.
Comparison of Algorithm-based Estimates of Occupational Diesel Exhaust Exposure to Those of Multiple Independent Raters in a Population-based Case–Control Study

PubMed Central

Friesen, Melissa C.

2013-01-01

Objectives: Algorithm-based exposure assessments based on patterns in questionnaire responses and professional judgment can readily apply transparent exposure decision rules to thousands of jobs quickly. However, we need to better understand how algorithms compare to a one-by-one job review by an exposure assessor. We compared algorithm-based estimates of diesel exhaust exposure to those of three independent raters within the New England Bladder Cancer Study, a population-based case–control study, and identified conditions under which disparities occurred in the assessments of the algorithm and the raters. Methods: Occupational diesel exhaust exposure was assessed previously using an algorithm and a single rater for all 14 983 jobs reported by 2631 study participants during personal interviews conducted from 2001 to 2004. Two additional raters independently assessed a random subset of 324 jobs that were selected based on strata defined by the cross-tabulations of the algorithm and the first rater’s probability assessments for each job, oversampling their disagreements. The algorithm and each rater assessed the probability, intensity and frequency of occupational diesel exhaust exposure, as well as a confidence rating for each metric. Agreement among the raters, their aggregate rating (average of the three raters’ ratings) and the algorithm were evaluated using proportion of agreement, kappa and weighted kappa (κw). Agreement analyses on the subset used inverse probability weighting to extrapolate the subset to estimate agreement for all jobs. Classification and Regression Tree (CART) models were used to identify patterns in questionnaire responses that predicted disparities in exposure status (i.e., unexposed versus exposed) between the first rater and the algorithm-based estimates. Results: For the probability, intensity and frequency exposure metrics, moderate to moderately high agreement was observed among raters (κw = 0.50–0.76) and between the
Deriving Oral Assessment Scales across Different Tests and Rater Groups.

ERIC Educational Resources Information Center

Chalhoub-Deville, Micheline

1995-01-01

The purpose of this study was to derive the criteria/dimensions underlying learners' second-language oral ability scores across three tests: an oral interview, a narration, and a read-aloud. A stimulus tape of 18 speech samples was presented to 3 native speaker rater groups for evaluation. Results indicate that researchers might need to reconsider…
Simulated patient training: Using inter-rater reliability to evaluate simulated patient consistency in nursing education.

PubMed

MacLean, Sharon; Geddes, Fiona; Kelly, Michelle; Della, Phillip

2018-03-01

Simulated patients (SPs) are frequently used for training nursing students in communication skills. An acknowledged benefit of using SPs is the opportunity to provide a standardized approach by which participants can demonstrate and develop communication skills. However, relatively little evidence is available on how to best facilitate and evaluate the reliability and accuracy of SPs' performances. The aim of this study is to investigate the effectiveness of an evidenced based SP training framework to ensure standardization of SPs. The training framework was employed to improve inter-rater reliability of SPs. A quasi-experimental study was employed to assess SP post-training understanding of simulation scenario parameters using inter-rater reliability agreement indices. Two phases of data collection took place. Initially a trial phase including audio-visual (AV) recordings of two undergraduate nursing students completing a simulation scenario is rated by eight SPs using the Interpersonal Communication Assessments Scale (ICAS) and Quality of Discharge Teaching Scale (QDTS). In phase 2, eight SP raters and four nursing faculty raters independently evaluated students' (N=42) communication practices using the QDTS. Intraclass correlation coefficients (ICC) were >0.80 for both stages of the study in clinical communication skills. The results support the premise that if trained appropriately, SPs have a high degree of reliability and validity to both facilitate and evaluate student performance in nurse education. Crown Copyright © 2018. Published by Elsevier Ltd. All rights reserved.
Inter-rater reliability of direct observations of the physical and psychosocial working conditions in eldercare: An evaluation in the DOSES project.

PubMed

Karstad, Kristina; Rugulies, Reiner; Skotte, Jørgen; Munch, Pernille Kold; Greiner, Birgit A; Burdorf, Alex; Søgaard, Karen; Holtermann, Andreas

2018-05-01

The aim of the study was to develop and evaluate the reliability of the "Danish observational study of eldercare work and musculoskeletal disorders" (DOSES) observation instrument to assess physical and psychosocial risk factors for musculoskeletal disorders (MSD) in eldercare work. During 1.5 years, sixteen raters conducted 117 inter-rater observations from 11 nursing homes. Reliability was evaluated using percent agreement and Gwet's AC1 coefficient. Of the 18 examined items, inter-rater reliability was excellent for 7 items (AC1>0.75) fair to good for 7 items (AC1 0.40-0.75) and poor for 2 items (AC1 0-0.40). For 2 items there was no agreement between the raters (AC1 <0). The reliability did not differ between the first and second half of the data collection period and the inter-rater observations were representative regarding occurrence of events in eldercare work. The instrument is appropriate for assessing physical and psychosocial risk factors for MSD among eldercare workers. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
Unblinding the dark matter blind spots

DOE Office of Scientific and Technical Information (OSTI.GOV)

Han, Tao; Kling, Felix; Su, Shufang

The dark matter (DM) blind spots in the Minimal Supersymmetric Standard Model (MSSM) refer to the parameter regions where the couplings of the DM particles to the $Z$-boson or the Higgs boson are almost zero, leading to vanishingly small signals for the DM direct detections. In this paper, we carry out comprehensive analyses for the DM searches under the blind-spot scenarios in MSSM. Guided by the requirement of acceptable DM relic abundance, we explore the complementary coverage for the theory parameters at the LHC, the projection for the future underground DM direct searches, and the indirect searches from the relicmore » DM annihilation into photons and neutrinos. We find that (i) the spin-independent (SI) blind spots may be rescued by the spin-dependent (SD) direct detection in the future underground experiments, and possibly by the indirect DM detections from IceCube and SuperK neutrino experiments; (ii) the detection of gamma rays from Fermi-LAT may not reach the desirable sensitivity for searching for the DM blind-spot regions; (iii) the SUSY searches at the LHC will substantially extend the discovery region for the blind-spot parameters. As a result, the dark matter blind spots thus may be unblinded with the collective efforts in future DM searches.« less
Unblinding the dark matter blind spots

DOE PAGES

Han, Tao; Kling, Felix; Su, Shufang; ...

2017-02-10

The dark matter (DM) blind spots in the Minimal Supersymmetric Standard Model (MSSM) refer to the parameter regions where the couplings of the DM particles to the $Z$-boson or the Higgs boson are almost zero, leading to vanishingly small signals for the DM direct detections. In this paper, we carry out comprehensive analyses for the DM searches under the blind-spot scenarios in MSSM. Guided by the requirement of acceptable DM relic abundance, we explore the complementary coverage for the theory parameters at the LHC, the projection for the future underground DM direct searches, and the indirect searches from the relicmore » DM annihilation into photons and neutrinos. We find that (i) the spin-independent (SI) blind spots may be rescued by the spin-dependent (SD) direct detection in the future underground experiments, and possibly by the indirect DM detections from IceCube and SuperK neutrino experiments; (ii) the detection of gamma rays from Fermi-LAT may not reach the desirable sensitivity for searching for the DM blind-spot regions; (iii) the SUSY searches at the LHC will substantially extend the discovery region for the blind-spot parameters. As a result, the dark matter blind spots thus may be unblinded with the collective efforts in future DM searches.« less
Ultrasonographic measurement of the acromiohumeral distance in spinal cord injury: Reliability and effects of shoulder positioning.

PubMed

Lin, Yen-Sheng; Boninger, Michael L; Day, Kevin A; Koontz, Alicia M

2015-11-01

To investigate the reliability of ultrasonographic measurement of acromiohumeral distance (AHD) and the effects of shoulder positioning on AHD among manual wheelchair users (MWUs) with spinal cord injury (SCI) and an able-bodied control group. Ten MWUs with SCI and 10 able-bodied subjects participated in this study. The ultrasonographic measurements of AHD from each subject were obtained by two raters during passive and active scapular plane arm elevation in neutral, 45°, 90° with and without resistance and in a weight relief raise position. The measurements were recorded again by each rater using the same procedures after a 30-minute time interval. All raters were blinded to each other's measurements. University Laboratories and Veteran Affairs Healthcare System. Intra-rater (intraclass correlation coefficient, ICC > 0.83) and inter-rater (ICC > 0.78) reliability was excellent for both the MWUs with SCI and able-bodied groups across all arm positions except for the 45° position in the control group for one of the raters (intra-rater: ICC < 0.40 and inter-rater: ICC < 0.60). AHD significantly reduced when the shoulder was in the 90° arm elevated positions with or without resistance. Findings from our study demonstrated that ultrasonography is a reliable means to evaluate AHD in both able bodied and individuals with SCI, who are known to have significant shoulder pathology. This technique could be used to develop reference measures and to identify changes in AHD caused by interventions.
Inter-rater reliability of the Full Outline of UnResponsiveness score and the Glasgow Coma Scale in critically ill patients: a prospective observational study

PubMed Central

2010-01-01

Introduction The Glasgow Coma Scale (GCS) is the most widely used scoring system for comatose patients in intensive care. Limitations of the GCS include the impossibility to assess the verbal score in intubated or aphasic patients, and an inconsistent inter-rater reliability. The FOUR (Full Outline of UnResponsiveness) score, a new coma scale not reliant on verbal response, was recently proposed. The aim of the present study was to compare the inter-rater reliability of the GCS and the FOUR score among unselected patients in general critical care. A further aim was to compare the inter-rater reliability of neurologists with that of intensive care unit (ICU) staff. Methods In this prospective observational study, scoring of GCS and FOUR score was performed by neurologists and ICU staff on 267 consecutive patients admitted to intensive care. Results In a total of 437 pair wise ratings the exact inter-rater agreement for the GCS was 71%, and for the FOUR score 82% (P = 0.0016); the inter-rater agreement within a range of ± 1 score point for the GCS was 90%, and for the FOUR score 92% (P = ns.). The exact inter-rater agreement among neurologists was superior to that among ICU staff for the FOUR score (87% vs. 79%, P = 0.04) but not for the GCS (73% vs. 73%). Neurologists and ICU staff did not significantly differ in the inter-rater agreement within a range of ± 1 score point for both GCS (88% vs. 93%) and the FOUR score (91% vs. 88%). Conclusions The FOUR score performed better than the GCS for exact inter-rater agreement, but not for the clinically more relevant agreement within the range of ± 1 score point. Though neurologists outperformed ICU staff with regard to exact inter-rater agreement, the inter-rater agreement of ICU staff within the clinically more relevant range of ± 1 score point equalled that of the neurologists. The small advantage in inter-rater reliability of the FOUR score is most likely insufficient to replace the GCS, a score with a long
Improving Teacher Selection: The Effect of Inter-Rater Reliability in the Screening Process. CEDR Working Paper. WP #2015-7

ERIC Educational Resources Information Center

Martinkova, Patricia; Goldhaber, Dan

2015-01-01

Inter-rater reliability, commonly assessed by intra-class correlation coefficient ICC, is an important index for describing the extent to which there is consistency amongst two or more raters in assigned measures. In organizational research, the data structure is often hierarchical and designs deviate substantially from the ideal of a balanced…
Attenuation of Change Blindness in Children with Autism Spectrum Disorders

ERIC Educational Resources Information Center

Fletcher-Watson, Sue; Leekam, Susan R.; Connolly, Brenda; Collis, Jess M.; Findlay, John M.; McConachie, Helen; Rodgers, Jacqui

2012-01-01

Change blindness refers to the difficulty most people find in detecting a difference between two pictures when these are presented successively, with a brief interruption between. Attention at the site of the change is required for detection. A number of studies have investigated change blindness in adults and children with autism spectrum…
Grant Peer Review: Improving Inter-Rater Reliability with Training.

PubMed

Sattler, David N; McKnight, Patrick E; Naney, Linda; Mathis, Randy

2015-01-01

This study developed and evaluated a brief training program for grant reviewers that aimed to increase inter-rater reliability, rating scale knowledge, and effort to read the grant review criteria. Enhancing reviewer training may improve the reliability and accuracy of research grant proposal scoring and funding recommendations. Seventy-five Public Health professors from U.S. research universities watched the training video we produced and assigned scores to the National Institutes of Health scoring criteria proposal summary descriptions. For both novice and experienced reviewers, the training video increased scoring accuracy (the percentage of scores that reflect the true rating scale values), inter-rater reliability, and the amount of time reading the review criteria compared to the no video condition. The increase in reliability for experienced reviewers is notable because it is commonly assumed that reviewers--especially those with experience--have good understanding of the grant review rating scale. The findings suggest that both experienced and novice reviewers who had not received the type of training developed in this study may not have appropriate understanding of the definitions and meaning for each value of the rating scale and that experienced reviewers may overestimate their knowledge of the rating scale. The results underscore the benefits of and need for specialized peer reviewer training.
PHYSICAL EDUCATION FOR BLIND CHILDREN.

ERIC Educational Resources Information Center

BUELL, CHARLES E.

A PRACTICAL RATHER THAN A THEORETICAL REFERENCE GUIDE, THE BOOK DISCUSSES THE NEED OF THE BLIND OR VISUALLY IMPAIRED CHILD FOR PHYSICAL EDUCATION. PAST AND PRESENT PROGRAMS IN PUBLIC AND RESIDENTIAL SCHOOLS, RECREATION AND LEISURE TIME ACTIVITIES (A GUIDE FOR PARENTS), SPORTS AND INTERSCHOLASTIC COMPETITION, ACTIVE GAMES, CONTESTS, RELAYS, AND…
Rating Communication in GP Consultations: The Association Between Ratings Made by Patients and Trained Clinical Raters

PubMed Central

Burt, Jenni; Abel, Gary; Elmore, Natasha; Newbould, Jenny; Davey, Antoinette; Llanwarne, Nadia; Maramba, Inocencio; Paddison, Charlotte; Benson, John; Silverman, Jonathan; Elliott, Marc N.; Campbell, John; Roland, Martin

2016-01-01

Patient evaluations of physician communication are widely used, but we know little about how these relate to professionally agreed norms of communication quality. We report an investigation into the association between patient assessments of communication quality and an observer-rated measure of communication competence. Consent was obtained to video record consultations with Family Practitioners in England, following which patients rated the physician’s communication skills. A sample of consultation videos was subsequently evaluated by trained clinical raters using an instrument derived from the Calgary-Cambridge guide to the medical interview. Consultations scored highly for communication by clinical raters were also scored highly by patients. However, when clinical raters judged communication to be of lower quality, patient scores ranged from “poor” to “very good.” Some patients may be inhibited from rating poor communication negatively. Patient evaluations can be useful for measuring relative performance of physicians’ communication skills, but absolute scores should be interpreted with caution. PMID:27698072

Congenital blindness limits allocentric to egocentric switching ability.

PubMed

Ruggiero, Gennaro; Ruotolo, Francesco; Iachini, Tina

2018-03-01

Many everyday spatial activities require the cooperation or switching between egocentric (subject-to-object) and allocentric (object-to-object) spatial representations. The literature on blind people has reported that the lack of vision (congenital blindness) may limit the capacity to represent allocentric spatial information. However, research has mainly focused on the selective involvement of egocentric or allocentric representations, not the switching between them. Here we investigated the effect of visual deprivation on the ability to switch between spatial frames of reference. To this aim, congenitally blind (long-term visual deprivation), blindfolded sighted (temporary visual deprivation) and sighted (full visual availability) participants were compared on the Ego-Allo switching task. This task assessed the capacity to verbally judge the relative distances between memorized stimuli in switching (from egocentric-to-allocentric: Ego-Allo; from allocentric-to-egocentric: Allo-Ego) and non-switching (only-egocentric: Ego-Ego; only-allocentric: Allo-Allo) conditions. Results showed a difficulty in congenitally blind participants when switching from allocentric to egocentric representations, not when the first anchor point was egocentric. In line with previous results, a deficit in processing allocentric representations in non-switching conditions also emerged. These findings suggest that the allocentric deficit in congenital blindness may determine a difficulty in simultaneously maintaining and combining different spatial representations. This deficit alters the capacity to switch between reference frames specifically when the first anchor point is external and not body-centered.
A Completely Blind Video Integrity Oracle.

PubMed

Mittal, Anish; Saad, Michele A; Bovik, Alan C

2016-01-01

Considerable progress has been made toward developing still picture perceptual quality analyzers that do not require any reference picture and that are not trained on human opinion scores of distorted images. However, there do not yet exist any such completely blind video quality assessment (VQA) models. Here, we attempt to bridge this gap by developing a new VQA model called the video intrinsic integrity and distortion evaluation oracle (VIIDEO). The new model does not require the use of any additional information other than the video being quality evaluated. VIIDEO embodies models of intrinsic statistical regularities that are observed in natural vidoes, which are used to quantify disturbances introduced due to distortions. An algorithm derived from the VIIDEO model is thereby able to predict the quality of distorted videos without any external knowledge about the pristine source, anticipated distortions, or human judgments of video quality. Even with such a paucity of information, we are able to show that the VIIDEO algorithm performs much better than the legacy full reference quality measure MSE on the LIVE VQA database and delivers performance comparable with a leading human judgment trained blind VQA model. We believe that the VIIDEO algorithm is a significant step toward making real-time monitoring of completely blind video quality possible.
Inter-Rater Agreement of Auscultation, Palpable Fremitus, and Ventilator Waveform Sawtooth Patterns Between Clinicians.

PubMed

Berry, Marc P; Martí, Joan-Daniel; Ntoumenopoulos, George

2016-10-01

Clinicians often use numerous bedside assessments for secretion retention in participants who are receiving invasive mechanical ventilation. This study aimed to evaluate inter-rater agreement between clinicians when using standard clinical assessments of secretion retention and whether differences in clinician experience influenced inter-rater agreement. Seventy-one mechanically ventilated participants were assessed by a research clinician and by one of 13 ICU clinicians. Each clinician conducted a standardized assessment of lung auscultation, palpation for chest-wall (rhonchal) fremitus, and ventilator inspiratory/expiratory flow-time waveforms for the sawtooth pattern. On the presence of breath sounds, agreement ranged from absolute to moderate in the upper zones and the lower zones, respectively. Kappa values for abnormal and adventitious lung sounds achieved moderate agreement in the upper zones, less than chance agreement to substantial agreement in the middle zones, and moderate agreement to almost perfect agreement in the lower zones. Moderate to almost perfect agreement was established for palpable fremitus in the upper zones, moderate to substantial agreement in the middle zones, and less than chance to moderate agreement in the lower zones. Inter-rater agreement on the presence of expiratory sawtooth pattern identification showed moderate agreement. The level of percentage agreement between the research and ICU clinicians for each respiratory assessment studied did not relate directly to level of clinical experience. Inter-rater agreement for all assessments showed variability between lung regions but maintained reasonable percentage agreement in mechanically ventilated participants. The level of percentage agreement achieved between clinicians did not directly relate to clinical experience for all respiratory assessments. Therefore, these respiratory assessments should not necessarily be viewed in isolation but interpreted within the context of a full
Perceptual, not memorial, disruption underlies emotion-induced blindness.

PubMed

Kennedy, Briana L; Most, Steven B

2012-04-01

Emotion-induced blindness refers to impaired awareness of stimuli appearing in the temporal wake of an emotionally arousing stimulus (S. B. Most, Chun, Widders, & Zald, 2005). In previous emotion-induced blindness experiments, participants withheld target responses until the end of a rapid stream of stimuli, even though each target appeared in the middle of the stream. The resulting interval between the targets' offset and participants' initiation of a response leaves open the possibility that emotion-induced blindness reflects a failure to encode or maintain target information in memory rather than a failure of perception. In the present study, participants engaged in a typical emotion-induced blindness task but initiated a response immediately upon seeing each target. Emotion-induced blindness was nevertheless robust. This suggests that emotion-induced blindness is not attributable to the delay between awareness of a target and the initiation of a response, but rather reflects the disruptive impact of emotional distractors on mechanisms driving conscious perception. (PsycINFO Database Record (c) 2012 APA, all rights reserved).
Writing Evaluation: Rater and Task Effects on the Reliability of Writing Scores for Children in Grades 3 and 4

ERIC Educational Resources Information Center

Kim, Young-Suk Grace; Schatschneider, Christopher; Wanzek, Jeanne; Gatlin, Brandy; Al Otaiba, Stephanie

2017-01-01

We examined how raters and tasks influence measurement error in writing evaluation and how many raters and tasks are needed to reach a desirable level of 0.90 and 0.80 reliabilities for children in Grades 3 and 4. A total of 211 children (102 boys) were administered three tasks in narrative and expository genres, respectively, and their written…
Least-Squares Models to Correct for Rater Effects in Performance Assessment.

ERIC Educational Resources Information Center

Raymond, Mark R.; Viswesvaran, Chockalingam

This study illustrates the use of three least-squares models to control for rater effects in performance evaluation: (1) ordinary least squares (OLS); (2) weighted least squares (WLS); and (3) OLS subsequent to applying a logistic transformation to observed ratings (LOG-OLS). The three models were applied to ratings obtained from four…
Inter-Rater Agreement of Pressure Ulcer Risk and Prevention Measures in the National Database of Nursing Quality Indicators(®) (NDNQI).

PubMed

Waugh, Shirley Moore; Bergquist-Beringer, Sandra

2016-06-01

In this descriptive multi-site study, we examined inter-rater agreement on 11 National Database of Nursing Quality Indicators(®) (NDNQI(®) ) pressure ulcer (PrU) risk and prevention measures. One hundred twenty raters at 36 hospitals captured data from 1,637 patient records. At each hospital, agreement between the most experienced rater and each other team rater was calculated for each measure. In the ratings studied, 528 patients were rated as "at risk" for PrU and, therefore, were included in calculations of agreement for the prevention measures. Prevalence-adjusted kappa (PAK) was used to interpret inter-rater agreement because prevalence of single responses was high. The PAK values for eight measures indicated "substantial" to "near perfect" agreement between most experienced and other team raters: Skin assessment on admission (.977, 95% CI [.966-.989]), PrU risk assessment on admission (.978, 95% CI [.964-.993]), Time since last risk assessment (.790, 95% CI [.729-.852]), Risk assessment method (.997, 95% CI [.991-1.0]), Risk status (.877, 95% CI [.838-.917]), Any prevention (.856, 95% CI [.76-.943]), Skin assessment (.956, 95% CI [.904-1.0]), and Pressure-redistribution surface use (.839, 95% CI [.763-.916]). For three intervention measures, PAK values fell below the recommended value of ≥.610: Routine repositioning (.577, 95% CI [.494-.661]), Nutritional support (.500, 95% CI [.418-.581]), and Moisture management (.556, 95% CI [.469-.643]). Areas of disagreement were identified. Findings provide support for the reliability of 8 of the 11 measures. Further clarification of data collection procedures is needed to improve reliability for the less reliable measures. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
High inter-rater reliability, agreement, and convergent validity of Constant score in patients with clavicle fractures.

PubMed

Ban, Ilija; Troelsen, Anders; Kristensen, Morten Tange

2016-10-01

The Constant score (CS) has been the primary endpoint in most studies on clavicle fractures. However, the CS was not developed to assess patients with clavicle fractures. Our aim was to examine inter-rater reliability and agreement of the CS in patients with clavicle fractures. The secondary aim was to estimate the correlation between the CS and the Disabilities of the Arm, Shoulder and Hand score and the internal consistency of the 2 scores. On the basis of sample sizing, 36 patients (31 male and 5 female patients; mean age, 41.3 years) with clavicle fractures underwent standardized CS assessment at a mean of 6.8 weeks (SD, 1.0 weeks) after injury. Reliability and agreement of the CS were determined by 2 raters. The interclass correlation coefficient (ICC2,1), standard error of measurement, minimal detectable change, Cronbach α coefficient, and Pearson correlation coefficient were estimated. Inter-rater reliability of the total CS was excellent (interclass correlation coefficient, 0.94; 95% confidence interval, 0.88-0.97), with no systematic difference between the 2 raters (P = .75). The standard error of measurement (measurement error at the group level) was 4.9, whereas the minimal detectable change (smallest change needed to indicate a real change for an individual) was 13.6 CS points. The internal consistency of the 10 CS items was good, with a Cronbach α of .85, and we found a strong correlation (r = -0.92) between the CS and Disabilities of the Arm, Shoulder and Hand score. The CS was found to be reliable for assessing patients with clavicle fractures, especially at the group level. With high inter-rater reliability and agreement, in addition to good internal consistency, the standardized CS used in this study can be used for comparison of results from different settings. Copyright © 2016 Journal of Shoulder and Elbow Surgery Board of Trustees. Published by Elsevier Inc. All rights reserved.
[Inter-rater concordance of the "Nursing Activities Score" in intensive care].

PubMed

Valls-Matarín, Josefa; Salamero-Amorós, Maria; Roldán-Gil, Carmen; Quintana-Riera, Salvador

2015-01-01

To evaluate inter-rater concordance in the valuation of the "Nursing Activities Score". Cross-sectional descriptive study conducted from December 2012 until June 2013 in a general intensive care unit with twelve beds. Three evaluator nurses, simultaneously and independently, through the patient daily charts, scored the nursing workload using Nursing Activities Score scale in all patients admitted over 18 years old. Three hundreds and thirty-nine records were collected. The intra-class correlation coefficient (ICC) between evaluators was 0.92 (0.89-0.94). A perfect concordance was obtained in 39.1% of the items, with 52.2% having a high, and 8.7% having lower concordance, corresponding to two of the items with multiple scoring options. Significant differences between two of the evaluators (P=.049) were found. Although the inter-rater concordance was high, more accurate records are needed to reduce the variability of the items with multiple options and to allow more accuracy in the interpretation and measurement of the data regarding nursing workload. Copyright © 2015 Elsevier España, S.L.U. All rights reserved.
Rater Biases in Genetically Informative Research Designs: Comment on Bartels, Boomsma, Hudziak, van Beijsterveldt, and van den Oord (2007)

ERIC Educational Resources Information Center

Hoyt, William T.

2007-01-01

Rater biases are of interest to behavior genetic researchers, who often use ratings data as a basis for studying heritability. Inclusion of multiple raters for each sibling pair (M. Bartels, D. I. Boomsma, J. J. Hudziak, T. C. E. M. van Beijsterveldt, & E. J. C. G. van den Oord, 2007) is a promising strategy for controlling bias variance and may…
Development and inter-rater reliability of a standardized verbal instruction manual for the Chinese Geriatric Depression Scale-short form.

PubMed

Wong, M T P; Ho, T P; Ho, M Y; Yu, C S; Wong, Y H; Lee, S Y

2002-05-01

The Geriatric Depression Scale (GDS) is a common screening tool for elderly depression in Hong Kong. This study aimed at (1) developing a standardized manual for the verbal administration and scoring of the GDS-SF, and (2) comparing the inter-rater reliability between the standardized and non-standardized verbal administration of GDS-SF. Two studies were reported. In Study 1, the process of developing the manual was described. In Study 2, we compared the inter-rater reliabilities of GDS-SF scores using the standardized verbal instructions and the traditional non-standardized administration. Results of Study 2 indicated that the standardized procedure in verbal administration and scoring improved the inter-rater reliabilities of GDS-SF. Copyright 2002 John Wiley & Sons, Ltd.
Reliability and diagnostic characteristics of the JFK coma recovery scale-revised: exploring the influence of rater's level of experience.

PubMed

Løvstad, Marianne; Frøslie, Kathrine F; Giacino, Joseph T; Skandsen, Toril; Anke, Audny; Schanke, Anne-Kristine

2010-01-01

To confirm the reliability and diagnostic validity of the JFK Coma Recovery Scale-Revised (CRS-R) across raters with varying levels of experience. Thirty-one patients with disorders of consciousness were recruited from 6 Norwegian hospitals. CRS-R and the Disability Rating Scale. Reliability measures were good for the CRS-R total scores and moderate to good for its subscales. Diagnostic agreement among examiners was good. Raters' experience with the CRS-R favorably influenced reliability. Sensitivity and specificity analyses demonstrated better detection of patients in minimally conscious state on the CRS-R relative to the Disability Rating Scale. The CRS-R is a reliable tool for diagnosing vegetative state and minimally conscious state. Raters' level of experience influences the reliability of the CRS-R scores.
Ultrasonographic measurement of the acromiohumeral distance in spinal cord injury: Reliability and effects of shoulder positioning

PubMed Central

Lin, Yen-Sheng; Boninger, Michael L.; Day, Kevin A.

2015-01-01

Objective To investigate the reliability of ultrasonographic measurement of acromiohumeral distance (AHD) and the effects of shoulder positioning on AHD among manual wheelchair users (MWUs) with spinal cord injury (SCI) and an able-bodied control group. Methods Ten MWUs with SCI and 10 able-bodied subjects participated in this study. The ultrasonographic measurements of AHD from each subject were obtained by two raters during passive and active scapular plane arm elevation in neutral, 45°, 90° with and without resistance and in a weight relief raise position. The measurements were recorded again by each rater using the same procedures after a 30-minute time interval. All raters were blinded to each other's measurements. Setting University Laboratories and Veteran Affairs Healthcare System. Results Intra-rater (intraclass correlation coefficient, ICC > 0.83) and inter-rater (ICC > 0.78) reliability was excellent for both the MWUs with SCI and able-bodied groups across all arm positions except for the 45° position in the control group for one of the raters (intra-rater: ICC < 0.40 and inter-rater: ICC < 0.60). AHD significantly reduced when the shoulder was in the 90° arm elevated positions with or without resistance. Conclusion Findings from our study demonstrated that ultrasonography is a reliable means to evaluate AHD in both able bodied and individuals with SCI, who are known to have significant shoulder pathology. This technique could be used to develop reference measures and to identify changes in AHD caused by interventions. PMID:24968117
Screening of the spine in adolescents: inter- and intra-rater reliability and measurement error of commonly used clinical tests.

PubMed

Aartun, Ellen; Degerfalk, Anna; Kentsdotter, Linn; Hestbaek, Lise

2014-02-10

Evidence on the reliability of clinical tests used for the spinal screening of children and adolescents is currently lacking. The aim of this study was to determine the inter- and intra-rater reliability and measurement error of clinical tests commonly used when screening young spines. Two experienced chiropractors independently assessed 111 adolescents aged 12-14 years who were recruited from a primary school in Denmark. A standardised examination protocol was used to test inter-rater reliability including tests for scoliosis, hypermobility, general mobility, inter-segmental mobility and end range pain in the spine. Seventy-five of the 111 subjects were re-examined after one to four hours to test intra-rater reliability. Percentage agreement and Cohen's Kappa were calculated for binary variables, and interclass correlation (ICC) and Bland-Altman plots with Limits of Agreement (LoA) were calculated for continuous measures. Inter-rater percentage agreement for binary data ranged from 59.5% to 100%. Kappa ranged from 0.06-1.00. Kappa ≥ 0.40 was seen for elbow, thumb, fifth finger and trunk/hip flexion hypermobility, pain response in inter-segmental mobility and end range pain in lumbar flexion and extension. For continuous data, ICCs ranged from 0.40-0.95. Only forward flexion as measured by finger-to-floor distance reached an acceptable ICC(≥ 0.75). Overall, results for intra-rater reliability were better than for inter-rater reliability but for both components, the LoA were quite wide compared with the range of assessments. Some clinical tests showed good, and some tests poor, reliability when applied in a spinal screening of adolescents. The results could probably be improved by additional training and further test standardization. This is the first step in evaluating the value of these tests for the spinal screening of adolescents. Future research should determine the association between these tests and current and/or future neck and back pain.
Beyond Essay Length: Evaluating e-rater[R]'s Performance on TOEFL[R] Essays. Research Reports. Report 73. RR-04-04

ERIC Educational Resources Information Center

Chodorow, Martin; Burstein, Jill

2004-01-01

This study examines the relation between essay length and holistic scores assigned to Test of English as a Foreign Language[TM] (TOEFL[R]) essays by e-rater[R], the automated essay scoring system developed by ETS. Results show that an early version of the system, e-rater99, accounted for little variance in human reader scores beyond that which…
Intra- and Inter-Rater Reliability of the Rate of Force Development of Hip Abductor Muscles Measured by Hand-Held Dynamometer

ERIC Educational Resources Information Center

Takeda, Kazuya; Tanabe, Shigeo; Koyama, Soichiro; Nagai, Tomoko; Sakurai, Hiroaki; Kanada, Yoshikiyo; Shomoto, Koji

2018-01-01

The aim of this study was to clarify the intra- and inter-rater reliability of the rate of force development in hip abductor muscle force measurements using a hand-held dynamometer. Thirty healthy adults were separately assessed by two independent raters on two separate days. Rate of force development was calculated from the slope of the…
A Critical Review of Some Qualitative Research Methods Used to Explore Rater Cognition

ERIC Educational Resources Information Center

Suto, Irenka

2012-01-01

Internationally, many assessment systems rely predominantly on human raters to score examinations. Arguably, this facilitates the assessment of multiple sophisticated educational constructs, strengthening assessment validity. It can introduce subjectivity into the scoring process, however, engendering threats to accuracy. The present objectives…
Translation, adaptation and inter-rater reliability of the administration manual for the Fugl-Meyer assessment.

PubMed

Michaelsen, Stella M; Rocha, André S; Knabben, Rodrigo J; Rodrigues, Luciano P; Fernandes, Claudia G C

2011-01-01

Recently, the reliability of the Brazilian version of the Fugl-Meyer Assessment (FMA) was assessed through the scoring given according to observations made by a single evaluator who applied the test. When different raters apply the scale, the reliability may depend on the interpretation given to the assessment sheet. In such cases, a clear administration manual is essential for ensuring homogeneity of application. To translate and adapt the French Canadian version of the FMA administration manual into Brazilian Portuguese and to evaluate the inter-rater reliability when different evaluators apply the FMA on the basis of the information contained in the manual. Eighteen adults (59±10 years) with chronic hemiparesis (38±35 months after a stroke) took part in this study. Eight patients participated in the first part of the study and 10 in the second part. Based on analyzing the results from part 1, an adapted version was developed, in which information and photos were added to illustrate the positions of the patient and evaluator. The inter-rater reliability was assessed using the intraclass correlation coefficient (ICC). The reliability of the FMA based on the adapted version of the manual was excellent for the total motor scores for the upper limbs (ICC=0.98) and lower limbs (ICC=0.90), as well as for movement sense (ICC=0.98) and upper and lower-limb passive range of motion (ICC=0.84 and 0.90, respectively). The reliability was moderate for tactile sensitivity (0.75). The joint pain assessment presented low reliability. The results showed that, except for pain assessment, application of the FMA based on the adapted version of the application manual for Brazilian Portuguese presented adequate inter-rater reliability.
Measuring the morphological characteristics of thoracolumbar fascia in ultrasound images: an inter-rater reliability study.

PubMed

De Coninck, Kyra; Hambly, Karen; Dickinson, John W; Passfield, Louis

2018-06-01

Chronic lower back pain is still regarded as a poorly understood multifactorial condition. Recently, the thoracolumbar fascia complex has been found to be a contributing factor. Ultrasound imaging has shown that people with chronic lower back pain demonstrate both a significant decrease in shear strain, and a 25% increase in thickness of the thoracolumbar fascia. There is sparse data on whether medical practitioners agree on the level of disorganisation in ultrasound images of thoracolumbar fascia. The purpose of this study was to establish inter-rater reliability of the ranking of architectural disorganisation of thoracolumbar fascia on a scale from 'very disorganised' to 'very organised'. An exploratory analysis was performed using a fully crossed design of inter-rater reliability. Thirty observers were recruited, consisting of 21 medical doctors, 7 physiotherapists and 2 radiologists, with an average of 13.03 ± 9.6 years of clinical experience. All 30 observers independently rated the architectural disorganisation of the thoracolumbar fascia in 30 ultrasound scans, on a Likert-type scale with rankings from 1 = very disorganised to 10 = very organised. Internal consistency was assessed using Cronbach's alpha. Krippendorff's alpha was used to calculate the overall inter-rater reliability. The Krippendorf's alpha was .61, indicating a modest degree of agreement between observers on the different morphologies of thoracolumbar fascia.The Cronbach's alpha (0.98), indicated that there was a high degree of consistency between observers. Experience in ultrasound image analysis did not affect constancy between observers (Cronbach's range between experienced and inexperienced raters: 0.95 and 0.96 respectively). Medical practitioners agree on morphological features such as levels of organisation and disorganisation in ultrasound images of thoracolumbar fascia, regardless of experience. Further analysis by an expert panel is required to develop specific
Systematic reviews need to consider applicability to disadvantaged populations: inter-rater agreement for a health equity plausibility algorithm

PubMed Central

2012-01-01

Background Systematic reviews have been challenged to consider effects on disadvantaged groups. A priori specification of subgroup analyses is recommended to increase the credibility of these analyses. This study aimed to develop and assess inter-rater agreement for an algorithm for systematic review authors to predict whether differences in effect measures are likely for disadvantaged populations relative to advantaged populations (only relative effect measures were addressed). Methods A health equity plausibility algorithm was developed using clinimetric methods with three items based on literature review, key informant interviews and methodology studies. The three items dealt with the plausibility of differences in relative effects across sex or socioeconomic status (SES) due to: 1) patient characteristics; 2) intervention delivery (i.e., implementation); and 3) comparators. Thirty-five respondents (consisting of clinicians, methodologists and research users) assessed the likelihood of differences across sex and SES for ten systematic reviews with these questions. We assessed inter-rater reliability using Fleiss multi-rater kappa. Results The proportion agreement was 66% for patient characteristics (95% confidence interval: 61%-71%), 67% for intervention delivery (95% confidence interval: 62% to 72%) and 55% for the comparator (95% confidence interval: 50% to 60%). Inter-rater kappa, assessed with Fleiss kappa, ranged from 0 to 0.199, representing very low agreement beyond chance. Conclusions Users of systematic reviews rated that important differences in relative effects across sex and socioeconomic status were plausible for a range of individual and population-level interventions. However, there was very low inter-rater agreement for these assessments. There is an unmet need for discussion of plausibility of differential effects in systematic reviews. Increased consideration of external validity and applicability to different populations and settings is

Systematic reviews need to consider applicability to disadvantaged populations: inter-rater agreement for a health equity plausibility algorithm.

PubMed

Welch, Vivian; Brand, Kevin; Kristjansson, Elizabeth; Smylie, Janet; Wells, George; Tugwell, Peter

2012-12-19

Systematic reviews have been challenged to consider effects on disadvantaged groups. A priori specification of subgroup analyses is recommended to increase the credibility of these analyses. This study aimed to develop and assess inter-rater agreement for an algorithm for systematic review authors to predict whether differences in effect measures are likely for disadvantaged populations relative to advantaged populations (only relative effect measures were addressed). A health equity plausibility algorithm was developed using clinimetric methods with three items based on literature review, key informant interviews and methodology studies. The three items dealt with the plausibility of differences in relative effects across sex or socioeconomic status (SES) due to: 1) patient characteristics; 2) intervention delivery (i.e., implementation); and 3) comparators. Thirty-five respondents (consisting of clinicians, methodologists and research users) assessed the likelihood of differences across sex and SES for ten systematic reviews with these questions. We assessed inter-rater reliability using Fleiss multi-rater kappa. The proportion agreement was 66% for patient characteristics (95% confidence interval: 61%-71%), 67% for intervention delivery (95% confidence interval: 62% to 72%) and 55% for the comparator (95% confidence interval: 50% to 60%). Inter-rater kappa, assessed with Fleiss kappa, ranged from 0 to 0.199, representing very low agreement beyond chance. Users of systematic reviews rated that important differences in relative effects across sex and socioeconomic status were plausible for a range of individual and population-level interventions. However, there was very low inter-rater agreement for these assessments. There is an unmet need for discussion of plausibility of differential effects in systematic reviews. Increased consideration of external validity and applicability to different populations and settings is warranted in systematic reviews to meet this
Using Raters from India to Score a Large-Scale Speaking Test

ERIC Educational Resources Information Center

Xi, Xiaoming; Mollaun, Pam

2011-01-01

We investigated the scoring of the Speaking section of the Test of English as a Foreign Language[TM] Internet-based (TOEFL iBT[R]) test by speakers of English and one or more Indian languages. We explored the extent to which raters from India, after being trained and certified, were able to score the TOEFL examinees with mixed first languages…
The Treatment of Brain Arteriovenous Malformation Study (TOBAS): A preliminary inter- and intra-rater agreement study on patient management.

PubMed

Fahed, Robert; Batista, André L; Darsaut, Tim E; Gentric, Jean-Christophe; Ducroux, Célina; Chaalala, Chiraz; Roberge, David; Bojanowski, Michel W; Weill, Alain; Roy, Daniel; Magro, Elsa; Raymond, Jean

2017-07-01

The best management of brain arteriovenous malformation (bAVM) patients remains unknown. Randomized allocation may be more readily accepted when there is uncertainty and disagreement regarding the management of potential participants. In planning for a trial, we aimed to assess variability and agreement among physicians managing bAVM patients. A portfolio composed of 35 patients was sent to 47 clinicians of various specialties managing bAVM patients. For each patient, physicians were asked their best management decision (surgery/embolization/radiosurgery/conservative), their confidence level, and whether they would include the patient in a randomized trial comparing conservative and curative management. Seven physicians, who had access to all images of each patient, independently responded twice, to assess inter and intra-rater agreement using kappa statistics. The inter-rater agreement (30 raters, including 16 neuroradiologists) for best management decision was only "fair" (κ [95%CI]=0.210[0.157; 0.295]). Agreement remained below 'substantial' (κ<.6) between physicians of the same specialty, and when no distinctions were made between various treatments (when responses were dichotomized as conservative versus curative). With access to all images the inter-rater agreement remained fair. The intra-rater agreement reached "substantial" only for the dichotomized decisions. Responding clinicians were willing to include 54.4% of patients (mainly unruptured bAVMs) in a randomized trial. There is a lack of agreement among clinicians involved in the management of bAVM patients. In this study a substantial proportion of clinicians were willing to offer randomized allocation of management options to a substantial number of patients. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
Raters Interpret Positively and Negatively Worded Items Similarly in a Quality of Life Instrument for Children

PubMed Central

Lin, Chung-Ying; Strong, Carol; Tsai, Meng-Che; Lee, Chih-Ting

2017-01-01

Measurement invariance is an important assumption to meaningfully compare children’s quality of life (QoL) between different raters (eg, children and parents) and across genders. Moreover, QoL instruments may combine using negatively and positively worded items—a common method to reduce response bias. However, the wording effects may have different levels of impact on different raters and genders. Our aim was to investigate the measurement invariance of Kid-KINDL, a commonly used QoL instrument, across genders and raters and to consider the wording effects simultaneously. Third to sixth graders (208 boys and 235 girls) completed the self-rated Kid-KINDL, and 1 parent each of 241 children completed the parent-rated Kid-KINDL. The wording effects were accounted for by correlated traits-uncorrelated methods model. The measurement invariance was examined using multigroup confirmatory factor analysis. Item loadings and item intercepts were invariant across gender and rater when we simultaneously accounted for the wording effects of Kid-KINDL. Our results suggest that Kid-KINDL could be used to compare QoL across gender and that parent-rated Kid-KINDL could be used to measure children’s QoL. Specifically, the invariant factor loadings across child-rated and parent-rated Kid-KINDL suggest that the score weights in each item were the same for both children and parents (ie, the important items identified by the children are the same items identified by the parents). The invariant item intercepts suggest that both children and parents share the same threshold for each item. Based on the results, we tentatively recommend that each score of a parent-rated Kid-KINDL can stand for each child’s QoL. PMID:28292193
Surveying for "artifacts": the susceptibility of the OCB-performance evaluation relationship to common rater, item, and measurement context effects.

PubMed

Podsakoff, Nathan P; Whiting, Steven W; Welsh, David T; Mai, Ke Michael

2013-09-01

Despite the increased attention paid to biases attributable to common method variance (CMV) over the past 50 years, researchers have only recently begun to systematically examine the effect of specific sources of CMV in previously published empirical studies. Our study contributes to this research by examining the extent to which common rater, item, and measurement context characteristics bias the relationships between organizational citizenship behaviors and performance evaluations using a mixed-effects analytic technique. Results from 173 correlations reported in 81 empirical studies (N = 31,146) indicate that even after controlling for study-level factors, common rater and anchor point number similarity substantially biased the focal correlations. Indeed, these sources of CMV (a) led to estimates that were between 60% and 96% larger when comparing measures obtained from a common rater, versus different raters; (b) led to 39% larger estimates when a common source rated the scales using the same number, versus a different number, of anchor points; and (c) when taken together with other study-level predictors, accounted for over half of the between-study variance in the focal correlations. We discuss the implications for researchers and practitioners and provide recommendations for future research. PsycINFO Database Record (c) 2013 APA, all rights reserved
Inter-rater reliability of categorical versus continuous scoring of fish vitality: Does it affect the utility of the reflex action mortality predictor (RAMP) approach?

PubMed Central

Yochum, Noëlle; Kochzius, Marc; Ampe, Bart; Tuyttens, Frank A. M.

2017-01-01

Scoring reflex responsiveness and injury of aquatic organisms has gained popularity as predictors of discard survival. Given this method relies upon the individual interpretation of scoring criteria, an evaluation of its robustness is done here to test whether protocol-instructed, multiple raters with diverse backgrounds (research scientist, technician, and student) are able to produce similar or the same reflex and injury score for one of the same flatfish (European plaice, Pleuronectes platessa) after experiencing commercial fishing stressors. Inter-rater reliability for three raters was assessed by using a 3-point categorical scale (‘absent’, ‘weak’, ‘strong’) and a tagged visual analogue continuous scale (tVAS, a 10 cm bar split in three labelled sections: 0 for ‘absent’, ‘weak’, ‘moderate’, and ‘strong’) for six reflex responses, and a 4-point scale for four injury types. Plaice (n = 304) were sampled from 17 research beam-trawl deployments during four trips. Fleiss kappa (categorical scores) and intra-class correlation coefficients (ICC, continuous scores) indicated variable inter-rater agreement by reflex type (ranging between 0.55 and 0.88, and 67% and 91% for Fleiss kappa and ICC, respectively), with least agreement among raters on extent of injury (Fleiss kappa between 0.08 and 0.27). Despite differences among raters, which did not significantly influence the relationship between impairment and predicted survival, combining categorical reflex and injury scores always produced a close relationship of such vitality indices and observed delayed mortality. The use of the continuous scale did not improve fit of these models compared with using the reflex impairment index based on categorical scores. Given these findings, we recommend using a 3-point categorical over a continuous scale. We also determined that training rather than experience of raters minimised inter-rater differences. Our results suggest that cost-efficient reflex
Bridging the gap between DeafBlind minds: interactional and social foundations of intention attribution in the Seattle DeafBlind community

PubMed Central

Edwards, Terra

2015-01-01

This article is concerned with social and interactional processes that simplify pragmatic acts of intention attribution. The empirical focus is a series of interactions among DeafBlind people in Seattle, Washington, where pointing signs are used to individuate objects of reference in the immediate environment. Most members of this community are born deaf and slowly become blind. They come to Seattle using Visual American Sign Language, which has emerged and developed in a field organized around visual modes of access. As vision deteriorates, however, links between deictic signs (such as pointing) and the present, remembered, or imagined environment erode in idiosyncratic ways across the community of language-users, and as a result, it becomes increasingly difficult for participants to converge on objects of reference. In the past, DeafBlind people addressed this problem by relying on sighted interpreters. Under the influence of the recent “pro-tactile” movement, they have turned instead to one another to find new solutions to these referential problems. Drawing on analyses of 120 h of videorecorded interaction and language-use, detailed fieldnotes collected during 12 months of sustained anthropological fieldwork, and more than 15 years of involvement in this community in a range of capacities, I argue that DeafBlind people are generating new and reciprocal modes of access to their environment, and this process is aligning language with context in novel ways. I discuss two mechanisms that can account for this process: embedding in the social field and deictic integration. I argue that together, these social and interactional processes yield a deictic system set to retrieve a restricted range of values from the extra-linguistic context, thereby attenuating the cognitive demands of intention attribution and narrowing the gap between DeafBlind minds. PMID:26500576
Blindness - resources

MedlinePlus

Resources - blindness ... The following organizations are good resources for information on blindness : American Foundation for the Blind -- www.afb.org Foundation Fighting Blindness -- www.blindness.org National Eye Institute -- ...
Inter-Rater and Test-Retest Reliability of the Beery VMI in Schoolchildren

PubMed Central

Harvey, Erin M.; Leonard-Green, Tina K.; Mohan, Kathleen M.; Kulp, Marjean Taylor; Davis, Amy L.; Miller, Joseph M.; Twelker, J. Daniel; Campus, Irene; Dennis, Leslie K.

2017-01-01

Purpose To assess inter-rater and test-retest reliability of the 6th Edition Beery-Buktenica Developmental Test of Visual-Motor Integration (VMI) and test-retest reliability of the VMI Visual Perception Supplemental Test (VMIp) in school-age children. Methods Subjects were 163 Native American 3rd – 8th grade students with no significant refractive error (astigmatism < 1.00 D, myopia: < 0.75 D, hyperopia: < 2.50 D, anisometropia < 1.50 D) or ocular abnormalities. The VMI and VMIp were administered twice, on separate days. All VMI tests were scored by two trained scorers and a subset of 50 tests were also scored by an experienced scorer. Scorers strictly applied objective scoring criteria. Analyses included inter-rater and test-retest assessments of bias, 95% limits of agreement, and intraclass correlation analysis. Results Trained scorers had no significant scoring bias compared to the experienced scorer. One of the two trained scorers tended to provide higher scores than the other (mean difference in standardized scores = 1.54). Inter-rater correlations were strong (0.75 to 0.88). VMI and VMIp test-retest comparisons indicated no significant bias (subjects did not tend to score better on retest). Test-retest correlations were moderate (0.54 to 0.58). The 95% LOAs for the VMI were −24.14 to 24.67 (scorer 1) and −26.06 to 26.58 (scorer 2) and the 95% LOAs for the VMIp were −27.11 to 27.34. Conclusions The 95% LOA for test-retest differences will be useful for determining if the VMI and VMIp have sufficient sensitivity for detecting change with treatment in both clinical and research settings. Further research on test-retest reliability reporting 95% LOAs for children across different age ranges are recommended, particularly if the test is to be used to detect changes due to intervention or treatment. PMID:28422801
20 CFR 416.1720 - Whom we refer.

Code of Federal Regulations, 2013 CFR

2013-04-01

... Treatment of Alcoholism Or Drug Addiction § 416.1720 Whom we refer. We will refer you to an approved facility for treatment of your alcoholism or drug addiction if— (a) You are disabled; (b) You are not blind; (c) You are not 65 years old or older; and (d) Alcoholism or drug addiction is a contributing factor...
20 CFR 416.1720 - Whom we refer.

Code of Federal Regulations, 2012 CFR

2012-04-01

... Treatment of Alcoholism Or Drug Addiction § 416.1720 Whom we refer. We will refer you to an approved facility for treatment of your alcoholism or drug addiction if— (a) You are disabled; (b) You are not blind; (c) You are not 65 years old or older; and (d) Alcoholism or drug addiction is a contributing factor...
20 CFR 416.1720 - Whom we refer.

Code of Federal Regulations, 2014 CFR

2014-04-01

... Treatment of Alcoholism Or Drug Addiction § 416.1720 Whom we refer. We will refer you to an approved facility for treatment of your alcoholism or drug addiction if— (a) You are disabled; (b) You are not blind; (c) You are not 65 years old or older; and (d) Alcoholism or drug addiction is a contributing factor...
20 CFR 416.1720 - Whom we refer.

Code of Federal Regulations, 2011 CFR

2011-04-01

... Treatment of Alcoholism Or Drug Addiction § 416.1720 Whom we refer. We will refer you to an approved facility for treatment of your alcoholism or drug addiction if— (a) You are disabled; (b) You are not blind; (c) You are not 65 years old or older; and (d) Alcoholism or drug addiction is a contributing factor...
Shape Perception and Navigation in Blind Adults

PubMed Central

Gori, Monica; Cappagli, Giulia; Baud-Bovy, Gabriel; Finocchietti, Sara

2017-01-01

Different sensory systems interact to generate a representation of space and to navigate. Vision plays a critical role in the representation of space development. During navigation, vision is integrated with auditory and mobility cues. In blind individuals, visual experience is not available and navigation therefore lacks this important sensory signal. In blind individuals, compensatory mechanisms can be adopted to improve spatial and navigation skills. On the other hand, the limitations of these compensatory mechanisms are not completely clear. Both enhanced and impaired reliance on auditory cues in blind individuals have been reported. Here, we develop a new paradigm to test both auditory perception and navigation skills in blind and sighted individuals and to investigate the effect that visual experience has on the ability to reproduce simple and complex paths. During the navigation task, early blind, late blind and sighted individuals were required first to listen to an audio shape and then to recognize and reproduce it by walking. After each audio shape was presented, a static sound was played and the participants were asked to reach it. Movements were recorded with a motion tracking system. Our results show three main impairments specific to early blind individuals. The first is the tendency to compress the shapes reproduced during navigation. The second is the difficulty to recognize complex audio stimuli, and finally, the third is the difficulty in reproducing the desired shape: early blind participants occasionally reported perceiving a square but they actually reproduced a circle during the navigation task. We discuss these results in terms of compromised spatial reference frames due to lack of visual input during the early period of development. PMID:28144226
Unified Parkinson's Disease Rating Scale-Motor Exam: inter-rater reliability of advanced practice nurse and neurologist assessments.

PubMed

Palmer, Janice L; Coats, Mary A; Roe, Catherine M; Hanko, Shelly M; Xiong, Chengjie; Morris, John C

2010-06-01

This paper is a report of a study to establish the inter-rater reliability of advanced practice nurse and neurologist neurological assessments which included ratings with the Unified Parkinson's Disease Rating Scale-Motor Exam. Around the world, advanced practice nurses are performing tasks once completed only by physicians. To promote consumer and provider confidence, it is important to establish that nurse and physician ratings using assessment tools are similar. In addition in research settings, when different raters are used, establishment of inter-rater reliability for study assessments is needed. Advanced practice nurses and neurologists independently recorded findings on neurological examinations of 46 participants in a study conducted between August 2007 and January 2008. An intraclass correlation coefficient was calculated to estimate overall agreement between the nurse and neurologist ratings. Agreement for individual items measured on a dichotomous scale was assessed by calculating Cohen's kappa. There was substantial agreement between advanced practice nurses and neurologists on the mean Unified Parkinson's Disease Rating Scale-Motor Exam ratings (intraclass correlation coefficient = 0.65) and the U.S. National Alzheimer's Coordinating Center Uniform Data Set neurological examination ratings of unremarkable findings (kappa = 0.74) and of gait disorder (kappa = 0.73). Moderate agreement (kappa = 0.53) was reached for the rating of whether all Unified Parkinson's Disease Rating Scale-Motor Exam items were normal. These findings are consistent with studies of the inter-rater agreement of the Unified Parkinson's Disease Rating Scale-Motor Exam and support the conduct of neurological assessments by advanced practice nurses.
Examining Rater Effects of the TGMD-2 on Children with Intellectual Disability

ERIC Educational Resources Information Center

Kim, Youngdeok; Park, Ilhyeok; Kang, Minsoo

2012-01-01

The purpose of this study was to investigate rater effects on the TGMD-2 when it applied to children with intellectual disability. A total of 22 children with intellectual disabilities participated in this study. Children's performances in each of 12 subtests of the TGMD-2 were recorded via video and scored by three adapted physical activity…
Task demands affect spatial reference frame weighting during tactile localization in sighted and congenitally blind adults

PubMed Central

Schubert, Jonathan T. W.; Badde, Stephanie; Röder, Brigitte

2017-01-01

Task demands modulate tactile localization in sighted humans, presumably through weight adjustments in the spatial integration of anatomical, skin-based, and external, posture-based information. In contrast, previous studies have suggested that congenitally blind humans, by default, refrain from automatic spatial integration and localize touch using only skin-based information. Here, sighted and congenitally blind participants localized tactile targets on the palm or back of one hand, while ignoring simultaneous tactile distractors at congruent or incongruent locations on the other hand. We probed the interplay of anatomical and external location codes for spatial congruency effects by varying hand posture: the palms either both faced down, or one faced down and one up. In the latter posture, externally congruent target and distractor locations were anatomically incongruent and vice versa. Target locations had to be reported either anatomically (“palm” or “back” of the hand), or externally (“up” or “down” in space). Under anatomical instructions, performance was more accurate for anatomically congruent than incongruent target-distractor pairs. In contrast, under external instructions, performance was more accurate for externally congruent than incongruent pairs. These modulations were evident in sighted and blind individuals. Notably, distractor effects were overall far smaller in blind than in sighted participants, despite comparable target-distractor identification performance. Thus, the absence of developmental vision seems to be associated with an increased ability to focus tactile attention towards a non-spatially defined target. Nevertheless, that blind individuals exhibited effects of hand posture and task instructions in their congruency effects suggests that, like the sighted, they automatically integrate anatomical and external information during tactile localization. Moreover, spatial integration in tactile processing is, thus, flexibly
Blind image quality assessment without training on human opinion scores

NASA Astrophysics Data System (ADS)

Mittal, Anish; Soundararajan, Rajiv; Muralidhar, Gautam S.; Bovik, Alan C.; Ghosh, Joydeep

2013-03-01

We propose a family of image quality assessment (IQA) models based on natural scene statistics (NSS), that can predict the subjective quality of a distorted image without reference to a corresponding distortionless image, and without any training results on human opinion scores of distorted images. These `completely blind' models compete well with standard non-blind image quality indices in terms of subjective predictive performance when tested on the large publicly available `LIVE' Image Quality database.
Color-coded fluid-attenuated inversion recovery images improve inter-rater reliability of fluid-attenuated inversion recovery signal changes within acute diffusion-weighted image lesions.

PubMed

Kim, Bum Joon; Kim, Yong-Hwan; Kim, Yeon-Jung; Ahn, Sung Ho; Lee, Deok Hee; Kwon, Sun U; Kim, Sang Joon; Kim, Jong S; Kang, Dong-Wha

2014-09-01

Diffusion-weighted image fluid-attenuated inversion recovery (FLAIR) mismatch has been considered to represent ischemic lesion age. However, the inter-rater agreement of diffusion-weighted image FLAIR mismatch is low. We hypothesized that color-coded images would increase its inter-rater agreement. Patients with ischemic stroke <24 hours of a clear onset were retrospectively studied. FLAIR signal change was rated as negative, subtle, or obvious on conventional and color-coded FLAIR images based on visual inspection. Inter-rater agreement was evaluated using κ and percent agreement. The predictive value of diffusion-weighted image FLAIR mismatch for identification of patients <4.5 hours of symptom onset was evaluated. One hundred and thirteen patients were enrolled. The inter-rater agreement of FLAIR signal change improved from 69.9% (k=0.538) with conventional images to 85.8% (k=0.754) with color-coded images (P=0.004). Discrepantly rated patients on conventional, but not on color-coded images, had a higher prevalence of cardioembolic stroke (P=0.02) and cortical infarction (P=0.04). The positive predictive value for patients <4.5 hours of onset was 85.3% and 71.9% with conventional and 95.7% and 82.1% with color-coded images, by each rater. Color-coded FLAIR images increased the inter-rater agreement of diffusion-weighted image FLAIR recovery mismatch and may ultimately help identify unknown-onset stroke patients appropriate for thrombolysis. © 2014 American Heart Association, Inc.
How to assess and compare inter-rater reliability, agreement and correlation of ratings: an exemplary analysis of mother-father and parent-teacher expressive vocabulary rating pairs

PubMed Central

Stolarova, Margarita; Wolf, Corinna; Rinker, Tanja; Brielmann, Aenne

2014-01-01

This report has two main purposes. First, we combine well-known analytical approaches to conduct a comprehensive assessment of agreement and correlation of rating-pairs and to dis-entangle these often confused concepts, providing a best-practice example on concrete data and a tutorial for future reference. Second, we explore whether a screening questionnaire developed for use with parents can be reliably employed with daycare teachers when assessing early expressive vocabulary. A total of 53 vocabulary rating pairs (34 parent–teacher and 19 mother–father pairs) collected for two-year-old children (12 bilingual) are evaluated. First, inter-rater reliability both within and across subgroups is assessed using the intra-class correlation coefficient (ICC). Next, based on this analysis of reliability and on the test-retest reliability of the employed tool, inter-rater agreement is analyzed, magnitude and direction of rating differences are considered. Finally, Pearson correlation coefficients of standardized vocabulary scores are calculated and compared across subgroups. The results underline the necessity to distinguish between reliability measures, agreement and correlation. They also demonstrate the impact of the employed reliability on agreement evaluations. This study provides evidence that parent–teacher ratings of children's early vocabulary can achieve agreement and correlation comparable to those of mother–father ratings on the assessed vocabulary scale. Bilingualism of the evaluated child decreased the likelihood of raters' agreement. We conclude that future reports of agreement, correlation and reliability of ratings will benefit from better definition of terms and stricter methodological approaches. The methodological tutorial provided here holds the potential to increase comparability across empirical reports and can help improve research practices and knowledge transfer to educational and therapeutic settings. PMID:24994985

Rater Perceptions of Bias Using the Multiple Mini-Interview Format: A Qualitative Study

ERIC Educational Resources Information Center

Alweis, Richard L.; Fitzpatrick, Caroline; Donato, Anthony A.

2015-01-01

Introduction: The Multiple Mini-Interview (MMI) format appears to mitigate individual rater biases. However, the format itself may introduce structural systematic bias, favoring extroverted personality types. This study aimed to gain a better understanding of these biases from the perspective of the interviewer. Methods: A sample of MMI…
A FACETS Analysis of Rater Bias in Measuring Japanese Second Language Writing Performance.

ERIC Educational Resources Information Center

Kondo-Brown, Kimi

2002-01-01

Using FACETS, investigates how judgments of trained teacher raters are biased toward certain types of candidates and certain criteria in assessing Japanese second language writing. Explores the potential for using a modified version of a rating scale for norm-referenced decisions about Japanese second language writing ability. (Author/VWL)
76 FR 32969 - National Technical Assistance and Dissemination Center for Children Who Are Deaf-Blind; Proposed...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-06-07

... ``individuals who are deaf-blind'' refers to infants, toddlers, children, youth and young adults through age 21... grants funded under the Projects for Children and Young Adults who are Deaf-Blind program (CFDA Number 84...
Rehabilitation of cortical blindness secondary to stroke.

PubMed

Gaber, Tarek A-Z K

2010-01-01

Cortical blindness is a rare complication of posterior circulation stroke. However, its complex presentation with sensory, physical, cognitive and behavioural impairments makes it one of the most challenging. Appropriate approach from a rehabilitation standpoint was never reported. Our study aims to discuss the rehabilitation methods and outcomes of a cohort of patients with cortical blindness. The notes of all patients with cortical blindness referred to a local NHS rehabilitation service in the last 6~years were examined. Patients' demographics, presenting symptoms, scan findings, rehabilitation programmes and outcomes were documented. Seven patients presented to our service, six of them were males. The mean age was 63. Patients 1, 2 and 3 had total blindness with severe cognitive and behavioural impairments, wandering and akathisia. All of them failed to respond to any rehabilitation effort and the focus was on damage limitation. Pharmacological interventions had a modest impact on behaviour and sleep pattern. The 3 patients were discharged to a nursing facility. Patients 4, 5, 6 and 7 had partial blindness with variable severity. All of them suffered from significant memory impairment. However, none suffered from any behavioural, physical or other cognitive impairment. Rehabilitation efforts on 3 patients were carried out collaboratively between brain injury occupational therapists and sensory disability officers. All patients experienced significant improvement in handicap and they all maintained community placements. This small cohort of patients suggests that the rehabilitation philosophy and outcomes of these 2 distinct groups of either total or partial cortical blindness differ significantly.
Blind Astronomers

NASA Astrophysics Data System (ADS)

Hockey, Thomas A.

2011-01-01

The phrase "blind astronomer” is used as an allegorical oxymoron. However, there were and are blind astronomers. What of famous blind astronomers? First, it must be stated that these astronomers were not martyrs to their craft. It is a myth that astronomers blind themselves by observing the Sun. As early as France's William of Saint-Cloud (circa 1290) astronomers knew that staring at the Sun was ill-advised and avoided it. Galileo Galilei did not invent the astronomical telescope and then proceed to blind himself with one. Galileo observed the Sun near sunrise and sunset or through projection. More than two decades later he became blind, as many septuagenarians do, unrelated to their profession. Even Isaac Newton temporarily blinded himself, staring at the reflection of the Sun when he was a twentysomething. But permanent Sun-induced blindness? No, it did not happen. For instance, it was a stroke that left Scotland's James Gregory (1638-1675) blind. (You will remember the Gregorian telescope.) However, he died days later. Thus, blindness little interfered with his occupation. English Abbot Richard of Wallingford (circa 1291 - circa 1335) wrote astronomical works and designed astronomical instruments. He was also blind in one eye. Yet as he further suffered from leprosy, his blindness seems the lesser of Richard's maladies. Perhaps the most famous professionally active, blind astronomer (or almost blind astronomer) is Dominique-Francois Arago (1786-1853), director until his death of the powerful nineteenth-century Paris Observatory. I will share other _ some poignant _ examples such as: William Campbell, whose blindness drove him to suicide; Leonhard Euler, astronomy's Beethoven, who did nearly half of his life's work while almost totally blind; and Edwin Frost, who "observed” a total solar eclipse while completely sightless.
Intra- and Inter-rater Agreement of Superior Vena Cava Flow and Right Ventricular Outflow Measurements in Late Preterm and Term Neonates.

PubMed

Mahoney, Liam; Fernandez-Alvarez, Jose R; Rojas-Anaya, Hector; Aiton, Neil; Wertheim, David; Seddon, Paul; Rabe, Heike

2018-02-24

To explore the intra- and inter-rater agreement of superior vena cava (SVC) flow and right ventricular (RV) outflow in healthy and unwell late preterm neonates (33-37 weeks' gestational age), term neonates (≥37 weeks' gestational age), and neonates receiving total-body cooling. The intra- and inter-rater agreement (n = 25 and 41 neonates, respectively) rates for SVC flow and RV outflow were determined by echocardiography in healthy and unwell late preterm and term neonates with the use of Bland-Altman plots, the repeatability coefficient, the repeatability index, and intraclass correlation coefficients. The intra-rater repeatability index values were 41% for SVC flow and 31% for RV outflow, with intraclass correlation coefficients indicating good agreement for both measures. The inter-rater repeatability index values for SVC flow and RV outflow were 63% and 51%, respectively, with intraclass correlation coefficients indicating moderate agreement for both measures. If SVC flow or RV outflow is used in the hemodynamic treatment of neonates, sequential measurements should ideally be performed by the same clinician to reduce potential variability. © 2018 by the American Institute of Ultrasound in Medicine.
The inter-rater reliability and prognostic value of coma scales in Nepali children with acute encephalitis syndrome.

PubMed

Ray, Stephen; Rayamajhi, Ajit; Bonnett, Laura J; Solomon, Tom; Kneen, Rachel; Griffiths, Michael J

2018-02-01

Background Acute encephalitis syndrome (AES) is a common cause of coma in Nepali children. The Glasgow coma scale (GCS) is used to assess the level of coma in these patients and predict outcome. Alternative coma scales may have better inter-rater reliability and prognostic value in encephalitis in Nepali children, but this has not been studied. The Adelaide coma scale (ACS), Blantyre coma scale (BCS) and the Alert, Verbal, Pain, Unresponsive scale (AVPU) are alternatives to the GCS which can be used. Methods Children aged 1-14 years who presented to Kanti Children's Hospital, Kathmandu with AES between September 2010 and November 2011 were recruited. All four coma scales (GCS, ACS, BCS and AVPU) were applied on admission, 48 h later and on discharge. Inter-rater reliability (unweighted kappa) was measured for each. Correlation and agreement between total coma score and outcome (Liverpool outcome score) was measured by Spearman's rank and Bland-Altman plot. The prognostic value of coma scales alone and in combination with physiological variables was investigated in a subgroup (n = 22). A multivariable logistic regression model was fitted by backward stepwise. Results Fifty children were recruited. Inter-rater reliability using the variables scales was fair to moderate. However, the scales poorly predicted clinical outcome. Combining the scales with physiological parameters such as systolic blood pressure improved outcome prediction. Conclusion This is the first study to compare four coma scales in Nepali children with AES. The scales exhibited fair to moderate inter-rater reliability. However, the study is inadequately powered to answer the question on the relationship between coma scales and outcome. Further larger studies are required.
International inter-rater agreement in scoring acne severity utilizing cloud-based image sharing of mobile phone photographs.

PubMed

Foolad, Negar; Ornelas, Jennifer N; Clark, Ashley K; Ali, Ifrah; Sharon, Victoria R; Al Mubarak, Luluah; Lopez, Andrés; Alikhan, Ali; Al Dabagh, Bishr; Firooz, Alireza; Awasthi, Smita; Liu, Yu; Li, Chin-Shang; Sivamani, Raja K

2017-09-01

Cloud-based image sharing technology allows facilitated sharing of images. Cloud-based image sharing technology has not been well-studied for acne assessments or treatment preferences, among international evaluators. We evaluated inter-rater variability of acne grading and treatment recommendations among an international group of dermatologists that assessed photographs. This is a prospective, single visit photographic study to assess inter-rater agreement of acne photographs shared through an integrated mobile device, cloud-based, and HIPAA-compliant platform. Inter-rater agreements for global acne assessment and acne lesion counts were evaluated by the Kendall's coefficient of concordance while correlations between treatment recommendations and acne severity were calculated by Spearman's rank correlation coefficient. There was good agreement for the evaluation of inflammatory lesions (KCC = 0.62, P < 0.0001), noninflammatory lesions (KCC = 0.62, P < 0.0001), and the global acne grading system score (KCC = 0.69, P < 0.0001). Topical retinoid, oral antibiotic, and isotretinoin treatment preferences correlated with photographic based acne severity. Our study supports the use of mobile phone based photography and cloud-based image sharing for acne assessment. Cloud-based sharing may facilitate acne care and research among international collaborators. © 2017 The International Society of Dermatology.
Acamprosate and Baclofen were Not Effective in the Treatment of Pathological Gambling: Preliminary Blind Rater Comparison Study.

PubMed

Dannon, Pinhas N; Rosenberg, Oded; Schoenfeld, Netta; Kotler, Moshe

2011-01-01

Pathological gambling (PG) is a highly prevalent and disabling impulse control disorder. A range of psychopharmacological options are available for the treatment of PG, including selective serotonin reuptake inhibitors, opioid receptor antagonists, anti-addiction drugs, and mood stabilizers. In our preliminary study, we examined the efficacy of two anti-addiction drugs, baclofen and acamprosate, in the treatment of PG. Seventeen male gamblers were randomly divided into two groups. Each group received one of the two drugs without being blind to treatment. All patients underwent a comprehensive psychiatric diagnostic evaluation and completed a series of semi-structured interviews. During the 6-months of study, monthly evaluations were carried out to assess improvement and relapses. Relapse was defined as recurrent gambling behavior. None of the 17 patients reached the 6-months abstinence. One patient receiving baclofen sustained abstinence for 4 months. Fourteen patients succeeded in sustaining abstinence for 1-3 months. Two patients stopped attending monthly evaluations. Baclofen and acamprosate did not prove efficient in treating pathological gamblers.
Commitment to Change and Challenges to Implementing Changes After Workplace-Based Assessment Rater Training.

PubMed

Kogan, Jennifer R; Conforti, Lisa N; Yamazaki, Kenji; Iobst, William; Holmboe, Eric S

2017-03-01

Faculty development for clinical faculty who assess trainees is necessary to improve assessment quality and impor tant for competency-based education. Little is known about what faculty plan to do differently after training. This study explored the changes faculty intended to make after workplace-based assessment rater training, their ability to implement change, predictors of change, and barriers encountered. In 2012, 45 outpatient internal medicine faculty preceptors (who supervised residents) from 26 institutions participated in rater training. They completed a commitment to change form listing up to five commitments and ranked (on a 1-5 scale) their motivation for and anticipated difficulty implementing each change. Three months later, participants were interviewed about their ability to implement change and barriers encountered. The authors used logistic regression to examine predictors of change. Of 191 total commitments, the most common commitments focused on what faculty would change about their own teaching (57%) and increasing direct observation (31%). Of the 183 commitments for which follow-up data were available, 39% were fully implemented, 40% were partially implemented, and 20% were not implemented. Lack of time/competing priorities was the most commonly cited barrier. Higher initial motivation (odds ratio [OR] 2.02; 95% confidence interval [CI] 1.14, 3.57) predicted change. As anticipated difficulty increased, implementation became less likely (OR 0.67; 95% CI 0.49, 0.93). While higher baseline motivation predicted change, multiple system-level barriers undermined ability to implement change. Rater-training faculty development programs should address how faculty motivation and organizational barriers interact and influence ability to change.
Implicit Binding of Facial Features During Change Blindness

PubMed Central

Lyyra, Pessi; Mäkelä, Hanna; Hietanen, Jari K.; Astikainen, Piia

2014-01-01

Change blindness refers to the inability to detect visual changes if introduced together with an eye-movement, blink, flash of light, or with distracting stimuli. Evidence of implicit detection of changed visual features during change blindness has been reported in a number of studies using both behavioral and neurophysiological measurements. However, it is not known whether implicit detection occurs only at the level of single features or whether complex organizations of features can be implicitly detected as well. We tested this in adult humans using intact and scrambled versions of schematic faces as stimuli in a change blindness paradigm while recording event-related potentials (ERPs). An enlargement of the face-sensitive N170 ERP component was observed at the right temporal electrode site to changes from scrambled to intact faces, even if the participants were not consciously able to report such changes (change blindness). Similarly, the disintegration of an intact face to scrambled features resulted in attenuated N170 responses during change blindness. Other ERP deflections were modulated by changes, but unlike the N170 component, they were indifferent to the direction of the change. The bidirectional modulation of the N170 component during change blindness suggests that implicit change detection can also occur at the level of complex features in the case of facial stimuli. PMID:24498165
Implicit binding of facial features during change blindness.

PubMed

Lyyra, Pessi; Mäkelä, Hanna; Hietanen, Jari K; Astikainen, Piia

2014-01-01

Change blindness refers to the inability to detect visual changes if introduced together with an eye-movement, blink, flash of light, or with distracting stimuli. Evidence of implicit detection of changed visual features during change blindness has been reported in a number of studies using both behavioral and neurophysiological measurements. However, it is not known whether implicit detection occurs only at the level of single features or whether complex organizations of features can be implicitly detected as well. We tested this in adult humans using intact and scrambled versions of schematic faces as stimuli in a change blindness paradigm while recording event-related potentials (ERPs). An enlargement of the face-sensitive N170 ERP component was observed at the right temporal electrode site to changes from scrambled to intact faces, even if the participants were not consciously able to report such changes (change blindness). Similarly, the disintegration of an intact face to scrambled features resulted in attenuated N170 responses during change blindness. Other ERP deflections were modulated by changes, but unlike the N170 component, they were indifferent to the direction of the change. The bidirectional modulation of the N170 component during change blindness suggests that implicit change detection can also occur at the level of complex features in the case of facial stimuli.
Leveraging Data Sampling and Practical Knowledge: Field Instructors' Perceptions about Inter-Rater Reliability Data

ERIC Educational Resources Information Center

Soslau, Elizabeth; Lewis, Kandia

2014-01-01

For accreditation and programmatic decision making, education school administrators use inter-rater reliability analyses to judge credibility of student-teacher assessments. Although weak levels of agreement between university-appointed supervisors and cooperating teachers are usually interpreted to indicate that the process is not being…
Frame-of-Reference Training Effectiveness: Effects of Goal Orientation and Self-Efficacy on Affective, Cognitive, Skill-Based, and Transfer Outcomes

ERIC Educational Resources Information Center

Dierdorff, Erich C.; Surface, Eric A.; Brown, Kenneth G.

2010-01-01

Empirical evidence supporting frame-of-reference (FOR) training as an effective intervention for calibrating raters is convincing. Yet very little is known about who does better or worse in FOR training. We conducted a field study of how motivational factors influence affective, cognitive, and behavioral learning outcomes, as well as near transfer…
Objective measurements of excess skin in post bariatric patients--inter-rater reliability.

PubMed

Biörserud, Christina; Fagevik Olsén, Monika; Elander, Anna; Wiklund, Malin

2016-01-01

An ability to reliably assess excess skin after massive weight loss using well-described and transferrable methods is important. The aim of this trial was to evaluate inter-rater reliability of ptosis and circumference measurements in patients with excess skin after bariatric surgery. Twenty-five postbariatric patients were included in the study, and their excess skin was measured 18 months after surgery. A protocol was designed to measure excess skin in a standardised way. To evaluate the inter-rater reliability in the measuring protocol, all patients were measured twice, by a specialist nurse and a specialist physiotherapist. All circumference measurements on different body parts had an ICC > 0.9, indicating high reliability. Furthermore, all breast and abdominal ptosis measurements had high reliability. In contrast, visual evaluation of abdominal ptosis had poor reliability. Measurements of ptoses on different body parts had an ICC > 0.6. There were no systematic differences between the results of the two testers, except for measurements of the buttocks and maximal knee circumference. The measuring protocol presented in this study has high reliability and, therefore, represents a useful instrument to provide a consistent and objective assessment of excess skin in the postbariatric patient.
Allocentric and contra-aligned spatial representations of a town environment in blind people.

PubMed

Chiesa, Silvia; Schmidt, Susanna; Tinti, Carla; Cornoldi, Cesare

2017-10-01

Evidence concerning the representation of space by blind individuals is still unclear, as sometimes blind people behave like sighted people do, while other times they present difficulties. A better understanding of blind people's difficulties, especially with reference to the strategies used to form the representation of the environment, may help to enhance knowledge of the consequences of the absence of vision. The present study examined the representation of the locations of landmarks of a real town by using pointing tasks that entailed either allocentric points of reference with mental rotations of different degrees, or contra-aligned representations. Results showed that, in general, people met difficulties when they had to point from a different perspective to aligned landmarks or from the original perspective to contra-aligned landmarks, but this difficulty was particularly evident for the blind. The examination of the strategies adopted to perform the tasks showed that only a small group of blind participants used a survey strategy and that this group had a better performance with respect to people who adopted route or verbal strategies. Implications for the comprehension of the consequences on spatial cognition of the absence of visual experience are discussed, focusing in particular on conceivable interventions. Copyright © 2017 Elsevier B.V. All rights reserved.
False predictions about the detectability of visual changes: the role of beliefs about attention, memory, and the continuity of attended objects in causing change blindness blindness.

PubMed

Levin, Daniel T; Drivdahl, Sarah B; Momen, Nausheen; Beck, Melissa R

2002-12-01

Recently, a number of experiments have emphasized the degree to which subjects fail to detect large changes in visual scenes. This finding, referred to as "change blindness," is often considered surprising because many people have the intuition that such changes should be easy to detect. documented this intuition by showing that the majority of subjects believe they would notice changes that are actually very rarely detected. Thus subjects exhibit a metacognitive error we refer to as "change blindness blindness." Here, we test whether CBB is caused by a misestimation of the perceptual experience associated with visual changes and show that it persists even when the pre- and postchange views are separated by long delays. In addition, subjects overestimate their change detection ability both when the relevant changes are illustrated by still pictures, and when they are illustrated using videos showing the changes occurring in real time. We conclude that CBB is a robust phenomenon that cannot be accounted for by failure to understand the specific perceptual experience associated with a change. Copyright 2002 Elsevier Science (USA)
IRR (Inter-Rater Reliability) of a COP (Classroom Observation Protocol)--A Critical Appraisal

ERIC Educational Resources Information Center

Rui, Ning; Feldman, Jill M.

2012-01-01

Notwithstanding broad utility of COPs (classroom observation protocols), there has been limited documentation of the psychometric properties of even the most popular COPs. This study attempted to fill this void by closely examining the item and domain-level IRR (inter-rater reliability) of a COP that was used in a federally funded striving readers…
Does training improve diagnostic accuracy and inter-rater agreement in applying the Berlin radiographic definition of acute respiratory distress syndrome? A multicenter prospective study.

PubMed

Peng, Jin-Min; Qian, Chuan-Yun; Yu, Xiang-You; Zhao, Ming-Yan; Li, Shu-Sheng; Ma, Xiao-Chun; Kang, Yan; Zhou, Fa-Chun; He, Zhen-Yang; Qin, Tie-He; Yin, Yong-Jie; Jiang, Li; Hu, Zhen-Jie; Sun, Ren-Hua; Lin, Jian-Dong; Li, Tong; Wu, Da-Wei; An, You-Zhong; Ai, Yu-Hang; Zhou, Li-Hua; Cao, Xiang-Yuan; Zhang, Xi-Jing; Sun, Rong-Qing; Chen, Er-Zhen; Du, Bin

2017-01-20

Poor inter-rater reliability in chest radiograph interpretation has been reported in the context of acute respiratory distress syndrome (ARDS), although not for the Berlin definition of ARDS. We sought to examine the effect of training material on the accuracy and consistency of intensivists' chest radiograph interpretations for ARDS diagnosis. We conducted a rater agreement study in which 286 intensivists (residents 41.3%, junior attending physicians 35.3%, and senior attending physician 23.4%) independently reviewed the same 12 chest radiographs developed by the ARDS Definition Task Force ("the panel") before and after training. Radiographic diagnoses by the panel were classified into the consistent (n = 4), equivocal (n = 4), and inconsistent (n = 4) categories and were used as a reference. The 1.5-hour training course attended by all 286 intensivists included introduction of the diagnostic rationale, and a subsequent in-depth discussion to reach consensus for all 12 radiographs. Overall diagnostic accuracy, which was defined as the percentage of chest radiographs that were interpreted correctly, improved but remained poor after training (42.0 ± 14.8% before training vs. 55.3 ± 23.4% after training, p < 0.001). Diagnostic sensitivity and specificity improved after training for all diagnostic categories (p < 0.001), with the exception of specificity for the equivocal category (p = 0.883). Diagnostic accuracy was higher for the consistent category than for the inconsistent and equivocal categories (p < 0.001). Comparisons of pre-training and post-training results revealed that inter-rater agreement was poor and did not improve after training, as assessed by overall agreement (0.450 ± 0.406 vs. 0.461 ± 0.575, p = 0.792), Fleiss's kappa (0.133 ± 0.575 vs. 0.178 ± 0.710, p = 0.405), and intraclass correlation coefficient (ICC; 0.219 vs. 0.276, p = 0.470). The radiographic diagnostic accuracy and
Inter-rater reliability of malaria parasite counts and comparison of methods

PubMed Central

2009-01-01

Background The introduction of artemesinin-based treatment for falciparum malaria has led to a shift away from symptom-based diagnosis. Diagnosis may be achieved by using rapid non-microscopic diagnostic tests (RDTs), of which there are many available. Light microscopy, however, has a central role in parasite identification and quantification and remains the main method of parasite-based diagnosis in clinic and hospital settings and is necessary for monitoring the accuracy of RDTs. The World Health Organization has prepared a proficiency testing panel containing a range of malaria-positive blood samples of known parasitaemia, to be used for the assessment of commercially available malaria RDTs. Different blood film and counting methods may be used for this purpose, which raises questions regarding accuracy and reproducibility. A comparison was made of the established methods for parasitaemia estimation to determine which would give the least inter-rater and inter-method variation Methods Experienced malaria microscopists counted asexual parasitaemia on different slides using three methods; the thin film method using the total erythrocyte count, the thick film method using the total white cell count and the Earle and Perez method. All the slides were stained using Giemsa pH 7.2. Analysis of variance (ANOVA) models were used to find the inter-rater reliability for the different methods. The paired t-test was used to assess any systematic bias between the two methods, and a regression analysis was used to see if there was a changing bias with parasite count level. Results The thin blood film gave parasite counts around 30% higher than those obtained by the thick film and Earle and Perez methods, but exhibited a loss of sensitivity with low parasitaemia. The thick film and Earle and Perez methods showed little or no bias in counts between the two methods, however, estimated inter-rater reliability was slightly better for the thick film method. Conclusion The thin film

Ethnicity and Deprivation are Associated With Blindness Among Adults With Primary Glaucoma in Nigeria: Results From the Nigeria National Blindness and Visual Impairment Survey.

PubMed

Kyari, Fatima; Wormald, Richard; Murthy, Gudlavalleti V S; Evans, Jennifer R; Gilbert, Clare E

2016-10-01

We explored the risk factors for glaucoma blindness among adults aged 40 years and above with primary glaucoma in Nigeria. A total of 13,591 participants aged 40 years and above were examined in the Nigeria Blindness Survey; 682 (5.02%; 95 CI, 4.60%-5.47%) had glaucoma by ISGEO's criteria. This was a case-control study (n=890 eyes of 629 persons): glaucoma blind persons were cases and glaucoma not-blind were controls. Education and occupation were used to determine socioeconomic status scores, which were divided into 3 tertiles (affluent, medium, deprived). We assessed sociodemographic, biophysical, and ocular factors by logistic regression analysis for association with glaucoma blindness. Multinomial regression analysis was also performed with nonglaucoma as the reference category. A total of 119/629 (18.9%; 95% CI, 15.9%-22.4%) persons were blind in both eyes; 510 were controls. There was interethnic variation in odds of blindness; age, male sex, socioeconomic status, prior diagnosis of glaucoma, hypertension, intraocular pressure, and lens opacity were associated with glaucoma blindness. Axial length, mean ocular perfusion pressure, and angle-closure glaucoma were associated with blind glaucoma eyes. In multivariate analysis, Igbo ethnicity (OR=2.79; 95% CI, 1.03-7.57) had higher risk as was being male (OR=4.59; 95% CI, 1.73-12.16) and unmarried (OR=2.50; 95% CI, 1.03-6.07). Deprivation (OR=3.57; 95% CI, 1.46-8.72), prior glaucoma diagnosis (OR=5.89; 95% CI, 1.79-19.40), and intraocular pressure (OR=1.07; 95% CI, 1.04-1.09) were also independent risk factors for glaucoma blindness. Approximately 1 in 5 people with primary glaucoma were blind. Male sex, ethnicity and deprivation were strongly associated with blindness. Services for glaucoma need to improve in Nigeria, focusing on poor communities and men.
Inter-Rater Reliability of Total Body Score-A Scale for Quantification of Corpse Decomposition.

PubMed

Nawrocka, Marta; Frątczak, Katarzyna; Matuszewski, Szymon

2016-05-01

The degree of body decomposition can be quantified using Total Body Score (TBS), a scale frequently used in taphonomic or entomological studies of decomposition. Here, the inter-rater reliability of the scale is analyzed. The study was made on 120 laymen, which were trained in the use of the scale. Participants scored decomposition of pig carcasses from photographs. It was found that the scale, when used by different people, gives homogeneous results irrespective of the user qualifications (the Krippendorff's alfa for all participants was 0.818). The study also indicated that carcasses in advanced decomposition receive significantly less accurate scores. Moreover, it was found that scores for cadavers in mosaic decomposition (i.e., representing signs of at least two stages of decomposition) are less accurate. These results demonstrate that the scale may be regarded as inter-rater reliable. Some propositions for refinement of the scale were also discussed. © 2016 American Academy of Forensic Sciences.
Evaluation by Native and Non-Native English Teacher-Raters of Japanese Students' Summaries

ERIC Educational Resources Information Center

Hijikata-Someya, Yuko; Ono, Masumi; Yamanishi, Hiroyuki

2015-01-01

Although the importance of summary writing is well documented in prior studies, few have investigated the evaluation of written summaries. Due to the complex nature of L2 summary writing, which requires one to read the original material and summarize its content in the L2, raters often emphasize different features when judging the quality of L2…
Intra- and inter-rater reliability of 3D passive intervertebral motion in subjects with nonspecific neck pain assessed by physical therapy students: A pilot study.

PubMed

Rossettini, Giacomo; Rondoni, Angie; Lovato, Tommaso; Strobe, Marco; Verzè, Elisa; Vicentini, Marco; Testa, Marco

2016-06-03

Passive Intervertebral Movements (PIVMs) are commonly used to assess and treat patients with nonspecific neck pain. Only very few studies have investigated 3D movements until now. This study assessed intra- and inter-rater reliability of three-dimensional (3D) cervical PIVMs performed by physical therapy students in patients with nonspecific neck pain. Thirty-one patients, mean age 47.2 ± 7.2 years, were independently evaluated by 2 physical therapy students. The raters (A and B) assessed mobility, end-feel and pain provocation performing bilaterally the 3D cervical segmental side-bending test (3D CSSB) from levels C2-C3 to C6-C7. Percentage agreement (raw, positive and negative), Cohen's kappa (95% CI), prevalence index and bias index were calculated to estimate intra- and inter-reliability. Intra-rater reliability showed kappa values ranging between fair and substantial (k 0.29-0.80) for pain provocation, mobility and end-feel, with percentage agreements between 61%-90%. Inter-rater reliability presented kappa values ranging between fair and substantial (k 0.22-0.62) for pain provocation, mobility and end-feel, with percentage agreements between 61% and 80%. Intra-rater reliability of 3D PIVMs was superior to inter-rater reliability in patients with nonspecific neck pain. The most repeatable evaluation parameter was pain. However overall poor reliability suggests avoiding the use of these techniques alone to examine patients and measure their outcome. Further studies are needed to investigate PIVMs reliability in combination with other assessment procedure in symptomatic patients.
Memory impairment is not sufficient for choice blindness to occur.

PubMed

Sagana, Anna; Sauerland, Melanie; Merckelbach, Harald

2014-01-01

Choice blindness refers to the phenomenon that people can be easily misled about the choices they made in the recent past. The aim of this study was to explore the cognitive mechanisms underlying choice blindness. Specifically, we tested whether memory impairment may account for choice blindness. A total of N = 88 participants provided sympathy ratings on 10-point scales for 20 female faces. Subsequently, participants motivated some of their ratings. However, on three trials, they were presented with sympathy ratings that deviated from their original ratings by three full scale points. On nearly 41% of the trials, participants failed to detect (i.e., were blind) the manipulation. After a short interval, participants were informed that some trials had been manipulated and were asked to recall their original ratings. Participants adopted the manipulated outcome in only 3% of the trials. Furthermore, the extent to which the original ratings were accurately remembered was not higher for detected as compared with non-detected trials. From a theoretical point of view our findings indicate that memory impairment does not fully account for blindness phenomena.
Memory impairment is not sufficient for choice blindness to occur

PubMed Central

Sagana, Anna; Sauerland, Melanie; Merckelbach, Harald

2014-01-01

Choice blindness refers to the phenomenon that people can be easily misled about the choices they made in the recent past. The aim of this study was to explore the cognitive mechanisms underlying choice blindness. Specifically, we tested whether memory impairment may account for choice blindness. A total of N = 88 participants provided sympathy ratings on 10-point scales for 20 female faces. Subsequently, participants motivated some of their ratings. However, on three trials, they were presented with sympathy ratings that deviated from their original ratings by three full scale points. On nearly 41% of the trials, participants failed to detect (i.e., were blind) the manipulation. After a short interval, participants were informed that some trials had been manipulated and were asked to recall their original ratings. Participants adopted the manipulated outcome in only 3% of the trials. Furthermore, the extent to which the original ratings were accurately remembered was not higher for detected as compared with non-detected trials. From a theoretical point of view our findings indicate that memory impairment does not fully account for blindness phenomena. PMID:24904467
Learning, Behaviour and Reaction Framework: A Model for Training Raters to Improve Assessment Quality

ERIC Educational Resources Information Center

Chen, Chung-Yang; Chang, Huiju; Hsu, Wen-Chin; Sheen, Gwo-Ji

2017-01-01

This paper proposes a training model for raters, with the goal to improve the intra- and inter-consistency of evaluation quality for higher education curricula. The model, termed the learning, behaviour and reaction (LBR) circular training model, is an interdisciplinary application from the business and organisational training domain. The…
Reliability of the standard goniometry and diagrammatic recording of finger joint angles: a comparative study with healthy subjects and non-professional raters.

PubMed

Macionis, Valdas

2013-01-09

Diagrammatic recording of finger joint angles by using two criss-crossed paper strips can be a quick substitute to the standard goniometry. As a preliminary step toward clinical validation of the diagrammatic technique, the current study employed healthy subjects and non-professional raters to explore whether reliability estimates of the diagrammatic goniometry are comparable with those of the standard procedure. The study included two procedurally different parts, which were replicated by assigning 24 medical students to act interchangeably as 12 subjects and 12 raters. A larger component of the study was designed to compare goniometers side-by-side in measurement of finger joint angles varying from subject to subject. In the rest of the study, the instruments were compared by parallel evaluations of joint angles similar for all subjects in a situation of simulated change of joint range of motion over time. The subjects used special guides to position the joints of their left ring finger at varying angles of flexion and extension. The obtained diagrams of joint angles were converted to numerical values by computerized measurements. The statistical approaches included calculation of appropriate intraclass correlation coefficients, standard errors of measurements, proportions of measurement differences of 5 or less degrees, and significant differences between paired observations. Reliability estimates were similar for both goniometers. Intra-rater and inter-rater intraclass correlation coefficients ranged from 0.69 to 0.93. The corresponding standard errors of measurements ranged from 2.4 to 4.9 degrees. Repeated measurements of a considerable number of raters fell within clinically non-meaningful 5 degrees of each other in proportions comparable with a criterion value of 0.95. Data collected with both instruments could be similarly interpreted in a simulated situation of change of joint range of motion over time. The paper goniometer and the standard goniometer can
Reliability of the standard goniometry and diagrammatic recording of finger joint angles: a comparative study with healthy subjects and non-professional raters

PubMed Central

2013-01-01

Background Diagrammatic recording of finger joint angles by using two criss-crossed paper strips can be a quick substitute to the standard goniometry. As a preliminary step toward clinical validation of the diagrammatic technique, the current study employed healthy subjects and non-professional raters to explore whether reliability estimates of the diagrammatic goniometry are comparable with those of the standard procedure. Methods The study included two procedurally different parts, which were replicated by assigning 24 medical students to act interchangeably as 12 subjects and 12 raters. A larger component of the study was designed to compare goniometers side-by-side in measurement of finger joint angles varying from subject to subject. In the rest of the study, the instruments were compared by parallel evaluations of joint angles similar for all subjects in a situation of simulated change of joint range of motion over time. The subjects used special guides to position the joints of their left ring finger at varying angles of flexion and extension. The obtained diagrams of joint angles were converted to numerical values by computerized measurements. The statistical approaches included calculation of appropriate intraclass correlation coefficients, standard errors of measurements, proportions of measurement differences of 5 or less degrees, and significant differences between paired observations. Results Reliability estimates were similar for both goniometers. Intra-rater and inter-rater intraclass correlation coefficients ranged from 0.69 to 0.93. The corresponding standard errors of measurements ranged from 2.4 to 4.9 degrees. Repeated measurements of a considerable number of raters fell within clinically non-meaningful 5 degrees of each other in proportions comparable with a criterion value of 0.95. Data collected with both instruments could be similarly interpreted in a simulated situation of change of joint range of motion over time. Conclusions The paper
Evaluation of previously embolized intracranial aneurysms: inter-and intra-rater reliability among neurosurgeons and interventional neuroradiologists.

PubMed

Zuckerman, Scott L; Lakomkin, Nikita; Magarik, Jordan A; Vargas, Jan; Stephens, Marcus; Akinpelu, Babatunde; Spiotta, Alejandro M; Ahmed, Azam; Arthur, Adam S; Fiorella, David; Hanel, Ricardo; Hirsch, Joshua A; Hui, Ferdinand K; James, Robert F; Kallmes, David F; Meyers, Philip M; Niemann, David B; Rasmussen, Peter; Turner, Raymond D; Welch, Babu G; Mocco, J

2018-05-01

The angiographic evaluation of previously coiled aneurysms can be difficult yet remains critical for determining re-treatment. The main objective of this study was to determine the inter-rater reliability for both the Raymond Scale and per cent embolization among a group of neurointerventionalists evaluating previously embolized aneurysms. A panel of 15 neurointerventionalists examined 92 distinct cases of immediate post-coil embolization and 1 year post-embolization angiographs. Each case was presented four times throughout the study, along with alterations in demographics in order to evaluate intra-rater reliability. All respondents were asked to provide the per cent embolization (0-100%) and Raymond Scale grade (1-3) for each aneurysm. Inter-rater reliability was evaluated by computing weighted kappa values (for the Raymond Scale) and intraclass correlation coefficients (ICC) for per cent embolization. 10 neurosurgeons and 5 interventional neuroradiologists evaluated 368 simulated cases. The agreement among all readers employing the Raymond Scale was fair (κ=0.35) while concordance in per cent embolization was good (ICC=0.64). Clinicians with fewer than 10 years of experience demonstrated a significantly greater level of agreement than the group with greater than 10 years (κ=0.39 and ICC=0.70 vs κ=0.28 and ICC=0.58). When the same aneurysm was presented multiple times, clinicians demonstrated excellent consistency when assessing per cent embolization (ICC=0.82), but moderate agreement when employing the Raymond classification (κ=0.58). Identifying the per cent embolization in previously coiled aneurysms resulted in good inter- and intra-rater agreement, regardless of years of experience. The strong agreement among providers employing per cent embolization may make it a valuable tool for embolization assessment in this patient population. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights
Inter-rater agreement between trachoma graders: comparison of grades given in field conditions versus grades from photographic review

PubMed Central

Gebresillasie, Sintayehu; Tadesse, Zerihun; Shiferaw, Ayalew; Yu, Sun N.; Stoller, Nicole E.; Zhou, Zhaoxia; Emerson, Paul M.; Gaynor, Bruce D.; Lietman, Thomas M.; Keenan, Jeremy D.

2016-01-01

Purpose Trachoma surveillance is most commonly performed by direct observation, usually by non-ophthalmologists using the World Health Organization (WHO) simplified grading system. However, conjunctival photographs may offer several benefits over direct clinical observation, including the potential for greater inter-rater agreement. This study assesses whether inter-rater agreement of trachoma grading differs when trained graders review conjunctival photographs versus when they perform conjunctival examinations in the field. Methods 3 trained trachoma graders each performed an independent examination of the everted right tarsal conjunctiva of 269 children aged 0-9 years, and then reviewed photographs of these same conjunctivae in a random order. For each eye, the grader documented the presence or absence of follicular trachoma (TF) and intense trachomatous inflammation (TI) according to the WHO simplified grading system. Results Inter-rater agreement for grade of TF was significantly higher in the field (kappa coefficient, κ, 0.73, 95% confidence interval, CI 0.67-0.80) than by photographic review (κ=0.55, 95% CI 0.49-0.63; difference in κ between field grading and photo grading 0.18, 95% CI 0.09-0.26). When field and photographic grades were each assessed as the consensus grade from the 3 graders, agreement between in-field and photographic graders was high for TF (κ=0.75, 95% CI 0.68-0.84). Conclusions In an area with hyperendemic trachoma, inter-rater agreement was lower for photographic assessment of trachoma than for in-field assessment. However, the trachoma grade reached by a consensus of photographic graders agreed well with the grade given by a consensus of in-field graders. PMID:26158573
Unified Parkinson’s Disease Rating Scale-Motor Exam: Inter-rater reliability of advanced practice nurse and neurologist assessments

PubMed Central

Palmer, Janice L.; Coats, Mary A.; Roe, Catherine M.; Hanko, Shelly M.; Xiong, Chengjie; Morris, John C.

2010-01-01

Aim This paper is a report of a study to establish the inter-rater reliability of advanced practice nurse and neurologist neurological assessments which included ratings with the Unified Parkinson’s Disease Rating Scale-Motor Exam. Background Around the world, advanced practice nurses are performing tasks once completed by only physicians. To promote consumer and provider confidence, it is important to establish that nurse and physician ratings using assessment tools are similar. In addition in research settings, when different raters are used, establishment of inter-rater reliability for study assessments is needed. Method Advanced practice nurses and neurologists independently recorded findings on neurological examinations of 46 participants in a study conducted between August 2007 and January 2008. An intraclass correlation coefficient was calculated to estimate overall agreement between the nurse and neurologist ratings. Agreement for individual items measured on a dichotomous scale was assessed by calculating Cohen’s kappa. Results There was substantial agreement between advanced practice nurses and neurologists on the mean Unified Parkinson’s Disease Rating Scale-Motor Exam ratings (intraclass correlation coefficient = 0.65) and the U.S. National Alzheimer’s Coordinating Center Uniform Data Set neurological examination ratings of unremarkable findings (kappa = 0.74) and of gait disorder (kappa = 0.73). Moderate agreement (kappa = 0.53) was reached for the rating of whether all Unified Parkinson’s Disease Rating Scale-Motor Exam items were normal. Conclusion These findings are consistent with studies of the inter-rater agreement of the Unified Parkinson’s Disease Rating Scale-Motor Exam and support the conduct of neurological assessments by advanced practice nurses. PMID:20546368
Sex Education for Deaf-Blind Youths and Adults.

ERIC Educational Resources Information Center

Ingraham, Cynthia L.; Vernon, McCay; Clemente, Brenda; Olney, Linda

2000-01-01

This article describes a model sex education program developed for youths and adults who are deafblind by the Helen Keller National Center for Deaf-Blind Youths and Adults. In addition, it also discusses major related issues and presents general recommendations and a resource for further information. (Contains 11 references.) (Author/CR)
The pattern of childhood blindness in Karnataka, South India.

PubMed

Gogate, Parikshit; Kishore, H; Dole, Kuldeep; Shetty, Jyoti; Gilbert, Clare; Ranade, Satish; Kumar, Mohan; Srihari; Deshpande, Madan

2009-01-01

To determine the causes of severe visual impairment and blindness in children in schools for the blind in southern Karnataka state of India. Children aged less than 16 years with a visual acuity of < 6/60 in the better eye, attending the residential schools for the blind were examined in 2005-2006, in the Karnataka state in the south of India. History taking, visual acuity estimation, external ocular examination, retinoscopy, and fundoscopy were done on all students. Refraction and low vision work-up done where indicated. The anatomical and etiological causes of severe visual impairment (< 6/60-3/60) and blindness (< 3/60 in the better eye) were classified using the World Health Organization's prevention of blindness programs' record system. A total of 1,179 students were examined, 891 of whom fulfilled the eligibility criteria. The major anatomical sites of visual loss were congenital anomalies (microphthalmos, anophthalmos) (321, 35.7%), corneal conditions (mainly scarring due to vitamin A deficiency, measles, trauma) (133, 14.9%), cataract or aphakia in 102 (11.4%), and retinal disorders (mainly dystrophies) in 177 children (19.9%). Nearly one-fourth of children were blind from conditions which could have been prevented or treated (27.8%), 87 of whom were referred for surgery. Low vision devices improved near acuity in 27 children (3%), and 43 (4.8%) benefited from refraction. Congenital anomalies, cataract, and retinal conditions account for most of the blindness in children.
ICT in Portuguese Reference Schools for the Education of Blind and Partially Sighted Students

ERIC Educational Resources Information Center

Ramos, Sara Isabel Moca; de Andrade, António Manuel Valente

2016-01-01

Technology has become an essential component in our society and considering its impact in the educational system, Information and Communication Technologies (ICT) cannot be dissociated from the educational process and, in particular, from pedagogical practices adopted for students who are blind or partially sighted. This study focuses on…
Inter-rater reliability of PATH observations for assessment of ergonomic risk factors in hospital work.

PubMed

Park, Jung-Keun; Boyer, Jon; Tessler, Jamie; Casey, Jeffrey; Schemm, Linda; Gore, Rebecca; Punnett, Laura

2009-07-01

This study examined the inter-rater reliability of expert observations of ergonomic risk factors by four analysts. Ten jobs were observed at a hospital using a newly expanded version of the PATH method (Buchholz et al. 1996), to which selected upper extremity exposures had been added. Two of the four raters simultaneously observed each worker onsite for a total of 443 observation pairs containing 18 categorical exposure items each. For most exposure items, kappa coefficients were 0.4 or higher. For some items, agreement was higher both for the jobs with less rapid hand activity and for the analysts with a higher level of ergonomic job analysis experience. These upper extremity exposures could be characterised reliably with real-time observation, given adequate experience and training of the observers. The revised version of PATH is applicable to the analysis of jobs where upper extremity musculoskeletal strain is of concern.
The Berg Balance Scale has high intra- and inter-rater reliability but absolute reliability varies across the scale: a systematic review.

PubMed

Downs, Stephen; Marquez, Jodie; Chiarelli, Pauline

2013-06-01

What is the intra-rater and inter-rater relative reliability of the Berg Balance Scale? What is the absolute reliability of the Berg Balance Scale? Does the absolute reliability of the Berg Balance Scale vary across the scale? Systematic review with meta-analysis of reliability studies. Any clinical population that has undergone assessment with the Berg Balance Scale. Relative intra-rater reliability, relative inter-rater reliability, and absolute reliability. Eleven studies involving 668 participants were included in the review. The relative intrarater reliability of the Berg Balance Scale was high, with a pooled estimate of 0.98 (95% CI 0.97 to 0.99). Relative inter-rater reliability was also high, with a pooled estimate of 0.97 (95% CI 0.96 to 0.98). A ceiling effect of the Berg Balance Scale was evident for some participants. In the analysis of absolute reliability, all of the relevant studies had an average score of 20 or above on the 0 to 56 point Berg Balance Scale. The absolute reliability across this part of the scale, as measured by the minimal detectable change with 95% confidence, varied between 2.8 points and 6.6 points. The Berg Balance Scale has a higher absolute reliability when close to 56 points due to the ceiling effect. We identified no data that estimated the absolute reliability of the Berg Balance Scale among participants with a mean score below 20 out of 56. The Berg Balance Scale has acceptable reliability, although it might not detect modest, clinically important changes in balance in individual subjects. The review was only able to comment on the absolute reliability of the Berg Balance Scale among people with moderately poor to normal balance. Copyright © 2013 Australian Physiotherapy Association. Published by .. All rights reserved.
The Influence of Skin Redness on Blinding in Transcranial Direct Current Stimulation Studies: A Crossover Trial.

PubMed

Ezquerro, Fernando; Moffa, Adriano H; Bikson, Marom; Khadka, Niranjan; Aparicio, Luana V M; de Sampaio-Junior, Bernardo; Fregni, Felipe; Bensenor, Isabela M; Lotufo, Paulo A; Pereira, Alexandre Costa; Brunoni, Andre R

2017-04-01

To evaluate whether and to which extent skin redness (erythema) affects investigator blinding in transcranial direct current stimulation (tDCS) trials. Twenty-six volunteers received sham and active tDCS, which was applied with saline-soaked sponges of different thicknesses. High-resolution skin images, taken before and 5, 15, and 30 min after stimulation, were randomized and presented to experienced raters who evaluated erythema intensity and judged on the likelihood of stimulation condition (sham vs. active). In addition, semi-automated image processing generated probability heatmaps and surface area coverage of erythema. Adverse events were also collected. Erythema was present, but less intense in sham compared to active groups. Erythema intensity was inversely and directly associated to correct sham and active stimulation group allocation, respectively. Our image analyses found that erythema also occurs after sham and its distribution is homogenous below electrodes. Tingling frequency was higher using thin compared to thick sponges, whereas erythema was more intense under thick sponges. Optimal investigator blinding is achieved when erythema after tDCS is mild. Erythema distribution under the electrode is patchy, occurs after sham tDCS and varies according to sponge thickness. We discuss methods to address skin erythema-related tDCS unblinding. © 2016 International Neuromodulation Society.
Measurement of the Inter-Rater Reliability Rate Is Mandatory for Improving the Quality of a Medical Database: Experience with the Paulista Lung Cancer Registry.

PubMed

Lauricella, Leticia L; Costa, Priscila B; Salati, Michele; Pego-Fernandes, Paulo M; Terra, Ricardo M

2018-06-01

Database quality measurement should be considered a mandatory step to ensure an adequate level of confidence in data used for research and quality improvement. Several metrics have been described in the literature, but no standardized approach has been established. We aimed to describe a methodological approach applied to measure the quality and inter-rater reliability of a regional multicentric thoracic surgical database (Paulista Lung Cancer Registry). Data from the first 3 years of the Paulista Lung Cancer Registry underwent an audit process with 3 metrics: completeness, consistency, and inter-rater reliability. The first 2 methods were applied to the whole data set, and the last method was calculated using 100 cases randomized for direct auditing. Inter-rater reliability was evaluated using percentage of agreement between the data collector and auditor and through calculation of Cohen's κ and intraclass correlation. The overall completeness per section ranged from 0.88 to 1.00, and the overall consistency was 0.96. Inter-rater reliability showed many variables with high disagreement (>10%). For numerical variables, intraclass correlation was a better metric than inter-rater reliability. Cohen's κ showed that most variables had moderate to substantial agreement. The methodological approach applied to the Paulista Lung Cancer Registry showed that completeness and consistency metrics did not sufficiently reflect the real quality status of a database. The inter-rater reliability associated with κ and intraclass correlation was a better quality metric than completeness and consistency metrics because it could determine the reliability of specific variables used in research or benchmark reports. This report can be a paradigm for future studies of data quality measurement. Copyright © 2018 American College of Surgeons. Published by Elsevier Inc. All rights reserved.
Analyses of inter-rater reliability between professionals, medical students and trained school children as assessors of basic life support skills.

PubMed

Beck, Stefanie; Ruhnke, Bjarne; Issleib, Malte; Daubmann, Anne; Harendza, Sigrid; Zöllner, Christian

2016-10-07

Training of lay-rescuers is essential to improve survival-rates after cardiac arrest. Multiple campaigns emphasise the importance of basic life support (BLS) training for school children. Trainings require a valid assessment to give feedback to school children and to compare the outcomes of different training formats. Considering these requirements, we developed an assessment of BLS skills using MiniAnne and tested the inter-rater reliability between professionals, medical students and trained school children as assessors. Fifteen professional assessors, 10 medical students and 111-trained school children (peers) assessed 1087 school children at the end of a CPR-training event using the new assessment format. Analyses of inter-rater reliability (intraclass correlation coefficient; ICC) were performed. Overall inter-rater reliability of the summative assessment was high (ICC = 0.84, 95 %-CI: 0.84 to 0.86, n = 889). The number of comparisons between peer-peer assessors (n = 303), peer-professional assessors (n = 339), and peer-student assessors (n = 191) was adequate to demonstrate high inter-rater reliability between peer- and professional-assessors (ICC: 0.76), peer- and student-assessors (ICC: 0.88) and peer- and other peer-assessors (ICC: 0.91). Systematic variation in rating of specific items was observed for three items between professional- and peer-assessors. Using this assessment and integrating peers and medical students as assessors gives the opportunity to assess hands-on skills of school children with high reliability.

Blind Quantum Signature with Blind Quantum Computation

NASA Astrophysics Data System (ADS)

Li, Wei; Shi, Ronghua; Guo, Ying

2017-04-01

Blind quantum computation allows a client without quantum abilities to interact with a quantum server to perform a unconditional secure computing protocol, while protecting client's privacy. Motivated by confidentiality of blind quantum computation, a blind quantum signature scheme is designed with laconic structure. Different from the traditional signature schemes, the signing and verifying operations are performed through measurement-based quantum computation. Inputs of blind quantum computation are securely controlled with multi-qubit entangled states. The unique signature of the transmitted message is generated by the signer without leaking information in imperfect channels. Whereas, the receiver can verify the validity of the signature using the quantum matching algorithm. The security is guaranteed by entanglement of quantum system for blind quantum computation. It provides a potential practical application for e-commerce in the cloud computing and first-generation quantum computation.
Emotional Bias in Classroom Observations: Within-Rater Positive Emotion Predicts Favorable Assessments of Classroom Quality

ERIC Educational Resources Information Center

Floman, James L.; Hagelskamp, Carolin; Brackett, Marc A.; Rivers, Susan E.

2017-01-01

Classroom observations increasingly inform high-stakes decisions and research in education, including the allocation of school funding and the evaluation of school-based interventions. However, trends in rater scoring tendencies over time may undermine the reliability of classroom observations. Accordingly, the present investigations, grounded in…
Raters' L2 Background as a Potential Source of Bias in Rating Oral Performance

ERIC Educational Resources Information Center

Winke, Paula; Gass, Susan; Myford, Carol

2013-01-01

Based on evidence that listeners may favor certain foreign accents over others (Gass & Varonis, 1984; Major, Fitzmaurice, Bunta, & Balasubramanian, 2002; Tauroza & Luk, 1997) and that language-test raters may better comprehend and/or rate the speech of test takers whose native languages (L1s) are more familiar on some level (Carey,…
A protocol for the Hamilton Rating Scale for Depression: Item scoring rules, Rater training, and outcome accuracy with data on its application in a clinical trial.

PubMed

Rohan, Kelly J; Rough, Jennifer N; Evans, Maggie; Ho, Sheau-Yan; Meyerhoff, Jonah; Roberts, Lorinda M; Vacek, Pamela M

2016-08-01

We present a fully articulated protocol for the Hamilton Rating Scale for Depression (HAM-D), including item scoring rules, rater training procedures, and a data management algorithm to increase accuracy of scores prior to outcome analyses. The latter involves identifying potentially inaccurate scores as interviews with discrepancies between two independent raters on the basis of either scores >=5-point difference) or meeting threshold for depression recurrence status, a long-term treatment outcome with public health significance. Discrepancies are resolved by assigning two new raters, identifying items with disagreement per an algorithm, and reaching consensus on the most accurate scores for those items. These methods were applied in a clinical trial where the primary outcome was the Structured Interview Guide for the Hamilton Rating Scale for Depression-Seasonal Affective Disorder version (SIGH-SAD), which includes the 21-item HAM-D and 8 items assessing atypical symptoms. 177 seasonally depressed adult patients were enrolled and interviewed at 10 time points across treatment and the 2-year followup interval for a total of 1589 completed interviews with 1535 (96.6%) archived. Inter-rater reliability ranged from ICCs of .923-.967. Only 86 (5.6%) interviews met criteria for a between-rater discrepancy. HAM-D items "Depressed Mood", "Work and Activities", "Middle Insomnia", and "Hypochondriasis" and Atypical items "Fatigability" and "Hypersomnia" contributed most to discrepancies. Generalizability beyond well-trained, experienced raters in a clinical trial is unknown. Researchers might want to consider adopting this protocol in part or full. Clinicians might want to tailor it to their needs. Copyright © 2016 Elsevier B.V. All rights reserved.
Inter-rater agreement among orthodontists in a blocked experiment.

PubMed

Korn, E L; Baumrind, S

1985-01-01

Five orthodontists were asked to predict for 64 patients a particular dichotomous outcome of treatment based on pre-treatment X-ray films. The orthodontists rated the cases in blocks of size 4-6 with the knowledge of the number of positive outcomes in each block. We discuss the reasons why this blocked design is appropriate whenever clinicians are asked to rate cases which have not been randomly selected from a clinical practice similar to their own. We give a simple description of the inter-rater agreement for this type of blocked experiment as well as a procedure to test that the agreement is no better than that expected by random independent assignment.
Greek mythology: the eye, ophthalmology, eye disease, and blindness.

PubMed

Trompoukis, Constantinos; Kourkoutas, Dimitrios

2007-06-01

In distant eras, mythology was a form of expression used by many peoples. A study of the Greek myths reveals concealed medical knowledge, in many cases relating to the eye. An analysis was made of the ancient Greek texts for mythological references relating to an understanding of vision, visual abilities, the eye, its congenital and acquired abnormalities, blindness, and eye injuries and their treatment. The Homeric epics contain anatomical descriptions of the eyes and the orbits, and an elementary knowledge of physiology is also apparent. The concept of the visual field can be seen in the myth of Argos Panoptes. Many myths describe external eye disease ("knyzosis"), visual disorders (amaurosis), and cases of blinding that, depending on the story, are ascribed to various causes. In addition, ocular motility abnormalities, congenital anomalies (cyclopia), injuries, and special treatments, such as the "licking" method, are mentioned. The study of mythological references to the eye reveals reliable medical observations of the ancient Greeks, which are concealed within the myths.
Blind prediction of natural video quality.

PubMed

Saad, Michele A; Bovik, Alan C; Charrier, Christophe

2014-03-01

We propose a blind (no reference or NR) video quality evaluation model that is nondistortion specific. The approach relies on a spatio-temporal model of video scenes in the discrete cosine transform domain, and on a model that characterizes the type of motion occurring in the scenes, to predict video quality. We use the models to define video statistics and perceptual features that are the basis of a video quality assessment (VQA) algorithm that does not require the presence of a pristine video to compare against in order to predict a perceptual quality score. The contributions of this paper are threefold. 1) We propose a spatio-temporal natural scene statistics (NSS) model for videos. 2) We propose a motion model that quantifies motion coherency in video scenes. 3) We show that the proposed NSS and motion coherency models are appropriate for quality assessment of videos, and we utilize them to design a blind VQA algorithm that correlates highly with human judgments of quality. The proposed algorithm, called video BLIINDS, is tested on the LIVE VQA database and on the EPFL-PoliMi video database and shown to perform close to the level of top performing reduced and full reference VQA algorithms.
A Study on the Impact of Fatigue on Human Raters When Scoring Speaking Responses

ERIC Educational Resources Information Center

Ling, Guangming; Mollaun, Pamela; Xi, Xiaoming

2014-01-01

The scoring of constructed responses may introduce construct-irrelevant factors to a test score and affect its validity and fairness. Fatigue is one of the factors that could negatively affect human performance in general, yet little is known about its effects on a human rater's scoring quality on constructed responses. In this study, we compared…
Perception of blindness and blinding eye conditions in rural communities.

PubMed

Ashaye, Adeyinka; Ajuwon, Ademola Johnson; Adeoti, Caroline

2006-06-01

The purpose of this qualitative study was to explore the causes and management of blindness and blinding eye conditions as perceived by rural dwellers of two Yoruba communities in Oyo State, Nigeria. Four focus group discussions were conducted among residents of Iddo and Isale Oyo, two rural Yoruba communities in Oyo State, Nigeria. Participants consisted of sighted, those who were partially or totally blind and community leaders. Ten patent medicine sellers and 12 traditional healers were also interviewed on their perception of the causes and management of blindness in their communities. Blindness was perceived as an increasing problem among the communities. Multiple factors were perceived to cause blindness, including germs, onchocerciasis and supernatural forces. Traditional healers believed that blindness could be cured, with many claiming that they had previously cured blindness in the past. However, all agreed that patience was an important requirement for the cure of blindness. The patent medicine sellers' reports were similar to those of the traditional healers. The barriers to use of orthodox medicine were mainly fear, misconception and perceived high costs of care. There was a consensus of opinion among group discussants and informants that there are severe social and economic consequences of blindness, including not been able to see and assess the quality of what the sufferer eats, perpetual sadness, loss of sleep and dependence on other persons for daily activities. Local beliefs associated with causation, symptoms and management of blindness and blinding eye conditions among rural Yoruba communities identified have provided a bridge for understanding local perspectives and basis for implementing appropriate primary eye care programs.
The Statin-Associated Muscle Symptom Clinical Index (SAMS-CI): Revision for Clinical Use, Content Validation, and Inter-rater Reliability.

PubMed

Rosenson, Robert S; Miller, Kate; Bayliss, Martha; Sanchez, Robert J; Baccara-Dinet, Marie T; Chibedi-De-Roche, Daniela; Taylor, Beth; Khan, Irfan; Manvelian, Garen; White, Michelle; Jacobson, Terry A

2017-04-01

The Statin-Associated Muscle Symptom Clinical Index (SAMS-CI) is a method for assessing the likelihood that a patient's muscle symptoms (e.g., myalgia or myopathy) were caused or worsened by statin use. The objectives of this study were to prepare the SAMS-CI for clinical use, estimate its inter-rater reliability, and collect feedback from physicians on its practical application. For content validity, we conducted structured in-depth interviews with its original authors as well as with a panel of independent physicians. Estimation of inter-rater reliability involved an analysis of 30 written clinical cases which were scored by a sample of physicians. A separate group of physicians provided feedback on the clinical use of the SAMS-CI and its potential utility in practice. Qualitative interviews with providers supported the content validity of the SAMS-CI. Feedback on the clinical use of the SAMS-CI included several perceived benefits (such as brevity, clear wording, and simple scoring process) and some possible concerns (workflow issues and applicability in primary care). The inter-rater reliability of the SAMS-CI was estimated to be 0.77 (confidence interval 0.66-0.85), indicating high concordance between raters. With additional provider feedback, a revised SAMS-CI instrument was created suitable for further testing, both in the clinical setting and in prospective validation studies. With standardized questions, vetted language, easily interpreted scores, and demonstrated reliability, the SAMS aims to estimate the likelihood that a patient's muscle symptoms were attributable to statins. The SAMS-CI may support better detection of statin-associated muscle symptoms in clinical practice, optimize treatment for patients experiencing muscle symptoms, and provide a useful tool for further clinical research.
Belief in narcissistic insecurity: Perceptions of lay raters and their personality and psychopathology relations.

PubMed

Stanton, Kasey; Watson, David; Clark, Lee Anna

2018-02-01

This study advances research on interpersonal perceptions of narcissism by examining the degree to which overt displays of narcissism (e.g. being boastful and arrogant) are viewed by lay raters as resulting from covert insecurity. We wrote a brief set of items to assess this view and collected responses from a large sample of community adults (n = 5 528). We present results both for participants reporting (n = 617; patient subsample) and not reporting (n = 4 911; non-patient subsample) current psychiatric treatment. Results revealed that (1) overt grandiose narcissistic traits generally are viewed as being linked to covert insecurity and vulnerability and (2) items intended to assess this link define a meaningful construct, referred to here as Belief in Narcissistic Insecurity. Patient subsample participants also completed measures of personality and psychopathology. Belief in Narcissistic Insecurity showed modest positive relations with self-rated narcissism and with favourable views of one's personality (i.e. seeing oneself as extraverted and conscientious). These findings contribute to research aimed at explicating how perceptions of narcissism are related to self-views and interpersonal functioning. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Does a Rater's Familiarity with a Candidate's Pronunciation Affect the Rating in Oral Proficiency Interviews?

ERIC Educational Resources Information Center

Carey, Michael D.; Mannell, Robert H.; Dunn, Peter K.

2011-01-01

This study investigated factors that could affect inter-examiner reliability in the pronunciation assessment component of speaking tests. We hypothesized that the rating of pronunciation is susceptible to variation in assessment due to the amount of exposure examiners have to nonnative English accents. An inter-rater variability analysis was…
The Effect of Instrument-Specific Rater Training on Interrater Reliability and Counseling Skills Performance Differentiation

ERIC Educational Resources Information Center

Meacham, Paul Douglas, Jr.

2013-01-01

The purpose of this study was to explore the effect of instrument-specific rater training on interrater reliability (IRR) and counseling skills performance differentiation. Strong IRR is of primary concern to effective program evaluation (McCullough, Kuhn, Andrews, Valen, Hatch, & Osimo, 2003; Schanche, Nielsen, McCullough, Valen, &…
Perception of blindness and blinding eye conditions in rural communities.

PubMed Central

Ashaye, Adeyinka; Ajuwon, Ademola Johnson; Adeoti, Caroline

2006-01-01

PURPOSE: The purpose of this qualitative study was to explore the causes and management of blindness and blinding eye conditions as perceived by rural dwellers of two Yoruba communities in Oyo State, Nigeria. METHODS: Four focus group discussions were conducted among residents of Iddo and Isale Oyo, two rural Yoruba communities in Oyo State, Nigeria. Participants consisted of sighted, those who were partially or totally blind and community leaders. Ten patent medicine sellers and 12 traditional healers were also interviewed on their perception of the causes and management of blindness in their communities. FINDINGS: Blindness was perceived as an increasing problem among the communities. Multiple factors were perceived to cause blindness, including germs, onchocerciasis and supernatural forces. Traditional healers believed that blindness could be cured, with many claiming that they had previously cured blindness in the past. However, all agreed that patience was an important requirement for the cure of blindness. The patent medicine sellers' reports were similar to those of the traditional healers. The barriers to use of orthodox medicine were mainly fear, misconception and perceived high costs of care. There was a consensus of opinion among group discussants and informants that there are severe social and economic consequences of blindness, including not been able to see and assess the quality of what the sufferer eats, perpetual sadness, loss of sleep and dependence on other persons for daily activities. CONCLUSION: Local beliefs associated with causation, symptoms and management of blindness and blinding eye conditions among rural Yoruba communities identified have provided a bridge for understanding local perspectives and basis for implementing appropriate primary eye care programs. PMID:16775910
Teaching Life Sciences to Blind and Visually Impaired Learners

ERIC Educational Resources Information Center

Fraser, William John; Maguvhe, Mbulaheni Obert

2008-01-01

This study reports on the teaching of life sciences (biology) to blind and visually impaired learners in South Africa at 11 special schools with specific reference to the development of science process skills in outcomes-based classrooms. Individual structured interviews were conducted with nine science educators teaching at the different special…
The relative reliability of actively participating and passively observing raters in a simulation-based assessment for selection to specialty training in anaesthesia.

PubMed

Roberts, M J; Gale, T C E; Sice, P J A; Anderson, I R

2013-06-01

Selection to specialty training is a high-stakes assessment demanding valuable consultant time. In one initial entry level and two higher level anaesthesia selection centres, we investigated the feasibility of using staff participating in simulation scenarios, rather than observing consultants, to rate candidate performance. We compared participant and observer scores using four different outcomes: inter-rater reliability; score distributions; correlation of candidate rankings; and percentage of candidates whose selection might be affected by substituting participants' for observers' ratings. Inter-rater reliability between observers was good (correlation coefficient 0.73-0.96) but lower between participants (correlation coefficient 0.39-0.92), particularly at higher level where participants also rated candidates more favourably than did observers. Station rank orderings were strongly correlated between the rater groups at entry level (rho 0.81, p < 0.001) but weaker at the two higher level centres (rho 0.52, p = 0.018; rho 0.58, p = 0.001). Substituting participants' for observers' ratings had less effect once scores were combined with those from other selection centre stations. Selection decisions for 0-20% of candidates could have changed, depending on the numbers of training posts available. We conclude that using participating raters is feasible at initial entry level only. Anaesthesia © 2013 The Association of Anaesthetists of Great Britain and Ireland.
Blindness to Curvature and Blindness to Illusory Curvature.

PubMed

Bertamini, Marco; Kitaoka, Akiyoshi

2018-01-01

We compare two versions of two known phenomena, the Curvature blindness and the Kite mesh illusions, to highlight how similar manipulations lead to blindness to curvature and blindness to illusory curvature, respectively. The critical factor is a change in luminance polarity; this factor interferes with the computation of curvature along the contour, for both real and illusory curvature.
Inter-Rater Reliability and Validity of the Australian Football League’s Kicking and Handball Tests

PubMed Central

Cripps, Ashley J.; Hopper, Luke S.; Joyce, Christopher

2015-01-01

Talent identification tests used at the Australian Football League’s National Draft Combine assess the capacities of athletes to compete at a professional level. Tests created for the National Draft Combine are also commonly used for talent identification and athlete development in development pathways. The skills tests created by the Australian Football League required players to either handball (striking the ball with the hand) or kick to a series of 6 randomly generated targets. Assessors subjectively rate each skill execution giving a 0-5 score for each disposal. This study aimed to investigate the inter-rater reliability and validity of the skills tests at an adolescent sub-elite level. Male Australian footballers were recruited from sub-elite adolescent teams (n = 121, age = 15.7 ± 0.3 years, height = 1.77 ± 0.07 m, mass = 69.17 ± 8.08 kg). The coaches (n = 7) of each team were also recruited. Inter-rater reliability was assessed using Inter-class correlations (ICC) and Limits of Agreement statistics. Both the kicking (ICC = 0.96, p < .01) and handball tests (ICC = 0.89, p < .01) demonstrated strong reliability and acceptable levels of absolute agreement. Content validity was determined by examining the test scores sensitivity to laterality and distance. Concurrent validity was assessed by comparing coaches’ perceptions of skill to actual test outcomes. Multivariate analysis of variance (MANOVA) examined the main effect of laterality, with scores on the dominant hand (p = .04) and foot (p < .01) significantly higher compared to the non-dominant side. Follow-up univariate analysis reported significant differences at every distance in the kicking test. A poor correlation was found between coaches’ perceptions of skill and testing outcomes. The results of this study demonstrate both skill tests demonstrate acceptable inter-rater reliable. Partial content validity was confirmed for the kicking test, however further research is required to confirm
Inconsistency in the analysis of morphological deformities in chironomidae (Insecta: Diptera) larvae.

PubMed

Salmelin, Johanna; Vuori, Kari-Matti; Hämäläinen, Heikki

2015-08-01

The incidence of morphological deformities of chironomid larvae as an indicator of sediment toxicity has been studied for decades. However, standards for deformity analysis are lacking. The authors evaluated whether 25 experts diagnosed larval deformities in a similar manner. Based on high-quality digital images, the experts rated 211 menta of Chironomus spp. larvae as normal or deformed. The larvae were from a site with polluted sediments or from a reference site. The authors revealed this to a random half of the experts, and the rest conducted the assessment blind. The authors quantified the interrater agreement by kappa coefficient, tested whether open and blind assessments differed in deformity incidence and in differentiation between the sites, and identified those deformity types rated most consistently or inconsistently. The total deformity incidence varied greatly, from 10.9% to 66.4% among experts. Kappa coefficient across rater pairs averaged 0.52, indicating insufficient agreement. The deformity types rated most consistently were those missing teeth or with extra teeth. The open and blind assessments did not differ, but differentiation between sites was clearest for raters who counted primarily absolute deformities such as missing and extra teeth and excluded apparent mechanical aberrations or deviations in tooth size or symmetry. The highly differing criteria in deformity assignment have likely led to inconsistent results in midge larval deformity studies and indicate an urgent need for standardization of the analysis. © 2015 SETAC.
A Comparison of EFL Raters' Essay-Rating Processes across Two Types of Rating Scales

ERIC Educational Resources Information Center

Li, Hang; He, Lianzhen

2015-01-01

This study used think-aloud protocols to compare essay-rating processes across holistic and analytic rating scales in the context of China's College English Test Band 6 (CET-6). A group of 9 experienced CET-6 raters scored the same batch of 10 CET-6 essays produced in an operational CET-6 administration twice, using both the CET-6 holistic…

An Alternative Method Used in Evaluating Agreement among Repeat Measurements by Two Raters in Education

ERIC Educational Resources Information Center

Erdogan, Semra; Orekici Temel, Gülhan; Selvi, Hüseyin; Ersöz Kaya, Irem

2017-01-01

Taking more than one measurement of the same variable also hosts the possibility of contamination from error sources, both singly and in combination as a result of interactions. Therefore, although the internal consistency of scores received from measurement tools is examined by itself, it is necessary to ensure interrater or intra-rater agreement…
Ocular morbidity patterns among children in schools for the blind in Chennai.

PubMed

Prakash, M Vs; Sivakumar, S; Dayal, Ashutosh; Chitra, A; Subramaniam, Sudharshini

2017-08-01

To identify the morbidity patterns causing blindness in children attending schools for the blind in Chennai and comparing our data with similar studies done previously. A cross-sectional prevalence study was carried out in two schools for the blind in Chennai. Blind schools were visited by a team of ophthalmologists and optometrists. Students with best-corrected visual acuity (BCVA) worse than 3/60 in the better eye were included and relevant history was noted. Every student underwent anterior segment evaluation and detailed fundus examination. Morbidity of the better eye was taken as cause of blindness. Health records maintained by the school were referred to wherever available. The anatomical causes of blindness include optic nerve disorders in 75 (24.8%) cases, retinal disorders in 55 (18.2%), corneal disorders in 47 (15.6%), lens-related disorders in 39 (12.9%), congenital anomalies in 11 (3.6%), and congenital glaucoma in 20 (6.6%) cases. The whole globe was involved in six cases (1.99%). Among conditions causing blindness, optic atrophy seen in 73 (24.17%) cases was the most common, followed by retinal dystrophy in 44 (14.56%), corneal scarring in 35 (11.59%), cataract in 22 (7.28%), and congenital glaucoma in 20 (6.6%) cases. It was found that avoidable causes of blindness were seen in 31% of cases and incurable causes in 45%. Optic nerve atrophy and retinal dystrophy are the emerging causes of blindness, underlining the need for genetic counseling and low vision rehabilitation centers, along with a targeted approach for avoidable causes of blindness.
A Comparison of Rubrics and Graded Category Rating Scales with Various Methods Regarding Raters' Reliability

ERIC Educational Resources Information Center

Dogan, C. Deha; Uluman, Müge

2017-01-01

The aim of this study was to determine the extent at which graded-category rating scales and rubrics contribute to inter-rater reliability. The research was designed as a correlational study. Study group consisted of 82 students attending sixth grade and three writing course teachers in a private elementary school. A performance task was…
The intra- and inter-rater reliability of five clinical muscle performance tests in patients with and without neck pain

PubMed Central

2013-01-01

Background This study investigates the reliability of muscle performance tests using cost- and time-effective methods similar to those used in clinical practice. When conducting reliability studies, great effort goes into standardising test procedures to facilitate a stable outcome. Therefore, several test trials are often performed. However, when muscle performance tests are applied in the clinical setting, clinicians often only conduct a muscle performance test once as repeated testing may produce fatigue and pain, thus variation in test results. We aimed to investigate whether cervical muscle performance tests, which have shown promising psychometric properties, would remain reliable when examined under conditions similar to those of daily clinical practice. Methods The intra-rater (between-day) and inter-rater (within-day) reliability was assessed for five cervical muscle performance tests in patients with (n = 33) and without neck pain (n = 30). The five tests were joint position error, the cranio-cervical flexion test, the neck flexor muscle endurance test performed in supine and in a 45°-upright position and a new neck extensor test. Results Intra-rater reliability ranged from moderate to almost perfect agreement for joint position error (ICC ≥ 0.48-0.82), the cranio-cervical flexion test (ICC ≥ 0.69), the neck flexor muscle endurance test performed in supine (ICC ≥ 0.68) and in a 45°-upright position (ICC ≥ 0.41) with the exception of a new test (neck extensor test), which ranged from slight to moderate agreement (ICC = 0.14-0.41). Likewise, inter-rater reliability ranged from moderate to almost perfect agreement for joint position error (ICC ≥ 0.51-0.75), the cranio-cervical flexion test (ICC ≥ 0.85), the neck flexor muscle endurance test performed in supine (ICC ≥ 0.70) and in a 45°-upright position (ICC ≥ 0.56). However, only slight to fair agreement was found for the neck extensor test (ICC�
[Blindness: Three Papers.

ERIC Educational Resources Information Center

Jernigan, Kenneth

Three papers by the president of the National Federation of the Blind are presented. The first, "A Definition of Blindness," examines definitions of blindness, asserting the advantages of a functional or sociological definition over a physical or medical definition. He cites harm in legal distinctions between partial and full blindness and between…
Inter-rater reliability for movement pattern analysis (MPA): measuring patterning of behaviors versus discrete behavior counts as indicators of decision-making style.

PubMed

Connors, Brenda L; Rende, Richard; Colton, Timothy J

2014-01-01

The unique yield of collecting observational data on human movement has received increasing attention in a number of domains, including the study of decision-making style. As such, interest has grown in the nuances of core methodological issues, including the best ways of assessing inter-rater reliability. In this paper we focus on one key topic - the distinction between establishing reliability for the patterning of behaviors as opposed to the computation of raw counts - and suggest that reliability for each be compared empirically rather than determined a priori. We illustrate by assessing inter-rater reliability for key outcome measures derived from movement pattern analysis (MPA), an observational methodology that records body movements as indicators of decision-making style with demonstrated predictive validity. While reliability ranged from moderate to good for raw counts of behaviors reflecting each of two Overall Factors generated within MPA (Assertion and Perspective), inter-rater reliability for patterning (proportional indicators of each factor) was significantly higher and excellent (ICC = 0.89). Furthermore, patterning, as compared to raw counts, provided better prediction of observable decision-making process assessed in the laboratory. These analyses support the utility of using an empirical approach to inform the consideration of measuring patterning versus discrete behavioral counts of behaviors when determining inter-rater reliability of observable behavior. They also speak to the substantial reliability that may be achieved via application of theoretically grounded observational systems such as MPA that reveal thinking and action motivations via visible movement patterns.
Mainstream Teacher Candidates' Perspectives on ESL Writing: The Effects of Writer Identity and Rater Background

ERIC Educational Resources Information Center

Kang, Hyun-Sook; Veitch, Hillary

2017-01-01

This study explored the extent to which the ethnic identity of a writer and the background (gender and area of teaching) of a rater can influence mainstream teacher candidates' evaluation of English as a second language (ESL) writing, using a matched-guise method. A one-page essay was elicited from an ESL learner enrolled in an intensive English…
Inter-rater reliability of data elements from a prototype of the Paul Coverdell National Acute Stroke Registry

PubMed Central

Reeves, Mathew J; Mullard, Andrew J; Wehner, Susan

2008-01-01

Background The Paul Coverdell National Acute Stroke Registry (PCNASR) is a U.S. based national registry designed to monitor and improve the quality of acute stroke care delivered by hospitals. The registry monitors care through specific performance measures, the accuracy of which depends in part on the reliability of the individual data elements used to construct them. This study describes the inter-rater reliability of data elements collected in Michigan's state-based prototype of the PCNASR. Methods Over a 6-month period, 15 hospitals participating in the Michigan PCNASR prototype submitted data on 2566 acute stroke admissions. Trained hospital staff prospectively identified acute stroke admissions, abstracted chart information, and submitted data to the registry. At each hospital 8 randomly selected cases were re-abstracted by an experienced research nurse. Inter-rater reliability was estimated by the kappa statistic for nominal variables, and intraclass correlation coefficient (ICC) for ordinal and continuous variables. Factors that can negatively impact the kappa statistic (i.e., trait prevalence and rater bias) were also evaluated. Results A total of 104 charts were available for re-abstraction. Excellent reliability (kappa or ICC > 0.75) was observed for many registry variables including age, gender, black race, hemorrhagic stroke, discharge medications, and modified Rankin Score. Agreement was at least moderate (i.e., 0.75 > kappa ≥; 0.40) for ischemic stroke, TIA, white race, non-ambulance arrival, hospital transfer and direct admit. However, several variables had poor reliability (kappa < 0.40) including stroke onset time, stroke team consultation, time of initial brain imaging, and discharge destination. There were marked systematic differences between hospital abstractors and the audit abstractor (i.e., rater bias) for many of the data elements recorded in the emergency department. Conclusion The excellent reliability of many of the data elements
Representing vision and blindness.

PubMed

Ray, Patrick L; Cox, Alexander P; Jensen, Mark; Allen, Travis; Duncan, William; Diehl, Alexander D

2016-01-01

There have been relatively few attempts to represent vision or blindness ontologically. This is unsurprising as the related phenomena of sight and blindness are difficult to represent ontologically for a variety of reasons. Blindness has escaped ontological capture at least in part because: blindness or the employment of the term 'blindness' seems to vary from context to context, blindness can present in a myriad of types and degrees, and there is no precedent for representing complex phenomena such as blindness. We explore current attempts to represent vision or blindness, and show how these attempts fail at representing subtypes of blindness (viz., color blindness, flash blindness, and inattentional blindness). We examine the results found through a review of current attempts and identify where they have failed. By analyzing our test cases of different types of blindness along with the strengths and weaknesses of previous attempts, we have identified the general features of blindness and vision. We propose an ontological solution to represent vision and blindness, which capitalizes on resources afforded to one who utilizes the Basic Formal Ontology as an upper-level ontology. The solution we propose here involves specifying the trigger conditions of a disposition as well as the processes that realize that disposition. Once these are specified we can characterize vision as a function that is realized by certain (in this case) biological processes under a range of triggering conditions. When the range of conditions under which the processes can be realized are reduced beyond a certain threshold, we are able to say that blindness is present. We characterize vision as a function that is realized as a seeing process and blindness as a reduction in the conditions under which the sight function is realized. This solution is desirable because it leverages current features of a major upper-level ontology, accurately captures the phenomenon of blindness, and can be
The assessment of fidelity in a motor speech-treatment approach

PubMed Central

Hayden, Deborah; Namasivayam, Aravind Kumar; Ward, Roslyn

2015-01-01

Objective To demonstrate the application of the constructs of treatment fidelity for research and clinical practice for motor speech disorders, using the Prompts for Restructuring Oral Muscular Phonetic Targets (PROMPT) Fidelity Measure (PFM). Treatment fidelity refers to a set of procedures used to monitor and improve the validity and reliability of behavioral intervention. While the concept of treatment fidelity has been emphasized in medical and allied health sciences, documentation of procedures for the systematic evaluation of treatment fidelity in Speech-Language Pathology is sparse. Methods The development and iterative process to improve the PFM, is discussed. Further, the PFM is evaluated against recommended measurement strategies documented in the literature. This includes evaluating the appropriateness of goals and objectives; and the training of speech–language pathologists, using direct and indirect procedures. Three expert raters scored the PFM to examine inter-rater reliability. Results Three raters, blinded to each other's scores, completed fidelity ratings on three separate occasions. Inter-rater reliability, using Krippendorff's Alpha, was >80% for the PFM on the final scoring occasion. This indicates strong inter-rater reliability. Conclusion The development of fidelity measures for the training of service providers and treatment delivery is important in specialized treatment approaches where certain ‘active ingredients’ (e.g. specific treatment targets and therapeutic techniques) must be present in order for treatment to be effective. The PFM reflects evidence-based practice by integrating treatment delivery and clinical skill as a single quantifiable metric. PFM enables researchers and clinicians to objectively measure treatment outcomes within the PROMPT approach. PMID:26213623
Rubber Hands Feel Touch, but Not in Blind Individuals

PubMed Central

Ehrsson, H. Henrik

2012-01-01

Psychology and neuroscience have a long-standing tradition of studying blind individuals to investigate how visual experience shapes perception of the external world. Here, we study how blind people experience their own body by exposing them to a multisensory body illusion: the somatic rubber hand illusion. In this illusion, healthy blindfolded participants experience that they are touching their own right hand with their left index finger, when in fact they are touching a rubber hand with their left index finger while the experimenter touches their right hand in a synchronized manner (Ehrsson et al. 2005). We compared the strength of this illusion in a group of blind individuals (n = 10), all of whom had experienced severe visual impairment or complete blindness from birth, and a group of age-matched blindfolded sighted participants (n = 12). The illusion was quantified subjectively using questionnaires and behaviorally by asking participants to point to the felt location of the right hand. The results showed that the sighted participants experienced a strong illusion, whereas the blind participants experienced no illusion at all, a difference that was evident in both tests employed. A further experiment testing the participants' basic ability to localize the right hand in space without vision (proprioception) revealed no difference between the two groups. Taken together, these results suggest that blind individuals with impaired visual development have a more veridical percept of self-touch and a less flexible and dynamic representation of their own body in space compared to sighted individuals. We speculate that the multisensory brain systems that re-map somatosensory signals onto external reference frames are less developed in blind individuals and therefore do not allow efficient fusion of tactile and proprioceptive signals from the two upper limbs into a single illusory experience of self-touch as in sighted individuals. PMID:22558268
Rubber hands feel touch, but not in blind individuals.

PubMed

Petkova, Valeria I; Zetterberg, Hedvig; Ehrsson, H Henrik

2012-01-01

Psychology and neuroscience have a long-standing tradition of studying blind individuals to investigate how visual experience shapes perception of the external world. Here, we study how blind people experience their own body by exposing them to a multisensory body illusion: the somatic rubber hand illusion. In this illusion, healthy blindfolded participants experience that they are touching their own right hand with their left index finger, when in fact they are touching a rubber hand with their left index finger while the experimenter touches their right hand in a synchronized manner (Ehrsson et al. 2005). We compared the strength of this illusion in a group of blind individuals (n = 10), all of whom had experienced severe visual impairment or complete blindness from birth, and a group of age-matched blindfolded sighted participants (n = 12). The illusion was quantified subjectively using questionnaires and behaviorally by asking participants to point to the felt location of the right hand. The results showed that the sighted participants experienced a strong illusion, whereas the blind participants experienced no illusion at all, a difference that was evident in both tests employed. A further experiment testing the participants' basic ability to localize the right hand in space without vision (proprioception) revealed no difference between the two groups. Taken together, these results suggest that blind individuals with impaired visual development have a more veridical percept of self-touch and a less flexible and dynamic representation of their own body in space compared to sighted individuals. We speculate that the multisensory brain systems that re-map somatosensory signals onto external reference frames are less developed in blind individuals and therefore do not allow efficient fusion of tactile and proprioceptive signals from the two upper limbs into a single illusory experience of self-touch as in sighted individuals.
Ocular morbidity patterns among children in schools for the blind in Chennai

PubMed Central

Prakash, MVS; Sivakumar, S; Dayal, Ashutosh; Chitra, A; Subramaniam, Sudharshini

2017-01-01

Purpose: To identify the morbidity patterns causing blindness in children attending schools for the blind in Chennai and comparing our data with similar studies done previously. Methods: A cross-sectional prevalence study was carried out in two schools for the blind in Chennai. Blind schools were visited by a team of ophthalmologists and optometrists. Students with best-corrected visual acuity (BCVA) worse than 3/60 in the better eye were included and relevant history was noted. Every student underwent anterior segment evaluation and detailed fundus examination. Morbidity of the better eye was taken as cause of blindness. Health records maintained by the school were referred to wherever available. Results: The anatomical causes of blindness include optic nerve disorders in 75 (24.8%) cases, retinal disorders in 55 (18.2%), corneal disorders in 47 (15.6%), lens-related disorders in 39 (12.9%), congenital anomalies in 11 (3.6%), and congenital glaucoma in 20 (6.6%) cases. The whole globe was involved in six cases (1.99%). Among conditions causing blindness, optic atrophy seen in 73 (24.17%) cases was the most common, followed by retinal dystrophy in 44 (14.56%), corneal scarring in 35 (11.59%), cataract in 22 (7.28%), and congenital glaucoma in 20 (6.6%) cases. Conclusion: It was found that avoidable causes of blindness were seen in 31% of cases and incurable causes in 45%. Optic nerve atrophy and retinal dystrophy are the emerging causes of blindness, underlining the need for genetic counseling and low vision rehabilitation centers, along with a targeted approach for avoidable causes of blindness. PMID:28820161
20 CFR 416.1720 - Whom we refer.

Code of Federal Regulations, 2010 CFR

2010-04-01

... 20 Employees' Benefits 2 2010-04-01 2010-04-01 false Whom we refer. 416.1720 Section 416.1720 Employees' Benefits SOCIAL SECURITY ADMINISTRATION SUPPLEMENTAL SECURITY INCOME FOR THE AGED, BLIND, AND DISABLED Referral of Persons Eligible for Supplemental Security Income to Other Agencies Referral for Treatment of Alcoholism Or Drug Addiction § 41...
Towards a framework for developing semantic relatedness reference standards.

PubMed

Pakhomov, Serguei V S; Pedersen, Ted; McInnes, Bridget; Melton, Genevieve B; Ruggieri, Alexander; Chute, Christopher G

2011-04-01

Our objective is to develop a framework for creating reference standards for functional testing of computerized measures of semantic relatedness. Currently, research on computerized approaches to semantic relatedness between biomedical concepts relies on reference standards created for specific purposes using a variety of methods for their analysis. In most cases, these reference standards are not publicly available and the published information provided in manuscripts that evaluate computerized semantic relatedness measurement approaches is not sufficient to reproduce the results. Our proposed framework is based on the experiences of medical informatics and computational linguistics communities and addresses practical and theoretical issues with creating reference standards for semantic relatedness. We demonstrate the use of the framework on a pilot set of 101 medical term pairs rated for semantic relatedness by 13 medical coding experts. While the reliability of this particular reference standard is in the "moderate" range; we show that using clustering and factor analyses offers a data-driven approach to finding systematic differences among raters and identifying groups of potential outliers. We test two ontology-based measures of relatedness and provide both the reference standard containing individual ratings and the R program used to analyze the ratings as open-source. Currently, these resources are intended to be used to reproduce and compare results of studies involving computerized measures of semantic relatedness. Our framework may be extended to the development of reference standards in other research areas in medical informatics including automatic classification, information retrieval from medical records and vocabulary/ontology development. Copyright © 2010 Elsevier Inc. All rights reserved.
Towards a Framework for Developing Semantic Relatedness Reference Standards

PubMed Central

Pakhomov, Serguei V.S.; Pedersen, Ted; McInnes, Bridget; Melton, Genevieve B.; Ruggieri, Alexander; Chute, Christopher G.

2010-01-01

Our objective is to develop a framework for creating reference standards for functional testing of computerized measures of semantic relatedness. Currently, research on computerized approaches to semantic relatedness between biomedical concepts relies on reference standards created for specific purposes using a variety of methods for their analysis. In most cases, these reference standards are not publicly available and the published information provided in manuscripts that evaluate computerized semantic relatedness measurement approaches is not sufficient to reproduce the results. Our proposed framework is based on the experiences of medical informatics and computational linguistics communities and addresses practical and theoretical issues with creating reference standards for semantic relatedness. We demonstrate the use of the framework on a pilot set of 101 medical term pairs rated for semantic relatedness by 13 medical coding experts. While the reliability of this particular reference standard is in the “moderate” range; we show that using clustering and factor analyses offers a data-driven approach to finding systematic differences among raters and identifying groups of potential outliers. We test two ontology-based measures of relatedness and provide both the reference standard containing individual ratings and the R program used to analyze the ratings as open-source. Currently, these resources are intended to be used to reproduce and compare results of studies involving computerized measures of semantic relatedness. Our framework may be extended to the development of reference standards in other research areas in medical informatics including automatic classification, information retrieval from medical records and vocabulary/ontology development. PMID:21044697
Reproducibility of dynamically represented acoustic lung images from healthy individuals

PubMed Central

Maher, T M; Gat, M; Allen, D; Devaraj, A; Wells, A U; Geddes, D M

2008-01-01

Background and aim: Acoustic lung imaging offers a unique method for visualising the lung. This study was designed to demonstrate reproducibility of acoustic lung images recorded from healthy individuals at different time points and to assess intra- and inter-rater agreement in the assessment of dynamically represented acoustic lung images. Methods: Recordings from 29 healthy volunteers were made on three separate occasions using vibration response imaging. Reproducibility was measured using quantitative, computerised assessment of vibration energy. Dynamically represented acoustic lung images were scored by six blinded raters. Results: Quantitative measurement of acoustic recordings was highly reproducible with an intraclass correlation score of 0.86 (very good agreement). Intraclass correlations for inter-rater agreement and reproducibility were 0.61 (good agreement) and 0.86 (very good agreement), respectively. There was no significant difference found between the six raters at any time point. Raters ranged from 88% to 95% in their ability to identically evaluate the different features of the same image presented to them blinded on two separate occasions. Conclusion: Acoustic lung imaging is reproducible in healthy individuals. Graphic representation of lung images can be interpreted with a high degree of accuracy by the same and by different reviewers. PMID:18024534
[Quality assurance in coding expertise of hospital cases in the German DRG system. Evaluation of inter-rater reliability in MDK expertise].

PubMed

Huber, H; Brambrink, M; Funk, R; Rieger, M

2012-10-01

The purpose of this study was to evaluate differences in the D-DRG results of a hospital case by 2 independently coding MKD raters. Calculation of the 2-inter-rater reliability was performed by examination of the coding of individual hospital cases. The reasons for the non-agreement of the expert evaluations and suggestions to improve the process are discussed. From the expert evaluation pool of the MDK-WL a random sample of 0.7% of the 57,375 expertises was taken. Distribution equality with the basic total was tested by the χ² test or, respectively, Fisher's exact test. For the total of 402 individual hospital cases, the G-DRG case sums of 2 experts of the MDK were determined independently and the results checked for each individual case for agreement or non-agreement. The corresponding confidence intervals with standard errors were analysed to test if certain major diagnosis categories (MDC) were statistically significantly more affected by differing expertise results than others. In 280 of the total 402 tested hospital cases, the 2 MDK raters independently reached the same G-DRG results; in 122 cases the G-DRG case sums determined by the 2 raters differed (agreement 70%; CI 65.2-74.1). Different DRG results between the 2 experts occurred regularly in the entire MDC spectrum. No MDC chapter in which significant differences between the 2 raters arose could be identified. The results of our study demonstrate an almost 70% agreement in the evaluation of hospital cost accounts by 2 independently operating MDK. This result leaves room for improvement. Optimisation potentials can be recognised on the basis of the results. Potential for improvement was established in combination with regular further training and the expansion of binding internal code recommendations as well as exchange of code-relevant information among experts in internal forums. The presented model is in principle suitable for cross-border examinations within the MDK system with the advantage that
The quality of evidence of psychometric properties of three-dimensional spinal posture-measuring instruments

PubMed Central

2011-01-01

Background Psychometric properties include validity, reliability and sensitivity to change. Establishing the psychometric properties of an instrument which measures three-dimensional human posture are essential prior to applying it in clinical practice or research. Methods This paper reports the findings of a systematic literature review which aimed to 1) identify non-invasive three-dimensional (3D) human posture-measuring instruments; and 2) assess the quality of reporting of the methodological procedures undertaken to establish their psychometric properties, using a purpose-build critical appraisal tool. Results Seventeen instruments were identified, of which nine were supported by research into psychometric properties. Eleven and six papers respectively, reported on validity and reliability testing. Rater qualification and reference standards were generally poorly addressed, and there was variable quality reporting of rater blinding and statistical analysis. Conclusions There is a lack of current research to establish the psychometric properties of non-invasive 3D human posture-measuring instruments. PMID:21569486
Alcohol and Other Drug Abuse as Coexisting Disabilities: Considerations for Counselors Serving Individuals Who Are Blind or Visually Impaired.

ERIC Educational Resources Information Center

Koch, D. Shane; Nelipovich, Michael; Sneed, Zach

2002-01-01

This article identifies the potential affect of alcohol and other drug abuse (AODA) on people who are blind or visually impaired, the barriers to providing effective AODA services for those people, and strategies for improving services for people with coexisting blindness or visual impairments and AODA. (Contains references.) (CR)

Transcultural Adaptation of GRID Hamilton Rating Scale For Depression (GRID-HAMD) to Brazilian Portuguese and Evaluation of the Impact of Training Upon Inter-Rater Reliability.

PubMed

Henrique-Araújo, Ricardo; Osório, Flávia L; Gonçalves Ribeiro, Mônica; Soares Monteiro, Ivandro; Williams, Janet B W; Kalali, Amir; Alexandre Crippa, José; Oliveira, Irismar Reis De

2014-07-01

GRID-HAMD is a semi-structured interview guide developed to overcome flaws in HAM-D, and has been incorporated into an increasing number of studies. Carry out the transcultural adaptation of GRID-HAMD into the Brazilian Portuguese language, evaluate the inter-rater reliability of this instrument and the training impact upon this measure, and verify the raters' opinions of said instrument. The transcultural adaptation was conducted by appropriate methodology. The measurement of inter-rater reliability was done by way of videos that were evaluated by 85 professionals before and after training for the use of this instrument. The intraclass correlation coefficient (ICC) remained between 0.76 and 0.90 for GRID-HAMD-21 and between 0.72 and 0.91 for GRID-HAMD-17. The training did not have an impact on the ICC, except for a few groups of participants with a lower level of experience. Most of the participants showed high acceptance of GRID-HAMD, when compared to other versions of HAM-D. The scale presented adequate inter-rater reliability even before training began. Training did not have an impact on this measure, except for a few groups with less experience. GRID-HAMD received favorable opinions from most of the participants.
Inter-rater reliability for movement pattern analysis (MPA): measuring patterning of behaviors versus discrete behavior counts as indicators of decision-making style

PubMed Central

Connors, Brenda L.; Rende, Richard; Colton, Timothy J.

2014-01-01

The unique yield of collecting observational data on human movement has received increasing attention in a number of domains, including the study of decision-making style. As such, interest has grown in the nuances of core methodological issues, including the best ways of assessing inter-rater reliability. In this paper we focus on one key topic – the distinction between establishing reliability for the patterning of behaviors as opposed to the computation of raw counts – and suggest that reliability for each be compared empirically rather than determined a priori. We illustrate by assessing inter-rater reliability for key outcome measures derived from movement pattern analysis (MPA), an observational methodology that records body movements as indicators of decision-making style with demonstrated predictive validity. While reliability ranged from moderate to good for raw counts of behaviors reflecting each of two Overall Factors generated within MPA (Assertion and Perspective), inter-rater reliability for patterning (proportional indicators of each factor) was significantly higher and excellent (ICC = 0.89). Furthermore, patterning, as compared to raw counts, provided better prediction of observable decision-making process assessed in the laboratory. These analyses support the utility of using an empirical approach to inform the consideration of measuring patterning versus discrete behavioral counts of behaviors when determining inter-rater reliability of observable behavior. They also speak to the substantial reliability that may be achieved via application of theoretically grounded observational systems such as MPA that reveal thinking and action motivations via visible movement patterns. PMID:24999336
The impact of revised DSM-5 criteria on the relative distribution and inter-rater reliability of eating disorder diagnoses in a residential treatment setting.

PubMed

Thomas, Jennifer J; Eddy, Kamryn T; Murray, Helen B; Tromp, Marilou D P; Hartmann, Andrea S; Stone, Melissa T; Levendusky, Philip G; Becker, Anne E

2015-09-30

This study evaluated the relative distribution and inter-rater reliability of revised DSM-5 criteria for eating disorders in a residential treatment program. Consecutive adolescent and young adult females (N=150) admitted to a residential eating disorder treatment facility were assigned both DSM-IV and DSM-5 diagnoses by a clinician (n=14) via routine clinical interview and a research assessor (n=4) via structured interview. We compared the frequency of diagnostic assignments under each taxonomy and by type of assessor. We evaluated concordance between clinician and researcher assignment through inter-rater reliability kappa and percent agreement. Significantly fewer patients received either clinician or researcher diagnoses of a residual eating disorder under DSM-5 (clinician-12.0%; researcher-31.3%) versus DSM-IV (clinician-28.7%; researcher-59.3%), with the majority of reassigned DSM-IV residual cases reclassified as DSM-5 anorexia nervosa. Researcher and clinician diagnoses showed moderate inter-rater reliability under DSM-IV (κ=.48) and DSM-5 (κ=.57), though agreement for specific DSM-5 other specified feeding or eating disorder (OSFED) presentations was poor (κ=.05). DSM-5 revisions were associated with significantly less frequent residual eating disorder diagnoses, but not with reduced inter-rater reliability. Findings support specific dimensions of clinical utility for revised DSM-5 criteria for eating disorders. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Intra-rater reliability of hallux flexor strength measures using the Nintendo Wii Balance Board.

PubMed

Quek, June; Treleaven, Julia; Brauer, Sandra G; O'Leary, Shaun; Clark, Ross A

2015-01-01

The purpose of this study was to investigate the intra-rater reliability of a new method in combination with the Nintendo Wii Balance Board (NWBB) to measure the strength of hallux flexor muscle. Thirty healthy individuals (age: 34.9 ± 12.9 years, height: 170.4 ± 10.5 cm, weight: 69.3 ± 15.3 kg, female = 15) participated. Repeated testing was completed within 7 days. Participants performed strength testing in sitting using a wooden platform in combination with the NWBB. This new method was set up to selectively recruit an intrinsic muscle of the foot, specifically the flexor hallucis brevis muscle. Statistical analysis was performed using intra-class coefficients and ordinary least product analysis. To estimate measurement error, standard error of measurement (SEM), minimal detectable change (MDC) and percentage error were calculated. Results indicate excellent intra-rater reliability (ICC = 0.982, CI = 0.96-0.99) with an absence of systematic bias. SEM, MDC and percentage error value were 0.5, 1.4 and 12 % respectively. This study demonstrates that a new method in combination with the NWBB application is reliable to measure hallux flexor strength and has potential to be used for future research and clinical application.
Inter-rater Agreement of Clinicians' Treatment Recommendations Based on Modified Barium Swallow Study Reports.

PubMed

Slovarp, Laurie; Danielson, Jennifer; Liss, Julie

2018-06-07

The modified barium swallow study (MBSS) is a commonly used radiographic procedure for diagnosis and treatment of swallowing disorders. Despite attempts by dysphagia specialists to standardize the MBSS, most institutions have not adopted such standardized procedures. High variability of assessment patterns arguably contribute to variability of treatment recommendations made from diagnostic information derived from the MBSS report. An online survey was distributed to speech-language pathologists (SLPs) participating in American Speech Language Hearing Association (ASHA) listservs. Sixty-three SLPs who treat swallowing disorders participated. Participating SLPs reviewed two MBSS reports and chose physiologic treatment targets (e.g., tongue base retraction) based on each report. One report primarily contained symptomatology (e.g., aspiration, pharyngeal residue) with minimal information on impaired physiology (e.g., laryngeal incompetence, reduced hyolaryngeal elevation/excursion). In contrast, the second report contained a clear description of impaired physiology to explain the dysphagia symptoms. Fleiss kappa coefficients were used to analyze inter-rater agreement across the high and low physiology report types. Results revealed significantly higher inter-rater agreement across clinicians when reviewing reports with clear explanation(s) of physiologic impairment relative to reports that primarily focused on symptomatology. Clinicians also reported significantly greater satisfaction and treatment confidence following review of reports with clear description(s) of impaired physiology.
Inter-Rater Reliability of Provider Interpretations of Irritable Bowel Syndrome Food and Symptom Journals

PubMed Central

Chung, Chia-Fang; Xu, Kaiyuan; Dong, Yi; Schenk, Jeanette M.; Cain, Kevin; Munson, Sean; Heitkemper, Margaret M.

2017-01-01

There are currently no standardized methods for identifying trigger food(s) from irritable bowel syndrome (IBS) food and symptom journals. The primary aim of this study was to assess the inter-rater reliability of providers’ interpretations of IBS journals. A second aim was to describe whether these interpretations varied for each patient. Eight providers reviewed 17 IBS journals and rated how likely key food groups (fermentable oligo-di-monosaccharides and polyols, high-calorie, gluten, caffeine, high-fiber) were to trigger IBS symptoms for each patient. Agreement of trigger food ratings was calculated using Krippendorff’s α-reliability estimate. Providers were also asked to write down recommendations they would give to each patient. Estimates of agreement of trigger food likelihood ratings were poor (average α = 0.07). Most providers gave similar trigger food likelihood ratings for over half the food groups. Four providers gave the exact same written recommendation(s) (range 3–7) to over half the patients. Inter-rater reliability of provider interpretations of IBS food and symptom journals was poor. Providers favored certain trigger food likelihood ratings and written recommendations. This supports the need for a more standardized method for interpreting these journals and/or more rigorous techniques to accurately identify personalized IBS food triggers. PMID:29113044
CD-ROM: A New Light for the Blind and Visually Impaired.

ERIC Educational Resources Information Center

Mates, Barbara T.

1990-01-01

Describes ways of using CD-ROM technology for the benefit of blind and visually impaired library patrons. Science, reference, and American historical documents that can be converted to braille, large print, or voice output from CD-ROMs are described, and hardware, software, and staff considerations are discussed. (LRW)
On Praising Convergent Thinking: Creativity as Blind Variation and Selective Retention

ERIC Educational Resources Information Center

Simonton, Dean Keith

2015-01-01

Arthur Cropley (2006) emphasized the critical place that convergent thinking has in creativity. Although he briefly refers to the blind variation and selective retention (BVSR) theory of creativity, his discussion could not reflect the most recent theoretical and empirical developments in BVSR, especially the resulting combinatorial models.…
Validity and intra-rater reliability of an Android phone application to measure cervical range-of-motion

PubMed Central

2014-01-01

Background Concurrent validity and intra-rater reliability using a customized Android phone application to measure cervical-spine range-of-motion (ROM) has not been previously validated against a gold-standard three-dimensional motion analysis (3DMA) system. Findings Twenty-one healthy individuals (age:31 ± 9.1 years, male:11) participated, with 16 re-examined for intra-rater reliability 1–7 days later. An Android phone was fixed on a helmet, which was then securely fastened on the participant’s head. Cervical-spine ROM in flexion, extension, lateral flexion and rotation were performed in sitting with concurrent measurements obtained from both a 3DMA system and the phone. The phone demonstrated moderate to excellent (ICC = 0.53-0.98, Spearman ρ = 0.52-0.98) concurrent validity for ROM measurements in cervical flexion, extension, lateral-flexion and rotation. However, cervical rotation demonstrated both proportional and fixed bias. Excellent intra-rater reliability was demonstrated for cervical flexion, extension and lateral flexion (ICC = 0.82-0.90), but poor for right- and left-rotation (ICC = 0.05-0.33) using the phone. Possible reasons for the outcome are that flexion, extension and lateral-flexion measurements are detected by gravity-dependent accelerometers while rotation measurements are detected by the magnetometer which can be adversely affected by surrounding magnetic fields. Conclusion The results of this study demonstrate that the tested Android phone application is valid and reliable to measure ROM of the cervical-spine in flexion, extension and lateral-flexion but not in rotation likely due to magnetic interference. The clinical implication of this study is that therapists should be mindful of the plane of measurement when using the Android phone to measure ROM of the cervical-spine. PMID:24742001
Validity and intra-rater reliability of an android phone application to measure cervical range-of-motion.

PubMed

Quek, June; Brauer, Sandra G; Treleaven, Julia; Pua, Yong-Hao; Mentiplay, Benjamin; Clark, Ross Allan

2014-04-17

Concurrent validity and intra-rater reliability using a customized Android phone application to measure cervical-spine range-of-motion (ROM) has not been previously validated against a gold-standard three-dimensional motion analysis (3DMA) system. Twenty-one healthy individuals (age:31 ± 9.1 years, male:11) participated, with 16 re-examined for intra-rater reliability 1-7 days later. An Android phone was fixed on a helmet, which was then securely fastened on the participant's head. Cervical-spine ROM in flexion, extension, lateral flexion and rotation were performed in sitting with concurrent measurements obtained from both a 3DMA system and the phone.The phone demonstrated moderate to excellent (ICC = 0.53-0.98, Spearman ρ = 0.52-0.98) concurrent validity for ROM measurements in cervical flexion, extension, lateral-flexion and rotation. However, cervical rotation demonstrated both proportional and fixed bias. Excellent intra-rater reliability was demonstrated for cervical flexion, extension and lateral flexion (ICC = 0.82-0.90), but poor for right- and left-rotation (ICC = 0.05-0.33) using the phone. Possible reasons for the outcome are that flexion, extension and lateral-flexion measurements are detected by gravity-dependent accelerometers while rotation measurements are detected by the magnetometer which can be adversely affected by surrounding magnetic fields. The results of this study demonstrate that the tested Android phone application is valid and reliable to measure ROM of the cervical-spine in flexion, extension and lateral-flexion but not in rotation likely due to magnetic interference. The clinical implication of this study is that therapists should be mindful of the plane of measurement when using the Android phone to measure ROM of the cervical-spine.
On the functional order of binocular rivalry and blind spot filling-in.

PubMed

Qian, Cheng S; Brascamp, Jan W; Liu, Taosheng

2017-07-01

Binocular rivalry is an important phenomenon for understanding the mechanisms of visual awareness. Here we assessed the functional locus of binocular rivalry relative to blind spot filling-in, which is thought to transpire in V1, thus providing a reference point for assessing the locus of rivalry. We conducted two experiments to explore the functional order of binocular rivalry and blind spot filling-in. Experiment 1 examined if the information filled-in at the blind spot can engage in rivalry with a physical stimulus at the corresponding location in the fellow eye. Participants' perceptual reports showed no difference between this condition and a condition where filling-in was precluded by presenting the same stimuli away from the blind spot, suggesting that the rivalry process is not influenced by any filling-in that might occur. In Experiment 2, we presented the fellow eye's stimulus directly in rivalry with the 'inducer' stimulus that surrounds the blind spot, and compared it with two control conditions away from the blind spot: one involving a ring physically identical to the inducer, and one involving a disc that resembled the filled-in percept. Perceptual reports in the blind spot condition resembled those in the 'ring' condition, more than those in the latter, 'disc' condition, indicating that a perceptually suppressed inducer does not engender filling-in. Thus, our behavioral data suggest binocular rivalry functionally precedes blind spot filling-in. We conjecture that the neural substrate of binocular rivalry suppression includes processing stages at or before V1. Copyright © 2017 Elsevier Ltd. All rights reserved.
Rater Evaluations for Psychiatric Instruments and Cultural Differences: The PANSS in China and United States

PubMed Central

Aggarwal, Neil Krishan; Zhang, Xiang Yang; Stefanovics, Elina; Chen, Da Chun; Xiu, Mei Hong; Xu, Ke; Rosenheck, Robert A.

2013-01-01

This article compares Positive and Negative Syndrome Scale (PANSS) data from Chinese and American inpatients with chronic schizophrenia to show how differences in item ratings may reflect cultural attitudes of raters. The Chinese sample (N=504) came from Beijing Huilongguan Hospital. The American sample came from 268 PANSS assessments of CATIE subjects hospitalized for 15 days or more to optimize equivalence of the samples. Controlling for age and gender, the Chinese sample scored significantly lower for total score by 25% (p<.0001), for the positive sub-scale by 35% (p<.0001), and on the general sub-scale by 32% (p<.0001), but not significantly different on the negative sub-scale score (+0.26%, p=0.76). However, the Chinese sample scored 26% higher on the item on poor rapport (p<.0001), 10.2% higher on passive social withdrawal (p=.003), and most notably 46% higher on the item on lack of judgment and insight (p<.0001). These results remain broadly consistent across gender sub-group analyses. Differences seem to be best explained by both cultural differences in patient clinical presentations as well as varying American and Chinese cultural values affecting rater judgment. PMID:22922237
CRM Assessment: Determining the Generalization of Rater Calibration Training. Summary of Research Report: Gold Standards Training

NASA Technical Reports Server (NTRS)

Baker, David P.

2002-01-01

The extent to which pilot instructors are trained to assess crew resource management (CRM) skills accurately during Line-Oriented Flight Training (LOFT) and Line Operational Evaluation (LOE) scenarios is critical. Pilot instructors must make accurate performance ratings to ensure that proper feedback is provided to flight crews and appropriate decisions are made regarding certification to fly the line. Furthermore, the Federal Aviation Administration's (FAA) Advanced Qualification Program (AQP) requires that instructors be trained explicitly to evaluate both technical and CRM performance (i.e., rater training) and also requires that proficiency and standardization of instructors be verified periodically. To address the critical need for effective pilot instructor training, the American Institutes for Research (AIR) reviewed the relevant research on rater training and, based on "best practices" from this research, developed a new strategy for training pilot instructors to assess crew performance. In addition, we explored new statistical techniques for assessing the effectiveness of pilot instructor training. The results of our research are briefly summarized below. This summary is followed by abstracts of articles and book chapters published under this grant.
Use of volunteer student abstractors for a retrospective cohort analysis: a study of inter-rater reliability.

PubMed

Gritsiouk, Yaroslav; Hegsted, Damian; Gardiner, Stuart; Merriman, Lisa; Gubler, Kelly Dean

2013-05-01

Little is known about the reliability of data collected by abstractors without professional medical training. This investigation sought to determine the level of agreement among untrained volunteer abstractors as part of a study to evaluate the risk assessment of venous thromboembolism in patients who have undergone trauma. Forty-nine paper charts were chosen randomly from a volunteer-reviewed cohort of 2,339 and were compared with those of a single experienced abstractor. Inter-rater agreement was assessed using percent agreement, Cohen's kappa, and prevalence-adjusted bias-adjusted kappa (PABAK). Of the 71 data points, 28 had perfect agreement. The average agreement across all charts was 97%. Data with imperfect agreement had kappa values between .27 and .96 (mean, .75), with one additional value at zero even though it was associated with an agreement of 94%. PABAK values ranged from .67 to .98 (mean, .91), an average increase of .17 compared with kappa values. The performance of volunteers showed outstanding inter-rater reliability; however, limitations of interpretation can influence reliability. Copyright © 2013 Elsevier Inc. All rights reserved.
Toronto Bariatric Interprofessional Psychosocial Assessment Suitability Scale: Evaluating A New Clinical Assessment Tool for Bariatric Surgery Candidates.

PubMed

Thiara, Gurneet; Yanofksy, Richard; Abdul-Kader, Sayed; Santiago, Vincent A; Cassin, Stephanie; Okrainec, Allan; Jackson, Timothy; Hawa, Raed; Sockalingam, Sanjeev

2016-01-01

Patients who are referred for possible bariatric surgery (BS) intervention undergo a series of assessments conducted by an interdisciplinary health care team to determine suitability for surgery. Herein, we report the initial validation and reliability studies of the Bariatric Interprofessional Psychosocial Assessment Suitability Scale (BIPASS) and its relationship to interdisciplinary psychosocial assessment practices for BS. This study was conducted at the Toronto Western Hospital, a Level 1A BS center of excellence accredited by the American College of Surgeons. Phase I: a total of 4 blinded raters applied the BIPASS to 31 randomly selected BS cases referred to our program to establish interrater reliability. Phase II: in all, 3 raters with clinical experience in bariatric psychosocial care applied the BIPASS to 54 randomly selected BS cases. In total, 46 of 54 (85.1%) patients were women. The median age of all patient cases was 49 years (range: 21-74). Raters׳ BIPASS scores ranged from 4-52 (median = 19.24, standard deviation =10.38). BIPASS scores were highly predictive of the BS psychosocial outcome (area under curve = 0.915; 95% CI: 0.844-0.985; p < 0.001). A BIPASS score of ≥16 was chosen as the cutoff score for further clinical assessment before proceeding with surgical evaluation based on a receiver operating characteristic curve analysis (sensitivity = 0.839; specificity = 0.783). The instrument has very good interrater reliability (Pearson correlation coefficient = 0.847) even among novice raters. The findings show that the BIPASS is a comprehensive screening tool in the psychosocial assessment of BS candidates, which standardizes the evaluation process and systematically identify at-risk patients for negative outcomes after BS. Copyright © 2016 The Academy of Psychosomatic Medicine. Published by Elsevier Inc. All rights reserved.
On Individual Differences in Person Perception: Raters' Personality Traits Relate to Their Psychopathy Checklist-Revised Scoring Tendencies

ERIC Educational Resources Information Center

Miller, Audrey K.; Rufino, Katrina A.; Boccaccini, Marcus T.; Jackson, Rebecca L.; Murrie, Daniel C.

2011-01-01

This study investigated raters' personality traits in relation to scores they assigned to offenders using the Psychopathy Checklist-Revised (PCL-R). A total of 22 participants, including graduate students and faculty members in clinical psychology programs, completed a PCL-R training session, independently scored four criminal offenders using the…
Prevalence and causes of severe visual impairment and blindness among children in the lorestan province of iran, using the key informant method.

PubMed

Razavi, Hessom; Kuper, Hannah; Rezvan, Farhad; Amelie, Khatere; Mahboobi-Pur, Hassan; Oladi, Mohammad Reza; Muhit, Mohammad; Hashemi, Hassan

2010-03-01

To estimate the prevalence and causes of severe visual impairment and blindness among children in Lorestan province of Iran, and to assess the feasibility of the Key Informant Method in this setting. Potential cases were identified using the Key Informant Method, in 3 counties of Lorestan province during June through August 2008, and referred for examination. Causes of severe visual impairment/blindness were determined and categorized using standard World Health Organization methods. Of 123 children referred for examination, 27 children were confirmed to have severe visual impairment/blindness or blindness. The median age was11 years (interquartile range 6-13), and 59% were girls. After adjusting for non-attenders, the estimated prevalence of severe visual impairment/blindness was 0.04% (0.03-0.05). The main site of abnormality was retina (44%), followed by disorders of the whole eye (33%). The majority of causes had a hereditary etiology (70%), which was associated with a family history of blindness (P = 0.002). Potentially avoidable causes of severe visual impairment/blindness were found in 14 children (52%). Almost all children with severe visual impairment/blindness had a history of parental consanguinity (93%). Our findings suggest a moderate prevalence of childhood blindness in the Lorestan province of Iran, a high proportion of which may be avoidable, given improved access to ophthalmic and genetic counselling services in rural areas. The Key Informant Method is feasible in Iran; future research is discussed.
Validity and intra-rater reliability of MyJump app on iPhone 6s in jump performance.

PubMed

Stanton, Robert; Wintour, Sally-Anne; Kean, Crystal O

2017-05-01

Smartphone applications are increasingly used by researchers, coaches, athletes and clinicians. The aim of this study was to examine the concurrent validity and intra-rater reliability of the smartphone-based application, MyJump, against laboratory-based force plate measurements. Cross sectional study. Participants completed counter-movement jumps (CMJ) (n=29) and 30cm drop jumps (DJ) (n=27) on a force plate which were simultaneously recorded using MyJump. To assess concurrent validity, jump height, derived from flight time acquired from each device, was compared for each jump type. Intra-rater reliability was determined by replicating data analysis of MyJump recordings on two occasions separated by seven days. CMJ and DJ heights derived from MyJump showed excellent agreement with the force plate (ICC values range from 0.991 for CMJ to 0.993) However mean DJ height from the force plate was significantly higher than MyJump (mean difference: 0.87cm, 95% CI: 0.69-1.04cm). Intra-rater reliability of MyJump for both CMJ and DJ was almost perfect (ICC values range from 0.997 for CMJ to 0.998 for DJ); however, mean CMJ and DJ jump height for Day 1 was significantly higher than Day 2 (CMJ: 0.43cm, 95% CI: 0.23-0.62cm); (DJ: 0.38cm, 95% CI: 0.23-0.53cm). The present study finds MyJump to be a valid and highly reliable tool for researchers, coaches, athletes and clinicians; however, systematic bias should be considered when comparing MyJump outputs to other testing devices. Copyright © 2016 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Color blindness

MedlinePlus

Color deficiency; Blindness - color ... Color blindness occurs when there is a problem with the pigments in certain nerve cells of the eye that sense color. These cells are called cones. They are found ...
Overview on Deaf-Blindness.

ERIC Educational Resources Information Center

Miles, Barbara

1995-01-01

This overview provides basic information on the causes of deaf-blindness and the particular challenges faced by individuals with deaf-blindness. Causes of deaf-blindness include various syndromes, multiple congenital anomalies, prematurity, congenital prenatal dysfunction, and various postnatal causes. Differences between people deaf-blind from…

Using Consensus Building Procedures with Expert Raters to Establish Comparison Scores of Behavior for Direct Behavior Rating

ERIC Educational Resources Information Center

Jaffery, Rose; Johnson, Austin H.; Bowler, Mark C.; Riley-Tillman, T. Chris; Chafouleas, Sandra M.; Harrison, Sayward E.

2015-01-01

To date, rater accuracy when using Direct Behavior Rating (DBR) has been evaluated by comparing DBR-derived data to scores yielded through systematic direct observation. The purpose of this study was to evaluate an alternative method for establishing comparison scores using expert-completed DBR alongside best practices in consensus building…
Choice of Target Population Weights in Rater Comparability Scoring and Equating. Research Report. ETS RR-13-03

ERIC Educational Resources Information Center

Puhan, Gautam

2013-01-01

The purpose of this study was to demonstrate that the choice of sample weights when defining the target population under poststratification equating can be a critical factor in determining the accuracy of the equating results under a unique equating scenario, known as "rater comparability scoring and equating." The nature of data…
A comparison of seminar and computer based training on the accuracy and reliability of raters using the Children's Global Assessment Scale (CGAS).

PubMed

Lundh, Anna; Kowalski, Jan; Sundberg, Carl Johan; Landén, Mikael

2012-11-01

The aim of this study was to compare two methods to conduct CGAS rater training. A total of 648 raters were randomized to training (CD or seminar), and rated five cases before and 12 months after training. The ICC at baseline/end of study was 0.71/0.78 (seminar), 0.76/0.78 (CD), and 0.67/0.79 (comparison). There were no differences in training effect in terms of agreement with expert ratings, which speaks in favor of using the less resource-demanding CD. However, the effect was modest in both groups, and untrained comparison group improved of the same order of magnitude, which proposes more extensive training.
Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial

PubMed Central

Hallgren, Kevin A.

2012-01-01

Many research designs require the assessment of inter-rater reliability (IRR) to demonstrate consistency among observational ratings provided by multiple coders. However, many studies use incorrect statistical procedures, fail to fully report the information necessary to interpret their results, or do not address how IRR affects the power of their subsequent analyses for hypothesis testing. This paper provides an overview of methodological issues related to the assessment of IRR with a focus on study design, selection of appropriate statistics, and the computation, interpretation, and reporting of some commonly-used IRR statistics. Computational examples include SPSS and R syntax for computing Cohen’s kappa and intra-class correlations to assess IRR. PMID:22833776
Daily Behavior Report Cards: An Investigation of the Consistency of On-Task Data across Raters and Methods

ERIC Educational Resources Information Center

Chafouleas, Sandra M.; Riley-Tillman, T. Chris; Sassu, Kari A.; LaFrance, Mary J.; Patwa, Shamim S.

2007-01-01

In this study, the consistency of on-task data collected across raters using either a Daily Behavior Report Card (DBRC) or systematic direct observation was examined to begin to understand the decision reliability of using DBRCs to monitor student behavior. Results suggested very similar conclusions might be drawn when visually examining data…
Blindness and visual impairment in the Americas and the Caribbean

PubMed Central

Muñoz, B; West, S K

2002-01-01

Aim: To summarise available data on the prevalence and causes of visual impairment and blindness in the Americas and the Caribbean. Methods: The published literature was searched in Medline and LILACS using the following key words: blindness, visual impairment, prevalence. Articles were reviewed, and the references of the articles were also searched for relevant articles, which were also reviewed. Results: Using the mortality in children under the age of 5 as an indicator, the overall prevalence of childhood blindness (in the under age 15 group) for the region was estimated at 0.45/1000, with the majority (67%) living in countries with mortality of children under age 5 above 30/1000 live births. Corneal opacities were more common in countries where the under 5 year mortality are above 30/1000 live births and retinopathy of prematurity (ROP) was an important cause in countries with intermediate death rates. For adults, overall blindness rates were not estimated because of the social, economic, and ethnic diversity in the region. The primary causes of visual loss in adults in the Americas were age related eye diseases, notably cataract and glaucoma in the African-American and Hispanic populations, and age related macular degeneration in the white population. Uncorrected refractive error was a significant cause of decreased vision across ages, ethnic groups, and countries. Conclusion: More data are needed on the magnitude and causes of visual loss for the Caribbean and Latin American countries. Rates of blindness and visual loss from available data within these countries are widely disparate. Prevention and control of avoidable blindness needs to be an ongoing focus in this region. PMID:11973241
Global data on blindness.

PubMed Central

Thylefors, B.; Négrel, A. D.; Pararajasegaram, R.; Dadzie, K. Y.

1995-01-01

Globally, it is estimated that there are 38 million persons who are blind. Moreover, a further 110 million people have low vision and are at great risk of becoming blind. The main causes of blindness and low vision are cataract, trachoma, glaucoma, onchocerciasis, and xerophthalmia; however, insufficient data on blindness from causes such as diabetic retinopathy and age-related macular degeneration preclude specific estimations of their global prevalence. The age-specific prevalences of the major causes of blindness that are related to age indicate that the trend will be for an increase in such blindness over the decades to come, unless energetic efforts are made to tackle these problems. More data collected through standardized methodologies, using internationally accepted (ICD-10) definitions, are needed. Data on the incidence of blindness due to common causes would be useful for calculating future trends more precisely. PMID:7704921
Effects of questionnaire-based diagnosis and training on inter-rater reliability among practitioners of traditional Chinese medicine.

PubMed

Mist, Scott; Ritenbaugh, Cheryl; Aickin, Mikel

2009-07-01

To investigate whether a training process that focused on a questionnaire-based diagnosis in Traditional Chinese Medicine (TCM), and developing diagnostic consensus, would improve the agreement of TCM diagnoses among 10 TCM practitioners evaluating patients with temporomandibular joint disorder (TMJD). Evaluation of a diagnostic training program at the Department of Family and Community Medicine, University of Arizona, Tucson, Arizona, and the Oregon College of Oriental Medicine, Portland, Oregon. Screened participants for a study of TCM for TMJD. PRACTITIONERS: Ten (10) licensed acupuncturists with a minimum of 5 years licensure and education in Chinese herbs. A training session using a questionnaire-based diagnostic form was conducted, followed by waves of diagnostic sessions. Between sessions, practitioners discussed the results of the previous round of participants with a focus on reducing variability in primary diagnosis and severity rating of each diagnosis: 3 waves of 5 patients were assessed by 4 practitioner pairs for a total of 120 diagnoses. At 18 months, practitioners completed a recalibration exercise with a similar format with a total of 32 diagnoses. These diagnoses were then examined with respect to the rate of agreement among the 10 practitioners using inter-rater correlations and kappas. The inter-rater correlation with respect to the TCM diagnoses among the 10 practitioners increased from 0.112 to 0.618 with training. Statistically significant improvements were found between the baseline and 18 month exercises (p < 0.01). Inter-rater reliability of TCM diagnosis may be improved through a training process and a questionnaire-based diagnosis process. The improvements varied by diagnosis, with the greatest congruence among primary and more severe diagnoses. Future TCM studies should consider including calibration training to improve the validity of results.
Inter-rater reliability and aspects of validity of the parent-infant relationship global assessment scale (PIR-GAS)

PubMed Central

2013-01-01

Background The Parent-Infant Relationship Global Assessment Scale (PIR-GAS) signifies a conceptually relevant development in the multi-axial, developmentally sensitive classification system DC:0-3R for preschool children. However, information about the reliability and validity of the PIR-GAS is rare. A review of the available empirical studies suggests that in research, PIR-GAS ratings can be based on a ten-minute videotaped interaction sequence. The qualification of raters may be very heterogeneous across studies. Methods To test whether the use of the PIR-GAS still allows for a reliable assessment of the parent-infant relationship, our study compared a PIR-GAS ratings based on a full-information procedure across multiple settings with ratings based on a ten-minute video by two doctoral candidates of medicine. For each mother-child dyad at a family day hospital (N = 48), we obtained two video ratings and one full-information rating at admission to therapy and at discharge. This pre-post design allowed for a replication of our findings across the two measurement points. We focused on the inter-rater reliability between the video coders, as well as between the video and full-information procedure, including mean differences and correlations between the raters. Additionally, we examined aspects of the validity of video and full-information ratings based on their correlation with measures of child and maternal psychopathology. Results Our results showed that a ten-minute video and full-information PIR-GAS ratings were not interchangeable. Most results at admission could be replicated by the data obtained at discharge. We concluded that a higher degree of standardization of the assessment procedure should increase the reliability of the PIR-GAS, and a more thorough theoretical foundation of the manual should increase its validity. PMID:23705962
A Study of the Use of the "e-rater"® Scoring Engine for the Analytical Writing Measure of the "GRE"® revised General Test. Research Report. ETS RR-14-24

ERIC Educational Resources Information Center

Breyer, F. Jay; Attali, Yigal; Williamson, David M.; Ridolfi-McCulla, Laura; Ramineni, Chaitanya; Duchnowski, Matthew; Harris, April

2014-01-01

In this research, we investigated the feasibility of implementing the "e-rater"® scoring engine as a check score in place of all-human scoring for the "Graduate Record Examinations"® ("GRE"®) revised General Test (rGRE) Analytical Writing measure. This report provides the scientific basis for the use of e-rater as a…
20 CFR 416.1710 - Whom we refer and when.

Code of Federal Regulations, 2010 CFR

2010-04-01

... 20 Employees' Benefits 2 2010-04-01 2010-04-01 false Whom we refer and when. 416.1710 Section 416.1710 Employees' Benefits SOCIAL SECURITY ADMINISTRATION SUPPLEMENTAL SECURITY INCOME FOR THE AGED, BLIND, AND DISABLED Referral of Persons Eligible for Supplemental Security Income to Other Agencies Referral for Vocational Rehabilitation Services §...
The Intra-Rater Reliability of Nine Content-Validated Technical Skill Assessment Instruments (TSAI) for Athletic Taping Skills

ERIC Educational Resources Information Center

Lagumen, Niko G.; Butterwick, Dale J.; Paskevich, David M.; Fung, Tak S.; Donnon, Tyrone L.

2008-01-01

Objective: To establish the intra-rater reliability of nine content-validated Technical Skill Assessment Instruments (TSAI) for the skills of athletic taping. Setting: University of Calgary. Subjects: Canadian Certified Athletic Therapists, CAT(C), with a mean ± SD of 9.6 ± 10.8 years as a CAT(C), 7.8 ± 10.9 years as a Supervisory Athletic…
Blind Source Parameters for Performance Evaluation of Despeckling Filters.

PubMed

Biradar, Nagashettappa; Dewal, M L; Rohit, ManojKumar; Gowre, Sanjaykumar; Gundge, Yogesh

2016-01-01

The speckle noise is inherent to transthoracic echocardiographic images. A standard noise-free reference echocardiographic image does not exist. The evaluation of filters based on the traditional parameters such as peak signal-to-noise ratio, mean square error, and structural similarity index may not reflect the true filter performance on echocardiographic images. Therefore, the performance of despeckling can be evaluated using blind assessment metrics like the speckle suppression index, speckle suppression and mean preservation index (SMPI), and beta metric. The need for noise-free reference image is overcome using these three parameters. This paper presents a comprehensive analysis and evaluation of eleven types of despeckling filters for echocardiographic images in terms of blind and traditional performance parameters along with clinical validation. The noise is effectively suppressed using the logarithmic neighborhood shrinkage (NeighShrink) embedded with Stein's unbiased risk estimation (SURE). The SMPI is three times more effective compared to the wavelet based generalized likelihood estimation approach. The quantitative evaluation and clinical validation reveal that the filters such as the nonlocal mean, posterior sampling based Bayesian estimation, hybrid median, and probabilistic patch based filters are acceptable whereas median, anisotropic diffusion, fuzzy, and Ripplet nonlinear approximation filters have limited applications for echocardiographic images.
Blind Source Parameters for Performance Evaluation of Despeckling Filters

PubMed Central

Biradar, Nagashettappa; Dewal, M. L.; Rohit, ManojKumar; Gowre, Sanjaykumar; Gundge, Yogesh

2016-01-01

The speckle noise is inherent to transthoracic echocardiographic images. A standard noise-free reference echocardiographic image does not exist. The evaluation of filters based on the traditional parameters such as peak signal-to-noise ratio, mean square error, and structural similarity index may not reflect the true filter performance on echocardiographic images. Therefore, the performance of despeckling can be evaluated using blind assessment metrics like the speckle suppression index, speckle suppression and mean preservation index (SMPI), and beta metric. The need for noise-free reference image is overcome using these three parameters. This paper presents a comprehensive analysis and evaluation of eleven types of despeckling filters for echocardiographic images in terms of blind and traditional performance parameters along with clinical validation. The noise is effectively suppressed using the logarithmic neighborhood shrinkage (NeighShrink) embedded with Stein's unbiased risk estimation (SURE). The SMPI is three times more effective compared to the wavelet based generalized likelihood estimation approach. The quantitative evaluation and clinical validation reveal that the filters such as the nonlocal mean, posterior sampling based Bayesian estimation, hybrid median, and probabilistic patch based filters are acceptable whereas median, anisotropic diffusion, fuzzy, and Ripplet nonlinear approximation filters have limited applications for echocardiographic images. PMID:27298618
How do trained raters take context factors into account when assessing GP trainee communication performance? An exploratory, qualitative study.

PubMed

Essers, Geurt; Dielissen, Patrick; van Weel, Chris; van der Vleuten, Cees; van Dulmen, Sandra; Kramer, Anneke

2015-03-01

Communication assessment in real-life consultations is a complex task. Generic assessment instruments help but may also have disadvantages. The generic nature of the skills being assessed does not provide indications for context-specific behaviour required in practice situations; context influences are mostly taken into account implicitly. Our research questions are: 1. What factors do trained raters observe when rating workplace communication? 2. How do they take context factors into account when rating communication performance with a generic rating instrument? Nineteen general practitioners (GPs), trained in communication assessment with a generic rating instrument (the MAAS-Global), participated in a think-aloud protocol reflecting concurrent thought processes while assessing videotaped real-life consultations. They were subsequently interviewed to answer questions explicitly asking them to comment on the influence of predefined contextual factors on the assessment process. Results from both data sources were analysed. We used a grounded theory approach to untangle the influence of context factors on GP communication and on communication assessment. Both from the think-aloud procedure and from the interviews we identified various context factors influencing communication, which were categorised into doctor-related (17), patient-related (13), consultation-related (18), and education-related factors (18). Participants had different views and practices on how to incorporate context factors into the GP(-trainee) communication assessment. Raters acknowledge that context factors may affect communication in GP consultations, but struggle with how to take contextual influences into account when assessing communication performance in an educational context. To assess practice situations, raters need extra guidance on how to handle specific contextual factors.
Six of one, half a dozen of the other: A measure of multidisciplinary inter/intra-rater reliability of the society for fetal urology and urinary tract dilation grading systems for hydronephrosis.

PubMed

Rickard, Mandy; Easterbrook, Bethany; Kim, Soojin; Farrokhyar, Forough; Stein, Nina; Arora, Steven; Belostotsky, Vladamir; DeMaria, Jorge; Lorenzo, Armando J; Braga, Luis H

2017-02-01

The urinary tract dilation (UTD) classification system was introduced to standardize terminology in the reporting of hydronephrosis (HN), and bridge a gap between pre- and postnatal classification such as the Society for Fetal Urology (SFU) grading system. Herein we compare the intra/inter-rater reliability of both grading systems. SFU (I-IV) and UTD (I-III) grades were independently assigned by 13 raters (9 pediatric urology staff, 2 nephrologists, 2 radiologists), twice, 3 weeks apart, to 50 sagittal postnatal ultrasonographic views of hydronephrotic kidneys. Data regarding ureteral measurements and bladder abnormalities were included to allow proper UTD categorization. Ten images were repeated to assess intra-rater reliability. Krippendorff's alpha coefficient was used to measure overall and by grade intra/inter-rater reliability. Reliability between specialties and training levels were also analyzed. Overall inter-rater reliability was slightly higher for SFU (α = 0.842, 95% CI 0.812-0.879, in session 1; and α = 0.808, 95% CI 0.775-0.839, in session 2) than for UTD (α = 0.774, 95% CI 0.715-0.827, in session 1; and α = 0.679, 95% CI 0.605-0.750, in session 2). Reliability for intermediate grades (SFU II/III and UTD 2) of HN was poor regardless of the system. Reliabilities for SFU and UTD classifications among Urology, Nephrology, and Radiology, as well as between training levels were not significantly different. Despite the introduction of HN grading systems to standardize the interpretation and reporting of renal ultrasound in infants with HN, none have been proven superior in allowing clinicians to distinguish between "moderate" grades. While this study demonstrated high reliability in distinguishing between "mild" (SFU I/II and UTD 1) and "severe" (SFU IV and UTD 3) grades of HN, the overall reliability between specialties was poor. This is in keeping with a previous report of modest inter-rater reliability of the SFU system. This drawback is
Evaluation of "e-rater"® for the "Praxis I"®Writing Test. Research Report. ETS RR-15-03

ERIC Educational Resources Information Center

Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M.

2015-01-01

Automated scoring models were trained and evaluated for the essay task in the "Praxis I"® writing test. Prompt-specific and generic "e-rater"® scoring models were built, and evaluation statistics, such as quadratic weighted kappa, Pearson correlation, and standardized differences in mean scores, were examined to evaluate the…
The effect of rater training on scoring performance and scale-specific expertise amongst occupational therapists participating in a multicentre study: a single-group pre-post-test study.

PubMed

Hansen, Tina; Elholm Madsen, Esben; Sørensen, Annette

2016-01-01

In order to enhance the quality of the data collected in a multicentre validation study of a revised Danish version of the McGill Ingestive Skills Assessment (MISA), the authors developed a rater training programme. The purpose of the present study was to evaluate the effect of the training on scoring performance and scale-specific expertise amongst raters. During 2 days of rater training, 81 occupational therapists (OTs) were qualified to observe and score dysphagic clients' mealtime performance according to the criteria of 36 MISA-items. The training effects were evaluated pre- to post-training using percentage exact agreement (PA) of scored MISA items of a case-vignette and a Likert scale self-report of scale-specific expertise. PA increased significantly from pre- to post-training (Z = -4.404, p < 0.001), although items for which the case-vignette reflected deficient mealtime performance appeared most difficult to score. The OTs scale-specific expertise improved significantly (knowledge: Z = -7.857, p < 0.001 and confidence: Z = -7.838, p < 0.001). Rater training improved OTs scoring performance when using the Danish MISA as well as their perceived scale-specific expertise. Future rater training should emphasis the items identified as those most difficult to score. Additionally, further studies addressing different training approaches and durations are warranted. When occupational therapists (OTs) use the McGill Ingestive Skills Assessment (MISA) they observe, interpret and record occupational performance of dysphagic clients participating in a meal. This is a highly complex task, which might introduce unwanted variability in measurement scores. A 2-day rater training programme was developed and this builds on the findings of several studies. These suggest that combinations of different training methods tend to yield the most effective results. Participation in the newly developed training programme on how to administer the MISA significantly reduces unwanted
Delivering Effective Instruction to Students with Deaf-Blindness and/or Other Severe Disabilities.

ERIC Educational Resources Information Center

North Carolina State Dept. of Public Instruction, Raleigh.

A guide to identifying, placing, and instructing children with severe disabilities, including deaf-blindness, is presented. Identification and placement information focuses on locating and referring children in need of special education services, the role of committees and staff members, the individualized education program, entrance and placement…
Fear of blindness and perceptions about blind people. The Andhra Pradesh Eye Disease Study.

PubMed

Giridhar, Pyda; Dandona, Rakhi; Prasad, Mudigonda N; Kovai, Vilas; Dandona, Lalit

2002-09-01

This study assessed the fear of being affected by illness and disability including blindness, and perceptions of the population towards blind people in the Indian state of Andhra Pradesh. A total of 11,786 subjects of all ages were sampled from 94 clusters in one urban and three rural study areas of Andhra Pradesh using stratified, random, cluster, systematic sampling to represent the population of this state. A total of 10,293 subjects of all ages underwent a detailed interview and dilated ocular evaluation. Subjects > 15 years of age (7,432) were interviewed regarding fear of illness/disability and their perceptions of blind people. The fear of blindness was assessed in comparison to cancer, severe mental illness, heart attack, losing limbs, deafness, inability to speak, and paralysis. A majority of the study population feared all the illnesses and disabilities assessed. The prevalence of fear of blindness was 90.9% (95% confidence interval 89.1-92.8%) and 92.1% (95% confidence interval 90.6-93.6%) in urban and rural study areas respectively. With multiple logistic regression the fear of blindness was significantly higher for those with any level of education and for those living in the rural study areas. The proportion of those having positive feelings towards blind people was higher in the urban study area. A high prevalence of blindness, 1.84%, has been reported in this population previously. These data suggest that this population feared blindness, and yet there is a high rate of blindness. This reflects the need for increasing awareness about blindness in this population through eye health promotion strategies in order to reduce blindness, and awareness regarding the availability of rehabilitation services.

The Sokoto blind beggars: causes of blindness and barriers to rehabilitation services.

PubMed

Balarabe, Aliyu Hamza; Mahmoud, Abdulraheem O; Ayanniyi, Abdulkabir Ayansiji

2014-01-01

To determine the causes of blindness and the barriers to accessing rehabilitation services (RS) among blind street beggars (bsb) in Sokoto, Nigeria. A cross-sectional survey of 202 bsb (VA < 3/60) using interviewer administered questionnaire. The causes of blindness were diagnosed by clinical ophthalmic examination. There were 107 (53%) males and 95 (47%) females with a mean age of 49 years (SD 12.2). Most bsb 191 (94.6%) had non-formal education. Of 190 (94.1%) irreversibly bsb, 180/190 (94.7%) had no light perception (NPL) bilaterally. The major causes of blindness were non-trachomatous corneal opacity (60.8%) and trachoma corneal opacity (12.8%). There were 166 (82%) blind from avoidable causes and 190 (94.1%) were irreversibly blind with 76.1% due to avoidable causes. The available sub-standard RS were educational, vocational and financial support. The barriers to RS in the past included non-availability 151 (87.8%), inability to afford 2 (1.2%), unfelt need 4 (2.3%), family refusal 1 (0.6), ignorance 6 (3.5%) and being not linked 8 (4.7%). The barriers to RS during the study period included inability of 72 subjects (35.6%) to access RS and 59 (81.9%) were due to lack of linkage to the existing services. Corneal opacification was the major cause of blindness among bsb. The main challenges to RS include the inadequate services available, societal and users factors. Renewed efforts are warranted toward the prevention of avoidable causes of blindness especially corneal opacities. The quality of life of the blind street beggar should be improved through available, accessible and affordable well-maintained and sustained rehabilitation services.
Telemedicine Physical Examination Utilizing a Consumer Device Demonstrates Poor Concordance with In-Person Physical Examination in Emergency Department Patients with Sore Throat: A Prospective Blinded Study.

PubMed

Akhtar, Moneeb; Van Heukelom, Paul G; Ahmed, Azeemuddin; Tranter, Rachel D; White, Erinn; Shekem, Nathaniel; Walz, David; Fairfield, Catherine; Vakkalanka, J Priyanka; Mohr, Nicholas M

2018-02-22

Telemedicine allows patients to connect with healthcare providers remotely. It has recently expanded to evaluate low-acuity illnesses such as pharyngitis by using patients' personal communication devices. The purpose of our study was to compare the telemedicine-facilitated physical examination with an in-person examination in emergency department (ED) patients with sore throat. This was a prospective, observational, blinded diagnostic concordance study of patients being seen for sore throat in a 60,000-visit Midwestern academic ED. A telemedicine and a face-to-face examination were performed independently by two advanced practice providers (APP), blinded to the results of the other evaluator. The primary outcome was agreement on pharyngeal redness between the evaluators, with secondary outcomes of agreement and inter-rater reliability on 14 other aspects of the pharyngeal physical examination. We also conducted a survey of patients and providers to evaluate perceptions and preferences for sore throat evaluation using telemedicine. Sixty-two patients were enrolled, with a median tonsil size of 1.0. Inter-rater agreement (kappa) for tonsil size was 0.394, which was worse than our predetermined concordance threshold. Other kappa values ranged from 0 to 0.434, and telemedicine was best for detecting abnormal coloration of the palate and tender superficial cervical lymph nodes (anterior structures), but poor for detecting abnormal submandibular lymph nodes or asymmetry of the posterior pharynx (posterior structures). In survey responses, telemedicine was judged easier to use and more comfortable for providers than patients; however, neither patients nor providers preferred in-person to telemedicine evaluation. Telemedicine exhibited poor agreement with the in-person physical examination on the primary outcome of tonsil size, but exhibited moderate agreement on coloration of the palate and cervical lymphadenopathy. Future work should better characterize the importance of
The effect of vertical and horizontal symmetry on memory for tactile patterns in late blind individuals.

PubMed

Cattaneo, Zaira; Vecchi, Tomaso; Fantino, Micaela; Herbert, Andrew M; Merabet, Lotfi B

2013-02-01

Visual stimuli that exhibit vertical symmetry are easier to remember than stimuli symmetric along other axes, an advantage that extends to the haptic modality as well. Critically, the vertical symmetry memory advantage has not been found in early blind individuals, despite their overall superior memory, as compared with sighted individuals, and the presence of an overall advantage for identifying symmetric over asymmetric patterns. The absence of the vertical axis memory advantage in the early blind may depend on their total lack of visual experience or on the effect of prolonged visual deprivation. To disentangle this issue, in this study, we measured the ability of late blind individuals to remember tactile spatial patterns that were either vertically or horizontally symmetric or asymmetric. Late blind participants showed better memory performance for symmetric patterns. An additional advantage for the vertical axis of symmetry over the horizontal one was reported, but only for patterns presented in the frontal plane. In the horizontal plane, no difference was observed between vertical and horizontal symmetric patterns, due to the latter being recalled particularly well. These results are discussed in terms of the influence of the spatial reference frame adopted during exploration. Overall, our data suggest that prior visual experience is sufficient to drive the vertical symmetry memory advantage, at least when an external reference frame based on geocentric cues (i.e., gravity) is adopted.
Introducing a means of quantifying community reputation: the print media as a data source.

PubMed

McLaren, Lindsay; Perry, Rosemary; Carruthers, Lesley; Hawe, Penelope

2005-06-01

A community's reputation may have implications for self-esteem, morale, or other health outcomes of residents. In this study, we introduce a means of quantifying the reputation of communities in Calgary, Canada based on their portrayal in the daily citywide newspaper. Publication dates were selected from an 8.5-year period using constructed week sampling. For communities designated as high or low in well-being, sampled references were rated as positive, negative, or neutral in topic, by two independent raters who were blind to community identity. Findings suggest that the print media represent a convenient and discriminating data source for characterising some aspects of community reputation.
Removing Bias towards World Englishes: The Development of a Rater Attitude Instrument Using Indian English as a Stimulus

ERIC Educational Resources Information Center

Hsu, Tammy Huei-Lien

2016-01-01

This study explores the attitudes of raters of English speaking tests towards the global spread of English and the challenges in rating speakers of Indian English in descriptive speaking tasks. The claims put forward by language attitude studies indicate a validity issue in English speaking tests: listeners tend to hold negative attitudes towards…
The Consistency between Human Raters and an Automated Essay Scoring System in Grading High School Students' English Writing

ERIC Educational Resources Information Center

Tsai, Min-hsiu

2012-01-01

This study investigates the consistency between human raters and an automated essay scoring system in grading high school students' English compositions. A total of 923 essays from 23 classes of 12 senior high schools in Taiwan (Republic of China) were obtained and scored manually and electronically. The results show that the consistency between…
Plant disease severity assessment - How rater bias, assessment method and experimental design affect hypothesis testing and resource use efficiency

USDA-ARS?s Scientific Manuscript database

The impact of rater bias and assessment method on hypothesis testing was studied for different experimental designs for plant disease assessment using balanced and unbalanced data sets. Data sets with the same number of replicate estimates for each of two treatments are termed ‘balanced’, and those ...
The Sokoto Blind Beggars: Causes of Blindness and Barriers to Rehabilitation Services

PubMed Central

Balarabe, Aliyu Hamza; Mahmoud, Abdulraheem O.; Ayanniyi, Abdulkabir Ayansiji

2014-01-01

Purpose: To determine the causes of blindness and the barriers to accessing rehabilitation services (RS) among blind street beggars (bsb) in Sokoto, Nigeria. Materials and Methods: A cross-sectional survey of 202 bsb (VA < 3/60) using interviewer administered questionnaire. The causes of blindness were diagnosed by clinical ophthalmic examination. Results: There were 107 (53%) males and 95 (47%) females with a mean age of 49 years (SD 12.2). Most bsb 191 (94.6%) had non-formal education. Of 190 (94.1%) irreversibly bsb, 180/190 (94.7%) had no light perception (NPL) bilaterally. The major causes of blindness were non-trachomatous corneal opacity (60.8%) and trachoma corneal opacity (12.8%). There were 166 (82%) blind from avoidable causes and 190 (94.1%) were irreversibly blind with 76.1% due to avoidable causes. The available sub-standard RS were educational, vocational and financial support. The barriers to RS in the past included non-availability 151 (87.8%), inability to afford 2 (1.2%), unfelt need 4 (2.3%), family refusal 1 (0.6), ignorance 6 (3.5%) and being not linked 8 (4.7%). The barriers to RS during the study period included inability of 72 subjects (35.6%) to access RS and 59 (81.9%) were due to lack of linkage to the existing services. Conclusion: Corneal opacification was the major cause of blindness among bsb. The main challenges to RS include the inadequate services available, societal and users factors. Renewed efforts are warranted toward the prevention of avoidable causes of blindness especially corneal opacities. The quality of life of the blind street beggar should be improved through available, accessible and affordable well-maintained and sustained rehabilitation services. PMID:24791106
Automated Scoring of Mathematics Tasks in the Common Core Era: Enhancements to M-Rater in Support of "CBAL"™ Mathematics and the Common Core Assessments. Research Reports. ETS RR-13-26

ERIC Educational Resources Information Center

Fife, James H.

2013-01-01

The m-rater scoring engine has been used successfully for the past several years to score "CBAL"™ mathematics tasks, for the most part without the need for human scoring. During this time, various improvements to m-rater and its scoring keys have been implemented in response to specific CBAL needs. In 2012, with the general move toward…
ESTIMATED STATISTICS ON BLINDNESS AND VISION PROBLEMS. NATIONAL SOCIETY FOR THE PREVENTION OF BLINDNESS FACT BOOK.

ERIC Educational Resources Information Center

HATFIELD, ELIZABETH M.

CURRENT ESTIMATES AND SOME TREND DATA ARE PRESENTED ON THE FOLLOWING SUBJECTS -- POPULATION GROWTH (1940-1960), PREVALENCE OF LEGAL BLINDNESS, NEW CASES OF LEGAL BLINDNESS, AGE DISTRIBUTION OF LEGALLY BLIND PERSONS, CAUSES OF LEGAL BLINDNESS, CHANGING PATTERNS IN CAUSES OF LEGAL BLINDNESS, CASES OF GLAUCOMA, SCHOOL CHILDREN NEEDING EYE CARE,…
Inter-rater reliability of kinesthetic measurements with the KINARM robotic exoskeleton.

PubMed

Semrau, Jennifer A; Herter, Troy M; Scott, Stephen H; Dukelow, Sean P

2017-05-22

Kinesthesia (sense of limb movement) has been extremely difficult to measure objectively, especially in individuals who have survived a stroke. The development of valid and reliable measurements for proprioception is important to developing a better understanding of proprioceptive impairments after stroke and their impact on the ability to perform daily activities. We recently developed a robotic task to evaluate kinesthetic deficits after stroke and found that the majority (~60%) of stroke survivors exhibit significant deficits in kinesthesia within the first 10 days post-stroke. Here we aim to determine the inter-rater reliability of this robotic kinesthetic matching task. Twenty-five neurologically intact control subjects and 15 individuals with first-time stroke were evaluated on a robotic kinesthetic matching task (KIN). Subjects sat in a robotic exoskeleton with their arms supported against gravity. In the KIN task, the robot moved the subjects' stroke-affected arm at a preset speed, direction and distance. As soon as subjects felt the robot begin to move their affected arm, they matched the robot movement with the unaffected arm. Subjects were tested in two sessions on the KIN task: initial session and then a second session (within an average of 18.2 ± 13.8 h of the initial session for stroke subjects), which were supervised by different technicians. The task was performed both with and without the use of vision in both sessions. We evaluated intra-class correlations of spatial and temporal parameters derived from the KIN task to determine the reliability of the robotic task. We evaluated 8 spatial and temporal parameters that quantify kinesthetic behavior. We found that the parameters exhibited moderate to high intra-class correlations between the initial and retest conditions (Range, r-value = [0.53-0.97]). The robotic KIN task exhibited good inter-rater reliability. This validates the KIN task as a reliable, objective method for quantifying
A treatment schedule of conventional physical therapy provided to enhance upper limb sensorimotor recovery after stroke: expert criterion validity and intra-rater reliability.

PubMed

Donaldson, Catherine; Tallis, Raymond C; Pomeroy, Valerie M

2009-06-01

Inadequate description of treatment hampers progress in stroke rehabilitation. To develop a valid, reliable, standardised treatment schedule of conventional physical therapy provided for the paretic upper limb after stroke. Eleven neurophysiotherapists participated in the established methodology: semi-structured interviews, focus groups and piloting a draft treatment schedule in clinical practice. Different physiotherapists (n=13) used the treatment schedule to record treatment given to stroke patients with mild, moderate and severe upper limb paresis. Rating of adequacy of the treatment schedule was made using a visual analogue scale (0 to 100mm). Mean (95% confidence interval) visual analogue scores were calculated (expert criterion validity). For intra-rater reliability, each physiotherapist observed a video tape of their treatment and immediately completed a treatment schedule recording form on two separate occasions, 4 to 6 weeks apart. The Kappa statistic was calculated for intra-rater reliability. The treatment schedule consists of a one-page A4 recording form and a user booklet, detailing 50 treatment activities. Expert criterion validity was 79 (95% confidence interval 74 to 84). Intra-rater Kappa was 0.81 (P<0.001). This treatment schedule can be used to document conventional physical therapy in subsequent clinical trials in the geographical area of its development. Further work is needed to investigate generalisability beyond this geographical area.
Repetition blindness and homophone blindness in young and older adults.

PubMed

Tyrrell, Caitlin J; James, Lori E; Noble, Paula M

2016-11-01

We tested age effects on repetition blindness (RB), defined as the reduced probability of reporting a target word following presentation of the same word in a rapidly presented list. We also tested age effects on homophone blindness (HB), in which the first word is a homophone of the target word rather than a repeated word. Thirty young and 28 older adults viewed rapidly presented lists of words containing repeated, homophone, or unrepeated word pairs and reported all of the words immediately after each list. Older adults exhibited a greater degree of RB and HB than young adults using a conditional scoring method that provides certainty that blindness has occurred. The existence of RB and HB for both age groups, and increased blindness for older compared to young adults, supports predictions of a binding theory that has successfully accounted for a wide range of phenomena in cognitive aging.
Reliable and valid assessment of Lichtenstein hernia repair skills.

PubMed

Carlsen, C G; Lindorff-Larsen, K; Funch-Jensen, P; Lund, L; Charles, P; Konge, L

2014-08-01

Lichtenstein hernia repair is a common surgical procedure and one of the first procedures performed by a surgical trainee. However, formal assessment tools developed for this procedure are few and sparsely validated. The aim of this study was to determine the reliability and validity of an assessment tool designed to measure surgical skills in Lichtenstein hernia repair. Key issues were identified through a focus group interview. On this basis, an assessment tool with eight items was designed. Ten surgeons and surgical trainees were video recorded while performing Lichtenstein hernia repair, (four experts, three intermediates, and three novices). The videos were blindly and individually assessed by three raters (surgical consultants) using the assessment tool. Based on these assessments, validity and reliability were explored. The internal consistency of the items was high (Cronbach's alpha = 0.97). The inter-rater reliability was very good with an intra-class correlation coefficient (ICC) = 0.93. Generalizability analysis showed a coefficient above 0.8 even with one rater. The coefficient improved to 0.92 if three raters were used. One-way analysis of variance found a significant difference between the three groups which indicates construct validity, p < 0.001. Lichtenstein hernia repair skills can be assessed blindly by a single rater in a reliable and valid fashion with the new procedure-specific assessment tool. We recommend this tool for future assessment of trainees performing Lichtenstein hernia repair to ensure that the objectives of competency-based surgical training are met.
Corneal blindness: a global perspective.

PubMed Central

Whitcher, J. P.; Srinivasan, M.; Upadhyay, M. P.

2001-01-01

Diseases affecting the cornea are a major cause of blindness worldwide, second only to cataract in overall importance. The epidemiology of corneal blindness is complicated and encompasses a wide variety of infectious and inflammatory eye diseses that cause corneal scarring, which ultimately leads to functional blindness. In addition, the prevalence of corneal disease varies from country to country and even from one population to another. While cataract is responsible for nearly 20 million of the 45 million blind people in the world, the next major cause is trachoma which blinds 4.9 million individuals, mainly as a result of corneal scarring and vascularization. Ocular trauma and corneal ulceration are significant causes of corneal blindness that are often underreported but may be responsible for 1.5-2.0 million new cases of monocular blindness every year. Causes of childhood blindness (about 1.5 million worldwide with 5 million visually disabled) include xerophthalmia (350,000 cases annually), ophthalmia neonatorum, and less frequently seen ocular diseases such as herpes simplex virus infections and vernal keratoconjunctivitis. Even though the control of onchocerciasis and leprosy are public health success stories, these diseases are still significant causes of blindness--affecting a quarter of a million individuals each. Traditional eye medicines have also been implicated as a major risk factor in the current epidemic of corneal ulceration in developing countries. Because of the difficulty of treating corneal blindness once it has occurred, public health prevention programmes are the most cost-effective means of decreasing the global burden of corneal blindness. PMID:11285665
Inter-Rater and Test-Retest (Between-Sessions) Reliability of the 4-Skills Scan for Dutch Elementary School Children

ERIC Educational Resources Information Center

van Kernebeek, Willem G.; de Schipper, Antoine W.; Savelsbergh, Geert J. P.; Toussaint, Huub M.

2018-01-01

In The Netherlands, the 4-Skills Scan is an instrument for physical education teachers to assess gross motor skills of elementary school children. Little is known about its reliability. Therefore, in this study the test-retest and inter-rater reliability was determined. Respectively, 624 and 557 Dutch 6- to 12-year-old children were analyzed for…
[Visual impairment and blindness in children in a Malawian school for the blind].

PubMed

Schulze Schwering, M; Nyrenda, M; Spitzer, M S; Kalua, K

2013-08-01

The aim of this study was to determine the anatomic sites of severe visual impairment and blindness in children in an integrated school for the blind in Malawi, and to compare the results with those of previous Malawian blind school studies. Children attending an integrated school for the blind in Malawi were examined in September 2011 using the standard WHO/PBL eye examination record for children with blindness and low vision. Visual acuity [VA] of the better eye was classified using the standardised WHO reporting form. Fifty-five pupils aged 6 to 19 years were examined, 39 (71 %) males, and 16 (29 %) females. Thirty eight (69%) were blind [BL], 8 (15 %) were severely visually impaired [SVI], 8 (15 %) visually impaired [VI], and 1 (1.8 %) was not visually impaired [NVI]. The major anatomic sites of visual loss were optic nerve (16 %) and retina (16 %), followed by lens/cataract (15 %), cornea (11 %) and lesions of the whole globe (11 %), uveal pathologies (6 %) and cortical blindness (2 %). The exact aetiology of VI or BL could not be determined in most children. Albinism accounted for 13 % (7/55) of the visual impairments. 24 % of the cases were considered to be potentially avoidable: refractive amblyopia among pseudophakic patients and corneal scaring. Optic atrophy, retinal diseases (mostly albinism) and cataracts were the major causes of severe visual impairment and blindness in children in an integrated school for the blind in Malawi. Corneal scarring was now the fourth cause of visual impairment, compared to being the commonest cause 35 years ago. Congenital cataract and its postoperative outcome were the commonest remedial causes of visual impairment. Georg Thieme Verlag KG Stuttgart · New York.
Causes and emerging trends of childhood blindness: findings from schools for the blind in Southeast Nigeria.

PubMed

Aghaji, Ada; Okoye, Obiekwe; Bowman, Richard

2015-06-01

To ascertain the causes severe visual impairment and blindness (SVI/BL) in schools for the blind in southeast Nigeria and to evaluate temporal trends. All children who developed blindness at <15 years of age in all the three schools for the blind in southeast Nigeria were examined. All the data were recorded on a WHO/Prevention of Blindness (WHO/PBL) form entered into a Microsoft Access database and transferred to STATA V.12.1 for analysis. To estimate temporal trends in causes of blindness, older (>15 years) children were compared with younger (≤15 years) children. 124 children were identified with SVI/BL. The most common anatomical site of blindness was the lens (33.9%). Overall, avoidable blindness accounted for 73.4% of all blindness. Exploring trends in SVI/BL between children ≤15 years of age and those >15 years old, this study shows a reduction in avoidable blindness but an increase in cortical visual impairment in the younger age group. The results from this study show a statistically significant decrease in avoidable blindness in children ≤15 years old. Corneal blindness appears to be decreasing but cortical visual impairment seems to be emerging in the younger age group. Appropriate strategies for the prevention of avoidable childhood blindness in Nigeria need to be developed and implemented. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Programs for the Deaf-Blind.

ERIC Educational Resources Information Center

American Annals of the Deaf, 1991

1991-01-01

This directory lists contact information for programs for the deaf-blind in the United States in 3 categories: (1) programs for deaf-blind children and youth (29 programs listed); (2) Helen Keller National Center for Deaf-Blind Youth and Adults (1 national and 10 regional offices); and (3) programs for training teachers of the deaf-blind (4…
Reliable and valid assessment of competence in endoscopic ultrasonography and fine-needle aspiration for mediastinal staging of non-small cell lung cancer.

PubMed

Konge, L; Vilmann, P; Clementsen, P; Annema, J T; Ringsted, C

2012-10-01

Fine-needle aspiration (FNA) guided by endoscopic ultrasonography (EUS) is important in mediastinal staging of non-small cell lung cancer (NSCLC). Training standards and implementation strategies of this technique are currently under discussion. The aim of this study was to explore the reliability and validity of a newly developed EUS Assessment Tool (EUSAT) designed to measure competence in EUS - FNA for mediastinal staging of NSCLC. A total of 30 patients with proven or suspected NSCLC underwent EUS - FNA for mediastinal staging by three trainees and three experienced physicians. Their performances were assessed prospectively by three experts in EUS under direct observation and again 2 months later in a blinded fashion using digital video-recordings. Based on the assessments, intra-rater reliability, inter-rater reliability, and construct validity were explored. The intra-rater reliability was good (Cronbach's α = 0.80), but comparison of results based on direct observations and blinded video-recordings indicated a significant bias favoring consultants (P = 0.022). Inter-rater reliability was very good (Cronbach's α = 0.93). However, one rater assessing five procedures or two raters each assessing four procedures were necessary to secure a generalizability coefficient of 0.80. The assessment tool demonstrated construct validity by discriminating between trainees and experienced physicians (P = 0.034). Competency in mediastinal staging of NSCLC using EUS and EUS - FNA can be assessed in a reliable and valid way using the EUSAT assessment tool. Measuring and defining competency and training requirements could improve EUS quality and benefit patient care. © Georg Thieme Verlag KG Stuttgart · New York.

Safety perception referents of permanent and temporary employees: safety climate boundaries in the industrial workplace.

PubMed

Luria, Gil; Yagil, Dana

2010-09-01

To explore the significant referents of safety perceptions among permanent and temporary employees in order to identify the boundaries of safety climate in a heterogeneous workforce. Collection of data from semi-structured interviews with employees in manufacturing organizations, using a combination of qualitative and quantitative methods to identify basic safety perceptions. Independent raters used content analysis to examine the data. Analysis of the data revealed differences between safety themes at organization, group and individual levels. Themes relating to the individual were more prevalent among temporary employees, while those relating to the group and the organization prevailed among permanent employees. Permanent employees view organizational and group levels as significant referents of safety perceptions, while temporary employees focus on the individual level. The results challenge the current view of safety climate as a uniform concept for all employees and prescribe boundary conditions for safety climate. It is suggested that organizations should implement "tailor-made" safety-climate practices according to the referents of employee sub-groups. 2009 Elsevier Ltd. All rights reserved.
Unconditionally Secure Blind Signatures

NASA Astrophysics Data System (ADS)

Hara, Yuki; Seito, Takenobu; Shikata, Junji; Matsumoto, Tsutomu

The blind signature scheme introduced by Chaum allows a user to obtain a valid signature for a message from a signer such that the message is kept secret for the signer. Blind signature schemes have mainly been studied from a viewpoint of computational security so far. In this paper, we study blind signatures in unconditional setting. Specifically, we newly introduce a model of unconditionally secure blind signature schemes (USBS, for short). Also, we propose security notions and their formalization in our model. Finally, we propose a construction method for USBS that is provably secure in our security notions.
Event-related potentials reveal rapid registration of features of infrequent changes during change blindness

PubMed Central

2010-01-01

Background Change blindness refers to a failure to detect changes between consecutively presented images separated by, for example, a brief blank screen. As an explanation of change blindness, it has been suggested that our representations of the environment are sparse outside focal attention and even that changed features may not be represented at all. In order to find electrophysiological evidence of neural representations of changed features during change blindness, we recorded event-related potentials (ERPs) in adults in an oddball variant of the change blindness flicker paradigm. Methods ERPs were recorded when subjects performed a change detection task in which the modified images were infrequently interspersed (p = .2) among the frequently (p = .8) presented unmodified images. Responses to modified and unmodified images were compared in the time window of 60-100 ms after stimulus onset. Results ERPs to infrequent modified images were found to differ in amplitude from those to frequent unmodified images at the midline electrodes (Fz, Pz, Cz and Oz) at the latency of 60-100 ms even when subjects were unaware of changes (change blindness). Conclusions The results suggest that the brain registers changes very rapidly, and that changed features in images are neurally represented even without participants' ability to report them. PMID:20181126
NBI‐98854, a selective monoamine transport inhibitor for the treatment of tardive dyskinesia: A randomized, double‐blind, placebo‐controlled study

PubMed Central

Jimenez, Roland; Hauser, Robert A.; Factor, Stewart A.; Burke, Joshua; Mandri, Daniel; Castro‐Gayol, Julio C.

2015-01-01

ABSTRACT Background Tardive dyskinesia is a persistent movement disorder induced by chronic neuroleptic exposure. NBI‐98854 is a novel, highly selective, vesicular monoamine transporter 2 inhibitor. We present results of a randomized, 6‐week, double‐blind, placebo‐controlled, dose‐titration study evaluating the safety, tolerability, and efficacy of NBI‐98854 for the treatment of tardive dyskinesia. Methods Male and female adult subjects with moderate or severe tardive dyskinesia were included. NBI‐98854 or placebo was given once per day starting at 25 mg and then escalated by 25 mg to a maximum of 75 mg based on dyskinesia and tolerability assessment. The primary efficacy endpoint was the change in Abnormal Involuntary Movement Scale from baseline at week 6 scored by blinded, central video raters. The secondary endpoint was the Clinical Global Impression of Change—Tardive Dyskinesia score assessed by the blinded investigator. Results Two hundred five potential subjects were screened, and 102 were randomized; 76% of NBI‐98854 subjects and 80% of placebo subjects reached the maximum allowed dose. Abnormal Involuntary Movement Scale scores for NBI‐98854 compared with placebo were significantly reduced (p = 0.0005). Active drug was also superior on the Clinical Global Impression of Change—Tardive Dyskinesia (p < 0.0001). Treatment‐emergent adverse event rates were 49% in the NBI‐98854 and 33% in the placebo subjects. The most common adverse events (active vs. placebo) were fatigue and headache (9.8% vs. 4.1%) and constipation and urinary tract infection (3.9% vs. 6.1%). No clinically relevant changes in safety assessments were noted. Conclusion NBI‐98854 significantly improved tardive dyskinesia and was well tolerated in patients. These results support the phase 3 clinical trials of NBI‐98854 now underway. © 2015 The Authors. Movement Disorders published by Wiley Periodicals, Inc. on behalf of International Parkinson and Movement Disorder
NBI-98854, a selective monoamine transport inhibitor for the treatment of tardive dyskinesia: A randomized, double-blind, placebo-controlled study.

PubMed

O'Brien, Christopher F; Jimenez, Roland; Hauser, Robert A; Factor, Stewart A; Burke, Joshua; Mandri, Daniel; Castro-Gayol, Julio C

2015-10-01

Tardive dyskinesia is a persistent movement disorder induced by chronic neuroleptic exposure. NBI-98854 is a novel, highly selective, vesicular monoamine transporter 2 inhibitor. We present results of a randomized, 6-week, double-blind, placebo-controlled, dose-titration study evaluating the safety, tolerability, and efficacy of NBI-98854 for the treatment of tardive dyskinesia. Male and female adult subjects with moderate or severe tardive dyskinesia were included. NBI-98854 or placebo was given once per day starting at 25 mg and then escalated by 25 mg to a maximum of 75 mg based on dyskinesia and tolerability assessment. The primary efficacy endpoint was the change in Abnormal Involuntary Movement Scale from baseline at week 6 scored by blinded, central video raters. The secondary endpoint was the Clinical Global Impression of Change-Tardive Dyskinesia score assessed by the blinded investigator. Two hundred five potential subjects were screened, and 102 were randomized; 76% of NBI-98854 subjects and 80% of placebo subjects reached the maximum allowed dose. Abnormal Involuntary Movement Scale scores for NBI-98854 compared with placebo were significantly reduced (p = 0.0005). Active drug was also superior on the Clinical Global Impression of Change-Tardive Dyskinesia (p < 0.0001). Treatment-emergent adverse event rates were 49% in the NBI-98854 and 33% in the placebo subjects. The most common adverse events (active vs. placebo) were fatigue and headache (9.8% vs. 4.1%) and constipation and urinary tract infection (3.9% vs. 6.1%). No clinically relevant changes in safety assessments were noted. NBI-98854 significantly improved tardive dyskinesia and was well tolerated in patients. These results support the phase 3 clinical trials of NBI-98854 now underway. © 2015 The Authors. Movement Disorders published by Wiley Periodicals, Inc. on behalf of International Parkinson and Movement Disorder Society.
Inter-rater agreement of comorbid DSM-IV personality disorders in substance abusers.

PubMed

Hesse, Morten; Thylstrup, Birgitte

2008-05-17

Little is known about the inter-rater agreement of personality disorders in clinical settings. Clinicians rated 75 patients with substance use disorders on the DSM-IV criteria of personality disorders in random order, and on rating scales representing the severity of each. Convergent validity agreement was moderate (range for r = 0.55, 0.67) for cluster B disorders rated with DSM-IV criteria, and discriminant validity was moderate for eight of the ten personality disorders. Convergent validity of the rating scales was only moderate for antisocial and narcissistic personality disorder. Dimensional ratings may be used in research studies and clinical practice with some caution, and may be collected as one of several sources of information to describe the personality of a patient.
[Orbital compartment syndrome. The most frequent cause of blindness following facial trauma].

PubMed

Klenk, Gusztáv; Katona, József; Kenderfi, Gábor; Lestyán, János; Gombos, Katalin; Hirschberg, Andor

2017-09-01

Although orbital compartment syndrome is a rare condition, it is still the most common cause of blindness following simple or complicated facial fractures. Its pathomechanism is similar to the compartment syndrome in the limb. Little extra fluid (blood, oedema, brain, foreign body) in a non-space yielding space results with increasingly higher pressures within a short period of time. Unless urgent surgical intervention is performed the blocked circulation of the central retinal artery will result irreversible ophthalmic nerve damage and blindness. Aim, material and method: A retrospective analysis of ten years, 2007-2017, in our hospital among those patients referred to us with facial-head trauma combined with blindness. 571 patients had fractures involving the orbit. 23 patients become blind from different reasons. The most common cause was orbital compartment syndrome in 17 patients; all had retrobulbar haematomas as well. 6 patients with retrobulbar haematoma did not develop compartment syndrome. Compartment syndrome was found among patient with extensive and minimal fractures such as with large and minimal haematomas. Early lateral canthotomy and decompression saved 7 patients from blindness. We can not predict and do not know why some patients develop orbital compartment syndrome. Compartment syndrome seems independent from fracture mechanism, comminution, dislocation, amount of orbital bleeding. All patients are in potential risk with midface fractures. We have a high suspicion that orbital compartment syndrome has been somehow missed out in the recommended textbooks of our medical universities and in the postgraduate trainings. Thus compartment syndrome is not recognized. Teaching, training and early surgical decompression is the only solution to save the blind eye. Orv Hetil. 2017; 158(36): 1410-1420.
An improved image non-blind image deblurring method based on FoEs

NASA Astrophysics Data System (ADS)

Zhu, Qidan; Sun, Lei

2013-03-01

Traditional non-blind image deblurring algorithms always use maximum a posterior(MAP). MAP estimates involving natural image priors can reduce the ripples effectively in contrast to maximum likelihood(ML). However, they have been found lacking in terms of restoration performance. Based on this issue, we utilize MAP with KL penalty to replace traditional MAP. We develop an image reconstruction algorithm that minimizes the KL divergence between the reference distribution and the prior distribution. The approximate KL penalty can restrain over-smooth caused by MAP. We use three groups of images and Harris corner detection to prove our method. The experimental results show that our algorithm of non-blind image restoration can effectively reduce the ringing effect and exhibit the state-of-the-art deblurring results.
Underdetermined blind separation of three-way fluorescence spectra of PAHs in water

NASA Astrophysics Data System (ADS)

Yang, Ruifang; Zhao, Nanjing; Xiao, Xue; Zhu, Wei; Chen, Yunan; Yin, Gaofang; Liu, Jianguo; Liu, Wenqing

2018-06-01

In this work, underdetermined blind decomposition method is developed to recognize individual components from the three-way fluorescent spectra of their mixtures by using sparse component analysis (SCA). The mixing matrix is estimated from the mixtures using fuzzy data clustering algorithm together with the scatters corresponding to local energy maximum value in the time-frequency domain, and the spectra of object components are recovered by pseudo inverse technique. As an example, using this method three and four pure components spectra can be blindly extracted from two samples of their mixture, with similarities between resolved and reference spectra all above 0.80. This work opens a new and effective path to realize monitoring PAHs in water by three-way fluorescence spectroscopy technique.
From Perception to Metacognition: Auditory and Olfactory Functions in Early Blind, Late Blind, and Sighted Individuals

PubMed Central

Cornell Kärnekull, Stina; Arshamian, Artin; Nilsson, Mats E.; Larsson, Maria

2016-01-01

Although evidence is mixed, studies have shown that blind individuals perform better than sighted at specific auditory, tactile, and chemosensory tasks. However, few studies have assessed blind and sighted individuals across different sensory modalities in the same study. We tested early blind (n = 15), late blind (n = 15), and sighted (n = 30) participants with analogous olfactory and auditory tests in absolute threshold, discrimination, identification, episodic recognition, and metacognitive ability. Although the multivariate analysis of variance (MANOVA) showed no overall effect of blindness and no interaction with modality, follow-up between-group contrasts indicated a blind-over-sighted advantage in auditory episodic recognition, that was most pronounced in early blind individuals. In contrast to the auditory modality, there was no empirical support for compensatory effects in any of the olfactory tasks. There was no conclusive evidence for group differences in metacognitive ability to predict episodic recognition performance. Taken together, the results showed no evidence of an overall superior performance in blind relative sighted individuals across olfactory and auditory functions, although early blind individuals exceled in episodic auditory recognition memory. This observation may be related to an experience-induced increase in auditory attentional capacity. PMID:27729884
Olfactory Performance in a Large Sample of Early-Blind and Late-Blind Individuals.

PubMed

Sorokowska, Agnieszka

2016-10-01

Previous examinations of olfactory sensitivity in blind people have produced contradictory findings. Thus, whether visual impairment is associated with increased olfactory abilities is unclear. In the present investigation, I aimed to resolve the existing questions via a relatively large-scale study comprising early-blind (N = 43), and late-blind (N = 41) and sighted (N = 84) individuals matched in terms of gender and age. To compare the results with those of previous studies, I combined data from a free odor identification test, extensive psychophysical testing (Sniffin' Sticks test), and self-assessed olfactory performance. The analyses revealed no significant effects of sight on olfactory threshold, odor discrimination, cued identification, or free identification scores; neither was the performance of the early-blind and late-blind participants significantly different. Additionally, the self-assessed olfactory abilities of the blind people were no different than those of the sighted people. These results suggest that sensory compensation in visually impaired is not pronounced with regards to olfactory abilities as measured by standardized smell tests. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Enhancing the Entertainment Experience of Blind and Low-Vision Theatregoers through Touch Tours

ERIC Educational Resources Information Center

Udo, J. P.; Fels, D. I.

2010-01-01

In this paper we demonstrate how universal design theory and the research available on museum-based touch tours can be used to develop a touch tour for blind and low-vision theatregoers. We discuss these theoretical and practical approaches with reference to data collected and experience gained from the creation and execution of a touch tour for…
Blinded randomized controlled study of a web-based otoscopy simulator in undergraduate medical education.

PubMed

Stepniak, Camilla; Wickens, Brandon; Husein, Murad; Paradis, Josee; Ladak, Hanif M; Fung, Kevin; Agrawal, Sumit K

2017-06-01

OtoTrain is a Web-based otoscopy simulator that has previously been shown to have face and content validity. The objective of this study was to evaluate the effectiveness of this Web-based otoscopy simulator in teaching diagnostic otoscopy to novice learners STUDY DESIGN: Prospective, blinded randomized control trial. Second-year medical students were invited to participate in the study. A pretest consisted of a series of otoscopy videos followed by an open-answer format assessment pertaining to the characteristics and diagnosis of each video. Participants were then randomly divided into a control group and a simulator group. Following the pretest, both groups attended standard otology lectures, but the simulator group was additionally given unlimited access to OtoTrain for 1 week. A post-test was completed using a separate set of otoscopy videos. Tests were graded based on a comprehensive marking scheme. The pretest and post-test were anonymized, and the three evaluators were blinded to student allotment. A total of 41 medical students were enrolled in the study and randomized to the control group (n = 20) and the simulator group (n = 21). There was no significant difference between the two groups on their pretest scores. With the standard otology lectures, the control group had a 31% improvement in their post-test score (mean ± standard error of the mean, 30.4 ± 1.5) compared with their pretest score (23.3 ± 1.8) (P < .001). The simulator group had the addition of OtoTrain to the otology lectures, and their score improved by 71% on their post-test (37.8 ± 1.6) compared to their pretest (22.1 ± 1.9) (P < .001). Comparing the post-test results, the simulator group had a 24% higher score than the control group (P < .002). Inter-rater reliability between the blinded evaluators was excellent (r = 0.953, P < .001). The use of OtoTrain increased the diagnostic otoscopic performance in novice learners. OtoTrain may be an effective teaching adjunct for undergraduate
Inter-Rater Reliability of the Modified Ashworth Scale and Modified Modified Ashworth Scale in Assessing Poststroke Elbow Flexor Spasticity

ERIC Educational Resources Information Center

Kaya, Taciser; Goksel Karatepe, Altinay; Gunaydin, Rezzan; Koc, Aysegul; Altundal Ercan, Ulku

2011-01-01

The Modified Ashworth Scale (MAS) is commonly used in clinical practice for grading spasticity. However, it was modified recently by omitting grade "1+" of the MAS and redefining grade "2". The aim of this study was to investigate the inter-rater reliability of MAS and modified MAS (MMAS) for the assessment of poststroke elbow flexor spasticity.…
Blind estimation of blur in hyperspectral images

NASA Astrophysics Data System (ADS)

Zhang, Mo; Vozel, Benoit; Chehdi, Kacem; Uss, Mykhail; Abramov, Sergey; Lukin, Vladimir

2017-10-01

Hyperspectral images acquired by remote sensing systems are generally degraded by noise and can be sometimes more severely degraded by blur. When no knowledge is available about the degradations present on the original image, blind restoration methods can only be considered. By blind, we mean absolutely no knowledge neither of the blur point spread function (PSF) nor the original latent channel and the noise level. In this study, we address the blind restoration of the degraded channels component-wise, according to a sequential scheme. For each degraded channel, the sequential scheme estimates the blur point spread function (PSF) in a first stage and deconvolves the degraded channel in a second and final stage by means of using the PSF previously estimated. We propose a new component-wise blind method for estimating effectively and accurately the blur point spread function. This method follows recent approaches suggesting the detection, selection and use of sufficiently salient edges in the current processed channel for supporting the regularized blur PSF estimation. Several modifications are beneficially introduced in our work. A new selection of salient edges through thresholding adequately the cumulative distribution of their corresponding gradient magnitudes is introduced. Besides, quasi-automatic and spatially adaptive tuning of the involved regularization parameters is considered. To prove applicability and higher efficiency of the proposed method, we compare it against the method it originates from and four representative edge-sparsifying regularized methods of the literature already assessed in a previous work. Our attention is mainly paid to the objective analysis (via ݈l1-norm) of the blur PSF error estimation accuracy. The tests are performed on a synthetic hyperspectral image. This synthetic hyperspectral image has been built from various samples from classified areas of a real-life hyperspectral image, in order to benefit from realistic spatial
Childhood Fears among Children Who Are Blind: The Perspective of Teachers Who Are Blind

ERIC Educational Resources Information Center

Al-Zboon, Eman

2017-01-01

The aim of this study was to investigate childhood fears in children who are blind from the perspective of teachers who are blind. The study was conducted in Jordan. Forty-six teachers were interviewed. Results revealed that the main fear content in children who are blind includes fear of the unknown; environment-, transportation- and…
Effect of standardized training on the reliability of the Cochrane risk of bias assessment tool: a prospective study.

PubMed

da Costa, Bruno R; Beckett, Brooke; Diaz, Alison; Resta, Nina M; Johnston, Bradley C; Egger, Matthias; Jüni, Peter; Armijo-Olivo, Susan

2017-03-03

The Cochrane risk of bias tool is commonly criticized for having a low reliability. We aimed to investigate whether training of raters, with objective and standardized instructions on how to assess risk of bias, can improve the reliability of the Cochrane risk of bias tool. In this pilot study, four raters inexperienced in risk of bias assessment were randomly allocated to minimal or intensive standardized training for risk of bias assessment of randomized trials of physical therapy treatments for patients with knee osteoarthritis pain. Two raters were experienced risk of bias assessors who served as reference. The primary outcome of our study was between-group reliability, defined as the agreement of the risk of bias assessments of inexperienced raters with the reference assessments of experienced raters. Consensus-based assessments were used for this purpose. The secondary outcome was within-group reliability, defined as the agreement of assessments within pairs of inexperienced raters. We calculated the chance-corrected weighted Kappa to quantify agreement within and between groups of raters for each of the domains of the risk of bias tool. A total of 56 trials were included in our analysis. The Kappa for the agreement of inexperienced raters with reference across items of the risk of bias tool ranged from 0.10 to 0.81 for the minimal training group and from 0.41 to 0.90 for the standardized training group. The Kappa values for the agreement within pairs of inexperienced raters across the items of the risk of bias tool ranged from 0 to 0.38 for the minimal training group and from 0.93 to 1 for the standardized training group. Between-group differences in Kappa for the agreement of inexperienced raters with reference always favored the standardized training group and was most pronounced for incomplete outcome data (difference in Kappa 0.52, p < 0.001) and allocation concealment (difference in Kappa 0.30, p = 0.004). Intensive, standardized training on
Memory blindness: Altered memory reports lead to distortion in eyewitness memory.

PubMed

Cochran, Kevin J; Greenspan, Rachel L; Bogart, Daniel F; Loftus, Elizabeth F

2016-07-01

Choice blindness refers to the finding that people can often be misled about their own self-reported choices. However, little research has investigated the more long-term effects of choice blindness. We examined whether people would detect alterations to their own memory reports, and whether such alterations could influence participants' memories. Participants viewed slideshows depicting crimes, and then either reported their memories for episodic details of the event (Exp. 1) or identified a suspect from a lineup (Exp. 2). Then we exposed participants to manipulated versions of their memory reports, and later tested their memories a second time. The results indicated that the majority of participants failed to detect the misinformation, and that exposing witnesses to misleading versions of their own memory reports caused their memories to change to be consistent with those reports. These experiments have implications for eyewitness memory.
Rating Movies and Rating the Raters Who Rate Them

PubMed Central

Zhou, Hua; Lange, Kenneth

2010-01-01

The movie distribution company Netflix has generated considerable buzz in the statistics community by offering a million dollar prize for improvements to its movie rating system. Among the statisticians and computer scientists who have disclosed their techniques, the emphasis has been on machine learning approaches. This article has the modest goal of discussing a simple model for movie rating and other forms of democratic rating. Because the model involves a large number of parameters, it is nontrivial to carry out maximum likelihood estimation. Here we derive a straightforward EM algorithm from the perspective of the more general MM algorithm. The algorithm is capable of finding the global maximum on a likelihood landscape littered with inferior modes. We apply two variants of the model to a dataset from the MovieLens archive and compare their results. Our model identifies quirky raters, redefines the raw rankings, and permits imputation of missing ratings. The model is intended to stimulate discussion and development of better theory rather than to win the prize. It has the added benefit of introducing readers to some of the issues connected with analyzing high-dimensional data. PMID:20802818
Rating Movies and Rating the Raters Who Rate Them.

PubMed

Zhou, Hua; Lange, Kenneth

2009-11-01

The movie distribution company Netflix has generated considerable buzz in the statistics community by offering a million dollar prize for improvements to its movie rating system. Among the statisticians and computer scientists who have disclosed their techniques, the emphasis has been on machine learning approaches. This article has the modest goal of discussing a simple model for movie rating and other forms of democratic rating. Because the model involves a large number of parameters, it is nontrivial to carry out maximum likelihood estimation. Here we derive a straightforward EM algorithm from the perspective of the more general MM algorithm. The algorithm is capable of finding the global maximum on a likelihood landscape littered with inferior modes. We apply two variants of the model to a dataset from the MovieLens archive and compare their results. Our model identifies quirky raters, redefines the raw rankings, and permits imputation of missing ratings. The model is intended to stimulate discussion and development of better theory rather than to win the prize. It has the added benefit of introducing readers to some of the issues connected with analyzing high-dimensional data.

Causes of blindness and career choice among pupils in a blind school; South Western Nigeria.

PubMed

Fadamiro, Christianah Olufunmilayo

2014-01-01

The causes of Blindness vary from place to place with about 80% of it been avoidable. Furthermore Blind people face a lot of challenges in career choice thus limiting their economic potential and full integration into the society. This study aims at identifying the causes of blindness and career choice among pupils in a school for the blind in South -Western Nigeria. This is a descriptive study of causes of blindness and career choice among 38 pupils residing in a school for the blind at Ikere -Ekiti, South Western Nigeria. Thirty eight pupils comprising of 25 males (65.8%) and 13 females (34.2%) with age range from 6-39 years were seen for the study, The commonest cause of blindness was cataract with 14 cases (36.84%) while congenital glaucoma and infection had an equal proportion of 5 cases each (13.16%). Avoidable causes constituted the greatest proportion of the causes 27 (71.05%) while unavoidable causes accounted for 11 (28.9%). The law career was the most desired profession by the pupils 11 (33.3%) followed by Teaching 9 (27.3%), other desired profession includes engineering, journalism and farming. The greatest proportion of causes of blindness identified in this study is avoidable. There is the need to create public awareness on some of the notable causes particularly cataract and motivate the community to utilize available eye care services Furthermore there is need for career talk in schools for the blind to enable them choose career where their potential can be fully maximized.
Vibrotactile masking experiments reveal accelerated somatosensory processing in congenitally blind braille readers.

PubMed

Bhattacharjee, Arindam; Ye, Amanda J; Lisak, Joy A; Vargas, Maria G; Goldreich, Daniel

2010-10-27

Braille reading is a demanding task that requires the identification of rapidly varying tactile patterns. During proficient reading, neighboring characters impact the fingertip at ∼100 ms intervals, and adjacent raised dots within a character at 50 ms intervals. Because the brain requires time to interpret afferent sensorineural activity, among other reasons, tactile stimuli separated by such short temporal intervals pose a challenge to perception. How, then, do proficient Braille readers successfully interpret inputs arising from their fingertips at such rapid rates? We hypothesized that somatosensory perceptual consolidation occurs more rapidly in proficient Braille readers. If so, Braille readers should outperform sighted participants on masking tasks, which demand rapid perceptual processing, but would not necessarily outperform the sighted on tests of simple vibrotactile sensitivity. To investigate, we conducted two-interval forced-choice vibrotactile detection, amplitude discrimination, and masking tasks on the index fingertips of 89 sighted and 57 profoundly blind humans. Sighted and blind participants had similar unmasked detection (25 ms target tap) and amplitude discrimination (compared with 100 μm reference tap) thresholds, but congenitally blind Braille readers, the fastest readers among the blind participants, exhibited significantly less masking than the sighted (masker, 50 Hz, 50 μm; target-masker delays, ±50 and ±100 ms). Indeed, Braille reading speed correlated significantly and specifically with masking task performance, and in particular with the backward masking decay time constant. We conclude that vibrotactile sensitivity is unchanged but that perceptual processing is accelerated in congenitally blind Braille readers.
Vibrotactile masking experiments reveal accelerated somatosensory processing in congenitally blind Braille readers

PubMed Central

Bhattacharjee, Arindam; Ye, Amanda J.; Lisak, Joy A.; Vargas, Maria G.; Goldreich, Daniel

2010-01-01

Braille reading is a demanding task that requires the identification of rapidly varying tactile patterns. During proficient reading, neighboring characters impact the fingertip at about 100-ms intervals, and adjacent raised dots within a character at 50-ms intervals. Because the brain requires time to interpret afferent sensorineural activity, among other reasons, tactile stimuli separated by such short temporal intervals pose a challenge to perception. How, then, do proficient Braille readers successfully interpret inputs arising from their fingertips at such rapid rates? We hypothesized that somatosensory perceptual consolidation occurs more rapidly in proficient Braille readers. If so, Braille readers should outperform sighted participants on masking tasks, which demand rapid perceptual processing, but would not necessarily outperform the sighted on tests of simple vibrotactile sensitivity. To investigate, we conducted two-interval forced-choice vibrotactile detection, amplitude discrimination, and masking tasks on the index fingertips of 89 sighted and 57 profoundly blind humans. Sighted and blind participants had similar unmasked detection (25-ms target tap) and amplitude discrimination (compared to 100-micron reference tap) thresholds, but congenitally blind Braille readers, the fastest readers among the blind participants, exhibited significantly less masking than the sighted (masker: 50-Hz, 50-micron; target-masker delays ±50 and ±100 ms). Indeed, Braille reading speed correlated significantly and specifically with masking task performance, and in particular with the backward masking decay time constant. We conclude that vibrotactile sensitivity is unchanged, but that perceptual processing is accelerated in congenitally blind Braille readers. PMID:20980584
Education of Blind Persons in Ethiopia.

ERIC Educational Resources Information Center

Maru, A. A.; Cook, M. J.

1990-01-01

The paper reviews the historical and cultural attitudes of Ethiopians toward blind children, the education of blind children, the special situation of orphaned blind children, limitations of existing educational models, and development of a new model that relies on elements of community-based rehabilitation and the employment of blind high school…
What the comprehensive economics of blindness and visual impairment can help us understand.

PubMed

Frick, Kevin D

2012-01-01

Since the year 2000, the amount written about the economics of blindness and visual impairment has increased substantially. In some cases, the studies listed under this heading are calculations of the costs related to vision impairment and blindness at a national or global level; in other cases the studies examine the cost-effectiveness of strategies to prevent or modify visual impairment or blindness that are intended to be applied as a guide to treatment recommendations and coverage decisions. In each case the references are just examples of many that could be cited. These important studies have helped advocates, policy makers, practitioners, educators, and others interested in eye and vision health to understand the magnitude of the impact that visual impairment and blindness have on the world, regions, nations, and individuals and the tradeoffs that need to be made to limit the impact. However, these studies only begin to tap into the insights that economic logic might offer to those interested in this field. This paper presents multiple case studies that demonstrate that the economics of blindness and visual impairment encompasses much more than simply measures of the burden of the condition. Case studies demonstrating the usefulness of economic insight include analysis of the prevention of conditions that lead to impairment, decisions about refractive error and presbyopia, decisions about disease and injury treatment, decisions about behavior among those with uncorrectable impairment, and decisions about how to regulate the market all have important economic inputs.
What the comprehensive economics of blindness and visual impairment can help us understand

PubMed Central

Frick, Kevin D

2012-01-01

Since the year 2000, the amount written about the economics of blindness and visual impairment has increased substantially. In some cases, the studies listed under this heading are calculations of the costs related to vision impairment and blindness at a national or global level; in other cases the studies examine the cost-effectiveness of strategies to prevent or modify visual impairment or blindness that are intended to be applied as a guide to treatment recommendations and coverage decisions. In each case the references are just examples of many that could be cited. These important studies have helped advocates, policy makers, practitioners, educators, and others interested in eye and vision health to understand the magnitude of the impact that visual impairment and blindness have on the world, regions, nations, and individuals and the tradeoffs that need to be made to limit the impact. However, these studies only begin to tap into the insights that economic logic might offer to those interested in this field. This paper presents multiple case studies that demonstrate that the economics of blindness and visual impairment encompasses much more than simply measures of the burden of the condition. Case studies demonstrating the usefulness of economic insight include analysis of the prevention of conditions that lead to impairment, decisions about refractive error and presbyopia, decisions about disease and injury treatment, decisions about behavior among those with uncorrectable impairment, and decisions about how to regulate the market all have important economic inputs. PMID:22944750
Evaluation of established and new reference lines for the standardization of transperineal ultrasound.

PubMed

Hennemann, J; Kennes, L N; Maass, N; Najjari, L

2014-11-01

To examine the performance of a new reference line for the assessment of pelvic organ descent by transperineal ultrasound. We compared our newly proposed reference line, between two hyperechoic contours of the symphysis pubis (Line 3), with the horizontal reference line proposed by Dietz and Wilson (Line 1) and the central pubic line proposed by Schaer et al. (Line 2). Ultrasound volumes of 94 women obtained in routine clinical practice were analyzed. The perpendicular distance from the reference lines to the internal sphincter and the most dependent part of the bladder base was measured for volumes obtained at rest, on pelvic floor muscle contraction, on Valsalva maneuver and during coughing. Measurements were repeated 4 months later by the same examiner. Rates of assessment were calculated, and intrarater reliability was evaluated using Bland-Altman plots and intraclass correlation coefficients. Line 2 had to be excluded from reliability analysis because of an assessment rate of only 12%, whereas Lines 1 and 3 could be assessed in 100% of volumes. The intrarater repeatability of Lines 1 and 3 was shown to be very similar. In this comparison of three potential reference lines for the assessment of pelvic organ descent by transperineal ultrasound, the central pubic line was shown to be inferior owing to poor visibility in our volumes. Inter-rater reliability analysis and validation studies are required to confirm our results. Copyright © 2014 ISUOG. Published by John Wiley & Sons Ltd.
Humanizing Blindness through Public Education.

ERIC Educational Resources Information Center

Augusto, C. R.; McGraw, J. M.

1990-01-01

Public attitudes toward blindness are shaped by limited contacts with visually impaired people and unrealistic portrayals of blind people in the media. Proactive efforts including national and local public education programs are needed to change stereotyped thinking, humanize blindness, and lead to greater opportunities for fuller participation in…
Programs for the Deaf Blind.

ERIC Educational Resources Information Center

American Annals of the Deaf, 1987

1987-01-01

The directory lists 30 programs for deaf-blind children and youth, the 10 regional offices of the Helen Keller National Center for Deaf-Blind Youths and Adults, and five programs for training teachers of the deaf-blind. Provided for each program is address, director's name, and phone number. (DB)
Project Word-Back: Exploratory Follow-Up Study on Deaf-Blind (Rubella) Children in California.

ERIC Educational Resources Information Center

Scheffelin, Edward J.

Project Word-Back, an exploratory followup study of 21 young deaf-blind (Rubella) children (6 to 9 years old), was conducted to establish a tentative reference source of information, obtain teacher estimates on selected aspects of the current functioning level of a sample of children, and provide basic data from which hypotheses may be formulated…
Underdetermined blind separation of three-way fluorescence spectra of PAHs in water.

PubMed

Yang, Ruifang; Zhao, Nanjing; Xiao, Xue; Zhu, Wei; Chen, Yunan; Yin, Gaofang; Liu, Jianguo; Liu, Wenqing

2018-06-15

In this work, underdetermined blind decomposition method is developed to recognize individual components from the three-way fluorescent spectra of their mixtures by using sparse component analysis (SCA). The mixing matrix is estimated from the mixtures using fuzzy data clustering algorithm together with the scatters corresponding to local energy maximum value in the time-frequency domain, and the spectra of object components are recovered by pseudo inverse technique. As an example, using this method three and four pure components spectra can be blindly extracted from two samples of their mixture, with similarities between resolved and reference spectra all above 0.80. This work opens a new and effective path to realize monitoring PAHs in water by three-way fluorescence spectroscopy technique. Copyright © 2018 Elsevier B.V. All rights reserved.
Ocular Pain and Impending Blindness During Facial Cosmetic Injections: Is Your Office Prepared?

PubMed

Prado, Giselle; Rodríguez-Feliz, Jose

2017-02-01

Soft tissue filler injections are the second most common non-surgical procedure performed by the plastic surgeon. Embolization of intravascular material after facial injection is a rare but terrifying outcome due to the high likelihood of long-term sequela such as blindness and cerebrovascular accident. The literature is replete with examples of permanent blindness caused by injection with autologous fat, soft tissue fillers such as hyaluronic acid, PLLA, calcium hydroxyl-apatite, and even corticosteroid suspensions. However, missing from the discussion is an effective treatment algorithm that can be quickly and safely followed by injecting physicians in the case of an intravascular injection with impending blindness. In this report, we present the case of a 64-year-old woman who suffered from blindness and hemiparesis after facial cosmetic injections performed by a family physician. We use this case to create awareness that this complication has become more common as the number of injectors and patients seeking these treatments have increased exponentially over the past few years. We share in this study our experience with the incorporation of a "blindness safety kit" in each of our offices to promptly initiate treatment in someone with embolization and impending blindness. The kit contains a step-by-step protocol to follow in the event of arterial embolization of filler material associated with ocular pain and impending loss of vision. This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
Introduction to Deaf-Blindness Workshop.

ERIC Educational Resources Information Center

Rhodes, Larry

This document presents the agenda and materials distributed at a 1-day introductory workshop on deaf-blindness. Introductory material explains the workshop's purpose and rules. A short test contrasts facts and myths about deaf-blindness. A handout presents information on the dynamics of deaf-blindness, etiologies in the adult deaf-blind…
Blindness and visual impairment in opera.

PubMed

Aydin, Pinar; Ritch, Robert; O'Dwyer, John

2018-01-01

The performing arts mirror the human condition. This study sought to analyze the reasons for inclusion of visually impaired characters in opera, the cause of the blindness or near blindness, and the dramatic purpose of the blindness in the storyline. We reviewed operas from the 18 th century to 2010 and included all characters with ocular problems. We classified the cause of each character's ocular problem (organic, nonorganic, and other) in relation to the thematic setting of the opera: biblical and mythical, blind beggars or blind musicians, historical (real or fictional characters), and contemporary or futuristic. Cases of blindness in 55 characters (2 as a choir) from 38 operas were detected over 3 centuries of repertoire: 11 had trauma-related visual impairment, 5 had congenital blindness, 18 had visual impairment of unknown cause, 9 had psychogenic or malingering blindness, and 12 were symbolic or miracle-related. One opera featured an ophthalmologist curing a patient. The research illustrates that visual impairment was frequently used as an artistic device to enhance the intent and situate an opera in its time.
Assessing disease severity: accuracy and reliability of rater estimates in relation to number of diagrams in a standard area diagram set

USDA-ARS?s Scientific Manuscript database

Error in rater estimates of plant disease severity occur, and standard area diagrams (SADs) help improve accuracy and reliability. The effects of diagram number in a SAD set on accuracy and reliability is unknown. The objective of this study was to compare estimates of pecan scab severity made witho...
"Color-Blind" Racism.

ERIC Educational Resources Information Center

Carr, Leslie G.

Examining race relations in the United States from a historical perspective, this book explains how the constitution is racist and how color blindness is actually a racist ideology. It is argued that Justice Harlan, in his dissenting opinion in Plessy v. Ferguson, meant that the constitution and the law must remain blind to the existence of race…
Deaf-Blind Perspectives, 1998-1999.

ERIC Educational Resources Information Center

Malloy, Peggy, Ed.

1998-01-01

This collection of three issues focuses on competencies for teachers of learners who are deaf-blind, living with deaf-blindness, and resources in Australia for parents and families of students who are deaf-blind. Articles include: (1) "Research-to-Practice Focus: Competencies for Teachers of Learners Who Are Deafblind" (Marianne Riggio),…
Measuring the Pain Area: An Intra- and Inter-Rater Reliability Study Using Image Analysis Software.

PubMed

Dos Reis, Felipe Jose Jandre; de Barros E Silva, Veronica; de Lucena, Raphaela Nunes; Mendes Cardoso, Bruno Alexandre; Nogueira, Leandro Calazans

2016-01-01

Pain drawings have frequently been used for clinical information and research. The aim of this study was to investigate intra- and inter-rater reliability of area measurements performed on pain drawings. Our secondary objective was to verify the reliability when using computers with different screen sizes, both with and without mouse hardware. Pain drawings were completed by patients with chronic neck pain or neck-shoulder-arm pain. Four independent examiners participated in the study. Examiners A and B used the same computer with a 16-inch screen and wired mouse hardware. Examiner C used a notebook with a 16-inch screen and no mouse hardware, and Examiner D used a computer with an 11.6-inch screen and a wireless mouse. Image measurements were obtained using GIMP and NIH ImageJ computer programs. The length of all the images was measured using GIMP software to a set scale in ImageJ. Thus, each marked area was encircled and the total surface area (cm(2) ) was calculated for each pain drawing measurement. A total of 117 areas were identified and 52 pain drawings were analyzed. The intrarater reliability between all examiners was high (ICC = 0.989). The inter-rater reliability was also high. No significant differences were observed when using different screen sizes or when using or not using the mouse hardware. This suggests that the precision of these measurements is acceptable for the use of this method as a measurement tool in clinical practice and research. © 2014 World Institute of Pain.
Inter-rater variability in motor function assessment in Parkinson's disease between experts in movement disorders and nurses specialising in PD management.

PubMed

de Deus Fonticoba, T; Santos García, D; Macías Arribí, M

2017-05-23

In clinical practice, assessing patients with Parkinson's disease (PD) is a complex, time-consuming task. Our purpose is to provide a rigorous and objective evaluation of how motor function in PD patients is assessed by neurologists specialising in movement disorders, on the one hand, and by nurses specialising in PD management, on the other. We conducted an observational, cross-sectional, single-centre study of 50 patients with PD (52% men; mean age: 64.7 ± 8.7 years) who were assessed between 5 January 2016 and 20 July 2016. A neurologist and a nurse evaluated motor function in the early morning hours using the Unified Parkinson's Disease Rating Scale (UPDRS) parts III and IV and Hoehn & Yahr (H&Y) scale. Tests were administered in the same PD periods (in 48 patients during the 'off' time and in 2 patients during the 'on' time). Inter-rater variability was estimated with the intraclass correlation coefficient (ICC). Forty-nine patients (98%) were classified in the same H&Y stage by both raters. Assessment times were similar for both raters. ICC for UPDRS-IV and UPDRS-III total scores were 0.955 (P<.0001) and 0.954 (P<.0001), respectively. The greatest variability was found for UPDRS-III item 29 (gait; ICC=0.746; P<.0001) and the lowest, for item 30 (postural stability; ICC=0.918; P<.0001). Motor function assessment of PD patients by a trained nurse is equivalent to that made by an expert neurologist and takes the same time to complete. Copyright © 2017 Sociedad Española de Neurología. Publicado por Elsevier España, S.L.U. All rights reserved.
Programs for the Deaf-Blind.

ERIC Educational Resources Information Center

American Annals of the Deaf, 1990

1990-01-01

The directory lists 28 state or multistate programs for deaf blind children and youth, the national center and 10 regional offices of the Helen Keller National Center for Deaf-Blind Youths and Adults, and 4 programs for training teachers of the deaf-blind. Information usually provided includes, address, director's name, and phone number. (DB)

When Emotion Blinds: A Spatiotemporal Competition Account of Emotion-Induced Blindness

PubMed Central

Wang, Lingling; Kennedy, Briana L.; Most, Steven B.

2012-01-01

Emotional visual scenes are such powerful attractors of attention that they can disrupt perception of other stimuli that appear soon afterward, an effect known as emotion-induced blindness. What mechanisms underlie this impact of emotion on perception? Evidence suggests that emotion-induced blindness may be distinguishable from closely related phenomena such as the orienting of spatial attention to emotional stimuli or the central resource bottlenecks commonly associated with the attentional blink. Instead, we suggest that emotion-induced blindness reflects relatively early competition between targets and emotional distractors, where spontaneous prioritization of emotional stimuli leads to suppression of competing perceptual representations potentially linked to an overlapping point in time and space. PMID:23162497
SLO blind data set inversion and classification using physically complete models

NASA Astrophysics Data System (ADS)

Shamatava, I.; Shubitidze, F.; Fernández, J. P.; Barrowes, B. E.; O'Neill, K.; Grzegorczyk, T. M.; Bijamov, A.

2010-04-01

Discrimination studies carried out on TEMTADS and Metal Mapper blind data sets collected at the San Luis Obispo UXO site are presented. The data sets included four types of targets of interest: 2.36" rockets, 60-mm mortar shells, 81-mm projectiles, and 4.2" mortar items. The total parameterized normalized magnetic source (NSMS) amplitudes were used to discriminate TOI from metallic clutter and among the different hazardous UXO. First, in object's frame coordinate, the total NSMS were determined for each TOI along three orthogonal axes from the training data provided by the Strategic Environmental Research and Development Program (SERDP) along with the referred blind data sets. Then the inverted total NSMS were used to extract the time-decay classification features. Once our inversion and classification algorithms were tested on the calibration data sets then we applied the same procedure to all blind data sets. The combined NSMS and differential evolution algorithm is utilized for determine the NSMS strengths for each cell. The obtained total NSMS time-decay curves were used to extract the discrimination features and perform classification using the training data as reference. In addition, for cross validation, the inverted locations and orientations from NSMS-DE algorithm were compared against the inverted data that obtained via the magnetic field, vector and scalar potentials (HAP) method and the combined dipole and Gauss-Newton approach technique. We examined the entire time decay history of the total NSMS case-by-case for classification purposes. Also, we use different multi-class statistical classification algorithms for separating the dangerous objects from non hazardous items. The inverted targets were ranked by target ID and submitted to SERDP for independent scoring. The independent scoring results are presented.
Blind Data Attack on BGP Routers

DTIC Science & Technology

2017-03-01

implement blind attack protection, leaving long -standing connections, such as Border Gateway Protocol (BGP) sessions, vulnerable to exploitation. This...protection measures should a discovered vulnerability reduce attack complexity. 14. SUBJECT TERMS BGP, TCP, blind attack, blind data attack 15. NUMBER OF...implementations may not properly implement blind attack protection, leaving long -standing connections, such as BorderGateway Protocol (BGP) sessions
Deaf-Blind Perspectives, 1997-1998.

ERIC Educational Resources Information Center

Deaf-Blind Perspectives, 1998

1998-01-01

This one-year collection of three serial issues focuses on problem solving skills for children with deaf-blindness, the history and change in the education of children who are deaf-blind since the rubella epidemic of the 1960's, and early identification of infants who are deaf-blind. Specific articles include: (1) "Research to Practice Focus…
Inter-rater reliability of motor unit number estimates and quantitative motor unit analysis in the tibialis anterior muscle.

PubMed

Boe, S G; Dalton, B H; Harwood, B; Doherty, T J; Rice, C L

2009-05-01

To establish the inter-rater reliability of decomposition-based quantitative electromyography (DQEMG) derived motor unit number estimates (MUNEs) and quantitative motor unit (MU) analysis. Using DQEMG, two examiners independently obtained a sample of needle and surface-detected motor unit potentials (MUPs) from the tibialis anterior muscle from 10 subjects. Coupled with a maximal M wave, surface-detected MUPs were used to derive a MUNE for each subject and each examiner. Additionally, size-related parameters of the individual MUs were obtained following quantitative MUP analysis. Test-retest MUNE values were similar with high reliability observed between examiners (ICC=0.87). Additionally, MUNE variability from test-retest as quantified by a 95% confidence interval was relatively low (+/-28 MUs). Lastly, quantitative data pertaining to MU size, complexity and firing rate were similar between examiners. MUNEs and quantitative MU data can be obtained with high reliability by two independent examiners using DQEMG. Establishing the inter-rater reliability of MUNEs and quantitative MU analysis using DQEMG is central to the clinical applicability of the technique. In addition to assessing response to treatments over time, multiple clinicians may be involved in the longitudinal assessment of the MU pool of individuals with disorders of the central or peripheral nervous system.
Evaluation of the "e-rater"® Scoring Engine for the "TOEFL"® Independent and Integrated Prompts. Research Report. ETS RR-12-06

ERIC Educational Resources Information Center

Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M.; Davey, Tim; Bridgeman, Brent

2012-01-01

Scoring models for the "e-rater"® system were built and evaluated for the "TOEFL"® exam's independent and integrated writing prompts. Prompt-specific and generic scoring models were built, and evaluation statistics, such as weighted kappas, Pearson correlations, standardized differences in mean scores, and correlations with…
Our Blind Child: Bringing Up a Blind Child During Its Early Years.

ERIC Educational Resources Information Center

Pielasch, Helmut, Ed.; And Others

The document contains 10 author contributed chapters (in four languages) which resulted from a 1976 international symposium on problems concerning the preschool education of blind children and the guidance of their parents. Chapters have the following titles (with authors and nationality in parentheses): "Development of the Blind Child"…
INTRA-RATER RELIABILITY OF THE MULTIPLE SINGLE-LEG HOP-STABILIZATION TEST AND RELATIONSHIPS WITH AGE, LEG DOMINANCE AND TRAINING.

PubMed

Sawle, Leanne; Freeman, Jennifer; Marsden, Jonathan

2017-04-01

Balance is a complex construct, affected by multiple components such as strength and co-ordination. However, whilst assessing an athlete's dynamic balance is an important part of clinical examination, there is no gold standard measure. The multiple single-leg hop-stabilization test is a functional test which may offer a method of evaluating the dynamic attributes of balance, but it needs to show adequate intra-tester reliability. The purpose of this study was to assess the intra-rater reliability of a dynamic balance test, the multiple single-leg hop-stabilization test on the dominant and non-dominant legs. Intra-rater reliability study. Fifteen active participants were tested twice with a 10-minute break between tests. The outcome measure was the multiple single-leg hop-stabilization test score, based on a clinically assessed numerical scoring system. Results were analysed using an Intraclass Correlations Coefficient (ICC 2,1 ) and Bland-Altman plots. Regression analyses explored relationships between test scores, leg dominance, age and training (an alpha level of p = 0.05 was selected). ICCs for intra-rater reliability were 0.85 for the dominant and non-dominant legs (confidence intervals = 0.62-0.95 and 0.61-0.95 respectively). Bland-Altman plots showed scores within two standard deviations. A significant correlation was observed between the dominant and non-dominant leg on balance scores (R 2 =0.49, p<0.05), and better balance was associated with younger participants in their non-dominant leg (R 2 =0.28, p<0.05) and their dominant leg (R 2 =0.39, p<0.05), and a higher number of hours spent training for the non-dominant leg R 2 =0.37, p<0.05). The multiple single-leg hop-stabilisation test demonstrated strong intra-tester reliability with active participants. Younger participants who trained more, have better balance scores. This test may be a useful measure for evaluating the dynamic attributes of balance. 3.
Blind-Anchor-Nut-Installation Fixture (BANIF)

NASA Technical Reports Server (NTRS)

Willey, Norman F., Jr.; Linker, James F.

1994-01-01

Blind-anchor-nut-installation fixture, BANIF, developed for replacing or installing anchor nuts in blind holes or other inaccessible places. Attachment of anchor nut to BANIF enables placement of anchor nut on blind side of component.
CT-P6 compared with reference trastuzumab for HER2-positive breast cancer: a randomised, double-blind, active-controlled, phase 3 equivalence trial.

PubMed

Stebbing, Justin; Baranau, Yauheni; Baryash, Valeriy; Manikhas, Alexey; Moiseyenko, Vladimir; Dzagnidze, Giorgi; Zhavrid, Edvard; Boliukh, Dmytro; Stroyakovskii, Daniil; Pikiel, Joanna; Eniu, Alexandru; Komov, Dmitry; Morar-Bolba, Gabriela; Li, Rubi K; Rusyn, Andriy; Lee, Sang Joon; Lee, Sung Young; Esteva, Francisco J

2017-07-01

CT-P6 is a proposed biosimilar to reference trastuzumab. In this study, we aimed to establish equivalence of CT-P6 to reference trastuzumab in neoadjuvant treatment of HER2-positive early-stage breast cancer. In this randomised, double-blind, active-controlled, phase 3 equivalence trial, we recruited women aged 18 years or older with stage I-IIIa operable HER2-positive breast cancer from 112 centres in 23 countries. Inclusion criteria were an Eastern Cooperative Oncology Group performance status score of 0 or 1; a normal left ventricular ejection fraction of at least 55%; adequate bone marrow, hepatic, and renal function; at least one measureable lesion; and known oestrogen and progesterone receptor status. Exclusion criteria included bilateral breast cancer, previous breast cancer treatment, previous anthracycline treatment, and pregnancy or lactation. We randomly allocated patients 1:1 to receive neoadjuvant CT-P6 or reference trastuzumab intravenously (eight cycles, each lasting 3 weeks, for 24 weeks; 8 mg/kg on day 1 of cycle 1 and 6 mg/kg on day 1 of cycles 2-8) in conjunction with neoadjuvant docetaxel (75 mg/m 2 on day 1 of cycles 1-4) and FEC (fluorouracil [500 mg/m 2 ], epirubicin [75 mg/m 2 ], and cyclophosphamide [500 mg/m 2 ]; day 1 of cycles 5-8) therapy. We stratified randomisation by clinical stage, receptor status, and country and used permuted blocks. We did surgery within 3-6 weeks of the final neoadjuvant study drug dose, followed by an adjuvant treatment period of up to 1 year. We monitored long-term safety and efficacy for 3 years after the last patient was enrolled. Participants and investigators were masked to treatment until study completion. The primary efficacy endpoint, analysed in the per-protocol population, was pathological complete response, assessed via specimens obtained during surgery, analysed by masked central review of local histopathology reports. The equivalence margin was -0·15 to 0·15. This trial is registered with
Sublexical Processing in Visual Recognition of Chinese Characters: Evidence from Repetition Blindness for Subcharacter Components

ERIC Educational Resources Information Center

Yeh, Su-Ling; Li, Jing-Ling

2004-01-01

Repetition blindness (RB) refers to the failure to detect the second occurrence of a repeated item in rapid serial visual presentation (RSVP). In two experiments using RSVP, the ability to report two critical characters was found to be impaired when these two characters were identical (Experiment 1) or similar by sharing one repeated component…
Programs for Deaf-Blind Children and Adults.

ERIC Educational Resources Information Center

American Annals of the Deaf, 2000

2000-01-01

This annual directory lists programs for deaf-blind children and adults including programs for deaf-blind children and youth (national and state level), the Helen Keller Centers for deaf-blind youth and adults, and programs for training teachers of deaf-blind students. (DB)
Evaluation of the "e-rater"® Scoring Engine for the "GRE"® Issue and Argument Prompts. Research Report. ETS RR-12-02

ERIC Educational Resources Information Center

Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M.; Davey, Tim; Bridgeman, Brent

2012-01-01

Automated scoring models for the "e-rater"® scoring engine were built and evaluated for the "GRE"® argument and issue-writing tasks. Prompt-specific, generic, and generic with prompt-specific intercept scoring models were built and evaluation statistics such as weighted kappas, Pearson correlations, standardized difference in…
Blindness

MedlinePlus

... is to “conduct and support research, training, health information dissemination, and other programs with respect to blinding eye ... Media Policies and Other Important Links NEI Employee Emergency Information NEI Intranet (Employees Only) *PDF files require ...
Poverty and blindness in Pakistan: results from the Pakistan national blindness and visual impairment survey.

PubMed

Gilbert, Clare E; Shah, S P; Jadoon, M Z; Bourne, R; Dineen, B; Khan, M A; Johnson, G J; Khan, M D

2008-01-05

To explore the association between blindness and deprivation in a nationally representative sample of adults in Pakistan. Cross sectional population based survey. 221 rural and urban clusters selected randomly throughout Pakistan. Nationally representative sample of 16 507 adults aged 30 or above (95.3% response rate). Associations between visual impairment and poverty assessed by a cluster level deprivation index and a household level poverty indicator; prevalence and causes of blindness; measures of the rate of uptake and quality of eye care services. 561 blind participants (<3/60 in the better eye) were identified during the survey. Clusters in urban Sindh province were the most affluent, whereas rural areas in Balochistan were the poorest. The prevalence of blindness in adults living in affluent clusters was 2.2%, compared with 3.7% in medium clusters and 3.9% in poor clusters (P<0.001 for affluent v poor). The highest prevalence of blindness was found in rural Balochistan (5.2%). The prevalence of total blindness (bilateral no light perception) was more than three times higher in poor clusters than in affluent clusters (0.24% v 0.07%, P<0.001). The prevalences of blindness caused by cataract, glaucoma, and corneal opacity were lower in affluent clusters and households. Reflecting access to eye care services, cataract surgical coverage was higher in affluent clusters (80.6%) than in medium (76.8%) and poor areas (75.1%). Intraocular lens implantation rates were significantly lower in participants from poorer households. 10.2% of adults living in affluent clusters presented to the examination station wearing spectacles, compared with 6.7% in medium clusters and 4.4% in poor cluster areas. Spectacle coverage in affluent areas was more than double that in poor clusters (23.5% v 11.1%, P<0.001). Blindness is associated with poverty in Pakistan; lower access to eye care services was one contributory factor. To reduce blindness, strategies targeting poor people will
KINECT 3: A Phase 3 Randomized, Double-Blind, Placebo-Controlled Trial of Valbenazine for Tardive Dyskinesia.

PubMed

Hauser, Robert A; Factor, Stewart A; Marder, Stephen R; Knesevich, Mary Ann; Ramirez, Paul M; Jimenez, Roland; Burke, Joshua; Liang, Grace S; O'Brien, Christopher F

2017-05-01

Tardive dyskinesia is a persistent movement disorder induced by dopamine receptor blockers, including antipsychotics. Valbenazine (NBI-98854) is a novel, highly selective vesicular monoamine transporter 2 inhibitor that demonstrated favorable efficacy and tolerability in the treatment of tardive dyskinesia in phase 2 studies. This phase 3 study further evaluated the efficacy, safety, and tolerability of valbenazine as a treatment for tardive dyskinesia. This 6-week, randomized, double-blind, placebo-controlled trial included patients with schizophrenia, schizoaffective disorder, or a mood disorder who had moderate or severe tardive dyskinesia. Participants were randomly assigned in a 1:1:1 ratio to once-daily placebo, valbenazine at 40 mg/day, or valbenazine at 80 mg/day. The primary efficacy endpoint was change from baseline to week 6 in the 80 mg/day group compared with the placebo group on the Abnormal Involuntary Movement Scale (AIMS) dyskinesia score (items 1-7), as assessed by blinded central AIMS video raters. Safety assessments included adverse event monitoring, laboratory tests, ECG, and psychiatric measures. The intent-to-treat population included 225 participants, of whom 205 completed the study. Approximately 65% of participants had schizophrenia or schizoaffective disorder, and 85.5% were receiving concomitant antipsychotics. Least squares mean change from baseline to week 6 in AIMS dyskinesia score was -3.2 for the 80 mg/day group, compared with -0.1 for the placebo group, a significant difference. AIMS dyskinesia score was also reduced in the 40 mg/day group (-1.9 compared with -0.1). The incidence of adverse events was consistent with previous studies. Once-daily valbenazine significantly improved tardive dyskinesia in participants with underlying schizophrenia, schizoaffective disorder, or mood disorder. Valbenazine was generally well tolerated, and psychiatric status remained stable. Longer trials are necessary to understand the long-term effects
The sensory construction of dreams and nightmare frequency in congenitally blind and late blind individuals.

PubMed

Meaidi, Amani; Jennum, Poul; Ptito, Maurice; Kupers, Ron

2014-05-01

We aimed to assess dream content in groups of congenitally blind (CB), late blind (LB), and age- and sex-matched sighted control (SC) participants. We conducted an observational study of 11 CB, 14 LB, and 25 SC participants and collected dream reports over a 4-week period. Every morning participants filled in a questionnaire related to the sensory construction of the dream, its emotional and thematic content, and the possible occurrence of nightmares. We also assessed participants' ability of visual imagery during waking cognition, sleep quality, and depression and anxiety levels. All blind participants had fewer visual dream impressions compared to SC participants. In LB participants, duration of blindness was negatively correlated with duration, clarity, and color content of visual dream impressions. CB participants reported more auditory, tactile, gustatory, and olfactory dream components compared to SC participants. In contrast, LB participants only reported more tactile dream impressions. Blind and SC participants did not differ with respect to emotional and thematic dream content. However, CB participants reported more aggressive interactions and more nightmares compared to the other two groups. Our data show that blindness considerably alters the sensory composition of dreams and that onset and duration of blindness plays an important role. The increased occurrence of nightmares in CB participants may be related to a higher number of threatening experiences in daily life in this group. Copyright © 2014 Elsevier B.V. All rights reserved.
High-dose transdermal nicotine in Parkinson's disease patients: a randomized, open-label, blinded-endpoint evaluation phase 2 study.

PubMed

Villafane, G; Thiriez, C; Audureau, E; Straczek, C; Kerschen, P; Cormier-Dequaire, F; Van Der Gucht, A; Gurruchaga, J-M; Quéré-Carne, M; Evangelista, E; Paul, M; Defer, G; Damier, P; Remy, P; Itti, E; Fénelon, G

2018-01-01

Studies of the effects of nicotine on motor symptoms in Parkinson's disease (PD) brought out discordant results. The aim of the present study was to evaluate the efficacy and safety of high doses of transdermal nicotine on motor symptoms in PD. Forty PD patients were randomly assigned to a treated and untreated arm in an open-label study. Treated patients received increasing doses of nicotine to reach 90 mg/day by 11 weeks. This dosage was maintained for 28 weeks (W39) and then reduced over 6 weeks. Final evaluation was performed 6 weeks after washout. The main outcome measure was the OFF-DOPA Unified Parkinson's Disease Rating Scale (UPDRS) motor score measured on video recordings by raters blinded to the medication status of the patients. There was no significant difference in OFF-DOPA UPDRS motor scores between the nicotine-treated and non-treated groups, neither at W39 (19.4 ± 9.3 vs. 21.5 ± 14.2) nor considering W39 differences from baseline (-1.5 ± 12.1 vs. +0.9 ± 12.1). The 39-item Parkinson's disease questionnaire scores decreased in nicotine-treated patients and increased in non-treated patients, but the difference was not significant. Overall tolerability was acceptable, and 12/20 treated patients reached the maximal dosage. High doses of transdermal nicotine were tolerated, but our study failed to demonstrate significant improvement in UPDRS motor scores. Improvement in unblinded secondary outcomes (UPDRS-II, UPDRS-IV, doses of l-DOPA equivalents) suggest a possible benefit for patients treated with nicotine, which should be confirmed in larger double blind, placebo-controlled studies. © 2017 EAN.
42 CFR 435.530 - Definition of blindness.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 42 Public Health 4 2010-10-01 2010-10-01 false Definition of blindness. 435.530 Section 435.530... ISLANDS, AND AMERICAN SAMOA Categorical Requirements for Eligibility Blindness § 435.530 Definition of blindness. (a) Definition. The agency must use the same definition of blindness as used under SSI, except...
5 CFR 2634.403 - Qualified blind trusts.

Code of Federal Regulations, 2012 CFR

2012-01-01

... 5 Administrative Personnel 3 2012-01-01 2012-01-01 false Qualified blind trusts. 2634.403 Section... Qualified blind trusts. (a) Definition. A qualified blind trust is a trust in which the filer, his spouse... instrument which establishes a blind trust must adhere substantively to model drafts circulated by the Office...

"VisionTouch Phone" for the Blind.

PubMed

Yong, Robest

2013-10-01

Our objective is to enable the blind to use smartphones with touchscreens to make calls and to send text messages (sms) with ease, speed, and accuracy. We believe that with our proposed platform, which enables the blind to locate the position of the keypads, new games and education, and safety applications will be increasingly developed for the blind. This innovative idea can also be implemented on tablets for the blind, allowing them to use information websites such as Wikipedia and newspaper portals.
Reversible blindness associated with alcoholic ketoacidosis.

PubMed

Yanagawa, Youichi; Kiyozumi, Teturou; Hatanaka, Kousuke; Itoh, Toshitaka; Sakamoto, Toshihisa; Okada, Yoshiaki

2004-04-01

To report a case of reversible blindness associated with severe alcoholic ketoacidosis. Observational case report. A 44-year-old male presented with gradual bilateral blindness that developed within a 24-hour period. He suffered from ethanol-induced severe ketoacidosis and shock and was resuscitated with epinephrine and sodium bicarbonate. The treatment of acidosis led to a rapid resolution of the patient's blindness. It is important to understand the role of severe acidosis as the sole causative factor of reversible bilateral blindness.
[A systematic social observation tool: methods and results of inter-rater reliability].

PubMed

Freitas, Eulilian Dias de; Camargos, Vitor Passos; Xavier, César Coelho; Caiaffa, Waleska Teixeira; Proietti, Fernando Augusto

2013-10-01

Systematic social observation has been used as a health research methodology for collecting information from the neighborhood physical and social environment. The objectives of this article were to describe the operationalization of direct observation of the physical and social environment in urban areas and to evaluate the instrument's reliability. The systematic social observation instrument was designed to collect information in several domains. A total of 1,306 street segments belonging to 149 different neighborhoods in Belo Horizonte, Minas Gerais, Brazil, were observed. For the reliability study, 149 segments (1 per neighborhood) were re-audited, and Fleiss kappa was used to access inter-rater agreement. Mean agreement was 0.57 (SD = 0.24); 53% had substantial or almost perfect agreement, and 20.4%, moderate agreement. The instrument appears to be appropriate for observing neighborhood characteristics that are not time-dependent, especially urban services, property characterization, pedestrian environment, and security.
Analysis of the Rater Effects on the Scoring of Diagnostic Trees Prepared by Teacher Candidates with the Many-Facet Rasch Model

ERIC Educational Resources Information Center

Nalbantoglu Yilmaz, Funda

2017-01-01

In the study, it was aimed to investigate the leniency/severity, bias and halo effect of the raters which were used in the scoring of the diagnostic tree prepared by the teacher candidates with the many-facet Rasch model. The research study group constitutes 24 teacher candidates who are taking measurement and evaluation lesson from the students…
Poverty and Blindness in Nigeria: Results from the National Survey of Blindness and Visual Impairment.

PubMed

Tafida, A; Kyari, F; Abdull, M M; Sivasubramaniam, S; Murthy, G V S; Kana, I; Gilbert, Clare E

2015-01-01

Poverty can be a cause and consequence of blindness. Some causes only affect the poorest communities (e.g. trachoma), and poor individuals are less likely to access services. In low income countries, cataract blind adults have been shown to be less economically active, indicating that blindness can exacerbate poverty. This study aims to explore associations between poverty and blindness using national survey data from Nigeria. Participants ≥40 years were examined in 305 clusters (2005-2007). Sociodemographic information, including literacy and occupation, was obtained by interview. Presenting visual acuity (PVA) was assessed using a reduced tumbling E LogMAR chart. Full ocular examination was undertaken by experienced ophthalmologists on all with PVA <6/12 in either eye. Causes of vision loss were determined using World Health Organization guidelines. Households were categorized into three levels of poverty based on literacy and occupation at household level. A total of 569/13,591 participants were blind (PVA <3/60, better eye; prevalence 4.2%, 95% confidence interval [CI] 3.8-4.6%). Prevalences of blindness were 8.5% (95% CI 7.7-9.5%), 2.5% (95% CI 2.0-3.1%), and 1.5% (95% CI 1.2-2.0%) in poorest, medium and affluent households, respectively (p = 0.001). Cause-specific prevalences of blindness from cataract, glaucoma, uncorrected aphakia and corneal opacities were significantly higher in poorer households. Cataract surgical coverage was low (37.2%), being lowest in females in poor households (25.3%). Spectacle coverage was 3 times lower in poor than affluent households (2.4% vs. 7.5%). In Nigeria, blindness is associated with poverty, in part reflecting lower access to services. Reducing avoidable causes will not be achieved unless access to services improves, particularly for the poor and women.
Blind Adolescents' Perceptions of Parental Attitudes.

ERIC Educational Resources Information Center

Agarwal, Rita; Piplani, Rashmi

1990-01-01

This study examined the perception of parental attitudes of 50 blind adolescents in northern India. Results indicated that blind girls perceived their parents as being more accepting and less rejecting than did blind boys, a result explained by culturally determined differences in social sex roles. (DB)
Inter-rater reliability of the German version of the Nurses' Global Assessment of Suicide Risk scale.

PubMed

Kozel, Bernd; Grieser, Manuela; Abderhalden, Christoph; Cutcliffe, John R

2016-10-01

In comparison to the general population, the suicide rates of psychiatric inpatient populations in Germany and Switzerland are very high. An important preventive contribution to the lowering of the suicide rates in mental health care is to ensure that the risk of suicide of psychiatric inpatients is assessed as accurately as possible. While risk-assessment instruments can serve an important function in determining such risk, very few have been translated to German. Therefore, in the present study, we reported on the German version of Nurses' Global Assessment of Suicide Risk (NGASR) scale. After translating the original instrument into German and pretesting the German version, we tested the inter-rater reliability of the instrument. Twelve video case studies were evaluated by 13 raters with the NGASR scale in a 'laboratory' trial. In each case, the observer's agreement was calculated for the single items, the overall scale, the risk levels, and the sum scores. The statistical data analysis was conducted with kappa and AC1 statistics for dichotomous (items, scale) scales. A high-to-very high observers' agreement (AC1: 0.62-1.00, kappa: 0.00-1.00) was determined for 16 items of the German version of the NGASR scale. We conclude that the German version of the NGASR scale is a reliable instrument for evaluating risk factors for suicide. A reliable application in the clinical practise appears to be enhanced by training in the use of the instrument and the right implementation instructions. © 2016 Australian College of Mental Health Nurses Inc.
Causes of Severe Visual Impairment and Blindness: Comparative Data From Bhutanese and Laotian Schools for the Blind.

PubMed

Farmer, Lachlan David Mailey; Ng, Soo Khai; Rudkin, Adam; Craig, Jamie; Wangmo, Dechen; Tsang, Hughie; Southisombath, Khamphoua; Griffiths, Andrew; Muecke, James

2015-01-01

To determine and compare the major causes of childhood blindness and severe visual impairment in Bhutan and Laos. Independent cross-sectional surveys. This survey consists of 2 cross-sectional observational studies. The Bhutanese component was undertaken at the National Institute for Vision Impairment, the only dedicated school for the blind in Bhutan. The Laotian study was conducted at the National Ophthalmology Centre and Vientiane School for the Blind. Children younger than age 16 were invited to participate. A detailed history and examination were performed consistent with the World Health Organization Prevention of Blindness Eye Examination Record. Of the 53 children examined in both studies, 30 were from Bhutan and 23 were from Laos. Forty percent of Bhutanese and 87.1% of Laotian children assessed were blind, with 26.7% and 4.3%, respectively, being severely visually impaired. Congenital causes of blindness were the most common, representing 45% and 43.5% of the Bhutanese and Laotian children, respectively. Anatomically, the primary site of blinding pathology differed between the cohorts. In Bhutan, the lens comprised 25%, with whole globe at 20% and retina at 15%, but in Laos, whole globe and cornea equally contributed at 30.4%, followed by retina at 17.4%. There was an observable difference in the rates of blindness/severe visual impairment due to measles, with no cases observed in the Bhutanese children but 20.7% of the total pathologies in the Laotian children attributable to congenital measles infection. Consistent with other studies, there is a high rate of blinding disease, which may be prevented, treated, or ameliorated.
20 CFR 416.983 - How we evaluate statutory blindness.

Code of Federal Regulations, 2010 CFR

2010-04-01

... 20 Employees' Benefits 2 2010-04-01 2010-04-01 false How we evaluate statutory blindness. 416.983... AGED, BLIND, AND DISABLED Determining Disability and Blindness Blindness § 416.983 How we evaluate statutory blindness. We will find that you are blind if you are statutorily blind within the meaning of...
20 CFR 416.982 - Blindness under a State plan.

Code of Federal Regulations, 2010 CFR

2010-04-01

... 20 Employees' Benefits 2 2010-04-01 2010-04-01 false Blindness under a State plan. 416.982 Section..., BLIND, AND DISABLED Determining Disability and Blindness Blindness § 416.982 Blindness under a State... plan because of your blindness for the month of December 1973; and (c) You continue to be blind as...
Using the STROBE statement to assess reporting in blindness prevalence surveys in low and middle income countries.

PubMed

Ramke, Jacqueline; Palagyi, Anna; Jordan, Vanessa; Petkovic, Jennifer; Gilbert, Clare E

2017-01-01

Cross-sectional blindness prevalence surveys are essential to plan and monitor eye care services. Incomplete or inaccurate reporting can prevent effective translation of research findings. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement is a 32 item checklist developed to improve reporting of observational studies. The aim of this study was to assess the completeness of reporting in blindness prevalence surveys in low and middle income countries (LMICs) using STROBE. MEDLINE, EMBASE and Web of Science databases were searched on April 8 2016 to identify cross-sectional blindness prevalence surveys undertaken in LMICs and published after STROBE was published in December 2007. The STROBE tool was applied to all included studies, and each STROBE item was categorized as 'yes' (met criteria), 'no' (did not meet criteria) or 'not applicable'. The 'Completeness of reporting (COR) score' for each manuscript was calculated: COR score = yes / [yes + no]. In journals with included studies the instructions to authors and reviewers were checked for reference to STROBE. The 89 included studies were undertaken in 32 countries and published in 37 journals. The mean COR score was 60.9% (95% confidence interval [CI] 58.1-63.7%; range 30.8-88.9%). The mean COR score did not differ between surveys published in journals with author instructions referring to STROBE (10/37 journals; 61.1%, 95%CI 56.4-65.8%) or in journals where STROBE was not mentioned (60.9%, 95%CI 57.4-64.3%; p = 0.93). While reporting in blindness prevalence surveys is strong in some areas, others need improvement. We recommend that more journals adopt the STROBE checklist and ensure it is used by authors and reviewers.
42 CFR 436.530 - Definition of blindness.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 42 Public Health 4 2010-10-01 2010-10-01 false Definition of blindness. 436.530 Section 436.530... Requirements for Medicaid Eligibility Blindness § 436.530 Definition of blindness. (a) Definition. The agency must use the definition of blindness that is used in the State plan for AB or AABD. (b) State plan...
Programs for Deaf-Blind Children and Adults.

ERIC Educational Resources Information Center

American Annals of the Deaf, 1995

1995-01-01

This report of the annual survey of programs for deaf-blind children and adults lists, by state, programs for deaf-blind children and youth, Helen Keller Centers for deaf-blind youth and adults, and programs for training teachers of deaf-blind students. Provided are program names, addresses, telephone numbers, and names of directors. (DB)
The Relationship between Lexical Frequency Profiling Measures and Rater Judgements of Spoken and Written General English Language Proficiency on the CELPIP-General Test

ERIC Educational Resources Information Center

Douglas, Scott Roy

2015-01-01

Independent confirmation that vocabulary in use unfolds across levels of performance as expected can contribute to a more complete understanding of validity in standardized English language tests. This study examined the relationship between Lexical Frequency Profiling (LFP) measures and rater judgements of test-takers' overall levels of…
Blindness and the age of enlightenment: Diderot's letter on the blind.

PubMed

Margo, Curtis E; Harman, Lynn E; Smith, Don B

2013-01-01

Several months after anonymously publishing an essay in 1749 with the title "Letter on the Blind for the Use of Those Who Can See," the chief editor of the French Encyclopédie was arrested and taken to the prison fortress of Vincennes just east of Paris, France. The correctly assumed author, Denis Diderot, was 35 years old and had not yet left his imprint on the Age of Enlightenment. His letter, which recounted the life of Nicolas Saunderson, a blind mathematician, was intended to advance secular empiricism and disparage the religiously tinged rationalism put forward by Rene Descartes. The letter's discussion of sensory perception in men born blind dismissed the supposed primacy of visual imagery in abstract thinking. The essay did little to resolve any philosophical controversy, but it marked a turning point in Western attitudes toward visual disability.
Early social-emotional development in blind infants.

PubMed

Tröster, H; Brambring, M

1992-01-01

In order to study the impact of blindness on social and emotional development during the first year of life, the level of social-emotional development was compared in blind and sighted 9- and 12-month-old infants. The five 9-month-old and the 17 12-month-old blind infants were completely blind from birth and exhibited no further serious disabilities. Social-emotional development was assessed with a scale from the Bielefeld Developmental Test for Blind Infants and Preschoolers containing three subscales on emotions, social interaction and impulse control. Compared to non-disabled infants, blind infants exhibited a more limited repertoire of facial expressions and less responsiveness. They less frequently attempted to initiate contact with their mothers (self-initiated interactions) or comply with simple requests and prohibitions than sighted infants. These differences in the social-emotional development of blind and sighted infants are traced back to the effects of blindness on the mother-child interaction. The lack of visual perception appears to impede particularly the acquisition of a dialogue concept.
Inter-rater reliability and review of the VA unresolved narratives.

PubMed Central

Eagon, J. C.; Hurdle, J. F.; Lincoln, M. J.

1996-01-01

To better understand how VA clinicians use medical vocabulary in every day practice, we set out to characterize terms generated in the Problem List module of the VA's DHCP system that were not mapped to terms in the controlled-vocabulary lexicon of DHCP. When entered terms fail to match those in the lexicon, a note is sent to a central repository. When our study started, the volume in that repository had reached 16,783 terms. We wished to characterize the potential reasons why these terms failed to match terms in the lexicon. After examining two small samples of randomly selected terms, we used group consensus to develop a set of rating criteria and a rating form. To be sure that the results of multiple reviewers could be confidently compared, we analyzed the inter-rater agreement of our rating process. Two rates used this form to rate the same 400 terms. We found that modifiers and numeric data were common and consistent reasons for failure to match, while others such as use of synonyms and absence of the concept from the lexicon were common but less consistently selected. PMID:8947642
Inter-rater reliability and review of the VA unresolved narratives.

PubMed

Eagon, J C; Hurdle, J F; Lincoln, M J

1996-01-01

To better understand how VA clinicians use medical vocabulary in every day practice, we set out to characterize terms generated in the Problem List module of the VA's DHCP system that were not mapped to terms in the controlled-vocabulary lexicon of DHCP. When entered terms fail to match those in the lexicon, a note is sent to a central repository. When our study started, the volume in that repository had reached 16,783 terms. We wished to characterize the potential reasons why these terms failed to match terms in the lexicon. After examining two small samples of randomly selected terms, we used group consensus to develop a set of rating criteria and a rating form. To be sure that the results of multiple reviewers could be confidently compared, we analyzed the inter-rater agreement of our rating process. Two rates used this form to rate the same 400 terms. We found that modifiers and numeric data were common and consistent reasons for failure to match, while others such as use of synonyms and absence of the concept from the lexicon were common but less consistently selected.
Multiply-Impaired Blind Children: A National Problem.

ERIC Educational Resources Information Center

Graham, Milton D.

In 1966, a national survey reported on 8,887 multiply impaired (MI) blind children. About 56% were boys; 85% had been blind since before age 3, and half were totally blind. The principal causes of blindness were retrolental fibroplasia and congenital cataracts. Almost 63% had two or more additional disabilities (86.8% of those under age 6), such…
Learning Receptive Fields and Quality Lookups for Blind Quality Assessment of Stereoscopic Images.

PubMed

Shao, Feng; Lin, Weisi; Wang, Shanshan; Jiang, Gangyi; Yu, Mei; Dai, Qionghai

2016-03-01

Blind quality assessment of 3D images encounters more new challenges than its 2D counterparts. In this paper, we propose a blind quality assessment for stereoscopic images by learning the characteristics of receptive fields (RFs) from perspective of dictionary learning, and constructing quality lookups to replace human opinion scores without performance loss. The important feature of the proposed method is that we do not need a large set of samples of distorted stereoscopic images and the corresponding human opinion scores to learn a regression model. To be more specific, in the training phase, we learn local RFs (LRFs) and global RFs (GRFs) from the reference and distorted stereoscopic images, respectively, and construct their corresponding local quality lookups (LQLs) and global quality lookups (GQLs). In the testing phase, blind quality pooling can be easily achieved by searching optimal GRF and LRF indexes from the learnt LQLs and GQLs, and the quality score is obtained by combining the LRF and GRF indexes together. Experimental results on three publicly 3D image quality assessment databases demonstrate that in comparison with the existing methods, the devised algorithm achieves high consistent alignment with subjective assessment.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.