Science.gov

Sample records for rater blinded reference

  1. Frame of Reference Rater Training Issues: Recall, Time and Behavior Observation Training.

    ERIC Educational Resources Information Center

    Roch, Sylvia G.; O'Sullivan, Brian J.

    2003-01-01

    Graduate students were trained as raters either using frame of reference (FOR, n=220, behavior observation training (BOT, n=21), or performance appraisal (controls, n=21). They rated videotaped lecturers twice. FOR increased number of behaviors recalled; FOR and BOT improved recall quality. FOR improved rating accuracy even after 2 weeks.…

  2. Frame of Reference Rater Training Issues: Recall, Time and Behavior Observation Training.

    ERIC Educational Resources Information Center

    Roch, Sylvia G.; O'Sullivan, Brian J.

    2003-01-01

    Graduate students were trained as raters either using frame of reference (FOR, n=220, behavior observation training (BOT, n=21), or performance appraisal (controls, n=21). They rated videotaped lecturers twice. FOR increased number of behaviors recalled; FOR and BOT improved recall quality. FOR improved rating accuracy even after 2 weeks.…

  3. An Open-Label, Rater-Blinded, Augmentation Study of Aripiprazole in Treatment-Resistant Depression

    PubMed Central

    Patkar, Ashwin A.; Peindl, Kathleen; Mago, Rajnish; Mannelli, Paolo; Masand, Prakash S.

    2006-01-01

    Background: About 30% to 46% of patients with major depressive disorder (MDD) fail to fully respond to initial antidepressants. While treatment-resistant depression commonly refers to nonresponse or partial response to at least 2 adequate trials with antidepressants from different classes, due to variability in terminology, a staging system based on prior treatment response has been suggested. Aripiprazole is a novel atypical antipsychotic with partial agonism at dopamine D2 and serotonin 5-HT1A receptors and antagonism at the 5-HT2 receptors. The present study evaluated whether augmentation with aripiprazole would be beneficial and tolerable in patients with treatment-resistant MDD who had failed 1 or more trials of antidepressants. Method: In an open-label, rater-blinded study conducted from March 2003 through December 2003, 10 patients with DSM-IV MDD without psychotic features who had failed to respond to an adequate trial of at least 1 antidepressant were prescribed aripiprazole (10–30 mg/day) for 6 weeks. The dose of preexisting antidepressants remained unchanged. Treatment response was defined as a 50% or greater reduction in score on the Hamilton Rating Scale for Depression (HAM-D) from baseline to end of treatment. Secondary efficacy measures included scores on the Clinical Global Impressions-Improvement (CGI-I) and -Severity (CGI-S) scales. Results: Eight of 10 patients had failed 2 or more antidepressant trials. The mean daily dose of aripiprazole was 13.21 mg. Intent-to-treat analysis showed that mean ± SD HAM-D scores reduced significantly from baseline (23.0 ± 8.1) to end of treatment (8.1 ± 6.0) (p < .001). There was a significant reduction in CGI-I (p < .05) and a trend toward decrease in CGI-S (p = .06) score. Seventy percent of the subjects were responders and 30% achieved remission. Common adverse effects were akathisia (20%), nausea (20%), and restlessness (20%). Conclusions: The study indicates the potential utility of aripiprazole as an

  4. A Randomized, Rater-Blinded, Parallel Trial of Intensive Speech Therapy in Sub-Acute Post-Stroke Aphasia: The SP-I-R-IT Study

    ERIC Educational Resources Information Center

    Martins, Isabel Pavao; Leal, Gabriela; Fonseca, Isabel; Farrajota, Luisa; Aguiar, Marta; Fonseca, Jose; Lauterbach, Martin; Goncalves, Luis; Cary, M. Carmo; Ferreira, Joaquim J.; Ferro, Jose M.

    2013-01-01

    Background: There is conflicting evidence regarding the benefits of intensive speech and language therapy (SLT), particularly because intensity is often confounded with total SLT provided. Aims: A two-centre, randomized, rater-blinded, parallel study was conducted to compare the efficacy of 100 h of SLT in a regular (RT) versus intensive (IT)…

  5. A Randomized, Rater-Blinded, Parallel Trial of Intensive Speech Therapy in Sub-Acute Post-Stroke Aphasia: The SP-I-R-IT Study

    ERIC Educational Resources Information Center

    Martins, Isabel Pavao; Leal, Gabriela; Fonseca, Isabel; Farrajota, Luisa; Aguiar, Marta; Fonseca, Jose; Lauterbach, Martin; Goncalves, Luis; Cary, M. Carmo; Ferreira, Joaquim J.; Ferro, Jose M.

    2013-01-01

    Background: There is conflicting evidence regarding the benefits of intensive speech and language therapy (SLT), particularly because intensity is often confounded with total SLT provided. Aims: A two-centre, randomized, rater-blinded, parallel study was conducted to compare the efficacy of 100 h of SLT in a regular (RT) versus intensive (IT)…

  6. Measurement-Based Care Versus Standard Care for Major Depression: A Randomized Controlled Trial With Blind Raters.

    PubMed

    Guo, Tong; Xiang, Yu-Tao; Xiao, Le; Hu, Chang-Qing; Chiu, Helen F K; Ungvari, Gabor S; Correll, Christoph U; Lai, Kelly Y C; Feng, Lei; Geng, Ying; Feng, Yuan; Wang, Gang

    2015-10-01

    The authors compared measurement-based care with standard treatment in major depression. Outpatients with moderate to severe major depression were consecutively randomized to 24 weeks of either measurement-based care (guideline- and rating scale-based decisions; N=61), or standard treatment (clinicians' choice decisions; N=59). Pharmacotherapy was restricted to paroxetine (20-60 mg/day) or mirtazapine (15-45 mg/day) in both groups. Depressive symptoms were measured with the Hamilton Depression Rating Scale (HAM-D) and the Quick Inventory of Depressive Symptomatology-Self-Report (QIDS-SR). Time to response (a decrease of at least 50% in HAM-D score) and remission (a HAM-D score of 7 or less) were the primary endpoints. Outcomes were evaluated by raters blind to study protocol and treatment. Significantly more patients in the measurement-based care group than in the standard treatment group achieved response (86.9% compared with 62.7%) and remission (73.8% compared with 28.8%). Similarly, time to response and remission were significantly shorter with measurement-based care (for response, 5.6 weeks compared with 11.6 weeks, and for remission, 10.2 weeks compared with 19.2 weeks). HAM-D scores decreased significantly in both groups, but the reduction was significantly larger for the measurement-based care group (-17.8 compared with -13.6). The measurement-based care group had significantly more treatment adjustments (44 compared with 23) and higher antidepressant dosages from week 2 to week 24. Rates of study discontinuation, adverse effects, and concomitant medications did not differ between groups. The results demonstrate the feasibility and effectiveness of measurement-based care for outpatients with moderate to severe major depression, suggesting that this approach can be incorporated in the clinical care of patients with major depression.

  7. Switching from clozapine to zotepine in patients with schizophrenia: a 12-week prospective, randomized, rater blind, and parallel study.

    PubMed

    Lin, Chao-Cheng; Chiu, Hsien-Jane; Chen, Jen-Yeu; Liou, Ying-Jay; Wang, Ying-Chieh; Chen, Tzu-Ting; Bai, Ya-Mei

    2013-04-01

    Clozapine is the most effective antipsychotic for patients with treatment-refractory schizophrenia, but many adverse effects are noted. Clinicians usually hesitate to switch from clozapine to other antipsychotics because of the risk of a re-emergence or worsening of the psychosis, although empirical studies are very limited. Zotepine, an atypical antipsychotic with a pharmacologic profile similar to clozapine, was found to be an effective treatment for patients with treatment-resistant schizophrenia in Japan. This 12-week study is the first prospective, randomized, and rater-blind study to investigate the efficacy and tolerability of switching from clozapine to zotepine. Fifty-nine patients with schizophrenia, who had taken clozapine for at least 6 months with a Clinical Global Impression-Severity score of at least 3, were randomly allocated to the zotepine and the clozapine groups. At the end of the study, 52 patients (88%) had completed the trial. The 7 withdrawal cases were all in the zotepine group. The final mean (SD) dose of zotepine and clozapine was 397.1 (75.7) versus 377.1 (62.5) mg/d, respectively. Patients in the zotepine group showed a significant increase in the Brief Psychiatric Rating Scale [mean (SD), 4.7 (8.7) vs -1.3 (6.3); P = 0.005], more general adverse effects as revealed by the Udvalg for Kliniske Undersogelser Rating Scale [mean (SD), 1.74 (3.9) vs -0.2 (2.8); P = 0.039], more extrapyramidal adverse effects as demonstrated by the Simpson and Angus Scale [mean (SD), 1.29 (3.5) vs 0.17 (2.1); P = 0.022], an increased use of propranolol (37.1% vs 0%, P < 0.0001) and anticholinergics (25.7% vs 0%, P = 0.008), and an increased level of prolactin (29.6 vs -3.8 ng/ mL, P < 0.0005), compared with the clozapine group. The results suggested that switching from clozapine to zotepine treatment should be done with caution.

  8. Generalized periodic discharges and 'triphasic waves': A blinded evaluation of inter-rater agreement and clinical significance.

    PubMed

    Foreman, Brandon; Mahulikar, Advait; Tadi, Prasanna; Claassen, Jan; Szaflarski, Jerzy; Halford, Jonathan J; Dean, Brian C; Kaplan, Peter W; Hirsch, Lawrence J; LaRoche, Suzette

    2016-02-01

    Generalized periodic discharges (GPDs) are associated with nonconvulsive seizures. Triphasic waves (TWs), a subtype of GPDs, have been described in relation to metabolic encephalopathy and not felt to be associated with seizures. We sought to establish the consistency of use of this descriptive term and its association with seizures. 11 experts in continuous EEG monitoring scored 20 cEEG samples containing GPDs using Standardized Critical Care EEG Terminology. In the absence of patient information, the inter-rater agreement (IRA) for EEG descriptors including TWs was assessed along with raters' clinical EEG interpretation and compared with actual patient information. The IRA for 'generalized' and 'periodic' was near-perfect (kappa=0.81), but fair for 'triphasic' (kappa=0.33). Patients with TWs were as likely to develop seizures as those without (25% vs 26%, N.S.) and surprisingly, patients with TWs were less likely to have toxic-metabolic encephalopathy than those without TWs (55% vs 79%, p<0.01). While IRA for the terms "generalized" and "periodic" is high, it is only fair for TWs. EEG interpreted as TWs presents similar risk for seizures as GPDs without triphasic appearance. GPDs are commonly associated with metabolic encephalopathy, but 'triphasic' appearance is not predictive. Conventional association of 'triphasic waves' with specific clinical conditions may lead to inaccurate EEG interpretation. Copyright © 2015 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.

  9. Genetics Home Reference: autosomal dominant congenital stationary night blindness

    MedlinePlus

    ... stationary night blindness autosomal dominant congenital stationary night blindness Printable PDF Open All Close All Enable Javascript ... collapse boxes. Description Autosomal dominant congenital stationary night blindness is a disorder of the retina , which is ...

  10. Genetics Home Reference: autosomal recessive congenital stationary night blindness

    MedlinePlus

    ... stationary night blindness autosomal recessive congenital stationary night blindness Printable PDF Open All Close All Enable Javascript ... collapse boxes. Description Autosomal recessive congenital stationary night blindness is a disorder of the retina , which is ...

  11. Genetics Home Reference: X-linked congenital stationary night blindness

    MedlinePlus

    ... stationary night blindness X-linked congenital stationary night blindness Printable PDF Open All Close All Enable Javascript ... collapse boxes. Description X-linked congenital stationary night blindness is a disorder of the retina , which is ...

  12. A randomized, rater-blinded, parallel trial of intensive speech therapy in sub-acute post-stroke aphasia: the SP-I-R-IT study.

    PubMed

    Martins, Isabel Pavão; Leal, Gabriela; Fonseca, Isabel; Farrajota, Luísa; Aguiar, Marta; Fonseca, José; Lauterbach, Martin; Gonçalves, Luís; Cary, M Carmo; Ferreira, Joaquim J; Ferro, Jose M

    2013-01-01

    There is conflicting evidence regarding the benefits of intensive speech and language therapy (SLT), particularly because intensity is often confounded with total SLT provided. A two-centre, randomized, rater-blinded, parallel study was conducted to compare the efficacy of 100 h of SLT in a regular (RT) versus intensive (IT) treatment in sub-acute post-stroke aphasia. Consecutive patients with aphasia, within 3 months of a left hemisphere ischemic stroke, were randomized to IT (2 h per day × 5 days per week, 10 weeks) or RT (2 h per week × 50 weeks). Evaluations took place at 10, 50 and 62 weeks. Primary outcome was the frequency of responders, defined by 15% increase of Aphasia Quotient (AQ) from the baseline to 50 weeks. Secondary outcomes were changes from the baseline in AQ and functional communication profile (FCP) at 50 and 62 weeks and improvement stability between 50 and 62 weeks. Thirty patients were randomized and 18 completed the study. No significant differences were found between groups in primary or secondary outcomes, although IT patients (N = 9) obtained higher scores in language measures between 10 and 62 weeks in per protocol analysis. The number of non-completions was identical between groups. This study suggests that, in the sub-acute period following stroke and controlling for the number of hours of SLT provided, there is a trend for a greater improvement in language and functional communication measures with IT compared with RT. The lack of statistical significance in results was probably due to the small sample size. © 2013 Royal College of Speech and Language Therapists.

  13. Lexical References to Sensory Modalities in Verbal Descriptions of People and Objects by Congenitally Blind, Late Blind and Sighted Adults

    PubMed Central

    Chauvey, Valérie; Hatwell, Yvette; Verine, Bertrand; Kaminski, Gwenael; Gentaz, Edouard

    2012-01-01

    Background Some previous studies have revealed that while congenitally blind people have a tendency to refer to visual attributes (‘verbalism’), references to auditory and tactile attributes are scarcer. However, this statement may be challenged by current theories claiming that cognition is linked to the perceptions and actions from which it derives. Verbal productions by the blind could therefore differ from those of the sighted because of their specific perceptual experience. The relative weight of each sense in oral descriptions was compared in three groups with different visual experience Congenitally blind (CB), late blind (LB) and blindfolded sighted (BS) adults. Methodology/Principal Findings Participants were asked to give an oral description of their mother and their father, and of four familiar manually-explored objects. The number of visual references obtained when describing people was relatively high, and was the same in the CB and BS groups (“verbalism” in the CB). While references to touch were scarce in all groups, the CB referred to audition more frequently than the LB and the BS groups. There were, by contrast, no differences between groups in descriptions of objects, and references to touch dominated the other modalities. Conclusion/Significance The relative weight of each modality varies according to the cognitive processes involved in each task. Long term memory, internal representations and information acquired through social communication, are at work in the People task, seem to favour visual references in both the blind and the sighted, whereas the congenitally blind also refer often to audition. By contrast, the perceptual encoding and working memory at work in the Objects task enhance sensory references to touch in a similar way in all groups. These results attenuate the impact of verbalism in blindness, and support (albeit moderately) the idea that the perceptual experience of the congenitally blind is to some extent reflected in

  14. Comparing the Effectiveness of Self-Paced and Collaborative Frame-of-Reference Training on Rater Accuracy in a Large-Scale Writing Assessment

    ERIC Educational Resources Information Center

    Raczynski, Kevin R.; Cohen, Allan S.; Engelhard, George, Jr.; Lu, Zhenqiu

    2015-01-01

    There is a large body of research on the effectiveness of rater training methods in the industrial and organizational psychology literature. Less has been reported in the measurement literature on large-scale writing assessments. This study compared the effectiveness of two widely used rater training methods--self-paced and collaborative…

  15. Comparing the Effectiveness of Self-Paced and Collaborative Frame-of-Reference Training on Rater Accuracy in a Large-Scale Writing Assessment

    ERIC Educational Resources Information Center

    Raczynski, Kevin R.; Cohen, Allan S.; Engelhard, George, Jr.; Lu, Zhenqiu

    2015-01-01

    There is a large body of research on the effectiveness of rater training methods in the industrial and organizational psychology literature. Less has been reported in the measurement literature on large-scale writing assessments. This study compared the effectiveness of two widely used rater training methods--self-paced and collaborative…

  16. Blindness

    MedlinePlus

    ... CPR: A Real Lifesaver Kids Talk About: Coaches Blindness KidsHealth > For Kids > Blindness Print A A A ... help, are sometimes called "legally blind." What Causes Blindness? Vision problems can develop before a baby is ...

  17. Blindness

    MedlinePlus

    ... Emergency Room? What Happens in the Operating Room? Blindness KidsHealth > For Kids > Blindness A A A What's ... help, are sometimes called "legally blind." What Causes Blindness? Vision problems can develop before a baby is ...

  18. Comparison between blinded and partially blinded detection of gastric cancer with multidetector CT using surgery and endoscopic submucosal dissection as reference standards.

    PubMed

    Kim, H J; Lee, D H; Ko, Y T

    2010-08-01

    The aim of this study is to compare blinded with partially blinded detection of gastric cancer with multidetector (MD) CT by using surgery and endoscopic submucosal dissection (ESD) as reference standards. 44 patients with gastric cancer underwent MDCT with air as an oral contrast agent. Surgery was performed on 37 patients, ESD on six and surgery after ESD on one. To provide comparison cases of blinded evaluation, 38 MDCT examinations were added for cases where no focal gastric lesion was seen on endoscopy. Two radiologists, blinded to the presence, number and location of the tumours, evaluated axial and axial plus multiplanar reformation (MPR) images of 82 MDCT examinations with or without gastric cancer. For partially blinded evaluation, the same radiologists, blinded to the location and number of tumours, evaluated axial and axial plus MPR images of 44 MDCT examinations of gastric cancer. Differences in assessment were resolved by consensus. 45 gastric cancers were found in surgical and ESD specimens. Detection rates of gastric cancer from axial and axial plus MPR images during blinded evaluation and from axial and axial plus MPR images during partially blinded evaluation were 62% (28/45), 64% (29/45), 64% (29/45) and 71% (32/45), respectively. There was no statistical significance for the comparison between blinded and partially blinded detection rates of gastric cancer. The detection rate of gastric cancer with MDCT during blinded evaluation showed no specific difference compared with the detection rate of gastric cancer with MDCT during partially blinded evaluation.

  19. Blind and reference channel-based time interleaved ADC calibration schemes: a comparison

    NASA Astrophysics Data System (ADS)

    Cimmino, Rosario F.; Centurelli, Francesco; Monsurrò, Pietro; Romano, Francesco; Trifiletti, Alessandro

    2016-07-01

    Many digital background calibration techniques exist which correct for offset, gain, timing and bandwidth mismatches in time-interleaved (TI) ADCs. Some require an additional reference channel, whereas others are blind and rely on the presence of a band where no signal is present (usually around the Nyquist frequency) or exploit other properties of the input signal. Blind calibration techniques, which don't use a reference channel, are suitable for correction of commercial TI-ADCs, or TI-ADC systems using commercial ADCs as channels. Techniques employing additional channels require a more complex layout (especially for the clock tree) and need an additional ADC, whose overhead cost is significant, especially for 2- or 4- channel TI-ADCs. However, we show that the estimation process is faster and more accurate when a reference channel is present, and many different error models can be used (exploiting different points in the accuracy / complexity trade-off).

  20. Longitudinal Rater Modeling with Splines

    ERIC Educational Resources Information Center

    Dobria, Lidia

    2011-01-01

    Performance assessments rely on the expert judgment of raters for the measurement of the quality of responses, and raters unavoidably introduce error in the scoring process. Defined as the tendency of a rater to assign higher or lower ratings, on average, than those assigned by other raters, even after accounting for differences in examinee…

  1. Longitudinal Rater Modeling with Splines

    ERIC Educational Resources Information Center

    Dobria, Lidia

    2011-01-01

    Performance assessments rely on the expert judgment of raters for the measurement of the quality of responses, and raters unavoidably introduce error in the scoring process. Defined as the tendency of a rater to assign higher or lower ratings, on average, than those assigned by other raters, even after accounting for differences in examinee…

  2. A randomized, rater-blinded, crossover study comparing the clinical efficacy of Ritalin(®) LA (methylphenidate) treatment in children with attention-deficit hyperactivity disorder under different breakfast conditions over 2 weeks.

    PubMed

    Schulz, Eberhard; Fleischhaker, Christian; Hennighausen, Klaus; Heiser, Philip; Haessler, Frank; Linder, Martin; Stollhoff, Kirsten; Warnke, Andreas; Baier, Monika; Klatt, Jan

    2010-11-01

    Several extended-release methylphenidate medications are available for treatment of children with ADHD. Pharmacokinetic investigations suggest that the serum levels of methylphenidate are partially altered when the medication is taken without breakfast. Clinical data comparing different breakfast situations are missing. In this study, different breakfast compositions and their influence on treatment with Ritalin LA are investigated. A total of 150 patients were enrolled in a rater-blinded, randomized crossover trial that compared a minimal breakfast with a standard breakfast in patients under stable treatment with Ritalin LA. Ratings for clinical efficacy were carried out after 1 week by teachers and parents (FBB-ADHS), as well as physicians (CGI). Additionally, a math test was administered to the patients. Of the total patients, 144 finished the trial with a breakfast compliance of 93%. All of the clinical rating scales showed consistently no difference between the two breakfast conditions. Non-inferiority of minimal breakfast versus standard breakfast was shown to be statistically significant (FBB-AHDS(Teacher): 0.97 with minimal breakfast, 1.01 with standard breakfast, P < 0.0001). The clinical efficacy of Ritalin LA is not influenced by breakfast and works independently of food intake.

  3. The efficacy of cerebellar vermal deep high frequency (theta range) repetitive transcranial magnetic stimulation (rTMS) in schizophrenia: A randomized rater blind-sham controlled study.

    PubMed

    Garg, Shobit; Sinha, Vinod Kumar; Tikka, Sai Krishna; Mishra, Preeti; Goyal, Nishant

    2016-09-30

    Repetitive transcranial magnetic stimulation (rTMS) is a promising therapeutic for schizophrenia. Treatment effects of rTMS have been variable across different symptom clusters, with negative symptoms showing better response, followed by auditory hallucinations. Cerebellum, especially vermis and its abnormalities (both structural and functional) have been implicated in cognitive, affective and positive symptoms of schizophrenia. rTMS to this alternate site has been suggested as a novel target for treating patients with this disorder. Hypothesizing cerebellar vermal magnetic stimulation as an adjunct to treat schizophrenia psychopathology, we conducted a double blind randomized sham controlled rTMS study. In this study, forty patients were randomly allocated (using block randomization method) to active high frequency (theta patterned) rTMS (n=20) and sham (n=20) groups. They received 10 sessions over 2 weeks. The Positive and Negative Syndrome Scale (PANSS) and Calgary Depression Scale for Schizophrenia (CDSS) scores were assessed at baseline, after last session and at 4 weeks (2 weeks post-rTMS). We found a significantly greater improvement in the group receiving active rTMS sessions, compared to the sham group on negative symptoms, and depressive symptoms. We conclude that cerebellar stimulation can be used as an effective adjunct to treat negative and affective symptoms.

  4. An open-label, rater-blinded, 8-week trial of bupropion hydrochloride extended-release in patients with major depressive disorder with atypical features.

    PubMed

    Seo, H-J; Lee, B C; Seok, J-H; Jeon, H J; Paik, J-W; Kim, W; Kwak, K-P; Han, C; Lee, K-U; Pae, C-U

    2013-09-01

    The present study aimed at investigating the effectiveness and tolerability of -bupropion hydrochloride extended release (XL) in major depressive disorder (MDD) patients with atypical features (AF).51 patients were prescribed bupropion XL for 8 weeks (6 visits: screening, baseline, weeks 1, 2, 4 and 8). The primary efficacy measure was a change of the Structured Interview Guide for the Hamilton Depression Rating Scale-Seasonal Affective Disorder Version (SIGH-SAD) from baseline to endpoint. Secondary efficacy measures included the SIGH-SAD atypical symptoms subscale, Clinical Global Impression-Severity (CGI-S), Sheehan Disability Scale (SDS) and Epworth Sleepiness Questionnaire (ESQ). Response or remission was defined as ≥50% reduction or ≤7 in SIGH-SAD total scores, respectively, at end of treatment.The HAM-D-29 total score reduced by 55.3% from baseline (27.3±6.5) to end of treatment (12.2±6.3) (p<0.001). Atypical symptom subscale scores also reduced by 54.5% from baseline (9.2±3.0) to end of treatment (4.2±2.8) (p<0.001). At the end of treatment, 24.4% (n=10) and 51.2% (n=21) subjects were classified as remitters and responders, respectively. The most frequently reported AEs were headache (13.7%), dry mouth (11.8%), dizziness (9.8%), and dyspepsia (9.8%).Our preliminary study indicates that bupropion XL may be beneficial in the treatment of MDD with atypical features. Adequately powered, randomized, double-blind, placebo-controlled trials are necessary to determine our results.

  5. Adrenocorticotropic hormone versus methylprednisolone added to interferon β in patients with multiple sclerosis experiencing breakthrough disease: a randomized, rater-blinded trial

    PubMed Central

    Berkovich, Regina; Bakshi, Rohit; Amezcua, Lilyana; Axtell, Robert C.; Cen, Steven Y.; Tauhid, Shahamat; Neema, Mohit; Steinman, Lawrence

    2016-01-01

    Background: The objective of this study was to evaluate monthly intramuscular adrenocorticotropic hormone (ACTH) gel versus intravenous methylprednisolone (IVMP) add-on therapy to interferon β for breakthrough disease in patients with relapsing forms of multiple sclerosis. Methods: This was a prospective, open-label, examiner-blinded, 15-month pilot study evaluating patients with Expanded Disability Status Scale (EDSS) score 3.0–6.5 and at least one clinical relapse or new T2 or gadolinium-enhanced lesion in the previous year. Twenty-three patients were randomized to ACTH (n = 12) or IVMP (n = 11) and completed the study. The primary outcome measure was the cumulative number of relapses. Secondary outcomes included EDSS, Mental Health Inventory (MHI), plasma cytokines, MS Functional Composite (MSFC), Quality-of-Life (MS-QOL) score, bone mineral density (BMD), and new or worsened psychiatric symptoms per month. Brain magnetic resonance imaging was analyzed post hoc. This was a preliminary and small-scale study. Results: Relapse rates differed significantly [ACTH 0.08, 95% confidence interval (CI) 0.01–0.54 versus IVMP 0.80, 95% CI 0.36–1.75; rate ratio, IVMP versus ACTH: 9.56, 95% CI 1.23–74.6; p = 0.03]. ACTH improved (p = 0.03) MHI (slope 0.95 ± 0.38 points/month; p = 0.02 versus slope −0.38 ± 0.43 points/month; p = 0.39). On-study decreases (all p < 0.05) in eight cytokine levels occurred only in the ACTH group. However, on-study EDSS, MSFC, MS-QOL, BMD, and MRI lesion changes were not significant between groups. Psychiatric symptoms per patient were greater with IVMP than ACTH (0.55, 95% CI 0.12–2.6 versus 0; p < 0.0001). Other common adverse events were insomnia and urinary tract infections (IVMP, seven events each) and fatigue or flu symptoms (ACTH, five events each). Conclusions: This study provided class II evidence that ACTH produced better examiner-assessed cumulative rates of relapses per patient than IVMP in the adjunctive treatment of

  6. Blindness

    MedlinePlus

    ... visual function, preservation of sight, and the special health problems and requirements of the blind.” News & Events Events Calendar NEI Press Releases News from NEI Grantees Spokesperson bios Statistics and ... Frequently asked questions Clinical Studies Publications Catalog ...

  7. Building a Library Collection on Blindness and Physical Disabilities: Basic Materials and Resources. Reference Circular No. 90-3.

    ERIC Educational Resources Information Center

    Library of Congress, Washington, DC. National Library Service for the Blind and Physically Handicapped.

    The materials listed in this reference circular are recommended to libraries and organizations as basic resources for providing a current information service on visual impairments and physical disabilities. The selections, which are based on the holdings of the Reference Section of the National Library Service (NLS) for the Blind and Physically…

  8. The Effects of Rater Training on Inter-Rater Agreement

    ERIC Educational Resources Information Center

    Pufpaff, Lisa A.; Clarke, Laura; Jones, Ruth E.

    2015-01-01

    This paper addresses the effects of rater training on the rubric-based scoring of three preservice teacher candidate performance assessments. This project sought to evaluate the consistency of ratings assigned to student learning outcome measures being used for program accreditation and to explore the need for rater training in order to increase…

  9. The Effects of Rater Training on Inter-Rater Agreement

    ERIC Educational Resources Information Center

    Pufpaff, Lisa A.; Clarke, Laura; Jones, Ruth E.

    2015-01-01

    This paper addresses the effects of rater training on the rubric-based scoring of three preservice teacher candidate performance assessments. This project sought to evaluate the consistency of ratings assigned to student learning outcome measures being used for program accreditation and to explore the need for rater training in order to increase…

  10. Reference frame preferences in haptics differ for the blind and sighted in the horizontal but not in the vertical plane.

    PubMed

    Struiksma, Marijn E; Noordzij, Matthijs L; Postma, Albert

    2011-01-01

    We investigated which reference frames are preferred when matching spatial language to the haptic domain. Sighted, low-vision, and blind participants were tested on a haptic-sentence-verification task where participants had to haptically explore different configurations of a ball and a shoe and judge the relation between them. Results from the spatial relation "above", in the vertical plane, showed that various reference frames are available after haptic inspection of a configuration. Moreover, the pattern of results was similar for all three groups and resembled patterns found for the sighted on visual sentence-verification tasks. In contrast, when judging the spatial relation "in front", in the horizontal plane, the blind showed a markedly different response pattern. The sighted and low-vision participants did not show a clear preference for either the absolute/relative or the intrinsic reference frame when these frames were dissociated. The blind, on the other hand, showed a clear preference for the intrinsic reference frame. In the absence of a dominant cue, such as gravity in the vertical plane, the blind might emphasise the functional relationship between the objects owing to enhanced experience with haptic exploration of objects.

  11. The Work-ability Support Scale: evaluation of scoring accuracy and rater reliability.

    PubMed

    Turner-Stokes, Lynne; Fadyl, Joanna; Rose, Hilary; Williams, Heather; Schlüter, Philip; McPherson, Kathryn

    2014-09-01

    The Work-ability Support Scale (WSS) is a new tool designed to assess vocational ability and support needs following onset of acquired disability, to assist decision-making in vocational rehabilitation. In this article, we report an iterative process of development through evaluation of inter- and intra-rater reliability and scoring accuracy, using vignettes. The impact of different methodological approaches to analysis of reliability is highlighted. Following preliminary evaluation using case-histories, six occupational therapists scored vignettes, first individually and then together in two teams. Scoring was repeated blind after 1 month. Scoring accuracy was tested against agreed 'reference standard' vignette scores using intraclass correlation coefficients (ICCs) for total scores and linear-weighted kappas (kw) for individual items. Item-by-item inter- and intra-rater reliability was evaluated for both individual and team scores, using two different statistical methods. ICCs for scoring accuracy ranged from 0.95 (95 % CI 0.78-0.98) to 0.96 (0.89-0.99) for Part A, and from 0.78 (95 % CI 0.67-0.85) to 0.84 (0.69-0.92) for Part B. Item by item analysis of scoring accuracy, inter- and intra-rater reliability all showed 'substantial' to 'almost perfect' agreement (kw ≥ 0.60) for all Part-A and 8/12 Part-B items, although multi-rater kappa (Fleiss) produced more conservative results (mK = 0.34-0.79). Team rating produced marginal improvements for Part-A but not Part-B. Four problematic contextual items were identified, leading to adjustment of the scoring manual. This vignette-based study demonstrates generally acceptable levels of scoring accuracy and reliability for the WSS. Further testing in real-life situations is now warranted.

  12. Rater agreement in lung scintigraphy.

    PubMed

    Christiansen, F; Andersson, T; Rydman, H; Qvarner, N; Måre, K

    1996-09-01

    The PIOPED criteria in their original and revised forms are today's standards in the interpretation of ventilation-perfusion scintigraphy. When the PIOPED criteria are used by experienced raters with training in consensus interpretation, the agreement rates have been demonstrated to be excellent. Our purpose was to investigate the rates of agreement between 2 experienced raters from different hospital who had no training in consensus interpretation. The 2 raters investigated a population of 195 patients. This group included 72 patients from a previous study who had an intermediate probability of pulmonary embolism and who had also been examined by pulmonary angiography. The results demonstrated moderate agreement rates with a kappa value of 0.54 (0.45-0.63 in a 95% confidence interval), which is similar to the kappa value of the PIOPED study but significantly lower than the kappa values of agreement rates among consensus-trained raters. There was a low consistency in the intermediate probability category, with a proportional agreement rate of 0.39 between the experienced raters. The moderate agreement rates between raters from different hospitals make it difficult to compare study populations of a certain scintigraphic category in different hospitals. Further investigations are mandatory for accurate diagnosis when the scintigrams are in the category of intermediate probability of pulmonary embolism.

  13. Intra- and Inter-Rater Reliability of the Modified Tuck Jump Assessment

    PubMed Central

    Fort-Vanmeerhaeghe, Azahara; Montalvo, Alicia M.; Lloyd, Rhodri S.; Read, Paul; Myer, Gregory D.

    2017-01-01

    The Tuck Jump Assessment (TJA) is a clinician-friendly screening tool that was designed to support practitioners with identification of neuromuscular deficits associated with anterior cruciate ligament injury. This study aimed to evaluate the inter- and intra-rater reliability of the modified scoring (0 to 2) TJA to add an additional range of objectivity for each criterion. A total of 24 elite youth volleyball athletes (12 males and 12 females) were included in this study. Each participant’s recorded performance of the TJA was scored independently by two raters across ten criteria using the modified scale. The two raters then scored the same videos one week later. Another investigator who was blind to the identity of the raters analyzed the scores from both raters for each participant. Kappa coefficient (k) and percentage of exact agreement (PEA) for both intra- and inter-rater reliability were analyzed for each item. Intraclass correlation coefficients (ICC) were calculated to determine intra- and inter-rater reliability of the modified TJA total score. Intra- and inter-rater k was good to excellent for most items (0.65-0.91). Average PEA between the two raters and two sessions ranged from 83.3 to 100% in all scored items. The ICC for the total score was excellent in both inter- and inter-rater correlations (0.94-0.96). This research demonstrated that the modified version of the TJA predominantly shows good to excellent intra- and inter-rater reliability in all analyzed criteria. Key points The modified TJA shows good to excellent intra- and inter-rater reliability. This test is useful for assessing repeated jump-landing technique. This test provides a user-friendly option for assessing high-risk movement patterns. PMID:28344460

  14. Randomized, double-blind study comparing proposed biosimilar LA-EP2006 with reference pegfilgrastim in breast cancer.

    PubMed

    Harbeck, Nadia; Lipatov, Oleg; Frolova, Mona; Udovitsa, Dmitry; Topuzov, Eldar; Ganea-Motan, Doina Elena; Nakov, Roumen; Singh, Pritibha; Rudy, Anita; Blackwell, Kimberly

    2016-06-01

    This randomized, double-blind trial compared proposed biosimilar LA-EP2006 with reference pegfilgrastim in women receiving chemotherapy for breast cancer (PROTECT-1). Women (≥18 years) were randomized to receive LA-EP2006 (n = 159) or reference (n = 157) pegfilgrastim (Neulasta(®), Amgen) for ≤6 cycles of (neo)-adjuvant TAC chemotherapy. Primary end point was duration of severe neutropenia (DSN) during cycle 1 (number of consecutive days with absolute neutrophil count <0.5 × 10(9)/l) with equivalence confirmed if 90% and 95% CIs were within a ±1 day margin. For DSN, LA-EP2006 was equivalent to reference (difference: 0.07 days; 90% CI: -0.09-0.23; 95% CI: -0.12-0.26). LA-EP2006 and reference pegfilgrastim showed no clinically meaningful differences regarding efficacy and safety in breast cancer patients receiving chemotherapy.

  15. ICT in Portuguese Reference Schools for the Education of Blind and Partially Sighted Students

    ERIC Educational Resources Information Center

    Ramos, Sara Isabel Moca; de Andrade, António Manuel Valente

    2016-01-01

    Technology has become an essential component in our society and considering its impact in the educational system, Information and Communication Technologies (ICT) cannot be dissociated from the educational process and, in particular, from pedagogical practices adopted for students who are blind or partially sighted. This study focuses on…

  16. ICT in Portuguese Reference Schools for the Education of Blind and Partially Sighted Students

    ERIC Educational Resources Information Center

    Ramos, Sara Isabel Moca; de Andrade, António Manuel Valente

    2016-01-01

    Technology has become an essential component in our society and considering its impact in the educational system, Information and Communication Technologies (ICT) cannot be dissociated from the educational process and, in particular, from pedagogical practices adopted for students who are blind or partially sighted. This study focuses on…

  17. The Smile Esthetic Index (SEI): A method to measure the esthetics of the smile. An intra-rater and inter-rater agreement study.

    PubMed

    Rotundo, Roberto; Nieri, Michele; Bonaccini, Daniele; Mori, Massimiliano; Lamberti, Elena; Massironi, Domenico; Giachetti, Luca; Franchi, Lorenzo; Venezia, Piero; Cavalcanti, Raffaele; Bondi, Elena; Farneti, Mauro; Pinchi, Vilma; Buti, Jacopo

    2015-01-01

    To propose a method to measure the esthetics of the smile and to report its validation by means of an intra-rater and inter-rater agreement analysis. Ten variables were chosen as determinants for the esthetics of a smile: smile line and facial midline, tooth alignment, tooth deformity, tooth dischromy, gingival dischromy, gingival recession, gingival excess, gingival scars and diastema/missing papillae. One examiner consecutively selected seventy smile pictures, which were in the frontal view. Ten examiners, with different levels of clinical experience and specialties, applied the proposed assessment method twice on the selected pictures, independently and blindly. Intraclass correlation coefficient (ICC) and Fleiss' kappa) statistics were performed to analyse the intra-rater and inter-rater agreement. Considering the cumulative assessment of the Smile Esthetic Index (SEI), the ICC value for the inter-rater agreement of the 10 examiners was 0.62 (95% CI: 0.51 to 0.72), representing a substantial agreement. Intra-rater agreement ranged from 0.86 to 0.99. Inter-rater agreement (Fleiss' kappa statistics) calculated for each variable ranged from 0.17 to 0.75. The SEI was a reproducible method, to assess the esthetic component of the smile, useful for the diagnostic phase and for setting appropriate treatment plans.

  18. How Do Raters Judge Spoken Vocabulary?

    ERIC Educational Resources Information Center

    Li, Hui

    2016-01-01

    The aim of the study was to investigate how raters come to their decisions when judging spoken vocabulary. Segmental rating was introduced to quantify raters' decision-making process. It is hoped that this simulated study brings fresh insight to future methodological considerations with spoken data. Twenty trainee raters assessed five Chinese…

  19. Rater reliability of fragile X mutation size estimates: A multilaboratory analysis

    SciTech Connect

    Fisch, G.S.; Maddalena, A.

    1996-08-09

    Notwithstanding the use of comparable molecular protocols, description and measurement of the fra(X) (fragile X) mutation may vary according to its appearance as a discrete band, smear, multiple bands, or mosaic. Estimation of mutation size may also differ from one laboratory to another. We report on the description of a mutation size estimate for a large sample of individuals tested for the fra(X) pre- or full mutation. Of 63 DNA samples evaluated, 45 were identified previously as fra(X) pre- or full mutations. DNA from 18 unaffected individuals was used as control. Genomic DNA was extracted from peripheral blood, and DNA fragments from each of four laboratories were sent to a single center where Southern blots were prepared and hybridized with the pE5.1 probe. Photographs from autoradiographs were returned to each site, and raters blind to the identity of the specimens were asked to evaluate them. Raters` estimates of mutation size compared favorably with a reference test. Intrarater reliability was good to excellent. Variability in mutation size estimates was comparable across band types. Variability in estimates was moderate, and was significantly correlated with absolute mutation size and band type. 9 refs., 1 fig., 3 tabs.

  20. Rater Effects on Essay Scoring: A Multilevel Analysis of Severity Drift, Central Tendency, and Rater Experience

    ERIC Educational Resources Information Center

    Leckie, George; Baird, Jo-Anne

    2011-01-01

    This study examined rater effects on essay scoring in an operational monitoring system from England's 2008 national curriculum English writing test for 14-year-olds. We fitted two multilevel models and analyzed: (1) drift in rater severity effects over time; (2) rater central tendency effects; and (3) differences in rater severity and central…

  1. Retrieval of diffusing surface by two-frame interferometric method with blind phase shift of a reference wave

    NASA Astrophysics Data System (ADS)

    Muravsky, Leonid I.; Kmet', Arkady B.; Voronyak, Taras I.

    2011-08-01

    Two-frame interferometric method with blind phase shift of a reference wave for smooth surfaces retrieval is considered. The ability of this method to reconstruct a macrorelief of diffusing surfaces with a given roughness is studied. Computer simulations have testified the ability of reliable low-noise reconstruction of the diffusing surface macrorelief with standard deviation of the roughness heights up to λ/10 by using the developed interferogram processing algorithm. The simulations have shown that the proposed correlation approach, which is used to determine the reference wave blind phase shift, is more suitable for a diffusing surface than for a smooth one and the increase of surface roughness leads to a quadruple decrease of this error in comparison with that for the smooth surface. Experimental verification of the interferometric method performance to retrieve real diffusing surface macroreliefs with given roughness has been done by using the experimental setup based on a Twyman-Green interferometer and roughness comparison specimen. The obtained experimental results virtually have coincided with the computer simulation results that prove the performance of the considered method to retrieve not only smooth, but also diffusing surfaces.

  2. Microhomogeneity in reference materials for microanalytical methods - a possible recourse from a blind alley?

    NASA Astrophysics Data System (ADS)

    Renno, A. D.; Michalak, P. P.; Munnik, F.; Tolosana-Delgado, R.; van den Boogaart, G. K.

    2013-12-01

    It is assumed that reference materials for microanalytical methods must be homogeneous, i.e. have the same concentration of the relevant element(s) overall, to ensure that they can be used reliably to get comparison values during the analysis with non absolute methods. With increasing resolution it becomes more and more difficult to ensure such homogeneity, up to the point that it is not possible for several microanalytical methods. Painstaking search for homogeneous natural minerals in gem quality or elaborate expensive methods to produce synthetic minerals provide as obvious solutions to the problem. We propose a way to get reliable reference values with some types of inhomogeneous material, based on multiple probing the reference material. Consider a reference material, which average concentration on the relevant element and its microscale variability has been adequately characterized by a destructive method at a series of grid spots. The minimal number of probing spots required for a certain precision level can be derived from the variance calculations. This procedure is always valid, whenever the heterogeneity value distribution of the reference material has a variance, but at the price that the number of spots will be huge if it is large. However, using adequate models of local heterogeneity can greatly reduce that number. Geostatistics can be used in random, systematic and periodic heterogeneities, while robust methods are useful in cases of nugget heterogeneities. Typical examples of natural and synthetic minerals, analysed by electron microprobe and micro-PIXE (particle induced X-ray emission) for microhomogeneity/microheterogeneity are shown. The distinctions between the two strategies of using these materials as a potential reference material are demonstrated.

  3. Exploring the role of first impressions in rater-based assessments.

    PubMed

    Wood, Timothy J

    2014-08-01

    Medical education relies heavily on assessment formats that require raters to assess the competence and skills of learners. Unfortunately, there are often inconsistencies and variability in the scores raters assign. To ensure the scores from these assessment tools have validity, it is important to understand the underlying cognitive processes that raters use when judging the abilities of their learners. The goal of this paper, therefore, is to contribute to a better understanding of the cognitive processes used by raters. Representative findings from the social judgment and decision making, cognitive psychology, and educational measurement literature will be used to enlighten the underpinnings of these rater-based assessments. Of particular interest is the impact judgments referred to as first impressions (or thin slices) have on rater-based assessments. These are judgments about people made very quickly and based on very little information. A narrative review will provide a synthesis of research in these three literatures (social judgment and decision making, educational psychology, and cognitive psychology) and will focus on the underlying cognitive processes, the accuracy and the impact of first impressions on rater-based assessments. The application of these findings to the types of rater-based assessments used in medical education will then be reviewed. Gaps in understanding will be identified and suggested directions for future research studies will be discussed.

  4. Rater Types in Writing Performance Assessments: A Classification Approach to Rater Variability

    ERIC Educational Resources Information Center

    Eckes, Thomas

    2008-01-01

    Research on rater effects in language performance assessments has provided ample evidence for a considerable degree of variability among raters. Building on this research, I advance the hypothesis that experienced raters fall into types or classes that are clearly distinguishable from one another with respect to the importance they attach to…

  5. Effects of Marking Method and Rater Experience on ESL Essay Scores and Rater Performance

    ERIC Educational Resources Information Center

    Barkaoui, Khaled

    2011-01-01

    This study examined the effects of marking method and rater experience on ESL (English as a Second Language) essay test scores and rater performance. Each of 31 novice and 29 experienced raters rated a sample of ESL essays both holistically and analytically. Essay scores were analysed using a multi-faceted Rasch model to compare test-takers'…

  6. A Hierarchical Rater Model for Constructed Responses, with a Signal Detection Rater Model

    ERIC Educational Resources Information Center

    DeCarlo, Lawrence T.; Kim, YoungKoung; Johnson, Matthew S.

    2011-01-01

    The hierarchical rater model (HRM) recognizes the hierarchical structure of data that arises when raters score constructed response items. In this approach, raters' scores are not viewed as being direct indicators of examinee proficiency but rather as indicators of essay quality; the (latent categorical) quality of an examinee's essay in turn…

  7. Weight-Based Classification of Raters and Rater Cognition in an EFL Speaking Test

    ERIC Educational Resources Information Center

    Cai, Hongwen

    2015-01-01

    This study is an attempt to classify raters according to their weighting patterns and explore systematic differences between rater types in the rating process. In the context of an EFL speaking test, 126 raters were classified into three types--form-oriented, balanced, and content-oriented--through cluster analyses of their weighting patterns…

  8. Weight-Based Classification of Raters and Rater Cognition in an EFL Speaking Test

    ERIC Educational Resources Information Center

    Cai, Hongwen

    2015-01-01

    This study is an attempt to classify raters according to their weighting patterns and explore systematic differences between rater types in the rating process. In the context of an EFL speaking test, 126 raters were classified into three types--form-oriented, balanced, and content-oriented--through cluster analyses of their weighting patterns…

  9. Variance Estimation of Nominal-Scale Inter-Rater Reliability with Random Selection of Raters

    ERIC Educational Resources Information Center

    Gwet, Kilem Li

    2008-01-01

    Most inter-rater reliability studies using nominal scales suggest the existence of two populations of inference: the population of subjects (collection of objects or persons to be rated) and that of raters. Consequently, the sampling variance of the inter-rater reliability coefficient can be seen as a result of the combined effect of the sampling…

  10. Inter-Rater Variability as Mutual Disagreement: Identifying Raters' Divergent Points of View

    ERIC Educational Resources Information Center

    Gingerich, Andrea; Ramlo, Susan E.; van der Vleuten, Cees P. M.; Eva, Kevin W.; Regehr, Glenn

    2017-01-01

    Whenever multiple observers provide ratings, even of the same performance, inter-rater variation is prevalent. The resulting "idiosyncratic rater variance" is considered to be unusable error of measurement in psychometric models and is a threat to the defensibility of our assessments. Prior studies of inter-rater variation in clinical…

  11. Efficacy and safety of generic escitalopram (Lexacure®) in patients with major depressive disorder: a 6-week multicenter, randomized, rater-blinded, escitalopram-comparative, non-inferiority study

    PubMed Central

    Jeong, Jong-Hyun; Bahk, Won-Myong; Woo, Young Sup; Lee, Kyung-Uk; Kim, Do Hoon; Kim, Moon-Doo; Kim, Won; Yang, Jong-Chul; Lee, Kwang Heun

    2015-01-01

    Objectives The primary aim of this non-inferiority study was to investigate the clinical effectiveness and safety of generic escitalopram (Lexacure®) versus branded escitalopram (Lexapro®) for patients with major depressive disorder (MDD). Methods The present study included 158 patients, who were randomized (1:1) to receive a flexible dose of generic escitalopram (n=78) or branded escitalopram (n=80) over a 6-week single-blind treatment period. The clinical benefits in the two groups were evaluated using the Montgomery–Åsberg Depression Rating Scale (MADRS), the 17-item Hamilton Depression Rating Scale (HDRS), the Clinical Global Impressions-Severity scale (CGI-S), and the Clinical Global Impressions-Improvement scale (CGI-I) at baseline, week 1, week 2, week 4, and week 6. The frequency of adverse events (AEs) was also assessed to determine safety at each follow-up visit. Results During the 6-week study period, 30 patients (38.5%) from the generic escitalopram group and 28 patients (30.0%) from the branded escitalopram group dropped out of the study (P=0.727). The MADRS, HDRS, CGI-S, and CGI-I scores significantly decreased in both groups, and there were no significant differences between the groups. At week 6, 28 patients (57.1%) in the generic escitalopram group and 35 patients (67.3%) in the branded escitalopram group had responded to treatment (as indicated by a ≥50% decrease from the baseline MADRS score; P=0.126), and the remission rates (MADRS score: ≤10) were 42.9% (n=21) in generic escitalopram group and 53.8% (n=28) in the branded escitalopram group (P=0.135). The most frequently reported AEs were nausea (17.9%), sleepiness/somnolence (7.7%), weight gain (3.8%), and dry mouth (2.6%) in the generic escitalopram group and nausea (20.0%), sleepiness/somnolence (3.8%), weight gain (2.5%), and dry mouth (2.5%) in the branded escitalopram group. Conclusion The present non-inferiority study demonstrated that generic escitalopram is a safe and an

  12. Rater Variables Associated with ITER Ratings

    ERIC Educational Resources Information Center

    Paget, Michael; Wu, Caren; McIlwrick, Joann; Woloschuk, Wayne; Wright, Bruce; McLaughlin, Kevin

    2013-01-01

    Advocates of holistic assessment consider the ITER a more authentic way to assess performance. But this assessment format is subjective and, therefore, susceptible to rater bias. Here our objective was to study the association between rater variables and ITER ratings. In this observational study our participants were clerks at the University of…

  13. Accuracy of Surgery Clerkship Performance Raters.

    ERIC Educational Resources Information Center

    Littlefield, John H.; And Others

    1991-01-01

    Interrater reliability in numerical ratings of clerkship performance (n=1,482 students) in five surgery programs was studied. Raters were classified as accurate or moderately or significantly stringent or lenient. Results indicate that increasing the proportion of accurate raters would substantially improve the precision of class rankings. (MSE)

  14. Agreement between Two Independent Groups of Raters

    ERIC Educational Resources Information Center

    Vanbelle, Sophie; Albert, Adelin

    2009-01-01

    We propose a coefficient of agreement to assess the degree of concordance between two independent groups of raters classifying items on a nominal scale. This coefficient, defined on a population-based model, extends the classical Cohen's kappa coefficient for quantifying agreement between two raters. Weighted and intraclass versions of the…

  15. Effects of Assigning Raters to Items

    ERIC Educational Resources Information Center

    Sykes, Robert C.; Ito, Kyoko; Wang, Zhen

    2008-01-01

    Student responses to a large number of constructed response items in three Math and three Reading tests were scored on two occasions using three ways of assigning raters: single reader scoring, a different reader for each response (item-specific), and three readers each scoring a rater item block (RIB) containing approximately one-third of a…

  16. Agreement between Two Independent Groups of Raters

    ERIC Educational Resources Information Center

    Vanbelle, Sophie; Albert, Adelin

    2009-01-01

    We propose a coefficient of agreement to assess the degree of concordance between two independent groups of raters classifying items on a nominal scale. This coefficient, defined on a population-based model, extends the classical Cohen's kappa coefficient for quantifying agreement between two raters. Weighted and intraclass versions of the…

  17. Accuracy of Surgery Clerkship Performance Raters.

    ERIC Educational Resources Information Center

    Littlefield, John H.; And Others

    1991-01-01

    Interrater reliability in numerical ratings of clerkship performance (n=1,482 students) in five surgery programs was studied. Raters were classified as accurate or moderately or significantly stringent or lenient. Results indicate that increasing the proportion of accurate raters would substantially improve the precision of class rankings. (MSE)

  18. Effects of Assigning Raters to Items

    ERIC Educational Resources Information Center

    Sykes, Robert C.; Ito, Kyoko; Wang, Zhen

    2008-01-01

    Student responses to a large number of constructed response items in three Math and three Reading tests were scored on two occasions using three ways of assigning raters: single reader scoring, a different reader for each response (item-specific), and three readers each scoring a rater item block (RIB) containing approximately one-third of a…

  19. Inter-rater, intra-rater, and inter-machine reliability of quantitative ultrasound measurements of the patellar tendon.

    PubMed

    Gellhorn, Alfred C; Carlson, M Jake

    2013-05-01

    The use of ultrasound (US) to perform quantitative measurements of musculoskeletal tissues requires accurate and reliable measurements between investigators and ultrasound machines. The objective of this study was to evaluate inter-rater and intra-rater reliability of patellar tendon measurements between providers with different levels of US experience and inter-machine reliability of US machines. Sixteen subjects without a history of knee pain were evaluated with US examinations of the patellar tendon. Each tendon was scanned independently by two investigators using two different ultrasound machines. Tendon length and cross-sectional area (CSA) were obtained, and examiners were blinded to each other's results. Tendon length was measured using a validated system involving surface markers and calipers, and CSA was measured using each machine's measuring software. Intra-class correlation coefficients (ICCs) were used to determine reliability of measurements between observers, where ICC > 0.75 was considered good and ICC > 0.9 was considered excellent. Inter-rater reliability between sonographers was excellent and revealed an ICC of 0.90 to 0.92 for patellar tendon CSA and an ICC of 0.96 for tendon length. ICC for intra-rater reliability of tendon CSA was also generally excellent, with ICC between 0.87 and 0.96. Inter-machine reliability was excellent, with ICC of 0.91-0.98 for tendon CSA and 0.96-0.98 for tendon length. Bland-Altman plots were constructed to measure validity and demonstrated a mean difference between sonographers of 0.03 mm(2) for CSA measurements and 0.2 mm for tendon length. Using well-defined scanning protocols, a novice and an experienced musculoskeletal sonographer attained high levels of inter-rater agreement, with similarly excellent results for intra-rater and inter-machine reliability. To our knowledge, this study is the first to report inter-machine reliability in the setting of quantitative musculoskeletal ultrasound. Copyright © 2013

  20. Introducing a new definition of a near fall: intra-rater and inter-rater reliability.

    PubMed

    Maidan, I; Freedman, T; Tzemah, R; Giladi, N; Mirelman, A; Hausdorff, J M

    2014-01-01

    Near falls (NFs) are more frequent than falls, and may occur before falls, potentially predicting fall risk. As such, identification of a NF is important. We aimed to assess intra and inter-rater reliability of the traditional definition of a NF and to demonstrate the potential utility of a new definition. To this end, 10 older adults, 10 idiopathic elderly fallers, and 10 patients with Parkinson's disease (PD) walked in an obstacle course while wearing a safety harness. All walks were videotaped. Forty-nine video segments were extracted to create 2 clips each of 8.48 min. Four raters scored each event using the traditional definition and, two weeks later, using the new definition. A fifth rater used only the new definition. Intra-rater reliability was determined using Kappa (K) statistics and inter-rater reliability was determined using ICC. Using the traditional definition, three raters had poor intra-rater reliability (K<0.054, p>0.137) and one rater had moderate intra-rater reliability (K=0.624, p<0.001). With the traditional definition, inter-rater reliability between the four raters was moderate (ICC=0.667, p<0.001). In contrast, the new NF definition showed high intra-rater (K>0.601, p<0.001) and excellent inter-rater reliability (ICC=0.815, p<0.001). A priori, it is easy to distinguish falls from usual walking and NFs, but it is more challenging to distinguish NFs from obstacle negotiation and usual walking. Therefore, a more precise definition of NF is required. The results of the present study suggest that the proposed new definition increases intra and inter-rater reliability, a critical step for using NFs to quantify fall risk. Copyright © 2013 Elsevier B.V. All rights reserved.

  1. Reducing inter-rater variability in the assessment of nuchal translucency image quality.

    PubMed

    Nisbet, D; McLennan, A; Robertson, A; Schluter, P J; Hyett, J

    2011-01-01

    Standardization of first-trimester nuchal translucency (NT) image acquisition is crucial to the success of screening for Down syndrome. Rigorous audit of operator performance and constructive feedback from assessors maintain standards. This process relies on good inter-rater agreement on image assessment. We describe the Australian approach to NT image assessment and evaluate the impact of a targeted intervention on inter-rater agreement. Between 2002 and 2008 a group of experienced practitioners met nine times to compare their assessment of a series of NT images. Each assessor had previously scored the images according to a system described in 2002. Inter-rater agreement was evaluated before and after an intervention where the assessors were required to refer to a detailed resource manual designed to reduce the subjectivity inherent in image assessment. There was a statistical improvement in inter-rater agreement for all elements of image assessment (original scores and individual component scores) after the intervention, apart from horizontal fetal position. However, even after the intervention, inter-rater agreement levels generally remained moderate (kappa range: 0.14-0.58). This study has shown that provision of detailed resource documentation to experienced assessors can significantly improve inter-rater agreement in all facets of NT image assessment. It also highlights areas of image assessment that require critical review. It is recommended that all audit bodies regularly review their inter-rater agreement to ensure consistent feedback to operators who submit images for expert peer review. 2011 S. Karger AG, Basel.

  2. Introducing a new definition of a near fall: Intra-rater and inter-rater reliability

    PubMed Central

    Maidan, I; Freedman, T; Tzemah, R; Giladi, N; Mirelman, A; Hausdorff, JM

    2013-01-01

    Near falls (NFs) are more frequent than falls, and may occur before falls, potentially predicting fall risk. As such identification of a NF is important. We aimed to assess intra and inter-rater reliability of the traditional definition of a NF and to demonstrate the potential utility of a new definition. To this end, 10 older adults, 10 idiopathic elderly fallers, and 10 patients with Parkinson’s disease (PD) walked in an obstacle course while wearing a safety harness. All walks were videotaped. 49 video segments were extracted to create 2 clips each of 8.48 minutes. Four raters scored each event using the traditional definition and, two weeks later, using the new definition. A fifth rater used only the new definition. Intrarater reliability was determined using Kappa (K) statistics and inter-rater reliability was determined using ICC. Using the traditional definition, three raters had poor intra-rater reliability (K<0.054, p>0.137) and one rater had moderate intra-rater reliability (K=0.624, p<0.001). With the traditional definition, inter-rater reliability between the four raters was moderate (ICC=0.667, p<0.001). In contrast, the new NF definition showed high intra-rater (K>0.601, p<0.001) and high inter-rater reliability (ICC=0.815, p<0.001). A priori, it is easy to distinguish falls from usual walking and NFs, but it is more challenging to distinguish NFs from obstacle negotiation and usual walking. Therefore, a more precise definition of NF is required. The results of the present study suggest that the proposed new definition increases intra and inter-rater reliability, a critical step for using NFs to quantify fall risk. PMID:23972512

  3. Rating the raters in a mixed model: An approach to deciphering the rater reliability

    NASA Astrophysics Data System (ADS)

    Shang, Junfeng; Wang, Yougui

    2013-05-01

    Rating the raters has attracted extensive attention in recent years. Ratings are quite complex in that the subjective assessment and a number of criteria are involved in a rating system. Whenever the human judgment is a part of ratings, the inconsistency of ratings is the source of variance in scores, and it is therefore quite natural for people to verify the trustworthiness of ratings. Accordingly, estimation of the rater reliability will be of great interest and an appealing issue. To facilitate the evaluation of the rater reliability in a rating system, we propose a mixed model where the scores of the ratees offered by a rater are described with the fixed effects determined by the ability of the ratees and the random effects produced by the disagreement of the raters. In such a mixed model, for the rater random effects, we derive its posterior distribution for the prediction of random effects. To quantitatively make a decision in revealing the unreliable raters, the predictive influence function (PIF) serves as a criterion which compares the posterior distributions of random effects between the full data and rater-deleted data sets. The benchmark for this criterion is also discussed. This proposed methodology of deciphering the rater reliability is investigated in the multiple simulated and two real data sets.

  4. Proxies and Other External Raters: Methodological Considerations

    PubMed Central

    Snow, A Lynn; Cook, Karon F; Lin, Pay-Shin; Morgan, Robert O; Magaziner, Jay

    2005-01-01

    Objective The purpose of this paper is to introduce researchers to the measurement and subsequent analysis considerations involved when using externally rated data. We will define and describe two categories of externally rated data, recommend methodological approaches for analyzing and interpreting data in these two categories, and explore factors affecting agreement between self-rated and externally rated reports. We conclude with a discussion of needs for future research. Data Sources/Study Setting Data sources for this paper are previous published studies and reviews comparing self-rated with externally rated data. Study Design/Data Collection/Extraction Methods This is a psychometric conceptual paper. Principal Findings We define two types of externally rated data: proxy data and other-rated data. Proxy data refer to those collected from someone who speaks for a patient who cannot, will not, or is unavailable to speak for him or herself, whereas we use the term other-rater data to refer to situations in which the researcher collects ratings from a person other than the patient to gain multiple perspectives on the assessed construct. These two types of data differ in the way the measurement model is defined, the definition of the gold standard against which the measurements are validated, the analysis strategies appropriately used, and how the analyses are interpreted. There are many factors affecting the discrepancies between self- and external ratings, including characteristics of the patient, the proxy, and of the rated construct. Several psychological theories can be helpful in predicting such discrepancies. Conclusions Externally rated data have an important place in health services research, but use of such data requires careful consideration of the nature of the data and how it will be analyzed and interpreted. PMID:16179002

  5. Intra-rater and inter-rater reliabilities of real-time acceleration gait analysis system.

    PubMed

    Osaka, Hiroshi; Shinkoda, Koichi; Watanabe, Susumu; Fujita, Daisuke; Kobara, Kenichi; Yoshimura, Yosuke; Ito, Tomotaka

    2016-01-01

    The purposes of this study were to construct a real-time acceleration gait analysis system equipped with software to analyse real-time trunk acceleration during walking and to examine the intra-rater and inter-rater reliabilities of the this system. This system has been comprised of an accelerometer, an acceleration amplifier, a transmitter, two foot switches, a receiver and a personal computer installed with the real-time acceleration analysis software. The acceleration signals received were analysed using the real-time acceleration analysis software, and gait parameters were calculated. The subjects were 20 healthy individuals and two raters. The intra-rater and inter-rater reliabilities of the measurement results obtained from this system were examined by performing intraclass correlation coefficients (ICC) and Bland-Altman analysis. The intra-rater and inter-rater ICCs ranged from 0.61 to 0.92 in any gait parameters. In the Bland-Altman analysis, neither fixed nor proportional bias was found in any of the gait parameters. From the ICC and Bland-Altman analysis results, the gait measurement using this system clearly demonstrates that the intra-rater and inter-rater measurements had good reproducibility. Owing to this system, we can improve the clinical efficiency of gait analysis and gait training for physiotherapy. Implication for Rehabilitation This study focused on the advantage of a gait analysis method using an accelerometer and constructed a gait analysis system that calculates real-time gait parameters from trunk acceleration measurements during walking. The gait analysis using this system has good intra-rater and inter-rater reliabilities, and using this system can improve the clinical efficiency of gait analysis and gait training.

  6. Measuring Essay Assessment: Intra-Rater and Inter-Rater Reliability

    ERIC Educational Resources Information Center

    Kayapinar, Ulas

    2014-01-01

    Problem Statement: There have been many attempts to research the effective assessment of writing ability, and many proposals for how this might be done. In this sense, rater reliability plays a crucial role for making vital decisions about testees in different turning points of both educational and professional life. Intra-rater and inter-rater…

  7. Cheiloscopy: Lip Print Inter-rater Reliability.

    PubMed

    Furnari, Winnie; Janal, Malvin N

    2017-05-01

    Lip print analysis, or cheiloscopy, has the potential to join fingerprints and retinal scans as an additional method to determine human identification. This preliminary study sought to determine agreement among 20 raters, forensic odontologists, using an often referenced system that categorizes lip prints into six classes related to the dominant pattern of vertical, horizontal, and intersecting lines. Lip prints were taken from 13 individuals, and raters categorized eight distinct regions of each print. In addition to ratings made while viewing the actual prints, the raters repeated the exercise using photographs of the lip prints. Multirater kappa, a chance-corrected measure of agreement, ranged between 0.15 for the actual prints and 0.25 for the photos, indicating only poor to fair levels of inter-rater reliability. While these results fail to support the use of lip prints for human identification, it is possible that more intensive training may yet produce adequate levels of reliability. © 2016 American Academy of Forensic Sciences.

  8. Direct Behavior Rating: Considerations for Rater Accuracy

    ERIC Educational Resources Information Center

    Harrison, Sayward E.; Riley-Tillman, T. Chris; Chafouleas, Sandra M.

    2014-01-01

    Direct behavior rating (DBR) offers users a flexible, feasible method for the collection of behavioral data. Previous research has supported the validity of using DBR to rate three target behaviors: academic engagement, disruptive behavior, and compliance. However, the effect of the base rate of behavior on rater accuracy has not been established.…

  9. A comparison of face-to-face and remote assessment of inter-rater reliability on the Hamilton Depression Rating Scale via videoconferencing.

    PubMed

    Kobak, Kenneth A; Williams, Janet B W; Engelhardt, Nina

    2008-02-28

    Poor inter-rater reliability (IRR) is an important methodological factor that may contribute to failed trials. The sheer number of raters at diverse sites in multicenter trials presents a formidable challenge in calibration. Videoconferencing allows for the evaluation of IRR of raters at diverse sites by enabling raters at different sites to each independently interview a common patient. This is a more rigorous test of IRR than passive rating of videotapes. To evaluate the potential impact of videoconferencing on IRR, we compared IRR obtained via videoconference to IRR obtained using face-to-face interviews. Four raters at three different locations were paired using all pair-wise combinations of raters. Using videoconferencing, each paired rater independently conducted an interview with the same patient, who was at a third, central location. Raters were blind to each others' scores. ICC from this cohort (n=22) was not significantly different from the ICC obtained by a cohort using two face-to-face interviews (n=21) (0.90 vs. 0.93, respectively) nor from a cohort using one face-to-face interview and one remote interview (n=21) (0.88). The mean Hamilton Depression Rating Scale (HAMD) scores obtained were not significantly different. There appears to be no loss of signal using remote methods of calibration compared with traditional face-to-face methods.

  10. Comparison of Models and Indices for Detecting Rater Centrality.

    PubMed

    Wolfe, Edward W; Song, Tian

    2015-01-01

    To date, much of the research concerning rater effects has focused on rater severity/leniency. Consequently, other potentially important rater effects have largely ignored by those conducting operational scoring projects. This simulation study compares four rater centrality indices (rater fit, residual-expected correlations, rater slope, and rater threshold variance) in terms of their Type I and Type II error rates under varying levels of centrality magnitude, centrality pervasiveness, and rating scale construction when each of four latent trait models is fitted to the simulated data (Rasch rating scale and partial credit models and the generalized rating scale and partial credit models). Results indicate that the residual-expected correlation may be most appropriately sensitive to rater centrality under most conditions.

  11. Detecting and Correcting for Rater Effects in Performance Assessment.

    ERIC Educational Resources Information Center

    Raymond, Mark R.; Houston, Walter M.

    Performance rating systems frequently use multiple raters in order to improve the reliability of ratings. However, unless all candidates are rated by the same raters, some candidates will be at an unfair advantage or disadvantage solely because they were rated by more stringent or lenient raters. To obtain fair and accurate evaluations of…

  12. Automated Essay Scoring With e-rater[R] V.2

    ERIC Educational Resources Information Center

    Attali, Yigal; Burstein, Jill

    2006-01-01

    E-rater[R] has been used by the Educational Testing Service for automated essay scoring since 1999. This paper describes a new version of e-rater (V.2) that is different from other automated essay scoring systems in several important respects. The main innovations of e-rater V.2 are a small, intuitive, and meaningful set of features used for…

  13. Workplace-Based Assessment: Raters' Performance Theories and Constructs

    ERIC Educational Resources Information Center

    Govaerts, M. J. B.; Van de Wiel, M. W. J.; Schuwirth, L. W. T.; Van der Vleuten, C. P. M.; Muijtjens, A. M. M.

    2013-01-01

    Weaknesses in the nature of rater judgments are generally considered to compromise the utility of workplace-based assessment (WBA). In order to gain insight into the underpinnings of rater behaviours, we investigated how raters form impressions of and make judgments on trainee performance. Using theoretical frameworks of social cognition and…

  14. Retrieving the relief of a low-roughness surface using a two-step interferometric method with blind phase shift of a reference wave

    NASA Astrophysics Data System (ADS)

    Muravsky, Leonid I.; Kmet', Arkady B.; Voronyak, Taras I.

    2012-11-01

    A two-step interferometric method with blind phase shift of a reference wave for surface relief retrieval is considered. The possibility of using this method to reconstruct a macrorelief and microrelief of low-roughness surfaces is studied. Computer simulations have testified to the possibility of obtaining a reliable low-noise reconstruction of a low-roughness surface macrorelief and microrelief with standard deviation of the roughness heights up to λ/10 by using the developed interferogram-processing algorithm. The simulations have shown that the correlation approach, which is used to determine the reference wave blind phase shift, is more suitable for a rough surface than for a smooth one and the increase of surface roughness leads to a sharp decrease of error in comparison with that for a smooth surface. The experiment-based verification of the performance of the proposed interferometric method has been done using an experiment setup based on a Twyman-Green interferometer. Peculiarities of choosing the sampling interval for a rough surface recording are discussed. The experimental results that were obtained virtually coincided with the computer simulation results, proving the feasibility of retrieving both smooth and low-roughness surfaces by the considered method.

  15. Inter and intra-rater reliability of mobile device goniometer in measuring lumbar flexion range of motion.

    PubMed

    Bedekar, Nilima; Suryawanshi, Mayuri; Rairikar, Savita; Sancheti, Parag; Shyam, Ashok

    2014-01-01

    Evaluation of range of motion (ROM) is integral part of assessment of musculoskeletal system. This is required in health fitness and pathological conditions; also it is used as an objective outcome measure. Several methods are described to check spinal flexion range of motion. Different methods for measuring spine ranges have their advantages and disadvantages. Hence, a new device was introduced in this study using the method of dual inclinometer to measure lumbar spine flexion range of motion (ROM). To determine Intra and Inter-rater reliability of mobile device goniometer in measuring lumbar flexion range of motion. iPod mobile device with goniometer software was used. The part being measure i.e the back of the subject was suitably exposed. Subject was standing with feet shoulder width apart. Spinous process of second sacral vertebra S2 and T12 were located, these were used as the reference points and readings were taken. Three readings were taken for each: inter-rater reliability as well as the intra-rater reliability. Sufficient rest was given between each flexion movement. Intra-rater reliability using ICC was r=0.920 and inter-rater r=0.812 at CI 95%. Validity r=0.95. Mobile device goniometer has high intra-rater reliability. The inter-rater reliability was moderate. This device can be used to assess range of motion of spine flexion, representing uni-planar movement.

  16. Inter-rater reliability of a modified version of Delitto et al.'s classification-based system for low back pain: a pilot study.

    PubMed

    Apeldoorn, Adri T; van Helvoirt, Hans; Ostelo, Raymond W; Meihuizen, Hanneke; Kamper, Steven J; van Tulder, Maurits W; de Vet, Henrica C W

    2016-05-01

    Observational inter-rater reliability study. To examine: (1) the inter-rater reliability of a modified version of Delitto et al.'s classification-based algorithm for patients with low back pain; (2) the influence of different levels of familiarity with the system; and (3) the inter-rater reliability of algorithm decisions in patients who clearly fit into a subgroup (clear classifications) and those who do not (unclear classifications). Patients were examined twice on the same day by two of three participating physical therapists with different levels of familiarity with the system. Patients were classified into one of four classification groups. Raters were blind to the others' classification decision. In order to quantify the inter-rater reliability, percentages of agreement and Cohen's Kappa were calculated. A total of 36 patients were included (clear classification n = 23; unclear classification n = 13). The overall rate of agreement was 53% and the Kappa value was 0·34 [95% confidence interval (CI): 0·11-0·57], which indicated only fair inter-rater reliability. Inter-rater reliability for patients with a clear classification (agreement 52%, Kappa value 0·29) was not higher than for patients with an unclear classification (agreement 54%, Kappa value 0·33). Familiarity with the system (i.e. trained with written instructions and previous research experience with the algorithm) did not improve the inter-rater reliability. Our pilot study challenges the inter-rater reliability of the classification procedure in clinical practice. Therefore, more knowledge is needed about factors that affect the inter-rater reliability, in order to improve the clinical applicability of the classification scheme.

  17. Inter-rater reliability of a modified version of Delitto et al.’s classification-based system for low back pain: a pilot study

    PubMed Central

    Apeldoorn, Adri T.; van Helvoirt, Hans; Ostelo, Raymond W.; Meihuizen, Hanneke; Kamper, Steven J.; van Tulder, Maurits W.; de Vet, Henrica C. W.

    2016-01-01

    Study design Observational inter-rater reliability study. Objectives To examine: (1) the inter-rater reliability of a modified version of Delitto et al.’s classification-based algorithm for patients with low back pain; (2) the influence of different levels of familiarity with the system; and (3) the inter-rater reliability of algorithm decisions in patients who clearly fit into a subgroup (clear classifications) and those who do not (unclear classifications). Methods Patients were examined twice on the same day by two of three participating physical therapists with different levels of familiarity with the system. Patients were classified into one of four classification groups. Raters were blind to the others’ classification decision. In order to quantify the inter-rater reliability, percentages of agreement and Cohen’s Kappa were calculated. Results A total of 36 patients were included (clear classification n = 23; unclear classification n = 13). The overall rate of agreement was 53% and the Kappa value was 0·34 [95% confidence interval (CI): 0·11–0·57], which indicated only fair inter-rater reliability. Inter-rater reliability for patients with a clear classification (agreement 52%, Kappa value 0·29) was not higher than for patients with an unclear classification (agreement 54%, Kappa value 0·33). Familiarity with the system (i.e. trained with written instructions and previous research experience with the algorithm) did not improve the inter-rater reliability. Conclusion Our pilot study challenges the inter-rater reliability of the classification procedure in clinical practice. Therefore, more knowledge is needed about factors that affect the inter-rater reliability, in order to improve the clinical applicability of the classification scheme. PMID:27559279

  18. The Effects of Rater Severity and Rater Distribution on Examinees' Ability Estimation for Constructed-Response Items. Research Report. ETS RR-13-23

    ERIC Educational Resources Information Center

    Wang, Zhen; Yao, Lihua

    2013-01-01

    The current study used simulated data to investigate the properties of a newly proposed method (Yao's rater model) for modeling rater severity and its distribution under different conditions. Our study examined the effects of rater severity, distributions of rater severity, the difference between item response theory (IRT) models with rater effect…

  19. Blindness - resources

    MedlinePlus

    Resources - blindness ... The following organizations are good resources for information on blindness : American Foundation for the Blind -- www.afb.org Foundation Fighting Blindness -- www.blindness.org National Eye Institute -- ...

  20. Inter-Rater Reliability and Intra-Rater Reliability of Assessing the 2-Minute Push-Up Test.

    PubMed

    Fielitz, Lynn; Coelho, Jeffrey; Horne, Thomas; Brechue, William

    2016-02-01

    The purpose of this study was to assess inter-rater reliability and intra-rater reliability of the 2-minute, 90° push-up test as utilized in the Army Physical Fitness Test. Analysis of rater assessment reliability included both total score agreement and agreement across individual push-up repetitions. This study utilized 8 Raters who assessed 15 different videotaped push-up performances over 4 iterations separated by a minimum of 1 week. The 15 push-up participants were videotaped during the semiannual Army Physical Fitness Test. Each Rater randomly viewed the 15 push-up and verbally responded with a "yes" or "no" to each push-up repetition. The data generated were analyzed using the Pearson product-moment correlation as well as the kappa, modified kappa and the intra-class correlation coefficient (3,1). An attribute agreement analysis was conducted to determine the percent of inter-rater and intra-rater agreement across individual push-ups.The results indicated that Raters varied a great deal in assessing push-ups. Over the 4 trials of 15 participants, the overall scores of the Raters varied between 3.0 and 35.7 push-ups. Post hoc comparisons found that there was significant increase in the grand mean of push-ups from trials 1-3 to trial 4 (p < 0.05). Also, there was a significant difference among raters over the 4 trials (p < 0.05). Pearson correlation coefficients for inter-rater and intra-rater reliability identified inter-rater reliability coefficients were between 0.10 and 0.97. Intra-rater coefficients were between 0.48 and 0.99. Intra-rater agreement for individual push-up repetitions ranged from 41.8% to 84.8%. The results indicated that the raters failed to assess the same push-up repetition with the same score (below 70% agreement) as well as failed to agree when viewed between raters (29%). Interestingly, as previously mentioned, scores on trial 4 increased significantly which might have been caused by rater drift or that the Raters did not maintain

  1. Body shape preferences: associations with rater body shape and sociosexuality.

    PubMed

    Price, Michael E; Pound, Nicholas; Dunn, James; Hopkins, Sian; Kang, Jinsheng

    2013-01-01

    There is accumulating evidence of condition-dependent mate choice in many species, that is, individual preferences varying in strength according to the condition of the chooser. In humans, for example, people with more attractive faces/bodies, and who are higher in sociosexuality, exhibit stronger preferences for attractive traits in opposite-sex faces/bodies. However, previous studies have tended to use only relatively simple, isolated measures of rater attractiveness. Here we use 3D body scanning technology to examine associations between strength of rater preferences for attractive traits in opposite-sex bodies, and raters' body shape, self-perceived attractiveness, and sociosexuality. For 118 raters and 80 stimuli models, we used a 3D scanner to extract body measurements associated with attractiveness (male waist-chest ratio [WCR], female waist-hip ratio [WHR], and volume-height index [VHI] in both sexes) and also measured rater self-perceived attractiveness and sociosexuality. As expected, WHR and VHI were important predictors of female body attractiveness, while WCR and VHI were important predictors of male body attractiveness. Results indicated that male rater sociosexuality scores were positively associated with strength of preference for attractive (low) VHI and attractive (low) WHR in female bodies. Moreover, male rater self-perceived attractiveness was positively associated with strength of preference for low VHI in female bodies. The only evidence of condition-dependent preferences in females was a positive association between attractive VHI in female raters and preferences for attractive (low) WCR in male bodies. No other significant associations were observed in either sex between aspects of rater body shape and strength of preferences for attractive opposite-sex body traits. These results suggest that among male raters, rater self-perceived attractiveness and sociosexuality are important predictors of preference strength for attractive opposite

  2. Effects of Rating Training on Inter-Rater Consistency for Developing a Dental Hygiene Clinical Rater Qualification System

    PubMed Central

    Oh, Jung Sook; Chae, Moungae; Jung, Jae Yeon; Bae, Sung Suk

    2007-01-01

    We tried to develop itemized evaluation criteria and a clinical rater qualification system through rating training of inter-rater consistency for experienced clinical dental hygienists and dental hygiene clinical educators. A total of 15 clinical dental hygienists with 1-year careers participated as clinical examination candidates, while 5 dental hygienists with 3-year educations and clinical careers or longer participated as clinical raters. They all took the clinical examination as examinees. The results were compared, and the consistency of competence was measured. The comparison of clinical competence between candidates and clinical raters showed that the candidate group's mean clinical competence ranged from 2.96 to 3.55 on a 5-point system in a total of 3 instruments (Probe, Explorer, Curet), while the clinical rater group's mean clinical competence ranged from 4.05 to 4.29. There was a higher inter-rater consistency after education of raters in the following 4 items: Probe, Explorer, Curet, and insertion on distal surface. The mean score distribution of clinical raters ranged from 75% to 100%, which was more uniform in the competence to detect an artificial calculus than that of candidates (25% to 100%). According to the above results, there was a necessity in the operating clinical rater qualification system for comprehensive dental hygiene clinicians. Furthermore, in order to execute the clinical rater qualification system, it will be necessary to keep conducting a series of studies on educational content, time, frequency, and educator level. PMID:19224006

  3. Pooled analysis of two randomized, double-blind trials comparing proposed biosimilar LA-EP2006 with reference pegfilgrastim in breast cancer.

    PubMed

    Blackwell, K; Gascon, P; Jones, C M; Nixon, A; Krendyukov, A; Nakov, R; Li, Y; Harbeck, N

    2017-09-01

    Following the functional and physicochemical characterization of a proposed biosimilar, comparative clinical studies help to confirm biosimilarity by demonstrating similar safety and efficacy to the reference product in a sensitive patient population. LA-EP2006 is a proposed biosimilar that has been developed for pegfilgrastim, a long-acting form of granulocyte colony-stimulating factor for the prevention of neutropenia. The current analysis reports data pooled from two independent, multinational, prospective, randomized, controlled, double-blind phase III studies of similar design comparing the safety and efficacy of reference pegfilgrastim with LA-EP2006 in patients with breast cancer receiving myelotoxic (neo)adjuvant TAC (docetaxel, doxorubicin, and cyclophosphamide) chemotherapy and requiring granulocyte colony-stimulating factor. A total of 624 patients were randomized in the PROTECT-1 and PROTECT-2 studies (NCT01735175; NCT01516736) (LA-EP2006: n = 314; reference: n = 310). Baseline characteristics of patients were well balanced across treatment groups. The primary end point, mean duration of severe neutropenia in the first chemotherapy cycle was similar in both the LA-EP2006 and reference groups (1.05 ± 1.055 days versus 1.01 ± 0.958 days), with a treatment difference of - 0.04 days [95% confidence interval (CI): -0.19 to 0.11] that met the equivalence criteria (the 95% CI were within the defined margin of ±1 day). Secondary end points, such as the nadir of absolute neutrophil count and the incidence of febrile neutropenia, were also similar between LA-EP2006 and reference pegfilgrastim. The safety and tolerability profile of LA-EP2006 was similar to that observed with reference pegfilgrastim, and there were no reports of neutralizing antibodies. This pooled analysis confirms, as a part of totality of evidence approach, that the proposed biosimilar pegfilgrastim LA-EP2006 has a comparable efficacy and safety profile to reference

  4. How Good Are Our Raters? Rater Errors in Clinical Skills Assessment

    ERIC Educational Resources Information Center

    Iramaneerat, Cherdsak; Yudkowsky, Rachel

    2006-01-01

    A multi-faceted Rasch measurement (MFRM) model was used to analyze a clinical skills assessment of 173 fourth-year medical students in a Midwestern medical school to investigate four types of rater errors: leniency, inconsistency, halo, and restriction of range. Each student performed six clinical tasks with six standardized patients (SPs), who…

  5. Inter-rater reliability of an observation-based ergonomics assessment checklist for office workers.

    PubMed

    Pereira, Michelle Jessica; Straker, Leon Melville; Comans, Tracy Anne; Johnston, Venerina

    2016-12-01

    To establish the inter-rater reliability of an observation-based ergonomics assessment checklist for computer workers. A 37-item (38-item if a laptop was part of the workstation) comprehensive observational ergonomics assessment checklist comparable to government guidelines and up to date with empirical evidence was developed. Two trained practitioners assessed full-time office workers performing their usual computer-based work and evaluated the suitability of workstations used. Practitioners assessed each participant consecutively. The order of assessors was randomised, and the second assessor was blinded to the findings of the first. Unadjusted kappa coefficients between the raters were obtained for the overall checklist and subsections that were formed from question-items relevant to specific workstation equipment. Twenty-seven office workers were recruited. The inter-rater reliability between two trained practitioners achieved moderate to good reliability for all except one checklist component. This checklist has mostly moderate to good reliability between two trained practitioners. Practitioner Summary: This reliable ergonomics assessment checklist for computer workers was designed using accessible government guidelines and supplemented with up-to-date evidence. Employers in Queensland (Australia) can fulfil legislative requirements by using this reliable checklist to identify and subsequently address potential risk factors for work-related injury to provide a safe working environment.

  6. Inter-rater reliability of select physical examination procedures in patients with neck pain.

    PubMed

    Hanney, William J; George, Steven Z; Kolber, Morey J; Young, Ian; Salamh, Paul A; Cleland, Joshua A

    2014-07-01

    This study evaluated the inter-rater reliability of select examination procedures in patients with neck pain (NP) conducted over a 24- to 48-h period. Twenty-two patients with mechanical NP participated in a standardized examination. One examiner performed standardized examination procedures and a second blinded examiner repeated the procedures 24-48 h later with no treatment administered between examinations. Inter-rater reliability was calculated with the Cohen Kappa and weighted Kappa for ordinal data while continuous level data were calculated using an intraclass correlation coefficient model 2,1 (ICC2,1). Coefficients for categorical variables ranged from poor to moderate agreement (-0.22 to 0.70 Kappa) and coefficients for continuous data ranged from slight to moderate (ICC2,1 0.28-0.74). The standard error of measurement for cervical range of motion ranged from 5.3° to 9.9° while the minimal detectable change ranged from 12.5° to 23.1°. This study is the first to report inter-rater reliability values for select components of the cervical examination in those patients with NP performed 24-48 h after the initial examination. There was considerably less reliability when compared to previous studies, thus clinicians should consider how the passage of time may influence variability in examination findings over a 24- to 48-h period.

  7. Rater Accuracy and Training Group Effects in Expert- and Supervisor-Based Monitoring Systems

    ERIC Educational Resources Information Center

    Baird, Jo-Anne; Meadows, Michelle; Leckie, George; Caro, Daniel

    2017-01-01

    This study evaluated rater accuracy with rater-monitoring data from high stakes examinations in England. Rater accuracy was estimated with cross-classified multilevel modelling. The data included face-to-face training and monitoring of 567 raters in 110 teams, across 22 examinations, giving a total of 5500 data points. Two rater-monitoring systems…

  8. Rater Accuracy and Training Group Effects in Expert- and Supervisor-Based Monitoring Systems

    ERIC Educational Resources Information Center

    Baird, Jo-Anne; Meadows, Michelle; Leckie, George; Caro, Daniel

    2017-01-01

    This study evaluated rater accuracy with rater-monitoring data from high stakes examinations in England. Rater accuracy was estimated with cross-classified multilevel modelling. The data included face-to-face training and monitoring of 567 raters in 110 teams, across 22 examinations, giving a total of 5500 data points. Two rater-monitoring systems…

  9. Pelvic floor muscle injuries 6 weeks post partum-an intra- and inter-rater study.

    PubMed

    Staer-Jensen, Jette; Siafarikas, Franziska; Hilde, Gunvor; Braekken, Ingeborg H; Bø, Kari; Engh, Marie Ellström

    2013-09-01

    To evaluate intra- and inter-rater reliability when diagnosing major defects, and inter-rater reliability of diagnosing minor defects and muscle thickness of the pubovisceral muscle in primiparous women 6 weeks after vaginal delivery, using 3D/4D transperineal ultrasound. Forty primiparous women were assessed using 3D/4D transperineal ultrasound. Volumes were acquired at maximal pelvic floor muscle (PFM) contraction, and diagnosis of muscle defects were done using tomographic ultrasound imaging (TUI) of the axial plane. Thickness was measured in three central levels of TUI. The stored volumes were analyzed offline by two investigators blinded to each others' results and the women's clinical data. Cohen's kappa (κ) and percentual agreement were calculated for defects, intraclass correlations coefficient (ICC) with 95% confidence intervals were calculated for thickness. Excellent intra-rater values were found for all major defects. Inter-rater values for bilateral and right-sided defects were excellent, and good for left-sided. Agreement for minor defects was poor. Measuring thickness ICC of 0.72 was found for the left side and 0.48 for the right side, although up to half of the cases had to be excluded owing to poor demarcation of the muscle. Tomographic ultrasound imaging of the axial plane using three central slices seems to be a reliable tool for detecting major pubovisceral muscle defects shortly after childbirth. Minor defects showed low reliability. Muscle thickness measurements showed moderate reliability, but too many cases had to be excluded for this to be a useful method for determining muscle thinning 6 weeks after delivery. Copyright © 2012 Wiley Periodicals, Inc.

  10. Training Raters to Assess Adult ADHD: Reliability of Ratings

    ERIC Educational Resources Information Center

    Adler, Lenard A.; Spencer, Thomas; Faraone, Stephen V.; Reimherr, Fred W.; Kelsey, Douglas; Michelson, David; Biederman, Joseph

    2005-01-01

    The standardization of ADHD ratings in adults is important given their differing symptom presentation. The authors investigated the agreement and reliability of rater standardization in a large-scale trial of atomoxetine in adults with ADHD. Training of 91 raters for the investigator-administered ADHD Rating Scale (ADHDRS-IV-Inv) occurred prior to…

  11. A Comparison of Assessment Methods and Raters in Product Creativity

    ERIC Educational Resources Information Center

    Lu, Chia-Chen; Luh, Ding-Bang

    2012-01-01

    Although previous studies have attempted to use different experiences of raters to rate product creativity by adopting the Consensus Assessment Method (CAT) approach, the validity of replacing CAT with another measurement tool has not been adequately tested. This study aimed to compare raters with different levels of experience (expert ves.…

  12. Rater Strategies for Reaching Agreement on Pupil Text Quality

    ERIC Educational Resources Information Center

    Jølle, Lennart

    2015-01-01

    Novice members of a Norwegian national rater panel tasked with assessing Year 8 pupils' written texts were studied during three successive preparation sessions (2011-2012). The purpose was to investigate how the raters successfully make use of different decision-making strategies in an assessment situation where pre-set criteria and standards give…

  13. Individual Feedback to Enhance Rater Training: Does It Work?

    ERIC Educational Resources Information Center

    Elder, Cathie; Knoch, Ute; Barkhuizen, Gary; von Randow, Janet

    2005-01-01

    Research on the utility of feedback to raters in the form of performance reports has produced mixed findings (Lunt, Morton, & Wigglesworth, 1994; Wigglesworth, 1993) and has thus far been trialled only in oral assessment contexts. This article reports on a study investigating raters' attitudes and responsiveness to feedback on their ratings of…

  14. A Method To Compare Rater Severity across Several Administrations.

    ERIC Educational Resources Information Center

    O'Neill, Thomas R.; Lunz, Mary E.

    This paper illustrates a method to study rater severity across exam administrations. A multi-facet Rasch model defined the ratings as being dominated by four facets: examinee ability, rater severity, project difficulty, and task difficulty. Ten years of data from administrations of a histotechnology performance assessment were pooled and analyzed…

  15. Twins and the Study of Rater (Dis)agreement

    ERIC Educational Resources Information Center

    Bartels, Meike; Boomsma, Dorret I.; Hudziak, James J.; van Beijsterveldt, Toos C. E. M.; van den Oord, Edwin J. C. G.

    2007-01-01

    Genetically informative data can be used to address fundamental questions concerning the measurement of behavior in children. The authors illustrate this with longitudinal multiple-rater data on internalizing problems in twins. Valid information on the behavior of a child is obtained for behavior that multiple raters agree upon and for…

  16. Measuring the Impact of Rater Negotiation in Writing Performance Assessment

    ERIC Educational Resources Information Center

    Trace, Jonathan; Janssen, Gerriet; Meier, Valerie

    2017-01-01

    Previous research in second language writing has shown that when scoring performance assessments even trained raters can exhibit significant differences in severity. When raters disagree, using discussion to try to reach a consensus is one popular form of score resolution, particularly in contexts with limited resources, as it does not require…

  17. Measuring the Impact of Rater Negotiation in Writing Performance Assessment

    ERIC Educational Resources Information Center

    Trace, Jonathan; Janssen, Gerriet; Meier, Valerie

    2017-01-01

    Previous research in second language writing has shown that when scoring performance assessments even trained raters can exhibit significant differences in severity. When raters disagree, using discussion to try to reach a consensus is one popular form of score resolution, particularly in contexts with limited resources, as it does not require…

  18. A Comparison of Assessment Methods and Raters in Product Creativity

    ERIC Educational Resources Information Center

    Lu, Chia-Chen; Luh, Ding-Bang

    2012-01-01

    Although previous studies have attempted to use different experiences of raters to rate product creativity by adopting the Consensus Assessment Method (CAT) approach, the validity of replacing CAT with another measurement tool has not been adequately tested. This study aimed to compare raters with different levels of experience (expert ves.…

  19. Intra-rater and Inter-rater Reliability of Mandibular Range of Motion Measures Considering a Neutral Craniocervical Position

    PubMed Central

    Beltran-Alacreu, Hector; López-de-Uralde-Villanueva, Ibai; Paris-Alemany, Alba; Angulo-Díaz-Parreño, Santiago; La Touche, Roy

    2014-01-01

    [Purpose] The aim of this study was to determine the inter-rater and intra-rater reliability of the mandibular range of motion (ROM) considering the neutral craniocervical position when performing the measurements. [Subjects and Methods] The sample consisted of 50 asymptomatic subjects. Two raters measured four mandibular ROMs (maximal mouth opening (MMO), laterals, and protrusion) using the craniomandibular scale. Subjects alternated between raters, receiving two complete trials per day, two days apart. Intra- and inter-rater reliability was determined using intra-class correlation coefficients (ICCs). Bland-Altman analysis was used to assess reliability, bias, and variability. Finally, the standard error of measurement (SEM) and minimal detectable change (MDC) were analyzed to measure responsiveness. [Results] Reliability was good for MMO (inter-rater, ICC= 0.95−0.96; intra-rater, ICC= 0.95−0.96) and for protrusion (inter-rater, ICC= 0.92−0.94; intra-rater, ICC= 0.93−0.96). Reliability was moderate for lateral excursions. The MMO and protrusion SEM ranged from 0.74 to 0.82 mm and from 0.29 to 0.49 mm, while the MDCs ranged from 1.73 to 1.91 mm and from 0.69 to 0.14 mm respectively. The analysis showed no random or systematic error, suggesting that effect learning did not affect reliability. [Conclusion] A standardized protocol for assessment of mandibular ROM in a neutral craniocervical position obtained good inter- and intra-rater reliability for MMO and protrusion and moderate inter- and intra-rater reliability for lateral excursions. PMID:25013296

  20. How faculty members experience workplace-based assessment rater training: a qualitative study.

    PubMed

    Kogan, Jennifer R; Conforti, Lisa N; Bernabeo, Elizabeth; Iobst, William; Holmboe, Eric

    2015-07-01

    Direct observation of clinical skills is a common approach in workplace-based assessment (WBA). Despite widespread use of the mini-clinical evaluation exercise (mini-CEX), faculty development efforts are typically required to improve assessment quality. Little consensus exists regarding the most effective training methods, and few studies explore faculty members' reactions to rater training. This study was conducted to qualitatively explore the experiences of faculty staff with two rater training approaches - performance dimension training (PDT) and a modified approach to frame of reference training (FoRT) - to elucidate how such faculty development can be optimally designed. In a qualitative study of a multifaceted intervention using complex intervention principles, 45 out-patient resident faculty preceptors from 26 US internal medicine residency programmes participated in a rater training faculty development programme. All participants were interviewed individually and in focus groups during and after the programme to elicit how the training influenced their approach to assessment. A constructivist grounded theory approach was used to analyse the data. Many participants perceived that rater training positively influenced their approach to direct observation and feedback, their ability to use entrustment as the standard for assessment, and their own clinical skills. However, barriers to implementation and change included: (i) a preference for holistic assessment over frameworks; (ii) challenges in defining competence; (iii) difficulty in changing one's approach to assessment, and (iv) concerns about institutional culture and buy-in. Rater training using PDT and a modified approach to FoRT can provide faculty staff with assessment skills that are congruent with principles of criterion-referenced assessment and entrustment, and foundational principles of competency-based education, while providing them with opportunities to reflect on their own clinical skills

  1. Inter-rater Reliability Assessment of ASPECT-R

    PubMed Central

    Bossie, Cynthia A.; Williamson, David; Mao, Lian; Kurut, Clennon

    2016-01-01

    Objective: The increasing importance of real-world data for clinical and policy decision making is driving a need for close attention to the pragmatic versus explanatory features of trial designs. ASPECT-R (A Study Pragmatic-Explanatory Characterization Tool-Rating) is an instrument informed by the PRECIS tool, which was developed to assist researchers in designing trials that are more pragmatic or explanatory. ASPECT-R refined the PRECIS domains and includes a detailed anchored rating system. This analysis established the inter-rater reliability of ASPECT-R. Design: Nine raters (identified from a convenience sample of persons knowledgeable about psychiatry clinical research/study design) received ASPECT-R training materials and 12 study publications. Selected studies assessed antipsychotic treatment in schizophrenia, were published in peer-reviewed journals, and represented a range of studies across a pragmatic-explanatory continuum as determined by authors (CB/LA). After completing training, raters reviewed the 12 studies and rated the study domains using ASPECT-R. Intraclass correlation coefficients were estimated for total and domain scores. Qualitative ratings then were assigned to describe the inter-rater reliability. Results: ASPECT-R scores for the 12 studies were completed by seven raters. The ASPECT-R total score intraclass correlation coefficient was 0.87, corresponding to an excellent inter-rater reliability. Domain intraclass correlation coefficients ranged from 0.85 to 0.31, corresponding to excellent to poor inter-rater reliability. Conclusion: The inter-rater reliability of the ASPECT-R total score was excellent, with excellent to good inter-rater reliability for most domains. The fair to poor inter-rater reliability for two domains may reflect a need for improved domain definition, anchoring, or training materials. ASPECT-R can be used to help understand the pragmaticexplanatory nature of completed or planned trials. PMID:27354926

  2. Comparison of haloperidol and midazolam in restless management of patients referred to the Emergency Department: A double-blinded, randomized clinical trial

    PubMed Central

    Esmailian, Mehrdad; Ahmadi, Omid; Taheri, Mehrsa; Zamani, Majid

    2015-01-01

    Background: Restless and violent behaviors are common in Emergency Departments (EDs), which need therapeutic interventions in most of the times. The first-generation anti-psychotic drugs are one of the most applicable therapeutic agents in the management of such patients, but their use has some limitations. Some studies suggest midazolam as an alternative medicine. Therefore, this study was performed with the aim of comparison of the efficacy and safety of haloperidol and midazolam in the restless management of referring patients to EDs. Materials and Methods: The present double-blinded trial was done on patients needed sedation and referred to the ED of Alzahra Hospital, Isfahan, Iran, in 2014. The patients were categorized into two random groups of haloperidol (5 mg) and midazolam receivers (2.5 mg for those weighing <50 kg and 5 mg in >50 kg), as intramuscular administration. The time to achieve sedation, need for rescue dose, need to resedation within the first 60 min, and adverse effects of drugs were compared among the groups. Results: Forty-eight patients were entered to the study. The mean age in the haloperidol and midazolam groups was 44.8 ± 4.1 years and 45.5 ± 4.7 years, respectively (P = 0.91). The mean time of sedation in the haloperidol and midazolam groups was 5.6 ± 0.3 min and 5.2 ± 0.1 min, respectively (P = 0.31). The mean time of full consciousness after sedation was 36.2 ± 4.5 min and 38.2 ± 3.4 min in the haloperidol and midazolam groups, respectively (P = 0.72). On average, time to arousal in the midazolam group was 10.33 min more than the haloperidol group, but it was not statistically significant. Conclusion: The results of the present study show that administration of midazolam and haloperidol have similar efficacy in the treatment of restless symptoms with the same recovery time from drug effects for referring patients to the ED. In addition, none of the adverse effects were observed in this study. PMID:26759570

  3. An assessment of the inter-rater reliability of the ASA physical status score in the orthopaedic trauma population.

    PubMed

    Ihejirika, Rivka C; Thakore, Rachel V; Sathiyakumar, Vasanth; Ehrenfeld, Jesse M; Obremskey, William T; Sethi, Manish K

    2015-04-01

    Although recent literature has demonstrated the utility of the ASA score in predicting postoperative length of stay, complication risk and potential utilization of other hospital resources, the ASA score has been inconsistently assigned by anaesthesia providers. This study tested the reliability of assignment of the ASA score classification by both attending anaesthesiologists and anaesthesia residents specifically among the orthopaedic trauma patient population. Nine case-based scenarios were created involving preoperative patients with isolated operative orthopaedic trauma injuries. The cases were created and assigned a reference score by both an attending anaesthesiologist and orthopaedic trauma surgeon. Attending and resident anaesthesiologists were asked to assign an ASA score for each case. Rater versus reference and inter-rater agreement amongst respondents was then analyzed utilizing Fleiss's Kappa and weighted and unweighted Cohen's Kappa. Thirty three individuals provided ASA scores for each of the scenarios. The average rater versus reference reliability was substantial (Kw=0.78, SD=0.131, 95% CI=0.73-0.83). The average rater versus reference Kuw was also substantial (Kuw=0.64, SD=0.21, 95% CI=0.56-0.71). The inter-rater reliability as evaluated by Fleiss's Kappa was moderate (K=0.51, p<.001). An inter-rater comparison within the group of attendings (K=0.50, p<.001) and within the group of residents were both moderate (K=0.55, p<.001). There was a significant increase in the level of inter-rater reliability from the self-reported 'very uncomfortable' participants to the 'very comfortable' participants (uncomfortable K=0.43, comfortable K=0.59, p<.001). This study shows substantial agreement strength for reliability of the ASA score among anaesthesiologists when evaluating orthopaedic trauma patients. The significant increase in inter-rater reliability based on anaesthesiologists' comfort with the ASA scoring method implies a need for further evaluation

  4. Statistical fusion of surface labels provided by multiple raters

    NASA Astrophysics Data System (ADS)

    Bogovic, John A.; Landman, Bennett A.; Bazin, Pierre-Louis; Prince, Jerry L.

    2010-03-01

    Studies of the size and morphology of anatomical structures rely on accurate and reproducible delineation of the structures, obtained either by human raters or automatic segmentation algorithms. Measures of reproducibility and variability are vital aspects of such studies and are usually estimated using repeated scans or repeated delineations (in the case of human raters). Methods exist for simultaneously estimating the true structure and rater performance parameters from multiple segmentations and have been demonstrated on volumetric images. In this work, we extend the applicability of previous methods onto two-dimensional surfaces parameterized as triangle meshes. Label homogeneity is enforced using a Markov random field formulated with an energy that addresses the challenges introduced by the surface parameterization. The method was tested using both simulated raters and cortical gyral labels. Simulated raters are computed using a global error model as well as a novel and more realistic boundary error model. We study the impact of raters and their accuracy based on both models, and show how effectively this method estimates the true segmentation on simulated surfaces. The Markov random field formulation was shown to effectively enforce homogeneity for raters suffering from label noise. We demonstrated that our method provides substantial improvements in accuracy over single-atlas methods for all experimental conditions.

  5. Assessment of Differential Rater Functioning in Latent Classes with New Mixture Facets Models.

    PubMed

    Jin, Kuan-Yu; Wang, Wen-Chung

    2017-01-01

    Multifaceted data are very common in the human sciences. For example, test takers' responses to essay items are marked by raters. If multifaceted data are analyzed with standard facets models, it is assumed there is no interaction between facets. In reality, an interaction between facets can occur, referred to as differential facet functioning. A special case of differential facet functioning is the interaction between ratees and raters, referred to as differential rater functioning (DRF). In existing DRF studies, the group membership of ratees is known, such as gender or ethnicity. However, DRF may occur when the group membership is unknown (latent) and thus has to be estimated from data. To solve this problem, in this study, we developed a new mixture facets model to assess DRF when the group membership is latent and we provided two empirical examples to demonstrate its applications. A series of simulations were also conducted to evaluate the performance of the new model in the DRF assessment in the Bayesian framework. Results supported the use of the mixture facets model because all parameters were recovered fairly well, and the more data there were, the better the parameter recovery.

  6. Does a Rater's Professional Background Influence Communication Skills Assessment?

    PubMed

    Artemiou, Elpida; Hecker, Kent G; Adams, Cindy L; Coe, Jason B

    2015-01-01

    There is increasing pressure in veterinary education to teach and assess communication skills, with the Objective Structured Clinical Examination (OSCE) being the most common assessment method. Previous research reveals that raters are a large source of variance in OSCEs. This study focused on examining the effect of raters' professional background as a source of variance when assessing students' communication skills. Twenty-three raters were categorized according to their professional background: clinical sciences (n=11), basic sciences (n=4), clinical communication (n=5), or hospital administrator/clinical skills technicians (n=3). Raters from each professional background were assigned to the same station and assessed the same students during two four-station OSCEs. Students were in year 2 of their pre-clinical program. Repeated-measures ANOVA results showed that OSCE scores awarded by the rater groups differed significantly: (F(matched_station_1) [2,91]=6.97, p=.002), (F(matched_station_2) [3,90]=13.95, p=.001), (F(matched_station_3) [3,90]=8.76, p=.001), and ((Fmatched_station_4) [2,91]=30.60, p=.001). A significant time effect between the two OSCEs was calculated for matched stations 1, 2, and 4, indicating improved student performances. Raters with a clinical communication skills background assigned scores that were significantly lower compared to the other rater groups. Analysis of written feedback provided by the clinical sciences raters showed that they were influenced by the students' clinical knowledge of the case and that they did not rely solely on the communication checklist items. This study shows that it is important to consider rater background both in recruitment and training programs for communication skills' assessment.

  7. A rater training protocol to assess team performance.

    PubMed

    Eppich, Walter; Nannicelli, Anna P; Seivert, Nicholas P; Sohn, Min-Woong; Rozenfeld, Ranna; Woods, Donna M; Holl, Jane L

    2015-01-01

    Simulation-based methodologies are increasingly used to assess teamwork and communication skills and provide team training. Formative feedback regarding team performance is an essential component. While effective use of simulation for assessment or training requires accurate rating of team performance, examples of rater-training programs in health care are scarce. We describe our rater training program and report interrater reliability during phases of training and independent rating. We selected an assessment tool shown to yield valid and reliable results and developed a rater training protocol with an accompanying rater training handbook. The rater training program was modeled after previously described high-stakes assessments in the setting of 3 facilitated training sessions. Adjacent agreement was used to measure interrater reliability between raters. Nine raters with a background in health care and/or patient safety evaluated team performance of 42 in-situ simulations using post-hoc video review. Adjacent agreement increased from the second training session (83.6%) to the third training session (85.6%) when evaluating the same video segments. Adjacent agreement for the rating of overall team performance was 78.3%, which was added for the third training session. Adjacent agreement was 97% 4 weeks posttraining and 90.6% at the end of independent rating of all simulation videos. Rater training is an important element in team performance assessment, and providing examples of rater training programs is essential. Articulating key rating anchors promotes adequate interrater reliability. In addition, using adjacent agreement as a measure allows differentiation between high- and low-performing teams on video review. © 2015 The Alliance for Continuing Education in the Health Professions, the Society for Academic Continuing Medical Education, and the Council on Continuing Medical Education, Association for Hospital Medical Education.

  8. A phase III randomised, double-blind, parallel-group study comparing SB4 with etanercept reference product in patients with active rheumatoid arthritis despite methotrexate therapy

    PubMed Central

    Emery, Paul; Vencovský, Jiří; Sylwestrzak, Anna; Leszczyński, Piotr; Porawska, Wieslawa; Baranauskaite, Asta; Tseluyko, Vira; Zhdan, Vyacheslav M; Stasiuk, Barbara; Milasiene, Roma; Barrera Rodriguez, Aaron Alejandro; Cheong, Soo Yeon; Ghil, Jeehoon

    2017-01-01

    Objectives To compare the efficacy and safety of SB4 (an etanercept biosimilar) with reference product etanercept (ETN) in patients with moderate to severe rheumatoid arthritis (RA) despite methotrexate (MTX) therapy. Methods This is a phase III, randomised, double-blind, parallel-group, multicentre study with a 24-week primary endpoint. Patients with moderate to severe RA despite MTX treatment were randomised to receive weekly dose of 50 mg of subcutaneous SB4 or ETN. The primary endpoint was the American College of Rheumatology 20% (ACR20) response at week 24. Other efficacy endpoints as well as safety, immunogenicity and pharmacokinetic parameters were also measured. Results 596 patients were randomised to either SB4 (N=299) or ETN (N=297). The ACR20 response rate at week 24 in the per-protocol set was 78.1% for SB4 and 80.3% for ETN. The 95% CI of the adjusted treatment difference was −9.41% to 4.98%, which is completely contained within the predefined equivalence margin of −15% to 15%, indicating therapeutic equivalence between SB4 and ETN. Other efficacy endpoints and pharmacokinetic endpoints were comparable. The incidence of treatment-emergent adverse events was comparable (55.2% vs 58.2%), and the incidence of antidrug antibody development up to week 24 was lower in SB4 compared with ETN (0.7% vs 13.1%). Conclusions SB4 was shown to be equivalent with ETN in terms of efficacy at week 24. SB4 was well tolerated with a lower immunogenicity profile. The safety profile of SB4 was comparable with that of ETN. Trial registration numbers NCT01895309, EudraCT 2012-005026-30. PMID:26150601

  9. A phase III randomised, double-blind, parallel-group study comparing SB4 with etanercept reference product in patients with active rheumatoid arthritis despite methotrexate therapy.

    PubMed

    Emery, Paul; Vencovský, Jiří; Sylwestrzak, Anna; Leszczyński, Piotr; Porawska, Wieslawa; Baranauskaite, Asta; Tseluyko, Vira; Zhdan, Vyacheslav M; Stasiuk, Barbara; Milasiene, Roma; Barrera Rodriguez, Aaron Alejandro; Cheong, Soo Yeon; Ghil, Jeehoon

    2017-01-01

    To compare the efficacy and safety of SB4 (an etanercept biosimilar) with reference product etanercept (ETN) in patients with moderate to severe rheumatoid arthritis (RA) despite methotrexate (MTX) therapy. This is a phase III, randomised, double-blind, parallel-group, multicentre study with a 24-week primary endpoint. Patients with moderate to severe RA despite MTX treatment were randomised to receive weekly dose of 50 mg of subcutaneous SB4 or ETN. The primary endpoint was the American College of Rheumatology 20% (ACR20) response at week 24. Other efficacy endpoints as well as safety, immunogenicity and pharmacokinetic parameters were also measured. 596 patients were randomised to either SB4 (N=299) or ETN (N=297). The ACR20 response rate at week 24 in the per-protocol set was 78.1% for SB4 and 80.3% for ETN. The 95% CI of the adjusted treatment difference was -9.41% to 4.98%, which is completely contained within the predefined equivalence margin of -15% to 15%, indicating therapeutic equivalence between SB4 and ETN. Other efficacy endpoints and pharmacokinetic endpoints were comparable. The incidence of treatment-emergent adverse events was comparable (55.2% vs 58.2%), and the incidence of antidrug antibody development up to week 24 was lower in SB4 compared with ETN (0.7% vs 13.1%). SB4 was shown to be equivalent with ETN in terms of efficacy at week 24. SB4 was well tolerated with a lower immunogenicity profile. The safety profile of SB4 was comparable with that of ETN. NCT01895309, EudraCT 2012-005026-30. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  10. Comparing biosimilar SB2 with reference infliximab after 54 weeks of a double-blind trial: clinical, structural and safety results.

    PubMed

    Smolen, Josef S; Choe, Jung-Yoon; Prodanovic, Nenad; Niebrzydowski, Jaroslaw; Staykov, Ivan; Dokoupilova, Eva; Baranauskaite, Asta; Yatsyshyn, Roman; Mekic, Mevludin; Porawska, Wieskawa; Ciferska, Hana; Jedrychowicz-Rosiak, Krystyna; Zielinska, Agnieszka; Choi, Jasmine; Rho, Young Hee

    2017-10-01

    SB2 is a biosimilar to the reference infliximab (INF). Similar efficacy, safety and immunogenicity between SB2 and INF up to 30 weeks were previously reported. This report investigates such clinical similarity up to 54 weeks, including structural joint damage. In this phase III, double-blind, parallel-group, multicentre study, patients with moderate to severe RA despite MTX were randomized (1:1) to receive 3 mg/kg of either SB2 or INF at 0, 2, 6 and every 8 weeks thereafter. Dose escalation by 1.5 mg/kg up to a maximum dose of 7.5 mg/kg was allowed after week 30. Efficacy, safety and immunogenicity were measured at each visit up to week 54. Radiographic damage evaluated by modified total Sharp score was measured at baseline and week 54. A total of 584 patients were randomized to receive SB2 (n = 291) or INF (n = 293). The rate of radiographic progression was comparable between SB2 and INF (mean modified total Sharp score difference: SB2, 0.38; INF, 0.37) at 1 year. ACR responses, 28-joint DAS, Clinical Disease Activity Index and Simplified Disease Activity Index were comparable between SB2 and INF up to week 54. The incidence of treatment-emergent adverse events and anti-drug antibodies were comparable between treatment groups. Such comparable trends of efficacy, safety and immunogenicity were consistent from baseline up to 54 weeks. The pattern of dose increment was also comparable between SB2 and INF. SB2 maintained similar efficacy, safety and immunogenicity with INF up to 54 weeks in patients with moderate to severe RA. Radiographic progression was comparable at 1 year. ClinicalTrials.gov (http://clinicaltrials.gov; NCT01936181) and EudraCT (https://www.clinicaltrialsregister.eu; 2012-005733-37).

  11. Treatment of functional dyspepsia with a fixed peppermint oil and caraway oil combination preparation as compared to cisapride. A multicenter, reference-controlled double-blind equivalence study.

    PubMed

    Madisch, A; Heydenreich, C J; Wieland, V; Hufnagel, R; Hotz, J

    1999-11-01

    The therapeutic equivalence of a fixed combination preparation consisting of peppermint oil and caraway oil (PCC, Enteroplant) and the prokinetic agent cisapride (CIS, CAS 81098-60-4) was investigated in a four-week randomized controlled double-blind study with planned adaptive interim analysis. The study comprised 120 outpatients with functional dyspepsia. The efficacy was evaluated in 118 patients. Of these, 60 patients received the enteric-coated combination preparation (2 x 1 capsule containing 90 mg peppermint oil +50 mg caraway oil per day) and 58 patients received the reference preparation cisapride (3 x 10 mg/day). The mean reduction of the pain score (primary variable) recorded on a visual analog scale (VAS) during the four-week treatment was 4.62 points with the peppermint oil/caraway oil preparation. This score was comparable with the mean reduction under cisapride (4.60 points) (p = 0.021; test for equivalence). Equivalence was also found in the secondary variable "frequency of pain" with a reduction by 4.65 points under PCC and by 4.16 points under cisapride carried out on an exploratory basis (p = 0.0034). Comparable results were attained with both treatments in the Dyspeptic Discomfort Score which included the other dyspeptic symptoms as well as intestinal and extraintestinal autonomic symptoms, in the prognosis as appraised by the physician and in the CGI scales (Clinical Global Impressions). Corresponding results were also found in Helicobacter pylori-positive patients and patients with initially intense epigastric pain in the two treatment groups. The combination preparation consisting of peppermint oil and caraway, oil appears to be comparable with cisapride and provides an effective means for treatment of functional dyspepsia. Both medications were tolerated well (adverse events were reported in 12 patients of the PCC group and in 14 patients of the CIS group).

  12. Body Shape Preferences: Associations with Rater Body Shape and Sociosexuality

    PubMed Central

    Price, Michael E.; Pound, Nicholas; Dunn, James; Hopkins, Sian; Kang, Jinsheng

    2013-01-01

    There is accumulating evidence of condition-dependent mate choice in many species, that is, individual preferences varying in strength according to the condition of the chooser. In humans, for example, people with more attractive faces/bodies, and who are higher in sociosexuality, exhibit stronger preferences for attractive traits in opposite-sex faces/bodies. However, previous studies have tended to use only relatively simple, isolated measures of rater attractiveness. Here we use 3D body scanning technology to examine associations between strength of rater preferences for attractive traits in opposite-sex bodies, and raters’ body shape, self-perceived attractiveness, and sociosexuality. For 118 raters and 80 stimuli models, we used a 3D scanner to extract body measurements associated with attractiveness (male waist-chest ratio [WCR], female waist-hip ratio [WHR], and volume-height index [VHI] in both sexes) and also measured rater self-perceived attractiveness and sociosexuality. As expected, WHR and VHI were important predictors of female body attractiveness, while WCR and VHI were important predictors of male body attractiveness. Results indicated that male rater sociosexuality scores were positively associated with strength of preference for attractive (low) VHI and attractive (low) WHR in female bodies. Moreover, male rater self-perceived attractiveness was positively associated with strength of preference for low VHI in female bodies. The only evidence of condition-dependent preferences in females was a positive association between attractive VHI in female raters and preferences for attractive (low) WCR in male bodies. No other significant associations were observed in either sex between aspects of rater body shape and strength of preferences for attractive opposite-sex body traits. These results suggest that among male raters, rater self-perceived attractiveness and sociosexuality are important predictors of preference strength for attractive opposite

  13. Rater agreement of visual lameness assessment in horses during lungeing

    PubMed Central

    Hammarberg, M.; Egenvall, A.; Pfau, T.

    2015-01-01

    Summary Reasons for performing study Lungeing is an important part of lameness examinations as the circular path may accentuate low‐grade lameness. Movement asymmetries related to the circular path, to compensatory movements and to pain make the lameness evaluation complex. Scientific studies have shown high inter‐rater variation when assessing lameness during straight line movement. Objectives The aim was to estimate inter‐ and intra‐rater agreement of equine veterinarians evaluating lameness from videos of sound and lame horses during lungeing and to investigate the influence of veterinarians’ experience and the objective degree of movement asymmetry on rater agreement. Study design Cross‐sectional observational study. Methods Video recordings and quantitative gait analysis with inertial sensors were performed in 23 riding horses of various breeds. The horses were examined at trot on a straight line and during lungeing on soft or hard surfaces in both directions. One video sequence was recorded per condition and the horses were classified as forelimb lame, hindlimb lame or sound from objective straight line symmetry measurements. Equine veterinarians (n = 86), including 43 with >5 years of orthopaedic experience, participated in a web‐based survey and were asked to identify the lamest limb on 60 videos, including 10 repeats. The agreements between (inter‐rater) and within (intra‐rater) veterinarians were analysed with κ statistics (Fleiss, Cohen). Results Inter‐rater agreement κ was 0.31 (0.38/0.25 for experienced/less experienced) and higher for forelimb (0.33) than for hindlimb lameness (0.11) or soundness (0.08) evaluation. Median intra‐rater agreement κ was 0.57. Conclusions Inter‐rater agreement was poor for less experienced raters, and for all raters when evaluating hindlimb lameness. Since identification of the lame limb/limbs is a prerequisite for successful diagnosis, treatment and recovery, the high inter‐rater variation

  14. OC10 - Inter-rater agreement of the Paediatric Early Warning Score tools used in the central Denmark region.

    PubMed

    Jensen, Claus Sixtus; Aagaard, Hanne; Vebert Olesen, Hanne; Kirkegaard, Hans

    2016-05-09

    Theme: Patient safety Background: Paediatric Early Warning Score (PEWS) tools can assist healthcare providers in the rapid detection and recognition of changes in patient condition. In the central Denmark region two different PEWS tools tested in large-scale RCT study. However, data from PEWS instruments are only as reliable and accurate as the caregiver who obtains and documents the parameters. The purpose was to evaluate the inter-rater agreement among nurses using the PEWS systems. The study was conducted in five paediatrics departments. Inter-observer reliability was investigated through simultaneous blinded PEWS assessment on the same patients by two nurses. Fleiss' kappa was utilized to determine the level of agreement among the raters. With a paucity of published reliability testing studies, this research attempts to address identified research gaps and will thus inform nursing practice.

  15. A sequential test for assessing observed agreement between raters.

    PubMed

    Bersimis, Sotiris; Sachlas, Athanasios; Chakraborti, Subha

    2017-09-12

    Assessing the agreement between two or more raters is an important topic in medical practice. Existing techniques, which deal with categorical data, are based on contingency tables. This is often an obstacle in practice as we have to wait for a long time to collect the appropriate sample size of subjects to construct the contingency table. In this paper, we introduce a nonparametric sequential test for assessing agreement, which can be applied as data accrues, does not require a contingency table, facilitating a rapid assessment of the agreement. The proposed test is based on the cumulative sum of the number of disagreements between the two raters and a suitable statistic representing the waiting time until the cumulative sum exceeds a predefined threshold. We treat the cases of testing two raters' agreement with respect to one or more characteristics and using two or more classification categories, the case where the two raters extremely disagree, and finally the case of testing more than two raters' agreement. The numerical investigation shows that the proposed test has excellent performance. Compared to the existing methods, the proposed method appears to require significantly smaller sample size with equivalent power. Moreover, the proposed method is easily generalizable and brings the problem of assessing the agreement between two or more raters and one or more characteristics under a unified framework, thus providing an easy to use tool to medical practitioners. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  16. Virtual Raters for Reproducible and Objective Assessments in Radiology

    NASA Astrophysics Data System (ADS)

    Kleesiek, Jens; Petersen, Jens; Döring, Markus; Maier-Hein, Klaus; Köthe, Ullrich; Wick, Wolfgang; Hamprecht, Fred A.; Bendszus, Martin; Biller, Armin

    2016-04-01

    Volumetric measurements in radiologic images are important for monitoring tumor growth and treatment response. To make these more reproducible and objective we introduce the concept of virtual raters (VRs). A virtual rater is obtained by combining knowledge of machine-learning algorithms trained with past annotations of multiple human raters with the instantaneous rating of one human expert. Thus, he is virtually guided by several experts. To evaluate the approach we perform experiments with multi-channel magnetic resonance imaging (MRI) data sets. Next to gross tumor volume (GTV) we also investigate subcategories like edema, contrast-enhancing and non-enhancing tumor. The first data set consists of N = 71 longitudinal follow-up scans of 15 patients suffering from glioblastoma (GB). The second data set comprises N = 30 scans of low- and high-grade gliomas. For comparison we computed Pearson Correlation, Intra-class Correlation Coefficient (ICC) and Dice score. Virtual raters always lead to an improvement w.r.t. inter- and intra-rater agreement. Comparing the 2D Response Assessment in Neuro-Oncology (RANO) measurements to the volumetric measurements of the virtual raters results in one-third of the cases in a deviating rating. Hence, we believe that our approach will have an impact on the evaluation of clinical studies as well as on routine imaging diagnostics.

  17. Virtual Raters for Reproducible and Objective Assessments in Radiology

    PubMed Central

    Kleesiek, Jens; Petersen, Jens; Döring, Markus; Maier-Hein, Klaus; Köthe, Ullrich; Wick, Wolfgang; Hamprecht, Fred A.; Bendszus, Martin; Biller, Armin

    2016-01-01

    Volumetric measurements in radiologic images are important for monitoring tumor growth and treatment response. To make these more reproducible and objective we introduce the concept of virtual raters (VRs). A virtual rater is obtained by combining knowledge of machine-learning algorithms trained with past annotations of multiple human raters with the instantaneous rating of one human expert. Thus, he is virtually guided by several experts. To evaluate the approach we perform experiments with multi-channel magnetic resonance imaging (MRI) data sets. Next to gross tumor volume (GTV) we also investigate subcategories like edema, contrast-enhancing and non-enhancing tumor. The first data set consists of N = 71 longitudinal follow-up scans of 15 patients suffering from glioblastoma (GB). The second data set comprises N = 30 scans of low- and high-grade gliomas. For comparison we computed Pearson Correlation, Intra-class Correlation Coefficient (ICC) and Dice score. Virtual raters always lead to an improvement w.r.t. inter- and intra-rater agreement. Comparing the 2D Response Assessment in Neuro-Oncology (RANO) measurements to the volumetric measurements of the virtual raters results in one-third of the cases in a deviating rating. Hence, we believe that our approach will have an impact on the evaluation of clinical studies as well as on routine imaging diagnostics. PMID:27118379

  18. A Qualitative Analysis of Rater Behavior on an L2 Speaking Assessment

    ERIC Educational Resources Information Center

    Kim, Hyun Jung

    2015-01-01

    Human raters are normally involved in L2 performance assessment; as a result, rater behavior has been widely investigated to reduce rater effects on test scores and to provide validity arguments. Yet raters' cognition and use of rubrics in their actual rating have rarely been explored qualitatively in L2 speaking assessments. In this study three…

  19. Rater Expertise in a Second Language Speaking Assessment: The Influence of Training and Experience

    ERIC Educational Resources Information Center

    Davis, Lawrence Edward

    2012-01-01

    Speaking performance tests typically employ raters to produce scores; accordingly, variability in raters' scoring decisions has important consequences for test reliability and validity. One such source of variability is the rater's level of expertise in scoring. Therefore, it is important to understand how raters' performance is influenced by…

  20. A Qualitative Analysis of Rater Behavior on an L2 Speaking Assessment

    ERIC Educational Resources Information Center

    Kim, Hyun Jung

    2015-01-01

    Human raters are normally involved in L2 performance assessment; as a result, rater behavior has been widely investigated to reduce rater effects on test scores and to provide validity arguments. Yet raters' cognition and use of rubrics in their actual rating have rarely been explored qualitatively in L2 speaking assessments. In this study three…

  1. Test-re-test reliability and inter-rater reliability of a digital pelvic inclinometer in young, healthy males and females.

    PubMed

    Beardsley, Chris; Egerton, Tim; Skinner, Brendon

    2016-01-01

    Objective. The purpose of this study was to investigate the reliability of a digital pelvic inclinometer (DPI) for measuring sagittal plane pelvic tilt in 18 young, healthy males and females. Method. The inter-rater reliability and test-re-test reliabilities of the DPI for measuring pelvic tilt in standing on both the right and left sides of the pelvis were measured by two raters carrying out two rating sessions of the same subjects, three weeks apart. Results. For measuring pelvic tilt, inter-rater reliability was designated as good on both sides (ICC = 0.81-0.88), test-re-test reliability within a single rating session was designated as good on both sides (ICC = 0.88-0.95), and test-re-test reliability between two rating sessions was designated as moderate on the left side (ICC = 0.65) and good on the right side (ICC = 0.85). Conclusion. Inter-rater reliability and test-re-test reliability within a single rating session of the DPI in measuring pelvic tilt were both good, while test-re-test reliability between rating sessions was moderate-to-good. Caution is required regarding the interpretation of the test-re-test reliability within a single rating session, as the raters were not blinded. Further research is required to establish validity.

  2. Explaining sexual harassment judgments: looking beyond gender of the rater.

    PubMed

    O'Connor, Maureen; Gutek, Barbara A; Stockdale, Margaret; Geer, Tracey M; Melançon, Renée

    2004-02-01

    In two decades of research on sexual harassment, one finding that appears repeatedly is that gender of the rater influences judgments about sexual harassment such that women are more likely than men to label behavior as sexual harassment. Yet, sexual harassment judgments are complex, particularly in situations that culminate in legal proceedings. And, this one variable, gender, may have been overemphasized to the exclusion of other situational and rater characteristic variables. Moreover, why do gender differences appear? As work by Wiener and his colleagues have done (R. L. Wiener et al., 2002; R. L. Wiener & L. Hurt, 2000; R. L. Wiener, L. Hurt, B. Russell, K. Mannen, & C. Gasper, 1997), this study attempts to look beyond gender to answer this question. In the studies reported here, raters (undergraduates and community adults), either read a written scenario or viewed a videotaped reenactment of a sexual harassment trial. The nature of the work environment was manipulated to see what, if any, effect the context would have on gender effects. Additionally, a number of rater characteristics beyond gender were measured, including ambivalent sexism attitudes of the raters, their judgments of complainant credibility, and self-referencing that might help explain rater judgments. Respondent gender, work environment, and community vs. student sample differences produced reliable differences in sexual harassment ratings in both the written and video trial versions of the study. The gender and sample differences in the sexual harassment ratings, however, are explained by a model which incorporates hostile sexism, perceptions of the complainants credibility, and raters' own ability to put themselves in the complainant's position (self-referencing).

  3. Change blindness images.

    PubMed

    Ma, Li-Qian; Xu, Kun; Wong, Tien-Tsin; Jiang, Bi-Ye; Hu, Shi-Min

    2013-11-01

    Change blindness refers to human inability to recognize large visual changes between images. In this paper, we present the first computational model of change blindness to quantify the degree of blindness between an image pair. It comprises a novel context-dependent saliency model and a measure of change, the former dependent on the site of the change, and the latter describing the amount of change. This saliency model in particular addresses the influence of background complexity, which plays an important role in the phenomenon of change blindness. Using the proposed computational model, we are able to synthesize changed images with desired degrees of blindness. User studies and comparisons to state-of-the-art saliency models demonstrate the effectiveness of our model.

  4. Workplace-based assessment: effects of rater expertise.

    PubMed

    Govaerts, M J B; Schuwirth, L W T; Van der Vleuten, C P M; Muijtjens, A M M

    2011-05-01

    Traditional psychometric approaches towards assessment tend to focus exclusively on quantitative properties of assessment outcomes. This may limit more meaningful educational approaches towards workplace-based assessment (WBA). Cognition-based models of WBA argue that assessment outcomes are determined by cognitive processes by raters which are very similar to reasoning, judgment and decision making in professional domains such as medicine. The present study explores cognitive processes that underlie judgment and decision making by raters when observing performance in the clinical workplace. It specifically focuses on how differences in rating experience influence information processing by raters. Verbal protocol analysis was used to investigate how experienced and non-experienced raters select and use observational data to arrive at judgments and decisions about trainees' performance in the clinical workplace. Differences between experienced and non-experienced raters were assessed with respect to time spent on information analysis and representation of trainee performance; performance scores; and information processing--using qualitative-based quantitative analysis of verbal data. Results showed expert-novice differences in time needed for representation of trainee performance, depending on complexity of the rating task. Experts paid more attention to situation-specific cues in the assessment context and they generated (significantly) more interpretations and fewer literal descriptions of observed behaviors. There were no significant differences in rating scores. Overall, our findings seemed to be consistent with other findings on expertise research, supporting theories underlying cognition-based models of assessment in the clinical workplace. Implications for WBA are discussed.

  5. Rater agreement on gait assessment during neurologic examination of horses.

    PubMed

    Olsen, E; Dunkel, B; Barker, W H J; Finding, E J T; Perkins, J D; Witte, T H; Yates, L J; Andersen, P H; Baiker, K; Piercy, R J

    2014-01-01

    Reproducible and accurate recognition of presence and severity of ataxia in horses with neurologic disease is important when establishing a diagnosis, assessing response to treatment, and making recommendations that might influence rider safety or a decision for euthanasia. To determine the reproducibility and validity of the gait assessment component in the neurologic examination of horses. Twenty-five horses referred to the Royal Veterinary College Equine Referral Hospital for neurological assessment (n = 15), purchased (without a history of gait abnormalities) for an unrelated study (n = 5), or donated because of perceived ataxia (n = 5). Utilizing a prospective study design; a group of board-certified medicine (n = 2) and surgery (n = 2) clinicians and residents (n = 2) assessed components of the equine neurologic examination (live and video recorded) and assigned individual and overall neurologic gait deficit grades (0-4). Inter-rater agreement and assessment-reassessment reliability were quantified using intraclass correlation coefficients (ICC). The ICCs of the selected components of the neurologic examination ranged from 0 to 0.69. "Backing up" and "recognition of mistakes over obstacle" were the only components with an ICC > 0.6. Assessment-reassessment agreement was poor to fair. The agreement on gait grading was good overall (ICC = 0.74), but poor for grades ≤ 1 (ICC = 0.08) and fair for ataxia grades ≥ 2 (ICC = 0.43). Clinicians with prior knowledge of a possible gait abnormality were more likely to assign a grade higher than the median grade. Clinicians should be aware of poor agreement even between skilled observers of equine gait abnormalities, especially when the clinical signs are subtle. Copyright © 2014 The Authors. Journal of Veterinary Internal Medicine published by Wiley Periodicals, Inc. on behalf of American College of Veterinary Internal Medicine.

  6. Stulberg classification system for evaluation of Legg-Calvé-Perthes disease: intra-rater and inter-rater reliability.

    PubMed

    Neyt, J G; Weinstein, S L; Spratt, K F; Dolan, L; Morcuende, J; Dietz, F R; Guyton, G; Hart, R; Kraut, M S; Lervick, G; Pardubsky, P; Saterbak, A

    1999-09-01

    Researchers and clinicians commonly use the classification system of Stulberg et al. as a basis for treatment decisions during the active phase of Legg-Calvé-Perthes disease because of its putative utility as a predictor of long-term outcome. It is generally assumed that this system has an acceptable degree of reliability. This assumption, however, is not convincingly supported by the literature. The purpose of the present study was to assess the inter-rater and intra-rater reliability of the classification system of Stulberg et al. with use of a pre-test, post-test design. During the pre-test phase, nine raters independently used the system to evaluate the radiographs of skeletally mature patients who had been managed for Legg-Calvé-Perthes disease. The intervention between the pre-test and post-test phases consisted of a consensus-building session during which all raters jointly arrived at standardized definitions of the various joint structures that are assessed with use of the classification system. The effect of these definitions on reliability then was assessed by reevaluating the radiographs during the post-test phase. The pre-test intra-rater reliability coefficients ranged from 0.709 to 0.915, and the post-test coefficients ranged from 0.568 to 0.874. The pre-test inter-rater reliability coefficients ranged from 0.603 to 0.732, and the post-test coefficients ranged from 0.648 to 0.744. Contributing to the variance was a lack of agreement concerning the assessment of joint structures and the way in which the raters translated these evaluations into a classification according to the system of Stulberg et al. Although intra-rater reliability was marginally acceptable, the degree of variability between the classifications assigned by different raters even after the intervention - calls into question the reliability of the system of Stulberg et al.; consequently, the validity of any treatment decisions, outcome evaluations, or epidemiological studies based on

  7. Color blindness

    MedlinePlus

    ... have trouble telling the difference between red and green. This is the most common type of color ... color blindness often have problems seeing reds and greens, too. The most severe form of color blindness ...

  8. Blind Astronomers

    NASA Astrophysics Data System (ADS)

    Hockey, Thomas A.

    2011-01-01

    The phrase "blind astronomer” is used as an allegorical oxymoron. However, there were and are blind astronomers. What of famous blind astronomers? First, it must be stated that these astronomers were not martyrs to their craft. It is a myth that astronomers blind themselves by observing the Sun. As early as France's William of Saint-Cloud (circa 1290) astronomers knew that staring at the Sun was ill-advised and avoided it. Galileo Galilei did not invent the astronomical telescope and then proceed to blind himself with one. Galileo observed the Sun near sunrise and sunset or through projection. More than two decades later he became blind, as many septuagenarians do, unrelated to their profession. Even Isaac Newton temporarily blinded himself, staring at the reflection of the Sun when he was a twentysomething. But permanent Sun-induced blindness? No, it did not happen. For instance, it was a stroke that left Scotland's James Gregory (1638-1675) blind. (You will remember the Gregorian telescope.) However, he died days later. Thus, blindness little interfered with his occupation. English Abbot Richard of Wallingford (circa 1291 - circa 1335) wrote astronomical works and designed astronomical instruments. He was also blind in one eye. Yet as he further suffered from leprosy, his blindness seems the lesser of Richard's maladies. Perhaps the most famous professionally active, blind astronomer (or almost blind astronomer) is Dominique-Francois Arago (1786-1853), director until his death of the powerful nineteenth-century Paris Observatory. I will share other _ some poignant _ examples such as: William Campbell, whose blindness drove him to suicide; Leonhard Euler, astronomy's Beethoven, who did nearly half of his life's work while almost totally blind; and Edwin Frost, who "observed” a total solar eclipse while completely sightless.

  9. A Note on the Interpretation of Weighted Kappa and its Relations to Other Rater Agreement Statistics for Metric Scales

    ERIC Educational Resources Information Center

    Schuster, Christof

    2004-01-01

    This article presents a formula for weighted kappa in terms of rater means, rater variances, and the rater covariance that is particularly helpful in emphasizing that weighted kappa is an absolute agreement measure in the sense that it is sensitive to differences in rater's marginal distributions. Specifically, rater mean differences will decrease…

  10. Inter-rater Reliability of Three Musculoskeletal Physical examination Techniques Used to Assess Motion in Three Planes While Standing

    PubMed Central

    Prather, Heidi; Hunt, Devyani; Steger-May, Karen; Hayes, Marcie Harris; Knaus, Evan; Clohisy, John

    2012-01-01

    Objective The objective of the study was to measure the reliability between examiners of three basic maneuvers of the Total Body Functional Profile© physical examination test. The hypothesis was musculoskeletal health care providers of different disciplines could reliably use the three basic maneuvers as part of the musculoskeletal physical examination. Design A prospective observational study was conducted. Twenty-eight adult volunteers were measured on both the left and right side by two independent raters on a single occasion. Setting The subjects were recruited through advertisements placed by the orthopedic department at a tertiary university. Participants 28 volunteers were recruited and completed the study. The volunteers were between the ages of 18 and 51 years of age, had no symptoms in the lower extremity or spine, had no previous history of surgery or tumor involving the lower extremity, and no medical conditions that would preclude participation. Assessment On a single occasion, two examiners per one volunteer were blinded to their own and each others' measurements. Each examiner assessed the distance of frontal and sagittal plane lunge and angle of motion for transverse plane testing. Main Outcome Measurements Inter-rater agreement is expressed with intraclass correlation coefficients (ICCs) and corresponding 95% confidence intervals (CIs). The difference between raters is reported with 95% CIs. Baseline demographics, UCLA, and Harris hip questionnaires were completed by all participants. Results The UCLA and Harris hip scores showed no significant activity restrictions or pain limitations in all participants. The inter-rater reliability for sagittal, frontal, and transverse plane matrix testing was good with ICCs of 0.86 (95% CI 0.77, 0.91), 0.90 (95% CI 0.84, 0.94), and 0.85 (95% CI 0.75, 0.91) respectively. The rater reliability between disciplines for transverse, sagittal and frontal plane matrix testing was good with ICCs of 0.89 (95% CI 0.80, 0

  11. Inter-rater reliability of modified Alberta Stroke program early computerized tomography score in patients with brain infarction

    PubMed Central

    Ghandehari, Kavian; Rezvani, Mohammad Reza; Shakeri, Mohammad Taghi; Mohammadifard, Mahdi; Ehsanbakhsh, Alireza; Mohammadifard, Mahyar; Mirgholami, Alireza; Boostani, Reza; Ghandehari, Kosar; Izadi-Mood, Zahra

    2011-01-01

    BACKGROUND: The Alberta Stroke Program Early Computerized Tomography Score (ASPECTS) was used to detect significant early ischemic changes on brain CT of acute stroke patients. We designed the modified ASPECTS and compared it to the above system based on the inter-rater reliability. METHODS: A cross-sectional validation study was conducted based on the inter-rater reliability. The CT images were chosen from the stroke data bank of Ghaem hospital, Mashhad in 2010. The inclusion criteria were the presence of middle cerebral artery territory infarction and performance of CT within 6 hours after stroke onset. Axial CT scans were performed on a third-generation CT scanner (Siemens, ARTX, Germany). Section thickness above posterior fossa was 10 mm (130 kV, 150 mAs). Films were made at window level of 35 HU. The brain CTs were scored by four independent radiologists based on the ASPECTS and modified ASPECTS. The readers were blind to clinical information except symptom side. Cochrane Q and Kappa tests served for statistical analysis. RESULTS: 24 CT scans were available and of sufficient quality. Difference in distribution of dichotomized ≤7 and >7 ASPECT scores between four raters was significant (Q=13.071, df=3, p=0.04). Distribution of dichotomized <6 and ≥6 scores based on modified ASPECT system between 4 raters was not significantly different (Q=6.349, df=3, p=0.096). CONCLUSIONS: Modified ASPECT method is more reliable than ASPECTS in detecting major early ischemic changes in stroke patients candidated to tPA thrombolysis. PMID:22973327

  12. Exploring Examiner Judgement of Professional Competence in Rater Based Assessment

    ERIC Educational Resources Information Center

    Naumann, Fiona L.; Marshall, Stephen; Shulruf, Boaz; Jones, Philip D.

    2016-01-01

    Exercise physiology courses have transitioned to competency based, forcing Universities to rethink assessment to ensure students are competent to practice. This study built on earlier research to explore rater cognition, capturing factors that contribute to assessor decision making about students' competency. The aims were to determine the source…

  13. Effects of Rater Characteristics and Scoring Methods on Speaking Assessment

    ERIC Educational Resources Information Center

    Matsugu, Sawako

    2013-01-01

    Understanding the sources of variance in speaking assessment is important in Japan where society's high demand for English speaking skills is growing. Three challenges threaten fair assessment of speaking. First, in Japanese university speaking courses, teachers are typically the only raters, but teachers' knowledge of their students may unfairly…

  14. Another Look at Inter-Rater Agreement. Research Report.

    ERIC Educational Resources Information Center

    Zwick, Rebecca

    Most currently used measures of inter-rater agreement for the nominal case incorporate a correction for "chance agreement." The definition of chance agreement is not the same for all coefficients, however. Three chance-corrected coefficients are Cohen's Kappa; Scott's Pi; and the S index of Bennett, Goldstein, and Alpert, which has…

  15. Effects of Rater Characteristics and Scoring Methods on Speaking Assessment

    ERIC Educational Resources Information Center

    Matsugu, Sawako

    2013-01-01

    Understanding the sources of variance in speaking assessment is important in Japan where society's high demand for English speaking skills is growing. Three challenges threaten fair assessment of speaking. First, in Japanese university speaking courses, teachers are typically the only raters, but teachers' knowledge of their students may unfairly…

  16. Effects of Rater-Ratee Similarity on Performance Ratings.

    ERIC Educational Resources Information Center

    Zalesny, Mary D.; Kirsch, Michael P.

    Social psychological research on attitude/belief similarity points to perceived similarity between people as leading to greater attraction and more favorable evaluations. When applied to the area of performance evaluations, organizational research on raters and ratees has focused primarily on the characteristics of the participants of the…

  17. Qualities of Judgmental Ratings by Four Rater Sources.

    ERIC Educational Resources Information Center

    Tsui, Anne S.

    Quality of performance data yielded by subjective judgment is of major concern to researchers in performance appraisal. However, some confusion exists in the analysis of quality on ratings obtained from different rating scale formats and from different raters. To clarify this confusion, a study was conducted to assess the quality of judgmental…

  18. Inter-Rater Reliability of an Electronic Discussion Coding System.

    ERIC Educational Resources Information Center

    MacKinnon, Gregory R.

    A "cognote" system was developed for coding electronic discussion groups and promoting critical thinking. Previous literature has provided an account of the strategy as applied to several academic settings. This paper addresses the research around establishing the inter-rater reliability of the cognote system. The findings suggest three indicators…

  19. Workplace-Based Assessment: Effects of Rater Expertise

    ERIC Educational Resources Information Center

    Govaerts, M. J. B.; Schuwirth, L. W. T.; Van der Vleuten, C. P. M.; Muijtjens, A. M. M.

    2011-01-01

    Traditional psychometric approaches towards assessment tend to focus exclusively on quantitative properties of assessment outcomes. This may limit more meaningful educational approaches towards workplace-based assessment (WBA). Cognition-based models of WBA argue that assessment outcomes are determined by cognitive processes by raters which are…

  20. Rating Format Effects on Rater Agreement and Reliability.

    ERIC Educational Resources Information Center

    Littlefield, John H.; Troendle, G. Roger

    This study compares intra- and inter-rater agreement and reliability when using three different rating form formats to assess the same stimuli. One format requests assessment by marking detailed criteria without an overall judgement; the second format requests only an overall judgement without the use of detailed criteria; and the third format…

  1. Double-blind, placebo-controlled study of three-month treatment with lymecycline in reactive arthritis, with special reference to Chlamydia arthritis.

    PubMed

    Lauhio, A; Leirisalo-Repo, M; Lähdevirta, J; Saikku, P; Repo, H

    1991-01-01

    We conducted a double-blind, placebo-controlled, randomized study of 3-month treatment with lymecycline, a form of tetracycline, in reactive arthritis (ReA). Lymecycline therapy significantly decreased the duration of the illness in patients with Chlamydia trachomatis-triggered ReA, but not in other ReA patients. In 2 ReA patients, C trachomatis was found in the throat, an uncommon locale for this organism. Our results suggest that it is important to verify the triggering microbe and that it is beneficial to treat Chlamydia arthritis patients with a prolonged course of tetracycline.

  2. Measuring the quality of life in mild to very severe dementia: testing the inter-rater and intra-rater reliability of the German version of the QUALIDEM.

    PubMed

    Dichter, Martin Nikolaus; Schwab, Christian G G; Meyer, Gabriele; Bartholomeyczik, Sabine; Dortmann, Olga; Halek, Margareta

    2014-05-01

    Quality of life (Qol) is an increasingly used outcome measure in dementia research. The QUALIDEM is a dementia-specific and proxy-rated Qol instrument. We aimed to determine the inter-rater and intra-rater reliability in residents with dementia in German nursing homes. The QUALIDEM consists of nine subscales that were applied to a sample of 108 people with mild to severe dementia and six consecutive subscales that were applied to a sample of 53 people with very severe dementia. The proxy raters were 49 registered nurses and nursing assistants. Inter-rater and intra-rater reliability scores were calculated on the subscale and item level. None of the QUALIDEM subscales showed strong inter-rater reliability based on the single-measure Intra-Class Correlation Coefficient (ICC) for absolute agreement ≥ 0.70. Based on the average-measure ICC for four raters, eight subscales for people with mild to severe dementia (care relationship, positive affect, negative affect, restless tense behavior, social relations, social isolation, feeling at home and having something to do) and five subscales for very severe dementia (care relationship, negative affect, restless tense behavior, social relations and social isolation) yielded a strong inter-rater agreement (ICC: 0.72-0.86). All of the QUALIDEM subscales, regardless of dementia severity, showed strong intra-rater agreement. The ICC values ranged between 0.70 and 0.79 for people with mild to severe dementia and between 0.75 and 0.87 for people with very severe dementia. This study demonstrated insufficient inter-rater reliability and sufficient intra-rater reliability for all subscales of both versions of the German QUALIDEM. The degree of inter-rater reliability can be improved by collaborative Qol rating by more than one nurse. The development of a measurement manual with accurate item definitions and a standardized education program for proxy raters is recommended.

  3. A paired comparison analysis of third-party rater thyroidectomy scar preference.

    PubMed

    Rajakumar, C; Doyle, P C; Brandt, M G; Moore, C C; Nichols, A; Franklin, J H; Yoo, J; Fung, K

    2017-01-01

    To determine the length and position of a thyroidectomy scar that is cosmetically most appealing to naïve raters. Images of thyroidectomy scars were reproduced on male and female necks using digital imaging software. Surgical variables studied were scar position and length. Fifteen raters were presented with 56 scar pairings and asked to identify which was preferred cosmetically. Twenty duplicate pairings were included to assess rater reliability. Analysis of variance was used to determine preference. Raters preferred low, short scars, followed by high, short scars, with long scars in either position being less desirable (p < 0.05). Twelve of 15 raters had acceptable intra-rater and inter-rater reliability. Naïve raters preferred low, short scars over the alternatives. High, short scars were the next most favourably rated. If other factors influencing incision choice are considered equal, surgeons should consider these preferences in scar position and length when planning their thyroidectomy approach.

  4. Intra-Rater and Inter-Rater Reliability of the Balance Error Scoring System in Pre-Adolescent School Children

    ERIC Educational Resources Information Center

    Sheehan, Dwayne P.; Lafave, Mark R.; Katz, Larry

    2011-01-01

    This study was designed to test the intra- and inter-rater reliability of the University of North Carolina's Balance Error Scoring System in 9- and 10-year-old children. Additionally, a modified version of the Balance Error Scoring System was tested to determine if it was more sensitive in this population ("raw scores"). Forty-six…

  5. Intra-Rater and Inter-Rater Reliability of the Balance Error Scoring System in Pre-Adolescent School Children

    ERIC Educational Resources Information Center

    Sheehan, Dwayne P.; Lafave, Mark R.; Katz, Larry

    2011-01-01

    This study was designed to test the intra- and inter-rater reliability of the University of North Carolina's Balance Error Scoring System in 9- and 10-year-old children. Additionally, a modified version of the Balance Error Scoring System was tested to determine if it was more sensitive in this population ("raw scores"). Forty-six…

  6. Overview on Deaf-Blindness

    ERIC Educational Resources Information Center

    Miles, Barbara

    2008-01-01

    It may seem that deaf-blindness refers to a total inability to see or hear. However, in reality deaf-blindness is a condition in which the combination of hearing and visual losses in children cause "such severe communication and other develop mental and educational needs that they cannot be accommodated in special education programs solely for…

  7. Statistically Comparing the Performance of Multiple Automated Raters across Multiple Items

    ERIC Educational Resources Information Center

    Kieftenbeld, Vincent; Boyer, Michelle

    2017-01-01

    Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…

  8. Investigating Differences between American and Indian Raters in Assessing TOEFL iBT Speaking Tasks

    ERIC Educational Resources Information Center

    Wei, Jing; Llosa, Lorena

    2015-01-01

    This article reports on an investigation of the role raters' language background plays in raters' assessment of test takers' speaking ability. Specifically, this article examines differences between American and Indian raters in their scores and scoring processes when rating Indian test takers' responses to the Test of English as a Foreign…

  9. The Influence of Training and Experience on Rater Performance in Scoring Spoken Language

    ERIC Educational Resources Information Center

    Davis, Larry

    2016-01-01

    Two factors were investigated that are thought to contribute to consistency in rater scoring judgments: rater training and experience in scoring. Also considered were the relative effects of scoring rubrics and exemplars on rater performance. Experienced teachers of English (N = 20) scored recorded responses from the TOEFL iBT speaking test prior…

  10. Rater Sensitivity to Lexical Accuracy, Sophistication and Range when Assessing Writing

    ERIC Educational Resources Information Center

    Fritz, Erik; Ruegg, Rachael

    2013-01-01

    Although raters can be trained to evaluate the lexical qualities of student essays, the question remains as to what extent raters follow the "lexis" scale descriptors in the rating scale when evaluating or rate according to their own criteria. The current study examines the extent to which 27 trained university EFL raters take various lexical…

  11. Investigating Raters' Development of Rating Ability on a Second Language Speaking Assessment

    ERIC Educational Resources Information Center

    Kim, Hyun Jung

    2011-01-01

    The purpose of the study was to investigate the extent to which raters coming from diverse backgrounds exhibited different levels of rating ability while scoring speaking performances. The study also aimed to examine how raters with different backgrounds could develop their rating ability over time. For this purpose, raters' background…

  12. Individual Differences in Rater Decision-Making Style: An Exploratory Mixed-Methods Study

    ERIC Educational Resources Information Center

    Baker, Beverly Anne

    2012-01-01

    Researchers of high-stakes, subjectively scored writing assessments have done much work to better understand the process that raters go through in applying a rating scale to a language performance to arrive at a score. However, there is still unexplained, systematic variability in rater scoring that resists rater training (see Hoyt & Kerns,…

  13. Adjusting for Year to Year Rater Variation in IRT Linking--An Empirical Evaluation

    ERIC Educational Resources Information Center

    Yen, Shu Jing; Ochieng, Charles; Michaels, Hillary; Friedman, Greg

    2005-01-01

    The main purpose of this study was to illustrate a polytomous IRT-based linking procedure that adjusts for rater variations. Test scores from two administrations of a statewide reading assessment were used. An anchor set of Year 1 students' constructed responses were rescored by Year 2 raters. To adjust for year-to-year rater variation in IRT…

  14. Two Models of Raters in a Structured Oral Examination: Does It Make a Difference?

    ERIC Educational Resources Information Center

    Touchie, Claire; Humphrey-Murto, Susan; Ainslie, Martha; Myers, Kathryn; Wood, Timothy J.

    2010-01-01

    Oral examinations have become more standardized over recent years. Traditionally a small number of raters were used for this type of examination. Past studies suggested that more raters should improve reliability. We compared the results of a multi-station structured oral examination using two different rater models, those based in a station,…

  15. Do Raters Demonstrate Halo Error When Scoring a Series of Responses?

    ERIC Educational Resources Information Center

    Ridge, Kirk

    This study investigated whether raters in two different training groups would demonstrate halo error when each rater scored all five responses to five different mathematics performance-based items from each student. One group of 20 raters was trained by an experienced scoring director with item-specific scoring rubrics and the opportunity to…

  16. Two Models of Raters in a Structured Oral Examination: Does It Make a Difference?

    ERIC Educational Resources Information Center

    Touchie, Claire; Humphrey-Murto, Susan; Ainslie, Martha; Myers, Kathryn; Wood, Timothy J.

    2010-01-01

    Oral examinations have become more standardized over recent years. Traditionally a small number of raters were used for this type of examination. Past studies suggested that more raters should improve reliability. We compared the results of a multi-station structured oral examination using two different rater models, those based in a station,…

  17. The Effect of Year-to-Year Rater Variation on IRT Linking

    ERIC Educational Resources Information Center

    Yen, Shu Jing; Ochieng, Charles; Michaels, Hillary; Friedman, Greg

    2005-01-01

    Year-to-year rater variation may result in constructed response (CR) parameter changes, making CR items inappropriate to use in anchor sets for linking or equating. This study demonstrates how rater severity affected the writing and reading scores. Rater adjustments were made to statewide results using an item response theory (IRT) methodology…

  18. Rater Behaviour When Judging Language Learners' Pragmatic Appropriateness in Extended Discourse

    ERIC Educational Resources Information Center

    Sydorenko, Tetyana; Maynard, Carson; Guntly, Erin

    2014-01-01

    The criteria by which raters judge pragmatic appropriateness of language learners' speech acts are underexamined, especially when raters evaluate extended discourse. To shed more light on this process, the present study investigated what factors are salient to raters when scoring pragmatic appropriateness of extended request sequences, and which…

  19. The Influence of Training and Experience on Rater Performance in Scoring Spoken Language

    ERIC Educational Resources Information Center

    Davis, Larry

    2016-01-01

    Two factors were investigated that are thought to contribute to consistency in rater scoring judgments: rater training and experience in scoring. Also considered were the relative effects of scoring rubrics and exemplars on rater performance. Experienced teachers of English (N = 20) scored recorded responses from the TOEFL iBT speaking test prior…

  20. Investigating Differences between American and Indian Raters in Assessing TOEFL iBT Speaking Tasks

    ERIC Educational Resources Information Center

    Wei, Jing; Llosa, Lorena

    2015-01-01

    This article reports on an investigation of the role raters' language background plays in raters' assessment of test takers' speaking ability. Specifically, this article examines differences between American and Indian raters in their scores and scoring processes when rating Indian test takers' responses to the Test of English as a Foreign…

  1. [Global blindness].

    PubMed

    Schulze Schwering, M

    2007-10-01

    Worldwide there are 37 million people who are completely blind and another 112 million whose sight is severely restricted. Of all blind people throughout the world, 85% live in developing countries. In three quarters of cases, blindness could be prevented or treated. The VISION 2020 campaign is dedicated to halving the number of people suffering from the diseases leading to blindness by means of disease control, training of specialist ophthalmic staff and development of appropriate infrastructures. More effort is needed if these goals are to be met. German ophthalmologists engaged in conservative and surgical treatments who join in and support VISION 2020 will be welcomed.

  2. Blind digital holographic microscopy

    NASA Astrophysics Data System (ADS)

    Anderson, Patrick N.; Wiegandt, Florian; Treacher, Daniel J.; Mang, Matthias M.; Gianani, Ilaria; Schiavi, Andrea; Lloyd, David T.; O'Keeffe, Kevin; Hooker, Simon M.; Walmsley, Ian A.

    2017-03-01

    A blind variant of digital holographic microscopy is presented that removes the requirement for a well-characterized, highly divergent reference beam. This is achieved by adopting an off-axis recording geometry where a sequence of holograms is recorded as the reference is tilted, and an iterative algorithm that estimates the amplitudes and phases of both beams while simultaneously enhancing the numerical aperture. Numerical simulations have demonstrated the accuracy and robustness of this approach when applied to the coherent imaging problem.

  3. Inter-rater agreement in the diagnosis of mucositis and peri-implantitis.

    PubMed

    Merli, Mauro; Bernardelli, Francesco; Giulianelli, Erica; Toselli, Ivano; Moscatelli, Marco; Pagliaro, Umberto; Nieri, Michele

    2014-09-01

    The objective was to assess the inter-rater agreement in the diagnosis of mucositis and peri-implantitis. Adult patients with ≥ 1 dental implant were eligible. Three operators examined the patients. One examiner allocated the patients to three groups of nine as follows: nine implants with peri-implantitis, nine implants with mucositis, and 9 implants with healthy mucosa. Each examiner recorded on all 27 patients (one implant per patient) recessions, probing depth, bleeding on probing, suppuration, keratinized tissue depth and bone loss, leading to a final diagnosis of mucositis, peri-implantitis or healthy mucosa. Examiners were independent and blinded to each other. Fleiss k-statistic with quadratic weight in the diagnosis of peri-implantitis and mucositis was 0.66 [CI95%: 0.45-0.87]. A complete agreement was obtained only in 14 cases (52%). Fleiss k-statistics in bleeding on probing and bone loss were respectively 0.31 [CI95%: 0.20-0.41] and 0.70 [CI95%: 0.45-0.94]. Intra-class correlation coefficients for recession, probing depth and keratinized tissue depth were respectively 0.69 [CI95%: 0.62-0.75], 0.54 [CI95%: 0.44-0.63] and 0.56 [CI95%: 0.27-0.77]. The inter-rater agreement in the diagnosis of peri-implant disease was qualified as merely good. This could also be due in part to the unclear definition of peri-implantitis and mucositis. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  4. Patient education programme on immunotherapy in multiple sclerosis (PEPIMS): a controlled rater-blinded study.

    PubMed

    Köpke, S; Kasper, J; Flachenecker, P; Meißner, H; Brandt, A; Hauptmann, B; Bender, G; Backhus, I; Rahn, A C; Pöttgen, J; Vettorazzi, E; Heesen, C

    2017-02-01

    To investigate the effectiveness of a multi-component evidence-based education programme on disease modifying therapies in multiple sclerosis. Controlled trial with two consecutive patient cohorts and a gap of two months between cohorts. Three neurological rehabilitation centres. Patients with multiple sclerosis within rehabilitation. Control group (CG) participants were recruited and received standard information. Two months later, intervention group (IG) participants were recruited and received a six-hour nurse-led interactive group education programme consisting of two parts and a comprehensive information brochure. Primary endpoint was "informed choice", comprising of adequate risk knowledge in combination with congruency between attitude towards immunotherapy and actual immunotherapy uptake. Further outcomes comprised risk knowledge, decision autonomy, anxiety and depression, self-efficacy, and fatigue. A total of 156 patients were included (IG=75, CG=81). The intervention led to significantly more participants with informed choice (IG: 47% vs. CG: 23%, P=0.004). The rate of persons with adequate risk knowledge was significantly higher in the IG two weeks after the intervention (IG: 54% vs. CG: 31%, P=0.007), but not after six months (IG: 48% vs. CG: 31%, P=0.058). No significant differences were shown for positive attitude towards disease modifying therapy (IG: 62% vs. CG: 71%, P=0.29) and for disease modifying therapy status after six months (IG: 61.5% vs CG: 68.6%, P=0.39). Also no differences were found for autonomy preferences and decisional conflict after six months. Delivering evidence-based information on multiple sclerosis disease modifying therapies within a rehabilitation setting led to a marked increase of informed choices.

  5. The Inter-rater Reliability of the Functional Movement Screen Within an Athletic Population Using Untrained Raters.

    PubMed

    Leeder, Jade E; Horsley, Ian G; Herrington, Lee C

    2016-09-01

    Elias JE. The inter-rater reliability of the functional movement screen within an athletic population using untrained raters. J Strength Cond Res 30(9): 2591-2599, 2016-The functional movement screen (FMS) is a commonly used screening tool designed to identify restrictions to movement patterns and increased injury risk using 7 predesigned tests. The purpose of this study was to analyze the inter-rater reliability of scoring of the FMS using a group of "untrained" subjects. Additionally, the study also examined if clinical experience level had any effect on reliability. Twenty fully qualified Physiotherapists working at the English Institute of Sport, with elite athletes, volunteered to participate in the study. The group comprised both level 2 and level 3 physiotherapists based on clinical experience levels. Five elite athletes, free from injury, were recruited and videoed completing 6 of the 7 FMS tests using a 3 camera system. The videos were scored by each Physiotherapist using the standardized scoring sheet, as developed by Cook et al. Each practitioner marked each athlete completing the 6 tests. The total scores were calculated for each athlete (maximum score of 18). The inter-rater reliability of the test was shown to be high, intraclass coefficient 0.906. An independent t test showed no significant differences between the level 2 and level 3 practitioners in the total scores (p = 0.502). The results of the test indicate that the FMS is a reliable screening tool when used by untrained practitioners in determining faulty movement patterns and that clinical experience level does not affect the reliability, therefore it may be a useful tool in the screening of athletic populations.

  6. Establishing intra- and inter-rater agreement of the Face, Legs, Activity, Cry, Consolability scale for evaluating pain in toddlers during immunization

    PubMed Central

    Gomez, Rebecca J; Barrowman, Nick; Elia, Sonja; Manias, Elizabeth; Royle, Jenny; Harrison, Denise

    2013-01-01

    BACKGROUND: The Face, Legs, Activity, Cry, Consolability (FLACC) scale is a five-item tool that was developed to assess postoperative pain in young children. The tool is frequently used as an outcome measure in studies investigating acute procedural pain in young children; however, there are limited published psychometric data in this context. OBJECTIVE: To establish inter-rater and intrarater agreement of the FLACC scale in toddlers during immunization. METHODS: Participants comprised a convenience sample of toddlers recruited from an immunization drop-in service, who were part of a larger pilot randomized controlled trial. Toddlers were video- and audiotaped during immunization procedures. The first rater scored each video twice in random order over a period of three weeks (intrarater agreement), while the second rater scored each video once and was blinded to the first rater’s scores (inter-rater agreement). The FLACC scale was scored at four time-points throughout the procedure. Intraclass correlation coefficients were used to assess agreement of the FLACC scale. RESULTS: Thirty toddlers between 12 and 18 months of age were recruited, and video data were available for 29. Intrarater agreement coefficients were 0.88 at baseline, 0.97 at insertion of first needle, and 0.80 and 0.81 at 15 s and 30 s following the final injection, respectively. Inter-rater coefficients were 0.40 at baseline, 0.95 at insertion of first needle, and 0.81 and 0.78 at 15 s and 30 s following the final injection, respectively. CONCLUSIONS: The FLACC scale has sufficient agreement in assessing pain in toddlers during immunizations, especially during the most painful periods of the procedure. PMID:24308028

  7. A Randomized, Placebo-Controlled, Active-Reference, Double-Blind, Flexible-Dose Study of the Efficacy of Vortioxetine on Cognitive Function in Major Depressive Disorder.

    PubMed

    Mahableshwarkar, Atul R; Zajecka, John; Jacobson, William; Chen, Yinzhong; Keefe, Richard S E

    2015-07-01

    This multicenter, randomized, double-blind, placebo-controlled, active-referenced (duloxetine 60 mg), parallel-group study evaluated the short-term efficacy and safety of vortioxetine (10-20 mg) on cognitive function in adults (aged 18-65 years) diagnosed with major depressive disorder (MDD) who self-reported cognitive dysfunction. Efficacy was evaluated using ANCOVA for the change from baseline to week 8 in the digit symbol substitution test (DSST)-number of correct symbols as the prespecified primary end point. The patient-reported perceived deficits questionnaire (PDQ) and physician-assessed clinical global impression (CGI) were analyzed in a prespecified hierarchical testing sequence as key secondary end points. Additional predefined end points included the objective performance-based University of San Diego performance-based skills assessment (UPSA) (ANCOVA) to measure functionality, MADRS (MMRM) to assess efficacy in depression, and a prespecified multiple regression analysis (path analysis) to calculate direct vs indirect effects of vortioxetine on cognitive function. Safety and tolerability were assessed at all visits. Vortioxetine was statistically superior to placebo on the DSST (P < 0.05), PDQ (P < 0.01), CGI-I (P < 0.001), MADRS (P < 0.05), and UPSA (P < 0.001). Path analysis indicated that vortioxetine's cognitive benefit was primarily a direct treatment effect rather than due to alleviation of depressive symptoms. Duloxetine was not significantly different from placebo on the DSST or UPSA, but was superior to placebo on the PDQ, CGI-I, and MADRS. Common adverse events (incidence ⩾ 5%) for vortioxetine were nausea, headache, and diarrhea. In this study of MDD adults who self-reported cognitive dysfunction, vortioxetine significantly improved cognitive function, depression, and functionality and was generally well tolerated.

  8. A Randomized, Placebo-Controlled, Active-Reference, Double-Blind, Flexible-Dose Study of the Efficacy of Vortioxetine on Cognitive Function in Major Depressive Disorder

    PubMed Central

    Mahableshwarkar, Atul R; Zajecka, John; Jacobson, William; Chen, Yinzhong; Keefe, Richard SE

    2015-01-01

    This multicenter, randomized, double-blind, placebo-controlled, active-referenced (duloxetine 60 mg), parallel-group study evaluated the short-term efficacy and safety of vortioxetine (10–20 mg) on cognitive function in adults (aged 18–65 years) diagnosed with major depressive disorder (MDD) who self-reported cognitive dysfunction. Efficacy was evaluated using ANCOVA for the change from baseline to week 8 in the digit symbol substitution test (DSST)–number of correct symbols as the prespecified primary end point. The patient-reported perceived deficits questionnaire (PDQ) and physician-assessed clinical global impression (CGI) were analyzed in a prespecified hierarchical testing sequence as key secondary end points. Additional predefined end points included the objective performance-based University of San Diego performance-based skills assessment (UPSA) (ANCOVA) to measure functionality, MADRS (MMRM) to assess efficacy in depression, and a prespecified multiple regression analysis (path analysis) to calculate direct vs indirect effects of vortioxetine on cognitive function. Safety and tolerability were assessed at all visits. Vortioxetine was statistically superior to placebo on the DSST (P<0.05), PDQ (P<0.01), CGI-I (P<0.001), MADRS (P<0.05), and UPSA (P<0.001). Path analysis indicated that vortioxetine's cognitive benefit was primarily a direct treatment effect rather than due to alleviation of depressive symptoms. Duloxetine was not significantly different from placebo on the DSST or UPSA, but was superior to placebo on the PDQ, CGI-I, and MADRS. Common adverse events (incidence ⩾5%) for vortioxetine were nausea, headache, and diarrhea. In this study of MDD adults who self-reported cognitive dysfunction, vortioxetine significantly improved cognitive function, depression, and functionality and was generally well tolerated. PMID:25687662

  9. Genetics Home Reference: sialidosis

    MedlinePlus

    ... features. Sialidosis type I, also referred to as cherry-red spot myoclonus syndrome, is the less severe ... or night blindness. An eye abnormality called a cherry-red spot, which can be identified with an ...

  10. Inter-rater Agreement of Nasal Endoscopy in Patients with a Prior History of Endoscopic Sinus Surgery

    PubMed Central

    McCoul, Edward D.; Smith, Timothy L.; Mace, Jess C.; Anand, Vijay K.; Senior, Brent A.; Hwang, Peter H.; Stankiewicz, James A.; Tabaee, Abtin

    2012-01-01

    OBJECTIVE Nasal endoscopy is an important part of the clinical evaluation of patients with chronic rhinosinusitis. However, its objectivity and inter-rater agreement have not been well studied, especially in patients who have previously had sinus surgery. METHODS Patients with a history of endoscopic sinus surgery for chronic rhinosinusitis were prospectively enrolled from a tertiary rhinology practice. Fourteen endoscopic nasal examinations were recorded using digital video capture software. Each patient also underwent computerized tomography (CT) and completed the Sinonasal Outcome Test (SNOT-22). Blinded review of inflammatory and anatomic findings for each video was independently performed by 5 academic rhinologists at separate institutions. Comparisons were performed using the unweighted Fleiss’ kappa statistic (Kf) and the prevalence- and bias-adjusted kappa (PABAK). RESULTS There were no significant correlations between age, Lund-Mackay score or SNOT-22 score. Inter-rater agreement was variable across the characteristics studied. Mean PABAK was excellent for the assessment of polyps (Kf =0.886); moderate for the assessments of middle turbinate (MT) integrity (Kf =0.543), MT position (Kf =0.443), maxillary sinus patency (Kf =0.593) and ethmoid sinus patency (Kf =0.429); fair for discharge (Kf =0.314), synechiae (Kf =0.257) and middle meatus patency (Kf =0.229); and poor for MT mucosal changes (Kf =0.148) and uncinate process (Kf =0.126). CONCLUSIONS The current study was notable for variability in the inter-rater agreement among the inflammatory and anatomic attributes that were examined. Further standardization of nasal endoscopy with regard to interpretation may improve the reliability of this procedure in clinical practice. PMID:22696506

  11. Inter-rater agreement in evaluation of disability: systematic review of reproducibility studies

    PubMed Central

    Barth, Jürgen; de Boer, Wout E L; Busse, Jason W; Hoving, Jan L; Kedzia, Sarah; Couban, Rachel; Fischer, Katrin; von Allmen, David Y; Spanjer, Jerry

    2017-01-01

    Objectives To explore agreement among healthcare professionals assessing eligibility for work disability benefits. Design Systematic review and narrative synthesis of reproducibility studies. Data sources Medline, Embase, and PsycINFO searched up to 16 March 2016, without language restrictions, and review of bibliographies of included studies. Eligibility criteria Observational studies investigating reproducibility among healthcare professionals performing disability evaluations using a global rating of working capacity and reporting inter-rater reliability by a statistical measure or descriptively. Studies could be conducted in insurance settings, where decisions on ability to work include normative judgments based on legal considerations, or in research settings, where decisions on ability to work disregard normative considerations.Teams of paired reviewers identified eligible studies, appraised their methodological quality and generalisability, and abstracted results with pretested forms. As heterogeneity of research designs and findings impeded a quantitative analysis, a descriptive synthesis stratified by setting (insurance or research) was performed. Results From 4562 references, 101 full text articles were reviewed. Of these, 16 studies conducted in an insurance setting and seven in a research setting, performed in 12 countries, met the inclusion criteria. Studies in the insurance setting were conducted with medical experts assessing claimants who were actual disability claimants or played by actors, hypothetical cases, or short written scenarios. Conditions were mental (n=6, 38%), musculoskeletal (n=4, 25%), or mixed (n=6, 38%). Applicability of findings from studies conducted in an insurance setting to real life evaluations ranged from generalisable (n=7, 44%) and probably generalisable (n=3, 19%) to probably not generalisable (n=6, 37%). Median inter-rater reliability among experts was 0.45 (range intraclass correlation coefficient 0.86 to κ−0

  12. Is the Parkinson Anxiety Scale comparable across raters?

    PubMed

    Forjaz, Maria João; Ayala, Alba; Martinez-Martin, Pablo; Dujardin, Kathy; Pontone, Gregory M; Starkstein, Sergio E; Weintraub, Daniel; Leentjens, Albert F G

    2015-04-01

    The Parkinson Anxiety Scale is a new scale developed to measure anxiety severity in Parkinson's disease specifically. It consists of three dimensions: persistent anxiety, episodic anxiety, and avoidance behavior. This study aimed to assess the measurement properties of the scale while controlling for the rater (self- vs. clinician-rated) effect. The Parkinson Anxiety Scale was administered to a cross-sectional multicenter international sample of 362 Parkinson's disease patients. Both patients and clinicians rated the patient's anxiety independently. A many-facet Rasch model design was applied to estimate and remove the rater effect. The following measurement properties were assessed: fit to the Rasch model, unidimensionality, reliability, differential item functioning, item local independency, interrater reliability (self or clinician), and scale targeting. In addition, test-retest stability, construct validity, precision, and diagnostic properties of the Parkinson Anxiety Scale were also analyzed. A good fit to the Rasch model was obtained for Parkinson Anxiety Scale dimensions A and B, after the removal of one item and rescoring of the response scale for certain items, whereas dimension C showed marginal fit. Self versus clinician rating differences were of small magnitude, with patients reporting higher anxiety levels than clinicians. The linear measure for Parkinson Anxiety Scale dimensions A and B showed good convergent construct with other anxiety measures and good diagnostic properties. Parkinson Anxiety Scale modified dimensions A and B provide valid and reliable measures of anxiety in Parkinson's disease that are comparable across raters. Further studies are needed with dimension C. © 2014 International Parkinson and Movement Disorder Society.

  13. The Effect of Spiritual Intervention on Postmenopausal Depression in Women Referred to Urban Healthcare Centers in Isfahan: A Double-Blind Clinical Trial

    PubMed Central

    Shafiee, Zohre; Zandiyeh, Zahra; Moeini, Mahin; Gholami, Ali

    2016-01-01

    Background: Depression is not only common after menopause, but also affects postmenopausal women more than other women. Some studies show the positive effects of spiritual intervention on postmenopausal women and depressed patients. However, there is inadequate experimental data for supporting the effectiveness of such interventions. Objectives: This study investigated the effect of a spiritual intervention on postmenopausal depression in women referred to urban healthcare centers in Isfahan, Iran. Patients and Methods: A randomized controlled clinical trial was conducted on postmenopausal women referred to the healthcare centers of Isfahan. Sixty-four women with postmenopausal depression were assigned randomly into an experimental group (n = 32) and a control group (n = 32). The experimental group received eight sessions of spiritual intervention while the control group received two sessions of training on healthy diet for postmenopausal women. All subjects in the experimental group and the control group responded to the Beck’s depression inventory at the start of the study, at the end of the fourth week, and a month after the last educational session. In addition to descriptive statistics, the chi-square test, independent samples t-test and repeated measures analysis of variance were used to analyze the data. Results: Before the intervention, the study groups did not differ significantly in terms of mean depression scores (20.76 ± 4.61 vs. 19.58 ± 5.27, P = 0.33). However, immediately after intervention and after one month, the mean depression scores of 11.01 ± 7.85 and 11.21 ± 9.23 in the experimental group were significantly lower than the control group (19.22 ± 4.94 and 19.34 ± 4.92, respectively) (P = 0.001). In repeated measures analysis of variance, Mauchly’s test of sphericity was not significant (P = 0.672), and in the test of within-subjects effects, a significant interaction was found between the spiritual intervention and time. Conclusions

  14. A comparison of analysis procedures for correlated binary data in dedicated multi-rater imaging trials.

    PubMed

    Kunz, Michael

    2015-01-01

    In this paper, three analysis procedures for repeated correlated binary data with no a priori ordering of the measurements are described and subsequently investigated. Examples for correlated binary data could be the binary assessments of subjects obtained by several raters in the framework of a clinical trial. This topic is especially of relevance when success criteria have to be defined for dedicated imaging trials involving several raters conducted for regulatory purposes. First, an analytical result on the expectation of the 'Majority rater' is presented when only the marginal distributions of the single raters are given. The paper provides a simulation study where all three analysis procedures are compared for a particular setting. It turns out that in many cases, 'Average rater' is associated with a gain in power. Settings were identified where 'Majority significant' has favorable properties. 'Majority rater' is in many cases difficult to interpret. Copyright © 2014 John Wiley & Sons, Ltd.

  15. Assessment and Correlation of Customer and Rater Response to Cold-Start and Warmup Driveability

    DTIC Science & Technology

    1993-08-01

    Satisfcation Versus Fuel Volatility Level- TWD Vehicles ............................................... 9 XIII. Comparison of Customer and Rater Results...AD-A271 775 CRC Report No. 585 ASSESSMENT AND CORRELATION OF CUSTOMER AND RATER RESPONSE TO COLD-START AND WARMUP DRIVEABILITY -bA A K-70-&q _C -oo0...404) 396-3404 Society of Automotive Engineers, Inc. ASSESSMENT AND CORRELATION OF CUSTOMER AND RATER RESPONSE TO COLD-START AND WARMUP DRIVEABILITY

  16. Intra and inter-rater reliability study of pelvic floor muscle dynamometric measurements

    PubMed Central

    Martinho, Natalia M.; Marques, Joseane; Silva, Valéria R.; Silva, Silvia L. A.; Carvalho, Leonardo C.; Botelho, Simone

    2015-01-01

    OBJECTIVE: The aim of this study was to evaluate the intra and inter-rater reliability of pelvic floor muscle (PFM) dynamometric measurements for maximum and average strengths, as well as endurance. METHOD: A convenience sample of 18 nulliparous women, without any urogynecological complaints, aged between 19 and 31 (mean age of 25.4±3.9) participated in this study. They were evaluated using a pelvic floor dynamometer based on load cell technology. The dynamometric evaluations were repeated in three successive sessions: two on the same day with a rest period of 30 minutes between them, and the third on the following day. All participants were evaluated twice in each session; first by examiner 1 followed by examiner 2. The vaginal dynamometry data were analyzed using three parameters: maximum strength, average strength, and endurance. The Intraclass Correlation Coefficient (ICC) was applied to estimate the PFM dynamometric measurement reliability, considering a good level as being above 0.75. RESULTS: The intra and inter-raters' analyses showed good reliability for maximum strength (ICCintra-rater1=0.96, ICCintra-rater2=0.95, and ICCinter-rater=0.96), average strength (ICCintra-rater1=0.96, ICCintra-rater2=0.94, and ICCinter-rater=0.97), and endurance (ICCintra-rater1=0.88, ICCintra-rater2=0.86, and ICCinter-rater=0.92) dynamometric measurements. CONCLUSIONS: The PFM dynamometric measurements showed good intra- and inter-rater reliability for maximum strength, average strength and endurance, which demonstrates that this is a reliable device that can be used in clinical practice. PMID:25993624

  17. A double-blind, randomized, placebo-controlled, active reference study of Lu AA21004 in patients with major depressive disorder.

    PubMed

    Alvarez, Enric; Perez, Victor; Dragheim, Marianne; Loft, Henrik; Artigas, Francesc

    2012-06-01

    The efficacy, safety, and tolerability of Lu AA21004 vs. placebo using venlafaxine XR as active reference in patients with DSM-IV-TR major depressive disorder (MDD) were evaluated. Lu AA21004 is a novel antidepressant that is a 5-HT3 and 5-HT7 receptor antagonist, 5-HT1A receptor agonist, 5-HT1B receptor partial agonist and inhibitor of the 5-HT transporter in recombinant cell lines. In this 6-wk, multi-site study, 429 patients were randomly assigned (1:1:1:1) to 5 or 10 mg Lu AA21004, placebo or 225 mg venlafaxine XR. All patients had a baseline Montgomery-Åsberg Depression Rating Scale (MADRS) total score ≥ 30. The primary efficacy analysis was based on the MADRS total score adjusting for multiplicity using a hierarchical testing procedure starting with the highest dose vs. placebo. Lu AA21004 was statistically significantly superior to placebo (n=105) in mean change from baseline in MADRS total score at week 6 (p<0.0001, last observation carried forward), with a mean treatment difference vs. placebo of 5.9 (5 mg, n=108), and 5.7 (10 mg, n=100) points. Venlafaxine XR (n=112) was also significantly superior to placebo at week 6 (p<0.0001). In total, 30 patients withdrew due to adverse events (AEs)--placebo: four (4%); 5 mg Lu AA21004: three (3%); 10 mg Lu AA21004: seven (7%); and venlafaxine: 16 (14%). The most common AEs were nausea, headache, hyperhidrosis, and dry mouth. No clinically relevant changes over time were seen in the clinical laboratory results, vital signs, weight, or ECG parameters. In this study, treatment with 5 mg and 10 mg Lu AA21004 for 6 wk was efficacious and well tolerated in patients with MDD.

  18. A double-blind, randomized, placebo-controlled, active reference study of Lu AA21004 in patients with major depressive disorder

    PubMed Central

    Alvarez, Enric; Perez, Victor; Dragheim, Marianne; Loft, Henrik; Artigas, Francesc

    2012-01-01

    The efficacy, safety, and tolerability of Lu AA21004 vs. placebo using venlafaxine XR as active reference in patients with DSM-IV-TR major depressive disorder (MDD) were evaluated. Lu AA21004 is a novel antidepressant that is a 5-HT3 and 5-HT7 receptor antagonist, 5-HT1A receptor agonist, 5-HT1B receptor partial agonist and inhibitor of the 5-HT transporter in recombinant cell lines. In this 6-wk, multi-site study, 429 patients were randomly assigned (1:1:1:1) to 5 or 10 mg Lu AA21004, placebo or 225 mg venlafaxine XR. All patients had a baseline Montgomery–Åsberg Depression Rating Scale (MADRS) total score ⩾30. The primary efficacy analysis was based on the MADRS total score adjusting for multiplicity using a hierarchical testing procedure starting with the highest dose vs. placebo. Lu AA21004 was statistically significantly superior to placebo (n=105) in mean change from baseline in MADRS total score at week 6 (p<0.0001, last observation carried forward), with a mean treatment difference vs. placebo of 5.9 (5 mg, n=108), and 5.7 (10 mg, n=100) points. Venlafaxine XR (n=112) was also significantly superior to placebo at week 6 (p<0.0001). In total, 30 patients withdrew due to adverse events (AEs) – placebo: four (4%); 5 mg Lu AA21004: three (3%); 10 mg Lu AA21004: seven (7%); and venlafaxine: 16 (14%). The most common AEs were nausea, headache, hyperhidrosis, and dry mouth. No clinically relevant changes over time were seen in the clinical laboratory results, vital signs, weight, or ECG parameters. In this study, treatment with 5 mg and 10 mg Lu AA21004 for 6 wk was efficacious and well tolerated in patients with MDD. PMID:21767441

  19. Can training improve the quality of inferences made by raters in competency modeling? A quasi-experiment.

    PubMed

    Lievens, Filip; Sanchez, Juan I

    2007-05-01

    A quasi-experiment was conducted to investigate the effects of frame-of-reference training on the quality of competency modeling ratings made by consultants. Human resources consultants from a large consulting firm were randomly assigned to either a training or a control condition. The discriminant validity, interrater reliability, and accuracy of the competency ratings were significantly higher in the training group than in the control group. Further, the discriminant validity and interrater reliability of competency inferences were highest among an additional group of trained consultants who also had competency modeling experience. Together, these results suggest that procedural interventions such as rater training can significantly enhance the quality of competency modeling.

  20. Grant Peer Review: Improving Inter-Rater Reliability with Training

    SciTech Connect

    Sattler, David N.; McKnight, Patrick E.; Naney, Linda; Mathis, Randy

    2015-06-15

    In this study, we developed and evaluated a brief training program for grant reviewers that aimed to increase inter-rater reliability, rating scale knowledge, and effort to read the grant review criteria. Enhancing reviewer training may improve the reliability and accuracy of research grant proposal scoring and funding recommendations. Seventy-five Public Health professors from U.S. research universities watched the training video we produced and assigned scores to the National Institutes of Health scoring criteria proposal summary descriptions. For both novice and experienced reviewers, the training video increased scoring accuracy (the percentage of scores that reflect the true rating scale values), inter-rater reliability, and the amount of time reading the review criteria compared to the no video condition. The increase in reliability for experienced reviewers is notable because it is commonly assumed that reviewers—especially those with experience—have good understanding of the grant review rating scale. Our findings suggest that both experienced and novice reviewers who had not received the type of training developed in this study may not have appropriate understanding of the definitions and meaning for each value of the rating scale and that experienced reviewers may overestimate their knowledge of the rating scale. Lastly, the results underscore the benefits of and need for specialized peer reviewer training.

  1. Acute blindness.

    PubMed

    Abboud, H; Sabbagh, C

    2008-11-01

    A 15-year-old man presenting with cortical blindness as the initial symptom of mitochondrial encephalomyopathy, lactic acidosis and stroke-like episodes (MELAS) is reported. He showed fluctuating consciousness and severe occipital headache with nausea and vomiting. T2 and diffusion-weighted magnetic resonance imaging showed high signal intensity in the occipital lobes. Electroencephalography showed diffuse sharp waves with focal epileptic discharges over the posterior region. The nature of stroke-like episodes and seizure mechanisms is unexplained in MELAS. Consequently, the possible mechanisms of the cortical blindness in this case are discussed.

  2. Intra- and inter-rater reliability of isometric shoulder extensor and internal rotator strength measurements performed using a hand-held dynamometer.

    PubMed

    Awatani, Takenori; Morikita, Ikuhiro; Shinohara, Junji; Mori, Seigo; Nariai, Miki; Tatsumi, Yasutaka; Nagata, Akinori; Koshiba, Hiroya

    2016-11-01

    [Purpose] The purpose of the present study was to establish the intra- and inter-rater reliability of measurement of extensor strength in the maximum shoulder abducted position and internal rotator strength in the 90° abducted and the 90° external rotated position using a hand-held dynamometer. [Subjects and Methods] Twelve healthy volunteers (12 male; mean ± SD: age 19.0 ± 1.1 years) participated in the study. The examiners were two students who had nonclinical experience with a hand-held dynamometer measurement. The examiners and participants were blinded to measurement results by the recorder. Participants in the prone position were instructed to hold the contraction against the ground reaction force, and peak isometric force was recorded using the hand-held dynamometer on the floor. Reliability was determined using intraclass correlation coefficients. [Results] The intra- and inter-rater reliability data were found to be "almost perfect". [Conclusion] This study investigated intra- and inter-rater reliability and reveald high reliability. Thus, the measurement method used in the present study can evaluate muscle strength by a simple measurement technique.

  3. Rating the raters: assessing the quality of Hamilton rating scale for depression clinical interviews in two industry-sponsored clinical drug trials.

    PubMed

    Engelhardt, Nina; Feiger, Alan D; Cogger, Kenneth O; Sikich, Dawn; DeBrota, David J; Lipsitz, Joshua D; Kobak, Kenneth A; Evans, Kenneth R; Potter, William Z

    2006-02-01

    The quality of clinical interviews conducted in industry-sponsored clinical drug trials is an important but frequently overlooked variable that may influence the outcome of a study. We evaluated the quality of Hamilton Rating Scale for Depression (HAM-D) clinical interviews performed at baseline in 2 similar multicenter, randomized, placebo-controlled depression trials sponsored by 2 pharmaceutical companies. A total of 104 audiotaped HAM-D clinical interviews were evaluated by a blinded expert reviewer for interview quality using the Rater Applied Performance Scale (RAPS). The RAPS assesses adherence to a structured interview guide, clarification of and follow-up to patient responses, neutrality, rapport, and adequacy of information obtained. HAM-D interviews were brief and cursory and the quality of interviews was below what would be expected in a clinical drug trial. Thirty-nine percent of the interviews were conducted in 10 minutes or less, and most interviews were rated fair or unsatisfactory on most RAPS dimensions. Results from our small sample illustrate that the clinical interview skills of raters who administered the HAM-D were below what many would consider acceptable. Evaluation and training of clinical interview skills should be considered as part of a rater training program.

  4. Blindness Clues

    ERIC Educational Resources Information Center

    Science Teacher, 2005

    2005-01-01

    Age-related macular degeneration is the leading cause of blindness in older adults, yet researchers are still in the dark about many of the factors that cause this incurable disease. But new insight from University of Florida (UF) and German researchers about a genetic link between rhesus monkeys with macular degeneration and humans could unlock…

  5. Blind Ambition

    ERIC Educational Resources Information Center

    Olson, Catherine Applefeld

    2009-01-01

    No matter how dedicated they may be, some teachers are daunted by extreme challenges. Carol Agler, music director at the Ohio State School for the Blind (OSSB), is not one of those teachers. Since joining the OSSB staff 11 years ago, Agler has revived the school's long-dormant band program and created its first marching band. Next January, she…

  6. Blindness Clues

    ERIC Educational Resources Information Center

    Science Teacher, 2005

    2005-01-01

    Age-related macular degeneration is the leading cause of blindness in older adults, yet researchers are still in the dark about many of the factors that cause this incurable disease. But new insight from University of Florida (UF) and German researchers about a genetic link between rhesus monkeys with macular degeneration and humans could unlock…

  7. Foundation Fighting Blindness

    MedlinePlus

    ... Campaign to End Blindness Other Ways to Fight Blindness Corporate Support Volunteer Take Action Honor a Loved ... taking place nationwide. Join Us We Are Ending Blindness The urgent mission of the Foundation Fighting Blindness ...

  8. Wounds measured from digital photographs using photodigital planimetry software: validation and rater reliability.

    PubMed

    Wendelken, Martin E; Berg, William T; Lichtenstein, Philip; Markowitz, Lee; Comfort, Christopher; Alvarez, Oscar M

    2011-09-01

     Traditional wound tracing technique consists of tracing the perimeter of the wound on clear acetate with a fine-tip marker, then placing the tracing on graph paper and counting the grids to calculate the surface area. Standard wound measurement technique for calcu- lating wound surface area (wound tracing) was compared to a new wound measurement method using digital photo-planimetry software ([DPPS], PictZar® Digital Planimetry). Two hundred wounds of varying etiologies were measured and traced by experienced exam- iners (raters). Simultaneously, digital photographs were also taken of each wound. The digital photographs were downloaded onto a PC, and using DPPS software, the wounds were measured and traced by the same examiners. Accuracy, intra- and interrater reliability of wound measurements obtained from tracings and from DPPS were studied and compared. Both accuracy and rater variability were directly related to wound size when wounds were measured and traced in the tradi- tional manner. In small (< 4 cm2), regularly shaped (round or oval) wounds, both accuracy and rater reliability was 98% and 95%, respectively. However, in larger, irregularly shaped wounds or wounds with epithelial islands, DPPS was more accurate than traditional mea- suring (3.9% vs. 16.2% [average error]). The mean inter-rater reliabil- ity score was 94% for DPPS and 84% for traditional measuring. The mean intrarater reliability score was 98.3% for DPPS and 89.3% for traditional measuring. In contrast to traditional measurements, DPPS may provide a more objective assessment since it can be done by a technician who is blinded to the treatment plan. Planimetry of digital photographs allows for a closer examination (zoom) of the wound and better visibility of advancing epithelium. Measurements of wounds performed on digital photographs using planimetry software were simple and convenient. It was more accurate, more objective, and resulted in better correlation within and

  9. Inter-rater agreement among orthodontists in a blocked experiment.

    PubMed

    Korn, E L; Baumrind, S

    1985-01-01

    Five orthodontists were asked to predict for 64 patients a particular dichotomous outcome of treatment based on pre-treatment X-ray films. The orthodontists rated the cases in blocks of size 4-6 with the knowledge of the number of positive outcomes in each block. We discuss the reasons why this blocked design is appropriate whenever clinicians are asked to rate cases which have not been randomly selected from a clinical practice similar to their own. We give a simple description of the inter-rater agreement for this type of blocked experiment as well as a procedure to test that the agreement is no better than that expected by random independent assignment.

  10. Innovations in Measuring Rater Accuracy in Standard Setting: Assessing "Fit" to Item Characteristic Curves

    ERIC Educational Resources Information Center

    Hurtz, Gregory M.; Jones, J. Patrick

    2009-01-01

    Standard setting methods such as the Angoff method rely on judgments of item characteristics; item response theory empirically estimates item characteristics and displays them in item characteristic curves (ICCs). This study evaluated several indexes of rater fit to ICCs as a method for judging rater accuracy in their estimates of expected item…

  11. Assessing and quantifying inter-rater variation for dichotomous ratings using a Rasch model.

    PubMed

    Petersen, Jørgen Holm; Larsen, Klaus; Kreiner, Svend

    2012-12-01

    We present a new model-based approach to the analysis of agreement between raters in a situation where all raters have supplied dichotomous ratings of the same cases in a sample. The model is a logistic regression model with random effects--a Rasch model. In the rater setting, the Rasch model includes parameters that allow raters to have different propensities to score a given set of individuals positively or negatively--the rater bias. An exact score test of the hypothesis of no rater bias is proposed and is shown to be an exact generalised McNemar's test. Based on the model, we suggest quantifying the rater variation as a suitable measure of the variation of the rater odds ratios. An important example that will serve to motivate and illustrate the proposed model, is the study of Umbilical artery Doppler velocimetry used by obstetricians to assess the status of a foetus. The purpose of the assessment is to improve the foetus' chance of survival by choosing the optimal time of elective delivery. In the study, data related to 139 perinatal deaths were sent to 32 experts who were asked whether the use of Doppler velocimetry might have prevented each death.

  12. A novel approach to rater training and certification in multinational trials.

    PubMed

    Jeglic, Elizabeth; Kobak, Kenneth A; Engelhardt, Nina; Williams, Janet B W; Lipsitz, Joshua D; Salvucci, Donna; Bryson, Heather; Bellew, Kevin

    2007-07-01

    Clinical trials are becoming increasingly international in scope. Global studies pose unique challenges in training and calibrating raters owing to language and cultural differences. Recent findings that poorly conducted interviews reduce study power, makes attention to raters' clinical skills critical. In this study, 109 raters from 14 countries went through a two-step certification process on the Hamilton Depression and Anxiety Rating Scales: (i) an online didactic tutorial on scoring conventions, and (ii) applied clinical training, consisting of small language-specific groups in which raters took turns interviewing patients while observed by an expert trainer, and observation and evaluation of individual interviews. Translators were used when native-language trainers were unavailable. Those who were unable to attend the startup meeting received the training individually via telephone. Results found a significant improvement in raters' knowledge of scoring conventions, with the mean number of correct answers on the 20-item test improving from 14.59 to 17.83, P<0.0001. In addition, raters' clinical skills improved significantly, with the mean score on the Rater Applied Performance Scale improving from their first to their second testing from 10.25 to 11.31, P=0.003. These results support the efficacy of this applied training model in improving raters' applied clinical skills in multinational trials.

  13. A Simulation Study of Rater Agreement Measures with 2x2 Contingency Tables

    ERIC Educational Resources Information Center

    Ato, Manuel; Lopez, Juan Jose; Benavente, Ana

    2011-01-01

    A comparison between six rater agreement measures obtained using three different approaches was achieved by means of a simulation study. Rater coefficients suggested by Bennet's [sigma] (1954), Scott's [pi] (1955), Cohen's [kappa] (1960) and Gwet's [gamma] (2008) were selected to represent the classical, descriptive approach, [alpha] agreement…

  14. A Cross-Linguistic Investigation of the Effect of Raters' Accent Familiarity on Speaking Assessment

    ERIC Educational Resources Information Center

    Huang, Becky; Alegre, Analucia; Eisenberg, Ann

    2016-01-01

    The project aimed to examine the effect of raters' familiarity with accents on their judgments of non-native speech. Participants included three groups of raters who were either from Spanish Heritage, Spanish Non-Heritage, or Chinese Heritage backgrounds (n = 16 in each group) using Winke & Gass's (2013) definition of a heritage learner as…

  15. The Effect of Raters and Rating Conditions on the Reliability of the Missionary Teaching Assessment

    ERIC Educational Resources Information Center

    Ure, Abigail C.

    2011-01-01

    This study investigated how 2 different rating conditions, the controlled rating condition (CRC) and the uncontrolled rating condition (URC), effected rater behavior and the reliability of a performance assessment (PA) known as the Missionary Teaching Assessment (MTA). The CRC gives raters the capability to manipulate (pause, rewind, fast-forward)…

  16. Comparing Native and Non-Native Raters of US Federal Government Speaking Tests

    ERIC Educational Resources Information Center

    Brooks, Rachel Lunde

    2013-01-01

    Previous Language Testing research has largely reported that although many raters' characteristics affect their evaluations of language assessments (Reed & Cohen, 2001), being a native speaker or non-native speaker rater does not significantly affect final ratings (Kim, 2009). In Second Language Acquisition, some researchers conclude that…

  17. The inter-rater reliability of Strain Index and OCRA Checklist task assessments in cheese processing.

    PubMed

    Paulsen, Robert; Gallu, Tommaso; Gilkey, David; Reiser, Raoul; Murgia, Lelia; Rosecrance, John

    2015-11-01

    The purpose of this study was to characterize the inter-rater reliability of two physical exposure assessment methods of the upper extremity, the Strain Index (SI) and Occupational Repetitive Actions (OCRA) Checklist. These methods are commonly used in occupational health studies and by occupational health practitioners. Seven raters used the SI and OCRA Checklist to assess task-level physical exposures to the upper extremity of workers performing 21 cheese manufacturing tasks. Inter-rater reliability was characterized using a single-measure, agreement-based intraclass correlation coefficient (ICC). Inter-rater reliability of SI assessments was moderate to good (ICC = 0.59, 95% CI: 0.45-0.73), a similar finding to prior studies. Inter-rater reliability of OCRA Checklist assessments was excellent (ICC = 0.80, 95% CI: 0.70-0.89). Task complexity had a small, but non-significant, effect on inter-rater reliability SI and OCRA Checklist scores. Both the SI and OCRA Checklist assessments possess adequate inter-rater reliability for the purposes of occupational health research and practice. The OCRA Checklist inter-rater reliability scores were among the highest reported in the literature for semi-quantitative physical exposure assessment tools of the upper extremity. The OCRA Checklist however, required more training time and time to conduct the risk assessments compared to the SI.

  18. Monitoring Rater Performance over Time: A Framework for Detecting Differential Accuracy and Differential Scale Category Use

    ERIC Educational Resources Information Center

    Myford, Carol M.; Wolfe, Edward W.

    2009-01-01

    In this study, we describe a framework for monitoring rater performance over time. We present several statistical indices to identify raters whose standards drift and explain how to use those indices operationally. To illustrate the use of the framework, we analyzed rating data from the 2002 Advanced Placement English Literature and Composition…

  19. Exploring the Role of First Impressions in Rater-Based Assessments

    ERIC Educational Resources Information Center

    Wood, Timothy J.

    2014-01-01

    Medical education relies heavily on assessment formats that require raters to assess the competence and skills of learners. Unfortunately, there are often inconsistencies and variability in the scores raters assign. To ensure the scores from these assessment tools have validity, it is important to understand the underlying cognitive processes that…

  20. Managing Rater Effects through the Use of FACETS Analysis: The Case of a University Placement Test

    ERIC Educational Resources Information Center

    Wu, Siew Mei; Tan, Susan

    2016-01-01

    Rating essays is a complex task where students' grades could be adversely affected by test-irrelevant factors such as rater characteristics and rating scales. Understanding these factors and controlling their effects are crucial for test validity. Rater behaviour has been extensively studied through qualitative methods such as questionnaires and…

  1. Managing Rater Effects through the Use of FACETS Analysis: The Case of a University Placement Test

    ERIC Educational Resources Information Center

    Wu, Siew Mei; Tan, Susan

    2016-01-01

    Rating essays is a complex task where students' grades could be adversely affected by test-irrelevant factors such as rater characteristics and rating scales. Understanding these factors and controlling their effects are crucial for test validity. Rater behaviour has been extensively studied through qualitative methods such as questionnaires and…

  2. Comparison of "E-Rater"[R] Automated Essay Scoring Model Calibration Methods Based on Distributional Targets

    ERIC Educational Resources Information Center

    Zhang, Mo; Williamson, David M.; Breyer, F. Jay; Trapani, Catherine

    2012-01-01

    This article describes two separate, related studies that provide insight into the effectiveness of "e-rater" score calibration methods based on different distributional targets. In the first study, we developed and evaluated a new type of "e-rater" scoring model that was cost-effective and applicable under conditions of absent human rating and…

  3. Approximate measurement invariance in cross-classified rater-mediated assessments.

    PubMed

    Kelcey, Ben; McGinn, Dan; Hill, Heather

    2014-01-01

    An important assumption underlying meaningful comparisons of scores in rater-mediated assessments is that measurement is commensurate across raters. When raters differentially apply the standards established by an instrument, scores from different raters are on fundamentally different scales and no longer preserve a common meaning and basis for comparison. In this study, we developed a method to accommodate measurement noninvariance across raters when measurements are cross-classified within two distinct hierarchical units. We conceptualized random item effects cross-classified graded response models and used random discrimination and threshold effects to test, calibrate, and account for measurement noninvariance among raters. By leveraging empirical estimates of rater-specific deviations in the discrimination and threshold parameters, the proposed method allows us to identify noninvariant items and empirically estimate and directly adjust for this noninvariance within a cross-classified framework. Within the context of teaching evaluations, the results of a case study suggested substantial noninvariance across raters and that establishing an approximately invariant scale through random item effects improves model fit and predictive validity.

  4. Exploring the Role of First Impressions in Rater-Based Assessments

    ERIC Educational Resources Information Center

    Wood, Timothy J.

    2014-01-01

    Medical education relies heavily on assessment formats that require raters to assess the competence and skills of learners. Unfortunately, there are often inconsistencies and variability in the scores raters assign. To ensure the scores from these assessment tools have validity, it is important to understand the underlying cognitive processes that…

  5. Controlling Rater Stringency Error in Clinical Performance Rating: Further Validation of a Performance Rating Theory.

    ERIC Educational Resources Information Center

    Cason, Gerald J.; And Others

    Prior research in a single clinical training setting has shown Cason and Cason's (1981) simplified model of their performance rating theory can improve rating reliability and validity through statistical control of rater stringency error. Here, the model was applied to clinical performance ratings of 14 cohorts (about 250 students and 200 raters)…

  6. Complementing Human Judgment of Essays Written by English Language Learners with E-Rater[R] Scoring

    ERIC Educational Resources Information Center

    Enright, Mary K.; Quinlan, Thomas

    2010-01-01

    E-rater[R] is an automated essay scoring system that uses natural language processing techniques to extract features from essays and to model statistically human holistic ratings. Educational Testing Service has investigated the use of e-rater, in conjunction with human ratings, to score one of the two writing tasks on the TOEFL-iBT[R] writing…

  7. A Cross-Linguistic Investigation of the Effect of Raters' Accent Familiarity on Speaking Assessment

    ERIC Educational Resources Information Center

    Huang, Becky; Alegre, Analucia; Eisenberg, Ann

    2016-01-01

    The project aimed to examine the effect of raters' familiarity with accents on their judgments of non-native speech. Participants included three groups of raters who were either from Spanish Heritage, Spanish Non-Heritage, or Chinese Heritage backgrounds (n = 16 in each group) using Winke & Gass's (2013) definition of a heritage learner as…

  8. The Meaning and Suitability of Various Effect Sizes for Structured Rater [times] Ratee Designs

    ERIC Educational Resources Information Center

    Honekopp, Johannes; Becker, Betsy Jane; Oswald, Frederick L.

    2006-01-01

    Four types of analysis are commonly applied to data from structured Rater [times] Ratee designs. These types are characterized by the unit of analysis, which is either raters or ratees, and by the design used, which is either between-units or within-unit design. The 4 types of analysis are quite different, and therefore they give rise to effect…

  9. Comparing Native and Non-Native Raters of US Federal Government Speaking Tests

    ERIC Educational Resources Information Center

    Brooks, Rachel Lunde

    2013-01-01

    Previous Language Testing research has largely reported that although many raters' characteristics affect their evaluations of language assessments (Reed & Cohen, 2001), being a native speaker or non-native speaker rater does not significantly affect final ratings (Kim, 2009). In Second Language Acquisition, some researchers conclude that…

  10. Effects of Rating Task Instructions on Consistency and Accuracy of Expert Raters.

    ERIC Educational Resources Information Center

    Littlefield, John H.; Troendle, G. Roger

    The effect of different types of rating task instructions on rater behavior was examined using experts, as opposed to novices, as raters. The experts were instructed to (1) form a global categorical judgment (early hypothesis generation); (2) assess 19 detailed elements; or (3) both. Subjects were 8 dental faculty members who ranged in age from 28…

  11. Approximate measurement invariance in cross-classified rater-mediated assessments

    PubMed Central

    Kelcey, Ben; McGinn, Dan; Hill, Heather

    2014-01-01

    An important assumption underlying meaningful comparisons of scores in rater-mediated assessments is that measurement is commensurate across raters. When raters differentially apply the standards established by an instrument, scores from different raters are on fundamentally different scales and no longer preserve a common meaning and basis for comparison. In this study, we developed a method to accommodate measurement noninvariance across raters when measurements are cross-classified within two distinct hierarchical units. We conceptualized random item effects cross-classified graded response models and used random discrimination and threshold effects to test, calibrate, and account for measurement noninvariance among raters. By leveraging empirical estimates of rater-specific deviations in the discrimination and threshold parameters, the proposed method allows us to identify noninvariant items and empirically estimate and directly adjust for this noninvariance within a cross-classified framework. Within the context of teaching evaluations, the results of a case study suggested substantial noninvariance across raters and that establishing an approximately invariant scale through random item effects improves model fit and predictive validity. PMID:25566145

  12. Intra-rater variability in low-grade glioma segmentation.

    PubMed

    Bø, Hans Kristian; Solheim, Ole; Jakola, Asgeir Store; Kvistad, Kjell-Arne; Reinertsen, Ingerid; Berntsen, Erik Magnus

    2017-01-01

    Assessment of size and growth are key radiological factors in low-grade gliomas (LGGs), both for prognostication and treatment evaluation, but the reliability of LGG-segmentation is scarcely studied. With a diffuse and invasive growth pattern, usually without contrast enhancement, these tumors can be difficult to delineate. The aim of this study was to investigate the intra-observer variability in LGG-segmentation for a radiologist without prior segmentation experience. Pre-operative 3D FLAIR images of 23 LGGs were segmented three times in the software 3D Slicer. Tumor volumes were calculated, together with the absolute and relative difference between the segmentations. To quantify the intra-rater variability, we used the Jaccard coefficient comparing both two (J2) and three (J3) segmentations as well as the Hausdorff Distance (HD). The variability measured with J2 improved significantly between the two last segmentations compared to the two first, going from 0.87 to 0.90 (p = 0.04). Between the last two segmentations, larger tumors showed a tendency towards smaller relative volume difference (p = 0.07), while tumors with well-defined borders had significantly less variability measured with both J2 (p = 0.04) and HD (p < 0.01). We found no significant relationship between variability and histological sub-types or Apparent Diffusion Coefficients (ADC). We found that the intra-rater variability can be considerable in serial LGG-segmentation, but the variability seems to decrease with experience and higher grade of border conspicuity. Our findings highlight that some criteria defining tumor borders and progression in 3D volumetric segmentation is needed, if moving from 2D to 3D assessment of size and growth of LGGs.

  13. Acute Blindness.

    PubMed

    Meekins, Jessica M

    2015-09-01

    Sudden loss of vision is an ophthalmic emergency with numerous possible causes. Abnormalities may occur at any point within the complex vision pathway, from retina to optic nerve to the visual center in the occipital lobe. This article reviews specific prechiasm (retina and optic nerve) and cerebral cortical diseases that lead to acute blindness. Information regarding specific etiologies, pathophysiology, diagnosis, treatment, and prognosis for vision is discussed. Copyright © 2015 Elsevier Inc. All rights reserved.

  14. Inter-rater reliability of manual and automated region-of-interest delineation for PiB PET.

    PubMed

    Rosario, Bedda L; Weissfeld, Lisa A; Laymon, Charles M; Mathis, Chester A; Klunk, William E; Berginc, Michael D; James, Jeffrey A; Hoge, Jessica A; Price, Julie C

    2011-04-01

    A major challenge in positron emission tomography (PET) amyloid imaging studies of Alzheimer's disease (AD) is the reliable detection of early amyloid deposition in human brain. Manual region-of-interest (ROI) delineation on structural magnetic resonance (MR) images is generally the reference standard for the extraction of count-rate data from PET images, as compared to automated MR-template(s) methods that utilize spatial normalization and a single set of ROIs. The goal of this work was to assess the inter-rater reliability of manual ROI delineation for PiB PET amyloid retention measures and the impact of CSF dilution correction (CSF) on this reliability for data acquired in elderly control (n=5) and AD (n=5) subjects. The intraclass correlation coefficient (ICC) was used to measure reliability. As a secondary goal, ICC scores were also computed for PiB outcome measures obtained by an automated MR-template ROI method and one manual rater; to assess the level of reliability that could be achieved using different processing methods. Fourteen ROIs were evaluated that included anterior cingulate (ACG), precuneus (PRC) and cerebellum (CER). The PiB outcome measures were the volume of distribution (V(T)), summed tissue uptake (SUV), and corresponding ratios that were computed using CER as reference (DVR and SUVR). Substantial reliability (ICC≥0.932) was obtained across 3 manual raters for V(T) and SUV measures when CSF correction was applied across all outcomes and regions and was similar in the absence of CSF correction. The secondary analysis revealed substantial reliability in primary cortical areas between the automated and manual SUV [ICC≥0.979 (ACG/PRC)] and SUVR [ICC≥0.977/0.952 (ACG/PRC)] outcomes. The current study indicates the following rank order among the various reliability results in primary cortical areas and cerebellum (high to low): 1) V(T) or SUV manual delineation, with or without CSF correction; 2) DVR or SUVR manual delineation, with or

  15. Inter-rater reliability of manual and automated region-of-interest delineation for PiB PET

    PubMed Central

    Rosario, Bedda L.; Weissfeld, Lisa A.; Laymon, Charles M.; Mathis, Chester A.; Klunk, William E.; Berginc, Michael D.; James, Jeffrey A.; Hoge, Jessica A.; Price, Julie C.

    2011-01-01

    A major challenge in positron emission tomography (PET) amyloid imaging studies of Alzheimer’s disease is the reliable detection of early amyloid deposition in human brain. Manual region-of-interest (ROI) delineation on structural magnetic resonance (MR) images is generally the reference standard for the extraction of count-rate data from PET images, as compared to automated MR-template(s) methods that utilize spatial normalization and a single set of ROIs. The goal of this work was to assess the inter-rater reliability of manual ROI delineation for PiB PET amyloid retention measures and the impact of CSF dilution correction (CSF) on this reliability for data acquired in elderly control (n=5) and AD (n=5) subjects. The intraclass correlation coefficient (ICC) was used to measure reliability. As a secondary goal, ICC scores were also computed for PiB outcome measures obtained by an automated MR-template ROI method and one manual rater; to assess the level of reliability that could be achieved using different processing methods. Fourteen ROIs were evaluated that included anterior cingulate (ACG), precuneus (PRC) and cerebellum (CER). The PiB outcome measures were the volume of distribution (VT), summed tissue uptake (SUV), and corresponding ratios that were computed using CER as reference (DVR and SUVR). Substantial reliability (ICC = 0.932) was obtained across 3 manual raters for VT and SUV measures when CSF correction was applied across all outcomes and regions and was similar in the absence of CSF correction. The secondary analysis revealed substantial reliability in primary cortical areas between the automated and manual SUV [ICC = 0.979 (ACG/PRC)] and SUVR [ICC = 0.977/0.952 (ACG/PRC)] outcomes. The current study indicates the following rank order among the various reliability results in primary cortical areas and cerebellum (high to low): 1) VT or SUV manual delineation, with or without CSF correction; 2) DVR or SUVR manual delineation, with or without CSF

  16. Magnetic resonance enterography has good inter-rater agreement and diagnostic accuracy for detecting inflammation in pediatric Crohn disease.

    PubMed

    Church, Peter C; Greer, Mary-Louise C; Cytter-Kuint, Ruth; Doria, Andrea S; Griffiths, Anne M; Turner, Dan; Walters, Thomas D; Feldman, Brian M

    2017-05-01

    Magnetic resonance enterography (MRE) is increasingly relied upon for noninvasive assessment of intestinal inflammation in Crohn disease. However very few studies have examined the diagnostic accuracy of individual MRE signs in children. We have created an MR-based multi-item measure of intestinal inflammation in children with Crohn disease - the Pediatric Inflammatory Crohn's MRE Index (PICMI). To inform item selection for this instrument, we explored the inter-rater agreement and diagnostic accuracy of individual MRE signs of inflammation in pediatric Crohn disease and compared our findings with the reference standards of the weighted Pediatric Crohn's Disease Activity Index (wPCDAI) and C-reactive protein (CRP). In this cross-sectional single-center study, MRE studies in 48 children with diagnosed Crohn disease (66% male, median age 15.5 years) were reviewed by two independent radiologists for the presence of 15 MRE signs of inflammation. Using kappa statistics we explored inter-rater agreement for each MRE sign across 10 anatomical segments of the gastrointestinal tract. We correlated MRE signs with the reference standards using correlation coefficients. Radiologists measured the length of inflamed bowel in each segment of the gastrointestinal tract. In each segment, MRE signs were scored as either binary (0-absent, 1-present), or ordinal (0-absent, 1-mild, 2-marked). These segmental scores were weighted by the length of involved bowel and were summed to produce a weighted score per patient for each MRE sign. Using a combination of wPCDAI≥12.5 and CRP≥5 to define active inflammation, we calculated area under the receiver operating characteristic curve (AUC) for each weighted MRE sign. Bowel wall enhancement, wall T2 hyperintensity, wall thickening and wall diffusion-weighted imaging (DWI) hyperintensity were most commonly identified. Inter-rater agreement was best for decreased motility and wall DWI hyperintensity (kappa≥0.64). Correlation between MRE

  17. Intra-Rater, Inter-Rater and Test-Retest Reliability of an Instrumented Timed Up and Go (iTUG) Test in Patients with Parkinson's Disease.

    PubMed

    van Lummel, Rob C; Walgaard, Stefan; Hobert, Markus A; Maetzler, Walter; van Dieën, Jaap H; Galindo-Garre, Francisca; Terwee, Caroline B

    2016-01-01

    The "Timed Up and Go" (TUG) is a widely used measure of physical functioning in older people and in neurological populations, including Parkinson's Disease. When using an inertial sensor measurement system (instrumented TUG [iTUG]), the individual components of the iTUG and the trunk kinematics can be measured separately, which may provide relevant additional information. The aim of this study was to determine intra-rater, inter-rater and test-retest reliability of the iTUG in patients with Parkinson's Disease. Twenty eight PD patients, aged 50 years or older, were included. For the iTUG the DynaPort Hybrid (McRoberts, The Hague, The Netherlands) was worn at the lower back. The device measured acceleration and angular velocity in three directions at a rate of 100 samples/s. Patients performed the iTUG five times on two consecutive days. Repeated measurements by the same rater on the same day were used to calculate intra-rater reliability. Repeated measurements by different raters on the same day were used to calculate intra-rater and inter-rater reliability. Repeated measurements by the same rater on different days were used to calculate test-retest reliability. Nineteen ICC values (15%) were ≥ 0.9 which is considered as excellent reliability. Sixty four ICC values (49%) were ≥ 0.70 and < 0.90 which is considered as good reliability. Thirty one ICC values (24%) were ≥ 0.50 and < 0.70, indicating moderate reliability. Sixteen ICC values (12%) were ≥ 0.30 and < 0.50 indicating poor reliability. Two ICT values (2%) were < 0.30 indicating very poor reliability. In conclusion, in patients with Parkinson's disease the intra-rater, inter-rater, and test-retest reliability of the individual components of the instrumented TUG (iTUG) was excellent to good for total duration and for turning durations, and good to low for the sub durations and for the kinematics of the SiSt and StSi. The results of this fully automated analysis of instrumented TUG movements demonstrate

  18. Congenitally Blind Counselor, Adventitiously Blind Client.

    ERIC Educational Resources Information Center

    Roberts, A. H.

    1994-01-01

    A counselor blind from birth describes personal difficulties in fully understanding the experience of clients who are adventitiously blind. Congenitally blind counselors are urged to recognize that adaptive methods cannot compensate for the panoramic view of the environment provided by vision and that recently blinded individuals need to deal with…

  19. The Effectiveness and Efficiency of Distributed Online, Regional Online, and Regional Face-to-Face Training for Writing Assessment Raters

    ERIC Educational Resources Information Center

    Wolfe, Edward W.; Matthews, Staci; Vickers, Daisy

    2010-01-01

    This study examined the influence of rater training and scoring context on training time, scoring time, qualifying rate, quality of ratings, and rater perceptions. One hundred twenty raters participated in the study and experienced one of three training contexts: (a) online training in a distributed scoring context, (b) online training in a…

  20. An Examination of Rater Performance on a Local Oral English Proficiency Test: A Mixed-Methods Approach

    ERIC Educational Resources Information Center

    Yan, Xun

    2014-01-01

    This paper reports on a mixed-methods approach to evaluate rater performance on a local oral English proficiency test. Three types of reliability estimates were reported to examine rater performance from different perspectives. Quantitative results were also triangulated with qualitative rater comments to arrive at a more representative picture of…

  1. Using Verbal Reports to Explore Rater Perceptual Processes in Scoring: A Mixed Methods Application to Oral Communication Assessment

    ERIC Educational Resources Information Center

    Joe, Jilliam N.; Harmes, J. Christine; Hickerson, Corey A.

    2011-01-01

    In recent years, there has been a growth in the use of rater cognitive data to inform test development and validity arguments. In this study, we examined differences in feature attention and categorisation between experienced and inexperienced raters for a college-level assessment of oral communication. The focus was two-fold: (a) rater cognition…

  2. Rater Effects in ITA Testing: ESL Teachers' versus American Undergraduates' Judgments of Accentedness, Comprehensibility, and Oral Proficiency

    ERIC Educational Resources Information Center

    Hsieh, Ching-Ni

    2011-01-01

    Second language (L2) oral performance assessment always involves raters' subjective judgments and is thus subject to rater variability. The variability due to rater characteristics has important consequential impacts on decision-making processes, particularly in high-stakes testing situations (Bachman, Lynch, & Mason, 1995; A. Brown, 1995;…

  3. The Development and Maintenance of Rating Quality in Performance Writing Assessment: A Longitudinal Study of New and Experienced Raters

    ERIC Educational Resources Information Center

    Lim, Gad S.

    2011-01-01

    Raters are central to writing performance assessment, and rater development--training, experience, and expertise--involves a temporal dimension. However, few studies have examined new and experienced raters' rating performance longitudinally over multiple time points. This study uses operational data from the writing section of the MELAB (n =…

  4. Rater Effects in ITA Testing: ESL Teachers' versus American Undergraduates' Judgments of Accentedness, Comprehensibility, and Oral Proficiency

    ERIC Educational Resources Information Center

    Hsieh, Ching-Ni

    2011-01-01

    Second language (L2) oral performance assessment always involves raters' subjective judgments and is thus subject to rater variability. The variability due to rater characteristics has important consequential impacts on decision-making processes, particularly in high-stakes testing situations (Bachman, Lynch, & Mason, 1995; A. Brown, 1995;…

  5. Mesalazine Has No Effect on Mucosal Immune Biomarkers in Patients with Diarrhea-Dominant Irritable Bowel Syndrome Referred to Shariati Hospital: A Randomized Double-Blind, Placebo-Controlled Trial

    PubMed Central

    Ghadir, Mohammad Reza; Poradineh, Mehri; Sotodeh, Masoud; Ansari, Reza; Kolahdoozan, Shadi; Hormati, Ahmad; Yousefi, Mohammad Hosein; Mirzaei, Samaneh; Vahedi, Homayoon

    2017-01-01

    BACKGROUND Intestinal mast cells may cause gastrointestinal symptoms in patients with diarrhea-dominant irritable bowel syndrome (IBS). The objective of this study was to determine the effect of mesalazine on the number of lamina propria mast cells and clinical manifestations of patients with diarrhea-dominant IBS referred to Shariati Hospital affiliated to Tehran University of Medical Sciences. METHODS This was a randomized placebo-controlled double-blind trial conducted on 49 patients with diarrhea-dominant IBS. The patients were randomly assigned to one of the experiment or control groups. The patients in experiment group took 2400 mg mesalazine daily in three divided doses for 8 weeks and the patient in control group took placebo on the same basis. Our first targeted outcome was an assigned downturn of mast cells number to the safe colonic baseline and the next one was a marked palliation of disease symptoms. Data were analyzed conforming intention-to-treat method. We used MANCOVA test to compare our both assigned outcomes in the two groups. We also compared the data with baseline values in both groups.All statistical tests were performed at the significance level of 0.05. RESULTS There was no significant difference between Mesalazine and placebo groups regarding the number of mast cells (p value=0.396), abdominal pain (p value=0.054), bloating (p value=0.365), defecation urgency (p value=0.212), and defecation frequency (p value=0.702). CONCLUSION Mesalazine had no significant effect either on the number of mast cells or on the severity of disease symptoms. This finding seems to be inconsistent with the hypothesis indicating immune mechanisms as potential therapeutic targets in IBS. The possible difference in this effect of Mesalazine should be evaluated in further studies among populations varying in race, ethnic, and geographical characteristics.

  6. Accuracy of a combined heart rate and motion sensor for assessing energy expenditure in free-living adults during a double-blind crossover caffeine trial using doubly labeled water as the reference method.

    PubMed

    Silva, A M; Santos, D A; Matias, C N; Júdice, P B; Magalhães, J P; Ekelund, U; Sardinha, L B

    2015-01-01

    A combined heart rate (HR) and motion sensor (Actiheart) has been proposed as an accurate method for assessing total energy expenditure (TEE) and physical activity energy expenditure (PAEE). However, the extent to which factors such as caffeine may affect the accuracy by which the estimated HR-related PAEE contribution will affect TEE and PAEE estimates is unknown. Therefore, we examined the validity of Actiheart in estimating TEE and PAEE in free-living adults under a caffeine trial compared with doubly labeled water (DLW) as reference criterion. Using a double-blind crossover trial (Clinicaltrials.gov ID: #NCT01477294) with two conditions (4-day each with a 3-day-washout period), randomly ordered as caffeine (5 mg/kg per day) and placebo (malt-dextrine) intake, TEE was measured by DLW in 17 physically active men (20-38 years) who were non-caffeine users. In each condition, resting energy expenditure (REE) was assessed by indirect calorimetry and PAEE was calculated as (TEE-(REE+0.1 TEE)). Simultaneously, PAEE and TEE were estimated by Actiheart using an individual calibration (ACC+HRstep). Under caffeine, ACC+HRstep explained 76 and 64% of TEE and PAEE from DLW, respectively; corresponding results for the placebo condition were 82 and 66%. No mean bias was found between ACC+HRstep and DLW for TEE (caffeine:-468 kJ per day; placebo:-407 kJ per day), although PAEE was slightly underestimated (caffeine:-856 kJ per day; placebo:-1147 kJ per day). Similar limits of agreement were observed in both conditions ranging from -2066 to 3002 and from -3488 to 1776 kJ per day for TEE and PAEE, respectively. Regardless of caffeine intake, the combined HR and motion sensor is valid for estimating free-living energy expenditure in a group of healthy men but is less accurate for an individual assessment.

  7. A randomised, double-blind, phase III study comparing SB2, an infliximab biosimilar, to the infliximab reference product Remicade in patients with moderate to severe rheumatoid arthritis despite methotrexate therapy

    PubMed Central

    Choe, Jung-Yoon; Prodanovic, Nenad; Niebrzydowski, Jaroslaw; Staykov, Ivan; Dokoupilova, Eva; Baranauskaite, Asta; Yatsyshyn, Roman; Mekic, Mevludin; Porawska, Wieskawa; Ciferska, Hana; Jedrychowicz-Rosiak, Krystyna; Zielinska, Agnieszka; Choi, Jasmine; Rho, Young Hee; Smolen, Josef S

    2017-01-01

    Objectives To compare the efficacy, safety, immunogenicity and pharmacokinetics (PK) of SB2 to the infliximab reference product (INF) in patients with moderate to severe rheumatoid arthritis (RA) despite methotrexate therapy. Methods This is a phase III, randomised, double-blind, multinational, multicentre parallel group study. Patients with moderate to severe RA despite methotrexate therapy were randomised in a 1:1 ratio to receive either SB2 or INF of 3 mg/kg. The primary end point was the American College of Rheumatology 20% (ACR20) response at week 30. Inclusion of the 95% CI of the ACR20 response difference within a ±15% margin was required for equivalence. Results 584 subjects were randomised into SB2 (N=291; 290 analysed) or INF (N=293). The ACR20 response at week 30 in the per-protocol set was 64.1% in SB2 versus 66.0% in INF. The adjusted rate difference was −1.88% (95% CI −10.26% to 6.51%), which was within the predefined equivalence margin. Other efficacy outcomes such as ACR50/70, disease activity score measured by 28 joints and European League against Rheumatism response were similar between SB2 and INF. The incidence of treatment-emergent adverse events was comparable (57.6% in SB2 vs 58.0% in INF) as well as the incidence of antidrug antibodies (ADA) to infliximab up to week 30 (55.1% in SB2 vs 49.7% in INF). The PK profile was similar between SB2 and INF. Efficacy, safety and PK by ADA subgroup were comparable between SB2 and INF. Conclusions SB2 was equivalent to INF in terms of ACR20 response at week 30. SB2 was well tolerated with a comparable safety profile, immunogenicity and PK to INF. Trial registration number NCT01936181. PMID:26318384

  8. Individual Differences in Susceptibility to Inattentional Blindness

    ERIC Educational Resources Information Center

    Seegmiller, Janelle K.; Watson, Jason M.; Strayer, David L.

    2011-01-01

    Inattentional blindness refers to the finding that people do not always see what appears in their gaze. Though inattentional blindness affects large percentages of people, it is unclear if there are individual differences in susceptibility. The present study addressed whether individual differences in attentional control, as reflected by…

  9. Individual Differences in Susceptibility to Inattentional Blindness

    ERIC Educational Resources Information Center

    Seegmiller, Janelle K.; Watson, Jason M.; Strayer, David L.

    2011-01-01

    Inattentional blindness refers to the finding that people do not always see what appears in their gaze. Though inattentional blindness affects large percentages of people, it is unclear if there are individual differences in susceptibility. The present study addressed whether individual differences in attentional control, as reflected by…

  10. Rater reliability and concurrent validity of the Keyboard Personal Computer Style instrument (K-PeCS).

    PubMed

    Baker, Nancy A; Cook, James R; Redfern, Mark S

    2009-01-01

    This paper describes the inter-rater and intra-rater reliability, and the concurrent validity of an observational instrument, the Keyboard Personal Computer Style instrument (K-PeCS), which assesses stereotypical postures and movements associated with computer keyboard use. Three trained raters independently rated the video clips of 45 computer keyboard users to ascertain inter-rater reliability, and then re-rated a sub-sample of 15 video clips to ascertain intra-rater reliability. Concurrent validity was assessed by comparing the ratings obtained using the K-PeCS to scores developed from a 3D motion analysis system. The overall K-PeCS had excellent reliability [inter-rater: intra-class correlation coefficients (ICC)=.90; intra-rater: ICC=.92]. Most individual items on the K-PeCS had from good to excellent reliability, although six items fell below ICC=.75. Those K-PeCS items that were assessed for concurrent validity compared favorably to the motion analysis data for all but two items. These results suggest that most items on the K-PeCS can be used to reliably document computer keyboarding style.

  11. Rater agreement of a test battery designed to assess adolescents' resistance training skill competency.

    PubMed

    Barnett, Lisa; Reynolds, John; Faigenbaum, Avery D; Smith, Jordan J; Harries, Simon; Lubans, David R

    2015-01-01

    The study aim was to assess rater agreement of the Resistance Training Skills Battery (RTSB) for adolescents. The RTSB provides an assessment of resistance training skill competency and includes six exercises. The RTSB can be used to assess performance and progress in adolescent resistance training programmes and to provide associated feedback to participants. Individual skill scores are based on the number of performance criteria successfully demonstrated and an overall resistance training skill quotient (RTSQ) is created by summing the six skill scores. The eight raters had varying experience in movement skill assessment and resistance training and completed a 2-3h training session in how to assess resistance training performance using the RTSB. The raters then completed an assessment on six skills for 12 adolescents (mean age=15.1 years, SD=1.0, six male and six female) in a randomised order. Agreement between seven of the eight raters was high (20 of the 21 pairwise correlations were greater than 0.7 and 13 of the 21 were greater than 0.8). Correlations between the eighth rater and each of the other seven raters were generally lower (0.45-0.78). Most variation in the assigned RTSB scores (67%) was between cases, a relatively small amount of the variation (10%) was between raters and the remainder (23%) was between periods within raters. The between-raters coefficient of variation was approximately 5%. The RTSB can be used reliably by those with experience in movement skill assessment and resistance training to assess the resistance skill of adolescents. Copyright © 2013 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.

  12. Colour blindness.

    PubMed

    Gordon, N

    1998-03-01

    The physiology of colour vision is discussed; as is the way in which the human eye can detect various combinations of red, green and blue. Red-green colour blindness, with X-linked inheritance, is the most common, but other types are also considered. Methods of testing relating to the age of the child are reviewed. The use of colours in teaching is widespread, but there is controversy over the difficulties this may cause a colour blind child. A review of the literature does not reveal much information on this, and any problems that do arise are likely to be individual to the child, and to depend on such factors as overall intelligence, the attitude of the teacher, and the personality of the child. There is not doubt that it is essential to recognise colour vision defects when it comes to choosing a career, and that tests must be done during secondary schooling, but in order to avoid some affected children being disadvantaged there is enough evidence to support testing at school entry.

  13. Classification of substandard factors in perinatal care: development and multidisciplinary inter-rater agreement of the Groningen-system.

    PubMed

    van Diem, Mariet Th; Timmer, Albertus; Gordijn, Sanne J; Bergman, Klasien A; Korteweg, Fleurisca J; Ravise, Joke; Vreugdenhil, Ellen; Erwich, Jan Jaap H M

    2015-09-11

    Perinatal audit is an established method for improving the quality of perinatal care. In audit meetings substandard factors (SSF) are identified in cases of perinatal mortality and morbidity. To our knowledge there is no classification system specifically designed for the classification of substandard factors. Such a classification may help to standardise allocation of substandard factors to categories. This will help to prioritise, guide and implement actions in quality improvement programs. A classification system of 284 substandard factors (SSF) identified in perinatal audit meetings between 2007 and 2011 was drawn up using the WHO Conceptual Framework for the International Classification for Patient Safety as a starting point. Discussions were held on inter-rater disagreements, inclusion of items, format and organisation and definitions of the main- and subcategories. A guideline was developed. An independent multidisciplinary group tested the classification. Independent of inter-rater agreement the allocations to categories were counted. For the counts in the subcategories one and two, we used the allocations in the main category as reference. The chance corrected agreement between classifiers was tested with Cohen's kappa statistic. The classification consists of 9 main categories with one or two subcategories. The main categories are (1) Equipment and Materials, (2) Medication, (3) Additional tests/ investigations, (4) Transportation , (5) Documentation, (6) Communication, (7) Medical practice, (8) Other and (9) non classifiable. Of 3663 allocations by 13 classifiers 1452 SSF's were allocated (40%) to 'medical practice' and 1247 (34%) to 'documentation'. 118 (3%) times SSF's were not classifiable, mainly due to unclear phrasing of the SSF. The chance corrected agreement of 284 substandard factors in the main category was 0.68 (95% CI 0.66-0.70) and 0.57 (95% CI 0.54-0.59) for the CDG and the IGD respectively. Classifying substandard factors has given insight

  14. PHYSICAL EDUCATION FOR BLIND CHILDREN.

    ERIC Educational Resources Information Center

    BUELL, CHARLES E.

    A PRACTICAL RATHER THAN A THEORETICAL REFERENCE GUIDE, THE BOOK DISCUSSES THE NEED OF THE BLIND OR VISUALLY IMPAIRED CHILD FOR PHYSICAL EDUCATION. PAST AND PRESENT PROGRAMS IN PUBLIC AND RESIDENTIAL SCHOOLS, RECREATION AND LEISURE TIME ACTIVITIES (A GUIDE FOR PARENTS), SPORTS AND INTERSCHOLASTIC COMPETITION, ACTIVE GAMES, CONTESTS, RELAYS, AND…

  15. Item and rater analysis of constructed response items via the multi-faceted Rasch model.

    PubMed

    Wolfe, Edward W

    2009-01-01

    This article describes how the multi-faceted Rasch model (MFRM) can be applied to item and rater analysis and the types of information that is made available by a multifaceted analysis of constructed-response items. Particularly, the text describes evidence that is made available by such analyses that is relevant to improving item and rubric development as well as rater training and monitoring. The article provides an introduction to MRFM extensions of the family of Rasch models, a description of item analysis procedures, a description of rater analysis procedures, and concludes with an example analysis conducted using a commercially available program that implements the MFRM, Facets.

  16. Reproducibility of range of motion and muscle strength measurements in patients with hip osteoarthritis – an inter-rater study

    PubMed Central

    2012-01-01

    Background Assessment of range of motion (ROM) and muscle strength is fundamental in the clinical diagnosis of hip osteoarthritis (OA) but reproducibility of these measurements has mostly involved clinicians from secondary care and has rarely reported agreement parameters. Therefore, the primary objective of the study was to determine the inter-rater reproducibility of ROM and muscle strength measurements. Furthermore, the reliability of the overall assessment of clinical hip OA was evaluated. Reporting is in accordance with proposed guidelines for the reporting of reliability and agreement studies (GRRAS). Methods In a university hospital, four blinded raters independently examined patients with unilateral hip OA; two hospital orthopaedists independently examined 48 (24 men) patients and two primary care chiropractors examined 61 patients (29 men). ROM was measured in degrees (deg.) with a standard two-arm goniometer and muscle strength in Newton (N) using a hand-held dynamometer. Reproducibility is reported as agreement and reliability between paired raters of the same profession. Agreement is reported as limits of agreement (LoA) and reliability is reported with intraclass correlation coefficients (ICC). Reliability of the overall assessment of clinical OA is reported as weighted kappa. Results Between orthopaedists, agreement for ROM ranged from LoA [-28–12 deg.] for internal rotation to [-8–13 deg.] for extension. ICC ranged between 0.53 and 0.73, highest for flexion. For muscle strength between orthopaedists, LoA ranged from [-65–47N] for external rotation to [-10 –59N] for flexion. ICC ranged between 0.52 and 0.85, highest for abduction. Between chiropractors, agreement for ROM ranged from LoA [-25–30 deg.] for internal rotation to [-13–21 deg.] for flexion. ICC ranged between 0.14 and 0.79, highest for flexion. For muscle strength between chiropractors, LoA ranged between [-80–20N] for external rotation to [-146–55N] for abduction. ICC

  17. The inter-rater reliability of the diagnosis of surgical site infection in the context of a clinical trial

    PubMed Central

    Nuttall, J.; Evaniew, N.; Thornley, P.; Griffin, A.; Deheshi, B.; O’Shea, T.; Wunder, J.; Ferguson, P.; Randall, R. L.; Turcotte, R.; Schneider, P.; McKay, P.; Bhandari, M.

    2016-01-01

    Objectives The diagnosis of surgical site infection following endoprosthetic reconstruction for bone tumours is frequently a subjective diagnosis. Large clinical trials use blinded Central Adjudication Committees (CACs) to minimise the variability and bias associated with assessing a clinical outcome. The aim of this study was to determine the level of inter-rater and intra-rater agreement in the diagnosis of surgical site infection in the context of a clinical trial. Materials and Methods The Prophylactic Antibiotic Regimens in Tumour Surgery (PARITY) trial CAC adjudicated 29 non-PARITY cases of lower extremity endoprosthetic reconstruction. The CAC members classified each case according to the Centers for Disease Control (CDC) criteria for surgical site infection (superficial, deep, or organ space). Combinatorial analysis was used to calculate the smallest CAC panel size required to maximise agreement. A final meeting was held to establish a consensus. Results Full or near consensus was reached in 20 of the 29 cases. The Fleiss kappa value was calculated as 0.44 (95% confidence interval (CI) 0.35 to 0.53), or moderate agreement. The greatest statistical agreement was observed in the outcome of no infection, 0.61 (95% CI 0.49 to 0.72, substantial agreement). Panelists reached a full consensus in 12 of 29 cases and near consensus in five of 29 cases when CDC criteria were used (superficial, deep or organ space). A stable maximum Fleiss kappa of 0.46 (95% CI 0.50 to 0.35) at CAC sizes greater than three members was obtained. Conclusions There is substantial agreement among the members of the PARITY CAC regarding the presence or absence of surgical site infection. Agreement on the level of infection, however, is more challenging. Additional clinical information routinely collected by the prospective PARITY trial may improve the discriminatory capacity of the CAC in the parent study for the diagnosis of infection. Cite this article: J. Nuttall, N. Evaniew, P. Thornley

  18. Grant Peer Review: Improving Inter-Rater Reliability with Training

    DOE PAGES

    Sattler, David N.; McKnight, Patrick E.; Naney, Linda; ...

    2015-06-15

    In this study, we developed and evaluated a brief training program for grant reviewers that aimed to increase inter-rater reliability, rating scale knowledge, and effort to read the grant review criteria. Enhancing reviewer training may improve the reliability and accuracy of research grant proposal scoring and funding recommendations. Seventy-five Public Health professors from U.S. research universities watched the training video we produced and assigned scores to the National Institutes of Health scoring criteria proposal summary descriptions. For both novice and experienced reviewers, the training video increased scoring accuracy (the percentage of scores that reflect the true rating scale values), inter-ratermore » reliability, and the amount of time reading the review criteria compared to the no video condition. The increase in reliability for experienced reviewers is notable because it is commonly assumed that reviewers—especially those with experience—have good understanding of the grant review rating scale. Our findings suggest that both experienced and novice reviewers who had not received the type of training developed in this study may not have appropriate understanding of the definitions and meaning for each value of the rating scale and that experienced reviewers may overestimate their knowledge of the rating scale. Lastly, the results underscore the benefits of and need for specialized peer reviewer training.« less

  19. Rating Movies and Rating the Raters Who Rate Them

    PubMed Central

    Zhou, Hua; Lange, Kenneth

    2010-01-01

    The movie distribution company Netflix has generated considerable buzz in the statistics community by offering a million dollar prize for improvements to its movie rating system. Among the statisticians and computer scientists who have disclosed their techniques, the emphasis has been on machine learning approaches. This article has the modest goal of discussing a simple model for movie rating and other forms of democratic rating. Because the model involves a large number of parameters, it is nontrivial to carry out maximum likelihood estimation. Here we derive a straightforward EM algorithm from the perspective of the more general MM algorithm. The algorithm is capable of finding the global maximum on a likelihood landscape littered with inferior modes. We apply two variants of the model to a dataset from the MovieLens archive and compare their results. Our model identifies quirky raters, redefines the raw rankings, and permits imputation of missing ratings. The model is intended to stimulate discussion and development of better theory rather than to win the prize. It has the added benefit of introducing readers to some of the issues connected with analyzing high-dimensional data. PMID:20802818

  20. Rating Movies and Rating the Raters Who Rate Them.

    PubMed

    Zhou, Hua; Lange, Kenneth

    2009-11-01

    The movie distribution company Netflix has generated considerable buzz in the statistics community by offering a million dollar prize for improvements to its movie rating system. Among the statisticians and computer scientists who have disclosed their techniques, the emphasis has been on machine learning approaches. This article has the modest goal of discussing a simple model for movie rating and other forms of democratic rating. Because the model involves a large number of parameters, it is nontrivial to carry out maximum likelihood estimation. Here we derive a straightforward EM algorithm from the perspective of the more general MM algorithm. The algorithm is capable of finding the global maximum on a likelihood landscape littered with inferior modes. We apply two variants of the model to a dataset from the MovieLens archive and compare their results. Our model identifies quirky raters, redefines the raw rankings, and permits imputation of missing ratings. The model is intended to stimulate discussion and development of better theory rather than to win the prize. It has the added benefit of introducing readers to some of the issues connected with analyzing high-dimensional data.

  1. Inter-rater reliability of the STEP protocol.

    PubMed

    Edman, A; Mahnfeldt, M; Wallin, A

    2001-01-01

    An inter-rater reliability test of the Stepwise Comparative Status Analysis (STEP) is presented. The STEP is a protocol for the clinical examination of patients with dementia, within the scope of a neuropsychiatric investigation. It combines psychiatric and neurologic bedside examination methods. The analysis is made in three steps where primary, observable symptom variables are successively aggregated via compound variables to the final determination of one of seven possible dominant regional brain syndromes (global, frontal, subcortical, parietal, frontosubcortical, frontoparietal, other), here also called complex variables. In the present study, two senior physicians assessed 50 patients independently and simultaneously. None of the patients was known to both physicians. In 42 patients (84%), the same dominant brain syndrome was determined by the two clinicians. The probability (P value) of this (or better) agreement was calculated at 2.0 x 10(-12). Kappa coefficients were calculated as a measure of assessment agreement regarding the 50 STEP variables. For 20 variables, the coefficient was 0.75 or above, indicating excellent agreement; for 22 variables, the coefficient was below 0.75 and above 0.40, indicating moderate agreement; and for 4 variables, the value was 0.40 or below, indicating poor agreement. Kappa calculations regarding the assessments of four variables were either not possible or were considered inappropriate.

  2. Examining rating quality in writing assessment: rater agreement, error, and accuracy.

    PubMed

    Wind, Stefanie A; Engelhard, George

    2012-01-01

    The use of performance assessments in which human raters evaluate student achievement has become increasingly prevalent in high-stakes assessment systems such as those associated with recent policy initiatives (e.g., Race to the Top). In this study, indices of rating quality are compared between two measurement perspectives. Within the context of a large-scale writing assessment, this study focuses on the alignment between indices of rater agreement, error, and accuracy based on traditional and Rasch measurement theory perspectives. Major empirical findings suggest that Rasch-based indices of model-data fit for ratings provide information about raters that is comparable to direct measures of accuracy. The use of easily obtained approximations of direct accuracy measures holds significant implications for monitoring rating quality in large-scale rater-mediated performance assessments.

  3. PANSS rater training using Internet and videoconference: results from a pilot study.

    PubMed

    Kobak, Kenneth A; Opler, Mark G A; Engelhardt, Nina

    2007-05-01

    Problems associated with the clinician-administered rating scales have led to new approaches to improve rater training. These include interactive, on-line didactic tutorials and live, remote evaluation of raters' clinical skills through the use of videoconferencing. The purpose of this study was to evaluate this approach in training novice raters on the administration of the Positive and Negative Symptom Scale (PANSS). Twelve trainees with no prior PANSS experience completed didactic training via CD-ROM and two remote training sessions where they interviewed a standardized patient-actor while being remotely observed in real time and given feedback. Results found a significant improvement in trainees' conceptual knowledge and an improvement in trainees' clinical skills. The use of these technologies allows for training to be more effectively delivered to diverse sites in multi-center trials, and for evaluation of raters' applied clinical skills, an area that has previously been overlooked.

  4. The effect of rater severity on person ability measure: a Rasch model analysis.

    PubMed

    Lunz, M E; Stahl, J A

    1993-04-01

    This paper presents a method for analyzing oral examinations with an extended, many-faceted Rasch model that calibrates medical specialty candidates, protocols, and raters. Significant variance was found among protocol difficulties and rater severities. When candidates' raw scores were compared with calibrated measures corrected for the bias caused by the particular protocols and raters encountered, variation between candidate scores and measures were observed. The data were found to fit the Rasch model well enough to be suitable for making measurement on oral examinations more objective as well as providing specific feedback to oral examination raters. In this example a medical oral examination was used; however, the techniques are applicable to any situation in which trained professionals rate candidate or patient performances. For occupational therapists, potential applications include evaluation of a student's fieldwork performance or observation of a patient's task performance.

  5. PEER RATING VALIDITY AS A FUNCTION OF RATER INTELLIGENCE AND RATING SCORE RECEIVED

    DTIC Science & Technology

    intelligence there is little reason to take into consideration rather intelligence when concerned with the validity of the ratings he gives . This is also true for the Peer Rating score received by the rater.

  6. Definition of blindness under National Programme for Control of Blindness: Do we need to revise it?

    PubMed

    Vashist, Praveen; Senjam, Suraj Singh; Gupta, Vivek; Gupta, Noopur; Kumar, Atul

    2017-02-01

    A review appropriateness of the current definition of blindness under National Programme for Control of Blindness (NPCB), Government of India. Online search of peer-reviewed scientific published literature and guidelines using PubMed, the World Health Organization (WHO) IRIS, and Google Scholar with keywords, namely blindness and visual impairment, along with offline examination of reports of national and international organizations, as well as their cross-references was done until December 2016, to identify relevant documents on the definition of blindness. The evidence for the historical and currently adopted definition of blindness under the NPCB, the WHO, and other countries was reviewed. Differences in the NPCB and WHO definitions were analyzed to assess the impact on the epidemiological status of blindness and visual impairment in India. The differences in the criteria for blindness under the NPCB and the WHO definitions cause an overestimation of the prevalence of blindness in India. These variations are also associated with an over-representation of refractive errors as a cause of blindness and an under-representation of other causes under the NPCB definition. The targets for achieving elimination of blindness also become much more difficult to achieve under the NPCB definition. Ignoring differences in definitions when comparing the global and Indian prevalence of blindness will cause erroneous interpretations. We recommend that the appropriate modifications should be made in the NPCB definition of blindness to make it consistent with the WHO definition.

  7. Definition of blindness under National Programme for Control of Blindness: Do we need to revise it?

    PubMed Central

    Vashist, Praveen; Senjam, Suraj Singh; Gupta, Vivek; Gupta, Noopur; Kumar, Atul

    2017-01-01

    A review appropriateness of the current definition of blindness under National Programme for Control of Blindness (NPCB), Government of India. Online search of peer-reviewed scientific published literature and guidelines using PubMed, the World Health Organization (WHO) IRIS, and Google Scholar with keywords, namely blindness and visual impairment, along with offline examination of reports of national and international organizations, as well as their cross-references was done until December 2016, to identify relevant documents on the definition of blindness. The evidence for the historical and currently adopted definition of blindness under the NPCB, the WHO, and other countries was reviewed. Differences in the NPCB and WHO definitions were analyzed to assess the impact on the epidemiological status of blindness and visual impairment in India. The differences in the criteria for blindness under the NPCB and the WHO definitions cause an overestimation of the prevalence of blindness in India. These variations are also associated with an over-representation of refractive errors as a cause of blindness and an under-representation of other causes under the NPCB definition. The targets for achieving elimination of blindness also become much more difficult to achieve under the NPCB definition. Ignoring differences in definitions when comparing the global and Indian prevalence of blindness will cause erroneous interpretations. We recommend that the appropriate modifications should be made in the NPCB definition of blindness to make it consistent with the WHO definition. PMID:28345562

  8. Inter- and intra-rater agreement of static posture analysis using a mobile application

    PubMed Central

    Boland, David M.; Neufeld, Eric V.; Ruddell, Jack; Dolezal, Brett A.; Cooper, Christopher B.

    2016-01-01

    [Purpose] To determine the intra- and inter-rater agreement of a mobile application, PostureScreen Mobile® (PSM), that assesses static standing posture. [Subjects and Methods] Three examiners with different levels of experience of assessing posture, one licensed physical therapist and two untrained undergraduate students, performed repeated postural assessments of 10 subjects, fully clothed or minimally clothed, using PSM on two nonconsecutive days. Anterior and right lateral images were captured and seventeen landmarks were identified on them. Intraclass correlation coefficients (ICCs) were calculated for each of 13 postural measures to evaluate inter-rater agreement on the first visit (fully or minimally clothed), as well as intra-rater agreement between the first and second visits (minimally clothed). [Results] Eleven postural measures were ultimately analyzed for inter- and intra-rater agreement. Inter-rater agreement was almost perfect (ICC≥0.81) for four measures and substantial (0.60rater agreement was almost perfect for four measures and substantial for four measures. Intra-rater agreement between two minimally clothed exams was almost perfect for two measures and substantial for five measures. [Conclusion] PSM is a widely available, inexpensive postural screening tool that requires little formal training. To maximize inter- and intra-rater agreement, postural screening using this mobile application should be conducted with subjects wearing minimal clothing. Assessing static standing posture via PSM gives repeatable measures for anatomical landmarks that were found to have substantial or almost perfect agreement. Our data also suggest that this technology may also be useful for diagnosing forward head posture. PMID:28174460

  9. A randomised, double-blind, phase III study comparing SB2, an infliximab biosimilar, to the infliximab reference product Remicade in patients with moderate to severe rheumatoid arthritis despite methotrexate therapy.

    PubMed

    Choe, Jung-Yoon; Prodanovic, Nenad; Niebrzydowski, Jaroslaw; Staykov, Ivan; Dokoupilova, Eva; Baranauskaite, Asta; Yatsyshyn, Roman; Mekic, Mevludin; Porawska, Wieskawa; Ciferska, Hana; Jedrychowicz-Rosiak, Krystyna; Zielinska, Agnieszka; Choi, Jasmine; Rho, Young Hee; Smolen, Josef S

    2017-01-01

    To compare the efficacy, safety, immunogenicity and pharmacokinetics (PK) of SB2 to the infliximab reference product (INF) in patients with moderate to severe rheumatoid arthritis (RA) despite methotrexate therapy. This is a phase III, randomised, double-blind, multinational, multicentre parallel group study. Patients with moderate to severe RA despite methotrexate therapy were randomised in a 1:1 ratio to receive either SB2 or INF of 3 mg/kg. The primary end point was the American College of Rheumatology 20% (ACR20) response at week 30. Inclusion of the 95% CI of the ACR20 response difference within a ±15% margin was required for equivalence. 584 subjects were randomised into SB2 (N=291; 290 analysed) or INF (N=293). The ACR20 response at week 30 in the per-protocol set was 64.1% in SB2 versus 66.0% in INF. The adjusted rate difference was -1.88% (95% CI -10.26% to 6.51%), which was within the predefined equivalence margin. Other efficacy outcomes such as ACR50/70, disease activity score measured by 28 joints and European League against Rheumatism response were similar between SB2 and INF. The incidence of treatment-emergent adverse events was comparable (57.6% in SB2 vs 58.0% in INF) as well as the incidence of antidrug antibodies (ADA) to infliximab up to week 30 (55.1% in SB2 vs 49.7% in INF). The PK profile was similar between SB2 and INF. Efficacy, safety and PK by ADA subgroup were comparable between SB2 and INF. SB2 was equivalent to INF in terms of ACR20 response at week 30. SB2 was well tolerated with a comparable safety profile, immunogenicity and PK to INF. NCT01936181. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  10. Repetition blindness is orientation blind.

    PubMed

    Corballis, Michael C; Armstrong, Cole

    2007-03-01

    In identifying rapid sequences of three letters, subjects were worse at identifying the first and third letters when they were the same than when they were different, indicating repetition blindness (RB). This effect occurred regardless of the angular orientations of the letters, but was more pronounced when the orientations of the repeated letters were different than when they were the same. In a second experiment, RB was also evident when the first and third letters were lowercase bs or ds, presented upright or inverted, even though they are differently named when inverted (q and p, respectively). Conversely, a third experiment showed that RB occurred when the letters had the same names but were repeated in different case. These results suggest that the early extraction of letter shape is independent of its orientation and left-right sense, and that RB can occur at the levels of both shape and name.

  11. Specific agreement on dichotomous outcomes can be calculated for more than two raters.

    PubMed

    de Vet, Henrica C W; Dikmans, Rieky E; Eekhout, Iris

    2017-03-01

    For assessing interrater agreement, the concepts of observed agreement and specific agreement have been proposed. The situation of two raters and dichotomous outcomes has been described, whereas often, multiple raters are involved. We aim to extend it for more than two raters and examine how to calculate agreement estimates and 95% confidence intervals (CIs). As an illustration, we used a reliability study that includes the scores of four plastic surgeons classifying photographs of breasts of 50 women after breast reconstruction into "satisfied" or "not satisfied." In a simulation study, we checked the hypothesized sample size for calculation of 95% CIs. For m raters, all pairwise tables [ie, m (m - 1)/2] were summed. Then, the discordant cells were averaged before observed and specific agreements were calculated. The total number (N) in the summed table is m (m - 1)/2 times larger than the number of subjects (n), in the example, N = 300 compared to n = 50 subjects times m = 4 raters. A correction of n√(m - 1) was appropriate to find 95% CIs comparable to bootstrapped CIs. The concept of observed agreement and specific agreement can be extended to more than two raters with a valid estimation of the 95% CIs. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. Examining rating scales using Rasch and Mokken models for rater-mediated assessments.

    PubMed

    Wind, Stephanie A

    2014-01-01

    A variety of methods for evaluating the psychometric quality of rater-mediated assessments have been proposed, including rater effects based on latent trait models (e.g., Engelhard, 2013; Wolfe, 2009). Although information about rater effects contributes to the interpretation and use of rater-assigned scores, it is also important to consider ratings in terms of the structure of the rating scale on which scores are assigned. Further, concern with the validity of rater-assigned scores necessitates investigation of these quality control indices within student subgroups, such as gender, language, and race/ethnicity groups. Using a set of guidelines for evaluating the interpretation and use of rating scales adapted from Linacre (1999, 2004), this study demonstrates methods that can be used to examine rating scale functioning within and across student subgroups with indicators from Rasch measurement theory (Rasch, 1960) and Mokken scale analysis (Mokken, 1971). Specifically, this study illustrates indices of rating scale effectiveness based on Rasch models and models adapted from Mokken scaling, and considers whether the two approaches to evaluating the interpretation and use of rating scales lead to comparable conclusions within the context of a large-scale rater-mediated writing assessment. Major findings suggest that indices of rating scale effectiveness based on a parametric and nonparametric approach provide related, but slightly different, information about the structure of rating scales. Implications for research, theory, and practice are discussed.

  13. Inter-rater Reliability of Sustained Aberrant Movement Patterns as a Clinical Assessment of Muscular Fatigue

    PubMed Central

    Aerts, Frank; Carrier, Kathy; Alwood, Becky

    2016-01-01

    Background: The assessment of clinical manifestation of muscle fatigue is an effective procedure in establishing therapeutic exercise dose. Few studies have evaluated physical therapist reliability in establishing muscle fatigue through detection of changes in quality of movement patterns in a live setting. Objective: The purpose of this study is to evaluate the inter-rater reliability of physical therapists’ ability to detect altered movement patterns due to muscle fatigue. Design: A reliability study in a live setting with multiple raters. Participants: Forty-four healthy individuals (ages 19-35) were evaluated by six physical therapists in a live setting. Methods: Participants were evaluated by physical therapists for altered movement patterns during resisted shoulder rotation. Each participant completed a total of four tests: right shoulder internal rotation, right shoulder external rotation, left shoulder internal rotation and left shoulder external rotation. Results: For all tests combined, the inter-rater reliability for a single rater scoring ICC (2,1) was .65 (95%, .60, .71) This corresponds to moderate inter-rater reliability between physical therapists. Limitations: The results of this study apply only to healthy participants and therefore cannot be generalized to a symptomatic population. Conclusion: Moderate inter-rater reliability was found between physical therapists in establishing muscle fatigue through the observation of sustained altered movement patterns during dynamic resistive shoulder internal and external rotation. PMID:27347241

  14. Quinine blindness.

    PubMed

    Naraqi, S; Okem, S; Moyia, N; Dutta, T K; Zzferio, B; Lalloo, D

    1992-12-01

    A young women was treated with intravenous quinine and chloramphenicol for suspected severe malaria and/or typhoid fever. On the second day of quinine therapy (after 2.25 g of quinine) she suddenly developed total bilateral loss of vision. Both drugs were stopped and cyclandelate therapy was started. She showed slight improvement in vision but on referral her visual acuity was limited to seeing waving hand movement only; visual fields were constricted and colour vision was absent. Both pupils were fixed and dilated. The fundi showed macular oedema and attenuated retinal arteries. She was treated with dexamethasone, cyclandelate, vitamin B complex and vitamin C. Colour vision was completely recovered after 5 days of treatment. Full recovery of the direct light reflex occurred after 10 days. Visual acuity improved slowly over a period of one month to 6/15 vision in both eyes. At this time macular oedema and retinal arteriolar attenuation were still present but less severe. In the context of this case report the condition of quinine blindness is briefly reviewed and the management discussed.

  15. Inter‐rater agreement in the assessment of exposure to carcinogens in the offshore petroleum industry

    PubMed Central

    Steinsvåg, Kjersti; Bråtveit, Magne; Moen, Bente E; Kromhout, Hans

    2007-01-01

    Objectives To evaluate the reliability of an expert team assessing exposure to carcinogens in the offshore petroleum industry and to study how the information provided influenced the agreement among raters. Methods Eight experts individually assessed the likelihood of exposure for combinations of 17 carcinogens, 27 job categories and four time periods (1970–1979, 1980–1989, 1990–1999 and 2000–2005). Each rater assessed 1836 combinations based on summary documents on carcinogenic agents, which included descriptions of sources of exposure and products, descriptions of work processes carried out within the different job categories, and monitoring data. Inter‐rater agreement was calculated using Cohen's kappa index and single and average score intraclass correlation coefficients (ICC) (ICC(2,1) and ICC(2,8), respectively). Differences in inter‐rater agreement for time periods, raters, International Agency for Research on Cancer groups and the amount of information provided were consequently studied. Results Overall, 18% of the combinations were denoted as possible exposure, and 14% scored probable exposure. Stratified by the 17 carcinogenic agents, the probable exposure prevalence ranged from 3.8% for refractory ceramic fibres to 30% for crude oil. Overall mean kappa was 0.42 (ICC(2,1) = 0.62 and ICC(2,8) = 0.93). Providing limited quantitative measurement data was associated with less agreement than for equally well described carcinogens without sampling data. Conclusion The overall κ and single‐score ICC indicate that the raters agree on exposure estimates well above the chance level. The levels of inter‐rater agreement were higher than in other comparable studies. The average score ICC indicates reliable mean estimates and implies that sufficient raters were involved. The raters seemed to have enough documentation on which to base their estimates, but provision of limited monitoring data leads to more incongruence among raters. Having real

  16. Blindness and severe visual impairment in pupils at schools for the blind in Burundi.

    PubMed

    Ruhagaze, Patrick; Njuguna, Kahaki Kimani Margaret; Kandeke, Lévi; Courtright, Paul

    2013-01-01

    To determine the causes of childhood blindness and severe visual impairment in pupils attending schools for the blind in Burundi in order to assist planning for services in the country. All pupils attending three schools for the blind in Burundi were examined. A modified WHO/PBL eye examination record form for children with blindness and low vision was used to record the findings. Data was analyzed for those who became blind or severely visually impaired before the age of 16 years. Overall, 117 pupils who became visually impaired before 16 years of age were examined. Of these, 109 (93.2%) were blind or severely visually impaired. The major anatomical cause of blindness or severe visual impairment was cornea pathology/phthisis (23.9%), followed by lens pathology (18.3%), uveal lesions (14.7%) and optic nerve lesions (11.9%). In the majority of pupils with blindness or severe visual impairment, the underlying etiology of visual loss was unknown (74.3%). More than half of the pupils with lens related blindness had not had surgery; among those who had surgery, outcomes were generally poor. The causes identified indicate the importance of continuing preventive public health strategies, as well as the development of specialist pediatric ophthalmic services in the management of childhood blindness in Burundi. The geographic distribution of pupils at the schools for the blind indicates a need for community-based programs to identify and refer children in need of services.

  17. Blindness and Severe Visual Impairment in Pupils at Schools for the Blind in Burundi

    PubMed Central

    Ruhagaze, Patrick; Njuguna, Kahaki Kimani Margaret; Kandeke, Lévi; Courtright, Paul

    2013-01-01

    Purpose: To determine the causes of childhood blindness and severe visual impairment in pupils attending schools for the blind in Burundi in order to assist planning for services in the country. Materials and Methods: All pupils attending three schools for the blind in Burundi were examined. A modified WHO/PBL eye examination record form for children with blindness and low vision was used to record the findings. Data was analyzed for those who became blind or severely visually impaired before the age of 16 years. Results: Overall, 117 pupils who became visually impaired before 16 years of age were examined. Of these, 109 (93.2%) were blind or severely visually impaired. The major anatomical cause of blindness or severe visual impairment was cornea pathology/phthisis (23.9%), followed by lens pathology (18.3%), uveal lesions (14.7%) and optic nerve lesions (11.9%). In the majority of pupils with blindness or severe visual impairment, the underlying etiology of visual loss was unknown (74.3%). More than half of the pupils with lens related blindness had not had surgery; among those who had surgery, outcomes were generally poor. Conclusion: The causes identified indicate the importance of continuing preventive public health strategies, as well as the development of specialist pediatric ophthalmic services in the management of childhood blindness in Burundi. The geographic distribution of pupils at the schools for the blind indicates a need for community-based programs to identify and refer children in need of services. PMID:23580854

  18. Spondylolisthesis: Intra-rater and Inter-rater Reliabilities of Radiographic Sagittal Spinopelvic Parameters Using Standard Picture Archiving and Communication System Measurement Tools.

    PubMed

    Montgomery, Robert A; Hresko, M Timothy; Kalish, Leslie A; Gold, Meryl; Li, Ying; Haus, Brian; Glotzbecker, Michael; Berthonnaud, Eric

    2013-11-01

    Reliability analysis. To determine the intra-rater and inter-rater reliability of common sagittal spinopelvic measurements from Digital Imaging and Communications in Medicine images on a commercial Picture Archiving and Communication system for patients with developmental spondylolisthesis. Computer-aided analysis of digital radiographs has been used in research protocols to define anatomic and positional characteristics of developmental spondylolisthesis. Previous studies have shown poor reliability and weak correlations of manual measurements used in clinical practice with research measurements, which limit the clinical value of prior research. Five raters of varying experience measured lateral spinopelvic images of 30 patients with developmental spondylolisthesis. Measurements were repeated after 1 week. Intra-rater and inter-rater reliabilities for each measurement were determined. Measurements were compared with those obtained from a computer-based image enhancement research system. Continuous variables were assessed by analysis of variance, whereas kappa statistics were determined for categorical variables. Excellent intraclass correlations (ICC)s were obtained for all radiographic measurements based on linear values (slip ratio and C7 balance) as well as pelvic tilt angle. Angular measurements had good to excellent ICC but were weaker when the sacral plate was involved. There was poor agreement with classification of sacral doming. Some measurements had reduced reliability in the images with evidence of doming. Excellent ICCs were found with measurements of from Digital Imaging and Communications in Medicine images using commercial Picture Archiving and Communication System tools. Sacral doming affected the reliability. A radiographic classification of spondylolisthesis will be most reliable when based on slip ratio, C7 balance, and pelvic tilt. Copyright © 2013 Scoliosis Research Society. Published by Elsevier Inc. All rights reserved.

  19. Clothing-Selection Habits of Teenage Girls Who Are Sighted and Blind.

    ERIC Educational Resources Information Center

    Kaufman, Al

    2000-01-01

    A study that compared the clothing-selection habits of 15 adolescent girls with blindness and 15 sighted girls found parents played a larger role in selecting the clothing for the girls with blindness, girls with blindness wore less makeup and jewelry, and care requirements were more important to girls with blindness. (Contains 12 references.) (CR)

  20. [Inter-rater reliability and validity of the OPD-CA axes structure and conflict].

    PubMed

    Benecke, Cord; Bock, Astrid; Wieser, Elke; Tschiesner, Reinhard; Lochmann, Martha; Küspert, Felicia; Schorn, Robert; Viertler, Bernhard; Steinmayr-Gensluckner, Maria

    2011-01-01

    The manual of the Operationalized Psychodynamic Diagnostics in childhood and adolescence (OPD-CA) is an instrument meanwhile widespread in the clinical practice to assess psychodynamic dimensions. Publications of inter-rater agreement and validity are still outstanding. This study assessed the interrater-reliability and validity for the axis structure and the axis conflict. 60 adolescents between 14 and 17 years, with and without psychic disorders, were diagnosed with the Operationalized Psychodynamic Diagnostics in childhood and adolescence (Arbeitskreis OPD-KJ, 2007) and SCID-II-interviews and questionnaires. A partial sample of 36 OPD-CA-interviews was the data basis for the assessment of inter-rater agreement. Calculations of validity for axis structure and axis conflict were made with the whole sample. Inter-rater agreement for the axis structure and the axis conflict showed good to very good weighted Kappa coefficients among the trained raters. Validity of the axis structure showed good results. The Operationalized Psychodynamic Diagnostics in childhood and adolescence (OPD-CA) allows a reliable diagnostic of axis structure and axis conflict, if the ratings are done on the basis of semistructured videotaped interviews by trained raters. The axis structure shows validity, while the results concerning the validity of the axis conflict remain unclear.

  1. Intra- and inter-rater reliability of digital image analysis for skin color measurement

    PubMed Central

    Sommers, Marilyn; Beacham, Barbara; Baker, Rachel; Fargo, Jamison

    2013-01-01

    Background We determined the intra- and inter-rater reliability of data from digital image color analysis between an expert and novice analyst. Methods Following training, the expert and novice independently analyzed 210 randomly ordered images. Both analysts used Adobe® Photoshop lasso or color sampler tools based on the type of image file. After color correction with Pictocolor® in camera software, they recorded L*a*b* (L*=light/dark; a*=red/green; b*=yellow/blue) color values for all skin sites. We computed intra-rater and inter-rater agreement within anatomical region, color value (L*, a*, b*), and technique (lasso, color sampler) using a series of one-way intra-class correlation coefficients (ICCs). Results Results of ICCs for intra-rater agreement showed high levels of internal consistency reliability within each rater for the lasso technique (ICC ≥ 0.99) and somewhat lower, yet acceptable, level of agreement for the color sampler technique (ICC = 0.91 for expert, ICC = 0.81 for novice). Skin L*, skin b*, and labia L* values reached the highest level of agreement (ICC ≥ 0.92) and skin a*, labia b*, and vaginal wall b* were the lowest (ICC ≥ 0.64). Conclusion Data from novice analysts can achieve high levels of agreement with data from expert analysts with training and the use of a detailed, standard protocol. PMID:23551208

  2. Cultural values and performance appraisal: assessing the effects of rater self-construal on performance ratings.

    PubMed

    Mishra, Vipanchi; Roch, Sylvia G

    2013-01-01

    Much of the prior research investigating the influence of cultural values on performance ratings has focused either on conducting cross-national comparisons among raters or using cultural level individualism/collectivism scales to measure the effects of cultural values on performance ratings. Recent research has shown that there is considerable within country variation in cultural values, i.e. people in one country can be more individualistic or collectivistic in nature. Taking the latter perspective, the present study used Markus and Kitayama's (1991) conceptualization of independent and interdependent self-construals as measures of individual variations in cultural values to investigate within culture variations in performance ratings. Results suggest that rater self-construal has a significant influence on overall performance evaluations; specifically, raters with a highly interdependent self-construal tend to show a preference for interdependent ratees, whereas raters high on independent self-construal do not show a preference for specific type of ratees when making overall performance evaluations. Although rater self-construal significantly influenced overall performance evaluations, no such effects were observed for specific dimension ratings. Implications of these results for performance appraisal research and practice are discussed.

  3. A template for reliable assessment of resident operative performance: assessment intervals, numbers of cases and raters.

    PubMed

    Williams, Reed G; Verhulst, Steven; Colliver, Jerry A; Sanfey, Hilary; Chen, Xiaodong; Dunnington, Gary L

    2012-10-01

    Operative performance rating (OPR) instruments have been developed to assess operative performance (OP). To guide program implementation, this study determined: 1) Appropriate intervals for OP progress decisions, 2) Number of OPRs and raters required per interval to achieve reproducible results. 21 surgeons rated 897 OPs (3 procedures) by 36 residents. Six-month PGY intervals were compared to determine length of stable operative performance intervals. Variance component analyses established rating factor importance. Generalizability analyses and decision studies determined number of OPRs required for reproducible OP decisions (reliabilities = 0.80). Resident OPRs are stable across single PGY years. 2.3 OPRs/resident/month provided a dependable basis for annual or semi-annual resident OP decisions. Results were similar for all procedures and training years. Rater idiosyncrasies accounted for most score variation (63% when interaction effects involving rater idiosyncrasies were included). Resident ability was the next most important source of variation (12%). Procedure was a less important source (5%). Annual resident OP decisions are supported. 2.3 OPRs per month provide a dependable basis for judging resident OP. These numbers are sufficient regardless of training year or procedure mix though efforts should be made to balance procedure mix. Multiple raters should rate each resident to control for rater idiosyncrasies. Copyright © 2012 Mosby, Inc. All rights reserved.

  4. Intra- and inter-rater reliability of digital image analysis for skin color measurement.

    PubMed

    Sommers, Marilyn; Beacham, Barbara; Baker, Rachel; Fargo, Jamison

    2013-11-01

    We determined the intra- and inter-rater reliability of data from digital image color analysis between an expert and novice analyst. Following training, the expert and novice independently analyzed 210 randomly ordered images. Both analysts used Adobe(®) Photoshop lasso or color sampler tools based on the type of image file. After color correction with Pictocolor(®) in camera software, they recorded L*a*b* (L*=light/dark; a*=red/green; b*=yellow/blue) color values for all skin sites. We computed intra-rater and inter-rater agreement within anatomical region, color value (L*, a*, b*), and technique (lasso, color sampler) using a series of one-way intra-class correlation coefficients (ICCs). Results of ICCs for intra-rater agreement showed high levels of internal consistency reliability within each rater for the lasso technique (ICC ≥ 0.99) and somewhat lower, yet acceptable, level of agreement for the color sampler technique (ICC = 0.91 for expert, ICC = 0.81 for novice). Skin L*, skin b*, and labia L* values reached the highest level of agreement (ICC ≥ 0.92) and skin a*, labia b*, and vaginal wall b* were the lowest (ICC ≥ 0.64). Data from novice analysts can achieve high levels of agreement with data from expert analysts with training and the use of a detailed, standard protocol. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  5. Measuring teacher dispositions using the DAATS battery: a multifaceted Rasch analysis of rater effect.

    PubMed

    Lang, W Steve; Wilkerson, Judy R; Rea, Dorothy C; Quinn, David; Batchelder, Heather L; Englehart, Dierdre S; Jennings, Kelly J

    2014-01-01

    The purpose of this study was to examine the extent to which raters' subjectivity impacts measures of teacher dispositions using the Dispositions Assessments Aligned with Teacher Standards (DAATS) battery. This is an important component of the collection of evidence of validity and reliability of inferences made using the scale. It also provides needed support for the use of subjective affective measures in teacher training and other professional preparation programs, since these measures are often feared to be unreliable because of rater effect. It demonstrates the advantages of using the Multi-Faceted Rasch Model as a better alternative to the typical methods used in preparation programs, such as Cohen's Kappa. DAATS instruments require subjective scoring using a six-point rating scale derived from the affective taxonomy as defined by Krathwohl, Bloom, and Masia (1956). Rater effect is a serious challenge and can worsen or drift over time. Errors in rater judgment can impact the accuracy of ratings, and these effects are common, but can be lessened through training of raters and monitoring of their efforts. This effort uses the multifaceted Rasch measurement models (MFRM) to detect and understand the nature of these effects.

  6. Unsupervised Blind Deconvolution

    NASA Astrophysics Data System (ADS)

    Baena-Galle, R.; Kann, L.; Mugnier, L.; Gudimetla, R.; Johnson, R.; Gladysz, S.

    2013-09-01

    "Blind" deconvolution is rarely executed blindly. All available methods have parameters which the user fine-tunes until the most visually-appealing reconstruction is achieved. The "art" of deconvolution is to find constraints which allow for the best estimate of an object to be recovered, but in practice these parameterized constraints often reduce deconvolution to the struggle of trial and error. In the course of AFOSR-sponsored activities we are developing a general maximum a posteriori framework for the problem of imaging through atmospheric turbulence, with the emphasis on multi-frame blind deconvolution. Our aim is to develop deconvolution strategy which is reference-less, i.e. no calibration PSF is required, extendable to longer exposures, and applicable to imaging with adaptive optics. In the first part of the project the focus has been on developing a new theory of statistics of images taken through turbulence, both with-, and without adaptive optics. Images and their Fourier transforms have been described as random phasor sums, their fluctuations controlled by wavefront "cells" and moments of the phase. The models were validated using simulations and real data from the 3.5m telescope at the Starfire Optical Range in New Mexico. Another important ingredient of the new framework is the capability to estimate the average PSF automatically from the target observations. A general approach, applicable to any type of object, has been proposed. Here use is made of an object-cancelling transformation of the image sequence. This transformation yields information about the atmospheric PSF. Currently, the PSF estimation module and the theoretical constraints on PSF variability are being incorporated into multi-frame blind deconvolution. In preliminary simulation tests we obtained significantly sharper images with respect to the starting observations and PSF estimates which closely track the input kernels. Thanks to access to the SOR 3.5m telescope we are now testing

  7. Inter- and intra-rater reliability of the manual handling component of the WorkHab Functional Capacity Evaluation.

    PubMed

    James, Carole; Mackenzie, Lynette; Capra, Mike

    2011-01-01

    The WorkHab Functional Capacity Evaluation (FCE) is widely used in Australian workplace injury management and occupational rehabilitation arenas; however, there is a lack of published literature regarding its reliability and validity.  This study investigated the intra- and inter-rater reliability of the manual handling component of this FCE.  A DVD was produced containing footage of the manual handling components of the WorkHab conducted with four injured workers. Therapist raters (n = 17) who were trained and accredited in use of the WorkHab FCE scored these components and 14 raters re-evaluated them after approximately 2 weeks. Ratings were compared using intraclass correlation coefficients (ICCs), paired sample t-tests (intra-rater), chi-squared (inter-rater) and percentage agreement. Intra-rater agreement was high with ICCs for the manual handling components and manual handling score showing excellent reliability (0.94-0.98) and good reliability for identification of the safe maximal lift (ICC: 0.81). Overall inter-rater agreement ranged from good to excellent for the manual handling components and safe maximal lift determination (ICC > 0.9). Agreement for safe maximal lift identification was good.  Ratings demonstrated substantial levels of intra-rater and inter-rater reliability for the lifting components of the WorkHab FCEs.

  8. Representing vision and blindness.

    PubMed

    Ray, Patrick L; Cox, Alexander P; Jensen, Mark; Allen, Travis; Duncan, William; Diehl, Alexander D

    2016-01-01

    There have been relatively few attempts to represent vision or blindness ontologically. This is unsurprising as the related phenomena of sight and blindness are difficult to represent ontologically for a variety of reasons. Blindness has escaped ontological capture at least in part because: blindness or the employment of the term 'blindness' seems to vary from context to context, blindness can present in a myriad of types and degrees, and there is no precedent for representing complex phenomena such as blindness. We explore current attempts to represent vision or blindness, and show how these attempts fail at representing subtypes of blindness (viz., color blindness, flash blindness, and inattentional blindness). We examine the results found through a review of current attempts and identify where they have failed. By analyzing our test cases of different types of blindness along with the strengths and weaknesses of previous attempts, we have identified the general features of blindness and vision. We propose an ontological solution to represent vision and blindness, which capitalizes on resources afforded to one who utilizes the Basic Formal Ontology as an upper-level ontology. The solution we propose here involves specifying the trigger conditions of a disposition as well as the processes that realize that disposition. Once these are specified we can characterize vision as a function that is realized by certain (in this case) biological processes under a range of triggering conditions. When the range of conditions under which the processes can be realized are reduced beyond a certain threshold, we are able to say that blindness is present. We characterize vision as a function that is realized as a seeing process and blindness as a reduction in the conditions under which the sight function is realized. This solution is desirable because it leverages current features of a major upper-level ontology, accurately captures the phenomenon of blindness, and can be

  9. Assessment of Interpersonal Motivation in Transcripts (AIMIT): an inter- and intra-rater reliability study of a new method of detection of interpersonal motivational systems in psychotherapy.

    PubMed

    Fassone, G; Valcella, F; Pallini, S; Scarcella, F; Tombolini, L; Ivaldi, A; Prunetti, E; Manaresi, F; Liotti, G

    2012-01-01

    Assessing Interpersonal Motivations in Transcripts (AIMIT) is a coding system aiming to systematically detect the activity of interpersonal motivational systems (IMS) in the therapeutic dialogue. An inter- and intra-rater reliability study has been conducted. Sixteen video-recorded psychotherapy sessions were selected and transcribed according to the AIMIT criteria. Sessions relate to 16 patients with an Axis II diagnosis, with a mean Global Assessment of Functioning of 51. For the intra-rater reliability evaluation, five sessions have been selected and assigned to five independent coders who where asked to make a first evaluation, and then a second independent one 14 days later. For the inter-rater reliability study, the sessions coded by the therapist-coder were jointly revised with another coder and finally classified as gold standard. The 16 standard sessions were sent to other evaluators for the independent coding. The agreement (κ) was estimated according to the following parameters for each coding unit: evaluation units supported by the 'codable' activation of one or more IMS; motivational interaction with reference to the ongoing relation between patient and therapist; an interaction between the patient and another person reported/narrated by the patient; detection of specific IMS: attachment (At), caregiving (CG), rank (Ra), sexuality (Se), peer cooperation (PC); and transitions from one IMS to another were also scored. The intra-rater agreement was evaluated through the parameters 'cod', 'At', 'CG', 'Ra', 'Se' and 'PC' described above. A total of 2443 coding units were analysed. For the nine parameters on which the agreement was calculated, eight ['coded (Cod)', 'ongoing relation (Rel)', 'narrated relation (Nar)', 'At', 'CG', 'Ra', 'Se' and 'PC'] have κ values comprised between 0.62 (CG) and 0.81 (Cod) and were therefore satisfactory. The scoring of 'transitions' showed agreement values slightly below desired cut-off (0.56). Intra-rater reliability was

  10. Inter- and intra-rater reliability of postural assessment for scoliosis.

    PubMed

    Suwanasri, Chompunoot; Sakullertphasuk, Wimonrat; Tosiriphattana, Meena; Sa-ngounsak, Thachakorn; Ekabutr, Worawan

    2014-07-01

    The purpose of the present study is to assess the inter- and intra-rater reliability of a postural assessment for detecting scoliosis between 4th year physical therapy (PT) students and PT specialists by using 30 postural indices. Six examiners, three 4th year PT students and 3 certified PTspecialists, performed the postural indices on 10 asymptomatic subjects. Inter-rater reliability between 4th year students and PT specialists and intra-rater reliability ranged from poor to almost perfect. Two items that needed rectification were PSIS level and iliac crest. There was a variable range of values in agreement either within or between examiners for the assessment of scoliosis screening. This shows that the assessment in scoliosis screening should be used with caution by 4th year students.

  11. Inter-rater reliability of measures to characterize the tobacco retail environment in Mexico.

    PubMed

    Hall, Marissa G; Kollath-Cattano, Christy; Reynales-Shigematsu, Luz Myriam; Thrasher, James F

    2015-01-01

    To evaluate the inter-rater reliability of a data collection instrument to assess the tobacco retail environment in Mexico, after major marketing regulations were implemented. In 2013, two data collectors independently evaluated 21 stores in two census tracts, through a data collection instrument that assessed the presence of price promotions, whether single cigarettes were sold, the number of visible advertisements, the presence of signage prohibiting the sale of cigarettes to minors, and characteristics of cigarette pack displays. We evaluated the inter-rater reliability of the collected data, through the calculation of metrics such as intraclass correlation coefficient, percent agreement, Cohen's kappa and Krippendorff's alpha. Most measures demonstrated substantial or perfect inter-rater reliability. Our results indicate the potential utility of the data collection instrument for future point-of-sale research.

  12. Cultural adaptation, content validity and inter-rater reliability of the "STAR Skin Tear Classification System".

    PubMed

    Strazzieri-Pulido, Kelly Cristina; Santos, Vera Lúcia Conceição de Gouveia; Carville, Keryln

    2015-01-01

    to perform the cultural adaptation of the STAR Skin Tear Classification System into the Portuguese language and to test the content validity and inter-rater reliability of the adapted version. methodological study with a quantitative approach. The cultural adaptation was developed in three phases: translation, evaluation by a committee of judges and back-translation. The instrument was tested regarding content validity and inter-rater reliability. the adapted version obtained a regular level of concordance when it was applied by nurses using photographs of friction injuries. Regarding its application in clinical practice, the adapted version obtained a moderate and statistically significant level of concordance. the study tested the content validity and inter-rater reliability of the version adapted into the Portuguese language. Its inclusion in clinical practice will enable the correct identification of this type of injury, as well as the implementation of protocols for the prevention and treatment of friction injuries.

  13. An inter-rater reliability study for the rorschach performance assessment system.

    PubMed

    Viglione, Donald J; Blume-Marcovici, Amy C; Miller, Heidi L; Giromini, Luciano; Meyer, Gregory

    2012-01-01

    Based on available research findings, the Rorschach performance assessment system (Meyer, Viglione, Mihura, Erard, & Erdberg, 2011 ) was recently developed in an attempt to ground the administration, coding, and interpretation of the Rorschach in its evidence base, improve its normative foundation, integrate international findings, reduce examiner variability, and increase utility. This study sought to establish inter-rater reliability for the coding decisions in this new system. We randomly selected 50 Rorschach records from ongoing research projects using R-Optimized administration. The records were administered by 16 examiners and came from a diverse sample in terms of age, sex, ethnicity, educational background, and patient status. Results demonstrated a mean intraclass correlation of .88 and median of .92. Overall, the findings indicate good to excellent inter-rater reliability for the great majority of codes and are consistent with previous findings of strong inter-rater reliability for alternative Rorschach systems and scores.

  14. Examining rater and occasion influences in observational assessments obtained from within the clinical environment

    PubMed Central

    Kreiter, Clarence D.; Wilson, Adam B.; Humbert, Aloysius J.; Wade, Patricia A.

    2016-01-01

    Background When ratings of student performance within the clerkship consist of a variable number of ratings per clinical teacher (rater), an important measurement question arises regarding how to combine such ratings to accurately summarize performance. As previous G studies have not estimated the independent influence of occasion and rater facets in observational ratings within the clinic, this study was designed to provide estimates of these two sources of error. Method During 2 years of an emergency medicine clerkship at a large midwestern university, 592 students were evaluated an average of 15.9 times. Ratings were performed at the end of clinical shifts, and students often received multiple ratings from the same rater. A completely nested G study model (occasion: rater: person) was used to analyze sampled rating data. Results The variance component (VC) related to occasion was small relative to the VC associated with rater. The D study clearly demonstrates that having a preceptor rate a student on multiple occasions does not substantially enhance the reliability of a clerkship performance summary score. Conclusions Although further research is needed, it is clear that case-specific factors do not explain the low correlation between ratings and that having one or two raters repeatedly rate a student on different occasions/cases is unlikely to yield a reliable mean score. This research suggests that it may be more efficient to have a preceptor rate a student just once. However, when multiple ratings from a single preceptor are available for a student, it is recommended that a mean of the preceptor's ratings be used to calculate the student's overall mean performance score. PMID:26925540

  15. Auditory Spatial Recalibration in Congenital Blind Individuals

    PubMed Central

    Finocchietti, Sara; Cappagli, Giulia; Gori, Monica

    2017-01-01

    Blind individuals show impairments for auditory spatial skills that require complex spatial representation of the environment. We suggest that this is partially due to the egocentric frame of reference used by blind individuals. Here we investigate the possibility of reducing the mentioned auditory spatial impairments with an audio-motor training. Our hypothesis is that the association between a motor command and the corresponding movement's sensory feedback can provide an allocentric frame of reference and consequently help blind individuals in understanding complex spatial relationships. Subjects were required to localize the end point of a moving sound before and after either 2-min of audio-motor training or a complete rest. During the training, subjects were asked to move their hand, and consequently the sound source, to freely explore the space around the setup and the body. Both congenital blind (N = 20) and blindfolded healthy controls (N = 28) participated in the study. Results suggest that the audio-motor training was effective in improving space perception of blind individuals. The improvement was not observed in those subjects that did not perform the training. This study demonstrates that it is possible to recalibrate the auditory spatial representation in congenital blind individuals with a short audio-motor training and provides new insights for rehabilitation protocols in blind people. PMID:28261053

  16. Three-dimensional intra-rater and inter-rater reliability during a posed smile using a video-based motion analyzing system.

    PubMed

    Mishima, Katsuaki; Umeda, Hirotsugu; Nakano, Asuka; Shiraishi, Ruriko; Hori, Sayaka; Ueyama, Yoshiya

    2014-07-01

    The purpose of this study was to determine the three-dimensional reproducibility of lip movement during a posed smile using a video-based motion analyzing system. In six adult volunteers (4 males and 2 females), the lip motions during a posed smile were recorded six times. Using our recently-developed motion analyzing system, range images were produced across the whole sequence during the posed smile. Virtual grids of 5 × 5 were fitted onto the surfaces, and the three-dimensional coordinates of the intersections of these grids were then computed. The magnitude of the shift of the intersections during smiling was calculated and summed in each area. Intraclass correlation coefficients (ICC), ICC (1,1) for intra-rater reliability and ICC (2,1) for inter-rater reliability were calculated. The number of repeated measurements necessary for an ICC level beyond 0.8 was determined using the formula of Spearman-Brown. The ICC (1,1) and ICC (2,1) ranged from 0.71 to 0.83 and from 0.77 to 0.99, respectively. The number of repeated measurements necessary for an ICC beyond 0.8 was 2. From the present study, both the three-dimensional intra-rater and inter-rater reliabilities during a posed smile were considered to be relatively high, and enough reliability could be expected by calculating the average of the values measured two times. However, the sample size was very small, this could not be generalized simplistically. Copyright © 2013 European Association for Cranio-Maxillo-Facial Surgery. Published by Elsevier Ltd. All rights reserved.

  17. Effects of Rating Purpose and Rater Self-Esteem on Performance Ratings.

    DTIC Science & Technology

    1983-03-01

    MUlIGUR 83-3 Z 11l 4. TITLE (and Subtitle) a. TYPE Or REPORT A PVfOO COVERED Effects of Rating Purpose and Rater Self - Esteem Interim I on...rnmbee) U..N1’ MiAY 27 093 Performance Ratings, Rating Purpose, Self - Esteem 26, ASTRACT (Continueo an revere* side of necessay aid Identify by block gmN...The influences of intended rating purpose (administrative vs. employee counseling) and rater self - esteem on ratings of employee performance were

  18. Establishing inter-rater reliability scoring in a state trauma system.

    PubMed

    Read-Allsopp, Christine

    2004-01-01

    Trauma systems rely on accurate Injury Severity Scoring (ISS) to describe trauma patient populations. Twenty-seven (27) Trauma Nurse Coordinators and Data Managers across the state of New South Wales, Australia trauma network were instructed in the uses and techniques of the Abbreviated Injury Scale (AIS) from the Association for the Advancement of Automotive Medicine. The aim is to provide accurate, reliable and valid data for the state trauma network. Four (4) months after the course a coding exercise was conducted to assess inter-rater reliability. The results show that inter-rater reliability is with accepted international standards.

  19. Structured interview for mild traumatic brain injury after military blast: inter-rater agreement and development of diagnostic algorithm.

    PubMed

    Walker, William C; Cifu, David X; Hudak, Anne M; Goldberg, Gary; Kunz, Richard D; Sima, Adam P

    2015-04-01

    The existing gold standard for diagnosing a suspected previous mild traumatic brain injury (mTBI) is clinical interview. But it is prone to bias, especially for parsing the physical versus psychological effects of traumatic combat events, and its inter-rater reliability is unknown. Several standardized TBI interview instruments have been developed for research use but have similar limitations. Therefore, we developed the Virginia Commonwealth University (VCU) retrospective concussion diagnostic interview, blast version (VCU rCDI-B), and undertook this cross-sectional study aiming to 1) measure agreement among clinicians' mTBI diagnosis ratings, 2) using clinician consensus develop a fully structured diagnostic algorithm, and 3) assess accuracy of this algorithm in a separate sample. Two samples (n = 66; n = 37) of individuals within 2 years of experiencing blast effects during military deployment underwent semistructured interview regarding their worst blast experience. Five highly trained TBI physicians independently reviewed and interpreted the interview content and gave blinded ratings of whether or not the experience was probably an mTBI. Paired inter-rater reliability was extremely variable, with kappa ranging from 0.194 to 0.825. In sample 1, the physician consensus prevalence of probable mTBI was 84%. Using these diagnosis ratings, an algorithm was developed and refined from the fully structured portion of the VCU rCDI-B. The final algorithm considered certain symptom patterns more specific for mTBI than others. For example, an isolated symptom of "saw stars" was deemed sufficient to indicate mTBI, whereas an isolated symptom of "dazed" was not. The accuracy of this algorithm, when applied against the actual physician consensus in sample 2, was almost perfect (correctly classified = 97%; Cohen's kappa = 0.91). In conclusion, we found that highly trained clinicians often disagree on historical blast-related mTBI determinations. A fully structured interview

  20. Inter- and intra-rater reliability of the GAITRite system among individuals with sub-acute stroke.

    PubMed

    Wong, Jennifer S; Jasani, Hardika; Poon, Vivien; Inness, Elizabeth L; McIlroy, William E; Mansfield, Avril

    2014-01-01

    Technology-based assessment tools with semi-automated processing, such as pressure-sensitive mats used for gait assessment, may be considered to be objective; therefore it may be assumed that rater reliability is not a concern. However, user input is often required and rater reliability must be determined. The purpose of this study was to assess the inter- and intra-rater reliability of spatial and temporal characteristics of gait in stroke patients using the GAITRite system. Forty-six individuals with stroke attending in-patient rehabilitation walked across the pressure-sensitive mat 2-4 times at preferred walking speeds, with or without a gait aid. Five raters independently processed gait data. Three raters re-processed the data after a delay of at least one month. The intraclass correlation coefficients (ICC) and 95% confidence intervals of the ICC were determined for velocity, step time, step length, and step width. Inter-rater reliability for velocity, step time, and step length were high (ICC>0.90). Intra-rater reliability was generally greater than inter-rater reliability (from 0.81 to >0.99 for inter-rater versus 0.77 to >0.99 for intra-rater reliability). Overall, this study suggests that GAITRite is a reliable assessment tool; however, there still remains subjectivity in processing the data, resulting in no patients with perfect agreement between raters. Additional logic checking within the processing software or standardization of training could help to reduce potential errors in processing.

  1. Blind Quantum Signature with Blind Quantum Computation

    NASA Astrophysics Data System (ADS)

    Li, Wei; Shi, Ronghua; Guo, Ying

    2017-04-01

    Blind quantum computation allows a client without quantum abilities to interact with a quantum server to perform a unconditional secure computing protocol, while protecting client's privacy. Motivated by confidentiality of blind quantum computation, a blind quantum signature scheme is designed with laconic structure. Different from the traditional signature schemes, the signing and verifying operations are performed through measurement-based quantum computation. Inputs of blind quantum computation are securely controlled with multi-qubit entangled states. The unique signature of the transmitted message is generated by the signer without leaking information in imperfect channels. Whereas, the receiver can verify the validity of the signature using the quantum matching algorithm. The security is guaranteed by entanglement of quantum system for blind quantum computation. It provides a potential practical application for e-commerce in the cloud computing and first-generation quantum computation.

  2. Blind Quantum Signature with Blind Quantum Computation

    NASA Astrophysics Data System (ADS)

    Li, Wei; Shi, Ronghua; Guo, Ying

    2016-12-01

    Blind quantum computation allows a client without quantum abilities to interact with a quantum server to perform a unconditional secure computing protocol, while protecting client's privacy. Motivated by confidentiality of blind quantum computation, a blind quantum signature scheme is designed with laconic structure. Different from the traditional signature schemes, the signing and verifying operations are performed through measurement-based quantum computation. Inputs of blind quantum computation are securely controlled with multi-qubit entangled states. The unique signature of the transmitted message is generated by the signer without leaking information in imperfect channels. Whereas, the receiver can verify the validity of the signature using the quantum matching algorithm. The security is guaranteed by entanglement of quantum system for blind quantum computation. It provides a potential practical application for e-commerce in the cloud computing and first-generation quantum computation.

  3. Onchocerciasis (River Blindness) FAQs

    MedlinePlus

    ... The CDC Parasites - Onchocerciasis (also known as River Blindness) Note: Javascript is disabled or is not supported ... infected Simulium blackfly. It is also called River Blindness because the fly that transmits infection breeds in ...

  4. Active comparator-controlled, rater-blinded study of corticotropin-based immunotherapies for opsoclonus-myoclonus syndrome.

    PubMed

    Tate, Elizabeth D; Pranzatelli, Michael R; Verhulst, Steven J; Markwell, Stephen J; Franz, David Neal; Graf, William D; Joseph, S Anne; Khakoo, Yasmin N; Lo, Warren D; Mitchell, Wendy G; Sivaswamy, Lalitha

    2012-07-01

    To test the efficacy and safety of corticotropin-based immunotherapies in pediatric opsoclonus-myoclonus syndrome, 74 children received corticotropin alone or with intravenous immunoglobulin (groups 1 and 2, active controls); or both with rituximab (group 3) or cyclophosphamide (group 4); or with rituximab plus chemotherapy (group 5) or steroid sparers (group 6). There was 65% improvement in motor severity score across groups (P < .0001), but treatment combinations were more effective than corticotropin alone (P = .0009). Groups 3, 4, and 5 responded better than group 1; groups 3 and 5 responded better than group 2. The response frequency to corticotropin was higher than to prior corticosteroids (P < .0001). Fifty-five percent had adverse events (corticosteroid excess), more so with multiagents (P = .03); and 10% had serious adverse events. This study demonstrates greater efficacy of corticotropin-based multimodal therapy compared with conventional therapy, greater response to corticotropin than corticosteroid-based therapy, and overall tolerability.

  5. Blindness and anorexia nervosa.

    PubMed

    McFarlane, A C

    1989-06-01

    Two cases of anorexia nervosa in blind patients are reported. They demonstrate that blind children experience many developmental problems which are thought to be important in the etiology of anorexia nervosa. Similarly, blind children are unusually susceptible to misperceive their body size and weight. The apparent absence of a strong association between congenital blindness and anorexia nervosa challenges the presumed aetiological link between disturbed body image and identity diffusion, and anorexia nervosa.

  6. Blind Loop Syndrome

    MedlinePlus

    ... breeding ground for bacteria. The bacteria may produce toxins as well as block the absorption of nutrients. The greater the length of small bowel involved in the blind loop, the greater the chance of bacterial overgrowth. What triggers blind loop syndrome? Blind loop ...

  7. Evaluating the Construct-Coverage of the e-rater[R] Scoring Engine. Research Report. ETS RR-09-01

    ERIC Educational Resources Information Center

    Quinlan, Thomas; Higgins, Derrick; Wolff, Susanne

    2009-01-01

    This report evaluates the construct coverage of the e-rater[R[ scoring engine. The matter of construct coverage depends on whether one defines writing skill, in terms of process or product. Originally, the e-rater engine consisted of a large set of components with a proven ability to predict human holistic scores. By organizing these capabilities…

  8. Construct Validity of "e-rater"® in Scoring TOEFL® Essays. Research Report. ETS RR-07-21

    ERIC Educational Resources Information Center

    Attali, Yigal

    2007-01-01

    This study examined the construct validity of the "e-rater"® automated essay scoring engine as an alternative to human scoring in the context of TOEFL® essay writing. Analyses were based on a sample of students who repeated the TOEFL within a short time period. Two "e-rater" scores were investigated in this study, the first…

  9. Rater Biases in Genetically Informative Research Designs: Comment on Bartels, Boomsma, Hudziak, van Beijsterveldt, and van den Oord (2007)

    ERIC Educational Resources Information Center

    Hoyt, William T.

    2007-01-01

    Rater biases are of interest to behavior genetic researchers, who often use ratings data as a basis for studying heritability. Inclusion of multiple raters for each sibling pair (M. Bartels, D. I. Boomsma, J. J. Hudziak, T. C. E. M. van Beijsterveldt, & E. J. C. G. van den Oord, 2007) is a promising strategy for controlling bias variance and may…

  10. Diagnostic Accuracy of the Halstead-Reitan and Luria-Nebraska Neuropsychological Batteries: Performance of Clinical Raters.

    ERIC Educational Resources Information Center

    Kane, Robert L.; And Others

    1987-01-01

    Three experienced neuropsychologists rated brain damaged and control subjects for brain damage using the Halstead-Reitan Battery and the Luria-Nebraska Neuropsychological Battery. Using either battery, raters were accurate in judging the presence of brain damage. There was a high degree of consistency between raters and test batteries when both…

  11. Writing Evaluation: Rater and Task Effects on the Reliability of Writing Scores for Children in Grades 3 and 4

    ERIC Educational Resources Information Center

    Kim, Young-Suk Grace; Schatschneider, Christopher; Wanzek, Jeanne; Gatlin, Brandy; Al Otaiba, Stephanie

    2017-01-01

    We examined how raters and tasks influence measurement error in writing evaluation and how many raters and tasks are needed to reach a desirable level of 0.90 and 0.80 reliabilities for children in Grades 3 and 4. A total of 211 children (102 boys) were administered three tasks in narrative and expository genres, respectively, and their written…

  12. Building "e-rater"® Scoring Models Using Machine Learning Methods. Research Report. ETS RR-16-04

    ERIC Educational Resources Information Center

    Chen, Jing; Fife, James H.; Bejar, Isaac I.; Rupp, André A.

    2016-01-01

    The "e-rater"® automated scoring engine used at Educational Testing Service (ETS) scores the writing quality of essays. In the current practice, e-rater scores are generated via a multiple linear regression (MLR) model as a linear combination of various features evaluated for each essay and human scores as the outcome variable. This…

  13. Liking and attributions of motives as mediators of the relationships between individuals' reputations, helpful behaviors, and raters' reward decisions.

    PubMed

    Johnson, Diane E; Erez, Amir; Kiker, D Scott; Motowidlo, Stephan J

    2002-08-01

    Two studies investigated the mediating effects of liking and attributions of motives on the relationship between a ratee's reputation and helpful behaviors and raters' reward decisions. During managerial simulations, raters evaluated individuals after watching videotapes in which the individual's reputation and helpful behaviors were manipulated. Results indicated an interaction effect between reputation and helpful behaviors such that a helpful person with a good reputation received more rewards than did a helpful person with a bad reputation. In contrast, an unhelpful person with a good reputation did not receive better rewards than an unhelpful person with a bad reputation. Moreover, raters' liking of ratees and the motives raters attributed to ratees' helpful behaviors mediated the relationship between the manipulations and raters' reward decisions.

  14. Phone and Video-Based Modalities of Central Blinded Adjudication of Modified Rankin Scores in an Endovascular Stroke Trial.

    PubMed

    López-Cancio, Elena; Salvat, Mercè; Cerdà, Neus; Jiménez, Marta; Codas, Javier; Llull, Laura; Boned, Sandra; Cano, Luis M; Lara, Blanca; Molina, Carlos; Cobo, Erik; Dávalos, Antoni; Jovin, Tudor G; Serena, Joaquín

    2015-12-01

    The standard outcome measure in stroke research is modified Rankin scale (mRS) evaluated by local blinded investigators. We aimed to assess feasibility and reliability of 2 central adjudication methods of mRS in the setting of a randomized endovascular stroke trial. This is a secondary analysis derived from the Randomized Trial of Revascularization With Solitaire FR Device Versus Best Medical Therapy in the Treatment of Acute Stroke Due to Anterior Circulation Large Vessel Occlusion Presenting Within Eight Hours of Symptom Onset (REVASCAT) trial cohort. Primary outcome was distribution of mRS at 90 days. Local evaluation was done by certified investigators masked to treatment assignment using structured face-to-face interviews. In addition, central assessment was performed by 2 independent raters via structured phone interview (n=120) and via video recordings of the face-to-face interviews with local investigators (n=106). Interrater agreement was evaluated using kappa and discordance statistics. Sensitivity analyses for the primary end point using different adjudication approaches were performed. Correlation between mRS obtained with each modality and 24-hour follow-up infarct volumes was studied. Using local evaluation as the reference, higher agreement rates were noted with central video than with central phone evaluations (kw 0.92 [0.88-0.96] versus 0.77 [0.72-0.83]). Discrepancies in mRS scoring between local and central raters (phone- and video-based) were similar in both treatment allocation arms. Sensitivity analyses showed benefit of endovascular treatment irrespective of adjudication method, but higher odds ratios were observed with local evaluations. Final infarct volume was similarly correlated with mRS across all 3 evaluation modalities. Central adjudication of mRS is feasible, reducing interrater variability and avoiding potential problems related to lack of blinding. Our findings may have implications in the planning of future randomized acute stroke

  15. A Comparison of EFL Raters' Essay-Rating Processes across Two Types of Rating Scales

    ERIC Educational Resources Information Center

    Li, Hang; He, Lianzhen

    2015-01-01

    This study used think-aloud protocols to compare essay-rating processes across holistic and analytic rating scales in the context of China's College English Test Band 6 (CET-6). A group of 9 experienced CET-6 raters scored the same batch of 10 CET-6 essays produced in an operational CET-6 administration twice, using both the CET-6 holistic…

  16. Emotional Bias in Classroom Observations: Within-Rater Positive Emotion Predicts Favorable Assessments of Classroom Quality

    ERIC Educational Resources Information Center

    Floman, James L.; Hagelskamp, Carolin; Brackett, Marc A.; Rivers, Susan E.

    2017-01-01

    Classroom observations increasingly inform high-stakes decisions and research in education, including the allocation of school funding and the evaluation of school-based interventions. However, trends in rater scoring tendencies over time may undermine the reliability of classroom observations. Accordingly, the present investigations, grounded in…

  17. Raters' L2 Background as a Potential Source of Bias in Rating Oral Performance

    ERIC Educational Resources Information Center

    Winke, Paula; Gass, Susan; Myford, Carol

    2013-01-01

    Based on evidence that listeners may favor certain foreign accents over others (Gass & Varonis, 1984; Major, Fitzmaurice, Bunta, & Balasubramanian, 2002; Tauroza & Luk, 1997) and that language-test raters may better comprehend and/or rate the speech of test takers whose native languages (L1s) are more familiar on some level (Carey,…

  18. The Perception of Nonverbal Behavior in Function of the Age and the Sex of the Rater.

    ERIC Educational Resources Information Center

    von Raffler-Engel, Walburga

    Research was conducted on the age and sex differences in raters' evaluations of job applicants' nonverbal behaviors. A ten-minute videotape of five interviews was shown to 28 members (7 females and 21 males) of the Industrial Personnel Association who had varying years of experience in personnel work. The simulations depicted job applicants whose…

  19. Analysis of Rater Severity on Written Expression Exam Using Many Faceted Rasch Measurement

    ERIC Educational Resources Information Center

    Prieto, Gerardo; Nieto, Eloísa

    2014-01-01

    This paper describes how a Many Faceted Rasch Measurement (MFRM) approach can be applied to performance assessment focusing on rater analysis. The article provides an introduction to MFRM, a description of MFRM analysis procedures, and an example to illustrate how to examine the effects of various sources of variability on test takers' performance…

  20. An Alternative Method Used in Evaluating Agreement among Repeat Measurements by Two Raters in Education

    ERIC Educational Resources Information Center

    Erdogan, Semra; Orekici Temel, Gülhan; Selvi, Hüseyin; Ersöz Kaya, Irem

    2017-01-01

    Taking more than one measurement of the same variable also hosts the possibility of contamination from error sources, both singly and in combination as a result of interactions. Therefore, although the internal consistency of scores received from measurement tools is examined by itself, it is necessary to ensure interrater or intra-rater agreement…

  1. The Effect of Instrument-Specific Rater Training on Interrater Reliability and Counseling Skills Performance Differentiation

    ERIC Educational Resources Information Center

    Meacham, Paul Douglas, Jr.

    2013-01-01

    The purpose of this study was to explore the effect of instrument-specific rater training on interrater reliability (IRR) and counseling skills performance differentiation. Strong IRR is of primary concern to effective program evaluation (McCullough, Kuhn, Andrews, Valen, Hatch, & Osimo, 2003; Schanche, Nielsen, McCullough, Valen, &…

  2. Rater Perceptions of Bias Using the Multiple Mini-Interview Format: A Qualitative Study

    ERIC Educational Resources Information Center

    Alweis, Richard L.; Fitzpatrick, Caroline; Donato, Anthony A.

    2015-01-01

    Introduction: The Multiple Mini-Interview (MMI) format appears to mitigate individual rater biases. However, the format itself may introduce structural systematic bias, favoring extroverted personality types. This study aimed to gain a better understanding of these biases from the perspective of the interviewer. Methods: A sample of MMI…

  3. Foreign Accentedness Revisited: Canadian and Singaporean Raters' Perception of Japanese-Accented English

    ERIC Educational Resources Information Center

    Saito, Kazuya; Shintani, Natsuko

    2016-01-01

    The current study examined how two groups of native speakers--monolingual Canadians and multilingual Singaporeans--differentially perceive foreign accentedness in spontaneous second language (L2) speech. The Singaporean raters, who had exposure to various models of English and also spoke multiple L2s on a daily basis, demonstrated more lenient…

  4. Does a Rater's Familiarity with a Candidate's Pronunciation Affect the Rating in Oral Proficiency Interviews?

    ERIC Educational Resources Information Center

    Carey, Michael D.; Mannell, Robert H.; Dunn, Peter K.

    2011-01-01

    This study investigated factors that could affect inter-examiner reliability in the pronunciation assessment component of speaking tests. We hypothesized that the rating of pronunciation is susceptible to variation in assessment due to the amount of exposure examiners have to nonnative English accents. An inter-rater variability analysis was…

  5. A Study on the Impact of Fatigue on Human Raters When Scoring Speaking Responses

    ERIC Educational Resources Information Center

    Ling, Guangming; Mollaun, Pamela; Xi, Xiaoming

    2014-01-01

    The scoring of constructed responses may introduce construct-irrelevant factors to a test score and affect its validity and fairness. Fatigue is one of the factors that could negatively affect human performance in general, yet little is known about its effects on a human rater's scoring quality on constructed responses. In this study, we compared…

  6. Using Raters from India to Score a Large-Scale Speaking Test

    ERIC Educational Resources Information Center

    Xi, Xiaoming; Mollaun, Pam

    2011-01-01

    We investigated the scoring of the Speaking section of the Test of English as a Foreign Language[TM] Internet-based (TOEFL iBT[R]) test by speakers of English and one or more Indian languages. We explored the extent to which raters from India, after being trained and certified, were able to score the TOEFL examinees with mixed first languages…

  7. A Critical Review of Some Qualitative Research Methods Used to Explore Rater Cognition

    ERIC Educational Resources Information Center

    Suto, Irenka

    2012-01-01

    Internationally, many assessment systems rely predominantly on human raters to score examinations. Arguably, this facilitates the assessment of multiple sophisticated educational constructs, strengthening assessment validity. It can introduce subjectivity into the scoring process, however, engendering threats to accuracy. The present objectives…

  8. A Critical Review of Some Qualitative Research Methods Used to Explore Rater Cognition

    ERIC Educational Resources Information Center

    Suto, Irenka

    2012-01-01

    Internationally, many assessment systems rely predominantly on human raters to score examinations. Arguably, this facilitates the assessment of multiple sophisticated educational constructs, strengthening assessment validity. It can introduce subjectivity into the scoring process, however, engendering threats to accuracy. The present objectives…

  9. Learning, Behaviour and Reaction Framework: A Model for Training Raters to Improve Assessment Quality

    ERIC Educational Resources Information Center

    Chen, Chung-Yang; Chang, Huiju; Hsu, Wen-Chin; Sheen, Gwo-Ji

    2017-01-01

    This paper proposes a training model for raters, with the goal to improve the intra- and inter-consistency of evaluation quality for higher education curricula. The model, termed the learning, behaviour and reaction (LBR) circular training model, is an interdisciplinary application from the business and organisational training domain. The…

  10. AcceleRater: a web application for supervised learning of behavioral modes from acceleration measurements.

    PubMed

    Resheff, Yehezkel S; Rotics, Shay; Harel, Roi; Spiegel, Orr; Nathan, Ran

    2014-01-01

    The study of animal movement is experiencing rapid progress in recent years, forcefully driven by technological advancement. Biologgers with Acceleration (ACC) recordings are becoming increasingly popular in the fields of animal behavior and movement ecology, for estimating energy expenditure and identifying behavior, with prospects for other potential uses as well. Supervised learning of behavioral modes from acceleration data has shown promising results in many species, and for a diverse range of behaviors. However, broad implementation of this technique in movement ecology research has been limited due to technical difficulties and complicated analysis, deterring many practitioners from applying this approach. This highlights the need to develop a broadly applicable tool for classifying behavior from acceleration data. Here we present a free-access python-based web application called AcceleRater, for rapidly training, visualizing and using models for supervised learning of behavioral modes from ACC measurements. We introduce AcceleRater, and illustrate its successful application for classifying vulture behavioral modes from acceleration data obtained from free-ranging vultures. The seven models offered in the AcceleRater application achieved overall accuracy of between 77.68% (Decision Tree) and 84.84% (Artificial Neural Network), with a mean overall accuracy of 81.51% and standard deviation of 3.95%. Notably, variation in performance was larger between behavioral modes than between models. AcceleRater provides the means to identify animal behavior, offering a user-friendly tool for ACC-based behavioral annotation, which will be dynamically upgraded and maintained.

  11. Does a Rater's Familiarity with a Candidate's Pronunciation Affect the Rating in Oral Proficiency Interviews?

    ERIC Educational Resources Information Center

    Carey, Michael D.; Mannell, Robert H.; Dunn, Peter K.

    2011-01-01

    This study investigated factors that could affect inter-examiner reliability in the pronunciation assessment component of speaking tests. We hypothesized that the rating of pronunciation is susceptible to variation in assessment due to the amount of exposure examiners have to nonnative English accents. An inter-rater variability analysis was…

  12. A Comparison of Teacher Effectiveness Measures Calculated Using Three Multilevel Models for Raters Effects

    ERIC Educational Resources Information Center

    Murphy, Daniel L.; Beretvas, S. Natasha

    2015-01-01

    This study examines the use of cross-classified random effects models (CCrem) and cross-classified multiple membership random effects models (CCMMrem) to model rater bias and estimate teacher effectiveness. Effect estimates are compared using CTT versus item response theory (IRT) scaling methods and three models (i.e., conventional multilevel…

  13. Comparing Foreign Accent in L1 Attrition and L2 Acquisition: Range and Rater Effects

    ERIC Educational Resources Information Center

    Schmid, Monika S.; Hopp, Holger

    2014-01-01

    This study examines the methodology of global foreign accent ratings in studies on L2 speech production. In three experiments, we test how variation in raters, range within speech samples, as well as instructions and procedures affects ratings of accent in predominantly monolingual speakers of German, non-native speakers of German, as well as…

  14. A Comparison of EFL Raters' Essay-Rating Processes across Two Types of Rating Scales

    ERIC Educational Resources Information Center

    Li, Hang; He, Lianzhen

    2015-01-01

    This study used think-aloud protocols to compare essay-rating processes across holistic and analytic rating scales in the context of China's College English Test Band 6 (CET-6). A group of 9 experienced CET-6 raters scored the same batch of 10 CET-6 essays produced in an operational CET-6 administration twice, using both the CET-6 holistic…

  15. A Longitudinal Examination of Rater and Ratee Effects in Performance Ratings.

    ERIC Educational Resources Information Center

    Vance, Robert J.; And Others

    1983-01-01

    Investigated the consistency and loci of leniency, halo, and range restriction effects in performance ratings in a longitudinal study. Policy supervisors (N=90) rated 350 subordinates on five occasions. Concluded that reliable variance in mean ratings is partly attributable to ratees, but mainly introduced by raters. (JAC)

  16. Exploring the Impact of Mental Workload on Rater-Based Assessments

    ERIC Educational Resources Information Center

    Tavares, Walter; Eva, Kevin W.

    2013-01-01

    When appraising the performance of others, assessors must acquire relevant information and process it in a meaningful way in order to translate it effectively into ratings, comments, or judgments about how well the performance meets appropriate standards. Rater-based assessment strategies in health professional education, including scale and…

  17. Usability by Raters of the Barber Scales of Self-Regard for Preschool Children

    ERIC Educational Resources Information Center

    Barber, Lucie W.; Barton, Kimberly

    The seven Barber Scales of Self-Regard for Preschool Children were developed to provide instruments for assessing levels of development for individual children. The purpose of this study was to probe into the question of whether or not raters (mothers, fathers, teachers) had difficulties rating children on the scales. Two sources of evidence were…

  18. Improving Creativity Performance Assessment: A Rater Effect Examination with Many Facet Rasch Model

    ERIC Educational Resources Information Center

    Hung, Su-Pin; Chen, Po-Hsi; Chen, Hsueh-Chih

    2012-01-01

    Product assessment is widely applied in creative studies, typically as an important dependent measure. Within this context, this study had 2 purposes. First, the focus of this research was on methods for investigating possible rater effects, an issue that has not received a great deal of attention in past creativity studies. Second, the…

  19. The Use of Analytic Rubric in the Assessment of Writing Performance--Inter-Rater Concordance Study

    ERIC Educational Resources Information Center

    Beyreli, Latif; Ari, Gokhan

    2009-01-01

    In this study, the purpose was to determine whether there was concordance among raters in the assessment of the writing performance using analytic rubric; furthermore, factors affecting the assessment process were examined. The analytic rubric used in the study consists of three sections and ten properties: External structure (format, spelling and…

  20. Improving Creativity Performance Assessment: A Rater Effect Examination with Many Facet Rasch Model

    ERIC Educational Resources Information Center

    Hung, Su-Pin; Chen, Po-Hsi; Chen, Hsueh-Chih

    2012-01-01

    Product assessment is widely applied in creative studies, typically as an important dependent measure. Within this context, this study had 2 purposes. First, the focus of this research was on methods for investigating possible rater effects, an issue that has not received a great deal of attention in past creativity studies. Second, the…

  1. Raters' L2 Background as a Potential Source of Bias in Rating Oral Performance

    ERIC Educational Resources Information Center

    Winke, Paula; Gass, Susan; Myford, Carol

    2013-01-01

    Based on evidence that listeners may favor certain foreign accents over others (Gass & Varonis, 1984; Major, Fitzmaurice, Bunta, & Balasubramanian, 2002; Tauroza & Luk, 1997) and that language-test raters may better comprehend and/or rate the speech of test takers whose native languages (L1s) are more familiar on some level (Carey,…

  2. Inter-rater Agreement on Identification of Electrographic Seizures and Periodic Discharges in ICU EEG Recordings

    PubMed Central

    Halford, J.J.; Shiau, D.; Desrochers, J.A.; Kolls, B.J.; Dean, B.C.; Waters, C.G.; Azar, N.J.; Haas, K.F.; Kutluay, E.; Martz, G.U.; Sinha, S.R.; Kern, R.T.; Kelly, K.M.; Sackellares, J.C.; LaRoche, S.M.

    2015-01-01

    Objective This study investigated inter-rater agreement (IRA) among EEG experts for the identification of electrographic seizures and periodic discharges (PDs) in continuous ICU EEG recordings. Methods Eight board-certified EEG experts independently identified seizures and PDs in thirty 1-hour EEG segments which were selected from ICU EEG recordings collected from three medical centers. IRA was compared between seizure and PD identifications, as well as among rater groups that have passed an ICU EEG Certification Test, developed by the Critical Care EEG Monitoring Research Consortium (CCEMRC). Results Both kappa and event-based IRA statistics showed higher mean values in identification of seizures compared to PDs (k = 0.58 vs. 0.38; p < 0.001). The group of rater pairs who had both passed the ICU EEG Certification Test had a significantly higher mean IRA in comparison to rater pairs in which neither had passed the test. Conclusions IRA among experts is significantly higher for identification of electrographic seizures compared to PDs. Additional instruction, such as the training module and certification test developed by the CCEMRC, could enhance this IRA. Significance This study demonstrates more disagreement in the labeling of PDs in comparison to seizures. This may be improved by education about standard EEG nomenclature. PMID:25481336

  3. High-Dimensional Explanatory Random Item Effects Models for Rater-Mediated Assessments

    ERIC Educational Resources Information Center

    Kelcey, Ben; Wang, Shanshan; Cox, Kyle

    2016-01-01

    Valid and reliable measurement of unobserved latent variables is essential to understanding and improving education. A common and persistent approach to assessing latent constructs in education is the use of rater inferential judgment. The purpose of this study is to develop high-dimensional explanatory random item effects models designed for…

  4. A Comparison of Teacher Effectiveness Measures Calculated Using Three Multilevel Models for Raters Effects

    ERIC Educational Resources Information Center

    Murphy, Daniel L.; Beretvas, S. Natasha

    2015-01-01

    This study examines the use of cross-classified random effects models (CCrem) and cross-classified multiple membership random effects models (CCMMrem) to model rater bias and estimate teacher effectiveness. Effect estimates are compared using CTT versus item response theory (IRT) scaling methods and three models (i.e., conventional multilevel…

  5. The Effect of Rater Training on Reducing Social Style Bias in Peer Evaluation

    ERIC Educational Resources Information Center

    May, Gary L.

    2008-01-01

    This study employed a quasiexperimental control group design in a university setting to test the effect of a rater-training program on reducing social style bias in intragroup peer evaluations after controlling for ability based on GPA. Comparison of rating scores of the test group to the control group indicated minimal social style rating bias in…

  6. The Accuracy of Performance Task Scores after Resolution of Rater Disagreement: A Monte Carlo Study

    ERIC Educational Resources Information Center

    Penny, James A.; Johnson, Robert L.

    2011-01-01

    When multiple raters score a writing sample, on occasion they will award discrepant scores. To report a single score to the examinee, some method of resolving those differences must be applied to the ratings before an operational score can be reported. Several forms of resolving score discrepancies have been described in the literature. Initial…

  7. Leveraging Data Sampling and Practical Knowledge: Field Instructors' Perceptions about Inter-Rater Reliability Data

    ERIC Educational Resources Information Center

    Soslau, Elizabeth; Lewis, Kandia

    2014-01-01

    For accreditation and programmatic decision making, education school administrators use inter-rater reliability analyses to judge credibility of student-teacher assessments. Although weak levels of agreement between university-appointed supervisors and cooperating teachers are usually interpreted to indicate that the process is not being…

  8. A Film Evaluation Checklist to Describe Instructionally Relevant Film Characteristics: A Rater Reliability Study.

    ERIC Educational Resources Information Center

    Burke, Nancy E.; And Others

    A Film Evaluation Checklist, developed at the National Technical Institute for the Deaf, describes instructionally relevant film characteristics. The instrument is organized in terms of structural characteristics of the medium, modes of concept emphasis using film techniques, and organization of instructional material. Raters independently rated…

  9. The Effect of Instrument-Specific Rater Training on Interrater Reliability and Counseling Skills Performance Differentiation

    ERIC Educational Resources Information Center

    Meacham, Paul Douglas, Jr.

    2013-01-01

    The purpose of this study was to explore the effect of instrument-specific rater training on interrater reliability (IRR) and counseling skills performance differentiation. Strong IRR is of primary concern to effective program evaluation (McCullough, Kuhn, Andrews, Valen, Hatch, & Osimo, 2003; Schanche, Nielsen, McCullough, Valen, &…

  10. IRR (Inter-Rater Reliability) of a COP (Classroom Observation Protocol)--A Critical Appraisal

    ERIC Educational Resources Information Center

    Rui, Ning; Feldman, Jill M.

    2012-01-01

    Notwithstanding broad utility of COPs (classroom observation protocols), there has been limited documentation of the psychometric properties of even the most popular COPs. This study attempted to fill this void by closely examining the item and domain-level IRR (inter-rater reliability) of a COP that was used in a federally funded striving readers…

  11. Mainstream Teacher Candidates' Perspectives on ESL Writing: The Effects of Writer Identity and Rater Background

    ERIC Educational Resources Information Center

    Kang, Hyun-Sook; Veitch, Hillary

    2017-01-01

    This study explored the extent to which the ethnic identity of a writer and the background (gender and area of teaching) of a rater can influence mainstream teacher candidates' evaluation of English as a second language (ESL) writing, using a matched-guise method. A one-page essay was elicited from an ESL learner enrolled in an intensive English…

  12. A Study on the Impact of Fatigue on Human Raters When Scoring Speaking Responses

    ERIC Educational Resources Information Center

    Ling, Guangming; Mollaun, Pamela; Xi, Xiaoming

    2014-01-01

    The scoring of constructed responses may introduce construct-irrelevant factors to a test score and affect its validity and fairness. Fatigue is one of the factors that could negatively affect human performance in general, yet little is known about its effects on a human rater's scoring quality on constructed responses. In this study, we compared…

  13. Using Raters from India to Score a Large-Scale Speaking Test

    ERIC Educational Resources Information Center

    Xi, Xiaoming; Mollaun, Pam

    2011-01-01

    We investigated the scoring of the Speaking section of the Test of English as a Foreign Language[TM] Internet-based (TOEFL iBT[R]) test by speakers of English and one or more Indian languages. We explored the extent to which raters from India, after being trained and certified, were able to score the TOEFL examinees with mixed first languages…

  14. Usability by Raters of the Barber Scales of Self-Regard for Preschool Children

    ERIC Educational Resources Information Center

    Barber, Lucie W.; Barton, Kimberly

    The seven Barber Scales of Self-Regard for Preschool Children were developed to provide instruments for assessing levels of development for individual children. The purpose of this study was to probe into the question of whether or not raters (mothers, fathers, teachers) had difficulties rating children on the scales. Two sources of evidence were…

  15. The Accuracy of Performance Task Scores after Resolution of Rater Disagreement: A Monte Carlo Study

    ERIC Educational Resources Information Center

    Penny, James A.; Johnson, Robert L.

    2011-01-01

    When multiple raters score a writing sample, on occasion they will award discrepant scores. To report a single score to the examinee, some method of resolving those differences must be applied to the ratings before an operational score can be reported. Several forms of resolving score discrepancies have been described in the literature. Initial…

  16. Inter-rater reliability of muscle contractile property measurements using non-invasive tensiomyography.

    PubMed

    Tous-Fajardo, Julio; Moras, Gerard; Rodríguez-Jiménez, Sergio; Usach, Robert; Doutres, Daniel Moreno; Maffiuletti, Nicola A

    2010-08-01

    Tensiomyography (TMG) is a relatively novel technique to assess muscle mechanical response based on radial muscle belly displacement consecutive to a single electrical stimulus. Although intra-session reliability has been found to be good, inter-rater reliability and the influence of sensor repositioning and electrodes placement on TMG measurements is unknown. The purpose of this study was to analyze the inter-rater reliability of vastus medialis muscle contractile property measurements obtained with TMG as well as the effect of inter-electrode distance (IED). Five contractile parameters were analyzed from vastus medialis muscle belly displacement-time curves: maximal displacement (Dm), contraction time (Tc), sustain time (Ts), delay time (Td), and half-relaxation time (Tr). The inter-rater reliability and IED effect on these measurements were evaluated in 18 subjects. Intra-class correlation coefficients, standard errors of measurement, Bland and Altman systematic bias and random error as well as coefficient of variations were used as measures of reliability. Overall, a good to excellent inter-rater reliability was found for all contractile parameters, except Tr, which showed insufficient reliability. Alterations in IED significantly affected Dm with a trend for all the other parameters. The present results legitimate the use of TMG for the assessment of vastus medialis muscle contractile properties, particularly for Dm and Tc. It is recommended to avoid Tr quantification and IED modifications during multiple TMG measurements.

  17. Rater Language Background as a Source of Measurement Error in the Testing of English Language Learners

    ERIC Educational Resources Information Center

    Kachchaf, Rachel; Solano-Flores, Guillermo

    2012-01-01

    We examined how rater language background affects the scoring of short-answer, open-ended test items in the assessment of English language learners (ELLs). Four native English and four native Spanish-speaking certified bilingual teachers scored 107 responses of fourth- and fifth-grade Spanish-speaking ELLs to mathematics items administered in…

  18. Inter-rater reliability of two paediatric early warning score tools.

    PubMed

    Jensen, Claus S; Aagaard, Hanne; Olesen, Hanne V; Kirkegaard, Hans

    2017-07-31

    Paediatric early warning score (PEWS) assessment tools can assist healthcare providers in the timely detection and recognition of subtle patient condition changes signalling clinical deterioration. However, PEWS tools instrument data are only as reliable and accurate as the caregivers who obtain and document the parameters. The aim of this study is to evaluate inter-rater reliability among nurses using PEWS systems. The study was carried out in five paediatrics departments in the Central Denmark Region. Inter-rater reliability was investigated through parallel observations. A total of 108 children and 69 nurses participated. Two nurses simultaneously performed a PEWS assessment on the same patient. Before the assessment, the two participating nurses drew lots to decide who would be the active observer. Intraclass correlation coefficient, Fleiss' κ and Bland-Altman limits of agreement were used to determine inter-rater reliability. The intraclass correlation coefficients for the aggregated PEWS score of the two PEWS models were 0.98 and 0.95, respectively. The κ value on the individual PEWS measurements ranged from 0.70 to 1.0, indicating good to very good agreement. The nurses assigned the exact same aggregated score for both PEWS models in 76% of the cases. In 98% of the PEWS assessments, the aggregated PEWS scores assigned by the nurses were equal to or below 1 point in both models. The study showed good to very good inter-rater reliability in the two PEWS models used in the Central Denmark Region.

  19. THE INTRA- AND INTER-RATER RELIABILITY OF THE SOCCER INJURY MOVEMENT SCREEN (SIMS)

    PubMed Central

    aus der Fünten, Karen; Govus, Andrew; Julian, Ross; Schimpchen, Jan; Meyer, Tim

    2017-01-01

    Background/purpose The growing volume of movement screening research reveals a belief among practitioners and researchers alike that movement quality may have an association with injury risk. However, existing movement screening tools have not considered the sport-specific movement and injury patterns relevant to soccer. The present study introduces the Soccer Injury Movement Screen (SIMS), which has been designed specifically for use within soccer. Furthermore, the purpose of the present study was to assess the intra- and inter-rater reliability of the SIMS and determine its suitability for use in further research. Methods The study utilized a test-retest design to discern reliablility. Twenty-five (11 males, 14 females) healthy, recreationally active university students (age 25.5 ± 4.0 years, height 171 ± 9 cm, weight 64.7 ± 12.6 kg) agreed to participate. The SIMS contains five sub-tests: the anterior reach, single-leg deadlift, in-line lunge, single-leg hop for distance and tuck jump. Each movement was scored out of 10 points and summed to produce a composite score out of 50. The anterior reach and single-leg hop for distance were scored in real-time while the remaining tests were filmed and scored retrospectively. Three raters conducted the SIMS with each participant on three occasions separated by an average of three and a half days (minimum one day, maximum seven days). Rater 1 re-scored the filmed movements for all participants on all occasions six months later to establish the ‘pure’ intra-rater (intra-occasion) reliability for those movements. Results Intraclass correlation coefficient (ICC) values for intra- and inter-rater composite score reliability ranged from 0.66-0.72 and 0.79-0.86 respectively. Weighted kappa values representing the intra- and inter-rater reliability of the individual sub-tests ranged from 0.35-0.91 indicating fair to almost perfect agreement. Conclusions Establishing the reliability of the SIMS is a

  20. Feasibility and Inter-Rater Reliability of Physical Performance Measures in Acutely Admitted Older Medical Patients

    PubMed Central

    Bodilsen, Ann Christine; Juul-Larsen, Helle Gybel; Petersen, Janne; Beyer, Nina; Andersen, Ove; Bandholm, Thomas

    2015-01-01

    Objective Physical performance measures can be used to predict functional decline and increased dependency in older persons. However, few studies have assessed the feasibility or reliability of such measures in hospitalized older patients. Here we assessed the feasibility and inter-rater reliability of four simple measures of physical performance in acutely admitted older medical patients. Design During the first 24 hours of hospitalization, the following were assessed twice by different raters in 52 (≥ 65 years) patients admitted for acute medical illness: isometric hand grip strength, 4-meter gait speed, 30-s chair stand and Cumulated Ambulation Score. Relative reliability was expressed as weighted kappa for the Cumulated Ambulation Score or as intra-class correlation coefficient (ICC1,1) and lower limit of the 95%-confidence interval (LL95%) for grip strength, gait speed, and 30-s chair stand. Absolute reliability was expressed as the standard error of measurement and the smallest real difference as a percentage of their respective means (SEM% and SRD%). Results The primary reasons for admission of the 52 included patients were infectious disease and cardiovascular illness. The mean± SD age was 78±8.3 years, and 73.1% were women. All patients performed grip strength and Cumulated Ambulation Score testing, 81% performed the gait speed test, and 54% completed the 30-s chair stand test (46% were unable to rise without using the armrests). No systematic bias was found between first and second tests or between raters. The weighted kappa for the Cumulated Ambulation Score was 0.76 (0.60–0.92). The ICC1,1 values were as follows: grip strength, 0.95 (LL95% 0.92); gait speed, 0.92 (LL95% 0.73), and 30-s chair stand, 0.82 (LL95% 0.67). The SEM% values for grip strength, gait speed, and 30-s chair stand were 8%, 7%, and 18%, and the SRD95% values were 22%, 17%, and 49%. Conclusion In acutely admitted older medical patients, grip strength, gait speed, and the

  1. THE INTRA- AND INTER-RATER RELIABILITY OF THE SOCCER INJURY MOVEMENT SCREEN (SIMS).

    PubMed

    McCunn, Robert; Aus der Fünten, Karen; Govus, Andrew; Julian, Ross; Schimpchen, Jan; Meyer, Tim

    2017-02-01

    The growing volume of movement screening research reveals a belief among practitioners and researchers alike that movement quality may have an association with injury risk. However, existing movement screening tools have not considered the sport-specific movement and injury patterns relevant to soccer. The present study introduces the Soccer Injury Movement Screen (SIMS), which has been designed specifically for use within soccer. Furthermore, the purpose of the present study was to assess the intra- and inter-rater reliability of the SIMS and determine its suitability for use in further research. The study utilized a test-retest design to discern reliablility. Twenty-five (11 males, 14 females) healthy, recreationally active university students (age 25.5 ± 4.0 years, height 171 ± 9 cm, weight 64.7 ± 12.6 kg) agreed to participate. The SIMS contains five sub-tests: the anterior reach, single-leg deadlift, in-line lunge, single-leg hop for distance and tuck jump. Each movement was scored out of 10 points and summed to produce a composite score out of 50. The anterior reach and single-leg hop for distance were scored in real-time while the remaining tests were filmed and scored retrospectively. Three raters conducted the SIMS with each participant on three occasions separated by an average of three and a half days (minimum one day, maximum seven days). Rater 1 re-scored the filmed movements for all participants on all occasions six months later to establish the 'pure' intra-rater (intra-occasion) reliability for those movements. Intraclass correlation coefficient (ICC) values for intra- and inter-rater composite score reliability ranged from 0.66-0.72 and 0.79-0.86 respectively. Weighted kappa values representing the intra- and inter-rater reliability of the individual sub-tests ranged from 0.35-0.91 indicating fair to almost perfect agreement. Establishing the reliability of the SIMS is a prerequisite for further research seeking to investigate

  2. Global data on blindness.

    PubMed Central

    Thylefors, B.; Négrel, A. D.; Pararajasegaram, R.; Dadzie, K. Y.

    1995-01-01

    Globally, it is estimated that there are 38 million persons who are blind. Moreover, a further 110 million people have low vision and are at great risk of becoming blind. The main causes of blindness and low vision are cataract, trachoma, glaucoma, onchocerciasis, and xerophthalmia; however, insufficient data on blindness from causes such as diabetic retinopathy and age-related macular degeneration preclude specific estimations of their global prevalence. The age-specific prevalences of the major causes of blindness that are related to age indicate that the trend will be for an increase in such blindness over the decades to come, unless energetic efforts are made to tackle these problems. More data collected through standardized methodologies, using internationally accepted (ICD-10) definitions, are needed. Data on the incidence of blindness due to common causes would be useful for calculating future trends more precisely. PMID:7704921

  3. Inter-rater Reliability Assessment of ASPECT-R: (A Study Pragmatic-Explanatory Characterization Tool-Rating).

    PubMed

    Bossie, Cynthia A; Alphs, Larry D; Williamson, David; Mao, Lian; Kurut, Clennon

    2016-01-01

    The increasing importance of real-world data for clinical and policy decision making is driving a need for close attention to the pragmatic versus explanatory features of trial designs. ASPECT-R (A Study Pragmatic-Explanatory Characterization Tool-Rating) is an instrument informed by the PRECIS tool, which was developed to assist researchers in designing trials that are more pragmatic or explanatory. ASPECT-R refined the PRECIS domains and includes a detailed anchored rating system. This analysis established the inter-rater reliability of ASPECT-R. Nine raters (identified from a convenience sample of persons knowledgeable about psychiatry clinical research/study design) received ASPECT-R training materials and 12 study publications. Selected studies assessed antipsychotic treatment in schizophrenia, were published in peer-reviewed journals, and represented a range of studies across a pragmatic-explanatory continuum as determined by authors (CB/LA). After completing training, raters reviewed the 12 studies and rated the study domains using ASPECT-R. Intraclass correlation coefficients were estimated for total and domain scores. Qualitative ratings then were assigned to describe the inter-rater reliability. ASPECT-R scores for the 12 studies were completed by seven raters. The ASPECT-R total score intraclass correlation coefficient was 0.87, corresponding to an excellent inter-rater reliability. Domain intraclass correlation coefficients ranged from 0.85 to 0.31, corresponding to excellent to poor inter-rater reliability. The inter-rater reliability of the ASPECT-R total score was excellent, with excellent to good inter-rater reliability for most domains. The fair to poor inter-rater reliability for two domains may reflect a need for improved domain definition, anchoring, or training materials. ASPECT-R can be used to help understand the pragmaticexplanatory nature of completed or planned trials.

  4. Impact of educational intervention on the inter-rater agreement of nasal endoscopy interpretation

    PubMed Central

    Colley, Patrick; Mace, Jess C.; Schaberg, Madeleine R.; Smith, Timothy L.; Tabaee, Abtin

    2015-01-01

    OBJECTIVE Nasal endoscopy is integral to the evaluation of sinonasal disorders. However, prior studies have shown significant variability in the inter-rater agreement of nasal endoscopy interpretation amongst practicing rhinologists. The objective of the current study is to evaluate the inter-rater agreement of nasal endoscopy amongst otolaryngology residents from a single training program at baseline and following an educational intervention. METHODS 11 otolaryngology residents completed nasal endoscopy grading forms for 8 digitally recorded nasal endoscopic examinations. An instructional lecture reviewing nasal endoscopy interpretation was subsequently provided. The residents then completed grading forms for 8 different nasal endoscopic examinations. Inter-rate agreement amongst residents for the pre- and post-lecture videos was calculated using the unweighted Fleiss’ kappa statistic (Kf) and intra-class correlation agreement (ICC). RESULTS Inter-rater agreement improved from a baseline level of fair (Kf range 0.268–0.383) to a post-educational level of moderate (Kf range 0.401–0.547) for nasal endoscopy findings of middle meatus mucosa, middle turbinate mucosa, middle meatus discharge, sphenoethmoid recess mucosa, sphenoethmoid recess discharge and atypical lesions (ICC, p<0.001). The baseline level of agreement for evaluation of nasal septum deviation was poor/fair and did not improve following educational intervention. CONCLUSIONS This study demonstrates a limited baseline level of inter-rater agreement of nasal endoscopy interpretation amongst otolaryngology residents. The inter-rater agreement for the majority of the characteristics that were evaluated improved after educational intervention. Further study is needed to improve nasal endoscopy interpretation. PMID:25781864

  5. Ultrasound assessment for grading structural tendon changes in supraspinatus tendinopathy: an inter-rater reliability study

    PubMed Central

    Hjarbaek, John; Eshoej, Henrik; Larsen, Camilla Marie; Vobbe, Jette; Juul-Kristensen, Birgit

    2016-01-01

    Aim To evaluate the inter-rater reliability of measuring structural changes in the tendon of patients, clinically diagnosed with supraspinatus tendinopathy (cases) and healthy participants (controls), on ultrasound (US) images captured by standardised procedures. Methods A total of 40 participants (24 patients) were included for assessing inter-rater reliability of measurements of fibrillar disruption, neovascularity, as well as the number and total length of calcifications and tendon thickness. Linear weighted κ, intraclass correlation (ICC), SEM, limits of agreement (LOA) and minimal detectable change (MDC) were used to evaluate reliability. Results ‘Moderate—almost perfect’ κ was found for grading fibrillar disruption, neovascularity and number of calcifications (k 0.60–0.96). For total length of calcifications and tendon thickness, ICC was ‘excellent’ (0.85–0.90), with SEM(Agreement) ranging from 0.63 to 2.94 mm and MDC(group) ranging from 0.28 to 1.29 mm. In general, SEM, LOA and MDC showed larger variation for calcifications than for tendon thickness. Conclusions Inter-rater reliability was moderate to almost perfect when a standardised procedure was applied for measuring structural changes on captured US images and movie sequences of relevance for patients with supraspinatus tendinopathy. Future studies should test intra-rater and inter-rater reliability of the method in vivo for use in clinical practice, in addition to validation against a gold standard, such as MRI. Trial registration number NCT01984203; Pre-results. PMID:27221128

  6. Blindness prevention programmes: past, present, and future.

    PubMed Central

    Resnikoff, S.; Pararajasegaram, R.

    2001-01-01

    Blindness and visual impairment have far-reaching implications for society, the more so when it is realized that 80% of visual disability is avoidable. The marked increase in the size of the elderly population, with their greater propensity for visually disabling conditions, presents a further challenge in this respect. However, if available knowledge and skills were made accessible to those communities in greatest need, much of this needless blindness could be alleviated. Since its inception over 50 years ago, and beginning with trachoma control, WHO has spearheaded efforts to assist Member States to meet the challenge of needless blindness. Since the establishment of the WHO Programme for the Prevention of Blindness in 1978, vast strides have been made through various forms of technical support to establish national prevention of blindness programmes. A more recent initiative, "The Global Initiative for the Elimination of Avoidable Blindness" (referred to as "VISION 2020--The Right to Sight"), launched in 1999, is a collaborative effort between WHO and a number of international nongovernmental organizations and other interested partners. This effort is poised to take the steps necessary to achieve the goal of eliminating avoidable blindness worldwide by the year 2020. PMID:11285666

  7. Favorable outcome of epileptic blindness in children.

    PubMed

    Shahar, Eli; Barak, Shai

    2003-01-01

    Acute blindness is a rare presentation of epileptic seizures, referring to loss of sight without loss of consciousness associated with electroencephalographic (EEG) epileptic discharges, mainly representing an ictal phase but also either pre- or postictal. We report a series of 14 children with documented epileptic blindness, describing the accompanying fits and thereafter the response to therapy to resolve the blindness and control associated seizures. All patients experienced episodes of acute complete visual obscuration lasting for 1 to 10 minutes. Seven patients hadaccompanying generalized seizures, with a photosensitive response recorded in three of them. All of these seven children were treated with valproic acid, regaining full vision, and six of them became seizure free. Three patients with acute blindness who had accompanying focal motor seizures and unilateral temporooccipital posterior epileptic discharges were treated with carbamazepine regained full vision and complete seizure control. Four additional children had the constellation of migrainous headaches, focal motor phenomena, and complete blindness, along with occipital discharges compatible with Gastaut syndrome, benign childhood epilepsy with occipital paroxysms. All four patients were started on carbamazepine and became asymptomatic. Our overall experience suggests that epileptic blindness in children is associated with a favorable outcome when promptly diagnosed and treated appropriately, resulting in complete resolution of blindness in all children and satisfactory control of seizures in most of them. We therefore recommend performing a prompt EEG in any child presenting with acute visual obscuration, even in the absence of other epileptic phenomena.

  8. Refractive error blindness.

    PubMed Central

    Dandona, R.; Dandona, L.

    2001-01-01

    Recent data suggest that a large number of people are blind in different parts of the world due to high refractive error because they are not using appropriate refractive correction. Refractive error as a cause of blindness has been recognized only recently with the increasing use of presenting visual acuity for defining blindness. In addition to blindness due to naturally occurring high refractive error, inadequate refractive correction of aphakia after cataract surgery is also a significant cause of blindness in developing countries. Blindness due to refractive error in any population suggests that eye care services in general in that population are inadequate since treatment of refractive error is perhaps the simplest and most effective form of eye care. Strategies such as vision screening programmes need to be implemented on a large scale to detect individuals suffering from refractive error blindness. Sufficient numbers of personnel to perform reasonable quality refraction need to be trained in developing countries. Also adequate infrastructure has to be developed in underserved areas of the world to facilitate the logistics of providing affordable reasonable-quality spectacles to individuals suffering from refractive error blindness. Long-term success in reducing refractive error blindness worldwide will require attention to these issues within the context of comprehensive approaches to reduce all causes of avoidable blindness. PMID:11285669

  9. Adjunctive Electroconvulsive Therapy for Schizophrenia: A Meta-analysis of Randomized Rater-Masked Controlled Trials [RETRACTED].

    PubMed

    Zheng, Wei; Xiang, Yu-Tao; Tang, Yi-Lang; Xiang, Ying-Qiang; Li, Xian-Bin; Cao, Xiao-Lan; Guo, Tong; Liu, Zheng-Rong; Chiu, Helen F K; Ungvari, Gabor S; de Leon, Jose

    2016-08-03

    The aim of the study was to examine published randomized controlled trials (RCTs) for the efficacy and safety of adjunctive electroconvulsive therapy (ECT) when combined with antipsychotics (APs) versus AP therapy for schizophrenia and related disorders during the acute phase. Two evaluators independently selected studies, extracted data, and conducted quality assessment and data synthesis. Standardized and weighted mean differences (SMD/WMD), risk ratio (RR) ±95% confidence intervals (CIs), number needed to treat (NNT), and number needed to harm (NNH) were calculated. Twenty-two RCTs (n = 1365, age = 36.9 years, male = 53%), including double-blind (8 RCTs) and rater-masked (14 RCTs) designs, were identified and analyzed. Adjunctive ECT was superior to AP therapy regarding (1) symptomatic improvement at last-observation endpoint (standardized mean difference, -0.67; P < 0.00001; I = 79%); (2) study-defined response (RR = 1.81, I = 0%, P < 0.00001, NNT = 4) and remission (RR = 2.05, I = 0%, P = 0.0004, NNT = 13); and (3) positive, negative, and general psychopathology subscores (weighted mean difference, -4.01 to -1.79; P = 0.005-0.0001). Results were similar in all preplanned subgroup analyses including Chinese (11 RCTs) versus non-Chinese (7 RCTs) origin, those with a Jadad score 3 or higher (12 RCTs) versus lower than 3 (6 RCTs), and those with clozapine (5 RCTs) versus those with non-clozapine treatments (13 RCTs). Compared with AP therapy, adjunctive ECT AP was significantly associated with more headache (RR = 2.72, P = 0.04, NNH = 5) and memory impairment (RR = 14.24, P = 0.01, NNH = 7). Adjunctive ECT seems to be an effective and safe option for schizophrenia and related disorders during acute phases but was associated with transient memory impairment and headaches.

  10. Use of e-rater[R] in Scoring of the TOEFL iBT[R] Writing Test. Research Report. ETS RR-11-25

    ERIC Educational Resources Information Center

    Haberman, Shelby J.

    2011-01-01

    Alternative approaches are discussed for use of e-rater[R] to score the TOEFL iBT[R] Writing test. These approaches involve alternate criteria. In the 1st approach, the predicted variable is the expected rater score of the examinee's 2 essays. In the 2nd approach, the predicted variable is the expected rater score of 2 essay responses by the…

  11. The mental health of individuals referred for assessment of autism spectrum disorder in adulthood: A clinic report.

    PubMed

    Russell, Ailsa J; Murphy, Clodagh M; Wilson, Ellie; Gillan, Nicola; Brown, Cordelia; Robertson, Dene M; Craig, Michael C; Deeley, Quinton; Zinkstok, Janneke; Johnston, Kate; McAlonan, Grainne M; Spain, Deborah; Murphy, Declan Gm

    2016-07-01

    Growing awareness of autism spectrum disorders has increased the demand for diagnostic services in adulthood. High rates of mental health problems have been reported in young people and adults with autism spectrum disorder. However, sampling and methodological issues mean prevalence estimates and conclusions about specificity in psychiatric co-morbidity in autism spectrum disorder remain unclear. A retrospective case review of 859 adults referred for assessment of autism spectrum disorder compares International Classification of Diseases, Tenth Revision diagnoses in those that met criteria for autism spectrum disorder (n = 474) with those that did not (n = 385). Rates of psychiatric diagnosis (>57%) were equivalent across both groups and exceeded general population rates for a number of conditions. The prevalence of anxiety disorders, particularly obsessive compulsive disorder, was significantly higher in adults with autism spectrum disorder than adults without autism spectrum disorder. Limitations of this observational clinic study, which may impact generalisability of the findings, include the lack of standardised structured psychiatric diagnostic assessments by assessors blind to autism spectrum disorder diagnosis and inter-rater reliability. The implications of this study highlight the need for careful consideration of mental health needs in all adults referred for autism spectrum disorder diagnosis.

  12. Hemianopic colour blindness.

    PubMed

    Albert, M L; Reches, A; Silverberg, R

    1975-06-01

    A man developed cortical blindness after cerebral infarction in the distribution of both posterior cerebral arteries. When he recovered from this condition, he was found to be colour blind in the left visual field, but not in the right. This unusual situation resulted in apparently contradictory performances on hemifield and free-field tasks of colour discrimination, naming, and recognition. The contradictions may be explained by interhemispheric competition between a hemisphere which could discriminate colours and a hemisphere which was colour blind.

  13. Attenuation of Change Blindness in Children with Autism Spectrum Disorders

    ERIC Educational Resources Information Center

    Fletcher-Watson, Sue; Leekam, Susan R.; Connolly, Brenda; Collis, Jess M.; Findlay, John M.; McConachie, Helen; Rodgers, Jacqui

    2012-01-01

    Change blindness refers to the difficulty most people find in detecting a difference between two pictures when these are presented successively, with a brief interruption between. Attention at the site of the change is required for detection. A number of studies have investigated change blindness in adults and children with autism spectrum…

  14. Attenuation of Change Blindness in Children with Autism Spectrum Disorders

    ERIC Educational Resources Information Center

    Fletcher-Watson, Sue; Leekam, Susan R.; Connolly, Brenda; Collis, Jess M.; Findlay, John M.; McConachie, Helen; Rodgers, Jacqui

    2012-01-01

    Change blindness refers to the difficulty most people find in detecting a difference between two pictures when these are presented successively, with a brief interruption between. Attention at the site of the change is required for detection. A number of studies have investigated change blindness in adults and children with autism spectrum…

  15. Social Comparison of Ability in Blind Children and Adolescents.

    ERIC Educational Resources Information Center

    Morin, Stephen F.; Jones, Reginald L.

    Forty-five blind, school aged subjects (aged 6-18 years) were questioned to determine the influence of age on the choice of the blind as a reference group for social comparison of abilities. To assess the direction of social comparison behavior, each subject was presented with a replication of three questions (which differed in the degree to which…

  16. Does training improve diagnostic accuracy and inter-rater agreement in applying the Berlin radiographic definition of acute respiratory distress syndrome? A multicenter prospective study.

    PubMed

    Peng, Jin-Min; Qian, Chuan-Yun; Yu, Xiang-You; Zhao, Ming-Yan; Li, Shu-Sheng; Ma, Xiao-Chun; Kang, Yan; Zhou, Fa-Chun; He, Zhen-Yang; Qin, Tie-He; Yin, Yong-Jie; Jiang, Li; Hu, Zhen-Jie; Sun, Ren-Hua; Lin, Jian-Dong; Li, Tong; Wu, Da-Wei; An, You-Zhong; Ai, Yu-Hang; Zhou, Li-Hua; Cao, Xiang-Yuan; Zhang, Xi-Jing; Sun, Rong-Qing; Chen, Er-Zhen; Du, Bin

    2017-01-20

    Poor inter-rater reliability in chest radiograph interpretation has been reported in the context of acute respiratory distress syndrome (ARDS), although not for the Berlin definition of ARDS. We sought to examine the effect of training material on the accuracy and consistency of intensivists' chest radiograph interpretations for ARDS diagnosis. We conducted a rater agreement study in which 286 intensivists (residents 41.3%, junior attending physicians 35.3%, and senior attending physician 23.4%) independently reviewed the same 12 chest radiographs developed by the ARDS Definition Task Force ("the panel") before and after training. Radiographic diagnoses by the panel were classified into the consistent (n = 4), equivocal (n = 4), and inconsistent (n = 4) categories and were used as a reference. The 1.5-hour training course attended by all 286 intensivists included introduction of the diagnostic rationale, and a subsequent in-depth discussion to reach consensus for all 12 radiographs. Overall diagnostic accuracy, which was defined as the percentage of chest radiographs that were interpreted correctly, improved but remained poor after training (42.0 ± 14.8% before training vs. 55.3 ± 23.4% after training, p < 0.001). Diagnostic sensitivity and specificity improved after training for all diagnostic categories (p < 0.001), with the exception of specificity for the equivocal category (p = 0.883). Diagnostic accuracy was higher for the consistent category than for the inconsistent and equivocal categories (p < 0.001). Comparisons of pre-training and post-training results revealed that inter-rater agreement was poor and did not improve after training, as assessed by overall agreement (0.450 ± 0.406 vs. 0.461 ± 0.575, p = 0.792), Fleiss's kappa (0.133 ± 0.575 vs. 0.178 ± 0.710, p = 0.405), and intraclass correlation coefficient (ICC; 0.219 vs. 0.276, p = 0.470). The radiographic diagnostic accuracy and

  17. Inter-Rater Reliability of Total Body Score-A Scale for Quantification of Corpse Decomposition.

    PubMed

    Nawrocka, Marta; Frątczak, Katarzyna; Matuszewski, Szymon

    2016-05-01

    The degree of body decomposition can be quantified using Total Body Score (TBS), a scale frequently used in taphonomic or entomological studies of decomposition. Here, the inter-rater reliability of the scale is analyzed. The study was made on 120 laymen, which were trained in the use of the scale. Participants scored decomposition of pig carcasses from photographs. It was found that the scale, when used by different people, gives homogeneous results irrespective of the user qualifications (the Krippendorff's alfa for all participants was 0.818). The study also indicated that carcasses in advanced decomposition receive significantly less accurate scores. Moreover, it was found that scores for cadavers in mosaic decomposition (i.e., representing signs of at least two stages of decomposition) are less accurate. These results demonstrate that the scale may be regarded as inter-rater reliable. Some propositions for refinement of the scale were also discussed.

  18. Using Google Street View to audit the built environment: inter-rater reliability results.

    PubMed

    Kelly, Cheryl M; Wilson, Jeffrey S; Baker, Elizabeth A; Miller, Douglas K; Schootman, Mario

    2013-02-01

    Observational field audits are recommended for public health research to collect data on built environment characteristics. A reliable, standardized alternative to field audits that uses publicly available information could provide the ability to efficiently compare results across different study sites and time. This study aimed to assess inter-rater reliability of built environment audits conducted using Google Street View imagery. In 2011, street segments from St. Louis and Indianapolis were geographically stratified to ensure representation of neighborhoods with different land use and socioeconomic characteristics in both cities. Inter-rater reliability was assessed using observed agreement and the prevalence-adjusted bias-adjusted kappa statistic (PABAK). The mean PABAK for all items was 0.84. Ninety-five percent of the items had substantial (PABAK ≥ 0.60) or nearly perfect (PABAK ≥ 0.80) agreement. Using Google Street View imagery to audit the built environment is a reliable method for assessing characteristics of the built environment.

  19. DeuteRater: a tool for quantifying peptide isotope precision and kinetic proteomics.

    PubMed

    Naylor, Bradley C; Porter, Michael T; Wilson, Elise; Herring, Adam; Lofthouse, Spencer; Hannemann, Austin; Piccolo, Stephen R; Rockwood, Alan L; Price, John C

    2017-05-15

    Using mass spectrometry to measure the concentration and turnover of the individual proteins in a proteome, enables the calculation of individual synthesis and degradation rates for each protein. Software to analyze concentration is readily available, but software to analyze turnover is lacking. Data analysis workflows typically don't access the full breadth of information about instrument precision and accuracy that is present in each peptide isotopic envelope measurement. This method utilizes both isotope distribution and changes in neutromer spacing, which benefits the analysis of both concentration and turnover. We have developed a data analysis tool, DeuteRater, to measure protein turnover from metabolic D 2 O labeling. DeuteRater uses theoretical predictions for label-dependent change in isotope abundance and inter-peak (neutromer) spacing within the isotope envelope to calculate protein turnover rate. We have also used these metrics to evaluate the accuracy and precision of peptide measurements and thereby determined the optimal data acquisition parameters of different instruments, as well as the effect of data processing steps. We show that these combined measurements can be used to remove noise and increase confidence in the protein turnover measurement for each protein. Source code and ReadMe for Python 2 and 3 versions of DeuteRater are available at https://github.com/JC-Price/DeuteRater . Data is at https://chorusproject.org/pages/index.html project number 1147. Critical Intermediate calculation files provided as Tables S3 and S4. Software has only been tested on Windows machines. jcprice@chem.byu.edu. Supplementary data are available at Bioinformatics online.

  20. Inter-rater reliability and validity of two ataxia rating scales in children with brain tumours.

    PubMed

    Hartley, H; Pizer, B; Lane, S; Sneade, C; Pratt, R; Bishop, A; Kumar, R

    2015-05-01

    This study aimed to investigate the inter-rater reliability and construct validity of the Scale for the Assessment and Rating of Ataxia (SARA) and Brief Ataxia Rating Scale (BARS) in children with posterior fossa tumours. These scales have been developed for adults with genetic ataxias, and the performance of these scales in children with brain tumours has not previously been described. The participants, who had undergone surgical resection for a posterior fossa tumour (inclusion criteria age 4-18 years), were recruited from the neuro-oncology service at a tertiary children's hospital. Children were assessed using the SARA, BARS and Paediatric Evaluation of Disability Index (PEDI) mobility domain, a measure of function. Children were independently rated by two therapists to determine the inter-rater reliability of the SARA and BARS. The construct validity was determined by assessing the correlation between the two scales with the PEDI. Forty-four children were recruited. Inter-rater reliability was good for both scales, demonstrating the strong correlations (SARA, r = 0.94; BARS, r = 0.91) and the good consistency (93 % of SARA and 90 % of BARS paired scores differing by less than 2 points) between two raters. Both ataxia scales demonstrated a strong negative correlation with the mobility domain of the PEDI (SARA, r = -0.77; BARS, r = -0.76), indicating that more severe ataxia was associated with worse mobility. The mean time for completion of the SARA was 4.5 and 2.7 min for the BARS. The SARA and BARS are reliable and valid measures and appear to be of equal value in determining the severity of ataxia in children with posterior fossa tumours.

  1. Inter-rater agreement between children's self-reported and parents' proxy-reported dental anxiety.

    PubMed

    Patel, H; Reid, C; Wilson, K; Girdler, N M

    2015-02-01

    Healthcare professionals often rely on parents to provide accurate dental anxiety assessment for their children. To date no studies have reported on inter-rater agreement between children's self-reported and their parents'/guardians' proxy-reported dental anxiety in the UK. To assess the frequency of self-reported dental anxiety in 7-16-year-old children and the inter-rater agreement between children's self-reported and parent/guardian proxy-reported dental anxiety for their children. Data were collected prospectively from 7-16-year-old children and their parents/guardians attending two community dental clinics in Fife, Scotland (July 2012-January 2013). Dental anxiety was assessed using faces version of Modified Child Dental Anxiety Scale. Questionnaires were separately and independently completed by children and their accompanying parent or guardian. One hundred and thirty-two child-parent/guardian pairs participated in this study. Children's self-reported dental anxiety was 18% (n=24, 95% CI 12-25). Inter-rater agreement between children and their parent/guardian was poor for dental filling (linear weighted kappa coefficient 0.17) and tooth extraction (0.20), whereas other questions had fair inter-rater agreement (0.21-0.34). Parents' proxy-reported assessments significantly failed to recognise dental anxiety in 46% (n=11) dentally anxious children (p=0.0004). Parent/guardian proxy-reported dental anxiety differs from children's self-reported dental anxiety suggesting children should be encouraged to self-report their dental anxiety.

  2. Inter-Rectus Distance Measurement Using Ultrasound Imaging: Does the Rater Matter?

    PubMed

    Keshwani, Nadia; Hills, Nicole; McLean, Linda

    2016-01-01

    Purpose: To investigate the interrater reliability of inter-rectus distance (IRD) measured from ultrasound images acquired at rest and during a head-lift task in parous women and to establish the standard error of measurement (SEM) and minimal detectable change (MDC) between two raters. Methods: Two physiotherapists independently acquired ultrasound images of the anterior abdominal wall from 17 parous women and measured IRD at four locations along the linea alba: at the superior border of the umbilicus, at 3 cm and 5 cm above the superior border of the umbilicus, and at 3 cm below the inferior border of the umbilicus. The interrater reliability of the IRD measurements was determined using intra-class correlation coefficients (ICCs). Bland-Altman analyses were used to detect bias between the raters, and SEM and MDC values were established for each measurement site. Results: When the two raters performed their own image acquisition and processing, ICCs(3,5) ranged from 0.72 to 0.91 at rest and from 0.63 to 0.96 during head lift, depending on the anatomical measurement site. Bland-Altman analyses revealed no systematic bias between the raters. SEM values ranged from 0.23 cm to 0.71 cm, and MDC values ranged from 0.64 cm to 1.97 cm. Conclusion: When using ultrasound imaging to measure IRD in women, it is acceptable for different therapists to compare IRDs between patients and within patients over time if IRD is measured above or below the umbilicus. Interrater reliability of IRD measurement is poorest at the level of the superior border of the umbilicus.

  3. Inter-rater reliability and acceptance of the structured diagnostic interview for regulatory problems in infancy.

    PubMed

    Popp, Lukka; Fuths, Sabrina; Seehagen, Sabine; Bolten, Margarete; Gross-Hemmi, Mirja; Wolke, Dieter; Schneider, Silvia

    2016-01-01

    Regulatory problems such as excessive crying, sleeping-and feeding difficulties in infancy are some of the earliest precursors of later mental health difficulties emerging throughout the lifespan. In the present study, the inter-rater reliability and acceptance of a structured computer-assisted diagnostic interview for regulatory problems (Baby-DIPS) was investigated. Using a community sample, 132 mothers of infants aged between 3 and 18 months (mean age = 10 months) were interviewed with the Baby-DIPS regarding current and former (combined = lifetime) regulatory problems. Severity of the symptoms was also rated. The interviews were conducted face-to-face at a psychology department at the university (51.5 %), the mother's home (23.5 %), or via telephone (25.0 %). Inter-rater reliability was assessed with Cohen's kappa (k). A sample of 48 mothers and their interviewers filled in acceptance questionnaires after the interview. Good to excellent inter-rater reliability on the levels of current and lifetime regulatory problems (k = 0.77-0.98) were found. High inter-rater agreement was also found for ratings of severity (ICC = 0.86-0.97). Participants and interviewers' overall acceptance ratings of the computer-assisted interview were favourable. Acceptance scores did not differ between interviews that revealed one or more clinically relevant regulatory problem(s) compared to those that revealed no regulatory problems. The Baby-DIPS was found to be a reliable instrument for the assessment of current and lifetime problems in crying and sleeping behaviours. The computer-assisted version of the Baby-DIPS was well accepted by interviewers and mothers. The Baby-DIPS appears to be well-suited for research and clinical use to identify infant regulatory problems.

  4. Inter-Rater Agreement of Auscultation, Palpable Fremitus, and Ventilator Waveform Sawtooth Patterns Between Clinicians.

    PubMed

    Berry, Marc P; Martí, Joan-Daniel; Ntoumenopoulos, George

    2016-10-01

    Clinicians often use numerous bedside assessments for secretion retention in participants who are receiving invasive mechanical ventilation. This study aimed to evaluate inter-rater agreement between clinicians when using standard clinical assessments of secretion retention and whether differences in clinician experience influenced inter-rater agreement. Seventy-one mechanically ventilated participants were assessed by a research clinician and by one of 13 ICU clinicians. Each clinician conducted a standardized assessment of lung auscultation, palpation for chest-wall (rhonchal) fremitus, and ventilator inspiratory/expiratory flow-time waveforms for the sawtooth pattern. On the presence of breath sounds, agreement ranged from absolute to moderate in the upper zones and the lower zones, respectively. Kappa values for abnormal and adventitious lung sounds achieved moderate agreement in the upper zones, less than chance agreement to substantial agreement in the middle zones, and moderate agreement to almost perfect agreement in the lower zones. Moderate to almost perfect agreement was established for palpable fremitus in the upper zones, moderate to substantial agreement in the middle zones, and less than chance to moderate agreement in the lower zones. Inter-rater agreement on the presence of expiratory sawtooth pattern identification showed moderate agreement. The level of percentage agreement between the research and ICU clinicians for each respiratory assessment studied did not relate directly to level of clinical experience. Inter-rater agreement for all assessments showed variability between lung regions but maintained reasonable percentage agreement in mechanically ventilated participants. The level of percentage agreement achieved between clinicians did not directly relate to clinical experience for all respiratory assessments. Therefore, these respiratory assessments should not necessarily be viewed in isolation but interpreted within the context of a full

  5. The Problem of Limited Inter-rater Agreement in Modelling Music Similarity.

    PubMed

    Flexer, Arthur; Grill, Thomas

    2016-07-02

    One of the central goals of Music Information Retrieval (MIR) is the quantification of similarity between or within pieces of music. These quantitative relations should mirror the human perception of music similarity, which is however highly subjective with low inter-rater agreement. Unfortunately this principal problem has been given little attention in MIR so far. Since it is not meaningful to have computational models that go beyond the level of human agreement, these levels of inter-rater agreement present a natural upper bound for any algorithmic approach. We will illustrate this fundamental problem in the evaluation of MIR systems using results from two typical application scenarios: (i) modelling of music similarity between pieces of music; (ii) music structure analysis within pieces of music. For both applications, we derive upper bounds of performance which are due to the limited inter-rater agreement. We compare these upper bounds to the performance of state-of-the-art MIR systems and show how the upper bounds prevent further progress in developing better MIR systems.

  6. The Problem of Limited Inter-rater Agreement in Modelling Music Similarity

    PubMed Central

    Flexer, Arthur; Grill, Thomas

    2016-01-01

    One of the central goals of Music Information Retrieval (MIR) is the quantification of similarity between or within pieces of music. These quantitative relations should mirror the human perception of music similarity, which is however highly subjective with low inter-rater agreement. Unfortunately this principal problem has been given little attention in MIR so far. Since it is not meaningful to have computational models that go beyond the level of human agreement, these levels of inter-rater agreement present a natural upper bound for any algorithmic approach. We will illustrate this fundamental problem in the evaluation of MIR systems using results from two typical application scenarios: (i) modelling of music similarity between pieces of music; (ii) music structure analysis within pieces of music. For both applications, we derive upper bounds of performance which are due to the limited inter-rater agreement. We compare these upper bounds to the performance of state-of-the-art MIR systems and show how the upper bounds prevent further progress in developing better MIR systems. PMID:28190932

  7. Inter-rater reliability determination for two tests of ulnar nerve conduction across the elbow.

    PubMed

    Carroll, Craig G; Landau, Mark E; Rouhanian, Minoo; Campbell, William W

    2017-05-01

    The inter-rater variability in determination of ulnar nerve conduction across the elbow compromises test accuracy. The extent of this variability is unknown. The objective of this study was to determine and compare inter-rater reliability of variables derived from 2 different ulnar nerve conduction studies (NCSs) across the elbow. Two investigators performed a standard ulnar NCS and a 6-cm conduction time (Six-Centimeter Conduction Time test, SCCT) on 60 extremities of asymptomatic subjects. In the standard test, below-elbow (BE) and above-elbow (AE) stimulation points were ≥ 10 cm apart, measured along a curved path, to calculate across-elbow NCV. In SCCT, BE and AE were precisely 6 cm apart measured linearly to calculate CTE (conduction time elbow). Inter-rater reliability was assessed by means of intraclass correlation coefficients (ICC). ICC for across-elbow NCV and CTE were 0.726 and 0.801, respectively. Reliability of CTE and across-elbow NCV are similar. Shorter distances, if measured linearly, can be used to determine across-elbow ulnar nerve conduction. Muscle Nerve 55: 664-668, 2017. © 2016 Wiley Periodicals, Inc.

  8. Assessment and correlation of customer and rater response to cold-start and warmup driveability

    SciTech Connect

    Not Available

    1993-08-01

    A program was conducted from January 14 through March 8, 1991, at Southwest Research Institute (SwRI) in San Antonio, Texas, to establish a relationship between demerits observed in CRC Cold-Start and Warmup Driveability assessments to customer satisfaction levels, and to determine which of several performance deficiencies associated with low volatility gasolines are most troublesome to customers during normal vehicle warmup. Customers used their vehicles in daily service, and a subset of the test fleet was evaluated by trained raters using the established CRC test procedure. There were 7,206 driveability performance assessments by customers which were correlated with 661 trained-rater cold-start driveability evaluations. One hundred sixty-seven SwRI employees participated in the program. Hesitation was the most widely observed problem and was the primary cause of dissatisfaction. The gasoline-ethanol and hydrocarbon-only fuel sets had distinctly different malfunction patterns. Hesitation was strongly associated with gasoline ethanol blends, while surge and stumble were strongly associated with hydrocarbon-only fuels. The current total weighted demerit (TWD) system was found to correlate poorly with customer satisfaction; however, customer observations of problems correlated no better with customer satisfaction. If TWD is to be an indicator of customer perception of driveability performance there should be uniform weighting of rater-observed malfunctions, and start-time should be assigned a greater weighting and a shorter grace period.

  9. Rater training to support high-stakes simulation-based assessments.

    PubMed

    Feldman, Moshe; Lazzara, Elizabeth H; Vanderbilt, Allison A; DiazGranados, Deborah

    2012-01-01

    Competency-based assessment and an emphasis on obtaining higher-level outcomes that reflect physicians' ability to demonstrate their skills has created a need for more advanced assessment practices. Simulation-based assessments provide medical education planners with tools to better evaluate the 6 Accreditation Council for Graduate Medical Education (ACGME) and American Board of Medical Specialties (ABMS) core competencies by affording physicians opportunities to demonstrate their skills within a standardized and replicable testing environment, thus filling a gap in the current state of assessment for regulating the practice of medicine. Observational performance assessments derived from simulated clinical tasks and scenarios enable stronger inferences about the skill level a physician may possess, but also introduce the potential of rater errors into the assessment process. This article reviews the use of simulation-based assessments for certification, credentialing, initial licensure, and relicensing decisions and describes rater training strategies that may be used to reduce rater errors, increase rating accuracy, and enhance the validity of simulation-based observational performance assessments.

  10. Inter-rater reliability and validity of the stroke rehabilitation assessment of movement (stream) instrument.

    PubMed

    Wang, Chun-Hou; Hsieh, Ching-Lin; Dai, May-Hui; Chen, Chia-Hui; Lai, Yu-Fen

    2002-01-01

    The Stroke Rehabilitation Assessment of Movement (STREAM) instrument is used to measure motor and mobility problems in patients who have experienced a stroke. The purposes of the study were to examine the interrater reliability, concurrent and convergent validity of the STREAM instrument in stroke patients. Fifty-four stroke patients participated in the study. For the purpose of interrater reliability, the STREAM instrument was administered by two raters on each patient within a 2-day period. Validity was assessed by comparing the patients' scores on the STREAM instrument with those obtained from the other well-established measures. Weighted kappa statistics for inter-rater agreement on scores for individual items ranged from 0.55 to 0.94. The intraclass correlation coefficient for the total score was 0.96 indicating very high inter-rater reliability. The intraclass correlation coefficients were also very high in each of the subscales. The total STREAM score was moderately to highly associated with the score of the Barthel Index and Fugl-Meyer motor assessment scale, rho = 0.67, and 0.95, respectively. The STREAM subscale scores were closely associated with scores of the other well-validated measures. Our results demonstrate that consistent and valid information can be obtained from the STREAM instrument and support its use in the value of the STREAM evaluation of motor and mobility recovery in persons who have experienced a stroke.

  11. Rater Training to Support High-Stakes Simulation-Based Assessments

    PubMed Central

    Feldman, Moshe; Lazzara, Elizabeth H.; Vanderbilt, Allison A.; DiazGranados, Deborah

    2013-01-01

    Competency-based assessment and an emphasis on obtaining higher-level outcomes that reflect physicians’ ability to demonstrate their skills has created a need for more advanced assessment practices. Simulation-based assessments provide medical education planners with tools to better evaluate the 6 Accreditation Council for Graduate Medical Education (ACGME) and American Board of Medical Specialties (ABMS) core competencies by affording physicians opportunities to demonstrate their skills within a standardized and replicable testing environment, thus filling a gap in the current state of assessment for regulating the practice of medicine. Observational performance assessments derived from simulated clinical tasks and scenarios enable stronger inferences about the skill level a physician may possess, but also introduce the potential of rater errors into the assessment process. This article reviews the use of simulation-based assessments for certification, credentialing, initial licensure, and relicensing decisions and describes rater training strategies that may be used to reduce rater errors, increase rating accuracy, and enhance the validity of simulation-based observational performance assessments. PMID:23280532

  12. "Color-Blind" Racism.

    ERIC Educational Resources Information Center

    Carr, Leslie G.

    Examining race relations in the United States from a historical perspective, this book explains how the constitution is racist and how color blindness is actually a racist ideology. It is argued that Justice Harlan, in his dissenting opinion in Plessy v. Ferguson, meant that the constitution and the law must remain blind to the existence of race…

  13. Blindness after intranasal ethmoidectomy.

    PubMed

    Sözeri, B; Ataman, M; Gürsel, B

    1993-06-01

    Orbital haemorrhage is an unusual and frustrating complication of ethmoid surgery. A case of reversible blindness which was due to intra-operative orbital haemorrhage occurring after intranasal ethmoidectomy is presented. Prevention and management of this kind of blindness can be reversed, if treated aggressively.

  14. "Color-Blind" Racism.

    ERIC Educational Resources Information Center

    Carr, Leslie G.

    Examining race relations in the United States from a historical perspective, this book explains how the constitution is racist and how color blindness is actually a racist ideology. It is argued that Justice Harlan, in his dissenting opinion in Plessy v. Ferguson, meant that the constitution and the law must remain blind to the existence of race…

  15. Blindness and Yoga

    ERIC Educational Resources Information Center

    Heyes, Anthony David

    1974-01-01

    Evidence is presented to support the claims that, among many blind persons, physical inactivity leads to poor physical fitness; that a state of anxiety is often a concomitant of unguided blind mobility; and that Yogic practices offer a solution to both difficulties. (GW)

  16. Unblinding the dark matter blind spots

    DOE PAGES

    Han, Tao; Kling, Felix; Su, Shufang; ...

    2017-02-10

    The dark matter (DM) blind spots in the Minimal Supersymmetric Standard Model (MSSM) refer to the parameter regions where the couplings of the DM particles to the $Z$-boson or the Higgs boson are almost zero, leading to vanishingly small signals for the DM direct detections. In this paper, we carry out comprehensive analyses for the DM searches under the blind-spot scenarios in MSSM. Guided by the requirement of acceptable DM relic abundance, we explore the complementary coverage for the theory parameters at the LHC, the projection for the future underground DM direct searches, and the indirect searches from the relicmore » DM annihilation into photons and neutrinos. We find that (i) the spin-independent (SI) blind spots may be rescued by the spin-dependent (SD) direct detection in the future underground experiments, and possibly by the indirect DM detections from IceCube and SuperK neutrino experiments; (ii) the detection of gamma rays from Fermi-LAT may not reach the desirable sensitivity for searching for the DM blind-spot regions; (iii) the SUSY searches at the LHC will substantially extend the discovery region for the blind-spot parameters. As a result, the dark matter blind spots thus may be unblinded with the collective efforts in future DM searches.« less

  17. Unblinding the dark matter blind spots

    NASA Astrophysics Data System (ADS)

    Han, Tao; Kling, Felix; Su, Shufang; Wu, Yongcheng

    2017-02-01

    The dark matter (DM) blind spots in the Minimal Supersymmetric Standard Model (MSSM) refer to the parameter regions where the couplings of the DM particles to the Z-boson or the Higgs boson are almost zero, leading to vanishingly small signals for the DM direct detections. In this paper, we carry out comprehensive analyses for the DM searches under the blind-spot scenarios in MSSM. Guided by the requirement of acceptable DM relic abundance, we explore the complementary coverage for the theory parameters at the LHC, the projection for the future underground DM direct searches, and the indirect searches from the relic DM annihilation into photons and neutrinos. We find that (i) the spin-independent (SI) blind spots may be rescued by the spin-dependent (SD) direct detection in the future underground experiments, and possibly by the indirect DM detections from IceCube and SuperK neutrino experiments; (ii) the detection of gamma rays from Fermi-LAT may not reach the desirable sensitivity for searching for the DM blind-spot regions; (iii) the SUSY searches at the LHC will substantially extend the discovery region for the blind-spot parameters. The dark matter blind spots thus may be unblinded with the collective efforts in future DM searches.

  18. Inter-rater reliability of two depression rating scales, MADRS and DRRS, based on videotape records of structured interviews.

    PubMed

    Corruble, E; Purper, D; Payan, C; Guelfi, J

    1998-08-01

    The inter-rater reliability of the French versions of the MADRS and the DRRS was studied on the basis of 58 videotape records of structured standardised interviews of depressed inpatients under antidepressant treatment. Each patient was assessed by two trained raters, from the same videotape recording. The inter-rater reliability of total scores was high with both scales (intra-class correlation coefficients: 0.86 for MADRS and 0.77 for DRRS). However, the inter-rater reliability for individual items was higher and more homogeneous for the MADRS than for the DRRS. Finally, the structured interview in French appears to be relevant for the MADRS, but it should be improved for the DRRS.

  19. Cross-national differences in the assessment of psychopathy: do they reflect variations in raters' perceptions of symptoms?

    PubMed

    Cooke, David J; Hart, Stephen D; Michie, Christine

    2004-09-01

    Cross-national differences in the prevalence of psychopathy have been reported. This study examined whether rater effects could account for these differences. Psychopathy was assessed with the Psychopathy Checklist-Revised (PCL-R; R. D. Hare, 1991). Videotapes of 6 Scottish prisoners and 6 Canadian prisoners were rated by 10 Scottish and 10 Canadian raters. No significant main or interaction effects involving the nationality of raters were detected at the level of full scores or factor scores. Using a generalizability theory approach, it was demonstrated that the interrater reliability of total scores was good, that is, the proportion of variance in test scores attributable to raters was small. The interrater reliability of factor scores was lower, typically falling in the fair range. Overall, the results suggest that the reported cross-national differences are more likely to be in the expression of the disorder rather than in the eye of the beholder.

  20. Blinded by Irrelevance: Pure Irrelevance Induced "Blindness"

    ERIC Educational Resources Information Center

    Eitam, Baruch; Yeshurun, Yaffa; Hassan, Kinneret

    2013-01-01

    To what degree does our representation of the immediate world depend solely on its relevance to what we are currently doing? We examined whether relevance per se can cause "blindness," even when there is no resource limitation. In a novel paradigm, people looked at a colored circle surrounded by a differently colored ring--the task relevance of…

  1. Blinded by Irrelevance: Pure Irrelevance Induced "Blindness"

    ERIC Educational Resources Information Center

    Eitam, Baruch; Yeshurun, Yaffa; Hassan, Kinneret

    2013-01-01

    To what degree does our representation of the immediate world depend solely on its relevance to what we are currently doing? We examined whether relevance per se can cause "blindness," even when there is no resource limitation. In a novel paradigm, people looked at a colored circle surrounded by a differently colored ring--the task relevance of…

  2. How "Blind" Are Double-Blind Studies?

    ERIC Educational Resources Information Center

    Margraf, Jurgen; And Others

    1991-01-01

    Compared alprazolam, imipramine, and placebo in the treatment of panic disorder patients (n=59) to investigate concerns about the internal validity of the double-blind design. Found that the great majority of patients and physicians were able to rate accurately whether active drug or placebo had been given and physicians could distinguish between…

  3. The Scarbase Duo(®): Intra-rater and inter-rater reliability and validity of a compact dual scar assessment tool.

    PubMed

    Fell, Matthew; Meirte, Jill; Anthonissen, Mieke; Maertens, Koen; Pleat, Jonathon; Moortgat, Peter

    2016-03-01

    Objective scar assessment tools were designed to help identify problematic scars and direct clinical management. Their use has been restricted by their measurement of a single scar property and the bulky size of equipment. The Scarbase Duo(®) was designed to assess both trans-epidermal water loss (TEWL) and colour of a burn scar whilst being compact and easy to use. Twenty patients with a burn scar were recruited and measurements taken using the Scarbase Duo(®) by two observers. The Scarbase Duo(®) measures TEWL via an open-chamber system and undertakes colorimetry via narrow-band spectrophotometry, producing values for relative erythema and melanin pigmentation. Validity was assessed by comparing the Scarbase Duo(®) against the Dermalab(®) and the Minolta Chromameter(®) respectively for TEWL and colorimetry measurements. The intra-class correlation coefficient (ICC) was used to assess reliability with standard error of measurement (SEM) used to assess reproducibility of measurements. The Pearson correlation coefficient (r) was used to assess the convergent validity. The Scarbase Duo(®) TEWL mode had excellent reliability when used on scars for both intra- (ICC=0.95) and inter-rater (ICC=0.96) measurements with moderate SEM values. The erythema component of the colorimetry mode showed good reliability for use on scars for both intra-(ICC=0.81) and inter-rater (ICC=0.83) measurements with low SEM values. Pigmentation values showed excellent reliability on scar tissue for both intra- (ICC=0.97) and inter-rater (ICC=0.97) with moderate SEM values. The Scarbase Duo(®) TEWL function had excellent correlation with the Dermalab(®) (r=0.93) whilst the colorimetry erythema value had moderate correlation with the Minolta Chromameter (r=0.72). The Scarbase Duo(®) is a reliable and objective scar assessment tool, which is specifically designed for burn scars. However, for clinical use, standardised measurement conditions are recommended.

  4. Automated Essay Scoring with e-rater® v.2.0. Research Report. ETS RR-04-45

    ERIC Educational Resources Information Center

    Attali, Yigal; Burstein, Jill

    2005-01-01

    The e-rater® system has been used by ETS for automated essay scoring since 1999. This paper describes a new version of e-rater (v.2.0) that differs from the previous one (v.1.3) with regard to the feature set and model building approach. The paper describes the new version, compares the new and previous versions in terms of performance, and…

  5. Ocular Motor Score (OMS): a clinical tool to evaluating ocular motor functions in children. Intrarater and inter-rater agreement.

    PubMed

    Olsson, Monica; Teär Fahnehjelm, Kristina; Rydberg, Agneta; Ygge, Jan

    2015-08-01

    Ocular motor score (OMS) is a new clinical test protocol for evaluating ocular motor functions in children and young adults. OMS is a set of 15 important and relevant non-invasive ocular motor function parameters derived from clinical practice. The aim of the study was to evaluate OMS according to intrarater and inter-rater agreement. Forty children aged 4-10 years, 23 girls median age 6.5 (range 4.3-9.3) and 17 boys median age 5.8 (range 4.1-9.8) were included. The ocular motor functions were assessed and scored according to the OMS protocol. The examinations were videotaped. To obtain the intrarater agreement, the first author examined and scored the children twice, first in the clinic and 2 weeks later by watching the videotape. To obtain the inter-rater agreement, three other raters independently scored the ocular motor function of the children by watching the videotapes. The overall observed intrarater agreement was 88%, and the observed inter-rater agreement between the three raters was 80%. For none of the subtests was there an observed intrarater agreement lower than 65%. Three of the subtests had an observed inter-rater agreement of 65% or below. Overall there was high observed intra- and inter-rater agreement for the OMS test protocol. Subtests such as saccades and smooth pursuit were more difficult for raters to score similarly according the clinical OMS test protocol. © 2015 Acta Ophthalmologica Scandinavica Foundation. Published by John Wiley & Sons Ltd.

  6. The Intra- and Inter-Rater Reliability of an Instrumented Spasticity Assessment in Children with Cerebral Palsy

    PubMed Central

    Schless, Simon-Henri; Desloovere, Kaat; Aertbeliën, Erwin; Molenaers, Guy; Huenaerts, Catherine; Bar-On, Lynn

    2015-01-01

    Aim Despite the impact of spasticity, there is a lack of objective, clinically reliable and valid tools for its assessment. This study aims to evaluate the reliability of various performance- and spasticity-related parameters collected with a manually controlled instrumented spasticity assessment in four lower limb muscles in children with cerebral palsy (CP). Method The lateral gastrocnemius, medial hamstrings, rectus femoris and hip adductors of 12 children with spastic CP (12.8 years, ±4.13 years, bilateral/unilateral involvement n=7/5) were passively stretched in the sagittal plane at incremental velocities. Muscle activity, joint motion, and torque were synchronously recorded using electromyography, inertial sensors, and a force/torque load-cell. Reliability was assessed on three levels: (1) intra- and (2) inter-rater within session, and (3) intra-rater between session. Results Parameters were found to be reliable in all three analyses, with 90% containing intra-class correlation coefficients >0.6, and 70% of standard error of measurement values <20% of the mean values. The most reliable analysis was intra-rater within session, followed by intra-rater between session, and then inter-rater within session. The Adds evaluation had a slightly lower level of reliability than that of the other muscles. Conclusions Limited intrinsic/extrinsic errors were introduced by repeated stretch repetitions. The parameters were more reliable when the same rater, rather than different raters performed the evaluation. Standardisation and training should be further improved to reduce extrinsic error when different raters perform the measurement. Errors were also muscle specific, or related to the measurement set-up. They need to be accounted for, in particular when assessing pre-post interventions or longitudinal follow-up. The parameters of the instrumented spasticity assessment demonstrate a wide range of applications for both research and clinical environments in the

  7. SU-E-T-511: Inter-Rater Variability in Classification of Incidents in a New Incident Reporting System

    SciTech Connect

    Pappas, D; Reis, S; Ali, A; Kapur, A

    2015-06-15

    Purpose To determine how consistent the results of different raters are when reviewing the same cases within the Radiation Oncology Incident Learning System (ROILS). Methods Three second-year medical physics graduate students filled out incident reports in spreadsheets set up to mimic ROILS. All students studied the same 33 cases and independently entered their assessments, for a total of 99 reviewed cases. The narratives for these cases were obtained from a published International Commission on Radiological Protection (ICRP) report which included shorter narratives selected from the Radiation Oncology Safety Information System (ROSIS) database. Each category of questions was reviewed to see how consistent the results were by utilizing free-marginal multirater kappa analysis. The percentage of cases where all raters shared full agreement or full disagreement was recorded to show which questions were answered consistently by multiple raters for a given case. The consistency among the raters was analyzed between ICRP and ROSIS cases to see if either group led to more reliable results. Results The categories where all raters agreed 100 percent in their choices were the event type (93.94 percent of cases 0.946 kappa) and the likelihood of the event being harmful to the patient (42.42 percent of cases 0.409 kappa). The categories where all raters disagreed 100 percent in their choices were the dosimetric severity scale (39.39 percent of cases 0.139 kappa) and the potential future toxicity (48.48 percent of cases 0.205 kappa). ROSIS had more cases where all raters disagreed than ICRP (23.06 percent of cases compared to 15.58 percent, respectively). Conclusion Despite reviewing the same cases, the results among the three raters was widespread. ROSIS narratives were shorter than ICRP, which suggests that longer narratives lead to more consistent results. This study shows that the incident reporting system can be optimized to yield more consistent results.

  8. Inter-rater reliability of three standardized functional tests in patients with low back pain

    PubMed Central

    Tidstrand, Johan; Horneij, Eva

    2009-01-01

    Background Of all patients with low back pain, 85% are diagnosed as "non-specific lumbar pain". Lumbar instability has been described as one specific diagnosis which several authors have described as delayed muscular responses, impaired postural control as well as impaired muscular coordination among these patients. This has mostly been measured and evaluated in a laboratory setting. There are few standardized and evaluated functional tests, examining functional muscular coordination which are also applicable in the non-laboratory setting. In ordinary clinical work, tests of functional muscular coordination should be easy to apply. The aim of this present study was to therefore standardize and examine the inter-rater reliability of three functional tests of muscular functional coordination of the lumbar spine in patients with low back pain. Methods Nineteen consecutive individuals, ten men and nine women were included. (Mean age 42 years, SD ± 12 yrs). Two independent examiners assessed three tests: "single limb stance", "sitting on a Bobath ball with one leg lifted" and "unilateral pelvic lift" on the same occasion. The standardization procedure took altered positions of the spine or pelvis and compensatory movements of the free extremities into account. The inter-rater reliability was analyzed by Cohen's kappa coefficient (κ) and by percentage agreement. Results The inter-rater reliability for the right and the left leg respectively was: for the single limb stance very good (κ: 0.88–1.0), for sitting on a Bobath ball good (κ: 0.79) and very good (κ: 0.88) and for the unilateral pelvic lift: good (κ: 0.61) and moderate (κ: 0.47). Conclusion The present study showed good to very good inter-rater reliability for two standardized tests, that is, the single-limb stance and sitting on a Bobath-ball with one leg lifted. Inter-rater reliability for the unilateral pelvic lift test was moderate to good. Validation of the tests in their ability to evaluate lumbar

  9. Shape Perception and Navigation in Blind Adults

    PubMed Central

    Gori, Monica; Cappagli, Giulia; Baud-Bovy, Gabriel; Finocchietti, Sara

    2017-01-01

    Different sensory systems interact to generate a representation of space and to navigate. Vision plays a critical role in the representation of space development. During navigation, vision is integrated with auditory and mobility cues. In blind individuals, visual experience is not available and navigation therefore lacks this important sensory signal. In blind individuals, compensatory mechanisms can be adopted to improve spatial and navigation skills. On the other hand, the limitations of these compensatory mechanisms are not completely clear. Both enhanced and impaired reliance on auditory cues in blind individuals have been reported. Here, we develop a new paradigm to test both auditory perception and navigation skills in blind and sighted individuals and to investigate the effect that visual experience has on the ability to reproduce simple and complex paths. During the navigation task, early blind, late blind and sighted individuals were required first to listen to an audio shape and then to recognize and reproduce it by walking. After each audio shape was presented, a static sound was played and the participants were asked to reach it. Movements were recorded with a motion tracking system. Our results show three main impairments specific to early blind individuals. The first is the tendency to compress the shapes reproduced during navigation. The second is the difficulty to recognize complex audio stimuli, and finally, the third is the difficulty in reproducing the desired shape: early blind participants occasionally reported perceiving a square but they actually reproduced a circle during the navigation task. We discuss these results in terms of compromised spatial reference frames due to lack of visual input during the early period of development. PMID:28144226

  10. Shape Perception and Navigation in Blind Adults.

    PubMed

    Gori, Monica; Cappagli, Giulia; Baud-Bovy, Gabriel; Finocchietti, Sara

    2017-01-01

    Different sensory systems interact to generate a representation of space and to navigate. Vision plays a critical role in the representation of space development. During navigation, vision is integrated with auditory and mobility cues. In blind individuals, visual experience is not available and navigation therefore lacks this important sensory signal. In blind individuals, compensatory mechanisms can be adopted to improve spatial and navigation skills. On the other hand, the limitations of these compensatory mechanisms are not completely clear. Both enhanced and impaired reliance on auditory cues in blind individuals have been reported. Here, we develop a new paradigm to test both auditory perception and navigation skills in blind and sighted individuals and to investigate the effect that visual experience has on the ability to reproduce simple and complex paths. During the navigation task, early blind, late blind and sighted individuals were required first to listen to an audio shape and then to recognize and reproduce it by walking. After each audio shape was presented, a static sound was played and the participants were asked to reach it. Movements were recorded with a motion tracking system. Our results show three main impairments specific to early blind individuals. The first is the tendency to compress the shapes reproduced during navigation. The second is the difficulty to recognize complex audio stimuli, and finally, the third is the difficulty in reproducing the desired shape: early blind participants occasionally reported perceiving a square but they actually reproduced a circle during the navigation task. We discuss these results in terms of compromised spatial reference frames due to lack of visual input during the early period of development.

  11. Inter and Intra Rater Reliability of the 10 Meter Walk Test in the Community Dweller Adults with Spastic Cerebral Palsy

    PubMed Central

    BAHRAMI, Fariba; NOORIZADEH DEHKORDI, Shohreh; DADGOO, Mehdi

    2017-01-01

    Objective We aimed to investigation the intra-rater and inter-raters reliability of the 10 meter walk test (10 MWT) in adults with spastic cerebral palsy (CP). Materials & Methods Thirty ambulatory adults with spastic CP in the summer of 2014 participated (19 men, 11 women; mean age 28 ± 7 yr, range 18- 46 yr). Individuals were non-randomly selected by convenient sampling from the Ra’ad Rehabilitation Goodwill Complex in Tehran, Iran. They had GMFCS levels below IV (I, II, and III). Retest interval for inter-raters study lasted a week. During the tests, participants walked with their maximum speed. Intraclass correlation coefficients (ICC) estimated reliability. Results The 10 MWT ICC for intra-rater was 0.98 (95% confidence interval (CI) 0.96-0.99) for participants, and >0.89 in GMFCS subgroups (95% confidence interval (CI) lower bound>0.67). The 10 MWT inter-raters’ ICC was 0.998 (95% confidence interval (CI) 0/996-0/999), and >0.993 in GMFCS subgroups (95% confidence interval (CI) lower bound>0.977). Standard error of the measurement (SEM) values for both studies was small (0.02< SEM< 0.07). Conclusion Excellent intra-rater and inter-raters reliability of the 10 MWT in adults with CP, especially in the moderate motor impairments (GMFCS level III), indicates that this tool can be used in clinics to assess the results of interventions. PMID:28277557

  12. [The relationship between state anxiety and the impression of sandplay productions in terms of factors of the makers and raters].

    PubMed

    Endo, Ayumu

    2012-02-01

    This study investigated the relationship between the state anxiety of Sandplay makers and raters, and the raters' impressions of the Sandplay productions. The S-Anxiety subscale of the STAI was administered to college students. One group (N = 20) created Sandplay productions which were photographed. Three works were selected from higher S-Anxiety subjects (H-works) and three from lower S-Anxiety subjects (L-works). Then another group of 58 college students were asked to rate these Sandplay productions using the SD method. Factor analysis extracted three factors of Flexibility, Integration, and Activity. The raters were divided into two groups based on their S-Anxiety scores, and their subscale scores were examined using ANOVA. Significant main effects for the makers involved Flexibility and Activity (L-works < H-works). This suggests that the S-Anxiety and ego function of the makers influence their works. Furthermore, an interaction was found with Integration. Higher S-Anxiety raters rated the Integration of L-works lower than did the lower S-Anxiety raters. This indicates that higher S-Anxiety raters observed the free expression of lower S-anxiety makers from a partial perspective.

  13. Reliability of novice raters in using the movement system impairment approach to classify people with low back pain

    PubMed Central

    Henry, Sharon M.; Van Dillen, Linda R.; Trombley, Andrea R.; Dee, Justine M.; Bunn, Janice Y.

    2013-01-01

    Observational cross sectional study. To examine the inter-rater reliability of novice raters in using the Movement System Impairment (MSI) approach system and to explore the patterns of disagreement in classification errors. The inter-rater reliability of individual tests items used in the MSI approach is moderate to good; however, the reliability of the classification algorithm has been tested only preliminarily. Using previously recorded patient data (n = 21), 13 novice raters classified patients according to the MSI schema. The overall percent agreement using the kappa statistic as well as the agreement/disagreement among pair-wise comparisons in classification assignments were examined. There was an overall 87.4% agreement in the pairs of classification judgments with a kappa coefficient of 0.81 (95% CI: 0.79, 0.83). Raters were most likely to agree on the classification of Flexion (100%) and least likely to agree on the classification of Rotation (84%). The MSI classification algorithm can be learned by novice users and with training, their inter-rater reliability in applying the algorithm for classification judgments is good and similar to that reported in other studies. However, some degree of error persists in the classification decision-making associated with the MSI system, in particular for the Rotation category. PMID:22796388

  14. On individual differences in person perception: raters' personality traits relate to their psychopathy checklist-revised scoring tendencies.

    PubMed

    Miller, Audrey K; Rufino, Katrina A; Boccaccini, Marcus T; Jackson, Rebecca L; Murrie, Daniel C

    2011-06-01

    This study investigated raters' personality traits in relation to scores they assigned to offenders using the Psychopathy Checklist-Revised (PCL-R). A total of 22 participants, including graduate students and faculty members in clinical psychology programs, completed a PCL-R training session, independently scored four criminal offenders using the PCL-R, and completed a comprehensive measure of their own personality traits. A priori hypotheses specified that raters' personality traits, and their similarity to psychopathy characteristics, would relate to raters' PCL-R scoring tendencies. As hypothesized, some raters assigned consistently higher scores on the PCL-R than others, especially on PCL-R Facets 1 and 2. Also as hypothesized, raters' scoring tendencies related to their own personality traits (e.g., higher rater Agreeableness was associated with lower PCL-R Interpersonal facet scoring). Overall, findings underscore the need for future research to examine the role of evaluator characteristics on evaluation results and the need for clinical training to address evaluators' personality influences on their ostensibly objective evaluations.

  15. Accuracy and reliability of the sensory test performed using the laryngopharyngeal endoscopic esthesiometer and rangefinder in patients with suspected obstructive sleep apnoea hypopnoea: protocol for a prospective double-blinded, randomised, exploratory study.

    PubMed

    Giraldo-Cadavid, Luis Fernando; Bastidas, Alirio Rodrigo; Padilla-Ortiz, Diana Marcela; Concha-Galan, Diana Carolina; Bazurto, María Angelica; Vargas, Leslie

    2017-08-21

    Patients with obstructive sleep apnoea hypopnoea syndrome (OSA) might have varying degrees of laryngopharyngeal mechanical hyposensitivity that might impair the brain's capacity to prevent airway collapse during sleep. However, this knowledge about sensory compromises in OSA comes from studies performed using methods with little evidence of their validity. Hence, the purpose of this study is to assess the reliability and accuracy of the measurement of laryngopharyngeal mechanosensitivity in patients with OSA using a recently developed laryngopharyngeal endoscopic esthesiometer and rangefinder (LPEER). The study will be prospective and double blinded, with a randomised crossover assignment of raters performing the sensory tests. Subjects will be recruited from patients with suspected OSA referred for baseline polysomnography to a university hospital sleep laboratory. Intra-rater and inter-rater reliability will be evaluated using the Bland-Altman's limits of agreement plot, the intraclass correlation coefficient, and the Pearson or Spearman correlation coefficient, depending on the distribution of the variables. Diagnostic accuracy will be evaluated plotting ROC curves using standard baseline polysomnography as a reference. The sensory threshold values ​​for patients with mild, moderate and severe OSA will be determined and compared using ANOVA or the Kruskal-Wallis test, depending on the distribution of the variables. The LPEER could be a new tool for evaluating and monitoring laryngopharyngeal sensory impairment in patients with OSA. If it is shown to be valid, it could help to increase our understanding of the pathophysiological mechanisms of this condition and potentially help in finding new therapeutic interventions for OSA. The protocol has been approved by the Institutional Review Board of Fundacion Neumologica Colombiana. The results will be disseminated through conference presentations and peer-reviewed publication. This trial was registered at Clinical

  16. The occurrence and inter-rater reliability of myofascial trigger points in the quadratus lumborum and gluteus medius: a prospective study in non-specific low back pain patients and controls in general practice.

    PubMed

    Njoo, K H; Van der Does, E

    1994-09-01

    The presence of a trigger point is essential to the myofascial pain syndrome. This study centres on identifying clearer criteria for the presence of trigger points in the quadratus lumborum and gluteus medius muscle by investigating the occurrence and inter-rater reliability of trigger point symptoms. Using the symptoms and signs as described by Simons' 1990 definition and two other former sets of criteria, 61 non-specific low back pain patients and 63 controls were examined in general practice by 5 observers, working in pairs. From the two major criteria of Simons' 1990 definition only 'localized tenderness' has good discriminative ability and inter-rater reliability (kappa > 0.5). This study does not find proof for the clinical usefulness of 'referred pain', which has neither of these two abilities. The criteria 'jump sign' and 'recognition', on the condition that localized tenderness is present, also have good discriminative ability and inter-rater reliability. Trigger points defined by the criteria found eligible in this study allow significant distinction between non-specific low back pain patients and controls. This is not the case with trigger points defined by Simons' 1990 criteria. Concerning reliability there is also a significant difference between the two different criteria sets. This study suggests that the clinical usefulness of trigger points is increased when localized tenderness and the presence of either jump sign or patient's recognition of his pain complaint are used as criteria for the presence of trigger points in the M. quadratus lumborum and the M. gluteus medius.

  17. Inter-rater reliability between musculoskeletal radiologists and orthopedic surgeons on computed tomography imaging features of spinal metastases.

    PubMed

    Khan, L; Mitera, G; Probyn, L; Ford, M; Christakis, M; Finkelstein, J; Donovan, A; Zhang, L; Zeng, L; Rubenstein, J; Yee, A; Holden, L; Chow, E

    2011-12-01

    The primary objective of this pilot study was to examine the inter-rater reliability in scoring the computed tomography (ct) imaging features of spinal metastases in patients referred for radiotherapy (rt) for bone pain. In a retrospective review, 3 musculoskeletal radiologists and 2 orthopedic spinal surgeons independently evaluated ct imaging features for 41 patients with spinal metastases treated with rt in an outpatient radiation clinic from January 2007 to October 2008. The evaluation used spinal assessment criteria that had been developed in-house, with reference to osseous and soft tissue tumour extent,presence of a pathologic fracture,severity of vertebral height loss, andpresence of kyphosis.The Cohen kappa coefficient between the two specialties was calculated. Mean patient age was 69.2 years (30 men, 11 women). The mean total daily oral morphine equivalent was 73.4 mg. Treatment dose-fractionation schedules included 8 Gy/1 (n = 28), 20 Gy/5 (n = 12), and 20 Gy/8 (n = 1). Areas of moderate agreement in identifying the ct imaging appearance of spinal metastasis included extent of vertebral body involvement (κ = 0.48) and soft-tissue component (κ = 0.59). Areas of fair agreement included extent of pedicle involvement (κ = 0.28), extent of lamina involvement (κ = 0.35), and presence of pathologic fracture (κ = 0.20). Areas of poor agreement included nerve-root compression (κ = 0.14) and vertebral body height loss (κ = 0.19). The range of agreement between musculoskeletal radiologists and orthopedic surgeons for most spinal assessment criteria is moderate to poor. A consensus for managing challenging vertebral injuries secondary to spinal metastases needs to be established so as to best triage patients to the most appropriate therapeutic modality.

  18. Hemianopic colour blindness.

    PubMed Central

    Albert, M L; Reches, A; Silverberg, R

    1975-01-01

    A man developed cortical blindness after cerebral infarction in the distribution of both posterior cerebral arteries. When he recovered from this condition, he was found to be colour blind in the left visual field, but not in the right. This unusual situation resulted in apparently contradictory performances on hemifield and free-field tasks of colour discrimination, naming, and recognition. The contradictions may be explained by interhemispheric competition between a hemisphere which could discriminate colours and a hemisphere which was colour blind. PMID:1080190

  19. Reference Books in Special Media. Reference Circular No. 82-4.

    ERIC Educational Resources Information Center

    Library of Congress, Washington, DC. National Library Service for the Blind and Physically Handicapped.

    Based on information contained in producers' catalogs and on responses to a survey conducted by the Reference Section of the Library of Congress National Library Service (NLS) for the Blind and Physically Handicapped, this publication lists reference materials produced in braille or in large type, and sound recordings of reference works available…

  20. Can Physicians Identify Inappropriate Nuclear Stress Tests? An Examination of Inter-rater Reliability for the 2009 Appropriate Use Criteria for Radionuclide Imaging

    PubMed Central

    Ye, Siqin; Rabbani, LeRoy E.; Kelly, Christopher R.; Kelly, Maureen R.; Lewis, Matthew; Paz, Yehuda; Peck, Clara L.; Rao, Shaline; Bokhari, Sabahat; Weiner, Shepard D.; Einstein, Andrew J.

    2014-01-01

    Background We sought to determine inter-rater reliability of the 2009 Appropriate Use Criteria (AUC) for radionuclide imaging (RNI) and whether physicians at various levels of training can effectively identify nuclear stress tests with inappropriate indications. Methods and Results Four hundred patients were randomly selected from a consecutive cohort of patients undergoing nuclear stress testing at an academic medical center. Raters with different levels of training (including cardiology attending physicians, cardiology fellows, internal medicine hospitalists, and internal medicine interns) classified individual nuclear stress tests using the 2009 AUC. Consensus classification by two cardiologists was considered the operational gold standard, and sensitivity and specificity of individual raters for identifying inappropriate tests was calculated. Inter-rater reliability of the AUC was assessed using Cohen’s kappa statistics for pairs of different raters. The mean age of patients was 61.5 years; 214 (54%) were female. The cardiologists rated 256 (64%) of 400 NSTs as appropriate, 68 (18%) as uncertain, 55 (14%) as inappropriate; 21 (5%) tests were unable to be classified. Inter-rater reliability for non-cardiologist raters was modest (unweighted Cohen’s kappa, 0.51, 95% confidence interval, 0.45 to 0.55). Sensitivity of individual raters for identifying inappropriate tests ranged from 47% to 82%, while specificity ranged from 85% to 97%. Conclusions Inter-rater reliability for the 2009 AUC for RNI is modest, and there is considerable variation in the ability of raters at different levels of training to identify inappropriate tests. PMID:25563660

  1. Can physicians identify inappropriate nuclear stress tests? An examination of inter-rater reliability for the 2009 appropriate use criteria for radionuclide imaging.

    PubMed

    Ye, Siqin; Rabbani, LeRoy E; Kelly, Christopher R; Kelly, Maureen R; Lewis, Matthew; Paz, Yehuda; Peck, Clara L; Rao, Shaline; Bokhari, Sabahat; Weiner, Shepard D; Einstein, Andrew J

    2015-01-01

    We sought to determine inter-rater reliability of the 2009 Appropriate Use Criteria for radionuclide imaging and whether physicians at various levels of training can effectively identify nuclear stress tests with inappropriate indications. Four hundred patients were randomly selected from a consecutive cohort of patients undergoing nuclear stress testing at an academic medical center. Raters with different levels of training (including cardiology attending physicians, cardiology fellows, internal medicine hospitalists, and internal medicine interns) classified individual nuclear stress tests using the 2009 Appropriate Use Criteria. Consensus classification by 2 cardiologists was considered the operational gold standard, and sensitivity and specificity of individual raters for identifying inappropriate tests were calculated. Inter-rater reliability of the Appropriate Use Criteria was assessed using Cohen κ statistics for pairs of different raters. The mean age of patients was 61.5 years; 214 (54%) were female. The cardiologists rated 256 (64%) of 400 nuclear stress tests as appropriate, 68 (18%) as uncertain, 55 (14%) as inappropriate; 21 (5%) tests were unable to be classified. Inter-rater reliability for noncardiologist raters was modest (unweighted Cohen κ, 0.51, 95% confidence interval, 0.45-0.55). Sensitivity of individual raters for identifying inappropriate tests ranged from 47% to 82%, while specificity ranged from 85% to 97%. Inter-rater reliability for the 2009 Appropriate Use Criteria for radionuclide imaging is modest, and there is considerable variation in the ability of raters at different levels of training to identify inappropriate tests. © 2015 American Heart Association, Inc.

  2. Leading Causes of Blindness

    MedlinePlus

    ... Cataract. Photo courtesy of National Eye Institute, NIH Cataracts Cataracts are a clouding of the lenses in your ... older people. More than 22 million Americans have cataracts. They are the leading cause of blindness in ...

  3. Facts About Color Blindness

    MedlinePlus

    ... visual function, preservation of sight, and the special health problems and requirements of the blind.” News & Events Events Calendar NEI Press Releases News from NEI Grantees Spokesperson bios Statistics and ... Frequently asked questions Clinical Studies Publications Catalog ...

  4. What is Color Blindness?

    MedlinePlus

    ... three color cone cells to determine our color perception. Color blindness can occur when one or more ... Anyone who experiences a significant change in color perception should see an ophthalmologist (Eye M.D.). Next ...

  5. Color Blindness Simulations

    MedlinePlus

    ... gov Site Map News Organization Search NWS All NOAA Home Why build an accessible web? How many ... Updates, additions Contact Us For assistance contact your NOAA Line Office Section 508 Coordinator Color blindness Simulations ...

  6. Adjustment to Blindness

    ERIC Educational Resources Information Center

    Delafield, George L.

    1976-01-01

    The author examines various factors that can or should be used to determine adjustment to a disability such as blindness and discusses the need for developing ways to accurately measure the process. (Author)

  7. Commitment to Change and Challenges to Implementing Changes After Workplace-Based Assessment Rater Training.

    PubMed

    Kogan, Jennifer R; Conforti, Lisa N; Yamazaki, Kenji; Iobst, William; Holmboe, Eric S

    2017-03-01

    Faculty development for clinical faculty who assess trainees is necessary to improve assessment quality and impor tant for competency-based education. Little is known about what faculty plan to do differently after training. This study explored the changes faculty intended to make after workplace-based assessment rater training, their ability to implement change, predictors of change, and barriers encountered. In 2012, 45 outpatient internal medicine faculty preceptors (who supervised residents) from 26 institutions participated in rater training. They completed a commitment to change form listing up to five commitments and ranked (on a 1-5 scale) their motivation for and anticipated difficulty implementing each change. Three months later, participants were interviewed about their ability to implement change and barriers encountered. The authors used logistic regression to examine predictors of change. Of 191 total commitments, the most common commitments focused on what faculty would change about their own teaching (57%) and increasing direct observation (31%). Of the 183 commitments for which follow-up data were available, 39% were fully implemented, 40% were partially implemented, and 20% were not implemented. Lack of time/competing priorities was the most commonly cited barrier. Higher initial motivation (odds ratio [OR] 2.02; 95% confidence interval [CI] 1.14, 3.57) predicted change. As anticipated difficulty increased, implementation became less likely (OR 0.67; 95% CI 0.49, 0.93). While higher baseline motivation predicted change, multiple system-level barriers undermined ability to implement change. Rater-training faculty development programs should address how faculty motivation and organizational barriers interact and influence ability to change.

  8. The intra-rater reliability of a revised 3-point grading system for accessory joint mobilizations.

    PubMed

    Ward, Jennifer; Hebron, Clair; Petty, Nicola J

    2017-09-01

    Joint mobilizations are often quantified using a 4-point grading system based on the physiotherapist's detection of resistance. It is suggested that the initial resistance to joint mobilizations is imperceptible to physiotherapists, but that at some point through range becomes perceptible, a point termed R1. Grades of mobilization traditionally hinge around this concept and are performed either before or after R1. Physiotherapists, however, show poor reliability in applying grades of mobilization. The definition of R1 is ambiguous and dependent on the skills of the individual physiotherapist. The aim of this study is to test a revised grading system where R1 is considered at the beginning of range, and the entire range, as perceived by the physiotherapists maximum force application, is divided into three, creating 3 grades of mobilization. Thirty-two post-registration physiotherapists and nineteen pre-registration students assessed end of range (point R2) and then applied 3 grades of AP mobilizations, over the talus, in an asymptomatic models ankle. Vertical forces were recorded through a force platform. Intra-class Correlation Coefficients, Standard Error of Measurement, and Minimal Detectable Change were calculated to explore intra-rater reliability on intra-day and inter-day testing. T-tests determined group differences. Intra-rater reliability was excellent for intra-day testing (ICC 0.96-0.97), and inter-day testing (ICC 0.85-0.93). No statistical difference was found between pre- and post-registration groups. Standardizing the definition of grades of mobilization, by moving R1 to the beginning of range and separating grades into thirds, results in excellent intra-rater reliability on intra-day and inter-day tests. 3b.

  9. A comparison of human raters and an intra-oral spectrophotometer.

    PubMed

    Browning, William D; Chan, Daniel C; Blalock, John S; Brackett, Martha G

    2009-01-01

    Consistently choosing an accurate shade match is far more difficult than it appears. Recently, several electronic shade-matching devices have been marketed. One device is an intraoral spectrophotometer, Easyshade. The current study compared the accuracy and consistency of the Easyshade (ES) device to three clinicians experienced in tooth whitening trials and trained in the use of the Vitapan 3D Master shade. The maxillary anteriors of 16 participants were matched on three separate occasions one month apart. At each appointment, the three clinicians (R1, R2 & R3) and ES independently chose a single 3D Master tab. A trained research assistant used the Easyshade device to record CIE L*, C* and H* and a shade tab. In addition, color differences between shade tabs were calculated using the Delta E 2000 (delta e 00) formula. The CIE L*C*H* data were also used to establish standards for the five lightness groups of the 3D Master. An intrarater agreement was evaluated using an intraclass correlation statistic, and an inter-rater agreement was evaluated using a weighted Kappa statistic. The percentages of exact matches were: ES = 41%; R1 = 27%; R2 = 22% and R3 = 17%. Matches within a half-shade were also calculated. This represents a mismatch that is perceptible but acceptable. The percentages of matches within a half-tab were: ES = 91%; R1 = 69%; R2 = 85% and R3 = 79%. In terms of lightness, the intra-rater agreement was considered to be very good for ES and R2 and good for R1 and R3. For chroma, agreement for ES was considered good, and for the three clinicians, it was considered moderate. The mean color difference for the L*, C*, H* data recorded at each evaluation was 1.5, or only slightly greater than the color difference between the same tab on different guides (1.2). The delta e 00 data were the most accurate data collected, and they were used to establish a standard to which the tab choices of the four raters were compared. A weighted Kappa statistic was performed

  10. Preliminary inter-rater reliability of the wheelchair components questionnaire for condition.

    PubMed

    Rispin, Karen; DiFrancesco, John; Raymond, Lawrence A; Riseling, Kristopher; Wee, Joy

    2017-07-07

    The Wheelchair Components Questionnaire for Condition (WCQ-C) enables the collection of data on wheelchair maintenance condition and durability in resource-limited environments. It can be used in large studies to indicate typical patterns of wear at a location, or for a type of wheelchair. It can also be used in clinical settings as an evidence based indication that a wheelchair may need repair or replacement. This type of data can enable effective use of limited funds by wheelchair providers, manufacturers and users. The goal of this study was to investigate the inter-rater reliability of the WCQ-C. Two therapists from North America who have worked extensively in low-resource areas used the WCQ-C to independently evaluate 46 wheelchairs at a primary school for children with disabilities in Kenya. Mean scores of ratings for each wheelchair by the two raters were used to calculate a two-way random interclass correlation coefficient. A value of 0.82 with a 95% confidence interval of 0.67-0.89 indicated good preliminary reliability. Preliminary results indicate that the WCQ-C is a reliable method of assessment. Additional studies are needed with larger and more diverse groups of raters. Because WCQ-C findings are specific to wheelchair wear and maintenance at each location, studies at other locations are also needed. Implications for rehabilitation The importance of inter-rater reliability testing in confirming the reliability of an assessment tool such as the WCQ-C. The use of the WCQ-C to monitor wheelchair condition in low-resource settings and other field settings. If used at regular interval can produce data that can be used to describe typical changes over time at each individual setting. This could enable proactive planning at that setting to avoid typical breakdowns and the injuries or clinical complications that could result. The use of the WCQ-C to monitor the condition of groups of wheelchairs of the same type. It can describe typical patterns of wear and

  11. Anomalous information reception by research mediums under blinded conditions II: replication and extension.

    PubMed

    Beischel, Julie; Boccuzzi, Mark; Biuso, Michael; Rock, Adam J

    2015-01-01

    The examination of the accuracy and specificity of information reported by mediums addresses the existence of non-local information transfer. This study was designed to replicate and extend a previous methodology achieving positive findings regarding the anomalous reception of information about deceased individuals by research mediums under experimental conditions that eliminate conventional explanations, including cold reading, rater bias, experimenter cueing, and fraud. Mediumship readings were performed over the phone under blinded conditions in which mediums, raters, and experimenters were all blinded. A total of 20 Windbridge Certified Research Mediums WCRMs participated in 86 readings. Accuracy and specificity were assessed through item scores, global reading scores, and forced-choice selections provided by blinded sitters. (1) Comparisons between blinded target and decoy readings regarding the estimated percentage accuracy of reading items (n = 27, P = .05, d = 0.49), (2) comparisons regarding the calculated percentage accuracy of reading items (n = 31, P = .002, d = 0.75), (3) comparisons regarding hits vs. misses (n = 31, P < .0001 and P = .002 for different reading sections), (4) comparisons regarding global scores (n = 58, P = .001, d = 0.57), and (5) forced-choice reading selections between blinded target and decoy readings (n = 58, P = .01) successfully replicate and extend previous findings demonstrating the phenomenon of anomalous information reception (AIR), the reporting of accurate and specific information without prior knowledge, in the absence of sensory feedback, and without using deceptive means. Because the experimental conditions of this study eliminated normal, sensory sources for the information mediums report, a non-local source (however controversial) remains the most likely explanation for the accuracy and specificity of their statements. Copyright © 2015. Published by Elsevier Inc.

  12. Inter-rater reliability of h-index scores calculated by Web of Science and Scopus for clinical epidemiology scientists.

    PubMed

    Walker, Benjamin; Alavifard, Sepand; Roberts, Surain; Lanes, Andrea; Ramsay, Tim; Boet, Sylvain

    2016-06-01

    We investigated the inter-rater reliability of Web of Science (WoS) and Scopus when calculating the h-index of 25 senior scientists in the Clinical Epidemiology Program of the Ottawa Hospital Research Institute. Bibliometric information and the h-indices for the subjects were computed by four raters using the automatic calculators in WoS and Scopus. Correlation and agreement between ratings was assessed using Spearman's correlation coefficient and a Bland-Altman plot, respectively. Data could not be gathered from Google Scholar due to feasibility constraints. The Spearman's rank correlation between the h-index of scientists calculated with WoS was 0.81 (95% CI 0.72-0.92) and with Scopus was 0.95 (95% CI 0.92-0.99). The Bland-Altman plot showed no significant rater bias in WoS and Scopus; however, the agreement between ratings is higher in Scopus compared to WoS. Our results showed a stronger relationship and increased agreement between raters when calculating the h-index of a scientist using Scopus compared to WoS. The higher inter-rater reliability and simple user interface used in Scopus may render it the more effective database when calculating the h-index of senior scientists in epidemiology. © 2016 Health Libraries Group.

  13. The intra- and inter-rater reliabilities of the Short Form Berg Balance Scale in institutionalized elderly people.

    PubMed

    Kim, Seong-Gil; Kim, Myoung-Kwon

    2015-09-01

    [Purpose] The purpose of this study was to examine the intra- and inter-rater reliabilities of the Short Form Berg Balance Scale in institutionalized elderly people. [Subjects and Methods] A total of 30 elderly people in a nursing facility in Y city, South Korea, participated in this study. Two examiners administered the Short Form Berg Balance Scale to one subject to investigate inter-rater reliability. After a week, the same examiners administered the Short Form Berg Balance Scale once more to investigate intra-rater reliability. [Results] The intra-rater reliability was 0.83. The inter-rater reliability was 0.79. Both reliabilities were high (more than 0.7). [Conclusion] The Short Form Berg Balance Scale is a version of the Berg Balance Scale shortened by reducing the number of items, but its reliabilities were not lower than those of the Berg Balance Scale. The Short Form Berg Balance Scale can be useful clinically due to its short measurement time.

  14. The feasibility of external blind DNA proficiency testing. II. Experience with actual blind tests.

    PubMed

    Peterson, Joseph L; Lin, George; Ho, Monica; Chen, Yingyu; Gaensslen, R E

    2003-01-01

    The background and goals of a national study to determine the feasibility of blind proficiency testing in U.S. forensic DNA laboratories are discussed. Part of the project involved designing and executing a series of fifteen blind proficiency tests. Execution included biological specimen donor recruitment and case evidence manufacturing. Simulated cases were submitted to DNA laboratories by law enforcement agencies and in some cases by other forensic-science laboratories. Replicate-manufactured evidence was submitted to reference laboratories to simulate the workings of a larger-scale program. Ten tests were straightforward, and essentially tested analytical ability. Five tests involved selecting on the basis of case facts appropriate bloodstains for typing from a bloodstain pattern. We describe in detail our experience in designing and conducting these blind proficiency test trials, and relate those experiences to the overall issue of blind proficiency testing as a quality-assurance tool in forensic DNA laboratories. In this feasibility test series, one blind test was detected by a laboratory, a second one was shown to the lab by law enforcement, and a third was never completed because of lapses in communication. Turnaround times were relatively fast in the independent/commercial labs and relatively slow in the larger public laboratories. Two cross-state case-to-case CODIS "hits" were "planted" among the first series of ten blind tests. One pair was detected. One member of the second pair went to a lab that was not CODIS-ready.

  15. Reference Services.

    ERIC Educational Resources Information Center

    Bunge, Charles A.

    1999-01-01

    Discusses library reference services. Topics include the historical development of reference services; instruction in library use, particularly in college and university libraries; guidance; information and referral services and how they differ from traditional question-answering service; and future concerns, including user fees and the planning…

  16. Reference Assessment

    ERIC Educational Resources Information Center

    Bivens-Tatum, Wayne

    2006-01-01

    This article presents interesting articles that explore several different areas of reference assessment, including practical case studies and theoretical articles that address a range of issues such as librarian behavior, patron satisfaction, virtual reference, or evaluation design. They include: (1) "Evaluating the Quality of a Chat Service"…

  17. Reference Assessment

    ERIC Educational Resources Information Center

    Bivens-Tatum, Wayne

    2006-01-01

    This article presents interesting articles that explore several different areas of reference assessment, including practical case studies and theoretical articles that address a range of issues such as librarian behavior, patron satisfaction, virtual reference, or evaluation design. They include: (1) "Evaluating the Quality of a Chat Service"…

  18. Reference Services.

    ERIC Educational Resources Information Center

    Bunge, Charles A.

    1999-01-01

    Discusses library reference services. Topics include the historical development of reference services; instruction in library use, particularly in college and university libraries; guidance; information and referral services and how they differ from traditional question-answering service; and future concerns, including user fees and the planning…

  19. Reference Revolutions.

    ERIC Educational Resources Information Center

    Mason, Marilyn Gell

    1998-01-01

    Describes developments in Online Computer Library Center (OCLC) electronic reference services. Presents a background on networked cataloging and the initial implementation of reference services by OCLC. Discusses the introduction of OCLC FirstSearch service, which today offers access to over 65 databases, future developments in integrated…

  20. Inter-rater reliability of a musculoskeletal screen as administered to female professional contemporary dancers.

    PubMed

    Karim, Annette; Millet, Victoria; Massie, Kate; Olson, Sharon; Morganthaler, Andrea

    2011-01-01

    The purpose of this study is to determine the inter-rater reliability of commonly used musculoskeletal screening components in a population of contemporary professional dancers. Study participants were 30 women from six contemporary dance companies between the ages of 18 and 32, with a mean age of 24, and Body Mass Index of 22.4. 101 items were assessed in the categories of Static Posture, the Beighton 9-Point Hypermobility Test, Flexibility, Strength, and Dynamic Posture, based upon the Pilot 2006 Dance USA Annual Post-Hire Health Screen for Professional Dancers. Testing was non-ordered, using 2 of the 4 available testers, with variable assignment of the lead tester. High percent agreement was found for the subcategories of hallux valgus, pelvic tilt, and forefoot alignment, flexor hallucis, iliopsoas, hip internal rotation flexed, external rotation extended, and soleus extensibility, composite Beighton, and for most measures within the dynamic posture category. Low to moderate percent agreement was found in the strength tests. Although this study demonstrated moderate to high percent agreement between raters, further test refinement is needed to improve the reliability of the measurement components.

  1. CRM Assessment: Determining the Generalization of Rater Calibration Training. Summary of Research Report: Gold Standards Training

    NASA Technical Reports Server (NTRS)

    Baker, David P.

    2002-01-01

    The extent to which pilot instructors are trained to assess crew resource management (CRM) skills accurately during Line-Oriented Flight Training (LOFT) and Line Operational Evaluation (LOE) scenarios is critical. Pilot instructors must make accurate performance ratings to ensure that proper feedback is provided to flight crews and appropriate decisions are made regarding certification to fly the line. Furthermore, the Federal Aviation Administration's (FAA) Advanced Qualification Program (AQP) requires that instructors be trained explicitly to evaluate both technical and CRM performance (i.e., rater training) and also requires that proficiency and standardization of instructors be verified periodically. To address the critical need for effective pilot instructor training, the American Institutes for Research (AIR) reviewed the relevant research on rater training and, based on "best practices" from this research, developed a new strategy for training pilot instructors to assess crew performance. In addition, we explored new statistical techniques for assessing the effectiveness of pilot instructor training. The results of our research are briefly summarized below. This summary is followed by abstracts of articles and book chapters published under this grant.

  2. A multi-rater framework for studying personality: The trait-reputation-identity model.

    PubMed

    McAbee, Samuel T; Connelly, Brian S

    2016-10-01

    Personality and social psychology have historically been divided between personality researchers who study the impact of traits and social-cognitive researchers who study errors in trait judgments. However, a broader view of personality incorporates not only individual differences in underlying traits but also individual differences in the distinct ways a person's personality is construed by oneself and by others. Such unique insights are likely to appear in the idiosyncratic personality judgments that raters make and are likely to have etiologies and causal force independent of trait perceptions shared across raters. Drawing on the logic of the Johari window (Luft & Ingham, 1955), the Self-Other Knowledge Asymmetry Model (Vazire, 2010), and Socioanalytic Theory (Hogan, 1996; Hogan & Blickle, 2013), we present a new model that separates personality variance into consensus about underlying traits (Trait), unique self-perceptions (Identity), and impressions conveyed to others that are distinct from self-perceptions (Reputation). We provide three demonstrations of how this Trait-Reputation-Identity (TRI) Model can be used to understand (a) consensus and discrepancies across rating sources, (b) personality's links with self-evaluation and self-presentation, and (c) gender differences in traits. We conclude by discussing how researchers can use the TRI Model to achieve a more sophisticated view of personality's impact on life outcomes, developmental trajectories, genetic origins, person-situation interactions, and stereotyped judgments. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  3. [Inter-rater concordance of the "Nursing Activities Score" in intensive care].

    PubMed

    Valls-Matarín, Josefa; Salamero-Amorós, Maria; Roldán-Gil, Carmen; Quintana-Riera, Salvador

    2015-01-01

    To evaluate inter-rater concordance in the valuation of the "Nursing Activities Score". Cross-sectional descriptive study conducted from December 2012 until June 2013 in a general intensive care unit with twelve beds. Three evaluator nurses, simultaneously and independently, through the patient daily charts, scored the nursing workload using Nursing Activities Score scale in all patients admitted over 18 years old. Three hundreds and thirty-nine records were collected. The intra-class correlation coefficient (ICC) between evaluators was 0.92 (0.89-0.94). A perfect concordance was obtained in 39.1% of the items, with 52.2% having a high, and 8.7% having lower concordance, corresponding to two of the items with multiple scoring options. Significant differences between two of the evaluators (P=.049) were found. Although the inter-rater concordance was high, more accurate records are needed to reduce the variability of the items with multiple options and to allow more accuracy in the interpretation and measurement of the data regarding nursing workload. Copyright © 2015 Elsevier España, S.L.U. All rights reserved.

  4. The inter-rater reliability of the Risk Instrument for Screening in the Community.

    PubMed

    Weathers, Elizabeth; O'Caoimh, Rónán; O'Sullivan, Ronan; Paúl, Constança; Orfilia, Frances; Clarnette, Roger; Fitzgerald, Carol; Svendrovski, Anton; Cornally, Nicola; Leahy-Warren, Patricia; Molloy, D William

    2016-09-01

    Predicting risk of adverse healthcare outcomes is important to enable targeted delivery of interventions. The Risk Instrument for Screening in the Community (RISC), designed for use by public health nurses (PHNs), measures the 1-year risk of hospitalisation, institutionalisation and death in community-dwelling older adults according to a five-point global risk score: from low (score 1,2) to medium (3) to high (4,5). We examined the inter-rater reliability (IRR) of the RISC between student PHNs (n=32) and expert raters using six cases (two low, medium and high-risk), scored before and after RISC training. Correlations increased for each adverse outcome, statistically significantly for institutionalisation (r=0.72 to 0.80, p=0.04) and hospitalisation (r=0.51 to 0.71, p<0.01) but not death. Training improved accuracy for low-risk but not all high-risk cases. Overall, the RISC showed good IRR, which increased after RISC training. That reliability fell for some high-risk cases suggests that the training programme requires adjustment to improve IRR further.

  5. Field Inter-Rater Reliability of the Psychopathy Checklist-Revised.

    PubMed

    Ismail, Ghena; Looman, Jan

    2016-06-01

    Strong inter-rater reliability has been established for the Hare Psychopathy Checklist-Revised (PCL-R), specifically by examiners in research contexts. However, there is less support for inter-reliability in applied settings. This study examined archival data that included a sample of sex offenders (n = 178) who entered federal custody between 1992 and 1998. The offenders were assessed using the PCL-R on two occasions. The first assessment occurred at Millhaven Institution, the intake unit for federally incarcerated offenders in the province of Ontario. The second assessment took place upon inmates' transfer to the Regional Treatment Center, which admits federal inmates with intense psychological and psychiatric needs. Intra-class correlation coefficients (ICCs) were calculated for item, total, factor, and facet scores. The ICC absolute agreement for the PCL-R total and factor scores from raters across both settings was slightly better than what has been previously reported by Hare. Results of this study show that the reliability of PCL-R scores in field settings can be comparable to those in research settings. Authors conclude by highlighting the importance of training, consultation, considering different scores for a given item, following the guidelines of the manual in addition to considering measures that enhance neutrality and reliability of findings in the criminal justice system.

  6. Using Google Street View to Audit the Built Environment: Inter-rater Reliability Results

    PubMed Central

    Wilson, Jeffrey S.; Baker, Elizabeth A.; Miller, Douglas K.; Schootman, Mario

    2012-01-01

    Background Observational field audits are recommended for public health research to collect data on built environment characteristics. A reliable, standardized alternative to field audits that uses publicly available information could provide the ability to efficiently compare results across different study sites and time. Purpose This study aimed to assess inter-rater reliability of built environment audits conducted using Google Street View imagery. Methods In 2011, street segments from St. Louis and Indianapolis were geographically stratified to ensure representation of neighborhoods with different land use and socioeconomic characteristics in both cities. Inter-rater reliability was assessed using observed agreement and the prevalence-adjusted bias-adjusted kappa statistic (PABAK). Results The mean PABAK for all items was 0.84. Ninety-five percent of the items had substantial (PABAK≥0.60) or nearly perfect (PABAK≥0.80) agreement. Conclusions Using Google Street View imagery to audit the built environment is a reliable method for assessing characteristics of the built environment. PMID:23054943

  7. Epidemiology of blindness in Nepal*

    PubMed Central

    Brilliant, L. B.; Pokhrel, R. P.; Grasset, N. C.; Lepkowski, J. M.; Kolstad, A.; Hawks, W.; Pararajasegaram, R.; Brilliant, G. E.; Gilbert, S.; Shrestha, S. R.; Kuo, J.

    1985-01-01

    This report presents the major findings of the Nepal Blindness Survey, the first nationwide epidemiological survey of blindness, which was conducted in 1979-80. The survey was designed to gather data that could be used to estimate the prevalence and causes of blindness in the country. Ancillary studies were conducted to obtain information on socioeconomic correlates and other risk factors of blinding conditions and patterns of health care utilization. The nationwide blindness prevalence rate is 0.84%. Cataract is the leading cause of blindness, accounting for over 80% of all avoidable blindness. Trachoma is the most prevalent blinding condition, affecting 6.5% of the population. Very few cases of childhood blindness were detected. The implications of the survey findings for programme planning, health manpower development, and health education are discussed. PMID:3874717

  8. Blinded by headlights.

    PubMed

    Stevanovski, Biljana; Oriet, Chris; Jolicoeur, Pierre

    2002-06-01

    Target identification is impaired when targets are presented during the planning or execution of a compatible response (e.g., right-pointing arrow during a right keypress) relative to an incompatible response (Müsseler & Hommel, 1997 a, b). Examinations of this blindness to response-compatible stimuli have typically used arrowheads as targets ("<" and ">"). The importance of the target symbol was examined by manipulating subjects' interpretation of that symbol (i.e., ">" interpreted as a right-pointing arrow or as a headlight shining to the left). Targets were presented at varying times during the planning or execution of a response in order to examine the time-course of the effect. Results showed that the interpretation, and not the physical identity, of the target was important for the blindness effect. Although the blindness effect was largest during the planning and execution of a response, it was not always confined to that temporal interval.

  9. Meningeal carcinomatosis and blindness

    PubMed Central

    Altrocchi, Paul H.; Eckman, Paul B.

    1973-01-01

    The clinical syndrome of meningeal carcinomatosis includes headache, dementia, radiculopathy, and cranial nerve palsies. Blindness may be the first, or most prominent, symptom. When blindness occurs in adult life, meningeal carcinomatosis should be included in the differential diagnosis, even in the absence of other symptoms and in the absence of known malignancy. Although all pathophysiological mechanisms of the blindness in meningeal carcinomatosis have not yet been elucidated, optic nerve involvement by meningeal tumour-cuffing, by chronic papilloedema, and by direct tumour infiltration represent the likeliest causes. In the neuropathological analysis of such cases, the importance of analysing the intra-orbital portion of the optic nerves, in addition to the portions of the optic nerve and chiasm usually examined at routine necropsy, is emphasized. A case is described to illustrate this point, with the only pathological abnormality in the optic nerves being found within 6 mm of the retina. Images PMID:4708455

  10. Optimal Blind Quantum Computation

    NASA Astrophysics Data System (ADS)

    Mantri, Atul; Pérez-Delgado, Carlos A.; Fitzsimons, Joseph F.

    2013-12-01

    Blind quantum computation allows a client with limited quantum capabilities to interact with a remote quantum computer to perform an arbitrary quantum computation, while keeping the description of that computation hidden from the remote quantum computer. While a number of protocols have been proposed in recent years, little is currently understood about the resources necessary to accomplish the task. Here, we present general techniques for upper and lower bounding the quantum communication necessary to perform blind quantum computation, and use these techniques to establish concrete bounds for common choices of the client’s quantum capabilities. Our results show that the universal blind quantum computation protocol of Broadbent, Fitzsimons, and Kashefi, comes within a factor of (8)/(3) of optimal when the client is restricted to preparing single qubits. However, we describe a generalization of this protocol which requires exponentially less quantum communication when the client has a more sophisticated device.

  11. [Inter-rater reliability of healthcare professional skills' portfolio assessments: The Andalusian Agency for Healthcare Quality model].

    PubMed

    Almuedo-Paz, Antonio; Herrera-Usagre, Manuel; Buiza-Camacho, Begoña; Julián-Carrión, José; Carrascosa-Salmoral, María del Pilar; Martín-García, Sheila María; Salguero-Cabalgante, Rocío

    2014-07-17

    This study aims to determine the reliability of assessment criteria used for a portfolio at the Andalusian Agency for Healthcare Quality (ACSA). all competences certification processes, regardless of their discipline. 2010-2011. Three types of tests are used: 368 certificates, 17 895 reports and 22 642 clinical practice reports (N=3 010 candidates). The tests were evaluated in pairs by the ACSA team of raters using two categories: valid and invalid. The percentage agreement in assessments of certificates was 89.9%; for the reports of clinical practice, 85.1%; and for clinical practice reports, 81.7%. The inter-rater agreement coefficients (kappa) ranged from 0.468 to 0.711. The results of this study show that the inter-rater reliability of assessments varies from fair to good. Compared with other similar studies, the results put the reliability of the model in a comfortable position. Criteria were reviewed and progressive automation of evaluations was done.

  12. Cultural adaptation, content validity and inter-rater reliability of the "STAR Skin Tear Classification System"1

    PubMed Central

    Strazzieri-Pulido, Kelly Cristina; Santos, Vera Lúcia Conceição de Gouveia; Carville, Keryln

    2015-01-01

    AIMS: to perform the cultural adaptation of the STAR Skin Tear Classification System into the Portuguese language and to test the content validity and inter-rater reliability of the adapted version. METHODS: methodological study with a quantitative approach. The cultural adaptation was developed in three phases: translation, evaluation by a committee of judges and back-translation. The instrument was tested regarding content validity and inter-rater reliability. RESULTS: the adapted version obtained a regular level of concordance when it was applied by nurses using photographs of friction injuries. Regarding its application in clinical practice, the adapted version obtained a moderate and statistically significant level of concordance. CONCLUSION: the study tested the content validity and inter-rater reliability of the version adapted into the Portuguese language. Its inclusion in clinical practice will enable the correct identification of this type of injury, as well as the implementation of protocols for the prevention and treatment of friction injuries. PMID:25806644

  13. Poor multi-rater reliability in TCM pattern diagnoses and variation in the use of symptoms to obtain a diagnosis.

    PubMed

    Birkeflet, Oddveig; Laake, Petter; Vøllestad, Nina K

    2014-08-01

    Pattern differentiation and diagnosis are fundamental principles of Traditional Chinese Medicine (TCM). Studies have shown low inter-rater reliability in TCM pattern diagnoses. This variability may originate from both the identification and the interpretation of symptoms and signs. To examine the inter-rater reliability in TCM pattern diagnoses made in the style of Maciocia for 25 case histories by eight acupuncturists and to explore the impact of demographic factors on the diagnostic conclusion. Further, the association between the diagnosis and the presence of symptoms was examined for a single TCM diagnosis. Eight acupuncturists independently diagnosed 25 women (15 fertile, 10 infertile) based on written case histories. Descriptive statistics, logistic regression and inter-rater reliability (κ) were used. Poor inter-rater reliability on TCM patterns (κ<0.20) and large variation in the number of TCM pattern diagnoses were found. Sex, duration of practice and education had a highly significant effect (p<0.001) on the use of TCM patterns and working hours had a significant effect (p=0.029). There was considerable intra- and inter-rater variation in the use of symptoms to make a diagnosis. Symptoms occurring frequently as well as infrequently were inconsistently used to diagnose Liver Qi Stagnation. The study was limited by a small sample size. The results showed extensive variation and poor inter-rater reliability in TCM diagnoses. Demographic variables influenced the frequency of diagnoses and symptoms were used inconsistently to set a diagnosis. The variability shown could impede individually tailored treatment. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  14. Ready Reference.

    ERIC Educational Resources Information Center

    Koltay, Emery

    1999-01-01

    Includes the following ready reference information: "Publishers' Toll-Free Telephone Numbers"; "How to Obtain an ISBN (International Standard Book Number)"; "How to Obtain an ISSN (International Standard Serial Number)"; and "How to Obtain an SAN (Standard Address Number)". (AEF)

  15. A method for quantitative measurement of lumbar intervertebral disc structures: an intra- and inter-rater agreement and reliability study

    PubMed Central

    2013-01-01

    Background There is a shortage of agreement studies relevant for measuring changes over time in lumbar intervertebral disc structures. The objectives of this study were: 1) to develop a method for measurement of intervertebral disc height, anterior and posterior disc material and dural sac diameter using MRI, 2) to evaluate intra- and inter-rater agreement and reliability for the measurements included, and 3) to identify factors compromising agreement. Methods Measurements were performed on MRIs from 16 people with and 16 without lumbar disc herniation, purposefully chosen to represent all possible disc contours among participants in a general population study cohort. Using the new method, MRIs were measured twice by one rater and once by a second rater. Agreement on the sagittal start- and end-slice was evaluated using weighted Kappa. Length and volume measurements were conducted on available slices between intervertebral foramens, and cross-sectional areas (CSA) were calculated from length measurements and slice thickness. Results were reported as Bland and Altman’s limits of agreement (LOA) and intraclass correlation coefficients (ICC). Results Weighted Kappa (Kw (95% CI)) for start- and end-slice were: intra-: 0.82(0.60;0.97) & 0.71(0.43;0.93); inter-rater: 0.56(0.29;0.78) & 0.60(0.35;0.81). For length measurements, LOA ranged from [−1.0;1.0] mm to [−2.0;2.3] mm for intra-; and from [−1.1; 1.4] mm to [−2.6;2.0] mm for inter-rater. For volume measurements, LOA ranged from [−293;199] mm3 to [−582;382] mm3 for intra-, and from [−17;801] mm3 to [−450;713] mm3 for inter-rater. For CSAs, LOA ranged between [−21.3; 18.8] mm2 and [−31.2; 43.7] mm2 for intra-, and between [−10.8; 16.4] mm2 and [−64.6; 27.1] mm2 for inter-rater. In general, LOA as a proportion of mean values gradually decreased with increasing size of the measured structures. Agreement was compromised by difficulties in identifying the vertebral corners, the anterior and

  16. Can the Dyskinesia Impairment Scale be used by inexperienced raters? A reliability study.

    PubMed

    Monbaliu, Elegast; Ortibus, Els; Prinzie, Peter; Dan, Bernard; De Cat, Josse; De Cock, Paul; Feys, Hilde

    2013-05-01

    The Dyskinesia Impairment Scale (DIS) is a new scale for measuring dystonia and choreoathetosis in dyskinetic Cerebral Palsy (CP). Previously, reliability of this scale has only been assessed for raters highly experienced in discriminating between dystonia and choreoathetosis. The aims of this study are to examine the reliability of the DIS used by inexperienced raters, new to discriminating between dystonia and choreoathetosis and to determine the effect of clinical expertise on reliability. Twenty-five patients (17 males; 8 females; age range 5-22 years; mean age = 13 years 6 months; SD = 5 years 4 months) with dyskinetic CP were filmed with the DIS standard video protocol. Two junior physiotherapists (PTs) and three senior PTs, all of whom were new to discriminating between dystonia and choreoathetosis, were trained in scoring the DIS. Afterward, they independently scored all patients from the video recordings using the DIS. Reliability was assessed by (1) Intraclass Correlation Coefficient (ICC), (2) Standard Error of Measurement (SEM) and Minimal Detectable Difference (MDD) and (3) Cronbach's alpha for internal consistency. Interrater reliability for the total DIS, and for the dystonia and choreoathetosis subscales was good for the junior PTs and moderately high to excellent for the senior PTs. SEM and MDD values for the total DIS were 6% and 15% respectively for the junior PTs and 4% and 12% respectively for the senior PTs. Cronbach's alpha ranged between 0.87 and 0.95 for the junior PTs and between 0.76 and 0.93 for the senior PTs. Reliability of the DIS scores for the inexperienced junior and senior PTs was sufficient in comparison with scores from the experienced raters in the previous study, indicating that the DIS can be used by inexperienced PTs new to discriminating between dystonia and choreoathetosis, and also that its reliability is not dependent on clinical expertise. However, based on the measurement errors and questionnaire data, familiarity

  17. [Blindness and visual rehabilitation].

    PubMed

    Matonti, F; Roux, S; Denis, D; Picaud, S; Chavane, F

    2015-02-01

    Blindness and visual impairment are a major public health problem all over the world and in all societies. A large amount of basic science and clinical research aims to rehabilitate patients and help them become more independent. Various methods are explored from cell and molecular therapy to prosthetic interfaces. We review the various treatment alternatives, describing their results and their limitations.

  18. Blindness in the Toybox

    ERIC Educational Resources Information Center

    Swartz, Edward M.

    1973-01-01

    The author proposes that toys which have been shown to cause blindness (such as dart guns, bows and arrows, peashooters, air guns, and slingshots) be banned, and suggests that government regulatory agencies and the toy industry have been lax in acting on their expressed concern for safety. (DB)

  19. Homer: The Blind Bard

    ERIC Educational Resources Information Center

    Doorley, Rachelle; King, Judith

    2005-01-01

    This article describes notable cultural, historical, and artistic elements emanating from sculptures originating in ancient Greece. The "blind bard" and its connection to the legendary Greek poet, Homer; Homer's impact on literary history; trends among Roman sculptures; and Roman replication of Greek art are described. Questions to…

  20. The blind beautiful eye.

    PubMed

    Feinsod, M

    2000-03-01

    Master Jehan Yperman, a medieval surgeon, observed that when the optic nerve is injured, the eye becomes blind and beautiful. This is an attempt to trace the footsteps of this forgotten surgeon and to track the history of the cosmetic use of the belladonna herb, as well as the concept of amaurotic mydriasis.

  1. On Simulating Blindness.

    ERIC Educational Resources Information Center

    Kappan, David

    Many educators in facilitative roles have approached the subject of visual disabilities by constructing activities designed to simulate blindness, using a blindfold or similar device. Participants are subsequently encouraged to perform rudimentary tasks such as eating a meal or moving about with a sighted companion as a guide. Frequently,…

  2. Folklore of Blindness.

    ERIC Educational Resources Information Center

    Wagner-Lampl, A.; Oliver, G. W.

    1994-01-01

    This article uses both case examples and reports from archives and oral literature to illustrate the broad range of connections between blindness and superstitions, folklore, beliefs, and mythology. Clinicians are urged to be aware of these beliefs as they counsel individuals adapting to the loss of vision. (Author/DB)

  3. Blinded by Science.

    ERIC Educational Resources Information Center

    Snyder, Tom

    1994-01-01

    Huge infusion of technology is coming into education; nothing can stop it, because so much money is involved. With computer marketers in driver seat instead of teachers, schools risk being blinded by science. Vendors have coopted progressive education buzzwords, including "frontal teaching,""linear thinking," and "computer…

  4. Homer: The Blind Bard

    ERIC Educational Resources Information Center

    Doorley, Rachelle; King, Judith

    2005-01-01

    This article describes notable cultural, historical, and artistic elements emanating from sculptures originating in ancient Greece. The "blind bard" and its connection to the legendary Greek poet, Homer; Homer's impact on literary history; trends among Roman sculptures; and Roman replication of Greek art are described. Questions to…

  5. Testing Children for Color Blindness

    MedlinePlus

    ... blindness as soon as age 4, finds Caucasian boys most likely to be color blind among different ... age 4. In addition, researchers found that Caucasian boys have the highest prevalence among four major ethnicities, ...

  6. Cognitive Mapping by the Blind.

    ERIC Educational Resources Information Center

    Casey, Steven M.

    1978-01-01

    In an effort to study the cognitive mapping abilities of blind persons, tactile maps of a school campus were made by ten congenitally blind and ten blindfolded partially sighted high school students. (Author)

  7. Sighted Children Learn About Blindness

    ERIC Educational Resources Information Center

    Scheffers, Wenda L.

    1977-01-01

    In a 20-lesson unit, sighted second-to fourth-grade students were taught about the long cane, guide dogs, daily living skills, eye physiology, causes of blindness, eye care, braille, and attitudes toward blindness. (CL)

  8. Sighted Children Learn About Blindness

    ERIC Educational Resources Information Center

    Scheffers, Wenda L.

    1977-01-01

    In a 20-lesson unit, sighted second-to fourth-grade students were taught about the long cane, guide dogs, daily living skills, eye physiology, causes of blindness, eye care, braille, and attitudes toward blindness. (CL)

  9. Swimming: An Introduction to Swimming, Diving, and SCUBA Diving for Blind and Physically Handicapped Individuals. Leisure Pursuit Series.

    ERIC Educational Resources Information Center

    Cylke, Frank Kurt, Ed.

    The annotated guide lists information sources available from the National Library Service for the Blind and Physically Handicapped in print, disc, cassette, and braille formats concerning swimming and diving with special reference to blind swimmers. The guide begins with a brief sketch of a champion swimmer who is also legally blind and an…

  10. Corneal blindness and xenotransplantation.

    PubMed

    Lamm, Vladimir; Hara, Hidetaka; Mammen, Alex; Dhaliwal, Deepinder; Cooper, David K C

    2014-01-01

    Approximately 39 million people are blind worldwide, with an estimated 285 million visually impaired. The developing world shoulders 90% of the world's blindness, with 80% of causative diseases being preventable or treatable. Blindness has a major detrimental impact on the patient, community, and healthcare spending. Corneal diseases are significant causes of blindness, affecting at least 4 million people worldwide. The prevalence of corneal disease varies between parts of the world. Trachoma, for instance, is the second leading cause of blindness in Africa, after cataracts, but is rarely found today in developed nations. When preventive strategies have failed, corneal transplantation is the most effective treatment for advanced corneal disease. The major surgical techniques for corneal transplantation include penetrating keratoplasty (PK), anterior lamellar keratoplasty, and endothelial keratoplasty (EK). Indications for corneal transplantation vary between countries, with Fuchs' dystrophy being the leading indication in the USA and keratoconus in Australia. With the exception of the USA, where EK will soon overtake PK as the most common surgical procedure, PK is the overwhelming procedure of choice. Success using corneal grafts in developing nations, such as Nepal, demonstrates the feasibility of corneal transplantation on a global scale. The number of suitable corneas from deceased human donors that becomes available will never be sufficient, and so research into various alternatives, for example stem cells, amniotic membrane transplantation, synthetic and biosynthetic corneas, and xenotransplantation, is progressing. While each of these has potential, we suggest that xenotransplantation holds the greatest potential for a corneal replacement. With the increasing availability of genetically engineered pigs, pig corneas may alleviate the global shortage of corneas in the near future.

  11. CORNEAL BLINDNESS AND XENOTRANSPLANTATION

    PubMed Central

    Lamm, Vladimir; Hara, Hidetaka; Mammen, Alex; Dhaliwal, Deepinder; Cooper, David K.C.

    2014-01-01

    Approximately 39 million people are blind worldwide, with an estimated 285 million visually impaired. The developing world shoulders 90% of the world’s blindness, with 80% of causative diseases being preventable or treatable. Blindness has a major detrimental impact on the patient, community, and healthcare spending. Corneal diseases are significant causes of blindness, affecting at least 4 million people worldwide. The prevalence of corneal disease varies among parts of the world. Trachoma, for instance, is the second leading cause of blindness in Africa, after cataracts, but is rarely found today in developed nations. When preventive strategies have failed, corneal transplantation is the most effective treatment for advanced corneal disease. The major surgical techniques for corneal transplantation include penetrating keratoplasty (PK), anterior lamellar keratoplasty (ALK), and endothelial keratoplasty (EK). Indications for corneal transplantation vary among countries, with Fuchs’ dystrophy being the leading indication in the U.S. and keratoconus in Australia. With the exception of the US, where EK will soon overtake PK as the most common surgical procedure, PK is the overwhelming procedure of choice. Success using corneal grafts in developing nations, such as Nepal, demonstrates the feasibility of corneal transplantation on a global scale. The number of suitable corneas from deceased human donors that becomes available will never be sufficient, and so research into various alternatives, e.g., stem cells, amniotic membrane transplantation, synthetic and biosynthetic corneas, and xenotransplantation, is progressing. While each of these has potential, we suggest that xenotransplantation holds the greatest potential for a corneal replacement. With the increasing availability of genetically-engineered pigs, pig corneas may alleviate the global shortage of corneas in the near future. PMID:25268248

  12. Controlling for rater effects when comparing survey items with incomplete Likert data.

    PubMed

    Schulz, E M; Sun, A

    2001-01-01

    The rating scale model (Andrich, 1978) was applied to data from a survey that directed students to rate their satisfaction with college services on a five point Likert scale. Because students used different services, and students were directed to rate only the services they used, the items were differentially exposed to a person factor that we call "pleasability." Differential exposure to pleasability makes items' average rating a biased measure of their performance. In contrast, item parameter estimates in the rating scale model corrected for differential exposure to pleasability. Compared to items' average ratings, item parameter estimates in the rating scale model did a better job of predicting which item received the higher rating when any two items were rated by the same rater.

  13. [A systematic social observation tool: methods and results of inter-rater reliability].

    PubMed

    Freitas, Eulilian Dias de; Camargos, Vitor Passos; Xavier, César Coelho; Caiaffa, Waleska Teixeira; Proietti, Fernando Augusto

    2013-10-01

    Systematic social observation has been used as a health research methodology for collecting information from the neighborhood physical and social environment. The objectives of this article were to describe the operationalization of direct observation of the physical and social environment in urban areas and to evaluate the instrument's reliability. The systematic social observation instrument was designed to collect information in several domains. A total of 1,306 street segments belonging to 149 different neighborhoods in Belo Horizonte, Minas Gerais, Brazil, were observed. For the reliability study, 149 segments (1 per neighborhood) were re-audited, and Fleiss kappa was used to access inter-rater agreement. Mean agreement was 0.57 (SD = 0.24); 53% had substantial or almost perfect agreement, and 20.4%, moderate agreement. The instrument appears to be appropriate for observing neighborhood characteristics that are not time-dependent, especially urban services, property characterization, pedestrian environment, and security.

  14. Effect of prior performance on subsequent performance evaluation by field independent-dependent raters.

    PubMed

    Sisco, Howard; Leventhal, Gloria

    2007-12-01

    The importance of accurate performance appraisals is central to many aspects of personnel activities in organizations. This study examined threats due to past performance to accuracy of evaluation of subsequent performance by raters differing in scores on field dependence. 162 college students were classified as Field-dependent (n = 81) or Field-independent (n = 81), using a median split on the Group Embedded Figures Test. Past performance (a lecture) was good or poor, presented directly via a videotape or indirectly via a written evaluation to the Field-independent or Field-dependent groups. Analysis indicated the hypothesized contrast effect (ratings in the opposite direction from that of prior ratings) in the Direct condition and an unexpected, albeit smaller, contrast effect in the Indirect condition. There were also differential effects of performance, presentation, and field dependency on rating of lecturer's style and ability.

  15. Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial

    PubMed Central

    Hallgren, Kevin A.

    2012-01-01

    Many research designs require the assessment of inter-rater reliability (IRR) to demonstrate consistency among observational ratings provided by multiple coders. However, many studies use incorrect statistical procedures, fail to fully report the information necessary to interpret their results, or do not address how IRR affects the power of their subsequent analyses for hypothesis testing. This paper provides an overview of methodological issues related to the assessment of IRR with a focus on study design, selection of appropriate statistics, and the computation, interpretation, and reporting of some commonly-used IRR statistics. Computational examples include SPSS and R syntax for computing Cohen’s kappa and intra-class correlations to assess IRR. PMID:22833776

  16. Inter-rater agreement for diagnoses of epilepsy in pregnant women.

    PubMed

    Khoshbin, Shahram; Herring, Amy; Holmes, Gregory L; Schomer, Donald; Hoch, Daniel; Dooling, Elizabeth C; Vining, Eileen P G; Holmes, Lewis B

    2013-04-01

    We report on inter-rater agreement in assessing the types of seizures exhibited by one hundred mothers ascertained in a study of the teratogenicity of maternal epilepsy and antiepileptic drugs. A summary of each woman's medical record and a one-page report of her responses to questions about her epilepsy were reviewed independently by six neurologists, three in pediatric neurology and three in adult neurology. Agreement was measured by the kappa statistic and log-linear modeling techniques. The adult neurologists agreed with each other 59% of the time, with the agreement higher when all three used information from the patients' records, such as an EEG, rather than when depending on the patients' responses to questions about their epilepsy. The pediatric neurologists agreed with each other 44% of the time and tended to rely more heavily on information in the patients' records, such as an EEG or a prior diagnosis, compared with the adult neurologists.

  17. Improving Teacher Selection: The Effect of Inter-Rater Reliability in the Screening Process. CEDR Working Paper. WP #2015-7

    ERIC Educational Resources Information Center

    Martinkova, Patricia; Goldhaber, Dan

    2015-01-01

    Inter-rater reliability, commonly assessed by intra-class correlation coefficient ICC, is an important index for describing the extent to which there is consistency amongst two or more raters in assigned measures. In organizational research, the data structure is often hierarchical and designs deviate substantially from the ideal of a balanced…

  18. Investigating Native and Non-Native English-Speaking Teacher Raters' Judgements of Oral Proficiency in the College English Test-Spoken English Test (CET-SET)

    ERIC Educational Resources Information Center

    Zhang, Ying; Elder, Catherine

    2014-01-01

    This study investigates the impact of raters' language background on their judgements of the speaking performance in the College English Test-Spoken English Test (CET-SET) of China, by comparing the rating patterns of non-native English-speaking (NNES) teacher raters, who are currently employed to assess performance on the CET-SET, with those of…

  19. Investigating Native and Non-Native English-Speaking Teacher Raters' Judgements of Oral Proficiency in the College English Test-Spoken English Test (CET-SET)

    ERIC Educational Resources Information Center

    Zhang, Ying; Elder, Catherine

    2014-01-01

    This study investigates the impact of raters' language background on their judgements of the speaking performance in the College English Test-Spoken English Test (CET-SET) of China, by comparing the rating patterns of non-native English-speaking (NNES) teacher raters, who are currently employed to assess performance on the CET-SET, with those of…

  20. Inter-rater reliability of the evaluation of muscular chains associated with posture alterations in scoliosis.

    PubMed

    Fortin, Carole; Feldman, Debbie Ehrmann; Tanaka, Clarice; Houde, Michelle; Labelle, Hubert

    2012-05-28

    In the Global postural re-education (GPR) evaluation, posture alterations are associated with anterior or posterior muscular chain impairments. Our goal was to assess the reliability of the GPR muscular chain evaluation. Inter-rater reliability study. Fifty physical therapists (PTs) and two experts trained in GPR assessed the standing posture from photographs of five youths with idiopathic scoliosis using a posture analysis grid with 23 posture indices (PI). The PTs and experts indicated the muscular chain associated with posture alterations. The PTs were also divided into three groups according to their experience in GPR. Experts' results (after consensus) were used to verify agreement between PTs and experts for muscular chain and posture assessments. We used Kappa coefficients (K) and the percentage of agreement (%A) to assess inter-rater reliability and intra-class coefficients (ICC) for determining agreement between PTs and experts. For the muscular chain evaluation, reliability was moderate to substantial for 12 PI for the PTs (%A: 56 to 82; K: 0.42 to 0.76) and perfect for 19 PI for the experts. For posture assessment, reliability was moderate to substantial for 12 PI for the PTs (%A > 60%; K: 0.42 to 0.75) and moderate to perfect for 18 PI for the experts (%A: 80 to 100; K: 0.55 to 1.00). The agreement between PTs and experts was good for most muscular chain evaluations (18 PI; ICC: 0.82 to 0.99) and PI (19 PI; ICC: 0.78 to 1.00). The GPR muscular chain evaluation has good reliability for most posture indices. GPR evaluation should help guide physical therapists in targeting affected muscles for treatment of abnormal posture patterns.

  1. Validity and inter-rater reliability of inertial gait measurements in Parkinson's disease: a pilot study.

    PubMed

    Esser, Patrick; Dawes, Helen; Collett, Johnny; Feltham, Max G; Howells, Ken

    2012-03-30

    Walking models driven by centre of mass (CoM) data obtained from inertial measurement units (IMU) or optical motion capture systems (OMCS) can be used to objectively measure gait. However current models have only been validated within typical developed adults (TDA). The purpose of this study was to compare the projected CoM movement within Parkinson's disease (PD) measured by an IMU with data collected from an OMCS after which spatio-temporal gait measures were derived using an inverted pendulum model. The inter-rater reliability of spatio-temporal parameters was explored between expert researchers and clinicians using the IMU processed data. Participants walked 10 m with an IMU attached over their centre of mass which was simultaneously recorded by an OMCS. Data was collected on two occasions, each by an expert researcher and clinician. Ten people with PD showed no difference (p=0.13) for vertical, translatory acceleration, velocity and relative position of the projected centre of mass between IMU and OMCS data. Furthermore no difference (p=0.18) was found for the derived step time, stride length and walking speed for people with PD. Measurements of step time (p=0.299), stride length (p=0.883) and walking speed (p=0.751) did not differ between experts and clinicians. There was good inter-rater reliability for these parameters (ICC3.1=0.979, ICC3.1=0.958 and ICC3.1=0.978, respectively). The findings are encouraging and support the use of IMUs by clinicians to measure CoM movement in people with PD. Copyright © 2012 Elsevier B.V. All rights reserved.

  2. Measuring the Pain Area: An Intra- and Inter-Rater Reliability Study Using Image Analysis Software.

    PubMed

    Dos Reis, Felipe Jose Jandre; de Barros E Silva, Veronica; de Lucena, Raphaela Nunes; Mendes Cardoso, Bruno Alexandre; Nogueira, Leandro Calazans

    2016-01-01

    Pain drawings have frequently been used for clinical information and research. The aim of this study was to investigate intra- and inter-rater reliability of area measurements performed on pain drawings. Our secondary objective was to verify the reliability when using computers with different screen sizes, both with and without mouse hardware. Pain drawings were completed by patients with chronic neck pain or neck-shoulder-arm pain. Four independent examiners participated in the study. Examiners A and B used the same computer with a 16-inch screen and wired mouse hardware. Examiner C used a notebook with a 16-inch screen and no mouse hardware, and Examiner D used a computer with an 11.6-inch screen and a wireless mouse. Image measurements were obtained using GIMP and NIH ImageJ computer programs. The length of all the images was measured using GIMP software to a set scale in ImageJ. Thus, each marked area was encircled and the total surface area (cm(2) ) was calculated for each pain drawing measurement. A total of 117 areas were identified and 52 pain drawings were analyzed. The intrarater reliability between all examiners was high (ICC = 0.989). The inter-rater reliability was also high. No significant differences were observed when using different screen sizes or when using or not using the mouse hardware. This suggests that the precision of these measurements is acceptable for the use of this method as a measurement tool in clinical practice and research. © 2014 World Institute of Pain.

  3. Inter-rater reliability of the evaluation of muscular chains associated with posture alterations in scoliosis

    PubMed Central

    2012-01-01

    Background In the Global postural re-education (GPR) evaluation, posture alterations are associated with anterior or posterior muscular chain impairments. Our goal was to assess the reliability of the GPR muscular chain evaluation. Methods Design: Inter-rater reliability study. Fifty physical therapists (PTs) and two experts trained in GPR assessed the standing posture from photographs of five youths with idiopathic scoliosis using a posture analysis grid with 23 posture indices (PI). The PTs and experts indicated the muscular chain associated with posture alterations. The PTs were also divided into three groups according to their experience in GPR. Experts’ results (after consensus) were used to verify agreement between PTs and experts for muscular chain and posture assessments. We used Kappa coefficients (K) and the percentage of agreement (%A) to assess inter-rater reliability and intra-class coefficients (ICC) for determining agreement between PTs and experts. Results For the muscular chain evaluation, reliability was moderate to substantial for 12 PI for the PTs (%A: 56 to 82; K: 0.42 to 0.76) and perfect for 19 PI for the experts. For posture assessment, reliability was moderate to substantial for 12 PI for the PTs (%A > 60%; K: 0.42 to 0.75) and moderate to perfect for 18 PI for the experts (%A: 80 to 100; K: 0.55 to 1.00). The agreement between PTs and experts was good for most muscular chain evaluations (18 PI; ICC: 0.82 to 0.99) and PI (19 PI; ICC: 0.78 to 1.00). Conclusions The GPR muscular chain evaluation has good reliability for most posture indices. GPR evaluation should help guide physical therapists in targeting affected muscles for treatment of abnormal posture patterns. PMID:22639838

  4. Measuring inter-rater reliability for nominal data - which coefficients and confidence intervals are appropriate?

    PubMed

    Zapf, Antonia; Castell, Stefanie; Morawietz, Lars; Karch, André

    2016-08-05

    Reliability of measurements is a prerequisite of medical research. For nominal data, Fleiss' kappa (in the following labelled as Fleiss' K) and Krippendorff's alpha provide the highest flexibility of the available reliability measures with respect to number of raters and categories. Our aim was to investigate which measures and which confidence intervals provide the best statistical properties for the assessment of inter-rater reliability in different situations. We performed a large simulation study to investigate the precision of the estimates for Fleiss' K and Krippendorff's alpha and to determine the empirical coverage probability of the corresponding confidence intervals (asymptotic for Fleiss' K and bootstrap for both measures). Furthermore, we compared measures and confidence intervals in a real world case study. Point estimates of Fleiss' K and Krippendorff's alpha did not differ from each other in all scenarios. In the case of missing data (completely at random), Krippendorff's alpha provided stable estimates, while the complete case analysis approach for Fleiss' K led to biased estimates. For shifted null hypotheses, the coverage probability of the asymptotic confidence interval for Fleiss' K was low, while the bootstrap confidence intervals for both measures provided a coverage probability close to the theoretical one. Fleiss' K and Krippendorff's alpha with bootstrap confidence intervals are equally suitable for the analysis of reliability of complete nominal data. The asymptotic confidence interval for Fleiss' K should not be used. In the case of missing data or data or higher than nominal order, Krippendorff's alpha is recommended. Together with this article, we provide an R-script for calculating Fleiss' K and Krippendorff's alpha and their corresponding bootstrap confidence intervals.

  5. Programs for the Deaf Blind.

    ERIC Educational Resources Information Center

    American Annals of the Deaf, 1987

    1987-01-01

    The directory lists 30 programs for deaf-blind children and youth, the 10 regional offices of the Helen Keller National Center for Deaf-Blind Youths and Adults, and five programs for training teachers of the deaf-blind. Provided for each program is address, director's name, and phone number. (DB)

  6. Tool for Assessing Responsibility-Based Education (TARE): Instrument Development, Content Validity, and Inter-Rater Reliability

    ERIC Educational Resources Information Center

    Wright, Paul M.; Craig, Mark W.

    2011-01-01

    Numerous scholars have stressed the importance of personal and social responsibility in physical activity settings; however, there is a lack of instrumentation to study the implementation of responsibility-based teaching strategies. The development, content validity, and initial inter-rater reliability testing of the Tool for Assessing…

  7. Inter-Rater Reliability of the Modified Ashworth Scale and Modified Modified Ashworth Scale in Assessing Poststroke Elbow Flexor Spasticity

    ERIC Educational Resources Information Center

    Kaya, Taciser; Goksel Karatepe, Altinay; Gunaydin, Rezzan; Koc, Aysegul; Altundal Ercan, Ulku

    2011-01-01

    The Modified Ashworth Scale (MAS) is commonly used in clinical practice for grading spasticity. However, it was modified recently by omitting grade "1+" of the MAS and redefining grade "2". The aim of this study was to investigate the inter-rater reliability of MAS and modified MAS (MMAS) for the assessment of poststroke elbow flexor spasticity.…

  8. Inter-Rater Reliability of the Modified Ashworth Scale and Modified Modified Ashworth Scale in Assessing Poststroke Elbow Flexor Spasticity

    ERIC Educational Resources Information Center

    Kaya, Taciser; Goksel Karatepe, Altinay; Gunaydin, Rezzan; Koc, Aysegul; Altundal Ercan, Ulku

    2011-01-01

    The Modified Ashworth Scale (MAS) is commonly used in clinical practice for grading spasticity. However, it was modified recently by omitting grade "1+" of the MAS and redefining grade "2". The aim of this study was to investigate the inter-rater reliability of MAS and modified MAS (MMAS) for the assessment of poststroke elbow flexor spasticity.…

  9. Using Consensus Building Procedures with Expert Raters to Establish Comparison Scores of Behavior for Direct Behavior Rating

    ERIC Educational Resources Information Center

    Jaffery, Rose; Johnson, Austin H.; Bowler, Mark C.; Riley-Tillman, T. Chris; Chafouleas, Sandra M.; Harrison, Sayward E.

    2015-01-01

    To date, rater accuracy when using Direct Behavior Rating (DBR) has been evaluated by comparing DBR-derived data to scores yielded through systematic direct observation. The purpose of this study was to evaluate an alternative method for establishing comparison scores using expert-completed DBR alongside best practices in consensus building…

  10. Non-inferiority test and confidence interval for the difference in correlated proportions in diagnostic procedures based on multiple raters.

    PubMed

    Saeki, Hiroyuki; Tango, Toshiro

    2011-12-10

    The efficacy of diagnostic procedures is generally evaluated on the basis of the results from multiple raters. However, there are few adequate methods of performing non-inferiority tests with confidence intervals to compare the accuracies (sensitivities or specificities) when multiple raters are considered. We propose new statistical methods for comparing the accuracies of two diagnostic procedures in a non-inferiority trial, on the basis of the results from multiple independent raters who are also independent of the study centers. We consider a study design in which each patient is subjected to two diagnostic procedures and all images are read by all raters. By assuming a multinomial distribution for matched-pair categorical data arising from the study design, we derive a score-based full menu, that is, a non-inferiority test, confidence interval and sample size formula, for inference of the difference in correlated proportions between the two diagnostic procedures. We conduct Monte Carlo simulation studies to examine the validity of the proposed methods, which showed that the proposed test has a size closer to the nominal significance level than a Wald-type test and that the proposed confidence interval has better empirical coverage probability than a Wald-type confidence interval. We illustrate the proposed methods with data from a study of diagnostic procedures for the diagnosis of oesophageal carcinoma infiltrating the tracheobronchial tree.

  11. Removing Bias towards World Englishes: The Development of a Rater Attitude Instrument Using Indian English as a Stimulus

    ERIC Educational Resources Information Center

    Hsu, Tammy Huei-Lien

    2016-01-01

    This study explores the attitudes of raters of English speaking tests towards the global spread of English and the challenges in rating speakers of Indian English in descriptive speaking tasks. The claims put forward by language attitude studies indicate a validity issue in English speaking tests: listeners tend to hold negative attitudes towards…

  12. Removing Bias towards World Englishes: The Development of a Rater Attitude Instrument Using Indian English as a Stimulus

    ERIC Educational Resources Information Center

    Hsu, Tammy Huei-Lien

    2016-01-01

    This study explores the attitudes of raters of English speaking tests towards the global spread of English and the challenges in rating speakers of Indian English in descriptive speaking tasks. The claims put forward by language attitude studies indicate a validity issue in English speaking tests: listeners tend to hold negative attitudes towards…

  13. The Intra- and Inter-rater Reliabilities of the Forward Head Posture Assessment of Normal Healthy Subjects.

    PubMed

    Nam, Seok Hyun; Son, Sung Min; Kwon, Jung Won; Lee, Na Kyung

    2013-06-01

    [Purpose] Assessment of posture is an important goal of physical therapy interventions for preventing the progression of forward head posture (FHP). The purpose of this study was to determine the inter- and intra-rater reliabilities of the assessment of FHP. [Subjects and Methods] We recruited 45 participants (20 male subjects, 25 female subjects) from a university student population. Two physical therapists assessed FHP using images of head extension. FHP is characterized by the measurement of angles and distances between anatomical landmarks. Forward shoulder angle of 54° or less was defined as FHP. Intra- and inter-rater reliabilities were estimated using Kendall's Taub correlation coefficients. [Results] Intra-class correlation of intra-rater measurements indicated an excellent level of reliability (0.91), and intra-class correlation of inter-rater measurements showed a good level of reliability in the assessment of FHP (0.75). [Conclusion] Assessment of FHP is an important component of evaluation and affects the design of the treatment regimen. The assessment of FHP was reliably measured by two physical therapists. It could therefore become a useful method for assessing FHP in the clinical setting. Future studies will be needed to provide more detailed quantitative data for accurate assessment of posture.

  14. Inter-rater reliability for measurement of passive physiological movements in lower extremity joints is generally low: a systematic review.

    PubMed

    van Trijffel, Emiel; van de Pol, Rachel J; Oostendorp, Rob Ab; Lucas, Cees

    2010-01-01

    What is the inter-rater reliability for measurements of passive physiological or accessory movements in lower extremity joints? Systematic review of studies of inter-rater reliability. Individuals with and without lower extremity disorders. Range of motion and end-feel using methods feasible in daily practice. 17 studies were included of which 5 demonstrated acceptable inter-rater reliability. Reliability of measurements of physiological range of motion ranged from Kappa -0.02 for measuring knee extension using a goniometer to ICC 0.97 for measuring knee flexion using vision. Measuring range of knee flexion consistently yielded acceptable reliability using either vision or instruments. Measurements of end-feel were unreliable for all hip and knee movements. Two studies satisfied all criteria for internal validity while reporting acceptable reliability for measuring physiological range of knee flexion and extension. Overall,however, methodological quality of included studies was poor. Inter-rater reliability of measurement of passive movements in lower extremity joints is generally low. We provide specific recommendations for the conduct and reporting of future research. Awaiting new evidence, clinicians should be cautious when relying on results from measurements of passive movements in joints for making decisions about patients with lower extremity disorders.

  15. Plant disease severity assessment - How rater bias, assessment method and experimental design affect hypothesis testing and resource use efficiency

    USDA-ARS?s Scientific Manuscript database

    The impact of rater bias and assessment method on hypothesis testing was studied for different experimental designs for plant disease assessment using balanced and unbalanced data sets. Data sets with the same number of replicate estimates for each of two treatments are termed ‘balanced’, and those ...

  16. Automated inter-rater reliability assessment and electronic data collection in a multi-center breast cancer study

    PubMed Central

    Thwin, Soe Soe; Clough-Gorr, Kerri M; McCarty, Maribet C; Lash, Timothy L; Alford, Sharon H; Buist, Diana SM; Enger, Shelley M; Field, Terry S; Frost, Floyd; Wei, Feifei; Silliman, Rebecca A

    2007-01-01

    Background The choice between paper data collection methods and electronic data collection (EDC) methods has become a key question for clinical researchers. There remains a need to examine potential benefits, efficiencies, and innovations associated with an EDC system in a multi-center medical record review study. Methods A computer-based automated menu-driven system with 658 data fields was developed for a cohort study of women aged 65 years or older, diagnosed with invasive histologically confirmed primary breast cancer (N = 1859), at 6 Cancer Research Network sites. Medical record review with direct data entry into the EDC system was implemented. An inter-rater and intra-rater reliability (IRR) system was developed using a modified version of the EDC. Results Automation of EDC accelerated the flow of study information and resulted in an efficient data collection process. Data collection time was reduced by approximately four months compared to the project schedule and funded time available for manuscript preparation increased by 12 months. In addition, an innovative modified version of the EDC permitted an automated evaluation of inter-rater and intra-rater reliability across six data collection sites. Conclusion Automated EDC is a powerful tool for research efficiency and innovation, especially when multiple data collection sites are involved. PMID:17577410

  17. When experts disagreed, who was correct? A comparison of PCL-R scores from independent raters and opposing forensic experts.

    PubMed

    Rufino, Katrina A; Boccaccini, Marcus T; Hawes, Samuel W; Murrie, Daniel C

    2012-12-01

    Researchers recently found that Psychopathy Checklist-Revised (PCL-R; Hare, 2003) scores reported by state experts were much higher than those reported by defense experts in sexually violent predator cases pursued for civil commitment (Murrie, Boccaccini, Johnson, & Janke, 2008), which raised the question of which scores were more accurate. In this study, two independent raters rescored the PCL-R from file review for 44 offenders from that sample who had opposing evaluator scores (allegiance cases) and 44 who had state expert, but not defense expert, scores (comparison cases). The independent raters agreed with one another in their scoring of the allegiance and comparison cases (Intraclass Correlation Coefficient [ICC] ICCA,1 = .95), but they disagreed with both state (ICCA,1 = .29) and defense (ICCA,1 = .14) experts in the allegiance cases. Agreement was stronger between state experts and independent raters for the comparison cases (ICCA,1 = .63), but the independent raters assigned significantly higher PCL-R scores than experts for both the allegiance and comparison cases. These findings suggest that offenders who were selected for rescoring by the defense may have been more difficult to score. Findings also raise questions about the extent to which PCL-R scores based on correctional file review only are comparable to those based on file and interview.

  18. On Individual Differences in Person Perception: Raters' Personality Traits Relate to Their Psychopathy Checklist-Revised Scoring Tendencies

    ERIC Educational Resources Information Center

    Miller, Audrey K.; Rufino, Katrina A.; Boccaccini, Marcus T.; Jackson, Rebecca L.; Murrie, Daniel C.

    2011-01-01

    This study investigated raters' personality traits in relation to scores they assigned to offenders using the Psychopathy Checklist-Revised (PCL-R). A total of 22 participants, including graduate students and faculty members in clinical psychology programs, completed a PCL-R training session, independently scored four criminal offenders using the…

  19. Minimum Competency Standards Set by Three Divergent Groups of Raters Using Three Judgmental Procedures: Implications for Validity.

    ERIC Educational Resources Information Center

    Halpin, Gerald; And Others

    1983-01-01

    Although arbitrary, whenever multiple judgmental standard-setting procedures are utilized by different groups concurrently, stability across raters can be achieved and decisions can be made in a relatively judicious manner. Greater stability across methods (Ebel, Nedelsky, Angoff) may be effected by slightly modifying the Ebel approach. (Author/PN)

  20. [Intra-rater Reliability for the Questionnaire on Activity Limitations and Participation Restrictions of Children With ADHD].

    PubMed

    Salamanca Duque, Luisa Matilde; Naranjo Aristizábal, María Mercedes; Gutiérrez Ríos, Gladys Helena; Prieto, Jaime Bayona

    2014-03-01

    Questionnaires for evaluating activity limitations and participation restrictions in children with ADHD (CLARP-TDAH) has recently been developed in Colombia, based on the suggestions made by the WHO from the International Classification of Functioning, Disability and Health (ICF), allowing clinical evaluation beyond an evaluation of the functionality and functioning of children in their family and school environments. Previous research with the questionnaire proved useful in the multidisciplinary approach of Colombian children with ADHD. This study determines the level of intra-rater reliability for questionnaires CLARP-TDAH Parents and Teachers. The study included a non-random sample of 203 Colombian children attending school and diagnosed with ADHD. Intra-rater reliability and the reproducibility of the results was determined using the Kappa index. The informants were parents and teachers. Kappa values >0.7 were obtained for the intra-rater reliability of the questionnaire domains of CLARP-TDAH Parents, while for CLARP-TDAH Teachers domains these values were >0.8. CLARP-TDAH questionnaires are a tool with a good level of intra-rater reliability, which allows a reliable assessment of activity limitations and participation restrictions in order to determine the level of functioning at home and school. Copyright © 2014 Asociación Colombiana de Psiquiatría. Publicado por Elsevier España. All rights reserved.

  1. A comparison of raters and disease assessment methods for estimating disease severity for purposes of hypothesis testing

    USDA-ARS?s Scientific Manuscript database

    Assessment of disease severity is most often made visually, and estimates can be inaccurate. Nearest percent estimates (NPEs) of Septoria leaf blotch on leaves of winter wheat by four raters (R1-R4) assessing non-treated (NT) and fungicide-treated (FT) plots were compared to true values using Lin’s ...

  2. Using Consensus Building Procedures with Expert Raters to Establish Comparison Scores of Behavior for Direct Behavior Rating

    ERIC Educational Resources Information Center

    Jaffery, Rose; Johnson, Austin H.; Bowler, Mark C.; Riley-Tillman, T. Chris; Chafouleas, Sandra M.; Harrison, Sayward E.

    2015-01-01

    To date, rater accuracy when using Direct Behavior Rating (DBR) has been evaluated by comparing DBR-derived data to scores yielded through systematic direct observation. The purpose of this study was to evaluate an alternative method for establishing comparison scores using expert-completed DBR alongside best practices in consensus building…

  3. Cross-National Differences in the Assessment of Psychopathy: Do They Reflect Variations in Raters' Perceptions of Symptoms?

    ERIC Educational Resources Information Center

    Cooke, David J.; Hart, Stephen D.; Michie, Christine

    2004-01-01

    Cross-national differences in the prevalence of psychopathy have been reported. This study examined whether rater effects could account for these differences. Psychopathy was assessed with the Psychopathy Checklist-Revised (PCL-R; R. D. Hare, 1991). Videotapes of 6 Scottish prisoners and 6 Canadian prisoners were rated by 10 Scottish and 10…

  4. The Consistency between Human Raters and an Automated Essay Scoring System in Grading High School Students' English Writing

    ERIC Educational Resources Information Center

    Tsai, Min-hsiu

    2012-01-01

    This study investigates the consistency between human raters and an automated essay scoring system in grading high school students' English compositions. A total of 923 essays from 23 classes of 12 senior high schools in Taiwan (Republic of China) were obtained and scored manually and electronically. The results show that the consistency between…

  5. Inter- and intra-rater reliability of diffusion tensor imaging parameters in the normal pediatric spinal cord

    PubMed Central

    Barakat, Nadia; Shah, Pallav; Faro, Scott H; Gaughan, John P; Middleton, Devon; Mulcahey, MJ; Mohamed, Feroze B

    2015-01-01

    AIM: To assess inter- and intra-rater reliability (agreement) between two region of interest (ROI) methods in pediatric spinal cord diffusion tensor imaging (DTI). METHODS: Inner-Field-of-View DTI data previously acquired from ten pediatric healthy subjects (mean age = 12.10 years) was used to assess for reliability. ROIs were drawn by two neuroradiologists on each subject data twice within a 3-mo interval. ROIs were placed on axial B0 maps along the cervical spine using free-hand and fixed-size ROIs. Agreement analyses for fractional anisotropy (FA), axial diffusivity, radial diffusivity and mean diffusivity were performed using intra-class-correlation (ICC) and Cronbach’s alpha statistical methods. RESULTS: Inter- and intra-rater agreement between the two ROI methods showed moderate (ICC = 0.5) to strong (ICC = 0.84). There were significant differences between raters in the number of pixels selected using free-hand ROIs (P < 0.05). However, no significant differences were observed in DTI parameter values. FA showed highest variability in ICC values (0.10-0.87). Cronbach’s alpha showed moderate-high values for raters and ROI methods. CONCLUSION: The study showed that high reproducibility in spinal cord DTI can be achieved, and demonstrated the importance of setting detailed methodology for post-processing DTI data, specifically the placement of ROIs. PMID:26435778

  6. On Individual Differences in Person Perception: Raters' Personality Traits Relate to Their Psychopathy Checklist-Revised Scoring Tendencies

    ERIC Educational Resources Information Center

    Miller, Audrey K.; Rufino, Katrina A.; Boccaccini, Marcus T.; Jackson, Rebecca L.; Murrie, Daniel C.

    2011-01-01

    This study investigated raters' personality traits in relation to scores they assigned to offenders using the Psychopathy Checklist-Revised (PCL-R). A total of 22 participants, including graduate students and faculty members in clinical psychology programs, completed a PCL-R training session, independently scored four criminal offenders using the…

  7. [The blindness in the literature-Jose Saramago: blindness and Albert Bang: the blind witness].

    PubMed

    Permin, H; Norn, M

    2001-01-01

    Two novels with different aspects of blindness seen through the doctors eyes. The Portuguese Nobel-prize winner José Saramago's story of a city struck by an epidemic of "white blindness", where the truth is what we cannot bear to see. The Danish author and unskilled labourer Albert Bang's (synonym with Karl E. Rasmussen) crime novel describes a blind or pretend to be blind butcher, who is a witness to a murder. Both novels are lyric, thought-provoking and insightful.

  8. Poroelastic references

    SciTech Connect

    Morency, Christina

    2014-12-12

    This file contains a list of relevant references on the Biot theory (forward and inverse approaches), the double-porosity and dual-permeability theory, and seismic wave propagation in fracture porous media, in RIS format, to approach seismic monitoring in a complex fractured porous medium such as Brady?s Geothermal Field.

  9. Reference Roundup.

    ERIC Educational Resources Information Center

    Silver, Linda; And Others

    1982-01-01

    Briefly describes the nature and availability of reference books for children and adolescents and then reviews some recent publications of this type, including works of a general nature and works on social science, science, the arts, language, history and geography, and biography. (JL)

  10. Ready Reference.

    ERIC Educational Resources Information Center

    Koltay, Emery

    2001-01-01

    Includes four articles that relate to ready reference, including a list of publishers' toll-free telephone numbers and Web sites; how to obtain an ISBN (International Standard Book Number) and an ISSN (International Standard Serial Number); and how to obtain an SAN (Standard Address Number), for organizations that are involved in the book…

  11. A Completely Blind Video Integrity Oracle.

    PubMed

    Mittal, Anish; Saad, Michele A; Bovik, Alan C

    2016-01-01

    Considerable progress has been made toward developing still picture perceptual quality analyzers that do not require any reference picture and that are not trained on human opinion scores of distorted images. However, there do not yet exist any such completely blind video quality assessment (VQA) models. Here, we attempt to bridge this gap by developing a new VQA model called the video intrinsic integrity and distortion evaluation oracle (VIIDEO). The new model does not require the use of any additional information other than the video being quality evaluated. VIIDEO embodies models of intrinsic statistical regularities that are observed in natural vidoes, which are used to quantify disturbances introduced due to distortions. An algorithm derived from the VIIDEO model is thereby able to predict the quality of distorted videos without any external knowledge about the pristine source, anticipated distortions, or human judgments of video quality. Even with such a paucity of information, we are able to show that the VIIDEO algorithm performs much better than the legacy full reference quality measure MSE on the LIVE VQA database and delivers performance comparable with a leading human judgment trained blind VQA model. We believe that the VIIDEO algorithm is a significant step toward making real-time monitoring of completely blind video quality possible.

  12. Gene therapy for blindness.

    PubMed

    Sahel, José-Alain; Roska, Botond

    2013-07-08

    Sight-restoring therapy for the visually impaired and blind is a major unmet medical need. Ocular gene therapy is a rational choice for restoring vision or preventing the loss of vision because most blinding diseases originate in cellular components of the eye, a compartment that is optimally suited for the delivery of genes, and many of these diseases have a genetic origin or genetic component. In recent years we have witnessed major advances in the field of ocular gene therapy, and proof-of-concept studies are under way to evaluate the safety and efficacy of human gene therapies. Here we discuss the concepts and recent advances in gene therapy in the retina. Our review discusses traditional approaches such as gene replacement and neuroprotection and also new avenues such as optogenetic therapies. We conjecture that advances in gene therapy in the retina will pave the way for gene therapies in other parts of the brain.

  13. Is inter-rater reliability of Global Trigger Tool results altered when members of the review team are replaced?

    PubMed

    Mevik, Kjersti; Griffin, Frances A; Hansen, Tonje Elisabeth; Deilkås, Ellen; Vonen, Barthold

    2016-09-01

    To evaluate the inter-rater reliability of results from Global Trigger Tool (GTT) reviews when one of the three reviewers remains consistent, while one or two reviewers rotate. Comparison of results from retrospective record review performed as a cross-sectional study with three review teams each consisting of two non-physicians and one physician; Team I (three consistent reviewers), Team II (one of the two non-physician reviewers or/and the physician from Team I are replaced for different review periods) and Team III (three consistent reviewers different from reviewers in Team I and Team II). Medium-sized hospital trust in Northern Norway. A total of 120 records were selected as biweekly samples of 10 from discharge lists between 1 July and 31 December 2010 for a 3-fold review. Replacement of review team members was tested to assess impact on inter-rater reliability and adverse events measurment. Inter-rater reliability assessed with the Cohen kappa coefficient between different teams regarding the presence and severity level of adverse events. Substantial inter-rater reliability regarding the presence and severity level of adverse events was obtained between Teams I and II, while moderate inter-rater reliability was obtained between Teams I and III. Replacement of reviewers did not influence the results provided that one of the non-physician reviewers remains consistent. The experience of the consistent reviewer can result in continued consistency in interpretation with the new reviewer through discussion of events. These findings could encourage more hospital to rotate reviewers in order to optimize resources when using the GTT. © The Author 2016. Published by Oxford University Press in association with the International Society for Quality in Health Care. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  14. BurnCase 3D software validation study: Burn size measurement accuracy and inter-rater reliability.

    PubMed

    Parvizi, Daryousch; Giretzlehner, Michael; Wurzer, Paul; Klein, Limor Dinur; Shoham, Yaron; Bohanon, Fredrick J; Haller, Herbert L; Tuca, Alexandru; Branski, Ludwik K; Lumenta, David B; Herndon, David N; Kamolz, Lars-P

    2016-03-01

    The aim of this study was to compare the accuracy of burn size estimation using the computer-assisted software BurnCase 3D (RISC Software GmbH, Hagenberg, Austria) with that using a 2D scan, considered to be the actual burn size. Thirty artificial burn areas were pre planned and prepared on three mannequins (one child, one female, and one male). Five trained physicians (raters) were asked to assess the size of all wound areas using BurnCase 3D software. The results were then compared with the real wound areas, as determined by 2D planimetry imaging. To examine inter-rater reliability, we performed an intraclass correlation analysis with a 95% confidence interval. The mean wound area estimations of the five raters using BurnCase 3D were in total 20.7±0.9% for the child, 27.2±1.5% for the female and 16.5±0.1% for the male mannequin. Our analysis showed relative overestimations of 0.4%, 2.8% and 1.5% for the child, female and male mannequins respectively, compared to the 2D scan. The intraclass correlation between the single raters for mean percentage of the artificial burn areas was 98.6%. There was also a high intraclass correlation between the single raters and the 2D Scan visible. BurnCase 3D is a valid and reliable tool for the determination of total body surface area burned in standard models. Further clinical studies including different pediatric and overweight adult mannequins are warranted. Copyright © 2016 Elsevier Ltd and ISBI. All rights reserved.

  15. Blind shaft development

    SciTech Connect

    Fiscor, S.

    2009-02-15

    The article discusses how Shaft Drillers International (SDI) is breaking new ground in shaft development and ground stabilization. Techniques of blind shaft drilling and raise bore shaft development developed by SDI are briefly explained. An associated company, Coastal Drilling East, deals with all types of ground improvement such as pre-grouting work for shafts, grouting of poor soil and water leaks into the mine. 3 photos.

  16. Geometry of blind thrusts

    SciTech Connect

    Kligfield, R.; Geiser, P.; Geiser, J.

    1985-01-01

    Blind thrusts are structures which at no time in their history broke the erosion surface and along which displacement progressively changes upwards. Faults of the stiff layer along which displacement progressively decreases to zero (tip) are one prominent type of blind thrust structure. Shortening above such tips is accommodated entirely by folding whereas shortening below the tip is partitioned between folding and faulting. For these types of faults it is possible to determine the original length of the stiff layer for balancing purposes. A systematic methodology for line length and area restoration is outlined for determining blind thrust geometry. Application of the methodology is particularly suitable for use with microcomputers. If the folded form of the cover is known along with the position of the fault and its tip, then it is possible to locate hanging and footwall cutoffs. If the fault trajectory, tip, and a single hanging wall footwall cutoff pair are known, then the folded form of the cover layer can be determined. In these constructions it is necessary to specify pin lines for balancing purposes. These pin lines may or may not have a zero displacement gradient, depending upon the amount of simple shear deformation. Examples are given from both Laramide structures of the western USA and the Appalachians.

  17. Implementation of blinded outcome assessment in the Effective Verruca Treatments trial (EverT) - lessons learned.

    PubMed

    Cockayne, Sarah; Hewitt, Catherine; Hashmi, Farina; Hicks, Kate; Concannon, Michael; McIntosh, Caroline; Thomas, Kim; Hall, Jill; Watson, Judith; Torgerson, David; Watt, Ian

    2016-01-01

    Trials using inadequate levels of blinding may report larger effect sizes than blinded studies. It has been suggested that blinded outcome assessment in open trials may in some cases be undertaken by assessments of photographs. The aim of this paper is to explore the effect of using different methods to assess the primary outcome in the EVerT (Effective Verruca Treatments) trial. It also aims to give an overview of the experiences of using digital photographs within the trial. We undertook a secondary analysis to explore the effect of using three different methods to assess the primary outcome in the EVerT trial: assessment of digital photographs by blinded healthcare professionals; blinded healthcare professional assessment at the recruiting site and patient self-report. The verruca clearance rates were calculated using the three different methods of assessment. A Cohen's kappa measure of inter-rater agreement was used to assess the agreement between the methods. We also investigated the experiences of healthcare professionals using digital photographs within the trial. Digital photographs for 189 out of 240 (79 %) patients in the trial were received for outcome assessment. Of the 189 photographs, 30 (16 %) were uninterpretable. The overall verruca clearance rates were 21 % (43/202,) using the unblinded patient self-reported outcome, 6 % (9/159,) using blinded assessment of digital photographs and 14 % (30/210,) using blinded outcome assessment at the site. Despite differences in the clearance rates found using different methods of outcome assessment, this did not change the original conclusion of the trial, that there is no evidence of a difference in effectiveness between cryotherapy and salicylic acid. Future trials using digital photographs should consider individual training needs at sites and have a backup method of assessment agreed a priori. ISRCTN Registry ISRCTN18994246.

  18. How to assess and compare inter-rater reliability, agreement and correlation of ratings: an exemplary analysis of mother-father and parent-teacher expressive vocabulary rating pairs

    PubMed Central

    Stolarova, Margarita; Wolf, Corinna; Rinker, Tanja; Brielmann, Aenne

    2014-01-01

    This report has two main purposes. First, we combine well-known analytical approaches to conduct a comprehensive assessment of agreement and correlation of rating-pairs and to dis-entangle these often confused concepts, providing a best-practice example on concrete data and a tutorial for future reference. Second, we explore whether a screening questionnaire developed for use with parents can be reliably employed with daycare teachers when assessing early expressive vocabulary. A total of 53 vocabulary rating pairs (34 parent–teacher and 19 mother–father pairs) collected for two-year-old children (12 bilingual) are evaluated. First, inter-rater reliability both within and across subgroups is assessed using the intra-class correlation coefficient (ICC). Next, based on this analysis of reliability and on the test-retest reliability of the employed tool, inter-rater agreement is analyzed, magnitude and direction of rating differences are considered. Finally, Pearson correlation coefficients of standardized vocabulary scores are calculated and compared across subgroups. The results underline the necessity to distinguish between reliability measures, agreement and correlation. They also demonstrate the impact of the employed reliability on agreement evaluations. This study provides evidence that parent–teacher ratings of children's early vocabulary can achieve agreement and correlation comparable to those of mother–father ratings on the assessed vocabulary scale. Bilingualism of the evaluated child decreased the likelihood of raters' agreement. We conclude that future reports of agreement, correlation and reliability of ratings will benefit from better definition of terms and stricter methodological approaches. The methodological tutorial provided here holds the potential to increase comparability across empirical reports and can help improve research practices and knowledge transfer to educational and therapeutic settings. PMID:24994985

  19. Validity and Reliability of Exposure Assessors’ Ratings of Exposure Intensity by Type of Occupational Questionnaire and Type of Rater

    PubMed Central

    Friesen, Melissa C.; Coble, Joseph B.; Katki, Hormuzd A.; Ji, Bu-Tian; Xue, Shouzheng; Stewart, Patricia A.

    2011-01-01

    Background: In epidemiologic studies that rely on professional judgment to assess occupational exposures, the raters’ accurate assessment is vital to detect associations. We examined the influence of the type of questionnaire, type of industry, and type of rater on the raters’ ability to reliably and validly assess within-industry differences in exposure. Our aim was to identify areas where improvements in exposure assessment may be possible. Methods: Subjects from three foundries (n = 72) and three textile plants (n = 74) in Shanghai, China, completed an occupational history (OH) and an industry-specific questionnaire (IQ). Six total dust measurements were collected per subject and were used to calculate a subject-specific measurement mean, which was used as the gold standard. Six raters independently ranked the intensity of each subject’s current job on an ordinal scale (1–4) based on the OH alone and on the OH and IQ together. Aggregate ratings were calculated for the group, for industrial hygienists, and for occupational physicians. We calculated intra-class correlation coefficients (ICCs) to evaluate the reliability of the raters. We calculated the correlation between the subject-specific measurement means and the ratings to evaluate the raters’ validity. Analyses were stratified by industry, type of questionnaire, and type of rater. We also examined the agreement between the ratings by exposure category, where the subject-specific measurement means were categorized into two and four categories. Results: The reliability and validity measures were higher for the aggregate ratings than for the ratings from the individual raters. The group’s performance was maximized with three raters. Both the reliability and validity measures were higher for the foundry industry than for the textile industry. The ICCs were consistently lower in the OH/IQ round than in the OH round in both industries. In contrast, the correlations with the measurement means were

  20. Temporary blindness after an anterior chamber cosmetic filler injection.

    PubMed

    Kim, Deok-Yeol; Eom, Jin-Sup; Kim, Jae Yong

    2015-06-01

    Blindness is a rare but devastating complication of cosmetic filler injection. A primary cause of blindness following hyaluronic acid filler injection is retrograde intravascular embolization into the small ocular arteries. We here report a case of temporary blindness associated with the injection of hyaluronic acid filler into the anterior chamber of eye. This is the first report of temporary blindness after cosmetic filler injection into the anterior chamber, and the first described case that recovered completely after the filler was removed. This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .